Hello all, Hope everyone is having good time and actively following your aspirations. In this post, I will be discussing about the technique called “Weighted Boxes Fusion”. Let’s get started!

In the previous article, we discussed about NMS and Soft-NMS techniques for filtering the predictions from object detection models. These techniques works well and are most used till date. These techniques works well for filtering predictions of a single model, What if you have predictions from multiple models? Weighted boxes fusion is a novel method for combining predictions of object detection models.


Aiming for high accuracies leads to creation of ensemble models (mostly in competitions) in which all the predictions from different models should be merged together for final predictions. Weighted boxes fusion is used for merging all the predictions from multiple models. Unlike NMS or Soft-NMS methods that simply remove part of the predictions, WBF method uses confidence scores of all proposed bounding boxes to construct the average boxes. This method won’t discard any bounding box, instead it combines them. This leads to improvement in quality of the combined predictions.

Comparison with NMS & Soft-NMS:

  • Both NMS and Soft-NMS filters the boxes by discarding boxes with low confidence score, but WBF uses information from all the boxes.
  • WBF can boost the results where all the ensembled models predict inaccurate boxes, by taking average of them. See the fig below for better understanding.
Comparison of WBF and NMS

Weighted Boxes Fusion Algorithm:


  • List of boxes predictions from each model, each box is 4 numbers.
    It has 3 dimensions (models_number, model_preds, 4)
  • List of scores for each model
  • List of labels for each model
  • List of weights for each model. Default: weight == 1 for each model (If more weightage should be given to a particular model, increase it’s weight to 2 or 3 etc.,)
  • IOU threshold (To check overlap)
  • Confidence threshold (to discard boxes of low confidence score)
  • Confidence type: Max, Avg (how to calculate confidence for averaged boxes)


  • List of final boxes
  • List of final confidence scores
  • List of final labels


Create a dictionary of labels and corresponding boxes from all the models like shown below and Sort each list in dict by score.

Boxes = {label_1:B_1,label_1:B_2,...label_n:B_n},
where B=[[score1, x1,y1,x2,y2],[score2, x1,y1,x2,y2],...]

Now the below algorithm is explained for each label in the dictionary, B1,B2..

  1. As we know from above, B_n is a sorted list of boxes for label_n based on confidence scores.
  2. Declare empty lists L and F for boxes clusters and fused boxes, respectively. Each position in the list L can contain a set of boxes (or single box), which form a cluster; each position in F contains only one box, which is the fused box from the corresponding cluster in L.
  3. Iterate through the boxes in B (b¹,b²,b³,..)and find the matching box in F. The match is found if the IOU of box with the box in F is greater than threshold. Here is a single box, whereas F can have multiple boxes. So IOU calculation is done with every box in F and finalise the index (pos)at which it got highest IOU with .
  4. If the match is not found i.e., if no box in F has IOU greater than threshold with , then append the box to F and L as new entries and proceed to .
  5. If the match is found, then take the index (pos) of matched box in F and add the box to L at that index. Recalculate the box coordinates and confidence score in F[pos], using all T boxes accumulated in cluster L[pos] with the following fusion formulas — (1),(2),(3),(4),(5).
  6. After all boxes in B are processed, re-scale confidence scores in F using below formula — (6),(7). Rescaling the scores is necessary to give weightage to more prominent boxes — if number of boxes in the cluster is low it could mean that only small number of models actually predict it and we need to decrease confidence for such case.
Weighted boxes and confidence calculation
Re-scale the confidence scores

Code module:

Below is the sample code module, find_matching_box() and get_weighted_box() are not shown here to keep it simple.

find_matching_box(): Performs step-3 in the algorithm

get_weighted_box(): Performs step-5 & 6 in the algorithm

overall_boxes = []
for label in Boxes:
boxes = filtered_boxes[label]
new_boxes = []
weighted_boxes = []
# Clusterize boxes
for j in range(0, len(boxes)):
index, best_iou = find_matching_box(weighted_boxes,\
boxes[j], iou_thr)
if index != -1: #No match found
weighted_boxes[index] = get_weighted_box(
# Rescale confidence based on number of models and boxes
for i in range(len(new_boxes)):
weighted_boxes[i][1] = weighted_boxes[i][1] *
len(new_boxes[i]) / weights.sum()


WBF have applications in 2 major scenarios:

  • Ensemble models —predictions from different models on the same data
Example use-case of Global wheat detection competition in kaggle
  • Predictions from Single model with Augmented data
Example use-case of Test Time Augmented images on single model

WBF gives better results than NMS & Soft-NMS in both the cases.

  • WBF can also be used in ensemble of the manual labels from experts in the medical applications.

WBF — an alternative to NMS?:

Sadly WBF gives inferior results on the single model use-case compared to NMS. Here is the results from ablation study conducted by authors:

Model: RetinaNet detector with the ResNet152 backbone trained on the Open Images dataset.


  • NMS with default IoU threshold = 0.5 — mAP: 0.4902
  • WBF with optimal parameters — mAP: 0.4532 (IoU threshold = 0.43, score threshold = 0.21)

That’s all for now, Hope you got something out of it. Below are the references to check it out. Thank you.


  1. https://arxiv.org/abs/1910.13302
  2. https://github.com/ZFTurbo/Weighted-Boxes-Fusion

Weighted Boxes Fusion was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.