Failed on reproducing benchmark results of jigsaw pretrained object detection

On the paper it is said that with Resnet50 model pretrained on jigsaw task using imagenet 1K the results should be around 56.6 (as shown on table 6).
We ran the fine tuning three times on Detectron using the pretrained model (https://dl.fbaipublicfiles.com/fair_self_supervision_benchmark/models/detection/resnet50_jigsaw_in1k_pretext.pkl) and each time the results did not match the paper.

attempt 1: 44.137
attempt 2: 44.055
attempt 3: 45.638

We used the same configurations as the paper.(https://github.com/facebookresearch/fair_self_supervision_benchmark/blob/master/configs/benchmark_tasks/object_detection_frozen/voc07/fast_rcnn_R-50-C4_with_ss_proposals_trainval.yaml)
We did not give any modification on the configuration file and ran on 2 GPUs.
We used all 5011 VOC2007 trainval data for training and tested on 4952 VOC2007 test data.

1 Like

thanks for reporting! Can you report of you are able to reproduce numbers with supervised models like: https://dl.fbaipublicfiles.com/fair_self_supervision_benchmark/models/detection/resnet50_in1k_supervised.pkl

Hi! I’m the same team as ‘iwill231’.

The experimental results are as follows.
I used the supervised model you mentioned(resnet50_in1k_supervised(for detection)).

paper : 68.5 ± 0.3 (as shown on table 6)
attempt : 48.935

In the experiment, the metric was displayed as follows:

=== TRAIN PHASE ===
json_stats: {“accuracy_cls”: “0.970703”, “eta”: “0:00:19”, “iter”: 29800, “loss”: “0.230634”, “loss_bbox”: “0.155998”, “loss_cls”: “0.087899”, “lr”: “0.000200”, “mb_qsize”: 64, “mem”: 3228, “time”: “0.099779”}
json_stats: {“accuracy_cls”: “0.972656”, “eta”: “0:00:17”, “iter”: 29820, “loss”: “0.227868”, “loss_bbox”: “0.144723”, “loss_cls”: “0.075733”, “lr”: “0.000200”, “mb_qsize”: 64, “mem”: 3228, “time”: “0.099779”}
json_stats: {“accuracy_cls”: “0.974609”, “eta”: “0:00:15”, “iter”: 29840, “loss”: “0.262981”, “loss_bbox”: “0.184999”, “loss_cls”: “0.077793”, “lr”: “0.000200”, “mb_qsize”: 64, “mem”: 3228, “time”: “0.099779”}
json_stats: {“accuracy_cls”: “0.974609”, “eta”: “0:00:13”, “iter”: 29860, “loss”: “0.238943”, “loss_bbox”: “0.173168”, “loss_cls”: “0.075246”, “lr”: “0.000200”, “mb_qsize”: 64, “mem”: 3228, “time”: “0.099780”}
json_stats: {“accuracy_cls”: “0.964844”, “eta”: “0:00:11”, “iter”: 29880, “loss”: “0.294908”, “loss_bbox”: “0.180296”, “loss_cls”: “0.104424”, “lr”: “0.000200”, “mb_qsize”: 64, “mem”: 3228, “time”: “0.099780”}
json_stats: {“accuracy_cls”: “0.976562”, “eta”: “0:00:09”, “iter”: 29900, “loss”: “0.273847”, “loss_bbox”: “0.174627”, “loss_cls”: “0.078685”, “lr”: “0.000200”, “mb_qsize”: 64, “mem”: 3228, “time”: “0.099781”}
json_stats: {“accuracy_cls”: “0.968750”, “eta”: “0:00:07”, “iter”: 29920, “loss”: “0.289842”, “loss_bbox”: “0.194769”, “loss_cls”: “0.097236”, “lr”: “0.000200”, “mb_qsize”: 64, “mem”: 3228, “time”: “0.099781”}
json_stats: {“accuracy_cls”: “0.976562”, “eta”: “0:00:05”, “iter”: 29940, “loss”: “0.255976”, “loss_bbox”: “0.158928”, “loss_cls”: “0.072126”, “lr”: “0.000200”, “mb_qsize”: 64, “mem”: 3228, “time”: “0.099782”}
json_stats: {“accuracy_cls”: “0.976562”, “eta”: “0:00:03”, “iter”: 29960, “loss”: “0.209973”, “loss_bbox”: “0.150499”, “loss_cls”: “0.070146”, “lr”: “0.000200”, “mb_qsize”: 64, “mem”: 3228, “time”: “0.099783”}
json_stats: {“accuracy_cls”: “0.957031”, “eta”: “0:00:01”, “iter”: 29980, “loss”: “0.299155”, “loss_bbox”: “0.194809”, “loss_cls”: “0.098019”, “lr”: “0.000200”, “mb_qsize”: 64, “mem”: 3228, “time”: “0.099783”}
json_stats: {“accuracy_cls”: “0.972656”, “eta”: “0:00:00”, “iter”: 29999, “loss”: “0.280807”, “loss_bbox”: “0.190085”, “loss_cls”: “0.090866”, “lr”: “0.000200”, “mb_qsize”: 64, “mem”: 3228, “time”: “0.099784”}

=== TEST PHASE ===
INFO json_dataset_evaluator.py: 251: ~~~~ Summary metrics ~~~~
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.172
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.416
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.112
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.015
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.080
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.228
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.222
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.276
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.278
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.026
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.149
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.357
INFO json_dataset_evaluator.py: 218: Wrote json eval results to: /storage/workspace/Network/models/resnet50_in1k_supervised/test/voc_2007_final/ResNet50_fast_rcnn/detection_results.pkl
INFO task_evaluation.py: 62: Evaluating bounding boxes is done!
INFO task_evaluation.py: 185: copypaste: Dataset: voc_2007_final
INFO task_evaluation.py: 187: copypaste: Task: box
INFO task_evaluation.py: 190: copypaste: AP,AP50,AP75,APs,APm,APl
INFO task_evaluation.py: 191: copypaste: 0.1718,0.4157,0.1123,0.0146,0.0799,0.2281


thanks for responding. If the supervised models are not working either then the problem is either on user side or the detectron code itself.

Can you try training end-2-end model without proposal files? that would correspond to this config file https://github.com/facebookresearch/fair_self_supervision_benchmark/blob/master/configs/benchmark_tasks/object_detection_frozen/voc07/e2e_faster_rcnn_R-50-C4_trainval.yaml#L16

You can also replace the weights with the detectron provided weights https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md#imagenet-pretrained-models (See RN-50.pkl)

Finally, if above also doesn’t work, try training the models https://github.com/facebookresearch/fair_self_supervision_benchmark/blob/master/configs/legacy_tasks/object_detection_full_finetune/voc07/e2e_faster_rcnn_R-50-C4_trainval.yaml#L16 which is more like original detectron setup.