As part of our submission pipeline for the challenge, I’m trying to reproduce the results mentioned in the paper. With AlexNet trained on the jigsaw pretext task using YFCC100M to extract features from VOC07 and then train/test the linear SVM on VOC07 for image classification, I obtain the following results for features from different layers (compared with results mentioned in Table 19 in the paper) -
- conv3 = 47.64 vs 54.7
- conv4 = 48.6 vs 55.4
- conv5 = 43.67 vs 49.7
I see a consistent drop of ~6% compared to the results of the paper. I’m using the default config files (alexnet_jigsaw_extract_features.yaml) provided in the repo without any changes to the parameters. I’m also using the provided PARAMS_FILE of the corresponding pretext task.
Can you please let us know if this reduction is expected? if not, any thoughts/suggestions on anything I’m missing to reproduce the results from the paper.