Are they the first and first three samples from the six samples we submit for each test example? Or are they selected randomly?
Also, in the evaluation criteria, it says that " The leaderboard will be sorted by minFDE at K=6. The rankings will be based on this metric too." However, it seems that the current ranking is not strictly sorted by minFDE at K=6? Are you using minADE instead?
Yep, we select the first and first 3 of your prediction for k=1 and k=3,
And the final ranking will be based on minFDE as mentioned in the evaluation criteria, the leaderboard will be updated shortly. Sorry for the confusion.
May I ask if the test set is selected to be more challenging than validation set? For some reason the performance dropped ~30% from validation set to test set
It’s not intended to be more challenging per se, but we do make sure that the test, train, and validation set are all from different area without overlapped. So yes, it can be more challenging in a sense that the distribution might not be exactly the same as the training, depending on the area of the city. 30% drop does seem a lot, though.
If you have any question about the dataset, I’d encourage opening an issue on our github repo https://github.com/argoai/argoverse-api. That way other people using the dataset can benefit from the discussion as well,