KILT - Wizard of Wikipedia failure due to test set inconsistency

Hi, I submitted for KILT - Wizard of Wikipedia earlier today and ended up getting a failure.
The stdout shows:
Starting Evaluation…
Evaluating for phase dl-wow
WARNING: DIFFERENT SIZE gold: 2942 guess: 2944

However, if you go to kilt GitHub page. It shows that the test file of wow has 2944 lines instead of 2942.

The stderr shows:
Traceback (most recent call last):
File “/code/scripts/workers/”, line 511, in run_submission,
File “/tmp/tmp97i6utq4/compute/challenge_data/challenge_689/”, line 96, in evaluate
run_output = evaluate_run(test_annotation_file, user_submission_file)
File “/tmp/tmp97i6utq4/compute/challenge_data/challenge_689/”, line 25, in evaluate_run
result = _calculate_metrics(gold_records, guess_records)
File “/tmp/tmp97i6utq4/compute/challenge_data/challenge_689/”, line 115, in _calculate_metrics
), f"you should provide exactly one valid answer for {guess_item[‘id’]}"
AssertionError: you should provide exactly one valid answer for 56765562-99cd-11ea-aa09-57c1ffc5a65b_1

Can you help check whether there is a problem on the test set from EvalAI end? Thanks

Nevermind, it was an issue on my end