Are we allowed to use compute_features.py

For the sake of the competition, can we used the computed features from the baseline and improve upon it? Or do we need to perform feature extraction from the raw csv files using our own metrics.

Second question, do you have an estimate on how large the generated feature pickle files will be when run on train/val/test? I’m asking this because feature generation is CPU intensive and we’re looking to run that on a separate VM with more CPU cores then move it later to one with more CUDA cores.

Yes, you can use compute_features.py. You can re-use anything from the released baseline code.

For reference, features pickle files for train set was of size 1.4GB, for val it was 300MB and for test it was 2.3GB. Evidently, it was higher for test set because number of centerlines was not restricted to 1.

1 Like