For the sake of the competition, can we used the computed features from the baseline and improve upon it? Or do we need to perform feature extraction from the raw csv files using our own metrics.
Second question, do you have an estimate on how large the generated feature pickle files will be when run on train/val/test? I’m asking this because feature generation is CPU intensive and we’re looking to run that on a separate VM with more CPU cores then move it later to one with more CUDA cores.