Our team would like to understand how the test input data will be delivered to the model. To our reasoning, there are two approaches:
(I) The 10 min segments work as independent inputs, where no time correlation between them is preserved and are tested in random order.
(II) The 10 min segments are correlated in time and tested in the correct order, allowing us to keep the information from the previous segments to be used in the next one.
Mean classifier probability values were calculated for groups of five consecutive 1-min segment, and the maximum probability across each 60-min interval was calculated.
However, in the example with the InceptionTime model, the submission.csv ( msg2022 → examples-> inceptionTime → submission → submission.csv) does not make it explicit whether the probabilities are calculated for each .parquet or for a group of samples in every subfolder. The output seems like overall probabilities for the whole pre-ictal event:
How are the tests designed? It is expected to compute probabilities for each .parquet segment independently or it is expected to return just an overall probability for a whole pre-ictal event?
sorry to bother you on a similar issue, but I have a question regarding the test-data that I hope you can hel me with.
In the description it states that:
> Seizures that occur within 4 hours of the previous seizure are not labelled (These are known as lead seizures).
In the training data, we might just drop those sequences occurring in that time frame (for example, or handle it in other ways).
But if we approach the problem like an “image classification task” with only one sequence by one, without the possibility to include information from previous sequences (please, correct me if I’m wrong), are we allowed to (for example) hardcode on the models those timeframes for which it is impossible for a seizure to be recorded ?
Another aspect that is not too clear to me is related to the timeframe. From your comment:
it seems that the sequences used for the private leaderboard were randomly picked from the dataset, but I would have expected them all to refer to a timespan subsequent to the last values of the training data, could you please help me clarify this aspect ?
Based on the discussion about the test samples, it is still not entirely clear to me whether samples from the past may be used. Therefore, once again my explicit question:
Is it explicitly allowed to use sequences from the past? I mean, if I have the sample with the appropriate timestamp and know that there are some consecutive 10-minute sequences in the past, can I also use this string of 10-minute sequences for the prediction? Kind of to include the extended history.