Hope you are doing good! As discussed before, I am participating in the Gaze Prediction challenge of this year.
I have some doubts regarding that. sorry for reaching out using this platform, but my teammate posted the questions on eval ai forum also… but could not get any response.
Here are my doubts:
q1.how to convert the valid.parquet into valid.json and vice versa
q2. how to apply 5-sample median filter (with 3 samples as minimum support) over the gaze values. is this filter the reason of total sample count in valid.parquet file and valid.json file.
q3. what is the method used for the linear interpolation to the timestamps 25 and 50 ms after the last input timestamp.
My response:
q1. You can use pandas to read and convert parquet to any format you want.
For example, install pandas; ==> pip install -U pandas
Then in python
import pandas as pd
df = pd.read_parquest() → read parquet file
df.to_json() → save to json file
q3. Not sure I get this question, we do not linearly interpolate to timestamps 25 and 50 (task is to predict gaze at these future time points given the input)
q2. This is a question for my colleague and I will get back once I have an answer
Question 1: Sachin already covered the file format. There is not backwards mapping from valid.json to valid.parquet because we remove some information to make it harder to cheat based on using different sequences.
Question 2- about the median filter. Here’s the code. Since we only run it once across the datasets, it is not optimized or pretty. The reason for the sample count in the valid.json (and test.json) files is that we randomly sample a subset of sequences. We do this so that participants can’t infer predictions from future sequences.
def _median_filter_vector(vector, num_samples=5, min_support=3):
"""Median filters a 1-dimensional input vector
Args:
vector: input vector to be filtered
num_samples: support for the median filter
min_support: if fewer than min_support values valid, sets filtered value to nan
NOTE First and last num_samples // 2 values are not filtered
"""
filtered = list(vector[:num_samples // 2])
for index in range(2, len(vector) - 2):
valid_values = [value for value in vector[index - num_samples // 2 : index + num_samples // 2 + 1] if not math.isnan(value)]
support = len(valid_values)
if support < min_support:
filtered.append(float('nan'))
else:
filtered.append(numpy.median(valid_values))
filtered.extend(vector[-(num_samples // 2):])
assert len(vector) == len(filtered)
return filtered
Question 3:
We linearly interpolation between the median filtered samples for the prediction targets. The median filter acts as a low pass filter to reduce detection noise and the linear interpolation leads to the correct values for the requested timestamp.