Analysis of pupil data

There are many valid approaches to the analysis of pupillometry data, but the optimal approach will depend on the type of experiment being run and the research question in mind. Kelbsch et al. (2019) provide an informative view on standards in pupillometry, and there are some great examples of community-developed packages for streamlining pupillometry data analysis. Whilst it is always worth exploring the wider options that are available, PyPlr includes a set of pandas-reliant (you won’t regret learning pandas!) scripting tools for implementing a standard data processing pipeline that is optimised to work with some of the idiosynchrasies of Pupil Labs data. So far we have found these tools to be adequate for analysing our own data, but we welcome suggestions for improvements.

See here for a general introduction to analysing Pupil Labs pupillometry data. What follows is a typical PyPlr analysis pipeline with tools from pyplr.utils and pyplr.preproc.

Export with Pupil Player

The first step in our pipeline is to export the data using Pupil Player, making sure the required plugins (e.g., Annotation Player plugin) are enabled. Currently this must be done individually for each recording. Below is printed the file structure of a typical recording after export (more info on recording format here).

[1]:
from pyplr import utils

# Pupil Labs recording directory
rec_dir = '/Users/jtm/OneDrive - Nexus365/protocols/pipr_protocol/JTM'
utils.print_file_structure(rec_dir)
JTM/
    .DS_Store
    annotation.pldata
    annotation_player.pldata
    annotation_player_timestamps.npy
    annotation_timestamps.npy
    blinks.pldata
    blinks_timestamps.npy
    eye1.intrinsics
    eye1.mp4
    eye1_lookup.npy
    eye1_timestamps.npy
    fixations.pldata
    fixations_timestamps.npy
    gaze.pldata
    gaze_timestamps.npy
    info.player.json
    notify.pldata
    notify_timestamps.npy
    pupil.pldata
    pupil_timestamps.npy
    surface_definitions_v01
    user_info.csv
    world.intrinsics
    world.mp4
    world_lookup.npy
    world_timestamps.npy
    analysis/
        plr_extraction.png
        plr_metrics.csv
        plr_overall_metrics.png
        processed.csv
        pupil_processing.png
    exports/
        000/
            annotations.csv
            export_info.csv
            gaze_positions.csv
            pupil_gaze_positions_info.txt
            pupil_positions.csv
            world.mp4
            world_timestamps.csv
            world_timestamps.npy
    pyplr_analysis/
    offline_data/
        tokens/
            gaze_positions_consumer_Vis_Polyline.token
            gaze_positions_producer_GazeFromRecording.token
            pupil_positions_consumer_Pupil_From_Recording.token
            pupil_positions_consumer_System_Timelines.token
            pupil_positions_producer_Pupil_From_Recording.token

Load exported data

Now we can set up some constants and use pyplr.utils to get things moving. new_subject(...) returns a dictionary with the root directory, recording id, data directory and a newly made output directory for the analysis results. Then, passing the data directory to load_pupil(...), we load the specified columns from the pupil_positions.csv exported data.

[2]:
# Columns to load
use_cols = ['confidence',
            'method',
            'pupil_timestamp',
            'eye_id',
            'diameter_3d',
            'diameter']

# Get a handle on a subject
s = utils.new_subject(
    rec_dir, export='000', out_dir_nm='pyplr_analysis')

# Load pupil data
samples = utils.load_pupil(
    s['data_dir'], eye_id='best', cols=use_cols)
samples
************************************************************
*************************** JTM ****************************
************************************************************
Loaded 48552 samples
[2]:
eye_id confidence diameter method diameter_3d
pupil_timestamp
85838.895658 1 0.977211 50.102968 3d c++ 6.105771
85838.903656 1 0.997867 50.037086 3d c++ 6.099682
85838.911644 1 0.998295 49.702628 3d c++ 6.060059
85838.919551 1 0.997833 49.968637 3d c++ 6.093759
85838.927504 1 0.998183 49.883191 3d c++ 6.084404
... ... ... ... ... ...
86246.730916 1 0.818574 56.684975 3d c++ 6.966634
86246.739307 1 0.947873 56.824819 3d c++ 6.983579
86246.746889 1 0.969337 57.176728 3d c++ 7.027931
86246.758588 1 0.956524 56.838490 3d c++ 6.988390
86246.762987 1 0.927155 56.933094 3d c++ 7.000156

48552 rows × 5 columns

Preprocessing

Preprocessing pupillometry data is relatively straight-forward and typically involves dealing with signal loss due to eye blinks and smoothing out any high frequency noise. There are sophisticated algorithms for detecting blinks in a pupil time course, but Pupil Player has its own Blink Detection plugin that detects blinks based on rapid changes in the confidence metric. Whilst you can export these blink events and use the timestamps to index and mask the pupil timecourse, we find that it is effective simply to mask the pupil data with thresholds on the first derivative and confidence metric, and then to follow up with linear interpolation and smoothing.

[3]:
from pyplr import graphing
from pyplr import preproc

# Sampling frequency
SAMPLE_RATE = 120

# Pupil columns to analyse
pupil_cols = ['diameter_3d', 'diameter']

# Make figure for processing
f, axs = graphing.pupil_preprocessing(nrows=5, subject='Example')

# Plot the raw data
samples[pupil_cols].plot(title='Raw', ax=axs[0], legend=True)
axs[0].legend(loc='center right', labels=['mm', 'pixels'])

# Mask first derivative
samples = preproc.mask_pupil_first_derivative(
    samples, threshold=3.0, mask_cols=pupil_cols)
samples[pupil_cols].plot(
    title='Masked 1st deriv (3*SD)', ax=axs[1], legend=False)

# Mask confidence
samples = preproc.mask_pupil_confidence(
    samples, threshold=0.8, mask_cols=pupil_cols)
samples[pupil_cols].plot(
    title='Masked confidence (<0.8)', ax=axs[2], legend=False)

# Interpolate
samples = preproc.interpolate_pupil(
    samples, interp_cols=pupil_cols)
samples[pupil_cols].plot(
    title='Linear interpolation', ax=axs[3], legend=False)

# Smooth
samples = preproc.butterworth_series(
    samples, fields=pupil_cols, filt_order=3,
    cutoff_freq=4/(SAMPLE_RATE/2))
samples[pupil_cols].plot(
    title='3rd order Butterworth filter with 4 Hz cut-off',
    ax=axs[4], legend=False);

_images/04d_analysis_6_0.png

Extraction

Having cleaned the timecourse we need to extract the events of interest. This requires the annotation events sent during the recording, which have the timestamps needed for extraction. These can be loaded with utlis.load_annotations(...).

[4]:
events = utils.load_annotations(s['data_dir'])
events
Loaded 6 events
[4]:
index label duration color creation_time creator protocol pulse_duration pulse_spec
timestamp
85854.348667 1835 LIGHT_ON NaN red 2020-11-10 09:39:10.888989 jtm pulse 1000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 1979]
85919.820362 9629 LIGHT_ON NaN blue 2020-11-10 09:39:10.882007 jtm pulse 1000 [0, 0, 0, 2500, 0, 0, 0, 0, 0, 0]
85985.280630 17422 LIGHT_ON NaN blue 2020-11-10 09:39:10.882007 jtm pulse 1000 [0, 0, 0, 2500, 0, 0, 0, 0, 0, 0]
86050.768628 25217 LIGHT_ON NaN red 2020-11-10 09:39:10.888989 jtm pulse 1000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 1979]
86116.228025 33008 LIGHT_ON NaN blue 2020-11-10 09:39:10.882007 jtm pulse 1000 [0, 0, 0, 2500, 0, 0, 0, 0, 0, 0]
86181.688172 40801 LIGHT_ON NaN red 2020-11-10 09:39:10.888989 jtm pulse 1000 [0, 0, 0, 0, 0, 0, 0, 0, 0, 1979]

Now we can pass our samples and events to utils.extract(...) along with some parameters defining the number of samples to extract and how much to offset the data relative to the event. Here we extract-65 second epochs and offset by a 5 second baseline period.

[5]:
# Number of samples to extract and which sample
# should mark the onset of the event
DURATION = 7800
ONSET_IDX = 600

# Extract the event ranges
ranges = utils.extract(
    samples,
    events,
    offset=-ONSET_IDX,
    duration=DURATION,
    borrow_attributes=['color'])
ranges
Extracted ranges for 6 events
[5]:
eye_id confidence diameter method diameter_3d interpolated orig_idx color
event onset
0 0 1.0 0.998989 48.404937 3d c++ 5.990967 0 85849.311553 red
1 1.0 0.998972 48.436117 3d c++ 5.994949 0 85849.319757 red
2 1.0 0.999134 48.469848 3d c++ 5.999237 0 85849.327708 red
3 1.0 0.998426 48.505896 3d c++ 6.003802 0 85849.335675 red
4 1.0 0.998895 48.544031 3d c++ 6.008612 0 85849.343669 red
... ... ... ... ... ... ... ... ... ...
5 7795 1.0 0.922093 54.375155 3d c++ 6.707360 0 86242.127040 red
7796 1.0 0.977674 54.338056 3d c++ 6.703963 0 86242.135026 red
7797 1.0 0.933031 54.304633 3d c++ 6.701016 0 86242.142965 red
7798 1.0 0.970404 54.274990 3d c++ 6.698505 0 86242.150957 red
7799 1.0 0.903852 54.249187 3d c++ 6.696413 0 86242.162996 red

46800 rows × 8 columns

utils.extract(...) returns a pandas DataFrame with a two-level (event, onset) hierarchichal index. The original index to the data is preserved in a column along with any attributes borrowed from the annotations. Now is a good time to create additional columns expressing the pupil data as percent signal change from baseline.

[6]:
# Calculate baselines
baselines = ranges.loc[:, range(0, ONSET_IDX), :].mean(level=0)

# New columns for percent signal change
ranges = preproc.percent_signal_change(
    ranges, baselines, pupil_cols)
ranges
[6]:
eye_id confidence diameter method diameter_3d interpolated orig_idx color diameter_3d_pc diameter_pc
event onset
0 0 1.0 0.998989 48.404937 3d c++ 5.990967 0 85849.311553 red -6.907039 -7.009474
1 1.0 0.998972 48.436117 3d c++ 5.994949 0 85849.319757 red -6.845173 -6.949574
2 1.0 0.999134 48.469848 3d c++ 5.999237 0 85849.327708 red -6.778534 -6.884774
3 1.0 0.998426 48.505896 3d c++ 6.003802 0 85849.335675 red -6.707607 -6.815521
4 1.0 0.998895 48.544031 3d c++ 6.008612 0 85849.343669 red -6.632865 -6.742260
... ... ... ... ... ... ... ... ... ... ... ...
5 7795 1.0 0.922093 54.375155 3d c++ 6.707360 0 86242.127040 red 8.891820 8.796684
7796 1.0 0.977674 54.338056 3d c++ 6.703963 0 86242.135026 red 8.836673 8.722455
7797 1.0 0.933031 54.304633 3d c++ 6.701016 0 86242.142965 red 8.788822 8.655579
7798 1.0 0.970404 54.274990 3d c++ 6.698505 0 86242.150957 red 8.748060 8.596268
7799 1.0 0.903852 54.249187 3d c++ 6.696413 0 86242.162996 red 8.714104 8.544640

46800 rows × 10 columns

Plotting and parametrisation

It is common practice when analysing the PLR to describe it in terms of paremeters relating to time, velocity and acceleration. So once you have an array of data representing a pupil’s response to light, be it from a single trial or an average of multiple trials, simply pass it to pyplr.plr.PLR along with some basic parameters.

[7]:
from pyplr.plr import PLR

average_plr = ranges.mean(level=1)['diameter_3d'].to_numpy()
plr = PLR(average_plr,
          sample_rate=SAMPLE_RATE,
          onset_idx=ONSET_IDX,
          stim_duration=1)

Now you can do a quick plot with options to display the velocity and acceleration profiles and the derived parameters:

[8]:
fig = plr.plot(vel=True, acc=True, print_params=True)
_images/04d_analysis_16_0.png

Or simply access the parameters as a DataFrame.

[9]:
params = plr.parameters()
params
[9]:
value
Baseline 6.751031
Latency_a 0.225000
Latency_b 0.283333
T2MaxVel 0.425000
T2MaxCon 1.550000
T2Rec75pc 18.825000
PeakCon 3.164023
ConAmplitude 3.587008
VelConMax 5.940593
VelConAve 2.670336
AccConMax 35.127363
ConTime 1.325000
VelRedMax 0.999670
AccRedMax 7.021978