Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hit and signal regressions per subject #25

Open
2 tasks done
aridyckovsky opened this issue May 11, 2021 · 13 comments
Open
2 tasks done

Hit and signal regressions per subject #25

aridyckovsky opened this issue May 11, 2021 · 13 comments
Assignees
Labels
data Manipulates data in some way
Milestone

Comments

@aridyckovsky
Copy link
Member

aridyckovsky commented May 11, 2021

TODO:

  • Create an is_hit logical column (0, 1)
  • Reaction time regression on signal time per participant

Potential regression ideas:

library(lme4)

lmer(
  reaction_time ~ 1 + signal_time + (1 | id), 
  data = combined_hits_df[is.finite(combined_hits_df$reaction_time)]
)

glmer(is_hit ~ 1 + signal_time + (1 | id), data = combined_hits_df, family = "binomial")
@aridyckovsky aridyckovsky added the data Manipulates data in some way label May 11, 2021
@aridyckovsky aridyckovsky added this to the Analysis milestone May 11, 2021
@aridyckovsky aridyckovsky changed the title Hit-dependent regressions per subject Hit and signal regressions per subject May 11, 2021
aridyckovsky added a commit that referenced this issue May 11, 2021
aridyckovsky added a commit that referenced this issue May 11, 2021
@aridyckovsky
Copy link
Member Author

@psokolhessner I've added the ideas from above for regressions, the first of which found here: https://github.com/sokolhessnerlab/itrackvalr/blob/main/notebooks/behavioral_data_preprocessing.md#predict-is_hit-using-signal_time

The models as written output fit warnings, one of which is common to both: fit warnings: Some predictor variables are on very different scales: consider rescaling

The glmer model also outputs this: optimizer (Nelder_Mead) convergence code: 0 (OK) ; 0 optimizer warnings; 3 lme4 warnings

@psokolhessner
Copy link
Member

Ah that warning (some predictor variables are on very different scales) would be b/c signal time will have values in the thousands, as compared to the intercept (a value of 1). Though such pedestrian numeric scale differences shouldn't matter, they do. We'll need to rescale signal_time for both regressions. I'd consider rescaling by 3,600 turning them from units of ms into units of hours (and fractions thereof). Then everything lives on a similar scale.

@psokolhessner
Copy link
Member

The glmer output is slightly more opaque. What are the additional 3 lme4 warnings?

@psokolhessner
Copy link
Member

When running models, you want to store their output too. So the calls to lmer and glmer should be something like

model1 = lmer(...

What to name the models... we may be working with them quite a bit, so keeping names clear but also not too long would be good. Here, I'd consider a name that features some text that indicates this is regression output, e.g. model or fit, along with descriptives that capture regression features and/or sequence and variants. I've used names like model1, model2a, model2b, etc before, as well as model_RT_SignalTime_MFX (latter captures that it's model output; it's on RT, using signal time, and is mixed effects [fixed and random]).

@psokolhessner
Copy link
Member

Left suggestions for how to do this in the RMD file with this commit: e2efb35

@aridyckovsky
Copy link
Member Author

aridyckovsky commented May 18, 2021

@psokolhessner thanks for all of this. I updated the renv.lock to include lmerTest, so that will be accessible throughout the repo. I also adjusted the signal and reaction time scales to the [0,1] interval, and the models ran without warning. Pasting the summary responses here:

Predicting is_hit from signal_time

## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod
## ]
##  Family: binomial  ( logit )
## Formula: is_hit ~ 1 + signal_time + (1 | id)
##    Data: scaled_combined_hits_df
## 
##      AIC      BIC   logLik deviance df.resid 
##   2307.5   2323.9  -1150.7   2301.5     1797 
## 
## Scaled residuals: 
##          Min           1Q       Median           3Q          Max 
## -2.494411658 -0.784968172 -0.477289551  0.902517615  2.366707656 
## 
## Random effects:
##  Groups Name        Variance    Std.Dev.   
##  id     (Intercept) 0.698673966 0.835867194
## Number of obs: 1800, groups:  id, 50
## 
## Fixed effects:
##                 Estimate   Std. Error  z value   Pr(>|z|)    
## (Intercept)  0.230697540  0.157237994  1.46719    0.14233    
## signal_time -0.859634004  0.182285149 -4.71588 2.4067e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## signal_time -0.572

Predicting reaction_time from signal_time

## Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
## Formula: reaction_time ~ 1 + signal_time + (1 | id)
##    Data: scaled_combined_hits_df %>% na.omit()
## 
## REML criterion at convergence: 2463
## 
## Scaled residuals: 
##          Min           1Q       Median           3Q          Max 
## -1.856746002 -0.522942508 -0.203867491  0.199189511  5.955977930 
## 
## Random effects:
##  Groups   Name        Variance    Std.Dev.   
##  id       (Intercept) 0.253031509 0.503022374
##  Residual             1.067215435 1.033061196
## Number of obs: 821, groups:  id, 50
## 
## Fixed effects:
##                  Estimate    Std. Error            df  t value   Pr(>|t|)    
## (Intercept)   1.424493424   0.100223741 105.550683060 14.21313 < 2.22e-16 ***
## signal_time   0.774275825   0.128541525 793.026666766  6.02355 2.6097e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## signal_time -0.585

@aridyckovsky
Copy link
Member Author

aridyckovsky commented May 18, 2021

Models to run:

Predict the probability of a hit by signal time with signal time random effects

is_hit ~ 1 + signal_time + (1 + signal_time | id)

Predict the reaction time for a hit by signal time with signal time random effects

reaction_time ~ 1 + signal_time + (1 + signal_time | id)

Predict the probability of a false alarm by response time

is_false_alarm ~ 1 + resp_time + (1 | id)

Predict the probability of a false alarm by response time with response time random effects

is_false_alarm ~ 1 + resp_time + (1 + resp_time | id)

Note:

is_false_alarm is 0 when is_hit == 1, or 1 otherwise

Checklist:

  • Create is_false_alarm column based on is_hit and other resp_time without a signal
  • Run above models
  • Plot the effects from models via predict
  • Then see if using pHit in first or second half of task (first and second 1800 seconds)
  • Play with ranef(), fixef() and coef()

@psokolhessner
Copy link
Member

A tweak to the definition of is_false_alarm to more clearly define what goes into "otherwise":
is_false_alarm contains entries for all responses/button-presses, and is 0 when is_hit == 1, or 1 otherwise.

Alternatively, variable can be identified as is_falsealarm_vs_hit (much like is_hit can be more clearly defined as is_hit_vs_miss).

aridyckovsky added a commit that referenced this issue May 25, 2021
@aridyckovsky
Copy link
Member Author

I think it's clearer to identify boolean variables as is_hit and is_false_alarm, keeping them as completely logical values. By introducing a "vs" into the labeling, we then must rely on our interpretations to understand the underlying logical value's meaning.

For example, the strictly boolean variable is_hit definition is very easy to understand from a data-reading perspective: If is_hit is 1 (true), then it's a hit. If is_hit is 0 (false), then it's not a hit. However, is_hit_vs_miss is not straightforward at the boolean level -- it requires annotation separate from the data to interpret correctly, i.e., is it a hit when true but a miss when false, or a hit when false but a miss when true? In this case, we are better served by adding an is_miss variable to maintain clear boolean variables with no room for interpretation error.

@psokolhessner
Copy link
Member

Per conversation, will transition to use is_hit_given_signal and is_hit_given_response etc.

@aridyckovsky
Copy link
Member Author

Plus: is_false_alarm_given_response

@aridyckovsky
Copy link
Member Author

The models we've discussed are now part of the main pipeline via the analyze_behavioral_data sub-pipeline. The output of the analysis notebook can be found here

@psokolhessner
Copy link
Member

psokolhessner commented May 25, 2021

Fantastic. Really nice, clear evidence - with increasing time in the task, people...

  1. Miss more double-sized ticks/have fewer hits.
  2. Are slower to respond when a double-tick does happen.
  3. Are more likely to erroneously report a double-sized tick (false alarm) when responding.

(note the careful phrasing of no. 3; if we wanted to say "are more likely to false alarm when no signal is present", that would require a fourth regression or pair of regressions on is_false_alarm_given_nosignal using trial number or step time instead of resp_time or signal_time)

Interesting to note how robust all of these effects are - fully RFX models identify the exact same effects, implying that most participants experience the effects of the passage of time on hits, reaction time, and false alarms in the same or very similar ways. Visualization or characterization of the individual-level estimates (either given by coef() or the sum of fixef and ranef) would also likely establish that. Of course, if they experienced these the same way, then the RFX estimates would be 0, and that's not the case, so there is some variability, just not much, and it's dwarfed by the FFX term (group-level overall shared effect) in magnitude.

The remaining items from the checklist above (#25 (comment) - mainly plotting model output, and mean p(hit) by half) will nicely wrap this up. Thank you @aridyckovsky this is looking great!!

@aridyckovsky aridyckovsky self-assigned this May 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Manipulates data in some way
Projects
None yet
Development

No branches or pull requests

2 participants