Hit and signal regressions per subject #25

aridyckovsky · 2021-05-11T19:28:07Z

TODO:

Create an is_hit logical column (0, 1)
Reaction time regression on signal time per participant

Potential regression ideas:

library(lme4)

lmer(
  reaction_time ~ 1 + signal_time + (1 | id), 
  data = combined_hits_df[is.finite(combined_hits_df$reaction_time)]
)

glmer(is_hit ~ 1 + signal_time + (1 | id), data = combined_hits_df, family = "binomial")

The text was updated successfully, but these errors were encountered:

aridyckovsky · 2021-05-11T21:25:58Z

@psokolhessner I've added the ideas from above for regressions, the first of which found here: https://github.com/sokolhessnerlab/itrackvalr/blob/main/notebooks/behavioral_data_preprocessing.md#predict-is_hit-using-signal_time

The models as written output fit warnings, one of which is common to both: fit warnings: Some predictor variables are on very different scales: consider rescaling

The glmer model also outputs this: optimizer (Nelder_Mead) convergence code: 0 (OK) ; 0 optimizer warnings; 3 lme4 warnings

psokolhessner · 2021-05-12T04:36:47Z

Ah that warning (some predictor variables are on very different scales) would be b/c signal time will have values in the thousands, as compared to the intercept (a value of 1). Though such pedestrian numeric scale differences shouldn't matter, they do. We'll need to rescale signal_time for both regressions. I'd consider rescaling by 3,600 turning them from units of ms into units of hours (and fractions thereof). Then everything lives on a similar scale.

psokolhessner · 2021-05-12T04:39:09Z

The glmer output is slightly more opaque. What are the additional 3 lme4 warnings?

psokolhessner · 2021-05-12T04:46:01Z

When running models, you want to store their output too. So the calls to lmer and glmer should be something like

model1 = lmer(...

What to name the models... we may be working with them quite a bit, so keeping names clear but also not too long would be good. Here, I'd consider a name that features some text that indicates this is regression output, e.g. model or fit, along with descriptives that capture regression features and/or sequence and variants. I've used names like model1, model2a, model2b, etc before, as well as model_RT_SignalTime_MFX (latter captures that it's model output; it's on RT, using signal time, and is mixed effects [fixed and random]).

psokolhessner · 2021-05-12T04:52:11Z

Left suggestions for how to do this in the RMD file with this commit: e2efb35

aridyckovsky · 2021-05-18T17:22:38Z

@psokolhessner thanks for all of this. I updated the renv.lock to include lmerTest, so that will be accessible throughout the repo. I also adjusted the signal and reaction time scales to the [0,1] interval, and the models ran without warning. Pasting the summary responses here:

Predicting is_hit from signal_time

## Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) [glmerMod
## ]
##  Family: binomial  ( logit )
## Formula: is_hit ~ 1 + signal_time + (1 | id)
##    Data: scaled_combined_hits_df
## 
##      AIC      BIC   logLik deviance df.resid 
##   2307.5   2323.9  -1150.7   2301.5     1797 
## 
## Scaled residuals: 
##          Min           1Q       Median           3Q          Max 
## -2.494411658 -0.784968172 -0.477289551  0.902517615  2.366707656 
## 
## Random effects:
##  Groups Name        Variance    Std.Dev.   
##  id     (Intercept) 0.698673966 0.835867194
## Number of obs: 1800, groups:  id, 50
## 
## Fixed effects:
##                 Estimate   Std. Error  z value   Pr(>|z|)    
## (Intercept)  0.230697540  0.157237994  1.46719    0.14233    
## signal_time -0.859634004  0.182285149 -4.71588 2.4067e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## signal_time -0.572

Predicting reaction_time from signal_time

## Linear mixed model fit by REML. t-tests use Satterthwaite's method ['lmerModLmerTest']
## Formula: reaction_time ~ 1 + signal_time + (1 | id)
##    Data: scaled_combined_hits_df %>% na.omit()
## 
## REML criterion at convergence: 2463
## 
## Scaled residuals: 
##          Min           1Q       Median           3Q          Max 
## -1.856746002 -0.522942508 -0.203867491  0.199189511  5.955977930 
## 
## Random effects:
##  Groups   Name        Variance    Std.Dev.   
##  id       (Intercept) 0.253031509 0.503022374
##  Residual             1.067215435 1.033061196
## Number of obs: 821, groups:  id, 50
## 
## Fixed effects:
##                  Estimate    Std. Error            df  t value   Pr(>|t|)    
## (Intercept)   1.424493424   0.100223741 105.550683060 14.21313 < 2.22e-16 ***
## signal_time   0.774275825   0.128541525 793.026666766  6.02355 2.6097e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr)
## signal_time -0.585

aridyckovsky · 2021-05-18T21:38:22Z

Models to run:

Predict the probability of a hit by signal time with signal time random effects

is_hit ~ 1 + signal_time + (1 + signal_time | id)

Predict the reaction time for a hit by signal time with signal time random effects

reaction_time ~ 1 + signal_time + (1 + signal_time | id)

Predict the probability of a false alarm by response time

is_false_alarm ~ 1 + resp_time + (1 | id)

Predict the probability of a false alarm by response time with response time random effects

is_false_alarm ~ 1 + resp_time + (1 + resp_time | id)

Note:

is_false_alarm is 0 when is_hit == 1, or 1 otherwise

Checklist:

Create is_false_alarm column based on is_hit and other resp_time without a signal
Run above models
Plot the effects from models via predict
Then see if using pHit in first or second half of task (first and second 1800 seconds)
Play with ranef(), fixef() and coef()

psokolhessner · 2021-05-18T22:31:58Z

A tweak to the definition of is_false_alarm to more clearly define what goes into "otherwise":
is_false_alarm contains entries for all responses/button-presses, and is 0 when is_hit == 1, or 1 otherwise.

Alternatively, variable can be identified as is_falsealarm_vs_hit (much like is_hit can be more clearly defined as is_hit_vs_miss).

aridyckovsky · 2021-05-25T17:57:47Z

I think it's clearer to identify boolean variables as is_hit and is_false_alarm, keeping them as completely logical values. By introducing a "vs" into the labeling, we then must rely on our interpretations to understand the underlying logical value's meaning.

For example, the strictly boolean variable is_hit definition is very easy to understand from a data-reading perspective: If is_hit is 1 (true), then it's a hit. If is_hit is 0 (false), then it's not a hit. However, is_hit_vs_miss is not straightforward at the boolean level -- it requires annotation separate from the data to interpret correctly, i.e., is it a hit when true but a miss when false, or a hit when false but a miss when true? In this case, we are better served by adding an is_miss variable to maintain clear boolean variables with no room for interpretation error.

psokolhessner · 2021-05-25T19:22:45Z

Per conversation, will transition to use is_hit_given_signal and is_hit_given_response etc.

aridyckovsky · 2021-05-25T19:36:13Z

Plus: is_false_alarm_given_response

aridyckovsky · 2021-05-25T21:14:13Z

The models we've discussed are now part of the main pipeline via the analyze_behavioral_data sub-pipeline. The output of the analysis notebook can be found here

psokolhessner · 2021-05-25T22:19:21Z

Fantastic. Really nice, clear evidence - with increasing time in the task, people...

Miss more double-sized ticks/have fewer hits.
Are slower to respond when a double-tick does happen.
Are more likely to erroneously report a double-sized tick (false alarm) when responding.

(note the careful phrasing of no. 3; if we wanted to say "are more likely to false alarm when no signal is present", that would require a fourth regression or pair of regressions on is_false_alarm_given_nosignal using trial number or step time instead of resp_time or signal_time)

Interesting to note how robust all of these effects are - fully RFX models identify the exact same effects, implying that most participants experience the effects of the passage of time on hits, reaction time, and false alarms in the same or very similar ways. Visualization or characterization of the individual-level estimates (either given by coef() or the sum of fixef and ranef) would also likely establish that. Of course, if they experienced these the same way, then the RFX estimates would be 0, and that's not the case, so there is some variability, just not much, and it's dwarfed by the FFX term (group-level overall shared effect) in magnitude.

The remaining items from the checklist above (#25 (comment) - mainly plotting model output, and mean p(hit) by half) will nicely wrap this up. Thank you @aridyckovsky this is looking great!!

aridyckovsky added the data Manipulates data in some way label May 11, 2021

aridyckovsky added this to the Analysis milestone May 11, 2021

aridyckovsky assigned psokolhessner May 11, 2021

aridyckovsky changed the title ~~Hit-dependent regressions per subject~~ Hit and signal regressions per subject May 11, 2021

aridyckovsky added a commit that referenced this issue May 11, 2021

add is_hit column related to #25

1522135

aridyckovsky added a commit that referenced this issue May 11, 2021

add ideas for regression from #25

38da205

aridyckovsky added a commit that referenced this issue May 18, 2021

fix: scaled times to eliminate warnings related to #25

1efb633

aridyckovsky added a commit that referenced this issue May 25, 2021

Update models addressing #25

58c0bb1

aridyckovsky added a commit that referenced this issue May 25, 2021

revise naming conventions via #25 comments and refactor targets pipeline

a491f92

aridyckovsky self-assigned this May 26, 2021

aridyckovsky added a commit that referenced this issue Jun 2, 2021

add prediction plots using ggeffects for #25

ccb1a82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hit and signal regressions per subject #25

Hit and signal regressions per subject #25

aridyckovsky commented May 11, 2021 •

edited

Loading

aridyckovsky commented May 11, 2021

psokolhessner commented May 12, 2021

psokolhessner commented May 12, 2021

psokolhessner commented May 12, 2021

psokolhessner commented May 12, 2021

aridyckovsky commented May 18, 2021 •

edited

Loading

aridyckovsky commented May 18, 2021 •

edited

Loading

psokolhessner commented May 18, 2021

aridyckovsky commented May 25, 2021

psokolhessner commented May 25, 2021

aridyckovsky commented May 25, 2021

aridyckovsky commented May 25, 2021

psokolhessner commented May 25, 2021 •

edited

Loading

Hit and signal regressions per subject #25

Hit and signal regressions per subject #25

Comments

aridyckovsky commented May 11, 2021 • edited Loading

aridyckovsky commented May 11, 2021

psokolhessner commented May 12, 2021

psokolhessner commented May 12, 2021

psokolhessner commented May 12, 2021

psokolhessner commented May 12, 2021

aridyckovsky commented May 18, 2021 • edited Loading

aridyckovsky commented May 18, 2021 • edited Loading

psokolhessner commented May 18, 2021

aridyckovsky commented May 25, 2021

psokolhessner commented May 25, 2021

aridyckovsky commented May 25, 2021

aridyckovsky commented May 25, 2021

psokolhessner commented May 25, 2021 • edited Loading

aridyckovsky commented May 11, 2021 •

edited

Loading

aridyckovsky commented May 18, 2021 •

edited

Loading

aridyckovsky commented May 18, 2021 •

edited

Loading

psokolhessner commented May 25, 2021 •

edited

Loading