Skip to content

Latest commit

 

History

History
597 lines (545 loc) · 388 KB

CODEBOOK.md

File metadata and controls

597 lines (545 loc) · 388 KB

Codebook

The codebook specifies the data types, possible values, and other information for each column in the data files.

Table of contents

Word features

TODO: insert short text about this section in this file

Please find the files under this link: Word features

Column name Possible values Value type Description Num missing values Missing value description Source
word string Words as they appear in the stimuli texts. Words are split at white-space. 0 nan nan
word_with_punct The word as it appears in the text, including punctuation. 0 nan nan
word_index_in_text 1-180 Integer The index of the word in the text. Indexing starts at 1. 0 nan nan
word_index_in_sent 1-51 Integer The index of the word in the sentence. Indexing starts at 1. 0 nan nan
sent_index_in_text 1-12 Integer The index of a sentence in the respective text. Indexing starts at 1. 0 nan nan
word_limit_char_indices no stats? Specifies the limits of each word in character indices. Format: [word_start],[word_end]. For example: 3,7 means a word starts at character index 3 in the text and ends at character index 7. The properties of the character indices are specified in char_index_in_text. 0 nan nan
text_id b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 Unique identifier given to each stimulus text. 0 nan nan
text_id_numeric 0-11 Integer Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 0 nan nan
text_domain biology: 954, physics: 941 Categorical The domain of the stimulus text. 0 nan nan
text_domain_numeric 0: 954, 1: 941 Categorical Numerical value of text_domain; 0=biology, 1=physics. 0 nan nan
word_length 2-33 Integer Word length is defined in number of characters including symbols like hyphens but without sentence punctuation at the end (i.e., z.B. = 4 characters; DNA-Kette =9 characters; eats.=4 characters). 0 nan nan
STTS_punctuation_before nan: 1883, $(: 12 Categorical If a word is preceded by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. 1883 nan nan
STTS_punctuation_after nan: 1689, $.: 101, $,: 93, $(: 10, $($,: 2 Categorical If a word is followed by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. 1689 nan nan
is_in_quote 0: 1881, 1: 14 Categorical Whether or not the word is part of an expression in quotes. 0 nan Manually tagged
is_in_parentheses 0: 1890, 1: 5 Categorical Whether or not the word is part of a phrase in parentheses. 0 nan Manually tagged
is_clause_beginning 0: 1796, 1: 99 Categorical Whether or not the word is the beginning of a clause. 0 nan Manually tagged
is_sent_beginning 0: 1798, 1: 97 Categorical Whether or not the word is the beginning of a new sentence. 0 nan Manually tagged
is_clause_end Whether or not the word is the end of a clause. 0 nan nan
is_sent_end Whether or not the word is the end of a sentence. 0 nan nan
is_abbreviation 0: 1890, 1: 5 Categorical Whether or not the entire word is an abbreviation. 0 nan Manually tagged
is_expert_technical_term 0: 1740, 1: 155 Categorical 1 if the word is a technical term that is not generally understandable. E.g.: ""Agarose"". 0 nan Manually tagged
is_general_technical_term 0: 1646, 1: 249 Categorical 1 if the word is a technical term that is generally understandable. E.g.: "elektrisch" 0 nan nan
contains_symbol 0: 1887, 1: 8 Categorical Whether or not the word contains a symbol. E.g.: β-D-Glucose 0 nan nan
contains_hyphen 0: 1866, 1: 29 Categorical Whether or not the word contains a hyphen. E.g. 1 for DNA-Fragment (not words that have tag TRUNC (compositional first element, e.g. in "Sekundär- und Tertiärstrukturen", "Sekundär-" does not count as having a hyphen.)). 0 nan nan
contains_abbreviation 0: 1883, 1: 12 Categorical Whether or not the word contains an abbreviation. 0 for words that are only an abbreviation. See is_abbreviation. E.g. 1 for DNA-Fragment, 0 for DNA. 0 nan nan
STTS_PoS_tag ADJA: 154, ADJD: 53, ADV: 73, APPR: 184, APPRART: 48, APZR: 1, ART: 276, CARD: 9, KOKOM: 17, KON: 66, KOUI: 6, KOUS: 16, NE: 4, NN: 515, PAV: 18, PDAT: 16, PDS: 7, PIAT: 5, PIDAT: 9, PIS: 10, PPER: 25, PPOSAT: 7, PRELAT: 6, PRELS: 29, PRF: 25, PTKA: 1, PTKNEG: 4, PTKVZ: 13, PTKZU: 10, PWAV: 1, TRUNC: 5, VAFIN: 73, VAINF: 8, VMFIN: 25, VMINF: 1, VVFIN: 102, VVINF: 33, VVIZU: 2, VVPP: 38 Categorical Part-of-speech tags according to the STTS-tagset. See stimuli/ANNOTATION.MD for more information. 0 nan Manually tagged
type string The orthographical representation of a word as found in the corpus; this data is case sensitive, i.e. there is a distinction between name and Name. 4 nan dlexDB
type_length_chars 2.0-33.0 Integer The length of the type of a word in characters. See the description of word_length for a definition of how characters are counted. 1 nan nan
PoS_tag adja: 162, adjd: 54, adv: 91, appr: 182, apprart: 48, art: 280, card: 9, kokom: 17, kon: 63, koui: 5, kous: 16, ne: 7, nn: 508, pdat: 16, pds: 7, piat: 5, pidat: 2, pis: 14, pper: 24, pposat: 7, prelat: 6, prels: 24, prf: 25, ptka: 1, ptkneg: 4, ptkvz: 15, ptkzu: 10, pwav: 1, trunc: 5, vafin: 73, vainf: 8, vmfin: 24, vminf: 1, vvfin: 103, vvinf: 33, vvizu: 2, vvpp: 38, xy: 5 Categorical Part-of-speech tag as defined by the dlexDB query. 0 nan dlexDB
lemma string nan 4 nan dlexDB
lemma_length_chars 1.0-32.0 Integer nan 3 nan dlexDB
syllables string nan 25 nan dlexDB
type_length_syllables 1.0-14.0 Integer nan 24 nan dlexDB
annotated_type_frequency_normalized min: 0.00817507899599, max: 24738.5901996, mean: 3889.8532, std: 6967.089 Float The number of occurrences of an annotated type in corpus. An annotated type is a unique combination of a type, its part-of-speech tag and its lemma. 127 nan dlexDB
type_frequency_normalized min: 0.00817507899599, max: 26530.3631386, mean: 4409.2283, std: 7712.5287 Float nan 115 nan dlexDB
lemma_frequency_normalized min: 0.00817507899599, max: 80100.3069113, mean: 13063.8057, std: 25247.1898 Float nan 115 nan dlexDB
familiarity_normalized min: 0.0, max: 26530.3631386, mean: 4074.0362, std: 7634.0602 Float nan 117 nan dlexDB
regularity_normalized min: 0.0, max: 2123.30585022, mean: 37.6119, std: 123.3575 Float nan 116 nan dlexDB
document_frequency_normalized min: 0.126068429944, max: 9372.80956103, mean: 3073.6225, std: 3377.4549 Float nan 116 nan dlexDB
sentence_frequency_normalized min: 0.0155184320176, max: 30912.3596552, mean: 6119.8019, std: 9642.457 Float nan 116 nan dlexDB
cumulative_syllable_corpus_frequency_normalized min: 1.40611358731, max: 125126.524676, mean: 16825.508, std: 15793.39 Float nan 116 nan dlexDB
cumulative_syllable_lexicon_frequency_normalized min: 0.428085856899, max: 218985.607753, mean: 23221.2613, std: 31879.0143 Float nan 119 nan dlexDB
cumulative_character_corpus_frequency_normalized min: 15533.2550482, max: 7810554.20193, mean: 1917789.2641, std: 1253328.3202 Float nan 116 nan dlexDB
cumulative_character_lexicon_frequency_normalized min: 47003.8270876, max: 18380479.713, mean: 4265792.357, std: 2812004.0938 Float nan 116 nan dlexDB
cumulative_character_bigram_corpus_frequency_normalized min: 5138.64210483, max: 1322150.62097, mean: 363265.3368, std: 217175.5613 Float nan 116 nan dlexDB
cumulative_character_bigram_lexicon_frequency_normalized min: 12677.7626521, max: 2788357.77704, mean: 590209.5889, std: 442407.5129 Float nan 116 nan dlexDB
cumulative_character_trigram_corpus_frequency_normalized min: 4358.04468689, max: 603427.130456, mean: 227949.9158, std: 122856.9432 Float nan 116 nan dlexDB
cumulative_character_trigram_lexicon_frequency_normalized min: 11942.3111499, max: 899592.89035, mean: 237804.6839, std: 171696.6712 Float nan 116 nan dlexDB
initial_letter_frequency_normalized min: 199.202149895, max: 110461.430317, mean: 38381.0963, std: 33346.9984 Float nan 116 nan dlexDB
initial_bigram_frequency_normalized min: 1.57779024623, max: 53801.2331077, mean: 12768.0203, std: 14670.9631 Float nan 116 nan dlexDB
initial_trigram_frequency_normalized min: -0.00817507899599, max: 29048.3692201, mean: 5888.4981, std: 8949.4325 Float nan 116 nan dlexDB
avg_cond_prob_in_bigrams min: 1.2e-07, max: 0.5006180465, mean: 0.0451, std: 0.0448 Float The conditional probability of the bigram, given the occurrence of its first component. In other words, how likely it is for the second component to follow directly after the first. Here, this measure is computed on the basis of the annotated type information. 116 nan dlexDB
avg_cond_prob_in_trigrams min: 3.153e-06, max: 25.0, mean: 0.2526, std: 0.6009 Float The conditional probability of the trigram, given the occurrence of its initial bigram. In other words, how likely it is for the third component to follow directly after the initial pair. Here, this measure is computed on the basis of the annotated type information. 116 nan dlexDB
neighbors_coltheart_higher_freq_cum_freq_normalized min: 0.0, max: 44055.247282, mean: 2248.7136, std: 7540.5582 Float nan 116 nan dlexDB
neighbors_coltheart_higher_freq_count_normalized min: 0.0, max: 8.13363128109, mean: 0.2077, std: 0.5007 Float nan 116 nan dlexDB
neighbors_coltheart_all_cum_freq_normalized min: 0.0, max: 49782.1108458, mean: 5076.6032, std: 10127.1033 Float nan 116 nan dlexDB
neighbors_coltheart_all_count_normalized min: 0.0, max: 47.5175301158, mean: 15.7971, std: 14.4153 Float nan 116 nan dlexDB
neighbors_levenshtein_higher_freq_cum_freq_normalized min: 0.0, max: 44055.247282, mean: 2879.4346, std: 7921.0448 Float nan 116 nan dlexDB
neighbors_levenshtein_higher_freq_count_normalized min: 0.0, max: 11.9864039932, mean: 0.3277, std: 0.6576 Float nan 116 nan dlexDB
neighbors_levenshtein_all_cum_freq_normalized min: 0.0, max: 54875.2749862, mean: 6722.366, std: 11598.2601 Float nan 116 nan dlexDB
neighbors_levenshtein_all_count_normalized min: 0.0, max: 75.7711966712, mean: 24.6418, std: 22.5295 Float nan 116 nan dlexDB
sent_surprisal_gpt2-base Surprisal value extracted from a language model (GerPT2-base) with the sentence as context. 0 nan nan
text_surprisal_gpt2-base Surprisal value extracted from a language model (GerPT2-base) with the text as context. 0 nan nan
sent_surprisal_gpt2-large Surprisal value extracted from a language model (GerPT2-large) with the sentence as context. 0 nan nan
text_surprisal_gpt2-large Surprisal value extracted from a language model (GerPT2-large) with the text as context. 0 nan nan
sent_surprisal_llama-7b Surprisal value extracted from a language model (LeoLM-7b) with the sentence as context. 0 nan nan
text_surprisal_llama-7b Surprisal value extracted from a language model (LeoLM-7b) with the text as context. 0 nan nan
sent_surprisal_llama-13b Surprisal value extracted from a language model (LeoLM-13b) with the sentence as context. 0 nan nan
text_surprisal_llama-13b Surprisal value extracted from a language model (LeoLM-13b) with the text as context. 0 nan nan
sent_surprisal_bert-base Surprisal value extracted from a language model (BERT-base) with the sentence as context. 0 nan nan
text_surprisal_bert-base Surprisal value extracted from a language model (BERT-base) with the text as context. 0 nan nan

Stimuli and comprehension questions

TODO: insert short text about this section in this file

Please find the file under this link: Stimuli including comprehension questions

Column name Possible values Value type Description Num missing values Missing value description Source
text_id b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 Unique identifier given to each stimulus text. 0 nan nan
text_id_numeric 0-11 Integer Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 0 nan nan
text_domain biology: 6, physics: 6 Categorical The domain of the stimulus text. 0 nan nan
text_domain_numeric 0: 6, 1: 6 Categorical Numerical value of text_domain; 0=biology, 1=physics. 0 nan nan
source The source of the stimulus text. 0 nan nan
headline string The header of the respective stimulus text. 0 nan nan
tq_1 string Text question 1. 0 nan nan
tq_1_option1 string Option 1 for text question 1. 0 nan nan
tq_1_option2 string Option 2 for text question 1. 0 nan nan
tq_1_option3 string Option 3 for text question 1. 0 nan nan
tq_1_option4 string Option 4 for text question 1. 0 nan nan
tq_2 string Text question 2. 0 nan nan
tq_2_option1 string Option 1 for text question 2. 0 nan nan
tq_2_option2 string Option 2 for text question 2. 0 nan nan
tq_2_option3 string Option 3 for text question 2. 0 nan nan
tq_2_option4 string Option 4 for text question 2. 0 nan nan
tq_3 string Text question 3. 0 nan nan
tq_3_option1 string Option 1 for text question 3. 0 nan nan
tq_3_option2 string Option 2 for text question 3. 0 nan nan
tq_3_option3 string Option 3 for text question 3. 0 nan nan
tq_3_option4 string Option 4 for text question 3. 0 nan nan
bq_1 string Background question 1. 0 nan nan
bq_1_option1 string Option 1 for background question 1. 0 nan nan
bq_1_option2 string Option 2 for background question 1. 0 nan nan
bq_1_option3 string Option 3 for background question 1. 0 nan nan
bq_1_option4 string Option 4 for background question 1. 0 nan nan
bq_2 string Background question 2. 0 nan nan
bq_2_option1 string Option 1 for background question 2. 0 nan nan
bq_2_option2 string Option 2 for background question 2. 0 nan nan
bq_2_option3 string Option 3 for background question 2. 0 nan nan
bq_2_option4 string Option 4 for background question 2. 0 nan nan
bq_3 string Background question 3. 0 nan nan
bq_3_option1 string Option 1 for background question 3. 0 nan nan
bq_3_option2 string Option 2 for background question 3. 0 nan nan
bq_3_option3 string Option 3 for background question 3. 0 nan nan
bq_3_option4 string Option 4 for background question 3. 0 nan nan
correct_ans_tq_1 1-4 Integer The index of the correct answer for text question 1. Specified as option number of the questions in that file. For example: 2 means that the answer that is specified in the column "tq_3_option2" is the correct answer to this question. 0 nan nan
correct_ans_tq_2 1-4 Integer The index of the correct answer for text question 2. Specified as option number of the questions in that file. For example: 2 means that the answer that is specified in the column "tq_3_option2" is the correct answer to this question. 0 nan nan
correct_ans_tq_3 1-4 Integer The index of the correct answer for text question 3. Specified as option number of the questions in that file. For example: 2 means that the answer that is specified in the column "tq_3_option2" is the correct answer to this question. 0 nan nan
correct_ans_bq_1 1-4 Integer The index of the correct answer for background question 1. Specified as option number of the questions in that file. For example: 2 means that the answer that is specified in the column "tq_3_option2" is the correct answer to this question. 0 nan nan
correct_ans_bq_2 1-4 Integer The index of the correct answer for background question 2. Specified as option number of the questions in that file. For example: 2 means that the answer that is specified in the column "tq_3_option2" is the correct answer to this question. 0 nan nan
correct_ans_bq_3 1-4 Integer The index of the correct answer for background question 3. Specified as option number of the questions in that file. For example: 2 means that the answer that is specified in the column "tq_3_option2" is the correct answer to this question. 0 nan nan

Items

TODO: insert short text about this section in this file

Please find the file under this link: Items

Column name Possible values Value type Description Num missing values Missing value description Source
version 0-119 Integer Specifies the version of the items. In each version, the order of the stimuli and the order of the answer options for each question differ. The specifics of each version can be found in the items.tsv. 0 nan nan
text_id b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 Unique identifier given to each stimulus text. 0 nan nan
text_domain biology: 720, physics: 720 Categorical The domain of the stimulus text. 0 nan nan
order_bq_1_ans no stats? The order in which the answers for background question 1 were presented. 0 nan nan
order_bq_2_ans no stats? See description of order_bq_1_ans 0 nan nan
order_bq_3_ans no stats? See description of order_bq_1_ans 0 nan nan
order_tq_1_ans no stats? See description of order_bq_1_ans 0 nan nan
order_tq_2_ans no stats? See description of order_bq_1_ans 0 nan nan
order_tq_3_ans no stats? See description of order_bq_1_ans 0 nan nan
trial 1-12 Integer Each participant reads all 12 texts, the order of which follows their trial number. If text b0 has trial number 2 for participant 5, this participant read text b0 as the second text. 0 nan nan

Areas of interest (AOI)

TODO: insert short text about this section in this file

Please find the files under this link: AOI

Column name Possible values Value type Description Num missing values Missing value description Source
aoi_type The shape of the area of interest. In this corpus, all aois are rectangles around the characters. 0 nan nan
aoi 1-1121 Integer The region of interest specified as character index in the text (see char_index_in_text). Defines which character has been fixated. 0 nan nan
start_x 80-1622 Integer The x-coordinate in pixels of the top left corner of the aoi rectangle. 0 nan nan
start_y 21-920 Integer The y-coordinate in pixels of the top left corner of the aoi rectangle. 0 nan nan
end_x 92-1634 Integer The x-coordinate in pixels of the bottom right corner of the aoi rectangle. 0 nan nan
end_y 99-998 Integer The y-coordinate in pixels of the bottom right corner of the aoi rectangle. 0 nan nan
character string Character as text. 0 nan nan

Dependency trees

TODO: insert short text about this section in this file

Please find the file under this link: Dependency trees

Column name Possible values Value type Description Num missing values Missing value description Source
sent_index_in_text 1-12 Integer The index of a sentence in the respective text. Indexing starts at 1. 0 nan nan
sentence string The sentence in the text. 0 nan nan
dependency_tree string The dependency tree of the sentence in the text. 0 nan nan
text_id_numeric 0-11 Integer Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 0 nan nan
text_id b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 Unique identifier given to each stimulus text. 0 nan nan

Fixations

TODO: insert short text about this section in this file

Please find the files under this link: Fixations

Column name Possible values Value type Description Num missing values Missing value description Source
fixation_index 1-1469 Integer The index of the fixation in temporal order. 0 nan nan
text_domain bio: 203667, biology: 1032, physics: 199721 Categorical The domain of the stimulus text. 0 nan nan
trial 1-12 Integer Each participant reads all 12 texts, the order of which follows their trial number. If text b0 has trial number 2 for participant 5, this participant read text b0 as the second text. 0 nan nan
acc_bq_1 min: 0.0, max: 1.0, mean: 0.3869, std: 0.487 Float The accuracy of background question 1. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_bq_2 min: 0.0, max: 1.0, mean: 0.3564, std: 0.4789 Float The accuracy of background question 2. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_bq_3 min: 0.0, max: 1.0, mean: 0.4217, std: 0.4938 Float The accuracy of background question 3. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_1 min: 0.0, max: 1.0, mean: 0.6625, std: 0.4729 Float The accuracy of text question 1. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_2 min: 0.0, max: 1.0, mean: 0.6326, std: 0.4821 Float The accuracy of text question 2. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_3 min: 0.0, max: 1.0, mean: 0.6564, std: 0.4749 Float The accuracy of text question 3. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
fixation_duration 2-4474 Integer The duration of the fixation in milliseconds. 0 nan nan
next_saccade_duration 1.0-9491.0 Integer The duration of the saccade that follows a fixation in milliseconds. 46 nan nan
previous_saccade_duration nan-nan Integer The duration of a saccade that preceeds a fixation in milliseconds. 515 nan nan
version 0-105 Integer Specifies the version of the items. In each version, the order of the stimuli and the order of the answer options for each question differ. The specifics of each version can be found in the items.tsv. 0 nan nan
line 1-12 Integer The texts were presented on the screen in multiple lines. Specifies the line of the respective row; indexing starts at 1. 0 nan nan
aoi 1-1121 Integer The region of interest specified as character index in the text (see char_index_in_text). Defines which character has been fixated. 0 nan nan
char_index_in_line 1-100 Integer Index of a character in the line. Indexing starts at 1. 0 nan nan
original_fixation_index 1-1478 Integer The index of the uncorrected fixation. 0 nan nan
is_fixation_adjusted False: 382202, True: 22218 Categorical Whether or not the fixation has been adjusted manually. 0 nan Manually tagged.
reader_id 0-105 Integer The unique identifier given to each reader. Reader IDs start at 0. 0 nan nan
text_id b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 Unique identifier given to each stimulus text. 0 nan nan

Scanpaths

TODO: insert short text about this section in this file

Please find the files under this link: Scanpaths

Column name Possible values Value type Description Num missing values Missing value description Source
fixation_index 1-1469 Integer The index of the fixation in temporal order. 0 nan nan
text_domain bio: 4682, biology: 200017, physics: 199721 Categorical The domain of the stimulus text. 0 nan nan
trial 1-12 Integer Each participant reads all 12 texts, the order of which follows their trial number. If text b0 has trial number 2 for participant 5, this participant read text b0 as the second text. 0 nan nan
acc_bq_1 min: 0.0, max: 1.0, mean: 0.3869, std: 0.487 Float The accuracy of background question 1. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_bq_2 min: 0.0, max: 1.0, mean: 0.3564, std: 0.4789 Float The accuracy of background question 2. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_bq_3 min: 0.0, max: 1.0, mean: 0.4217, std: 0.4938 Float The accuracy of background question 3. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_1 min: 0.0, max: 1.0, mean: 0.6625, std: 0.4729 Float The accuracy of text question 1. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_2 min: 0.0, max: 1.0, mean: 0.6326, std: 0.4821 Float The accuracy of text question 2. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_3 min: 0.0, max: 1.0, mean: 0.6564, std: 0.4749 Float The accuracy of text question 3. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
fixation_duration 2-4474 Integer The duration of the fixation in milliseconds. 0 nan nan
next_saccade_duration 1.0-9491.0 Integer The duration of the saccade that follows a fixation in milliseconds. 46 nan nan
previous_saccade_duration 1.0-9491.0 Integer The duration of a saccade that preceeds a fixation in milliseconds. 515 nan nan
version 0-105 Integer Specifies the version of the items. In each version, the order of the stimuli and the order of the answer options for each question differ. The specifics of each version can be found in the items.tsv. 0 nan nan
line 1-12 Integer The texts were presented on the screen in multiple lines. Specifies the line of the respective row; indexing starts at 1. 0 nan nan
aoi 1-1121 Integer The region of interest specified as character index in the text (see char_index_in_text). Defines which character has been fixated. 0 nan nan
char_index_in_line 1-100 Integer Index of a character in the line. Indexing starts at 1. 0 nan nan
original_fixation_index 1-1478 Integer The index of the uncorrected fixation. 0 nan nan
is_fixation_adjusted False: 382202, True: 22218 Categorical Whether or not the fixation has been adjusted manually. 0 nan Manually tagged.
reader_id 0-105 Integer The unique identifier given to each reader. Reader IDs start at 0. 0 nan nan
text_id b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 Unique identifier given to each stimulus text. 0 nan nan
word_index_in_text 1-180 Integer The index of the word in the text. Indexing starts at 1. 0 nan nan
sent_index_in_text 1-12 Integer The index of a sentence in the respective text. Indexing starts at 1. 0 nan nan
char_index_in_text 1-1121 Integer Index of a character in the text. Indexing starts at 1. 0 nan nan
word string Words as they appear in the stimuli texts. Words are split at white-space. 0 nan nan
character string Character as text. 0 nan nan
text_id_numeric 0-11 Integer Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 0 nan nan
text_domain_numeric 0: 204699, 1: 199721 Categorical Numerical value of text_domain; 0=biology, 1=physics. 0 nan nan
reader_domain_numeric 0: 223158, 1: 181262 Categorical Numerical encoding of the reader domain; 0=biology, 1=physics. 0 nan nan
expert_status_numeric 0: 154333, 1: 250087 Categorical Numerical value of expert_status; 0=beginner, 1=expert. 0 nan nan
expert_reading_label_numeric 0: 290883, 1: 113537 Categorical Numeric encoding of whether the reader is an expert in the text domain (i.e. text_domain == reader_domain and reader is expert). 1=expert_reading, 0=non-expert_reading 0 nan nan
expert_reading_label expert_reading: 113537, non-expert_reading: 290883 Categorical Whether the reader is an expert in the text domain (i.e. text_domain == reader_domain and reader is expert) 0 nan nan

Reading measures

TODO: insert short text about this section in this file

Please find the files under this link: Reading measures

Column name Possible values Value type Description Num missing values Missing value description Source
word_index_in_sent 1-51 Integer The index of the word in the sentence. Indexing starts at 1. 0 nan nan
sent_index_in_text 1-12 Integer The index of a sentence in the respective text. Indexing starts at 1. 0 nan nan
FFD min: 0, max: 2144, mean: 166.4158, std: 132.8433 Float First-fixation duration: duration of the first fixation on a word if this word is fixated in first-pass reading, otherwise 0. 0 nan nan
SFD min: 0, max: 2144, mean: 118.8309, std: 135.573 Float Single-fixation duration: duration of the only first-pass fixation on a word, 0 if the word was skipped or more than one fixation occurred in the first-pass (equals FFD in case of a single first-pass fixation). 0 nan nan
FD min: 0, max: 2144, mean: 203.5219, std: 116.9324 Float First duration: duration of the first fixation on a word (identical to FFD if not skipped in the first-pass). 0 nan nan
FPRT min: 0, max: 9649, mean: 247.1511, std: 298.6889 Float First-pass reading time: sum of the durations of all first-pass fixations on a word (0 if the word was skipped in the first-pass). 0 nan nan
FRT min: 0, max: 9649, mean: 291.8272, std: 288.631 Float First-reading time: sum of the duration of all fixations from first fixating the word (independent if the first fixation occurs in first-pass reading) until leaving the word for the first time (equals FPRT in case the word was fixated in the first-pass). 0 nan nan
TFT min: 0, max: 25314, mean: 632.8199, std: 720.3975 Float Total-fixation time: sum of all fixations on a word (FPRT+RRT). 0 nan nan
RRT min: 0, max: 23902, mean: 385.6688, std: 597.5206 Float Re-reading time: sum of the durations of all fixations on a word that do not belong to the first-pass (TFT-FPRT). 0 nan nan
RPD_inc min: 0, max: 318898, mean: 632.8199, std: 3881.7376 Float Inclusive regression-path duration: Sum of all fixation durations starting from the first first-pass fixation on a word until fixation on a word to the right of this word (including all regressive fixations on previous words), 0 if the word was not fixated in the first-pass (RPD_exc+RBRT). 0 nan nan
RPD_exc min: 0, max: 315640, mean: 342.295, std: 3815.3786 Float Exclusive regression-path duration: Sum of all fixation durations after initiating a first-pass regression from a word until fixating a word to the right of this word, without counting fixations on the word itself (RPD_inc-RBRT). 0 nan nan
RBRT min: 0, max: 10675, mean: 290.5249, std: 358.8929 Float Right-bounded reading time: Sum of all fixation durations on a word until a word to the right of this word is fixated (RPD_inc-RDP_exc). 0 nan nan
Fix 0: 14182, 1: 127943 Categorical Fixation: 1 if the word was fixated, otherwise 0 (FPF or RR). 0 nan nan
FPF 0: 38408, 1: 103717 Categorical First-pass fixation: 1 if the word was fixated in the first-pass, otherwise 0. 0 nan nan
RR 0: 48283, 1: 93842 Categorical Re-reading: 1 if the word was fixated after the first-pass reading, otherwise 0 (sign(RRT)). 0 nan nan
FPReg 0: 119060, 1: 23065 Categorical First-pass regression: 1 if a regression was initiated in the first-pass reading of the word, otherwise 0 (sign(RPD exc)). 0 nan nan
TRC_out 0-15 Integer Total count of outgoing regressions: total number of regressive saccades initiated from this word. 0 nan nan
TRC_in 0-12 Integer Total count of incoming regressions: total number of regressive saccades landing on this word. 0 nan nan
LP 0-28 Integer Landing position: position of the first saccade on the word expressed by ordinal position of the fixated character. 0 nan nan
SL_in -162-156 Integer Incoming saccade length: length of the saccade that leads to first fixation on a word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression. 0 nan nan
SL_out -179-63 Integer Outgoing saccade length: length of the first saccade that leaves the word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression; 0 if the word is never fixated. 0 nan nan
TFC The total fixation count on the word. 0 nan nan
text_domain_numeric 0: 71550, 1: 70575 Categorical Numerical value of text_domain; 0=biology, 1=physics. 0 nan nan
trial 1-12 Integer Each participant reads all 12 texts, the order of which follows their trial number. If text b0 has trial number 2 for participant 5, this participant read text b0 as the second text. 0 nan nan
text_id b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 Unique identifier given to each stimulus text. 0 nan nan
text_id_numeric 0-11 Integer Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 0 nan nan
reader_id 0-105 Integer The unique identifier given to each reader. Reader IDs start at 0. 0 nan nan
gender_numeric 0.0: 66325, 1.0: 73905, nan: 1895 Categorical Numerical value of gender; 0=male, 1=female. 1895 nan nan
reader_domain_numeric 0: 81485, 1: 60640 Categorical Numerical encoding of the reader domain; 0=biology, 1=physics. 0 nan nan
expert_status_numeric 0: 53060, 1: 89065 Categorical Numerical value of expert_status; 0=beginner, 1=expert. 0 nan nan
domain_expert_status_numeric 0: 30320, 1: 51165, 2: 22740, 3: 37900 Categorical Numerical value of domain_expert_status; 0=biology-beginner, 1=biology-expert, 2=physics-beginner, 3=physics-expert. 0 nan nan
expert_reading_label_numeric 0: 97547, 1: 44578 Categorical Numeric encoding of whether the reader is an expert in the text domain (i.e. text_domain == reader_domain and reader is expert). 1=expert_reading, 0=non-expert_reading 0 nan nan
expert_reading_label expert_reading: 44578, non-expert_reading: 97547 Categorical Whether the reader is an expert in the text domain (i.e. text_domain == reader_domain and reader is expert) 0 nan nan
age min: 18.0, max: 41.0, mean: 24.1644, std: 4.1809 Float Reader's age. 3790 nan nan
mean_acc_bq min: 0.0, max: 1.0, mean: 0.6487, std: 0.3076 Float The mean accuracy of all text questions for one text read by one reader. 1958 nan nan
mean_acc_tq min: 0.0, max: 1.0, mean: 0.3939, std: 0.3158 Float The mean accuracy of all background questions for one text read by one reader. 1958 nan nan
acc_bq_1 min: 0.0, max: 1.0, mean: 0.3922, std: 0.4883 Float The accuracy of background question 1. The answer can be either true or false, so the value is either 0 or 1. 1958 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_bq_2 min: 0.0, max: 1.0, mean: 0.3619, std: 0.4805 Float The accuracy of background question 2. The answer can be either true or false, so the value is either 0 or 1. 1958 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_bq_3 min: 0.0, max: 1.0, mean: 0.4277, std: 0.4947 Float The accuracy of background question 3. The answer can be either true or false, so the value is either 0 or 1. 1958 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_1 min: 0.0, max: 1.0, mean: 0.6469, std: 0.4779 Float The accuracy of text question 1. The answer can be either true or false, so the value is either 0 or 1. 1958 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_2 min: 0.0, max: 1.0, mean: 0.6428, std: 0.4792 Float The accuracy of text question 2. The answer can be either true or false, so the value is either 0 or 1. 1958 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_3 min: 0.0, max: 1.0, mean: 0.6563, std: 0.4749 Float The accuracy of text question 3. The answer can be either true or false, so the value is either 0 or 1. 1958 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan

Merged: fixations, participant info, reading measures and word features

TODO: insert short text about this section in this file

Please find the files under this link: Reading measures merged

Column name Possible values Value type Description Num missing values Missing value description Source
word string Words as they appear in the stimuli texts. Words are split at white-space. 0 nan nan
word_with_punct The word as it appears in the text, including punctuation. 0 nan nan
word_index_in_text 1-180 Integer The index of the word in the text. Indexing starts at 1. 0 nan nan
word_index_in_sent 1-51 Integer The index of the word in the sentence. Indexing starts at 1. 0 nan nan
sent_index_in_text 1-12 Integer The index of a sentence in the respective text. Indexing starts at 1. 0 nan nan
text_id_numeric 0-11 Integer Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 0 nan nan
text_domain biology: 71550, physics: 70575 Categorical The domain of the stimulus text. 0 nan nan
word_length 2-33 Integer Word length is defined in number of characters including symbols like hyphens but without sentence punctuation at the end (i.e., z.B. = 4 characters; DNA-Kette =9 characters; eats.=4 characters). 0 nan nan
STTS_punctuation_before 0.0: 70800, 0: 70425, $(: 900 Categorical If a word is preceded by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. 0 nan nan
STTS_punctuation_after $(: 750, $($,: 150, $,: 6975, $.: 7575, 0: 126675 Categorical If a word is followed by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. 0 nan nan
is_in_quote 0: 141075, 1: 1050 Categorical Whether or not the word is part of an expression in quotes. 0 nan Manually tagged
is_in_parentheses 0: 141750, 1: 375 Categorical Whether or not the word is part of a phrase in parentheses. 0 nan Manually tagged
is_clause_beginning 0: 134700, 1: 7425 Categorical Whether or not the word is the beginning of a clause. 0 nan Manually tagged
is_sent_beginning 0: 134850, 1: 7275 Categorical Whether or not the word is the beginning of a new sentence. 0 nan Manually tagged
is_clause_end Whether or not the word is the end of a clause. 0 nan nan
is_sent_end Whether or not the word is the end of a sentence. 0 nan nan
is_abbreviation 0: 141750, 1: 375 Categorical Whether or not the entire word is an abbreviation. 0 nan Manually tagged
is_expert_technical_term 0: 130500, 1: 11625 Categorical 1 if the word is a technical term that is not generally understandable. E.g.: ""Agarose"". 0 nan Manually tagged
is_general_technical_term 0: 123450, 1: 18675 Categorical 1 if the word is a technical term that is generally understandable. E.g.: "elektrisch" 0 nan nan
contains_symbol 0: 141525, 1: 600 Categorical Whether or not the word contains a symbol. E.g.: β-D-Glucose 0 nan nan
contains_hyphen 0: 139950, 1: 2175 Categorical Whether or not the word contains a hyphen. E.g. 1 for DNA-Fragment (not words that have tag TRUNC (compositional first element, e.g. in "Sekundär- und Tertiärstrukturen", "Sekundär-" does not count as having a hyphen.)). 0 nan nan
contains_abbreviation 0: 141225, 1: 900 Categorical Whether or not the word contains an abbreviation. 0 for words that are only an abbreviation. See is_abbreviation. E.g. 1 for DNA-Fragment, 0 for DNA. 0 nan nan
STTS_PoS_tag ADJA: 11550, ADJD: 3975, ADV: 5475, APPR: 13800, APPRART: 3600, APZR: 75, ART: 20700, CARD: 675, KOKOM: 1275, KON: 4950, KOUI: 450, KOUS: 1200, NE: 300, NN: 38625, PAV: 1350, PDAT: 1200, PDS: 525, PIAT: 375, PIDAT: 675, PIS: 750, PPER: 1875, PPOSAT: 525, PRELAT: 450, PRELS: 2175, PRF: 1875, PTKA: 75, PTKNEG: 300, PTKVZ: 975, PTKZU: 750, PWAV: 75, TRUNC: 375, VAFIN: 5475, VAINF: 600, VMFIN: 1875, VMINF: 75, VVFIN: 7650, VVINF: 2475, VVIZU: 150, VVPP: 2850 Categorical Part-of-speech tags according to the STTS-tagset. See stimuli/ANNOTATION.MD for more information. 0 nan Manually tagged
type string The orthographical representation of a word as found in the corpus; this data is case sensitive, i.e. there is a distinction between name and Name. 0 nan dlexDB
type_length_chars 0.0-33.0 Integer The length of the type of a word in characters. See the description of word_length for a definition of how characters are counted. 0 nan nan
PoS_tag adja: 12150, adjd: 4050, adv: 6825, appr: 13650, apprart: 3600, art: 21000, card: 675, kokom: 1275, kon: 4725, koui: 375, kous: 1200, ne: 525, nn: 38100, pdat: 1200, pds: 525, piat: 375, pidat: 150, pis: 1050, pper: 1800, pposat: 525, prelat: 450, prels: 1800, prf: 1875, ptka: 75, ptkneg: 300, ptkvz: 1125, ptkzu: 750, pwav: 75, trunc: 375, vafin: 5475, vainf: 600, vmfin: 1800, vminf: 75, vvfin: 7725, vvinf: 2475, vvizu: 150, vvpp: 2850, xy: 375 Categorical Part-of-speech tag as defined by the dlexDB query. 0 nan dlexDB
lemma string nan 0 nan dlexDB
lemma_length_chars 0.0-32.0 Integer nan 0 nan dlexDB
syllables string nan 0 nan dlexDB
type_length_syllables 0.0-14.0 Integer nan 0 nan dlexDB
annotated_type_frequency_normalized min: 0.0, max: 24738.5901996, mean: 3629.1612, std: 6797.6492 Float The number of occurrences of an annotated type in corpus. An annotated type is a unique combination of a type, its part-of-speech tag and its lemma. 0 nan dlexDB
type_frequency_normalized min: 0.0, max: 26530.3631386, mean: 4141.6498, std: 7546.5578 Float nan 0 nan dlexDB
lemma_frequency_normalized min: 0.0, max: 80100.3069113, mean: 12271.0154, std: 24660.3797 Float nan 0 nan dlexDB
familiarity_normalized min: 0.0, max: 26530.3631386, mean: 3822.4994, std: 7457.3314 Float nan 0 nan dlexDB
regularity_normalized min: 0.0, max: 2123.30585022, mean: 35.3095, std: 119.8288 Float nan 0 nan dlexDB
document_frequency_normalized min: 0.0, max: 9372.80956103, mean: 2885.4746, std: 3353.4877 Float nan 0 nan dlexDB
sentence_frequency_normalized min: 0.0, max: 30912.3596552, mean: 5745.1861, std: 9454.5921 Float nan 0 nan dlexDB
cumulative_syllable_corpus_frequency_normalized min: 0.0, max: 125126.524676, mean: 15795.556, std: 15820.9152 Float nan 0 nan dlexDB
cumulative_syllable_lexicon_frequency_normalized min: 0.0, max: 218985.607753, mean: 21763.0396, std: 31363.3366 Float nan 0 nan dlexDB
cumulative_character_corpus_frequency_normalized min: 0.0, max: 7810554.20193, mean: 1800394.2485, std: 1298158.5605 Float nan 0 nan dlexDB
cumulative_character_lexicon_frequency_normalized min: 0.0, max: 18380479.713, mean: 4004667.3367, std: 2909455.8454 Float nan 0 nan dlexDB
cumulative_character_bigram_corpus_frequency_normalized min: 0.0, max: 1322150.62097, mean: 341028.5141, std: 227677.2532 Float nan 0 nan dlexDB
cumulative_character_bigram_lexicon_frequency_normalized min: 0.0, max: 2788357.77704, mean: 554080.6642, std: 451286.9101 Float nan 0 nan dlexDB
cumulative_character_trigram_corpus_frequency_normalized min: 0.0, max: 603427.130456, mean: 213996.2534, std: 130950.6249 Float nan 0 nan dlexDB
cumulative_character_trigram_lexicon_frequency_normalized min: 0.0, max: 899592.89035, mean: 223247.7744, std: 175811.3775 Float nan 0 nan dlexDB
initial_letter_frequency_normalized min: 0.0, max: 110461.430317, mean: 36031.6466, std: 33586.1123 Float nan 0 nan dlexDB
initial_bigram_frequency_normalized min: 0.0, max: 53801.2331077, mean: 11986.4422, std: 14536.7787 Float nan 0 nan dlexDB
initial_trigram_frequency_normalized min: -0.00817507899599, max: 29048.3692201, mean: 5528.0412, std: 8782.9659 Float nan 0 nan dlexDB
avg_cond_prob_in_bigrams min: 0.0, max: 0.5006180465, mean: 0.0423, std: 0.0447 Float The conditional probability of the bigram, given the occurrence of its first component. In other words, how likely it is for the second component to follow directly after the first. Here, this measure is computed on the basis of the annotated type information. 0 nan dlexDB
avg_cond_prob_in_trigrams min: 0.0, max: 25.0, mean: 0.2371, std: 0.5852 Float The conditional probability of the trigram, given the occurrence of its initial bigram. In other words, how likely it is for the third component to follow directly after the initial pair. Here, this measure is computed on the basis of the annotated type information. 0 nan dlexDB
neighbors_coltheart_higher_freq_cum_freq_normalized min: 0.0, max: 44055.247282, mean: 2111.0615, std: 7323.9586 Float nan 0 nan dlexDB
neighbors_coltheart_higher_freq_count_normalized min: 0.0, max: 8.13363128109, mean: 0.195, std: 0.4875 Float nan 0 nan dlexDB
neighbors_coltheart_all_cum_freq_normalized min: 0.0, max: 49782.1108458, mean: 4765.8454, std: 9884.7277 Float nan 0 nan dlexDB
neighbors_coltheart_all_count_normalized min: 0.0, max: 47.5175301158, mean: 14.8301, std: 14.4676 Float nan 0 nan dlexDB
neighbors_levenshtein_higher_freq_cum_freq_normalized min: 0.0, max: 44055.247282, mean: 2703.1737, std: 7703.635 Float nan 0 nan dlexDB
neighbors_levenshtein_higher_freq_count_normalized min: 0.0, max: 11.9864039932, mean: 0.3077, std: 0.6418 Float nan 0 nan dlexDB
neighbors_levenshtein_all_cum_freq_normalized min: 0.0, max: 54875.2749862, mean: 6310.865, std: 11349.5391 Float nan 0 nan dlexDB
neighbors_levenshtein_all_count_normalized min: 0.0, max: 75.7711966712, mean: 23.1334, std: 22.6083 Float nan 0 nan dlexDB
sent_surprisal_gpt2-base Surprisal value extracted from a language model (GerPT2-base) with the sentence as context. 0 nan nan
text_surprisal_gpt2-base Surprisal value extracted from a language model (GerPT2-base) with the text as context. 0 nan nan
sent_surprisal_gpt2-large Surprisal value extracted from a language model (GerPT2-large) with the sentence as context. 0 nan nan
text_surprisal_gpt2-large Surprisal value extracted from a language model (GerPT2-large) with the text as context. 0 nan nan
sent_surprisal_llama-7b Surprisal value extracted from a language model (LeoLM-7b) with the sentence as context. 0 nan nan
text_surprisal_llama-7b Surprisal value extracted from a language model (LeoLM-7b) with the text as context. 0 nan nan
sent_surprisal_llama-13b Surprisal value extracted from a language model (LeoLM-13b) with the sentence as context. 0 nan nan
text_surprisal_llama-13b Surprisal value extracted from a language model (LeoLM-13b) with the text as context. 0 nan nan
sent_surprisal_bert-base Surprisal value extracted from a language model (BERT-base) with the sentence as context. 0 nan nan
text_surprisal_bert-base Surprisal value extracted from a language model (BERT-base) with the text as context. 0 nan nan
FFD min: 0, max: 2144, mean: 166.4158, std: 132.8433 Float First-fixation duration: duration of the first fixation on a word if this word is fixated in first-pass reading, otherwise 0. 0 nan nan
SFD min: 0, max: 2144, mean: 118.8309, std: 135.573 Float Single-fixation duration: duration of the only first-pass fixation on a word, 0 if the word was skipped or more than one fixation occurred in the first-pass (equals FFD in case of a single first-pass fixation). 0 nan nan
FD min: 0, max: 2144, mean: 203.5219, std: 116.9324 Float First duration: duration of the first fixation on a word (identical to FFD if not skipped in the first-pass). 0 nan nan
FPRT min: 0, max: 9649, mean: 247.1511, std: 298.6889 Float First-pass reading time: sum of the durations of all first-pass fixations on a word (0 if the word was skipped in the first-pass). 0 nan nan
FRT min: 0, max: 9649, mean: 291.8272, std: 288.631 Float First-reading time: sum of the duration of all fixations from first fixating the word (independent if the first fixation occurs in first-pass reading) until leaving the word for the first time (equals FPRT in case the word was fixated in the first-pass). 0 nan nan
TFT min: 0, max: 25314, mean: 632.8199, std: 720.3975 Float Total-fixation time: sum of all fixations on a word (FPRT+RRT). 0 nan nan
TFC The total fixation count on the word. 0 nan nan
RRT min: 0, max: 23902, mean: 385.6688, std: 597.5206 Float Re-reading time: sum of the durations of all fixations on a word that do not belong to the first-pass (TFT-FPRT). 0 nan nan
RPD_inc min: 0, max: 318898, mean: 632.8199, std: 3881.7376 Float Inclusive regression-path duration: Sum of all fixation durations starting from the first first-pass fixation on a word until fixation on a word to the right of this word (including all regressive fixations on previous words), 0 if the word was not fixated in the first-pass (RPD_exc+RBRT). 0 nan nan
RPD_exc min: 0, max: 315640, mean: 342.295, std: 3815.3786 Float Exclusive regression-path duration: Sum of all fixation durations after initiating a first-pass regression from a word until fixating a word to the right of this word, without counting fixations on the word itself (RPD_inc-RBRT). 0 nan nan
RBRT min: 0, max: 10675, mean: 290.5249, std: 358.8929 Float Right-bounded reading time: Sum of all fixation durations on a word until a word to the right of this word is fixated (RPD_inc-RDP_exc). 0 nan nan
Fix 0: 14182, 1: 127943 Categorical Fixation: 1 if the word was fixated, otherwise 0 (FPF or RR). 0 nan nan
FPF 0: 38408, 1: 103717 Categorical First-pass fixation: 1 if the word was fixated in the first-pass, otherwise 0. 0 nan nan
RR 0: 48283, 1: 93842 Categorical Re-reading: 1 if the word was fixated after the first-pass reading, otherwise 0 (sign(RRT)). 0 nan nan
FPReg 0: 119060, 1: 23065 Categorical First-pass regression: 1 if a regression was initiated in the first-pass reading of the word, otherwise 0 (sign(RPD exc)). 0 nan nan
TRC_out 0-15 Integer Total count of outgoing regressions: total number of regressive saccades initiated from this word. 0 nan nan
TRC_in 0-12 Integer Total count of incoming regressions: total number of regressive saccades landing on this word. 0 nan nan
LP 0-28 Integer Landing position: position of the first saccade on the word expressed by ordinal position of the fixated character. 0 nan nan
SL_in -162-156 Integer Incoming saccade length: length of the saccade that leads to first fixation on a word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression. 0 nan nan
SL_out -179-63 Integer Outgoing saccade length: length of the first saccade that leaves the word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression; 0 if the word is never fixated. 0 nan nan
acc_bq_1 min: 0.0, max: 1.0, mean: 0.3922, std: 0.4883 Float The accuracy of background question 1. The answer can be either true or false, so the value is either 0 or 1. 1958 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_bq_2 min: 0.0, max: 1.0, mean: 0.3619, std: 0.4805 Float The accuracy of background question 2. The answer can be either true or false, so the value is either 0 or 1. 1958 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_bq_3 min: 0.0, max: 1.0, mean: 0.4277, std: 0.4947 Float The accuracy of background question 3. The answer can be either true or false, so the value is either 0 or 1. 1958 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_1 min: 0.0, max: 1.0, mean: 0.6469, std: 0.4779 Float The accuracy of text question 1. The answer can be either true or false, so the value is either 0 or 1. 1958 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_2 min: 0.0, max: 1.0, mean: 0.6428, std: 0.4792 Float The accuracy of text question 2. The answer can be either true or false, so the value is either 0 or 1. 1958 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_3 min: 0.0, max: 1.0, mean: 0.6563, std: 0.4749 Float The accuracy of text question 3. The answer can be either true or false, so the value is either 0 or 1. 1958 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
mean_acc_tq min: 0.0, max: 1.0, mean: 0.3939, std: 0.3158 Float The mean accuracy of all background questions for one text read by one reader. 1958 nan nan
mean_acc_bq min: 0.0, max: 1.0, mean: 0.6487, std: 0.3076 Float The mean accuracy of all text questions for one text read by one reader. 1958 nan nan
text_domain_numeric 0: 142125 Categorical Numerical value of text_domain; 0=biology, 1=physics. 0 nan nan
trial 1-12 Integer Each participant reads all 12 texts, the order of which follows their trial number. If text b0 has trial number 2 for participant 5, this participant read text b0 as the second text. 0 nan nan
text_id b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 Unique identifier given to each stimulus text. 0 nan nan
reader_id 0-105 Integer The unique identifier given to each reader. Reader IDs start at 0. 0 nan nan
gender_numeric 0.0: 66325, 1.0: 73905, nan: 1895 Categorical Numerical value of gender; 0=male, 1=female. 1895 nan nan
reader_domain_numeric 0: 81485, 1: 60640 Categorical Numerical encoding of the reader domain; 0=biology, 1=physics. 0 nan nan
age min: 18.0, max: 41.0, mean: 24.1644, std: 4.1809 Float Reader's age. 3790 nan nan
expert_status_numeric 0: 53060, 1: 89065 Categorical Numerical value of expert_status; 0=beginner, 1=expert. 0 nan nan
domain_expert_status_numeric 0: 30320, 1: 51165, 2: 22740, 3: 37900 Categorical Numerical value of domain_expert_status; 0=biology-beginner, 1=biology-expert, 2=physics-beginner, 3=physics-expert. 0 nan nan
expert_reading_label_numeric 0: 90960, 1: 51165 Categorical Numeric encoding of whether the reader is an expert in the text domain (i.e. text_domain == reader_domain and reader is expert). 1=expert_reading, 0=non-expert_reading 0 nan nan

Merged: scanpaths, participant info, reading measures and word features

TODO: insert short text about this section in this file

Please find the files under this link: Scanpaths merged

Column name Possible values Value type Description Num missing values Missing value description Source
fixation_index 1-1469 Integer The index of the fixation in temporal order. 0 nan nan
text_domain bio: 4682, biology: 200017, physics: 199721 Categorical The domain of the stimulus text. 0 nan nan
trial 1-12 Integer Each participant reads all 12 texts, the order of which follows their trial number. If text b0 has trial number 2 for participant 5, this participant read text b0 as the second text. 0 nan nan
acc_bq_1 min: 0.0, max: 1.0, mean: 0.3869, std: 0.487 Float The accuracy of background question 1. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_bq_2 min: 0.0, max: 1.0, mean: 0.3564, std: 0.4789 Float The accuracy of background question 2. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_bq_3 min: 0.0, max: 1.0, mean: 0.4217, std: 0.4938 Float The accuracy of background question 3. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_1 min: 0.0, max: 1.0, mean: 0.6625, std: 0.4729 Float The accuracy of text question 1. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_2 min: 0.0, max: 1.0, mean: 0.6326, std: 0.4821 Float The accuracy of text question 2. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
acc_tq_3 min: 0.0, max: 1.0, mean: 0.6564, std: 0.4749 Float The accuracy of text question 3. The answer can be either true or false, so the value is either 0 or 1. 5785 For participant 1 (p0), 31 (p0, b1, b5), 32 (p0), 61 (p1), 62 (b0, b1, b3, b5, 04) and 90 (b3) the accuracies for certain trials are missing due to hardware problems (missing measurements). nan
fixation_duration 2-4474 Integer The duration of the fixation in milliseconds. 0 nan nan
next_saccade_duration 1.0-9491.0 Integer The duration of the saccade that follows a fixation in milliseconds. 46 nan nan
previous_saccade_duration 1.0-9491.0 Integer The duration of a saccade that preceeds a fixation in milliseconds. 515 nan nan
version 0-105 Integer Specifies the version of the items. In each version, the order of the stimuli and the order of the answer options for each question differ. The specifics of each version can be found in the items.tsv. 0 nan nan
line 1-12 Integer The texts were presented on the screen in multiple lines. Specifies the line of the respective row; indexing starts at 1. 0 nan nan
aoi 1-1121 Integer The region of interest specified as character index in the text (see char_index_in_text). Defines which character has been fixated. 0 nan nan
char_index_in_line 1-100 Integer Index of a character in the line. Indexing starts at 1. 0 nan nan
original_fixation_index 1-1478 Integer The index of the uncorrected fixation. 0 nan nan
is_fixation_adjusted False: 382202, True: 22218 Categorical Whether or not the fixation has been adjusted manually. 0 nan Manually tagged.
reader_id 0-105 Integer The unique identifier given to each reader. Reader IDs start at 0. 0 nan nan
text_id b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 Unique identifier given to each stimulus text. 0 nan nan
word_index_in_text 1-180 Integer The index of the word in the text. Indexing starts at 1. 0 nan nan
sent_index_in_text 1-12 Integer The index of a sentence in the respective text. Indexing starts at 1. 0 nan nan
char_index_in_text 1-1121 Integer Index of a character in the text. Indexing starts at 1. 0 nan nan
word string Words as they appear in the stimuli texts. Words are split at white-space. 0 nan nan
character string Character as text. 0 nan nan
text_id_numeric 0-11 Integer Numerical value of text_id; 0=b0, 1=b1, 2=b2, 3=b3, 4=b4, 5=b5, 6=p0, 7=p1, 8=p2, 9=p3, 10=p4, 11=p5 0 nan nan
text_domain_numeric 0: 204699, 1: 199721 Categorical Numerical value of text_domain; 0=biology, 1=physics. 0 nan nan
reader_domain_numeric 0: 223158, 1: 181262 Categorical Numerical encoding of the reader domain; 0=biology, 1=physics. 0 nan nan
expert_status_numeric 0: 154333, 1: 250087 Categorical Numerical value of expert_status; 0=beginner, 1=expert. 0 nan nan
expert_reading_label_numeric 0: 290883, 1: 113537 Categorical Numeric encoding of whether the reader is an expert in the text domain (i.e. text_domain == reader_domain and reader is expert). 1=expert_reading, 0=non-expert_reading 0 nan nan
expert_reading_label expert_reading: 113537, non-expert_reading: 290883 Categorical Whether the reader is an expert in the text domain (i.e. text_domain == reader_domain and reader is expert) 0 nan nan
word_with_punct The word as it appears in the text, including punctuation. 96 nan nan
word_index_in_sent 1-51 Integer The index of the word in the sentence. Indexing starts at 1. 0 nan nan
word_length 2-33 Integer Word length is defined in number of characters including symbols like hyphens but without sentence punctuation at the end (i.e., z.B. = 4 characters; DNA-Kette =9 characters; eats.=4 characters). 0 nan nan
STTS_punctuation_before 0.0: 211108, 0: 189407, $(: 3905 Categorical If a word is preceded by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. 0 nan nan
STTS_punctuation_after $(: 3260, $($,: 573, $,: 22559, $.: 25794, 0: 352234 Categorical If a word is followed by a punctuation mark, the STTS-PoS-tag of the punctuation mark is added here. 0 nan nan
is_in_quote 0: 399715, 1: 4705 Categorical Whether or not the word is part of an expression in quotes. 0 nan Manually tagged
is_in_parentheses 0: 403155, 1: 1265 Categorical Whether or not the word is part of a phrase in parentheses. 0 nan Manually tagged
is_clause_beginning 0: 388232, 1: 16188 Categorical Whether or not the word is the beginning of a clause. 0 nan Manually tagged
is_sent_beginning 0: 386681, 1: 17739 Categorical Whether or not the word is the beginning of a new sentence. 0 nan Manually tagged
is_clause_end Whether or not the word is the end of a clause. 0 nan nan
is_sent_end Whether or not the word is the end of a sentence. 0 nan nan
is_abbreviation 0: 403478, 1: 942 Categorical Whether or not the entire word is an abbreviation. 0 nan Manually tagged
is_expert_technical_term 0: 332354, 1: 72066 Categorical 1 if the word is a technical term that is not generally understandable. E.g.: ""Agarose"". 0 nan Manually tagged
is_general_technical_term 0: 325333, 1: 79087 Categorical 1 if the word is a technical term that is generally understandable. E.g.: "elektrisch" 0 nan nan
contains_symbol 0: 400458, 1: 3962 Categorical Whether or not the word contains a symbol. E.g.: β-D-Glucose 0 nan nan
contains_hyphen 0: 388149, 1: 16271 Categorical Whether or not the word contains a hyphen. E.g. 1 for DNA-Fragment (not words that have tag TRUNC (compositional first element, e.g. in "Sekundär- und Tertiärstrukturen", "Sekundär-" does not count as having a hyphen.)). 0 nan nan
contains_abbreviation 0: 399423, 1: 4997 Categorical Whether or not the word contains an abbreviation. 0 for words that are only an abbreviation. See is_abbreviation. E.g. 1 for DNA-Fragment, 0 for DNA. 0 nan nan
STTS_PoS_tag ADJA: 51041, ADJD: 12714, ADV: 12236, APPR: 22470, APPRART: 5566, APZR: 91, ART: 37340, CARD: 1594, KOKOM: 2428, KON: 5798, KOUI: 654, KOUS: 2521, NE: 955, NN: 162980, PAV: 3444, PDAT: 3292, PDS: 1374, PIAT: 791, PIDAT: 1653, PIS: 1322, PPER: 2511, PPOSAT: 1360, PRELAT: 1302, PRELS: 4193, PRF: 3606, PTKA: 97, PTKNEG: 687, PTKVZ: 1490, PTKZU: 583, PWAV: 76, TRUNC: 1137, VAFIN: 10340, VAINF: 1206, VMFIN: 3953, VMINF: 153, VVFIN: 23854, VVINF: 7713, VVIZU: 578, VVPP: 9317 Categorical Part-of-speech tags according to the STTS-tagset. See stimuli/ANNOTATION.MD for more information. 0 nan Manually tagged
type string The orthographical representation of a word as found in the corpus; this data is case sensitive, i.e. there is a distinction between name and Name. 0 nan dlexDB
type_length_chars 0.0-33.0 Integer The length of the type of a word in characters. See the description of word_length for a definition of how characters are counted. 0 nan nan
PoS_tag adja: 53330, adjd: 12226, adv: 15728, appr: 22193, apprart: 5566, art: 37918, card: 1594, kokom: 2428, kon: 5405, koui: 559, kous: 2521, ne: 1386, nn: 160585, pdat: 3292, pds: 1374, piat: 791, pidat: 352, pis: 2063, pper: 2434, pposat: 1360, prelat: 1302, prels: 4076, prf: 3606, ptka: 97, ptkneg: 687, ptkvz: 1891, ptkzu: 583, pwav: 76, trunc: 1137, vafin: 10340, vainf: 1206, vmfin: 3829, vminf: 153, vvfin: 23978, vvinf: 7713, vvizu: 578, vvpp: 9317, xy: 746 Categorical Part-of-speech tag as defined by the dlexDB query. 0 nan dlexDB
lemma string nan 0 nan dlexDB
lemma_length_chars 0.0-32.0 Integer nan 0 nan dlexDB
syllables string nan 0 nan dlexDB
type_length_syllables 0.0-14.0 Integer nan 0 nan dlexDB
annotated_type_frequency_normalized min: 0.0, max: 24738.5901996, mean: 1950.9055, std: 5185.3006 Float The number of occurrences of an annotated type in corpus. An annotated type is a unique combination of a type, its part-of-speech tag and its lemma. 0 nan dlexDB
type_frequency_normalized min: 0.0, max: 26530.3631386, mean: 2247.4523, std: 5847.2187 Float nan 0 nan dlexDB
lemma_frequency_normalized min: 0.0, max: 80100.3069113, mean: 7203.2409, std: 19769.4428 Float nan 0 nan dlexDB
familiarity_normalized min: 0.0, max: 26530.3631386, mean: 2191.7786, std: 5759.2592 Float nan 0 nan dlexDB
regularity_normalized min: 0.0, max: 2123.30585022, mean: 46.8657, std: 137.5046 Float nan 0 nan dlexDB
document_frequency_normalized min: 0.0, max: 9372.80956103, mean: 1684.1043, std: 2829.0626 Float nan 0 nan dlexDB
sentence_frequency_normalized min: 0.0, max: 30912.3596552, mean: 3137.4539, std: 7374.8037 Float nan 0 nan dlexDB
cumulative_syllable_corpus_frequency_normalized min: 0.0, max: 125126.524676, mean: 15768.7784, std: 17529.5528 Float nan 0 nan dlexDB
cumulative_syllable_lexicon_frequency_normalized min: 0.0, max: 218985.607753, mean: 27232.3183, std: 36883.9628 Float nan 0 nan dlexDB
cumulative_character_corpus_frequency_normalized min: 0.0, max: 7810554.20193, mean: 2053804.334, std: 1596380.3916 Float nan 0 nan dlexDB
cumulative_character_lexicon_frequency_normalized min: 0.0, max: 18380479.713, mean: 4612580.9638, std: 3597155.0404 Float nan 0 nan dlexDB
cumulative_character_bigram_corpus_frequency_normalized min: 0.0, max: 1322150.62097, mean: 356831.454, std: 269772.388 Float nan 0 nan dlexDB
cumulative_character_bigram_lexicon_frequency_normalized min: 0.0, max: 2788357.77704, mean: 629626.1651, std: 539088.9742 Float nan 0 nan dlexDB
cumulative_character_trigram_corpus_frequency_normalized min: 0.0, max: 603427.130456, mean: 200341.8076, std: 144122.7012 Float nan 0 nan dlexDB
cumulative_character_trigram_lexicon_frequency_normalized min: 0.0, max: 899592.89035, mean: 236423.2776, std: 199573.1416 Float nan 0 nan dlexDB
initial_letter_frequency_normalized min: 0.0, max: 110461.430317, mean: 28045.0077, std: 30618.9167 Float nan 0 nan dlexDB
initial_bigram_frequency_normalized min: 0.0, max: 53801.2331077, mean: 8706.0335, std: 12743.2638 Float nan 0 nan dlexDB
initial_trigram_frequency_normalized min: -0.00817507899599, max: 29048.3692201, mean: 3754.6304, std: 7393.1224 Float nan 0 nan dlexDB
avg_cond_prob_in_bigrams min: 0.0, max: 0.5006180465, mean: 0.0313, std: 0.0466 Float The conditional probability of the bigram, given the occurrence of its first component. In other words, how likely it is for the second component to follow directly after the first. Here, this measure is computed on the basis of the annotated type information. 0 nan dlexDB
avg_cond_prob_in_trigrams min: 0.0, max: 25.0, mean: 0.2251, std: 0.8814 Float The conditional probability of the trigram, given the occurrence of its initial bigram. In other words, how likely it is for the third component to follow directly after the initial pair. Here, this measure is computed on the basis of the annotated type information. 0 nan dlexDB
neighbors_coltheart_higher_freq_cum_freq_normalized min: 0.0, max: 44055.247282, mean: 1276.643, std: 5775.4034 Float nan 0 nan dlexDB
neighbors_coltheart_higher_freq_count_normalized min: 0.0, max: 8.13363128109, mean: 0.1556, std: 0.4321 Float nan 0 nan dlexDB
neighbors_coltheart_all_cum_freq_normalized min: 0.0, max: 49782.1108458, mean: 2794.1781, std: 7982.6321 Float nan 0 nan dlexDB
neighbors_coltheart_all_count_normalized min: 0.0, max: 47.5175301158, mean: 9.0448, std: 12.679 Float nan 0 nan dlexDB
neighbors_levenshtein_higher_freq_cum_freq_normalized min: 0.0, max: 44055.247282, mean: 1683.6273, std: 6153.8504 Float nan 0 nan dlexDB
neighbors_levenshtein_higher_freq_count_normalized min: 0.0, max: 11.9864039932, mean: 0.2681, std: 0.5814 Float nan 0 nan dlexDB
neighbors_levenshtein_all_cum_freq_normalized min: 0.0, max: 54875.2749862, mean: 3761.4734, std: 9299.5647 Float nan 0 nan dlexDB
neighbors_levenshtein_all_count_normalized min: 0.0, max: 75.7711966712, mean: 14.1417, std: 19.6383 Float nan 0 nan dlexDB
sent_surprisal_gpt2-base Surprisal value extracted from a language model (GerPT2-base) with the sentence as context. 0 nan nan
text_surprisal_gpt2-base Surprisal value extracted from a language model (GerPT2-base) with the text as context. 0 nan nan
sent_surprisal_gpt2-large Surprisal value extracted from a language model (GerPT2-large) with the sentence as context. 0 nan nan
text_surprisal_gpt2-large Surprisal value extracted from a language model (GerPT2-large) with the text as context. 0 nan nan
sent_surprisal_llama-7b Surprisal value extracted from a language model (LeoLM-7b) with the sentence as context. 0 nan nan
text_surprisal_llama-7b Surprisal value extracted from a language model (LeoLM-7b) with the text as context. 0 nan nan
sent_surprisal_llama-13b Surprisal value extracted from a language model (LeoLM-13b) with the sentence as context. 0 nan nan
text_surprisal_llama-13b Surprisal value extracted from a language model (LeoLM-13b) with the text as context. 0 nan nan
sent_surprisal_bert-base Surprisal value extracted from a language model (BERT-base) with the sentence as context. 0 nan nan
text_surprisal_bert-base Surprisal value extracted from a language model (BERT-base) with the text as context. 0 nan nan
FFD min: 0, max: 2144, mean: 195.9741, std: 124.5597 Float First-fixation duration: duration of the first fixation on a word if this word is fixated in first-pass reading, otherwise 0. 0 nan nan
SFD min: 0, max: 2144, mean: 107.9483, std: 134.474 Float Single-fixation duration: duration of the only first-pass fixation on a word, 0 if the word was skipped or more than one fixation occurred in the first-pass (equals FFD in case of a single first-pass fixation). 0 nan nan
FD min: 0, max: 2144, mean: 226.9857, std: 103.7904 Float First duration: duration of the first fixation on a word (identical to FFD if not skipped in the first-pass). 0 nan nan
FPRT min: 0, max: 9649, mean: 408.9247, std: 526.0428 Float First-pass reading time: sum of the durations of all first-pass fixations on a word (0 if the word was skipped in the first-pass). 0 nan nan
FRT min: 0, max: 9649, mean: 456.8788, std: 518.1388 Float First-reading time: sum of the duration of all fixations from first fixating the word (independent if the first fixation occurs in first-pass reading) until leaving the word for the first time (equals FPRT in case the word was fixated in the first-pass). 0 nan nan
TFT min: 0, max: 25314, mean: 1333.0163, std: 1428.494 Float Total-fixation time: sum of all fixations on a word (FPRT+RRT). 0 nan nan
TFC The total fixation count on the word. 0 nan nan
RRT min: 0, max: 23902, mean: 924.0916, std: 1240.0587 Float Re-reading time: sum of the durations of all fixations on a word that do not belong to the first-pass (TFT-FPRT). 0 nan nan
RPD_inc min: 0, max: 318898, mean: 1076.7946, std: 5339.73 Float Inclusive regression-path duration: Sum of all fixation durations starting from the first first-pass fixation on a word until fixation on a word to the right of this word (including all regressive fixations on previous words), 0 if the word was not fixated in the first-pass (RPD_exc+RBRT). 0 nan nan
RPD_exc min: 0, max: 315640, mean: 557.5849, std: 5209.143 Float Exclusive regression-path duration: Sum of all fixation durations after initiating a first-pass regression from a word until fixating a word to the right of this word, without counting fixations on the word itself (RPD_inc-RBRT). 0 nan nan
RBRT min: 0, max: 10675, mean: 519.2098, std: 638.9024 Float Right-bounded reading time: Sum of all fixation durations on a word until a word to the right of this word is fixated (RPD_inc-RDP_exc). 0 nan nan
Fix 0: 110, 1: 404310 Categorical Fixation: 1 if the word was fixated, otherwise 0 (FPF or RR). 0 nan nan
FPF 0: 56838, 1: 347582 Categorical First-pass fixation: 1 if the word was fixated in the first-pass, otherwise 0. 0 nan nan
RR 0: 48241, 1: 356179 Categorical Re-reading: 1 if the word was fixated after the first-pass reading, otherwise 0 (sign(RRT)). 0 nan nan
FPReg 0: 308156, 1: 96264 Categorical First-pass regression: 1 if a regression was initiated in the first-pass reading of the word, otherwise 0 (sign(RPD exc)). 0 nan nan
TRC_out 0-15 Integer Total count of outgoing regressions: total number of regressive saccades initiated from this word. 0 nan nan
TRC_in 0-12 Integer Total count of incoming regressions: total number of regressive saccades landing on this word. 0 nan nan
LP 1-28 Integer Landing position: position of the first saccade on the word expressed by ordinal position of the fixated character. 0 nan nan
SL_in -162-156 Integer Incoming saccade length: length of the saccade that leads to first fixation on a word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression. 0 nan nan
SL_out -179-63 Integer Outgoing saccade length: length of the first saccade that leaves the word in number of words; positive sign if the saccade is a progressive one, negative sign if it is a regression; 0 if the word is never fixated. 0 nan nan
mean_acc_tq min: 0.0, max: 1.0, mean: 0.3883, std: 0.3144 Float The mean accuracy of all background questions for one text read by one reader. 5785 nan nan
mean_acc_bq min: 0.0, max: 1.0, mean: 0.6505, std: 0.3052 Float The mean accuracy of all text questions for one text read by one reader. 5785 nan nan
gender_numeric 0.0: 187536, 1.0: 212874, nan: 4010 Categorical Numerical value of gender; 0=male, 1=female. 4010 nan nan
age min: 18.0, max: 41.0, mean: 24.0283, std: 4.1436 Float Reader's age. 8459 nan nan
domain_expert_status_numeric 0: 89325, 1: 133833, 2: 65008, 3: 116254 Categorical Numerical value of domain_expert_status; 0=biology-beginner, 1=biology-expert, 2=physics-beginner, 3=physics-expert. 0 nan nan

AOI to word mapping

TODO: insert short text about this section in this file

Please find the file under this link: aoi to word mapping

Column name Possible values Value type Description Num missing values Missing value description Source
text_id b0, b1, b2, b3, b4, b5, p0, p1, p2, p3, p4, p5 Unique identifier given to each stimulus text. 0 nan nan
word_index_in_text 1-180 Integer The index of the word in the text. Indexing starts at 1. 0 nan nan
char_index_in_text 1-1121 Integer Index of a character in the text. Indexing starts at 1. 0 nan nan

Participants

TODO: insert short text about this section in this file

Please find the file under this link: Participant information

Column name Possible values Value type Description Num missing values Missing value description Source
reader_id 0-105 Integer The unique identifier given to each reader. Reader IDs start at 0. 0 nan nan
reader_domain biology: 43, physics: 32 Categorical The area of expertise of the reader. All readers are students whose major is either physics or biology. 0 nan nan
reader_domain_numeric 0: 43, 1: 32 Categorical Numerical encoding of the reader domain; 0=biology, 1=physics. 0 nan nan
expert_status beginner: 28, expert: 47 Categorical Reader's expert status. Readers are considered experts if they are either MSc or PhD students. 1st semester BSc students are considered beginners. 0 nan nan
expert_status_numeric 0: 28, 1: 47 Categorical Numerical value of expert_status; 0=beginner, 1=expert. 0 nan nan
domain_expert_status biology-beginner: 16, biology-expert: 27, physics-beginner: 12, physics-expert: 20 Categorical The combination of the readers' major (reader_domain) and their expertise (expert_status). 0 nan nan
domain_expert_status_numeric 0: 16, 1: 27, 2: 12, 3: 20 Categorical Numerical value of domain_expert_status; 0=biology-beginner, 1=biology-expert, 2=physics-beginner, 3=physics-expert. 0 nan nan
glasses no: 54, yes: 20, nan: 1 Categorical Whether or not reader is wearing glasses. 1 nan nan
age min: 18.0, max: 41.0, mean: 24.1644, std: 4.2098 Float Reader's age. 2 nan nan
handedness right: 68, left: 6, nan: 1 Categorical Reader's handedness. 1 nan nan
hours_sleep min: 0.0, max: 11.0, mean: 7.2095, std: 1.3138 Float The hours of sleep of the participant before the experiment. 1 nan nan
alcohol no: 71, yes: 3, nan: 1 Categorical Whether or not a participant consumed alcohol within 24 hours before the experiment start. 1 nan nan
gender female: 39, male: 35, nan: 1 Categorical Reader's gender. 1 nan nan
gender_numeric 0.0: 35, 1.0: 39, nan: 1 Categorical Numerical value of gender; 0=male, 1=female. 1 nan nan