Skip to content

Data Export Columns

Geoff Goehle edited this page Aug 18, 2014 · 8 revisions

These are potential columns in the big data table we are hoping to build. I only included things which we currently have access to on the theory that we want to use existing data, not make upgrades to our current tables and then allow those upgrades time to propagate.

There are some questions which need to be addressed or at least dismissed.

  • Table design: Do we have one giant table for portability and ease of use, or do we have smaller tables for performance and assume people know how to do joins?
  • Data purity: A lot of data which people care about can be included but will not be very "pure". For example fields like seed, due date, answer date can be included from the appropriate tables, but they may not actually be the seed, due date, answer date that the student had when the answer was recorded. Most of the time they will be the same but there will certainly be times when they are not.
  • Computed columns: Do we want to have computed columns that do not reflect any data we actually store and may involve some educated guessing. For example, we have the final number of incorrect attempts and correct attempts and the final score, but we dont have these values at the time of each attempt. Do we want to try and compute these values for each answer row? What happens when we fail (e.g. the question has auxiliary fields which were not recorded) or are just wrong (e.g. the seed changed)?

Table Columns

  • Answer ID: Salted hash or Unique int
  • Course ID: Salted hash
  • Student ID: Salted hash
  • Set ID: Salted hash
  • Problem ID: Salted hash
  • Answer Timestamp: Unix time
  • Answer String
  • Answer Correct String: String of 1's, 0's corresponding to correctness of answers
  • Problem Path: Also serves as unique identifier of problem
  • Final Problem Status
  • Total Incorrect Attempts
  • Total Correct Attempts
  • Problem Value: Possibly impure
  • Problem Max Attempts: Possibly impure
  • Seed: Possibly impure
  • Open Date: Unix time, Possibly impure
  • Due Date: Unix time, Possibly impure
  • Answer Date: Unix time, Possibly impure
  • Set Type
  • Library Subject: Possibly Missing
  • Library Chapter: Possibly Missing
  • Library Section: Possibly Missing
  • Library Keywords: Possibly Missing
  • Raw Status of Attempt: Post-Computed (ratio of 1's to number of answers).
  • Status of Attempt: Post-Computed, Possibly Missing, Possibly Impure
  • Final Set Grade: Post-Computed, Possibly Missing, Possibly Impure
  • Number Incorrect Previous Attempts: Post-Computed, Possibly Missing, Possibly Impure
  • Number Correct Previous Attempts: Post-Computed, Possibly Missing, Possibly Impure