Skip to content

Copying field data

cjshawMIT edited this page Feb 14, 2018 · 8 revisions

When collecting data from the field, you'll want to grab the qbank assessment data as well as any student uploaded files.

On Windows unplatform installations, there is a script provided in unplatform, called DataExtractionScript.bat. Just run that script and copy the resulting zip file. That script takes care of assessment data export as well as student uploaded files, and bundles both sets of data into a single zip file.

MongoDB

For a gstudio / MongoDB installation, just do a mongodump and you'll have all the qbank data! Depending on your configuration, it will most likely be stored in the following databases

  • assessment
  • assessment_authoring
  • hierarchy
  • id
  • logging
  • relationship
  • repository
  • resource

Specific collections

In the field, student-responses is only saved in certain database collections -- the remainder is data used to serve the assessments, and does not change in the field. Student-related data is in:

  • assessment/AssessmentSection
  • assessment/AssessmentTaken
  • logging/Log
  • logging/LogEntry
  • repository/Asset
  • repository/Repository

Also, any files uploaded as part of an assessment are put into the webapps/CLIx/studentResponseFiles directory (see below for more details).

Make sure that all of the above data is saved by the field service provides and transported back to the central data store, for analytics and research purposes.

NOTE If you expect that teachers in the field will be authoring new content, then you MUST save / export the entire MongoDB datastore (not just the student-specific collections listed above), so that you capture the new questions and assessments.

Merging multiple MongoDBs

Once you have collected the data from multiple schools, you can merge all the data into a single MongoDB instance with mongorestore. If you collected the raw MongoDB data (i.e. *.wt files), then first you'll need to run mongodump on each of the data sets to get *.json and *.bson files.

Converting raw MongoDB file into a database dump

To convert from the raw WiredTiger *.wt files to a proper MongoDB database dump, you have to run mongodump against a running MongoDB instance. For example, if your *.wt files are located in a directory called rj1:

$ ls rj1
-rw-r--r--  1 user  staff         49 Mar 21  2017 WiredTiger
-rw-r--r--  1 user  staff         21 Mar 21  2017 WiredTiger.lock
-rw-r--r--  1 user  staff        933 Feb 13 13:11 WiredTiger.turtle
-rw-r--r--  1 user  staff     188416 Feb 13 13:11 WiredTiger.wt
-rw-r--r--  1 user  staff       4096 Feb 13 13:11 WiredTigerLAS.wt
-rw-r--r--  1 user  staff  104857728 Dec  4 23:05 WiredTigerLog.0000000067
-rw-r--r--  1 user  staff  104857728 Dec  4 22:44 WiredTigerPreplog.0000000001
-rw-r--r--  1 user  staff  104857728 Dec  4 22:44 WiredTigerPreplog.0000000002
-rw-r--r--  1 user  staff      36864 Feb 13 13:11 _mdb_catalog.wt
-rw-r--r--  1 user  staff      77824 Feb 13 13:11 collection-0--1013810609953599019.wt
-rw-r--r--  1 user  staff      36864 Feb 13 13:11 collection-0--1650450412842504590.wt
-rw-r--r--  1 user  staff      61440 Feb 13 13:11 collection-0-1527613604414953189.wt
etc...

$ mongod --dbpath rj1 &
$ mongodump -o rj1-dump
$ ls rj1-dump
drwxr-xr-x  16 user  staff  544 Feb 13 12:54 assessment
drwxr-xr-x   6 user  staff  204 Feb 13 12:54 assessment_authoring
drwxr-xr-x  20 user  staff  680 Feb 13 12:54 gstudio-mongodb
drwxr-xr-x   4 user  staff  136 Feb 13 12:54 hierarchy
drwxr-xr-x  10 user  staff  340 Feb 13 12:54 id
drwxr-xr-x   6 user  staff  204 Feb 13 12:54 logging
drwxr-xr-x   6 user  staff  204 Feb 13 12:54 relationship
drwxr-xr-x   6 user  staff  204 Feb 13 12:54 repository
drwxr-xr-x   6 user  staff  204 Feb 13 12:54 resource

$ ls rj1-dump/assessment
-rw-r--r--  1 user  staff    342301 Feb 13 12:54 Assessment.bson
-rw-r--r--  1 user  staff        93 Feb 13 12:54 Assessment.metadata.json
-rw-r--r--  1 user  staff   1712600 Feb 13 12:54 AssessmentOffered.bson
-rw-r--r--  1 user  staff       100 Feb 13 12:54 AssessmentOffered.metadata.json
-rw-r--r--  1 user  staff    917299 Feb 13 12:54 AssessmentSection.bson
-rw-r--r--  1 user  staff       100 Feb 13 12:54 AssessmentSection.metadata.json
-rw-r--r--  1 user  staff    109002 Feb 13 12:54 AssessmentTaken.bson
-rw-r--r--  1 user  staff        98 Feb 13 12:54 AssessmentTaken.metadata.json
-rw-r--r--  1 user  staff    353323 Feb 13 12:54 Bank.bson
-rw-r--r--  1 user  staff        87 Feb 13 12:54 Bank.metadata.json
-rw-r--r--  1 user  staff  13972948 Feb 13 12:54 Item.bson
-rw-r--r--  1 user  staff        87 Feb 13 12:54 Item.metadata.json

Merging data dumps

Once you have a set of dump directories with *.json and *.bson files, you can now restore then into a single database. For the first set, you'll use the --drop flag to clean out any existing data:

$ mongorestore --drop rj1-dump

For all subsequent directories, do not use the --drop flag, to preserve the previously merged data sets. Note that the question and assessments data are always duplicated, and will cause MongoDB to throw some warnings about duplicate IDs. You can ignore those, assuming that teachers in the field did not edit existing assessments or questions.

$ mongorestore rj2-dump

Filesystem storage

For an unplatform installation, the JSON data files will be located in the webapps/ directory under the qbank installation (the path can be found by visiting https://localhost:8080/datastore_path). You'll want to copy the entire webapps/CLIx/ directory (will include the student-uploaded files).

Student-uploaded files

In either of the two above situations (gstudio or unplatform), you still need to copy the student-uploaded files, which are not stored in GridFS, but instead are stored on the filesystem. These files will be located in the webapps/ directory under the qbank installation (the full path can be found by visiting https://localhost:8080/datastore_path). You'll want to copy the entire webapps/CLIx/studentResponseFiles/ directory.