Skip to content
Jiyin He edited this page Mar 30, 2015 · 7 revisions

Table of content

[Database](## How are things stored in the database?)

How are things stored in the database?

There are 9 tables in the database. The detailed definition data type and constraints can be found in the files db/db_schema.mwb and tasklog.sql. Here, we document what each table/field is used for.

user: information used to identify an unique user.

  • userid - the internal unique identifier of a user.
  • username - If the user would register, then this should be the email address of the user.
  • password - If the user would register, then he/she can have a password.
  • token - Apart from username/password we can generate a random token to store in the user's browser storage and use that as the identifier of the user. This would be useful if the user does not register.
  • IP - Just some additional information used for identifying the user.
  • role - The user can be an experiment subject, an admin person for the experiment, or an annotator, depending on the study.
  • activated - This can be useful for example when a user forgets to turn off the logger after the experiment period, then we can set the user to be inactivated to stop the logging.

user_profile: further details of the users, e.g., from questionnaires.

  • userid - a foreign key referred from the user table.
  • name - the name of the user
  • age - the age of the user
  • education - the education level of the user
  • profession - the profession of the user
  • experienceSearch - how much experience the user has with search systems, likert scale.
  • frequencySearch - how often the user uses search systems,
  • searchFor - what does the user usually search for? Free text.
  • otherInfo - other information recorded per experiment. Free text.

Foreign key: userid is a foreign key from the user table. Updates/deletes are cascade.

task: task descriptions.

  • taskid - an unique identifier for the tasks created by users
  • userid - a task is created by a user
  • taskname - a short name for the task
  • description - a short description for the task
  • timeCreated - the client time when the task is created
  • timeCompleted - the client time when the task is marked as completed
  • timeStarted - the client time when a user indicates that he/she starts working on the task
  • timeStored - the server timestamp when the task is uploaded in the database.

Foreign key: userid is a foreign key from the user table. However, when the user is updated/deleted, no action for this table. The tasks created by the (deleted) user can still be used. We need to manually update/delete the corresponding task information if needed.

task_hierarchy: the hierarchy structure of the tasks. The tasks are stored as a tree. Each task points to its parent task (a broader task). For a root task, it points to -1.

  • taskid
  • supTask - the parent task that the current task points to. Default -1, i.e., the current task is a root.

Foreign key: the taskid is a foreign key from the task table. Updates/deletes are cascade. However, supTask is not constrained by existing tasks, extra care is needed when(1) inserting a supTask, check if it is an existing task; and (2) deleting a task when it has a child: the child should be connected to the parent of the deleted task or -1.

query: queries issued by users. Each issued query is a unique record. It doesn't matter if the query is has been issued before or not.

  • queryid - the unique identifier of a query
  • query - query text
  • clientTime - the client time when the query is issued
  • timestamp - the server time when the query is uploaded
  • userid - the user who issued the query.

Information in this table can actually be derived from the searchLog or eventLog table, but it would facilitate the annotation tasks.

Foreign key: userid is a foreign key from the user table, but no action will be performed on updating/deleting the user. We need to manually update/delete records if needed.

searchLog: search log of a user. This is a subset of the event log.

  • userid - the user being logged
  • element - the element being listened.
  • event - the event being logged
  • referer - the HTTP referer of the current status. Using this we can re-construct the user path.
  • refererType - the type of the referer, e.g., whether it's from a query, a querySuggetion, or a result link.
  • info - detailed information about the event.
  • clientTime - the client time when the event happens
  • timestamp - the timestamp of the server when the event is logged.
  • url - the current url. We assume that search activities always happen with a certain url. This url can be the url of the search engine or the url of the clicked result.
  • searchEngine - the search engine with which the search happened.

Foreign key: userid is a foreign key from the user table. Updates/deletes are cascade.

Examples of events that can be logged:

What can happen during user interaction with a search system:

  1. user issues a query -> SERP loads

  2. user clicks on a result -> result page lands (-> user goes back to SERP -> SERP loads). User can also go from one result page to some other web pages, but this has nothing to do with the search engine anymore.

  3. user clicks on pagination -> SERP loads

  4. user clicks on query suggestion -> SERP loads

element event referer refererType info url
querybox query none none queryid, queryType: user URL of the search engine
serp load previous url It could by query, query suggestion, pagination, or webpage (e.g., when return from a clicked result) The SERP page content, e.g., as html current url
pager click previous url SERP page number current url
query seggestion click previous url SERP queryid, queryType: query suggestion current url
webpage load previous url SERP time loaded current url
webpage focus/focus out previous url SERP/other (a page may be open from a SERP but then user goes to other tabs and then come back later) time current url

eventLog: similar as searchLog, but with some more logging options and extra fields:

  • device - in case we go for multiple device types, e.g., mobile/tablet/computer

anno_task_query: annotation of query - task relation.

  • taskid - the task to be annotated
  • userid - user who does the annotation
  • queryid - queries associated with that task

Foreign key: userid, taskid, queryid are foreign keys. On update cascade, but on deletion no action.

anno_relevance: annotation of the relevance of a url for a query

  • userid - the user who does the annotation
  • queryid - the query being annotated
  • url - the url being annotated
  • relevant - the relevance of the url for the query
  • type - the type of relevance. For example, users can find the relevance information in 3 ways: summary - user find answer directly from summary; doc - user find relevant information from the document; direct - user find the answer directly from the search engine.

Foreign key: userid, queryid are foreign keys. On update cascade, on delete no action.

Clone this wiki locally