-
Notifications
You must be signed in to change notification settings - Fork 1
Home
[Database](## How are things stored in the database?)
There are 9 tables in the database. The detailed definition data type and constraints can be found in the files db/db_schema.mwb and tasklog.sql. Here, we document what each table/field is used for.
user: information used to identify an unique user.
- userid - the internal unique identifier of a user.
- username - If the user would register, then this should be the email address of the user.
- password - If the user would register, then he/she can have a password.
- token - Apart from username/password we can generate a random token to store in the user's browser storage and use that as the identifier of the user. This would be useful if the user does not register.
- IP - Just some additional information used for identifying the user.
- role - The user can be an experiment subject, an admin person for the experiment, or an annotator, depending on the study.
- activated - This can be useful for example when a user forgets to turn off the logger after the experiment period, then we can set the user to be inactivated to stop the logging.
user_profile: further details of the users, e.g., from questionnaires.
- userid - a foreign key referred from the user table.
- name - the name of the user
- age - the age of the user
- education - the education level of the user
- profession - the profession of the user
- experienceSearch - how much experience the user has with search systems, likert scale.
- frequencySearch - how often the user uses search systems,
- searchFor - what does the user usually search for? Free text.
- otherInfo - other information recorded per experiment. Free text.
Foreign key: userid is a foreign key from the user table. Updates/deletes are cascade.
task: task descriptions.
- taskid - an unique identifier for the tasks created by users
- userid - a task is created by a user
- taskname - a short name for the task
- description - a short description for the task
- timeCreated - the client time when the task is created
- timeCompleted - the client time when the task is marked as completed
- timeStarted - the client time when a user indicates that he/she starts working on the task
- timeStored - the server timestamp when the task is uploaded in the database.
Foreign key: userid is a foreign key from the user table. However, when the user is updated/deleted, no action for this table. The tasks created by the (deleted) user can still be used. We need to manually update/delete the corresponding task information if needed.
task_hierarchy: the hierarchy structure of the tasks. The tasks are stored as a tree. Each task points to its parent task (a broader task). For a root task, it points to -1.
- taskid
- supTask - the parent task that the current task points to. Default -1, i.e., the current task is a root.
Foreign key: the taskid is a foreign key from the task table. Updates/deletes are cascade. However, supTask is not constrained by existing tasks, extra care is needed when(1) inserting a supTask, check if it is an existing task; and (2) deleting a task when it has a child: the child should be connected to the parent of the deleted task or -1.
query: queries issued by users. Each issued query is a unique record. It doesn't matter if the query is has been issued before or not.
- queryid - the unique identifier of a query
- query - query text
- clientTime - the client time when the query is issued
- timestamp - the server time when the query is uploaded
- userid - the user who issued the query.
Information in this table can actually be derived from the searchLog or eventLog table, but it would facilitate the annotation tasks.
Foreign key: userid is a foreign key from the user table, but no action will be performed on updating/deleting the user. We need to manually update/delete records if needed.
searchLog: search log of a user. This is a subset of the event log.
- userid - the user being logged
- element - the element being listened.
- event - the event being logged
- referer - the HTTP referer of the current status. Using this we can re-construct the user path.
- refererType - the type of the referer, e.g., whether it's from a query, a querySuggetion, or a result link.
- info - detailed information about the event.
- clientTime - the client time when the event happens
- timestamp - the timestamp of the server when the event is logged.
- url - the current url. We assume that search activities always happen with a certain url. This url can be the url of the search engine or the url of the clicked result.
- searchEngine - the search engine with which the search happened.
Foreign key: userid is a foreign key from the user table. Updates/deletes are cascade.
Examples of events that can be logged:
What can happen during user interaction with a search system:
-
user issues a query -> SERP loads
-
user clicks on a result -> result page lands (-> user goes back to SERP -> SERP loads). User can also go from one result page to some other web pages, but this has nothing to do with the search engine anymore.
-
user clicks on pagination -> SERP loads
-
user clicks on query suggestion -> SERP loads
element | event | referer | refererType | info | url |
---|---|---|---|---|---|
querybox | query | none | none | queryid, queryType: user | URL of the search engine |
serp | load | previous url | It could by query, query suggestion, pagination, or webpage (e.g., when return from a clicked result) | The SERP page content, e.g., as html | current url |
pager | click | previous url | SERP | page number | current url |
query seggestion | click | previous url | SERP | queryid, queryType: query suggestion | current url |
webpage | load | previous url | SERP | time loaded | current url |
webpage | focus/focus out | previous url | SERP/other (a page may be open from a SERP but then user goes to other tabs and then come back later) | time | current url |
eventLog: similar as searchLog, but with some more logging options and extra fields:
- device - in case we go for multiple device types, e.g., mobile/tablet/computer
anno_task_query: annotation of query - task relation.
- taskid - the task to be annotated
- userid - user who does the annotation
- queryid - queries associated with that task
Foreign key: userid, taskid, queryid are foreign keys. On update cascade, but on deletion no action.
anno_relevance: annotation of the relevance of a url for a query
- userid - the user who does the annotation
- queryid - the query being annotated
- url - the url being annotated
- relevant - the relevance of the url for the query
- type - the type of relevance. For example, users can find the relevance information in 3 ways: summary - user find answer directly from summary; doc - user find relevant information from the document; direct - user find the answer directly from the search engine.
Foreign key: userid, queryid are foreign keys. On update cascade, on delete no action.