Home

Table of content

[Database](## How are things stored in the database?)

How are things stored in the database?

There are 9 tables in the database. The detailed definition data type and constraints can be found in the files db/db_schema.mwb and tasklog.sql. Here, we document what each table/field is used for.

user: information used to identify an unique user.

userid - the internal unique identifier of a user.
username - If the user would register, then this should be the email address of the user.
password - If the user would register, then he/she can have a password.
token - Apart from username/password we can generate a random token to store in the user's browser storage and use that as the identifier of the user. This would be useful if the user does not register.
IP - Just some additional information used for identifying the user.
role - The user can be an experiment subject, an admin person for the experiment, or an annotator, depending on the study.
activated - This can be useful for example when a user forgets to turn off the logger after the experiment period, then we can set the user to be inactivated to stop the logging.

user_profile: further details of the users, e.g., from questionnaires.

userid - a foreign key referred from the user table.
name - the name of the user
age - the age of the user
education - the education level of the user
profession - the profession of the user
experienceSearch - how much experience the user has with search systems, likert scale.
frequencySearch - how often the user uses search systems,
searchFor - what does the user usually search for? Free text.
otherInfo - other information recorded per experiment. Free text.

Foreign key: userid is a foreign key from the user table. Updates/deletes are cascade.

task: task descriptions.

taskid - an unique identifier for the tasks created by users
userid - a task is created by a user
taskname - a short name for the task
description - a short description for the task
timeCreated - the client time when the task is created
timeCompleted - the client time when the task is marked as completed
timeStarted - the client time when a user indicates that he/she starts working on the task
timeStored - the server timestamp when the task is uploaded in the database.

Foreign key: userid is a foreign key from the user table. However, when the user is updated/deleted, no action for this table. The tasks created by the (deleted) user can still be used. We need to manually update/delete the corresponding task information if needed.

task_hierarchy: the hierarchy structure of the tasks. The tasks are stored as a tree. Each task points to its parent task (a broader task). For a root task, it points to -1.

taskid
supTask - the parent task that the current task points to. Default -1, i.e., the current task is a root.

Foreign key: the taskid is a foreign key from the task table. Updates/deletes are cascade. However, supTask is not constrained by existing tasks, extra care is needed when(1) inserting a supTask, check if it is an existing task; and (2) deleting a task when it has a child: the child should be connected to the parent of the deleted task or -1.

query: queries issued by users. Each issued query is a unique record. It doesn't matter if the query is has been issued before or not.

queryid - the unique identifier of a query
query - query text
clientTime - the client time when the query is issued
timestamp - the server time when the query is uploaded
userid - the user who issued the query.

Information in this table can actually be derived from the searchLog or eventLog table, but it would facilitate the annotation tasks.

Foreign key: userid is a foreign key from the user table, but no action will be performed on updating/deleting the user. We need to manually update/delete records if needed.

searchLog: search log of a user. This is a subset of the event log.

userid - the user being logged
element - the element being listened.
event - the event being logged
referer - the HTTP referer of the current status. Using this we can re-construct the user path.
refererType - the type of the referer, e.g., whether it's from a query, a querySuggetion, or a result link.
info - detailed information about the event.
clientTime - the client time when the event happens
timestamp - the timestamp of the server when the event is logged.
url - the current url. We assume that search activities always happen with a certain url. This url can be the url of the search engine or the url of the clicked result.
searchEngine - the search engine with which the search happened.

Foreign key: userid is a foreign key from the user table. Updates/deletes are cascade.

Examples of events that can be logged:

What can happen during user interaction with a search system:

user issues a query -> SERP loads
user clicks on a result -> result page lands (-> user goes back to SERP -> SERP loads). User can also go from one result page to some other web pages, but this has nothing to do with the search engine anymore.
user clicks on pagination -> SERP loads
user clicks on query suggestion -> SERP loads

element	event	referer	refererType	info	url
querybox	query	none	none	queryid, queryType: user	URL of the search engine
serp	load	previous url	It could by query, query suggestion, pagination, or webpage (e.g., when return from a clicked result)	The SERP page content, e.g., as html	current url
pager	click	previous url	SERP	page number	current url
query seggestion	click	previous url	SERP	queryid, queryType: query suggestion	current url
webpage	load	previous url	SERP	time loaded	current url
webpage	focus/focus out	previous url	SERP/other (a page may be open from a SERP but then user goes to other tabs and then come back later)	time	current url

eventLog: similar as searchLog, but with some more logging options and extra fields:

device - in case we go for multiple device types, e.g., mobile/tablet/computer

anno_task_query: annotation of query - task relation.

taskid - the task to be annotated
userid - user who does the annotation
queryid - queries associated with that task

Foreign key: userid, taskid, queryid are foreign keys. On update cascade, but on deletion no action.

anno_relevance: annotation of the relevance of a url for a query

userid - the user who does the annotation
queryid - the query being annotated
url - the url being annotated
relevant - the relevance of the url for the query
type - the type of relevance. For example, users can find the relevance information in 3 ways: summary - user find answer directly from summary; doc - user find relevant information from the document; direct - user find the answer directly from the search engine.

Foreign key: userid, queryid are foreign keys. On update cascade, on delete no action.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Table of content

How are things stored in the database?

Clone this wiki locally