From a5fb0c42ed2b5ddf72d9f111f463dc76bb325f57 Mon Sep 17 00:00:00 2001 From: Bruno Amaral Date: Fri, 29 Mar 2024 11:51:13 +0000 Subject: [PATCH 1/3] update info for articles table --- docs/02.1-database tables and fields.md | 100 +++++++++++------------- 1 file changed, 45 insertions(+), 55 deletions(-) diff --git a/docs/02.1-database tables and fields.md b/docs/02.1-database tables and fields.md index 14042dbb..4b59349f 100644 --- a/docs/02.1-database tables and fields.md +++ b/docs/02.1-database tables and fields.md @@ -1,5 +1,3 @@ -[TOC] - # Description of database tables and fields [Tables and Relations Diagram](images/gregory-table-relations-diagram.png) @@ -8,63 +6,53 @@ An Article is a published piece of knowledge, and is usually assigned the value of `science paper` for the `kind` column. -| Field Name | Field Type | Options/Comments | Description | -| ------------------- | --------------- | ------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | -| article_id | AutoField | primary_key=True | | -| title | TextField | blank=False, null=False, unique=True | | -| link | URLField | max_length=2000, blank=False, null=False | | -| doi | CharField | max_length=280, blank=True, null=True | Digital Object Identifier, used to avoid duplicates. | -| summary | TextField | blank=True, null=True | The abstract of the paper or full content of a news article. | -| source | ForeignKey | Sources, on_delete=models.DO_NOTHING, db_column='source', blank=True, null=True | | -| published_date | DateTimeField | blank=True, null=True | The date we get from the source or from the maintenance task that checks the DOI for extra information. | -| discovery_date | DateTimeField | auto_now_add=True | Date when we added the article to the database. | -| authors | ManyToManyField | Authors, blank=True | | -| categories | ManyToManyField | Categories | | -| entities | ManyToManyField | Entities | | -| relevant | BooleanField | blank=True, null=True | Manual annotation of the relevancy of the article to improve the quality of life for patients. | -| ml_prediction_gnb | BooleanField | blank=True, null=True | Machine Learning Prediction with the Gausian Naive Bayes Model. | -| ml_prediction_lr | BooleanField | blank=True, null=True | Machine Learning Prediction with the Logarithm Regression Model | -| noun_phrases | JSONField | blank=True, null=True | Natural Language Processing to pick up the noun phrases in the title. Not used right now. | -| sent_to_admin | BooleanField | blank=True, null=True | True if the article was sent to the admin for review. | -| sent_to_subscribers | BooleanField | blank=True, null=True | True if the article was sent to the subscribers. | -| kind | CharField | choices=KINDS, max_length=50, default='science paper' | Not used right now. Possible values are `science paper` and `news article`. | -| access | CharField | choices=ACCESS_OPTIONS, max_length=50, default=None, null=True | Indicates if the article is open access or restricted. | -| publisher | CharField | max_length=150, blank=True, null=True, default=None | Publisher of the Journal | -| container_title | CharField | max_length=150, blank=True, null=True, default=None | Journal Title | -| crossref_check | DateTimeField | blank=True, null=True | Date and time when we last checked crossref.org for up to date information. | -| takeaways | TextField | blank=True, null=True | Extraction of key takeaways from the summary / abstract. | +| Field Name | Field Type | Options/Comments | Description | +| --------------------- | ----------------- | -------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------- | +| article_id | AutoField | primary_key=True | | +| title | TextField | blank=False, null=False, unique=True | | +| link | URLField | max_length=2000, blank=False, null=False | | +| doi | CharField | max_length=280, blank=True, null=True | Digital Object Identifier, used to avoid duplicates. | +| summary | TextField | blank=True, null=True | The abstract of the paper or full content of a news article. | +| source | ForeignKey | Sources, on_delete=models.DO_NOTHING, db_column='source', blank=True, null=True | | +| published_date | DateTimeField | blank=True, null=True | The date we get from the source or from the maintenance task that checks the DOI for extra information. | +| discovery_date | DateTimeField | auto_now_add=True | Date when we added the article to the database. | +| authors | ManyToManyField | Authors, blank=True | | +| categories | ManyToManyField | Categories | | +| entities | ManyToManyField | Entities | | +| relevant | BooleanField | blank=True, null=True | Manual annotation of the relevancy of the article to improve the quality of life for patients. | +| ml_prediction_gnb | BooleanField | blank=True, null=True | Machine Learning Prediction with the Gaussian Naive Bayes Model. | +| ml_prediction_lr | BooleanField | blank=True, null=True | Machine Learning Prediction with the Logistic Regression Model. | +| ml_prediction_lsvc | BooleanField | blank=True, null=True | Machine Learning Prediction with the Linear Support Vector Classification Model. | +| noun_phrases | JSONField | blank=True, null=True | Natural Language Processing to pick up the noun phrases in the title. Not used right now. | +| sent_to_admin | BooleanField | blank=True, null=True | True if the article was sent to the admin for review. | +| sent_to_subscribers | BooleanField | blank=True, null=True | True if the article was sent to the subscribers. | +| kind | CharField | choices=KINDS, max_length=50, default='science paper' | Not used right now. Possible values are `science paper` and `news article`. | +| access | CharField | choices=ACCESS_OPTIONS, max_length=50, default=None, null=True | Indicates if the article is open access or restricted. | +| publisher | CharField | max_length=150, blank=True, null=True, default=None | Publisher of the Journal | +| container_title | CharField | max_length=150, blank=True, null=True, default=None | Journal Title | +| crossref_check | DateTimeField | blank=True, null=True | Date and time when we last checked crossref.org for up to date information. | +| takeaways | TextField | blank=True, null=True | Extraction of key takeaways from the summary / abstract. | ## Authors -| Field Name | Field Type | Options/Comments | Description | -| ----------- | ------------- | -------------------------------------------------- | ---------------------------------------------------------------------------------- | -| author_id | AutoField | primary_key=True | Primary key | -| family_name | CharField | max_length=150, blank=False, null=False | | -| given_name | CharField | max_length=150, blank=False, null=False | | -| ORCID | CharField | max_length=150, blank=True, null=True, unique=True | The URL for the author profile. Example: https://orcid.org/0000-0003-3045-0304 | -| country | CountryField | blank=True, null=True | Taken from the ORCID profile, empty most of the times. Does not equal affiliation. | -| orcid_check | DateTimeField | blank=True, null=True | Date of when we last checked ORCID to avoid overloading them with requests | +| Field Name | Field Type | Options/Comments | Description | +| ------------ | ------------- | -------------------------------------------------- | ---------------------------------------------------------------------------------- | +| author_id | AutoField | primary_key=True | Primary key | +| family_name | CharField | max_length=150, blank=False, null=False | | +| given_name | CharField | max_length=150, blank=False, null=False | | +| ORCID | CharField | max_length=150, blank=True, null=True, unique=True | The URL for the author profile. Example: https://orcid.org/0000-0003-3045-0304 | +| country | CharField | max_length=2, blank=True, null=True | Country code (ISO 3166-1 alpha-2). Taken from the ORCID profile. | +| orcid_check | DateTimeField | blank=True, null=True | Date of when we last checked ORCID to avoid overloading them with requests. | ## Categories -A category is a medicine, a treatment, or special area of research. Categories are applied to both clinical trials and articles and allow us to look at both side by side. In this example we can see how much research was published around Autologous Hematopoietic Stem Cell Transplantation (aHSCT) and how many clinical trials were done in the same period of time. https://gregory-ms.com/categories/ahsct/ - -| Field Name | Field Type | Options/Comments | Description | -| -------------------- | ---------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- | -| category_id | AutoField | primary_key=True | | -| category_description | TextField | blank=True, null=True | Annotation of the description of the drug and reason for including it in the database. | -| category_name | CharField | max_length=200, blank=True, null=True | | -| category_slug | SlugField | blank=True, null=True, unique=True | Used to build the API endpoint | -| category_terms | ArrayField | CharField(max_length=100), default=list, verbose_name='Terms to include in category (comma separated)', help_text="Add terms separated by commas." | List of search terms for that category | - -## Entities - -Not used right now, the initial idea was to identify entities based on an ontology. - -| Field Name | Field Type | Options/Comments | -| ---------- | ---------- | ---------------- | -| entity | TextField | | -| label | TextField | | +| Field Name | Field Type | Options/Comments | Description | +| --------------------- | ----------- | ------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------- | +| category_id | AutoField | primary_key=True | | +| category_description | TextField | blank=True, null=True | Annotation of the description of the drug and reason for including it in the database. | +| category_name | CharField | max_length=200, blank=True, null=True | | +| category_slug | SlugField | blank=True, null=True, unique=True | Used to build the API endpoint | +| category_terms | ArrayField | CharField(max_length=100), default=list, verbose_name='Terms to include in category (comma separated)' | List of search terms for that category | ## Sources @@ -95,8 +83,10 @@ Groundwork so that we can one day use the same Gregory install for multiple fiel ## Trials -| Field Name | Field Type | Options/Comments | World Health Organisation match | Description | -| ----------------------------- | ----------------- | --------------------------------------------------------------------------------- | ------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +Additional field in the `trials` table: + +| Field Name | Field Type | Options/Comments | Description | +| ------------------------- | ------------- | ----------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ | | trial_id | AutoField | primary_key=True | | | | discovery_date | DateTimeField | blank=True, null=True | | Date when the trial was added to the database | | title | TextField | blank=False, null=False, unique=True | | | From 7962408685d083dceac32061dcdefffcd2425466 Mon Sep 17 00:00:00 2001 From: Bruno Amaral Date: Fri, 29 Mar 2024 11:56:47 +0000 Subject: [PATCH 2/3] add missing field --- docs/02.1-database tables and fields.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/02.1-database tables and fields.md b/docs/02.1-database tables and fields.md index 4b59349f..cd81bb37 100644 --- a/docs/02.1-database tables and fields.md +++ b/docs/02.1-database tables and fields.md @@ -98,7 +98,8 @@ Additional field in the `trials` table: | sent | BooleanField | blank=True, null=True | | Unsure if it can be removed. | | sent_to_subscribers | BooleanField | blank=True, null=True | | | | sent_to_admin | BooleanField | blank=True, null=True, default=False | | True if it was sent to the admin | -| sent_real_time_notification | BooleanField | blank=True, null=True, default=False | | | +| last_updated | DateTimeField | | The timestamp when the trial record was last updated. | +| sent_real_time_notification | BooleanField | blank=True, null=True, default=False | Indicates whether a real-time notification was sent for the trial. | | categories | ManyToManyField | Categories, blank=True | | | | identifiers | JSONField | blank=True, null=True | | Right now there isn’t a single identifier number for clinical trials. Each registry uses its own format and we are saving them all as a jsonb column. There is a plan to establish a [Universal Trial Number](https://www.who.int/clinical-trials-registry-platform/unambiguous-trial-identification/the-universal-trial-number-(utn)#:~:text=The%20aim%20of%20the%20Universal,the%20history%20of%20the%20trial.) for the WHO registry. | | history | HistoricalRecords | | | We are keeping track of changes to the clinical trials but are not using this information right now. | From 3e0477c97222d15d41451a09bdc38d0bbfa52ea8 Mon Sep 17 00:00:00 2001 From: Bruno Amaral Date: Fri, 29 Mar 2024 12:15:48 +0000 Subject: [PATCH 3/3] minor edit --- docs/02.1-database tables and fields.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/02.1-database tables and fields.md b/docs/02.1-database tables and fields.md index cd81bb37..64def295 100644 --- a/docs/02.1-database tables and fields.md +++ b/docs/02.1-database tables and fields.md @@ -83,7 +83,7 @@ Groundwork so that we can one day use the same Gregory install for multiple fiel ## Trials -Additional field in the `trials` table: +Data for Clinical Trials | Field Name | Field Type | Options/Comments | Description | | ------------------------- | ------------- | ----------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |