-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nextprot dataset and protein examples #423
base: master
Are you sure you want to change the base?
Conversation
"@id": "https://www.nextprot.org/entry/NX_P52701", | ||
"includedInDataset": "ftp://ftp.nextprot.org/pub/current_release/xml/nextprot_all.xml.gz", | ||
"citation": { | ||
"@id": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the citations are not mandatory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good start but things I would suggest changing.
@@ -0,0 +1,43 @@ | |||
{ | |||
"@type": "Dataset", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have an @id
property to identify the dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, @id is important for Dataset
"name": "Creative Commons CC BY 4.0 Attribution", | ||
"url": "https://creativecommons.org/licenses/by/4.0/" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
identifier
property is missing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it is minimum according to https://bioschemas.org/profiles/Dataset/0.3-RELEASE-2019_06_14/.
By the way, I did not see a dct:conformsTo linking to the corresponding Bioschemas profile version, you should add it if you do not have it yet.
@@ -0,0 +1,43 @@ | |||
{ | |||
"@type": "Dataset", | |||
"name": "neXtProt entries", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 'entries' part of the dataset name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not seem it is from what I saw on their website
"description": "The collection of neXtProt entries for human proteins", | ||
"url": "https://www.nextprot.org", | ||
"keywords": "nextprot,Human,Proteins,Proteome,Proteomics,protein database,protein knowledgebase,protein resource,human protein,human proteome,function,medical,disease,expression,interactions,sequence,isoform,mutation,variant,phenotypes,proteomics,peptide,structure,3D,annotation,biocuration,chromosomes,protein validation,protein-coding genes,post-translational modifications,ptm,data integration,systems biology,genetic variations,UniProt", | ||
"distribution": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be correct
{ | ||
"@context": "http://schema.org", | ||
"@type": "DataRecord", | ||
"@id": "https://www.nextprot.org/entry/NX_P52701", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This @id
should be different from the value for the main entity. You may just want to add #DR
onto the end of this one.
"@context": "http://schema.org", | ||
"@type": "DataRecord", | ||
"@id": "https://www.nextprot.org/entry/NX_P52701", | ||
"includedInDataset": "ftp://ftp.nextprot.org/pub/current_release/xml/nextprot_all.xml.gz", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of this should be the value of the @id
in your Dataset markup. It should point to the description of the dataset rather than the download file.
"@type": "DataRecord", | ||
"@id": "https://www.nextprot.org/entry/NX_P52701", | ||
"includedInDataset": "ftp://ftp.nextprot.org/pub/current_release/xml/nextprot_all.xml.gz", | ||
"citation": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omit this property if there isn't a value for it.
You may also want to add a citation
property into your dataset markup
"mainEntity": { | ||
"@id": "https://www.nextprot.org/entry/NX_P52701", | ||
"@type": "Protein", | ||
"http://purl.org/dc/terms/conformsTo": "https://bioschemas.org/specifications/Protein/0.9-DRAFT", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update to 0.11-RELEASE
"http://purl.org/dc/terms/conformsTo": "https://bioschemas.org/specifications/Protein/0.9-DRAFT", | ||
"identifier": "NX_P52701", | ||
"name": "DNA mismatch repair protein Msh6", | ||
"description": "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omit properties for which there is no data. However in this case you probably want to include the text in your overview section of the webpage
"isEncodedByBioChemEntity": { | ||
"@type": "Gene", | ||
"name": "MSH6", | ||
"identifier": "HGNC:7329", | ||
"hasRepresentation": "2p16.3" | ||
}, | ||
"taxonomicRange": { | ||
"@id": "https://identifiers.org/taxonomy:9606", | ||
"@type": "Taxon", | ||
"name": "Human" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you intend that these are embedded within the hasBioChemEntityPart
rather than properties of the protein directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the profile, these two properties can be used directly for a Protein.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @AlasdairGray comments. I have added a couple more, please have a look. Thanks.
@@ -0,0 +1,43 @@ | |||
{ | |||
"@type": "Dataset", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, @id is important for Dataset
@@ -0,0 +1,43 @@ | |||
{ | |||
"@type": "Dataset", | |||
"name": "neXtProt entries", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not seem it is from what I saw on their website
}, | ||
{ | ||
"@type": "DataDownload", | ||
"fileFormat": "RDF" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are not you missing the download URL here?
"name": "Creative Commons CC BY 4.0 Attribution", | ||
"url": "https://creativecommons.org/licenses/by/4.0/" | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it is minimum according to https://bioschemas.org/profiles/Dataset/0.3-RELEASE-2019_06_14/.
By the way, I did not see a dct:conformsTo linking to the corresponding Bioschemas profile version, you should add it if you do not have it yet.
"isEncodedByBioChemEntity": { | ||
"@type": "Gene", | ||
"name": "MSH6", | ||
"identifier": "HGNC:7329", | ||
"hasRepresentation": "2p16.3" | ||
}, | ||
"taxonomicRange": { | ||
"@id": "https://identifiers.org/taxonomy:9606", | ||
"@type": "Taxon", | ||
"name": "Human" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the profile, these two properties can be used directly for a Protein.
Examples