-
Notifications
You must be signed in to change notification settings - Fork 0
OpenCGA Catalog Data Models
In this section will be explained all the data models used in Catalog.
For more detailed information about the Java data models you can browse the source code at Java beans or take a look at the JSON Schemas.
There are some fields that can be found in many of the different data models, these are:
- id: a numeric positive identifier which is unique in the whole Catalog. This id can be used in the API and REST web services.
- attributes: this field can be used by different applications using OpenCGA to store custom information, any well-formed JSON object is accepted.
- lastActivity: this field reports when was the last time the data was updated, this is useful when updating a web client interface. []: # (For a lastActivity known value, if the value matches with the stored, it is not necessary to)
Register every login and logout made by the user. The sessionId is valid only while the field logout
is empty.
This is the root level of the hierarchy. It represents any people registered in the system.
Most relevant fields are:
- id: Alphanumerical string identifier. This is the only non-numerical Id.
-
status: Accepted account status:
-
ACTIVE
: -
BANNED
: -
DELETED
: -
ACTIVATION_PENDING
:
-
-
role: Accepted role values:
-
ADMIN
: -
USER
: -
ANONYMOUS
:
-
- tools:
Example:
{
"id": "jcoll",
"name": "jacobo",
"email": "[email protected]",
"password": "dWArAxd6QlNqzL9qGchg",
"organization": "ACME",
"role": "USER",
"status": "ACTIVE",
"sessions": [
{
"id" : "Sq6JKQ5Uv8MOwK5jmgyd",
"ip" : "10.0.0.14",
"login" : "20141215162449",
"logout" : "20141215165837"
}
],
"lastActivity": "20141215182938676",
"tools": [],
"configs": {},
"attributes": {},
"projects": [
{
"id": 14,
"name": "Project1",
"alias": "proj1",
"creationDate": "20141215182727",
"description": "Test project",
"organization": "ACME",
"status": "",
"studies": [ ],
"attributes": {}
}
]
}
* In this example the array of studies and tools have been omitted. Will be explained below
Main Catalog object. A study is a set of, among others, files, jobs and samples. All the files in a study share location, cypher and sharing options (ACLs).
Most relevant fields are:
-
type: (to cohort?)
-
CASE_SET
: -
CONTROL_SET
: -
CASE_CONTROL
: -
PAIRED
: -
FAMILY
: -
TRIO
:
-
- stats: (to cohort?)
-
status:
-
ACTIVE
:
-
- diskUsage: Sum of the diskUsage of all files in the study.
-
cipher: Mechanism used to cypher all files in study. Accepted values:
-
NONE
: Without encryption. -
AES_256
: not implemented yet
-
- uri: Location of the study. An URI is required instead of a Path because the study could be in different hosts and file systems.
Example:
{
"id": 15,
"name": "Study test 1",
"alias": "std1",
"type": "FAMILY",
"creatorId": "jcoll",
"creationDate": "20141215182938",
"description": "",
"status": "ACTIVE",
"lastActivity": "20141215182938",
"diskUsage": 0,
"cipher": "NONE",
"acl": [ ],
"experiments": [ ],
"files": [ ],
"jobs": [ ],
"samples": [ ],
"uri": "hdfs:///data/opencga/catalog2/users/jcoll/projects/14/15/",
"datasets": [
{
"id": 0,
"name": "bam_test_files",
"creationDate": "20141215182938",
"description": " ... ",
"files": [ 26, 27, 28, 29, 35, 36, 38],
"attributes": { }
}
],
"cohorts": [ ],
"variableSets": [ ],
"stats": { },
"attributes": { }
}
* In this example the array of files, jobs and samples have been omitted. Will be explained below
Most relevant fields are:
-
type: Accepted values:
-
FILE
: Any real file stored in the file system. -
FOLDER
: File container. -
INDEX
: Not a real file. Represents a indexed file in a OpenCGA-Storage Engine. Removed at v0.6.0
-
-
format:
-
PLAIN
: -
GZIP
: -
BINARY
: -
IMAGE
: -
EXECUTABLE
:
-
-
bioformat:
-
VARIANT
: -
ALIGNMENT
: -
SEQUENCE
: -
NONE
:
-
-
status: File status. For more information, go to File life cycle. Accepted values:
-
UPLOADING
: The file is being uploaded. -
UPLOADED
: Whole file uploaded. It has to be moved to the final destination. -
INDEXING
: The file is being indexed. Removed at v0.6.0 -
READY
: File is ready to use. -
DELETING
: Deletion pending. -
DELETED
: Deleted file. Irreversible deletion.
-
- jobId and experimentId: Specifies the source of the file. A file can be generated from a job or an experiment.
Example:
{
"id" : 3,
"name" : "chr14.phase1_release_v3.20101123.snps_indels_svs.genotypes.refpanel.AMR.vcf.gz",
"type" : "FILE",
"format" : "GZIP",
"bioformat" : "VARIANT",
"path" : "data/vcf/chr14.phase1_release_v3.20101123.snps_indels_svs.genotypes.refpanel.AMR.vcf.gz",
"ownerId" : "jcoll",
"creationDate" : "20141215162449",
"description" : " ... ",
"status" : "READY",
"diskUsage" : 24276833,
"experimentId" : -1,
"sampleIds" : [ ],
"jobId" : -1,
"acl" : [ ],
"stats" : { },
"attributes" : { }
}
Example:
{
"id" : 138,
"name" : "Test job",
"userId" : "jcoll",
"toolName" : "network-miner",
"date" : "20141031151537",
"description" : " ... ",
"startTime" : 1415632245213,
"endTime" : 1415632258708,
"outputError" : "",
"commandLine" : "/opt/opencga/analysis/network-miner/babelomics/babelomics.sh --tool network-miner --seedlist 150140.chrom20.ILLUMINA.bwa.CHM1.20131218.bam.bai --significant-value 0.05 --list HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam --list-tags gene --intermediate 1 --outdir /home/cafetero/opencga/catalog/jobs/J_KrOrWfEwkx/ --order ascending --interactome hsa --randoms 500 --components false --group all --o-name result",
"visits" : -1,
"status" : "READY",
"outDirId" : "6",
"tmpOutDirUri" : "file:///home/cafetero/opencga/catalog/jobs/J_KrOrWfEwkx/",
"input" : [
66
],
"tags" : [ ],
"output" : [
658,
659
],
"attributes" : { },
"executionAttributes" : {
"type" : "analysis",
"jobExecutionId" : "268",
"executionManager" : "SGE",
"qname" : "normal.q",
"group" : "cafetero",
"jobname" : "network-miner_Test_job",
"end_time" : "Wed Dec 10 11:10:06 2014",
"jobnumber" : 268,
"failed" : 0,
"start_time" : "Wed Dec 10 11:10:06 2014",
"hostname" : "host001",
"qsub_time" : "Wed Dec 10 11:10:00 2014",
"mem" : "0.000",
"cpu" : "0.049",
"exit_status" : 0
}
}
Example:
{
"id" : 19,
"name" : "SMP00096",
"source" : "",
"individual" : null,
"description" : " ... ",
"annotationSets" : [
{
"name" : "Basic annotation",
"variableSetId" : 21,
"annotations" : [
{ "id" : "NAME", "value" : "Glennie the platypus" },
{ "id" : "BORN-DATE", "value" : "20071000000000" },
{ "id" : "GENDER", "value" : "FEMALE" }
{ "id" : "PHEN", "value" : "CASE" }
{ "id" : "WEIGHT", "value" : 25.38 }
],
"date" : "20141216135957",
"attributes" : { }
}
]
}