Skip to content

NetWAS Guide for Python

Alicja Tadych edited this page Jul 27, 2017 · 16 revisions

WikiGuidesNetWAS Using Python

Version 1.0

GIANT API Python module can be used to run NetWAS analyses from Python.


Prerequisites

Requests

Requests module is needed to communicate with the GIANT servers. It can be installed using:

$ pip install requests

Pandas (Python Data Analysis Library) - optional

NetWAS job can be analyzed using Pandas data frames. Scientific Python distributions such as Anaconda will already have Pandas. Otherwise, it can be installed using:

$ pip install pandas


Installation

$ git clone https://github.com/FunctionLab/giant-api.git
$ cd giant-api/client/python

Include from giantapi import NetwasJob in your Python session or scripts.


Usage

Example 1

After installation, start a Python session.

$ python

Import NetwasJob from the giantapi module.

>>> from giantapi import NetwasJob

Define a new NetWAS job.

>>> nj1 = NetwasJob(title="NetWAS Python client demo I", 
	   			    gwas_file="./bmi-2012.out", 
				    gwas_format="vegas", 
				    tissue="api-demo", 
				    p_value=0.01)

The keyword argument tissue='api-demo' will invoke a special demo mode on the GIANT servers, simulating a short NetWAS job. (Ordinary NetWAS jobs take up to 30 minutes to complete.)

You can optionally include an email address using the keyword argument email='[email protected]', which will receive notification of job results.

Job status can be checked simply by displaying (or printing) the job object.

>>> nj1
{
    "id": null,
    "created": null,
    "title": "NetWAS Python client demo I",
    "email": null,
    "gwas_file": "./bmi-2012.out",
    "gwas_format": "vegas",
    "tissue": "api-demo",
    "p_value": 0.01,
    "log_file": null,
    "result_file": null,
    "status": null
}

To start the SVM training, use the start command.

>>> nj1.start()

Job status can be checked by printing the job as usual.

>>> nj1
{
    "id": "d32492a5-9dbd-41c3-a15f-9fa60a43c93c",
    "created": "2015-08-06T19:50:38.093858Z",
    "title": "NetWAS Python client demo I",
    "email": null,
    "gwas_file": "http://giant-api.princeton.edu/media/uploads/.../bmi-2012.out",
    "gwas_format": "vegas",
    "tissue": "api-demo",
    "p_value": 0.01,
    "log_file": "http://giant-api.princeton.edu/media/results/.../log.txt",
    "result_file": null,
    "status": "running"
}

Training log can be viewed using:

>>> print nj1.log()

Reading genes
NEW Class array
Cross Validation Trial 0
SLACK NORM =1
ALG=3
Learned
Classified 3103 examples
NEW Class array
Cross Validation Trial 1
SLACK NORM =1
...

After a short while, results of the demo will be available.

>>> nj1
{
    "id": "d32492a5-9dbd-41c3-a15f-9fa60a43c93c",
    "created": "2015-08-06T19:50:38.093858Z",
    "title": "NetWAS Python client demo I",
    "email": null,
    "gwas_file": "http://giant-api.princeton.edu/media/uploads/.../bmi-2012.out",
    "gwas_format": "vegas",
    "tissue": "api-demo",
    "p_value": 0.01,
    "log_file": "http://giant-api.princeton.edu/media/results/.../log.txt",
    "result_file": "http://giant-api.princeton.edu/media/results/.../result.txt",
    "status": "completed"
}

The result file can be downloaded from the given URL or it can be returned as a Pandas data frame:

>>> result = nj1.result()
>>> for line in result.splitlines()[0:10]:
    print line

gene	class	z_score
FOXI2	-1	0.507116
DHCR24	-1	0.498371
MRPL12	-1	0.471051
NMI	-1	0.453594
TWF1	-1	0.452324
ZBTB24	-1	0.451116
FGF2	-1	0.438866
GOLM1	-1	0.38386
COX5B	-1	0.381948


Example 2

The second example illustrates NetWAS with actual networks. Define two jobs with the same GWAS file but on different tissues.

>>> nj1 = NetwasJob(title="GIANT API Python Client Example - Heart", 
	   			    gwas_file="./bmi-2012.out", 
				    gwas_format="vegas", 
				    tissue="heart", 
				    p_value=0.05)
>>> nj2 = NetwasJob(title="GIANT API Python Client Example - Muscle", 
                    	   			    gwas_file="./bmi-2012.out", 
                    				    gwas_format="vegas", 
                    				    tissue="muscle", 
                    				    p_value=0.01)

Jobs can be started and their progress monitored.

>>> nj1.start()
>>> nj2.start()

>>> print nj1.log()
...
>>> print nj2.log()
...

When the jobs are completed (and if you install pandas), you can read their results into data frames.

>>> from StringIO import StringIO
>>> from pandas import DataFrame
>>> ss1 = StringIO(nj1.result())
>>> df1 = DataFrame.from_csv(ss1,sep='\t')
>>> ss2 = StringIO(nj2.result())
>>> df2 = DataFrame.from_csv(ss2,sep='\t')

>>> df1.head()

        class   z_score
gene                   
NAP1L1      -1  2.06044
LMF1        -1  1.81052
PPA2        -1  1.80502
SOS1        -1  1.69904
TMEM30A     -1  1.66365

>>> df2.head()

       class   z_score
gene                   
KRT6B      -1  0.817125
EFTUD2     -1  0.793585
CDKN1B     -1  0.792667
ITGA2B     -1  0.725828
DLAT       -1  0.686413

Result data frames can be joined on the gene column for comparison.

>>> dfc = df1.join(df2, lsuffix='_heart', rsuffix='_muscle')
>>> dfc.head()

         class_heart  z_score_heart  class_muscle  z_score_muscle
gene                                                         
NAP1L1            -1      2.06044            -1     -0.076636
LMF1              -1      1.81052            -1      0.054080
PPA2              -1      1.80502            -1     -0.552529
SOS1              -1      1.69904            -1     -0.311439
TMEM30A           -1      1.66365            -1     -0.003742

Results can be filtered or processed using the standard Pandas dataframe functions. For example, we can filter for genes whose network-prioritized rankings moved in the same direction for both tissues. (Although the z-scores were too small in this particular example to be meaningful.)

>>> dfc[(dfc['class_heart'] * dfc['class_muscle'] > 0) & 
        (dfc['z_score_heart' ] > 0.5) & 
        (dfc['z_score_muscle'] > 0.5)]

        class_heart, z_score_heart, class_muscle, z_score_muscle
gene
SNRPN            -1     1.153500            -1      0.615083
IQGAP1           -1     0.534167            -1      0.685550

Testing

Source code for two examples are provided in demo.py and netwas.py scripts respectively. These can be executed using:

$ python -i demo.py
$ python -i netwas.py

Retrieving old jobs from the GIANT server

Use the class method NetwasJob.from_server() with a valid job ID to retrieve that job from the GIANT server. This allows checking the progress and results of long-running jobs from subsequent Python sessions. (Users are still responsible for keeping track of IDs of any NetWAS jobs they create.)

Example

Define and start a new NetWAS job, record its ID, then exit Python.

>>> nj1 = NetwasJob(title="NetWAS job retrieval demo",
		   	           gwas_file="./bmi-2012.out", 
				    gwas_format="vegas", 
				    tissue="api-demo", 
				    p_value=0.01)
>>> nj1.start()
>>> nj1
{
    "id": "a99e5848-ee59-4e55-a8d2-95186441634c",
	...
	 "status": "running",
	...
}
>>> exit()

Now start a new Python session and retrieve the previous job using its ID.

>>> nj1 = NetwasJob.from_server('1461585d-89d6-4c63-a0b1-d40d32cccce7')
>>> nj1
{
    "id": "1461585d-89d6-4c63-a0b1-d40d32cccce7",
    "created": "2017-07-26T18:47:32Z",
    "title": "GIANT API NetWAS Example in Python",
    "email": "",
    "gwas_file": "http://giant-api.princeton.edu/media/netwas/uploads/1461585d-89d6-4c63-a0b1-d40d32cccce7/bmi-2012.out",
    "tissue": "api-demo",
    "p_value": 0.01,
    "log_file": "http://giant-api.princeton.edu/media/netwas/results/1461585d-89d6-4c63-a0b1-d40d32cccce7/log.txt",
    "results_file": "http://giant-api.princeton.edu/media/netwas/results/1461585d-89d6-4c63-a0b1-d40d32cccce7/results.txt",
    "status": "completed"
}

>>> for line in result.splitlines()[0:10]:
        print line

gene	class	z_score
FOXI2	-1	0.507116
DHCR24	-1	0.498371
MRPL12	-1	0.471051
NMI	-1	0.453594
TWF1	-1	0.452324
ZBTB24	-1	0.451116
FGF2	-1	0.438866
GOLM1	-1	0.38386
COX5B	-1	0.381948
LRRC27	-1	0.378153
NPNT	-1	0.37708
CNPY3	-1	0.376211
RPL29	-1	0.374641
RPL21	-1	0.374403