Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add news item about KM/C-PH survival analysis tool using the lifelines package #2112

Merged
merged 3 commits into from
Aug 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 9 additions & 30 deletions content/news/2023-08-07-generic-tabular-plotter/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,14 +40,16 @@ This Iris test data is relatively tiny with about 150 rows, so the points
are nicely separated, making the hover information easy to use.

Interactive html plots work best for at most, a few thousand well spread points,
so the hover display is easy to control.
so the hover display is easy to control. They reliably freeze up using a recent firefox build if 10k rows, so this tool
now always fails if >5k rows are chosen for html output. Advice to this effect has been
added to the form.

Interactive html output is available in stand alone format, where 3MB of javascript is included,
For <5k rows of data, interactive html output is available in stand alone format, where 3MB of javascript is included,
allowing it to be viewed offline. Short form html requires an internet connection to download the
javascript into the browser so cannot be viewed offline.

PNG plots are recommended for large numbers of rows, since the the hover function tends to be less useful
when the plot is very crowded.
Only PNG output options will work for large numbers of rows, since the the hover function tends to be less useful
when the plot is very crowded, and large html outputs can make browser windows freeze up.

If the tabular data does not have a header row of column names, the user can supply and use a
comma delimited list, as the "header" parameter on the tool form.
Expand All @@ -64,31 +66,8 @@ transformed evalues seem highly collinear with bitscores, in the few samples tes

### Installation for testing

The [plotly_tabular_plot](https://toolshed.g2.bx.psu.edu/repository/browse_repository?id=a4961ff57ce13935) tool, owned by fubar, is available for testing, in the main Galaxy Toolshed.
It is very new and so not suitable for production use yet. Please let me know if it works for you.

### Tool code

The tool code is available for review at the <a href="https://github.com/fubar2/plotly_tabular_tool">github repository</a> where issues should
be raised when there are problems or suggestions. This is machine generated code, so pull requests don't
make much sense. The generator can be rerun with simple changes easily so please suggest
any useful things you'd like to see.

### Regenerating and editing the tool

This tool was created using the <a href="https://github.com/fubar2/galaxy_tf_overlay">ToolFactory</a> automated, form driven code generator
described in this <a href="https://training.galaxy.lazarus.name/training-material/topics/dev/tutorials/tool-generators/tutorial.html">training module.</a>

This generic tabular file plotter tool is now included in the ToolFactory docker or local installation, in a built-in history containing 3
advanced examples.

The code and test for the tool in the Toolshed can be regenerated by re-running the supplied history job in the ToolFactory.

The regenerated tool form can be edited, changing the tool_id to create a new tool. The new tool could be specialised for
certain kinds of tabular data, such as this [25 column Galaxy Blast search output plotter](https://github.com/fubar2/plotly_blast_tool). It
was generated using the generic tabular plotter as the starting point, adding a default header and
a transformation of the evalues, to make it better suited to that specific kind of data. That generated tool is also included in the
ToolFactory built-in advanced history.

The [plotly_tabular_plot](https://toolshed.g2.bx.psu.edu/repository/browse_repository?id=a4961ff57ce13935) tool, owned by fubar,
is available for testing, in the main Galaxy Toolshed.
It is very new and so not suitable for production use yet. Please let me know if it works for you at the [github repository](https://github.com/fubar2/plotly_tabular_tool).


Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
59 changes: 59 additions & 0 deletions content/news/2023-08-11-lifelineskmcph/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
title: "Survival analysis for right censored data using lifelines"
date: "2023-08-11"
authors: Ross Lazarus
authors_structured:
- github: fubar2
tease: "Kaplan-Meier and Cox proportional hazards models are available for testing in Galaxy"
hide_tease: true
subsites: [all]
---

A wrapper for the [lifelines](https://lifelines.readthedocs.io/en/latest/Survival%20Analysis%20intro.html) package is available

1. Runs a Kaplan-Meier analysis and generates a KM plot with 95% confidence intervals.
2. If a grouping variable is provided, produces curves by group.
3. If there are exactly 2 groups, runs a log-rank test of the hypothesis of no difference.
4. If covariates are provided, a Cox's proportional hazards model is run, the proportionality assumptions are tested and partial plots
generated, for each covariate

Any Galaxy tabular data with a column containing time and status in a format suitable for pandas and lifelines can be used as input.
Time might be an integer month since a treatment. Status might be 0 for no failure at observation time, 1 for death or failure.
Other columns can be used as groups for KM, or as covariates for Cox-PH.

If the data has no header row, the default column names are col1,....coln unless a header parameter, containing column names in order
delimited with "," is supplied on the tool form.

Whatever the source of column names, they must match the ones provided as parameters.

### Using the Rossi recidivism data from the lifelines tutorials

With race as a grouping variable, the report shows a logrank test result.

![KM plot sample](lifelines_rossi_km.png)

A comma separated list (prio, age, race, mar, fin) of covariate column names was provided,
so a Cox-PH model is run, the assumption of proportionality are tested,and recommendations made
in the text report.

![KM plot sample](lifelines_report.png)

For each covariate, a Schoenfeld diagnostic plot is produced in a history collection.

![KM plot sample](lifelines_rossi_schoenfeld.png)

Partial plots for each covariate are produced. Quintiles are used for covariates with > 10 distinct values.
Non-ordinal categories with > 10 values will produce meaningless quintiles, but ordinal should work.
10 or fewer distinct values are used as is.

![C-PH partial plot samples](agepartialrossi.png)

![C-PH partial plot samples](parolepartialrossi.png)

A tabular survival table and life table are written to the collection.

### Installation for testing

The [lifelines tool](https://toolshed.g2.bx.psu.edu/view/fubar/lifelines_km_cph_tool/dd5e65893cb8), owned by fubar,
is available for testing, in the main Galaxy Toolshed. It is very new and so not suitable for production use yet.
Please let me know if it works for you at the [github repository](https://github.com/fubar2/lifelines_tool).
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.