SEPIA is a framework for comparing the accuracies of algorithms that prioritize individuals by risk of transmitting HIV (Human Immunodeficiency Virus).
SEPIA is written in Python 3 and requires the following dependencies:
sudo apt-get update
sudo apt-get install python3-pip
pip3 install numpy
pip3 install scipy
pip3 install matplotlib
pip3 install seaborn
Additional external packages will also be installed as shown below by running efficacy_functions.py
:
from gzip inport open as gopen
from sys import stderr
import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
from itertools import repeat
SEPIA is designed to be used through the bash interface. To install SEPIA, clone the directory to the desired location:
git clone https://github.com/Moshiri-Lab/SEPIA.git
SEPIA.py matches all individuals in the user's ordering along with the number of people each individual infected during a specified period of time. Then, computes the Kendall Tau-b correlation coefficient between the user's ordering and the optimal ordering.
usage: SEPIA.py [-h] -m METRIC [-i INPUT] [-t TRANMSISSIONHIST]
[-c CONTACTNET] -s START [-e END] [-v]
File takes in a prioritization ordering and runs through the SEPIA workflow to
output the Kendall Tau B correlation coefficient between their ordering and
the most optimal ordering, as generated by the chosen metric. If verbose flag
is specified, intermediate data in the process can be outputted to stderr.
-h, --help show this help message and exit
-m METRIC, --metric METRIC
Metric of prioritization (1-6) (default: None)
-i INPUT, --input INPUT
Input File - User's Ordering (default: stdin)
-t TRANMSISSIONHIST, --transmissionHist TRANSMISSIONHIST
Transmission History File (default: )
-c CONTACTNET, --contactNet CONTACTNET
Contact History File (default: )
-s START, --start START
Time Start (default: None)
-e END, --end END Time End (default: inf)
-v, --verbose Print Intermediate List with Individuals Matched to
Counts (default: False)
efficacyFunctions.py defines several functions used in the scripts above.
We have implemented six distinct metrics to generate optimal orderings, with each defining a unique way of calculating the count values of individuals such that individuals with higher counts will have higher priority in the ordering.
In this metric, each individual's count is calculated as the number of individuals that they have directly (one edge away) transmitted HIV to.
The below figure illustrates an example transmission network, with arrows indicating a transmission from one person (node) to another:
In this example, Person A has four outgoing edges, indicating that Person A transmitted HIV to four people and has a direct transmission count of 4. Similarly, Person B has no outgoing edges, so Person B's count is 0.
In this metric, each individual's count is calculated as the slope of a best-fit line plotted in a step graph of all of the individual's outgoing transmissions over a specified time period on the horizontal axis. The line of best-fit starts at the event of the individual first transmitting HIV to someone else; this aims to prioritize individuals that transmit HIV to more people over a short time period, as they will have steeper slopes.
With this metric, we hope to take into account that individuals who transmit HIV to others more recently should have higher priority than individuals who transmitted HIV to others longer ago.
The following figure shows the resulting lines of best-fit for two cases:
The graph on the left represents a case in which the individual started transmitting HIV more recently, whereas the graph on the right represents a case in which the individual had multiple outgoing transmissions early in the time period but stopped towards the middle. This design thus gives higher priority to the individual represented by the left side with multiple recent outgoing transmissions, as their slope is greater.
This metric extends Metric 1 in order to analyze an individual's greater impact on the community.
Each individual's count is calculated as the cumulative number of individuals they indirectly (more than 1 edge away) transmitted HIV to for up to any given number of degrees away.
For instance, in the example transmission network from (1), Person A (highlighted in red) directly transmitted HIV to Persons B, C, D, and E (highlighted in yellow), who then transmitted HIV to Persons F, G, H, and I (highlighted in blue), who then transmitted HIV to Persons J, K, L, M, N, and O (highlighted in green). Thus, given the number of degrees away = 2 (2 edges away in the figure), Person A's indirect transmission are F, G, H, and I, which sums to a count of 4. Similarly, given the number of degrees away = 3, Person A's indirect transmissions are F, G, H, I, J, K, L, M, N, and O, for a count of 10.
This metric merges Metrics 1 and 3 to take into account each individual's direct and indirect transmissions.
Each individual's count is calculated as the cumulative number of individuals that they have directly (1 edge away) and indirectly (2+ edges away) transmitted HIV to for up to any given number of degrees away.
In the example network from (1), at 2 degrees away, Person A (highlighted in red) has 4 direct transmissions (to Persons B, C, D, and E) and 4 indirect transmissions (to Persons F, G, H, and I) for a total count of 8. Similarly, at 3 degrees away, Person A has 4 direct transmissions and 10 indirect transmissions (Persons F, G, H, I, J, K, L, M, N, O) for a total count of 14.
This metric measures an individual's priority based on their number of contacts, with an edge in a contact network existing between any two individuals who have a relationship through which HIV may be transmitted.
The figure below illustrates an example contact network corresponding to the transmission network in previous examples:
In this example, Person A has undirected edges between themself and Persons B, C, D, and E, so Person A has a count of 4. Similarly, Person B has indirected edges between themself and Persons A, R, and S, so Person B has a count of 3.
This metric combines Metrics 1 and 5 in order to take into account each individual's number of direct transmissions and number of contacts.
In the example transmission and contact networks from (1) and (5), Person D has direct transmissions to Persons G and H, and is in contact with Persons A, G, H, and P, so Person D has a total count of 6.