Skip to content

Function `baltic.loadJSON`

Barney Potter edited this page Oct 14, 2024 · 1 revision

loadJSON() Function

Description

The loadJSON() function in BALTIC is used to load a Nextstrain JSON file and create a BALTIC tree object. This function can handle both local JSON files and URLs to Nextstrain JSONs, and provides options for attribute translation and tree processing.

Syntax

def loadJSON(json_object, json_translation={'name':'name','absoluteTime':'num_date'},
             verbose=False, sort=True, stats=True)

Parameters

  • json_object (str or dict): The path to the JSON file, a URL to a Nextstrain JSON, or a JSON object.
  • json_translation (dict): A dictionary that translates JSON attributes to tree attributes. Default is {'name': 'name', 'absoluteTime': 'num_date'}.
  • verbose (bool): If True, prints verbose output during the process. Default is False.
  • sort (bool): If True, sorts the branches of the tree after loading. Default is True.
  • stats (bool): If True, calculates tree statistics after loading. Default is True.

Return Value

  • tuple: A tuple containing the BALTIC tree object created from the JSON and the metadata from the JSON.

Functionality

  1. Loads the JSON data from a file, URL, or object.
  2. Extracts metadata and tree structure from the JSON.
  3. Calls make_treeJSON() to create a BALTIC tree object from the JSON tree data.
  4. Translates JSON attributes to BALTIC tree attributes based on the json_translation dictionary.
  5. Processes node attributes and traits.
  6. Calculates branch lengths based on specified attributes (e.g., 'height' or 'absoluteTime').
  7. Optionally sorts branches and calculates tree statistics.
  8. Extracts and assigns color mappings from the JSON metadata.
  9. Returns the processed BALTIC tree object and the JSON metadata.

Use Cases

  1. Loading Nextstrain JSON files into BALTIC for analysis or visualization.
  2. Importing trees and associated metadata from web-based phylogenetic resources.
  3. Processing trees with custom JSON formats by specifying appropriate attribute translations.
  4. Extracting both tree structure and associated metadata from a single JSON source.

Example

import baltic as bt
import matplotlib.pyplot as plt

# Load a Nextstrain JSON (local file or URL)
tree, metadata = bt.loadJSON("https://nextstrain.org/charon/getDataset?prefix=/zika", verbose=True)

# Print basic tree information
print(f"Number of tips: {len(tree.getExternal())}")
print(f"Number of internal nodes: {len(tree.getInternal())}")

# Check available traits
sample_node = tree.Objects[0]
print("Available traits:", list(sample_node.traits.keys()))

# Visualize the tree
fig, ax = plt.subplots(figsize=(12, 8))
tree.plotTree(ax)
tree.addText(ax, x_attr=lambda n: n.x + 0.001)
plt.title("Phylogenetic Tree")
plt.show()

# Plot a time-scaled tree if absoluteTime is available
if hasattr(tree.getExternal()[0], 'absoluteTime'):
    fig, ax = plt.subplots(figsize=(12, 8))
    tree.plotTree(ax)
    tree.addText(ax, x_attr=lambda n: n.absoluteTime)
    ax.set_xlabel("Time")
    plt.title("Time-scaled Phylogenetic Tree")
    plt.show()

# Print some metadata information
print("\nMetadata:")
for key in metadata:
    print(f"  {key}")

# If color mappings are available, print an example
if hasattr(tree, 'cmap') and tree.cmap:
    print("\nColor mapping example:")
    example_trait = next(iter(tree.cmap))
    print(f"Trait: {example_trait}")
    for category, color in list(tree.cmap[example_trait].items())[:5]:
        print(f"  {category}: {color}")

Notes

  • The function can handle JSON data from local files, URLs (specifically Nextstrain URLs), or pre-loaded JSON objects.
  • The json_translation dictionary is crucial for mapping JSON attributes to BALTIC tree attributes. Modify this if your JSON uses different key names.
  • Ensure that the json_translation dictionary includes mappings for 'name' and at least one of 'absoluteTime', 'length', or 'height'.
  • The function extracts color mappings from the JSON metadata, which can be useful for consistent coloring in visualizations.
  • When loading from a Nextstrain URL, the function uses the requests library to fetch the data.
  • The function assumes a specific structure for Nextstrain JSONs. If working with custom JSON formats, you may need to modify the function or preprocess your JSON.
  • The sort and stats options allow for immediate processing of the tree after loading, which can be useful for quick analyses or visualizations.
  • This function is particularly useful for working with Nextstrain data or other phylogenetic data in a similar JSON format.
Clone this wiki locally