Skip to content

Commit

Permalink
Add function to read back taxonomy grouping json
Browse files Browse the repository at this point in the history
  • Loading branch information
lczech committed Dec 17, 2024
1 parent 6bafda4 commit 7060cea
Show file tree
Hide file tree
Showing 2 changed files with 64 additions and 0 deletions.
32 changes: 32 additions & 0 deletions lib/genesis/taxonomy/functions/kmer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -682,6 +682,38 @@ void write_taxonomy_grouping_to_json(
utils::JsonWriter().write( doc, target );
}

// --------------------------------------------------------------------------
// read_taxonomy_grouping_from_json
// --------------------------------------------------------------------------

std::vector<TaxonomyGroupData> read_taxonomy_grouping_from_json(
std::shared_ptr<utils::BaseInputSource> source
) {
std::vector<TaxonomyGroupData> result;
auto doc = utils::JsonReader().read( source );
auto& arr = doc.get_array();
result.reserve( arr.size() );
for( auto& child : arr ) {
TaxonomyGroupData elem;
elem.group_index = child.at( "group_index" ).get_number_unsigned();
if( elem.group_index != result.size() ) {
throw std::runtime_error(
"Taxonomy grouping json file contains " + std::to_string( result.size() ) +
"entries, but with non-consecutive group indices. "
"Found group_index" + std::to_string( elem.group_index )
);
}
elem.num_sequences = child.at( "num_sequences" ).get_number_unsigned();
elem.sum_seq_lengths = child.at( "sum_seq_lengths" ).get_number_unsigned();
for( auto const& tax : child.at( "taxa" ).get_array() ) {
elem.taxa.push_back( tax.get_string() );
}
result.push_back( std::move( elem ));
child.clear();
}
return result;
}

// --------------------------------------------------------------------------
// write_kmer_taxonomy_to_json
// --------------------------------------------------------------------------
Expand Down
32 changes: 32 additions & 0 deletions lib/genesis/taxonomy/functions/kmer.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,29 @@ std::string grouped_taxonomy_report( Taxonomy const& tax );
// Json Format
// =================================================================================================

/**
* @brief Data of a single group, as a POD.
*
* The functions for grouping of a Taxonomy, group_by_taxon_sizes() and
* group_with_target_number_of_groups(), produce annotations of the Taxonomy in form of KmerTaxonData
* for each Taxon. We can store the data in a json file with write_taxonomy_grouping_to_json(),
* which is meant for inspection of the results of grouping.
*
* When reading back this json data, we however might not always want to reconstruct all data on
* the Taxonomy where it came from; instead, for some use cases, we might just be interested in the
* data per group, without the underlying Taxonomy. To this end, read_taxonomy_grouping_from_json()
* reads such a file again, and produces a vector of elements containing the group data.
*
* This data structure captures the data there. See write_taxonomy_grouping_to_json() for details.
*/
struct TaxonomyGroupData
{
size_t group_index;
size_t num_sequences;
size_t sum_seq_lengths;
std::vector<std::string> taxa;
};

/**
* @brief Write the resulting list of groups of taxonomic grouping to json.
*
Expand All @@ -234,6 +257,15 @@ void write_taxonomy_grouping_to_json(
std::shared_ptr<utils::BaseOutputTarget> target
);

/**
* @brief Read data back in that was written by write_taxonomy_grouping_to_json().
*
* See TaxonomyGroupData for a description of the resuling POD struct per taxonomic group.
*/
std::vector<TaxonomyGroupData> read_taxonomy_grouping_from_json(
std::shared_ptr<utils::BaseInputSource> source
);

/**
* @brief Write a Taxonomy with KmerTaxonData, in our internal Json format.
*
Expand Down

0 comments on commit 7060cea

Please sign in to comment.