Skip to content

Plugins

Téo Lemane edited this page Oct 20, 2021 · 1 revision

Plugins support

kmtricks already provides some basic filtering features (using --hard-min, --recurrence-min) and a k-mer rescue procedure (--soft-min, --share-min). There are two ways to achieve more specific filtering on the kmtricks matrix: 1) Run kmtricks until the count step and then use the kmtricks API to stream the matrix and apply specific filters. Although the API provides all the things to do this, it is not always very convenient. 2) Produce the whole matrix before reading it back for filtering which seems not very efficient.

To address this issue, kmtricks now support plugins. Basically, it is about writing a class that inherits from km::IMergePlugin and compile it as shared library. Then, it will be loaded during kmtricks execution using POSIX run-time dynamic linking.

The primary purpose of plugins is filtering. However, they give a view on each row (a k-mer/hash and a count vector) of the matrix during its construction. Feel free to do whatever you want with this.

Usage

1. Clone the repository

git clone --recursive https://github.com/tlemane/kmtricks

2. Write the plugin

The plugin source files must be put in kmtricks/plugins directory. For instance: kmtricks/plugins/my_plugin/my_plugin.cpp.

Plugins must inherit from km::IMergePlugin interface:

class IMergePlugin
{
public:
  IMergePlugin() = default;
  virtual ~IMergePlugin() {}
  virtual void set_out_dir(const std::string& s) final { m_output_directory = s; }
  virtual void set_partition(size_t p) final { m_partition = p; }

  // override me: see TemplateEx
  virtual void set_kmer_size(const size_t kmer_size) { m_kmer_size = kmer_size; }

  // override me if you need some configuration for your plugins
  // kmtricks cli allows to pass a basic string to plugins (--plugin-config)
  // Could be a path to a config file for instance.
  virtual void configure(const std::string& s) {}

  // called on each row during matrix construction
  // values in count_vector can be modified (set to zero abundances less than your threshold for instance) but do not resize it!
  // return true -> keep the row
  // return false -> discard the raw
  virtual bool process_kmer(const uint64_t* kmer_data, std::vector<typename selectC<DMAX_C>::type>& count_vector) { return true; }

  // same as process_kmer but for hash mode
  virtual bool process_hash(uint64_t h, std::vector<typename selectC<DMAX_C>::type>& count_vector) { return true; }

protected:

  // if you need to dump somethings, write them in this directory.
  // corresponds to <kmtricks_dir>/plugin_output
  std::string m_output_directory;

  // kmer-size
  size_t m_kmer_size;

  // partitions are merged in parallel and each thread uses its plugin instance
  // this is the partition id
  size_t m_partition;
};

a. Basic plugin (mainly for filtering)

#include <kmtricks/plugin.hpp>

// DMAX_C is a compile definition set by cmake
using count_type = typename km::selectC<DMAX_C>::type;

class BasicEx : public km::IMergePlugin
{
public:
  BasicEx() = default;
private:
  unsigned int m_threshold {0};

  // Override process_kmer
  // Discard lines which contain abundances less than a threshold
  bool process_kmer(const uint64_t* kmer_data, std::vector<count_type>& count_vector) override
  {
    for (auto& c : count_vector)
      if (c < m_threshold)
        return false;
    return true;
  }

  // Override configure (not necessary if you don't need configuration)
  // The string is passed to kmtricks with --plugin-config
  // Here it's a simple example where the string is a threshold
  // It could be a path to a config file for instance
  void configure(const std::string& s) override
  {
    m_threshold = std::stoll(s);
  }
};

// Make the plugin loadable
extern "C" std::string plugin_name() { return "BasicEx"; }
extern "C" int use_template() { return 0; }
extern "C" km::IMergePlugin* create0() { return new BasicEx(); }
extern "C" void destroy(km::IMergePlugin* p) { delete p; }

b. Template plugin (if you need the k-mer content)

kmtricks uses different backend for k-mer representation which means you need a template plugin for use the km::Kmer<> class.

#include <kmtricks/plugin.hpp>

// Same as BasicEx
using count_type = typename km::selectC<DMAX_C>::type;

template<std::size_t MAX_K>
class TemplateEx : public km::IMergePlugin
{
public:
  TemplateEx() = default;
private:
  unsigned int m_threshold {0};
  // Declare a k-mer
  km::Kmer<MAX_K> m_kmer;

public:
  // same as BasicEx
  void configure(const std::string& s)
  {
    m_threshold = std::stoll(s);
  }

  // Override set_kmer_size to pass the k-mer size to m_kmer
  void set_kmer_size(size_t kmer_size) override
  {
    this->m_kmer_size = kmer_size;
    m_kmer.set_k(this->m_kmer_size);
  }

  // Override process_kmer
  // Discard lines which contain abundances less than a threshold if the k-mer starts with 'A'
  bool process_kmer(const uint64_t* kmer_data, std::vector<count_type>& count_vector) override
  {
    m_kmer.set64_p(kmer_data);
    if (m_kmer.at(0) == 'A')
    {
      for (auto& c : count_vector)
      {
        if (c < m_threshold)
        {
          return false;
        }
      }
    }
    return true;
  }
};

// Make the plugin loadable
extern "C" std::string plugin_name() { return "TemplateEx"; }
extern "C" int use_template() { return 1; }
extern "C" km::IMergePlugin* create32() { return new TemplateEx<32>(); } // call if --kmer-size < 32
extern "C" km::IMergePlugin* create64() { return new TemplateEx<64>(); } // call if --kmer-size < 64
extern "C" km::IMergePlugin* create512() { return new TemplateEx<512>(); } // call if --kmer-size < 512

// With create32, create64 and create512, the plugin supports k-mer size in [8, 64) and [480, 512)

3. Compile and run

After compilation, the plugins are provided as shared libraries at: build/plugins/lib<name>.so

a. Compile plugins only and use an already compiled binary of kmtricks (from conda for instance)

Compile:

# '-c 4': bytes per count, plugins and kmtricks must be compiled with the same value.
# for instance binary provided by conda package uses 4 bytes per count.
./install.sh -c 4 -p -q

Note that the binary with plugin support provided by the conda package is kmtricksp instead of kmtricks.

Run:

# To be consistent with the examples, an integer is used for --plugin-config
# In real case, it would be something like a path to a config file
kmtricksp --plugin build/plugins/lib<name>.so --plugin-config 12 <KMTRICKS_ARGS>

b. Compile plugins and kmtricks

Compile:

./install.sh -p

Run:

./bin/kmtricks --plugin build/plugins/lib<name>.so --plugin-config 12 <KMTRICKS_ARGS>