Skip to content

MADlib Module Anatomy

agorajek edited this page May 13, 2011 · 13 revisions

This page explains all the elements needed to sucesfully develop and plug-in a new MADlib module.

Say you want to write a new MADlib module called NewModule (code name: newmod).

1. Module files overview (source tree):

./src/
    modules/
        newmod/              # (REQUIRED) new directory for the module code
            newmod.sql_in    # (REQUIRED) SQL file to create DB objects
            newmod.py_in     # (optional) Python code
            newmod.c/cpp     # (optional) C/C++ code for this module
            test/            # (optional) directory for SQL test scripts
                newmod.sql_in    
                ...

2. Module files explained:

  • newmod.sql_in - SQL file which creates database objects for this method. This is the only required code file, because there could me a module/method written completely in SQL. There would be no need for Python or C/C++ code in such case. This file is preprocessed with m4 during installation phase and currently uses the following meta variables:
    • MADLIB_SCHEMA - will be replaced with the target schema name
    • PLPYTHON_LIBDIR - used inside PL/Python routines (UDFs) and will be replaced with a path to a directory with the Python module of each method
    • MODULE_PATHNAME - used inside C routines (UDFs) and will be replaced with a path to a directory with the C/C++ module of each method
  • newmod.py_in - Python code for newmod module (preprocessed during build phase for each DB platform)
  • newmod.c/cpp - C/C++ code for newmod module
  • test/newmod.sql_in - SQL test script written according to Unit-Testing-Guide

3. Configuration:

In order to include the new module in the generic (not database dependent) installation only the following config file must be edited: ./config/Modules.yml. New name element must be added with an optional depends item:

    - name:    newmod
      depends: ['othermod1', 'othermod2']

4. Adding support for other DB platforms:

If you must adjust any of the code to a particular database platform the files which requires changes must be replicated under a dedicated ./port/<portid>/module directory, see below:

./ports/
    greenplum/                   # Example port id: greenplum
        modules/
            newmod/              # (REQUIRED) new directory for the module code
                newmod.sql_in    # (optional) SQL file to create DB objects
                newmod.py_in     # (optional) Python code
                newmod.c/cpp     # (optional) C/C++ code for this module
                test/            # (optional) directory for SQL test scripts
                    newmod.sql_in    
                    ...