-
Notifications
You must be signed in to change notification settings - Fork 128
MADlib Module Anatomy
This page explains all the elements needed to develop and plug-in a new MADlib module.
- Module files overview (source tree perspective)
- Module files explained
- Configuration
- Adding support for other DB platforms
Say you want to write a new MADlib module called NewModule (code name: newmod), use the following directory tree structure as a reference for your module.
./src/
modules/
newmod/ # (REQUIRED) new directory for the module code
newmod.cpp # (optional) C/C++ code for this module
newmod.hpp # (optional) C/C++ header for this module
...
ports/
postgres/
modules/
newmod/
newmod.sql_in # (REQUIRED) SQL file to create DB objects
newmod.py_in # (optional) Python code
test/ # (optional) directory for SQL test scripts
newmod.sql_in # (optional) test scripts that will be run during install-check
...
-
newmod.sql_in - SQL file which creates database objects for this method. This is the only required code file, because there could me a module/method written completely in SQL. There would be no need for Python or C/C++ code in such case. This file is preprocessed with m4 during installation phase and currently uses the following meta variables:
- MADLIB_SCHEMA - will be replaced with the target schema name
- PLPYTHON_LIBDIR - used inside PL/Python routines (UDFs) and will be replaced with a path to a directory with the Python module of each method
- MODULE_PATHNAME - used inside C routines (UDFs) and will be replaced with a path to a directory with the C/C++ module of each method
-
newmod.py_in - Python code for newmod module (preprocessed during build phase for each DB platform)
-
newmod.c/cpp - C/C++ code for newmod module
-
test/newmod.sql_in - SQL test script that is executed during install-check
In order to include the new module in the generic (not database dependent) installation only the following config file must be edited: ./config/Modules.yml
. New name
element must be added with an optional depends
item:
- name: newmod
depends: ['othermod1', 'othermod2']
If you must adjust any of the code to a particular database platform the files which requires changes must be replicated under a dedicated ./port/<portid>/module
directory, see below. The database <portid>
you will be referring to must be already defined in ./config/Ports.yml
config file.
./ports/
greenplum/ # Example port id: greenplum
modules/
newmod/ # (REQUIRED) new directory for the module code
newmod.sql_in # (optional) SQL file to create DB objects
newmod.py_in # (optional) Python code
...