-
Notifications
You must be signed in to change notification settings - Fork 128
MADlib Installer Notes (v0.1alpha)
- updated on 1/3/2011*
- updated version 12/21/2010*
- MADlib Installer Notes (2010 Oct)
- MADlib Installer Notes (2011 Feb)
These instructions assume you have Greenplum in $GPHOME
, and a gpadmin
user account with sudo privileges.
- Make sure you have Python
setuptools
installed. If you're using Greenplum, you'll need to do this manually. Log into your gpadmin account, make sure you've sourced the greenplum paths, and do the following:
cd /tmp
wget http://pypi.python.org/packages/2.6/s/setuptools/setuptools-0.6c11-py2.6.egg#md5=bfa92100bd772d5a213eedd356d64086
sh setuptools-0.6c11-py2.6.egg
- Install the python libraries used by the madlib installer.
export CFLAGS="-L$GPHOME/ext/python/lib/ -L $GPHOME/lib/"
$GPHOME/ext/python/bin/easy_install argparse hashlib pyyaml sqlparse psycopg2
-
Change directory into the
madlib-contrib
root. You now have two choices: build an rpm for distribution and install it, or simply install the python code you have. The former is more like what will happen eventually, the latter is easier for madlib developers.a. Option 1: Build rpm and Install.
python setup.py bdist_rpm
cd dist
rpm -Uvh madlib-0.01-1.noarch.rpm
sudo chown -R gpadmin $GPHOME/ext/python/lib/python2.6/site-packages/mad*
b. **Option 2: Install directly from the repo.**
python setup.py install
- In your newly-installed madpy extension, use
vi
(or substitute your favorite editor) to edit themadpy/Config.yml
file to reflect your information. You'll likely only need to change theconnect_args
, but you may want to change the other fields as well.
vi $GPHOME/ext/python/lib/python2.6/site-packages/madpy/Config.yml
- Make sure you have already defined the apprioriate madlib schema in the appropriate database (the schema and database specified in your
madpy/Config.yml
in the previous step) and that PLpgSQL and PL/Pythonu languages are installed in your database:
CREATE SCHEMA <your_madlib_schema>;
CREATE LANGUAGE plpgsql;
CREATE LANGUAGE plpythonu;
- Now that the python libraries are installed in the filesystem, it's time to build the database extensions and install them.
madpack install
- To undo things, you want to uninstall the extensions from the database, and remove the rpm if you installed that way.
madpack uninstall
sudo rpm -e madlib
Information about packages is stored in two places.
- Installation configuration is in
madpy/Config.yml
. The format is fairly straightforward: you specify a unique name for your method (which should be the directory name undermethods
in the repo) and a desired port to install (which should be the directory name under<yourmethod>/src
.) If you like, you can also place a Config.yml file into some directory //Config.yml in your filesystem, and runmadpack -c /<path-to-dir> install
.) - Each port directory should have an
Install.yml
file that specifies SQL scripts to roll "forward" (fw
) and "backward" (bw
). Amodule
key is also required to hold the module name (but is unused as of now so this may change). Seesketch/src/extended_sql/pg_gp/Install.yml
for an example. Thedepends
key holds a list of modules that this one depends on, which will be installed before this package is attempted. Seeprofile/src/extended_sql/pg_gp/Install.yml
for an example.
Note: the madpack
script will attempt to run make install
in the port directory, which you can use to generate appropriate SQL install directory references via the use of pgxs. This requires you to configure two important things:
- The Makefile for your method should end with the line
include config.mk
Do not create this file yourself; it will be autogenerated (and deleted) during the madpack installation process. See sketch/src/extended_sql/pg_gp/Makefile
for an example.
2. SQL scripts should use the string MADLIB_SCHEMA
as the schema before any function or table names; this will be replaced by the value of the target_schema
in Config.yml
. See sketch/src/extended_sql/pg_gp/sketches.sql.in
for an example.
3. SQL scripts are now passed through the m4 preprocessor, which allows you to place conditional text into your SQL. For example, in [[sketches.sql.in|https://github.com/madlib/madlib-contrib/blob/master/methods/sketch/src/extended_sql/pg_gp/sketches.sql.in] I have:
CREATE AGGREGATE MADLIB_SCHEMA.fmcount(anyelement)
(
sfunc = MADLIB_SCHEMA.fmsketch_trans,
stype = bytea,
finalfunc = MADLIB_SCHEMA.fmsketch_getcount,
ifdef(`GREENPLUM',`prefunc = MADLIB_SCHEMA.fmsketch_merge,')
initcond = ''
);
You can now register macro definitions in the Config.yml
file via the key prep_flags
, as in "prep_flags: -DMADLIB -DGREENPLUM
"
Try running madpack -h
which provides fairly extensive help.
If you run into trouble, here are some manual steps to clean things out. I assume here you specified a schema called madlib
. If not, replace your schema name in the below:
% psql <your database name>
psql (8.2.13)
Type "help" for help.
<your database name>=# drop schema madlib cascade;
<your database name>=# ^D\q
% rm -rf $GPHOME/ext/python/lib/python2.6/site-packages/mad*
%