Installing MADlib on Pivotal HAWQ ================================= MADlib is a library of statistics and machine learning functions that can be installed in HAWQ. MADlib is installed separately from the main HAWQ installation. For a description of the general MADlib installation process, refer to the MADlib installation guide for PostgreSQL and GPDB: https://github.com/madlib/madlib/wiki/Installation-Guide An installation script, hawq_install.sh, installs the MADlib RPM distribution on the HAWQ master and segment nodes. It installs the MADlib files but does not register MADlib functions with HAWQ databases. After running hawq_install.sh, use the madpack utility program to install, reinstall, or upgrade the MADlib database objects. After adding new segment nodes to HAWQ, MADlib must be installed on the new segment nodes. This should be done after the HAWQ binaries are properly installed and preferably before running gpexpand. Upgrading HAWQ from 1.1 to 1.2 ------------------------------ In HAWQ 1.1 a portion of MADlib v0.5 came preinstalled. These functions in their original form are incompatible with HAWQ 1.2 and will be removed as part of the HAWQ 1.2 upgrade. Dependencies on MADlib 0.5 should be removed from the installation before performing the HAWQ 1.2 upgrade. When the HAWQ upgrade is complete, install MADlib 1.5 or higher and then reinstall the MADlib database objects using the madpack utility. Requirements ------------ Check that you have completed the following tasks before running the MADlib installation script: - Make sure you have rpm, gpssh, and gpscp in your PATH. - Make sure that you have HAWQ binaries installed properly on all master and segment nodes in your cluster (also new segment nodes when adding new nodes). - Add hawq_install.sh to your PATH. - Make sure the HOSTFILE lists all segment nodes (also new segment nodes when adding new nodes). To install MADlib: 1. Run the following command to install MADlib: hawq_install.sh -r -f [-s] [-d ] [--prefix ] Required Settings ----------------- -r | --rpm-path The path to the MADlib RPM file. -f | --host-file The file containing the host names of all new segments. Optional Settings ----------------- -s | --skip-localhost Set this option to prevent MADlib installation on the localhost. -d | --set-gphome Indicates the HAWQ installation path. If you do not specify one, the installer uses the value stored in the environment variable GPHOME. --prefix Indicates MADlib installation path. If not set, the default value ${GPHOME}/madlib is used. -h | -? | --help Displays help. Example ------- hawq_install.sh -r /home/gpadmin/madlib/madlib-1.5-Linux.rpm -f /usr/local/greenplum-db/hostfile