Installing Apache MADlib on Apache HAWQ (incubating) ================================================================= Apache MADlib is a library of statistics and machine learning functions that can be installed in Apache HAWQ. MADlib is installed separately from the main HAWQ installation. For a description of the general MADlib installation process, refer to the MADlib installation guide for PostgreSQL and GPDB: https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide An installation script, hawq_install.sh, installs the MADlib RPM distribution on the HAWQ master and segment nodes. It installs the MADlib files but does not register MADlib functions with HAWQ databases. After running hawq_install.sh, use the madpack utility program to install, reinstall, or upgrade the MADlib database objects. After adding new segment nodes to HAWQ, MADlib must be installed on the new segment nodes. This should be done after the HAWQ binaries are properly installed and preferably before running gpexpand. Requirements ------------ Check that you have completed the following tasks before running the MADlib installation script: - Make sure you have rpm, gpssh, and gpscp in your PATH. - Make sure that you have HAWQ binaries installed properly on all master and segment nodes in your cluster (also new segment nodes when adding new nodes). - Add hawq_install.sh to your PATH. - Make sure the HOSTFILE lists all segment nodes (also new segment nodes when adding new nodes). To install MADlib: 1. Run the following command to install MADlib: hawq_install.sh -r -f [-s] [-d ] [--prefix ] Required Settings ----------------- -r | --rpm-path The path to the MADlib RPM file. -f | --host-file The file containing the host names of all new segments. Optional Settings ----------------- -s | --skip-localhost Set this option to prevent MADlib installation on the localhost. -d | --set-gphome Indicates the HAWQ installation path. If you do not specify one, the installer uses the value stored in the environment variable GPHOME. --prefix Indicates MADlib installation path. If not set, the default value ${GPHOME}/madlib is used. -h | -? | --help Displays help. Example ------- hawq_install.sh -r /home/gpadmin/madlib/madlib-1.11-Linux.rpm -f /usr/local/greenplum-db/hostfile