apache-pig Getting started with apache-pig Installation or Setup


Example

Linux

Requirements (r0.16.0)

Mandatory

As per current Apache-Pig documentation it supports only Unix & Windows operating systems.

  • Hadoop 0.23.X, 1.X or 2.X
  • Java 1.6 or Later versions installed and JAVA_HOME environment variable set to Java installation directory

Optional

  • Python 2.7 or more (Python UDFs)
  • Ant 1.8 (for builds)

Download the latest Pig release

Download the latest version of pig from http://pig.apache.org/releases.html#Download

Installation

mkdir Pig
cd Downloads/ 
tar zxvf pig-(latest-version).tar.gz 
tar zxvf pig-(latest-version).tar.gz 
mv pig-(latest-version).tar.gz/* /home/Pig/

Configuration

After installing Apache Pig, we have to configure it.

Open the .bashrc file

vim ~/.bashrc

In the .bashrc file, set the following variables −

export PIG_HOME = /home/Pig
export PATH  = PATH:/home/Pig/bin

save the file and reload bashrc again in the environment using

. ~/.bashrc

Verifying Pig version

pig –version 

If the installation is successful, the above command displays the installed Pig version number.

Testing Pig Installation

pig -h

This should display all the possible commands associated with pig

Your pig is now installed locally and you can run it using local parameter like

pig -x local

Connecting to Hadoop

If Hadoop1.x or 2.x is Installed on the cluster and the HADOOP_HOME environment variable is setup.

you can connect pig to Hadoop by adding the line in the .bashrc like before

export PIG_CLASSPATH = $HADOOP_HOME/conf

Running Pig

Execution Modes

You can run Pig either using the pig (bin/pig) command or by running jar file (java -cp pig.jar)

PIG scripts can be executed in 3 different modes:

  • Local Mode

     pig -x local ...
    
  • Mapreduce Mode (default mode)

     pig -x mapreduce ...
          (or)
     pig ...
    
  • Tez Local Mode

     pig -x tez ...
    

Interactive Mode

Pig can be run in interactive mode using the Grunt shell. Pig Latin statements and commands can be entered interactively in this shell.

Example

$ pig -x <mode> <enter>
grunt>

Mode can be one of execution modes as explained in the previous section.

Batch Mode

Pig can also be executed in batch mode. Here a .pig file containing a list of pig statements and commands is provided.

Example

$ pig -x <mode> <script.pig>
grunt>

Similarly Mode can be one of execution modes as explained in the previous section.