This post is the second in a hopefully substantive and informative series of posts about Apache Crunch, a framework for enabling Java developers to write Map-Reduce programs more easily for Hadoop.

In my previous and first tutorial on Apache Crunch, I talked about the benefits of Crunch, and some basic driver code to help you understand what Crunch can do at an entry level. In today’s entry, I’d like to walk you through getting Crunch installed on your local machine so you can start playing with it yourself. If you’ve done this already, you’ll love the next tutorial on Java objects and materialization (coming soon).

Let’s talk about what you need:


Install Eclipse or IntelliJ (but seriously though, if you’re just getting set up, get IntelliJ, it’s amazing). You’ve got this step down, I’m sure.

Install Maven. You can do this in less than 5 minutes on any home operating system (Windows, Mac, Linux), by following these steps on the Maven site. If you’re on Mac or Linux, this is an even simpler process:

Open Terminal (or equivalent command line). Enter these commands exactly (one at a time):

ruby -e "$(curl -fsSL"
brew install maven

The first command will ask for you to confirm by pressing Enter, do so. It’ll also ask for a password to confirm, enter it. You should see an ‘Installation Successful’ statement upon completion. The second command will install Maven using what you installed in the first command, Brew. It’s about 10mb in size, and should say as much when it completes.

Create a Crunch Project (comprehensive guide here). You can do this through a few short command line commands. Note: for any command in the code below that is bolded, it means you can customize it. For example, you don’t have to call your package com.hadoopsters.bigdata, you can call it mycompany.banana.suitcase, but it’s best to follow Java package naming conventions. The same applies for crunchdemo, you can call it MyCrunchDemoSupreme, it’s up to you.

  1. Open Terminal (or equivalent command line). Navigate to your development work area, such as an Eclipse Workspace or code project folder on your Mac.
  2. Enter these command exactly (one at a time):
    mvn archetype:generate -Dfilter=org.apache.crunch:crunch-archetype 
  3. Prompt will say 1.0-SNAPSHOT, but just hit ENTER.
  4. Prompt will say com.bigdata.crunch, but just hit ENTER.
  5. Prompt will say “Y:”, but just hit ENTER.
  6. Your Crunch project should be installed in the current folder in a directory called crunchdemo (or whatever you named it).

Expected output:

[INFO] Generating project in Interactive mode
[INFO] No archetype defined. Using maven-archetype-quickstart (org.apache.maven.archetypes:maven-archetype-quickstart:1.0)
Choose archetype:
1: remote -> org.apache.crunch:crunch-archetype (Create a basic, self-contained job for Apache Crunch.)
Choose a number or apply filter (format: [groupId:]artifactId, case sensitive contains): : 1
Choose org.apache.crunch:crunch-archetype version:
1: 0.4.0-incubating
2: 0.5.0-incubating
3: 0.6.0
4: 0.7.0
5: 0.7.0-hadoop2
6: 0.8.0
7: 0.8.0-hadoop2
8: 0.8.1
9: 0.8.1-hadoop2
10: 0.8.2
11: 0.8.2-hadoop2
12: 0.8.3
13: 0.8.3-hadoop2
14: 0.8.4
15: 0.8.4-hadoop2
16: 0.9.0
17: 0.9.0-hadoop2
18: 0.10.0
19: 0.10.0-hadoop2
20: 0.11.0
21: 0.11.0-hadoop2
22: 0.12.0
23: 0.12.0-hadoop2
24: 0.13.0
Choose a number: 24: 
Downloaded: (15 KB at 19.1 KB/sec)
Downloaded: (4 KB at 13.3 KB/sec)
Define value for property 'groupId': : com.hadoopsters.bigdata
Define value for property 'artifactId': : crunchdemo
Define value for property 'version': 1.0-SNAPSHOT: :
Define value for property 'package': com.hadoopsters.bigdata: :
Confirm properties configuration:
groupId: com.hadoopsters.bigdata
artifactId: crunchdemo
version: 1.0-SNAPSHOT
package: com.hadoopsters.bigdata
Y: :
[INFO] ----------------------------------------------------------------------------
[INFO] Using following parameters for creating project from Archetype: crunch-archetype:0.13.0
[INFO] ----------------------------------------------------------------------------
[INFO] Parameter: groupId, Value: com.hadoopsters.bigdata
[INFO] Parameter: artifactId, Value: crunchdemo
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] Parameter: package, Value: com.hadoopsters.bigdata
[INFO] Parameter: packageInPathFormat, Value: com/hadoopsters/bigdata
[INFO] Parameter: package, Value: com.hadoopsters.bigdata
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] Parameter: groupId, Value: com.hadoopsters.bigdata
[INFO] Parameter: artifactId, Value: crunchdemo
[INFO] project created from Archetype in dir: /Users/landon/Desktop/DevWorkspace/crunchdemo
[INFO] ------------------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 04:35 min
[INFO] Finished at: 2015-10-01T22:42:09-04:00
[INFO] Final Memory: 13M/120M
[INFO] ------------------------------------------------------------------------

If everything went well, you should have a Crunch project ready to go! Let’s see what’s in it by importing the project to IntelliJ.

Import into IntelliJ (as Maven project)

Screen Shot 2015-10-01 at 11.02.07 PM

Screen Shot 2015-10-01 at 11.02.52 PM

Screen Shot 2015-10-01 at 11.03.17 PM

Screen Shot 2015-10-01 at 11.06.24 PM

Screen Shot 2015-10-01 at 11.06.34 PM

Screen Shot 2015-10-01 at 11.09.02 PM

Now you have a Crunch project, and can start playing with things in the MemPipeline on your local machine (or Map/Reduce and Spark if you’re so bold, though I’d recommend getting familiar with Crunch in local form first). Definitely walk through the Wordcount Example on the Apache Crunch website, and see how it works!

Next time, we’ll write our first Crunch program in a MemPipeline, and explore more advanced topics like Java objects and materialization.

<< Previous Tutorial  |  Next Tutorial >>

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.