aboutsummaryrefslogtreecommitdiff
This directory contains a pre-built version of Hadoop for
demonstrating OpenJDK-8 on aarch64 systems.  This build of Hadoop
deliberately contains no native code.

Setup
=====

To setup the environment please source the env.sh script.

  $ . env.sh
  
You can verify that the installation is complete by verifying the
existence of hadoop (on your PATH):

  $ which hadoop
  $ hadoop version

Teragen Demo
============

The goal of TeraSort is to sort a large amount of data as fast as
possible. The example comprises the following steps:

  1) Generating the input data via teragen
  2) Running the actual terasort on the input data
  3) Validating the sorted output data via teravalidate

Those discrete steps map to the following shell scripts:

  $ teragen       <n-gigabytes> <output-filename>
  $ terasort      <input-filename> <outout-filename>
  $ teravalidate  <input-filename> <output-filename>

for example:

  $ teragen 1 teragen-1GB
  $ terasort teragen-1GB terasort-1GB-sorted
  $ teravalidate terasort-1GB-sorted terasort-1GB-validated

Available Demos
===============

  aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
  aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
  dbcount: An example job that count the pageview counts from a database.
  grep: A map/reduce program that counts the matches of a regex in the input.
  join: A job that effects a join over sorted, equally partitioned datasets
  multifilewc: A job that counts words from several files.
  pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
  pi: A map/reduce program that estimates Pi using monte-carlo method.
  randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
  randomwriter: A map/reduce program that writes 10GB of random data per node.
  secondarysort: An example defining a secondary sort to the reduce.
  sleep: A job that sleeps at each map and reduce task.
  sort: A map/reduce program that sorts the data written by the random writer.
  sudoku: A sudoku solver.
  teragen: Generate data for the terasort
  terasort: Run the terasort
  teravalidate: Checking results of terasort
  wordcount: A map/reduce program that counts the words in the input files.