This directory contains a pre-built version of Hadoop for demonstrating OpenJDK-8 on aarch64 systems. This build of Hadoop deliberately contains no native code. Setup ===== To setup the environment please source the env.sh script. $ . env.sh You can verify that the installation is complete by verifying the existence of hadoop (on your PATH): $ which hadoop $ hadoop version Teragen Demo ============ The goal of TeraSort is to sort a large amount of data as fast as possible. The example comprises the following steps: 1) Generating the input data via teragen 2) Running the actual terasort on the input data 3) Validating the sorted output data via teravalidate Those discrete steps map to the following shell scripts: $ teragen <n-gigabytes> <output-filename> $ terasort <input-filename> <outout-filename> $ teravalidate <input-filename> <output-filename> for example: $ teragen 1 teragen-1GB $ terasort teragen-1GB terasort-1GB-sorted $ teravalidate terasort-1GB-sorted terasort-1GB-validated Available Demos =============== aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files. aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files. dbcount: An example job that count the pageview counts from a database. grep: A map/reduce program that counts the matches of a regex in the input. join: A job that effects a join over sorted, equally partitioned datasets multifilewc: A job that counts words from several files. pentomino: A map/reduce tile laying program to find solutions to pentomino problems. pi: A map/reduce program that estimates Pi using monte-carlo method. randomtextwriter: A map/reduce program that writes 10GB of random textual data per node. randomwriter: A map/reduce program that writes 10GB of random data per node. secondarysort: An example defining a secondary sort to the reduce. sleep: A job that sleeps at each map and reduce task. sort: A map/reduce program that sorts the data written by the random writer. sudoku: A sudoku solver. teragen: Generate data for the terasort terasort: Run the terasort teravalidate: Checking results of terasort wordcount: A map/reduce program that counts the words in the input files.