public class WordCount extends Object
This class, WordCount, is the second in a series of four successively more detailed
'word count' examples. You may first want to take a look at MinimalWordCount.
After you've looked at this example, then see the DebuggingWordCount
pipeline, for introduction of additional concepts.
For a detailed walkthrough of this example, see http://beam.incubator.apache.org/use/walkthroughs/
Basic concepts, also in the MinimalWordCount example: Reading text files; counting a PCollection; writing to GCS.
New Concepts:
1. Executing a Pipeline both locally and using the selected runner 2. Using ParDo with static DoFns defined out-of-line 3. Building a composite transform 4. Defining your own pipeline options
Concept #1: you can execute this pipeline either locally or using the selected runner. These are now command-line options and not hard-coded as they were in the MinimalWordCount example. To execute this pipeline locally, specify a local output file or output prefix on GCS:
--output=[YOUR_LOCAL_FILE | gs://YOUR_OUTPUT_PREFIX]
To change the runner, specify:
--runner=YOUR_SELECTED_RUNNER
See examples/java/README.md for instructions about how to configure different runners.
The input file defaults to gs://apache-beam-samples/shakespeare/kinglear.txt
and can be overridden with --inputFile.
| Modifier and Type | Class and Description |
|---|---|
static class |
WordCount.CountWords
A PTransform that converts a PCollection containing lines of text into a PCollection of
formatted word counts.
|
static class |
WordCount.FormatAsTextFn
A SimpleFunction that converts a Word and Count into a printable string.
|
static interface |
WordCount.WordCountOptions
Options supported by
WordCount. |
| Constructor and Description |
|---|
WordCount() |
public static void main(String[] args)