public class WindowedWordCount extends Object
This class, WindowedWordCount, is the last in a series of four successively more
detailed 'word count' examples. First take a look at MinimalWordCount,
WordCount, and DebuggingWordCount.
Basic concepts, also in the MinimalWordCount, WordCount, and DebuggingWordCount examples: Reading text files; counting a PCollection; writing to GCS; executing a Pipeline both locally and using a selected runner; defining DoFns; creating a custom aggregator; user-defined PTransforms; defining PipelineOptions.
New Concepts:
1. Unbounded and bounded pipeline input modes 2. Adding timestamps to data 3. Windowing 4. Re-using PTransforms over windowed PCollections 5. Writing to BigQuery
By default, the examples will run with the DirectRunner.
To change the runner, specify:
--runner=YOUR_SELECTED_RUNNER
See examples/java/README.md for instructions about how to configure different runners.
Optionally specify the input file path via:
--inputFile=gs://INPUT_PATH,
which defaults to gs://apache-beam-samples/shakespeare/kinglear.txt.
Specify an output BigQuery dataset and optionally, a table for the output. If you don't
specify the table, one will be created for you using the job name. If you don't specify the
dataset, a dataset called beam_examples must already exist in your project.
--bigQueryDataset=YOUR-DATASET --bigQueryTable=YOUR-NEW-TABLE-NAME.
By default, the pipeline will do fixed windowing, on 1-minute windows. You can
change this interval by setting the --windowSize parameter, e.g. --windowSize=10
for 10-minute windows.
The example will try to cancel the pipelines on the signal to terminate the process (CTRL-C) and then exits.
| Modifier and Type | Class and Description |
|---|---|
static interface |
WindowedWordCount.Options
Options supported by
WindowedWordCount. |
| Constructor and Description |
|---|
WindowedWordCount() |
public static void main(String[] args) throws IOException
IOException