public class DebuggingWordCount extends Object
This class, DebuggingWordCount, is the third in a series of four successively more
detailed 'word count' examples. You may first want to take a look at MinimalWordCount
and WordCount. After you've looked at this example, then see the
WindowedWordCount pipeline, for introduction of additional concepts.
Basic concepts, also in the MinimalWordCount and WordCount examples: Reading text files; counting a PCollection; executing a Pipeline both locally and using a selected runner; defining DoFns.
New Concepts:
1. Logging to Cloud Logging 2. Controlling worker log levels 3. Creating a custom aggregator 4. Testing your Pipeline via PAssert
To execute this pipeline locally, specify general pipeline configuration:
--project=YOUR_PROJECT_ID
To change the runner, specify:
--runner=YOUR_SELECTED_RUNNER
To use the additional logging discussed below, specify:
--workerLogLevelOverrides={"org.apache.beam.examples":"DEBUG"}
Note that when you run via mvn exec, you may need to escape
the quotations as appropriate for your shell. For example, in bash:
mvn compile exec:java ... \
-Dexec.args="... \
--workerLogLevelOverrides={\\\"org.apache.beam.examples\\\":\\\"DEBUG\\\"}"
Concept #2: Dataflow workers which execute user code are configured to log to Cloud Logging by default at "INFO" log level and higher. One may override log levels for specific logging namespaces by specifying:
--workerLogLevelOverrides={"Name1":"Level1","Name2":"Level2",...}
For example, by specifying:
--workerLogLevelOverrides={"org.apache.beam.examples":"DEBUG"}
when executing this pipeline using the Dataflow service, Cloud Logging would contain only
"DEBUG" or higher level logs for the org.apache.beam.examples package in
addition to the default "INFO" or higher level logs. In addition, the default Dataflow worker
logging configuration can be overridden by specifying
--defaultWorkerLogLevel=<one of TRACE, DEBUG, INFO, WARN, ERROR>. For example,
by specifying --defaultWorkerLogLevel=DEBUG when executing this pipeline with
the Dataflow service, Cloud Logging would contain all "DEBUG" or higher level logs. Note
that changing the default worker log level to TRACE or DEBUG will significantly increase
the amount of logs output.
The input file defaults to gs://apache-beam-samples/shakespeare/kinglear.txt
and can be overridden with --inputFile.
| Modifier and Type | Class and Description |
|---|---|
static class |
DebuggingWordCount.FilterTextFn
A DoFn that filters for a specific key based upon a regular expression.
|
static interface |
DebuggingWordCount.WordCountOptions
Options supported by
DebuggingWordCount. |
| Constructor and Description |
|---|
DebuggingWordCount() |
public static void main(String[] args)