public class HourlyTeamScore extends UserScore
UserScore. In addition to the concepts introduced in UserScore,
new concepts include: windowing and element timestamps; use of Filter.by().
This pipeline processes data collected from gaming events in batch, building on UserScore but using fixed windows. It calculates the sum of scores per team, for each window,
optionally allowing specification of two timestamps before and after which data is filtered out.
This allows a model where late data collected after the intended analysis window can be included,
and any late-arriving data prior to the beginning of the analysis window can be removed as well.
By using windowing and adding element timestamps, we can do finer-grained analysis than with the
UserScore pipeline. However, our batch processing is high-latency, in that we don't get
results from plays at the beginning of the batch's time period until the batch is processed.
To execute this pipeline using the Dataflow service, specify the pipeline configuration like this:
--project=YOUR_PROJECT_ID
--tempLocation=gs://YOUR_TEMP_DIRECTORY
--runner=BlockingDataflowRunner
--dataset=YOUR-DATASET
where the BigQuery dataset you specify must already exist.
Optionally include --input to specify the batch input file path.
To indicate a time after which the data should be filtered out, include the
--stopMin arg. E.g., --stopMin=2015-10-18-23-59 indicates that any data
timestamped after 23:59 PST on 2015-10-18 should not be included in the analysis.
To indicate a time before which data should be filtered out, include the --startMin arg.
If you're using the default input specified in UserScore,
"gs://dataflow-samples/game/gaming_data*.csv", then
--startMin=2015-11-16-16-10 --stopMin=2015-11-17-16-10 are good values.
UserScore.ExtractAndSumScore| Constructor and Description |
|---|
HourlyTeamScore() |
| Modifier and Type | Method and Description |
|---|---|
protected static Map<String,WriteToBigQuery.FieldInfo<KV<String,Integer>>> |
configureWindowedTableWrite()
Create a map of information that describes how to write pipeline output to BigQuery.
|
static void |
main(String[] args)
Run a batch pipeline to do windowed analysis of the data.
|
configureBigQueryWriteprotected static Map<String,WriteToBigQuery.FieldInfo<KV<String,Integer>>> configureWindowedTableWrite()
WriteWindowedToBigQuery constructor to write team score sums and
includes information about window start time.Copyright © 2016 The Apache Software Foundation. All rights reserved.