public class TFIDF extends Object
Concepts: joining data; side inputs; logging
To execute this pipeline locally, specify general pipeline configuration:
--project=YOUR_PROJECT_ID
and a local output file or output prefix on GCS:
--output=[YOUR_LOCAL_FILE | gs://YOUR_OUTPUT_PREFIX]
To execute this pipeline using the Dataflow service, specify pipeline configuration:
--project=YOUR_PROJECT_ID
--stagingLocation=gs://YOUR_STAGING_DIRECTORY
--runner=BlockingDataflowPipelineRunner
and an output prefix on GCS:
--output=gs://YOUR_OUTPUT_PREFIX
The default input is gs://dataflow-samples/shakespeare/ and can be overridden with
--input.
| Modifier and Type | Class and Description |
|---|---|
static class |
TFIDF.ComputeTfIdf
A transform containing a basic TF-IDF pipeline.
|
static class |
TFIDF.ReadDocuments
Reads the documents at the provided uris and returns all lines
from the documents tagged with which document they are from.
|
static class |
TFIDF.WriteTfIdf
A
PTransform to write, in CSV format, a mapping from term and URI
to score. |
| Constructor and Description |
|---|
TFIDF() |
| Modifier and Type | Method and Description |
|---|---|
static Set<URI> |
listInputDocuments(org.apache.beam.runners.flink.examples.TFIDF.Options options)
Lists documents contained beneath the
options.input prefix/directory. |
static void |
main(String[] args) |
public static Set<URI> listInputDocuments(org.apache.beam.runners.flink.examples.TFIDF.Options options) throws URISyntaxException, IOException
options.input prefix/directory.URISyntaxExceptionIOExceptionCopyright © 2016 The Apache Software Foundation. All rights reserved.