public class TfIdf extends Object
Concepts: joining data; side inputs; logging
To execute this pipeline locally, specify a local output file or output prefix on GCS:
--output=[YOUR_LOCAL_FILE | gs://YOUR_OUTPUT_PREFIX]
To change the runner, specify:
--runner=YOUR_SELECTED_RUNNER
See examples/java/README.md for instructions about how to configure different runners.
The default input is gs://apache-beam-samples/shakespeare/ and can be overridden with
--input.
| Modifier and Type | Class and Description |
|---|---|
static class |
TfIdf.ComputeTfIdf
A transform containing a basic TF-IDF pipeline.
|
static class |
TfIdf.ReadDocuments
Reads the documents at the provided uris and returns all lines
from the documents tagged with which document they are from.
|
static class |
TfIdf.WriteTfIdf
A
PTransform to write, in CSV format, a mapping from term and URI
to score. |
| Constructor and Description |
|---|
TfIdf() |
| Modifier and Type | Method and Description |
|---|---|
static Set<URI> |
listInputDocuments(org.apache.beam.examples.complete.TfIdf.Options options)
Lists documents contained beneath the
options.input prefix/directory. |
static void |
main(String[] args) |
public static Set<URI> listInputDocuments(org.apache.beam.examples.complete.TfIdf.Options options) throws URISyntaxException, IOException
options.input prefix/directory.URISyntaxExceptionIOException