- All Implemented Interfaces:
- Destroyable, PipelineConfigurable, StageLifecycle<BatchRuntimeContext>, Transformation<KeyValue<LongWritable,Text>,KeyValue<String,co.cask.cdap.api.data.format.StructuredRecord>>
public class ConnectorSource
extends BatchSource<LongWritable,Text,KeyValue<String,co.cask.cdap.api.data.format.StructuredRecord>>
Internal batch source used as a connector between pipeline phases.
Though this extends BatchSource, this will not be instantiated through the plugin framework, but will
be created explicitly through the application.
The batch connector is just a PartitionedFileSet, where a partition is the name of a phase that wrote to it.
This way, multiple phases can have the same local PartitionedFileSet as a sink, and the source will read data
from all partitions.
This is because we don't want this to show up as a plugin that users can select and use, and also because
it uses features not exposed in the etl api (local workflow datasets).
TODO: improve storage format. It is currently a json of the record but that is obviously not ideal