- All Implemented Interfaces:
- Destroyable, PipelineConfigurable, StageLifecycle<BatchRuntimeContext>, Transformation<KeyValue<String,co.cask.cdap.api.data.format.StructuredRecord>,KeyValue<NullWritable,Text>>
public class ConnectorSink
extends BatchSink<KeyValue<String,co.cask.cdap.api.data.format.StructuredRecord>,NullWritable,Text>
Internal batch sink used as a connector between pipeline phases.
Though this extends BatchSink, this will not be instantiated through the plugin framework, but will
be created explicitly through the application.
The batch connector is just a PartitionedFileSet, where a partition is the name of a phase that wrote to it.
This way, multiple phases can have the same local PartitionedFileSet as a sink, and the source will read data
from all partitions.
This is because we don't want this to show up as a plugin that users can select and use, and also because
it uses features not exposed in the etl api (local workflow datasets).
TODO: improve storage format. It is currently a json of the record but that is obviously not ideal