@Beta public abstract class JavaSparkExecutionContext extends Object implements RuntimeContext, Transactional, WorkflowInfoProvider
| Constructor and Description |
|---|
JavaSparkExecutionContext() |
| Modifier and Type | Method and Description |
|---|---|
<K,V> org.apache.spark.api.java.JavaPairRDD<K,V> |
fromDataset(String datasetName)
Creates a
JavaPairRDD from the given Dataset. |
<K,V> org.apache.spark.api.java.JavaPairRDD<K,V> |
fromDataset(String datasetName,
Map<String,String> arguments)
Creates a
JavaPairRDD from the given Dataset with the given set of Dataset arguments. |
abstract <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> |
fromDataset(String datasetName,
Map<String,String> arguments,
Iterable<? extends Split> splits)
|
org.apache.spark.api.java.JavaRDD<StreamEvent> |
fromStream(String streamName)
Creates a
JavaRDD that represents all events from the given stream. |
<V> org.apache.spark.api.java.JavaPairRDD<Long,V> |
fromStream(String streamName,
Class<V> valueType)
Creates a
JavaPairRDD that represents all events from the given stream. |
<T> org.apache.spark.api.java.JavaPairRDD<Long,co.cask.cdap.api.stream.GenericStreamEventData<T>> |
fromStream(String streamName,
co.cask.cdap.api.data.format.FormatSpecification formatSpec,
Class<T> dataType)
Creates a
JavaPairRDD that represents all events from the given stream. |
abstract <T> org.apache.spark.api.java.JavaPairRDD<Long,co.cask.cdap.api.stream.GenericStreamEventData<T>> |
fromStream(String streamName,
co.cask.cdap.api.data.format.FormatSpecification formatSpec,
long startTime,
long endTime,
Class<T> dataType)
Creates a
JavaPairRDD that represents data from the given stream for events in the given
time range. |
abstract org.apache.spark.api.java.JavaRDD<StreamEvent> |
fromStream(String streamName,
long startTime,
long endTime)
Creates a
JavaRDD that represents events from the given stream in the given time range. |
abstract <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> |
fromStream(String streamName,
long startTime,
long endTime,
Class<? extends co.cask.cdap.api.stream.StreamEventDecoder<K,V>> decoderClass,
Class<K> keyType,
Class<V> valueType)
Creates a
JavaPairRDD that represents events from the given stream in the given time range. |
abstract <V> org.apache.spark.api.java.JavaPairRDD<Long,V> |
fromStream(String streamName,
long startTime,
long endTime,
Class<V> valueType)
Creates a
JavaPairRDD that represents events from the given stream in the given time range. |
abstract TaskLocalizationContext |
getLocalizationContext()
Returns a
Serializable TaskLocalizationContext which can be used to retrieve files localized to
task containers. |
abstract long |
getLogicalStartTime()
Returns the logical start time of this Spark job.
|
abstract Metrics |
getMetrics()
|
abstract PluginContext |
getPluginContext()
Returns a
Serializable PluginContext which can be used to request for plugins instances. |
abstract ServiceDiscoverer |
getServiceDiscoverer()
Returns a
Serializable ServiceDiscoverer for Service Discovery in Spark Program which can be
passed in Spark program's closures. |
abstract SparkSpecification |
getSpecification() |
<K,V> void |
saveAsDataset(org.apache.spark.api.java.JavaPairRDD<K,V> rdd,
String datasetName)
Saves the given
JavaPairRDD to the given Dataset. |
abstract <K,V> void |
saveAsDataset(org.apache.spark.api.java.JavaPairRDD<K,V> rdd,
String datasetName,
Map<String,String> arguments)
Saves the given
JavaPairRDD to the given Dataset with the given set of Dataset arguments. |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetAdmin, getApplicationSpecification, getNamespace, getRunId, getRuntimeArgumentsexecutegetWorkflowInfo, getWorkflowTokenpublic abstract SparkSpecification getSpecification()
Spark job instance.public abstract long getLogicalStartTime()
public abstract ServiceDiscoverer getServiceDiscoverer()
Serializable ServiceDiscoverer for Service Discovery in Spark Program which can be
passed in Spark program's closures.Serializable ServiceDiscovererpublic abstract Metrics getMetrics()
Serializable Metrics which can be used to emit custom metrics from user's Spark
program. This can also be passed in Spark program's closures and workers can emit their own metricsSerializable Metrics for Spark programspublic abstract PluginContext getPluginContext()
Serializable PluginContext which can be used to request for plugins instances. The
instance returned can also be used in Spark program's closures.Serializable PluginContext.public abstract TaskLocalizationContext getLocalizationContext()
Serializable TaskLocalizationContext which can be used to retrieve files localized to
task containers. The instance returned can also be used in Spark program's closures.TaskLocalizationContext for the Spark programpublic <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> fromDataset(String datasetName)
JavaPairRDD from the given Dataset.K - key typeV - value typedatasetName - name of the DatasetJavaPairRDD instance that reads from the given DatasetDatasetInstantiationException - if the Dataset doesn't existpublic <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> fromDataset(String datasetName, Map<String,String> arguments)
JavaPairRDD from the given Dataset with the given set of Dataset arguments.K - key typeV - value typedatasetName - name of the Datasetarguments - arguments for the DatasetJavaPairRDD instance that reads from the given DatasetDatasetInstantiationException - if the Dataset doesn't existpublic abstract <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> fromDataset(String datasetName, Map<String,String> arguments, @Nullable Iterable<? extends Split> splits)
JavaPairRDD from the given Dataset with the given set of Dataset arguments
and custom list of Splits. Each Split will create a Partition in the JavaPairRDD.K - key typeV - value typedatasetName - name of the Datasetarguments - arguments for the Datasetsplits - list of Split or null to use the default splits provided by the DatasetJavaPairRDD instance that reads from the given DatasetDatasetInstantiationException - if the Dataset doesn't existpublic org.apache.spark.api.java.JavaRDD<StreamEvent> fromStream(String streamName)
JavaRDD that represents all events from the given stream.streamName - name of the streamJavaRDD instance that reads from the given streamDatasetInstantiationException - if the Stream doesn't existpublic abstract org.apache.spark.api.java.JavaRDD<StreamEvent> fromStream(String streamName, long startTime, long endTime)
JavaRDD that represents events from the given stream in the given time range.streamName - name of the streamstartTime - the starting time of the stream to be read in milliseconds (inclusive);
passing in 0 means start reading from the first event available in the stream.endTime - the ending time of the streams to be read in milliseconds (exclusive);
passing in Long.MAX_VALUE means read up to latest event available in the stream.JavaRDD instance that reads from the given streamDatasetInstantiationException - if the Stream doesn't existpublic <V> org.apache.spark.api.java.JavaPairRDD<Long,V> fromStream(String streamName, Class<V> valueType)
JavaPairRDD that represents all events from the given stream. The key in the
resulting JavaPairRDD is the event timestamp. The stream body will
be decoded as the give value type. Currently it supports Text, String and ByteWritable.streamName - name of the streamvalueType - type of the stream body to decode toJavaRDD instance that reads from the given streamDatasetInstantiationException - if the Stream doesn't existpublic abstract <V> org.apache.spark.api.java.JavaPairRDD<Long,V> fromStream(String streamName, long startTime, long endTime, Class<V> valueType)
JavaPairRDD that represents events from the given stream in the given time range.
The key in the resulting JavaPairRDD is the event timestamp.
The stream body will be decoded as the give value type.
Currently it supports Text, String and ByteWritable.streamName - name of the streamstartTime - the starting time of the stream to be read in milliseconds (inclusive);
passing in 0 means start reading from the first event available in the stream.endTime - the ending time of the streams to be read in milliseconds (exclusive);
passing in Long.MAX_VALUE means read up to latest event available in the stream.valueType - type of the stream body to decode toJavaRDD instance that reads from the given streamDatasetInstantiationException - if the Stream doesn't existpublic abstract <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> fromStream(String streamName, long startTime, long endTime, Class<? extends co.cask.cdap.api.stream.StreamEventDecoder<K,V>> decoderClass, Class<K> keyType, Class<V> valueType)
JavaPairRDD that represents events from the given stream in the given time range.
Each steam event will be decoded by an instance of the given StreamEventDecoder class.streamName - name of the streamstartTime - the starting time of the stream to be read in milliseconds (inclusive);
passing in 0 means start reading from the first event available in the stream.endTime - the ending time of the streams to be read in milliseconds (exclusive);
passing in Long.MAX_VALUE means read up to latest event available in the stream.decoderClass - the StreamEventDecoder for decoding StreamEventkeyType - the type of the decoded keyvalueType - the type of the decoded valueJavaRDD instance that reads from the given streamDatasetInstantiationException - if the Stream doesn't existpublic <T> org.apache.spark.api.java.JavaPairRDD<Long,co.cask.cdap.api.stream.GenericStreamEventData<T>> fromStream(String streamName, co.cask.cdap.api.data.format.FormatSpecification formatSpec, Class<T> dataType)
JavaPairRDD that represents all events from the given stream.
The first entry in the pair is a Long, representing the
event timestamp, while the second entry is a GenericStreamEventData,
which contains data decoded from the stream event body base on
the given FormatSpecification.T - value typestreamName - name of the streamformatSpec - the FormatSpecification describing the format in the streamJavaPairRDD instance that reads from the given stream.DatasetInstantiationException - if the Stream doesn't existpublic abstract <T> org.apache.spark.api.java.JavaPairRDD<Long,co.cask.cdap.api.stream.GenericStreamEventData<T>> fromStream(String streamName, co.cask.cdap.api.data.format.FormatSpecification formatSpec, long startTime, long endTime, Class<T> dataType)
JavaPairRDD that represents data from the given stream for events in the given
time range. The first entry in the pair is a Long, representing the
event timestamp, while the second entry is a GenericStreamEventData,
which contains data decoded from the stream event body base on
the given FormatSpecification.T - value typestreamName - name of the streamformatSpec - the FormatSpecification describing the format in the streamstartTime - the starting time of the stream to be read in milliseconds (inclusive);
passing in 0 means start reading from the first event available in the stream.endTime - the ending time of the streams to be read in milliseconds (exclusive);
passing in Long.MAX_VALUE means read up to latest event available in the stream.JavaPairRDD instance that reads from the given stream.DatasetInstantiationException - if the Stream doesn't existpublic <K,V> void saveAsDataset(org.apache.spark.api.java.JavaPairRDD<K,V> rdd,
String datasetName)
JavaPairRDD to the given Dataset.rdd - the JavaPairRDD to be saveddatasetName - name of the DatasetDatasetInstantiationException - if the Dataset doesn't existpublic abstract <K,V> void saveAsDataset(org.apache.spark.api.java.JavaPairRDD<K,V> rdd,
String datasetName,
Map<String,String> arguments)
JavaPairRDD to the given Dataset with the given set of Dataset arguments.rdd - the JavaPairRDD to be saveddatasetName - name of the Datasetarguments - arguments for the DatasetDatasetInstantiationException - if the Dataset doesn't existCopyright © 2016 Cask Data, Inc. Licensed under the Apache License, Version 2.0.