@Beta public abstract class JavaSparkExecutionContextBase extends Object implements SchedulableProgramContext, RuntimeContext, Transactional, WorkflowInfoProvider, SecureStore, MetadataReader, MetadataWriter
| Constructor and Description |
|---|
JavaSparkExecutionContextBase() |
| Modifier and Type | Method and Description |
|---|---|
abstract co.cask.cdap.api.spark.dynamic.SparkInterpreter |
createInterpreter()
Creates a new instance of
SparkInterpreter for Scala code compilation and interpretation. |
abstract void |
execute(int timeoutInSeconds,
TxRunnable runnable)
Transactions with a custom timeout are not supported in Spark.
|
<K,V> org.apache.spark.api.java.JavaPairRDD<K,V> |
fromDataset(String datasetName)
Creates a
JavaPairRDD from the given Dataset. |
<K,V> org.apache.spark.api.java.JavaPairRDD<K,V> |
fromDataset(String datasetName,
Map<String,String> arguments)
Creates a
JavaPairRDD from the given Dataset with the given set of dataset arguments. |
abstract <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> |
fromDataset(String datasetName,
Map<String,String> arguments,
Iterable<? extends Split> splits)
|
<K,V> org.apache.spark.api.java.JavaPairRDD<K,V> |
fromDataset(String namespace,
String datasetName)
Creates a
JavaPairRDD from the given Dataset. |
<K,V> org.apache.spark.api.java.JavaPairRDD<K,V> |
fromDataset(String namespace,
String datasetName,
Map<String,String> arguments)
Creates a
JavaPairRDD from the given Dataset with the given set of dataset arguments. |
abstract <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> |
fromDataset(String namespace,
String datasetName,
Map<String,String> arguments,
Iterable<? extends Split> splits)
|
org.apache.spark.api.java.JavaRDD<StreamEvent> |
fromStream(String streamName)
Creates a
JavaRDD that represents all events from the given stream. |
<V> org.apache.spark.api.java.JavaPairRDD<Long,V> |
fromStream(String streamName,
Class<V> valueType)
Creates a
JavaPairRDD that represents all events from the given stream. |
<T> org.apache.spark.api.java.JavaPairRDD<Long,co.cask.cdap.api.stream.GenericStreamEventData<T>> |
fromStream(String streamName,
co.cask.cdap.api.data.format.FormatSpecification formatSpec,
Class<T> dataType)
Creates a
JavaPairRDD that represents all events from the given stream. |
abstract <T> org.apache.spark.api.java.JavaPairRDD<Long,co.cask.cdap.api.stream.GenericStreamEventData<T>> |
fromStream(String streamName,
co.cask.cdap.api.data.format.FormatSpecification formatSpec,
long startTime,
long endTime,
Class<T> dataType)
Creates a
JavaPairRDD that represents data from the given stream for events in the given
time range. |
abstract org.apache.spark.api.java.JavaRDD<StreamEvent> |
fromStream(String streamName,
long startTime,
long endTime)
Creates a
JavaRDD that represents events from the given stream in the given time range. |
abstract <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> |
fromStream(String streamName,
long startTime,
long endTime,
Class<? extends co.cask.cdap.api.stream.StreamEventDecoder<K,V>> decoderClass,
Class<K> keyType,
Class<V> valueType)
Creates a
JavaPairRDD that represents events from the given stream in the given time range. |
abstract <V> org.apache.spark.api.java.JavaPairRDD<Long,V> |
fromStream(String streamName,
long startTime,
long endTime,
Class<V> valueType)
Creates a
JavaPairRDD that represents events from the given stream in the given time range. |
org.apache.spark.api.java.JavaRDD<StreamEvent> |
fromStream(String namespace,
String streamName)
Creates a
JavaRDD that represents all events from the given stream. |
<V> org.apache.spark.api.java.JavaPairRDD<Long,V> |
fromStream(String namespace,
String streamName,
Class<V> valueType)
Creates a
JavaPairRDD that represents all events from the given stream. |
<T> org.apache.spark.api.java.JavaPairRDD<Long,co.cask.cdap.api.stream.GenericStreamEventData<T>> |
fromStream(String namespace,
String streamName,
co.cask.cdap.api.data.format.FormatSpecification formatSpec,
Class<T> dataType)
Creates a
JavaPairRDD that represents all events from the given stream. |
abstract <T> org.apache.spark.api.java.JavaPairRDD<Long,co.cask.cdap.api.stream.GenericStreamEventData<T>> |
fromStream(String namespace,
String streamName,
co.cask.cdap.api.data.format.FormatSpecification formatSpec,
long startTime,
long endTime,
Class<T> dataType)
Creates a
JavaPairRDD that represents data from the given stream for events in the given
time range. |
abstract org.apache.spark.api.java.JavaRDD<StreamEvent> |
fromStream(String namespace,
String streamName,
long startTime,
long endTime)
Creates a
JavaRDD that represents events from the given stream in the given time range. |
abstract <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> |
fromStream(String namespace,
String streamName,
long startTime,
long endTime,
Class<? extends co.cask.cdap.api.stream.StreamEventDecoder<K,V>> decoderClass,
Class<K> keyType,
Class<V> valueType)
Creates a
JavaPairRDD that represents events from the given stream in the given time range. |
abstract <V> org.apache.spark.api.java.JavaPairRDD<Long,V> |
fromStream(String namespace,
String streamName,
long startTime,
long endTime,
Class<V> valueType)
Creates a
JavaPairRDD that represents events from the given stream in the given time range. |
abstract TaskLocalizationContext |
getLocalizationContext()
Returns a
Serializable TaskLocalizationContext which can be used to retrieve files localized to
task containers. |
abstract long |
getLogicalStartTime()
Returns the logical start time of this Spark job.
|
abstract MessagingContext |
getMessagingContext()
Returns a
MessagingContext which can be used to interact with the transactional
messaging system. |
abstract Metrics |
getMetrics()
|
abstract PluginContext |
getPluginContext()
Returns a
Serializable PluginContext which can be used to request for plugins instances. |
abstract SecureStore |
getSecureStore()
Returns a
Serializable SecureStore which can be used to request for plugins instances. |
abstract ServiceDiscoverer |
getServiceDiscoverer()
Returns a
Serializable ServiceDiscoverer for Service Discovery in Spark Program which can be
passed in Spark program's closures. |
abstract SparkExecutionContext |
getSparkExecutionContext()
Returns the underlying
SparkExecutionContext used by this object. |
abstract SparkSpecification |
getSpecification() |
<K,V> void |
saveAsDataset(org.apache.spark.api.java.JavaPairRDD<K,V> rdd,
String datasetName)
Saves the given
JavaPairRDD to the given Dataset. |
abstract <K,V> void |
saveAsDataset(org.apache.spark.api.java.JavaPairRDD<K,V> rdd,
String datasetName,
Map<String,String> arguments)
Saves the given
JavaPairRDD to the given Dataset with the given set of Dataset arguments. |
<K,V> void |
saveAsDataset(org.apache.spark.api.java.JavaPairRDD<K,V> rdd,
String namespace,
String datasetName)
Saves the given
JavaPairRDD to the given Dataset. |
abstract <K,V> void |
saveAsDataset(org.apache.spark.api.java.JavaPairRDD<K,V> rdd,
String namespace,
String datasetName,
Map<String,String> arguments)
Saves the given
JavaPairRDD to the given Dataset with the given set of dataset arguments. |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetTriggeringScheduleInfogetAdmin, getApplicationSpecification, getClusterName, getDataTracer, getNamespace, getRunId, getRuntimeArgumentsexecutegetWorkflowInfo, getWorkflowTokengetSecureData, listSecureDatagetMetadata, getMetadataaddProperties, addTags, addTags, removeMetadata, removeProperties, removeProperties, removeTags, removeTagspublic abstract SparkSpecification getSpecification()
Spark job instance.public abstract long getLogicalStartTime()
public abstract ServiceDiscoverer getServiceDiscoverer()
Serializable ServiceDiscoverer for Service Discovery in Spark Program which can be
passed in Spark program's closures.Serializable ServiceDiscovererpublic abstract Metrics getMetrics()
Serializable Metrics which can be used to emit custom metrics from user's Spark
program. This can also be passed in Spark program's closures and workers can emit their own metricsSerializable Metrics for Spark programspublic abstract PluginContext getPluginContext()
Serializable PluginContext which can be used to request for plugins instances. The
instance returned can also be used in Spark program's closures.Serializable PluginContext.public abstract SecureStore getSecureStore()
Serializable SecureStore which can be used to request for plugins instances. The
instance returned can also be used in Spark program's closures.Serializable SecureStore.public abstract MessagingContext getMessagingContext()
MessagingContext which can be used to interact with the transactional
messaging system. Currently the returned instance can only be used in the Spark driver process.MessagingContextpublic abstract TaskLocalizationContext getLocalizationContext()
Serializable TaskLocalizationContext which can be used to retrieve files localized to
task containers. The instance returned can also be used in Spark program's closures.TaskLocalizationContext for the Spark programpublic <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> fromDataset(String datasetName)
JavaPairRDD from the given Dataset.K - key typeV - value typedatasetName - name of the datasetJavaPairRDD instance that reads from the given datasetDatasetInstantiationException - if the dataset doesn't existpublic <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> fromDataset(String namespace, String datasetName)
JavaPairRDD from the given Dataset.K - key typeV - value typenamespace - namespace in which the dataset existsdatasetName - name of the datasetJavaPairRDD instance that reads from the given datasetDatasetInstantiationException - if the dataset doesn't existpublic <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> fromDataset(String datasetName, Map<String,String> arguments)
JavaPairRDD from the given Dataset with the given set of dataset arguments.K - key typeV - value typedatasetName - name of the datasetarguments - arguments for the datasetJavaPairRDD instance that reads from the given datasetDatasetInstantiationException - if the dataset doesn't existpublic <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> fromDataset(String namespace, String datasetName, Map<String,String> arguments)
JavaPairRDD from the given Dataset with the given set of dataset arguments.K - key typeV - value typenamespace - namespace in which the dataset existsdatasetName - name of the datasetarguments - arguments for the datasetJavaPairRDD instance that reads from the given datasetDatasetInstantiationException - if the dataset doesn't existpublic abstract <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> fromDataset(String datasetName, Map<String,String> arguments, @Nullable Iterable<? extends Split> splits)
JavaPairRDD from the given Dataset with the given set of dataset arguments
and custom list of Splits. Each Split will create a Partition in the JavaPairRDD.K - key typeV - value typedatasetName - name of the datasetarguments - arguments for the datasetsplits - list of Split or null to use the default splits provided by the datasetJavaPairRDD instance that reads from the given datasetDatasetInstantiationException - if the dataset doesn't existpublic abstract <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> fromDataset(String namespace, String datasetName, Map<String,String> arguments, @Nullable Iterable<? extends Split> splits)
JavaPairRDD from the given Dataset with the given set of dataset arguments
and custom list of Splits. Each Split will create a Partition in the JavaPairRDD.K - key typeV - value typenamespace - namespace in which the dataset existsdatasetName - name of the datasetarguments - arguments for the datasetsplits - list of Split or null to use the default splits provided by the datasetJavaPairRDD instance that reads from the given datasetDatasetInstantiationException - if the dataset doesn't existpublic org.apache.spark.api.java.JavaRDD<StreamEvent> fromStream(String streamName)
JavaRDD that represents all events from the given stream.streamName - name of the streamJavaRDD instance that reads from the given streamDatasetInstantiationException - if the stream doesn't existpublic org.apache.spark.api.java.JavaRDD<StreamEvent> fromStream(String namespace, String streamName)
JavaRDD that represents all events from the given stream.namespace - namespace in which the stream existsstreamName - name of the streamJavaRDD instance that reads from the given streamDatasetInstantiationException - if the stream doesn't existpublic abstract org.apache.spark.api.java.JavaRDD<StreamEvent> fromStream(String streamName, long startTime, long endTime)
JavaRDD that represents events from the given stream in the given time range.streamName - name of the streamstartTime - the starting time of the stream to be read in milliseconds (inclusive);
passing in 0 means start reading from the first event available in the stream.endTime - the ending time of the streams to be read in milliseconds (exclusive);
passing in Long.MAX_VALUE means read up to latest event available in the stream.JavaRDD instance that reads from the given streamDatasetInstantiationException - if the stream doesn't existpublic abstract org.apache.spark.api.java.JavaRDD<StreamEvent> fromStream(String namespace, String streamName, long startTime, long endTime)
JavaRDD that represents events from the given stream in the given time range.namespace - namespace in which the stream existsstreamName - name of the streamstartTime - the starting time of the stream to be read in milliseconds (inclusive);
passing in 0 means start reading from the first event available in the stream.endTime - the ending time of the streams to be read in milliseconds (exclusive);
passing in Long.MAX_VALUE means read up to latest event available in the stream.JavaRDD instance that reads from the given streamDatasetInstantiationException - if the stream doesn't existpublic <V> org.apache.spark.api.java.JavaPairRDD<Long,V> fromStream(String streamName, Class<V> valueType)
JavaPairRDD that represents all events from the given stream. The key in the
resulting JavaPairRDD is the event timestamp. The stream body will
be decoded as the give value type. Currently it supports Text, String and ByteWritable.streamName - name of the streamvalueType - type of the stream body to decode toJavaRDD instance that reads from the given streamDatasetInstantiationException - if the stream doesn't existpublic <V> org.apache.spark.api.java.JavaPairRDD<Long,V> fromStream(String namespace, String streamName, Class<V> valueType)
JavaPairRDD that represents all events from the given stream. The key in the
resulting JavaPairRDD is the event timestamp. The stream body will
be decoded as the give value type. Currently it supports Text, String and ByteWritable.namespace - namespace in which the stream existsstreamName - name of the streamvalueType - type of the stream body to decode toJavaRDD instance that reads from the given streamDatasetInstantiationException - if the stream doesn't existpublic abstract <V> org.apache.spark.api.java.JavaPairRDD<Long,V> fromStream(String streamName, long startTime, long endTime, Class<V> valueType)
JavaPairRDD that represents events from the given stream in the given time range.
The key in the resulting JavaPairRDD is the event timestamp.
The stream body will be decoded as the give value type.
Currently it supports Text, String and ByteWritable.streamName - name of the streamstartTime - the starting time of the stream to be read in milliseconds (inclusive);
passing in 0 means start reading from the first event available in the stream.endTime - the ending time of the streams to be read in milliseconds (exclusive);
passing in Long.MAX_VALUE means read up to latest event available in the stream.valueType - type of the stream body to decode toJavaRDD instance that reads from the given streamDatasetInstantiationException - if the stream doesn't existpublic abstract <V> org.apache.spark.api.java.JavaPairRDD<Long,V> fromStream(String namespace, String streamName, long startTime, long endTime, Class<V> valueType)
JavaPairRDD that represents events from the given stream in the given time range.
The key in the resulting JavaPairRDD is the event timestamp.
The stream body will be decoded as the give value type.
Currently it supports Text, String and ByteWritable.namespace - namespace in which the stream existsstreamName - name of the streamstartTime - the starting time of the stream to be read in milliseconds (inclusive);
passing in 0 means start reading from the first event available in the stream.endTime - the ending time of the streams to be read in milliseconds (exclusive);
passing in Long.MAX_VALUE means read up to latest event available in the stream.valueType - type of the stream body to decode toJavaRDD instance that reads from the given streamDatasetInstantiationException - if the stream doesn't existpublic abstract <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> fromStream(String streamName, long startTime, long endTime, Class<? extends co.cask.cdap.api.stream.StreamEventDecoder<K,V>> decoderClass, Class<K> keyType, Class<V> valueType)
JavaPairRDD that represents events from the given stream in the given time range.
Each steam event will be decoded by an instance of the given StreamEventDecoder class.streamName - name of the streamstartTime - the starting time of the stream to be read in milliseconds (inclusive);
passing in 0 means start reading from the first event available in the stream.endTime - the ending time of the streams to be read in milliseconds (exclusive);
passing in Long.MAX_VALUE means read up to latest event available in the stream.decoderClass - the StreamEventDecoder for decoding StreamEventkeyType - the type of the decoded keyvalueType - the type of the decoded valueJavaRDD instance that reads from the given streamDatasetInstantiationException - if the stream doesn't existpublic abstract <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> fromStream(String namespace, String streamName, long startTime, long endTime, Class<? extends co.cask.cdap.api.stream.StreamEventDecoder<K,V>> decoderClass, Class<K> keyType, Class<V> valueType)
JavaPairRDD that represents events from the given stream in the given time range.
Each steam event will be decoded by an instance of the given StreamEventDecoder class.namespace - namespace in which the stream existsstreamName - name of the streamstartTime - the starting time of the stream to be read in milliseconds (inclusive);
passing in 0 means start reading from the first event available in the stream.endTime - the ending time of the streams to be read in milliseconds (exclusive);
passing in Long.MAX_VALUE means read up to latest event available in the stream.decoderClass - the StreamEventDecoder for decoding StreamEventkeyType - the type of the decoded keyvalueType - the type of the decoded valueJavaRDD instance that reads from the given streamDatasetInstantiationException - if the stream doesn't existpublic <T> org.apache.spark.api.java.JavaPairRDD<Long,co.cask.cdap.api.stream.GenericStreamEventData<T>> fromStream(String streamName, co.cask.cdap.api.data.format.FormatSpecification formatSpec, Class<T> dataType)
JavaPairRDD that represents all events from the given stream.
The first entry in the pair is a Long, representing the
event timestamp, while the second entry is a GenericStreamEventData,
which contains data decoded from the stream event body base on
the given FormatSpecification.T - value typestreamName - name of the streamformatSpec - the FormatSpecification describing the format in the streamJavaPairRDD instance that reads from the given stream.DatasetInstantiationException - if the Stream doesn't existpublic <T> org.apache.spark.api.java.JavaPairRDD<Long,co.cask.cdap.api.stream.GenericStreamEventData<T>> fromStream(String namespace, String streamName, co.cask.cdap.api.data.format.FormatSpecification formatSpec, Class<T> dataType)
JavaPairRDD that represents all events from the given stream.
The first entry in the pair is a Long, representing the
event timestamp, while the second entry is a GenericStreamEventData,
which contains data decoded from the stream event body base on
the given FormatSpecification.T - value typenamespace - namespace in which the stream existsstreamName - name of the streamformatSpec - the FormatSpecification describing the format in the streamJavaPairRDD instance that reads from the given stream.DatasetInstantiationException - if the Stream doesn't existpublic abstract <T> org.apache.spark.api.java.JavaPairRDD<Long,co.cask.cdap.api.stream.GenericStreamEventData<T>> fromStream(String streamName, co.cask.cdap.api.data.format.FormatSpecification formatSpec, long startTime, long endTime, Class<T> dataType)
JavaPairRDD that represents data from the given stream for events in the given
time range. The first entry in the pair is a Long, representing the
event timestamp, while the second entry is a GenericStreamEventData,
which contains data decoded from the stream event body base on
the given FormatSpecification.T - value typestreamName - name of the streamformatSpec - the FormatSpecification describing the format in the streamstartTime - the starting time of the stream to be read in milliseconds (inclusive);
passing in 0 means start reading from the first event available in the stream.endTime - the ending time of the streams to be read in milliseconds (exclusive);
passing in Long.MAX_VALUE means read up to latest event available in the stream.JavaPairRDD instance that reads from the given stream.DatasetInstantiationException - if the Stream doesn't existpublic abstract <T> org.apache.spark.api.java.JavaPairRDD<Long,co.cask.cdap.api.stream.GenericStreamEventData<T>> fromStream(String namespace, String streamName, co.cask.cdap.api.data.format.FormatSpecification formatSpec, long startTime, long endTime, Class<T> dataType)
JavaPairRDD that represents data from the given stream for events in the given
time range. The first entry in the pair is a Long, representing the
event timestamp, while the second entry is a GenericStreamEventData,
which contains data decoded from the stream event body base on
the given FormatSpecification.T - value typenamespace - namespace in which the stream existsstreamName - name of the streamformatSpec - the FormatSpecification describing the format in the streamstartTime - the starting time of the stream to be read in milliseconds (inclusive);
passing in 0 means start reading from the first event available in the stream.endTime - the ending time of the streams to be read in milliseconds (exclusive);
passing in Long.MAX_VALUE means read up to latest event available in the stream.JavaPairRDD instance that reads from the given stream.DatasetInstantiationException - if the Stream doesn't existpublic <K,V> void saveAsDataset(org.apache.spark.api.java.JavaPairRDD<K,V> rdd,
String datasetName)
JavaPairRDD to the given Dataset.rdd - the JavaPairRDD to be saveddatasetName - name of the DatasetDatasetInstantiationException - if the Dataset doesn't existpublic <K,V> void saveAsDataset(org.apache.spark.api.java.JavaPairRDD<K,V> rdd,
String namespace,
String datasetName)
JavaPairRDD to the given Dataset.rdd - the JavaPairRDD to be savednamespace - the namespace in which the specified dataset is to be saveddatasetName - name of the DatasetDatasetInstantiationException - if the Dataset doesn't existpublic abstract <K,V> void saveAsDataset(org.apache.spark.api.java.JavaPairRDD<K,V> rdd,
String datasetName,
Map<String,String> arguments)
JavaPairRDD to the given Dataset with the given set of Dataset arguments.rdd - the JavaPairRDD to be saveddatasetName - name of the datasetarguments - arguments for the datasetDatasetInstantiationException - if the dataset doesn't existpublic abstract <K,V> void saveAsDataset(org.apache.spark.api.java.JavaPairRDD<K,V> rdd,
String namespace,
String datasetName,
Map<String,String> arguments)
JavaPairRDD to the given Dataset with the given set of dataset arguments.rdd - the JavaPairRDD to be savednamespace - the namespace in which the specified Dataset is to be saveddatasetName - name of the Datasetarguments - arguments for the DatasetDatasetInstantiationException - if the Dataset doesn't existpublic abstract void execute(int timeoutInSeconds,
TxRunnable runnable)
throws org.apache.tephra.TransactionFailureException
execute in interface Transactionalorg.apache.tephra.TransactionFailureException - alwayspublic abstract co.cask.cdap.api.spark.dynamic.SparkInterpreter createInterpreter() throws IOException
SparkInterpreter for Scala code compilation and interpretation.SparkInterpreterIOException - if failed to create a local directory for storing the compiled class filespublic abstract SparkExecutionContext getSparkExecutionContext()
SparkExecutionContext used by this object.Copyright © 2018 Cask Data, Inc. Licensed under the Apache License, Version 2.0.