@Beta public interface JavaSparkMain extends Serializable
JavaSparkExecutionContext for
interacting with CDAP.
public class JavaSparkTest extends JavaSparkMain {
@Override
public void run(JavaSparkExecutionContext sec) throws Exception {
JavaSparkContext sc = new JavaSparkContext();
// Create a RDD from stream "input", with event body decoded as UTF-8 String
JavaRDD<String> streamRDD = sec.fromStreamAsStringPair("input").values();
// Create a RDD from dataset "lookup", which represents a lookup table from String to Long
JavaPairRDD<String, Long> lookupRDD = sec.fromDataset("lookup");
// Join the "input" stream with the "lookup" dataset and save it to "output" dataset
JavaPairRDD<String, Long> resultRDD = streamRDD
.mapToPair(new PairFunction<String, String, String>() {
@Override
public Tuple2<String, String> call(String s) throws Exception {
return Tuple2.apply(s, s);
}
})
.join(lookupRDD)
.mapValues(new Function<Tuple2<String, Long>, Long>() {
@Override
public Long call(Tuple2<String, Long> v1) throws Exception {
return v1._2;
}
});
sec.saveAsDataset(resultRDD, "output");
}
}
This interface extends serializable because the closures are anonymous class in Java and Spark Serializes the
closures before sending it to worker nodes. This serialization of inner anonymous class expects the outer
containing class to be serializable else NotSerializableException is thrown. Having this interface
serializable gives a neater API.| Modifier and Type | Method and Description |
|---|---|
void |
run(JavaSparkExecutionContext sec)
This method will be called when the Spark program starts.
|
void run(JavaSparkExecutionContext sec) throws Exception
sec - the context for interacting with CDAPExceptionCopyright © 2016 Cask Data, Inc. Licensed under the Apache License, Version 2.0.