Spark

Back to Data-Science

Overview of RDDs

Spark SQL & DataFrames -- source

Spark SQL is a module on top of Spark's RDD API

Datasets are a distributed collection of data. It is an abstraction on top of the RDD API, reaping the benefits of powerful lambda transformations, strong typing and SQL's optimized execution