package sql
- Alphabetic
- Public
- All
Type Members
- case class CustomTransform(transform: (DataFrame) ⇒ DataFrame, requiredColumns: Seq[String] = Seq.empty[String], addedColumns: Seq[String] = Seq.empty[String], removedColumns: Seq[String] = Seq.empty[String], skipWhenPossible: Boolean = true) extends Product with Serializable
- case class DariaValidationError(smth: String) extends Exception with Product with Serializable
- case class DataFrameColumnsException(smth: String) extends Exception with Product with Serializable
- trait DataFrameValidator extends AnyRef
-
case class
EtlDefinition(sourceDF: DataFrame, transform: (DataFrame) ⇒ DataFrame, write: (DataFrame) ⇒ Unit, metadata: Map[String, Any] = ...) extends Product with Serializable
spark-daria can be used as a lightweight framework for running ETL analyses in Spark.
spark-daria can be used as a lightweight framework for running ETL analyses in Spark.
You can define
EtlDefinitions, group them in a collection, and run the etls via jobs.Components of an ETL
An ETL starts with a DataFrame, runs a series of transformations (filter, custom transformations, repartition), and writes out data.
The
EtlDefinitionclass is generic and can be molded to suit all ETL situations. For example, it can read a CSV file from S3, run transformations, and write out Parquet files on your local filesystem. - case class InvalidColumnSortOrderException(smth: String) extends Exception with Product with Serializable
- case class InvalidDataFrameSchemaException(smth: String) extends Exception with Product with Serializable
- case class MissingDataFrameColumnsException(smth: String) extends Exception with Product with Serializable
- class ParquetCompactor extends AnyRef
- case class ProhibitedDataFrameColumnsException(smth: String) extends Exception with Product with Serializable
Value Members
-
object
ColumnExt
Additional methods for the Spark Column class
Additional methods for the Spark Column class
- Since
0.0.1
- object DariaValidator
- object DariaWriters
- object DataFrameExt
- object DataFrameHelpers extends DataFrameValidator
- object FunctionsAsColumnExt
- object SparkSessionExt
-
object
functions
Spark [has a ton of SQL functions](https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/functions.html) and spark-daria is meant to fill in any gaps.
-
object
transformations
Functions available for DataFrame operations.
Functions available for DataFrame operations.
SQL transformations take a DataFrame as an argument and return a DataFrame. They are suitable arguments for the
Dataset#transformmethod.It's convenient to work with DataFrames that have snake_case column names. Column names with spaces make it harder to write SQL queries.