package sql

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class CustomTransform(transform: (DataFrame) ⇒ DataFrame, requiredColumns: Seq[String] = Seq.empty[String], addedColumns: Seq[String] = Seq.empty[String], removedColumns: Seq[String] = Seq.empty[String], skipWhenPossible: Boolean = true) extends Product with Serializable
  2. case class DariaValidationError(smth: String) extends Exception with Product with Serializable
  3. case class DataFrameColumnsException(smth: String) extends Exception with Product with Serializable
  4. trait DataFrameValidator extends AnyRef
  5. case class EtlDefinition(sourceDF: DataFrame, transform: (DataFrame) ⇒ DataFrame, write: (DataFrame) ⇒ Unit, metadata: Map[String, Any] = ...) extends Product with Serializable

    spark-daria can be used as a lightweight framework for running ETL analyses in Spark.

    spark-daria can be used as a lightweight framework for running ETL analyses in Spark.

    You can define EtlDefinitions, group them in a collection, and run the etls via jobs.

    Components of an ETL

    An ETL starts with a DataFrame, runs a series of transformations (filter, custom transformations, repartition), and writes out data.

    The EtlDefinition class is generic and can be molded to suit all ETL situations. For example, it can read a CSV file from S3, run transformations, and write out Parquet files on your local filesystem.

  6. case class InvalidColumnSortOrderException(smth: String) extends Exception with Product with Serializable
  7. case class InvalidDataFrameSchemaException(smth: String) extends Exception with Product with Serializable
  8. case class MissingDataFrameColumnsException(smth: String) extends Exception with Product with Serializable
  9. class ParquetCompactor extends AnyRef
  10. case class ProhibitedDataFrameColumnsException(smth: String) extends Exception with Product with Serializable

Value Members

  1. object ColumnExt

    Additional methods for the Spark Column class

    Additional methods for the Spark Column class

    Since

    0.0.1

  2. object DariaValidator
  3. object DariaWriters
  4. object DataFrameExt
  5. object DataFrameHelpers extends DataFrameValidator
  6. object FunctionsAsColumnExt
  7. object SparkSessionExt
  8. object functions

    Spark [has a ton of SQL functions](https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/functions.html) and spark-daria is meant to fill in any gaps.

  9. object transformations

    Functions available for DataFrame operations.

    Functions available for DataFrame operations.

    SQL transformations take a DataFrame as an argument and return a DataFrame. They are suitable arguments for the Dataset#transform method.

    It's convenient to work with DataFrames that have snake_case column names. Column names with spaces make it harder to write SQL queries.

Ungrouped