object DataFrameHelpers extends DataFrameValidator
- Alphabetic
- By Inheritance
- DataFrameHelpers
- DataFrameValidator
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
-
def
columnToArray[T](df: DataFrame, colName: String)(implicit arg0: ClassTag[T]): Array[T]
Converts a DataFrame column to an Array of values N.B. This method uses
collectand should only be called on small DataFrames.Converts a DataFrame column to an Array of values N.B. This method uses
collectand should only be called on small DataFrames.This function converts a column to an array of items.
Suppose we have the following
sourceDF:+---+ |num| +---+ | 1| | 2| | 3| +---+
Let's convert the
numcolumn to an Array of values. Let's run the code and view the results.val actual = DataFrameHelpers.columnToArray[Int](sourceDF, "num") println(actual) // Array(1, 2, 3)
-
def
columnToList[T](df: DataFrame, colName: String)(implicit arg0: ClassTag[T]): List[T]
Converts a DataFrame column to a List of values N.B. This method uses
collectand should only be called on small DataFrames.Converts a DataFrame column to a List of values N.B. This method uses
collectand should only be called on small DataFrames.This function converts a column to a list of items.
Suppose we have the following
sourceDF:+---+ |num| +---+ | 1| | 2| | 3| +---+
Let's convert the
numcolumn to a List of values. Let's run the code and view the results.val actual = DataFrameHelpers.columnToList[Int](sourceDF, "num") println(actual) // List(1, 2, 3)
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
printAthenaCreateTable(df: DataFrame, athenaTableName: String, s3location: String): Unit
Generates a CREATE TABLE query for AWS Athena
Generates a CREATE TABLE query for AWS Athena
Suppose we have the following
df:+--------+--------+---------+ | team| sport|goals_for| +--------+--------+---------+ | jets|football| 45| |nacional| soccer| 10| +--------+--------+---------+
Run the code to print the CREATE TABLE query.
DataFrameHelpers.printAthenaCreateTable(df, "my_cool_athena_table", "s3://my-bucket/extracts/people") CREATE TABLE IF NOT EXISTS my_cool_athena_table( team STRING, sport STRING, goals_for INT ) STORED AS PARQUET LOCATION 's3://my-bucket/extracts/people'
- def readTimestamped(dirname: String): DataFrame
- lazy val spark: SparkSession
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toArrayOfMaps(df: DataFrame): Array[Map[String, Any]]
Converts a DataFrame to an Array of Maps N.B. This method uses
collectand should only be called on small DataFrames.Converts a DataFrame to an Array of Maps N.B. This method uses
collectand should only be called on small DataFrames.Converts a DataFrame to an array of Maps.
Suppose we have the following
sourceDF:+----------+-----------+---------+ |profession|some_number|pay_grade| +----------+-----------+---------+ | doctor| 4| high| | dentist| 10| high| +----------+-----------+---------+
Run the code to convert this DataFrame into an array of Maps.
val actual = DataFrameHelpers.toArrayOfMaps(sourceDF) println(actual) Array( Map("profession" -> "doctor", "some_number" -> 4, "pay_grade" -> "high"), Map("profession" -> "dentist", "some_number" -> 10, "pay_grade" -> "high") )
- def toCreateDataFrameCode(df: DataFrame): String
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
twoColumnsToMap[keyType, valueType](df: DataFrame, keyColName: String, valueColName: String)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[keyType], arg1: scala.reflect.api.JavaUniverse.TypeTag[valueType]): Map[keyType, valueType]
Converts two column to a map of key value pairs
Converts two column to a map of key value pairs
N.B. This method uses
collectand should only be called on small DataFrames.Converts two columns in a DataFrame to a Map.
Suppose we have the following
sourceDF:+-----------+---------+ | island|fun_level| +-----------+---------+ | boracay| 7| |long island| 9| +-----------+---------+
Let's convert this DataFrame to a Map with
islandas the key andfun_levelas the value.val actual = DataFrameHelpers.twoColumnsToMap[String, Integer]( sourceDF, "island", "fun_level" ) println(actual) // Map( // "boracay" -> 7, // "long island" -> 9 // )
-
def
validateAbsenceOfColumns(df: DataFrame, prohibitedColNames: Seq[String]): Unit
Throws an error if the DataFrame contains any of the prohibited columns Validates columns are not included in a DataFrame.
Throws an error if the DataFrame contains any of the prohibited columns Validates columns are not included in a DataFrame. This code will error out:
val sourceDF = Seq( ("jets", "football"), ("nacional", "soccer") ).toDF("team", "sport") val prohibitedColNames = Seq("team", "sport", "country", "city") validateAbsenceOfColumns(sourceDF, prohibitedColNames)
This is the error message:
> com.github.mrpowers.spark.daria.sql.ProhibitedDataFrameColumnsException: The [team, sport] columns are not allowed to be included in the DataFrame with the following columns [team, sport]
- Definition Classes
- DataFrameValidator
-
def
validatePresenceOfColumns(df: DataFrame, requiredColNames: Seq[String]): Unit
Throws an error if the DataFrame doesn't contain all the required columns Validates if columns are included in a DataFrame.
Throws an error if the DataFrame doesn't contain all the required columns Validates if columns are included in a DataFrame. This code will error out:
val sourceDF = Seq( ("jets", "football"), ("nacional", "soccer") ).toDF("team", "sport") val requiredColNames = Seq("team", "sport", "country", "city") validatePresenceOfColumns(sourceDF, requiredColNames)
This is the error message
> com.github.mrpowers.spark.daria.sql.MissingDataFrameColumnsException: The [country, city] columns are not included in the DataFrame with the following columns [team, sport]
- Definition Classes
- DataFrameValidator
-
def
validateSchema(df: DataFrame, requiredSchema: StructType): Unit
Throws an error if the DataFrame schema doesn't match the required schema
Throws an error if the DataFrame schema doesn't match the required schema
This code will error out:
val sourceData = List( Row(1, 1), Row(-8, 8), Row(-5, 5), Row(null, null) ) val sourceSchema = List( StructField("num1", IntegerType, true), StructField("num2", IntegerType, true) ) val sourceDF = spark.createDataFrame( spark.sparkContext.parallelize(sourceData), StructType(sourceSchema) ) val requiredSchema = StructType( List( StructField("num1", IntegerType, true), StructField("num2", IntegerType, true), StructField("name", StringType, true) ) ) validateSchema(sourceDF, requiredSchema)
This is the error message:
> com.github.mrpowers.spark.daria.sql.InvalidDataFrameSchemaException: The [StructField(name,StringType,true)] StructFields are not included in the DataFrame with the following StructFields [StructType(StructField(num1,IntegerType,true), StructField(num2,IntegerType,true))]
- Definition Classes
- DataFrameValidator
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
- def writeTimestamped(df: DataFrame, outputDirname: String, numPartitions: Option[Int] = None, overwriteLatest: Boolean = true): Unit
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated