object functions

Spark [has a ton of SQL functions](https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/functions.html) and spark-daria is meant to fill in any gaps.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. functions
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def antiTrim(col: Column): Column

    Deletes inner whitespace and leaves leading and trailing whitespace

    Deletes inner whitespace and leaves leading and trailing whitespace

    val actualDF = sourceDF.withColumn(
      "some_string_anti_trimmed",
      antiTrim(col("some_string"))
    )

    Removes all inner whitespace, but doesn't delete leading or trailing whitespace (e.g. changes " this has some " to " thishassome ".

  5. def arrayExNull(cols: Column*): Column

    Like array() function but doesn't include null elements

  6. def array_filter_nulls[T]()(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): UserDefinedFunction
  7. def array_groupBy[T](f: (T) ⇒ Boolean)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): UserDefinedFunction
  8. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  9. def beginningOfMonth(colName: String): Column
  10. def beginningOfMonthDate(col: Column): Column
  11. def beginningOfMonthTime(col: Column): Column
  12. def beginningOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column
  13. def broadcastArrayContains[T](col: Column, broadcastedArray: Broadcast[Array[T]]): Column
  14. def bucketFinder(col: Column, buckets: Array[(Any, Any)], inclusiveBoundries: Boolean = false, lowestBoundLte: Boolean = false, highestBoundGte: Boolean = false): Column
  15. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native() @HotSpotIntrinsicCandidate()
  16. def dayOfWeekStr(col: Column): Column
  17. def endOfMonthDate(col: Column): Column
  18. def endOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column
  19. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  20. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  21. def excelEpochToDate(colName: String): Column

    Convert an Excel epoch to date.

    Convert an Excel epoch to date. Let's see how it works: Suppose we have the following testDF

    +-----------------+
    |       excel_time|
    +-----------------+
    |43967.24166666666|
    |33966.78333333378|
    |43965.58383363244|
    |33964.58393533934|
    +-----------------+

    We can run the excelEpochToDate function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDF = testDF
      .withColumn("timestamp", excelEpochToDate("excel_time"))
    
    actualDf.show()
    
    +-----------------+----------+
    |       excel_time|      date|
    +-----------------+----------+
    |43967.24166666666|2020-05-16|
    |33966.78333333378|1992-12-28|
    |43965.58383363244|2020-05-14|
    |33964.58393533934|1992-12-26|
    +-----------------+----------+
  22. def excelEpochToDate(col: Column): Column

    Convert an Excel epoch to date.

    Convert an Excel epoch to date. Let's see how it works: Suppose we have the following testDF

    +-----------------+
    |       excel_time|
    +-----------------+
    |43967.24166666666|
    |33966.78333333378|
    |43965.58383363244|
    |33964.58393533934|
    +-----------------+

    We can run the excelEpochToDate function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDF = testDF
      .withColumn("timestamp", excelEpochToDate(col("excel_time")))
    
    actualDf.show()
    
    +-----------------+----------+
    |       excel_time|      date|
    +-----------------+----------+
    |43967.24166666666|2020-05-16|
    |33966.78333333378|1992-12-28|
    |43965.58383363244|2020-05-14|
    |33964.58393533934|1992-12-26|
    +-----------------+----------+
  23. def excelEpochToTimestamp(colName: String): Column

    Convert an Excel epoch to timestamp.

    Convert an Excel epoch to timestamp. Inspired by [Filip Czaja]{http://fczaja.blogspot.com/2011/06/convert-excel-date-into-timestamp.html} Let's see how it works: Suppose we have the following testDF

    +-----------------+
    |       excel_time|
    +-----------------+
    |43967.24166666666|
    |33966.78333333378|
    |43965.58383363244|
    |33964.58393533934|
    +-----------------+

    We can run the excelEpochToTimestamp function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDF = testDF
      .withColumn("timestamp", excelEpochToTimestamp("excel_time"))
    
    actualDf.show()
    
    +-----------------+-------------------+
    |       excel_time|          timestamp|
    +-----------------+-------------------+
    |43967.24166666666|2020-05-16 05:47:59|
    |33966.78333333378|1992-12-28 18:48:00|
    |43965.58383363244|2020-05-14 14:00:43|
    |33964.58393533934|1992-12-26 14:00:52|
    +-----------------+-------------------+
  24. def excelEpochToTimestamp(col: Column): Column

    Convert an Excel epoch to timestamp.

    Convert an Excel epoch to timestamp. Inspired by [Filip Czaja]{http://fczaja.blogspot.com/2011/06/convert-excel-date-into-timestamp.html} Let's see how it works: Suppose we have the following testDF

    +-----------------+
    |       excel_time|
    +-----------------+
    |43967.24166666666|
    |33966.78333333378|
    |43965.58383363244|
    |33964.58393533934|
    +-----------------+

    We can run the excelEpochToTimestamp function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDF = testDF
      .withColumn("timestamp", excelEpochToTimestamp(col("excel_time")))
    
    actualDf.show()
    
    +-----------------+-------------------+
    |       excel_time|          timestamp|
    +-----------------+-------------------+
    |43967.24166666666|2020-05-16 05:47:59|
    |33966.78333333378|1992-12-28 18:48:00|
    |43965.58383363244|2020-05-14 14:00:43|
    |33964.58393533934|1992-12-26 14:00:52|
    +-----------------+-------------------+
  25. def excelEpochToUnixTimestamp(colName: String): Column

    Convert an Excel epoch to unix timestamp.

    Convert an Excel epoch to unix timestamp. Inspired by [Filip Czaja]{http://fczaja.blogspot.com/2011/06/convert-excel-date-into-timestamp.html} Let's see how it works: Suppose we have the following testDF

    +-----------------+
    |       excel_time|
    +-----------------+
    |43967.24166666666|
    |33966.78333333378|
    |43965.58383363244|
    |33964.58393533934|
    +-----------------+

    We can run the excelEpochToUnixTimestamp function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDF = testDF
      .withColumn("num_years", excelEpochToUnixTimestamp("excel_time"))
    
    actualDf.show()
    
    +-----------------+--------------------+
    |       excel_time|      unix_timestamp|
    +-----------------+--------------------+
    |43967.24166666666|1.5896080799999995E9|
    |33966.78333333378| 7.255684800000383E8|
    |43965.58383363244|1.5894648432258427E9|
    |33964.58393533934| 7.253784520133189E8|
    +-----------------+--------------------+
  26. def excelEpochToUnixTimestamp(col: Column): Column

    Convert an Excel epoch to unix timestamp.

    Convert an Excel epoch to unix timestamp. Inspired by [Filip Czaja]{http://fczaja.blogspot.com/2011/06/convert-excel-date-into-timestamp.html} Let's see how it works: Suppose we have the following testDF

    +-----------------+
    |       excel_time|
    +-----------------+
    |43967.24166666666|
    |33966.78333333378|
    |43965.58383363244|
    |33964.58393533934|
    +-----------------+

    We can run the excelEpochToUnixTimestamp function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDF = testDF
      .withColumn("num_years", excelEpochToUnixTimestamp(col("excel_time")))
    
    actualDf.show()
    
    +-----------------+--------------------+
    |       excel_time|      unix_timestamp|
    +-----------------+--------------------+
    |43967.24166666666|1.5896080799999995E9|
    |33966.78333333378| 7.255684800000383E8|
    |43965.58383363244|1.5894648432258427E9|
    |33964.58393533934| 7.253784520133189E8|
    +-----------------+--------------------+
  27. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  28. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  29. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  30. val isLuhnNumber: UserDefinedFunction
  31. def multiEquals[T](value: T, cols: Column*)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): Column

    Returns true if multiple columns are equal to a given value

    Returns true if multiple columns are equal to a given value

    Returns true if multiple columns are equal to a value.

    Suppose we have the following sourceDF:

    +---+---+
    | s1| s2|
    +---+---+
    |cat|cat|
    |cat|dog|
    |pig|pig|
    +---+---+

    We can use the multiEquals function to see if multiple columns are equal to "cat".

    val actualDF = sourceDF.withColumn(
      "are_s1_and_s2_cat",
      multiEquals[String]("cat", col("s1"), col("s2"))
    )
    
    actualDF.show()
    
    +---+---+-----------------+
    | s1| s2|are_s1_and_s2_cat|
    +---+---+-----------------+
    |cat|cat|             true|
    |cat|dog|            false|
    |pig|pig|            false|
    +---+---+-----------------+
  32. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  33. def nextWeekday(col: Column): Column
  34. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  35. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native() @HotSpotIntrinsicCandidate()
  36. val regexp_extract_all: UserDefinedFunction
  37. val regexp_extract_all_by_group: UserDefinedFunction
  38. def regexp_extract_all_by_group_fun(pattern: String, text: String, captureGroup: Int): Array[String]
  39. val regexp_extract_all_by_groups: UserDefinedFunction
  40. def regexp_extract_all_by_groups_fun(pattern: String, text: String, captureGroups: Seq[Int]): Array[Array[String]]
  41. def removeAllWhitespace(colName: String): Column

    Removes all whitespace in a string

    Removes all whitespace in a string

    val actualDF = sourceDF.withColumn(
      "some_string_without_whitespace",
      removeAllWhitespace(col("some_string"))
    )
    Since

    0.16.0

  42. def removeAllWhitespace(col: Column): Column

    Removes all whitespace in a string

    Removes all whitespace in a string

    val actualDF = sourceDF.withColumn(
      "some_string_without_whitespace",
      removeAllWhitespace(col("some_string"))
    )

    Removes all whitespace in a string (e.g. changes "this has some" to "thishassome".

    Since

    0.16.0

  43. def removeNonWordCharacters(col: Column): Column

    Removes all non-word characters from a string

    Removes all non-word characters from a string

    val actualDF = sourceDF.withColumn(
      "some_string_remove_non_word_chars",
      removeNonWordCharacters(col("some_string"))
    )

    Removes all non-word characters from a string, excluding whitespace (e.g. changes " ni!!ce h^^air person " to " nice hair person ").

  44. def singleSpace(col: Column): Column

    Replaces all whitespace in a string with single spaces

    Replaces all whitespace in a string with single spaces

    val actualDF = sourceDF.withColumn(
      "some_string_single_spaced",
      singleSpace(col("some_string"))
    )

    Replaces all multispaces with single spaces (e.g. changes "this has some" to "this has some".

  45. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  46. def toString(): String
    Definition Classes
    AnyRef → Any
  47. def truncate(col: Column, len: Int): Column

    Truncates the length of StringType columns

    Truncates the length of StringType columns

    sourceDF.withColumn(
      "some_string_truncated",
      truncate(col("some_string"), 3)
    )

    Truncates the "some_string" column to only have three characters.

  48. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  49. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  50. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  51. def yeardiff(end: Column, start: Column): Column

    Returns the number of years from start to end.

    Returns the number of years from start to end. There is a datediff function that calculates the number of days between two dates, but there isn't a yeardiff function that calculates the number of years between two dates.

    The com.github.mrpowers.spark.daria.sql.functions.yeardiff function fills the gap. Let's see how it works!

    Suppose we have the following testDf

    +--------------------+--------------------+
    |      first_datetime|     second_datetime|
    +--------------------+--------------------+
    |2016-09-10 00:00:...|2001-08-10 00:00:...|
    |2016-04-18 00:00:...|2010-05-18 00:00:...|
    |2016-01-10 00:00:...|2013-08-10 00:00:...|
    |                null|                null|
    +--------------------+--------------------+

    We can run the yeardiff function as follows:

    import com.github.mrpowers.spark.daria.sql.functions._
    
    val actualDf = testDf
      .withColumn("num_years", yeardiff(col("first_datetime"), col("second_datetime")))
    
    actualDf.show()
    
    +--------------------+--------------------+------------------+
    |      first_datetime|     second_datetime|         num_years|
    +--------------------+--------------------+------------------+
    |2016-09-10 00:00:...|2001-08-10 00:00:...|15.095890410958905|
    |2016-04-18 00:00:...|2010-05-18 00:00:...| 5.923287671232877|
    |2016-01-10 00:00:...|2013-08-10 00:00:...| 2.419178082191781|
    |                null|                null|              null|
    +--------------------+--------------------+------------------+

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] ) @Deprecated
    Deprecated

Inherited from AnyRef

Inherited from Any

Collection functions

Date time functions

Misc functions

String functions

Support functions for DataFrames