object functions
Spark [has a ton of SQL functions](https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/functions.html) and spark-daria is meant to fill in any gaps.
- Alphabetic
- By Inheritance
- functions
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
antiTrim(col: Column): Column
Deletes inner whitespace and leaves leading and trailing whitespace
Deletes inner whitespace and leaves leading and trailing whitespace
val actualDF = sourceDF.withColumn( "some_string_anti_trimmed", antiTrim(col("some_string")) )
Removes all inner whitespace, but doesn't delete leading or trailing whitespace (e.g. changes
" this has some "to" thishassome ". -
def
arrayExNull(cols: Column*): Column
Like array() function but doesn't include null elements
- def array_filter_nulls[T]()(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): UserDefinedFunction
- def array_groupBy[T](f: (T) ⇒ Boolean)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): UserDefinedFunction
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
- def beginningOfMonth(colName: String): Column
- def beginningOfMonthDate(col: Column): Column
- def beginningOfMonthTime(col: Column): Column
- def beginningOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column
- def broadcastArrayContains[T](col: Column, broadcastedArray: Broadcast[Array[T]]): Column
- def bucketFinder(col: Column, buckets: Array[(Any, Any)], inclusiveBoundries: Boolean = false, lowestBoundLte: Boolean = false, highestBoundGte: Boolean = false): Column
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native() @HotSpotIntrinsicCandidate()
- def dayOfWeekStr(col: Column): Column
- def endOfMonthDate(col: Column): Column
- def endOfWeek(col: Column, lastDayOfWeek: String = "Sat"): Column
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
excelEpochToDate(colName: String): Column
Convert an Excel epoch to date.
Convert an Excel epoch to date. Let's see how it works: Suppose we have the following
testDF+-----------------+ | excel_time| +-----------------+ |43967.24166666666| |33966.78333333378| |43965.58383363244| |33964.58393533934| +-----------------+
We can run the
excelEpochToDatefunction as follows:import com.github.mrpowers.spark.daria.sql.functions._ val actualDF = testDF .withColumn("timestamp", excelEpochToDate("excel_time")) actualDf.show() +-----------------+----------+ | excel_time| date| +-----------------+----------+ |43967.24166666666|2020-05-16| |33966.78333333378|1992-12-28| |43965.58383363244|2020-05-14| |33964.58393533934|1992-12-26| +-----------------+----------+
-
def
excelEpochToDate(col: Column): Column
Convert an Excel epoch to date.
Convert an Excel epoch to date. Let's see how it works: Suppose we have the following
testDF+-----------------+ | excel_time| +-----------------+ |43967.24166666666| |33966.78333333378| |43965.58383363244| |33964.58393533934| +-----------------+
We can run the
excelEpochToDatefunction as follows:import com.github.mrpowers.spark.daria.sql.functions._ val actualDF = testDF .withColumn("timestamp", excelEpochToDate(col("excel_time"))) actualDf.show() +-----------------+----------+ | excel_time| date| +-----------------+----------+ |43967.24166666666|2020-05-16| |33966.78333333378|1992-12-28| |43965.58383363244|2020-05-14| |33964.58393533934|1992-12-26| +-----------------+----------+
-
def
excelEpochToTimestamp(colName: String): Column
Convert an Excel epoch to timestamp.
Convert an Excel epoch to timestamp. Inspired by [Filip Czaja]{http://fczaja.blogspot.com/2011/06/convert-excel-date-into-timestamp.html} Let's see how it works: Suppose we have the following
testDF+-----------------+ | excel_time| +-----------------+ |43967.24166666666| |33966.78333333378| |43965.58383363244| |33964.58393533934| +-----------------+
We can run the
excelEpochToTimestampfunction as follows:import com.github.mrpowers.spark.daria.sql.functions._ val actualDF = testDF .withColumn("timestamp", excelEpochToTimestamp("excel_time")) actualDf.show() +-----------------+-------------------+ | excel_time| timestamp| +-----------------+-------------------+ |43967.24166666666|2020-05-16 05:47:59| |33966.78333333378|1992-12-28 18:48:00| |43965.58383363244|2020-05-14 14:00:43| |33964.58393533934|1992-12-26 14:00:52| +-----------------+-------------------+
-
def
excelEpochToTimestamp(col: Column): Column
Convert an Excel epoch to timestamp.
Convert an Excel epoch to timestamp. Inspired by [Filip Czaja]{http://fczaja.blogspot.com/2011/06/convert-excel-date-into-timestamp.html} Let's see how it works: Suppose we have the following
testDF+-----------------+ | excel_time| +-----------------+ |43967.24166666666| |33966.78333333378| |43965.58383363244| |33964.58393533934| +-----------------+
We can run the
excelEpochToTimestampfunction as follows:import com.github.mrpowers.spark.daria.sql.functions._ val actualDF = testDF .withColumn("timestamp", excelEpochToTimestamp(col("excel_time"))) actualDf.show() +-----------------+-------------------+ | excel_time| timestamp| +-----------------+-------------------+ |43967.24166666666|2020-05-16 05:47:59| |33966.78333333378|1992-12-28 18:48:00| |43965.58383363244|2020-05-14 14:00:43| |33964.58393533934|1992-12-26 14:00:52| +-----------------+-------------------+
-
def
excelEpochToUnixTimestamp(colName: String): Column
Convert an Excel epoch to unix timestamp.
Convert an Excel epoch to unix timestamp. Inspired by [Filip Czaja]{http://fczaja.blogspot.com/2011/06/convert-excel-date-into-timestamp.html} Let's see how it works: Suppose we have the following
testDF+-----------------+ | excel_time| +-----------------+ |43967.24166666666| |33966.78333333378| |43965.58383363244| |33964.58393533934| +-----------------+
We can run the
excelEpochToUnixTimestampfunction as follows:import com.github.mrpowers.spark.daria.sql.functions._ val actualDF = testDF .withColumn("num_years", excelEpochToUnixTimestamp("excel_time")) actualDf.show() +-----------------+--------------------+ | excel_time| unix_timestamp| +-----------------+--------------------+ |43967.24166666666|1.5896080799999995E9| |33966.78333333378| 7.255684800000383E8| |43965.58383363244|1.5894648432258427E9| |33964.58393533934| 7.253784520133189E8| +-----------------+--------------------+
-
def
excelEpochToUnixTimestamp(col: Column): Column
Convert an Excel epoch to unix timestamp.
Convert an Excel epoch to unix timestamp. Inspired by [Filip Czaja]{http://fczaja.blogspot.com/2011/06/convert-excel-date-into-timestamp.html} Let's see how it works: Suppose we have the following
testDF+-----------------+ | excel_time| +-----------------+ |43967.24166666666| |33966.78333333378| |43965.58383363244| |33964.58393533934| +-----------------+
We can run the
excelEpochToUnixTimestampfunction as follows:import com.github.mrpowers.spark.daria.sql.functions._ val actualDF = testDF .withColumn("num_years", excelEpochToUnixTimestamp(col("excel_time"))) actualDf.show() +-----------------+--------------------+ | excel_time| unix_timestamp| +-----------------+--------------------+ |43967.24166666666|1.5896080799999995E9| |33966.78333333378| 7.255684800000383E8| |43965.58383363244|1.5894648432258427E9| |33964.58393533934| 7.253784520133189E8| +-----------------+--------------------+
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- val isLuhnNumber: UserDefinedFunction
-
def
multiEquals[T](value: T, cols: Column*)(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T]): Column
Returns true if multiple columns are equal to a given value
Returns true if multiple columns are equal to a given value
Returns
trueif multiple columns are equal to a value.Suppose we have the following sourceDF:
+---+---+ | s1| s2| +---+---+ |cat|cat| |cat|dog| |pig|pig| +---+---+
We can use the
multiEqualsfunction to see if multiple columns are equal to"cat".val actualDF = sourceDF.withColumn( "are_s1_and_s2_cat", multiEquals[String]("cat", col("s1"), col("s2")) ) actualDF.show() +---+---+-----------------+ | s1| s2|are_s1_and_s2_cat| +---+---+-----------------+ |cat|cat| true| |cat|dog| false| |pig|pig| false| +---+---+-----------------+
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def nextWeekday(col: Column): Column
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native() @HotSpotIntrinsicCandidate()
- val regexp_extract_all: UserDefinedFunction
- val regexp_extract_all_by_group: UserDefinedFunction
- def regexp_extract_all_by_group_fun(pattern: String, text: String, captureGroup: Int): Array[String]
- val regexp_extract_all_by_groups: UserDefinedFunction
- def regexp_extract_all_by_groups_fun(pattern: String, text: String, captureGroups: Seq[Int]): Array[Array[String]]
-
def
removeAllWhitespace(colName: String): Column
Removes all whitespace in a string
Removes all whitespace in a string
val actualDF = sourceDF.withColumn( "some_string_without_whitespace", removeAllWhitespace(col("some_string")) )
- Since
0.16.0
-
def
removeAllWhitespace(col: Column): Column
Removes all whitespace in a string
Removes all whitespace in a string
val actualDF = sourceDF.withColumn( "some_string_without_whitespace", removeAllWhitespace(col("some_string")) )
Removes all whitespace in a string (e.g. changes
"this has some"to"thishassome".- Since
0.16.0
-
def
removeNonWordCharacters(col: Column): Column
Removes all non-word characters from a string
Removes all non-word characters from a string
val actualDF = sourceDF.withColumn( "some_string_remove_non_word_chars", removeNonWordCharacters(col("some_string")) )
Removes all non-word characters from a string, excluding whitespace (e.g. changes
" ni!!ce h^^air person "to" nice hair person "). -
def
singleSpace(col: Column): Column
Replaces all whitespace in a string with single spaces
Replaces all whitespace in a string with single spaces
val actualDF = sourceDF.withColumn( "some_string_single_spaced", singleSpace(col("some_string")) )
Replaces all multispaces with single spaces (e.g. changes
"this has some"to"this has some". -
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
-
def
truncate(col: Column, len: Int): Column
Truncates the length of StringType columns
Truncates the length of StringType columns
sourceDF.withColumn( "some_string_truncated", truncate(col("some_string"), 3) )
Truncates the
"some_string"column to only have three characters. -
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
def
yeardiff(end: Column, start: Column): Column
Returns the number of years from
starttoend.Returns the number of years from
starttoend. There is adatedifffunction that calculates the number of days between two dates, but there isn't ayeardifffunction that calculates the number of years between two dates.The
com.github.mrpowers.spark.daria.sql.functions.yeardifffunction fills the gap. Let's see how it works!Suppose we have the following
testDf+--------------------+--------------------+ | first_datetime| second_datetime| +--------------------+--------------------+ |2016-09-10 00:00:...|2001-08-10 00:00:...| |2016-04-18 00:00:...|2010-05-18 00:00:...| |2016-01-10 00:00:...|2013-08-10 00:00:...| | null| null| +--------------------+--------------------+
We can run the
yeardifffunction as follows:import com.github.mrpowers.spark.daria.sql.functions._ val actualDf = testDf .withColumn("num_years", yeardiff(col("first_datetime"), col("second_datetime"))) actualDf.show() +--------------------+--------------------+------------------+ | first_datetime| second_datetime| num_years| +--------------------+--------------------+------------------+ |2016-09-10 00:00:...|2001-08-10 00:00:...|15.095890410958905| |2016-04-18 00:00:...|2010-05-18 00:00:...| 5.923287671232877| |2016-01-10 00:00:...|2013-08-10 00:00:...| 2.419178082191781| | null| null| null| +--------------------+--------------------+------------------+
Deprecated Value Members
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] ) @Deprecated
- Deprecated