You can use when
to mask the irrelevant values, and get the max
From
date where To
is null:
import org.apache.spark.sql.expressions.Window
val df2 = df.withColumn(
"To",
when(
$"to" === lit("2198-01-01"),
max(when($"To".isNull, $"From")).over(Window.partitionBy("id"))
).otherwise($"to")
)
df2.show
+-----+----------+----------+
| id| From| To|
+-----+----------+----------+
|James|2021-01-09|2021-01-15|
|James|2021-01-14|2021-01-22|
|James|2021-01-22| null|
| Sara|2021-01-16|2021-02-23|
| Sara|2021-02-23| null|
+-----+----------+----------+
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…