Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
983 views
in Technique[技术] by (71.8m points)

select values from first table not present in second table in spark using scala or spark sql


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

You can just do a leftanti join on two dataframes to get the expected output.

 val df = Seq(("A","20210121","key1","value1"),("A","20210121","key2","value2"),("A","20210121","key3","value3"),("B","20210121","key1","value1"),("B","20210121","key2","value1"),("B","20210121","key3","value3"),("B","20210121","key4","value3"),("C","20210121","key1","value2"))
.toDF("row_key","data_as_of_date","key","value")

 val df1 = Seq(("A","20210121","key1","value1"),("A","20210121","key2","value2"),("B","20210121","key1","value1"),("B","20210121","key4","value3"),("C","20210121","key1","value2"))
.toDF("row_key","data_as_of_date","key","value")

 val outputdf = df.join(df1, Seq("row_key","data_as_of_date","key"),"leftanti")
 display(outputdf)

You can see the output as below : enter image description here


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...