You can try an anti
join which removes rows that satisfy the given condition:
import pyspark.sql.functions as F
result = df2.alias('df2').join(df1.alias('df1'),
F.expr("""
df2.custm_id = df1.custm_id and
df2.transf_date = df1.transf_date and
df2.transfr = df1.transfr and
df2.pdt != df1.pdt
"""),
'anti'
)
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…