scala - How to obtain the symmetric difference between two DataFrames?

Question

Welcome To Ask or Share your Answers For Others

scala - How to obtain the symmetric difference between two DataFrames?

asked Oct 17, 2021 in Technique[技术] by 深蓝 (71.8m points)

In the SparkSQL 1.6 API (scala) Dataframe has functions for intersect and except, but not one for difference. Obviously, a combination of union and except can be used to generate difference:

df1.except(df2).union(df2.except(df1))

But this seems a bit awkward. In my experience, if something seems awkward, there's a better way to do it, especially in Scala.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

747 views

1 Answer

深蓝 · Answer 1 · 2021-10-17T02:47:35+0000

You can always rewrite it as:

df1.unionAll(df2).except(df1.intersect(df2))

Seriously though this UNION, INTERSECT and EXCEPT / MINUS is pretty much a standard set of SQL combining operators. I am not aware of any system which provides XOR like operation out of the box. Most likely because it is trivial to implement using other three and there is not much to optimize there.

Categories

scala - How to obtain the symmetric difference between two DataFrames?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags