Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

What is the corrent syntax for filtering on multiple columns in the Scala API? If I want to do something like this:

dataFrame.filter($"col01" === "something" && $"col02" === "something else")

or

dataFrame.filter($"col01" === "something" || $"col02" === "something else") 

EDIT:

This is what my original code looks like. Everything comes in as a string.

df.select($"userID" as "user", $"itemID" as "item", $"quantity" cast("int"), $"price" cast("float"), $"discount" cast ("float"), sqlf.substring($"datetime", 0, 10) as "date", $"group")
  .filter($"item" !== "" && $"group" !== "-1")
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.1k views
Welcome To Ask or Share your Answers For Others

1 Answer

I think i see what the issue is. For some reason, spark does not allow two !='s in the same filter. Need to look at how filter is defined in Spark source code.

Now for your code to work, you can use this to do the filter

df.filter(col("item").notEqual("") && col("group").notEqual("-1"))

or use two filters in same statement

df.filter($"item" !== "").filter($"group" !== "-1").select(....)

This link here can help with different spark methods.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...