Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

When iterating through a Dataframe using .foreach in Spark Scala is it possible to access another DataFrame, or load a DataFrame from SparkSQL, to make comparisons? For example, DF1 has available days and if a day is marked as not available on DF1 but appears on DF2 I would like to ignore that row of DF1. I have the logic working when I do a .collect on DF1 and iterate, but DF1 will be a large dataset and I do not want to be pulling all of that data back on to the driver.

DF1 Schema
 |-- id: integer (nullable = false)
 |-- monday: boolean (nullable = false)
 |-- tuesday: boolean (nullable = false)
 |-- wednesday: boolean (nullable = false)
 |-- thursday: boolean (nullable = false)
 |-- friday: boolean (nullable = false)
 |-- saturday: boolean (nullable = false)
 |-- sunday: boolean (nullable = false)

 DF2 Schema
 |-- start: timestamp (nullable = false)
 |-- end: timestamp (nullable = false)
 |-- dayStart: string (nullable = false)
 |-- dayEnd: string (nullable = false)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
3.7k views
Welcome To Ask or Share your Answers For Others

1 Answer

等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...