Scala Spark reverse grouping of groupBy

Question

Welcome To Ask or Share your Answers For Others

Scala Spark reverse grouping of groupBy

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

Im trying to reverse (flatten out) the grouping created on a RDD in Scala, like this: https://backtobazics.com/big-data/spark/apache-spark-groupby-example/

Basically what i have is a key - value where the value is a list. I want to flatten that out. I cant figure out how to go about it, im thinking it must lie in flatmap somehow, but i cant figure out the syntax. Can anybody point me in the right direction please?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

215 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:16:09+0000

You should provide some code in order to answer your question, but here is how you can flatten a groupBy by leveraging flatMap (I am using a code snippet similar to the "Spark groupBy Example Using Scala"). For now, I assume you are working with an RDD of strings.

val v = Array("foo", "bar", "foobarz")
val rdd: org.apache.spark.rdd.RDD[String] = sc.parallelize(v)
val kvRDD: org.apache.spark.rdd.RDD[(String, Iterable[String])] = rdd.groupBy(x => x) // your group by function goes here
// if you explicitly want to keep the key and generate an RDD of tuples
val pairRDD: org.apache.spark.rdd.RDD[(String, String)] = kvRDD.flatMap({ case (k: String, v: Iterable[String]) => v.map(i => (k, i))})
// or if you just want to undo the grouping without preserving the key
val origRDD: org.apache.spark.rdd.RDD[String] = kvRDD.flatMap({ case (_: String, v: Iterable[String]) => v})

Categories

Scala Spark reverse grouping of groupBy

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags