I am trying to find the solution in spark to group data with a common element in an array.
key value
[k1,k2] v1
[k2] v2
[k3,k2] v3
[k4] v4
If any element matches in key, we have to assign same groupid to that.(Groupby common element)
Result:
key value GroupID
[k1,k2] v1 G1
[k2] v2 G1
[k3,k2] v3 G1
[k4] v4 G2
Some suggestions are already given with Spark Graphx, but at this moment learning curve will be more to implement this for a single feature.
See Question&Answers more detail:os