Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a data set with multiple columns. A function needs to be invoked to compute result using the data available within a row. So I used a case class with a method and created a data set using it. As example,

case class testCase(x: Double, a1: Array[Double], a2: Array[Double]) {
    var someInt = 0
    def myMethod1(): Unit = {...}    // use x, a1 and a2
    def myMethod2(): Unit = {...}    // use x, a1 and a2
    def result(): { return someInt }

It is called from the main() as

val res = myDS.map(_.result()).toDF("result")

The problem I am facing is that while the code works correctly, no matter how I invoke, unlike for the other parts of the program, the above statement does not work concurrently. Irrespective of the number executors, cores and repartitioning, only one instance of a method seems to work at time!

Any hints to what I should look at would be appreciated.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
164 views
Welcome To Ask or Share your Answers For Others

1 Answer

testCase case class should not be mutable, if you concurrently modify the state of the object your program will be non deterministic because of that. What is looking wrong with the few information you are giving is this snippet

var someInt = 0

that value is probably being modified concurrently by several tasks and I am pretty sure you don't want that.

Can you explain what are you trying to do?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...