Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I am new to spark and would like to know how many cores and executors have to be used in a spark job and AWS if we have 2 slave c4.8xlarge nodes and 1 c4.8x large master node. I have tried different combinations but not able to understand the concept.

Thank you.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
197 views
Welcome To Ask or Share your Answers For Others

1 Answer

Cloudera guys gave good explanation on that

https://www.youtube.com/watch?v=vfiJQ7wg81Y

If, let's say you have 16 cores on your node(I think this is exactly your case), then you give 1 for yarn to manage this node, then you devide 15 to 3, so each executor has 5 cores. Also, you have java overhead which is Max(384M, 0.07*spark.executor.memory). So, if you have 3 executors per node, then you have 3*Max(384M, 0.07*spark.executor.memory) overhead for JVMs, the rest can be used for memory containers. enter image description here

However, on a cluster with many users working simultaneously, yarn can push your spark session out of some containers, making spark go all the way back through the DAG and bringing all the RDD to the present state, which is bad. That is why you need to make --num-executors, --executor-memory and --executor-cores slightly less to give some space to other users in advance. But this doesn't apply to AWS where you are the only one user.

--executor-memory 18Gb should work for you btw

More details on turning your cluster parameters http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...