Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

enter image description here

I am running a spark job with 3 files each of 100MB size, for some reason my spark UI shows all dataset concentrated into 2 executors.This is making the job run for 19 hrs and still running. Below is my spark configuration . spark 2.3 is the version used.

spark2-submit --class org.mySparkDriver 
    --master yarn-cluster 
    --deploy-mode cluster 
    --driver-memory 8g 
    --num-executors 100 
    --conf spark.default.parallelism=40 
    --conf spark.yarn.executor.memoryOverhead=6000mb 
    --conf spark.dynamicAllocation.executorIdleTimeout=6000s 
    --conf spark.executor.cores=3 
    --conf spark.executor.memory=8G 

I tried repartitioning inside the code which works , as this makes the file go into 20 partitions (i used rdd.repartition(20)). But why should I repartition , i believe specifying spark.default.parallelism=40 in the script should let spark divide the input file to 40 executors and process the file in 40 executors.

Can anyone help.

Thanks, Neethu

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
139 views
Welcome To Ask or Share your Answers For Others

1 Answer

I am assuming you're running your jobs in YARN if yes, you can check following properties.

yarn.scheduler.maximum-allocation-mb
yarn.nodemanager.resource.memory-mb
yarn.scheduler.maximum-allocation-vcores
yarn.nodemanager.resource.cpu-vcores

In YARN these properties would affect number of containers that can be instantiated in a NodeManager based on spark.executor.cores, spark.executor.memory property values (along with executor memory overhead)

For example, if a cluster with 10 nodes (RAM : 16 GB, cores : 6) and set with following yarn properties

yarn.scheduler.maximum-allocation-mb=10GB 
yarn.nodemanager.resource.memory-mb=10GB
yarn.scheduler.maximum-allocation-vcores=4
yarn.nodemanager.resource.cpu-vcores=4

Then with spark properties spark.executor.cores=2, spark.executor.memory=4GB you can expect 2 Executors/Node so total you'll get 19 executors + 1 container for Driver

If the spark properties are spark.executor.cores=3, spark.executor.memory=8GB then you will get 9 Executor (only 1 Executor/Node) + 1 container for Driver

you can refer to link for more details

Hope this helps


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...