Cloudera guys gave good explanation on that
https://www.youtube.com/watch?v=vfiJQ7wg81Y
If, let's say you have 16 cores on your node(I think this is exactly your case), then you give 1 for yarn to manage this node, then you devide 15 to 3, so each executor has 5 cores.
Also, you have java overhead which is Max(384M, 0.07*spark.executor.memory).
So, if you have 3 executors per node, then you have 3*Max(384M, 0.07*spark.executor.memory) overhead for JVMs, the rest can be used for memory containers.
However, on a cluster with many users working simultaneously, yarn can push your spark session out of some containers, making spark go all the way back through the DAG and bringing all the RDD to the present state, which is bad. That is why you need to make --num-executors, --executor-memory and --executor-cores slightly less to give some space to other users in advance. But this doesn't apply to AWS where you are the only one user.
--executor-memory 18Gb should work for you btw
More details on turning your cluster parameters
http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…