I am using PySpark from Django and connect to a spark master node using SparkSession to execute a job on the cluster.
My question is do I need a full install of spark on my local machine? All the documentation has me install spark and then add the PySpark libraries to the python path. I don't believe I need all ~500mb of that to connect to an existing cluster. I'm trying to lighten my docker containers.
Thanks for the help.
See Question&Answers more detail:os