We distribute our Python app, which uses Spark, together with Python 3.7 interpreter (python.exe
with all necessary libs lies near MyApp.exe
).
To set PYSPARK_PYTHON
we have have function which determines the path to our python.exe
:
os.environ['PYSPARK_PYTHON'] = get_python()
on Windows PYSPARK_PYTHON
will become C:/MyApp/python.exe
on Ubuntu PYSPARK_PYTHON
will become /opt/MyApp/python.exe
We start the master/driver node and create SparkSession
on Windows. Then we start the worker node on Ubuntu but the worker fails with:
Job aborted due to stage failure: Task 1 in stage 11.0 failed 4 times, most recent failure: Lost task 1.3 in stage 11.0 (TID 1614, 10.0.2.15, executor 1): java.io.IOException: Cannot run program "C:/MyApp/python.exe": error=2, No such file or directory
Of course, there is no C:/MyApp/python.exe
on ubuntu.
If I understand this error correctly, PYSPARK_PYTHON
from driver is sent to all workers.
Also tried to set PYSPARK_PYTHON
in spark-env.sh
and spark-defaults.conf
. How can I change PYSPARK_PYTHON
on Ubuntu workers to become /opt/MyApp/python.exe
?