I've been searching for a while if there is any way to use a Scala
class in Pyspark
, and I haven't found any documentation nor guide about this subject.
Let's say I create a simple class in Scala
that uses some libraries of apache-spark
, something like:
class SimpleClass(sqlContext: SQLContext, df: DataFrame, column: String) {
def exe(): DataFrame = {
import sqlContext.implicits._
df.select(col(column))
}
}
- Is there any possible way to use this class in
Pyspark
? - Is it too tough?
- Do I have to create a
.py
file? - Is there any guide that shows how to do that?
By the way I also looked at the spark
code and I felt a bit lost, and I was incapable of replicating their functionality for my own purpose.