In Databricks (SparkR
), I run the batch algorithm of the self-organizing map in parallel from the kohonen
package as it gives me considerable reductions in computation time as opposed to my local machine. However, after fitting the model I would like to download/export the trained model (a list
) to my local machine to continue working with the results (create plots etc.) in a way that is not available in Databricks. I know how to save & download a SparkDataFrame
to csv:
sdftest # a SparkDataFrame
write.df(sdftest, path = "dbfs:/FileStore/test.csv", source = "csv", mode = "overwrite")
However, I am not sure how to do this for a 'regular' R list
object.
Is there any way to save the output created in Databricks to my local machine in .RData
format? If not, is there a workaround that would still allow me to continue working with the model results locally?
EDIT :
library(kohonen)
# Load data
sdf.cluster <- read.df("abfss://cluster.csv", source = "csv", header="true", inferSchema = "true")
# Collet SDF to RDF as kohonen::som is not available for SparkDataFrames
rdf.cluster <- SparkR::collect(sdf.cluster)
# Change rdf to matrix as is required by kohonen::som
rdf.som <- as.matrix(rdf.cluster)
# Parallel Batch SOM from Kohonen
som.grid <- somgrid(xdim = 5, ydim = 5, topo="hexagonal",
neighbourhood.fct="gaussian")
set.seed(1)
som.model <- som(rdf.som, grid=som.grid, rlen=10, alpha=c(0.05,0.01), keep.data = TRUE, dist.fcts = "euclidean", mode = "online")
Any help is very much appreciated!