Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a dataframe that is constructed using the transform() method of the VectorAssembler class. Besides, I have a trained k-means model that can output the i-th center point when the method "clusterCenter(i)" is called on it. The center point has the same dimension as each row of the dataframe(if converted to vector). Number of center points is 2 times of the number of rows in dataframe

Now I want to calculate the cosine value between each row in dataframe and each center point vector, and append the cosine to a list. Following is my code:

val cosine_list=ListBuffer(("sample_string",0.0)) // first item in list to show 
                                                 //the data structure of list
    for (i<- 0 until k){ //k: number of rows in dataframe
      val cen0=df.select("features").collect()(i).getAs[Vector](0)
      val cen0_new=org.apache.spark.mllib.linalg.Vectors.fromML(cen0)
      for (j<-0 until 2*k){ //number of center points is 2* number of rows in df
        val cen1=model.clusterCenters(j) //get the j-th center point vector
        val cen1_new=org.apache.spark.mllib.linalg.Vectors.fromML(cen1)
        val sqr_cen0=Vectors.norm(cen0_new,2)
        val sqr_cen1=Vectors.norm(cen1_new,2)
        val dot1=DenseVector(cen0_new.toArray).dot(DenseVector(cen1_new.toArray))
        val cos=dot1/(sqr_cen0*sqr_cen1)
        val map_name=s"${i}_${j}"
        cosine_list.append((map_name,cos))
      }

The above code works fine, it just takes a lot of time(of course it also depends on the size of data). My question is that can the code snippet be improved in terms of efficiency(by using another API or whatever)? thanks in advance!

question from:https://stackoverflow.com/questions/65896652/improvement-of-scala-spark-code-snippet-for-calculating-cosine-similarity-in-ter

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
382 views
Welcome To Ask or Share your Answers For Others

1 Answer

Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...