Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I want to find range for input using scala as given below input dataframe

Input
    scala> val r_df = Seq((1,"1 to 6"),(2,"44/1 to 3")).toDF("id","range")
    r_df: org.apache.spark.sql.DataFrame = [id: int, range: string]


scala> r_df.show
+---+---------+
| id|    range|
+---+---------+
|  1|   1 to 6|
|  2|44/1 to 3|
+---+---------+

for loop udf

val survey_to1 = udf((data1: Int, data2: Int) => {
      val arr = new ArrayBuffer[Int]()
      for(i <- data1 to data2)
      {
        arr+= i
      }
      arr
    })




r_df4.withColumn("new", survey_to1(col("new1"),col("new3"))).show(false)

applied above for loop udf to dataframe, it is taking only "1 to 6"

+---+---------+----+----+----+------------------+
|id |range    |new1|new2|new3|new               |
+---+---------+----+----+----+------------------+
|1  |1 to 6   |1   |to  |6   |[1, 2, 3, 4, 5, 6]|
|2  |44/1 to 3|44/1|to  |3   |null              |
+---+---------+----+----+----+------------------+

Expected output

+---+---------+----+----+----+------------------+
|id |range    |new1|new2|new3|new               |
+---+---------+----+----+----+------------------+
|1  |1 to 6   |1   |to  |6   |[1, 2, 3, 4, 5, 6]|
|2  |44/1 to 3|44/1|to  |3   |[44/1,44/2,44,3]  |
+---+---------+----+----+----+------------------+
question from:https://stackoverflow.com/questions/65933769/range-pattern-using-for-loop

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
184 views
Welcome To Ask or Share your Answers For Others

1 Answer

With those specific string patterns:

import org.apache.spark.sql.functions.udf

val patern = "([0-9.]{2}/[0-9.]{1}|[0-9.]{1}) to ([0-9.]{1})".r

def createArray = udf { str : String =>
    val patern(from, _to) = str
    ((from.split("/").last.toInt to _to.toInt).toArray)
      .map(el => {
        val strPattern = from.split("/")
        s"""${ if(strPattern.length > 1) strPattern(0) + "/" + el else el
        }"""
      })
  }

val r_df = Seq((1,"1 to 6"),(2,"44/1 to 3")).toDF("id","range")
r_df.withColumn("array", createArray($"range")).show(false)

gives:

+---+---------+------------------+
|id |range    |array             |
+---+---------+------------------+
|1  |1 to 6   |[1, 2, 3, 4, 5, 6]|
|2  |44/1 to 3|[44/1, 44/2, 44/3]|
+---+---------+------------------+

to add a patter to support strings with the format "3a to 5a" just update the regex with:

val patern = "([0-9.]{2}/[0-9.]{1}|[0-9.]{1})[a-zA-Z0-9_]* to ([0-9.]{1})[a-zA-Z0-9_]*".r

For example:

+---+---------+------------------+
|id |range    |array             |
+---+---------+------------------+
|1  |1 to 6   |[1, 2, 3, 4, 5, 6]|
|2  |44/1 to 3|[44/1, 44/2, 44/3]|
|3  |3a to 5a |[3, 4, 5]         |
+---+---------+------------------+

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...