I am trying to split a column of total count into different ranges of columns using pyspark. I am well versed with doing this in SQL but not clear on how to do it using PySpark. Glad if anyone can enlighten me on this.
I want to sort the matches columns into 3 different bins of columns where:
matches = 0
,matches => 1 & < =3
,matches => 1 & < =5
Sample DataFrame:
+-----+—-------+
|names| matches|
+-----+-—------+
| Sam| 1|
| Tom| 3|
| Max| 5|
| Kai| 7|
+-----+—-------+
Expected DataFrame Outcome:
+-----------+-----------+-------+
| 0 matches | lessthan3 | upto5 |
+-----------+-----------+-------+
| 0| 1| 3 |
+-----------+-----------+-------+
question from:https://stackoverflow.com/questions/65867294/spark-count-records-into-specified-ranges