Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

When multiplying two columns together in a spark SQL table with random negative values, returns "NaN" for those which have a negative in one of the columns. Any techniques to help get the calculations work?

SELECT temperature * days FROM weather_data
question from:https://stackoverflow.com/questions/66059902/dealing-with-negatives-in-calculations-databricks-spark-sql

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
110 views
Welcome To Ask or Share your Answers For Others

1 Answer

If you get NaN from a multiplication, mybe one or more columns contains NaN values. You can use nanvl to set a default value (ex. 0) when the column is NaN. Use it with coalesce to handle nulls too:

SELECT coalesce(nanvl(temperature, 0), 0) * days FROM weather_data

Example:

weather_data table:
+-----------+----+
|temperature|days|
+-----------+----+
|        NaN|   1|
|     -12.34|   2|
|       null|   3|
|       15.5|   4|
+-----------+----+

spark.sql("SELECT coalesce(nanvl(temperature, 0), 0) * days AS mul FROM weather_data").show()

+------+
|   mul|
+------+
|   0.0|
|-24.68|
|   0.0|
|  62.0|
+------+

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...