I am not able to convert string to date format in spark.sql. When I am passing the raw string , it is converted successfully but when I am trying to store that value in a variable and passing that argument , getting type mismatch error. Tried many different techniques but still getting the same error. Can someone please help me with this thing :
>>> s
'2020-10-23'
>>> type(s)
<type 'str'>
>>> spark.sql("""select cast('2020-10-23' as date)""").show()
+------------------------+
|CAST(2020-10-23 AS DATE)|
+------------------------+
| 2020-10-23|
+------------------------+
>>> spark.sql("""select cast("""+s+""" as date)""").show()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/session.py", line 778, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve 'CAST(((2020 - 10) - 23) AS DATE)' due to data type mismatch: cannot cast int to date; line 1 pos 7;
'Project [unresolvedalias(cast(((2020 - 10) - 23) as date), None)]
+- OneRowRelation
"
>>> spark.sql("""select cast("""+str(s)+""" as date)""").show()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/session.py", line 778, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve 'CAST(((2020 - 10) - 23) AS DATE)' due to data type mismatch: cannot cast int to date; line 1 pos 7;
'Project [unresolvedalias(cast(((2020 - 10) - 23) as date), None)]
+- OneRowRelation
"
>>>
>>> s
'2020-10-23'
>>> type(s)
<type 'str'>
>>>
>>> spark.sql("""select cast(date_format("""+s+""",'yyyy-MM-dd') as date)""").show()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/session.py", line 778, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p4355.6905851/lib/spark/python/pyspark/sql/utils.py", line 69, in deco
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve 'date_format(((2020 - 10) - 23), 'yyyy-MM-dd')' due to data type mismatch: argument 1 requires timestamp type, however, '((2020 - 10) - 23)' is of int type.; line 1 pos 12;
'Project [unresolvedalias(cast(date_format(((2020 - 10) - 23), yyyy-MM-dd, Some(America/New_York)) as date), None)]
+- OneRowRelation
"
>>>
question from:https://stackoverflow.com/questions/65867764/not-able-convert-string-to-date-in-spark-sql-getting-error