I have a file with column id, earlier it used to be a small number and has datatype as int but later on had bigger values and now the values in it are of type bigint.
I am trying to read the value from the column as:
from pyspark.sql.functions import *
from pyspark.sql.types import *
df = spark.read.parquet("hadoop file location")
df=df.selectExpr("cast(id as BIGINT) as id1") # also tried cast(id as INT) same error
df.show(10, False)
I am getting an exception as:
Caused by: org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException
Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file hdfsfilename.snappy.parquet. Column: [id], Expected: int, Found: INT64
How can i read the correct value of column id from this data?