Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a file with column id, earlier it used to be a small number and has datatype as int but later on had bigger values and now the values in it are of type bigint.

I am trying to read the value from the column as:

from pyspark.sql.functions import *
from pyspark.sql.types import * 
df = spark.read.parquet("hadoop file location")
df=df.selectExpr("cast(id as BIGINT) as id1") # also tried cast(id as INT) same error

df.show(10, False)

I am getting an exception as:

Caused by: org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException

Caused by: org.apache.spark.sql.execution.QueryExecutionException: Parquet column cannot be converted in file hdfsfilename.snappy.parquet. Column: [id], Expected: int, Found: INT64

How can i read the correct value of column id from this data?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
607 views
Welcome To Ask or Share your Answers For Others

1 Answer

等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...