I understand the basic theory of textFile
generating partition for each file, while wholeTextFiles
generates an RDD of pair values, where the key is the path of each file, the value is the content of each file.
Now, from a technical point of view, what's the difference between :
val textFile = sc.textFile("my/path/*.csv", 8)
textFile.getNumPartitions
and
val textFile = sc.wholeTextFiles("my/path/*.csv",8)
textFile.getNumPartitions
In both methods I'm generating 8 partitions. So why should I use wholeTextFiles
in the first place, and what's its benefit over textFile
?