Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I need to read a different file in every map() ,the file is in HDFS

  val rdd=sc.parallelize(1 to 10000)
  val rdd2=rdd.map{x=>
    val hdfs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI("hdfs://ITS-Hadoop10:9000/"), new org.apache.hadoop.conf.Configuration())
    val path=new Path("/user/zhc/"+x+"/")
    val t=hdfs.listStatus(path)
    val in =hdfs.open(t(0).getPath)
    val reader = new BufferedReader(new InputStreamReader(in))
    var l=reader.readLine()
  }
 rdd2.count

My problem is this code

val hdfs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI("hdfs://ITS-Hadoop10:9000/"), new org.apache.hadoop.conf.Configuration())

takes too much running time, every time of map() needs to create a new FileSystem value. Can i put this code outside map() function so it doesn't have to create hdfs every time? Or how can i read files quickly in map()?

My code runs on multiple machines. Thank you!

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
908 views
Welcome To Ask or Share your Answers For Others

1 Answer

In your case, I recommend the use of wholeTextFiles method wich will return pairRdd with the key is the file full path, and the value is the content of the file in string.

val filesPariRDD = sc.wholeTextFiles("hdfs://ITS-Hadoop10:9000/")
val filesLineCount = filesPariRDD.map( x => (x._1, x._2.length ) ) //this will return a map of fileName , number of lines of each file. You could apply any other function on the file contents
filesLineCount.collect() 

Edit

If your files are in directories which are under the same directory ( as mentioned in comments)you could use some kind of regular expression

val filesPariRDD = sc.wholeTextFiles("hdfs://ITS-Hadoop10:9000/*/")

Hope this is clear and helpful


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share

548k questions

547k answers

4 comments

86.3k users

...