Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

As I am sitting here waiting for some R scripts to run...I was wondering... is there any way to parallelize rbind in R?

I sitting waiting for this call to complete frequently as I deal with large amounts of data.

do.call("rbind", LIST)
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
256 views
Welcome To Ask or Share your Answers For Others

1 Answer

I haven't found a way to do this in parallel either thus far. However for my dataset (this one is a list of about 1500 dataframes totaling 4.5M rows) the following snippet seemed to help:

while(length(lst) > 1) {
    idxlst <- seq(from=1, to=length(lst), by=2)

    lst <- lapply(idxlst, function(i) {
        if(i==length(lst)) { return(lst[[i]]) }

        return(rbind(lst[[i]], lst[[i+1]]))
    })
}

where lst is the list. It seemed to be about 4 times faster than using do.call(rbind, lst) or even do.call(rbind.fill, lst) (with rbind.fill from the plyr package). In each iteration this code is halving the amount of dataframes.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...