Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I've to calculate so called competition index for a couple of the experiments. I have known position of the object and its size. I'd like to calculate the sum of the sizes in a certain radius and the sum of the distances to the objects that are within this radius. The example of the data are here:

set.seed(13181938)
    df <- data.frame(exp = rep(LETTERS[1:20], each = 100), x = rnorm(1000, 100, 50), 
                     y = rnorm(1000, 100, 50), di = rnorm(5, 2, 2))
df$comp1 <- 0
df$dist <- 0

I used a loop for calculations but it takes a lot of time to complete the calculation for 1000 objects. In the real data set I have more than 10000 objects.

fori <- function(x) {
  for (i in 1:nrow(x)){
    for (j in 1:nrow(x)){
      dist = sqrt((x$x[j] - x$x[i])^2 + (x$y[j] - x$y[i])^2)
        #print(paste(x$exp[i], x$exp[j], dist))
        if(dist < 2 & x$exp[i] == x$exp[j]){
        x$comp1[i] = x$comp1[i] + x$di[j]
        x$dist[i] = x$dist[i] + dist
      }
    }
  }
  df <- data.frame(x)
  return(df)
}

abc <- fori(df)

It takes a very long time to run this loop for the example and it means that it will take much more for the entire data set. Can you suggest any other way? I tried apply and DT but without success.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
1.3k views
Welcome To Ask or Share your Answers For Others

1 Answer

Loops like this are a perfect candidate for speeding up with Rcpp. The logic translates across unchanged:

library(Rcpp)

cppFunction('
List
computeIndex(const NumericVector x,
             const NumericVector y, 
             const NumericVector di,
             const CharacterVector ex)
{
    int n = x.size();
    NumericVector comp1(n), dist(n);

    for(int i = 0; i < n; ++i)
    {
        for(int j = 0; j < n; ++j)
        {
            double dx = x[j] - x[i], dy = y[j] - y[i];
            double d = std::sqrt(dx*dx + dy*dy);

            if((d < 2) && (ex[i] == ex[j]))
            {
                comp1[i] += di[j];
                dist[i] +=  d;
            }
        }
    }

    return List::create(Named("comp1") = comp1,
                        Named("dist") = dist);
}
')

res <- data.frame(computeIndex(df$x, df$y, df$di, df$exp))

Not only is this faster than the equivalent R-only code, but it avoids having to allocate any O(N^2) objects. You can also combine this with dplyr to avoid needless comparisons between rows with different exp values:

df %>%
    group_by(exp) %>%
    do({
        res <- computeIndex(.$x, .$y, .$di, .$exp)
        data.frame(., res)
    })

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...