As a matter of best practices, I'm trying to determine if it's better to create a function and apply()
it across a matrix, or if it's better to simply loop a matrix through the function. I tried it both ways and was surprised to find apply()
is slower. The task is to take a vector and evaluate it as either being positive or negative and then return a vector with 1 if it's positive and -1 if it's negative. The mash()
function loops and the squish()
function is passed to the apply()
function.
million <- as.matrix(rnorm(100000))
mash <- function(x){
for(i in 1:NROW(x))
if(x[i] > 0) {
x[i] <- 1
} else {
x[i] <- -1
}
return(x)
}
squish <- function(x){
if(x >0) {
return(1)
} else {
return(-1)
}
}
ptm <- proc.time()
loop_million <- mash(million)
proc.time() - ptm
ptm <- proc.time()
apply_million <- apply(million,1, squish)
proc.time() - ptm
loop_million
results:
user system elapsed
0.468 0.008 0.483
apply_million
results:
user system elapsed
1.401 0.021 1.423
What is the advantage to using apply()
over a for
loop if performance is degraded? Is there a flaw in my test? I compared the two resulting objects for a clue and found:
> class(apply_million)
[1] "numeric"
> class(loop_million)
[1] "matrix"
Which only deepens the mystery. The apply()
function cannot accept a simple numeric vector and that's why I cast it with as.matrix()
in the beginning. But then it returns a numeric. The for
loop is fine with a simple numeric vector. And it returns an object of same class as that one passed to it.