Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

Given a large data frame e.g. df, with 500 columns and 100 rows, how do I just subset columns exceeding a specific threshold e.g. 1 ?

Thanks

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
252 views
Welcome To Ask or Share your Answers For Others

1 Answer

Assuming you mean subset columns that all of their values are greater than 1 you could do something like this (essentially you can also use the following according to any condition you might have. Just change the if condition):

Example Data

a <- data.frame(matrix(runif(90),ncol=3))

> a
           X1         X2         X3
1  0.33341130 0.09307143 0.51932506
2  0.78014395 0.30378432 0.67309736
3  0.19967771 0.30829771 0.60144888
4  0.77736355 0.42504910 0.23880491
5  0.60631868 0.55198423 0.29565519
6  0.24246456 0.57945721 0.17882712
7  0.10499677 0.48768998 0.54931955
8  0.92288335 0.29290491 0.72885160
9  0.85246128 0.87564673 0.60069170
10 0.39931205 0.29895856 0.83249469
11 0.33674259 0.85618041 0.62940935
12 0.27816980 0.51508938 0.76079354
13 0.19121182 0.27586235 0.21273823
14 0.66337625 0.18631150 0.67762964
15 0.00923405 0.84753915 0.08386400
16 0.33209371 0.54919903 0.49128825
17 0.97685675 0.25564765 0.56439142
18 0.26710042 0.75852884 0.88706946
19 0.32422355 0.58971620 0.84070049
20 0.73000898 0.09068726 0.92541277
21 0.80547283 0.93723241 0.31050230
22 0.28897215 0.80679092 0.06080124
23 0.32190269 0.12254342 0.42506740
24 0.52569405 0.68506407 0.68302356
25 0.31098388 0.66225007 0.08565480
26 0.67546897 0.08123716 0.58419470
27 0.29501987 0.17836528 0.79322116
28 0.20736102 0.81145297 0.44078101
29 0.75165829 0.51865202 0.36653840
30 0.63375066 0.03804626 0.69949846

Solution

Just a single lapply is enough. I use 0.05 as threshold here because it is easier to demonstrate how to use it according to my random data set. Change that to whatever you want in your dataset.

b <- do.call(cbind, (lapply(a, function(x) if(all(x>0.05)) return(x) )))

Output

> b
              X3
 [1,] 0.51932506
 [2,] 0.67309736
 [3,] 0.60144888
 [4,] 0.23880491
 [5,] 0.29565519
 [6,] 0.17882712
 [7,] 0.54931955
 [8,] 0.72885160
 [9,] 0.60069170
[10,] 0.83249469
[11,] 0.62940935
[12,] 0.76079354
[13,] 0.21273823
[14,] 0.67762964
[15,] 0.08386400
[16,] 0.49128825
[17,] 0.56439142
[18,] 0.88706946
[19,] 0.84070049
[20,] 0.92541277
[21,] 0.31050230
[22,] 0.06080124
[23,] 0.42506740
[24,] 0.68302356
[25,] 0.08565480
[26,] 0.58419470
[27,] 0.79322116
[28,] 0.44078101
[29,] 0.36653840
[30,] 0.69949846

Only column 3 confirmed the condition on this occasion so it was returned.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...