r - dplyr to calculate of prevalence of a variable in a condition

Question

Welcome To Ask or Share your Answers For Others

r - dplyr to calculate of prevalence of a variable in a condition

asked Jan 31, 2022 in Technique[技术] by 深蓝 (71.8m points)

I am new in the dplyr world - so sorry if the question might sound simple, basically, I am interested in calculating the number of entries that are larger than 0.5 for each column. If they are lower than 0.5 I consider them as zero. I don't mind having a vector, that stores this number.

here is the example

messy <- data.frame(samples = c("s1", "s2", "s3", "s4"),
                    o1 = c(0.5, 0.7, 0.8, 0.6),
                    o2 = c(0.2, 0.8, 0.8, 0.1),
                    o3 = c(0.9, 0.2, 0.0, 0.1),
                    o4 = c(0.1, 0.6, 0.4, 0.4))
bb <- gather(messy, otu, counts, o1:o4)

bb %>% filter(counts > 0.5) %>% group_by(otu) %>% summarize(fre=n())
bb$fre/4

** update, I believe the code in the example is what I wanted to have.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

486 views

1 Answer

深蓝 · Answer 1 · 2022-01-31T07:26:47+0000

answered Jan 31, 2022 by 深蓝 (71.8m points)

You can do colSums(messy > 0.5). This doesn't use dplyr but it is very simple and efficient.

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

r - dplyr to calculate of prevalence of a variable in a condition

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags