Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I want to cluster my data to say 5 clusters, then we need to select 50 individuals with most dissimilar relationship from all the data. That means if cluster one contains 100, two contains 200, three contains 400, four contains 200, and five 100, I have to select 5 from the first cluster + 10 from the second cluster + 20 from the third + 10 from the fourth + 5 from the fifth.

Data example:

     mydata<-matrix(nrow=100,ncol=10,rnorm(1000, mean = 0, sd = 1))

What I did till now is clustering the data and rank the individuals within each cluster, then export it to excel and go from there … That has become became a problem since my data has became really big.

I will appreciate any help or suggestion on how to apply the previous in R .

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
169 views
Welcome To Ask or Share your Answers For Others

1 Answer

I′m not sure if it is exactly what you are searching, but maybe it helps:

mydata<-matrix(nrow=100, ncol=10, rnorm(1000, mean = 0, sd = 1))
rownames(mydata) <- paste0("id", 1:100) # some id for identification


# cluster objects and calculate dissimilarity matrix
cl <- cutree(hclust(
  sim <- dist(mydata, diag = TRUE, upper=TRUE)), 5) 

# combine results, take sum to aggregate dissimilarity
res <- data.frame(id=rownames(mydata),
                  cluster=cl, dis_sim=rowSums(as.matrix(sim)))
# order, lowest overall dissimilarity will be first
res <- res[order(res$dis_sim), ] 


# split object
reslist <- split(res, f=res$cluster)


## takes first three items with highest overall dissim.
lapply(reslist, tail, n=3) 

## returns id′s with highest overall dissimilarity, top 20% 
lapply(reslist, function(x, p) tail(x, round(nrow(x)*p)), p=0.2)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...