Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

In my understanding, I thought PCA can be performed only for continuous features. But while trying to understand the difference between onehot encoding and label encoding came through a post in the following link:

When to use One Hot Encoding vs LabelEncoder vs DictVectorizor?

It states that one hot encoding followed by PCA is a very good method, which basically means PCA is applied for categorical features. Hence confused, please suggest me on the same.

question from:https://stackoverflow.com/questions/40795141/pca-for-categorical-features

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
199 views
Welcome To Ask or Share your Answers For Others

1 Answer

I disagree with the others.

While you can use PCA on binary data (e.g. one-hot encoded data) that does not mean it is a good thing, or it will work very well.

PCA is designed for continuous variables. It tries to minimize variance (=squared deviations). The concept of squared deviations breaks down when you have binary variables.

So yes, you can use PCA. And yes, you get an output. It even is a least-squared output: it's not as if PCA would segfault on such data. It works, but it is just much less meaningful than you'd want it to be; and supposedly less meaningful than e.g. frequent pattern mining.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...