Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I have a set of data that looks something like this:

Feature X1  Feature X2  Feature X3  Output Y=0  Output Y=1
A           27.5        0.0125      500         0
B           67.5        0.175       4000        30
A           32.5        0.325       1000        120
C           42.5        0.175       600         20
...

(i.e. for each combination of features X1, X2 and X3, I got the number of counts for output Y = 0 and Y = 1)

And I would like to build a logistic regression or random forest model using sklearn on the data set to predict the output Y.

One way of approaching this is to expand each count into one row in an array and feed it into whatever model to be used, but the size of the data (total number of counts) is very large (around 1e10) and hence requires a lot of computational power to deal with.

Is there a way to let sklearn models understand such data structure without taking the massive array as input?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
379 views
Welcome To Ask or Share your Answers For Others

1 Answer

等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...