I have a set of data that looks something like this:
Feature X1 Feature X2 Feature X3 Output Y=0 Output Y=1
A 27.5 0.0125 500 0
B 67.5 0.175 4000 30
A 32.5 0.325 1000 120
C 42.5 0.175 600 20
...
(i.e. for each combination of features X1, X2 and X3, I got the number of counts for output Y = 0 and Y = 1)
And I would like to build a logistic regression or random forest model using sklearn on the data set to predict the output Y.
One way of approaching this is to expand each count into one row in an array and feed it into whatever model to be used, but the size of the data (total number of counts) is very large (around 1e10) and hence requires a lot of computational power to deal with.
Is there a way to let sklearn models understand such data structure without taking the massive array as input?