I have a dataset of 1 minute data of 1000 stocks since 1998, that total around (2012-1998)*(365*24*60)*1000 = 7.3 Billion
rows.
Most (99.9%) of the time I will perform only read requests.
What is the best way to store this data in a db?
- 1 big table with 7.3B rows?
- 1000 tables (one for each stock symbol) with 7.3M rows each?
- any recommendation of database engine? (I'm planning to use Amazon RDS' MySQL)
I'm not used to deal with datasets this big, so this is an excellent opportunity for me to learn. I will appreciate a lot your help and advice.
Edit:
This is a sample row:
'XX', 20041208, 938, 43.7444, 43.7541, 43.735, 43.7444, 35116.7, 1, 0, 0
Column 1 is the stock symbol, column 2 is the date, column 3 is the minute, the rest are open-high-low-close prices, volume, and 3 integer columns.
Most of the queries will be like "Give me the prices of AAPL between April 12 2012 12:15 and April 13 2012 12:52"
About the hardware: I plan to use Amazon RDS so I'm flexible on that
See Question&Answers more detail:os