Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
menu search
person
Welcome To Ask or Share your Answers For Others

Categories

I'm creating a database schema for storing historical stock data. I currently have a schema as show below.

My requirements are to store "bar data" (date, open, high, low, close volume) for multiple stock symbols. Each symbol might also have multiple timeframes (e.g. Google Weekly bars and Google Daily bars).

My current schema puts the bulk of the data is in the OHLCV table. I'm far from a database expert and am curious if this is too naive. Constructive input is very welcome.

CREATE TABLE Exchange (exchange TEXT UNIQUE NOT NULL);

CREATE TABLE Symbol (symbol TEXT UNIQUE NOT NULL, exchangeID INTEGER NOT NULL);

CREATE TABLE Timeframe (timeframe TEXT NOT NULL, symbolID INTEGER NOT NULL);

CREATE TABLE OHLCV (date TEXT NOT NULL CHECK (date LIKE '____-__-__ __:__:__'),
    open REAL NOT NULL,
    high REAL NOT NULL,
    low REAL NOT NULL,
    close REAL NOT NULL,
    volume INTEGER NOT NULL,
    timeframeID INTEGER NOT NULL);

This means my queries currently go something like: Find the timeframeID for a given symbol/timeframe, then do a select on the OHLCV table where the timeframeID matches.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
485 views
Welcome To Ask or Share your Answers For Others

1 Answer

We tried to find a proper database structure for storing large amount of data for a long time. The solution below is the result of more than 6 years of experience. It is now working flawlessly for our quantitative analysis.

We have been able to store hundreds of gigabytes of intraday and daily data using this scheme in SQL Server:

 Symbol -  char 6
 Date -  date
 Time -  time
 Open -  decimal 18, 4
 High -  decimal 18, 4
 Low -  decimal 18, 4
 Close -  decimal 18, 4
 Volume -  int

All trading instruments are stored in a single table. We also have a clustered index on symbol, date and time columns.

For daily data, we have a separate table and do not use the Time column. Volume datatype is also bigint instead of int.

The performance? We can get data out of the server in a matter of milliseconds. Remember, the database size is almost 1 terabyte.

We purchased all of our historical market data from the Kibot web site: http://www.kibot.com/


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
thumb_up_alt 0 like thumb_down_alt 0 dislike
Welcome to ShenZhenJia Knowledge Sharing Community for programmer and developer-Open, Learning and Share
...