Optimizing SQLite is tricky.
(优化SQLite是棘手的。)
Bulk-insert performance of a C application can vary from 85 inserts per second to over 96,000 inserts per second!(C应用程序的大容量插入性能可以从每秒85次插入到每秒超过96,000次插入!)
Background: We are using SQLite as part of a desktop application.
(背景:我们将SQLite用作桌面应用程序的一部分。)
We have large amounts of configuration data stored in XML files that are parsed and loaded into an SQLite database for further processing when the application is initialized.(我们将大量配置数据存储在XML文件中,这些文件将被解析并加载到SQLite数据库中,以便在初始化应用程序时进行进一步处理。)
SQLite is ideal for this situation because it's fast, it requires no specialized configuration, and the database is stored on disk as a single file.(SQLite非常适合这种情况,因为它速度快,不需要专门的配置,并且数据库作为单个文件存储在磁盘上。)
Rationale: Initially I was disappointed with the performance I was seeing.
(基本原理: 最初,我对看到的性能感到失望。)
It turns-out that the performance of SQLite can vary significantly (both for bulk-inserts and selects) depending on how the database is configured and how you're using the API.(事实证明,取决于数据库的配置方式和使用API??的方式,SQLite的性能可能会发生很大的变化(批量插入和选择)。)
It was not a trivial matter to figure out what all of the options and techniques were, so I thought it prudent to create this community wiki entry to share the results with Stack Overflow readers in order to save others the trouble of the same investigations.(弄清楚所有选项和技术是什么都不是一件容易的事,因此,我认为创建此社区Wiki条目与Stack Overflow读者共享结果以节省其他人的麻烦是审慎的做法。)
The Experiment: Rather than simply talking about performance tips in the general sense (ie "Use a transaction!" ), I thought it best to write some C code and actually measure the impact of various options.
(实验:我认为,最好是编写一些C代码并实际衡量各种选择的影响,而不是简单地谈论一般意义上的性能提示(即“使用事务!” )。)
We're going to start with some simple data:(我们将从一些简单的数据开始:)
- A 28 MB TAB-delimited text file (approximately 865,000 records) of the complete transit schedule for the city of Toronto
- My test machine is a 3.60 GHz P4 running Windows XP.
(我的测试计算机是运行Windows XP的3.60 GHz P4。)
- The code is compiled with Visual C++ 2005 as "Release" with "Full Optimization" (/Ox) and Favor Fast Code (/Ot).
(使用Visual C ++ 2005将代码编译为带有“完整优化”(/ Ox)和“快速收藏”代码(/ Ot)的“发行版”。)
- I'm using the SQLite "Amalgamation", compiled directly into my test application.
(我正在使用直接编译到测试应用程序中的SQLite“合并”。)
The SQLite version I happen to have is a bit older (3.6.7), but I suspect these results will be comparable to the latest release (please leave a comment if you think otherwise).(我刚好拥有的SQLite版本(3.6.7)有点旧,但是我怀疑这些结果将与最新版本相当(如果您另有意见,请发表评论)。)
Let's write some code!
(让我们写一些代码!)
The Code: A simple C program that reads the text file line-by-line, splits the string into values and then inserts the data into an SQLite database.
(代码:一个简单的C程序,它逐行读取文本文件,将字符串拆分为值,然后将数据插入SQLite数据库。)
In this "baseline" version of the code, the database is created, but we won't actually insert data:(在此“基准”版本的代码中,创建了数据库,但实际上不会插入数据:)
/*************************************************************
Baseline code to experiment with SQLite performance.
Input data is a 28 MB TAB-delimited text file of the
complete Toronto Transit System schedule/route info
from http://www.toronto.ca/open/datasets/ttc-routes/
**************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include "sqlite3.h"
#define INPUTDATA "C:\TTC_schedule_scheduleitem_10-27-2009.txt"
#define DATABASE "c:\TTC_schedule_scheduleitem_10-27-2009.sqlite"
#define TABLE "CREATE TABLE IF NOT EXISTS TTC (id INTEGER PRIMARY KEY, Route_ID TEXT, Branch_Code TEXT, Version INTEGER, Stop INTEGER, Vehicle_Index INTEGER, Day Integer, Time TEXT)"
#define BUFFER_SIZE 256
int main(int argc, char **argv) {
sqlite3 * db;
sqlite3_stmt * stmt;
char * sErrMsg = 0;
char * tail = 0;
int nRetCode;
int n = 0;
clock_t cStartClock;
FILE * pFile;
char sInputBuf [BUFFER_SIZE] = "";
char * sRT = 0; /* Route */
char * sBR = 0; /* Branch */
char * sVR = 0; /* Version */
char * sST = 0; /* Stop Number */
char * sVI = 0; /* Vehicle */
char * sDT = 0; /* Date */
char * sTM = 0; /* Time */
char sSQL [BUFFER_SIZE] = "";
/*********************************************/
/* Open the Database and create the Schema */
sqlite3_open(DATABASE, &db);
sqlite3_exec(db, TABLE, NULL, NULL, &sErrMsg);
/*********************************************/
/* Open input file and import into Database*/
cStartClock = clock();
pFile = fopen (INPUTDATA,"r");
while (!feof(pFile)) {
fgets (sInputBuf, BUFFER_SIZE, pFile);
sRT = strtok (sInputBuf, ""); /* Get Route */
sBR = strtok (NULL, ""); /* Get Branch */
sVR = strtok (NULL, ""); /* Get Version */
sST = strtok (NULL, ""); /* Get Stop Number */
sVI = strtok (NULL, ""); /* Get Vehicle */
sDT = strtok (NULL, ""); /* Get Date */
sTM = strtok (NULL, ""); /* Get Time */
/* ACTUAL INSERT WILL GO HERE */
n++;
}
fclose (pFile);
printf("Imported %d records in %4.2f seconds
", n, (clock() - cStartClock) / (double)CLOCKS_PER_SEC);
sqlite3_close(db);
return 0;
}
The "Control" (“控制”)
Running the code as-is doesn't actually perform any database operations, but it will give us an idea of how fast the raw C file I/O and string processing operations are.
(按原样运行代码实际上不会执行任何数据库操作,但是它将使我们了解原始C文件I / O和字符串处理操作的速度。)
Imported 864913 records in 0.94 seconds
(在0.94秒内导入864913记录)
Great!
(大!)
We can do 920,000 inserts per second, provided we don't actually do any inserts :-)(只要我们实际上不执行任何插入操作,我们就可以每秒执行920,000次插入操作:-))
The "Worst-Case-Scenario" (“最坏情况”)
We're going to generate the SQL string using the values read from the file and invoke that SQL operation using sqlite3_exec:
(我们将使用从文件中读取的值来生成SQL字符串,并使用sqlite3_exec调用该SQL操作:)
sprintf(sSQL, "INSERT INTO TTC VALUES (NULL, '%s', '%s', '%s', '%s', '%s', '%s', '%s')", sRT, sBR, sVR, sST, sVI, sDT, sTM);
sqlite3_exec(db, sSQL, NULL, NULL, &sErrMsg);
This is going to be slow because the SQL will be compiled into VDBE code for every insert and every insert will happen in its own transaction.
(这将很慢,因为对于每个插入,SQL都将被编译成VDBE代码,并且每个插入将在其自己的事务中发生。)
How slow?(有多慢)
Imported 864913 records in 9933.61 seconds
(在9933.61秒内导入了864913条记录)
Yikes!
(kes!)
2 hours and 45 minutes!(2小时45分钟!)
That's only 85 inserts per second.(每秒只有85次插入。)
Using a Transaction (使用交易)
By default, SQLite will evaluate every INSERT / UPDATE statement within a unique transaction.
(默认情况下,SQLite将评估唯一事务中的每个INSERT / UPDATE语句。)
If performing a large number of inserts, it's advisable to wrap your operation in a transaction:(如果执行大量插入操作,建议将操作包装在事务中:)
sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);
pFile = fopen (INPUTDATA,"r");
while (!feof(pFile)) {
...
}
fclose (pFile);
sqlite3_exec(db, "END TRANSACTION", NULL, NULL, &sErrMsg);
Imported 864913 records in 38.03 seconds
(在38.03秒内导入864913记录)
That's better.
(这样更好)
Simply wrapping all of our inserts in a single transaction improved our performance to 23,000 inserts per second.(只需将所有插入物包装在一个事务中,就可以将我们的性能提高到每秒23,000个插入物。)
Using a Prepared Statement (使用准备好的语句)
Using a transaction was a huge improvement, but recompiling the SQL statement for every insert doesn't make sense if we using the same SQL over-and-over.
(使用事务是一个巨大的改进,但是如果我们反复使用相同的SQL,则对于每个插入都重新编译SQL语句是没有意义的。)
Let's usesqlite3_prepare_v2
to compile our SQL statement once and then bind our parameters to that statement using sqlite3_bind_text
: (让我们使用sqlite3_prepare_v2
一次编译我们的SQL语句,然后使用sqlite3_bind_text
将参数绑定到该语句:)
/* Open input file and import into the database */
cStartClock = clock();
sprintf(sSQL, "INSERT INTO TTC VALUES (NULL, @RT, @BR, @VR, @ST, @VI, @DT, @TM)");
sqlite3_prepare_v2(db, sSQL, BUFFER_SIZE, &stmt, &tail);
sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);
pFile = fopen (INPUTDATA,"r");
while (!feof(pFile)) {
fgets (sInputBuf, BUFFER_SIZE, pFile);
sRT = strtok (sInputBuf, ""); /* Get Route */
sBR = strtok (NULL, ""); /* Get Branch */
sVR = strtok (NULL, ""); /* Get Version */
sST = strtok (NULL, ""); /* Get Stop Number */
sVI = strtok (NULL, ""); /* Get Vehicle */
sDT = strtok (NULL, ""); /* Get Date */
sTM = strtok (NULL, ""); /* Get Time */
sqlite3_bind_text(stmt, 1, sRT, -1, SQLITE_TRANSIENT);
sqlite3_bind_text(stmt, 2, sBR, -1, SQLITE_TRANSIENT);
sqlite3_bind_text(stmt, 3, sVR, -1, SQLITE_TRANSIENT);
sqlite3_bind_text(stmt, 4, sST, -1, SQLITE_TRANSIENT);
sqlite3_bind_text(stmt, 5, sVI, -1, SQLITE_TRANSIENT);
sqlite3_bind_text(stmt, 6, sDT, -1, SQLITE_TRANSIENT);
sqlite3_bind_text(stmt, 7, sTM, -1, SQLITE_TRANSIENT);
sqlite3_step(stmt);
sqlite3_clear_bindings(stmt);
sqlite3_reset(stmt);
n++;
}
fclose (pFile);
sqlite3_exec(db, "END TRANSACTION", NULL, NULL, &sErrMsg);
printf("Imported %d records in %4.2f seconds
", n, (clock() - cStartClock) / (double)CLOCKS_PER_SEC);
sqlite3_finalize(stmt);
sqlite3_close(db);
return 0;
Imported 864913 records in 16.27 seconds
(在16.27秒内导入864913记录)
Nice!
(真好!)
There's a little bit more code (don't forget to callsqlite3_clear_bi