Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
345 views
in Technique[技术] by (71.8m points)

c - 提高SQLite每秒INSERT的性能?(Improve INSERT-per-second performance of SQLite?)

Optimizing SQLite is tricky.

(优化SQLite是棘手的。)

Bulk-insert performance of a C application can vary from 85 inserts per second to over 96,000 inserts per second!

(C应用程序的大容量插入性能可以从每秒85次插入到每秒超过96,000次插入!)

Background: We are using SQLite as part of a desktop application.

(背景:我们将SQLite用作桌面应用程序的一部分。)

We have large amounts of configuration data stored in XML files that are parsed and loaded into an SQLite database for further processing when the application is initialized.

(我们将大量配置数据存储在XML文件中,这些文件将被解析并加载到SQLite数据库中,以便在初始化应用程序时进行进一步处理。)

SQLite is ideal for this situation because it's fast, it requires no specialized configuration, and the database is stored on disk as a single file.

(SQLite非常适合这种情况,因为它速度快,不需要专门的配置,并且数据库作为单个文件存储在磁盘上。)

Rationale: Initially I was disappointed with the performance I was seeing.

(基本原理: 最初,我对看到的性能感到失望。)

It turns-out that the performance of SQLite can vary significantly (both for bulk-inserts and selects) depending on how the database is configured and how you're using the API.

(事实证明,取决于数据库的配置方式和使用API??的方式,SQLite的性能可能会发生很大的变化(批量插入和选择)。)

It was not a trivial matter to figure out what all of the options and techniques were, so I thought it prudent to create this community wiki entry to share the results with Stack Overflow readers in order to save others the trouble of the same investigations.

(弄清楚所有选项和技术是什么都不是一件容易的事,因此,我认为创建此社区Wiki条目与Stack Overflow读者共享结果以节省其他人的麻烦是审慎的做法。)

The Experiment: Rather than simply talking about performance tips in the general sense (ie "Use a transaction!" ), I thought it best to write some C code and actually measure the impact of various options.

(实验:我认为,最好是编写一些C代码并实际衡量各种选择的影响,而不是简单地谈论一般意义上的性能提示(即“使用事务!” )。)

We're going to start with some simple data:

(我们将从一些简单的数据开始:)

  • A 28 MB TAB-delimited text file (approximately 865,000 records) of the complete transit schedule for the city of Toronto

    (28 MB TAB分隔的文本文件(约865,000条记录), 用于多伦多市完整运输时间表)

  • My test machine is a 3.60 GHz P4 running Windows XP.

    (我的测试计算机是运行Windows XP的3.60 GHz P4。)

  • The code is compiled with Visual C++ 2005 as "Release" with "Full Optimization" (/Ox) and Favor Fast Code (/Ot).

    (使用Visual C ++ 2005将代码编译为带有“完整优化”(/ Ox)和“快速收藏”代码(/ Ot)的“发行版”。)

  • I'm using the SQLite "Amalgamation", compiled directly into my test application.

    (我正在使用直接编译到测试应用程序中的SQLite“合并”。)

    The SQLite version I happen to have is a bit older (3.6.7), but I suspect these results will be comparable to the latest release (please leave a comment if you think otherwise).

    (我刚好拥有的SQLite版本(3.6.7)有点旧,但是我怀疑这些结果将与最新版本相当(如果您另有意见,请发表评论)。)

Let's write some code!

(让我们写一些代码!)

The Code: A simple C program that reads the text file line-by-line, splits the string into values and then inserts the data into an SQLite database.

(代码:一个简单的C程序,它逐行读取文本文件,将字符串拆分为值,然后将数据插入SQLite数据库。)

In this "baseline" version of the code, the database is created, but we won't actually insert data:

(在此“基准”版本的代码中,创建了数据库,但实际上不会插入数据:)

/*************************************************************
    Baseline code to experiment with SQLite performance.

    Input data is a 28 MB TAB-delimited text file of the
    complete Toronto Transit System schedule/route info
    from http://www.toronto.ca/open/datasets/ttc-routes/

**************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include "sqlite3.h"

#define INPUTDATA "C:\TTC_schedule_scheduleitem_10-27-2009.txt"
#define DATABASE "c:\TTC_schedule_scheduleitem_10-27-2009.sqlite"
#define TABLE "CREATE TABLE IF NOT EXISTS TTC (id INTEGER PRIMARY KEY, Route_ID TEXT, Branch_Code TEXT, Version INTEGER, Stop INTEGER, Vehicle_Index INTEGER, Day Integer, Time TEXT)"
#define BUFFER_SIZE 256

int main(int argc, char **argv) {

    sqlite3 * db;
    sqlite3_stmt * stmt;
    char * sErrMsg = 0;
    char * tail = 0;
    int nRetCode;
    int n = 0;

    clock_t cStartClock;

    FILE * pFile;
    char sInputBuf [BUFFER_SIZE] = "";

    char * sRT = 0;  /* Route */
    char * sBR = 0;  /* Branch */
    char * sVR = 0;  /* Version */
    char * sST = 0;  /* Stop Number */
    char * sVI = 0;  /* Vehicle */
    char * sDT = 0;  /* Date */
    char * sTM = 0;  /* Time */

    char sSQL [BUFFER_SIZE] = "";

    /*********************************************/
    /* Open the Database and create the Schema */
    sqlite3_open(DATABASE, &db);
    sqlite3_exec(db, TABLE, NULL, NULL, &sErrMsg);

    /*********************************************/
    /* Open input file and import into Database*/
    cStartClock = clock();

    pFile = fopen (INPUTDATA,"r");
    while (!feof(pFile)) {

        fgets (sInputBuf, BUFFER_SIZE, pFile);

        sRT = strtok (sInputBuf, "");     /* Get Route */
        sBR = strtok (NULL, "");            /* Get Branch */
        sVR = strtok (NULL, "");            /* Get Version */
        sST = strtok (NULL, "");            /* Get Stop Number */
        sVI = strtok (NULL, "");            /* Get Vehicle */
        sDT = strtok (NULL, "");            /* Get Date */
        sTM = strtok (NULL, "");            /* Get Time */

        /* ACTUAL INSERT WILL GO HERE */

        n++;
    }
    fclose (pFile);

    printf("Imported %d records in %4.2f seconds
", n, (clock() - cStartClock) / (double)CLOCKS_PER_SEC);

    sqlite3_close(db);
    return 0;
}

The "Control" (“控制”)

Running the code as-is doesn't actually perform any database operations, but it will give us an idea of how fast the raw C file I/O and string processing operations are.

(按原样运行代码实际上不会执行任何数据库操作,但是它将使我们了解原始C文件I / O和字符串处理操作的速度。)

Imported 864913 records in 0.94 seconds

(在0.94秒内导入864913记录)

Great!

(大!)

We can do 920,000 inserts per second, provided we don't actually do any inserts :-)

(只要我们实际上不执行任何插入操作,我们就可以每秒执行920,000次插入操作:-))


The "Worst-Case-Scenario" (“最坏情况”)

We're going to generate the SQL string using the values read from the file and invoke that SQL operation using sqlite3_exec:

(我们将使用从文件中读取的值来生成SQL字符串,并使用sqlite3_exec调用该SQL操作:)

sprintf(sSQL, "INSERT INTO TTC VALUES (NULL, '%s', '%s', '%s', '%s', '%s', '%s', '%s')", sRT, sBR, sVR, sST, sVI, sDT, sTM);
sqlite3_exec(db, sSQL, NULL, NULL, &sErrMsg);

This is going to be slow because the SQL will be compiled into VDBE code for every insert and every insert will happen in its own transaction.

(这将很慢,因为对于每个插入,SQL都将被编译成VDBE代码,并且每个插入将在其自己的事务中发生。)

How slow?

(有多慢)

Imported 864913 records in 9933.61 seconds

(在9933.61秒内导入了864913条记录)

Yikes!

(kes!)

2 hours and 45 minutes!

(2小时45分钟!)

That's only 85 inserts per second.

(每秒只有85次插入。)

Using a Transaction (使用交易)

By default, SQLite will evaluate every INSERT / UPDATE statement within a unique transaction.

(默认情况下,SQLite将评估唯一事务中的每个INSERT / UPDATE语句。)

If performing a large number of inserts, it's advisable to wrap your operation in a transaction:

(如果执行大量插入操作,建议将操作包装在事务中:)

sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);

pFile = fopen (INPUTDATA,"r");
while (!feof(pFile)) {

    ...

}
fclose (pFile);

sqlite3_exec(db, "END TRANSACTION", NULL, NULL, &sErrMsg);

Imported 864913 records in 38.03 seconds

(在38.03秒内导入864913记录)

That's better.

(这样更好)

Simply wrapping all of our inserts in a single transaction improved our performance to 23,000 inserts per second.

(只需将所有插入物包装在一个事务中,就可以将我们的性能提高到每秒23,000个插入物。)

Using a Prepared Statement (使用准备好的语句)

Using a transaction was a huge improvement, but recompiling the SQL statement for every insert doesn't make sense if we using the same SQL over-and-over.

(使用事务是一个巨大的改进,但是如果我们反复使用相同的SQL,则对于每个插入都重新编译SQL语句是没有意义的。)

Let's use sqlite3_prepare_v2 to compile our SQL statement once and then bind our parameters to that statement using sqlite3_bind_text :

(让我们使用sqlite3_prepare_v2一次编译我们的SQL语句,然后使用sqlite3_bind_text将参数绑定到该语句:)

/* Open input file and import into the database */
cStartClock = clock();

sprintf(sSQL, "INSERT INTO TTC VALUES (NULL, @RT, @BR, @VR, @ST, @VI, @DT, @TM)");
sqlite3_prepare_v2(db,  sSQL, BUFFER_SIZE, &stmt, &tail);

sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);

pFile = fopen (INPUTDATA,"r");
while (!feof(pFile)) {

    fgets (sInputBuf, BUFFER_SIZE, pFile);

    sRT = strtok (sInputBuf, "");   /* Get Route */
    sBR = strtok (NULL, "");        /* Get Branch */
    sVR = strtok (NULL, "");        /* Get Version */
    sST = strtok (NULL, "");        /* Get Stop Number */
    sVI = strtok (NULL, "");        /* Get Vehicle */
    sDT = strtok (NULL, "");        /* Get Date */
    sTM = strtok (NULL, "");        /* Get Time */

    sqlite3_bind_text(stmt, 1, sRT, -1, SQLITE_TRANSIENT);
    sqlite3_bind_text(stmt, 2, sBR, -1, SQLITE_TRANSIENT);
    sqlite3_bind_text(stmt, 3, sVR, -1, SQLITE_TRANSIENT);
    sqlite3_bind_text(stmt, 4, sST, -1, SQLITE_TRANSIENT);
    sqlite3_bind_text(stmt, 5, sVI, -1, SQLITE_TRANSIENT);
    sqlite3_bind_text(stmt, 6, sDT, -1, SQLITE_TRANSIENT);
    sqlite3_bind_text(stmt, 7, sTM, -1, SQLITE_TRANSIENT);

    sqlite3_step(stmt);

    sqlite3_clear_bindings(stmt);
    sqlite3_reset(stmt);

    n++;
}
fclose (pFile);

sqlite3_exec(db, "END TRANSACTION", NULL, NULL, &sErrMsg);

printf("Imported %d records in %4.2f seconds
", n, (clock() - cStartClock) / (double)CLOCKS_PER_SEC);

sqlite3_finalize(stmt);
sqlite3_close(db);

return 0;

Imported 864913 records in 16.27 seconds

(在16.27秒内导入864913记录)

Nice!

(真好!)

There's a little bit more code (don't forget to call sqlite3_clear_bi

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Several tips:

(几个技巧:)

  1. Put inserts/updates in a transaction.

    (将插入/更新放入事务中。)

  2. For older versions of SQLite - Consider a less paranoid journal mode ( pragma journal_mode ).

    (对于较旧版本的SQLite-考虑较少的偏执日志模式( pragma journal_mode )。)

    There is NORMAL , and then there is OFF , which can significantly increase insert speed if you're not too worried about the database possibly getting corrupted if the OS crashes.

    (有NORMAL ,然后有OFF ,如果您不太担心数据库可能因操作系统崩溃而损坏,则可以显着提高插入速度。)

    If your application crashes the data should be fine.

    (如果您的应用程序崩溃了,数据应该没问题。)

    Note that in newer versions, the OFF/MEMORY settings are not safe for application level crashes.

    (请注意,在较新的版本中, OFF/MEMORY设置对于应用程序级崩溃不安全。)

  3. Playing with page sizes makes a difference as well ( PRAGMA page_size ).

    (使用页面大小也会有所不同( PRAGMA page_size )。)

    Having larger page sizes can make reads and writes go a bit faster as larger pages are held in memory.

    (由于较大的页面保留在内存中,因此具有较大的页面大小可以使读取和写入的速度更快。)

    Note that more memory will be used for your database.

    (请注意,更多的内存将用于您的数据库。)

  4. If you have indices, consider calling CREATE INDEX after doing all your inserts.

    (如果有索引,请在完成所有插入操作后考虑调用CREATE INDEX 。)

    This is significantly faster than creating the index and then doing your inserts.

    (这比创建索引然后进行插入要快得多。)

  5. You have to be quite careful if you have concurrent access to SQLite, as the whole database is locked when writes are done, and although multiple readers are possible, writes will be locked out.

    (如果您可以并发访问SQLite,则必须非常小心,因为写入完成后整个数据库将被锁定,尽管可能有多个读取器,但写入将被锁定。)

    This has been improved somewhat with the addition of a WAL in newer SQLite versions.

    (通过在较新的SQLite版本中添加WAL,已对此进行了一些改进。)

  6. Take advantage of saving space...smaller databases go faster.

    (利用节省空间的优势...较小的数据库运行更快。)

    For instance, if you have key value pairs, try making the key an INTEGER PRIMARY KEY if possible, which will replace the implied unique row number column in the table.

    (例如,如果您具有键值对,请尝试尽可能使键成为INTEGER PRIMARY KEY ,它将替换表中隐含的唯一行号列。)

  7. If you are using multiple threads, you can try using the shared page cache , which will allow loaded pages to be shared between threads, which can avoid expensive I/O calls.

    (如果使用多个线程,则可以尝试使用共享页面缓存 ,这将允许在线程之间共享已加载的页面,从而避免了昂贵的I / O调用。)

  8. Don't use !feof(file) !

    (不要使用!feof(file))

I've also asked similar questions here and here .

(我也在这里这里问过类似的问题。)


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...