Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
308 views
in Technique[技术] by (71.8m points)

c++ - How to read 4GB file on 32bit system

In my case I have different files lets assume that I have >4GB file with data. I want to read that file line by line and process each line. One of my restrictions is that soft has to be run on 32bit MS Windows or on 64bit with small amount of RAM (min 4GB). You can also assume that processing of these lines isn't bottleneck.

In current solution I read that file by ifstream and copy to some string. Here is snippet how it looks like.

std::ifstream file(filename_xml.c_str());
uintmax_t m_numLines = 0;
std::string str;
while (std::getline(file, str))
{
    m_numLines++;
}

And ok, that's working but to slowly here is a time for my 3.6 GB of data:

real    1m4.155s
user    0m0.000s
sys     0m0.030s

I'm looking for a method that will be much faster than that for example I found that How to parse space-separated floats in C++ quickly? and I loved presented solution with boost::mapped_file but I faced to another problem what if my file is to big and in my case file 1GB large was enough to drop entire process. I have to care about current data in memory probably people who will be using that tool doesn't have more than 4 GB installed RAM.

So I found that mapped_file from boost but how to use it in my case? Is it possible to read partially that file and receive these lines?

Maybe you have another much better solution. I have to just process each line.

Thanks,
Bart

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Nice to see you found my benchmark at How to parse space-separated floats in C++ quickly?

It seems you're really looking for the fastest way to count lines (or any linear single pass analysis), I've done a similar analysis and benchmark of exactly that here

Interestingly, you'll see that the most performant code does not need to rely on memory mapping at all there.

static uintmax_t wc(char const *fname)
{
    static const auto BUFFER_SIZE = 16*1024;
    int fd = open(fname, O_RDONLY);
    if(fd == -1)
        handle_error("open");

    /* Advise the kernel of our access pattern.  */
    posix_fadvise(fd, 0, 0, 1);  // FDADVICE_SEQUENTIAL

    char buf[BUFFER_SIZE + 1];
    uintmax_t lines = 0;

    while(size_t bytes_read = read(fd, buf, BUFFER_SIZE))
    {
        if(bytes_read == (size_t)-1)
            handle_error("read failed");
        if (!bytes_read)
            break;

        for(char *p = buf; (p = (char*) memchr(p, '
', (buf + bytes_read) - p)); ++p)
            ++lines;
    }

    return lines;
}

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...