Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
532 views
in Technique[技术] by (71.8m points)

c++ - What's the most efficient way to read a file into a std::string?

I currently do this, and the conversion to std::string at the end take 98% of the execution time. There must be a better way!

std::string
file2string(std::string filename)
{
    std::ifstream file(filename.c_str());
    if(!file.is_open()){
        // If they passed a bad file name, or one we have no read access to,
        // we pass back an empty string.
        return "";
    }
    // find out how much data there is
    file.seekg(0,std::ios::end);
    std::streampos length = file.tellg();
    file.seekg(0,std::ios::beg);
    // Get a vector that size and
    std::vector<char> buf(length);
    // Fill the buffer with the size
    file.read(&buf[0],length);
    file.close();
    // return buffer as string
    std::string s(buf.begin(),buf.end());
    return s;
}
See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Being a big fan of C++ iterator abstraction and the algorithms, I would love the following to be the fasted way to read a file (or any other input stream) into a std::string (and then print the content):

#include <algorithm>
#include <fstream>
#include <iostream>
#include <iterator>
#include <string>

int main()
{
    std::string s(std::istreambuf_iterator<char>(std::ifstream("file")
                                                 >> std::skipws),
                  std::istreambuf_iterator<char>());
    std::cout << "file='" << s << "'
";
}

This certainly is fast for my own implementation of IOStreams but it requires a lot of trickery to actually get it fast. Primarily, it requires optimizing algorithms to cope with segmented sequences: a stream can be seen as a sequence of input buffers. I'm not aware of any STL implementation consistently doing this optimization. The odd use of std::skipws is just to get reference to the just created stream: the std::istreambuf_iterator<char> expects a reference to which the temporary file stream wouldn't bind.

Since this probably isn't the fastest approach, I would be inclined to use std::getline() with a particular "newline" character, i.e. on which isn't in the file:

std::string s;
// optionally reserve space although I wouldn't be too fuzzed about the
// reallocations because the reads probably dominate the performances
std::getline(std::ifstream("file") >> std::skipws, s, 0);

This assumes that the file doesn't contain a null character. Any other character would do as well. Unfortunately, std::getline() takes a char_type as delimiting argument, rather than an int_type which is what the member std::istream::getline() takes for the delimiter: in this case you could use eof() for a character which never occurs (char_type, int_type, and eof() refer to the respective member of char_traits<char>). The member version, in turn, can't be used because you would need to know ahead of time how many characters are in the file.

BTW, I saw some attempts to use seeking to determine the size of the file. This is bound not to work too well. The problem is that the code conversion done in std::ifstream (well, actually in std::filebuf) can create a different number of characters than there are bytes in the file. Admittedly, this isn't the case when using the default C locale and it is possible to detect that this doesn't do any conversion. Otherwise the best bet for the stream would be to run over the file and determine the number of characters being produced. I actually think that this is what would be needed to be done when the code conversion could something interesting although I don't think it actually is done. However, none of the examples explicitly set up the C locale, using e.g. std::locale::global(std::locale("C"));. Even with this it is also necessary to open the file in std::ios_base::binary mode because otherwise end of line sequences may be replaced by a single character when reading. Admittedly, this would only make the result shorter, never longer.

The other approaches using the extraction from std::streambuf* (i.e. those involving rdbuf()) all require that the resulting content is copied at some point. Given that the file may actually be very large this may not be an option. Without the copy this could very well be the fastest approach, however. To avoid the copy, it would be possible to create a simple custom stream buffer which takes a reference to a std::string as constructor argument and directly appends to this std::string:

#include <fstream>
#include <iostream>
#include <string>

class custombuf:
    public std::streambuf
{
public:
    custombuf(std::string& target): target_(target) {
        this->setp(this->buffer_, this->buffer_ + bufsize - 1);
    }

private:
    std::string& target_;
    enum { bufsize = 8192 };
    char buffer_[bufsize];
    int overflow(int c) {
        if (!traits_type::eq_int_type(c, traits_type::eof()))
        {
            *this->pptr() = traits_type::to_char_type(c);
            this->pbump(1);
        }
        this->target_.append(this->pbase(), this->pptr() - this->pbase());
        this->setp(this->buffer_, this->buffer_ + bufsize - 1);
        return traits_type::not_eof(c);
    }
    int sync() { this->overflow(traits_type::eof()); return 0; }
};

int main()
{
    std::string s;
    custombuf   sbuf(s);
    if (std::ostream(&sbuf)
        << std::ifstream("readfile.cpp").rdbuf()
        << std::flush) {
        std::cout << "file='" << s << "'
";
    }
    else {
        std::cout << "failed to read file
";
    }
}

At least with a suitably chosen buffer I would expect the version to be the fairly fast. Which version is the fastest will certainly depend on the system, the standard C++ library being used, and probably a number of other factors, i.e. you want to measure the performance.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...