Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
346 views
in Technique[技术] by (71.8m points)

c++ - Using seekg() in text mode

While trying to read in a simple ANSI-encoded text file in text mode (Windows), I came across some strange behaviour with seekg() and tellg(); Any time I tried to use tellg(), saved its value (as pos_type), and then seek to it later, I would always wind up further ahead in the stream than where I left off.

Eventually I did a sanity check; even if I just do this...

int main()
{
   std::ifstream dataFile("myfile.txt",
         std::ifstream::in);
   if (dataFile.is_open() && !dataFile.fail())
   {
      while (dataFile.good())
      {
         std::string line;
         dataFile.seekg(dataFile.tellg());
         std::getline(dataFile, line);
      }
   }
}

...then eventually, further into the file, lines are half cut-off. Why exactly is this happening?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

This issue is caused by libstdc++ using the difference between the current remaining buffer with lseek64 to determine the current offset.

The buffer is set using the return value of read, which for a text mode file on windows returns the number of bytes that have been put into the buffer after endline conversion (i.e. the 2 byte endline is converted to , windows also seems to append a spurious newline to the end of the file).

lseek64 however (which with mingw results in a call to _lseeki64) returns the current absolute file position, and once the two values are subtracted you end up with an offset that is off by 1 for each remaining newline in the text file (+1 for the extra newline).

The following code should display the issue, you can even use a file with a single character and no newlines due to the extra newline inserted by windows.

#include <iostream>
#include <fstream>

int main()
{
  std::ifstream f("myfile.txt");

  for (char c; f.get(c);)
    std::cout << f.tellg() << ' ';
}

For a file with a single a character I get the following output

2 3

Clearly off by 1 for the first call to tellg. After the second call the file position is correct as the end has been reached after taking the extra newline into account.

Aside from opening the file in binary mode, you can circumvent the issue by disabling buffering

#include <iostream>
#include <fstream>

int main()
{
  std::ifstream f;
  f.rdbuf()->pubsetbuf(nullptr, 0);
  f.open("myfile.txt");

  for (char c; f.get(c);)
    std::cout << f.tellg() << ' ';
}

but this is far from ideal.

Hopefully mingw / mingw-w64 or gcc can fix this, but first we'll need to determine who would be responsible for fixing it. I suppose the base issue is with MSs implementation of lseek which should return appropriate values according to how the file has been opened.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...