Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
191 views
in Technique[技术] by (71.8m points)

c++ - fallocate vs posix_fallocate

I am debating which function to use between posix_fallocate and fallocate. posix_fallocate writes a file right away (initializes the characters to NULL). However, fallocate does not change the file size (when using FALLOC_FL_KEEP_SIZE flag). Based on my experimentation, it seems that fallocate does not write NULL or zero characters to the file.

Can someone please comment based on your experience? Thanks for your time.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

Having files that take up more storage space than their displayed length is not usual, so unless you have a good reason for doing that (e.g. you want to use the file length to keep track of how far a download got, for the purpose of resuming it), best to use the default fallocate(2) behaviour. (without FALLOC_FL_KEEP_SIZE). This is the same semantics as posix_fallocate(3).

The man page for fallocate(2) even says that its default behaviour (no flags) is intended as an optimal way of implementing posix_fallocate(3), and points to that as a portable way to allocate space.

The original question says something about writing zeros to the file. None of these calls write anything but metadata. If you read from space that's been preallocated but not yet written, you'll get zeros (not whatever was in that disk space previously, that would be a big security hole). You can only read up to the end of a file (the length, set by fallocate, ftruncate, or various other ways), so if you have a zero-length file and fallocate with FALLOC_FL_KEEP_SIZE, then you can't read anything. Nothing to do with preallocation, just file size semantics.

So if you're fine with the POSIX semantics, use it, because it's more portable. Every GNU/Linux system will support posix_fallocate(3), but so will some other systems.

However, thanks to POSIX semantics, it's not that simple. If you use it on a filesystem that doesn't support preallocation, it will still succeed, but do so by falling back to actually writing a zero in every block of the file.

Test program:

#include <fcntl.h>
int main() {
    int fd = open("foo", O_RDWR|O_CREAT, 0666);
    if (fd < 0) return 1;
    return posix_fallocate(fd, 0, 400000);
}

on XFS

$ strace ~/src/c/falloc
...
open("foo", O_RDWR|O_CREAT, 0666) = 3
fallocate(3, 0, 0, 400000)              = 0
exit_group(0)                           = ?

on a fat32 flash drive:

open("foo", O_RDWR|O_CREAT, 0666) = 3
fallocate(3, 0, 0, 400000)              = -1 EOPNOTSUPP (Operation not supported)
fstat(3, {st_mode=S_IFREG|0755, st_size=400000, ...}) = 0
fstatfs(3, {f_type="MSDOS_SUPER_MAGIC", f_bsize=65536, f_blocks=122113, f_bfree=38274, f_bavail=38274, f_files=0, f_ffree=0, f_fsid={2145, 0}, f_namelen=1530, f_frsize=65536}) = 0
pread(3, "", 1, 6783)                 = 1
pwrite(3, "", 1, 6783)                = 1
pread(3, "", 1, 72319)                = 1
pwrite(3, "", 1, 72319)               = 1
pread(3, "", 1, 137855)               = 1
pwrite(3, "", 1, 137855)              = 1
pread(3, "", 1, 203391)               = 1
pwrite(3, "", 1, 203391)              = 1
pread(3, "", 1, 268927)               = 1
pwrite(3, "", 1, 268927)              = 1
pread(3, "", 1, 334463)               = 1
pwrite(3, "", 1, 334463)              = 1
pread(3, "", 1, 399999)               = 1
pwrite(3, "", 1, 399999)              = 1
exit_group(0)                           = ?

It does avoid the reads if the file wasn't yet that long, but writing every block is still horrible.

If you want something simple, I'd still just go with posix_fallocate. There's a FreeBSD man page for it, and it's specified by POSIX, so every POSIX-compliant system provides it. The one drawback is that it will be horrible with glibc on a filesystem that doesn't support preallocation. See for example https://plus.google.com/+AaronSeigo/posts/FGtXM13QuhQ. For a program that works with large files, (e.g. torrents), this could be really bad.

You can thank POSIX semantics for requiring glibc to do this, as it doesn't define an error code for "the filesystem doesn't support preallocation". http://pubs.opengroup.org/onlinepubs/009695399/functions/posix_fallocate.html. It also guarantees that if the call succeeds, subsequent writes into the allocated region won't fail due to lack of disk space. So the posix design doesn't provide a way to handle the case where the caller cares about efficiency / performance / fragmentation, rather than disk space guarantees. This forces the POSIX implementation to do the read-write loop, rather than leaving that as an option for callers that need a disk-space guarantee. Thanks POSIX...

I don't know whether non-GNU implementations of posix_fallocate similarly fall back to extremely slow read-write behaviour when the filesystem doesn't support preallocation. (FreeBSD, Solaris?). Apparently OS X (Darwin) doesn't implement posix_fallocate, unless it's very recent.

If you're looking to support preallocation across a lot of platforms, but without falling back to read-then-write if the OS has a way to just attempt preallocation, you have to use whatever platform-specific method is available. e.g. check out https://github.com/arvidn/libtorrent/blob/master/src/file.cpp

search for file::set_size. It has several ifdeffed blocks depending on what the compile target supports, starting with windows code to load DLLs and do stuff there, then fcntl F_PREALLOCATE, or fcntl F_ALLOCSP64, then Linux fallocate(2), then falls back to using posix_fallocate. Also, found this 2007 list post for OS X Darwin: http://lists.apple.com/archives/darwin-dev/2007/Dec/msg00040.html


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...