Ted Ts’o on the Recent Ext4 Delayed Allocation Issues
There has been a lot of discussion about the recent potential ext4 data loss issues. The data loss issue is caused by ext4 using delayed allocation to improve performance which results in some data loss if the system loses power before the latest set of data in the buffer is written to disk. The system will then replace the existing file with a zero-length file (there are a set of patches for kernel 2.6.30 to fix the killing of existing files, but if the file is a newly created file it will still be zero-length) . This is a common method for improving filesystem performance and filesystems such as Reiser4, XFS, HFS+, ZFS and btrfs use this method. Ted Ts’o wrote a blog post titled “Delayed allocation and the zero-length file problem” in which he discusses these issues more in depth and talks about the pros and cons of all of the different solutions that are available to the ext4 team. I think this quote sums the issue and best solution going forward nicely:
What’s the best path forward? For now, what I would recommend to Ubuntu gamers whose systems crash all the time and who want to use ext4, to use the nodelalloc mount option. I haven’t quantified what the performance penalty will be for this mode of operation, but the performance will be better than ext3, and at least this way they won’t have to worry about files getting lost as a result of delayed allocation. Long term, application writers who are worried about files getting lost on an unclena shutdown really should use fsync. Modern filesystems are all going to be using delayed allocation, because of its inherent performance benefits, and whether you think the future belongs to ZFS, or btrfs, or XFS, or ext4 — all of these filesystems used delayed allocation.
I especially enjoy seeing how Mr. Ts’o is engaging his readers in the comments and holding a real discussion about the issue and solutions to the issue with them.