2009-08-10 18:59:33

by Frank Mayhar

[permalink] [raw]
Subject: Data loss/corruption when using fallocate/ftruncate.

Hello again, folks. We've got an app that needs to use O_DIRECT for
performance and is using fallocate() to make sure the files are all in
one extent. Unfortunately the end size isn't always the fallocated size
so it has to do a truncate when it's done; the sequence is generally:

create(file)
fallocate(file, KEEP_SIZE, 0, maxlen)
write/write/write/write...
fallocate(file, 0, 0. maxlen-minus a bit)
ftruncate(file, actual-len)

We've been seeing some of these files end up all or partly zero after
(but not before) the truncate. After further analysis, it's clear that
the last extent (possibly the only extent) is being marked uninit for
some reason. The actual blocks on disk are nonzero but due to the
extent being marked uninit they are being read as zero.

Note that this isn't easy to reproduce; lots of other stuff is going on
when this happens. Our feeling is that there's a race somewhere, quite
possibly between fallocate and ftruncate, but it's not clear. Certainly
a single-threaded application doesn't see this, nor does an application
that uses mutexes to serialize access to the file.

This is a heads-up to point out a real problem. We're still analyzing
and trying to track down the bug but it may take a little while.
--
Frank Mayhar <[email protected]>
Google, Inc.



2009-08-10 21:10:44

by Mingming Cao

[permalink] [raw]
Subject: Re: Data loss/corruption when using fallocate/ftruncate.

On Mon, 2009-08-10 at 11:59 -0700, Frank Mayhar wrote:
> Hello again, folks. We've got an app that needs to use O_DIRECT for
> performance and is using fallocate() to make sure the files are all in
> one extent. Unfortunately the end size isn't always the fallocated size
> so it has to do a truncate when it's done; the sequence is generally:
>
> create(file)
> fallocate(file, KEEP_SIZE, 0, maxlen)
> write/write/write/write...
> fallocate(file, 0, 0. maxlen-minus a bit)
> ftruncate(file, actual-len)
>
> We've been seeing some of these files end up all or partly zero after
> (but not before) the truncate. After further analysis, it's clear that
> the last extent (possibly the only extent) is being marked uninit for
> some reason. The actual blocks on disk are nonzero but due to the
> extent being marked uninit they are being read as zero.
>
> Note that this isn't easy to reproduce; lots of other stuff is going on
> when this happens. Our feeling is that there's a race somewhere, quite
> possibly between fallocate and ftruncate, but it's not clear. Certainly
> a single-threaded application doesn't see this, nor does an application
> that uses mutexes to serialize access to the file.
>
> This is a heads-up to point out a real problem. We're still analyzing
> and trying to track down the bug but it may take a little while.


Which kernel you are running? Two month ago a similar data "lose" issue
caused by mismark an previously-preallocated-but-later-filled trunk as
uninitialized after truncate. The following patch has been in 2.6.31-rc1
http://lists.openwall.net/linux-ext4/2009/06/10/30

Mingming