LinuxLists.cc - Fwd: Ext4 bug with f

2009-10-18 15:47:23

Subject: Fwd: Ext4 bug with f

Hi, I'd like to report what I'm fairly certain is an ext4 bug. I hope
this is the right place to do so.

My program creates a big file (around 30 GB) with posix_fallocate (to
utilize extents), fills it with data and uses ftruncate to crop it to
its final size (usually somewhere between 20 and 25 GB).
The problem is that in around 5% of the cases, the program locks up
completely in a syscall. The process can thus not be killed even with
kill -9, and a reboot is all that will do.

Here is the contents of my /proc/PID/syscall:
76 0xee4a80 0x486d6aaf8 0x36390113f8 0x7fffc63cd350 0xecc050
0x7fffc63cd3f0 0x7fffc63cd5c8 0x36380e0cc7

Syscall 76 is as far as I can tell getrlimit, which I do not call. It
must have been called somewhere from ftruncate.

The file is on a software raid 0 mount with two disks, handled by
mdadm. I have reported the problem to the md people but they insist
it's an ext4 problem.

I have also tried closing the file and opening it again prior to the
ftruncate. No change.

There are no other strange phenomena whatsoever with ext4. This
problem only arises in this particular situation.

I'm running Fedora on an x86_64 system.
I have tried this on several kernel versions, the last one 2.6.31.1.
It has persisted all the way since the kernel that originally shipped
with Fedora 11.

Is this a bug/known problem? Thankful for any insights!

/Fredrik

2009-10-18 15:57:33

by Eric Sandeen

[permalink] [raw]

Fredrik Andersson wrote:
>> To try to emulate, how does it write into the preallocated space; large or
>> small IOs? Sequential streaming? mmap writes? It may not be relevant but
>> would be nice to try to match it as closely as possible.
>
> This is a big file that is written sequentially using stdio buffered
> I/O (with a setvbuf of about 4K) in the drdbmake process. No mmap.
> It is regenerated from an earlier version of the same file, and we
> preallocate a file that is 25% bigger than the
> previous version, to allow for more data than was in the previous file
> and to utilize the extent concept in ext4.

FWIW, you do not need to preallocate to get extents. Preallocation
fundamentally only guarantees space available (somewhere) though in
practice, it can lead to more contiguous allocation of that space since
it's all done up front ...

> We then read the previous file sequentially, update some entries here
> and there and
> rewrite it sequentially into the new, fallocated file. There is one
> single instance of random I/O: Once the whole new
> file has been written, we seek back to the start to write a fixed-size
> header. We then ftruncate the file to the proper size.
> No process is concurrently reading from the file that is being
> written. There is however another process, nodeserv,
> that does random reads from the "previous" file (the one we're
> sequentially reading in drdbmake).
> The deadlock is always in the final ftruncate. It does not help to
> close the file and reopen it again before the ftruncate call.

Thanks. If find time to think enough about the backtraces you sent
it'll probably be obvious, but the complete description of your workload
is helpful.

Just out of curiosity have you verified that the deadlock doesn't exist
if you skip the preallocation? I wonder about a fake test where you
simply write a bit extra, and truncate that back.

-Eric

> /Fredrik