2008-11-29 13:18:28

by Andres Freund

[permalink] [raw]
Subject: EXT4 ENOSPC Bug

Hi Ted, Hi Andreas, hi all,

On a testsystem (spare laptop) with ext4 as root filesystem sometimes the
system starts to return ENOSPC to all write/create syscalls.
Sometimes the system runs without problems, at other times it starts having
problems soon after boot.
A reboot resolves the problem temporarily.

I don't see a specific usage triggering the problem.

Deleting some files sometimes allows the creation (just touch $unused_filename)
of some files, but not many.

Anything I can do to help you to debug the problem?

Andres

Kernel is: 2.6.28-rc6-andres-00007-ged31348

# df /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/root_crypt
302855628 110032848 192822780 37% /

# df -i /
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/root_crypt
19234816 786691 18448125 5% /


# tune2fs -l /dev/mapper/root_crypt
tune2fs 1.41.3 (12-Oct-2008)
Filesystem volume name: root
Last mounted on: <not available>
Filesystem UUID: b81f5bab-9544-45a9-8f15-aa7012ae2522
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype
needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg
dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 19234816
Block count: 76921099
Reserved block count: 0
Free blocks: 48205695
Free inodes: 18448125
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1005
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Tue Sep 30 11:57:33 2008
Last mount time: Thu Nov 27 14:30:58 2008
Last write time: Thu Nov 27 14:30:58 2008
Mount count: 116
Maximum mount count: 200
Last checked: Mon Oct 6 13:54:09 2008
Check interval: 31104000 (12 months)
Next check after: Thu Oct 1 13:54:09 2009
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
First orphan inode: 288694
Default directory hash: tea
Directory Hash Seed: d0501622-2c99-4b0d-9a0c-d6b6705ff4f2
Journal backup: inode blocks


Attachments:
(No filename) (2.65 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments
Subject: Re: EXT4 ENOSPC Bug

Andres Freund wrote:
> Hi Ted, Hi Andreas, hi all,
>
> On a testsystem (spare laptop) with ext4 as root filesystem sometimes the
> system starts to return ENOSPC to all write/create syscalls.
> Sometimes the system runs without problems, at other times it starts
having
> problems soon after boot.
> A reboot resolves the problem temporarily.
I have had the same problem.
>
> I don't see a specific usage triggering the problem.
Using Vuze(formerly azureus) seemed to trigger the problem. My system
can run fine for days before it happens.
>
> Deleting some files sometimes allows the creation (just touch
$unused_filename)
> of some files, but not many.
>
> Anything I can do to help you to debug the problem?
>
> Andres
I do not have much more info to give you, this is mostly a "me too".

Thanks,
Jonathan

2008-11-29 21:15:36

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Sat, Nov 29, 2008 at 02:18:19PM +0100, Andres Freund wrote:
> Hi Ted, Hi Andreas, hi all,
>
> On a testsystem (spare laptop) with ext4 as root filesystem sometimes the
> system starts to return ENOSPC to all write/create syscalls.
> Sometimes the system runs without problems, at other times it starts having
> problems soon after boot.

Hmm.... Is there anything interesting log messages relating to ext4 or
jbd2 showing up in /var/log/messages or in dmesg?

- Ted

2008-11-29 21:31:37

by Andres Freund

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Saturday 29 November 2008 22:15:33 Theodore Tso wrote:
> On Sat, Nov 29, 2008 at 02:18:19PM +0100, Andres Freund wrote:
> > On a testsystem (spare laptop) with ext4 as root filesystem sometimes the
> > system starts to return ENOSPC to all write/create syscalls.
> > Sometimes the system runs without problems, at other times it starts
> > having problems soon after boot.
> Hmm.... Is there anything interesting log messages relating to ext4 or
> jbd2 showing up in /var/log/messages or in dmesg?
Nothing else than telling that barriers are not used (I am using dm-crypt...),
but that shouldnt be related.

Andres


Attachments:
(No filename) (619.00 B)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-12-01 12:35:04

by Andres Freund

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

Hi Ted, hi all,

Are there any additional informations you could use?
The filesystem is a bit big for you to download unfortunately (and the testdata
it contains a bit to sensitive).

Andres


Attachments:
(No filename) (192.00 B)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-12-01 19:42:32

by Andreas Dilger

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Dec 01, 2008 13:34 +0100, Andres Freund wrote:
> Are there any additional informations you could use?
> The filesystem is a bit big for you to download unfortunately (and the
> testdata it contains a bit to sensitive).

Som information that would be useful:
- did you run "e2fsck -f" to see if there were any errors in the filesystem?
- do you run any specific applications that seem to trigger the problem
(e.g. Vuze (formerly azureus) as was reported by another user)
- do the applications writing to this file have any unusual IO pattern
(e.g. mmap IO, lots of write+truncate+write on the same file, etc)

We discussed the creation of a debugging patch to help diagnose this
problem. It looks like you have already compiled your own kernel, so
I assume it would be possible for you to run with an additional patch?

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-12-01 20:16:12

by Andres Freund

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

Hi Andreas, Hi all,

On Monday 01 December 2008 20:42:04 Andreas Dilger wrote:
> On Dec 01, 2008 13:34 +0100, Andres Freund wrote:
> > Are there any additional informations you could use?
> > The filesystem is a bit big for you to download unfortunately (and the
> > testdata it contains a bit to sensitive).
> - did you run "e2fsck -f" to see if there were any errors in the
> filesystem?
Yes, no errors.

> - do you run any specific applications that seem to trigger the
> problem (e.g. Vuze (formerly azureus) as was reported by another user)
No. The Problem occured sometimes shortly after boot (Once even before X was
up) sometimes only after days of full usage.
Boot starts postgres, thats the only thing that propably always ran. But I
have seen it while postgres was idle - i.e. just some polls and stats.

> - do the applications writing to this file have any unusual IO pattern
> (e.g. mmap IO, lots of write+truncate+write on the same file, etc)
Its not a single file, but the whole filesystem, having problems.

> We discussed the creation of a debugging patch to help diagnose this
> problem. It looks like you have already compiled your own kernel, so
> I assume it would be possible for you to run with an additional patch?
Sure, no problem.
At least if the patch is not expensive during run time. The system produces
test-data for testing our development software and I cant take it down for too
long.
Thats the reason why I can use something like ext4 on such a system at all ;-)

It sometimes takes quite a while to reproduce the problem though. The system
ran stable for 4 days so far.
Same kernel as the last time the error occured though.

Any specific debug options I should enable?

Andres


Attachments:
(No filename) (1.68 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-12-02 07:57:16

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

Hi Andres,

What we suspect may be happening is that somehow, there is something
in your workload which is causing a leak or a double-increment of the
number of blocks reserved so that delayed allocation will succeed.
Basically, when the system writes to a block that hasn't been
previously allocated by filesystme, we reserve a data block so that
when we finally do the late delayed allocation, we know that a free
block will be available. When we do finally determine where the block
will be located on the filesystem, we decrement the reserved block
counter. If somehow the reserved block counter is growing and not
shrinking when it should, that could lead to the problem which you
describe.

So.... could you apply this patch, attached below. You can trigger it
using the attached program, debug-ioctl. If the filesystem is
quiscent, and you've typed sync once or twice, you should get the
following in your printk logs:

[ 2742.603886] ext4 debug delalloc of dm-4
[ 2742.603948] ext4: dirty blocks 0 free blocks 7324359
[ 2742.603960] ext4 debug delalloc done

If you do have some dirty blocks that haen't been flushed out to disk,
it might look liket this:

[ 2758.653682] ext4 debug delalloc of dm-4
[ 2758.653697] ext4: dirty blocks 172 free blocks 7324439
[ 2758.653703] s_dirty list:
[ 2758.653708] ino 401167: 79 2
[ 2758.653713] ino 401200: 2 2
[ 2758.653718] ino 401197: 3 2
....
[ 2758.653828] ext4 debug delalloc done

If our theory is correct, I suspect you will start to see the number
of dirty blocks grown over time, even before you start seeing ENOSPC
errors (which will happen when the number of dirty blocks exceeds the
number of free blocks).

In that case, the list of inodes that have data and metadata blocks
reserved will hopefully tell us soemthing about what might be going
on. Just run the debug-ioctl command giving a filename or directory
within the filesystem where you want to deump out the debugging
information; it will be dumpd out in the dmesg buffer.

- Ted


Attachments:
(No filename) (1.98 kB)
debug-delalloc-patch (2.42 kB)
debug-ioctl.c (863.00 B)
Download all attachments

2008-12-02 14:59:09

by Andres Freund

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

Hi Ted, Hi Andreas,

On Tuesday 02 December 2008 08:57:12 Theodore Tso wrote:
> So.... could you apply this patch, attached below. You can trigger it
> using the attached program, debug-ioctl. If the filesystem is
> quiscent, and you've typed sync once or twice, you should get the
> following in your printk logs:
Ok. The system now runs (without problems) with the patch enabled and I can
get the debug output.
30 Minutes after boot the system still returns to zero dirty blocks and the
free blocks seem to stay in a sensible range.
I will let it run for a while and report back if either something interesting
happens or the problem reappears and I am seeing no significant amount of dirty
blocks.

Andres



Attachments:
(No filename) (717.00 B)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments
Subject: Re: EXT4 ENOSPC Bug

On Mon, 01 Dec 2008, Andreas Dilger wrote:
> - do the applications writing to this file have any unusual IO pattern
> (e.g. mmap IO, lots of write+truncate+write on the same file, etc)

Vuze does a massive ammount of sparse writes. It can also be configured to
pre-allocate space (either by creating an empty non-sparse file, or by
telling the kernel that the file is expected to grow to size foo and to
please pre-allocate that in the filesystem. I don't recall how this is
done).

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2008-12-02 16:47:13

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Tue, Dec 02, 2008 at 03:58:53PM +0100, Andres Freund wrote:
> Ok. The system now runs (without problems) with the patch enabled and I can
> get the debug output.
> 30 Minutes after boot the system still returns to zero dirty blocks and the
> free blocks seem to stay in a sensible range.
> I will let it run for a while and report back if either something interesting
> happens or the problem reappears and I am seeing no significant amount of dirty
> blocks.

You say you are using Postgres, right? Something you might try to see
if it triggers the problem it is creating a new database and then
restoring some database dump/backup into that new database. Some
databases expand into a new table space (or whatever terminology
Postgres uses) by random writes into a sparse portion of the file.
This could be triggering the problem, or at least trigger the problem
more quickly.

The other thing I wanted to ask is whether "df" was showing the 37%
in-use statistic at the time, or was that after you rebooted. And
although I hate to ask it, you're sure this isn't the standard "delete
an in-use file but not get the space back" Unix trap, right?

Thanks, regards,

- Ted

2008-12-02 17:47:40

by Andres Freund

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

Hi,

On Tuesday 02 December 2008 17:47:09 Theodore Tso wrote:
> You say you are using Postgres, right? Something you might try to see
> if it triggers the problem it is creating a new database and then
> restoring some database dump/backup into that new database. Some
> databases expand into a new table space (or whatever terminology
> Postgres uses) by random writes into a sparse portion of the file.
> This could be triggering the problem, or at least trigger the problem
> more quickly.
I tried that - I have seen no problems so far. But it is not the first time I
did not see the problem for some time.

Btw, postgres just creates the database by copying over a default database.

For an easy test with sparse files, I created a big one, set it up as a loop
device, created a filesystem and ran some stuff in it.
No Problem so far.

> The other thing I wanted to ask is whether "df" was showing the 37%
> in-use statistic at the time, or was that after you rebooted.
It definitely was before a reboot. And there were plenty of both, inodes and
blocks.

I think that I have seen the problem on metadata only changes (find /tmp -type
f|xargs touch) as well, but sometimes metadata changes were possible while file
creation was not.

Another Datapoint: File deletion sometimes made it possible to create more
files, but by far not as much as the space freed.

> And although I hate to ask it, you're sure this isn't the standard "delete
> an in-use file but not get the space back" Unix trap, right?
There were over 200GB free, so I doubt that. I don't know what could have
caused an allocation of so much space unnoticed in an idle system multiple
times.
But I do understand the reason for the question ;-)


Andres


Attachments:
(No filename) (1.69 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-12-02 20:33:21

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Tue, Dec 02, 2008 at 06:47:24PM +0100, Andres Freund wrote:
>
> I think that I have seen the problem on metadata only changes (find
> /tmp -type f|xargs touch) as well, but sometimes metadata changes
> were possible while file creation was not.

Hmm... really? Was the error message ENOSPC, or something else?

The only way I can see that it might be ENOSPC would be if we got an
error in the jbd2 layer, in start_this_handle(), but that would have
resulted in a kernel printk:

if (nblocks > journal->j_max_transaction_buffers) {
printk(KERN_ERR "JBD: %s wants too many credits (%d > %d)\n",
current->comm, nblocks,
journal->j_max_transaction_buffers);
ret = -ENOSPC;
goto out;
}


If that wasn't too long ago, you might want to search your old system
log files in /var/log for any "JBD" or "ext4" messages. That might be
an important clue, although that seems rather unlikely to me. Or in
the case where metadata changes were failing, perhaps they were doing
so with some other error that ENOSPC?

- Ted

Subject: Re: EXT4 ENOSPC Bug

Theodore Tso wrote:

> Hmm... really? Was the error message ENOSPC, or something else?
>
I can confirm I get a ENOSPC on open/creat. Here is the relevant output
from strace touch test:

open("test", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = -1 ENOSPC (No
space left on device)

And the df -h output (first filesystem in list):

Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg0-root 88G 60G 25G 71% /
tmpfs 1.6G 12K 1.6G 1% /lib/init/rw
udev 10M 112K 9.9M 2% /dev
tmpfs 1.6G 0 1.6G 0% /dev/shm
/dev/sda1 92M 19M 69M 22% /boot
none 1.6G 3.3M 1.6G 1% /tmp

Cheers,
Jonathan

Subject: Re: EXT4 ENOSPC Bug

Jonathan Bastien-Filiatrault wrote:
> Theodore Tso wrote:
>
>> Hmm... really? Was the error message ENOSPC, or something else?
>>
> I can confirm I get a ENOSPC on open/creat. Here is the relevant output
> from strace touch test:
>
(Sorry for replying to myself)

This happened while compiling a kernel with ccache enabled (with a
relatively low hit rate for this specific build). Vuze was also opened,
but with no active download.

Cheers,
Jonathan

2008-12-03 04:37:21

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Tue, Dec 02, 2008 at 07:40:43PM -0500, Jonathan Bastien-Filiatrault wrote:
> Jonathan Bastien-Filiatrault wrote:
> > Theodore Tso wrote:
> >
> >> Hmm... really? Was the error message ENOSPC, or something else?
> >>
> > I can confirm I get a ENOSPC on open/creat. Here is the relevant output
> > from strace touch test:
> >
> (Sorry for replying to myself)
>
> This happened while compiling a kernel with ccache enabled (with a
> relatively low hit rate for this specific build). Vuze was also opened,
> but with no active download.

Hmm... I build with ccache all the time. I normally don't run a
bittorrent client, though. What kernel version are you using?

Any chance you can apply the patch which I sent earlier in this
thread, trigger the debugging ioctl when it happens again?

Thanks,

- Ted

2008-12-03 15:36:13

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Tue, Dec 02, 2008 at 07:37:20PM -0500, Jonathan Bastien-Filiatrault wrote:
> Theodore Tso wrote:
>
> > Hmm... really? Was the error message ENOSPC, or something else?
> >
> I can confirm I get a ENOSPC on open/creat. Here is the relevant output
> from strace touch test:
>
> open("test", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = -1 ENOSPC (No
> space left on device)
>

Can you make sure you have the below patch in the kernel.

c001077f4003fa75793bb62979baa6241dd8eb19

commit c001077f4003fa75793bb62979baa6241dd8eb19
Author: Eric Sandeen <[email protected]>
Date: Tue Aug 19 22:19:50 2008 -0400

ext4: Fix bug where we return ENOSPC even though we have plenty of inodes

-aneesh

2008-12-03 17:23:18

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Wed, Dec 03, 2008 at 09:04:41PM +0530, Aneesh Kumar K.V wrote:
>
> Can you make sure you have the below patch in the kernel.
>
> c001077f4003fa75793bb62979baa6241dd8eb19
>
> commit c001077f4003fa75793bb62979baa6241dd8eb19
> Author: Eric Sandeen <[email protected]>
> Date: Tue Aug 19 22:19:50 2008 -0400
>
> ext4: Fix bug where we return ENOSPC even though we have plenty of inodes
>

Mmmm, good point, thanks. I had been assuming this was caused by some
failure in the delayed allocation code with block accounting, but we
also had a bug fix that was causing a problem with inode allocation.

That doesn't explain a report of an ENOSPC error with metadata only
changes were failing (i.e., touching a file that already exists),
although I don't think we've gotten a lot of information about that
scenario and it feels a little unconfirmed to me still...

So, yes, this patch may solve the issue for you.

- Ted


Attachments:
(No filename) (930.00 B)
0001-ext4-Fix-bug-where-we-return-ENOSPC-even-though-we.patch (1.56 kB)
Download all attachments

2008-12-03 18:18:33

by Andres Freund

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Wednesday 03 December 2008 18:23:09 Theodore Tso wrote:
> On Wed, Dec 03, 2008 at 09:04:41PM +0530, Aneesh Kumar K.V wrote:
> > Can you make sure you have the below patch in the kernel.
> > c001077f4003fa75793bb62979baa6241dd8eb19
Yes, I do have that patch in kernel (verified through git and looking at the
source). Its a part of linus' tree.

> Mmmm, good point, thanks. I had been assuming this was caused by some
> failure in the delayed allocation code with block accounting, but we
> also had a bug fix that was causing a problem with inode allocation.
>
> That doesn't explain a report of an ENOSPC error with metadata only
> changes were failing (i.e., touching a file that already exists),
> although I don't think we've gotten a lot of information about that
> scenario and it feels a little unconfirmed to me still...
Unfortunately I am not absolutely sure (I reported this) as well, this is the
reason I did not report it at first but only after no culprit could be found.

Still no luck reproducing it once more.


Andres


Attachments:
(No filename) (1.02 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-12-10 00:08:14

by Andres Freund

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

Hi Ted,

Very Strange:
I booted (after an crash which I did not see) a somewhat older kernel and I
once more caught the issue. Unfortunately I do not have your patch in this
older kernel.


older kernel: 2.6.28-rc5-andres-00070-g4e14e83
kernel I rean whole last week: 2.6.28-rc6-andres-00007-ged31348
There seem to have been no significant ext4 related changes to the kernel
between those.

for i in `seq 1 1000`;do touch tmp$i;done
touch: cannot touch `tmp139': No space left on device
...

rm tmp*;for i in `seq 1 200`;do dd if=/dev/zero of=tmp$i bs=1024k count=1;done
dd: opening `tmp139': No space left on device
139
..

Various different blocksizes and filesizes all return the same result.

This all hints to the patch Anesh linked. But it is included in 2.6.28-rc5.
Any idea how that could happen?

The kernel install is just a "make;make install; update-initramfs -c -k `make
kernelrelease`; update-grub" run...

A short idea how I could verify on binary level what kernelrelease it really
is?


If a somehow corrupted compile (murked ccache?) is the culprit of this (in
addition to my inaccuracies) I am very sorry for all your work.

I let this kernel run for the next hours if you want to run some tests...

Thanks,

Andres



Attachments:
(No filename) (1.22 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2009-02-16 11:47:50

by Andres Freund

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

Hi All,

"Luckily", to prove that I am not completly mad I recently found this
problem again. On a more recent kernel:
2.6.29-rc3-andres-00498-g68e80d5
(upstream 2.6.29 + Theodore's ext4 debug patches)

Again I got:
open("/home/andres/tt", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = -1
ENOSPC (No space left on device)

When doing `touch /home/andres/tt`

Again df does not show something suspicious:
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/root_crypt
19234816 826296 18408520 5% /
tmpfs 491025 3 491022 1% /lib/init/rw
varrun 491025 96 490929 1% /var/run
varlock 491025 3 491022 1% /var/lock
udev 491025 5517 485508 2% /dev
tmpfs 491025 1 491024 1% /dev/shm
/dev/sda2 61312 110 61202 1% /boot

Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/root_crypt
302855628 202819548 100036080 67% /
tmpfs 1964100 0 1964100 0% /lib/init/rw
varrun 1964100 300 1963800 1% /var/run
varlock 1964100 0 1964100 0% /var/lock
udev 1964100 92 1964008 1% /dev
tmpfs 1964100 0 1964100 0% /dev/shm
/dev/sda2 964532 634052 281484 70% /boot

Delayed allocations are not showing anything suspicious:
[72586.050538] ext4 debug delalloc of dm-0
[72586.050547] ext4: dirty blocks 22 free blocks 25008749
[72586.050551] s_dirty list:
[72586.050556] ino 534064: 2 2
[72586.050559] ino 539981: 2 2
[72586.050563] ino 524506: 5 2
[72586.050566] ino 184479: 1 2
[72586.050571] ino 167172: 2 2
[72586.050574] ext4 debug delalloc done


Any debugging ideas?

I got this twice in the last two months, so its not really easy to
reproduce :-(

Andres

PS: All information is taken out of the running kernel while having the
problem

meminfo:
MemTotal: 3928200 kB
MemFree: 23312 kB
Buffers: 239564 kB
Cached: 2383280 kB
SwapCached: 0 kB
Active: 1490948 kB
Inactive: 2149360 kB
Active(anon): 879880 kB
Inactive(anon): 294356 kB
Active(file): 611068 kB
Inactive(file): 1855004 kB
Unevictable: 52 kB
Mlocked: 52 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 52 kB
Writeback: 4 kB
AnonPages: 1017356 kB
Mapped: 129440 kB
Slab: 123316 kB
SReclaimable: 94512 kB
SUnreclaim: 28804 kB
PageTables: 26936 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 1964100 kB
Committed_AS: 1169664 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 341580 kB
VmallocChunk: 34359392035 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 86016 kB
DirectMap2M: 4003840 kB

2009-02-16 15:20:21

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Mon, Feb 16, 2009 at 12:37:19PM +0100, Andres Freund wrote:
> Hi All,
>
> "Luckily", to prove that I am not completly mad I recently found this
> problem again. On a more recent kernel:
> 2.6.29-rc3-andres-00498-g68e80d5
> (upstream 2.6.29 + Theodore's ext4 debug patches)
>
> Again I got:
> open("/home/andres/tt", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = -1
> ENOSPC (No space left on device)

So /home/andres/tt does not exist, correct? Does this happen for any
attempt to create a new zero-length file using the "touch" command?
(Even in other directories) Can you append to a file using "cat
/etc/mailcap >> some_existing_file"?

If it is "yes" to the first two questions, and "no" the third, it
looks like this might be an inode allocation failure as opposed to a
block allocation failure.

Once this triggers, does it reliably continue to fail if you reboot?
Or does it go away when you reboot?

- Ted

2009-02-16 15:27:22

by Andres Freund

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

Hi,

On 02/16/2009 04:01 PM, Theodore Tso wrote:
> On Mon, Feb 16, 2009 at 12:37:19PM +0100, Andres Freund wrote:
>> "Luckily", to prove that I am not completly mad I recently found this
>> problem again. On a more recent kernel:
>> 2.6.29-rc3-andres-00498-g68e80d5
>> (upstream 2.6.29 + Theodore's ext4 debug patches)
>>
>> Again I got:
>> open("/home/andres/tt", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = -1
>> ENOSPC (No space left on device)
>
> So /home/andres/tt does not exist, correct?
yes

> Does this happen for any
> attempt to create a new zero-length file using the "touch" command?
> (Even in other directories) Can you append to a file using "cat
> /etc/mailcap>> some_existing_file"?
Happens at the moment only for not existing files in every directory i
tried.

> Once this triggers, does it reliably continue to fail if you reboot?
Yes and no. It triggers as long as I don't delete any files:

Out of another mail:

for i in `seq 1 1000`;do touch tmp$i;done
touch: cannot touch `tmp139': No space left on device
...

rm tmp*;for i in `seq 1 200`;do dd if=/dev/zero of=tmp$i bs=1024k
count=1;done
dd: opening `tmp139': No space left on device
139
..

So, yes, seems to be an inode allocation problem.

> Or does it go away when you reboot?
Reboot fixes it.


Andres

2009-02-16 19:00:32

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Mon, Feb 16, 2009 at 04:27:03PM +0100, Andres Freund wrote:
>
> So, yes, seems to be an inode allocation problem.
>

Andres, Alex, others,

I'm pretty sure the ENOSPC problem which you both found is an inode
allocation problem. Some of you seem to have an easier time
reproducing it than others; could you try this patch, and periodically
scan your system logs for the message "ext4: find_group_flex failed,
fallback succeeded"? If the problem goes away for you, and you find
the occasional aforemention message in your system log, that will
confirm what I suspect, which is the bug is in fs/ext4/inode.c's
find_group_flex() function. (If I'm wrong, the fallback code will
activate only when the filesystem is genuinely out of inodes, which
should be very rare.)

More comments are in the patch header. My current long-term plan for
dealing with this is to enhance find_group_orlov() to and
find_group_other() to understand about flex_bg's.

- Ted

commit 1012e25b371b203164e4766a98f1e696df68b56d
Author: Theodore Ts'o <[email protected]>
Date: Mon Feb 16 13:51:16 2009 -0500

ext4: Add fallback for find_group_flex

This is a workaround for find_group_flex() which badly needs to be
replaced. One of its problems (besides ignoring the Orlov algorithm)
is that it is a bit hyperactive about returning failure under
suspicious circumstances. This can lead to spurious ENOSPC failures.
Work around this for now by retrying the search using
find_group_other() if find_group_flex() returns -1. If
find_group_other() succeeds when find_group_flex(), log a warning
message. I can't quite find the motivation to spend effort working on
fixing find_group_flex() given that I want to replace it all anyway
(and in fact work on the replacement code is underway), so we may
leave the workaround for as long as find_group_flex() stays in the
kernel...

Signed-off-by: "Theodore Ts'o" <[email protected]>

diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index a200059..21080ab 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -715,6 +715,13 @@ struct inode *ext4_new_inode(handle_t *handle, struct inode *dir, int mode)

if (sbi->s_log_groups_per_flex) {
ret2 = find_group_flex(sb, dir, &group);
+ if (ret2 == -1) {
+ ret2 = find_group_other(sb, dir, &group);
+ if (ret2 == 0)
+ printk(KERN_NOTICE "ext4: find_group_flex "
+ "failed, fallback succeeded dir %lu\n",
+ dir->i_ino);
+ }
goto got_group;
}


2009-02-17 17:21:15

by Alex Buell

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Mon, 16 Feb 2009 14:00:01 -0500, I waved a wand and this message
magically appears in front of Theodore Tso:

> On Mon, Feb 16, 2009 at 04:27:03PM +0100, Andres Freund wrote:
> >
> > So, yes, seems to be an inode allocation problem.
>
> Andres, Alex, others,
>
> I'm pretty sure the ENOSPC problem which you both found is an inode
> allocation problem. Some of you seem to have an easier time
> reproducing it than others; could you try this patch, and periodically
> scan your system logs for the message "ext4: find_group_flex failed,
> fallback succeeded"? If the problem goes away for you, and you find
> the occasional aforemention message in your system log, that will
> confirm what I suspect, which is the bug is in fs/ext4/inode.c's
> find_group_flex() function. (If I'm wrong, the fallback code will
> activate only when the filesystem is genuinely out of inodes, which
> should be very rare.)

OK, I had to go look through the archives on linux-ext4 mailing list to
see what the context was. For myself, this used to happen at least once
a week with 2.6.26, and less frequently with 2.6.27. I think that 2.6.28
with your patch should get rid of that problem altogether. I will of
course get in touch should I see any more of these find_group_flex
failures as that would mean your patch worked.

Thanks for your work on tracking this one down!
--
http://www.munted.org.uk

Fearsome grindings.

2009-02-17 17:36:29

by Andres Freund

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On 02/16/2009 08:00 PM, Theodore Tso wrote:
> On Mon, Feb 16, 2009 at 04:27:03PM +0100, Andres Freund wrote:
>> So, yes, seems to be an inode allocation problem.

> I'm pretty sure the ENOSPC problem which you both found is an inode
> allocation problem. Some of you seem to have an easier time
> reproducing it than others; could you try this patch, and periodically
> scan your system logs for the message "ext4: find_group_flex failed,
> fallback succeeded"? If the problem goes away for you, and you find
> the occasional aforemention message in your system log, that will
> confirm what I suspect, which is the bug is in fs/ext4/inode.c's
> find_group_flex() function. (If I'm wrong, the fallback code will
> activate only when the filesystem is genuinely out of inodes, which
> should be very rare.)
>
> More comments are in the patch header. My current long-term plan for
> dealing with this is to enhance find_group_orlov() to and
> find_group_other() to understand about flex_bg's.
Ok. I am now running with the patch enabled on two machines - but as the
issue occured only 2 times in nearly 2 months on two machines...

Andres

2009-02-17 18:13:20

by Eric Sandeen

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

Theodore Tso wrote:
> On Mon, Feb 16, 2009 at 04:27:03PM +0100, Andres Freund wrote:
>> So, yes, seems to be an inode allocation problem.
>>
>
> Andres, Alex, others,
>
> I'm pretty sure the ENOSPC problem which you both found is an inode
> allocation problem. Some of you seem to have an easier time
> reproducing it than others; could you try this patch, and periodically
> scan your system logs for the message "ext4: find_group_flex failed,
> fallback succeeded"? If the problem goes away for you, and you find
> the occasional aforemention message in your system log, that will
> confirm what I suspect, which is the bug is in fs/ext4/inode.c's
> find_group_flex() function. (If I'm wrong, the fallback code will
> activate only when the filesystem is genuinely out of inodes, which
> should be very rare.)
>
> More comments are in the patch header. My current long-term plan for
> dealing with this is to enhance find_group_orlov() to and
> find_group_other() to understand about flex_bg's.

Ok, I finally got to where I can reliably hit this. Just as I was about
to install an ext4 with this patch in place, and the bug was preventing
the new initrd creation ;) But worked around that, and:

ext4: find_group_flex failed, fallback succeeded dir 258402
ext4: find_group_flex failed, fallback succeeded dir 258402
ext4: find_group_flex failed, fallback succeeded dir 258402
ext4: find_group_flex failed, fallback succeeded dir 258402
....

I'll see if I can dig a bit more as to why the find_group_flex failed,
if you think it's worth it, Ted.

-Eric

2009-02-17 20:09:12

by Eric Sandeen

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

Eric Sandeen wrote:
> Theodore Tso wrote:
>> On Mon, Feb 16, 2009 at 04:27:03PM +0100, Andres Freund wrote:
>>> So, yes, seems to be an inode allocation problem.
>>>
>> Andres, Alex, others,
>>
>> I'm pretty sure the ENOSPC problem which you both found is an inode
>> allocation problem. Some of you seem to have an easier time
>> reproducing it than others; could you try this patch, and periodically
>> scan your system logs for the message "ext4: find_group_flex failed,
>> fallback succeeded"? If the problem goes away for you, and you find
>> the occasional aforemention message in your system log, that will
>> confirm what I suspect, which is the bug is in fs/ext4/inode.c's
>> find_group_flex() function. (If I'm wrong, the fallback code will
>> activate only when the filesystem is genuinely out of inodes, which
>> should be very rare.)
>>
>> More comments are in the patch header. My current long-term plan for
>> dealing with this is to enhance find_group_orlov() to and
>> find_group_other() to understand about flex_bg's.
>
> Ok, I finally got to where I can reliably hit this. Just as I was about
> to install an ext4 with this patch in place, and the bug was preventing
> the new initrd creation ;) But worked around that, and:
>
> ext4: find_group_flex failed, fallback succeeded dir 258402
> ext4: find_group_flex failed, fallback succeeded dir 258402
> ext4: find_group_flex failed, fallback succeeded dir 258402
> ext4: find_group_flex failed, fallback succeeded dir 258402
> ....
>
> I'll see if I can dig a bit more as to why the find_group_flex failed,
> if you think it's worth it, Ted.

FWIW my problem seems to be different than others have encountered; mine
persists past reboot, while other reporters have said that a reboot
(remount) makes the problem go away.

I seem to be encountering some silliness in find_group_flex when 2 out
of 3 groups are full (I "only" have 55k inodes left, all in the last group).

-Eric

2009-02-17 22:31:29

by Alex Buell

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Tue, 17 Feb 2009 14:08:21 -0600, I waved a wand and this message
magically appears in front of Eric Sandeen:

> FWIW my problem seems to be different than others have encountered;
> mine persists past reboot, while other reporters have said that a
> reboot (remount) makes the problem go away.
>
> I seem to be encountering some silliness in find_group_flex when 2 out
> of 3 groups are full (I "only" have 55k inodes left, all in the last
> group).

I've discovered a forced fsck clears this. HTH.
--
http://www.munted.org.uk

Fearsome grindings.

2009-02-17 22:56:46

by Eric Sandeen

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

Alex Buell wrote:
> On Tue, 17 Feb 2009 14:08:21 -0600, I waved a wand and this message
> magically appears in front of Eric Sandeen:
>
>> FWIW my problem seems to be different than others have encountered;
>> mine persists past reboot, while other reporters have said that a
>> reboot (remount) makes the problem go away.
>>
>> I seem to be encountering some silliness in find_group_flex when 2 out
>> of 3 groups are full (I "only" have 55k inodes left, all in the last
>> group).
>
> I've discovered a forced fsck clears this. HTH.

Do you have the output of the fsck run?

-Eric

2009-02-17 23:01:08

by Alex Buell

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Tue, 17 Feb 2009 16:56:23 -0600, I waved a wand and this message
magically appears in front of Eric Sandeen:

> Alex Buell wrote:
> > On Tue, 17 Feb 2009 14:08:21 -0600, I waved a wand and this message
> > magically appears in front of Eric Sandeen:
> >
> >> FWIW my problem seems to be different than others have encountered;
> >> mine persists past reboot, while other reporters have said that a
> >> reboot (remount) makes the problem go away.
> >>
> >> I seem to be encountering some silliness in find_group_flex when 2
> >> out of 3 groups are full (I "only" have 55k inodes left, all in
> >> the last group).
> >
> > I've discovered a forced fsck clears this. HTH.
>
> Do you have the output of the fsck run?

I'm afraid not, there's no way to save the output on a forced fsck
reboot.
--
http://www.munted.org.uk

Fearsome grindings.

2009-02-18 02:29:59

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Tue, Feb 17, 2009 at 02:08:21PM -0600, Eric Sandeen wrote:
> FWIW my problem seems to be different than others have encountered; mine
> persists past reboot, while other reporters have said that a reboot
> (remount) makes the problem go away.

It might or might not be the same problem, since the reporters were
doing this on a mounted root partition, and on a filesystem quite a
bit larger than your test filesystem; so it could be that the act of
shutting down and rebooting created/deleted various pid files, and
purturbed the filesystem to make the problem go away.

The other possibility is that it is the flex_bg specific counters
which were introduced specifically for find_group_flex. I'm not wild
about them since they mean we have to take an extra flex_bg specific
spin lock for every block and inode allocation. The Orlov algorithm
only needs the information when allocating directories, and since
those are rarer than file allocations, I think it should be OK to
simply sum up the necessary fields at directory allocation time
instead of trying to maintain separate counters (which could possibly
get corrupted, although I couldn't see a way that they could be
getting out of sync with reality).

- Ted

2009-02-18 21:19:01

by Andres Freund

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

Hi All,

On 02/17/2009 06:36 PM, Andres Freund wrote:
> On 02/16/2009 08:00 PM, Theodore Tso wrote:
>> On Mon, Feb 16, 2009 at 04:27:03PM +0100, Andres Freund wrote:
>>> So, yes, seems to be an inode allocation problem.
>> I'm pretty sure the ENOSPC problem which you both found is an inode
>> allocation problem. Some of you seem to have an easier time
>> reproducing it than others; could you try this patch, and periodically
>> scan your system logs for the message "ext4: find_group_flex failed,
>> fallback succeeded"? If the problem goes away for you, and you find
>> the occasional aforemention message in your system log, that will
>> confirm what I suspect, which is the bug is in fs/ext4/inode.c's
>> find_group_flex() function. (If I'm wrong, the fallback code will
>> activate only when the filesystem is genuinely out of inodes, which
>> should be very rare.)
>> More comments are in the patch header. My current long-term plan for
>> dealing with this is to enhance find_group_orlov() to and
>> find_group_other() to understand about flex_bg's.
> Ok. I am now running with the patch enabled on two machines - but as the
> issue occured only 2 times in nearly 2 months on two machines...
Didn't take that long:
On one of the machines I got several thousand of:

[10379.575904] ext4: find_group_flex failed, fallback succeeded dir 416319
[10379.576002] ext4: find_group_flex failed, fallback succeeded dir 416319
[10379.579981] ext4: find_group_flex failed, fallback succeeded dir 416319
[10379.580097] ext4: find_group_flex failed, fallback succeeded dir 416319
(with different directories)

No userspace visible behaviour.

So it seems you were right. It seems sensible to put that patch without
printk in the kernel until the issue is fully solved...


Andres

2009-02-18 22:48:25

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Wed, Feb 18, 2009 at 10:18:57PM +0100, Andres Freund wrote:
> On one of the machines I got several thousand of:
>
> [10379.575904] ext4: find_group_flex failed, fallback succeeded dir 416319
> [10379.576002] ext4: find_group_flex failed, fallback succeeded dir 416319
> [10379.579981] ext4: find_group_flex failed, fallback succeeded dir 416319
> [10379.580097] ext4: find_group_flex failed, fallback succeeded dir 416319
> (with different directories)

Ok, that's good. Good to know the workaround works.

Can you send me a dumpe2fs of the filesystem in question? I'm curious
what was going on...

> No userspace visible behaviour.
>
> So it seems you were right. It seems sensible to put that patch without
> printk in the kernel until the issue is fully solved...

Thanks for the report. I'll push the workaround patch to Linus for
2.6.29 to avoid this problem for now. I recently sent to linux-ext4
for comment a patch to revamp the Orlov allocator for flex_bg and to
use that instead of find_group_flex(), but no way that's going into
2.6.29 at this point....

- Ted

2009-02-19 02:18:49

by Andres Freund

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On 02/18/2009 10:29 PM, Theodore Tso wrote:
> Ok, that's good. Good to know the workaround works.
> Can you send me a dumpe2fs of the filesystem in question? I'm curious
> what was going on...
Will do as soon as I am at the same place as the machine. I guess thats
only interesting to you privately (size and so on)?

> Thanks for the report. I'll push the workaround patch to Linus for
> 2.6.29 to avoid this problem for now. I recently sent to linux-ext4
> for comment a patch to revamp the Orlov allocator for flex_bg and to
> use that instead of find_group_flex(), but no way that's going into
> 2.6.29 at this point....
Would it be helpfull if I test that patch?

Andres

2009-02-19 05:01:12

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Thu, Feb 19, 2009 at 03:18:45AM +0100, Andres Freund wrote:
> On 02/18/2009 10:29 PM, Theodore Tso wrote:
>> Ok, that's good. Good to know the workaround works.
>> Can you send me a dumpe2fs of the filesystem in question? I'm curious
>> what was going on...
> Will do as soon as I am at the same place as the machine. I guess thats
> only interesting to you privately (size and so on)?
>
>> Thanks for the report. I'll push the workaround patch to Linus for
>> 2.6.29 to avoid this problem for now. I recently sent to linux-ext4
>> for comment a patch to revamp the Orlov allocator for flex_bg and to
>> use that instead of find_group_flex(), but no way that's going into
>> 2.6.29 at this point....
> Would it be helpfull if I test that patch?
>

Sure, I'll take all of the testing I can get. :-)

The patch is in the ext4 patch queue, and I sent them to the ext4
patch queue. The patch is also in patch work:

http://patchwork.ozlabs.org/patch/23343/

The patch which I sent you earlier (available below) is a prequisite:

http://patchwork.ozlabs.org/patch/23228/


- Ted


2009-02-19 15:48:54

by Eric Sandeen

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

Theodore Tso wrote:
> On Thu, Feb 19, 2009 at 03:18:45AM +0100, Andres Freund wrote:
>> On 02/18/2009 10:29 PM, Theodore Tso wrote:
>>> Ok, that's good. Good to know the workaround works.
>>> Can you send me a dumpe2fs of the filesystem in question? I'm curious
>>> what was going on...
>> Will do as soon as I am at the same place as the machine. I guess thats
>> only interesting to you privately (size and so on)?
>>
>>> Thanks for the report. I'll push the workaround patch to Linus for
>>> 2.6.29 to avoid this problem for now. I recently sent to linux-ext4
>>> for comment a patch to revamp the Orlov allocator for flex_bg and to
>>> use that instead of find_group_flex(), but no way that's going into
>>> 2.6.29 at this point....
>> Would it be helpfull if I test that patch?
>>
>
> Sure, I'll take all of the testing I can get. :-)
>
> The patch is in the ext4 patch queue, and I sent them to the ext4
> patch queue. The patch is also in patch work:
>
> http://patchwork.ozlabs.org/patch/23343/
>
> The patch which I sent you earlier (available below) is a prequisite:
>
> http://patchwork.ozlabs.org/patch/23228/

Ted, I hope the printk will be removed or at least ratelimited before it
gets upstream?

Thanks,
-Eric

2009-02-23 02:02:39

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On Thu, Feb 19, 2009 at 09:46:51AM -0600, Eric Sandeen wrote:
>
> Ted, I hope the printk will be removed or at least ratelimited before it
> gets upstream?
>

Yes, I've added a ratelimit.

- Ted

2009-02-27 03:57:01

by Andres Freund

[permalink] [raw]
Subject: Re: EXT4 ENOSPC Bug

On 02/18/2009 10:29 PM, Theodore Tso wrote:
>> [10379.575904] ext4: find_group_flex failed, fallback succeeded dir 416319
>> [10379.576002] ext4: find_group_flex failed, fallback succeeded dir 416319
>> [10379.579981] ext4: find_group_flex failed, fallback succeeded dir 416319
>> [10379.580097] ext4: find_group_flex failed, fallback succeeded dir 416319
>> (with different directories)
> Can you send me a dumpe2fs of the filesystem in question? I'm curious
> what was going on...
Unfortunately the system was rebooted, before I had the chance to do the
dump - since then the problem has not reemerged.
Would a dump after reboot still be usefull?

Andres