2009-08-19 05:49:04

by Graham Murray

[permalink] [raw]
Subject: Block allocation failed

Shortly after rebooting following an untidy shutdown due to an unrelated
(to ext4 or the filesystem) Oops. Unmounting the partition and running
e2fsck (version 1.41.8) just replayed the journal and stated the
filesystem was clean but I did not believe it so ran 'e2fsck -f
/dev/sdb3' and that showed several errors which I allowed it to correct.

Aug 18 23:50:07 newton EXT4-fs error (device sdb3): ext4_mb_generate_buddy: EXT4-fs: group 35: 3499 blocks in bitmap, 3243 in gd
Aug 18 23:50:07 newton Aborting journal on device sdb3:8.
Aug 18 23:50:07 newton EXT4-fs (sdb3): Remounting filesystem read-only
Aug 18 23:50:07 newton EXT4-fs error (device sdb3): ext4_mb_generate_buddy: EXT4-fs: group 36: 2941 blocks in bitmap, 2429 in gd
Aug 18 23:50:07 newton EXT4-fs error (device sdb3): ext4_mb_generate_buddy: EXT4-fs: group 37: 6806 blocks in bitmap, 5782 in gd
Aug 18 23:50:07 newton EXT4-fs error (device sdb3): ext4_mb_generate_buddy: EXT4-fs: group 49: 5616 blocks in bitmap, 5730 in gd
Aug 18 23:50:07 newton EXT4-fs error (device sdb3): ext4_mb_generate_buddy: EXT4-fs: group 65: 12586 blocks in bitmap, 12524 in gd
Aug 18 23:50:07 newton mpage_da_map_blocks block allocation failed for inode 275405 at logical offset 0 with max blocks 1 with error -30
Aug 18 23:50:07 newton This should not happen.!! Data will be lost
Aug 18 23:50:07 newton ext4_da_writepages: jbd2_start: 1023 pages, ino 275405; err -30
Aug 18 23:50:07 newton Pid: 344, comm: pdflush Not tainted 2.6.31-rc6-00043-g894ef82 #2
Aug 18 23:50:07 newton Call Trace:
Aug 18 23:50:07 newton [<c05d512b>] ? printk+0x23/0x36
Aug 18 23:50:07 newton [<c0258533>] ext4_da_writepages+0x463/0x4a0
Aug 18 23:50:07 newton [<c01bb54e>] do_writepages+0x2e/0x60
Aug 18 23:50:07 newton [<c0207bbb>] writeback_single_inode+0x16b/0x3c0
Aug 18 23:50:07 newton [<c013109a>] ? set_next_entity+0x15a/0x190
Aug 18 23:50:07 newton [<c02080d5>] generic_sync_sb_inodes+0x165/0x4d0
Aug 18 23:50:07 newton [<c020854e>] writeback_inodes+0x6e/0x110
Aug 18 23:50:07 newton [<c01ba222>] wb_kupdate+0xa2/0x120
Aug 18 23:50:07 newton [<c01bbc40>] ? pdflush+0x0/0x1f0
Aug 18 23:50:07 newton [<c01bbd26>] pdflush+0xe6/0x1f0
Aug 18 23:50:07 newton [<c01ba180>] ? wb_kupdate+0x0/0x120
Aug 18 23:50:07 newton [<c0160304>] kthread+0x84/0x90
Aug 18 23:50:07 newton [<c0160280>] ? kthread+0x0/0x90
Aug 18 23:50:07 newton [<c0103e3f>] kernel_thread_helper+0x7/0x18
Aug 18 23:56:29 newton EXT4-fs: mballoc: 0 blocks 1 reqs (0 success)
Aug 18 23:56:29 newton EXT4-fs: mballoc: 1 extents scanned, 0 goal hits, 1 2^N hits, 0 breaks, 0 lost
Aug 18 23:56:29 newton EXT4-fs: mballoc: 67 generated and it took 5291480
Aug 18 23:56:29 newton EXT4-fs: mballoc: 512 preallocated, 0 discarded
Aug 18 23:56:29 newton EXT4-fs error (device sdb3): ext4_put_super: Couldn't clean up the journal


2009-08-19 13:50:16

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Block allocation failed

On Wed, Aug 19, 2009 at 06:27:46AM +0100, Graham Murray wrote:
> Shortly after rebooting following an untidy shutdown due to an unrelated
> (to ext4 or the filesystem) Oops. Unmounting the partition and running
> e2fsck (version 1.41.8) just replayed the journal and stated the
> filesystem was clean but I did not believe it so ran 'e2fsck -f
> /dev/sdb3' and that showed several errors which I allowed it to correct.

What sort of errors did it find?

> Aug 18 23:50:07 newton EXT4-fs error (device sdb3): ext4_mb_generate_buddy: EXT4-fs: group 35: 3499 blocks in bitmap, 3243 in gd
> Aug 18 23:50:07 newton Aborting journal on device sdb3:8.

Was this right after you mounted the filesystem, or did some time
take place before these errors started showing up?

- Ted

2009-08-19 14:46:01

by Graham Murray

[permalink] [raw]
Subject: Re: Block allocation failed

Theodore Tso <[email protected]> writes:

> On Wed, Aug 19, 2009 at 06:27:46AM +0100, Graham Murray wrote:
>> Shortly after rebooting following an untidy shutdown due to an unrelated
>> (to ext4 or the filesystem) Oops. Unmounting the partition and running
>> e2fsck (version 1.41.8) just replayed the journal and stated the
>> filesystem was clean but I did not believe it so ran 'e2fsck -f
>> /dev/sdb3' and that showed several errors which I allowed it to correct.
>
> What sort of errors did it find?
>
Sorry, but I do not remember all the errors, it was late at night. The
first were some sort of block error with lots of block numbers in ()
which I responded 'y' to fix. Then there were a number of files with
multiply-claimed blocks which I responded 'y' to clone. There were files
containing unallocated or deleted inodes. A number of files were
recovered to Lost+Found. A number of inode reference counts were
wrong. There may have been other errors, but I do not remember what they
were.

>> Aug 18 23:50:07 newton EXT4-fs error (device sdb3): ext4_mb_generate_buddy: EXT4-fs: group 35: 3499 blocks in bitmap, 3243 in gd
>> Aug 18 23:50:07 newton Aborting journal on device sdb3:8.
>
> Was this right after you mounted the filesystem, or did some time
> take place before these errors started showing up?

It was about 90s after the message showing the filesystem mounted

Aug 18 23:48:33 newton EXT4-fs (sdb3): barriers enabled
Aug 18 23:48:33 newton kjournald2 starting: pid 3855, dev sdb3:8, commit
interval 10 seconds
Aug 18 23:48:33 newton EXT4-fs (sdb3): internal journal on sdb3:8
Aug 18 23:48:33 newton EXT4-fs (sdb3): delayed allocation enabled
Aug 18 23:48:33 newton EXT4-fs: file extents enabled
Aug 18 23:48:33 newton EXT4-fs: mballoc enabled
Aug 18 23:48:33 newton EXT4-fs (sdb3): mounted filesystem with ordered
data mode


2009-08-19 15:25:38

by Eric Sandeen

[permalink] [raw]
Subject: Re: Block allocation failed

Graham Murray wrote:
> Theodore Tso <[email protected]> writes:
>
>> On Wed, Aug 19, 2009 at 06:27:46AM +0100, Graham Murray wrote:
>>> Shortly after rebooting following an untidy shutdown due to an unrelated
>>> (to ext4 or the filesystem) Oops. Unmounting the partition and running
>>> e2fsck (version 1.41.8) just replayed the journal and stated the
>>> filesystem was clean but I did not believe it so ran 'e2fsck -f
>>> /dev/sdb3' and that showed several errors which I allowed it to correct.
>> What sort of errors did it find?
>>
> Sorry, but I do not remember all the errors, it was late at night. The
> first were some sort of block error with lots of block numbers in ()
> which I responded 'y' to fix.

Probably something like this:

Block bitmap differences: +(32768--34306) +(65536--66049)
+(98304--99842) +(131072--131585) +(163840--165378) +(196608--197121) ...

indicating corrupt bitmaps I guess, which would explain the later mismatch.

-Eric

2009-08-19 16:20:57

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Block allocation failed

On Wed, Aug 19, 2009 at 03:46:01PM +0100, Graham Murray wrote:
> Sorry, but I do not remember all the errors, it was late at night. The
> first were some sort of block error with lots of block numbers in ()
> which I responded 'y' to fix. Then there were a number of files with
> multiply-claimed blocks which I responded 'y' to clone. There were files
> containing unallocated or deleted inodes. A number of files were
> recovered to Lost+Found. A number of inode reference counts were
> wrong. There may have been other errors, but I do not remember what they
> were.

Hmm... if I had to guess, a portion of the inode table was written to
the wrong location on disk -- on top of another part of the inode
table. That is the most common cause of a large number of multiply
cliamed blocks. Was there more than half-dozen or so such inodes?
And were they numerically contiguous?

> >> Aug 18 23:50:07 newton EXT4-fs error (device sdb3): ext4_mb_generate_buddy: EXT4-fs: group 35: 3499 blocks in bitmap, 3243 in gd
> >> Aug 18 23:50:07 newton Aborting journal on device sdb3:8.
> >
> > Was this right after you mounted the filesystem, or did some time
> > take place before these errors started showing up?
>
> It was about 90s after the message showing the filesystem mounted

Hmm, the most likely cause for that would be if the block group
descriptors had an incorrect number of free blocks. But you had just
run e2fsck -f.

You might want to try running e2fsck -f twice, back to back, saving
the output of buth e2fsck runs. If the second e2fsck finds problems,
then we either have an e2fsck bug, or there is some kind of hardware
problem. Was this filesystme on some kind of RAID system by any chance?

- Ted

2009-08-19 17:33:45

by Graham Murray

[permalink] [raw]
Subject: Re: Block allocation failed

Theodore Tso <[email protected]> writes:

>> It was about 90s after the message showing the filesystem mounted
>
> Hmm, the most likely cause for that would be if the block group
> descriptors had an incorrect number of free blocks. But you had just
> run e2fsck -f.

No, sorry I was obviously not clear as to the order of things. First the
system rebooted itself a few seconds after writing an oops (spinlock bad
magic) to syslog and subitting it to kerneloops. On the reboot, the
syslog showed a normal (ie no error and no recovery) mount followed
about 90s later by the error dump I posted. After this, I unmounted the
filesystem and ran 'e2fsck /dev/sdb3' which just replayed the journal
and said the fs was clean. I then ran 'e2fsck -f' which made several
prompts for problems.