2002-11-10 22:53:26

by Martin Diehl

[permalink] [raw]
Subject: 2.5.46: buffer layer error at fs/buffer.c:399


Hi,

shortly after upgrading 2.5.45->2.5.46 I got this while cp'ing some
regular file. Source and destination was both in rootfs, which is ext2.
It's not exactly reproducible, however apparently triggered by files
larger than about 500KB - at least I got a few with those but never from
smaller one.

Despite this error/warning(?) the requested file was successfully copied
(and diffed). A forced fsck on / didn't find any corruption. And it seems
it doesn't occur if the target file exists, i.e. it's overwritten.

Kernel is 2.5.46 SMP running on UP box. The new extended attributes were
not enabled for ext2.

Martin

-------------------

* this one was triggered by "cp file1 file2" (cwd=/root)

buffer layer error at fs/buffer.c:399
Pass this trace through ksymoops for reporting
Call Trace:
[<c0151db6>] __find_get_block_slow+0xe6/0x150
[<c0152ec6>] __find_get_block+0xd6/0xf0
[<c0153277>] unmap_underlying_metadata+0x17/0x50
[<c0153853>] __block_prepare_write+0x163/0x450
[<c0153885>] __block_prepare_write+0x195/0x450
[<c0138d13>] file_read_actor+0x83/0xf0
[<c0142196>] buffered_rmqueue+0x116/0x120
[<c015434c>] block_prepare_write+0x1c/0x40
[<c01a0e70>] ext2_get_block+0x0/0x420
[<c013a1ad>] generic_file_write_nolock+0x5ed/0x9d0
[<c01a0e70>] ext2_get_block+0x0/0x420
[<c01590d5>] cp_new_stat64+0xa5/0xc0
[<c011883f>] do_page_fault+0x12f/0x44b
[<c013a603>] generic_file_write+0x53/0x70
[<c014fd3d>] vfs_write+0xcd/0x170
[<c0150028>] sys_write+0x28/0x40
[<c0109157>] syscall_call+0x7/0xb

* and this one by "dd if=/dev/zero of=/root/file bs=1k count=512"

buffer layer error at fs/buffer.c:399
Pass this trace through ksymoops for reporting
Call Trace:
[<c0151db6>] __find_get_block_slow+0xe6/0x150
[<c0152ec6>] __find_get_block+0xd6/0xf0
[<c0153277>] unmap_underlying_metadata+0x17/0x50
[<c0153853>] __block_prepare_write+0x163/0x450
[<c0153885>] __block_prepare_write+0x195/0x450
[<c0149075>] __set_page_dirty_nobuffers+0xa5/0xb0
[<c0153bb5>] __block_commit_write+0x75/0xb0
[<c015434c>] block_prepare_write+0x1c/0x40
[<c01a0e70>] ext2_get_block+0x0/0x420
[<c013a1ad>] generic_file_write_nolock+0x5ed/0x9d0
[<c01a0e70>] ext2_get_block+0x0/0x420
[<c012a07a>] update_process_times+0x2a/0x30
[<c01266af>] tasklet_hi_action+0x7f/0xe0
[<c012639c>] do_softirq+0x5c/0xc0
[<c010be45>] do_IRQ+0x1a5/0x1b0
[<c010a048>] common_interrupt+0x18/0x20
[<c013a603>] generic_file_write+0x53/0x70
[<c014fd3d>] vfs_write+0xcd/0x170
[<c012639c>] do_softirq+0x5c/0xc0
[<c0150028>] sys_write+0x28/0x40
[<c0109157>] syscall_call+0x7/0xb


2002-11-11 00:46:42

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.46: buffer layer error at fs/buffer.c:399

Martin Diehl wrote:
>
> Hi,
>
> shortly after upgrading 2.5.45->2.5.46 I got this while cp'ing some
> regular file. Source and destination was both in rootfs, which is ext2.
> It's not exactly reproducible, however apparently triggered by files
> larger than about 500KB - at least I got a few with those but never from
> smaller one.
>
> Despite this error/warning(?) the requested file was successfully copied
> (and diffed). A forced fsck on / didn't find any corruption. And it seems
> it doesn't occur if the target file exists, i.e. it's overwritten.
>
> Kernel is 2.5.46 SMP running on UP box. The new extended attributes were
> not enabled for ext2.
>
> Martin
>
> -------------------
>
> * this one was triggered by "cp file1 file2" (cwd=/root)
>
> buffer layer error at fs/buffer.c:399
> Pass this trace through ksymoops for reporting
> Call Trace:
> [<c0151db6>] __find_get_block_slow+0xe6/0x150
> [<c0152ec6>] __find_get_block+0xd6/0xf0
> [<c0153277>] unmap_underlying_metadata+0x17/0x50

This means that we had pagecache floating about which has buffers,
but those buffers had unexpected block numbers. Possibly something
went wrong during an earlier invalidation of the device during the
mount process. Do you remember if a fsck happened during that bootup?

Is there anything unusual about your setup? Using initrd? Is the
rootfs backed by a normal old disk?

Could I please see a `dumpe2fs -h' of that device?

Please, run with this patch and see if if happens again, thanks.


--- 25/fs/buffer.c~buffer-debug Sun Nov 10 16:47:31 2002
+++ 25-akpm/fs/buffer.c Sun Nov 10 16:49:59 2002
@@ -397,6 +397,9 @@ __find_get_block_slow(struct block_devic
bh = bh->b_this_page;
} while (bh != head);
buffer_error();
+ printk("block=%llu, b_blocknr=%llu\n",
+ (unsigned long long)block, (unsigned long long)bh->b_blocknr);
+ printk("b_state=0x%08lx, b_size=%u\n", bh->b_state, bh->b_size);
out_unlock:
spin_unlock(&bd_mapping->private_lock);
page_cache_release(page);

_

2002-11-11 22:07:55

by Martin Diehl

[permalink] [raw]
Subject: Re: 2.5.46: buffer layer error at fs/buffer.c:399

On Sun, 10 Nov 2002, Andrew Morton wrote:

> > buffer layer error at fs/buffer.c:399
> > Pass this trace through ksymoops for reporting
> > Call Trace:
> > [<c0151db6>] __find_get_block_slow+0xe6/0x150
> > [<c0152ec6>] __find_get_block+0xd6/0xf0
> > [<c0153277>] unmap_underlying_metadata+0x17/0x50
>
> This means that we had pagecache floating about which has buffers,
> but those buffers had unexpected block numbers. Possibly something
> went wrong during an earlier invalidation of the device during the
> mount process. Do you remember if a fsck happened during that bootup?

Very likely, yes. AFAICT from the syslog the first occurence was about 4
hours after reboot. During the reboot there was a journal recovery for all
ext3-fs (all except rootfs are ext3). The initscripts force fsck when ext2
appears unclean so I'm pretty sure there was an automatic fsck. But
unfortunately I don't recall for sure whether it detected and fixed any
inconsistency. It completed without manual interaction and the forced fsck
after the fs/buffer.c:399 message didn't show anything.

> Is there anything unusual about your setup? Using initrd? Is the

No initrd. Nothing unusual IMHO. I wouldn't call the 1k blocksize unusual
since it was silently done by some distribution installer quite some time
ago and didn't cause any harm so far. The box had 2.5.45 running for
about a week and I've never seen the buffer layer error message before -
but there were more than a few unclean shutdowns due to sysfs/usb issues.

> rootfs backed by a normal old disk?

It's /dev/hda1 on a 40GB IDE disk - no indication of any IDE problem.

> Could I please see a `dumpe2fs -h' of that device?

see below.

> Please, run with this patch and see if if happens again, thanks.

Despite a few intentional unclean reboots it didn't reappear until now.
I'll keep the patch applied and will report if it does. However, chances
are it won't: After the initial appearance I could retrigger it a few
times when writing larger files. But suddenly it didn't appear anymore.
Maybe it was a singular event, hardly to reproduce.

Thanks,
Martin

-----------------

dumpe2fs 1.25 (20-Sep-2001)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: a0dd0fc0-fe31-11d5-9437-e3df28bee53c
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: filetype sparse_super
Filesystem state: not clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 128016
Block count: 512032
Reserved block count: 25601
Free blocks: 172670
Free inodes: 99826
First block: 1
Block size: 1024
Fragment size: 1024
Blocks per group: 8192
Fragments per group: 8192
Inodes per group: 2032
Inode blocks per group: 254
Last mount time: Mon Nov 11 19:35:39 2002
Last write time: Mon Nov 11 19:41:44 2002
Mount count: 1
Maximum mount count: 33
Last checked: Mon Nov 11 19:35:38 2002
Check interval: 15552000 (6 months)
Next check after: Sat May 10 20:35:38 2003
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128