2008-03-12 09:40:41

by Tomasz Chmielewski

[permalink] [raw]
Subject: sysfs Kernel BUG when RAID bitmap file has IO errors

I have a RAID-5 bitmap file on external media:

# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[0] sdf1[4] sde1[3] sdd1[2] sdc1[1]
1562834944 blocks level 5, 64k chunk, algorithm 2 [5/5] [UUUUU]
bitmap: 0/373 pages [0KB], 512KB chunk, file: /root/raid/bitmap

unused devices: <none>


Unfortunately, that external media is flash which slowly dies:

# md5sum /root/raid/bitmap
md5sum: /root/raid/bitmap: Input/output error


Let's access "/sys/block/md0/md/dev-sdd1/super":

# cat /sys/block/md0/md/dev-sdd1/super

# dmesg -c
------------[ cut here ]------------
Kernel BUG at 78178626 [verbose debug info unavailable]
invalid opcode: 0000 [#8]
Modules linked in: iscsi_scst crc32c libcrc32c scst_vdisk scst
iptable_filter ip_tables x_tables as_iosched deadline_iosched nfs nfsd
lockd nfs_acl auth_rpcgss sunrpc exportfs dm_snapshot dm_mirror bonding
e1000 sata_mv

Pid: 4901, comm: cat Tainted: G D (2.6.24.2-1 #1)
EIP: 0060:[<78178626>] EFLAGS: 00010212 CPU: 0
EIP is at sysfs_read_file+0x88/0xd4
EAX: 00000001 EBX: 91611780 ECX: 00000000 EDX: 976fecc0
ESI: 00001000 EDI: 976fed20 EBP: 9745b344 ESP: 8353df54
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process cat (pid: 4901, ti=8353c000 task=7d4caa40 task.ti=8353c000)
Stack: 93511080 0804b4d8 00001000 0804e000 91611794 783515f0 847e3380
0804e000
7817859e 00001000 7814ad7c 8353dfa0 00001000 847e3380 fffffff7
0804e000
8353c000 7814b0aa 8353dfa0 00000000 00000000 00000000 00000003
0804e000
Call Trace:
[<7817859e>] sysfs_read_file+0x0/0xd4
[<7814ad7c>] vfs_read+0x88/0x10a
[<7814b0aa>] sys_read+0x41/0x67
[<78103bca>] syscall_call+0x7/0xb
=======================
Code: c0 74 61 8b 47 18 8b 4b 0c 8b 40 04 89 43 24 89 e8 8b 74 24 14 8b
57 14 ff 16 89 c6 89 f8 e8 18 0b 00 00 81 fe ff 0f 00 00 7e 04 <0f> 0b
eb fe 85 f6 78 31 c7 43 20 00 00 00 00 89 33 eb 07 be f4
EIP: [<78178626>] sysfs_read_file+0x88/0xd4 SS:ESP 0068:8353df54
---[ end trace 7e6d8636f39e737a ]---



--
Tomasz Chmielewski
http://wpkg.org


2008-03-12 09:51:55

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: sysfs Kernel BUG when RAID bitmap file has IO errors

Tomasz Chmielewski schrieb:

(...)

> Let's access "/sys/block/md0/md/dev-sdd1/super":
>
> # cat /sys/block/md0/md/dev-sdd1/super
>
> # dmesg -c
> ------------[ cut here ]------------
> Kernel BUG at 78178626 [verbose debug info unavailable]

It turns out a broken RAID bitmap file has nothing to do with it - the
same happens on a different machine without a bitmap file:

------------[ cut here ]------------
Kernel BUG at 7817736a [verbose debug info unavailable]
invalid opcode: 0000 [#1]
Modules linked in: as_iosched nfs lockd nfs_acl sunrpc bonding dm_mirror
dm_snapshot e1000 sata_mv

Pid: 2494, comm: cat Not tainted (2.6.24.3-1 #1)
EIP: 0060:[<7817736a>] EFLAGS: 00010212 CPU: 0
EIP is at sysfs_read_file+0x88/0xd4
EAX: 00000001 EBX: 961b5880 ECX: 00000000 EDX: 964ef360
ESI: 00001000 EDI: 964ef3c0 EBP: 9705bd04 ESP: 971f1f54
DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
Process cat (pid: 2494, ti=971f0000 task=970ad9a0 task.ti=971f0000)
Stack: 96443080 0804b4d8 00001000 0804e000 961b5894 7835f6f0 96193400
0804e000
781772e2 00001000 78149bd5 971f1fa0 00001000 96193400 fffffff7
0804e000
971f0000 78149f03 971f1fa0 00000000 00000000 00000000 00000003
0804e000
Call Trace:
[<781772e2>] sysfs_read_file+0x0/0xd4
[<78149bd5>] vfs_read+0x88/0x10a
[<78149f03>] sys_read+0x41/0x67
[<78103bba>] syscall_call+0x7/0xb
=======================
Code: c0 74 61 8b 47 18 8b 4b 0c 8b 40 04 89 43 24 89 e8 8b 74 24 14 8b
57 14 ff 16 89 c6 89 f8 e8 18 0b 00 00 81 fe ff 0f 00 00 7e 04 <0f> 0b
eb fe 85 f6 78 31 c7 43 20 00 00 00 00 89 33 eb 07 be f4
EIP: [<7817736a>] sysfs_read_file+0x88/0xd4 SS:ESP 0068:971f1f54
---[ end trace e39b88900c26a529 ]---


Certainly, it doesn't happen with 2.6.18 kernels.



--
Tomasz Chmielewski
http://wpkg.org

2008-03-12 22:36:36

by Andrew Morton

[permalink] [raw]
Subject: Re: sysfs Kernel BUG when RAID bitmap file has IO errors

On Wed, 12 Mar 2008 10:51:38 +0100
Tomasz Chmielewski <[email protected]> wrote:

> Tomasz Chmielewski schrieb:
>
> (...)
>
> > Let's access "/sys/block/md0/md/dev-sdd1/super":
> >
> > # cat /sys/block/md0/md/dev-sdd1/super
> >
> > # dmesg -c
> > ------------[ cut here ]------------
> > Kernel BUG at 78178626 [verbose debug info unavailable]
>
> It turns out a broken RAID bitmap file has nothing to do with it - the
> same happens on a different machine without a bitmap file:
>
> ------------[ cut here ]------------
> Kernel BUG at 7817736a [verbose debug info unavailable]

argh. Please do enable CONFIG_DEBUG_BUGVERBOSE.

> invalid opcode: 0000 [#1]
> Modules linked in: as_iosched nfs lockd nfs_acl sunrpc bonding dm_mirror
> dm_snapshot e1000 sata_mv
>
> Pid: 2494, comm: cat Not tainted (2.6.24.3-1 #1)
> EIP: 0060:[<7817736a>] EFLAGS: 00010212 CPU: 0
> EIP is at sysfs_read_file+0x88/0xd4
> EAX: 00000001 EBX: 961b5880 ECX: 00000000 EDX: 964ef360
> ESI: 00001000 EDI: 964ef3c0 EBP: 9705bd04 ESP: 971f1f54
> DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
> Process cat (pid: 2494, ti=971f0000 task=970ad9a0 task.ti=971f0000)
> Stack: 96443080 0804b4d8 00001000 0804e000 961b5894 7835f6f0 96193400
> 0804e000
> 781772e2 00001000 78149bd5 971f1fa0 00001000 96193400 fffffff7
> 0804e000
> 971f0000 78149f03 971f1fa0 00000000 00000000 00000000 00000003
> 0804e000
> Call Trace:
> [<781772e2>] sysfs_read_file+0x0/0xd4
> [<78149bd5>] vfs_read+0x88/0x10a
> [<78149f03>] sys_read+0x41/0x67
> [<78103bba>] syscall_call+0x7/0xb
> =======================
> Code: c0 74 61 8b 47 18 8b 4b 0c 8b 40 04 89 43 24 89 e8 8b 74 24 14 8b
> 57 14 ff 16 89 c6 89 f8 e8 18 0b 00 00 81 fe ff 0f 00 00 7e 04 <0f> 0b
> eb fe 85 f6 78 31 c7 43 20 00 00 00 00 89 33 eb 07 be f4
> EIP: [<7817736a>] sysfs_read_file+0x88/0xd4 SS:ESP 0068:971f1f54

I assume this is the BUG_ON(count >= (ssize_t)PAGE_SIZE) in
fill_read_buffer().

This was reported recently and we prepared a debug patch but the
reporter was unable to trigger the bug again.

Please add the below and retest?


From: Andrew Morton <[email protected]>

Try to find the culprit who caused
http://bugzilla.kernel.org/show_bug.cgi?id=10150

Cc: <[email protected]>
Cc: Greg KH <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
---

drivers/base/core.c | 5 +++++
fs/sysfs/file.c | 8 +++++++-
2 files changed, 12 insertions(+), 1 deletion(-)

diff -puN fs/Kconfig~driver-core-debug-for-bad-dev_attr_show-return-value fs/Kconfig
diff -puN fs/sysfs/file.c~driver-core-debug-for-bad-dev_attr_show-return-value fs/sysfs/file.c
--- a/fs/sysfs/file.c~driver-core-debug-for-bad-dev_attr_show-return-value
+++ a/fs/sysfs/file.c
@@ -12,6 +12,7 @@

#include <linux/module.h>
#include <linux/kobject.h>
+#include <linux/kallsyms.h>
#include <linux/namei.h>
#include <linux/poll.h>
#include <linux/list.h>
@@ -94,7 +95,12 @@ static int fill_read_buffer(struct dentr
* The code works fine with PAGE_SIZE return but it's likely to
* indicate truncated result or overflow in normal use cases.
*/
- BUG_ON(count >= (ssize_t)PAGE_SIZE);
+ if (count >= (ssize_t)PAGE_SIZE) {
+ print_symbol("fill_read_buffer: %s returned bad count\n",
+ (unsigned long)ops->show);
+ /* Try to struggle along */
+ count = PAGE_SIZE - 1;
+ }
if (count >= 0) {
buffer->needs_read_fill = 0;
buffer->count = count;
diff -puN drivers/base/core.c~driver-core-debug-for-bad-dev_attr_show-return-value drivers/base/core.c
--- a/drivers/base/core.c~driver-core-debug-for-bad-dev_attr_show-return-value
+++ a/drivers/base/core.c
@@ -19,6 +19,7 @@
#include <linux/kdev_t.h>
#include <linux/notifier.h>
#include <linux/genhd.h>
+#include <linux/kallsyms.h>
#include <asm/semaphore.h>

#include "base.h"
@@ -68,6 +69,10 @@ static ssize_t dev_attr_show(struct kobj

if (dev_attr->show)
ret = dev_attr->show(dev, dev_attr, buf);
+ if (ret >= (ssize_t)PAGE_SIZE) {
+ print_symbol("dev_attr_show: %s returned bad count\n",
+ (unsigned long)dev_attr->show);
+ }
return ret;
}

_

2008-03-12 22:44:44

by Andrew Morton

[permalink] [raw]
Subject: Re: sysfs Kernel BUG when RAID bitmap file has IO errors

On Wed, 12 Mar 2008 15:36:18 -0700
Andrew Morton <[email protected]> wrote:

> I assume this is the BUG_ON(count >= (ssize_t)PAGE_SIZE) in
> fill_read_buffer().
>
> This was reported recently and we prepared a debug patch but the
> reporter was unable to trigger the bug again.
>
> Please add the below and retest?

err, don't bother.

rdev->sb_size = MD_SB_BYTES;

...

#define MD_SB_BYTES 4096

...

and

rdev->sb_size = le32_to_cpu(sb->max_dev) * 2 + 256;
bmask = queue_hardsect_size(rdev->bdev->bd_disk->queue)-1;
if (rdev->sb_size & bmask)
rdev->sb_size = (rdev->sb_size | bmask) + 1;

I'm assuming that either the above sequence of code never ran at all, or it
produced sb_size >= PAGE_SIZE.

then this:

static ssize_t
super_show(mdk_rdev_t *rdev, char *page)
{
if (rdev->sb_loaded && rdev->sb_size) {
memcpy(page, page_address(rdev->sb_page), rdev->sb_size);
return rdev->sb_size;
} else
return 0;
}

returns something >= PAGE_SIZE then this:

static int fill_read_buffer(struct dentry * dentry, struct sysfs_buffer * buffer)
{
struct sysfs_dirent *attr_sd = dentry->d_fsdata;
struct kobject *kobj = attr_sd->s_parent->s_dir.kobj;
struct sysfs_ops * ops = buffer->ops;
int ret = 0;
ssize_t count;

if (!buffer->page)
buffer->page = (char *) get_zeroed_page(GFP_KERNEL);
if (!buffer->page)
return -ENOMEM;

/* need attr_sd for attr and ops, its parent for kobj */
if (!sysfs_get_active_two(attr_sd))
return -ENODEV;

buffer->event = atomic_read(&attr_sd->s_attr.open->event);
count = ops->show(kobj, attr_sd->s_attr.attr, buffer->page);

sysfs_put_active_two(attr_sd);

/*
* The code works fine with PAGE_SIZE return but it's likely to
* indicate truncated result or overflow in normal use cases.
*/
BUG_ON(count >= (ssize_t)PAGE_SIZE);

declares game over.

Neil, pls fix?

2008-03-14 04:35:17

by NeilBrown

[permalink] [raw]
Subject: Re: sysfs Kernel BUG when RAID bitmap file has IO errors

On Wednesday March 12, [email protected] wrote:
> On Wed, 12 Mar 2008 15:36:18 -0700
> Andrew Morton <[email protected]> wrote:
>
> > I assume this is the BUG_ON(count >= (ssize_t)PAGE_SIZE) in
> > fill_read_buffer().

That's a new "BUG_ON".
Before commit 8118a859dc7abd873193986c77a8d9bdb877adc8
it was
BUG_ON(count > (ssize_t)PAGE_SIZE);

But I've been looking for an excuse to remove the "super" file from
sysfs for a while.
It doesn't really fit the sysfs model, and now clearly breaks it. And
it isn't really all that useful (you can just read from the raw device
if you want to see the superblock.
It was a "seems like a good idea at the time" thing which I now
regret.

So: I'll just remove it. (with apologies to anyone who actually uses
it, but I suspect there are only two of you :-)

Patch to follow

NeilBrown