2005-09-05 00:46:32

by Wilco Baan Hofman

[permalink] [raw]
Subject: RAID1 ramdisk patch

diff -urN linux-2.6.13-rc6.orig/include/linux/raid/raid1.h linux-2.6.13-rc6/include/linux/raid/raid1.h
--- linux-2.6.13-rc6.orig/include/linux/raid/raid1.h 2005-08-07 20:18:56.000000000 +0200
+++ linux-2.6.13-rc6/include/linux/raid/raid1.h 2005-09-04 11:41:24.000000000 +0200
@@ -32,6 +32,7 @@
int raid_disks;
int working_disks;
int last_used;
+ int preferred_read_disk;
sector_t next_seq_sect;
spinlock_t device_lock;
diff -urN linux-2.6.13-rc6.orig/drivers/md/raid1.c linux-2.6.13-rc6/drivers/md/raid1.c
--- linux-2.6.13-rc6.orig/drivers/md/raid1.c 2005-08-07 20:18:56.000000000 +0200
+++ linux-2.6.13-rc6/drivers/md/raid1.c 2005-09-05 01:54:26.000000000 +0200
@@ -21,6 +21,8 @@
* Additions to bitmap code, (C) 2003-2004 Paul Clements, SteelEye Technology:
* - persistent bitmap code
*
+ * Special handling of ramdisk (C) 2005 Wilco Baan Hofman <[email protected]>
+ *
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2, or (at your option)
@@ -399,8 +401,6 @@
goto rb_out;
}
}
- disk = new_disk;
- /* now disk == new_disk == starting point for search */

/*
* Don't change to another disk for sequential reads:
@@ -409,7 +409,18 @@
goto rb_out;
if (this_sector == conf->mirrors[new_disk].head_position)
goto rb_out;
-
+
+ /* [SYN] If the preferred disk exists, return it */
+ if (conf->preferred_read_disk != -1 &&
+ (new_rdev=conf->mirrors[conf->preferred_read_disk].rdev) != NULL &&
+ new_rdev->in_sync) {
+ new_disk = conf->preferred_read_disk;
+ goto rb_out;
+ }
+
+ disk = new_disk;
+ /* now disk == new_disk == starting point for search */
+
current_distance = abs(this_sector - conf->mirrors[disk].head_position);

/* Find the disk whose head is closest */
@@ -1292,10 +1303,11 @@
static int run(mddev_t *mddev)
{
conf_t *conf;
- int i, j, disk_idx;
+ int i, j, disk_idx, ram_count;
mirror_info_t *disk;
mdk_rdev_t *rdev;
struct list_head *tmp;
+ char b[BDEVNAME_SIZE];

if (mddev->level != 1) {
printk("raid1: %s: raid level not set to mirroring (%d)\n",
@@ -1417,6 +1429,30 @@
mddev->queue->unplug_fn = raid1_unplug;
mddev->queue->issue_flush_fn = raid1_issue_flush;

+ /* [SYN] if there is a ram disk, that will be the preferred disk.
+ * .. unless there are multiple ram disks. */
+ conf->preferred_read_disk = -1;
+ for (i = 0,
+ ram_count = 0;
+ i < mddev->raid_disks;
+ i++) {
+
+ bdevname(conf->mirrors[i].rdev->bdev, b);
+ if (strncmp(b, "ram", 3) == 0) {
+ if (ram_count) {
+ conf->preferred_read_disk = -1;
+ break;
+ }
+ conf->preferred_read_disk = i;
+ ram_count++;
+ }
+ }
+ if (conf->preferred_read_disk >= 0) {
+ printk(KERN_INFO
+ "raid1: One ram disk (%s) found, setting it preferred read disk.\n", b);
+ }
+
+
return 0;

out_no_mem:


Attachments:
syn-raid1ramdisk-20050905.patch (2.88 kB)

2005-09-05 01:28:00

by NeilBrown

[permalink] [raw]
Subject: Re: RAID1 ramdisk patch

On Monday September 5, [email protected] wrote:
> Hi all,
>
> I have written a small patch for use with a HDD-backed ramdisk in the md
> raid1 driver. The raid1 driver usually does read balancing on the disks,
> but I feel that if it encounters a single ram disk in the array that
> should be the preferred read disk. The application of this would be for
> example a 2GB ram disk in raid1 with a 2GB partition, where the ram disk
> is used for reading and both 'disks' used for writing.
>
> Attached is a bit of code which checks for a ram-disk and sets it as
> preferred disk. It also checks if the ram disk is in sync before
> allowing the read.

Hi,
equivalent functionality is now available in 2.6-mm and is referred
to as 'write mostly'.
If you use mdadm-2.0 and mark a device as --write-mostly, then all
read requests will go to the other device(s) if possible,.
e.g.
mdadm --create /dev/md0 --level=1 --raid-disks=2 /dev/ramdisk \
--writemostly /dev/realdisk

Does this suit your needs?

You can also arrange for the write to the writemostly device to be
'write-behind' so that the filesystem doesn't wait for the write to
complete. This can reduce write-latency (though not increase write
throughput) at a very small cost of reliability (if the RAM dies, the
disk may not be 100% up-to-date).

NeilBrown

2005-09-05 07:40:40

by Wilco Baan Hofman

[permalink] [raw]
Subject: Re: RAID1 ramdisk patch

Neil Brown wrote:

>On Monday September 5, [email protected] wrote:
>
>
>>Hi all,
>>
>>I have written a small patch for use with a HDD-backed ramdisk in the md
>>raid1 driver. The raid1 driver usually does read balancing on the disks,
>>but I feel that if it encounters a single ram disk in the array that
>>should be the preferred read disk. The application of this would be for
>>example a 2GB ram disk in raid1 with a 2GB partition, where the ram disk
>>is used for reading and both 'disks' used for writing.
>>
>>Attached is a bit of code which checks for a ram-disk and sets it as
>>preferred disk. It also checks if the ram disk is in sync before
>>allowing the read.
>>
>>
>
>Hi,
> equivalent functionality is now available in 2.6-mm and is referred
> to as 'write mostly'.
> If you use mdadm-2.0 and mark a device as --write-mostly, then all
> read requests will go to the other device(s) if possible,.
> e.g.
> mdadm --create /dev/md0 --level=1 --raid-disks=2 /dev/ramdisk \
> --writemostly /dev/realdisk
>
> Does this suit your needs?
>
> You can also arrange for the write to the writemostly device to be
> 'write-behind' so that the filesystem doesn't wait for the write to
> complete. This can reduce write-latency (though not increase write
> throughput) at a very small cost of reliability (if the RAM dies, the
> disk may not be 100% up-to-date).
>
>NeilBrown
>
>
>
I was looking for that (but couldn't find it)..

At this point I don't see why it wouldn't, if that also syncs from the
partition then it's basically the same functionality, but written from a
different perspective.

To use it I'll have to deviate from stock linux and use a non-packaged
mdadm, but that is better than applying my patch every kernel update ;-)

Thanks, I'll look into it.

Wilco Baan Hofman

2005-11-16 13:36:31

by Sander

[permalink] [raw]
Subject: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch)

Neil Brown wrote (ao):
> If you use mdadm-2.0 and mark a device as --write-mostly, then all
> read requests will go to the other device(s) if possible,.
> e.g.
> mdadm --create /dev/md0 --level=1 --raid-disks=2 /dev/ramdisk \
> --writemostly /dev/realdisk
>
> Does this suit your needs?
>
> You can also arrange for the write to the writemostly device to be
> 'write-behind' so that the filesystem doesn't wait for the write to
> complete. This can reduce write-latency (though not increase write
> throughput) at a very small cost of reliability (if the RAM dies, the
> disk may not be 100% up-to-date).

With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I
try this:

mdadm -C /dev/md1 -l1 -n2 --bitmap=/storage/md1.bitmap /dev/loop0 \
--write-behind /dev/loop1

loop0 is attached to a file on tmpfs, and loop1 is attached
to a file on a lvm2 volume (reiser4, if that matters).

I can create and use the array with:

mdadm -C /dev/md1 -l1 -n2 /dev/loop0 /dev/loop1

and

mdadm -C /dev/md1 -l1 -n2 /dev/loop0 --write-mostly /dev/loop1

mdadm is compiled with:
gcc (GCC) 4.0.3 20051023 (prerelease) (Debian 4.0.2-3)

Can/should I provide more info?

With kind regards, Sander

This is what I get if I reboot, create the images with dd,
attach them with losetup and try to create the array with mdadm:


[42949575.730000] loop: loaded (max 8 devices)
[42949584.840000] md: bind<loop0>
[42949584.840000] md: bind<loop1>
[42949584.840000] md: md1: raid array is not clean -- starting background reconstruction
[42949584.840000] md1: bitmap file is out of date (0 < 1) -- forcing full recovery
[42949584.840000] md1: bitmap file is out of date, doing full recovery
[42949584.840000] Unable to handle kernel NULL pointer dereference at virtual address 00000008
[42949584.840000] printing eip:
[42949584.840000] c01c33dd
[42949584.840000] *pde = 00000000
[42949584.840000] Oops: 0000 [#1]
[42949584.840000] last sysfs file: /devices/pci0000:00/0000:00:11.0/i2c-0/name
[42949584.840000] Modules linked in: loop dm_mod i2c_viapro i2c_core
[42949584.840000] CPU: 0
[42949584.840000] EIP: 0060:[<c01c33dd>] Not tainted VLI
[42949584.840000] EFLAGS: 00010286 (2.6.14-mm2)
[42949584.840000] EIP is at prepare_write_unix_file+0x1d/0xab
[42949584.840000] eax: 00000000 ebx: c01c33c0 ecx: 00000000 edx: c104ce60
[42949584.840000] esi: c104ce60 edi: f2f2f4a0 ebp: 00000000 esp: c2d6bd90
[42949584.840000] ds: 007b es: 007b ss: 0068
[42949584.840000] Process mdadm (pid: 749, threadinfo=c2d6b000 task=c3784580)
[42949584.840000] Stack: 30303034 00000000 c104ce60 c01c33c0 c104ce60 f2f2f4a0 00000001 c02b00f2
[42949584.840000] 00001000 00000f00 f2f2f4a0 c2674000 c104ce60 c02b1154 c03a97dc f7c278cc
[42949584.840000] c2d6bddc c02b05b4 c03a975c f7c278cc 00000000 00000000 00000000 00031f20
[42949584.840000] Call Trace:
[42949584.840000] [<c01c33c0>] prepare_write_unix_file+0x0/0xab
[42949584.840000] [<c02b00f2>] write_page+0x52/0x140
[42949584.840000] [<c02b1154>] bitmap_init_from_disk+0x384/0x450
[42949584.840000] [<c02b05b4>] bitmap_read_sb+0x84/0x2f0
[42949584.840000] [<c02b21f3>] bitmap_create+0x1a3/0x2a0
[42949584.840000] [<c02ab95a>] do_md_run+0x2ba/0x500
[42949584.840000] [<c02ac8a7>] add_new_disk+0x157/0x3b0
[42949584.840000] [<c0179034>] mpage_writepages+0x124/0x3d0
[42949584.840000] [<c013c23e>] __pagevec_free+0x3e/0x60
[42949584.840000] [<c013eff9>] release_pages+0x29/0x160
[42949584.840000] [<c02adb81>] md_ioctl+0x5a1/0x630
[42949584.840000] [<c0137918>] find_get_pages+0x18/0x40
[42949584.840000] [<c02ad5e0>] md_ioctl+0x0/0x630
[42949584.840000] [<c01ede74>] blkdev_driver_ioctl+0x54/0x60
[42949584.840000] [<c01edfb4>] blkdev_ioctl+0x134/0x180
[42949584.840000] [<c015e158>] block_ioctl+0x18/0x20
[42949584.840000] [<c015e140>] block_ioctl+0x0/0x20
[42949584.840000] [<c01674ff>] do_ioctl+0x1f/0x70
[42949584.840000] [<c016769c>] vfs_ioctl+0x5c/0x1e0
[42949584.840000] [<c0156c91>] __fput+0xe1/0x140
[42949584.840000] [<c016785d>] sys_ioctl+0x3d/0x70
[42949584.840000] [<c0102f49>] syscall_call+0x7/0xb
[42949584.840000] Code: 02 00 00 eb 89 89 f6 8d bc 27 00 00 00 00 83 ec 1c 89 5c 24 0c 89 7c 24 14 89 6c 24 18 89 c5 89 74 24 10 89 54 24 08 89 4c 24 04 <8b> 40 08 8b 40 08 8b 80 94 00 00 00 e8 92 20 fd ff 3d 18 fc ff
[42949584.840000]


--
Humilis IT Services and Solutions
http://www.humilis.net

2005-11-16 22:20:37

by Andrew Morton

[permalink] [raw]
Subject: Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch)

Sander <[email protected]> wrote:
>
> Neil Brown wrote (ao):
> > If you use mdadm-2.0 and mark a device as --write-mostly, then all
> > read requests will go to the other device(s) if possible,.
> > e.g.
> > mdadm --create /dev/md0 --level=1 --raid-disks=2 /dev/ramdisk \
> > --writemostly /dev/realdisk
> >
> > Does this suit your needs?
> >
> > You can also arrange for the write to the writemostly device to be
> > 'write-behind' so that the filesystem doesn't wait for the write to
> > complete. This can reduce write-latency (though not increase write
> > throughput) at a very small cost of reliability (if the RAM dies, the
> > disk may not be 100% up-to-date).
>
> With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I
> try this:

It oopsed in reiser4. reiserfs-dev added to Cc...

> mdadm -C /dev/md1 -l1 -n2 --bitmap=/storage/md1.bitmap /dev/loop0 \
> --write-behind /dev/loop1
>
> loop0 is attached to a file on tmpfs, and loop1 is attached
> to a file on a lvm2 volume (reiser4, if that matters).
>
> I can create and use the array with:
>
> mdadm -C /dev/md1 -l1 -n2 /dev/loop0 /dev/loop1
>
> and
>
> mdadm -C /dev/md1 -l1 -n2 /dev/loop0 --write-mostly /dev/loop1
>
> mdadm is compiled with:
> gcc (GCC) 4.0.3 20051023 (prerelease) (Debian 4.0.2-3)
>
> Can/should I provide more info?
>
> With kind regards, Sander
>
> This is what I get if I reboot, create the images with dd,
> attach them with losetup and try to create the array with mdadm:
>
>
> [42949575.730000] loop: loaded (max 8 devices)
> [42949584.840000] md: bind<loop0>
> [42949584.840000] md: bind<loop1>
> [42949584.840000] md: md1: raid array is not clean -- starting background reconstruction
> [42949584.840000] md1: bitmap file is out of date (0 < 1) -- forcing full recovery
> [42949584.840000] md1: bitmap file is out of date, doing full recovery
> [42949584.840000] Unable to handle kernel NULL pointer dereference at virtual address 00000008
> [42949584.840000] printing eip:
> [42949584.840000] c01c33dd
> [42949584.840000] *pde = 00000000
> [42949584.840000] Oops: 0000 [#1]
> [42949584.840000] last sysfs file: /devices/pci0000:00/0000:00:11.0/i2c-0/name
> [42949584.840000] Modules linked in: loop dm_mod i2c_viapro i2c_core
> [42949584.840000] CPU: 0
> [42949584.840000] EIP: 0060:[<c01c33dd>] Not tainted VLI
> [42949584.840000] EFLAGS: 00010286 (2.6.14-mm2)
> [42949584.840000] EIP is at prepare_write_unix_file+0x1d/0xab
> [42949584.840000] eax: 00000000 ebx: c01c33c0 ecx: 00000000 edx: c104ce60
> [42949584.840000] esi: c104ce60 edi: f2f2f4a0 ebp: 00000000 esp: c2d6bd90
> [42949584.840000] ds: 007b es: 007b ss: 0068
> [42949584.840000] Process mdadm (pid: 749, threadinfo=c2d6b000 task=c3784580)
> [42949584.840000] Stack: 30303034 00000000 c104ce60 c01c33c0 c104ce60 f2f2f4a0 00000001 c02b00f2
> [42949584.840000] 00001000 00000f00 f2f2f4a0 c2674000 c104ce60 c02b1154 c03a97dc f7c278cc
> [42949584.840000] c2d6bddc c02b05b4 c03a975c f7c278cc 00000000 00000000 00000000 00031f20
> [42949584.840000] Call Trace:
> [42949584.840000] [<c01c33c0>] prepare_write_unix_file+0x0/0xab
> [42949584.840000] [<c02b00f2>] write_page+0x52/0x140
> [42949584.840000] [<c02b1154>] bitmap_init_from_disk+0x384/0x450
> [42949584.840000] [<c02b05b4>] bitmap_read_sb+0x84/0x2f0
> [42949584.840000] [<c02b21f3>] bitmap_create+0x1a3/0x2a0
> [42949584.840000] [<c02ab95a>] do_md_run+0x2ba/0x500
> [42949584.840000] [<c02ac8a7>] add_new_disk+0x157/0x3b0
> [42949584.840000] [<c0179034>] mpage_writepages+0x124/0x3d0
> [42949584.840000] [<c013c23e>] __pagevec_free+0x3e/0x60
> [42949584.840000] [<c013eff9>] release_pages+0x29/0x160
> [42949584.840000] [<c02adb81>] md_ioctl+0x5a1/0x630
> [42949584.840000] [<c0137918>] find_get_pages+0x18/0x40
> [42949584.840000] [<c02ad5e0>] md_ioctl+0x0/0x630
> [42949584.840000] [<c01ede74>] blkdev_driver_ioctl+0x54/0x60
> [42949584.840000] [<c01edfb4>] blkdev_ioctl+0x134/0x180
> [42949584.840000] [<c015e158>] block_ioctl+0x18/0x20
> [42949584.840000] [<c015e140>] block_ioctl+0x0/0x20
> [42949584.840000] [<c01674ff>] do_ioctl+0x1f/0x70
> [42949584.840000] [<c016769c>] vfs_ioctl+0x5c/0x1e0
> [42949584.840000] [<c0156c91>] __fput+0xe1/0x140
> [42949584.840000] [<c016785d>] sys_ioctl+0x3d/0x70
> [42949584.840000] [<c0102f49>] syscall_call+0x7/0xb
> [42949584.840000] Code: 02 00 00 eb 89 89 f6 8d bc 27 00 00 00 00 83 ec 1c 89 5c 24 0c 89 7c 24 14 89 6c 24 18 89 c5 89 74 24 10 89 54 24 08 89 4c 24 04 <8b> 40 08 8b 40 08 8b 80 94 00 00 00 e8 92 20 fd ff 3d 18 fc ff
> [42949584.840000]
>
>
> --
> Humilis IT Services and Solutions
> http://www.humilis.net
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2005-11-16 23:08:42

by NeilBrown

[permalink] [raw]
Subject: Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch)

On Wednesday November 16, [email protected] wrote:
> Sander <[email protected]> wrote:
> >
> >
> > With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I
> > try this:
>
> It oopsed in reiser4. reiserfs-dev added to Cc...
>

Hmm... It appears that md/bitmap is calling prepare_write and
commit_write with 'file' as NULL - this works for some filesystems,
but not for reiser4.

Does this patch help.

Signed-off-by: Neil Brown <[email protected]>

### Diffstat output
./drivers/md/bitmap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~ 2005-11-17 10:05:18.000000000 +1100
+++ ./drivers/md/bitmap.c 2005-11-17 10:05:40.000000000 +1100
@@ -326,9 +326,9 @@ static int write_page(struct bitmap *bit
}
}

- ret = page->mapping->a_ops->prepare_write(NULL, page, 0, PAGE_SIZE);
+ ret = page->mapping->a_ops->prepare_write(bitmap->file, page, 0, PAGE_SIZE);
if (!ret)
- ret = page->mapping->a_ops->commit_write(NULL, page, 0,
+ ret = page->mapping->a_ops->commit_write(bitmap->file, page, 0,
PAGE_SIZE);
if (ret) {
unlock_page(page);

2005-11-17 07:50:31

by Sander

[permalink] [raw]
Subject: Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch)

Neil Brown wrote (ao):
> On Wednesday November 16, [email protected] wrote:
> > Sander <[email protected]> wrote:
> > > With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I
> > > try this:
> >
> > It oopsed in reiser4. reiserfs-dev added to Cc...
> >
>
> Hmm... It appears that md/bitmap is calling prepare_write and
> commit_write with 'file' as NULL - this works for some filesystems,
> but not for reiser4.
>
> Does this patch help.

Something changed, but it didn't fix it it seems:

# mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1
mdadm: RUN_ARRAY failed: No such file or directory

(google didn't turn up the same error, but a lot
without the 'No such file or directory')

[42949645.530000] md: bind<loop0>
[42949645.540000] md: bind<loop1>
[42949645.540000] md: md1: raid array is not clean -- starting background reconstruction
[42949645.540000] md1: bitmap file is out of date (0 < 1) -- forcing full recovery
[42949645.540000] md1: bitmap file is out of date, doing full recovery
[42949645.560000] md1: bitmap initialized from disk: read 0/7 pages, set 0 bits, status: 1
[42949645.560000] md1: failed to create bitmap (1)
[42949645.560000] md: pers->run() failed ...
[42949645.560000] md: md1 stopped.
[42949645.560000] md: unbind<loop1>
[42949645.560000] md: export_rdev(loop1)
[42949645.560000] md: unbind<loop0>
[42949645.560000] md: export_rdev(loop0)

# ls -l /storage/raid1.bitmap
-rw-r--r-- 1 root root 25856 Nov 17 08:37 /storage/raid1.bitmap

(file is there, lets try again)

~# mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1
mdadm: /dev/loop0 appears to be part of a raid array:
level=raid1 devices=2 ctime=Thu Nov 17 08:37:58 2005
mdadm: /dev/loop1 appears to be part of a raid array:
level=raid1 devices=2 ctime=Thu Nov 17 08:37:58 2005
Continue creating array? yes
mdadm: bitmap file /storage/raid1.bitmap already exists, use --force to overwrite

(ok, try with new bitmapfile)

# mdadm -C /dev/md1 --bitmap=/storage/raid.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1
mdadm: /dev/loop0 appears to be part of a raid array:
level=raid1 devices=2 ctime=Thu Nov 17 08:37:58 2005
mdadm: /dev/loop1 appears to be part of a raid array:
level=raid1 devices=2 ctime=Thu Nov 17 08:37:58 2005
Continue creating array? yes
mdadm: RUN_ARRAY failed: No such file or directory

(doesn't work, lets force the first one)

# mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -f -l1 -n2 /dev/loop0 --write-behind /dev/loop1
mdadm: /dev/loop0 appears to be part of a raid array:
level=raid1 devices=2 ctime=Thu Nov 17 08:40:50 2005
mdadm: /dev/loop1 appears to be part of a raid array:
level=raid1 devices=2 ctime=Thu Nov 17 08:40:50 2005
Continue creating array? yes
Segmentation fault


For some reason, the dmesg is quite a bit longer now.

[42949831.700000] Bad page state at free_hot_cold_page (in process 'mdadm', page c1043220)
[42949831.700000] flags:0x80000001 mapping:00000000 mapcount:0 count:0
[42949831.700000] Backtrace:
[42949831.700000] [<c013b320>] bad_page+0x70/0xb0
[42949831.700000] [<c013bab1>] free_hot_cold_page+0x51/0xd0
[42949831.700000] [<c013f5da>] truncate_inode_pages_range+0x11a/0x310
[42949831.700000] [<c01a2ac0>] reiser4_invalidate_pages+0x90/0xc0
[42949831.700000] [<c01ba5ed>] kill_hook_extent+0x17d/0x5b0
[42949831.700000] [<c01ac29c>] plugin_by_unsafe_id+0x1c/0x110
[42949831.700000] [<c01ba470>] kill_hook_extent+0x0/0x5b0
[42949831.700000] [<c01cd7fd>] call_kill_hooks+0x9d/0xc0
[42949831.700000] [<c01cd8f0>] kill_head+0x0/0x40
[42949831.700000] [<c01cdf76>] prepare_for_compact+0x536/0x540
[42949831.700000] [<c0192a0e>] lock_tail+0x1e/0x40
[42949831.700000] [<c01ac29c>] plugin_by_unsafe_id+0x1c/0x110
[42949831.700000] [<c01cd820>] kill_units+0x0/0x80
[42949831.700000] [<c01cd8f0>] kill_head+0x0/0x40
[42949831.700000] [<c0192933>] longterm_unlock_znode+0xa3/0x160
[42949831.700000] [<c0192bf3>] longterm_lock_znode+0x163/0x250
[42949831.700000] [<c018ce4b>] jload_gfp+0x5b/0x140
[42949831.700000] [<c01cdfb1>] kill_node40+0x31/0xc0
[42949831.700000] [<c0191a88>] carry_cut+0x48/0x60
[42949831.700000] [<c018f458>] carry_on_level+0x38/0xc0
[42949831.700000] [<c018f302>] carry+0x82/0x1a0
[42949831.700000] [<c018f704>] add_carry+0x24/0x40
[42949831.700000] [<c018f51d>] post_carry+0x3d/0xa0
[42949831.710000] [<c0194886>] kill_node_content+0xf6/0x160
[42949831.710000] [<c0194e39>] cut_tree_worker_common+0x159/0x350
[42949831.710000] [<c0194ce0>] cut_tree_worker_common+0x0/0x350
[42949831.710000] [<c0195155>] cut_tree_object+0x125/0x240
[42949831.710000] [<c0196d29>] reiser4_grab_reserved+0x49/0x190
[42949831.710000] [<c018d04f>] jrelse+0xf/0x20
[42949831.710000] [<c01bfc81>] cut_file_items+0xb1/0x180
[42949831.710000] [<c01a0108>] add_empty_leaf+0xa8/0x220
[42949831.710000] [<c01bfdab>] shorten_file+0x4b/0x260
[42949831.710000] [<c01bfb40>] update_file_size+0x0/0x90
[42949831.710000] [<c01c2f03>] setattr_truncate+0x73/0x210
[42949831.710000] [<c01ad384>] permission_common+0x24/0x40
[42949831.710000] [<c01ad360>] permission_common+0x0/0x40
[42949831.710000] [<c0162b78>] permission+0x48/0x90
[42949831.710000] [<c0163119>] __link_path_walk+0x89/0xc40
[42949831.710000] [<c01c30fe>] setattr_unix_file+0x5e/0xc0
[42949831.710000] [<c016f58f>] notify_change+0xcf/0x2d5
[42949831.710000] [<c0163d3f>] link_path_walk+0x6f/0xe0
[42949831.710000] [<c0153e9b>] do_truncate+0x4b/0x70
[42949831.710000] [<c0162b78>] permission+0x48/0x90
[42949831.710000] [<c0164704>] may_open+0x184/0x1d0
[42949831.710000] [<c01647d5>] open_namei+0x85/0x560
[42949831.710000] [<c0154fe2>] filp_open+0x22/0x50
[42949831.710000] [<c01551ad>] get_unused_fd+0x4d/0xb0
[42949831.710000] [<c01552c1>] do_sys_open+0x41/0xd0
[42949831.710000] [<c0102f49>] syscall_call+0x7/0xb
[42949831.710000] Trying to fix it up, but a reboot is needed
[42949831.710000] ------------[ cut here ]------------
[42949831.710000] kernel BUG at mm/filemap.c:480!
[42949831.710000] invalid operand: 0000 [#1]
[42949831.710000] last sysfs file: /devices/pci0000:00/0000:00:11.0/i2c-0/name
[42949831.710000] Modules linked in: loop dm_mod i2c_viapro i2c_core
[42949831.710000] CPU: 0
[42949831.710000] EIP: 0060:[<c013763d>] Tainted: G B VLI
[42949831.710000] EFLAGS: 00010246 (2.6.14-mm2)
[42949831.710000] EIP is at unlock_page+0xd/0x30
[42949831.710000] eax: 00000000 ebx: c1043220 ecx: c03cad30 edx: c1652218
[42949831.710000] esi: 00000001 edi: 00000000 ebp: 00000006 esp: c26c298c
[42949831.710000] ds: 007b es: 007b ss: 0068
[42949831.710000] Process mdadm (pid: 785, threadinfo=c26c2000 task=c6f64050)
[42949831.710000] Stack: c1043220 c013f5e1 0000000e 00007000 f2fb87ec 00000000 00000000 00000007
[42949831.710000] 00000000 c1043220 c1045260 c1040240 c1040260 c1042820 c1042800 c10415e0
[42949831.710000] 00007000 00000000 00000000 00000000 00000006 f2fb8810 00000001 00006fff
[42949831.710000] Call Trace:
[42949831.710000] [<c013f5e1>] truncate_inode_pages_range+0x121/0x310
[42949831.710000] [<c01a2ac0>] reiser4_invalidate_pages+0x90/0xc0
[42949831.710000] [<c01ba5ed>] kill_hook_extent+0x17d/0x5b0
[42949831.710000] [<c01ac29c>] plugin_by_unsafe_id+0x1c/0x110
[42949831.710000] [<c01ba470>] kill_hook_extent+0x0/0x5b0
[42949831.710000] [<c01cd7fd>] call_kill_hooks+0x9d/0xc0
[42949831.710000] [<c01cd8f0>] kill_head+0x0/0x40
[42949831.710000] [<c01cdf76>] prepare_for_compact+0x536/0x540
[42949831.710000] [<c0192a0e>] lock_tail+0x1e/0x40
[42949831.710000] [<c01ac29c>] plugin_by_unsafe_id+0x1c/0x110
[42949831.710000] [<c01cd820>] kill_units+0x0/0x80
[42949831.710000] [<c01cd8f0>] kill_head+0x0/0x40
[42949831.710000] [<c0192933>] longterm_unlock_znode+0xa3/0x160
[42949831.710000] [<c0192bf3>] longterm_lock_znode+0x163/0x250
[42949831.710000] [<c018ce4b>] jload_gfp+0x5b/0x140
[42949831.710000] [<c01cdfb1>] kill_node40+0x31/0xc0
[42949831.710000] [<c0191a88>] carry_cut+0x48/0x60
[42949831.710000] [<c018f458>] carry_on_level+0x38/0xc0
[42949831.710000] [<c018f302>] carry+0x82/0x1a0
[42949831.710000] [<c018f704>] add_carry+0x24/0x40
[42949831.710000] [<c018f51d>] post_carry+0x3d/0xa0
[42949831.710000] [<c0194886>] kill_node_content+0xf6/0x160
[42949831.710000] [<c0194e39>] cut_tree_worker_common+0x159/0x350
[42949831.710000] [<c0194ce0>] cut_tree_worker_common+0x0/0x350
[42949831.710000] [<c0195155>] cut_tree_object+0x125/0x240
[42949831.710000] [<c0196d29>] reiser4_grab_reserved+0x49/0x190
[42949831.710000] [<c018d04f>] jrelse+0xf/0x20
[42949831.710000] [<c01bfc81>] cut_file_items+0xb1/0x180
[42949831.710000] [<c01a0108>] add_empty_leaf+0xa8/0x220
[42949831.710000] [<c01bfdab>] shorten_file+0x4b/0x260
[42949831.710000] [<c01bfb40>] update_file_size+0x0/0x90
[42949831.710000] [<c01c2f03>] setattr_truncate+0x73/0x210
[42949831.710000] [<c01ad384>] permission_common+0x24/0x40
[42949831.710000] [<c01ad360>] permission_common+0x0/0x40
[42949831.710000] [<c0162b78>] permission+0x48/0x90
[42949831.710000] [<c0163119>] __link_path_walk+0x89/0xc40
[42949831.710000] [<c01c30fe>] setattr_unix_file+0x5e/0xc0
[42949831.710000] [<c016f58f>] notify_change+0xcf/0x2d5
[42949831.710000] [<c0163d3f>] link_path_walk+0x6f/0xe0
[42949831.710000] [<c0153e9b>] do_truncate+0x4b/0x70
[42949831.710000] [<c0162b78>] permission+0x48/0x90
[42949831.710000] [<c0164704>] may_open+0x184/0x1d0
[42949831.710000] [<c01647d5>] open_namei+0x85/0x560
[42949831.710000] [<c0154fe2>] filp_open+0x22/0x50
[42949831.710000] [<c01551ad>] get_unused_fd+0x4d/0xb0
[42949831.710000] [<c01552c1>] do_sys_open+0x41/0xd0
[42949831.710000] [<c0102f49>] syscall_call+0x7/0xb
[42949831.710000] Code: e8 69 ff ff ff 89 da b9 20 6f 13 c0 c7 04 24 02 00 00 00 e8 e6 77 22 00 83 c4 20 5b c3 90 53 89 c3 0f ba 30 00 19 c0 85 c0 75 08 <0f> 0b e0 01 f8 6a 38 c0 89 d8 e8 34 ff ff ff 89 da 31 c9 5b e9
[42949831.710000]

--
Humilis IT Services and Solutions
http://www.humilis.net

2005-11-17 10:12:51

by Sander

[permalink] [raw]
Subject: Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch)

Sander wrote (ao):
# Neil Brown wrote (ao):
# > On Wednesday November 16, [email protected] wrote:
# > > Sander <[email protected]> wrote:
# > > > With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I
# > > > try this:
# > >
# > > It oopsed in reiser4. reiserfs-dev added to Cc...
# > >
# >
# > Hmm... It appears that md/bitmap is calling prepare_write and
# > commit_write with 'file' as NULL - this works for some filesystems,
# > but not for reiser4.
# >
# > Does this patch help.
#
# Something changed, but it didn't fix it it seems:
#
# # mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1
# mdadm: RUN_ARRAY failed: No such file or directory

FWIW, the following happens when I point --bitmap to /tmp/raid1.bitmap
which is tmpfs, and also happens when I attach both loop0 and loop1 to
files on tmpfs.

This would suggest that reiser4 is not solely at fault?

The difference btw is that I can reboot with 'shutdown -r now'
instead of sysrq. And that mdadm hangs:

# mdadm -C /dev/md1 --bitmap=/tmp/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1
mdadm: RUN_ARRAY failed: No such file or directory

# mdadm -C /dev/md1 -f --bitmap=/tmp/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1
mdadm: /dev/loop0 appears to be part of a raid array:
level=raid1 devices=2 ctime=Thu Nov 17 11:04:31 2005
mdadm: /dev/loop1 appears to be part of a raid array:
level=raid1 devices=2 ctime=Thu Nov 17 11:04:31 2005
Continue creating array? yes
[hang, no prompt, no reaction to ctrl-c, etc]


[42949549.780000] md: bind<loop0>
[42949549.780000] md: bind<loop1>
[42949549.780000] md: md1: raid array is not clean -- starting background reconstruction
[42949549.790000] md1: bitmap file is out of date (0 < 1) -- forcing full recovery
[42949549.790000] md1: bitmap file is out of date, doing full recovery
[42949549.790000] md1: bitmap initialized from disk: read 0/4 pages, set 0 bits, status: 524288
[42949549.790000] Bad page state at free_hot_cold_page (in process 'mdadm', page c10dcc20)
[42949549.790000] flags:0x80000019 mapping:f5155c84 mapcount:0 count:0
[42949549.790000] Backtrace:
[42949549.790000] [<c013b320>] bad_page+0x70/0xb0
[42949549.790000] [<c013bab1>] free_hot_cold_page+0x51/0xd0
[42949549.790000] [<c02b0a90>] bitmap_file_put+0x30/0x70
[42949549.790000] [<c02b1f8e>] bitmap_free+0x1e/0xb0
[42949549.790000] [<c02b2126>] bitmap_create+0xd6/0x2a0
[42949549.790000] [<c02ab95a>] do_md_run+0x2ba/0x500
[42949549.790000] [<c02ac8a7>] add_new_disk+0x157/0x3b0
[42949549.790000] [<c0179034>] mpage_writepages+0x124/0x3d0
[42949549.790000] [<c013c23e>] __pagevec_free+0x3e/0x60
[42949549.790000] [<c013eff9>] release_pages+0x29/0x160
[42949549.790000] [<c02adb81>] md_ioctl+0x5a1/0x630
[42949549.790000] [<c0137918>] find_get_pages+0x18/0x40
[42949549.790000] [<c02ad5e0>] md_ioctl+0x0/0x630
[42949549.790000] [<c01ede74>] blkdev_driver_ioctl+0x54/0x60
[42949549.790000] [<c01edfb4>] blkdev_ioctl+0x134/0x180
[42949549.790000] [<c015e158>] block_ioctl+0x18/0x20
[42949549.790000] [<c015e140>] block_ioctl+0x0/0x20
[42949549.790000] [<c01674ff>] do_ioctl+0x1f/0x70
[42949549.790000] [<c016769c>] vfs_ioctl+0x5c/0x1e0
[42949549.790000] [<c0156c91>] __fput+0xe1/0x140
[42949549.790000] [<c016785d>] sys_ioctl+0x3d/0x70
[42949549.790000] [<c0102f49>] syscall_call+0x7/0xb
[42949549.790000] Trying to fix it up, but a reboot is needed
[42949549.790000] md1: failed to create bitmap (524288)
[42949549.790000] md: pers->run() failed ...
[42949549.790000] md: md1 stopped.
[42949549.790000] md: unbind<loop1>
[42949549.790000] md: export_rdev(loop1)
[42949549.790000] md: unbind<loop0>
[42949549.790000] md: export_rdev(loop0)


--
Humilis IT Services and Solutions
http://www.humilis.net

2005-11-17 10:15:10

by Sander

[permalink] [raw]
Subject: Re: segfault mdadm --write-behind, 2.6.14-mm2 (was: Re: RAID1 ramdisk patch)

Sander wrote (ao):
# Sander wrote (ao):
# # Neil Brown wrote (ao):
# # > On Wednesday November 16, [email protected] wrote:
# # > > Sander <[email protected]> wrote:
# # > > > With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I
# # > > > try this:
# # > >
# # > > It oopsed in reiser4. reiserfs-dev added to Cc...
# # > >
# # >
# # > Hmm... It appears that md/bitmap is calling prepare_write and
# # > commit_write with 'file' as NULL - this works for some filesystems,
# # > but not for reiser4.
# # >
# # > Does this patch help.
# #
# # Something changed, but it didn't fix it it seems:
# #
# # # mdadm -C /dev/md1 --bitmap=/storage/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1
# # mdadm: RUN_ARRAY failed: No such file or directory
#
# FWIW, the following happens when I point --bitmap to /tmp/raid1.bitmap
# which is tmpfs, and also happens when I attach both loop0 and loop1 to
# files on tmpfs.
#
# This would suggest that reiser4 is not solely at fault?
#
# The difference btw is that I can reboot with 'shutdown -r now'
# instead of sysrq. And that mdadm hangs:
#
# # mdadm -C /dev/md1 --bitmap=/tmp/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1
# mdadm: RUN_ARRAY failed: No such file or directory
#
# # mdadm -C /dev/md1 -f --bitmap=/tmp/raid1.bitmap -l1 -n2 /dev/loop0 --write-behind /dev/loop1
# mdadm: /dev/loop0 appears to be part of a raid array:
# level=raid1 devices=2 ctime=Thu Nov 17 11:04:31 2005
# mdadm: /dev/loop1 appears to be part of a raid array:
# level=raid1 devices=2 ctime=Thu Nov 17 11:04:31 2005
# Continue creating array? yes
# [hang, no prompt, no reaction to ctrl-c, etc]

And even more info. It seems mdadm spins:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
749 root 25 0 1696 568 492 R 99.9 0.1 8:32.50 mdadm

Would sysrq-t be useful?


# [42949549.780000] md: bind<loop0>
# [42949549.780000] md: bind<loop1>
# [42949549.780000] md: md1: raid array is not clean -- starting background reconstruction
# [42949549.790000] md1: bitmap file is out of date (0 < 1) -- forcing full recovery
# [42949549.790000] md1: bitmap file is out of date, doing full recovery
# [42949549.790000] md1: bitmap initialized from disk: read 0/4 pages, set 0 bits, status: 524288
# [42949549.790000] Bad page state at free_hot_cold_page (in process 'mdadm', page c10dcc20)
# [42949549.790000] flags:0x80000019 mapping:f5155c84 mapcount:0 count:0
# [42949549.790000] Backtrace:
# [42949549.790000] [<c013b320>] bad_page+0x70/0xb0
# [42949549.790000] [<c013bab1>] free_hot_cold_page+0x51/0xd0
# [42949549.790000] [<c02b0a90>] bitmap_file_put+0x30/0x70
# [42949549.790000] [<c02b1f8e>] bitmap_free+0x1e/0xb0
# [42949549.790000] [<c02b2126>] bitmap_create+0xd6/0x2a0
# [42949549.790000] [<c02ab95a>] do_md_run+0x2ba/0x500
# [42949549.790000] [<c02ac8a7>] add_new_disk+0x157/0x3b0
# [42949549.790000] [<c0179034>] mpage_writepages+0x124/0x3d0
# [42949549.790000] [<c013c23e>] __pagevec_free+0x3e/0x60
# [42949549.790000] [<c013eff9>] release_pages+0x29/0x160
# [42949549.790000] [<c02adb81>] md_ioctl+0x5a1/0x630
# [42949549.790000] [<c0137918>] find_get_pages+0x18/0x40
# [42949549.790000] [<c02ad5e0>] md_ioctl+0x0/0x630
# [42949549.790000] [<c01ede74>] blkdev_driver_ioctl+0x54/0x60
# [42949549.790000] [<c01edfb4>] blkdev_ioctl+0x134/0x180
# [42949549.790000] [<c015e158>] block_ioctl+0x18/0x20
# [42949549.790000] [<c015e140>] block_ioctl+0x0/0x20
# [42949549.790000] [<c01674ff>] do_ioctl+0x1f/0x70
# [42949549.790000] [<c016769c>] vfs_ioctl+0x5c/0x1e0
# [42949549.790000] [<c0156c91>] __fput+0xe1/0x140
# [42949549.790000] [<c016785d>] sys_ioctl+0x3d/0x70
# [42949549.790000] [<c0102f49>] syscall_call+0x7/0xb
# [42949549.790000] Trying to fix it up, but a reboot is needed
# [42949549.790000] md1: failed to create bitmap (524288)
# [42949549.790000] md: pers->run() failed ...
# [42949549.790000] md: md1 stopped.
# [42949549.790000] md: unbind<loop1>
# [42949549.790000] md: export_rdev(loop1)
# [42949549.790000] md: unbind<loop0>
# [42949549.790000] md: export_rdev(loop0)

--
Humilis IT Services and Solutions
http://www.humilis.net

2005-11-18 14:18:49

by Vladimir V. Saveliev

[permalink] [raw]
Subject: Re: segfault mdadm --write-behind, 2.6.14-mm2

Hello

Andrew Morton wrote:
> Sander <[email protected]> wrote:
>>Neil Brown wrote (ao):
>>> If you use mdadm-2.0 and mark a device as --write-mostly, then all
>>> read requests will go to the other device(s) if possible,.
>>> e.g.
>>> mdadm --create /dev/md0 --level=1 --raid-disks=2 /dev/ramdisk \
>>> --writemostly /dev/realdisk
>>>
>>> Does this suit your needs?
>>>
>>> You can also arrange for the write to the writemostly device to be
>>> 'write-behind' so that the filesystem doesn't wait for the write to
>>> complete. This can reduce write-latency (though not increase write
>>> throughput) at a very small cost of reliability (if the RAM dies, the
>>> disk may not be 100% up-to-date).
>>With 2.6.14-mm2 (x86) and mdadm 2.1 I get a Segmentation fault when I
>>try this:
>
> It oopsed in reiser4. reiserfs-dev added to Cc...
>
>>mdadm -C /dev/md1 -l1 -n2 --bitmap=/storage/md1.bitmap /dev/loop0 \
>>--write-behind /dev/loop1
>>
>>loop0 is attached to a file on tmpfs, and loop1 is attached
>>to a file on a lvm2 volume (reiser4, if that matters).
>>

I tried ext2 on lvm2 and that did not help.
So, for now I would assume that the problem is not in reiser4 but somewhere else.