2004-10-23 23:07:37

by Mathieu

[permalink] [raw]
Subject: 2.6.9-mm1: LVM stopped working


Well, I gave a try to last -mm tree. The bot seemed good till it got to
LVM stuff. Vgchange does not find any volume groups. I can't say much because
lvm is pretty "early stuff" on this box; so it is pretty unusable. All I know
for now, as I changed a little my boot scripts to be more verbose, is that
vgchange -avvv y returns this kind of message:
hdXN: cannot read LABEL
and this message for all parts it can test....
As I need this box up and running, I came back to 2.6.9-rc3-mm3 (it works
pretty well). I will be able to run more tests on it, tomorrow but for now
that's all I can provide.

Oh and dmesg didn't have any oops or BUG in it, and seemed quite usual,
in IDE detection and settings messages and device-mapper messages.

However, I use dm-crypt to encrypt my / (no initrd, just initramfs) and
it works under 2.6.9-mm1, so the bug is likely to be in IDE stuff.

Sorry, for not being able to provide more infos. I will see if I can try on
another LVM'ed box but not for critical stuff.

Mathieu

--
Lots of luck ... please pass your crack pipe arounds so the rest of us
idiots can see your vision or lack of ...

- Andre Hedrick on linux-kernel


2004-10-25 19:05:06

by Christophe Saout

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working

Am Sonntag, den 24.10.2004, 01:06 +0200 schrieb Mathieu Segaud:

> Well, I gave a try to last -mm tree. The bot seemed good till it got to
> LVM stuff. Vgchange does not find any volume groups. I can't say much because
> lvm is pretty "early stuff" on this box; so it is pretty unusable. All I know
> for now, as I changed a little my boot scripts to be more verbose, is that
> vgchange -avvv y returns this kind of message:
> hdXN: cannot read LABEL
> and this message for all parts it can test....
> As I need this box up and running, I came back to 2.6.9-rc3-mm3 (it works
> pretty well). I will be able to run more tests on it, tomorrow but for now
> that's all I can provide.
>
> Oh and dmesg didn't have any oops or BUG in it, and seemed quite usual,
> in IDE detection and settings messages and device-mapper messages.
>
> However, I use dm-crypt to encrypt my / (no initrd, just initramfs) and
> it works under 2.6.9-mm1, so the bug is likely to be in IDE stuff.

Are you encrypting your PV or your LVs?

There's some new dm-crypt code in -mm1 along with some API changes, but
backward compatibility is provided and should work.


Attachments:
signature.asc (189.00 B)
Dies ist ein digital signierter Nachrichtenteil

2004-10-26 04:26:36

by Mathieu

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working

Christophe Saout <[email protected]> disait derni?rement que :

> Are you encrypting your PV or your LVs?

non they are not encrypted, I thought about the new iv but my aes-encrypted
/ still boot :)
I wonder if it comes from some ide changes, as the messages from vgscan and
vgchange indicate that LABEL areas are detected, but cannot be read....

quite weird as anything else works quite well...

>
> There's some new dm-crypt code in -mm1 along with some API changes, but
> backward compatibility is provided and should work.

Best regards,

Mathieu

--
printk("----------- [cut here ] --------- [please bite here ] ---------\n");
linux-2.6.6/arch/x86_64/kernel/traps.

2004-10-26 11:00:20

by Mathieu

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working

Christophe Saout <[email protected]> disait derni?rement que :

> Are you encrypting your PV or your LVs?
>
> There's some new dm-crypt code in -mm1 along with some API changes, but
> backward compatibility is provided and should work.

I tried 2.6.9-mm1, reverting all the new dm-crypt stuff, and it didn't make it.
So it is not related to these patches.
Will look further into it later; for now I must go working on my PhD :)

Best regards,

--
<riel> google rules
<google> rules: http://www.law.cornell.edu/rules/fre/overview.html

- Rik van Riel chatting with the bots on #kernelnewbies

2004-10-26 12:41:29

by Joseph Fannin

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working

On Mon, Oct 25, 2004 at 09:03:22PM +0200, Christophe Saout wrote:
> Am Sonntag, den 24.10.2004, 01:06 +0200 schrieb Mathieu Segaud:
>
>>Well, I gave a try to last -mm tree. The bot seemed good till it got to
>>LVM stuff. Vgchange does not find any volume groups. I can't say much because
>>lvm is pretty "early stuff" on this box; so it is pretty unusable.

LVM doesn't work with 2.6.9-mm1 here either, complaining that it
can't find all the pv's. I'm not using any sort of encryption. Here,
pvdisplay reports:

--- Physical volume ---
PV Name /dev/hda2
VG Name home
PV Size 24.52 GB / not usable 0
Allocatable yes (but full)
PE Size (KByte) 4096
Total PE 6277
Free PE 0
Allocated PE 6277
PV UUID M8tcls-7Tp7-sAYe-ypH3-if50-00JH-hvvXSL

--- Physical volume ---
PV Name unknown device
VG Name home
PV Size 70.47 GB / not usable 0
Allocatable yes (but full)
PE Size (KByte) 4096
Total PE 18040
Free PE 0
Allocated PE 18040
PV UUID SmreB9-Q3dp-DBBc-q0N9-v762-o6UB-1VUgYw

--- Physical volume ---
PV Name unknown device
VG Name home
PV Size 25.12 GB / not usable 0
Allocatable yes (but full)
PE Size (KByte) 4096
Total PE 6431
Free PE 0
Allocated PE 6431
PV UUID sbbFSh-0MP8-jtir-Jcyx-VtcE-TxNh-tfNwNe

I can open the device nodes for the 'missing' pv's in a hexeditor and see the
uuid magic; if I reboot into 2.6.9-rc4-mm1 they are found without a
problem, and everything works.

Whether or not I'll have time to try to narrow down the change
that causes this depends on things that are out of my control ATM. :-/

--
Joseph Fannin
[email protected]

2004-10-26 13:57:56

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working


On Tue, 26 Oct 2004 [email protected] wrote:

> On Mon, Oct 25, 2004 at 09:03:22PM +0200, Christophe Saout wrote:
>> Am Sonntag, den 24.10.2004, 01:06 +0200 schrieb Mathieu Segaud:
>>
>>> Well, I gave a try to last -mm tree. The bot seemed good till it got to
>>> LVM stuff. Vgchange does not find any volume groups. I can't say much because
>>> lvm is pretty "early stuff" on this box; so it is pretty unusable.
>
> LVM doesn't work with 2.6.9-mm1 here either, complaining that it
> can't find all the pv's. I'm not using any sort of encryption. Here,
> pvdisplay reports:

It doesn't work on 2.6.10-rc1 either. Works fine on 2.6.9 and 2.4.8-rc1.

device-mapper ioctl cmd 0 failed: Inappropriate ioctl for device
striped: Required device-mapper target(s) not detected in your kernel
lvcreate: Create a logical volume



Jeff.

2004-10-26 14:01:14

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working

On Tue, Oct 26, 2004 at 08:36:51AM -0400, [email protected] wrote:
> LVM doesn't work with 2.6.9-mm1 here either, complaining that it
> can't find all the pv's. I'm not using any sort of encryption. Here,
> pvdisplay reports:

> I can open the device nodes for the 'missing' pv's in a hexeditor and see the
> uuid magic; if I reboot into 2.6.9-rc4-mm1 they are found without a
> problem, and everything works.

Firstly enable lvm debugging. lvm.conf: log { file="/tmp/lvm2.log" level=7 }
Compare the lvm log files for the kernels to see where it's going wrong.
Then take complete straces (incl. read/write data) of the lvm process
with each kernel and again compare them. [Or put files on web and send URLs.]

[To check for repeat of old problems with related symptoms:]
Were both kernels compiled with the same compiler version? Which version?
Does it make any difference if you rebuild lvm with --disable-o_direct?

Alasdair
--
[email protected]

Subject: Re: 2.6.9-mm1: LVM stopped working

On Sun, 24 Oct 2004 01:06:07 +0200, Mathieu Segaud
<[email protected]> wrote:
>
> Well, I gave a try to last -mm tree. The bot seemed good till it got to
> LVM stuff. Vgchange does not find any volume groups. I can't say much because
> lvm is pretty "early stuff" on this box; so it is pretty unusable. All I know
> for now, as I changed a little my boot scripts to be more verbose, is that
> vgchange -avvv y returns this kind of message:
> hdXN: cannot read LABEL
> and this message for all parts it can test....
> As I need this box up and running, I came back to 2.6.9-rc3-mm3 (it works
> pretty well). I will be able to run more tests on it, tomorrow but for now
> that's all I can provide.
>
> Oh and dmesg didn't have any oops or BUG in it, and seemed quite usual,
> in IDE detection and settings messages and device-mapper messages.
>
> However, I use dm-crypt to encrypt my / (no initrd, just initramfs) and
> it works under 2.6.9-mm1, so the bug is likely to be in IDE stuff.

prove it ;)

There were only minor IDE changes from 2.6.9-rc3-mm3 to 2.6.9-mm1,
I don't see any obvious suspects...

2004-10-26 14:10:38

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working

On Tue, Oct 26, 2004 at 09:55:38PM +0800, Jeff Chua wrote:
> It doesn't work on 2.6.10-rc1 either. Works fine on 2.6.9 and 2.4.8-rc1.
> device-mapper ioctl cmd 0 failed: Inappropriate ioctl for device

Do you get any corresponding kernel messages?
Check /dev/mapper/control corresponds to /proc/devices & /proc/misc.
(See device-mapper scripts/devmap_mknod.sh)
Use 'dmsetup version' and 'dmsetup targets' to test.

Alasdair
--
[email protected]

Subject: Re: 2.6.9-mm1: LVM stopped working

On Tue, 26 Oct 2004 16:00:47 +0200, Bartlomiej Zolnierkiewicz
<[email protected]> wrote:
> On Sun, 24 Oct 2004 01:06:07 +0200, Mathieu Segaud
> <[email protected]> wrote:
> >
> > Well, I gave a try to last -mm tree. The bot seemed good till it got to
> > LVM stuff. Vgchange does not find any volume groups. I can't say much because
> > lvm is pretty "early stuff" on this box; so it is pretty unusable. All I know
> > for now, as I changed a little my boot scripts to be more verbose, is that
> > vgchange -avvv y returns this kind of message:
> > hdXN: cannot read LABEL
> > and this message for all parts it can test....
> > As I need this box up and running, I came back to 2.6.9-rc3-mm3 (it works
> > pretty well). I will be able to run more tests on it, tomorrow but for now
> > that's all I can provide.
> >
> > Oh and dmesg didn't have any oops or BUG in it, and seemed quite usual,
> > in IDE detection and settings messages and device-mapper messages.
> >
> > However, I use dm-crypt to encrypt my / (no initrd, just initramfs) and
> > it works under 2.6.9-mm1, so the bug is likely to be in IDE stuff.
>
> prove it ;)

To make this task easier I prepared 2.6.9-rc3-mm3 to 2.6.9-mm1 IDE patch:

http://home.elka.pw.edu.pl/~bzolnier/ide-2.6.9-rc3-mm3-to-2.6.9-mm1.patch.bz2

Just revert it from 2.6.9-mm1.

> There were only minor IDE changes from 2.6.9-rc3-mm3 to 2.6.9-mm1,
> I don't see any obvious suspects...

2004-10-26 17:21:57

by Mathieu

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working

Bartlomiej Zolnierkiewicz <[email protected]> disait derni?rement que :

> To make this task easier I prepared 2.6.9-rc3-mm3 to 2.6.9-mm1 IDE patch:
>
> http://home.elka.pw.edu.pl/~bzolnier/ide-2.6.9-rc3-mm3-to-2.6.9-mm1.patch.bz2
>
> Just revert it from 2.6.9-mm1.

thx, I will test it soon.
I have just made straces of vgchange processes in success and failure cases
(there is little difference in the fact that in the failure case, I added
-v verbose option but that's all)

vgchange tries to read 2 chunks of data from the partition:
- the first 2048 bytes,
- and after closing device, and reopening it, the 512 next ones.

in the failure case, the first read succeeds with just 1536 bytes read,
which causes the process to issue another read syscall to read the "missing"
512 bytes, which fails...

for now, that's all I can see
I will enable lvm debugging, for the next try

the straces are:
http://www.crans.org/~segaud/vgchange.failure
http://www.crans.org/~segaud/vgchange.succeeded
(names are obvious)

Best regards,

Mathieu

--
"I am a living example of someone who took on an issue and benefited from it."

George W. Bush
April 25, 2001
Speaking to John King of CNN.

2004-10-26 17:57:13

by Mathieu

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working

Bartlomiej Zolnierkiewicz <[email protected]> disait derni?rement que :


>> > However, I use dm-crypt to encrypt my / (no initrd, just initramfs) and
>> > it works under 2.6.9-mm1, so the bug is likely to be in IDE stuff.
>>
>> prove it ;)
>
> To make this task easier I prepared 2.6.9-rc3-mm3 to 2.6.9-mm1 IDE patch:
>
> http://home.elka.pw.edu.pl/~bzolnier/ide-2.6.9-rc3-mm3-to-2.6.9-mm1.patch.bz2
>
> Just revert it from 2.6.9-mm1.

reverting ide changes do not change anything....
error is still here
The only changes I can see now, are the md changes. I will try reverting it,
and if I get no positive results, I give up (for today :))

--
printk("What? oldfid != cii->c_fid. Call 911.\n");
linux-2.4.3/fs/coda/cnode.c

2004-10-26 18:06:27

by Mathieu

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working

Mathieu Segaud <[email protected]> disait derni?rement que :

> reverting ide changes do not change anything....
> error is still here
> The only changes I can see now, are the md changes. I will try reverting it,
> and if I get no positive results, I give up (for today :))

obviously, md changes cannot have caused this failure....my mistake

I will dig again and again, if need be
--
dprintk(5, KERN_DEBUG "Jotti is een held!\n");
linux-2.6.6/drivers/media/video/zoran_card.c

2004-10-26 21:43:27

by Joseph Fannin

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

On Tue, Oct 26, 2004 at 02:59:55PM +0100, Alasdair G Kergon wrote:
> On Tue, Oct 26, 2004 at 08:36:51AM -0400, [email protected] wrote:
> > LVM doesn't work with 2.6.9-mm1 here either, complaining that it
> > can't find all the pv's. I'm not using any sort of encryption. Here,
> > pvdisplay reports:
>
> > I can open the device nodes for the 'missing' pv's in a hexeditor and see the
> > uuid magic; if I reboot into 2.6.9-rc4-mm1 they are found without a
> > problem, and everything works.

> [To check for repeat of old problems with related symptoms:]
> Were both kernels compiled with the same compiler version? Which version?
> Does it make any difference if you rebuild lvm with --disable-o_direct?

Chris Han (BCC'ed) mailed me to let me know he'd narrowed the
problem down to the 'dio-handle-eof.patch'. Reverting it makes things
work for me too. Yay!

> Firstly enable lvm debugging. lvm.conf: log { file="/tmp/lvm2.log" level=7 }
> Compare the lvm log files for the kernels to see where it's going wrong.
> Then take complete straces (incl. read/write data) of the lvm process
> with each kernel and again compare them. [Or put files on web and send URLs.]

vgchange -a y logs that it 'Failed to read label area' for the
'missing' pv's.

I've put up some possibly useful traces, but I think I've picked out
the relevant bits just below:
http://home.columbus.rr.com/jfannin1/vgchange-trace-good.txt
http://home.columbus.rr.com/jfannin1/vgchange-trace-bad.txt

Some (but not all) partitions opened are reading alternately (the -1024 is
constant):

read(4, 0xbfffe600, 2048) = ? ERESTARTSYS (To be restarted)
read(4, "\300;9\230", 2048) = -1024

If there's anything else that wants investigating, I'm still
willing as my free time allows. Thanks!

--
Joseph Fannin
[email protected]

2004-10-26 22:14:13

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

[email protected] wrote:
>
> > [To check for repeat of old problems with related symptoms:]
> > Were both kernels compiled with the same compiler version? Which version?
> > Does it make any difference if you rebuild lvm with --disable-o_direct?
>
> Chris Han (BCC'ed) mailed me to let me know he'd narrowed the
> problem down to the 'dio-handle-eof.patch'. Reverting it makes things
> work for me too. Yay!

If you have time, please restore dio-handle-eof.patch and then apply the
below fixup, then retest. Thanks.

--- 25/fs/direct-io.c~dio-handle-eof-fix 2004-10-26 00:49:40.363376432 -0700
+++ 25-akpm/fs/direct-io.c 2004-10-26 00:49:40.367375824 -0700
@@ -987,6 +987,8 @@ direct_io_worker(int rw, struct kiocb *i
isize = i_size_read(inode);
if (bytes_todo > (isize - offset))
bytes_todo = isize - offset;
+ if (!bytes_todo)
+ return 0;

for (seg = 0; seg < nr_segs && bytes_todo; seg++) {
user_addr = (unsigned long)iov[seg].iov_base;
_

2004-10-27 04:40:28

by Mathieu

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

Andrew Morton <[email protected]> disait derni?rement que :

> If you have time, please restore dio-handle-eof.patch and then apply the
> below fixup, then retest. Thanks.

I had time to test this fix; it did not solve the problem. Whereas reverting
the complete dio-handle-eof.patch solved it.

Best regards,

Mathieu

--
"We ought to make the pie higher."

George W. Bush
February 15, 2000
Comment made in Columbia, South Carolina during presidential campaign.

2004-10-27 05:29:18

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

Mathieu Segaud <[email protected]> wrote:
>
> Andrew Morton <[email protected]> disait derni?rement que :
>
> > If you have time, please restore dio-handle-eof.patch and then apply the
> > below fixup, then retest. Thanks.
>
> I had time to test this fix; it did not solve the problem. Whereas reverting
> the complete dio-handle-eof.patch solved it.

bummer. Can you send a super-simple means by which I can demonstrate the
problem?

Thanks.

2004-10-27 05:49:11

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

On Tue, Oct 26 2004, Andrew Morton wrote:
> Mathieu Segaud <[email protected]> wrote:
> >
> > Andrew Morton <[email protected]> disait derni?rement que :
> >
> > > If you have time, please restore dio-handle-eof.patch and then apply the
> > > below fixup, then retest. Thanks.
> >
> > I had time to test this fix; it did not solve the problem. Whereas reverting
> > the complete dio-handle-eof.patch solved it.
>
> bummer. Can you send a super-simple means by which I can demonstrate the
> problem?

Hmm, maybe round the value up to a PAGE_SIZE in length?

--
Jens Axboe

2004-10-27 06:50:34

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

On Wed, Oct 27 2004, Jens Axboe wrote:
> On Tue, Oct 26 2004, Andrew Morton wrote:
> > Mathieu Segaud <[email protected]> wrote:
> > >
> > > Andrew Morton <[email protected]> disait derni?rement que :
> > >
> > > > If you have time, please restore dio-handle-eof.patch and then apply the
> > > > below fixup, then retest. Thanks.
> > >
> > > I had time to test this fix; it did not solve the problem. Whereas reverting
> > > the complete dio-handle-eof.patch solved it.
> >
> > bummer. Can you send a super-simple means by which I can demonstrate the
> > problem?
>
> Hmm, maybe round the value up to a PAGE_SIZE in length?

This feels pretty icky, but should suffice for testing. Does it make a
difference?

--- /opt/kernel/linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:29:51.866931262 +0200
+++ linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:41:20.292172299 +0200
@@ -987,8 +987,8 @@
isize = i_size_read(inode);
if (bytes_todo > (isize - offset))
bytes_todo = isize - offset;
- if (!bytes_todo)
- return 0;
+ if (bytes_todo < PAGE_SIZE)
+ bytes_todo = PAGE_SIZE;

for (seg = 0; seg < nr_segs && bytes_todo; seg++) {
user_addr = (unsigned long)iov[seg].iov_base;

--
Jens Axboe

2004-10-27 15:09:42

by Joseph Fannin

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

On Wed, Oct 27, 2004 at 08:41:46AM +0200, Jens Axboe wrote:
> On Wed, Oct 27 2004, Jens Axboe wrote:
> > On Tue, Oct 26 2004, Andrew Morton wrote:
> > > Mathieu Segaud <[email protected]> wrote:
> > > > Andrew Morton <[email protected]> disait derni?rement que :
> > > >
> > > > > If you have time, please restore dio-handle-eof.patch and then apply the
> > > > > below fixup, then retest. Thanks.
> > > >
> > > > I had time to test this fix; it did not solve the problem. Whereas reverting
> > > > the complete dio-handle-eof.patch solved it.
> > >
> > > bummer. Can you send a super-simple means by which I can demonstrate the
> > > problem?
> >
> > Hmm, maybe round the value up to a PAGE_SIZE in length?
>
> This feels pretty icky, but should suffice for testing. Does it make a
> difference?

I made this change to 2.6.9-mm1 and it didn't. vgchange still
seems to be trying to read 2048 bytes, rather than 4096 (I may not
know what I'm talking about, or even what I'm looking at, though).

--
Joseph Fannin
[email protected]

"Bull in pure form is rare; there is usually some contamination by data."
-- William Graves Perry Jr.

2004-10-27 15:36:57

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

On Wed, Oct 27, 2004 at 11:03:35AM -0400, Joseph Fannin wrote:
> I made this change to 2.6.9-mm1 and it didn't. vgchange still
> seems to be trying to read 2048 bytes, rather than 4096 (I may not
> know what I'm talking about, or even what I'm looking at, though).

LVM2 uses the (soft) device block size for both alignment and size.
If no blocksize is defined, it uses pagesize.

Even when it only needs to change a few consecutive bytes, it still
has to read a complete aligned block, make the change, then write it
back.

Alasdair
--
[email protected]

2004-10-27 15:41:41

by Mathieu

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

Jens Axboe <[email protected]> disait derni?rement que :


> This feels pretty icky, but should suffice for testing. Does it make a
> difference?
>
> --- /opt/kernel/linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:29:51.866931262 +0200
> +++ linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:41:20.292172299 +0200
> @@ -987,8 +987,8 @@
> isize = i_size_read(inode);
> if (bytes_todo > (isize - offset))
> bytes_todo = isize - offset;
> - if (!bytes_todo)
> - return 0;
> + if (bytes_todo < PAGE_SIZE)
> + bytes_todo = PAGE_SIZE;
>
> for (seg = 0; seg < nr_segs && bytes_todo; seg++) {
> user_addr = (unsigned long)iov[seg].iov_base;

As 2.6.10-rc1-mm1 failed (as expected), I tried tour fix applied upon
2.6.10-rc1-mm1. This did not make any difference.
The only workaround for now is backing out dio-handle-eof-fix.patch and
dio-handle-eof.patch
I am willing to test anything you could send :)

Best regards,

Mathieu

--
panic("esp_handle: current_SC == penguin within interrupt!");
linux-2.2.16/drivers/scsi/esp.c

2004-10-27 15:58:15

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

On Wed, Oct 27, 2004 at 08:41:46AM +0200, Jens Axboe wrote:
> --- /opt/kernel/linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:29:51.866931262 +0200
> +++ linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:41:20.292172299 +0200
> @@ -987,8 +987,8 @@
> isize = i_size_read(inode);

Can that return 0?

Alasdair
--
[email protected]

2004-10-27 20:54:42

by Mathieu

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

Andrew Morton <[email protected]> disait derni?rement que :

> Could someone pleeeeze send out a simple recipe for repeating this problem?

Well, as soon as the boot scripts "initializes" an LVM2 volume group;

just create one with vgcreate(8) from the lvm2 tools (I guess it'd fail under
faulty kernels like the latest -mm's).
Then boot 2.6.9-mm1 or 2.6.10-rc1-mm1, either the distro specific init scripts
will yield an error (No volume groups found), or just issue 'vgchange -a y',
to activate all available volume groups, and it will fail with the above error.

In my case, no kernel messages in dmesg related to these errors.
--
printk("----------- [cut here ] --------- [please bite here ] ---------\n");
linux-2.6.6/arch/x86_64/kernel/traps.

2004-10-27 20:32:58

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

Mathieu Segaud <[email protected]> wrote:
>
> Jens Axboe <[email protected]> disait derni?rement que :
>
>
> > This feels pretty icky, but should suffice for testing. Does it make a
> > difference?
> >
> > --- /opt/kernel/linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:29:51.866931262 +0200
> > +++ linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:41:20.292172299 +0200
> > @@ -987,8 +987,8 @@
> > isize = i_size_read(inode);
> > if (bytes_todo > (isize - offset))
> > bytes_todo = isize - offset;
> > - if (!bytes_todo)
> > - return 0;
> > + if (bytes_todo < PAGE_SIZE)
> > + bytes_todo = PAGE_SIZE;
> >
> > for (seg = 0; seg < nr_segs && bytes_todo; seg++) {
> > user_addr = (unsigned long)iov[seg].iov_base;
>
> As 2.6.10-rc1-mm1 failed (as expected), I tried tour fix applied upon
> 2.6.10-rc1-mm1. This did not make any difference.
> The only workaround for now is backing out dio-handle-eof-fix.patch and
> dio-handle-eof.patch

Could someone pleeeeze send out a simple recipe for repeating this problem?

2004-10-27 18:22:52

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)


On Wed, 27 Oct 2004, Alasdair G Kergon wrote:

> On Wed, Oct 27, 2004 at 08:41:46AM +0200, Jens Axboe wrote:
>> --- /opt/kernel/linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:29:51.866931262 +0200
>> +++ linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:41:20.292172299 +0200
>> @@ -987,8 +987,8 @@
>> isize = i_size_read(inode);
>
> Can that return 0?

This may not be the problem coz' the bug still exists using vanilla
2.6.10-rc1, and there's no "isize" in fs/direct-io.c

Thanks,
Jeff.

2004-10-28 04:54:22

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)


On Wed, 27 Oct 2004, Andrew Morton wrote:
> Could someone pleeeeze send out a simple recipe for repeating this problem?

I'm using 2.6.10-rc1 and got the following error ...

# vgscan
Reading all physical volumes. This may take a while...
Found volume group "vg01" using metadata type lvm2

# vgchange -a y
0 logical volume(s) in volume group "vg01" now active

# lvcreate -L 100M -n lv01 vg01
device-mapper ioctl cmd 0 failed: Inappropriate ioctl for device
striped: Required device-mapper target(s) not detected in your kernel
lvcreate: Create a logical volume


Can't create logical volume (lvcreate) using 2.6.10-rc1.

No problem with 2.6.9 and 2.4.28-rc1.

Lvm tools is LVM2.2.00.25.

Here's my partial .config (same as 2.6.9 which is working with lvm).

#
# Multi-device support (RAID and LVM)
#
CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
# CONFIG_MD_LINEAR is not set
# CONFIG_MD_RAID0 is not set
# CONFIG_MD_RAID1 is not set
# CONFIG_MD_RAID10 is not set
# CONFIG_MD_RAID5 is not set
# CONFIG_MD_RAID6 is not set
# CONFIG_MD_MULTIPATH is not set
CONFIG_BLK_DEV_DM=y
# CONFIG_DM_CRYPT is not set
# CONFIG_DM_SNAPSHOT is not set
# CONFIG_DM_MIRROR is not set
# CONFIG_DM_ZERO is not set


Thanks,
Jeff.

2004-10-28 15:22:03

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

On Thu, Oct 28, 2004 at 12:52:20PM +0800, Jeff Chua wrote:
> I'm using 2.6.10-rc1 and got the following error ...
> # lvcreate -L 100M -n lv01 vg01
> device-mapper ioctl cmd 0 failed: Inappropriate ioctl for device
> striped: Required device-mapper target(s) not detected in your kernel
> lvcreate: Create a logical volume

But that's *not* the dio problem we're discussing in this thread.
It's saying userspace communication with device-mapper isn't working,
most likely because there's something wrong with the way your
system creates /dev/mapper/control when booting or the ioctl
compatibility code (what architecture?).

Alasdair
--
[email protected]

Subject: Re: 2.6.9-mm1: LVM stopped working
Reply-To:
In-Reply-To: <[email protected]>

On Tue, Oct 26, 2004 at 03:09:25PM +0100, Alasdair G Kergon wrote:
> On Tue, Oct 26, 2004 at 09:55:38PM +0800, Jeff Chua wrote:
> > It doesn't work on 2.6.10-rc1 either. Works fine on 2.6.9 and 2.4.8-rc1.
> > device-mapper ioctl cmd 0 failed: Inappropriate ioctl for device
>
> Do you get any corresponding kernel messages?
> Check /dev/mapper/control corresponds to /proc/devices & /proc/misc.
> (See device-mapper scripts/devmap_mknod.sh)
> Use 'dmsetup version' and 'dmsetup targets' to test.
>
> Alasdair
> --
> [email protected]

2004-10-28 16:22:50

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

On Wed, 27 Oct 2004 17:36:14 +0200, Mathieu Segaud said:

> As 2.6.10-rc1-mm1 failed (as expected), I tried tour fix applied upon
> 2.6.10-rc1-mm1. This did not make any difference.
> The only workaround for now is backing out dio-handle-eof-fix.patch and
> dio-handle-eof.patch
> I am willing to test anything you could send :)

For what it's worth, I hit the exact same problem with 2.6.10-rc1-mm1
(failure to get the LVM together at boot, causing a wedge because
my / filesystem is on an LVM), and backing out those two patches has
me up and running.

# fdisk -l /dev/hda

Disk /dev/hda: 40.0 GB, 40007761920 bytes
255 heads, 63 sectors/track, 4864 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/hda1 1 29 232911 84 OS/2 hidden C: drive
/dev/hda2 30 4864 38837137+ 5 Extended
/dev/hda5 * 30 32 24066 83 Linux
/dev/hda6 33 2327 18434556 8e Linux LVM
/dev/hda7 2328 2458 1052226 82 Linux swap
/dev/hda8 2459 4864 19326163+ 8e Linux LVM

(Basically, a 24M /boot, a swap, and *two* LVM partitions - I wonder if that
has anything to do with it - it found one and didn't find the other, and gave
up with much complaining). That OS/2 partition is a remnant of what the docs 2
years ago said was needed for suspend-to-disk...

Am also able to test patches if needed...


Attachments:
(No filename) (226.00 B)

2004-10-29 05:00:42

by Jeff Chua

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

On Thu, 28 Oct 2004, Alasdair G Kergon wrote:

> On Thu, Oct 28, 2004 at 12:52:20PM +0800, Jeff Chua wrote:
>> I'm using 2.6.10-rc1 and got the following error ...
>> # lvcreate -L 100M -n lv01 vg01
>> device-mapper ioctl cmd 0 failed: Inappropriate ioctl for device
>> striped: Required device-mapper target(s) not detected in your kernel
>> lvcreate: Create a logical volume
>
> But that's *not* the dio problem we're discussing in this thread.
> It's saying userspace communication with device-mapper isn't working,
> most likely because there's something wrong with the way your
> system creates /dev/mapper/control when booting or the ioctl
> compatibility code (what architecture?).

doesn't make any sense to me. Why would 2.6.9 works then?
Architecture is Intel running on IBM X31 notebook.

Never had LVM problem until 2.6.10-rc1. It just went dead.

Jeff.

2004-10-29 05:31:58

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

In article <[email protected]> you wrote:
> doesn't make any sense to me. Why would 2.6.9 works then?
> Architecture is Intel running on IBM X31 notebook.

Perhaps you different devfs settings? I had a similiar problem.

Greetings
Bernd

2004-11-01 23:31:42

by Laurent Riffard

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working


Hello,

Lvm2 stopped working since 2.6.9-mm1 for me too : 2.6.9-rc4-mm1 was
fine, 2.6.9-mm1 to 2.6.10-mm2 break lvm2. Reverting
dio-handle-eof.patch on these kernel solves the problem.

I have a simple test case here.

With 2.6.9-rc4-mm1, "pvdisplay /dev/hda4" shows :
--- Physical volume ---
PV Name /dev/hda4
VG Name vglinux1
PV Size 19,07 GB / not usable 0
Allocatable yes
PE Size (KByte) 4096
Total PE 4882
Free PE 3424
Allocated PE 1458
PV UUID Kvi5oA-d8NL-DU0n-vJpt-TKb3-RmDP-nrZoaz

With later -mm kernel, "pvdisplay /dev/hda4" shows :
No physical volume label read from /dev/hda4
Failed to read physical volume "/dev/hda4"

I tracked down the problem to this code section in fs/direct-io.c (function direct_io_worker) :

1012 dio->total_pages = 0;
1013 if (user_addr & (PAGE_SIZE-1)) {
1014 dio->total_pages++;
1015 bytes -= PAGE_SIZE - (user_addr & (PAGE_SIZE - 1));
1016 }
1017 dio->total_pages += (bytes + PAGE_SIZE - 1) / PAGE_SIZE;
1018 dio->curr_user_address = user_addr;
1019
1020 ret = do_direct_IO(dio);
1021
1022 dio->result += bytes -
1023 ((dio->final_block_in_request - dio->block_in_file) <<
1024 blkbits);

In my case, direct_io_worker is called to read 2048 bytes at the beginning of /dev/hda4 :
user_addr=0xbfff9800 (half page aligned)
bytes=2048 (half page)
So "bytes" is zeroed line 1015.
And dio->result is zeroed line 1023.
As a result, direct_io_worker returns 0.

Before dio-handle-eof.patch, line 1022 was :
dio->result += iov[seg].iov_len -

What is the semantic of "bytes" line 1015 : bytes to read on the next page ?
Did I miss something ?

hope this helps...
I will do some tests if needed.

--
laurent



Attachments:
signature.asc (252.00 B)
OpenPGP digital signature

2004-11-02 14:42:02

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

On Wed, Oct 27 2004, Mathieu Segaud wrote:
> Jens Axboe <[email protected]> disait derni?rement que :
>
>
> > This feels pretty icky, but should suffice for testing. Does it make a
> > difference?
> >
> > --- /opt/kernel/linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:29:51.866931262 +0200
> > +++ linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:41:20.292172299 +0200
> > @@ -987,8 +987,8 @@
> > isize = i_size_read(inode);
> > if (bytes_todo > (isize - offset))
> > bytes_todo = isize - offset;
> > - if (!bytes_todo)
> > - return 0;
> > + if (bytes_todo < PAGE_SIZE)
> > + bytes_todo = PAGE_SIZE;
> >
> > for (seg = 0; seg < nr_segs && bytes_todo; seg++) {
> > user_addr = (unsigned long)iov[seg].iov_base;
>
> As 2.6.10-rc1-mm1 failed (as expected), I tried tour fix applied upon
> 2.6.10-rc1-mm1. This did not make any difference.
> The only workaround for now is backing out dio-handle-eof-fix.patch and
> dio-handle-eof.patch
> I am willing to test anything you could send :)

Does this work, on top of 2.6.0-rc1-mm1?

--- /opt/kernel/linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:29:51.000000000 +0200
+++ linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-11-02 15:36:51.864411244 +0100
@@ -985,10 +985,12 @@
}

isize = i_size_read(inode);
- if (bytes_todo > (isize - offset))
- bytes_todo = isize - offset;
- if (!bytes_todo)
- return 0;
+ if (bytes_todo > (isize - offset)) {
+ if ((isize - offset))
+ bytes_todo = isize - offset;
+ if (bytes_todo > PAGE_SIZE)
+ bytes_todo = PAGE_SIZE;
+ }

for (seg = 0; seg < nr_segs && bytes_todo; seg++) {
user_addr = (unsigned long)iov[seg].iov_base;
@@ -1008,10 +1010,9 @@
dio->curr_page = 0;

dio->total_pages = 0;
- if (user_addr & (PAGE_SIZE-1)) {
+ if (user_addr & (PAGE_SIZE-1))
dio->total_pages++;
- bytes -= PAGE_SIZE - (user_addr & (PAGE_SIZE - 1));
- }
+
dio->total_pages += (bytes + PAGE_SIZE - 1) / PAGE_SIZE;
dio->curr_user_address = user_addr;


--
Jens Axboe

2004-11-02 15:10:58

by Mathieu

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

Jens Axboe <[email protected]> disait derni?rement que :

> Ehm, that should be
>
> if ((isize - offset))
> bytes_todo = isize - offset;
> else if (bytes_todo > PAGE_SIZE)
> bytes_todo = PAGE_SIZE;

Give 2 or 3 hours (time to get from office to home, rhooo trafic in Paris)
and I'll get you an answer (I will adapt this patch to 2.6.10-rc1-mm2 if
need be)

Best,

--
There is a word for that and that word is "crap".

- Alexander Viro on linux-kernel

2004-11-02 15:10:57

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

On Tue, Nov 02 2004, Jens Axboe wrote:
> On Wed, Oct 27 2004, Mathieu Segaud wrote:
> > Jens Axboe <[email protected]> disait derni?rement que :
> >
> >
> > > This feels pretty icky, but should suffice for testing. Does it make a
> > > difference?
> > >
> > > --- /opt/kernel/linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:29:51.866931262 +0200
> > > +++ linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:41:20.292172299 +0200
> > > @@ -987,8 +987,8 @@
> > > isize = i_size_read(inode);
> > > if (bytes_todo > (isize - offset))
> > > bytes_todo = isize - offset;
> > > - if (!bytes_todo)
> > > - return 0;
> > > + if (bytes_todo < PAGE_SIZE)
> > > + bytes_todo = PAGE_SIZE;
> > >
> > > for (seg = 0; seg < nr_segs && bytes_todo; seg++) {
> > > user_addr = (unsigned long)iov[seg].iov_base;
> >
> > As 2.6.10-rc1-mm1 failed (as expected), I tried tour fix applied upon
> > 2.6.10-rc1-mm1. This did not make any difference.
> > The only workaround for now is backing out dio-handle-eof-fix.patch and
> > dio-handle-eof.patch
> > I am willing to test anything you could send :)
>
> Does this work, on top of 2.6.0-rc1-mm1?
>
> --- /opt/kernel/linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:29:51.000000000 +0200
> +++ linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-11-02 15:36:51.864411244 +0100
> @@ -985,10 +985,12 @@
> }
>
> isize = i_size_read(inode);
> - if (bytes_todo > (isize - offset))
> - bytes_todo = isize - offset;
> - if (!bytes_todo)
> - return 0;
> + if (bytes_todo > (isize - offset)) {
> + if ((isize - offset))
> + bytes_todo = isize - offset;
> + if (bytes_todo > PAGE_SIZE)
> + bytes_todo = PAGE_SIZE;
> + }

Ehm, that should be

if ((isize - offset))
bytes_todo = isize - offset;
else if (bytes_todo > PAGE_SIZE)
bytes_todo = PAGE_SIZE;


--- /opt/kernel/linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:29:51.000000000 +0200
+++ linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-11-02 15:55:27.918459070 +0100
@@ -985,10 +985,12 @@
}

isize = i_size_read(inode);
- if (bytes_todo > (isize - offset))
- bytes_todo = isize - offset;
- if (!bytes_todo)
- return 0;
+ if (bytes_todo > (isize - offset)) {
+ if ((isize - offset))
+ bytes_todo = isize - offset;
+ else if (bytes_todo > PAGE_SIZE)
+ bytes_todo = PAGE_SIZE;
+ }

for (seg = 0; seg < nr_segs && bytes_todo; seg++) {
user_addr = (unsigned long)iov[seg].iov_base;
@@ -1008,10 +1010,9 @@
dio->curr_page = 0;

dio->total_pages = 0;
- if (user_addr & (PAGE_SIZE-1)) {
+ if (user_addr & (PAGE_SIZE-1))
dio->total_pages++;
- bytes -= PAGE_SIZE - (user_addr & (PAGE_SIZE - 1));
- }
+
dio->total_pages += (bytes + PAGE_SIZE - 1) / PAGE_SIZE;
dio->curr_user_address = user_addr;


--
Jens Axboe

2004-11-02 17:05:59

by Mathieu

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

Jens Axboe <[email protected]> disait derni?rement que :

> Ehm, that should be
>
> if ((isize - offset))
> bytes_todo = isize - offset;
> else if (bytes_todo > PAGE_SIZE)
> bytes_todo = PAGE_SIZE;
>

this one works :)
(of course on top 2.6.10-rc1-mm2, too)

Thanks,

--
<JALH> regex are more than some crappy posix thing
<JALH> they are an art form

- Marc Zealey on #kernelnewbies

2004-11-03 00:35:23

by Laurent Riffard

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


I applied the following patch [http://lkml.org/lkml/2004/11/2/129]
from Jens Axboe onto kernel 2.6.10-rc1-mm2 (succeeded with offset 2
lines) and it solved the problem.

Thanks.

BTW, is there a mean to reply to a post in lkml witout being
subscribed ?
- --
laurent


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFBiAr6UqUFrirTu6IRAh8OAKC3bMrwHy8BFxOoEhg03VbQp9R5hACgu5pn
ZWguH7/6uODFa8NuhwinreQ=
=qXXZ
-----END PGP SIGNATURE-----

2004-11-05 16:03:28

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: 2.6.9-mm1: LVM stopped working (dio-handle-eof.patch)

On Tue, 02 Nov 2004 15:55:41 +0100, Jens Axboe said:

> Ehm, that should be
>
> if ((isize - offset))
> bytes_todo = isize - offset;
> else if (bytes_todo > PAGE_SIZE)
> bytes_todo = PAGE_SIZE;

(Sorry for delay in testing this one...)

This version fixes my LVM issues on 2.6.10-rc1-mm2 as well...

>
> --- /opt/kernel/linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-10-27 08:29:51.000
000000 +0200
> +++ linux-2.6.10-rc1-mm1/fs/direct-io.c 2004-11-02 15:55:27.918459070 +
0100
> @@ -985,10 +985,12 @@
> }
>
> isize = i_size_read(inode);
> - if (bytes_todo > (isize - offset))
> - bytes_todo = isize - offset;
> - if (!bytes_todo)
> - return 0;
> + if (bytes_todo > (isize - offset)) {
> + if ((isize - offset))
> + bytes_todo = isize - offset;
> + else if (bytes_todo > PAGE_SIZE)
> + bytes_todo = PAGE_SIZE;
> + }
>
> for (seg = 0; seg < nr_segs && bytes_todo; seg++) {
> user_addr = (unsigned long)iov[seg].iov_base;
> @@ -1008,10 +1010,9 @@
> dio->curr_page = 0;
>
> dio->total_pages = 0;
> - if (user_addr & (PAGE_SIZE-1)) {
> + if (user_addr & (PAGE_SIZE-1))
> dio->total_pages++;
> - bytes -= PAGE_SIZE - (user_addr & (PAGE_SIZE - 1));
> - }
> +
> dio->total_pages += (bytes + PAGE_SIZE - 1) / PAGE_SIZE;
> dio->curr_user_address = user_addr;
>
>
> --
> Jens Axboe
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


Attachments:
(No filename) (226.00 B)