2003-07-15 22:23:27

by Petro

[permalink] [raw]
Subject: LVM, snapshots and Linux 2.4.x

Hello again.


This time I'm having a bit of trouble with getting snapshots to work
on a 2.4.x kernel.

I have a (well, several) machines configured as follows:

4 gig ram,
2x2.4Ghz. Xeon processors (hyperthreading on).
6 200 gig Western Digital drives attached to a 3ware 7800 card.
1 200 Gig Western Digital drive attachedt to the motherboard.
Motherboard is a Supermicro SUPER P4DPi-G2 (MBD-P4DPi-G2-B)

I have tried mildly patched (i.e. only the stuff I absolutely need)
"stock" kernels, and redhat's 2.4.20-18.9 kernel (stock compile).

I'm trying to create and mount snapshots using "lvcreate -L10G -s -n
snaptest /dev/vg0" (and then a mount later).

Under 2.4.18 I actually get an oops:
ksymoops 2.4.1 on i686 2.4.18. Options used
-V (default)
-k /var/log/ksymoops/20030715080710.ksyms (specified)
-l /var/log/ksymoops/20030715080710.modules (specified)
-o /lib/modules/2.4.18/ (default)
-m /boot/System.map-2.4.18 (default)

Warning (compare_maps): mismatch on symbol partition_name , ksyms_base says c0208860, System.map says c0158050. Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol nlmsvc_ops , lockd says f8984fb0, /lib/modules/2.4.18/kernel/fs/lockd/lockd.o says f8984408. Ignoring /lib/modules/2.4.18/kernel/fs/lockd/lockd.o entry
Warning (compare_maps): mismatch on symbol nfs_debug , sunrpc says f8977524, /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o says f8977204. Ignoring /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol nfsd_debug , sunrpc says f8977528, /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o says f8977208. Ignoring /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol nlm_debug , sunrpc says f897752c, /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o says f897720c. Ignoring /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol rpc_debug , sunrpc says f8977520, /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o says f8977200. Ignoring /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o entry
kernel BUG at vmalloc.c:236!
invalid operand: 0000
CPU: 3
EIP: 0010:[<c012c431>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
eax: 0000001d ebx: 00000000 ecx: c02ccde0 edx: 00005647
esi: 00000000 edi: f7162000 ebp: fffffff4 esp: ea799d18
ds: 0018 es: 0018 ss: 0018
Process lvcreate (pid: 10777, stackpage=ea799000)
Stack: c02695b7 000000ec 00000000 00000000 f7162000 fffffff4 000001f0 f8d61000
00000001 fffffff4 c02ce188 c02ce2d8 000001f0 00000001 c0212e15 00000000
000001f2 00000163 f716216c 00000000 f7162000 ea799df8 c0212ec8 f7162000
Call Trace: [<c0212e15>] [<c0212ec8>] [<c0210ae0>] [<c020e58c>] [<c0145dc7>]
[<c0106e9b>]
Code: 0f 0b 83 c4 08 31 c0 e9 b7 01 00 00 8d 76 00 6a 02 53 e8 2c

>>EIP; c012c431 <__vmalloc+35/200> <=====
Trace; c0212e15 <lvm_snapshot_alloc_hash_table+45/8c>
Trace; c0212ec8 <lvm_snapshot_alloc+6c/e0>
Trace; c0210ae0 <lvm_do_lv_create+50c/850>
Trace; c020e58c <lvm_chr_ioctl+71c/828>
Trace; c0145dc7 <sys_ioctl+1bb/208>
Trace; c0106e9b <system_call+33/38>
Code; c012c431 <__vmalloc+35/200>
00000000 <_EIP>:
Code; c012c431 <__vmalloc+35/200> <=====
0: 0f 0b ud2a <=====
Code; c012c433 <__vmalloc+37/200>
2: 83 c4 08 add $0x8,%esp
Code; c012c436 <__vmalloc+3a/200>
5: 31 c0 xor %eax,%eax
Code; c012c438 <__vmalloc+3c/200>
7: e9 b7 01 00 00 jmp 1c3 <_EIP+0x1c3> c012c5f4 <__vmalloc+1f8/200>
Code; c012c43d <__vmalloc+41/200>
c: 8d 76 00 lea 0x0(%esi),%esi
Code; c012c440 <__vmalloc+44/200>
f: 6a 02 push $0x2
Code; c012c442 <__vmalloc+46/200>
11: 53 push %ebx
Code; c012c443 <__vmalloc+47/200>
12: e8 2c 00 00 00 call 43 <_EIP+0x43> c012c474 <__vmalloc+78/200>


6 warnings issued. Results may not be reliable.

On a 2.4.21 kernel I get:
lvcreate -- WARNING: the snapshot will be automatically disabled once it gets full
lvcreate -- INFO: using default snapshot chunk size of 64 KB for "/dev/vg0/snap"
lvcreate -- ERROR "Cannot allocate memory" creating VGDA for "/dev/vg0/snap" in kernel

This is on a machine with 4 gig of memory running *NOTHING ELSE*.

This also causes some sort of funkiness with LVM layer requiring a hard
reboot--powercycle--to get it back.

This is with:

#CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_HIGHMEM=y
CONFIG_HIGHIO=y
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
CONFIG_SMP=y

if I turn off HIGHIO, the lvcreate command completes successfully, but
the snapshot is unmountable.

If I turn off HIGHMEM4G and HIGHIO, then everything works, but I lose
3G. This is non-workable.

The redhat kernel (2.4.20-18.9) allows the creation of the snapshot but
will not mount it.

However--the redhat kernel is a real bastard child, as these are Debian
boxes with really old bits on them (pre-3.0 unstable packages).

At this point these are not production machines, so I'm willing to try
just about anything reasonable to get a *stable* platform that provides:

1) Access to 4G of ram (or most of 4G).
2) 1Terabyte filesystem (or close to it).
3) 10-20G snapshots of that filesystem.


What am I forgetting to mention...Other than I'm at wits ends?

--
"On two occasions, I have been asked [by members of Parliament], 'Pray,
Mr. Babbage, if you put into the machine wrong figures, will the right
answers come out?' I am not able to rightly apprehend the kind of confusion
of ideas that could provoke such a question." -- Charles Babbage


2003-07-16 01:01:19

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: LVM, snapshots and Linux 2.4.x

In article <[email protected]> you wrote:
> if I turn off HIGHIO, the lvcreate command completes successfully, but
> the snapshot is unmountable.

kvm was segfaulting for me with xfs if the snapsht volume gets full, but I
think this is fixed.

What filesystem do you had on the snapshot volume? Depending on the
filesystem, you may need to mount it without journal replay, or with
ignoring duplicate uuids.

Greetings
Bernd
--
eckes privat - http://www.eckes.org/
Project Freefire - http://www.freefire.org/

2003-07-16 21:15:21

by Petro

[permalink] [raw]
Subject: Re: LVM, snapshots and Linux 2.4.x

On Wed, Jul 16, 2003 at 03:16:07AM +0200, Bernd Eckenfels wrote:
> In article <[email protected]> you wrote:
> > if I turn off HIGHIO, the lvcreate command completes successfully, but
> > the snapshot is unmountable.
>
> kvm was segfaulting for me with xfs if the snapsht volume gets full, but I
> think this is fixed.
> What filesystem do you had on the snapshot volume? Depending on the
> filesystem, you may need to mount it without journal replay, or with
> ignoring duplicate uuids.

Kernel 2.4.18 segfaults when I do the lvcreate -s

Kernel 2.4.21 with the HighMem I/O fails on the lvcreate -s.

Kernel 2.4.21 without HighMem I/O fails but with High mem fails
when I try to mount the snapshot.

Kernel 2.4.21 without high memory, and without high i/o suceeds
in both creating and mounting the snapshot.

--
"On two occasions, I have been asked [by members of Parliament], 'Pray,
Mr. Babbage, if you put into the machine wrong figures, will the right
answers come out?' I am not able to rightly apprehend the kind of confusion
of ideas that could provoke such a question." -- Charles Babbage

2003-07-18 23:06:36

by Petro

[permalink] [raw]
Subject: Kernel Oops--2.4.18

(I sent a similar email out earlier in the week, but received only one
reply, so I'm trying again)

I'm repeatedly getting:

kernel BUG at vmalloc.c:236!
invalid operand: 0000
CPU: 2
EIP: 0010:[__vmalloc+53/512] Not tainted
EFLAGS: 00010286
eax: 0000001d ebx: 00000000 ecx: c02c80e0 edx: 000055a2
esi: 00000000 edi: f705a600 ebp: fffffff4 esp: f6fc9d18
ds: 0018 es: 0018 ss: 0018
Process lvcreate (pid: 400, stackpage=f6fc9000)
Stack: c026c0d5 000000ec 00000000 00000000 f705a600 fffffff4 000001f0 f8d4e000
00000001 fffffff4 c02c94e8 c02c9638 000001f0 00000001 c0216165 00000000
000001f2 00000163 f705a76c 00000000 f705a600 f6fc9df8 c0216218 f705a600
Call Trace: [lvm_snapshot_alloc_hash_table+69/140] [lvm_snapshot_alloc+108/224] [lvm_do_lv_create+1292/2140] [lvm_chr_ioctl+1820/2088] [sys_ioctl+443/520]
[system_call+51/56]

Code: 0f 0b 83 c4 08 31 c0 e9 b7 01 00 00 8d 76 00 6a 02 53 e8 2c

Which decodes to:

ksymoops 2.4.1 on i686 2.4.18. Options used
-V (default)
-k /var/log/ksymoops/20030718143024.ksyms (specified)
-l /var/log/ksymoops/20030718143024.modules (specified)
-o /lib/modules/2.4.18/ (default)
-m /boot/System.map-2.4.18 (default)

Warning (compare_maps): mismatch on symbol partition_name , ksyms_base says c020b870, System.map says c0158b10. Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol nlmsvc_ops , lockd says f8973af0, /lib/modules/2.4.18/kernel/fs/lockd/lockd.o says f8972f48. Ignoring /lib/modules/2.4.18/kernel/fs/lockd/lockd.o entry
Warning (compare_maps): mismatch on symbol nfs_debug , sunrpc says f89660e4, /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o says f8965dc4. Ignoring /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol nfsd_debug , sunrpc says f89660e8, /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o says f8965dc8. Ignoring /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol nlm_debug , sunrpc says f89660ec, /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o says f8965dcc. Ignoring /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o entry
Warning (compare_maps): mismatch on symbol rpc_debug , sunrpc says f89660e0, /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o says f8965dc0. Ignoring /lib/modules/2.4.18/kernel/net/sunrpc/sunrpc.o entry
kernel BUG at vmalloc.c:236!
invalid operand: 0000
CPU: 2
EIP: 0010:[__vmalloc+53/512] Not tainted
EFLAGS: 00010286
eax: 0000001d ebx: 00000000 ecx: c02c80e0 edx: 000055a2
esi: 00000000 edi: f705a600 ebp: fffffff4 esp: f6fc9d18
ds: 0018 es: 0018 ss: 0018
Process lvcreate (pid: 400, stackpage=f6fc9000)
Stack: c026c0d5 000000ec 00000000 00000000 f705a600 fffffff4 000001f0 f8d4e000
00000001 fffffff4 c02c94e8 c02c9638 000001f0 00000001 c0216165 00000000
000001f2 00000163 f705a76c 00000000 f705a600 f6fc9df8 c0216218 f705a600
Call Trace: [lvm_snapshot_alloc_hash_table+69/140] [lvm_snapshot_alloc+108/224] [lvm_do_lv_create+1292/2140] [lvm_chr_ioctl+1820/2088] [sys_ioctl+443/520]
Code: 0f 0b 83 c4 08 31 c0 e9 b7 01 00 00 8d 76 00 6a 02 53 e8 2c
Using defaults from ksymoops -t elf32-i386 -a i386

Code; 00000000 Before first symbol
00000000 <_EIP>:
Code; 00000000 Before first symbol
0: 0f 0b ud2a
Code; 00000002 Before first symbol
2: 83 c4 08 add $0x8,%esp
Code; 00000005 Before first symbol
5: 31 c0 xor %eax,%eax
Code; 00000007 Before first symbol
7: e9 b7 01 00 00 jmp 1c3 <_EIP+0x1c3> 000001c3 Before first symbol
Code; 0000000c Before first symbol
c: 8d 76 00 lea 0x0(%esi),%esi
Code; 0000000f Before first symbol
f: 6a 02 push $0x2
Code; 00000011 Before first symbol
11: 53 push %ebx
Code; 00000012 Before first symbol
12: e8 2c 00 00 00 call 43 <_EIP+0x43> 00000043 Before first symbol


6 warnings issued. Results may not be reliable.

When trying to do create a snapshot with lvcreate -L10G -s -n snaptest
/dev/vg0/lv0.

The kernel source is mildly patched with (in order)
lmsensors-patch-2.7.0vs2.4.18
i2cpatch-2.7.0vs2.4.18lvm
lvm-1.0.7-2.4.18.patch
linux-2.4.18-VFS-lock.patch
linux-2.4.18-sard.patch
3ware 7.5.2 drivers (copied over by hand).

and built using gcc --version 2.95.4

The hardware is:

4 gig ram,
2x2.4Ghz. Xeon processors (hyperthreading on).
6 200 gig Western Digital drives attached to a 3ware 7800 card.
1 200 Gig Western Digital drive attachedt to the motherboard.
Motherboard is a Supermicro SUPER P4DPi-G2 (MBD-P4DPi-G2-B)


I have:
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_HIGHMEM=y

I am told by the LVM Guy that this is a kernel VM bug. Is this the case?
If so, is there a patch I can apply to let me get this stuff into
production?


--
"On two occasions, I have been asked [by members of Parliament], 'Pray,
Mr. Babbage, if you put into the machine wrong figures, will the right
answers come out?' I am not able to rightly apprehend the kind of confusion
of ideas that could provoke such a question." -- Charles Babbage