2001-03-08 22:22:20

by Jeffrey Hundstad

[permalink] [raw]
Subject: lvm - lvm_map access beyond end of device

Hello,

Vital info:
Linux-2.2.18
LVM version 0.9 by Heinz Mauelshagen (13/11/2000)
gcc version 2.95.2
GDT7563RN Firmware 2.27.04-R03F


After about 27 days of uptime one of our Linux machines that is being
used as a Samba, Netatalk, and FTP server; the main data mount went into
a read-only state. The machine has a GDT7563RN ICP Vortex RAID card
with 6-SCSI channels. Five channels are being used with nine 18GB disks
on each channel. The RAID controller takes those 45 disks and makes
three RAID-5 units with two hot-spares. We are using LVM to take those
three RAID disks and striping those to one big block device. We are
mounting that 656GB disk as an EXT2 mount on /data. During testing we
banged the h*ll out of it with multiple bonnie tests and never found a
problem. But during normal use we got events in our syslog (attached),
also note the RAID controller showed no events. After a umount of the
mounted filesystem, a reboot then a fsck the system worked fine. There
were no bad reports from the fsck. Yes, I said no bad reports from
fsck, it did take 3.5 hours though. What's causing this?

Inline attachments syslog, dmesg-boot.

syslog:

Mar 8 01:15:47 files kernel: lvm - lvm_map access beyond end of device;
*rsector: 2451636592 or size: 8 wrong for minor: 0
Mar 8 01:15:47 files kernel: Bad lvm_map in ll_rw_block
Mar 8 01:15:47 files kernel: EXT2-fs error (device lvm(58,0)):
ext2_readdir: bad entry in directory #43122713: rec_len is smaller than
minimal - offset=0, inode=0, rec_len=0, name_len=0
Mar 8 01:15:47 files kernel: Remounting filesystem read-only
Mar 8 01:15:48 files kernel: lvm - lvm_map access beyond end of device;
*rsector: 2596938272 or size: 8 wrong for minor: 0
Mar 8 01:15:48 files kernel: Bad lvm_map in ll_rw_block
Mar 8 01:15:48 files kernel: EXT2-fs error (device lvm(58,0)):
ext2_readdir: bad entry in directory #43122714: rec_len is smaller than
minimal - offset=0, inode=0, rec_len=0, name_len=0
Mar 8 01:15:48 files kernel: Remounting filesystem read-only


dmesg-boot:
1 1 89
0d 000 00 1 0 0 0 0 0 0 00
0e 000 00 0 0 0 0 0 1 1 91
0f 000 00 0 0 0 0 0 1 1 99
10 0FF 0F 1 1 0 1 0 1 1 A1
11 0FF 0F 1 1 0 1 0 1 1 A9
12 000 00 1 0 0 0 0 0 0 00
13 0FF 0F 1 1 0 1 0 1 1 B1
14 000 00 1 0 0 0 0 0 0 00
15 0FF 0F 1 1 0 1 0 1 1 B9
16 000 00 1 0 0 0 0 0 0 00
17 0FF 0F 1 1 0 1 0 1 1 C1
IRQ to pin mappings:
IRQ0 -> 2
IRQ1 -> 1
IRQ3 -> 3
IRQ4 -> 4
IRQ6 -> 6
IRQ7 -> 7
IRQ8 -> 8
IRQ12 -> 12
IRQ13 -> 13
IRQ14 -> 14
IRQ15 -> 15
IRQ16 -> 16
IRQ17 -> 17
IRQ19 -> 19
IRQ21 -> 21
IRQ23 -> 23
.................................... done.
checking TSC synchronization across CPUs: passed.
PCI: PCI BIOS revision 2.10 entry at 0xfdab0
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI->APIC IRQ transform: (B0,I12,P0) -> 19
PCI->APIC IRQ transform: (B0,I12,P0) -> 19
PCI->APIC IRQ transform: (B0,I13,P0) -> 17
PCI->APIC IRQ transform: (B0,I14,P0) -> 21
PCI->APIC IRQ transform: (B0,I16,P0) -> 16
PCI->APIC IRQ transform: (B0,I16,P1) -> 21
PCI->APIC IRQ transform: (B0,I18,P3) -> 21
PCI->APIC IRQ transform: (B2,I7,P0) -> 23
Linux NET4.0 for Linux 2.2
Based upon Swansea University Computer Society NET3.039
NET4: Unix domain sockets 1.0 for Linux NET4.0.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
TCP: Hash tables configured (ehash 524288 bhash 65536)
Starting kswapd v 1.5
Detected PS/2 Mouse Port.
pty: 256 Unix98 ptys configured
Real Time Clock Driver v1.09
RAM disk driver initialized: 16 RAM disks of 4096K size
loop: registered device at major 7
PIIX4: IDE controller on PCI bus 00 dev 91
PIIX4: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0x3060-0x3067, BIOS settings: hda:DMA, hdb:pio
ide1: BM-DMA at 0x3068-0x306f, BIOS settings: hdc:pio, hdd:pio
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
LVM version 0.9 by Heinz Mauelshagen (13/11/2000)
lvm
-- Driver successfully initialized
Configuring GDT-PCI HA at 0/13 IRQ 17
scsi0 : GDT7563RN
scsi : 1 host.
Vendor: ICP Model: Host Drive #00 Rev:
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
Vendor: ICP Model: Host Drive #15 Rev:
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdb at scsi0, channel 0, id 15, lun 0
Vendor: ICP Model: Host Drive #29 Rev:
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sdc at scsi0, channel 0, id 29, lun 0
scsi : detected 3 SCSI disks total.
SCSI device sda: hdwr sector= 512 bytes. Sectors= 501790275 [245014 MB]
[245.0 GB]
SCSI device sdb: hdwr sector= 512 bytes. Sectors= 465949260 [227514 MB]
[227.5 GB]
SCSI device sdc: hdwr sector= 512 bytes. Sectors= 465949260 [227514 MB]
[227.5 GB]
Partition check:
sda: sda1 sda2 sda3
sdb: sdb1
sdc: sdc1
VFS: Mounted root (ext2 filesystem) readonly.
Freeing unused kernel memory: 68k freed
Adding Swap: 2097136k swap-space (priority -1)
eepro100.c:v1.09j-t 9/29/99 Donald Becker
http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html
eepro100.c: $Revision: 1.20.2.10 $ 2000/05/31 Modified by Andrey V.
Savochkin <[email protected]> and others
eepro100.c: VA Linux custom, Dragan Stancevic <[email protected]>
2000/11/15
eth0: Intel PCI EtherExpress Pro100 82557, 00:D0:B7:88:55:39, IRQ 21.
Receiver lock-up bug exists -- enabling work-around.
Board assembly 000000-000, Physical connectors present: RJ45
Primary interface chip i82555 PHY #1.
Forcing 100Mbs full-duplex operation.
General self-test: passed.
Serial sub-system self-test: passed.
Internal registers self-test: passed.
ROM checksum self-test: passed (0x04f4518b).
NET4: AppleTalk 0.18 for Linux NET4.0
(scsi1) <Adaptec AIC-7896/7 Ultra2 SCSI host adapter> found at PCI
0/12/0
(scsi1) Wide Channel A, SCSI ID=7, 32/255 SCBs
(scsi1) Downloading sequencer code... 392 instructions downloaded
(scsi2) <Adaptec AIC-7896/7 Ultra2 SCSI host adapter> found at PCI
0/12/1
(scsi2) Wide Channel B, SCSI ID=7, 32/255 SCBs
(scsi2) Downloading sequencer code... 392 instructions downloaded
(scsi3) <Adaptec AIC-7899 Ultra 160/m SCSI host adapter> found at PCI
0/16/0
(scsi3) Wide Channel A, SCSI ID=7, 32/255 SCBs
(scsi3) Downloading sequencer code... 392 instructions downloaded
(scsi4) <Adaptec AIC-7899 Ultra 160/m SCSI host adapter> found at PCI
0/16/1
(scsi4) Wide Channel B, SCSI ID=7, 32/255 SCBs
(scsi4) Downloading sequencer code... 392 instructions downloaded
scsi1 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.31/3.2.4
<Adaptec AIC-7896/7 Ultra2 SCSI host adapter>
scsi2 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.31/3.2.4
<Adaptec AIC-7896/7 Ultra2 SCSI host adapter>
scsi3 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.31/3.2.4
<Adaptec AIC-7899 Ultra 160/m SCSI host adapter>
scsi4 : Adaptec AHA274x/284x/294x (EISA/VLB/PCI-Fast SCSI) 5.1.31/3.2.4
<Adaptec AIC-7899 Ultra 160/m SCSI host adapter>
scsi : 5 hosts.
(scsi2:0:6:0) Synchronous at 20.0 Mbyte/sec, offset 16.
Vendor: TOSHIBA Model: CD-ROM XM-6401TA Rev: 1001
Type: CD-ROM ANSI SCSI revision: 02
Vendor: EXABYTE Model: Exabyte EZ17 Rev: 1.07
Type: Medium Changer ANSI SCSI revision: 02
(scsi3:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31.
Vendor: EXABYTE Model: Mammoth2 Rev: v02j
Type: Sequential-Access ANSI SCSI revision: 02
Vendor: EXABYTE Model: EXB-210 Rev: 1.07
Type: Medium Changer ANSI SCSI revision: 02
(scsi4:0:1:0) Synchronous at 80.0 Mbyte/sec, offset 31.
Vendor: EXABYTE Model: Mammoth2 Rev: v02p
Type: Sequential-Access ANSI SCSI revision: 02
Detected scsi generic sg4 at scsi3, channel 0, id 0, lun 0
Detected scsi generic sg6 at scsi4, channel 0, id 0, lun 0
st: bufsize 32768, wrt 30720, max buffers 5, s/g segs 16.
Detected scsi tape st0 at scsi3, channel 0, id 1, lun 0
Detected scsi tape st1 at scsi4, channel 0, id 1, lun 0




2001-03-09 00:48:05

by Andreas Dilger

[permalink] [raw]
Subject: Re: lvm - lvm_map access beyond end of device

Jeffrey Hundstad writes:
> After about 27 days of uptime one of our Linux machines that is being
> used as a Samba, Netatalk, and FTP server; the main data mount went into
> a read-only state.
[snip]
> We are mounting that 656GB disk as an EXT2 mount on /data.

As an aside, other than this problem, how do you find ext2 on such a large
filesystem? So far this is the largest ext2 filesystem I've seen. Are
you looking into ext3 at all to avoid fsck on crash? Granted, in this
case you still needed to do the whole fsck but in other cases ext3 would
help.

> ...during normal use we got events in our syslog (attached), also note
> the RAID controller showed no events. After a umount of the mounted
> filesystem, a reboot then a fsck the system worked fine. There
> were no bad reports from the fsck. Yes, I said no bad reports from
> fsck, it did take 3.5 hours though. What's causing this?
>
> Mar 8 01:15:47 files kernel: lvm - lvm_map access beyond end of device;
> *rsector: 2451636592 or size: 8 wrong for minor: 0
> Mar 8 01:15:47 files kernel: Bad lvm_map in ll_rw_block
> Mar 8 01:15:47 files kernel: EXT2-fs error (device lvm(58,0)):
> ext2_readdir: bad entry in directory #43122713: rec_len is smaller than
> minimal - offset=0, inode=0, rec_len=0, name_len=0
> Mar 8 01:15:47 files kernel: Remounting filesystem read-only
> Mar 8 01:15:48 files kernel: lvm - lvm_map access beyond end of device;
> *rsector: 2596938272 or size: 8 wrong for minor: 0
> Mar 8 01:15:48 files kernel: Bad lvm_map in ll_rw_block
> Mar 8 01:15:48 files kernel: EXT2-fs error (device lvm(58,0)):
> ext2_readdir: bad entry in directory #43122714: rec_len is smaller than
> minimal - offset=0, inode=0, rec_len=0, name_len=0
> Mar 8 01:15:48 files kernel: Remounting filesystem read-only

Is this the only output in your syslog, a representative sample, a
small part of a huge number of messages? How often do you have this
problem, or is it the first time?

Can you verify that inodes 43122713 and 43122714 are actually valid
directories or at least valid inode numbers? dumpe2fs -h should be
enough to tell you the number of inodes in the filesystem.

According to my calculations for your 656 GB 4kB block filesystem:

sector 2451636592 / 8 sectors/block = 306454574
sector 2596938272 / 8 sectors/block = 324617284

656 GB * 1024 MB/GB * 256 blocks/MB = 171966464 blocks

So LVM is correct in refusing to map these blocks. The real question is
where do the block numbers come from? The "bad entry" messages are
simply a result of LVM not filling in the buffer for ext2, and it is
all zeros.

The first point where we could have a bogus block number would be
in inode_getblk(), where we access inode->u.ext2_i.i_data[0] (the
block number here is not checked), but you say that e2fsck reported
all was well.

The below patch for 2.2.18 verifies the block numbers read from disk
before reading them. It would at least help narrow down the problem here.

Cheers, Andreas
========================================================================
--- fs/ext2/inode.c.orig Wed Jan 17 03:22:51 2001
+++ fs/ext2/inode.c Thu Mar 8 17:35:33 2001
@@ -228,13 +228,20 @@
int create, int new_block, int * err)
{
u32 * p;
- int tmp, goal = 0;
+ u32 tmp, goal = 0;
struct buffer_head * result;
int blocks = inode->i_sb->s_blocksize / 512;

p = inode->u.ext2_i.i_data + nr;
repeat:
tmp = *p;
+ if (tmp > le32_to_cpu(EXT2_SB(inode->i_sb)->s_es->s_blocks_count)) {
+ ext2_warning(inode->i_sb, __FUNCTION__,
+ "inode #%u has bad direct block[%d] = %u\n",
+ inode->i_ino, nr, tmp);
+ *err = -EIO;
+ return NULL;
+ }
if (tmp) {
struct buffer_head * result;
result = getblk (inode->i_dev, le32_to_cpu(tmp), inode->i_sb->s_blocksize);
@@ -293,7 +301,7 @@
int create, int blocksize,
int new_block, int * err)
{
- int tmp, goal = 0;
+ u32 tmp, goal = 0;
u32 * p;
struct buffer_head * result;
int blocks = inode->i_sb->s_blocksize / 512;
@@ -311,6 +319,13 @@
p = (u32 *) bh->b_data + nr;
repeat:
tmp = le32_to_cpu(*p);
+ if (tmp > le32_to_cpu(EXT2_SB(inode->i_sb)->s_es->s_blocks_count)) {
+ ext2_warning(inode->i_sb, __FUNCTION__,
+ "inode #%u has bad indirect block[%u][%d] = %u\n",
+ inode->i_ino, bh->b_blocknr, nr, tmp);
+ *err = -EIO;
+ return NULL;
+ }
if (tmp) {
result = getblk (bh->b_dev, tmp, blocksize);
if (tmp == le32_to_cpu(*p)) {
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert

2001-03-09 05:02:23

by Jeffrey Hundstad

[permalink] [raw]
Subject: Re: lvm - lvm_map access beyond end of device

Andreas Dilger wrote:

> Jeffrey Hundstad writes:
> > After about 27 days of uptime one of our Linux machines that is being
> > used as a Samba, Netatalk, and FTP server; the main data mount went into
> > a read-only state.
> [snip]
> > We are mounting that 656GB disk as an EXT2 mount on /data.
>
> As an aside, other than this problem, how do you find ext2 on such a large
> filesystem? So far this is the largest ext2 filesystem I've seen. Are
> you looking into ext3 at all to avoid fsck on crash? Granted, in this
> case you still needed to do the whole fsck but in other cases ext3 would
> help.

EXT2 is nice and snappy. This is our first crash... but the fsck was fun to watch ;) We
might switch if there is a stable alternative. I've used EXT3 on my personal system and it
worked fine, and it was really sweet to move from EXT2<->EXT3. Do you recommend it for a
production system? Why isn't it in 2.2.18 OR 2.4.x?


>
> > ...during normal use we got events in our syslog (attached), also note
> > the RAID controller showed no events. After a umount of the mounted
> > filesystem, a reboot then a fsck the system worked fine. There
> > were no bad reports from the fsck. Yes, I said no bad reports from
> > fsck, it did take 3.5 hours though. What's causing this?
> >
> > Mar 8 01:15:47 files kernel: lvm - lvm_map access beyond end of device;
> > *rsector: 2451636592 or size: 8 wrong for minor: 0
> > Mar 8 01:15:47 files kernel: Bad lvm_map in ll_rw_block
> > Mar 8 01:15:47 files kernel: EXT2-fs error (device lvm(58,0)):
> > ext2_readdir: bad entry in directory #43122713: rec_len is smaller than
> > minimal - offset=0, inode=0, rec_len=0, name_len=0
> > Mar 8 01:15:47 files kernel: Remounting filesystem read-only
> > Mar 8 01:15:48 files kernel: lvm - lvm_map access beyond end of device;
> > *rsector: 2596938272 or size: 8 wrong for minor: 0
> > Mar 8 01:15:48 files kernel: Bad lvm_map in ll_rw_block
> > Mar 8 01:15:48 files kernel: EXT2-fs error (device lvm(58,0)):
> > ext2_readdir: bad entry in directory #43122714: rec_len is smaller than
> > minimal - offset=0, inode=0, rec_len=0, name_len=0
> > Mar 8 01:15:48 files kernel: Remounting filesystem read-only
>
> Is this the only output in your syslog, a representative sample, a
> small part of a huge number of messages? How often do you have this
> problem, or is it the first time?
>

This was a `grep kernel` of the days syslog file. I did take a look and it was the only thing
relevant.


>
> Can you verify that inodes 43122713 and 43122714 are actually valid
> directories or at least valid inode numbers? dumpe2fs -h should be
> enough to tell you the number of inodes in the filesystem.
>

# dumpe2fs -h /dev/vg0/lv0
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 1e473942-d8f1-4e48-905f-2f872f7e4424
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: filetype sparse_super
Filesystem state: not clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 87375872
Block count: 174723072
Reserved block count: 0
Free blocks: 163429202
Free inodes: 86687944
First block: 0
Block size: 4096
Fragment size: 4096
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 16384
Inode blocks per group: 512
Last mount time: Thu Mar 8 12:28:00 2001
Last write time: Thu Mar 8 22:42:13 2001
Mount count: 1
Maximum mount count: 20
Last checked: Thu Mar 8 12:27:58 2001
Check interval: 15552000 (6 months)
Next check after: Tue Sep 4 13:27:58 2001
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128



>
> According to my calculations for your 656 GB 4kB block filesystem:
>
> sector 2451636592 / 8 sectors/block = 306454574
> sector 2596938272 / 8 sectors/block = 324617284
>
> 656 GB * 1024 MB/GB * 256 blocks/MB = 171966464 blocks
>
> So LVM is correct in refusing to map these blocks. The real question is
> where do the block numbers come from? The "bad entry" messages are
> simply a result of LVM not filling in the buffer for ext2, and it is
> all zeros.
>
> The first point where we could have a bogus block number would be
> in inode_getblk(), where we access inode->u.ext2_i.i_data[0] (the
> block number here is not checked), but you say that e2fsck reported
> all was well.
>
> The below patch for 2.2.18 verifies the block numbers read from disk
> before reading them. It would at least help narrow down the problem here.

I'll see if I can add them in next reboot... which is tricky business with a 24x7 machine in
production. I'll insert this into 2.2.19 when it comes out.


>
>
> Cheers, Andreas
> ========================================================================
> --- fs/ext2/inode.c.orig Wed Jan 17 03:22:51 2001
> +++ fs/ext2/inode.c Thu Mar 8 17:35:33 2001
> @@ -228,13 +228,20 @@
> int create, int new_block, int * err)
> {
> u32 * p;
> - int tmp, goal = 0;
> + u32 tmp, goal = 0;
> struct buffer_head * result;
> int blocks = inode->i_sb->s_blocksize / 512;
>
> p = inode->u.ext2_i.i_data + nr;
> repeat:
> tmp = *p;
> + if (tmp > le32_to_cpu(EXT2_SB(inode->i_sb)->s_es->s_blocks_count)) {
> + ext2_warning(inode->i_sb, __FUNCTION__,
> + "inode #%u has bad direct block[%d] = %u\n",
> + inode->i_ino, nr, tmp);
> + *err = -EIO;
> + return NULL;
> + }
> if (tmp) {
> struct buffer_head * result;
> result = getblk (inode->i_dev, le32_to_cpu(tmp), inode->i_sb->s_blocksize);
> @@ -293,7 +301,7 @@
> int create, int blocksize,
> int new_block, int * err)
> {
> - int tmp, goal = 0;
> + u32 tmp, goal = 0;
> u32 * p;
> struct buffer_head * result;
> int blocks = inode->i_sb->s_blocksize / 512;
> @@ -311,6 +319,13 @@
> p = (u32 *) bh->b_data + nr;
> repeat:
> tmp = le32_to_cpu(*p);
> + if (tmp > le32_to_cpu(EXT2_SB(inode->i_sb)->s_es->s_blocks_count)) {
> + ext2_warning(inode->i_sb, __FUNCTION__,
> + "inode #%u has bad indirect block[%u][%d] = %u\n",
> + inode->i_ino, bh->b_blocknr, nr, tmp);
> + *err = -EIO;
> + return NULL;
> + }
> if (tmp) {
> result = getblk (bh->b_dev, tmp, blocksize);
> if (tmp == le32_to_cpu(*p)) {