2004-06-15 15:48:49

by micah anderson

[permalink] [raw]
Subject: PROBLEM: 2.6.6 grinds to a halt with moderate I/O


Following the format from REPORTING-BUGS please see the below information.
I unfortunately cannot subscribe to the list, but will follow the thread. I
have searched high and low, read a number of threads somewhat tangential to
this problem, and asked a few times in #kernelnewbies before I got to my
wits end and now will try here. I really appreciate any insight anyone has,
and will be happy to provide more information or additional tests

1. When doing moderate I/O on a 2.6.6 system the machine becomes unusable.

2. I found that with HIGHMEM support compiled into the kernel, when I did a
cp -vr /var /usr/tmp it would work fine until it got about halfway through the
large ldap.log file (approximately 500 megs) when the system would no longer
be able to fork new processes. Your existing shell would function, but
if you tried to run top, free, etc. it would hang. vmstat 1 would print
the first line, but never continue. I ran a million different kernel configs
to try and isolate things, and I thought I had it nailed down with passing
apic=off to the kernel at boot because the large logfile copy test would
pass, but when rsyncing maildirs tonight the same problem appeared. Early
in my tests I thought the problem was dm-crypt, but the problem existed
even when no encrypted filesystems were involved, and existed when I
removed dm-crypt support from the kernel. Disabling HIGHMEM support seems
to make the problem go away.

Machine requires a powercycle to get it back. Memory was memtested for over
24 hours. Machine is a HP netserver lh1000r with megaraid controller, no IDE.

3. kernel, i/o

4. Linux version 2.6.6 (root@willow) (gcc version 3.3.3 (Debian 20040422)) #9 SMP Fri Jun 11 17:43:06 PDT 2004

5. No oops available

6. see above for reproducable test

7. Environment

7.1 Linux willow 2.6.6 #9 SMP Fri Jun 11 17:43:06 PDT 2004 i686 GNU/Linux

Gnu C 3.3.3
Gnu make 3.80
binutils 2.14.90.0.7
util-linux 2.12
mount 2.12
module-init-tools 3.0-pre10
e2fsprogs 1.35
PPP 2.4.2
Linux C Library 2.3.2
Dynamic linker (ldd) 2.3.2
Procps 3.2.1
Net-tools 1.60
Console-tools 0.2.3
Sh-utils 5.0.91

7.2 cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 6
cpu MHz : 933.936
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr sse
bogomips : 1843.20

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 8
model name : Pentium III (Coppermine)
stepping : 10
cpu MHz : 933.936
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 mmx fxsr sse
bogomips : 1863.68

7.3 No module support in kernel

7.4 cat /proc/ioports
0000-001f : dma1
0020-0021 : pic1
0040-005f : timer
0060-006f : keyboard
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial
0cf8-0cff : PCI conf1
1400-14ff : 0000:00:07.0
1800-183f : 0000:00:02.0
1800-183f : e100
1840-187f : 0000:00:08.0
1840-187f : e100
1880-188f : 0000:00:0f.1
2000-20ff : 0000:01:05.0
2400-24ff : 0000:01:05.1
3000-3fff : PCI Bus #02
3000-30ff : 0000:02:01.0

cat /proc/iomem
00000000-0009f7ff : System RAM
0009f800-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000ca7ff : Video ROM
000ca800-000cbfff : Adapter ROM
000cc000-000cd7ff : Adapter ROM
000cd800-000cdfff : Adapter ROM
000ce000-000cfdff : Adapter ROM
000f0000-000fffff : System ROM
00100000-7fffffff : System RAM
00100000-002f88b1 : Kernel code
002f88b2-0038725f : Kernel data
e8001000-e8001fff : 0000:00:02.0
e8001000-e8001fff : e100
e8002000-e8002fff : 0000:00:07.0
e8003000-e8003fff : 0000:00:08.0
e8003000-e8003fff : e100
e8004000-e8004fff : 0000:00:0f.2
e8100000-e81fffff : 0000:00:02.0
e8100000-e81fffff : e100
e8200000-e82fffff : 0000:00:08.0
e8200000-e82fffff : e100
e9000000-e9ffffff : 0000:00:07.0
ea000000-ea001fff : 0000:01:05.0
ea002000-ea003fff : 0000:01:05.1
ea004000-ea0043ff : 0000:01:05.0
ea004400-ea0047ff : 0000:01:05.1
ea100000-ea1fffff : PCI Bus #02
ea100000-ea100fff : 0000:02:01.0
f0000000-f7ffffff : PCI Bus #02
f0000000-f7ffffff : PCI Bus #03
f0000000-f7ffffff : 0000:03:00.0
f0000000-f000007f : megaraid
fec00000-fec0ffff : reserved
fee00000-fee00fff : reserved
fff80000-ffffffff : reserved


7.5 lspci -vvv

0000:00:00.0 Host bridge: ServerWorks CNB20LE Host Bridge (rev 05)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 48, Cache Line Size: 0x08 (32 bytes)

0000:00:00.1 Host bridge: ServerWorks CNB20LE Host Bridge (rev 05)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, Cache Line Size: 0x08 (32 bytes)

0000:00:02.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
Subsystem: Hewlett-Packard Company NetServer 10/100TX
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (2000ns min, 14000ns max), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 22
Region 0: Memory at e8001000 (32-bit, non-prefetchable)
Region 1: I/O ports at 1800 [size=64]
Region 2: Memory at e8100000 (32-bit, non-prefetchable) [size=1M]
Capabilities: <available only to root>

0000:00:07.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 65) (prog-if 00 [VGA])
Subsystem: Hewlett-Packard Company: Unknown device 10e1
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 66 (2000ns min), Cache Line Size: 0x08 (32 bytes)
Region 0: Memory at e9000000 (32-bit, non-prefetchable)
Region 1: I/O ports at 1400 [size=256]
Region 2: Memory at e8002000 (32-bit, non-prefetchable) [size=4K]
Capabilities: <available only to root>

0000:00:08.0 Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 08)
Subsystem: Hewlett-Packard Company NetServer 10/100TX
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (2000ns min, 14000ns max), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 23
Region 0: Memory at e8003000 (32-bit, non-prefetchable)
Region 1: I/O ports at 1840 [size=64]
Region 2: Memory at e8200000 (32-bit, non-prefetchable) [size=1M]
Capabilities: <available only to root>

0000:00:0f.0 ISA bridge: ServerWorks OSB4 South Bridge (rev 4f)
Subsystem: ServerWorks OSB4 South Bridge
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR+ <PERR-
Latency: 0

0000:00:0f.1 IDE interface: ServerWorks OSB4 IDE Controller (prog-if 8a [Master SecP PriP])
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32
Region 4: I/O ports at 1880 [size=16]

0000:00:0f.2 USB Controller: ServerWorks OSB4/CSB5 OHCI USB Controller (rev 04) (prog-if 10 [OHCI])
Subsystem: ServerWorks OSB4/CSB5 OHCI USB Controller
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap- 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin A routed to IRQ 33
Region 0: Memory at e8004000 (32-bit, non-prefetchable)

0000:01:02.0 PCI bridge: Intel Corp. 21154 PCI-to-PCI Bridge (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64
Bus: primary=01, secondary=02, subordinate=03, sec-latency=36
I/O behind bridge: 00003000-00003fff
Memory behind bridge: ea100000-ea1fffff
Prefetchable memory behind bridge: 00000000f0000000-00000000f7f00000
Expansion ROM at 00003000 [disabled] [size=4K]
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
Capabilities: <available only to root>

0000:01:05.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
Subsystem: Hewlett-Packard Company: Unknown device 60b0
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 72 (4250ns min, 4500ns max), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 24
Region 0: I/O ports at 2000
Region 1: Memory at ea004000 (64-bit, non-prefetchable) [size=1K]
Region 3: Memory at ea000000 (64-bit, non-prefetchable) [size=8K]
Capabilities: <available only to root>

0000:01:05.1 SCSI storage controller: LSI Logic / Symbios Logic 53c1010 Ultra3 SCSI Adapter (rev 01)
Subsystem: Hewlett-Packard Company: Unknown device 60b0
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 72 (4250ns min, 4500ns max), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin B routed to IRQ 25
Region 0: I/O ports at 2400
Region 1: Memory at ea004400 (64-bit, non-prefetchable) [size=1K]
Region 3: Memory at ea002000 (64-bit, non-prefetchable) [size=8K]
Capabilities: <available only to root>

0000:02:00.0 PCI bridge: Intel Corp. 21154 PCI-to-PCI Bridge (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64
Bus: primary=02, secondary=03, subordinate=03, sec-latency=36
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fff00000-000fffff
Prefetchable memory behind bridge: 00000000f0000000-00000000f7f00000
BridgeCtl: Parity- SERR- NoISA+ VGA- MAbort- >Reset- FastB2B-
Capabilities: <available only to root>

0000:02:01.0 SCSI storage controller: QLogic Corp. ISP12160 Dual Channel Ultra3 SCSI Processor (rev 06)
Subsystem: American Megatrends Inc. QLA12160 on AMI MegaRAID
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (16000ns min), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 29
Region 0: I/O ports at 3000
Region 1: Memory at ea100000 (32-bit, non-prefetchable) [size=4K]
Capabilities: <available only to root>

0000:03:00.0 RAID bus controller: American Megatrends Inc. MegaRAID (rev 25)
Subsystem: Hewlett-Packard Company: Unknown device 60e8
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B+
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64, Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 16
Region 0: Memory at f0000000 (32-bit, prefetchable)
Capabilities: <available only to root>

7.6 cat /proc/scsi/scsi
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: MegaRAID Model: LD 0 RAID5 140G Rev: K
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi0 Channel: 05 Id: 11 Lun: 00
Vendor: SDR Model: GEM318 Rev: 0
Type: Processor ANSI SCSI revision: 02
micah@willow:/tmp$

7.7 Machine is a HP netserver lh1000r with megaraid controller, no IDE.



Attachments:
(No filename) (12.48 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2004-06-15 16:01:53

by William Lee Irwin III

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.6 grinds to a halt with moderate I/O

On Tue, Jun 15, 2004 at 10:47:45AM -0500, Micah Anderson wrote:
> Following the format from REPORTING-BUGS please see the below information.
> I unfortunately cannot subscribe to the list, but will follow the thread. I
> have searched high and low, read a number of threads somewhat tangential to
> this problem, and asked a few times in #kernelnewbies before I got to my
> wits end and now will try here. I really appreciate any insight anyone has,
> and will be happy to provide more information or additional tests
> 1. When doing moderate I/O on a 2.6.6 system the machine becomes unusable.
> 2. I found that with HIGHMEM support compiled into the kernel, when I
> did a cp -vr /var /usr/tmp it would work fine until it got about
> halfway through the large ldap.log file (approximately 500 megs) when
> the system would no longer be able to fork new processes. Your
> existing shell would function, but if you tried to run top, free, etc.
> it would hang. vmstat 1 would print the first line, but never
> continue. I ran a million different kernel configs to try and isolate
> things, and I thought I had it nailed down with passing apic=off to
> the kernel at boot because the large logfile copy test would
> pass, but when rsyncing maildirs tonight the same problem appeared. Early
> in my tests I thought the problem was dm-crypt, but the problem existed
> even when no encrypted filesystems were involved, and existed when I
> removed dm-crypt support from the kernel. Disabling HIGHMEM support seems
> to make the problem go away.

Thanks for the bugreport. I'm going to file this in the Debian BTS
after I get the FPU fixes out. Could you send along a dmesg
(/var/log/dmesg on Debian) and /proc/meminfo and /proc/cpuinfo at some
point when you can log into the box? I'll also try to reproduce this.

Thanks.


-- wli

2004-06-15 18:21:07

by micah anderson

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.6 grinds to a halt with moderate I/O


>Thanks for the bugreport. I'm going to file this in the Debian BTS
>after I get the FPU fixes out. Could you send along a dmesg
>(/var/log/dmesg on Debian) and /proc/meminfo and /proc/cpuinfo at some
>point when you can log into the box? I'll also try to reproduce this.

I am not sure why this would be filed in the Debian BTS, yes the
underlying OS is Debian, but this is not a Debian Kernel, it is a
vanilla 2.6.6 kernel that I compiled by hand.

Please find attached the dmesg and the /proc/meminfo, the
/proc/cpuinfo was already included in the original email.

1. dmesg
Linux version 2.6.6 (root@willow) (gcc version 3.3.3 (Debian 20040422)) #10 SMP Tue Jun 15 09:25:44 PDT 2004
BIOS-provided physical RAM map:
BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
BIOS-e820: 00000000000e9400 - 0000000000100000 (reserved)
BIOS-e820: 0000000000100000 - 0000000080000000 (usable)
BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
1152MB HIGHMEM available.
896MB LOWMEM available.
found SMP MP-table at 000f7580
On node 0 totalpages: 524288
DMA zone: 4096 pages, LIFO batch:1
Normal zone: 225280 pages, LIFO batch:16
HighMem zone: 294912 pages, LIFO batch:16
DMI 2.3 present.
ACPI: Unable to locate RSDP
Intel MultiProcessor Specification v1.4
Virtual Wire compatibility mode.
OEM ID: HP Product ID: LP 1Kr/2Kr APIC at: 0xFEE00000
Processor #0 6:8 APIC version 17
Processor #3 6:8 APIC version 17
I/O APIC #1 Version 17 at 0xFEC00000.
I/O APIC #2 Version 17 at 0xFEC01000.
Enabling APIC mode: Flat. Using 2 I/O APICs
Processors: 2
Built 1 zonelists
Kernel command line: apic=off root=/dev/sda1 ro
Initializing CPU#0
CPU 0 irqstacks, hard=c03e4000 soft=c03e2000
PID hash table entries: 4096 (order 12: 32768 bytes)
Detected 934.115 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Memory: 2076156k/2097152k available (1991k kernel code, 19820k reserved, 565k data, 368k init, 1179648k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 1843.20 BogoMIPS
Dentry cache hash table entries: 262144 (order: 8, 1048576 bytes)
Inode-cache hash table entries: 131072 (order: 7, 524288 bytes)
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: After generic identify, caps: 0387fbff 00000000 00000000 00000000
CPU: After vendor identify, caps: 0387fbff 00000000 00000000 00000000
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU serial number disabled.
CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
CPU0: Intel Pentium III (Coppermine) stepping 06
per-CPU timeslice cutoff: 731.34 usecs.
task migration cache decay timeout: 1 msecs.
enabled ExtINT on CPU#0
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Booting processor 1/3 eip 2000
CPU 1 irqstacks, hard=c03e5000 soft=c03e3000
Initializing CPU#1
masked ExtINT on CPU#1
ESR value before enabling vector: 00000000
ESR value after enabling vector: 00000000
Calibrating delay loop... 1863.68 BogoMIPS
CPU: After generic identify, caps: 0387fbff 00000000 00000000 00000000
CPU: After vendor identify, caps: 0387fbff 00000000 00000000 00000000
CPU: L1 I cache: 16K, L1 D cache: 16K
CPU: L2 cache: 256K
CPU serial number disabled.
CPU: After all inits, caps: 0383fbff 00000000 00000000 00000040
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel Pentium III (Coppermine) stepping 0a
Total of 2 processors activated (3706.88 BogoMIPS).
ENABLING IO-APIC IRQs
Setting 1 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 1 ... ok.
Setting 2 in the phys_id_present_map
...changing IO-APIC physical APIC ID to 2 ... ok.
init IO_APIC IRQs
IO-APIC (apicid-pin) 1-0, 1-5, 1-9, 1-10, 1-11, 1-15, 2-1, 2-2, 2-3, 2-4, 2-5, 2-10, 2-11, 2-12, 2-14, 2-15 not connected.
..TIMER: vector=0x31 pin1=-1 pin2=0
...trying to set up timer (IRQ0) through the 8259A ...
..... (found pin 0) ...works.
number of MP IRQ sources: 18.
number of IO-APIC #1 registers: 16.
number of IO-APIC #2 registers: 16.
testing the IO APIC.......................
IO APIC #1......
.... register #00: 01000000
....... : physical APIC id: 01
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 000F0011
....... : max redirection entries: 000F
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 00000000
....... : arbitration: 00
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 001 01 0 0 0 0 0 1 1 31
01 001 01 0 0 0 0 0 1 1 39
02 000 00 1 0 0 0 0 0 0 00
03 001 01 0 0 0 0 0 1 1 41
04 001 01 0 0 0 0 0 1 1 49
05 000 00 1 0 0 0 0 0 0 00
06 001 01 0 0 0 0 0 1 1 51
07 001 01 0 0 0 0 0 1 1 59
08 001 01 0 0 0 0 0 1 1 61
09 000 00 1 0 0 0 0 0 0 00
0a 000 00 1 0 0 0 0 0 0 00
0b 000 00 1 0 0 0 0 0 0 00
0c 001 01 0 0 0 0 0 1 1 69
0d 001 01 0 0 0 0 0 1 1 71
0e 001 01 0 0 0 0 0 1 1 79
0f 000 00 1 0 0 0 0 0 0 00
IO APIC #2......
.... register #00: 02000000
....... : physical APIC id: 02
....... : Delivery Type: 0
....... : LTS : 0
.... register #01: 000F0011
....... : max redirection entries: 000F
....... : PRQ implemented: 0
....... : IO APIC version: 0011
.... register #02: 0D000000
....... : arbitration: 0D
.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 001 01 1 1 0 1 0 1 1 81
01 000 00 1 0 0 0 0 0 0 00
02 000 00 1 0 0 0 0 0 0 00
03 000 00 1 0 0 0 0 0 0 00
04 000 00 1 0 0 0 0 0 0 00
05 000 00 1 0 0 0 0 0 0 00
06 001 01 1 1 0 1 0 1 1 89
07 001 01 1 1 0 1 0 1 1 91
08 001 01 1 1 0 1 0 1 1 99
09 001 01 1 1 0 1 0 1 1 A1
0a 000 00 1 0 0 0 0 0 0 00
0b 000 00 1 0 0 0 0 0 0 00
0c 000 00 1 0 0 0 0 0 0 00
0d 001 01 1 1 0 1 0 1 1 A9
0e 000 00 1 0 0 0 0 0 0 00
0f 000 00 1 0 0 0 0 0 0 00
IRQ to pin mappings:
IRQ0 -> 0:0
IRQ1 -> 0:1
IRQ2 -> 0:2
IRQ3 -> 0:3
IRQ4 -> 0:4
IRQ6 -> 0:6
IRQ7 -> 0:7
IRQ8 -> 0:8
IRQ12 -> 0:12
IRQ13 -> 0:13
IRQ14 -> 0:14
IRQ16 -> 1:0
IRQ22 -> 1:6
IRQ23 -> 1:7
IRQ24 -> 1:8
IRQ25 -> 1:9
IRQ29 -> 1:13
.................................... done.
Using local APIC timer interrupts.
calibrating APIC timer ...
..... CPU clock speed is 933.0294 MHz.
..... host bus clock speed is 133.0327 MHz.
checking TSC synchronization across 2 CPUs: passed.
Brought up 2 CPUs
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfda11, last bus=3
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
Linux Plug and Play Support v0.97 (c) Adam Belay
SCSI subsystem initialized
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: Discovered peer bus 01
PCI->APIC IRQ transform: (B0,I2,P0) -> 22
PCI->APIC IRQ transform: (B0,I8,P0) -> 23
PCI->APIC IRQ transform: (B0,I15,P0) -> 33
PCI->APIC IRQ transform: (B1,I5,P0) -> 24
PCI->APIC IRQ transform: (B1,I5,P1) -> 25
PCI->APIC IRQ transform: (B2,I1,P0) -> 29
PCI->APIC IRQ transform: (B3,I0,P0) -> 16
Machine check exception polling timer started.
Starting balanced_irq
highmem bounce pool size: 64 pages
Initializing Cryptographic API
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing disabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
Using anticipatory io scheduler
Floppy drive(s): fd0 is 1.44M
FDC 0 is a National Semiconductor PC87306
e100: Intel(R) PRO/100 Network Driver, 3.0.17
e100: Copyright(c) 1999-2004 Intel Corporation
e100: eth0: e100_probe: addr 0xe8001000, irq 22, MAC addr 00:30:6E:05:E9:D0
e100: eth1: e100_probe: addr 0xe8003000, irq 23, MAC addr 00:30:6E:05:E9:D1
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
hda: CD-224E, ATAPI CD/DVD-ROM drive
hdc: IRQ probe failed (0xffffffba)
hdc: IRQ probe failed (0xffffffba)
hdd: IRQ probe failed (0xffffffba)
hdd: IRQ probe failed (0xffffffba)
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: ATAPI 24X CD-ROM drive, 128kB Cache
Uniform CD-ROM driver Revision: 3.20
megaraid: found 0x101e:0x1960:bus 3:slot 0:func 0
scsi0:Found MegaRAID controller at 0xf8804000, IRQ:16
megaraid: [K01.04:J01.01] detected 1 logical drives.
megaraid: supports extended CDBs.
megaraid: channel[0] is raid.
megaraid: channel[1] is raid.
scsi0 : LSI Logic MegaRAID K01.04 254 commands 16 targs 5 chans 7 luns
scsi0: scanning scsi channel 0 for logical drives.
Vendor: MegaRAID Model: LD 0 RAID5 140G Rev: K
Type: Direct-Access ANSI SCSI revision: 02
scsi0: scanning scsi channel 1 for logical drives.
scsi0: scanning scsi channel 2 for logical drives.
scsi0: scanning scsi channel 4 [P0] for physical devices.
scsi0: scanning scsi channel 5 [P1] for physical devices.
Vendor: SDR Model: GEM318 Rev: 0
Type: Processor ANSI SCSI revision: 02
SCSI device sda: 286744576 512-byte hdwr sectors (146813 MB)
sda: asking for cache data failed
sda: assuming drive cache: write through
sda: sda1 sda2 < sda5 sda6 sda7 sda8 sda9 >
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0, type 0
Attached scsi generic sg1 at scsi0, channel 5, id 11, lun 0, type 3
mice: PS/2 mouse device common for all mice
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
input: AT Translated Set 2 keyboard on isa0060/serio0
md: linear personality registered as nr 1
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
raid5: measuring checksumming speed
8regs : 1704.000 MB/sec
8regs_prefetch: 1364.000 MB/sec
32regs : 900.000 MB/sec
32regs_prefetch: 796.000 MB/sec
pIII_sse : 1900.000 MB/sec
pII_mmx : 2340.000 MB/sec
p5_mmx : 2504.000 MB/sec
raid5: using function: pIII_sse (1900.000 MB/sec)
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
NET: Registered protocol family 2
IP: routing cache hash table of 16384 buckets, 128Kbytes
TCP: Hash tables configured (established 524288 bind 65536)
ip_conntrack version 2.1 (8192 buckets, 65536 max) - 300 bytes per conntrack
ip_tables: (C) 2000-2002 Netfilter core team
ipt_recent v0.3.1: Stephen Frost <[email protected]>. http://snowman.net/projects/ipt_recent/
arp_tables: (C) 2002 David S. Miller
NET: Registered protocol family 1
NET: Registered protocol family 17
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting. Commit interval 5 seconds
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 368k freed
Adding 979920k swap on /dev/sda6. Priority:-1 extents:1
EXT3 FS on sda1, internal journal
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda7, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting. Commit interval 5 seconds
EXT3 FS on sda8, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
e100: eth1: e100_watchdog: link up, 100Mbps, full-duplex


2. /proc/meminfo
MemTotal: 2077312 kB
MemFree: 1583540 kB
Buffers: 27804 kB
Cached: 374892 kB
SwapCached: 0 kB
Active: 65300 kB
Inactive: 375884 kB
HighTotal: 1179648 kB
HighFree: 765056 kB
LowTotal: 897664 kB
LowFree: 818484 kB
SwapTotal: 979920 kB
SwapFree: 979920 kB
Dirty: 64 kB
Writeback: 0 kB
Mapped: 51084 kB
Slab: 41584 kB
Committed_AS: 184572 kB
PageTables: 1032 kB
VmallocTotal: 114680 kB
VmallocUsed: 788 kB
VmallocChunk: 113892 kB




On Tue, 15 Jun 2004, William Lee Irwin III wrote:

> On Tue, Jun 15, 2004 at 10:47:45AM -0500, Micah Anderson wrote:
> > Following the format from REPORTING-BUGS please see the below information.
> > I unfortunately cannot subscribe to the list, but will follow the thread. I
> > have searched high and low, read a number of threads somewhat tangential to
> > this problem, and asked a few times in #kernelnewbies before I got to my
> > wits end and now will try here. I really appreciate any insight anyone has,
> > and will be happy to provide more information or additional tests
> > 1. When doing moderate I/O on a 2.6.6 system the machine becomes unusable.
> > 2. I found that with HIGHMEM support compiled into the kernel, when I
> > did a cp -vr /var /usr/tmp it would work fine until it got about
> > halfway through the large ldap.log file (approximately 500 megs) when
> > the system would no longer be able to fork new processes. Your
> > existing shell would function, but if you tried to run top, free, etc.
> > it would hang. vmstat 1 would print the first line, but never
> > continue. I ran a million different kernel configs to try and isolate
> > things, and I thought I had it nailed down with passing apic=off to
> > the kernel at boot because the large logfile copy test would
> > pass, but when rsyncing maildirs tonight the same problem appeared. Early
> > in my tests I thought the problem was dm-crypt, but the problem existed
> > even when no encrypted filesystems were involved, and existed when I
> > removed dm-crypt support from the kernel. Disabling HIGHMEM support seems
> > to make the problem go away.
>
> Thanks for the bugreport. I'm going to file this in the Debian BTS
> after I get the FPU fixes out. Could you send along a dmesg
> (/var/log/dmesg on Debian) and /proc/meminfo and /proc/cpuinfo at some
> point when you can log into the box? I'll also try to reproduce this.
>
> Thanks.
>
>
> -- wli


Attachments:
(No filename) (14.76 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2004-06-15 18:32:10

by William Lee Irwin III

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.6 grinds to a halt with moderate I/O

At some point in the past, I wrote:
>> Thanks for the bugreport. I'm going to file this in the Debian BTS
>> after I get the FPU fixes out. Could you send along a dmesg
>> (/var/log/dmesg on Debian) and /proc/meminfo and /proc/cpuinfo at some
>> point when you can log into the box? I'll also try to reproduce this.

On Tue, Jun 15, 2004 at 01:19:08PM -0500, Micah Anderson wrote:
> I am not sure why this would be filed in the Debian BTS, yes the
> underlying OS is Debian, but this is not a Debian Kernel, it is a
> vanilla 2.6.6 kernel that I compiled by hand.

The debian kernel team is migrating Debian's 2.6 as close to mainline
as is possible within policy guidelines, so it'll be applicable to it
hopefully in the next 24 hours.

On Tue, Jun 15, 2004 at 01:19:08PM -0500, Micah Anderson wrote:
> Please find attached the dmesg and the /proc/meminfo, the
> /proc/cpuinfo was already included in the original email.

Okay, thanks. I'll do some testing of copying large files shortly.


-- wli

2004-06-16 07:43:28

by Philippe Gramoullé

[permalink] [raw]
Subject: Re: PROBLEM: 2.6.6 grinds to a halt with moderate I/O


Hello Micah,

Could you give more information on the Megaraid part of your setup ?

Are the i/o made on a disk/partition controlled by the megaraid ?

If no, does the same behavior occur on a plain scsi disk ?

If yes, what kind of RAID level do you use ? how many disks ?

Could you give megaraid firmware information as well as logical volume settings regarding
read,write and cache policy.

Also, is it a regression over previous kernels, like 2.6.5 or even earlier kernels ?

I've been using 2.6.3-mm3 for weeks now with DELL hardware and a megaraid controller
doing intensive i/o around the clock without any problems. The box has been just rock solid.

Thanks,

Philippe

On Tue, 15 Jun 2004 10:47:45 -0500
Micah Anderson <[email protected]> wrote:

|
| Following the format from REPORTING-BUGS please see the below information.
| I unfortunately cannot subscribe to the list, but will follow the thread. I
| have searched high and low, read a number of threads somewhat tangential to
| this problem, and asked a few times in #kernelnewbies before I got to my
| wits end and now will try here. I really appreciate any insight anyone has,
| and will be happy to provide more information or additional tests
| [snip]