2003-01-06 23:27:10

by Chris Wood

[permalink] [raw]
Subject: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

Due to kswapd problems in Redhat's 2.4.9 kernel, I have had to upgrade
to the 2.4.20 kernel with the IBM Summit Patches for our IBM x440. It
has run very well with one exception, between 8:00am and 9:00am our
server will see a cpu usage hit under the system resources (in top) and
start to drag the server to a very slow situation where people can't
access the server.

See the following jpg of top as an example of the system usage. It
doesn't seem to be any one program.

http://www.wencor.com/slow2.4.20.jpg

When we start to have users log off the server (we have 300 telnet users
that login) the system usually bounces right back to normal. We have
had to reboot once or twice to get it fully working again (lpd went into
limbo and wouldn't come back). After the server bounces back to normal,
we can run the rest of the day without any trouble and under full heavy
load. I have never seen it happen at any other time of day and it
doesn't happen every day.

With some tips from James Cleverdon (IBM), I turned on some kernel
debugging and got the following from readprofile when the server was
having problems (truncated to the first 22 lines):
16480 total 0.0138
6383 .text.lock.swap 110.0517
4689 .text.lock.vmscan 28.2470
4486 shrink_cache 4.6729
168 rw_swap_page_base 0.6176
124 prune_icache 0.5167
81 statm_pgd_range 0.1534
51 .text.lock.inode 0.0966
38 system_call 0.6786
31 .text.lock.tty_io 0.0951
31 .text.lock.locks 0.1435
18 .text.lock.sched 0.0373
16 _stext 0.2000
15 fput 0.0586
11 .text.lock.read_write 0.0924
9 strnicmp 0.0703
9 do_wp_page 0.0110
9 do_page_fault 0.0066
9 .text.lock.namei 0.0073
9 .text.lock.fcntl 0.0714
8 sys_read 0.0294

Here is a snapshot when the server is fine, no problems (truncated):
1715833 total 1.4317
1677712 default_idle 26214.2500
4355 system_call 77.7679
2654 file_read_actor 11.0583
2159 bounce_end_io_read 5.8668
1752 put_filp 18.2500
1664 do_page_fault 1.2137
1294 fget 20.2188
1246 do_wp_page 1.5270
1233 fput 4.8164
1138 posix_lock_file 0.7903
1120 kmem_cache_alloc 3.6842
1098 do_softirq 4.9018
1042 statm_pgd_range 1.9735
882 kfree 6.1250
732 __loop_delay 15.2500
673 flush_tlb_mm 6.0089
610 fcntl_setlk64 1.3616
554 __kill_fasync 4.9464
498 zap_page_range 0.4716
414 do_generic_file_read 0.3696
409 __free_pages 8.5208
401 sys_semop 0.3530

I have to admit that most of this doesn't make a lot of sense to me and
I don't know what the .text.lock.* processes are doing. Any ideas?
Anything I can try?

Chris Wood
Wencor West, Inc.

-----------------------------------
System Info From Here Down:
IBM x440 - Dual Xeon 1.4ghz MP, with Hyperthreading turned on
6 gig RAM
2 internal 36gig drives mirrored
1 additional intel e1000 network card
2 IBM fibre adapters (QLA2300s) connected to a FastT700 SAN
RedHat Advanced Server 2.1
2.4.20 kernel built using the RH 2.4.9e8summit .config file as template

These things are listed below (hopefully this isn't overkill):
x440:/proc$ cat modules (see results below)
x440:/proc$ cat scsi/scsi (see results below)
x440:/proc$ cat cpuinfo (see results below)
x440:/proc$ cat ioports (see results below)
x440:/proc$ cat iomem (see results below)

x440:/proc$ cat modules
autofs 11876 0 (autoclean) (unused)
e1000 59280 1
bcm5700 95076 1
ipchains 50728 28
usb-uhci 26724 0 (unused)
usbcore 76448 1 [usb-uhci]
ext3 69888 7
jbd 51808 7 [ext3]
qla2300 236608 2
ips 45184 6
aic7xxx 133376 0
sd_mod 13020 16
scsi_mod 121304 4 [qla2300 ips aic7xxx sd_mod]

x440:/proc$ cat scsi/scsi
Attached devices:
Host: scsi2 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: SERVERAID Rev: 1.00
Type: Direct-Access ANSI SCSI revision: 02
Host: scsi2 Channel: 00 Id: 15 Lun: 00
Vendor: IBM Model: SERVERAID Rev: 1.00
Type: Processor ANSI SCSI revision: 02
Host: scsi2 Channel: 01 Id: 09 Lun: 00
Vendor: IBM Model: GNHv1 S2 Rev: 0
Type: Processor ANSI SCSI revision: 02
Host: scsi3 Channel: 00 Id: 00 Lun: 00
Vendor: IBM Model: 1742 Rev: 0520
Type: Direct-Access ANSI SCSI revision: 03


x440:/proc$ cat cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Xeon(TM) CPU 1.40GHz
stepping : 1
cpu MHz : 1397.190
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 2785.28

processor : 1
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Xeon(TM) CPU 1.40GHz
stepping : 1
cpu MHz : 1397.190
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 2791.83

processor : 2
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Xeon(TM) CPU 1.40GHz
stepping : 1
cpu MHz : 1397.190
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 2791.83

processor : 3
vendor_id : GenuineIntel
cpu family : 15
model : 1
model name : Intel(R) Xeon(TM) CPU 1.40GHz
stepping : 1
cpu MHz : 1397.190
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips : 2791.83

x440:/proc$ cat ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
03c0-03df : vga+
03f6-03f6 : ide0
0440-044f : VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
0700-070f : VIA Technologies, Inc. VT82C586B PIPC Bus Master IDE
0700-0707 : ide0
0708-070f : ide1
0cf8-0cff : PCI conf1
1800-187f : PCI device 1014:010f (IBM)
1880-189f : VIA Technologies, Inc. USB
1880-189f : usb-uhci
18a0-18bf : VIA Technologies, Inc. USB (#2)
18a0-18bf : usb-uhci
2000-20ff : Adaptec AIC-7899P U160/m
2100-21ff : Adaptec AIC-7899P U160/m (#2)
2800-28ff : QLogic Corp. QLA2300 64-bit FC-AL Adapter
2800-28fe : qla2300
4000-40ff : QLogic Corp. QLA2300 64-bit FC-AL Adapter (#2)
4000-40fe : qla2300
7000-701f : Intel Corp. 82544EI Gigabit Ethernet Controller
7000-701f : e1000

x440:/proc$ cat iomem
00000000-0009c7ff : System RAM
0009c800-0009ffff : reserved
000a0000-000bffff : Video RAM area
000c0000-000c7fff : Video ROM
000c8000-000cffff : Extension ROM
000d0000-000d01ff : Extension ROM
000f0000-000fffff : System ROM
00100000-dffb707f : System RAM
00100000-0022b615 : Kernel code
0022b616-002a525f : Kernel data
dffb7080-dffbf7ff : ACPI Tables
dffbf800-dfffffff : reserved
e0000000-e7ffffff : S3 Inc. Savage 4
e8400000-e8401fff : IBM Netfinity ServeRAID controller
e8400000-e8401fff : ips
f0c20000-f0c3ffff : Intel Corp. 82544EI Gigabit Ethernet Controller
f0c20000-f0c3ffff : e1000
f0c40000-f0c5ffff : Intel Corp. 82544EI Gigabit Ethernet Controller
f0c40000-f0c5ffff : e1000
f1000000-f11fffff : PCI device 1014:010f (IBM)
f1200000-f127ffff : S3 Inc. Savage 4
f1600000-f160ffff : Broadcom Corporation NetXtreme BCM5700 Gigabit Ethernet
f1600000-f160ffff : bcm5700
f1610000-f1610fff : Adaptec AIC-7899P U160/m
f1610000-f1610fff : aic7xxx
f1611000-f1611fff : Adaptec AIC-7899P U160/m (#2)
f1611000-f1611fff : aic7xxx
f1820000-f1820fff : QLogic Corp. QLA2300 64-bit FC-AL Adapter
f1920000-f1920fff : QLogic Corp. QLA2300 64-bit FC-AL Adapter (#2)
fec00000-ffffffff : reserved



2003-01-06 23:44:07

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

Chris Wood wrote:
>
> Due to kswapd problems in Redhat's 2.4.9 kernel, I have had to upgrade
> to the 2.4.20 kernel with the IBM Summit Patches for our IBM x440.
> ...
> 16480 total 0.0138
> 6383 .text.lock.swap 110.0517
> 4689 .text.lock.vmscan 28.2470
> 4486 shrink_cache 4.6729
> 168 rw_swap_page_base 0.6176
> 124 prune_icache 0.5167

With six gigs of memory, it looks like the VM has gone nuts
trying to locate some reclaimable lowmem.

Suggest you send the contents of /proc/meminfo and /proc/slabinfo,
captured during a period of misbehaviour.

Then please apply
http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1.bz2
and send a report on the outcome.

2003-01-06 23:41:49

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

On Mon, Jan 06, 2003 at 04:35:17PM -0700, Chris Wood wrote:
> 6383 .text.lock.swap 110.0517
> 4689 .text.lock.vmscan 28.2470
> 4486 shrink_cache 4.6729
> 168 rw_swap_page_base 0.6176
> 124 prune_icache 0.5167
> 81 statm_pgd_range 0.1534
> 51 .text.lock.inode 0.0966
> 38 system_call 0.6786
> 31 .text.lock.tty_io 0.0951
> 31 .text.lock.locks 0.1435
> 18 .text.lock.sched 0.0373
> 16 _stext 0.2000
> 15 fput 0.0586
> 11 .text.lock.read_write 0.0924
> 9 strnicmp 0.0703
> 9 do_wp_page 0.0110
> 9 do_page_fault 0.0066
> 9 .text.lock.namei 0.0073
> 9 .text.lock.fcntl 0.0714
> 8 sys_read 0.0294

This is really bad lock contention. You may need 2.5.x.


Bill

2003-01-09 02:12:47

by James Cleverdon

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

On Monday 06 January 2003 03:35 pm, Chris Wood wrote:
> Due to kswapd problems in Redhat's 2.4.9 kernel, I have had to upgrade
> to the 2.4.20 kernel with the IBM Summit Patches for our IBM x440. It
> has run very well with one exception, between 8:00am and 9:00am our
> server will see a cpu usage hit under the system resources (in top) and
> start to drag the server to a very slow situation where people can't
> access the server.
>
> See the following jpg of top as an example of the system usage. It
> doesn't seem to be any one program.
>
> http://www.wencor.com/slow2.4.20.jpg
>
> When we start to have users log off the server (we have 300 telnet users
> that login) the system usually bounces right back to normal. We have
> had to reboot once or twice to get it fully working again (lpd went into
> limbo and wouldn't come back). After the server bounces back to normal,
> we can run the rest of the day without any trouble and under full heavy
> load. I have never seen it happen at any other time of day and it
> doesn't happen every day.
>
> With some tips from James Cleverdon (IBM), I turned on some kernel
> debugging and got the following from readprofile when the server was
> having problems (truncated to the first 22 lines):
> 16480 total 0.0138
> 6383 .text.lock.swap 110.0517
> 4689 .text.lock.vmscan 28.2470
> 4486 shrink_cache 4.6729
> 168 rw_swap_page_base 0.6176
> 124 prune_icache 0.5167
> 81 statm_pgd_range 0.1534
> 51 .text.lock.inode 0.0966
> 38 system_call 0.6786
> 31 .text.lock.tty_io 0.0951
> 31 .text.lock.locks 0.1435
> 18 .text.lock.sched 0.0373
> 16 _stext 0.2000
> 15 fput 0.0586
> 11 .text.lock.read_write 0.0924
> 9 strnicmp 0.0703
> 9 do_wp_page 0.0110
> 9 do_page_fault 0.0066
> 9 .text.lock.namei 0.0073
> 9 .text.lock.fcntl 0.0714
> 8 sys_read 0.0294
>
> Here is a snapshot when the server is fine, no problems (truncated):
> 1715833 total 1.4317
> 1677712 default_idle 26214.2500
> 4355 system_call 77.7679
> 2654 file_read_actor 11.0583
> 2159 bounce_end_io_read 5.8668
> 1752 put_filp 18.2500
> 1664 do_page_fault 1.2137
> 1294 fget 20.2188
> 1246 do_wp_page 1.5270
> 1233 fput 4.8164
> 1138 posix_lock_file 0.7903
> 1120 kmem_cache_alloc 3.6842
> 1098 do_softirq 4.9018
> 1042 statm_pgd_range 1.9735
> 882 kfree 6.1250
> 732 __loop_delay 15.2500
> 673 flush_tlb_mm 6.0089
> 610 fcntl_setlk64 1.3616
> 554 __kill_fasync 4.9464
> 498 zap_page_range 0.4716
> 414 do_generic_file_read 0.3696
> 409 __free_pages 8.5208
> 401 sys_semop 0.3530
>
> I have to admit that most of this doesn't make a lot of sense to me and
> I don't know what the .text.lock.* processes are doing. Any ideas?
> Anything I can try?
>
> Chris Wood
> Wencor West, Inc.

Chris,

You're showing all the signs of the "kswapd" bug present in v2.4 kernels.
Well, kswapd gets blamed for the problem. It is actually caused by using up
nearly all of low memory with the buffer header and/or inode slab caches.
(Cat /proc/slabinfo when kswapd is running >= 99% and see if those two caches
have grown extra large.) Anyway, kswapd gets triggered because a zone has
hit its low memory threshold. But kswapd can't swap buffer headers or
inodes. The situation is hopeless, yet kswapd presses on anyway, scouring
every memory zone for pages to free, all the while holding important memory
locks.

Meanwhile, every program that wants more memory will spin on those locks.
That's what the .text.lock.* entries are: the out-of-line spin code for each
lock; it is used when the lock is already owned by some other CPU.

Net result: a computer that runs like molasses in January.

Of the several proposed patches for this bug, Andrea Archangeli's and Andrew
Morton's worked best in our tests. I believe that Andrea was going to add in
some of Andrew's code for the final fix. The kernel that is on the SLES 8 /
UL 1.0 gold CDs works fine so I assume the Vulcan Mind Meld on the patches
went well.

Unfortunately, I don't have any references to the final patch set.

> -----------------------------------
> System Info From Here Down:
> IBM x440 - Dual Xeon 1.4ghz MP, with Hyperthreading turned on
> 6 gig RAM
> 2 internal 36gig drives mirrored
> 1 additional intel e1000 network card
> 2 IBM fibre adapters (QLA2300s) connected to a FastT700 SAN
> RedHat Advanced Server 2.1
> 2.4.20 kernel built using the RH 2.4.9e8summit .config file as template
>
[ Snip! ]

Our customers have seen this on large Dell boxes too. I strongly suspect that
any v2.4 system with lots of physical memory and high I/O bandwidth can cause
this bug.


--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com


2003-01-09 02:49:11

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

On Mon, Jan 06, 2003 at 04:35:17PM -0700, Chris Wood wrote:
> With some tips from James Cleverdon (IBM), I turned on some kernel
> debugging and got the following from readprofile when the server was
> having problems (truncated to the first 22 lines):
> 16480 total 0.0138

Here are some monitoring tools that might help detect the cause of
the situation.

bloatmon is the "back end"; there's no reason to run it directly.

bloatmeter shows the "least utilized" slabs.

bloatmost shows the largest slabs.

These sort of make for a top(1) for "lowmem pressure". Not everything
is accounted there, though. The missing pieces are largely

(1) simultaneous temporary poll table allocations
(2) pmd's
(3) kernel stacks

Bill


Attachments:
(No filename) (760.00 B)
brief message
bloatmost (110.00 B)
bloatmost
bloatmeter (133.00 B)
bloatmeter
bloatmon (413.00 B)
bloatmon
Download all attachments

2003-01-09 17:24:53

by Chris Wood

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

Andrew Morton wrote:
> Chris Wood wrote:
>
>>Due to kswapd problems in Redhat's 2.4.9 kernel, I have had to upgrade
>>to the 2.4.20 kernel with the IBM Summit Patches for our IBM x440.
>>...
>>16480 total 0.0138
>> 6383 .text.lock.swap 110.0517
>> 4689 .text.lock.vmscan 28.2470
>> 4486 shrink_cache 4.6729
>> 168 rw_swap_page_base 0.6176
>> 124 prune_icache 0.5167
>
>
> With six gigs of memory, it looks like the VM has gone nuts
> trying to locate some reclaimable lowmem.
>
> Suggest you send the contents of /proc/meminfo and /proc/slabinfo,
> captured during a period of misbehaviour.

The server ran fine for 3 days, so it took a bit to get this info.

Is there a list of which patches I can apply if I don't want to apply
the entire 2.4.20aa1? I'm nervous about breaking other things, but may
give it a try anyway.

Thanks for the help!

Here is a /proc/meminfo when it is running fine:

total: used: free: shared: buffers: cached:
Mem: 6356955136 6035910656 321044480 0 206626816 5301600256
Swap: 2146529280 41652224 2104877056
MemTotal: 6207964 kB
MemFree: 313520 kB
MemShared: 0 kB
Buffers: 201784 kB
Cached: 5171716 kB
SwapCached: 5628 kB
Active: 3667492 kB
Inactive: 1912544 kB
HighTotal: 5373660 kB
HighFree: 203952 kB
LowTotal: 834304 kB
LowFree: 109568 kB
SwapTotal: 2096220 kB
SwapFree: 2055544 kB

Here is a /proc/meminfo when it is having problems:

total: used: free: shared: buffers: cached:
Mem: 6356955136 6337114112 19841024 0 369520640 5160353792
Swap: 2146529280 96501760 2050027520
MemTotal: 6207964 kB
MemFree: 19376 kB
MemShared: 0 kB
Buffers: 360860 kB
Cached: 5023300 kB
SwapCached: 16108 kB
Active: 2551264 kB
Inactive: 3291804 kB
HighTotal: 5373660 kB
HighFree: 15404 kB
LowTotal: 834304 kB
LowFree: 3972 kB
SwapTotal: 2096220 kB
SwapFree: 2001980 kB

Here is a /proc/slabinfo when it is fine:

slabinfo - version: 1.1 (SMP)
kmem_cache 64 64 244 4 4 1 : 252 126
ip_fib_hash 14 224 32 2 2 1 : 252 126
ip_conntrack 0 0 384 0 0 1 : 124 62
urb_priv 0 0 64 0 0 1 : 252 126
journal_head 1141 5929 48 33 77 1 : 252 126
revoke_table 7 250 12 1 1 1 : 252 126
revoke_record 448 448 32 4 4 1 : 252 126
clip_arp_cache 0 0 128 0 0 1 : 252 126
ip_mrt_cache 0 0 128 0 0 1 : 252 126
tcp_tw_bucket 384 510 128 13 17 1 : 252 126
tcp_bind_bucket 442 1008 32 9 9 1 : 252 126
tcp_open_request 570 570 128 19 19 1 : 252 126
inet_peer_cache 232 232 64 4 4 1 : 252 126
ip_dst_cache 807 1185 256 79 79 1 : 252 126
arp_cache 354 480 128 16 16 1 : 252 126
blkdev_requests 768 810 128 27 27 1 : 252 126
dnotify_cache 500 664 20 4 4 1 : 252 126
file_lock_cache 1157 2120 96 53 53 1 : 252 126
fasync_cache 565 600 16 3 3 1 : 252 126
uid_cache 419 448 32 4 4 1 : 252 126
skbuff_head_cache 780 1410 256 65 94 1 : 252 126
sock 426 1671 1280 288 557 1 : 60 30
sigqueue 725 725 132 25 25 1 : 252 126
kiobuf 0 0 64 0 0 1 : 252 126
cdev_cache 703 870 64 15 15 1 : 252 126
bdev_cache 9 116 64 2 2 1 : 252 126
mnt_cache 18 116 64 2 2 1 : 252 126
inode_cache 50995 50995 512 7285 7285 1 : 124 62
dentry_cache 71760 71760 128 2392 2392 1 : 252 126
dquot 0 0 128 0 0 1 : 252 126
filp 52314 52380 128 1746 1746 1 : 252 126
names_cache 28 28 4096 28 28 1 : 60 30
buffer_head 1342242 1486740 128 49558 49558 1 : 252 126
mm_struct 701 2355 256 155 157 1 : 252 126
vm_area_struct 11887 58530 128 1793 1951 1 : 252 126
fs_cache 831 2378 64 41 41 1 : 252 126
files_cache 597 2184 512 246 312 1 : 124 62
signal_act 501 2112 1408 168 192 4 : 60 30
pae_pgd 699 2378 64 41 41 1 : 252 126
size-131072(DMA) 0 0 131072 0 0 32 : 0 0
size-131072 0 0 131072 0 0 32 : 0 0
size-65536(DMA) 0 0 65536 0 0 16 : 0 0
size-65536 0 0 65536 0 0 16 : 0 0
size-32768(DMA) 0 0 32768 0 0 8 : 0 0
size-32768 1 5 32768 1 5 8 : 0 0
size-16384(DMA) 0 0 16384 0 0 4 : 0 0
size-16384 5 12 16384 5 12 4 : 0 0
size-8192(DMA) 0 0 8192 0 0 2 : 0 0
size-8192 5 7 8192 5 7 2 : 0 0
size-4096(DMA) 0 0 4096 0 0 1 : 60 30
size-4096 437 1127 4096 437 1127 1 : 60 30
size-2048(DMA) 0 0 2048 0 0 1 : 60 30
size-2048 314 434 2048 170 217 1 : 60 30
size-1024(DMA) 0 0 1024 0 0 1 : 124 62
size-1024 567 1464 1024 240 366 1 : 124 62
size-512(DMA) 0 0 512 0 0 1 : 124 62
size-512 906 968 512 120 121 1 : 124 62
size-256(DMA) 0 0 256 0 0 1 : 252 126
size-256 8724 8850 256 583 590 1 : 252 126
size-128(DMA) 2 60 128 2 2 1 : 252 126
size-128 3198 3450 128 114 115 1 : 252 126
size-64(DMA) 0 0 128 0 0 1 : 252 126
size-64 3486 4050 128 135 135 1 : 252 126
size-32(DMA) 34 116 64 2 2 1 : 252 126
size-32 22446 22446 64 387 387 1 : 252 126

Here is a /proc/slabinfo when it is having problems:

slabinfo - version: 1.1 (SMP)
kmem_cache 64 64 244 4 4 1 : 252 126
ip_fib_hash 14 224 32 2 2 1 : 252 126
ip_conntrack 0 0 384 0 0 1 : 124 62
urb_priv 0 0 64 0 0 1 : 252 126
journal_head 1660 3773 48 49 49 1 : 252 126
revoke_table 7 250 12 1 1 1 : 252 126
revoke_record 0 0 32 0 0 1 : 252 126
clip_arp_cache 0 0 128 0 0 1 : 252 126
ip_mrt_cache 0 0 128 0 0 1 : 252 126
tcp_tw_bucket 148 150 128 5 5 1 : 252 126
tcp_bind_bucket 696 896 32 8 8 1 : 252 126
tcp_open_request 120 120 128 4 4 1 : 252 126
inet_peer_cache 107 232 64 4 4 1 : 252 126
ip_dst_cache 960 960 256 64 64 1 : 252 126
arp_cache 232 360 128 12 12 1 : 252 126
blkdev_requests 768 810 128 27 27 1 : 252 126
dnotify_cache 238 332 20 2 2 1 : 252 126
file_lock_cache 1776 2040 96 51 51 1 : 252 126
fasync_cache 273 400 16 2 2 1 : 252 126
uid_cache 501 560 32 5 5 1 : 252 126
skbuff_head_cache 685 1020 256 68 68 1 : 252 126
sock 1095 1095 1280 365 365 1 : 60 30
sigqueue 203 203 132 7 7 1 : 252 126
kiobuf 0 0 64 0 0 1 : 252 126
cdev_cache 725 754 64 13 13 1 : 252 126
bdev_cache 9 116 64 2 2 1 : 252 126
mnt_cache 18 116 64 2 2 1 : 252 126
inode_cache 13808 20755 512 2965 2965 1 : 124 62
dentry_cache 5976 14070 128 469 469 1 : 252 126
dquot 0 0 128 0 0 1 : 252 126
filp 52314 52380 128 1746 1746 1 : 252 126
names_cache 8 8 4096 8 8 1 : 60 30
buffer_head 1335952 1470150 128 49005 49005 1 : 252 126
mm_struct 1620 1620 256 108 108 1 : 252 126
vm_area_struct 39180 39180 128 1306 1306 1 : 252 126
fs_cache 1815 1972 64 34 34 1 : 252 126
files_cache 1477 1477 512 211 211 1 : 124 62
signal_act 1430 1430 1408 130 130 4 : 60 30
pae_pgd 1798 1798 64 31 31 1 : 252 126
size-131072(DMA) 0 0 131072 0 0 32 : 0 0
size-131072 0 0 131072 0 0 32 : 0 0
size-65536(DMA) 0 0 65536 0 0 16 : 0 0
size-65536 0 0 65536 0 0 16 : 0 0
size-32768(DMA) 0 0 32768 0 0 8 : 0 0
size-32768 1 1 32768 1 1 8 : 0 0
size-16384(DMA) 0 0 16384 0 0 4 : 0 0
size-16384 5 5 16384 5 5 4 : 0 0
size-8192(DMA) 0 0 8192 0 0 2 : 0 0
size-8192 5 5 8192 5 5 2 : 0 0
size-4096(DMA) 0 0 4096 0 0 1 : 60 30
size-4096 981 1011 4096 981 1011 1 : 60 30
size-2048(DMA) 0 0 2048 0 0 1 : 60 30
size-2048 312 342 2048 167 171 1 : 60 30
size-1024(DMA) 0 0 1024 0 0 1 : 124 62
size-1024 1080 1080 1024 270 270 1 : 124 62
size-512(DMA) 0 0 512 0 0 1 : 124 62
size-512 832 832 512 104 104 1 : 124 62
size-256(DMA) 0 0 256 0 0 1 : 252 126
size-256 8550 8550 256 570 570 1 : 252 126
size-128(DMA) 2 60 128 2 2 1 : 252 126
size-128 2850 2850 128 95 95 1 : 252 126
size-64(DMA) 0 0 128 0 0 1 : 252 126
size-64 2591 4200 128 140 140 1 : 252 126
size-32(DMA) 34 116 64 2 2 1 : 252 126
size-32 2536 7134 64 123 123 1 : 252 126

>
> Then please apply
> http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1.bz2
> and send a report on the outcome.



2003-01-09 20:09:52

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

Chris Wood wrote:
>
> ..
> The server ran fine for 3 days, so it took a bit to get this info.

Is appreciated, thanks.

> Is there a list of which patches I can apply if I don't want to apply
> the entire 2.4.20aa1? I'm nervous about breaking other things, but may
> give it a try anyway.

http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1/05_vm_16_active_free_zone_bhs-1
http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1/10_inode-highmem-2

The former is the most important and, alas, has dependencies on
earlier patches.

hm, OK. I've pulled all Andrea's VM changes and the inode-highmem fix
into a standalone diff. I'll beat on that a bit tonight before unleashing
it.

> Thanks for the help!
>
> Here is a /proc/meminfo when it is running fine:

These numbers are a little odd. You seem to have only lost 200M of
lowmem to buffer_heads. Bill, what's your take on this?

Maybe we're looking at the wrong thing. Are any of your applications
using mlock(), mlockall(), etc?

2003-01-10 00:17:17

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

On Thu, Jan 09, 2003 at 12:18:27PM -0800, Andrew Morton wrote:
> These numbers are a little odd. You seem to have only lost 200M of
> lowmem to buffer_heads. Bill, what's your take on this?

He's really low on lowmem. It's <= 16GB or so so mem_map is near-
irrelevant, say 60MB.

My interpretation of the numbers is as follows, where pae_pmd and
kernel_stack are both guessed from pae_pgd:

buffer_head: 166994KB 183768KB 90.87
pae_pmd: 21576KB 21576KB 100.0
kernel_stack: 14384KB 14384KB 100.0
inode_cache: 6904KB 10377KB 66.52
filp: 6539KB 6547KB 99.87
vm_area_struct: 4897KB 4897KB 100.0
size-4096: 3924KB 4044KB 97.3
size-256: 2137KB 2137KB 100.0
signal_act: 1966KB 1966KB 100.0
dentry_cache: 747KB 1758KB 42.47
sock: 1368KB 1368KB 100.0
size-1024: 1080KB 1080KB 100.0
files_cache: 738KB 738KB 100.0
size-2048: 624KB 684KB 91.22
size-64: 323KB 525KB 61.69
size-32: 158KB 445KB 35.54
size-512: 416KB 416KB 100.0
mm_struct: 405KB 405KB 100.0
size-128: 356KB 356KB 100.0
skbuff_head_cache: 171KB 255KB 67.15
ip_dst_cache: 240KB 240KB 100.0
file_lock_cache: 166KB 191KB 87.5
journal_head: 77KB 176KB 43.99
fs_cache: 113KB 123KB 92.3
pae_pgd: 112KB 112KB 100.0
blkdev_requests: 96KB 101KB 94.81
size-16384: 80KB 80KB 100.0
cdev_cache: 45KB 47KB 96.15
arp_cache: 29KB 45KB 64.44
size-8192: 40KB 40KB 100.0
names_cache: 32KB 32KB 100.0
size-32768: 32KB 32KB 100.0
tcp_bind_bucket: 21KB 28KB 77.67
sigqueue: 26KB 26KB 100.0
tcp_tw_bucket: 18KB 18KB 100.0
uid_cache: 15KB 17KB 89.46
tcp_open_request: 15KB 15KB 100.0
kmem_cache: 15KB 15KB 100.0
inet_peer_cache: 6KB 14KB 46.12
size-128(DMA): 0KB 7KB 3.33
size-32(DMA): 2KB 7KB 29.31
ip_fib_hash: 0KB 7KB 6.25
bdev_cache: 0KB 7KB 7.75
mnt_cache: 1KB 7KB 15.51
dnotify_cache: 4KB 6KB 71.68
fasync_cache: 4KB 6KB 68.25
revoke_table: 0KB 2KB 2.80

== grand total of 253.015MB, fragmentation included.
+ 60MB mem_map
== grand total of 313MB or so


Either pollwait tables (invisible in 2.4 and 2.5), kernel stacks of
threads (which don't get pae_pgd's and are hence invisible in 2.4
and 2.5), or pagecache, with a much higher likelihood of pagecache.

Or there might be dark matter in the universe, and he's being bitten by
unaccounted !__GFP_HIGHMEM allocations, e.g. stock 2.4.x pagetables,
which aren't predictable from pae_pgd etc. highpte of any flavor (aa or
otherwise) should fix that. But there's no way to guess, as there's zero
2.4.x PTE accounting or even any hints from this report, like average
RSS and VSZ (which are still underestimates, as 2.4.x pagetables are
leaked over the lifetime of the process vs. 2.5.x's reap-on-munmap()).


Bill

2003-01-10 00:34:42

by Brian Tinsley

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

>
>
>Either pollwait tables (invisible in 2.4 and 2.5), kernel stacks of
>threads (which don't get pae_pgd's and are hence invisible in 2.4
>and 2.5), or pagecache, with a much higher likelihood of pagecache.
>
The "kernel stacks of threads" may have some bearing on my incarnation
of this problem. We have several heavily threaded Java applications
running at the time the live-locks occur. At our most problematic site,
one application has a bug that can cause hundreds of timer threads (I
mean like 800 or so!) to be "accidentally" created. This site is
scheduled for an upgrade either tonight or tomorrow, so I will leave the
system as it is and see if I can still cause the live-lock to manifest
itself after the upgrade.

--

-[========================]-
-[ Brian Tinsley ]-
-[ Chief Systems Engineer ]-
-[ Emageon ]-
-[========================]-



2003-01-10 00:46:43

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

At some point in the past, I wrote:
>> Either pollwait tables (invisible in 2.4 and 2.5), kernel stacks of
>> threads (which don't get pae_pgd's and are hence invisible in 2.4
>> and 2.5), or pagecache, with a much higher likelihood of pagecache.

On Thu, Jan 09, 2003 at 06:44:10PM -0600, Brian Tinsley wrote:
> The "kernel stacks of threads" may have some bearing on my incarnation
> of this problem. We have several heavily threaded Java applications
> running at the time the live-locks occur. At our most problematic site,
> one application has a bug that can cause hundreds of timer threads (I
> mean like 800 or so!) to be "accidentally" created. This site is
> scheduled for an upgrade either tonight or tomorrow, so I will leave the
> system as it is and see if I can still cause the live-lock to manifest
> itself after the upgrade.

There is no extant implementation of paged stacks yet. I'm working on
a different problem (mem_map on 64GB on 2.5.x). I probably won't have
time to implement it in the near future, I probably won't be doing it
vs. 2.4.x, and I won't have to if someone else does it first.


Bill

2003-01-10 03:08:25

by Brian Tinsley

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

>>>At some point in the past, I wrote:
>>> Either pollwait tables (invisible in 2.4 and 2.5), kernel stacks of
>>> threads (which don't get pae_pgd's and are hence invisible in 2.4
>>> and 2.5), or pagecache, with a much higher likelihood of pagecache.

>>On Thu, Jan 09, 2003 at 06:44:10PM -0600, Brian Tinsley wrote:
>> The "kernel stacks of threads" may have some bearing on my incarnation
>> of this problem. We have several heavily threaded Java applications
>> running at the time the live-locks occur. At our most problematic site,
>> one application has a bug that can cause hundreds of timer threads (I
>> mean like 800 or so!) to be "accidentally" created. This site is
>> scheduled for an upgrade either tonight or tomorrow, so I will leave the
>> system as it is and see if I can still cause the live-lock to manifest
>> itself after the upgrade.

>There is no extant implementation of paged stacks yet.

For the most part, this is probably a boundary condition, right? Anyone
that intentionally has 800+ threads in a single application probably
needs to reevaluate their design :)

>I'm working on a different problem (mem_map on 64GB on 2.5.x). I probably
> won't have time to implement it in the near future, I probably won't
be doing it
>vs. 2.4.x, and I won't have to if someone else does it first.

Is that a hint to someone in particular?



--

-[========================]-
-[ Brian Tinsley ]-
-[ Chief Systems Engineer ]-
-[ Emageon ]-
-[========================]-


2003-01-10 03:20:59

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

At some point in the past, I wrote:
>> There is no extant implementation of paged stacks yet.

On Thu, Jan 09, 2003 at 09:17:56PM -0600, Brian Tinsley wrote:
> For the most part, this is probably a boundary condition, right? Anyone
> that intentionally has 800+ threads in a single application probably
> needs to reevaluate their design :)

IMHO multiprogramming is as valid a use for memory as any other. Or
even otherwise, it's not something I care to get in design debates
about, it's just how the things are used.

The only trouble is support for what you're doing is unimplemented.


At some point in the past, I wrote:
>> I'm working on a different problem (mem_map on 64GB on 2.5.x). I
>> probably won't have time to implement it in the near future, I
>> probably won't be doing it vs. 2.4.x, and I won't have to if someone
>> else does it first.

On Thu, Jan 09, 2003 at 09:17:56PM -0600, Brian Tinsley wrote:
> Is that a hint to someone in particular?

Only you, if anyone. My intentions and patchwriting efforts on the 64GB
and highmem multiprogramming fronts are long since public, and publicly
stated to be targeted at 2.7. Since there isn't a 2.7 yet, 2.5-CURRENT
must suffice until there is.


Bill

2003-01-10 03:32:36

by Brian Tinsley

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

William Lee Irwin III wrote:

>At some point in the past, I wrote:
>
>
>>>There is no extant implementation of paged stacks yet.
>>>
>>>
>
>On Thu, Jan 09, 2003 at 09:17:56PM -0600, Brian Tinsley wrote:
>
>
>>For the most part, this is probably a boundary condition, right? Anyone
>>that intentionally has 800+ threads in a single application probably
>>needs to reevaluate their design :)
>>
>>
>
>IMHO multiprogramming is as valid a use for memory as any other. Or
>even otherwise, it's not something I care to get in design debates
>about, it's just how the things are used.
>
I agree with the philosophy in general, but if I sit down to write a
threaded application for Linux on IA-32 and wind up with a design that
uses 800+ threads in any instance (other than a bug, which was our
case), it's time to give up the day job and start riding on the back of
the garbage truck ;)

>The only trouble is support for what you're doing is unimplemented.
>
You mean the 800+ threads or Java on Linux?

>At some point in the past, I wrote:
>
>
>>>I'm working on a different problem (mem_map on 64GB on 2.5.x). I
>>>probably won't have time to implement it in the near future, I
>>>probably won't be doing it vs. 2.4.x, and I won't have to if someone
>>>else does it first.
>>>
>>>
>
>On Thu, Jan 09, 2003 at 09:17:56PM -0600, Brian Tinsley wrote:
>
>
>>Is that a hint to someone in particular?
>>
>>
>
>Only you, if anyone. My intentions and patchwriting efforts on the 64GB
>and highmem multiprogramming fronts are long since public, and publicly
>stated to be targeted at 2.7. Since there isn't a 2.7 yet, 2.5-CURRENT
>must suffice until there is.
>
In all honesty, I would enjoy nothing more than contributing to kernel
development. Unfortunately it's a bit out of my scope right now (but not
forever). If I only believed aliens seeded our gene pool with clones, I
could hook up with those folks that claim to have cloned a human and get
one of me made! ;)


2003-01-10 03:45:34

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

William Lee Irwin III wrote:
>> IMHO multiprogramming is as valid a use for memory as any other. Or
>> even otherwise, it's not something I care to get in design debates
>> about, it's just how the things are used.

On Thu, Jan 09, 2003 at 09:42:06PM -0600, Brian Tinsley wrote:
> I agree with the philosophy in general, but if I sit down to write a
> threaded application for Linux on IA-32 and wind up with a design that
> uses 800+ threads in any instance (other than a bug, which was our
> case), it's time to give up the day job and start riding on the back of
> the garbage truck ;)

I could care less what userspace does: mechanism, not policy. Userspace
wants, and I give if I can, just as the kernel does with system calls.

800 threads isn't even a high thread count anyway, the 2.5.x testing
was with a peak thread count of 100,000. 800 threads, even with an 8KB
stack, is no more than 6.4MB of lowmem for stacks and so shouldn't
stress the system unless many instances of it are run. I suspect your
issue is elsewhere. I'll submit accounting patches for Marcelo's and/or
Andrea's trees so you can find out what's actually going on.


William Lee Irwin III wrote:
>> Only you, if anyone. My intentions and patchwriting efforts on the 64GB
>> and highmem multiprogramming fronts are long since public, and publicly
>> stated to be targeted at 2.7. Since there isn't a 2.7 yet, 2.5-CURRENT
>> must suffice until there is.

On Thu, Jan 09, 2003 at 09:42:06PM -0600, Brian Tinsley wrote:
> In all honesty, I would enjoy nothing more than contributing to kernel
> development. Unfortunately it's a bit out of my scope right now (but not
> forever). If I only believed aliens seeded our gene pool with clones, I
> could hook up with those folks that claim to have cloned a human and get
> one of me made! ;)

I don't know what to tell you here. I'm lucky that this is my day job
and that I can contribute so much. However, there are plenty who
contribute major changes (many even more important than my own) without
any such sponsorship. Perhaps emulating them would satisfy your wish.


Bill

2003-01-10 03:59:25

by Brian Tinsley

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

William Lee Irwin III wrote:

>William Lee Irwin III wrote:
>
>
>>>IMHO multiprogramming is as valid a use for memory as any other. Or
>>>even otherwise, it's not something I care to get in design debates
>>>about, it's just how the things are used.
>>>
>>>
>
>On Thu, Jan 09, 2003 at 09:42:06PM -0600, Brian Tinsley wrote:
>
>
>>I agree with the philosophy in general, but if I sit down to write a
>>threaded application for Linux on IA-32 and wind up with a design that
>>uses 800+ threads in any instance (other than a bug, which was our
>>case), it's time to give up the day job and start riding on the back of
>>the garbage truck ;)
>>
>>
>
>I could care less what userspace does: mechanism, not policy. Userspace
>wants, and I give if I can, just as the kernel does with system calls.
>
>800 threads isn't even a high thread count anyway, the 2.5.x testing
>was with a peak thread count of 100,000. 800 threads, even with an 8KB
>stack, is no more than 6.4MB of lowmem for stacks and so shouldn't
>stress the system unless many instances of it are run.
>
I understand your perspective here. I won't get into application design
issues as it is far out of context from this list.

>I suspect your issue is elsewhere. I'll submit accounting patches for Marcelo's and/or Andrea's trees so you can find out what's actually going on.
>
Much appreciated! I look forward to it.


>On Thu, Jan 09, 2003 at 09:42:06PM -0600, Brian Tinsley wrote:
>
>
>>In all honesty, I would enjoy nothing more than contributing to kernel
>>development. Unfortunately it's a bit out of my scope right now (but not forever). If I only believed aliens seeded our gene pool with clones, I could hook up with those folks that claim to have cloned a human and get one of me made! ;)
>>
>>
>
>I don't know what to tell you here. I'm lucky that this is my day job
>and that I can contribute so much. However, there are plenty who
>contribute major changes (many even more important than my own) without
>any such sponsorship. Perhaps emulating them would satisfy your wish.
>
It would!

I cannot say thanks enough for the efforts of you and everyone else out
there. Frankly, I would not have my day job and would not have been able
to make Emageon what it is today were it not for you all!

Oh, please excuse the stupid humor tonight. I'm in a giddy mood for some
reason. Must be the excitement from the prospect of getting resolution
to this problem!


2003-01-10 04:13:58

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

William Lee Irwin III wrote:
>> I don't know what to tell you here. I'm lucky that this is my day job
>> and that I can contribute so much. However, there are plenty who
>> contribute major changes (many even more important than my own) without
>> any such sponsorship. Perhaps emulating them would satisfy your wish.

On Thu, Jan 09, 2003 at 10:08:55PM -0600, Brian Tinsley wrote:
> It would!
> I cannot say thanks enough for the efforts of you and everyone else out
> there. Frankly, I would not have my day job and would not have been able
> to make Emageon what it is today were it not for you all!
> Oh, please excuse the stupid humor tonight. I'm in a giddy mood for some
> reason. Must be the excitement from the prospect of getting resolution
> to this problem!

We're straying from the subject here. Please describe your machine,
in terms of how many cpus it has and how much highmem it has, and
your workload, so I can better determine the issue. Perhaps we can
cooperatively devise something that works well for you.

Or perhaps the kernel version is not up-to-date. Please also provide
the precise kernel version (and included patches). And workload too.


Thanks,
Bill

2003-01-10 04:40:32

by Brian Tinsley

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

>
>
>We're straying from the subject here.
>
Sorry

>Please describe your machine,
>in terms of how many cpus it has and how much highmem it has, and
>your workload, so I can better determine the issue. Perhaps we can
>cooperatively devise something that works well for you.
>
IBM x360
Pentium 4 Xeon MP processors

2 processor system has 4GB RAM
4 processor system has 8GB RAM

1 IBM ServeRAID controller
2 Intel PRO/1000MT NICs
2 QLogic 2340 Fibre Channel HBAs

>Or perhaps the kernel version is not up-to-date. Please also provide
>the precise kernel version (and included patches). And workload too.
>
The kernel version is stock 2.4.20 with Chris Mason's data logging and
journal relocation patches for ReiserFS (neither of which are actually
in use for any mounted filesystems). It is compiled for 64GB highmem
support. And just to refresh, I have seen this exact behavior on stock
2.4.19 and stock 2.4.17 (no patches on either of these) also compiled
with 64GB highmem support.

Workload:
When the live-lock occurs, the system is performing intensive network
I/O and intensive disk reads from the fibre channel storage (i.e., the
backup program is reading files from disk and transferring them to the
backup server). I posted a snapshot of sar data collection earlier today
showing selected stats leading up to and just after the live-lock occurs
(which is noted by a ~2 minute gap in sar logging). After the live-lock
is released, the only thing that stands out is an unusual increase in
runtime for kswapd (as reported by ps).

The various Java programs mentioned in prior postings are *mostly* idle
at this point in time as it is after hours for our clients.


2003-01-10 05:08:32

by Martin J. Bligh

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

> IBM x360
> Pentium 4 Xeon MP processors
>
> 2 processor system has 4GB RAM
> 4 processor system has 8GB RAM
>
> 1 IBM ServeRAID controller
> 2 Intel PRO/1000MT NICs
> 2 QLogic 2340 Fibre Channel HBAs
>
>> Or perhaps the kernel version is not up-to-date. Please also provide
>> the precise kernel version (and included patches). And workload too.
>>
> The kernel version is stock 2.4.20 with Chris Mason's data logging and journal relocation patches for ReiserFS (neither of which are actually in use for any mounted filesystems). It is compiled for 64GB highmem support. And just to refresh, I have seen this exact behavior on stock 2.4.19 and stock 2.4.17 (no patches on either of these) also compiled with 64GB highmem support.
>
> Workload:
> When the live-lock occurs, the system is performing intensive network I/O and intensive disk reads from the fibre channel storage (i.e., the backup program is reading files from disk and transferring them to the backup server). I posted a snapshot of sar data collection earlier today showing selected stats leading up to and just after the live-lock occurs (which is noted by a ~2 minute gap in sar logging). After the live-lock is released, the only thing that stands out is an unusual increase in runtime for kswapd (as reported by ps).
>
> The various Java programs mentioned in prior postings are *mostly* idle at this point in time as it is after hours for our clients.


If you don't have any individual processes that need to be particularly
large (eg > 1Gb of data), I suggest you just cheat^Wfinesse the problem
and move PAGE_OFFSET from C0000000 to 80000000 - will give you more than
twice as much lowmem to play with. I think this might even be a config
option in RedHat kernels.

Martin.

2003-01-10 05:16:00

by William Lee Irwin III

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

At some point in the past, I wrote:
>> Or perhaps the kernel version is not up-to-date. Please also provide
>> the precise kernel version (and included patches). And workload too.

On Thu, Jan 09, 2003 at 10:50:03PM -0600, Brian Tinsley wrote:
> The kernel version is stock 2.4.20 with Chris Mason's data logging and
> journal relocation patches for ReiserFS (neither of which are actually
> in use for any mounted filesystems). It is compiled for 64GB highmem
> support. And just to refresh, I have seen this exact behavior on stock
> 2.4.19 and stock 2.4.17 (no patches on either of these) also compiled
> with 64GB highmem support.

Okay, can you try with either 2.4.x-aa or 2.5.x-CURRENT?

I'm suspecting either bh problems or lowpte problems.

Also, could you monitor your load with the scripts I posted?


Thanks,
Bill

2003-01-10 05:35:56

by Brian Tinsley

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

>
>
>Okay, can you try with either 2.4.x-aa or 2.5.x-CURRENT?
>
Yes, I *just* booted a machine with 2.4.20-aa1 in our lab. I was having
problems compiling the Linux Virtual Server code, but it's fixed now.

>I'm suspecting either bh problems or lowpte problems.
>
>Also, could you monitor your load with the scripts I posted?
>
>
Yes, they are already uploaded to a customer site and ready to go. I
need to flex the -aa1 kernel a bit before I load it there as well.


Thanks!


2003-01-10 20:34:34

by Chris Wood

[permalink] [raw]
Subject: Re: 2.4.20, .text.lock.swap cpu usage? (ibm x440)

Andrew Morton wrote:
> Chris Wood wrote:
>
>>..
>>The server ran fine for 3 days, so it took a bit to get this info.
>
>
> Is appreciated, thanks.
>
>
>>Is there a list of which patches I can apply if I don't want to apply
>>the entire 2.4.20aa1? I'm nervous about breaking other things, but may
>>give it a try anyway.
>
>
> http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1/05_vm_16_active_free_zone_bhs-1
> http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20aa1/10_inode-highmem-2
>
> The former is the most important and, alas, has dependencies on
> earlier patches.
>
> hm, OK. I've pulled all Andrea's VM changes and the inode-highmem fix
> into a standalone diff. I'll beat on that a bit tonight before unleashing
> it.
>

I tried to apply 2.4.20aa1 to my /usr/src/linux and then compile it, but
it failed to compile. I do have the IBM x440 (NUMA) patches applied to
this tree, I don't know if that caused any problems but I didn't see any
when I applied the patch. I'll attach a snip at the end of this email
just in case it may point to something (there was more than this).

>
>>Thanks for the help!
>>
>>Here is a /proc/meminfo when it is running fine:
>
>
> These numbers are a little odd. You seem to have only lost 200M of
> lowmem to buffer_heads. Bill, what's your take on this?
>
> Maybe we're looking at the wrong thing. Are any of your applications
> using mlock(), mlockall(), etc?

I'm not sure, other than our services our main programs are in Cobol
(iCobol and AcuCobol). I could ask the vendors if that would help.


------- sorry if this is an ugly paste -------

/usr/src/linux-2.4.20/include/asm/pgalloc.h: In function `get_pgd_slow':
/usr/src/linux-2.4.20/include/asm/pgalloc.h:49: `PAGE_OFFSET_RAW'
undeclared (fi
rst use in this function)
/usr/src/linux-2.4.20/include/asm/pgalloc.h:54: warning: implicit
declaration of
function `set_64bit'
/usr/src/linux-2.4.20/include/asm/pgalloc.h: In function `free_pgd_slow':
/usr/src/linux-2.4.20/include/asm/pgalloc.h:110: `PAGE_OFFSET_RAW'
undeclared (f
irst use in this function)
In file included from /usr/src/linux-2.4.20/include/linux/blkdev.h:11,
from /usr/src/linux-2.4.20/include/linux/blk.h:4,
from init/main.c:25:
/usr/src/linux-2.4.20/include/asm/io.h: In function `virt_to_phys':
/usr/src/linux-2.4.20/include/asm/io.h:78: `PAGE_OFFSET_RAW' undeclared
(first u
se in this function)
/usr/src/linux-2.4.20/include/asm/io.h:79: warning: control reaches end
of non-v
oid function
/usr/src/linux-2.4.20/include/asm/io.h: In function `phys_to_virt':
/usr/src/linux-2.4.20/include/asm/io.h:96: `PAGE_OFFSET_RAW' undeclared
(first u
se in this function)
/usr/src/linux-2.4.20/include/asm/io.h:97: warning: control reaches end
of non-v
oid function
/usr/src/linux-2.4.20/include/asm/io.h: In function `isa_check_signature':
/usr/src/linux-2.4.20/include/asm/io.h:280: `PAGE_OFFSET_RAW' undeclared
(first
use in this function)
init/main.c: In function `start_kernel':
init/main.c:381: `PAGE_OFFSET_RAW' undeclared (first use in this function)
make: *** [init/main.o] Error 1
In file included from eni.c:9:
/usr/src/linux-2.4.20/include/linux/mm.h: In function `pmd_alloc':
/usr/src/linux-2.4.20/include/linux/mm.h:521: `PAGE_OFFSET_RAW'
undeclared (firs
t use in this function)
/usr/src/linux-2.4.20/include/linux/mm.h:521: (Each undeclared
identifier is rep
orted only once
/usr/src/linux-2.4.20/include/linux/mm.h:521: for each function it
appears in.)
/usr/src/linux-2.4.20/include/linux/mm.h:522: warning: control reaches
end of no
n-void function
In file included from /usr/src/linux-2.4.20/include/linux/highmem.h:5,
from /usr/src/linux-2.4.20/include/linux/vmalloc.h:8,
from /usr/src/linux-2.4.20/include/asm/io.h:47,
from /usr/src/linux-2.4.20/include/asm/pci.h:35,
from /usr/src/linux-2.4.20/include/linux/pci.h:622,
from eni.c:10:
/usr/src/linux-2.4.20/include/asm/pgalloc.h: In function `get_pgd_slow':
/usr/src/linux-2.4.20/include/asm/pgalloc.h:49: `PAGE_OFFSET_RAW'
undeclared (fi
rst use in this function)
/usr/src/linux-2.4.20/include/asm/pgalloc.h:54: warning: implicit
declaration of
function `set_64bit'
/usr/src/linux-2.4.20/include/asm/pgalloc.h: In function `free_pgd_slow':
/usr/src/linux-2.4.20/include/asm/pgalloc.h:110: `PAGE_OFFSET_RAW'
undeclared (f
irst use in this function)
In file included from /usr/src/linux-2.4.20/include/asm/pci.h:35,
from /usr/src/linux-2.4.20/include/linux/pci.h:622,
from eni.c:10:
/usr/src/linux-2.4.20/include/asm/io.h: In function `virt_to_phys':
/usr/src/linux-2.4.20/include/asm/io.h:78: `PAGE_OFFSET_RAW' undeclared
(first u
se in this function)
/usr/src/linux-2.4.20/include/asm/io.h:79: warning: control reaches end
of non-v
oid function
/usr/src/linux-2.4.20/include/asm/io.h: In function `phys_to_virt':
/usr/src/linux-2.4.20/include/asm/io.h:96: `PAGE_OFFSET_RAW' undeclared
(first u
se in this function)
/usr/src/linux-2.4.20/include/asm/io.h:97: warning: control reaches end
of non-v
oid function
/usr/src/linux-2.4.20/include/asm/io.h: In function `isa_check_signature':
/usr/src/linux-2.4.20/include/asm/io.h:280: `PAGE_OFFSET_RAW' undeclared
(first
use in this function)
make[2]: *** [eni.o] Error 1
make[1]: *** [_modsubdir_atm] Error 2
make: *** [_mod_drivers] Error 2