Helle there,
I've been, for the past days, setting up a fairly big diskless network
based on Linux. I've chosen to use 2.4.19 as the kernel because there
were some hardware requirements, and for most of the newer boxes, it
runs fine. However, for three of the older boxes, we have had some
pretty odd performance and stability issues. This message is about the
latest one, which is an ASUS P5S-B (has the infamous SIS 530 chipset)
on an intel eepro100 card. Details:
Host bridge: Silicon Integrated Systems [SiS] 530 Host (rev 2).
Ethernet controller: Intel Corp. 82557/8/9 [Ethernet Pro 100] (rev 8).
VGA compatible controller: Silicon Integrated Systems [SiS] 6306
3D-AGP (rev a2)
We've been using NFS for the diskless boxes, of course, and for this
particular box, the CPU usage for everything is just so much higher its
amazing. It's so slow you can feel the repaints happening when running
X, even when listing directories using ls -lR. I'll show a summary of
bonnie runs on the root (NFS) partition:
1.02b ------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
can (v2) 300M 4054 57 5121 2 3549 3 5942 91 10195 3 442.2 4
can (v3) 300M 5965 82 7861 3 2739 2 5437 82 9316 3 675.5 6
can (v3+) 300M 4721 66 2654 1 641 0 5454 74 5017 0 690.2 7
min (v3+) 300M 3170 95 1526 2 1118 3 2913 89 4061 2 474.9 21
tri (v3+) 300M 2708 96 5997 65 2806 76 2673 90 6064 73 351.9 64
------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
can (v2) 16 44 0 4316 19 80 0 46 0 4684 8 68 0
can (v3) 16 2104 9 5060 16 2102 8 2149 6 5789 10 1833 6
can (v3+) 16 1666 7 4390 38 2103 8 1735 9 8859 17 2143 8
min (v3+) 16 1037 19 2467 45 1133 17 959 17 4645 26 1120 12
tri (v3+) 16 1066 35 1879 66 1125 28 978 30 4187 43 1334 32
Legend: hosts to the left, can is a K7-900 min and tri are K6-500s.
v2 indicates mount options=v2
v3 indicates mount options=v3
v3+ indicates mount options=v3 and Trond's nfs-all patch applied
On tri, which is the referred SIS530 box, as you can see, for most runs
the CPU usage is just so much higher than minas, which has practically
the same setup: K6-500, old PCI (no AGP) board, eepro100 card. I'm
wondering if anybody has seen something like this before?
The server is a K7 with 2 3c905-TXM cards that is serving all other
boxes with no problems reported (beyond some lockd quirks I'm trying to
get my head around).
I've tried applying Trond's patches, using pci=biosirq and other tweaks,
but nothing has really made it better. Any idea what's going on?
Crossposting to reach the appropriate parties as I feel its quite a
cross-subject issue.
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
>>>>> " " == Christian Reis <[email protected]> writes:
> Helle there,
> I've been, for the past days, setting up a fairly big diskless
> network based on Linux. I've chosen to use 2.4.19 as the kernel
> because there were some hardware requirements, and for most of
> the newer boxes, it runs fine. However, for three of the older
> boxes, we have had some pretty odd performance and stability
> issues. This message is about the latest one, which is an ASUS
> P5S-B (has the infamous SIS 530 chipset) on an intel eepro100
> card. Details:
Is all this NFS over UDP? If so, numbers should not really have
changed in 2.4.19 ( - yes my patchset changes things, but stock 2.4.19
should not be too different w.r.t 2.4.18)
Are you able to determine where in the 2.4.19-pre series the
performance dies?
Cheers,
Trond
On Wed, Aug 14, 2002 at 03:13:55AM +0200, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <[email protected]> writes:
>
> > Helle there,
>
> > I've been, for the past days, setting up a fairly big diskless
> > network based on Linux. I've chosen to use 2.4.19 as the kernel
> > because there were some hardware requirements, and for most of
> > the newer boxes, it runs fine. However, for three of the older
> > boxes, we have had some pretty odd performance and stability
> > issues. This message is about the latest one, which is an ASUS
> > P5S-B (has the infamous SIS 530 chipset) on an intel eepro100
> > card. Details:
>
> Is all this NFS over UDP? If so, numbers should not really have
> changed in 2.4.19 ( - yes my patchset changes things, but stock 2.4.19
> should not be too different w.r.t 2.4.18)
>
> Are you able to determine where in the 2.4.19-pre series the
> performance dies?
Yes, it is over UDP. (Should I try TCP?)
Well, to be honest, I just set the network up, and I only tried two
kernels: 2.4.19 kosher and 2.4.19 with nfs-all. I haven't experimented
swapping kernels because I've been a bit singleminded that it's
something to do with the hardware setup.
I can try using an older kernel to see if it helps. 2.4.18 is a good
idea? Let me try and I'll post you back.
(BTW: Your patches *do* solve a problem I have: it makes client nfs
locking actually work; before them I had some serious issues with
locking under high network load. Not anymore. The flock() patch is also
essential for running sendmail on the diskless stations --- before it I
was forced to use tmpfs for /var/spool/mqueue.)
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
On 13 August 2002 22:29, Christian Reis wrote:
> On tri, which is the referred SIS530 box, as you can see, for most runs
> the CPU usage is just so much higher than minas, which has practically
> the same setup: K6-500, old PCI (no AGP) board, eepro100 card. I'm
> wondering if anybody has seen something like this before?
Start swapping hardware between these two boxes
--
vda
On Thu, Aug 15, 2002 at 11:41:20AM -0200, Denis Vlasenko wrote:
> On 13 August 2002 22:29, Christian Reis wrote:
> > On tri, which is the referred SIS530 box, as you can see, for most runs
> > the CPU usage is just so much higher than minas, which has practically
> > the same setup: K6-500, old PCI (no AGP) board, eepro100 card. I'm
> > wondering if anybody has seen something like this before?
>
> Start swapping hardware between these two boxes
I've done it, actually. I've swapped processor, memory and video cards
(which are the only things the boxes actually contain) and there has
been no change. It's very strange - block read and block write consume
enormous amounts of CPU, no matter how much I tinker. I've tried 2.4.19,
2.2.21, NFS patches, etc.
There could be a hidden BIOS option, but I can't seem to find out what
it is. Both caches are enabled. PCI interrupts are fine. It has to be a
motherboard support problem; in Windows the box runs fine. :-(
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
On Wed, Aug 14, 2002 at 03:13:55AM +0200, Trond Myklebust wrote:
> >>>>> " " == Christian Reis <[email protected]> writes:
>
> > Helle there,
>
> > I've been, for the past days, setting up a fairly big diskless
> > network based on Linux. I've chosen to use 2.4.19 as the kernel
> > because there were some hardware requirements, and for most of
> > the newer boxes, it runs fine. However, for three of the older
> > boxes, we have had some pretty odd performance and stability
> > issues. This message is about the latest one, which is an ASUS
> > P5S-B (has the infamous SIS 530 chipset) on an intel eepro100
> > card. Details:
>
> Is all this NFS over UDP? If so, numbers should not really have
> changed in 2.4.19 ( - yes my patchset changes things, but stock 2.4.19
> should not be too different w.r.t 2.4.18)
Trond, I've been looking at this a bit more. I've tried 2.4.18 and
2.2.21 and nothing changes - it always looks bad. If I look at top
output for
time dd if=/dev/zero of=TESTFILE count=3000 bs=100k
rpciod is around 60% of CPU and there is 0% idle. A typical vmstat line
is:
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 0 16228 4 197316 0 0 0 0 1426 603 14 86 0
Now, I contrast this to the other box (same CPU speed, other mobo):
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 1 0 0 196216 0 53656 0 0 0 0 103 9 0 0 100
I see rpciod running, but very rarely. So any idea why rpciod would be
running and burning up CPU a lot in the first case, but not in the
second? Maybe I need a kernel profiler to go after the actual problem?
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
On 15 August 2002 17:21, Christian Reis wrote:
> On Thu, Aug 15, 2002 at 11:41:20AM -0200, Denis Vlasenko wrote:
> > On 13 August 2002 22:29, Christian Reis wrote:
> > > On tri, which is the referred SIS530 box, as you can see, for most runs
> > > the CPU usage is just so much higher than minas, which has practically
> > > the same setup: K6-500, old PCI (no AGP) board, eepro100 card. I'm
> > > wondering if anybody has seen something like this before?
> >
> > Start swapping hardware between these two boxes
>
> I've done it, actually. I've swapped processor, memory and video cards
> (which are the only things the boxes actually contain) and there has
> been no change. It's very strange - block read and block write consume
> enormous amounts of CPU, no matter how much I tinker. I've tried 2.4.19,
> 2.2.21, NFS patches, etc.
Network card?
Motherboard? :-)
BIOS? 8-)
Compare BIOS/chipset setup (lspci -vvvxxx)
--
vda
On Fri, Aug 16, 2002 at 11:21:33AM -0200, Denis Vlasenko wrote:
> > > On 13 August 2002 22:29, Christian Reis wrote:
> > > > On tri, which is the referred SIS530 box, as you can see, for most runs
> > > > the CPU usage is just so much higher than minas, which has practically
> > > > the same setup: K6-500, old PCI (no AGP) board, eepro100 card. I'm
> > > > wondering if anybody has seen something like this before?
>
> Network card?
Same type of network card in them them, swapped between them, no change.
> Motherboard? :-)
> BIOS? 8-)
Yes, it has to be something in there. But they are not the same brand,
and I've already updated the BIOS.
> Compare BIOS/chipset setup (lspci -vvvxxx)
Thanks for the suggestion. There are some items that differ between the
outputs, as you can see below. One thing that is suggestive is that
host bridge Control has I/O+ on problematic mobo, but I/O- on the normal
one. I'm not sure what this affects, though. (BTW, I have triple
checked, and there is no interrupt sharing going on).
I've got to a point where I need to understand a bit more about how
interrupts are handled by the hardware and kernel.
I want to understand how it is possible for one network card in
motherboard A to generate thousands of interrupts more than the same
card in motherboard B for the same amount of net traffic. How does the
kernel (or card?) schedule or postpone (batch?) the interrupts?
Host bridge, Motherboard A (problematic):
00:00.0 Host bridge: Silicon Integrated Systems [SiS] 530 Host (rev 02)
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Latency: 32
Region 0: Memory at e0000000 (32-bit, non-prefetchable) [size=64M]
Capabilities: [c0] AGP version 2.0
Status: RQ=31 SBA+ 64bit- FW- Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>
00: 39 10 30 05 07 00 10 22 02 00 00 06 00 20 80 00
10: 00 00 00 e0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
30: 00 00 00 00 c0 00 00 00 00 00 00 00 00 00 00 00
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 90 98 e0 00 00 0d b2 00 50 00 00 00 00 18 00 00
60: 26 06 06 67 00 00 00 00 c0 00 00 00 00 00 00 00
70: cc 80 00 00 88 88 88 00 00 00 00 00 00 00 00 00
80: 00 00 80 03 60 00 03 44 00 10 7b 00 48 00 00 00
90: 00 00 00 00 40 00 00 01 00 00 00 00 00 00 00 00
a0: 40 40 80 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 02 00 20 00 03 02 00 1f 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Host bridge, Motherboard B (normal):
00:00.0 Host bridge: Acer Laboratories Inc. [ALi] M1541 (rev 04)
Subsystem: Acer Laboratories Inc. [ALi] ALI M1541 Aladdin V/V+
AGP System Controller
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort-
<TAbort- <MAbort+ >SERR- <PERR-
Latency: 64
Region 0: Memory at d0000000 (32-bit, non-prefetchable) [size=256M]
Capabilities: [b0] AGP version 1.0
Status: RQ=28 SBA+ 64bit- FW- Rate=x1,x2
Command: RQ=0 SBA- AGP- 64bit- FW- Rate=<none>
00: b9 10 41 15 06 00 10 24 04 00 00 06 00 40 00 00
10: 00 00 00 d0 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 b9 10 41 15
30: 00 00 00 00 b0 00 00 00 00 00 00 00 00 00 00 00
40: 13 04 81 75 00 01 00 06 09 ef a0 5c 00 00 ff ff
50: 00 f0 00 cc 00 05 07 ff 00 00 00 00 00 00 08 00
60: 00 00 00 00 00 00 00 00 7f b0 ff b0 ff 00 ff 00
70: 00 30 46 02 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 0f c8 07 1e ea 20 20 00 00 4b 42 32
90: 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 02 00 10 00 03 02 00 1c 00 00 00 00 0a 00 00 00
c0: 90 00 fd df 00 00 00 00 bf 4a 00 00 00 00 00 10
d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
e0: 01 00 01 00 00 00 00 00 00 00 00 00 38 11 0c 29
f0: 00 00 00 08 00 90 95 03 00 00 00 00 00 00 00 00
Ethernet controller, Motherboard A (problematic):
00:0b.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100]
(rev 08)
Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (2000ns min, 14000ns max), cache line size 08
Interrupt: pin A routed to IRQ 11
Region 0: Memory at c7800000 (32-bit, non-prefetchable) [size=4K]
Region 1: I/O ports at d800 [size=64]
Region 2: Memory at c7000000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at <unassigned> [disabled] [size=1M]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-
00: 86 80 29 12 17 00 90 02 08 00 00 02 08 20 00 00
10: 00 00 80 c7 01 d8 00 00 00 00 00 c7 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 0c 00
30: 00 00 00 00 dc 00 00 00 00 00 00 00 0b 01 08 38
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 01 00 22 7e
e0: 00 40 00 3a 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Ethernet controller, Motherboard B (normal):
00:0b.0 Ethernet controller: Intel Corp. 82557 [Ethernet Pro 100] (rev 08)
Subsystem: Intel Corp. EtherExpress PRO/100+ Management Adapter
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort-
<TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (2000ns min, 14000ns max), cache line size 08
Interrupt: pin A routed to IRQ 10
Region 0: Memory at de800000 (32-bit, non-prefetchable) [size=4K]
Region 1: I/O ports at a400 [size=64]
Region 2: Memory at de000000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at <unassigned> [disabled] [size=1M]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 PME-Enable- DSel=0 DScale=2 PME-
00: 86 80 29 12 17 00 90 02 08 00 00 02 08 20 00 00
10: 00 00 80 de 01 a4 00 00 00 00 00 de 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 0c 00
30: 00 00 00 00 dc 00 00 00 00 00 00 00 0a 01 08 38
40: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
50: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
60: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
70: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
d0: 00 00 00 00 00 00 00 00 00 00 00 00 01 00 22 fe
e0: 00 40 00 3a 00 00 00 00 00 00 00 00 00 00 00 00
f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Take care,
--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL