2005-01-04 20:09:48

by Marek Habersack

[permalink] [raw]
Subject: Very high load on P4 machines with 2.4.28

Hello,

We have several machines with similar configurations

0000:00:00.0 Host bridge: Intel Corp. 82875P Memory Controller Hub (rev 02)
0000:00:01.0 PCI bridge: Intel Corp. 82875P Processor to AGP Controller (rev 02)
0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev c2)
0000:00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC Bridge (rev 02)
0000:00:1f.2 IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150 Storage Controller (rev 02)
0000:00:1f.3 SMBus: Intel Corp. 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
0000:02:09.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
0000:02:0a.0 Ethernet controller: Intel Corp. 82541EI Gigabit Ethernet Controller (Copper)
0000:02:0b.0 Ethernet controller: Intel Corp. 82541EI Gigabit Ethernet Controller (Copper)

and

0000:00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 03)
0000:00:02.0 VGA compatible controller: Intel Corp. 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 03)
0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev 82)
0000:00:1f.0 ISA bridge: Intel Corp. 82801DB/DBL (ICH4/ICH4-L) LPC Bridge (rev 02)
0000:00:1f.1 IDE interface: Intel Corp. 82801DB/DBL (ICH4/ICH4-L) UltraATA-100 IDE Controller (rev 02)
0000:00:1f.3 SMBus: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 02)
0000:01:05.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02)
0000:01:06.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02)

equipped with 2.6Ghz P4 CPUs, 1Gb of ram, 2-4gb of swap, the kernel config
is attached. The machines have normal load averages hovering not higher than
7.0, depending on the time of the day etc. Two of the machines run 2.4.25,
one 2.4.27 and they work fine. When booted with 2.4.28, though (compiled
with Debian's gcc 2.3.5, with p3 or p4 CPU selected in the config), the load
is climbing very fast and hovers around a value 3-4 times higher than with
the older kernels. Booted back in the old kernel, the load comes to its
usual level. The logs suggest nothing, no errors, nothing unusual is
happening.

Has anyone had similar problems with 2.4.28 in an environment resembling the
above? Could it be a problem with highmem i/o?

tia,

marek


Attachments:
(No filename) (2.26 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2005-01-04 22:14:40

by Willy Tarreau

[permalink] [raw]
Subject: Re: Very high load on P4 machines with 2.4.28

Hi,

On Tue, Jan 04, 2005 at 08:56:36PM +0100, Marek Habersack wrote:
(...)
> equipped with 2.6Ghz P4 CPUs, 1Gb of ram, 2-4gb of swap, the kernel config
> is attached. The machines have normal load averages hovering not higher than
> 7.0, depending on the time of the day etc. Two of the machines run 2.4.25,
> one 2.4.27 and they work fine. When booted with 2.4.28, though (compiled
> with Debian's gcc 2.3.5, with p3 or p4 CPU selected in the config), the load
> is climbing very fast and hovers around a value 3-4 times higher than with
> the older kernels. Booted back in the old kernel, the load comes to its
> usual level. The logs suggest nothing, no errors, nothing unusual is
> happening.
>
> Has anyone had similar problems with 2.4.28 in an environment resembling the
> above? Could it be a problem with highmem i/o?

Never encountered yet ! Could you provide some indications about the type of
work (I/O, network, CPU, scripts execution, #of processes, etc...) ?

Regards,
Willy

2005-01-04 22:17:56

by Willy Tarreau

[permalink] [raw]
Subject: Re: Very high load on P4 machines with 2.4.28

Oh, while I'm at it, are you using hyperthreading, and if so, could you
disable it ? I have seen many cases where it degrades performances
significantly (eg: highly loaded user space network applications).

Willy

On Tue, Jan 04, 2005 at 08:56:36PM +0100, Marek Habersack wrote:
> Hello,
>
> We have several machines with similar configurations
>
> 0000:00:00.0 Host bridge: Intel Corp. 82875P Memory Controller Hub (rev 02)
> 0000:00:01.0 PCI bridge: Intel Corp. 82875P Processor to AGP Controller (rev 02)
> 0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev c2)
> 0000:00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC Bridge (rev 02)
> 0000:00:1f.2 IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150 Storage Controller (rev 02)
> 0000:00:1f.3 SMBus: Intel Corp. 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
> 0000:02:09.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
> 0000:02:0a.0 Ethernet controller: Intel Corp. 82541EI Gigabit Ethernet Controller (Copper)
> 0000:02:0b.0 Ethernet controller: Intel Corp. 82541EI Gigabit Ethernet Controller (Copper)
>
> and
>
> 0000:00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 03)
> 0000:00:02.0 VGA compatible controller: Intel Corp. 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 03)
> 0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev 82)
> 0000:00:1f.0 ISA bridge: Intel Corp. 82801DB/DBL (ICH4/ICH4-L) LPC Bridge (rev 02)
> 0000:00:1f.1 IDE interface: Intel Corp. 82801DB/DBL (ICH4/ICH4-L) UltraATA-100 IDE Controller (rev 02)
> 0000:00:1f.3 SMBus: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 02)
> 0000:01:05.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02)
> 0000:01:06.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02)
>
> equipped with 2.6Ghz P4 CPUs, 1Gb of ram, 2-4gb of swap, the kernel config
> is attached. The machines have normal load averages hovering not higher than
> 7.0, depending on the time of the day etc. Two of the machines run 2.4.25,
> one 2.4.27 and they work fine. When booted with 2.4.28, though (compiled
> with Debian's gcc 2.3.5, with p3 or p4 CPU selected in the config), the load
> is climbing very fast and hovers around a value 3-4 times higher than with
> the older kernels. Booted back in the old kernel, the load comes to its
> usual level. The logs suggest nothing, no errors, nothing unusual is
> happening.
>
> Has anyone had similar problems with 2.4.28 in an environment resembling the
> above? Could it be a problem with highmem i/o?
>
> tia,
>
> marek


2005-01-04 23:15:48

by Marek Habersack

[permalink] [raw]
Subject: Re: Very high load on P4 machines with 2.4.28

On Tue, Jan 04, 2005 at 11:03:13PM +0100, Willy Tarreau scribbled:
> Hi,
>
> On Tue, Jan 04, 2005 at 08:56:36PM +0100, Marek Habersack wrote:
> (...)
> > equipped with 2.6Ghz P4 CPUs, 1Gb of ram, 2-4gb of swap, the kernel config
> > is attached. The machines have normal load averages hovering not higher than
> > 7.0, depending on the time of the day etc. Two of the machines run 2.4.25,
> > one 2.4.27 and they work fine. When booted with 2.4.28, though (compiled
> > with Debian's gcc 2.3.5, with p3 or p4 CPU selected in the config), the load
> > is climbing very fast and hovers around a value 3-4 times higher than with
> > the older kernels. Booted back in the old kernel, the load comes to its
> > usual level. The logs suggest nothing, no errors, nothing unusual is
> > happening.
> >
> > Has anyone had similar problems with 2.4.28 in an environment resembling the
> > above? Could it be a problem with highmem i/o?
>
> Never encountered yet ! Could you provide some indications about the type of
> work (I/O, network, CPU, scripts execution, #of processes, etc...) ?
Of course. Here's some information:

the machines (with exception of one) are virtual hosting servers running
apache with a lot of customer-provided perl scripts, php code,
mysql (quite heavily used). The bandwidth used ranges from 4Mbit/s
(the non-virtual box) to 24Mbit/s. The number of processes ranges from ~300 to
~600. Interestingly enough, the machine with the highest load average is the
one generating 4Mbit/s and the one with 24Mbit/s has the smallest load
average value. The latter also suffers from the biggest loadavg increase.
All of the virtual machines have iptables accounting chains for each
configured IP (there are between 62 IP numbers on one and 32 on the other).
The virtual boxes have two 80GB SATA drives raided with softraid. The
non-virtual box has a single IDE drive, no raid.

Some diagnostics (virtuals running 2.4.25 with ow1, non-virtual 2.4.27 with
ow1):

(virtual #1)
# vmstat
procs -----------memory---------- ---swap-- -----io---- --system------cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 108364 33376 3032 382452 1 1 85 107 79 76 42 7 52 0

# iostat
avg-cpu: %user %nice %sys %iowait %idle
41.86 0.00 6.63 0.00 51.51

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
dev8-0 44.33 91.04 761.64 32859202 274901552
dev8-1 44.07 88.28 759.27 31863034 274048056

# cat /proc/interrupts
CPU0
0: 36164314 XT-PIC timer
1: 2 XT-PIC keyboard
2: 0 XT-PIC cascade
4: 5 XT-PIC serial
8: 4 XT-PIC rtc
10: 304927622 XT-PIC eth0
12: 786373 XT-PIC eth1
14: 31209236 XT-PIC libata
15: 1 XT-PIC ide1
NMI: 0
ERR: 0


(virtual #2, the 24Mbit/s one)
# vmstat
procs -----------memory---------- ---swap-- -----io---- --system------cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
5 3 172448 13084 1208 304048 4 4 90 50 109 117 19 8 73 0

# iostat
avg-cpu: %user %nice %sys %iowait %idle
18.76 0.00 7.98 0.00 73.26

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
dev8-0 28.66 465.98 323.12 168199261 116634204
dev8-1 28.26 458.50 314.57 165501660 113547124

# cat /proc/interrupts
CPU0
0: 36164279 XT-PIC timer
1: 2 XT-PIC keyboard
2: 0 XT-PIC cascade
4: 5 XT-PIC serial
8: 4 XT-PIC rtc
10: 713628758 XT-PIC eth0
12: 1452211 XT-PIC eth1
14: 20094643 XT-PIC libata
15: 1 XT-PIC ide1
NMI: 0
ERR: 0


(the non-virtual)
# vmstat
procs -----------memory---------- ---swap-- -----io---- --system------cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
60 0 70300 115960 0 369244 0 0 79 32 90 45 73 7 21 0

# iostat
avg-cpu: %user %nice %sys %iowait %idle
72.64 0.03 6.63 0.00 20.71

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
dev3-0 7.62 54.77 91.76 19834569 33227488

# cat /proc/interrupts
CPU0
0: 36284746 XT-PIC timer
1: 2 XT-PIC keyboard
2: 0 XT-PIC cascade
4: 6 XT-PIC serial
8: 4 XT-PIC rtc
10: 787346 XT-PIC eth1
11: 120939540 XT-PIC eth0
14: 2743009 XT-PIC ide0
15: 1069794 XT-PIC ide1
NMI: 0
ERR: 0


sar doesn't show anything interesting for any of the boxes. We tried to run
2.6.10 on the non-virtual box to see whether it could cope with its load
problems, it crashed after one day with no trace in the logs as to what
might have caused the problem (this box runs some software that's pretty
bursty and CPU-intensive giving sometimes around 50-80 apache processes
running at a time). One other interesting thing to note is that we have one
other box with the similar configuration to the virtuals (also a virtual
host) but it runs 2.4.28 with SMP+HT enabled - no load problems there at
all. In the past we had a problem with P4 machines running very slowly when
the kernel was compiled with the P4 CPU target, but it was on different
hardware (an MSI mobo, the boxes above all have Supermicro mobos) and
compiling the kernel with P3 CPU target helped with the slowness. That's why
we tried the same "trick" here - but this time it didn't help. Let me know if you
need more info,

thanks for your help, best regards

marek


Attachments:
(No filename) (5.85 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2005-01-04 23:33:09

by Marek Habersack

[permalink] [raw]
Subject: Re: Very high load on P4 machines with 2.4.28

On Tue, Jan 04, 2005 at 11:05:21PM +0100, Willy Tarreau scribbled:
> Oh, while I'm at it, are you using hyperthreading, and if so, could you
yes, as I wrote in the mail I've just sent - on one box which does NOT
exhibit the problem... :)

> disable it ? I have seen many cases where it degrades performances
> significantly (eg: highly loaded user space network applications).
We saw it in two cases as well, that's why in general we don't run with HT
enabled (although we do test the boxes with it and if it behaves, we leave
it on).

best regards,

marek


Attachments:
(No filename) (557.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2005-01-05 05:34:45

by Willy Tarreau

[permalink] [raw]
Subject: Re: Very high load on P4 machines with 2.4.28

Hi,

On Wed, Jan 05, 2005 at 12:07:33AM +0100, Marek Habersack wrote:
> Interestingly enough, the machine with the highest load average is the
> one generating 4Mbit/s and the one with 24Mbit/s has the smallest load
> average value.

This is common with multi-process servers like apache if the link is
saturated, because data takes more time to reach the client, so you have
a higher concurrency.

> The latter also suffers from the biggest loadavg increase.
> All of the virtual machines have iptables accounting chains for each
> configured IP (there are between 62 IP numbers on one and 32 on the other).
> The virtual boxes have two 80GB SATA drives raided with softraid. The
> non-virtual box has a single IDE drive, no raid.

> (virtual #2, the 24Mbit/s one)
> # vmstat
> procs -----------memory---------- ---swap-- -----io---- --system------cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 5 3 172448 13084 1208 304048 4 4 90 50 109 117 19 8 73 0

I don't like something : with 73% idle, you have 5 processes in the rq. I think
this machine writes logs synchronously to disks, or stores SSL sessions on a
real disk and waits for writes. A tmpfs would be a great help.
You can try to trace the processes activity with :

# strace -Te write <process pid>
It will display the time elapsed in each write() syscall, you'll find the
fds in /proc/<pid>/fd. You may notice big times on logs or ssl sessions.

> (the non-virtual)
> # vmstat
> procs -----------memory---------- ---swap-- -----io---- --system------cpu----
> r b swpd free buff cache si so bi bo in cs us sy id wa
> 60 0 70300 115960 0 369244 0 0 79 32 90 45 73 7 21 0

Same note for this one, although it does more user space work (php? ssl?).
It's possible that some change in 2.4.28 touches the I/O subsystem and
increases your wait I/O time in this particular application.
(...)
> One other interesting thing to note is that we have one
> other box with the similar configuration to the virtuals (also a virtual
> host) but it runs 2.4.28 with SMP+HT enabled - no load problems there at
> all.

So, to contradict myself, have you tried enabling HT on other boxes which
suffer from the load ?

> Let me know if you need more info,

You have send fairly enough info right now. Other than I/O work, I have no
idea. You may want to play with /proc/sys/vm/{bdflush,max-readahead} and
others to see if it changes things.

If your load is bursty, it might help to reduce the ratio of dirty blocks
before flushing (first field in bdflush), because although writes will
start more often, they will take fewer time.

I already have solved similar problems by disabling keep-alive to decrease
the number of processes.

Regards,
Willy

2005-01-05 11:33:28

by Marek Habersack

[permalink] [raw]
Subject: Re: Very high load on P4 machines with 2.4.28

On Wed, Jan 05, 2005 at 06:28:41AM +0100, Willy Tarreau scribbled:
> Hi,
Hello,

> On Wed, Jan 05, 2005 at 12:07:33AM +0100, Marek Habersack wrote:
> > Interestingly enough, the machine with the highest load average is the
> > one generating 4Mbit/s and the one with 24Mbit/s has the smallest load
> > average value.
>
> This is common with multi-process servers like apache if the link is
> saturated, because data takes more time to reach the client, so you have
> a higher concurrency.
The link isn't saturated - we have a 200Mbit/s margin atm. It's not a
bandwidth problem, that's certain.

> > The latter also suffers from the biggest loadavg increase.
> > All of the virtual machines have iptables accounting chains for each
> > configured IP (there are between 62 IP numbers on one and 32 on the other).
> > The virtual boxes have two 80GB SATA drives raided with softraid. The
> > non-virtual box has a single IDE drive, no raid.
>
> > (virtual #2, the 24Mbit/s one)
> > # vmstat
> > procs -----------memory---------- ---swap-- -----io---- --system------cpu----
> > r b swpd free buff cache si so bi bo in cs us sy id wa
> > 5 3 172448 13084 1208 304048 4 4 90 50 109 117 19 8 73 0
>
> I don't like something : with 73% idle, you have 5 processes in the rq. I think
> this machine writes logs synchronously to disks, or stores SSL sessions on a
the only synchronously written logs are auth.log and mail.err, SSL is there
indeed, but the site is hardly ever accessed (as of a while ago, the box has
a load of 0.75, pushing out 14Mbit/s. With 2.4.28 last week it was around
10.0 in the same conditions).

> real disk and waits for writes. A tmpfs would be a great help.
The only thigs writing to disk on regular basis (except for syslog and
apache for logs) are the php session files, one tdb database for traffic
data and mysql (which might be using fsync - can that be the cause of the
i/o slowness?). But, in any case, the machine behaves well under kernels
other than 2.4.28.

> You can try to trace the processes activity with :
>
> # strace -Te write <process pid>
> It will display the time elapsed in each write() syscall, you'll find the
> fds in /proc/<pid>/fd. You may notice big times on logs or ssl sessions.
nope... the times are in the range 0.000008 to 0.000045...

> > (the non-virtual)
> > # vmstat
> > procs -----------memory---------- ---swap-- -----io---- --system------cpu----
> > r b swpd free buff cache si so bi bo in cs us sy id wa
> > 60 0 70300 115960 0 369244 0 0 79 32 90 45 73 7 21 0
>
> Same note for this one, although it does more user space work (php? ssl?).
poorly written perl scripts

> It's possible that some change in 2.4.28 touches the I/O subsystem and
> increases your wait I/O time in this particular application.
> (...)
Any clues as to where too look? I examined the 2.4.28 changelog and saw
nothing that would suggest such change, but then I'm not a kernel hacker, I
might have easily missed something important.

> > One other interesting thing to note is that we have one
> > other box with the similar configuration to the virtuals (also a virtual
> > host) but it runs 2.4.28 with SMP+HT enabled - no load problems there at
> > all.
>
> So, to contradict myself, have you tried enabling HT on other boxes which
> suffer from the load ?
Yep, only one box boots fine with HT enabled (out of the ones with
problems), the others just freeze (we thought it could have been the machine
BIOS, but updating it didn't help)

> > Let me know if you need more info,
>
> You have send fairly enough info right now. Other than I/O work, I have no
> idea. You may want to play with /proc/sys/vm/{bdflush,max-readahead} and
> others to see if it changes things.
At this point I think we're gonna run them under the older kernels and wait
for 2.4.29 to see whether the problem still exists there. If it does, we'll
try 2.6 on the machines and if that doesn't help, we'll do some more testing
with 2.4.28 - we have our hands tied, since they are production machines and
we cannot let them run with such degraded performance for too long...

> If your load is bursty, it might help to reduce the ratio of dirty blocks
> before flushing (first field in bdflush), because although writes will
> start more often, they will take fewer time.
what about nfract_sync? Does it make sense to make it smaller as well? I've
also decreased age_buffer to 15s

> I already have solved similar problems by disabling keep-alive to decrease
> the number of processes.
Disabling keep-alive is a routine here... :) But, that is unlikely to be the
cause since it's evidently a kernel thing.

Well, I'll see what good the bdflush changes do to the machines when they
run under the "good" kernel and we'll schedule for some testing with 2.4.28
at some point.

thanks for your help, it's greately appreciated!

best regards,

marek


Attachments:
(No filename) (4.84 kB)
signature.asc (189.00 B)
Digital signature
Download all attachments

2005-01-05 12:15:34

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: Very high load on P4 machines with 2.4.28

On Tue, Jan 04, 2005 at 08:56:36PM +0100, Marek Habersack wrote:
> Hello,
>
> We have several machines with similar configurations
>
> 0000:00:00.0 Host bridge: Intel Corp. 82875P Memory Controller Hub (rev 02)
> 0000:00:01.0 PCI bridge: Intel Corp. 82875P Processor to AGP Controller (rev 02)
> 0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev c2)
> 0000:00:1f.0 ISA bridge: Intel Corp. 82801EB/ER (ICH5/ICH5R) LPC Bridge (rev 02)
> 0000:00:1f.2 IDE interface: Intel Corp. 82801EB (ICH5) Serial ATA 150 Storage Controller (rev 02)
> 0000:00:1f.3 SMBus: Intel Corp. 82801EB/ER (ICH5/ICH5R) SMBus Controller (rev 02)
> 0000:02:09.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
> 0000:02:0a.0 Ethernet controller: Intel Corp. 82541EI Gigabit Ethernet Controller (Copper)
> 0000:02:0b.0 Ethernet controller: Intel Corp. 82541EI Gigabit Ethernet Controller (Copper)
>
> and
>
> 0000:00:00.0 Host bridge: Intel Corp. 82845G/GL[Brookdale-G]/GE/PE DRAM Controller/Host-Hub Interface (rev 03)
> 0000:00:02.0 VGA compatible controller: Intel Corp. 82845G/GL[Brookdale-G]/GE Chipset Integrated Graphics Device (rev 03)
> 0000:00:1e.0 PCI bridge: Intel Corp. 82801 PCI Bridge (rev 82)
> 0000:00:1f.0 ISA bridge: Intel Corp. 82801DB/DBL (ICH4/ICH4-L) LPC Bridge (rev 02)
> 0000:00:1f.1 IDE interface: Intel Corp. 82801DB/DBL (ICH4/ICH4-L) UltraATA-100 IDE Controller (rev 02)
> 0000:00:1f.3 SMBus: Intel Corp. 82801DB/DBL/DBM (ICH4/ICH4-L/ICH4-M) SMBus Controller (rev 02)
> 0000:01:05.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02)
> 0000:01:06.0 Ethernet controller: Intel Corp. 82540EM Gigabit Ethernet Controller (rev 02)
>
> equipped with 2.6Ghz P4 CPUs, 1Gb of ram, 2-4gb of swap, the kernel config
> is attached. The machines have normal load averages hovering not higher than
> 7.0, depending on the time of the day etc. Two of the machines run 2.4.25,
> one 2.4.27 and they work fine. When booted with 2.4.28, though (compiled
> with Debian's gcc 2.3.5, with p3 or p4 CPU selected in the config), the load
> is climbing very fast and hovers around a value 3-4 times higher than with
> the older kernels. Booted back in the old kernel, the load comes to its
> usual level. The logs suggest nothing, no errors, nothing unusual is
> happening.
>
> Has anyone had similar problems with 2.4.28 in an environment resembling the
> above? Could it be a problem with highmem i/o?

Nothing that I'm aware of should cause such increase in loadavg.

Marek, can you please try 2.4.28-pre1 ?

2005-01-05 17:50:45

by Marek Habersack

[permalink] [raw]
Subject: Re: Very high load on P4 machines with 2.4.28

On Wed, Jan 05, 2005 at 07:42:36AM -0200, Marcelo Tosatti scribbled:
[snip]
> > Has anyone had similar problems with 2.4.28 in an environment resembling the
> > above? Could it be a problem with highmem i/o?
>
> Nothing that I'm aware of should cause such increase in loadavg.
>
> Marek, can you please try 2.4.28-pre1 ?
I should be able to schedule that on Friday. Currently, we're running 2.4.28
on one of the machines (the non-virtual) but with mem=800M to exclude
highmem. So far - no problems... We'll see how it goes on

best regards,

marek


Attachments:
(No filename) (550.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2005-01-06 18:58:30

by Denis Vlasenko

[permalink] [raw]
Subject: Re: Very high load on P4 machines with 2.4.28

On Tuesday 04 January 2005 21:56, Marek Habersack wrote:
> equipped with 2.6Ghz P4 CPUs, 1Gb of ram, 2-4gb of swap, the kernel config
> is attached.

It isn't...

> The machines have normal load averages hovering not higher than
> 7.0, depending on the time of the day etc. Two of the machines run 2.4.25,
> one 2.4.27 and they work fine. When booted with 2.4.28, though (compiled
> with Debian's gcc 2.3.5, with p3 or p4 CPU selected in the config), the load
> is climbing very fast and hovers around a value 3-4 times higher than with
> the older kernels. Booted back in the old kernel, the load comes to its
> usual level. The logs suggest nothing, no errors, nothing unusual is
> happening.

You may try each of 2.4.28-pre{1,2,3} kernels with identical .config
and pinpoint when did it happen.
--
vda