2004-04-03 19:15:49

by Rafał J. Wysocki

[permalink] [raw]
Subject: 2.6.[45]-.*: weird behavior

Hi,

For quite some time I've been observing strange keyboard problems with the
kernels 2.6.4 and above. Namely, in a GUI, the keyboard (which is a PS/2)
sometimes seems to get "locked" for some time (usually for a couple of
seconds) even if there's no any process running in the background (of course
there are some processes in the background but they are all sleeping). By
saying "locked" I mean that the kernel does not seem to accept any keyboard
input whatsoever at that time, but after the kayboard gets "unclocked" it
properly passes all of the characters typed in the meantime to applications.

Moreover, it often takes more time to "unlock" the keyboard (up to 30 sec. or
so), in which case I can speed up the proccess by switchig windows with a
mouse (usually it is sufficient to switch once to another window, but
sometimes more window switching is necessary).

Some days ago I noticed that similar problem occured for example when I tried
to unpack the kernel source: the tar process had apparently been suspended
for several times for up to 10 sec., so it took more than approx. 3 min. to
unpack the source (usually it takes no more than 30 sec.). There were not
any processes running in the background, though.

I noticed that this happened after updatedb had finished so I tried to
recreate this by running updatedb once again and than unpacking the kernel
once again and the problem reappeared. I'd run top before and I found that
the CPUs were spending 90+ percent of the time on IO-wait (the system is a
dual AMD64 w/ NUMA w/o kernel preemption). Strange.

I thought it was accidental (the kernel was 2.6.5-rc3-mm2), but yesterday I
noticed that the keyboard "locking" occured even more often after I had
installed the 2.6.5-rc3-mm4 kernel (it happened after the machine had been -
seemingly - idle for some time). I ran top and that's what it showed me:

top - 23:14:39 up 3:21, 1 user, load average: 1.24, 1.12, 1.05
Tasks: 111 total, 1 running, 110 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.5% user, 0.0% system, 0.0% nice, 50.0% idle, 49.5% IO-wait
Mem: 1030060k total, 1000820k used, 29240k free, 18724k buffers
Swap: 1050608k total, 3520k used, 1047088k free, 50168k cached

and it had been showing similar things for a minute or so, which is a kind of
weird, IMHO. I mean, the only running process was the top itself, so I don't
get the 49.5% IO-wait.

Then I thought it was a NUMA-related issue but today I had the "keyboard
locking problem" on a Celeron (Coppermine)-based laptop running the 2.6.5-rc3
with exactly the same symptoms (apparantly, the kernel stopped accepting the
keyboard input or at least passing it to applications - I couldn't even
switch from X to a console - and then, when I closed some windows with a
mouse preparing the system for reboot, the keyboard got "unlocked" and it
started to work as usual). So, there are two different (though a bit
similar) architectures affected, it seems.

Now, can you please tell me what may cause such a behavior? I really would
like to narrow it, so please tell me what I can do for this purpose (I've no
idea whatsoever).

Yours,
Rafael

--
Rafael J. Wysocki,
SiSK
[tel. (+48) 605 053 693]
----------------------------
For a successful technology, reality must take precedence over public
relations, for nature cannot be fooled.
-- Richard P. Feynman



2004-04-03 19:50:46

by Grzegorz Kulewski

[permalink] [raw]
Subject: Re: 2.6.[45]-.*: weird behavior

On Sat, 3 Apr 2004, R. J. Wysocki wrote:

> Hi,
>
> For quite some time I've been observing strange keyboard problems with the
> kernels 2.6.4 and above. Namely, in a GUI, the keyboard (which is a PS/2)
> sometimes seems to get "locked" for some time (usually for a couple of
> seconds) even if there's no any process running in the background (of course
> there are some processes in the background but they are all sleeping). By
> saying "locked" I mean that the kernel does not seem to accept any keyboard
> input whatsoever at that time, but after the kayboard gets "unclocked" it
> properly passes all of the characters typed in the meantime to applications.
>
> Moreover, it often takes more time to "unlock" the keyboard (up to 30 sec. or
> so), in which case I can speed up the proccess by switchig windows with a
> mouse (usually it is sufficient to switch once to another window, but
> sometimes more window switching is necessary).
>
> Some days ago I noticed that similar problem occured for example when I tried
> to unpack the kernel source: the tar process had apparently been suspended
> for several times for up to 10 sec., so it took more than approx. 3 min. to
> unpack the source (usually it takes no more than 30 sec.). There were not
> any processes running in the background, though.
>
> I noticed that this happened after updatedb had finished so I tried to
> recreate this by running updatedb once again and than unpacking the kernel
> once again and the problem reappeared. I'd run top before and I found that
> the CPUs were spending 90+ percent of the time on IO-wait (the system is a
> dual AMD64 w/ NUMA w/o kernel preemption). Strange.
>
> I thought it was accidental (the kernel was 2.6.5-rc3-mm2), but yesterday I
> noticed that the keyboard "locking" occured even more often after I had
> installed the 2.6.5-rc3-mm4 kernel (it happened after the machine had been -
> seemingly - idle for some time). I ran top and that's what it showed me:
>
> top - 23:14:39 up 3:21, 1 user, load average: 1.24, 1.12, 1.05
> Tasks: 111 total, 1 running, 110 sleeping, 0 stopped, 0 zombie
> Cpu(s): 0.5% user, 0.0% system, 0.0% nice, 50.0% idle, 49.5% IO-wait
> Mem: 1030060k total, 1000820k used, 29240k free, 18724k buffers
> Swap: 1050608k total, 3520k used, 1047088k free, 50168k cached
>
> and it had been showing similar things for a minute or so, which is a kind of
> weird, IMHO. I mean, the only running process was the top itself, so I don't
> get the 49.5% IO-wait.
>
> Then I thought it was a NUMA-related issue but today I had the "keyboard
> locking problem" on a Celeron (Coppermine)-based laptop running the 2.6.5-rc3
> with exactly the same symptoms (apparantly, the kernel stopped accepting the
> keyboard input or at least passing it to applications - I couldn't even
> switch from X to a console - and then, when I closed some windows with a
> mouse preparing the system for reboot, the keyboard got "unlocked" and it
> started to work as usual). So, there are two different (though a bit
> similar) architectures affected, it seems.
>
> Now, can you please tell me what may cause such a behavior? I really would
> like to narrow it, so please tell me what I can do for this purpose (I've no
> idea whatsoever).

Hi,

Maybe you should make profile of the running kernel on both configurations
to find what functions are called most often?

Can you attach config files for both the AMD64 and the laptop?

And can you reproduce this problem with 2.6.2-rc2-mm? or 2.6.2-mm?
kernels? They are working for me and my friend well on all machines (with
some minor issues) while >=2.6.4-mm? are broken on nearly all
configurations (because of different, often unrelated reasons).


regards

Grzegorz Kulewski

2004-04-03 21:31:35

by Rafał J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.[45]-.*: weird behavior

On Saturday 03 of April 2004 21:50, Grzegorz Kulewski wrote:
> On Sat, 3 Apr 2004, R. J. Wysocki wrote:
[...]
>
> Maybe you should make profile of the running kernel on both configurations
> to find what functions are called most often?
>
> Can you attach config files for both the AMD64 and the laptop?

The config for the laptop is attached, the other one must wait. If you think
of what they have in common, there's not much.

> And can you reproduce this problem with 2.6.2-rc2-mm? or 2.6.2-mm?
> kernels? They are working for me and my friend well on all machines (with
> some minor issues)

No, everything seems to be OK up to the 2.6.4-rc-something (roughly, the
problem appeared in the 2.6.4 for the first time).

> while >=2.6.4-mm? are broken on nearly all
> configurations (because of different, often unrelated reasons).

There are some issues with the 2.6.4-and-above kernels but I've managed to get
almost all of them running anyway.

--
Rafael J. Wysocki,
SiSK
[tel. (+48) 605 053 693]
----------------------------
For a successful technology, reality must take precedence over public
relations, for nature cannot be fooled.
-- Richard P. Feynman



Attachments:
(No filename) (1.15 kB)
config-2.6.5-rc3 (30.09 kB)
Download all attachments

2004-04-03 22:15:38

by Grzegorz Kulewski

[permalink] [raw]
Subject: Re: 2.6.[45]-.*: weird behavior

On Sat, 3 Apr 2004, R. J. Wysocki wrote:

> On Saturday 03 of April 2004 21:50, Grzegorz Kulewski wrote:
> > On Sat, 3 Apr 2004, R. J. Wysocki wrote:
> [...]
> > Can you attach config files for both the AMD64 and the laptop?
>
> The config for the laptop is attached, the other one must wait. If you think
> of what they have in common, there's not much.

Ok, some questions to that config file. Why do you have:
- scsi emulation
- scsi
- both generic ide options
- large block devices (2TB)?

Can you post:
- distro name and version
- dmesg or log at the end of testing - better after such kb lock if you
can reproduce, maybe after some stressing to see if any unnormal messages
appeared
- lspci -v
- lsmod
- mount
- hdparm hdparm -iIvtT for all drives
- some files from /proc describing configuration if you think they are
important.

Does any process sleep in D state in ps output all the time or bechaves
strangely? If so, maybe you should find and apply the patch for kernel
stack for each process in /proc (it was included in wolk for example) and
check what kernel function is causing the waits (for example I found some
usb problems causing D state lock of processes using some usb ioctls).

If it all does not help, maybe you should compile kernel with all debug
and kernel hacking options to see if some driver does not lock the kernel
and sleep or something like that, or possibly try to find what changeset
between 2.6.3 and 2.6.4 broke your setup :)


good luck

Grzegorz Kulewski

2004-04-03 23:02:12

by Rafał J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.[45]-.*: weird behavior

On Sunday 04 of April 2004 00:15, Grzegorz Kulewski wrote:
> On Sat, 3 Apr 2004, R. J. Wysocki wrote:
> > On Saturday 03 of April 2004 21:50, Grzegorz Kulewski wrote:
> > > On Sat, 3 Apr 2004, R. J. Wysocki wrote:
> >
> > [...]
> >
> > > Can you attach config files for both the AMD64 and the laptop?
> >
> > The config for the laptop is attached, the other one must wait. If you
> > think of what they have in common, there's not much.
>
> Ok, some questions to that config file. Why do you have:
> - scsi emulation
> - scsi
> - both generic ide options
> - large block devices (2TB)?

SCSI support is necessary for USB storage, the rest is just for fun. :-)

> Can you post:
> - distro name and version

RH9

> - dmesg or log at the end of testing - better after such kb lock if you
> can reproduce, maybe after some stressing to see if any unnormal messages
> appeared
> - lspci -v
> - lsmod
> - mount
> - hdparm hdparm -iIvtT for all drives
> - some files from /proc describing configuration if you think they are
> important.

Well, I _really_ had not much time to track this. If I'd had time, I'd
probably have checked all these things already. I don't think there are any
unusual things about what you list, though.

> Does any process sleep in D state in ps output all the time or bechaves
> strangely? If so, maybe you should find and apply the patch for kernel
> stack for each process in /proc (it was included in wolk for example) and
> check what kernel function is causing the waits (for example I found some
> usb problems causing D state lock of processes using some usb ioctls).

Good idea, I can do that.

> If it all does not help, maybe you should compile kernel with all debug
> and kernel hacking options to see if some driver does not lock the kernel
> and sleep or something like that, or possibly try to find what changeset
> between 2.6.3 and 2.6.4 broke your setup :)

Well, the patch-2.6.4.bz2 is 2.2+M big. That's _a_ _lot_ of changesets, so I
don't think I can figure out this, unless I know which one could
_potentially_ cause the effects that I observe. Please, give me a hint, if
you have any idea.

--
Rafael J. Wysocki,
SiSK
[tel. (+48) 605 053 693]
----------------------------
For a successful technology, reality must take precedence over public
relations, for nature cannot be fooled.
-- Richard P. Feynman

2004-04-04 11:21:51

by Grzegorz Kulewski

[permalink] [raw]
Subject: Re: 2.6.[45]-.*: weird behavior

Maybe you should check IRQ sharing with ps port (if it is ps keyboard). Is
it different under >=2.6.4-mm than under other kernels?

And maybe you should add boot options to turn ACPI and other similar
options off.

And what video card are you using? There were some problems with S3 and
other cards / XFree86 implementations in the past. Maybe kernel change is
just exposing the problemm, not producing it? What about starting X with
vesa or other "generic" driver insead of driver for your card.

I tried hard to reproduce your problem on my Athlon 32 box with
2.6.5-rc3-mm4 but with no luck. I have some problems with this kernel, as
with other kernels past 2.6.4 (for example I cannot mount any non root non
virtual fs :)), but not this kind of problem. If I remember good I had one
keyboard lockup under X in last month, but this was on 2.6.2-rc2-mm?
kernel (probalby) after exiting from game (aa) and keyboard did not
unlock (but I didint wait 30 sec - I do not have enough amount of
patience :)). This problem never happened again.

What are exact steps to reproduce your problem?

And maybe the common thing with these two configurations is the
SMP<->preemprion. These two are very similar in producing problems and if
I remember good you have preemtion enabled on the laptop. Can you
reproduce the problem without it?

I also found the patch for kernel stack in proc - it is here:

http://thebsh.namesys.com/snapshots/LATEST/extra/a_04-proc-stack.patch

there is also iowait-reason and sleep-stat patch in the same directory.

And maybe try to reproduce the problem with 2.6.5 without mm. People like
to have reports about "stable" kernels :)


good luck

Grzegorz Kulewski


PS. I am compiling 2.6.5 too to check if it works better for me.

2004-04-05 10:22:08

by Rafał J. Wysocki

[permalink] [raw]
Subject: Re: 2.6.[45]-.*: weird behavior

On Sunday 04 of April 2004 13:21, Grzegorz Kulewski wrote:
[...]
> PS. I am compiling 2.6.5 too to check if it works better for me.

First, thanks for your help! Unfortunately I had no time to dig into it
further and I'm now using the 2.6.5 too without any problems so far. Let's
see if it happens again,

Rafael

--
Rafael J. Wysocki,
SiSK
[tel. (+48) 605 053 693]
----------------------------
For a successful technology, reality must take precedence over public
relations, for nature cannot be fooled.
-- Richard P. Feynman