2003-06-01 01:39:39

by Albert Cahalan

[permalink] [raw]
Subject: another must-fix: major PS/2 mouse problem

Lots of people (check Google) get this message
from the kernel:

psmouse.c: Lost synchronization, throwing 2 bytes away.

(the number of bytes will be 1, 2, or 3)

At work, I get it when there is heavy NFS traffic.
The mouse goes crazy, jumping around and doing
random cut-and-paste all over everything. This is
with a decently fast and modern PC.

I'll guess that NFS and the mouse both have worker
threads fighting for CPU time, and neither is RT.




2003-06-04 05:34:35

by Yoann

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

is there a patch for this bug ?

I have the same problem with my laptop, chip sis630, celeron 1.2Ghz, 256MB of
RAM (32MB for video), mouse on PS/2 (ImPS/2) abd read mp3 throught nfs
partition (ethernet 100MB). I haven't try without traffic on nfs but I will
try next time I boot on the 2.5.70 (currently, I'm running a 2.4.20)

Yoann

Albert Cahalan wrote:
> Lots of people (check Google) get this message
> from the kernel:
>
> psmouse.c: Lost synchronization, throwing 2 bytes away.
>
> (the number of bytes will be 1, 2, or 3)
>
> At work, I get it when there is heavy NFS traffic.
> The mouse goes crazy, jumping around and doing
> random cut-and-paste all over everything. This is
> with a decently fast and modern PC.
>
> I'll guess that NFS and the mouse both have worker
> threads fighting for CPU time, and neither is RT.


2003-06-04 07:34:23

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

On Tue, Jun 03, 2003 at 11:21:55PM -0700, Andrew Morton wrote:

> We believe that it may be due to the ethernet driver holding interrupts off
> for too long when the traffic is heavy.

Note that this doesn't necessarily mean that the ethernet driver
disables the interrupts for a too long time, it just means that the
computer is only servicing the network interrupts at that time, and
since the mouse interrupt does have a lower priority, it's serviced
not very often and with huge delays.

In such a case the network driver should either use interrupt mitigation
if the cards supports it (reading many packets per one interrupt) or
switch to a polled mode.

> Does that seem to match your observations? Does the problem happen when
> the net traffic is high?
>
> Which ethernet driver are you using?

--
Vojtech Pavlik
SuSE Labs, SuSE CR

2003-06-04 07:39:07

by Andrew Morton

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

Vojtech Pavlik <[email protected]> wrote:
>
> On Tue, Jun 03, 2003 at 11:21:55PM -0700, Andrew Morton wrote:
>
> > We believe that it may be due to the ethernet driver holding interrupts off
> > for too long when the traffic is heavy.
>
> Note that this doesn't necessarily mean that the ethernet driver
> disables the interrupts for a too long time, it just means that the
> computer is only servicing the network interrupts at that time, and
> since the mouse interrupt does have a lower priority, it's serviced
> not very often and with huge delays.
>
> In such a case the network driver should either use interrupt mitigation
> if the cards supports it (reading many packets per one interrupt) or
> switch to a polled mode.

Has this problem been observed in 2.4 kernels?

2003-06-04 07:46:57

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

On Wed, Jun 04, 2003 at 12:53:02AM -0700, Andrew Morton wrote:
> Vojtech Pavlik <[email protected]> wrote:
> >
> > On Tue, Jun 03, 2003 at 11:21:55PM -0700, Andrew Morton wrote:
> >
> > > We believe that it may be due to the ethernet driver holding interrupts off
> > > for too long when the traffic is heavy.
> >
> > Note that this doesn't necessarily mean that the ethernet driver
> > disables the interrupts for a too long time, it just means that the
> > computer is only servicing the network interrupts at that time, and
> > since the mouse interrupt does have a lower priority, it's serviced
> > not very often and with huge delays.
> >
> > In such a case the network driver should either use interrupt mitigation
> > if the cards supports it (reading many packets per one interrupt) or
> > switch to a polled mode.
>
> Has this problem been observed in 2.4 kernels?

No, since 2.4 doesn't have the re-sync code in the mouse driver which is
triggering in this case. But problems with the machine being flooded
with interrupts from the NIC so hard that it actually cannot do anything
are quite common.

--
Vojtech Pavlik
SuSE Labs, SuSE CR

2003-06-04 08:00:19

by Andrew Morton

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

Vojtech Pavlik <[email protected]> wrote:
>
> > Has this problem been observed in 2.4 kernels?
>
> No, since 2.4 doesn't have the re-sync code in the mouse driver which is
> triggering in this case. But problems with the machine being flooded
> with interrupts from the NIC so hard that it actually cannot do anything
> are quite common.

So is the resync code doing more good than harm?

2003-06-04 08:27:17

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

On Wed, Jun 04, 2003 at 01:14:13AM -0700, Andrew Morton wrote:

> > > Has this problem been observed in 2.4 kernels?
> >
> > No, since 2.4 doesn't have the re-sync code in the mouse driver which is
> > triggering in this case. But problems with the machine being flooded
> > with interrupts from the NIC so hard that it actually cannot do anything
> > are quite common.
>
> So is the resync code doing more good than harm?

Hard to tell. The people for which it does good don't complain.

--
Vojtech Pavlik
SuSE Labs, SuSE CR

2003-06-04 19:08:20

by Yoann

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

Vojtech Pavlik wrote:
> On Wed, Jun 04, 2003 at 01:14:13AM -0700, Andrew Morton wrote:
>
>
>>>>Has this problem been observed in 2.4 kernels?
>>>
>>> No, since 2.4 doesn't have the re-sync code in the mouse driver which is
>>> triggering in this case. But problems with the machine being flooded
>>> with interrupts from the NIC so hard that it actually cannot do anything
>>> are quite common.
>>
>>So is the resync code doing more good than harm?
>
>
> Hard to tell. The people for which it does good don't complain.

I didn't reboot my pc yet, so I'm still running a 2.4.20 without any problem
with my mouse. but when I will boot on the 2.5.70, what I should do to find
where does the bug come from. I'm little but new here, so I never try to
locate a bug in a kernel...

thanks for your advice

Yoann
--
Jugglers, like programmers, handle objects which, at first sight, seem complex
and difficult to control. Some of them, with time and patience, manage to
control one or the other or both at the same time, and thus become aware of
what they are doing.


2003-06-04 23:01:15

by Albert Cahalan

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

On Wed, 2003-06-04 at 04:14, Andrew Morton wrote:
> Vojtech Pavlik <[email protected]> wrote:

> >> Has this problem been observed in 2.4 kernels?
> >
> > No, since 2.4 doesn't have the re-sync code in the mouse driver which is
> > triggering in this case. But problems with the machine being flooded
> > with interrupts from the NIC so hard that it actually cannot do anything
> > are quite common.
>
> So is the resync code doing more good than harm?

The log message is useful.

I think the resync code is a bit like the OOM killer.
We need it, but something is wrong if it ever gets used.
It also doesn't quite work the way it should.

Anyway...

I only get the problem with NFS traffic. It may be
that NFS traffic is the only way I've yet found to
generate extreme network usage though.

The system with problems is an NFSv3 client that
gets abused by an in-house version control system
based on SCCS. I suppose this is like running
"tar xf foo.tar" or "tar xf foo.tar foo" over NFS.

The hardware is:

Pentium III (Coppermine)
1002.822 MHz
Apollo chipset

# lspci -s 00:0d.0 -v
00:0d.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado]
(rev 74)
Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC
Management NIC
Flags: bus master, medium devsel, latency 32, IRQ 11
I/O ports at ec00 [size=128]
Memory at df000000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 2

# nfsstat -c
Client rpc stats:
calls retrans authrefrsh
118380 7843 0
Client nfs v2:
null getattr setattr root lookup readlink
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
read wrcache write create remove rename
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%
link symlink mkdir rmdir readdir fsstat
0 0% 0 0% 0 0% 0 0% 0 0% 0 0%

Client nfs v3:
null getattr setattr lookup access readlink
0 0% 12501 10% 114 0% 68765 58% 25538 21% 4 0%
read write create mkdir symlink mknod
8830 7% 725 0% 377 0% 3 0% 1 0% 0 0%
remove rmdir rename link readdir readdirplus
498 0% 0 0% 367 0% 173 0% 0 0% 10 0%
fsstat fsinfo pathconf commit
2 0% 2 0% 0 0% 470 0%


2003-07-23 00:38:30

by Albert Cahalan

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

I may have found the problem!

On Tue, 2003-06-03 at 15:18, Yoann wrote:

> I have the same problem with my laptop, chip sis630,
> celeron 1.2Ghz, 256MB of RAM (32MB for video), mouse
> on PS/2 (ImPS/2) abd read mp3 throught nfs partition
> (ethernet 100MB). I haven't try without traffic on
> nfs but I will try next time I boot on the 2.5.70

Using the lockmeter on a 2.5.75 kernel, I discovered
that boomerang_interrupt() grabs a spinlock for over
1/4 second. No joke, 253 ms. Interrupts are off AFAIK.

Mouse behavior is terrible.

It should be no surprise that NTP isn't working too
well either. The ntpd daemon keeps complaining about
losing sync and having to advance the clock by amounts
of over 100 seconds.

Could somebody with the hardware manual take a look
at that function?


2003-07-24 17:27:47

by Andrew Morton

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

Albert Cahalan <[email protected]> wrote:
>
> Using the lockmeter on a 2.5.75 kernel, I discovered
> that boomerang_interrupt() grabs a spinlock for over
> 1/4 second. No joke, 253 ms. Interrupts are off AFAIK.

boomerang_interrupt() doesn't disable interrupts. Is the NIC sharing the
mouse's IRQ line?

boomerang_interrupt() is only used by nasty old NICs and yes, I guess it is
possible that something has gone wrong and is causing occasional long spins
in there.

But I am more suspecting that you're not really using boomerang_interrupt()
at all, and that something has gone wrong with lockmeter. What sort of NIC
are you using?

Bear in mind that if some other device generates an interrupt while the CPU
is running boomerang_interrupt(), lockmeter will count the time spent in
that other device's interrupt as "time spent in boomerand_interrupt()".
Which is very true, but it is not much help when one is trying to identify
the source of the problem.

Perhaps what you should do is to do an rdtsc on entry and exit of do_IRQ()
and print stuff out when "long" periods of time in do_IRQ() are noticed.

2003-07-25 01:41:01

by Albert Cahalan

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

On Thu, 2003-07-24 at 13:30, Andrew Morton wrote:
> Albert Cahalan <[email protected]> wrote:

> > Using the lockmeter on a 2.5.75 kernel, I discovered
> > that boomerang_interrupt() grabs a spinlock for over
> > 1/4 second. No joke, 253 ms. Interrupts are off AFAIK.
>
> boomerang_interrupt() doesn't disable interrupts. Is the NIC sharing the
> mouse's IRQ line?

No.

CPU0
0: 746770 XT-PIC timer
1: 936 XT-PIC i8042
2: 0 XT-PIC cascade
4: 9 XT-PIC serial
5: 0 XT-PIC uhci-hcd, uhci-hcd
11: 2417 XT-PIC eth0
12: 60 XT-PIC i8042
14: 13844 XT-PIC ide0
15: 2 XT-PIC ide1
NMI: 0
LOC: 751552
ERR: 0
MIS: 0


> boomerang_interrupt() is only used by nasty old NICs and yes, I guess it is
> possible that something has gone wrong and is causing occasional long spins
> in there.
>
> But I am more suspecting that you're not really using boomerang_interrupt()
> at all, and that something has gone wrong with lockmeter. What sort of NIC
> are you using?

I hope you don't consider a 100 Mb/s PCI device to be
a nasty old NIC. It's not an NE2000 you know! I have this:

00:0d.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 74)
Subsystem: 3Com Corporation 3C905C-TX Fast Etherlink for PC Management NIC
Flags: bus master, medium devsel, latency 32, IRQ 11
I/O ports at ec00 [size=128]
Memory at df001000 (32-bit, non-prefetchable) [size=128]
Expansion ROM at <unassigned> [disabled] [size=128K]
Capabilities: [dc] Power Management version 2

Without heavy net usage, boomerang_interrupt can take
as long as 1950 microseconds. That would be from mounting
an NFS filesystem and receiving broadcast packets.
I didn't have an opportunity to hit NFS hard today.

That's from rdtsc on a 1002-MHz Pentium III.

> Bear in mind that if some other device generates an interrupt while the CPU
> is running boomerang_interrupt(), lockmeter will count the time spent in
> that other device's interrupt as "time spent in boomerand_interrupt()".
> Which is very true, but it is not much help when one is trying to identify
> the source of the problem.

Do the Intel IRQ controller priority rules play a role here?

> Perhaps what you should do is to do an rdtsc on entry and exit of do_IRQ()
> and print stuff out when "long" periods of time in do_IRQ() are noticed.

I added code to the top and bottom of do_IRQ, as well as to
the top and bottom of boomerang_interrupt. The lockmeter was
compiled into the kernel but never enabled. I record the
minimum and maximum time in microseconds.

-------------------------------
IRQ num use min max
--- ------ -------- --- -------
0 746770 timer 40 103595
1 936 i8042 13 389773
2 0 cascade - -
3 - - - -
4 9 serial 28 56
5 0 uhci-hcd - -
6 - - 711 711
7 - - 25 25
8 - - - -
9 - - - -
10 - - - -
11 2417 eth0 87 1535331
12 60 i8042 18 102895
13 - - - -
14 13844 ide0 8 51944
15 2 ide1 7 11
NMI 0
LOC 751552
ERR 0
MIS 0
-------------------------------



2003-07-26 03:03:17

by Andrew Morton

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

Albert Cahalan <[email protected]> wrote:
>
> I hope you don't consider a 100 Mb/s PCI device to be
> a nasty old NIC. It's not an NE2000 you know! I have this:

Sorry, I got my boomerangs and vortices mixed up. Vortex is the ancient one.

> I added code to the top and bottom of do_IRQ, as well as to
> the top and bottom of boomerang_interrupt. The lockmeter was
> compiled into the kernel but never enabled. I record the
> minimum and maximum time in microseconds.
>
> -------------------------------
> IRQ num use min max
> --- ------ -------- --- -------
> 0 746770 timer 40 103595
> 1 936 i8042 13 389773
> 2 0 cascade - -
> 3 - - - -
> 4 9 serial 28 56
> 5 0 uhci-hcd - -
> 6 - - 711 711
> 7 - - 25 25
> 8 - - - -
> 9 - - - -
> 10 - - - -
> 11 2417 eth0 87 1535331
> 12 60 i8042 18 102895
> 13 - - - -
> 14 13844 ide0 8 51944
> 15 2 ide1 7 11

But did your instrumentation account for nested interrupts? What happens
if a slow i8042 interrupt happens in the middle of a 3c59x interrupt?

Still, that probably doesn't account for the stalls.

I don't know what does account for it, frankly. You could try dropping the
2.4 driver into the 2.5 tree just to verify that it is not a driver
problem. The driver has hardly changed at all.

2003-07-26 15:15:38

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

On Fri, 25 Jul 2003, Andrew Morton wrote:

> But did your instrumentation account for nested interrupts? What happens
> if a slow i8042 interrupt happens in the middle of a 3c59x interrupt?

Just to verify that, he could remove the local_irq_enable for
!SA_INTERRUPT.

Zwane
--
function.linuxpower.ca

2003-07-29 03:04:52

by Albert Cahalan

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

On Sat, 2003-07-26 at 11:16, Zwane Mwaikambo wrote:
> On Fri, 25 Jul 2003, Andrew Morton wrote:
>
> > But did your instrumentation account for nested interrupts? What happens
> > if a slow i8042 interrupt happens in the middle of a 3c59x interrupt?
>
> Just to verify that, he could remove the local_irq_enable for
> !SA_INTERRUPT.

OK, I did this. Now, in microseconds, I get:

------------------------
IRQ use min max
--- -------- --- -------
0 timer 40 103968
1 i8042 14 1138 (was 389773)
2 cascade - -
3 - - -
4 serial 29 56
5 uhci-hcd - -
6 - 690 690
7 - 40 40
8 - - -
9 - - -
10 - - -
11 eth0 73 31332 (was 1535331)
12 i8042 18 215 (was 102895)
13 - - -
14 ide0 7 43846
15 ide1 7 12
------------------------

boomerang_interrupt itself takes 4 to 59 microseconds.

Then I switched to 2.6.0-test2. Testing more, I get the
problem with or without SMP and with or without
preemption. Here's a chunk of my log file:

Loosing too many ticks!
TSC cannot be used as a timesource. (Are you running with SpeedStep?)
Falling back to a sane timesource.
psmouse.c: Lost synchronization, throwing 3 bytes away.
psmouse.c: Lost synchronization, throwing 1 bytes away.

Arrrrgh! The TSC is my only good time source!

Remember that this is a pretty normal system. I have
a Red Hat 8 install w/ required upgrades, ext3, IDE,
a 1-GHz Pentium III, a boring VIA chipset, etc.

To reproduce, I do some PS/2 mouse movement while
doing one of:

a. Lots of concurrent write() and sync() activity to ext3.
b. Lots of NFSv3 traffic.


2003-07-29 03:15:21

by Andrew Morton

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

Albert Cahalan <[email protected]> wrote:
>
> OK, I did this. Now, in microseconds, I get:
>
> ------------------------
> IRQ use min max
> --- -------- --- -------
> 0 timer 40 103968
> 1 i8042 14 1138 (was 389773)
> 2 cascade - -
> 3 - - -
> 4 serial 29 56
> 5 uhci-hcd - -
> 6 - 690 690
> 7 - 40 40
> 8 - - -
> 9 - - -
> 10 - - -
> 11 eth0 73 31332 (was 1535331)
> 12 i8042 18 215 (was 102895)
> 13 - - -
> 14 ide0 7 43846
> 15 ide1 7 12
> ------------------------
>
> boomerang_interrupt itself takes 4 to 59 microseconds.

So this looks OK, yes? (Is that instrumentation patch productisable?
Looks handly, albeit a subset of microstate accounting)

> Then I switched to 2.6.0-test2. Testing more, I get the
> problem with or without SMP and with or without
> preemption. Here's a chunk of my log file:
>
> Loosing too many ticks!
> TSC cannot be used as a timesource. (Are you running with SpeedStep?)
> Falling back to a sane timesource.
> psmouse.c: Lost synchronization, throwing 3 bytes away.
> psmouse.c: Lost synchronization, throwing 1 bytes away.
>
> Arrrrgh! The TSC is my only good time source!

Arrrgh! More PS/2 problems!

I think the lost synchronisation is the problem, would you agree?

The person who fixes this gets a Nobel prize.

> Remember that this is a pretty normal system. I have
> a Red Hat 8 install w/ required upgrades, ext3, IDE,
> a 1-GHz Pentium III, a boring VIA chipset, etc.
>
> To reproduce, I do some PS/2 mouse movement while
> doing one of:
>
> a. Lots of concurrent write() and sync() activity to ext3.
> b. Lots of NFSv3 traffic.

ie: lots of interrupt traffic causes the PS2 driver to go whacky?

2003-07-29 12:49:46

by Albert Cahalan

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

On Mon, 2003-07-28 at 23:14, Andrew Morton wrote:
> Albert Cahalan <[email protected]> wrote:

> > OK, I did this. Now, in microseconds, I get:
> >
> > ------------------------
> > IRQ use min max
> > --- -------- --- -------
> > 0 timer 40 103968
> > 1 i8042 14 1138 (was 389773)
> > 2 cascade - -
> > 3 - - -
> > 4 serial 29 56
> > 5 uhci-hcd - -
> > 6 - 690 690
> > 7 - 40 40
> > 8 - - -
> > 9 - - -
> > 10 - - -
> > 11 eth0 73 31332 (was 1535331)
> > 12 i8042 18 215 (was 102895)
> > 13 - - -
> > 14 ide0 7 43846
> > 15 ide1 7 12
> > ------------------------
> >
> > boomerang_interrupt itself takes 4 to 59 microseconds.
>
> So this looks OK, yes?

I suppose boomerang_interrupt itself is OK.
Spending 104 ms in IRQ 0, 31 ms in IRQ 11, and
44 ms in IRQ 14 is not at all OK. I was hoping
to get under 200 microseconds for everything.

> (Is that instrumentation patch productisable?
> Looks handly, albeit a subset of microstate accounting)

Not really. I printk() when a value exceeds the
saved maximum, then scan my logs for the first
and last values. There's also hard-coded knowledge
of my 1-GHz CPU, which lets me convert to microseconds
as follows: us = (unsigned)(ns64>>3)/125u;

(that lets me handle up to 32 seconds)

Huh. So the minimum value is really the first value.
Later values could be less, but that's not important.
I suppose that true min/max via a /proc file would
be pretty easy to implement. I like my 1-GHz hack.
I like a TSC that measures in nanoseconds too.

> > Then I switched to 2.6.0-test2. Testing more, I get the
> > problem with or without SMP and with or without
> > preemption. Here's a chunk of my log file:
> >
> > Loosing too many ticks!
> > TSC cannot be used as a timesource. (Are you running with SpeedStep?)
> > Falling back to a sane timesource.
> > psmouse.c: Lost synchronization, throwing 3 bytes away.
> > psmouse.c: Lost synchronization, throwing 1 bytes away.
> >
> > Arrrrgh! The TSC is my only good time source!
>
> Arrrgh! More PS/2 problems!
>
> I think the lost synchronisation is the problem, would you agree?

It's one problem. It's a problem other people have seen.
My TSC should be good though; I'd like to use it.
At times ntpd (the NTP daemon) gets really unhappy with
the situation, yanking my clock ahead by up to 10 minutes
to compensate for lost time.

> The person who fixes this gets a Nobel prize.
>
> > Remember that this is a pretty normal system. I have
> > a Red Hat 8 install w/ required upgrades, ext3, IDE,
> > a 1-GHz Pentium III, a boring VIA chipset, etc.
> >
> > To reproduce, I do some PS/2 mouse movement while
> > doing one of:
> >
> > a. Lots of concurrent write() and sync() activity to ext3.
> > b. Lots of NFSv3 traffic.
>
> ie: lots of interrupt traffic causes the PS2 driver to go whacky?

I guess so. The ext3+IDE behavior seems to lift the blame
from boomerang_interrupt. Using ext3+IDE, I seem to need
a couple minutes to reproduce the problem. NFSv3+Ethernet
will give me the problem almost instantly.



2003-07-29 19:10:26

by Andrew Morton

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

Albert Cahalan <[email protected]> wrote:
>
> > So this looks OK, yes?
>
> I suppose boomerang_interrupt itself is OK.
> Spending 104 ms in IRQ 0, 31 ms in IRQ 11, and
> 44 ms in IRQ 14 is not at all OK. I was hoping
> to get under 200 microseconds for everything.

I misread that.

Last time I checked (which was about 18 months ago) the maximum
interrupts-off time on a 500MHz desktop-class machine was 80 microseconds.

Something is broken there. Do you have another machine to sanity check
against?

2003-07-29 19:47:59

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

On Tue, 29 Jul 2003, Andrew Morton wrote:

> Albert Cahalan <[email protected]> wrote:
> >
> > > So this looks OK, yes?
> >
> > I suppose boomerang_interrupt itself is OK.
> > Spending 104 ms in IRQ 0, 31 ms in IRQ 11, and
> > 44 ms in IRQ 14 is not at all OK. I was hoping
> > to get under 200 microseconds for everything.
>
> I misread that.
>
> Last time I checked (which was about 18 months ago) the maximum
> interrupts-off time on a 500MHz desktop-class machine was 80 microseconds.

IDE has traditionally been a small headache in that department. I need to
find out how it fares in 2.5


--
function.linuxpower.ca

2003-07-29 19:50:14

by Chris Friesen

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

Andrew Morton wrote:

> Last time I checked (which was about 18 months ago) the maximum
> interrupts-off time on a 500MHz desktop-class machine was 80 microseconds.

You might want to bump that up a little bit. Querying carrier signal on
a tulip chip is 100usecs with interrupts off.

Doesn't make any difference here though.

Chris


--
Chris Friesen | MailStop: 043/33/F10
Nortel Networks | work: (613) 765-0557
3500 Carling Avenue | fax: (613) 765-2986
Nepean, ON K2H 8E9 Canada | email: [email protected]

2003-07-30 06:29:32

by Pavel Machek

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

Hi!

> > Loosing too many ticks!
> > TSC cannot be used as a timesource. (Are you running with SpeedStep?)
> > Falling back to a sane timesource.
> > psmouse.c: Lost synchronization, throwing 3 bytes away.
> > psmouse.c: Lost synchronization, throwing 1 bytes away.
> >
> > Arrrrgh! The TSC is my only good time source!
>
> Arrrgh! More PS/2 problems!
>
> I think the lost synchronisation is the problem, would you agree?
>
> The person who fixes this gets a Nobel prize.


If you set ps/2 synchronization timeout to 20 seconds, you are going to make vojtech
unhappy (he likes that code :-), but at least 2.6.0 will not be worse than 2.4.x...

Do you want me to create a patch?
--
Pavel
Written on sharp zaurus, because my Velo1 broke. If you have Velo you don't need...

2003-07-30 06:39:41

by Andrew Morton

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

Pavel Machek <[email protected]> wrote:
>
> Hi!
>
> > > Loosing too many ticks!
> > > TSC cannot be used as a timesource. (Are you running with SpeedStep?)
> > > Falling back to a sane timesource.
> > > psmouse.c: Lost synchronization, throwing 3 bytes away.
> > > psmouse.c: Lost synchronization, throwing 1 bytes away.
> > >
> > > Arrrrgh! The TSC is my only good time source!
> >
> > Arrrgh! More PS/2 problems!
> >
> > I think the lost synchronisation is the problem, would you agree?
> >
> > The person who fixes this gets a Nobel prize.
>
>
> If you set ps/2 synchronization timeout to 20 seconds, you are going to make vojtech
> unhappy (he likes that code :-), but at least 2.6.0 will not be worse than 2.4.x...

2.6 is currently much worse than 2.4: we're buried in what appear to be
many different varieties of PS/2 bug reports.


> Do you want me to create a patch?

Well I do not know what the problem with synchronisation is, not what
solution you propose.

But yeah, I like patches ;)


2003-07-30 12:38:58

by Albert Cahalan

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

On Wed, 2003-07-30 at 01:08, Pavel Machek wrote:
> Hi!
>
> > > Loosing too many ticks!
> > > TSC cannot be used as a timesource. (Are you running with SpeedStep?)
> > > Falling back to a sane timesource.
> > > psmouse.c: Lost synchronization, throwing 3 bytes away.
> > > psmouse.c: Lost synchronization, throwing 1 bytes away.
> > >
> > > Arrrrgh! The TSC is my only good time source!
> >
> > Arrrgh! More PS/2 problems!
> >
> > I think the lost synchronisation is the problem, would you agree?
> >
> > The person who fixes this gets a Nobel prize.
>
>
> If you set ps/2 synchronization timeout to 20 seconds, you are going to make vojtech
> unhappy (he likes that code :-), but at least 2.6.0 will not be worse than 2.4.x...
>
> Do you want me to create a patch?

No. That will just hide one symptom of the problem,
making things more difficult to debug.

It won't fix my clock, which the ntpd program keeps
complaining about. Under heavy load, my clock falls
behind so much that ntpd gives up on the gentle treatment
and just yanks the clock forward by as much as 10 minutes.

It won't make the mouse run well. Maybe you'd stop the
mouse from going crazy from time to time, but there'd
still temporary freezes from time to time. (not OK!)

It won't convince Linux that my TSC isn't broken.

It won't solve Mikael Pettersson's problem, posted
under the subject "[BUG] 2.6.0-test2 loses time on 486".
He writes:

"My old 486 test box is losing time at an alarming rate
when running 2.6.0-test kernels. It loses almost 2 minutes
per hour, less if it sits idle. This problem does not
occur when it's running a 2.4 kernel."

Gee, I get that too, on a 1 GHz Pentium III. It seems
we're all losing LOTS of clock ticks and other interrupts.

I took the net-related email addresses off the Cc: list.
Please leave me on it so I don't have to break threading.


2003-07-30 19:24:54

by Mikael Pettersson

[permalink] [raw]
Subject: Re: another must-fix: major PS/2 mouse problem

On 30 Jul 2003 08:29:32 -0400, Albert Cahalan wrote:
>> > > psmouse.c: Lost synchronization, throwing 3 bytes away.
>> > > psmouse.c: Lost synchronization, throwing 1 bytes away.
>> > >
>> > > Arrrrgh! The TSC is my only good time source!
>> >
>> > Arrrgh! More PS/2 problems!
>> >
>> > I think the lost synchronisation is the problem, would you agree?
>> >
>> > The person who fixes this gets a Nobel prize.
...
>It won't make the mouse run well. Maybe you'd stop the
>mouse from going crazy from time to time, but there'd
>still temporary freezes from time to time. (not OK!)

FWIW, the problems my Dell Latitude had with the external
mice I use with it were significantly reduced once I added
"psmouse_noext" to the kernel's command line. That one
change eliminated all lost sync messages and general craziness
after resumes from suspended state.

To make the mouse move at proper speed w/o jerkiness I
also had to tweak the rate and scaling programmed into it
to match 2.4 defaults. (rate 100, scale 2:1)

In fairness, only my old Latitude has these PS/2 issues.

/Mikael