2000-11-17 20:28:59

by Tigran Aivazian

[permalink] [raw]
Subject: test11-pre6 still very broken

Hi,

The mysterious lockups in test11-pre5 continue in test11-pre6. It is very
difficult because the lockups appear to be kdb-specific (and kdb itself
goes mad) but when there is no kdb there is very little useful information
one can extract from a dead system...

I will start removing kernel subsystems, one by one and try to reproduce
it on as plain kernel as possible (i.e. just io, no networking etc.)

So, this not-very-useful report just says -- test11-pre6 is extremely
unstable, a simple "ltrace ls" can cause a lockup. Also, some programs
work when run normally but coredump (or hang) when run via strace, but
only sometimes, not always... (no, I don't have faulty memory, I run
memtest!)

Regards,
Tigran


2000-11-17 20:35:09

by H. Peter Anvin

[permalink] [raw]
Subject: Re: test11-pre6 still very broken

Followup to: <[email protected]>
By author: Tigran Aivazian <[email protected]>
In newsgroup: linux.dev.kernel
>
> Hi,
>
> The mysterious lockups in test11-pre5 continue in test11-pre6. It is very
> difficult because the lockups appear to be kdb-specific (and kdb itself
> goes mad) but when there is no kdb there is very little useful information
> one can extract from a dead system...
>
> I will start removing kernel subsystems, one by one and try to reproduce
> it on as plain kernel as possible (i.e. just io, no networking etc.)
>
> So, this not-very-useful report just says -- test11-pre6 is extremely
> unstable, a simple "ltrace ls" can cause a lockup. Also, some programs
> work when run normally but coredump (or hang) when run via strace, but
> only sometimes, not always... (no, I don't have faulty memory, I run
> memtest!)
>

It could be that -test5 and -test6 break some assumption kdb makes.
It has been eminently stable here.

-hpa
--
<[email protected]> at work, <[email protected]> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt

2000-11-18 00:59:57

by Keith Owens

[permalink] [raw]
Subject: Re: test11-pre6 still very broken

On Fri, 17 Nov 2000 20:00:49 +0000 (GMT),
Tigran Aivazian <[email protected]> wrote:
>The mysterious lockups in test11-pre5 continue in test11-pre6. It is very
>difficult because the lockups appear to be kdb-specific (and kdb itself
>goes mad) but when there is no kdb there is very little useful information
>one can extract from a dead system...

Race in user space debug registers versus kdb. Only appears on SMP
systems. Working on fix.

2000-11-18 05:09:37

by Keith Owens

[permalink] [raw]
Subject: Re: test11-pre6 still very broken

On Fri, 17 Nov 2000 20:00:49 +0000 (GMT),
Tigran Aivazian <[email protected]> wrote:
>The mysterious lockups in test11-pre5 continue in test11-pre6. It is very
>difficult because the lockups appear to be kdb-specific (and kdb itself
>goes mad) but when there is no kdb there is very little useful information
>one can extract from a dead system...

ftp://oss.sgi.com/projects/kdb/download/ix86/kdb-v1.6-2.4.0-test11-pre7.gz

Assorted bug fixes from my work in progress tree, including one that
fixes a race between user space use of debug and kdb, ltrace trips this.

Some people have reported keyboard lockups after leaving kdb. I have
not been able to reproduce this problem, let me know if you still see
it.

2000-11-18 05:58:03

by David Ford

[permalink] [raw]
Subject: Re: test11-pre6 still very broken

> > The mysterious lockups in test11-pre5 continue in test11-pre6. It is very
> > difficult because the lockups appear to be kdb-specific (and kdb itself

[...]

> It could be that -test5 and -test6 break some assumption kdb makes.
> It has been eminently stable here.

Whether or not the assumptions are made, the last testN series of kernels have
introduced two serious issues IMO. The first is the mysterious deadlocks and
no way to figure them out. With kdb the machine won't hang. Without it, it
deadlocks within 36 hours.

The second issue is usb. I now have two machines that lockup on boot in USB.
One is the above workstation, the second is a Compaq laptop. Unfortunately
I have no way of unplugging the USB hardware inside the laptop :P

-d


Attachments:
david.vcf (176.00 B)
Card for David Ford

2000-11-18 07:02:13

by Greg KH

[permalink] [raw]
Subject: Re: test11-pre6 still very broken

On Fri, Nov 17, 2000 at 09:27:19PM -0800, David Ford wrote:
>
> The second issue is usb. I now have two machines that lockup on boot in USB.
> One is the above workstation, the second is a Compaq laptop. Unfortunately
> I have no way of unplugging the USB hardware inside the laptop :P

Can't you not compile in the UHCI driver? Actually, it seems odd that a
Compaq laptop would have a uhci driver, as Compaq was one of the OHCI
creators...

greg k-h


--
greg@(kroah|wirex).com
http://immunix.org/~greg

2000-11-18 07:44:47

by Jeff Garzik

[permalink] [raw]
Subject: Re: test11-pre6 still very broken

Greg KH wrote:
>
> On Fri, Nov 17, 2000 at 09:27:19PM -0800, David Ford wrote:
> >
> > The second issue is usb. I now have two machines that lockup on boot in USB.
> > One is the above workstation, the second is a Compaq laptop. Unfortunately
> > I have no way of unplugging the USB hardware inside the laptop :P
>
> Can't you not compile in the UHCI driver? Actually, it seems odd that a
> Compaq laptop would have a uhci driver, as Compaq was one of the OHCI
> creators...

It's quite common actually. Many newer Compaq laptops, including my
dinky Presario, use Via for their north/southbridge chipset.

Jeff


--
Jeff Garzik |
Building 1024 | The chief enemy of creativity is "good" sense
MandrakeSoft | -- Picasso

2000-11-18 07:57:30

by Ben Ford

[permalink] [raw]
Subject: Re: test11-pre6 still very broken

Here is lspci output from the laptop in question. Is this not UHCI?

[ben@Juanita ben]$ /sbin/lspci
00:00.0 Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge (rev 03)
00:01.0 PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev 03)
00:07.0 ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 02)
00:07.1 IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 01)
00:07.2 USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 01)
00:07.3 Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 03)
00:09.0 Communication controller: Lucent Microelectronics 56k WinModem (rev 01)
00:0a.0 CardBus bridge: Texas Instruments: Unknown device ac50 (rev 01)
00:11.0 Multimedia audio controller: ESS Technology ES1969 Solo-1 Audiodrive (rev
02)
00:12.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08)
01:00.0 VGA compatible controller: ATI Technologies Inc 3D Rage P/M Mobility AGP
2x (rev 64)

The machine hangs on warm reboot almost every time. On cold boot, it never has
the problem.

-b



Greg KH wrote:

> On Fri, Nov 17, 2000 at 09:27:19PM -0800, David Ford wrote:
> >
> > The second issue is usb. I now have two machines that lockup on boot in USB.
> > One is the above workstation, the second is a Compaq laptop. Unfortunately
> > I have no way of unplugging the USB hardware inside the laptop :P
>
> Can't you not compile in the UHCI driver? Actually, it seems odd that a
> Compaq laptop would have a uhci driver, as Compaq was one of the OHCI
> creators...
>
> greg k-h
>
> --
> greg@(kroah|wirex).com
> http://immunix.org/~greg
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/


Attachments:
smime.p7s (2.55 kB)
S/MIME Cryptographic Signature

2000-11-18 08:26:54

by Greg KH

[permalink] [raw]
Subject: Re: test11-pre6 still very broken

On Fri, Nov 17, 2000 at 11:25:50PM -0800, Ben Ford wrote:
> Here is lspci output from the laptop in question. Is this not UHCI?

Yes it is. Just a bit funny if you think about it, but with Intel and
Via putting the UHCI core into their chipsets I guess it makes sense.

One note for the archives, if you are presented a choice between a OHCI
or a UHCI controller, go for the OHCI. It has a "cleaner" interface,
handles more of the logic in the silicon, and due to this provides
faster transfers.

In it's defense, the UHCI design was the first one, and OHCI
capitalized on that by fixing some of its weaknesses. Hopefully the
same thing will not happen for USB 2.0, and everyone will like EHCI.


greg k-h
(who has UHCI in all of his machines except one.)

--
greg@(kroah|wirex).com
http://immunix.org/~greg

2000-11-18 18:48:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: test11-pre6 still very broken

In article <[email protected]>, Greg KH <[email protected]> wrote:
>On Fri, Nov 17, 2000 at 11:25:50PM -0800, Ben Ford wrote:
>> Here is lspci output from the laptop in question. Is this not UHCI?
>
>Yes it is. Just a bit funny if you think about it, but with Intel and
>Via putting the UHCI core into their chipsets I guess it makes sense.
>
>One note for the archives, if you are presented a choice between a OHCI
>or a UHCI controller, go for the OHCI. It has a "cleaner" interface,
>handles more of the logic in the silicon, and due to this provides
>faster transfers.

I'd disagree. UHCI has tons of advantages, not the least of which is
[Cthat it was there first and is widely available. If OHCI hadn't been
done we'd have _one_ nice good USB controller implementation instead of
fighting stupid and unnecessary battles that shouldn't have existed in
the first place.

For example, the UHCI root hub can be controlled without DMA, which
makes it a lot cheaper on the system. When a UHCI system is unconnected
and idle, it doesn't waste cycles on extra memory traffic the way OHCI
does.

UHCI also requires fewer transistors, and is the more common by far
simply because Intel is good at getting their chipsets out.

Basically, the advantages of OHCI are not worth the differentiation, and
are not always advantages at all. Many people think that it is "good"
that the root hub looks more like a regular hub, but that's just wrong.

Especially with faster speeds, the memory pressure of the USB controller
is going to be noticeable, and it would be much preferable if the root
directory of the USB tree would be separated out (and cached in the
controller) by the root hub. The UHCI approach of making the root a bit
special should be taken _further_, and not seen as a mistake.

I hope EHCI makes it all moot. Some way or another.

Linus

2000-11-19 20:35:29

by Pavel Machek

[permalink] [raw]
Subject: Re: test11-pre6 still very broken

Hi!

> >One note for the archives, if you are presented a choice between a OHCI
> >or a UHCI controller, go for the OHCI. It has a "cleaner" interface,
> >handles more of the logic in the silicon, and due to this provides
> >faster transfers.
>
> I'd disagree. UHCI has tons of advantages, not the least of which is
> [Cthat it was there first and is widely available. If OHCI hadn't been
> done we'd have _one_ nice good USB controller implementation instead of
> fighting stupid and unnecessary battles that shouldn't have existed in
> the first place.

UHCI has one bad disadvantage: the way it is designed, you can choose
either slow USB or slow system.

If you are doing bulk usb transfers at high speed (faster than ISDN modem,
or so), you need to make loop in the command "tree", which hogs down your
PCI bus (leading to slow overall performance). It is called FSBR and its
ugly. 50% system slowdown due to stupid UHCI.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2000-11-20 13:07:48

by Thomas Sailer

[permalink] [raw]
Subject: Re: test11-pre6 still very broken

Linus Torvalds wrote:

> I'd disagree. UHCI has tons of advantages, not the least of which is
> [Cthat it was there first and is widely available. If OHCI hadn't been
> done we'd have _one_ nice good USB controller implementation instead of

UHCI has a couple of disadvantages, though (and some of them could have
been fixed with only very little added gates).

For example:

- one cannot reliably unlink transfer buffers from the queues without
waiting 1ms
- bandwidth reclamation can be a real PCI hog

> I hope EHCI makes it all moot. Some way or another.

Only for USB2 devices. EHCI is supposed to be paired with an existing
UHCI or OHCI controller core that is supposed to take over the USB connector
if an USB 1.x hub or device is plugged in. So we end up needing to support
UHCI and OHCI for a very long time, I don't see mice and keyboards going
USB2 anytime soon 8-)

Tom

2000-11-21 12:02:59

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: test11-pre6 still very broken

On Mon, Nov 20, 2000 at 01:37:23PM +0100, Thomas Sailer wrote:

> > I hope EHCI makes it all moot. Some way or another.
>
> Only for USB2 devices. EHCI is supposed to be paired with an existing
> UHCI or OHCI controller core that is supposed to take over the USB connector
> if an USB 1.x hub or device is plugged in. So we end up needing to support
> UHCI and OHCI for a very long time, I don't see mice and keyboards going
> USB2 anytime soon 8-)

Oops? I thought the paired controller there is for OSes not being able
to handle EHCI yet? So that USB works even for those ... I think EHCI
should handle even 1.x devices ... I may be wrong, though.

--
Vojtech Pavlik
SuSE Labs

2000-11-21 12:05:39

by Thomas Sailer

[permalink] [raw]
Subject: Re: test11-pre6 still very broken

Vojtech Pavlik wrote:

> Oops? I thought the paired controller there is for OSes not being able
> to handle EHCI yet? So that USB works even for those ... I think EHCI
> should handle even 1.x devices ... I may be wrong, though.

Check the Intel EHCI spec. Esp. the chapter about port handover...

Tom