2002-09-09 22:16:14

by Greg KH

[permalink] [raw]
Subject: [BK PATCH] USB changes for 2.5.34

This includes the previous USB changes I sent out for 2.5.33 but never
made it into your tree.

Pull from: bk://linuxusb.bkbits.net/linus-2.5

thanks,

greg k-h

drivers/usb/core/hcd.c | 21 +-
drivers/usb/core/inode.c | 43 +++--
drivers/usb/host/ehci-dbg.c | 141 +++++++++-------
drivers/usb/host/ehci-hcd.c | 117 +++++++++----
drivers/usb/host/ehci-hub.c | 13 +
drivers/usb/host/ehci-q.c | 107 +++++-------
drivers/usb/host/ehci-sched.c | 32 +--
drivers/usb/host/ehci.h | 24 ++
drivers/usb/host/ohci-dbg.c | 256 ++++++++++++++++++++++++++----
drivers/usb/host/ohci-hcd.c | 79 ++++-----
drivers/usb/host/ohci-mem.c | 27 ---
drivers/usb/host/ohci-q.c | 314 +++++++++++++++++++------------------
drivers/usb/host/ohci.h | 27 +--
drivers/usb/host/uhci-hcd.c | 187 ++++++++++------------
drivers/usb/host/uhci-hcd.h | 43 ++++-
drivers/usb/image/scanner.c | 6
drivers/usb/media/se401.c | 128 +++++++--------
drivers/usb/media/se401.h | 1
drivers/usb/net/kaweth.c | 2
drivers/usb/net/pegasus.h | 6
drivers/usb/net/usbnet.c | 5
drivers/usb/storage/protocol.c | 69 ++++++--
drivers/usb/storage/transport.c | 5
drivers/usb/storage/unusual_devs.h | 14 +
drivers/usb/storage/usb.c | 8
drivers/usb/storage/usb.h | 1
26 files changed, 1018 insertions(+), 658 deletions(-)
-----

[email protected], 2002-09-09 14:21:49-07:00, [email protected]
[PATCH] USB: se401 driver update


drivers/usb/media/se401.c | 128 +++++++++++++++++++++++-----------------------
drivers/usb/media/se401.h | 1
2 files changed, 65 insertions(+), 64 deletions(-)
------

[email protected], 2002-09-09 13:59:18-07:00, [email protected]
[PATCH] [PATCH 2.5.33+] ohci and iso-in

I added a bug in 2.5.23 when cleaning up something that
was broken ... it wasn't broken in quite the way I had
thought at the time!

This fixes a problem some folk have reported recently
with ISO-IN, by masking a common non-error outcome.

Please merge to Linus' tree, on top of the one patch
you already have queued. Thanks to Nemosoft for such
quick turnaround on testing!

drivers/usb/host/ohci-q.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletion(-)
------

[email protected], 2002-09-09 13:58:51-07:00, [email protected]
[PATCH] [PATCH] (repost) fix for big endian machines in scanner.c

This patch fixes a problem with big endian machines and scanner drivers which
use the SCANNER_IOCTL_CTRLMSG ioctl. The big endian to little endian swap was
done twice, resulting in a no-op.

drivers/usb/image/scanner.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
------

[email protected], 2002-09-09 13:58:32-07:00, [email protected]
[PATCH] USB storage: abort bug fix

Also, have you sent in the one-line fix I found for the abort bug?
Andries found that it cured his BUG_ON problem. In case you didn't save a
copy of it, I've included it below.

drivers/usb/storage/transport.c | 3 +++
1 files changed, 3 insertions(+)
------

[email protected], 2002-09-05 08:40:44-07:00, [email protected]
[PATCH] usbnet, add YOPY device IDs

A now-happy Yopy user sent me these IDs.

drivers/usb/net/usbnet.c | 5 +++++
1 files changed, 5 insertions(+)
------

[email protected], 2002-09-05 08:40:25-07:00, [email protected]
[PATCH] two byte offset for kaweth

this is the two byte offset patch to kaweth for 2.5
to prevent MIPS crashing and speed up other arches.

drivers/usb/net/kaweth.c | 2 ++
1 files changed, 2 insertions(+)
------

[email protected], 2002-09-05 08:33:20-07:00, [email protected]
USB: storage driver: replace show_trace() with BUG()

drivers/usb/storage/transport.c | 2 +-
1 files changed, 1 insertion(+), 1 deletion(-)
------

[email protected], 2002-09-04 17:35:36-07:00, [email protected]
[PATCH] Feiya 5-in-1 Card Reader

I have a USB 5-in-1 Card Reader, that will read CF and SM and SD/MMC.
Under Linux it appears as three SCSI devices.
For today, the report is on the CF part.

The CF part works fine under ordinary usb-storage SCSI simulation,
with one small problem: 8 and 32 MB cards, that are detected as
having 15872 and 63488 sectors by other readers, are detected as
having 15873 and 63489 sectors by this Feiya reader
(0x090c / 0x1132).
In the good old days probably nobody would have noticed, but these
days the partition reading code also wants to read the last sector.
This results in the SCSI code taking the device off line:

[USB storage does a READ_10, which fails since the sector is past
the end of the disk. Then it tries a READ_6 and nothing ever happens,
probably because the device does not support READ_6. Then the
error handler does an abort which triggers some bugs in scsiglue.c
and transport.c, then the error handler does a device reset, then
a host reset, then a bus reset, and finally the device is taken offline.]

The patch below does not address any bugs in the SCSI error code
(a big improvement would be just to rip it all out - this error code
never achieves anything useful but has crashed many a machine)
and does not fix the USB code either.
It just adds a flag to the unusual_devices section mentioning that
this device (my revision is 1.00) has this bug.

Without the patch the kernel crashes, or insmod usb-storage hangs.
With the patch the CF part of the device works perfectly.

(Another change is to only print "Fixing INQUIRY data" when
really something is changed, not when the data was OK already.)

Andries

drivers/usb/storage/protocol.c | 69 ++++++++++++++++++++++++++++---------
drivers/usb/storage/unusual_devs.h | 7 +++
drivers/usb/storage/usb.h | 1
3 files changed, 62 insertions(+), 15 deletions(-)
------

[email protected], 2002-09-04 12:08:14-07:00, [email protected]
USB: clean up the error path in create_special_files() for usbfs

Thanks to David Brownell for pointing out the problem here.

drivers/usb/core/inode.c | 42 ++++++++++++++++++++++++++++++++----------
1 files changed, 32 insertions(+), 10 deletions(-)
------

[email protected], 2002-09-04 11:16:26-07:00, [email protected]
[PATCH] 2.5.32-usb

This patch appears not to be in 2.5.32, but applies cleanly.

The following patch fixes 3 problems in USB:

1. Don't pci_map buffers when we know we're not going to pass them
to a device.

This was first noticed on ARM (no surprises here); the root hub
code, rh_call_control(), placed data into the buffer and then
called usb_hcd_giveback_urb(). This function called
pci_unmap_single() on this region which promptly destroyed the
data that rh_call_control() had placed there. This lead to a
corrupted device descriptor and the "too many configurations"
message.

2. If controller->hcca is NULL, don't try to dereference it.

3. If we free the root hub (in ohci-hcd.c or uhci-hcd.c), don't
leave a dangling pointer around to trip us up in usb_disconnect().
EHCI appears to get this right.

drivers/usb/core/hcd.c | 21 +++++++++++----------
drivers/usb/host/ohci-dbg.c | 3 ++-
drivers/usb/host/ohci-hcd.c | 3 ++-
drivers/usb/host/uhci-hcd.c | 1 +
4 files changed, 16 insertions(+), 12 deletions(-)
------

[email protected], 2002-09-04 11:04:10-07:00, [email protected]
USB: remove __NO_VERSION__

Thanks to Rusty "trivial" Russell

drivers/usb/core/inode.c | 1 -
1 files changed, 1 deletion(-)
------

[email protected], 2002-09-04 09:54:02-07:00, [email protected]
[PATCH] USB: pegasus driver patch

one more adapter, changed company name and forgotten flag

drivers/usb/net/pegasus.h | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)
------

[email protected], 2002-09-04 09:21:00-07:00, [email protected]
[PATCH] ohci-hcd endpoint scheduling, driverfs

This patch cleans up some messy parts of this driver, and
was pleasantly painless.

- gets rid of ED dma hashtables
* less memory needed
* also less (+faster) code
* ... rewrites all ED scheduling ops, they now use
cpu addresses, like EHCI and UHCI do already

- simplifies ED scheduling (no dma hashtables)
* control and bulk lists are now doubly linked
* periodic tree still singly linked; driver uses a
new CPU view "shadow" of the hardware framelist
* previous periodic code was cryptic, almost read-only
* simpler tree code for EDs with {branch,period}

- bugfixes periodic scheduling
* when CONFIG_USB_BANDWIDTH, checks per-frame load
against the limit; no more dodgey accounting
* handles iso period != 1; interrupt and iso schedule
EDs with the same routine (HW sees special TDs)
* credit usbfs with bandwidth for endpoints, not URBs

- adds driverfs output (when CONFIG_USB_DEBUG)
* resembles EHCI: 'async' (control+bulk) and
'periodic' (interrupt+iso) files show schedules
* shows only queue heads (EDs) just now (*)

- has minor text and code cleanups, etc

Now that this logic has morphed into more comprehensible
form, I know what to borrow into the EHCI code!


(*) It shows TDs on the td_list, but this patch won't
put them there. A queue fault handling update will.

drivers/usb/host/ohci-dbg.c | 253 +++++++++++++++++++++++++++++++-----
drivers/usb/host/ohci-hcd.c | 67 ++++-----
drivers/usb/host/ohci-mem.c | 27 ---
drivers/usb/host/ohci-q.c | 308 +++++++++++++++++++++++---------------------
drivers/usb/host/ohci.h | 27 +--
5 files changed, 425 insertions(+), 257 deletions(-)
------

[email protected], 2002-09-04 09:18:15-07:00, [email protected]
[PATCH] PATCH: usb-storage: fix software eject

This patch fixes the recently broken software eject of media. At least, it
should. I'm back to having compile problems again, but the fix should
be pretty self-evident.

drivers/usb/storage/usb.c | 8 +++++++-
1 files changed, 7 insertions(+), 1 deletion(-)
------

[email protected], 2002-09-04 09:16:59-07:00, [email protected]
[PATCH] Re: updated ehci patch ...

* keep watchdog on shorter leash, and just do
standard irq processing when it barks. this
means I can use a somewhat iffy vt8235 mobo.

* updates to the driverfs debug output, including
using S_IRUGO so anyone can gawk.

* some updates, mostly to use a new hcd_to_bus(),
so this version also compiles on a (slightly
patched) 2.4.20-pre5 kernel. (*)

drivers/usb/host/ehci-dbg.c | 141 +++++++++++++++++++++++-------------------
drivers/usb/host/ehci-hcd.c | 53 +++++++++------
drivers/usb/host/ehci-hub.c | 13 ++-
drivers/usb/host/ehci-q.c | 9 +-
drivers/usb/host/ehci-sched.c | 6 -
drivers/usb/host/ehci.h | 22 ++++++
6 files changed, 150 insertions(+), 94 deletions(-)
------

[email protected], 2002-09-04 09:16:29-07:00, [email protected]
[PATCH] ehci locking

I've been chasing problems on a KT333 based system, with
the 8253 southbridge and EHCI 1.0 (!), and this fixes at
least some of them:

- locking updates:
* a few routines weren't protected right
* less irqsave thrashing for schedule lock

- adds a watchdog timer that should fire when the
STS_IAA interrupt seems to be missing.

- gives ports back to companion UHCI/OHCI on rmmod

- re-enables faulted QH only after all its completion
callbacks have done their work

- removes an oops I've seen when usb-storage unlinks
stuff. (it seemed confused about error handling, but
that's not a reason to oops.)

- minor cleanup: deadcode rm, etc

Right now the watchdog just barks, and that mechanism might
go away (or into the shared hcd code). Sometimes the issue
it reports seems to clear up by itself, but sometimes not...

drivers/usb/host/ehci-hcd.c | 64 ++++++++++++++++++++++-----
drivers/usb/host/ehci-q.c | 98 +++++++++++++++++-------------------------
drivers/usb/host/ehci-sched.c | 26 ++++-------
drivers/usb/host/ehci.h | 2
4 files changed, 105 insertions(+), 85 deletions(-)
------

[email protected], 2002-09-04 09:06:34-07:00, [email protected]
[PATCH] Lexar USB CF Reader

Two weeks ago I sent this patch to the listed USB storage maintainer
([email protected]) and have not yet heard back. The
attached patch adds support for the Lexar USB CF Reader identified by
id_product 0xb002, version 0x0113 (which is the version I have). This
patch is against the 2.4.19 kernel, sorry if this is the wrong address
to send this stuff to. Thanks.

drivers/usb/storage/unusual_devs.h | 7 +++++++
1 files changed, 7 insertions(+)
------

[email protected], 2002-09-04 09:06:07-07:00, [email protected]
[PATCH] pci_free_consistent on ohci initialisation failure

The trace at the end of the message shows the init failure.

drivers/usb/host/ohci-hcd.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)
------

[email protected], 2002-09-04 09:05:37-07:00, [email protected]
[PATCH] Re: [patch 2.5.31-bk5] uhci, misc

This patch has some small UHCI bugfixes

- on submit error, frees memory and (!) returns error code
- root hub should disconnect only once
- pci pool code shouldn't be given GFP_DMA
- uses del_timer_sync(), which behaves on SMP, not del_timer()

and cleanups:

- use container_of
- doesn't replicate so much hcd state
- no such status -ECONNABORTED
- uses bus_name in procfs, not "hc0", "hc1" etc

drivers/usb/host/uhci-hcd.c | 101 ++++++++++++++++----------------------------
drivers/usb/host/uhci-hcd.h | 8 ---
2 files changed, 38 insertions(+), 71 deletions(-)
------

[email protected], 2002-09-04 09:05:07-07:00, [email protected]
[PATCH] uhci, doc + cleanup

Another UHCI patch. I'm sending this since Dan said he was going to
start teaching "uhci-hcd" how to do control and interrupt queueing,
and this may help. Granted it checks out (I didn't test the part
that has a chance to break, though it "looks right"), I think it
should get merged in at some point. What it does:

- updates and adds some comments/docs
- gets rid of a "magic number" calling convention, instead passing
an explicit flag UHCI_PTR_DEPTH or UHCI_PTR_BREADTH (self-doc :)
- deletes bits of unused/dead code
- updates the append-to-qh code:
* start using list_for_each() ... clearer than handcrafted
loops, and it prefetches too. Lots of places should get
updated to do this, IMO.
* re-orders some stuff to fix a sequencing problem
* adds ascii-art to show how the urb queueing is done
(based on some email Johannes sent me recently)

That sequencing problem is that when splicing a QH between A and B,
it currently splices A-->QH before QH-->B ... so that if the HC is
looking at that chunk of schedule at that time, everything starting
at B will be ignored during the rest of that frame. (Since the QH
is initted to have UHCI_PTR_TERM next, stopping the schedule scan.)

I said "problem" not "bug" since in the current code it would probably
(what does that "PIIX bug" do??) just reduce control/bulk throughput.
That's because the logic is only appending towards the end of each
frame's schedule, where the FSBR loopback kicks in.

drivers/usb/host/uhci-hcd.c | 85 ++++++++++++++++++++++++--------------------
drivers/usb/host/uhci-hcd.h | 35 +++++++++++++++++-
2 files changed, 82 insertions(+), 38 deletions(-)
------


2002-09-10 00:09:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BK PATCH] USB changes for 2.5.34


Greg, please don't do this

> [email protected], 2002-09-05 08:33:20-07:00, [email protected]
> USB: storage driver: replace show_trace() with BUG()

that BUG() thing is _way_ out of line, and has killed a few of my machines
several times for no good reason. It actively hurts debuggability, because
the machine is totally dead after it, and the whole and ONLY point of
BUG() messages is to help debugging and make it clear that we can't handle
something.

In this case, we _can_ handle it, and we're much better off with a machine
that works and that you can look up the messages with than killing it.

Rule of thumb: BUG() is only good for something that never happens and
that we really have no other option for (ie state is so corrupt that
continuing is deadly).

Linus

2002-09-10 00:17:58

by Greg KH

[permalink] [raw]
Subject: Re: [BK PATCH] USB changes for 2.5.34

On Mon, Sep 09, 2002 at 05:17:40PM -0700, Linus Torvalds wrote:
>
> Greg, please don't do this
>
> > [email protected], 2002-09-05 08:33:20-07:00, [email protected]
> > USB: storage driver: replace show_trace() with BUG()
>
> that BUG() thing is _way_ out of line, and has killed a few of my machines
> several times for no good reason. It actively hurts debuggability, because
> the machine is totally dead after it, and the whole and ONLY point of
> BUG() messages is to help debugging and make it clear that we can't handle
> something.
>
> In this case, we _can_ handle it, and we're much better off with a machine
> that works and that you can look up the messages with than killing it.
>
> Rule of thumb: BUG() is only good for something that never happens and
> that we really have no other option for (ie state is so corrupt that
> continuing is deadly).

Sorry, Matt told me to add it, I didn't realize the background. Should
I leave it as show_trace(), or just remove it? Do you want me to send
you another changeset to put it back?

thanks,

greg k-h

2002-09-10 00:22:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BK PATCH] USB changes for 2.5.34


On Mon, 9 Sep 2002, Greg KH wrote:
>
> Sorry, Matt told me to add it, I didn't realize the background. Should
> I leave it as show_trace(), or just remove it? Do you want me to send
> you another changeset to put it back?

I've just excluded it, with a comment to never _ever_ kill the machine
unless there is a major reason for it.

We might want to add some "weaker" form of BUG_ON() for sanity checks that
aren't life-threatening (ie a "CHECK(a == b)" kind of debug facility) that
would be prettier than doing printk's and show_trace(), and that would
also be easier to disable for production kernels (not that I personally
much believe in disabling debugging like that - if it really isn't needed
it should be removed, not disabled).

Linus

2002-09-10 00:30:49

by Nicholas Miell

[permalink] [raw]
Subject: Re: [BK PATCH] USB changes for 2.5.34

On Mon, 2002-09-09 at 17:17, Linus Torvalds wrote:
>
> Greg, please don't do this
>
> > [email protected], 2002-09-05 08:33:20-07:00, [email protected]
> > USB: storage driver: replace show_trace() with BUG()
>
> [ cut "BUG() is for fatal errors only" ]
>
> Linus
>

show_trace isn't exported for modules, and (even worse) isn't even
implemented on all architectures, IIRC.
- Nicholas

2002-09-10 00:36:14

by Alan

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: [BK PATCH] USB changes for 2.5.34

On Tue, 2002-09-10 at 01:30, Linus Torvalds wrote:
> We might want to add some "weaker" form of BUG_ON() for sanity checks that
> aren't life-threatening (ie a "CHECK(a == b)" kind of debug facility) that
> would be prettier than doing printk's and show_trace(), and that would
> also be easier to disable for production kernels (not that I personally
> much believe in disabling debugging like that - if it really isn't needed
> it should be removed, not disabled).

I'd have thought you may well want the reverse. If the user didnt pick
the kernel debugging, don't die on software check option you want to
blow up. If they are debugging or its < 2.6.0-rc1 you want it to show
the stack and keep going


2002-09-10 01:22:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BK PATCH] USB changes for 2.5.34


On 9 Sep 2002, Nicholas Miell wrote:
>
> show_trace isn't exported for modules, and (even worse) isn't even
> implemented on all architectures, IIRC.

So? If it is a problem for people, fix it. Or remove the damn call. It
_still_ isn't valid to kill a machine for a non-fatal error.

We're not Windows. We don't take GFP's for random reasons. I'm not
interested in what some people call "hardening", but I _am_ interested in
a system that is rock solid and works even when it doesn't necessarily
expect to.

Linus

2002-09-10 01:36:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: [BK PATCH] USB changes for 2.5.34


On 10 Sep 2002, Alan Cox wrote:
>
> I'd have thought you may well want the reverse. If the user didnt pick
> the kernel debugging, don't die on software check option you want to
> blow up.

No, thanks. If it's fatal, it's a BUG(), and you blow regardless.

If it isn't fatal, there is no excuse for blowing up. EVER. That's
_especially_ true for some random user who didn't ask for, and can't
handle debugging. If it's useful information that the developer believes
he wants, it shouldn't be conditional at all.

> If they are debugging or its < 2.6.0-rc1 you want it to show
> the stack and keep going

You definitely want to keep going regardless. A BUG() that takes out the
machine is just not useful, because users who aren't ready to debug it
can't even make any reports except "it stops" (which this one did if you
were under X - the machine was just _dead_).

Basically, with the amount of locking we have, a BUG() or a blow-up just
about anywhere is lethal. Most sequences (especially in drivers, but
inside filesystems etc too) tend to hold spinlocks etc that just makes it
a bad idea to BUG() out unless you really really have to, since the
machine is not likely to survive and be able to write good reports to disk
etc at pretty much any point.

(It used to be that you could take a fault just about anywhere except for
in interrupt handlers, and Linux would try its damndest to clean up and
continue as if nothing had happened. Those days are sadly gone, and
trapping and depending on killing the process seldom works well any more).

On the whole, it's a lot better to just print out a message (and call
traces are often very useful) and continue. That's not always possible, of
course, and a lot of BUG() and BUG_ON() cases are perfectly valid simply
because sometimes there isn't anything you can do except kill the machine
and try to inform the user.

I think the historical kernel behaviour ("trap and kill and continue"
historically worked so fine for _both_ major bugs and for "random sanity
test" cases) has caused us to be a bit lazy about this sometimes.

Linus

2002-09-10 01:43:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: [BK PATCH] USB changes for 2.5.34


On Mon, 9 Sep 2002, Linus Torvalds wrote:
>
> On the whole, it's a lot better to just print out a message (and call
> traces are often very useful) and continue. That's not always possible, of
> course, and a lot of BUG() and BUG_ON() cases are perfectly valid simply
> because sometimes there isn't anything you can do except kill the machine
> and try to inform the user.

Note that from an implementation standpoint I suspect that a "trap and
continue" thing can easily be pretty much exactly as the current BUG()
with a flag somewhere, say in the "third byte" of the "ud2" instruction.

That would also make it easy to dynamically change the behaviour (ie some
people might want to explicitly make even the "warnings" fatal - a kernel
version of -Werror), and the implementation should be trivial:

#define TRAP_INSTRUCTION( lethal ) \
__asm__ __volatile__( "ud2\n" \
"\t.byte %0\n" \
"\t.word %c1\n" \
"\t.long %c2\n" \
: :"i" (lethal), "i" (__LINE__), "i" (__FILE__))

and then you have

#define BUG() TRAP_INSTRUCTION(1)
#define WARN() TRAP_INSTRUCTION(0)

or something like that (where the non-lethal version just increments eip
by 9 to jump over the extended ud2 and the information pointers).

Linus

2002-09-10 02:02:29

by Matthew Dharm

[permalink] [raw]
Subject: Re: [BK PATCH] USB changes for 2.5.34

Linus, normally I would agree with you on this, but...

(1) We can't always handle it. The current code implements a heuristic,
which is not always correct. When it gets it wrong....

(2) show_trace() was proving ineffective in bringing the bad commands to
light so they might be fixed

(3) Given that we think we've fixed all of these bad initiators, the BUG()
really is something that should never happen.

Matt

On Mon, Sep 09, 2002 at 05:17:40PM -0700, Linus Torvalds wrote:
>
> Greg, please don't do this
>
> > [email protected], 2002-09-05 08:33:20-07:00, [email protected]
> > USB: storage driver: replace show_trace() with BUG()
>
> that BUG() thing is _way_ out of line, and has killed a few of my machines
> several times for no good reason. It actively hurts debuggability, because
> the machine is totally dead after it, and the whole and ONLY point of
> BUG() messages is to help debugging and make it clear that we can't handle
> something.
>
> In this case, we _can_ handle it, and we're much better off with a machine
> that works and that you can look up the messages with than killing it.
>
> Rule of thumb: BUG() is only good for something that never happens and
> that we really have no other option for (ie state is so corrupt that
> continuing is deadly).
>
> Linus
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Matthew Dharm Home: [email protected]
Maintainer, Linux USB Mass Storage Driver

I need a computer?
-- Customer
User Friendly, 2/19/1998


Attachments:
(No filename) (1.71 kB)
(No filename) (232.00 B)
Download all attachments

2002-09-10 02:44:16

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BK PATCH] USB changes for 2.5.34


Short and sweet:

If I have a BUG() that I need to remove and replace with a printk()
(which I did) in order to even be able to debug the dang thing, then
that BUG() is a mistake. No ifs, buts or maybe's accepted. The one and
_only_ point of BUG() is to help debugging, and if it doesn't, then it
should not be there.

Longer:

On Mon, 9 Sep 2002, Matthew Dharm wrote:
>
> (3) Given that we think we've fixed all of these bad initiators, the BUG()
> really is something that should never happen.

You can't have it both ways.

Either it really never happens.

Or it _does_ happen, in which case you want sane debuggable output.

In the latter case, BUG() is the wrong thing to do, exactly because it
tends to bring the system down. In the first case it obviously doesn't
matter and isn't even relevant, and the best thing to do would be to
remove it altogether. In neither case is BUG() useful.

I'm personally in X 99% of the time except for the reasonably rare case
when I'm chasing down some bug I know I can reproduce and I want the
kernel to have access to the console.

And I doubt I'm alone in that. I suspect most people who use Linux in any
interesting situation (and no, I don't think servers are very interesting
from most standpoints) tend to do this. Agreed?

That means that the BUG() output won't even be _visible_ to 99% of all
cases. And if it causes the machine to hang, it is now not only invisible,
it cannot even be gotten hold of some other way.

Which basically means that a driver that uses BUG() for something that
isn't certain to be fatal anyway is a BUGGY driver.

If my X session is hung, that's already bad (most people will give up at
that point). If I can't even log in remotely, that's _disastrous_, since
now the BUG() causes me to not to be able to debug it at all. The BUG()
basically made itself irrlevant.

The fact is, BUG() is almost always the wrong thing to do. And it's
almost _guaranteed_ to be the wrong thing to do in a driver, since a
driver won't know what locks the rest of the kernel is holding, _and_
since it's almost always possible for a driver to try to return an error
code instead.

In short:

Either you want debugging (in which case BUG() is the wrong thing to
do), or you don't want debugging (in which case BUG() is the wrong thing
to do). You can choose either, but in neither case is BUG() acceptable.

BUG() is fine for _fundamental_ problems, where you don't have any other
choice, and where the machine really is effectively dead anyway. If the
VM notices that it's lists are corrupt, that's a BUG() thing. We don't
have much choice. If the scheduler notices that it's running on another
CPU than it thought it was running on, that's a BUG() thing.

Now, the kernel problem may be that BUG() is a bit too easy to use, and
the alternatives are not. We should fix that. But we shouldn't fix it by
using BUG() in places where it definitely doesn't belong.

Linus

2002-09-10 02:54:44

by Linus Torvalds

[permalink] [raw]
Subject: Re: [BK PATCH] USB changes for 2.5.34


On Mon, 9 Sep 2002, Linus Torvalds wrote:
>
> Which basically means that a driver that uses BUG() for something that
> isn't certain to be fatal anyway is a BUGGY driver.

There are probably good exceptions to this rule, like any rule.

For example, if the request queue (or some other fundamental internal data
structure) is found to be corrupted, I couldn't really fault a driver for
BUG()'ing out on it. It's not as if the driver could sanely even do an
end_request() in that case.

But even broken hardware is not a reason for a BUG(). For example, if the
driver notices that some part of the controller is hung hard (ie provable
hardware problem), the last thing it should do is to bring the system
down. It should fail the IO, and maybe turn itself off, but let the system
continue otherwise.

One of the best things I ever did from a debuggability standpoint was to
almost never use panic() in the base kernel, and make various kernel page
faults etc just try to kill the offender and go on.

Sometimes that "try to continue" approach ends up being nasty too (the
problem repeats and you end up with a dead machine that scrolls infinite
bug reports off the screen, making them really hard to catch), but on the
whole it tends to make things much easier to debug ("oops, I just lost my
floppy drive, let's save the messages on the harddisk and reboot").

Linus

2002-09-10 10:19:03

by Andries Brouwer

[permalink] [raw]
Subject: Re: [BK PATCH] USB changes for 2.5.34

On Mon, Sep 09, 2002 at 05:19:45PM -0700, Greg KH wrote:
> On Mon, Sep 09, 2002 at 05:17:40PM -0700, Linus Torvalds wrote:

> > Greg, please don't do this
> >
> > Rule of thumb: BUG() is only good for something that never happens
>
> Sorry, Matt told me to add it, I didn't realize the background. Should
> I leave it as show_trace(), or just remove it? Do you want me to send
> you another changeset to put it back?

There were a few BUG_ON calls in scsi_glue.c and in transport.c that
killed machines. For transport.c my source has

if (len != srb->request_bufflen) {
printk("USB transport.c: len=%d srb->request_bufflen=%d\n",
len, srb->request_bufflen);
#if 0
show_trace(NULL);
#endif
}
(show_trace was not defined here; I think the call should be deleted).

With some luck Alan's fix means that the ones in usb_stor_abort_transport()
and [us_]release() are not triggered anymore.

Andries

2002-09-10 16:41:39

by Thunder from the hill

[permalink] [raw]
Subject: Re: [BK PATCH] USB changes for 2.5.34

Hi,

On Mon, 9 Sep 2002, Linus Torvalds wrote:
> I'm personally in X 99% of the time except for the reasonably rare case
> when I'm chasing down some bug I know I can reproduce and I want the
> kernel to have access to the console.
>
> And I doubt I'm alone in that. I suspect most people who use Linux in any
> interesting situation (and no, I don't think servers are very interesting
> from most standpoints) tend to do this. Agreed?

Our gatekeeper has never even heard of X. And no, I wouldn't call it a
server. The only thing it does is to control which doors and gates are
open and which are closed, and whether or not the runaway is free...

Thunder
--
--./../...-/. -.--/---/..-/.-./..././.-../..-. .---/..-/.../- .-
--/../-./..-/-/./--..-- ../.----./.-../.-.. --./../...-/. -.--/---/..-
.- -/---/--/---/.-./.-./---/.--/.-.-.-
--./.-/-.../.-./.././.-../.-.-.-

2002-09-10 16:52:52

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: [BK PATCH] USB changes for 2.5.34

On Tue, Sep 10, 2002 at 10:46:27AM -0600, Thunder from the hill wrote:
> Hi,
>
> On Mon, 9 Sep 2002, Linus Torvalds wrote:
> > I'm personally in X 99% of the time except for the reasonably rare case
> > when I'm chasing down some bug I know I can reproduce and I want the
> > kernel to have access to the console.
> >
> > And I doubt I'm alone in that. I suspect most people who use Linux in any
> > interesting situation (and no, I don't think servers are very interesting
> > from most standpoints) tend to do this. Agreed?
>
> Our gatekeeper has never even heard of X. And no, I wouldn't call it a
> server. The only thing it does is to control which doors and gates are
> open and which are closed, and whether or not the runaway is free...

So you wanted to say that you'd prefer the machine to crash on a BUG()
than try to keep going in case of a recoverable error? I don't think
you'd like to stay locked in. At least that was what this discussion was
about.

--
Vojtech Pavlik
SuSE Labs