2008-02-15 21:46:22

by Andrew Buehler

[permalink] [raw]
Subject: USB regression (and other failures) in 2.6.2[45]*

In my workplace, I use a customized version of Novell's ZENworks imaging
boot CD, which is based off of Linux. I have one particular model of
laptop - the IBM/Lenovo R61 - on which three different things fail
completely in current kernels (tested with 2.6.24.2 and 2.6.25-rc1):
USB, AHCI (and thus access to the SATA drive), and networking. As a
consequence of all three failing in parallel, I have no practical way to
get logs and other information off of the machine to help with tracking
down the bugs.

I am primarily concerned about the AHCI and networking issues, since
they are what need to be working in order for us to do what we need to
with these boot discs and these laptops. However, I intend to focus on
the USB issue first, because it seems slightly more tractable and fixing
it would allow me to reliably get logs off of the computer so as to
provide information to help track down and fix the other problems.

Specifically, the USB issue is more tractable in that I have found one
narrow set of circumstances in which I *can* get it to work, and so have
been able to obtain an lspci log and a dmesg log from the failing
laptop. I seem to remember the lkml FAQ advising not to simply attach
such files unsolicited, so I have not provided them here, but I am more
than willing to send them (and the matching .config file) along upon
request. Instead, I will do my best to summarize the errors as I have
observed them, though that best may be somewhat poor. In the following,
unless explicitly specified, I am using 2.6.25-rc1, simply because I
expect that it will be more likely to get attention and fixes than
earlier (released) versions.


Early in the boot process, immediately after the 'io scheduler foobar
registered' lines, the message

====
0000:00:1a.7 EHCI: BIOS handoff failed (BIOS bug?) 01010001
====

appears twice. Despite the parenthetical suggestion, I do not believe
that the problem could be a bug in the BIOS, because Windows is able to
access all of the hardware on these laptops - including USB devices,
which is what I understand EHCI to involve - without the slightest
difficulty.

If there is no USB Flash drive is connected during the boot process,
there are no further apparently-USB-related errors during boot that I
can recognize, and various messages about USB host controllers being
detected appear; they seem to be perfectly normal. When the boot process
completes, connecting such a drive produces no visible response
whatsoever.

If on the other hand there *is* a USB Flash drive connected during the
boot process, there are many other USB-related messages, some of which
appear to be errors. I am not certain which are in fact relevant, and
would prefer not to simply copy-and-paste blindly from the log; if the
information is necessary, I would prefer to simply provide the entire
log rather than risk missing something important. However, when the boot
process is done and the usb-storage module is loaded, the drive is in
fact recognized and can be mounted, though it is very slow to respond;
in my one test it took ~20+ seconds to mount the drive (512MB, vfat), an
unmeasured but quite long time to dump dmesg into a file on that drive,
a barely noticeable but still present blink to copy /proc/config.gz to
the drive, and four seconds to unmount afterwards.


For reference, I have on hand a version of this same boot disc built
using kernel 2.6.23.1, which does not produce the EHCI errors, and on
which the USB drive is usable in exactly the way I expect from a Linux
system. I have not made a significant attempt to narrow down the point
at which the functionality broke, but I can do so if desired, though it
will take some time - the more so as I can test this only while at work,
and am facing an impending three-day weekend.

(I do not have a working git environment, and do not understand well how
to set one up, as the mechanics and to some extent the interface
semantics of git seem to be rather different from those of any VCS with
which I am familiar. That is, however, the only reason - aside from the
time involved - why I would be unwilling to track down the exact change
which caused the regression.)

I am quite certain that I have not provided enough information to
address the problem. Please let me know what would be necessary, and I
will do my best to provide it. Additionally, if I have made any major
flubs (of etiquette or otherwise), please do point them out so that I
can avoid them in future.

--
Andrew Buehler


2008-02-16 14:32:23

by Oliver Pinter

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]*

add CC (Andrew, Greg and linux-usb)

On 2/15/08, Andrew Buehler <[email protected]> wrote:
> In my workplace, I use a customized version of Novell's ZENworks imaging
> boot CD, which is based off of Linux. I have one particular model of
> laptop - the IBM/Lenovo R61 - on which three different things fail
> completely in current kernels (tested with 2.6.24.2 and 2.6.25-rc1):
> USB, AHCI (and thus access to the SATA drive), and networking. As a
> consequence of all three failing in parallel, I have no practical way to
> get logs and other information off of the machine to help with tracking
> down the bugs.
>
> I am primarily concerned about the AHCI and networking issues, since
> they are what need to be working in order for us to do what we need to
> with these boot discs and these laptops. However, I intend to focus on
> the USB issue first, because it seems slightly more tractable and fixing
> it would allow me to reliably get logs off of the computer so as to
> provide information to help track down and fix the other problems.
>
> Specifically, the USB issue is more tractable in that I have found one
> narrow set of circumstances in which I *can* get it to work, and so have
> been able to obtain an lspci log and a dmesg log from the failing
> laptop. I seem to remember the lkml FAQ advising not to simply attach
> such files unsolicited, so I have not provided them here, but I am more
> than willing to send them (and the matching .config file) along upon
> request. Instead, I will do my best to summarize the errors as I have
> observed them, though that best may be somewhat poor. In the following,
> unless explicitly specified, I am using 2.6.25-rc1, simply because I
> expect that it will be more likely to get attention and fixes than
> earlier (released) versions.
>
>
> Early in the boot process, immediately after the 'io scheduler foobar
> registered' lines, the message
>
> ====
> 0000:00:1a.7 EHCI: BIOS handoff failed (BIOS bug?) 01010001
> ====
>
> appears twice. Despite the parenthetical suggestion, I do not believe
> that the problem could be a bug in the BIOS, because Windows is able to
> access all of the hardware on these laptops - including USB devices,
> which is what I understand EHCI to involve - without the slightest
> difficulty.
>
> If there is no USB Flash drive is connected during the boot process,
> there are no further apparently-USB-related errors during boot that I
> can recognize, and various messages about USB host controllers being
> detected appear; they seem to be perfectly normal. When the boot process
> completes, connecting such a drive produces no visible response
> whatsoever.
>
> If on the other hand there *is* a USB Flash drive connected during the
> boot process, there are many other USB-related messages, some of which
> appear to be errors. I am not certain which are in fact relevant, and
> would prefer not to simply copy-and-paste blindly from the log; if the
> information is necessary, I would prefer to simply provide the entire
> log rather than risk missing something important. However, when the boot
> process is done and the usb-storage module is loaded, the drive is in
> fact recognized and can be mounted, though it is very slow to respond;
> in my one test it took ~20+ seconds to mount the drive (512MB, vfat), an
> unmeasured but quite long time to dump dmesg into a file on that drive,
> a barely noticeable but still present blink to copy /proc/config.gz to
> the drive, and four seconds to unmount afterwards.
>
>
> For reference, I have on hand a version of this same boot disc built
> using kernel 2.6.23.1, which does not produce the EHCI errors, and on
> which the USB drive is usable in exactly the way I expect from a Linux
> system. I have not made a significant attempt to narrow down the point
> at which the functionality broke, but I can do so if desired, though it
> will take some time - the more so as I can test this only while at work,
> and am facing an impending three-day weekend.
>
> (I do not have a working git environment, and do not understand well how
> to set one up, as the mechanics and to some extent the interface
> semantics of git seem to be rather different from those of any VCS with
> which I am familiar. That is, however, the only reason - aside from the
> time involved - why I would be unwilling to track down the exact change
> which caused the regression.)
>
> I am quite certain that I have not provided enough information to
> address the problem. Please let me know what would be necessary, and I
> will do my best to provide it. Additionally, if I have made any major
> flubs (of etiquette or otherwise), please do point them out so that I
> can avoid them in future.
>
> --
> Andrew Buehler
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


--
Thanks,
Oliver

2008-02-16 15:20:29

by Alan Stern

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]*

On Sat, 16 Feb 2008, Oliver Pinter wrote:

> On 2/15/08, Andrew Buehler <[email protected]> wrote:
> > In my workplace, I use a customized version of Novell's ZENworks imaging
> > boot CD, which is based off of Linux. I have one particular model of
> > laptop - the IBM/Lenovo R61 - on which three different things fail
> > completely in current kernels (tested with 2.6.24.2 and 2.6.25-rc1):
> > USB, AHCI (and thus access to the SATA drive), and networking. As a
> > consequence of all three failing in parallel, I have no practical way to
> > get logs and other information off of the machine to help with tracking
> > down the bugs.

...

To make a long story short, the USB symptoms you describe indicate a
problem with interrupt routing. This could well explain the other
difficulties too. There are various kernel parameters you can try
putting on the boot command line to work around it: acpi=noirq or
acpi=off or pci=noacpi or a few others.

Alan Stern

2008-02-16 16:47:09

by Andrew Buehler

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]*

(Note: I consider it blatantly incorrect to send a reply both to a
mailing list and directly to the address of someone who is subscribed to
that list unless you have reason to believe that that someone will not
see the message otherwise, but in this case I am doing so anyway because
I see no way to avoid it and still make sure all relevant people receive
the message.)

On 2/16/2008 10:20 AM, Alan Stern wrote:

> On Sat, 16 Feb 2008, Oliver Pinter wrote:
>
>> On 2/15/08, Andrew Buehler <[email protected]> wrote:
>>> In my workplace, I use a customized version of Novell's ZENworks
>>> imaging boot CD, which is based off of Linux. I have one
>>> particular model of laptop - the IBM/Lenovo R61 - on which three
>>> different things fail completely in current kernels (tested with
>>> 2.6.24.2 and 2.6.25-rc1): USB, AHCI (and thus access to the SATA
>>> drive), and networking. As a consequence of all three failing in
>>> parallel, I have no practical way to get logs and other
>>> information off of the machine to help with tracking down the
>>> bugs.
>
> ...
>
> To make a long story short, the USB symptoms you describe indicate a
> problem with interrupt routing. This could well explain the other
> difficulties too. There are various kernel parameters you can try
> putting on the boot command line to work around it: acpi=noirq or
> acpi=off or pci=noacpi or a few others.

I have now tried all three of these, with no apparent effect; the USB
drive is still not detected when plugged in after boot. A naive search
on Google provides no indication of other possible parameters to try;
the only list I have found of ACPI-related kernel parameters includes no
others which seem likely to be helpful without more knowledge of the
specifics of the situation (and the subsystem) than I have.

What would the next step be?

--
Andrew Buehler

2008-02-16 17:17:20

by Alan Stern

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]*

On Sat, 16 Feb 2008, Andrew Buehler wrote:

> (Note: I consider it blatantly incorrect to send a reply both to a
> mailing list and directly to the address of someone who is subscribed to
> that list unless you have reason to believe that that someone will not
> see the message otherwise, but in this case I am doing so anyway because
> I see no way to avoid it and still make sure all relevant people receive
> the message.)

I (and a lot of other people as well, to judge by the email I receive)
don't think this is incorrect. For one thing, it's not always possible
to tell whether or not the recipient is subscribed to any of the lists.
For another, getting two copies of a message is no big deal -- more
irritating (IMO) is getting a rejection message as a result of replying
to message which was cross-posted to a closed list. But in each case,
hitting the "d" key will delete the unwanted message.

In fact, the thing that bothers me the most is when people reply to a
long email with just a few lines of new text but don't bother to prune
the long message down to its essential parts. This forces me to read
through hundreds of lines containing nothing new or of interest in
order to obtain a minimal amount of useful information.

> On 2/16/2008 10:20 AM, Alan Stern wrote:

> > To make a long story short, the USB symptoms you describe indicate a
> > problem with interrupt routing. This could well explain the other
> > difficulties too. There are various kernel parameters you can try
> > putting on the boot command line to work around it: acpi=noirq or
> > acpi=off or pci=noacpi or a few others.
>
> I have now tried all three of these, with no apparent effect; the USB
> drive is still not detected when plugged in after boot. A naive search
> on Google provides no indication of other possible parameters to try;
> the only list I have found of ACPI-related kernel parameters includes no
> others which seem likely to be helpful without more knowledge of the
> specifics of the situation (and the subsystem) than I have.
>
> What would the next step be?

People on LKML who are more familiar with interrupt routing problems
might be able to offer more help. For now, you can try things like
turning on CONFIG_USB_DEBUG, posting the output from dmesg, posting the
contents of /proc/interrupts (say before and after a new USB device is
plugged in).

Assuming that the 2.6.23 kernel works on your computer, you can go the
extreme route of installing git and doing a bisection to find the first
patch causing your difficulty.

Alan Stern

2008-02-16 21:34:18

by Andrew Buehler

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]*

CPU0
0: 90 IO-APIC-edge timer
1: 39 IO-APIC-edge i8042
2: 0 XT-PIC-XT cascade
10: 0 IO-APIC-edge ahci, uhci_hcd:usb5
11: 0 IH-APIC-edge ehci_hcd:usb1, ehci_hcd:usb2, uhci_hcd:usb3,uhci_hcd:usb4, uhci_hcd:usb6, uhci_hcd:usb7
12: 2332 IO-APIC-edge i8042
14: 152 IO-APIC-edge ide0
NMI: 0 Non-maskable interrupts
LOC: 97683 Local timer interrupts
RES: 0 Rescheduling interrupts
CAL: 0 function call interrupts
TLB: 0 TLB shootdowns
TRM: 0 Thermal event interrupts
SPU: 0 Spurious interrupts
ERR: 0
MIS: 0


Attachments:
r61-dmesg-usb_drive_during_boot-2.6.25-rc1.txt (54.31 kB)
r61-interrupts-1.txt (721.00 B)
r61-interrupts-2.txt (721.00 B)
Download all attachments

2008-02-16 23:11:43

by Alan Stern

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]*

On Sat, 16 Feb 2008, Andrew Buehler wrote:

> > For another, getting two copies of a message is no big deal --
>
> I disagree.

Everyone has his own taste. Obviously there's no world-wide consensus,
possibly because different people have different workflow habits and so
are affected by duplicate messages to varying extents.

> When I receive a message sent directly to me in a discussion which is on
> a list, I expect that it is because someone either considered it
> important enough to warrant making certain it came to my attention
> specifically, or wanted to continue the discussion but felt that it
> should not continue to take place on the mailing list.

Sometimes that is the case but often it isn't. Your expectations are
at variance with other people's behavior; you shouldn't expect everyone
else to change just to match your personal ideals.

On the other hand, I would be perfectly happy to edit your name out of
the reply list -- but since you said you aren't receiving all the
messages in this thread via the list that might not be a good thing to
do at the moment...


> > People on LKML who are more familiar with interrupt routing problems
> > might be able to offer more help. For now, you can try things like
> > turning on CONFIG_USB_DEBUG, posting the output from dmesg, posting
> > the contents of /proc/interrupts (say before and after a new USB
> > device is plugged in).
>
> In my current testing kernel, which I believe is the one with which I
> captured the sole successful non-2.6.23.1 dmesg so far, CONFIG_USB_DEBUG
> is on. The associated dmesg (obtained yesterday from booting with the
> Flash drive connected) is attached. (The flood of 'no version magic,
> tainting kernel' messages between line 600 and line 1160 are a side
> effect of Novell's custom environment which I have not yet made the
> effort to fix; the boot scripts attempt to detect the network card by
> modprobing every network driver available until they find one which
> works. Here, because the correct one fails, they wind up trying each one
> twice.)

The line saying:

> ehci_hcd 0000:00:1d.7: Unlink after no-IRQ? Controller is probably using the wrong IRQ.

is an indication that interrupt routing is indeed not working right.
Or possibly your EHCI controller isn't working. You could try
blacklisting or unloading ehci-hcd to see if that helps. Of course
then none of your USB devices would be able to run at high speed.

> I have transcribed the contents of /proc/interrupts both before and
> after plugging in the Flash drive I have been using for testing, and
> they are also attached. I have been as careful as I could to be sure
> that the contents of the attached 'r61-interrupts-[12].txt' files is the
> same as what was shown on the laptop, but cannot absolutely guarantee
> that I have not missed something. For the record, the '1' is from before
> connecting the drive, and the '2' is from after.

Notice that the interrupt count for IRQ 11 doesn't change when you plug
in the device. Obviously something is wrong there.

In fact, it's a little surprising that almost all the USB controllers
are routed to the same IRQ. However this is beyond my area of
expertise. You could try posting a message on the linux-acpi mailing
list; the people there should know a lot more about these issues.


> > Assuming that the 2.6.23 kernel works on your computer, you can go
> > the extreme route of installing git and doing a bisection to find the
> > first patch causing your difficulty.
>
> That would require me to learn enough of how git works, as distinct from
> more traditional VCSes, to be able to use it with some confidence. This
> is not impossible - indeed I want to do it at some point - but for the
> time being I have no idea where to start, and indeed I am not especially
> clear on exactly what (from a user's perspective) the differences been
> git and e.g. CVS or Subversion are. I know that the entire concept
> relies around a lack of centralization, but I have not been able to get
> my head around what that means in a practical sense.

There are some excellent tutorials on the web, with detailed
explanations of how to do a bisection to track down a kernel bug.

Alan Stern

2008-02-17 01:12:58

by Andrew Buehler

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]*

On 2/16/2008 6:11 PM, Alan Stern wrote:

> On Sat, 16 Feb 2008, Andrew Buehler wrote:
>
>>> For another, getting two copies of a message is no big deal --
>>
>> I disagree.
>
> Everyone has his own taste. Obviously there's no world-wide
> consensus, possibly because different people have different workflow
> habits and so are affected by duplicate messages to varying extents.

I am well aware that this particular point is opinion. I have had
justifications for and arguments in favor of it in the past, but none of
them come readily to mind at the moment, except for the one gone over
briefly below.

>> When I receive a message sent directly to me in a discussion which
>> is on a list, I expect that it is because someone either considered
>> it important enough to warrant making certain it came to my
>> attention specifically, or wanted to continue the discussion but
>> felt that it should not continue to take place on the mailing list.
>>
>
> Sometimes that is the case but often it isn't. Your expectations are
> at variance with other people's behavior; you shouldn't expect
> everyone else to change just to match your personal ideals.

Messages sent to my address directly are explicitly not filtered into
the folders I have set up for various mailing lists, so that if someone
does send me a "heads up" reply for a specific topic on a list to which
I am subscribed it does not get caught by the list filter and fail to
come to my attention. If a message fails to be filtered into any
mailing-list folder, then I should be able to conclude that it is
specifically intended for me, and not part of normal mailing-list
traffic. The practice of sending replies to both addresses renders this
an invalid conclusion. I do not think that it is unreasonable to expect
that conclusion to be valid.

> On the other hand, I would be perfectly happy to edit your name out
> of the reply list -- but since you said you aren't receiving all the
> messages in this thread via the list that might not be a good thing
> to do at the moment...

It's not that I'm not receiving all of this thread's messages via the
list - it's that I'm not receiving *any* of them via the list, and I
suspect that the reason is that my address is in both the To:/Cc: and
the list itself. Something is filtering it such that I do not receive
"duplicate" replies in this way, but it is doing so by filtering out the
list copy rather than the direct copy. I have seen mailing lists which
do this before, but I see no other indication that the LKML is one of
them, and I would not be in the least surprised if this turned out to be
yet one more problem with gmail.

As far as I am aware, I am seeing all messages posted to the list which
do not have me in To: or Cc:. I suspect that if a reply in this thread
were posted to the list but not sent to me, I would see it on the list.
It might be worth an experiment, but since it would increase traffic for
other list members to no purpose it is probably not worth it overall.

>>> People on LKML who are more familiar with interrupt routing
>>> problems might be able to offer more help. For now, you can try
>>> things like turning on CONFIG_USB_DEBUG, posting the output from
>>> dmesg, posting the contents of /proc/interrupts (say before and
>>> after a new USB device is plugged in).
>>
>> In my current testing kernel, which I believe is the one with which
>> I captured the sole successful non-2.6.23.1 dmesg so far,
>> CONFIG_USB_DEBUG is on. The associated dmesg (obtained yesterday
>> from booting with the Flash drive connected) is attached. (The
>> flood of 'no version magic, tainting kernel' messages between line
>> 600 and line 1160 are a side effect of Novell's custom environment
>> which I have not yet made the effort to fix; the boot scripts
>> attempt to detect the network card by modprobing every network
>> driver available until they find one which works. Here, because the
>> correct one fails, they wind up trying each one twice.)
>
> The line saying:
>
>> ehci_hcd 0000:00:1d.7: Unlink after no-IRQ? Controller is probably
>> using the wrong IRQ.
>
> is an indication that interrupt routing is indeed not working right.
> Or possibly your EHCI controller isn't working. You could try
> blacklisting or unloading ehci-hcd to see if that helps. Of course
> then none of your USB devices would be able to run at high speed.

ehci-hcd is not modular in my current kernel, and if there is a way to
turn it off without its being modular I am not aware of it. I will have
to jump through a few hoops to be able to obtain a copy of the boot CD
with an updated kernel while not at work, but I will try to do so
sometime tomorrow.

In practical terms, I am frankly not especially bothered by the lack of
support for high-speed USB in Linux on this machine; the primary reason
I am interested in USB there at the moment, aside from a general
philosophy of "unsupported devices are bad and anything I can do to help
them become supported is good", is because getting it working would
allow me to easily get the necessary information out to be able to
properly report the other problems, with AHCI and networking.

>> I have transcribed the contents of /proc/interrupts both before and
>> after plugging in the Flash drive I have been using for testing,
>> and they are also attached. I have been as careful as I could to be
>> sure that the contents of the attached 'r61-interrupts-[12].txt'
>> files is the same as what was shown on the laptop, but cannot
>> absolutely guarantee that I have not missed something. For the
>> record, the '1' is from before connecting the drive, and the '2' is
>> from after.
>
> Notice that the interrupt count for IRQ 11 doesn't change when you
> plug in the device. Obviously something is wrong there.
>
> In fact, it's a little surprising that almost all the USB controllers
> are routed to the same IRQ. However this is beyond my area of
> expertise. You could try posting a message on the linux-acpi mailing
> list; the people there should know a lot more about these issues.

Until this thread, I was not even aware that ACPI was related to USB; I
had largely conflated it with a similar acronym which I think is related
to power management and which I can suddenly not even find in my kernel
config. I will, however, look into linux-acpi.

>>> Assuming that the 2.6.23 kernel works on your computer, you can
>>> go the extreme route of installing git and doing a bisection to
>>> find the first patch causing your difficulty.
>>
>> That would require me to learn enough of how git works, as distinct
>> from more traditional VCSes, to be able to use it with some
>> confidence. This is not impossible - indeed I want to do it at some
>> point - but for the time being I have no idea where to start, and
>> indeed I am not especially clear on exactly what (from a user's
>> perspective) the differences been git and e.g. CVS or Subversion
>> are. I know that the entire concept relies around a lack of
>> centralization, but I have not been able to get my head around what
>> that means in a practical sense.
>
> There are some excellent tutorials on the web, with detailed
> explanations of how to do a bisection to track down a kernel bug.

I have found at least a place to start, and am reading up on the
subject. I will most likely not be able to make a practical start on
this until at least Tuesday, as not having direct access to the machine
I will in the long term be building on makes some things impractical,
but if no solution is forthcoming in the meantime I will expect to do this.

That will not be helpful for the other two problems, however, since
neither of them was ever working as far as I am aware. That also leaves
me hesitant to conclude that they are rooted in the same IRQ issue as
the USB problem appears to be.

Which lists or other addresses would be appropriate for reporting
problems with AHCI/libata and with networking, specifically with the
e1000/e1000e drivers? I see a mailing list for e1000 in MAINTAINERS, but
only the maintainer's address for SATA/libata/whatever else may be
involved there, and I am reflexively reluctant to bother a maintainer
directly with as little information as I presently have.

--
Andrew Buehler

2008-02-17 03:35:26

by Alan Stern

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]*

On Sat, 16 Feb 2008, Andrew Buehler wrote:

> Messages sent to my address directly are explicitly not filtered into
> the folders I have set up for various mailing lists, so that if someone
> does send me a "heads up" reply for a specific topic on a list to which
> I am subscribed it does not get caught by the list filter and fail to
> come to my attention. If a message fails to be filtered into any
> mailing-list folder, then I should be able to conclude that it is
> specifically intended for me, and not part of normal mailing-list
> traffic. The practice of sending replies to both addresses renders this
> an invalid conclusion. I do not think that it is unreasonable to expect
> that conclusion to be valid.

It's not unreasonable. Neither is Aristotelian physics. Nevertheless,
neither one is a good match to reality.

Why not arrange instead that messages sent from a mailing list server
_do_ get filtered into the corresponding folder, even if they were also
sent to your address? This certainly should make your assumption (that
messages not filtered into any mailing-list folder are specifically
intended for you) much more valid than it is now.

> It's not that I'm not receiving all of this thread's messages via the
> list - it's that I'm not receiving *any* of them via the list, and I
> suspect that the reason is that my address is in both the To:/Cc: and
> the list itself. Something is filtering it such that I do not receive
> "duplicate" replies in this way, but it is doing so by filtering out the
> list copy rather than the direct copy. I have seen mailing lists which
> do this before, but I see no other indication that the LKML is one of
> them, and I would not be in the least surprised if this turned out to be
> yet one more problem with gmail.

Well, I _am_ receiving your messages by way of linux-usb as well as
directly, for whatever that's worth.


> > is an indication that interrupt routing is indeed not working right.
> > Or possibly your EHCI controller isn't working. You could try
> > blacklisting or unloading ehci-hcd to see if that helps. Of course
> > then none of your USB devices would be able to run at high speed.
>
> ehci-hcd is not modular in my current kernel, and if there is a way to
> turn it off without its being modular I am not aware of it.

Go into the /sys/bus/pci/drivers/ehci_hcd directory. Then for each
symbolic link to a controller device listed there, write the device's
name (with "echo -n") to the "unbind" file. For example,

echo -n 0000:00:1d.7 >unbind

That will have nearly the same effect as unloading ehci-hcd.

> Until this thread, I was not even aware that ACPI was related to USB; I
> had largely conflated it with a similar acronym which I think is related
> to power management and which I can suddenly not even find in my kernel
> config. I will, however, look into linux-acpi.

ACPI isn't directly related to USB; rather it has to do with
transferring information between the OS and the
BIOS/vendor-specific-hardware. Power management is example where such
a transfer is needed. In your case, the relevant information is which
IRQ is connected to which motherboard device. If you don't have ACPI
enabled in your configuration, then perhaps that's the problem -- try
enabling it.

> That will not be helpful for the other two problems, however, since
> neither of them was ever working as far as I am aware. That also leaves
> me hesitant to conclude that they are rooted in the same IRQ issue as
> the USB problem appears to be.

Maybe they aren't. But when you have multiple bugs, you have to fix
them one at a time.

> Which lists or other addresses would be appropriate for reporting
> problems with AHCI/libata and with networking, specifically with the
> e1000/e1000e drivers? I see a mailing list for e1000 in MAINTAINERS, but
> only the maintainer's address for SATA/libata/whatever else may be
> involved there, and I am reflexively reluctant to bother a maintainer
> directly with as little information as I presently have.

I don't know, but you should wait until the simpler problem is sorted
out before tackling the more complicated ones.

Alan Stern

2008-02-17 04:11:21

by Joseph Fannin

[permalink] [raw]
Subject: [OT] GMail (was USB regression (and other failures)...)

On Sat, Feb 16, 2008 at 08:12:40PM -0500, Andrew Buehler wrote:
> [...] and I would not be in the least surprised if this turned out to
> be yet one more problem with gmail

It is; Gmail will refuse to POP more than one copy of a mail to you,
no matter how many copies it recieves via different paths. Which copy
you get seems to be dependant on which arrives first, so you can't
even hope a mail exchange will consistently arrive in one mailbox or
another.

Note that this also applies to mails cross-posted to multiple lists
you maybe be suscribed to; this breaks threading fantastically.

Google is aware of the issue, and considers it a feature. If you find
another free mail service which isn't so broken, I'd love to hear
about it.

---

That said, netiquette on the kernel lists is to *never* drop CC's.
Too much traffic crosses the lists for anyone to read it all and note
anything they might be interested and/or implicated in. Never
dropping CC's allows busy people to keep track of conversations
they've taken part in or that someone thinks they should see
without the worry of missing any important parts of one.

Or at least it does if your mail system isn't broken. We get what we
pay for. :-/

--
Joseph Fannin
[email protected]

2008-02-17 07:20:28

by Paul Jackson

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]*

Andrew wrote:
> (Note: I consider it blatantly incorrect to send a reply both to a
> mailing list and directly to the address of someone who is subscribed to
> that list

Regardless of how you consider it, that is how responding to these big
lists -must- work.

There is no practical way for respondents to know, without spending at
a minimum several minutes of their time per reply, whether or not the
explicit receipients of a message are or are not also on one or more of
the receiving lists.

Do you really expect, Andrew, that I should examine the membership lists
of each of linux-scsi, linux-usb and linux-kernel (if they are even open
to the public) to see if you're subscribed to them, before responding to
a message addressed such as this?

As subscribers and submitters to such lists, we just have to learn to
deal with this reality. For example, I receive an average of a 100
messages per hour on this email address, -after- my employers spam
filters have knocked off over 90% of the incoming.

May I recommend you become an expert in procmail? That or speed
reading (and speed ignoring ;).

In a separate reply to this message, Alan Stern wrote:
> Everyone has his own taste.

This is not a matter of taste on these big lists. There is no other
practical alternative. Most of the burden of ultimate filtering must
be shifted to the recipients, and the senders asked only that they
err on the side of including every individual list or person already
on the address lists.

Joseph Fannin also replied:
> another free mail service which isn't so broken,

I'd recommend fastmail.fm as one of the least broken, most tech savvy
mail services. I believe that their free side includes IMAP, though
not POP support.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-02-17 11:28:18

by Sergey Vlasov

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]*

On Sat, Feb 16, 2008 at 04:33:41PM -0500, Andrew Buehler wrote:
> The associated dmesg (obtained yesterday from booting with the
> Flash drive connected) is attached.

This dmesg shows that ACPI is not enabled in your kernel config - most
likely this is the problem. Try to enable it:

1) In the "Power management options" submenu enable the "Power
Management support" option (CONFIG_PM) - if this option is
disabled, you will not see the option to enable ACPI below.

2) In the same submenu enable the "ACPI (Advanced Configuration and
Power Interface) Support" option (CONFIG_ACPI).

Without ACPI support the kernel can use legacy interrupt routing
tables from BIOS, but on new systems these tables are often broken due
to lack of testing (because all modern operating systems use ACPI
instead of these legacy tables).


Attachments:
(No filename) (836.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments

2008-02-17 16:17:28

by Andrew Buehler

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]*

On 2/17/2008 2:20 AM, Paul Jackson wrote:

> Andrew wrote:

(Since there are multiple Andrews on just the LKML, and at least two -
one of whom is much more prominent than I am - in the direct address
list for this discussion, I'm not sure whether or not this is a
sufficient attribution. If it works for you, though...)

>> (Note: I consider it blatantly incorrect to send a reply both to a
>> mailing list and directly to the address of someone who is
>> subscribed to that list
>
> Regardless of how you consider it, that is how responding to these
> big lists -must- work.
>
> There is no practical way for respondents to know, without spending
> at a minimum several minutes of their time per reply, whether or not
> the explicit receipients of a message are or are not also on one or
> more of the receiving lists.

As I have now acknowledged twice (and this makes three times), there
does not seem to be a practical way to avoid it in this instance. That
does not make it any less incorrect to send a duplicate private copy to
the person in question.

> Do you really expect, Andrew, that I should examine the membership
> lists of each of linux-scsi, linux-usb and linux-kernel (if they are
> even open to the public) to see if you're subscribed to them, before
> responding to a message addressed such as this?

Of course not.

> As subscribers and submitters to such lists, we just have to learn to
> deal with this reality. For example, I receive an average of a 100
> messages per hour on this email address, -after- my employers spam
> filters have knocked off over 90% of the incoming.
>
> May I recommend you become an expert in procmail? That or speed
> reading (and speed ignoring ;).

AFAIRK (though I could be mistaken), procmail is not available under
Windows, which is what I have to use for work purposes. I have an
interest in learning it form my own purposes, but it is very much on the
back burner.

> In a separate reply to this message, Alan Stern wrote:
>
>> Everyone has his own taste.
>
> This is not a matter of taste on these big lists. There is no other
> practical alternative.

I'm not disputing that. I just consider it incorrect anyway.

> Joseph Fannin also replied:
>
>> another free mail service which isn't so broken,
>
> I'd recommend fastmail.fm as one of the least broken, most tech savvy
> mail services. I believe that their free side includes IMAP, though
> not POP support.

I'm not as fond of IMAP as I used to be, though I no longer remember
exactly why, but I thank you for the recommendation. When I have
opportunity I will check it out, though that will probably not be this
week. (I also thank Joseph for the confirmation that the problem does
lie with Gmail.)



And, since there is no longer anything specifically kernel-related in
this subthread, I do not intend to reply publicly in it again unless
requested to do so.

--
Andrew Buehler

2008-02-17 16:20:49

by Paul Jackson

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]*

Andrew B wrote:
> Windows, which is what I have to use for work purposes.

aha -- my condolences ;)

Take care. Your last reply made as much sense
as we're likely to make of this one. Thanks.

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.940.382.4214

2008-02-17 16:22:02

by Andrew Buehler

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]*

On 2/16/2008 10:35 PM, Alan Stern wrote:

> On Sat, 16 Feb 2008, Andrew Buehler wrote:
>
>> Messages sent to my address directly are explicitly not filtered
>> into the folders I have set up for various mailing lists, so that
>> if someone does send me a "heads up" reply for a specific topic on
>> a list to which I am subscribed it does not get caught by the list
>> filter and fail to come to my attention. If a message fails to be
>> filtered into any mailing-list folder, then I should be able to
>> conclude that it is specifically intended for me, and not part of
>> normal mailing-list traffic. The practice of sending replies to
>> both addresses renders this an invalid conclusion. I do not think
>> that it is unreasonable to expect that conclusion to be valid.
>
> It's not unreasonable. Neither is Aristotelian physics.
> Nevertheless, neither one is a good match to reality.
>
> Why not arrange instead that messages sent from a mailing list server
> _do_ get filtered into the corresponding folder, even if they were
> also sent to your address? This certainly should make your
> assumption (that messages not filtered into any mailing-list folder
> are specifically intended for you) much more valid than it is now.

Two reasons that I can think of off the top of my head.

One of them I mentioned above: because that precludes the possibility of
someone sending me a direct copy to draw my attention to something which
they think needs it, unless they send it separately from the list copy.
(This does not especially apply on the kernel-related mailing lists,
since no one is likely to think I am particularly worth drawing in to
any discussion there anytime soon, but it has come up elsewhere and the
basic principle is the same.)

The other is that this would lead to duplicate copies of the same reply
in the mailing list folder, which is ugly at best, especially with
respect to proper threading.

>> Until this thread, I was not even aware that ACPI was related to
>> USB; I had largely conflated it with a similar acronym which I
>> think is related to power management and which I can suddenly not
>> even find in my kernel config. I will, however, look into
>> linux-acpi.
>
> ACPI isn't directly related to USB; rather it has to do with
> transferring information between the OS and the
> BIOS/vendor-specific-hardware. Power management is example where
> such a transfer is needed. In your case, the relevant information is
> which IRQ is connected to which motherboard device. If you don't
> have ACPI enabled in your configuration, then perhaps that's the
> problem -- try enabling it.

It is indeed not enabled, and when I check the config for the 2.6.23.1
kernel where USB works, I find that it is enabled there. I will test the
result of enabling it in the current kernel. If I don't have an answer
by the end of the day, I probably won't be able to get one until at
least Tuesday.

>> That will not be helpful for the other two problems, however, since
>> neither of them was ever working as far as I am aware. That also
>> leaves me hesitant to conclude that they are rooted in the same IRQ
>> issue as the USB problem appears to be.
>
> Maybe they aren't. But when you have multiple bugs, you have to fix
> them one at a time.

Oh, I agree - that is a large part of why I posted a "full" description
of only one problem initially, rather than all three in a single mail.

--
Andrew Buehler

2008-02-19 20:36:18

by Andrew Buehler

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]* - mostly resolved

On 2/16/2008 10:35 PM, Alan Stern wrote:

> On Sat, 16 Feb 2008, Andrew Buehler wrote:

>> Until this thread, I was not even aware that ACPI was related to
>> USB; I had largely conflated it with a similar acronym which I
>> think is related to power management and which I can suddenly not
>> even find in my kernel config. I will, however, look into
>> linux-acpi.
>
> ACPI isn't directly related to USB; rather it has to do with
> transferring information between the OS and the
> BIOS/vendor-specific-hardware. Power management is example where such
> a transfer is needed. In your case, the relevant information is
> which IRQ is connected to which motherboard device. If you don't have
> ACPI enabled in your configuration, then perhaps that's the problem
> -- try enabling it.

Apparently it was the problem; enabling ACPI has fixed not only the USB
problem but also the network problem (somewhat miraculously, since I'm
quite certain that I had ACPI enabled in a 2.6.23.x kernel where the
network did not work despite an apparently matching driver).

I feel somewhat foolish for having reported a regression over what turns
out to have been a simple misconfiguration, but I still do think it's
somewhat misleading at best for something so potentially important to
completely non-power-related things to be buried under the heading of
power management... I would suggest moving it somewhere else in the
config and the dependencies, except that I have neither a suggestion for
a possible place nor any idea of how much actual work that would
involve.



With those two problems out of the way, what is left is the hard-drive
issue, and that is also halfway fixed by enabling ACPI. Specifically, it
is "fixed" in that the kernel sees the hard drive and I can mount it,
but it is not fixed in that the program we need to use in this
environment does not see the drive.

I have a config from a boot disc running 2.6.5 (that's not a typo) under
which the program in question *does* see the drive, but there are
massive differences between that config and the one I am using now, and
narrowing the critical difference down is likely to be somewhat
difficult - particularly since some of the "differences" are merely
renamed config symbols (i.e. the CONFIG_SCSI_SATA_*->CONFIG_SATA_*
switchover), and I have limited ability to tell which without intensive
investigation. Are there any established techniques for simplifying this
kind of comparison?

--
Andrew Buehler

2008-02-20 15:50:34

by Alan Stern

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]* - mostly resolved

On Tue, 19 Feb 2008, Andrew Buehler wrote:

> With those two problems out of the way, what is left is the hard-drive
> issue, and that is also halfway fixed by enabling ACPI. Specifically, it
> is "fixed" in that the kernel sees the hard drive and I can mount it,
> but it is not fixed in that the program we need to use in this
> environment does not see the drive.

What do you mean by "does not see the drive"?

> I have a config from a boot disc running 2.6.5 (that's not a typo) under
> which the program in question *does* see the drive, but there are
> massive differences between that config and the one I am using now, and
> narrowing the critical difference down is likely to be somewhat
> difficult - particularly since some of the "differences" are merely
> renamed config symbols (i.e. the CONFIG_SCSI_SATA_*->CONFIG_SATA_*
> switchover), and I have limited ability to tell which without intensive
> investigation. Are there any established techniques for simplifying this
> kind of comparison?

The only established technique is to run various kernels intermediate
between the one that works and the one that fails.

Alan Stern

2008-02-20 17:06:50

by Andrew Buehler

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]* - mostly resolved

(I suspect that some of the existing CC:s can now be dropped, and others
might need to be added if indeed this is worth discussing on kernel
lists at all, but I don't know what the protocol on that is so I have
left all of them in for the moment.)

On 2/20/2008 10:50 AM, Alan Stern wrote:

> On Tue, 19 Feb 2008, Andrew Buehler wrote:
>
>> With those two problems out of the way, what is left is the
>> hard-drive issue, and that is also halfway fixed by enabling ACPI.
>> Specifically, it is "fixed" in that the kernel sees the hard drive
>> and I can mount it, but it is not fixed in that the program we need
>> to use in this environment does not see the drive.
>
> What do you mean by "does not see the drive"?

Its detect-hardware-and-report mode shows a HD size of 0 (which is what
it has showed in cases where the kernel has not detected the drive), its
detect-partitions-and-report mode shows no partitions and no devices
which can have partitions, and attempting to actually use it to pull
down a drive image (it's a disk-imaging program) causes it to hang at
the point where it would begin to write.

Hmm. One thing which just sprang to mind, in the stab-in-the-dark
category: in 2.6.24.2, launching the program on some machines gave
warnings along the lines of "this program is using a deprecated ioctl,
please convert it to SG_IO" (which I naturally cannot do since it's
closed and I don't have the source), but IIRC in the 2.5.25-rc2-based
disc with ACPI enabled no such message appears. If the reason that there
are no longer such messages is that the ioctl in question has now been
removed, that might explain why the program does not see the drive.

I have suspected that I might eventually need to port the old interface
forwards to be able to continue to use this program, but I did not
expect it to happen this soon...

>> I have a config from a boot disc running 2.6.5 (that's not a typo)
>> under which the program in question *does* see the drive, but there
>> are massive differences between that config and the one I am using
>> now, and narrowing the critical difference down is likely to be
>> somewhat difficult - particularly since some of the "differences"
>> are merely renamed config symbols (i.e. the
>> CONFIG_SCSI_SATA_*->CONFIG_SATA_* switchover), and I have limited
>> ability to tell which without intensive investigation. Are there
>> any established techniques for simplifying this kind of comparison?
>
> The only established technique is to run various kernels intermediate
> between the one that works and the one that fails.

I'm not sure I expressed myself clearly. I do not think the problem is
with the different kernels. I think the problem is with the different
configurations. I am asking if there are any established techniques for
comparing differences between config files from widely different
kernels.

Or, if you're suggesting running various kernels with configs which are
hybrids of the config which works and the current one which does not: in
order to do that I have to be able to tell what the actual differences
are, and at minimum the renamed symbols (not all of which I expect to be
able to identify at a glance) would make that quite difficult to do, so
I remain with the same problem and the same question.

--
Andrew Buehler

2008-02-20 17:15:41

by Alan Stern

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]* - mostly resolved

On Wed, 20 Feb 2008, Andrew Buehler wrote:

> > What do you mean by "does not see the drive"?
>
> Its detect-hardware-and-report mode shows a HD size of 0 (which is what
> it has showed in cases where the kernel has not detected the drive), its
> detect-partitions-and-report mode shows no partitions and no devices
> which can have partitions, and attempting to actually use it to pull
> down a drive image (it's a disk-imaging program) causes it to hang at
> the point where it would begin to write.
>
> Hmm. One thing which just sprang to mind, in the stab-in-the-dark
> category: in 2.6.24.2, launching the program on some machines gave
> warnings along the lines of "this program is using a deprecated ioctl,
> please convert it to SG_IO" (which I naturally cannot do since it's
> closed and I don't have the source),

You can ask the program's author to update it.

> but IIRC in the 2.5.25-rc2-based
> disc with ACPI enabled no such message appears. If the reason that there
> are no longer such messages is that the ioctl in question has now been
> removed, that might explain why the program does not see the drive.

Could be. You can use strace to find out what system calls the program
is making.


> I'm not sure I expressed myself clearly. I do not think the problem is
> with the different kernels. I think the problem is with the different
> configurations. I am asking if there are any established techniques for
> comparing differences between config files from widely different
> kernels.

Not as far as I know.

Alan Stern

2008-02-20 18:27:49

by Andrew Buehler

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]* - mostly resolved

On 2/20/2008 12:15 PM, Alan Stern wrote:

> On Wed, 20 Feb 2008, Andrew Buehler wrote:

>> Hmm. One thing which just sprang to mind, in the stab-in-the-dark
>> category: in 2.6.24.2, launching the program on some machines gave
>> warnings along the lines of "this program is using a deprecated
>> ioctl, please convert it to SG_IO" (which I naturally cannot do
>> since it's closed and I don't have the source),
>
> You can ask the program's author to update it.

It's provided by Novell, with whom I have no direct contact and am not
presently authorized to speak on behalf of my organization. From what I
have read about the history of their support on this program and these
discs, I do not expect that they would be willing to support it except
in environments which they provide in monolithic form; it would be
possible for me to copy an updated version of the program out of such an
environment to use in my own customized one, but I am not certain that
they have even created such an updated version, and in any case
obtaining it would almost certainly require buying the latest version of
Novell ZENworks - which my organization is certainly not prepared to do
at the present time.

In other words: I don't think that's likely to be practical in the
present instance. If you have reason to believe otherwise (past positive
experience with Novell, for instance), I'd be glad to hear it.

>> I'm not sure I expressed myself clearly. I do not think the problem
>> is with the different kernels. I think the problem is with the
>> different configurations. I am asking if there are any established
>> techniques for comparing differences between config files from
>> widely different kernels.
>
> Not as far as I know.

Oh, well... thanks anyway.

Is there any place (aside from maybe the kernel changelog, which
contains a whole lot - if not several lots - of unrelated information)
where I could find a list of config-symbol name additions, changes,
deletions and meaning changes by version or by date? That would at least
let me build a mapping between the symbols in the older config and the
ones in the new one, which is about where I would have to start.

--
Andrew Buehler

2008-02-20 19:29:55

by Alan Stern

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]* - mostly resolved

On Wed, 20 Feb 2008, Andrew Buehler wrote:

> In other words: I don't think that's likely to be practical in the
> present instance. If you have reason to believe otherwise (past positive
> experience with Novell, for instance), I'd be glad to hear it.

Greg KH may be able to help in that respect.

> Is there any place (aside from maybe the kernel changelog, which
> contains a whole lot - if not several lots - of unrelated information)
> where I could find a list of config-symbol name additions, changes,
> deletions and meaning changes by version or by date? That would at least
> let me build a mapping between the symbols in the older config and the
> ones in the new one, which is about where I would have to start.

Not as far as I know.

Alan Stern

2008-02-21 16:05:58

by Andrew Buehler

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]* - mostly resolved

On 2/20/2008 2:29 PM, Alan Stern wrote:

> On Wed, 20 Feb 2008, Andrew Buehler wrote:
>
>> In other words: I don't think that's likely to be practical in the
>> present instance. If you have reason to believe otherwise (past
>> positive experience with Novell, for instance), I'd be glad to hear
>> it.
>
> Greg KH may be able to help in that respect.

I know he's in the CC:, but I'm not sure he's reading this thread, and
I'm hesitant to bother people about things out of the blue unless I have
reason to expect that it's something they're going to care about. (Just
because it's a big deal for me doesn't mean it makes one whit of
difference to anyone else, and from what little I've seen on
linux-kernel he seems to be somewhat important and fairly busy...)

>> Is there any place (aside from maybe the kernel changelog, which
>> contains a whole lot - if not several lots - of unrelated
>> information) where I could find a list of config-symbol name
>> additions, changes, deletions and meaning changes by version or by
>> date? That would at least let me build a mapping between the
>> symbols in the older config and the ones in the new one, which is
>> about where I would have to start.
>
> Not as far as I know.

Then I suppose I'm reduced to browsing the Kconfig files, reading old
changelogs, and trying a lot of different configs... thanks anyway.

--
Andrew Buehler

2008-02-21 16:36:56

by Alan Stern

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]* - mostly resolved

On Thu, 21 Feb 2008, Andrew Buehler wrote:

> On 2/20/2008 2:29 PM, Alan Stern wrote:
>
> > On Wed, 20 Feb 2008, Andrew Buehler wrote:
> >
> >> In other words: I don't think that's likely to be practical in the
> >> present instance. If you have reason to believe otherwise (past
> >> positive experience with Novell, for instance), I'd be glad to hear
> >> it.
> >
> > Greg KH may be able to help in that respect.
>
> I know he's in the CC:, but I'm not sure he's reading this thread, and
> I'm hesitant to bother people about things out of the blue unless I have
> reason to expect that it's something they're going to care about. (Just
> because it's a big deal for me doesn't mean it makes one whit of
> difference to anyone else, and from what little I've seen on
> linux-kernel he seems to be somewhat important and fairly busy...)

In this case you shouldn't worry about it. I don't know whether Greg
has been following this thread either, but I do know that he spends a
lot of time and effort trying to improve vendors' support for Linux.
This is right up his alley. What I'm not sure about is the extent of
his influence over Novell...

Alan Stern

2008-02-21 17:22:49

by Greg KH

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]* - mostly resolved

On Thu, Feb 21, 2008 at 11:36:23AM -0500, Alan Stern wrote:
> On Thu, 21 Feb 2008, Andrew Buehler wrote:
>
> > On 2/20/2008 2:29 PM, Alan Stern wrote:
> >
> > > On Wed, 20 Feb 2008, Andrew Buehler wrote:
> > >
> > >> In other words: I don't think that's likely to be practical in the
> > >> present instance. If you have reason to believe otherwise (past
> > >> positive experience with Novell, for instance), I'd be glad to hear
> > >> it.
> > >
> > > Greg KH may be able to help in that respect.
> >
> > I know he's in the CC:, but I'm not sure he's reading this thread, and
> > I'm hesitant to bother people about things out of the blue unless I have
> > reason to expect that it's something they're going to care about. (Just
> > because it's a big deal for me doesn't mean it makes one whit of
> > difference to anyone else, and from what little I've seen on
> > linux-kernel he seems to be somewhat important and fairly busy...)
>
> In this case you shouldn't worry about it. I don't know whether Greg
> has been following this thread either, but I do know that he spends a
> lot of time and effort trying to improve vendors' support for Linux.
> This is right up his alley. What I'm not sure about is the extent of
> his influence over Novell...

Heh, yes, I've been reading this.

It sounds like an old version of a Novell product is making a newer
kernel spit out a warning message. Odds are this is fixed in a newer
one, as well as the basic issue that Novell doesn't even ship anything
based on the 2.6.24 kernel yet, so perhaps the Zenworks developers don't
even know of the issue, that's what bugzilla.novell.com is for :)

thanks,

greg k-h

2008-02-21 19:43:35

by Andrew Buehler

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]* - mostly resolved

On 2/21/2008 12:17 PM, Greg KH wrote:

> On Thu, Feb 21, 2008 at 11:36:23AM -0500, Alan Stern wrote:
>
>> On Thu, 21 Feb 2008, Andrew Buehler wrote:

[Greg KH]
>>> I know he's in the CC:, but I'm not sure he's reading this
>>> thread, and I'm hesitant to bother people about things out of the
>>> blue unless I have reason to expect that it's something they're
>>> going to care about. (Just because it's a big deal for me doesn't
>>> mean it makes one whit of difference to anyone else, and from
>>> what little I've seen on linux-kernel he seems to be somewhat
>>> important and fairly busy...)
>>
>> In this case you shouldn't worry about it. I don't know whether
>> Greg has been following this thread either, but I do know that he
>> spends a lot of time and effort trying to improve vendors' support
>> for Linux. This is right up his alley. What I'm not sure about is
>> the extent of his influence over Novell...
>
> Heh, yes, I've been reading this.
>
> It sounds like an old version of a Novell product is making a newer
> kernel spit out a warning message.

Originally that was all that it was, but now I am seeing the product in
question not even see the hard drive despite the fact that it is
mountable with standard utilities. Whether the failure is in the program
or elsewhere I don't know, but I'm hoping it's in the program, because
if it's elsewhere I have a *lot* of tedious digging and testing ahead of
me and little real idea of where to start.

> Odds are this is fixed in a newer one,

I haven't been able to find a newer version of the program, and do not
have the standing with my organization to contact Novell on their
behalf. (When I brought the idea up with the people who do have such
standing, in another context, the answer I received was essentially
"wait until we upgrade the servers to that version, which will not be
soon".) I also have not been able to find a clear contact path in any
case.

> as well as the basic issue that Novell doesn't even ship anything
> based on the 2.6.24 kernel yet, so perhaps the Zenworks developers
> don't even know of the issue, that's what bugzilla.novell.com is for
> :)

The problem has been present at least since 2.6.23.x and I think since
2.6.18, but from what I've been able to find Novell's last kernel was
based on 2.6.16.

I didn't even know there *was* a bugzilla.novell.com - and I've been
digging around Novell's site (and Googling around it from outside) for
what seems to me like quite a while. From the looks of things, however,
they don't have a category for ZENworks or for imaging, and the Linux
I'm using is not one of the SUSE-based distros they do list. (The
environment on the boot disc itself doesn't really qualify as any kind
of distro at all.)

Unfortunately, from what I've seen elsewhere it looks as if they are A:
not interested in supporting use any Linux except for their own SUSE for
this purpose and B: not likely to be open to supporting or even helping
with a customized environment; all of their technical documentation on
the subject is geared towards adding files and drivers to their existing
environment, not to replacing the entire kernel of that environment or
working with the results. Given that I'm building complete replacement
kernels and customizing rather a number of other things in the end
product, I'm rather inclined to suspect that even if I got into contact
with them about this they would say something to the effect of "since
you're not using our official environment, we won't/can't spend time and
effort trying to help you".

--
Andrew Buehler

2008-02-21 20:03:16

by Alan Stern

[permalink] [raw]
Subject: Re: USB regression (and other failures) in 2.6.2[45]* - mostly resolved

On Thu, 21 Feb 2008, Andrew Buehler wrote:

> > It sounds like an old version of a Novell product is making a newer
> > kernel spit out a warning message.
>
> Originally that was all that it was, but now I am seeing the product in
> question not even see the hard drive despite the fact that it is
> mountable with standard utilities. Whether the failure is in the program
> or elsewhere I don't know, but I'm hoping it's in the program, because
> if it's elsewhere I have a *lot* of tedious digging and testing ahead of
> me and little real idea of where to start.

You could start by running the program under strace. That should give
you a good idea of where the problem begins.

Alan Stern