Hi folks,
Second attempt now:
I already reported to Linus Torvalds and Andrew Morton that it is impossible to mount a conventional floppy drive without hanging up the whole system.
Andrew's reaction was quite ambiguous: "We did not break it"
Once again and for the last time: I do not state that floppy.c is broken. I only state that it is immpossible to mount a floppy drive with kernel 2.6.21-rc1-git1. Kernel 2.6.20 is OK. But 2.6.21-rc1-git1 is definitely buggy!
I did some work already:
a. I copied the following modules from the intact and sane kernel 2.6.20 into the 2.6.21-rc1-git1 tree:
cdrom.h, floppy.c, init.h, io.h, proc_misc.c, setup.c, timer.h, uaccess.h
b. I adjusted some hunks of the patch for module main.c (part of patch-2.6.21-rc1) to make the kernel compile without errors.
But the problem still persists, and I do not have any idea anymore where the offensive hunks in patch-2.6.21-rc1 could reside.
Questions:
a. Can someone please confirm the described problem?
b. Can someone please take action to find out where the buggy code resides?
c. Why is this untested material being pushed into main vanilla - what is going on at kernel.org please?
Please take action! The bug was introduced somewhere at the transition of 2.6.20 towards 2.6.20-git14.
Yours sincerely
Uwe
P. S.: I once again repeat my proposal: Every patch code should be tested before it is being pushed into main vanilla.
The testing tree is the mm-tree.
If this policy producing regressions persists noone takes advantage of! So please avoid those regressions in future!
If this rule "test your changes, however small, on at least 4 or 5 people" does not apply to absolutely everybody then it is worth nothing!
My platform is i386, Intel ICH4 and I am using floppy.c (Normal floppy disk support).
--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
> On Sat, 24 Feb 2007 18:54:24 +0100 "Uwe Bugla" <[email protected]> wrote:
> Hi folks,
> Second attempt now:
> I already reported to Linus Torvalds and Andrew Morton that it is impossible to mount a conventional floppy drive without hanging up the whole system.
> Andrew's reaction was quite ambiguous: "We did not break it"
Sorry, what I meant was "Neither Linus nor I broke it". ie: please report
this in a place where the person who did break it can see it. This you have
done.
> Once again and for the last time: I do not state that floppy.c is broken. I only state that it is immpossible to mount a floppy drive with kernel 2.6.21-rc1-git1. Kernel 2.6.20 is OK. But 2.6.21-rc1-git1 is definitely buggy!
> I did some work already:
> a. I copied the following modules from the intact and sane kernel 2.6.20 into the 2.6.21-rc1-git1 tree:
> cdrom.h, floppy.c, init.h, io.h, proc_misc.c, setup.c, timer.h, uaccess.h
> b. I adjusted some hunks of the patch for module main.c (part of patch-2.6.21-rc1) to make the kernel compile without errors.
> But the problem still persists, and I do not have any idea anymore where the offensive hunks in patch-2.6.21-rc1 could reside.
>
> Questions:
> a. Can someone please confirm the described problem?
> b. Can someone please take action to find out where the buggy code resides?
> c. Why is this untested material being pushed into main vanilla - what is going on at kernel.org please?
>
> Please take action! The bug was introduced somewhere at the transition of 2.6.20 towards 2.6.20-git14.
>
I think we'll find that it works OK for hundreds of other people, so it got
broken in some manner which is specific to a very small number of machines,
of which yours is one.
If that theory proves to be correct, I'm afraid that the most proactical
way of fixing this is to ask you to run a git-bisect to find the changeset
which introduced the regression.
-------- Original-Nachricht --------
Datum: Sat, 24 Feb 2007 10:07:29 -0800
Von: Andrew Morton <[email protected]>
An: "Uwe Bugla" <[email protected]>
CC: [email protected], [email protected], [email protected]
Betreff: Re: bug in kernel 2.6.21-rc1-git1: conventional floppy drive cannot be mounted without hanging up the whole system
> > On Sat, 24 Feb 2007 18:54:24 +0100 "Uwe Bugla" <[email protected]> wrote:
> > Hi folks,
> > Second attempt now:
> > I already reported to Linus Torvalds and Andrew Morton that it is
> impossible to mount a conventional floppy drive without hanging up the whole
> system.
> > Andrew's reaction was quite ambiguous: "We did not break it"
>
> Sorry, what I meant was "Neither Linus nor I broke it". ie: please report
> this in a place where the person who did break it can see it. This you
> have
> done.
OK, accepted - sorry!
>
> > Once again and for the last time: I do not state that floppy.c is
> broken. I only state that it is immpossible to mount a floppy drive with kernel
> 2.6.21-rc1-git1. Kernel 2.6.20 is OK. But 2.6.21-rc1-git1 is definitely
> buggy!
> > I did some work already:
> > a. I copied the following modules from the intact and sane kernel 2.6.20
> into the 2.6.21-rc1-git1 tree:
> > cdrom.h, floppy.c, init.h, io.h, proc_misc.c, setup.c, timer.h,
> uaccess.h
> > b. I adjusted some hunks of the patch for module main.c (part of
> patch-2.6.21-rc1) to make the kernel compile without errors.
> > But the problem still persists, and I do not have any idea anymore where
> the offensive hunks in patch-2.6.21-rc1 could reside.
> >
> > Questions:
> > a. Can someone please confirm the described problem?
> > b. Can someone please take action to find out where the buggy code
> resides?
> > c. Why is this untested material being pushed into main vanilla - what
> is going on at kernel.org please?
> >
> > Please take action! The bug was introduced somewhere at the transition
> of 2.6.20 towards 2.6.20-git14.
> >
>
> I think we'll find that it works OK for hundreds of other people, so it
> got
> broken in some manner which is specific to a very small number of
> machines,
> of which yours is one.
>
> If that theory proves to be correct, I'm afraid that the most proactical
> way of fixing this is to ask you to run a git-bisect to find the changeset
> which introduced the regression.
Guess this theory is wrong but lets wait and see to prove who is right!
>
OK, Andrew,
Problem: I do not have internet at home, I am sitting in a friends flat now.
So you cannot expect me to download all git patches of 2.6.20 to test.
Could you please explain the procedure of bisecting?
Above that I've spent hours to find out the essence of the problem and I am really beat to the bone!
And my linuxtv patches should be ported into kernel please, with or without Abraham, with or without Chehab. I swear you that they are OK and not buggy at all.
It is the wrong policy to execute protectionism on people having lots of administration power but in reality lacking the experience, who are not able to tolerate justified criticism.
I enjoyed Gerd and I enjoyed most of all german guys of the convergence crew. Those were real fine and honest people, especially Gerd himself.
Sincerely
Uwe
--
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/mailfooter/topmail-out
-------- Original-Nachricht --------
Datum: Sat, 24 Feb 2007 10:07:29 -0800
Von: Andrew Morton <[email protected]>
An: "Uwe Bugla" <[email protected]>
CC: [email protected], [email protected], [email protected]
Betreff: Re: bug in kernel 2.6.21-rc1-git1: conventional floppy drive cannot be mounted without hanging up the whole system
> > On Sat, 24 Feb 2007 18:54:24 +0100 "Uwe Bugla" <[email protected]> wrote:
> > Hi folks,
> > Second attempt now:
> > I already reported to Linus Torvalds and Andrew Morton that it is
> impossible to mount a conventional floppy drive without hanging up the whole
> system.
> > Andrew's reaction was quite ambiguous: "We did not break it"
>
> Sorry, what I meant was "Neither Linus nor I broke it". ie: please report
> this in a place where the person who did break it can see it. This you
> have
> done.
>
> > Once again and for the last time: I do not state that floppy.c is
> broken. I only state that it is immpossible to mount a floppy drive with kernel
> 2.6.21-rc1-git1. Kernel 2.6.20 is OK. But 2.6.21-rc1-git1 is definitely
> buggy!
> > I did some work already:
> > a. I copied the following modules from the intact and sane kernel 2.6.20
> into the 2.6.21-rc1-git1 tree:
> > cdrom.h, floppy.c, init.h, io.h, proc_misc.c, setup.c, timer.h,
> uaccess.h
> > b. I adjusted some hunks of the patch for module main.c (part of
> patch-2.6.21-rc1) to make the kernel compile without errors.
> > But the problem still persists, and I do not have any idea anymore where
> the offensive hunks in patch-2.6.21-rc1 could reside.
> >
> > Questions:
> > a. Can someone please confirm the described problem?
> > b. Can someone please take action to find out where the buggy code
> resides?
> > c. Why is this untested material being pushed into main vanilla - what
> is going on at kernel.org please?
> >
> > Please take action! The bug was introduced somewhere at the transition
> of 2.6.20 towards 2.6.20-git14.
> >
>
> I think we'll find that it works OK for hundreds of other people, so it
> got
> broken in some manner which is specific to a very small number of
> machines,
> of which yours is one.
>
> If that theory proves to be correct, I'm afraid that the most proactical
> way of fixing this is to ask you to run a git-bisect to find the changeset
> which introduced the regression.
>
Andrew,
it is definitely NOT my job to repair errors that other responsibility-free people pushed into vanilla mainline without the slightest test effort in some mm-tree for example. Who wasted it must repair it, without the slightest discussion!
I have provided enough information and energy to establish guidelines how to fix that error.
Above that I am still waiting for my linuxtv patches being applied which would be a nice and honest gesture.
Any further questions, Mr. Morton, or did I pronounce clear?
So take action now please!
Yours sincerely
Uwe
--
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser
> it is definitely NOT my job to repair errors that other responsibility-free people pushed into vanilla mainline without the slightest test effort in some mm-tree for example. Who wasted it must repair it, without the slightest discussion!
You're assuming the author didn't test it. For things like floppy.c,
it's possible it was tested and worked on their system just fine, but
is broken on yours. As you can replicate the problem, you're the
natural person to help narrow down the cause. This is the way all
development works, not just Linux or free software.
Please start by assuming that people mean well and are trying to do
the right thing, and you'll get a lot more productive responses.
> So take action now please!
And please remember that the vast majority of us, just like you, are volunteers.
Ray
Uwe Bugla napsal(a):
> Hi folks,
Hi.
> Once again and for the last time: I do not state that floppy.c is broken. I only state that it is immpossible to mount a floppy drive with kernel 2.6.21-rc1-git1. Kernel 2.6.20 is OK. But 2.6.21-rc1-git1 is definitely buggy!
> I did some work already:
> a. I copied the following modules from the intact and sane kernel 2.6.20 into the 2.6.21-rc1-git1 tree:
> cdrom.h, floppy.c, init.h, io.h, proc_misc.c, setup.c, timer.h, uaccess.h
> b. I adjusted some hunks of the patch for module main.c (part of patch-2.6.21-rc1) to make the kernel compile without errors.
> But the problem still persists, and I do not have any idea anymore where the offensive hunks in patch-2.6.21-rc1 could reside.
>
> Questions:
> a. Can someone please confirm the described problem?
Yup (last -mm).
> b. Can someone please take action to find out where the buggy code resides?
I'll take a look. I've seen here and tried to debug it down some time ago
without any result. Sysrq is dead, probably some kind of locking issue. LOCKDEP
screams silence, code reading returned nothing. I thought, it's problem inside
my setup, I gave it up (even reporting). But it seems to be real problem now.
> c. Why is this untested material being pushed into main vanilla - what is going on at kernel.org please?
Hard to say, nobody is perfect, this should never happen, but unfortunately it does.
regards,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E
Hnus <[email protected]> is an alias for /dev/null
Jiri Slaby napsal(a):
>> Once again and for the last time: I do not state that floppy.c is
>> broken. I only state that it is immpossible to mount a floppy drive
>> with kernel 2.6.21-rc1-git1. Kernel 2.6.20 is OK. But 2.6.21-rc1-git1
>> is definitely buggy!
>> I did some work already:
>> a. I copied the following modules from the intact and sane kernel
>> 2.6.20 into the 2.6.21-rc1-git1 tree:
>> cdrom.h, floppy.c, init.h, io.h, proc_misc.c, setup.c, timer.h, uaccess.h
>> b. I adjusted some hunks of the patch for module main.c (part of
>> patch-2.6.21-rc1) to make the kernel compile without errors.
>> But the problem still persists, and I do not have any idea anymore
>> where the offensive hunks in patch-2.6.21-rc1 could reside.
>>
>> Questions:
>> a. Can someone please confirm the described problem?
>
> Yup (last -mm).
Ok, this commit is the culprit:
Commit: 2ff2d3d74705d34ab71b21f54634fcf50d57bdd5
Author: Stephane Eranian <[email protected]> Tue, 13 Feb 2007 13:26:22 +0100
[PATCH] i386: add idle notifier
Add a notifier mechanism to the low level idle loop. You can register a
callback function which gets invoked on entry and exit from the low level id
loop. The low level idle loop is defined as the polling loop, low-power cal
or the mwait instruction. Interrupts processed by the idle thread are not
considered part of the low level loop.
The notifier can be used to measure precisely how much is spent in useless
execution (or low power mode). The perfmon subsystem uses it to turn on/off
monitoring.
Signed-off-by: stephane eranian <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Andi Kleen <[email protected]>
---
Reverting (some manual work due to irq.c changes) this on latest -mm allows me
to mount floppy again.
regards,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E
Hnus <[email protected]> is an alias for /dev/null
On Mon, 26 Feb 2007, Jiri Slaby wrote:
>
> Ok, this commit is the culprit:
> Commit: 2ff2d3d74705d34ab71b21f54634fcf50d57bdd5
> Author: Stephane Eranian <[email protected]> Tue, 13 Feb 2007 13:26:22 +0100
>
> [PATCH] i386: add idle notifier
Interesting. It doesn't touch floppy at all, but it *does* seem to play
around with irq state.
In particular, the floppy uses IRQF_DISABLED (which means that it doesn't
want interrupts enabled when in the irq handler), and I get the feeling
that the poll_idle() stuff made that not work.
That said, the only thing that *really* seems to change (as far as a
floopy driver could notice) is the added "exit_idle()" in the do_IRQ()
sequence, and I'm not seeing that one enabling interrupts.
But the idle sequence definitely does (ie now we disable/enable interrupts
in cpu_idle(). I'm not seeing why that should matter, though.
Stephane, any ideas?
Linus
Linus Torvalds napsal(a):
>
> On Mon, 26 Feb 2007, Jiri Slaby wrote:
>> Ok, this commit is the culprit:
>> Commit: 2ff2d3d74705d34ab71b21f54634fcf50d57bdd5
>> Author: Stephane Eranian <[email protected]> Tue, 13 Feb 2007 13:26:22 +0100
>>
>> [PATCH] i386: add idle notifier
>
> Interesting. It doesn't touch floppy at all, but it *does* seem to play
> around with irq state.
>
> In particular, the floppy uses IRQF_DISABLED (which means that it doesn't
> want interrupts enabled when in the irq handler), and I get the feeling
> that the poll_idle() stuff made that not work.
$ grep flags /proc/cpuinfo
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid xtpr
i.e. 'ht' and no 'monitor'; both poll_idle and mwait_idle are not involved,
default_idle is.
regards,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E
Hnus <[email protected]> is an alias for /dev/null
On Mon, Feb 26, 2007 at 07:51:19AM -0800, Linus Torvalds wrote:
> > Commit: 2ff2d3d74705d34ab71b21f54634fcf50d57bdd5
> > Author: Stephane Eranian <[email protected]> Tue, 13 Feb 2007 13:26:22 +0100
> >
> > [PATCH] i386: add idle notifier
>
> Interesting. It doesn't touch floppy at all, but it *does* seem to play
> around with irq state.
Side note: this patch adds several function calls (4), several additional
L1 cache touches and a generally inefficient code path to *every single
interrupt that exits from the idle poll*, not to mention the extra (useless,
as it doesn't get used on 99.9% of deployed systems) function call and cache
touches to every single interrupt. Keep in mind that systems acting as
routers will often be sitting in the idle loop when processing interrupts.
At the very least this overhead (which is noticable on profiles) should be
configurable. I don't think that my 586 class embedded routers really need
this crap to be going into the kernel.
-ben
--
"Time is of no importance, Mr. President, only life is important."
Don't Email: <[email protected]>.
On Mon, 26 Feb 2007, Benjamin LaHaise wrote:
>
> Side note: this patch adds several function calls (4), several additional
> L1 cache touches and a generally inefficient code path to *every single
> interrupt that exits from the idle poll*, not to mention the extra (useless,
> as it doesn't get used on 99.9% of deployed systems) function call and cache
> touches to every single interrupt.
It is in fact possible that the floppy failure might just be from some
timing-dependent thing, and the slowdown itself is the problem.
Although I do find that a bit unlikely, since machines these days are
about a million times faster than they used to be, so even if it's
unnecessarily slow, it shouldn't be noticeable for a floppy drive.
> Keep in mind that systems acting as routers will often be sitting in
> the idle loop when processing interrupts. At the very least this
> overhead (which is noticable on profiles) should be configurable. I
> don't think that my 586 class embedded routers really need this crap to
> be going into the kernel.
I'm inclined to agree. Considering that the patch is known to cause
problems, and that it's apparently broken on x86 *anyway* (the
idle_notifier_register function isn't even exported), and considering that
it's clearly bad for interrupt performance and could have been done a lot
better, I would suggest just ripping it all out.
And I think we should do the same for x86-64 too..
Linus
Hi,
On Mon, Feb 26, 2007 at 07:51:19AM -0800, Linus Torvalds wrote:
>
>
> On Mon, 26 Feb 2007, Jiri Slaby wrote:
> >
> > Ok, this commit is the culprit:
> > Commit: 2ff2d3d74705d34ab71b21f54634fcf50d57bdd5
> > Author: Stephane Eranian <[email protected]> Tue, 13 Feb 2007 13:26:22 +0100
> >
> > [PATCH] i386: add idle notifier
>
> Interesting. It doesn't touch floppy at all, but it *does* seem to play
> around with irq state.
>
> In particular, the floppy uses IRQF_DISABLED (which means that it doesn't
> want interrupts enabled when in the irq handler), and I get the feeling
> that the poll_idle() stuff made that not work.
>
> That said, the only thing that *really* seems to change (as far as a
> floopy driver could notice) is the added "exit_idle()" in the do_IRQ()
> sequence, and I'm not seeing that one enabling interrupts.
>
> But the idle sequence definitely does (ie now we disable/enable interrupts
> in cpu_idle(). I'm not seeing why that should matter, though.
>
> Stephane, any ideas?
>
I think this may be related to the issue fixed by Venkatesh here:
http://www.ussg.iu.edu/hypermail/linux/kernel/0611.3/1264.html
The same code was used for i386 but I think for both architectures, the patch
is missing default_idle(). I believe deault idle should look as follows:
void default_idle(void)
{
if (!hlt_counter && boot_cpu_data.hlt_works_ok) {
current_thread_info()->status &= ~TS_POLLING;
/*
* TS_POLLING-cleared state must be visible before we
* test NEED_RESCHED:
*/
smp_mb();
local_irq_disable();
if (!need_resched())
safe_halt(); /* enables interrupts racelessly */
else
local_irq_enable();
current_thread_info()->status |= TS_POLLING;
} else {
/* loop is done by the caller */
----> local_irq_enable();
cpu_relax();
}
}
I do not have any machine with floppy drives. Could you try this?
Thanks.
--
-Stephane
On Mon, 26 Feb 2007, Stephane Eranian wrote:
>
> The same code was used for i386 but I think for both architectures, the patch
> is missing default_idle(). I believe deault idle should look as follows:
Side note - I'll revert it regardless.
We can re-merge it later after
- it has been tested better
- people have added code to make all the expensive stuff go away if there
are no idle notifiers registered (ie "enter_idle" should NOT set the
idle-notify bit unconditionally in the idle loop, it should set if only
if there are people who care.
As it is, that patch was not only apparently buggy, but even if you fix
the actual bug, it's still broken from a performance angle.
Linus
Linus,
On Mon, Feb 26, 2007 at 09:10:22AM -0800, Linus Torvalds wrote:
> >
> > Side note: this patch adds several function calls (4), several additional
> > L1 cache touches and a generally inefficient code path to *every single
> > interrupt that exits from the idle poll*, not to mention the extra (useless,
> > as it doesn't get used on 99.9% of deployed systems) function call and cache
> > touches to every single interrupt.
>
> It is in fact possible that the floppy failure might just be from some
> timing-dependent thing, and the slowdown itself is the problem.
>
> Although I do find that a bit unlikely, since machines these days are
> about a million times faster than they used to be, so even if it's
> unnecessarily slow, it shouldn't be noticeable for a floppy drive.
>
I don't know enough about the floppy driver to comment on this but I would
agree with you here.
> > Keep in mind that systems acting as routers will often be sitting in
> > the idle loop when processing interrupts. At the very least this
> > overhead (which is noticable on profiles) should be configurable. I
> > don't think that my 586 class embedded routers really need this crap to
> > be going into the kernel.
>
> I'm inclined to agree. Considering that the patch is known to cause
> problems, and that it's apparently broken on x86 *anyway* (the
> idle_notifier_register function isn't even exported), and considering that
> it's clearly bad for interrupt performance and could have been done a lot
> better, I would suggest just ripping it all out.
>
The notifier was exported initially but then some people argued I should take
this out because there was no actual users just yet.
As for the performance impact, for non idle tasks, this translates into an
if() return. For the idle, if this is not used, you have a bitop + call
to notifier call chain with empty chain.
With perfmon, we would like to exclude useless kernel execution from being
monitored. That means poll_idle(), default_idle(), mwait_idle(). Yet we
want to capture interrupt handler execution by the idle thread because this
represents useful execution.
The notifier is one mechanism whereby we can dynamically register a callback
to start/stop monitoring to achieve our goal. One could argue the mechanism
is heavy for the usage we make of it. We could certainly add two more
perfmon hooks in the idle code and we would save the atomic call chain notifier
altogether.
Another solution would be to just rely on firmware to stop/restart performance
counters when using mwait or hlt. But we would need something for poll_idle().
An issue with this solution is that it depends on the processor architecture or
implementation, some may not always stop the counters. Yet, we would like to
provide for 'useless execution exclusion' at the interface level for all
architectures. Explicit code may be the only way to provide such guarantee.
If you think there maybe a better way to do this, I am certainly open to suggestions.
Thanks.
--
-Stephane