From: Ville Syrjälä <[email protected]>
This reverts commit 608008a45798fe9e2aee04f99b5270ea57c1376f.
It breaks wifi on my pentium 3 Fujitsu-Siemens Lifebook S6010
laptop. Scanning for APs doesn't seem to work most of the time,
and, even when it manages to find some APs it never manages to
authenticate successfully. dmesg is just littered with:
"wlan0: send auth to ... (try 1/3)
wlan0: send auth to ... (try 2/3)
wlan0: send auth to ... (try 3/3)
wlan0: authentication with ... timed out"
Presumably also USB is broken on account of the following noise
in dmesg:
"usb usb2-port2: Cannot enable. Maybe the USB cable is bad?".
Cc: Dou Liyang <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Signed-off-by: Ville Syrjälä <[email protected]>
---
arch/x86/kernel/tsc.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 6490f618e096..203edfabe813 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -182,7 +182,7 @@ static void __init cyc2ns_init_boot_cpu(void)
}
/*
- * Secondary CPUs do not run through tsc_init(), so set up
+ * Secondary CPUs do not run through cyc2ns_init(), so set up
* all the scale factors for all CPUs, assuming the same
* speed as the bootup CPU. (cpufreq notifiers will fix this
* up if their speed diverges)
@@ -1389,7 +1389,7 @@ static bool __init determine_cpu_tsc_frequencies(bool early)
}
/*
- * Trust non-zero tsc_khz as authoritative,
+ * Trust non-zero tsc_khz as authorative,
* and use it to sanity check cpu_khz,
* which will be off if system timer is off.
*/
@@ -1421,14 +1421,6 @@ static unsigned long __init get_loops_per_jiffy(void)
return lpj;
}
-static void __init tsc_enable_sched_clock(void)
-{
- /* Sanitize TSC ADJUST before cyc2ns gets initialized */
- tsc_store_and_check_tsc_adjust(true);
- cyc2ns_init_boot_cpu();
- static_branch_enable(&__use_tsc);
-}
-
void __init tsc_early_init(void)
{
if (!boot_cpu_has(X86_FEATURE_TSC))
@@ -1437,7 +1429,10 @@ void __init tsc_early_init(void)
return;
loops_per_jiffy = get_loops_per_jiffy();
- tsc_enable_sched_clock();
+ /* Sanitize TSC ADJUST before cyc2ns gets initialized */
+ tsc_store_and_check_tsc_adjust(true);
+ cyc2ns_init_boot_cpu();
+ static_branch_enable(&__use_tsc);
}
void __init tsc_init(void)
@@ -1461,10 +1456,13 @@ void __init tsc_init(void)
setup_clear_cpu_cap(X86_FEATURE_TSC_DEADLINE_TIMER);
return;
}
- tsc_enable_sched_clock();
+ /* Sanitize TSC ADJUST before cyc2ns gets initialized */
+ tsc_store_and_check_tsc_adjust(true);
+ cyc2ns_init_boot_cpu();
}
cyc2ns_init_secondary_cpus();
+ static_branch_enable(&__use_tsc);
if (!no_sched_irq_time)
enable_sched_clock_irqtime();
--
2.16.4
Ville,
On Mon, 10 Sep 2018, Ville Syrjala wrote:
> From: Ville Syrjälä <[email protected]>
>
> This reverts commit 608008a45798fe9e2aee04f99b5270ea57c1376f.
>
> It breaks wifi on my pentium 3 Fujitsu-Siemens Lifebook S6010
> laptop. Scanning for APs doesn't seem to work most of the time,
> and, even when it manages to find some APs it never manages to
> authenticate successfully. dmesg is just littered with:
> "wlan0: send auth to ... (try 1/3)
> wlan0: send auth to ... (try 2/3)
> wlan0: send auth to ... (try 3/3)
> wlan0: authentication with ... timed out"
I asked for that before and I really do not understand why you do not even
make an attempt to report an issue first and allow the developers to work
with you to figure out what exactly is the problem. All you do is to send
an revert patch with a changelog which describes symptoms and probably
breaks more than it cures. Not really helpful, really.
It's surely helpful to know that you bisected it to that commit and
reverting it helps. Can you please provide more detailes information like
dmesg of an good and a bad boot?
Thanks,
tglx
Hi Ville,
The failure is surprising, because the commit is tiny, and almost does
not change the code logic.
From looking through the commit, the only functional difference this
commit makes is:
static_branch_enable(&__use_tsc) was called unconditionally from
tsc_init(), but after the commit only when tsc_khz == 0.
I wonder if on p3 static_branch_enable(&__use_tsc) fails to enable
early, when it supposed to? But, I would first try to make that
unconditional call again, and see if this fixes the problem, and then
figure out why it was not enabled when it was supposed to.
So, in tsc_init(void)
First try to add this one line back:
cyc2ns_init_secondary_cpus();
- static_branch_enable(&__use_tsc);
See if it fixes everything, and lets work from there.
Thank you,
Pavel
On 9/10/18 8:48 AM, Thomas Gleixner wrote:
> Ville,
>
> On Mon, 10 Sep 2018, Ville Syrjala wrote:
>
>> From: Ville Syrjälä <[email protected]>
>>
>> This reverts commit 608008a45798fe9e2aee04f99b5270ea57c1376f.
>>
>> It breaks wifi on my pentium 3 Fujitsu-Siemens Lifebook S6010
>> laptop. Scanning for APs doesn't seem to work most of the time,
>> and, even when it manages to find some APs it never manages to
>> authenticate successfully. dmesg is just littered with:
>> "wlan0: send auth to ... (try 1/3)
>> wlan0: send auth to ... (try 2/3)
>> wlan0: send auth to ... (try 3/3)
>> wlan0: authentication with ... timed out"
>
> I asked for that before and I really do not understand why you do not even
> make an attempt to report an issue first and allow the developers to work
> with you to figure out what exactly is the problem. All you do is to send
> an revert patch with a changelog which describes symptoms and probably
> breaks more than it cures. Not really helpful, really.
>
> It's surely helpful to know that you bisected it to that commit and
> reverting it helps. Can you please provide more detailes information like
> dmesg of an good and a bad boot?
>
> Thanks,
>
> tglx
>
>
>
On Mon, Sep 10, 2018 at 02:48:45PM +0200, Thomas Gleixner wrote:
> Ville,
>
> On Mon, 10 Sep 2018, Ville Syrjala wrote:
>
> > From: Ville Syrj?l? <[email protected]>
> >
> > This reverts commit 608008a45798fe9e2aee04f99b5270ea57c1376f.
> >
> > It breaks wifi on my pentium 3 Fujitsu-Siemens Lifebook S6010
> > laptop. Scanning for APs doesn't seem to work most of the time,
> > and, even when it manages to find some APs it never manages to
> > authenticate successfully. dmesg is just littered with:
> > "wlan0: send auth to ... (try 1/3)
> > wlan0: send auth to ... (try 2/3)
> > wlan0: send auth to ... (try 3/3)
> > wlan0: authentication with ... timed out"
>
> I asked for that before and I really do not understand why you do not even
> make an attempt to report an issue first and allow the developers to work
> with you to figure out what exactly is the problem. All you do is to send
> an revert patch with a changelog which describes symptoms and probably
> breaks more than it cures. Not really helpful, really.
You're reading way too much into this. The revert is just a point to
start the conversion. I've found that it's the best way to get the
attention of the relevant developers. Other kind of regression
reports have an unfortunate habit of disappearing into /dev/null.
>
> It's surely helpful to know that you bisected it to that commit and
> reverting it helps. Can you please provide more detailes information like
> dmesg of an good and a bad boot?
I think the only real difference (apart from the USB noise) is:
- clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1cbd01c4e18, max_idle_ns: 881590491211 ns
- Calibrating delay loop (skipped), value calculated using timer frequency.. 1718674.70 BogoMIPS (lpj=2863311530)
+ clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x1cbc55fe3af, max_idle_ns: 881590592998 ns
+ Calibrating delay loop (skipped), value calculated using timer frequency.. 859455.59 BogoMIPS (lpj=1431852151)
Full logs attached.
--
Ville Syrj?l?
Intel
On Mon, Sep 10, 2018 at 05:07:10PM +0300, Ville Syrjälä wrote:
> You're reading way too much into this. The revert is just a point to
> start the conversion. I've found that it's the best way to get the
> attention of the relevant developers. Other kind of regression
> reports have an unfortunate habit of disappearing into /dev/null.
That's some strange "logic".
You're sending a patch which has "[PATCH]" in the subject but now you
say it is not really a patch but only a way to get people's attention?!?
And nothing in it says so anywhere - it looks like an actual patch and
all.
If you do that a couple of times I'm afraid the opposite might happen -
such "patches" would get ignored completely.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Mon, Sep 10, 2018 at 04:47:20PM +0200, Borislav Petkov wrote:
> On Mon, Sep 10, 2018 at 05:07:10PM +0300, Ville Syrj?l? wrote:
> > You're reading way too much into this. The revert is just a point to
> > start the conversion. I've found that it's the best way to get the
> > attention of the relevant developers. Other kind of regression
> > reports have an unfortunate habit of disappearing into /dev/null.
>
> That's some strange "logic".
>
> You're sending a patch which has "[PATCH]" in the subject but now you
> say it is not really a patch but only a way to get people's attention?!?
But it is a patch, and if it happens to get accepted as is so be
it. If not, it's a good place where to start the conversation on
how to fix the bug in another way.
You guys seem to have a notion that anything which says '[PATCH]'
is somehow final. In my book any patch is up for debate. Nothing
special about this one in that regard.
--
Ville Syrj?l?
Intel
On Mon, Sep 10, 2018 at 06:09:10PM +0300, Ville Syrjälä wrote:
> But it is a patch, and if it happens to get accepted as is so be
> it. If not, it's a good place where to start the conversation on
> how to fix the bug in another way.
Uh, more of that "logic".
It is a patch but not really, if it is applied, good, if not, also good.
WTF dude?
> You guys seem to have a notion that anything which says '[PATCH]'
> is somehow final. In my book any patch is up for debate. Nothing
> special about this one in that regard.
Well, let's see: imagine you're a maintainer. You get gazillion patches
a day. And you think, oh well, I need to review and possibly apply this.
And then move on to the next one. Because everyone is asking, when is
she/he going to apply my damn patches...
But nooo, *some* of the patches are special - they're a conversation
starter *only*! But also if applied, that's fine too.
What a bunch of bull!
What's wrong with sending a mail tagged with "[REGRESSION]" - this looks
like the tag people have adopted - and explain what the problem is, what
you've bisected it to and what your observations are? Like everyone else
reporting bugs/regressions/...
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Mon, Sep 10, 2018 at 05:25:38PM +0200, Borislav Petkov wrote:
> On Mon, Sep 10, 2018 at 06:09:10PM +0300, Ville Syrj?l? wrote:
> > But it is a patch, and if it happens to get accepted as is so be
> > it. If not, it's a good place where to start the conversation on
> > how to fix the bug in another way.
>
> Uh, more of that "logic".
>
> It is a patch but not really, if it is applied, good, if not, also good.
> WTF dude?
>
> > You guys seem to have a notion that anything which says '[PATCH]'
> > is somehow final. In my book any patch is up for debate. Nothing
> > special about this one in that regard.
>
> Well, let's see: imagine you're a maintainer. You get gazillion patches
> a day. And you think, oh well, I need to review and possibly apply this.
> And then move on to the next one. Because everyone is asking, when is
> she/he going to apply my damn patches...
Sounds to me like the maintainer should figure out how to delegate some
of the load a bit. Or just go on vacation and ignore all mails. I hear
stress isn't good for you.
>
> But nooo, *some* of the patches are special - they're a conversation
> starter *only*! But also if applied, that's fine too.
That's what all patches are. No should be applying unreviewed patches
blindly.
Also often a revert is a perfect way to handle regressions. It gets
the angry users off your back ASAP allowing you to fix the bug
properly without having to rush it. I only wish all regression I've
caused would have been caught early enough for a revert to apply
cleanly.
Even if the revert isn't applied the fact that the mail has the
offending code right there makes the disussion easier. No need to
git fetch; git show <copy paste sha>; copy paste some code snippets
into the mail, etc.).
>
> What a bunch of bull!
Calm down. No one is out to revert all your patches.
>
> What's wrong with sending a mail tagged with "[REGRESSION]" - this looks
> like the tag people have adopted - and explain what the problem is, what
> you've bisected it to and what your observations are? Like everyone else
> reporting bugs/regressions/...
Maybe you can propose a new git-regression tool then? And document that
you want bugs reported using it? Ideally I'd say it should do almost
exactly what git revert does except s/revert/regression/. Though I
suppose it could include the original diff instead of the reverse.
Now, how about we stop this pointless "logic" discussion and
focus on the techinal stuff from now on?
--
Ville Syrj?l?
Intel
On Mon, Sep 10, 2018 at 06:51:13PM +0300, Ville Syrjälä wrote:
> Sounds to me like the maintainer should figure out how to delegate some
> of the load a bit. Or just go on vacation and ignore all mails. I hear
> stress isn't good for you.
Bullshit.
> That's what all patches are. No should be applying unreviewed patches
> blindly.
>
> Also often a revert is a perfect way to handle regressions. It gets
> the angry users off your back ASAP allowing you to fix the bug
More bullshit.
> Calm down. No one is out to revert all your patches.
Even more bullshit.
> Maybe you can propose a new git-regression tool then? And document that
> you want bugs reported using it? Ideally I'd say it should do almost
> exactly what git revert does except s/revert/regression/. Though I
> suppose it could include the original diff instead of the reverse.
Even even more bullshit.
> Now, how about we stop this pointless "logic" discussion and
> focus on the techinal stuff from now on?
That's the only constructive and serious thing you've said so far.
Let's.
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
On Mon, 10 Sep 2018, Ville Syrj?l? wrote:
> On Mon, Sep 10, 2018 at 02:48:45PM +0200, Thomas Gleixner wrote:
> > On Mon, 10 Sep 2018, Ville Syrjala wrote:
> > I asked for that before and I really do not understand why you do not even
> > make an attempt to report an issue first and allow the developers to work
> > with you to figure out what exactly is the problem. All you do is to send
> > an revert patch with a changelog which describes symptoms and probably
> > breaks more than it cures. Not really helpful, really.
>
> You're reading way too much into this. The revert is just a point to
> start the conversion. I've found that it's the best way to get the
> attention of the relevant developers. Other kind of regression
> reports have an unfortunate habit of disappearing into /dev/null.
1) My workflow makes things tagged as BUG and REGRESSION urgent
automatically while [PATCH] just is queued to the normal pile of
backlog, i.e. at the end. It just sprang into my eyes by chance, but in
general you might just get the contrary of what you are looking for.
2) A proper bug report with proper information (it's documented what should
be provided), is way more worth than a patch with a mostly useless
change log, which forces me to ask for the proper information instead of
having it right away.
Thanks,
tglx
On Mon, Sep 10, 2018 at 06:23:49PM +0200, Thomas Gleixner wrote:
> On Mon, 10 Sep 2018, Ville Syrj?l? wrote:
> > On Mon, Sep 10, 2018 at 02:48:45PM +0200, Thomas Gleixner wrote:
> > > On Mon, 10 Sep 2018, Ville Syrjala wrote:
> > > I asked for that before and I really do not understand why you do not even
> > > make an attempt to report an issue first and allow the developers to work
> > > with you to figure out what exactly is the problem. All you do is to send
> > > an revert patch with a changelog which describes symptoms and probably
> > > breaks more than it cures. Not really helpful, really.
> >
> > You're reading way too much into this. The revert is just a point to
> > start the conversion. I've found that it's the best way to get the
> > attention of the relevant developers. Other kind of regression
> > reports have an unfortunate habit of disappearing into /dev/null.
>
> 1) My workflow makes things tagged as BUG and REGRESSION urgent
> automatically while [PATCH] just is queued to the normal pile of
> backlog, i.e. at the end. It just sprang into my eyes by chance, but in
> general you might just get the contrary of what you are looking for.
Ah. Might be nice to document that somewhere. I might have to type up
that git-regression tool for myself, because I'm lazy.
>
> 2) A proper bug report with proper information (it's documented what should
> be provided), is way more worth than a patch with a mostly useless
> change log, which forces me to ask for the proper information instead of
> having it right away.
I do agree that not having to ask for more information would be nice,
but hard to generalize because every subsystem needs different things.
In this case you asked for the dmesg, which isn't even mentioned in
Documentation/admin-guide/reporting-bugs.rst as far as I can see.
So I'm not quite sure which documentation you're referring to here.
--
Ville Syrj?l?
Intel
On Mon, 10 Sep 2018, Ville Syrj?l? wrote:
Good: 1718674.70 BogoMIPS (lpj=2863311530)
Bad: 859455.59 BogoMIPS (lpj=1431852151)
while both kernels agree on the CPU frequency of 996MHz. This pretty much
smells like the 32bit LPJ conversion bug which got fixed in rc3. Does the
problem persist with rc3?
Thanks,
tglx
On Mon, 10 Sep 2018, Ville Syrj?l? wrote:
> On Mon, Sep 10, 2018 at 06:23:49PM +0200, Thomas Gleixner wrote:
> > 1) My workflow makes things tagged as BUG and REGRESSION urgent
> > automatically while [PATCH] just is queued to the normal pile of
> > backlog, i.e. at the end. It just sprang into my eyes by chance, but in
> > general you might just get the contrary of what you are looking for.
>
> Ah. Might be nice to document that somewhere. I might have to type up
> that git-regression tool for myself, because I'm lazy.
Well, it's probably different between maintainers, but it's common practice
to have '[REGRESION] sub/sys got fubarred' in the subject.
> > 2) A proper bug report with proper information (it's documented what should
> > be provided), is way more worth than a patch with a mostly useless
> > change log, which forces me to ask for the proper information instead of
> > having it right away.
>
> I do agree that not having to ask for more information would be nice,
> but hard to generalize because every subsystem needs different things.
>
> In this case you asked for the dmesg, which isn't even mentioned in
> Documentation/admin-guide/reporting-bugs.rst as far as I can see.
> So I'm not quite sure which documentation you're referring to here.
I didn't look, but I expected dmesg to be part of it and a lot of people
provide it as well as the start point of their bisection. Again, I had to
do a shot into the dark and ask you whether it's fixed in -rc3. bisect
start would have told me.
So again. That revert patch habit does not make my life easier at all.
Thanks,
tglx
On Mon, Sep 10, 2018 at 07:02:43PM +0200, Thomas Gleixner wrote:
> On Mon, 10 Sep 2018, Ville Syrj?l? wrote:
> > On Mon, Sep 10, 2018 at 06:23:49PM +0200, Thomas Gleixner wrote:
> > > 1) My workflow makes things tagged as BUG and REGRESSION urgent
> > > automatically while [PATCH] just is queued to the normal pile of
> > > backlog, i.e. at the end. It just sprang into my eyes by chance, but in
> > > general you might just get the contrary of what you are looking for.
> >
> > Ah. Might be nice to document that somewhere. I might have to type up
> > that git-regression tool for myself, because I'm lazy.
>
> Well, it's probably different between maintainers, but it's common practice
> to have '[REGRESION] sub/sys got fubarred' in the subject.
>
> > > 2) A proper bug report with proper information (it's documented what should
> > > be provided), is way more worth than a patch with a mostly useless
> > > change log, which forces me to ask for the proper information instead of
> > > having it right away.
> >
> > I do agree that not having to ask for more information would be nice,
> > but hard to generalize because every subsystem needs different things.
> >
> > In this case you asked for the dmesg, which isn't even mentioned in
> > Documentation/admin-guide/reporting-bugs.rst as far as I can see.
> > So I'm not quite sure which documentation you're referring to here.
>
> I didn't look, but I expected dmesg to be part of it and a lot of people
> provide it as well as the start point of their bisection. Again, I had to
> do a shot into the dark and ask you whether it's fixed in -rc3. bisect
> start would have told me.
>
> So again. That revert patch habit does not make my life easier at all.
OK. I'll keep that in mind and try to stick REGRESSION format in the
future.
--
Ville Syrj?l?
Intel
On Mon, Sep 10, 2018 at 06:53:54PM +0200, Thomas Gleixner wrote:
> On Mon, 10 Sep 2018, Ville Syrj?l? wrote:
>
> Good: 1718674.70 BogoMIPS (lpj=2863311530)
> Bad: 859455.59 BogoMIPS (lpj=1431852151)
>
> while both kernels agree on the CPU frequency of 996MHz. This pretty much
> smells like the 32bit LPJ conversion bug which got fixed in rc3. Does the
> problem persist with rc3?
Indeed looks to be fixed by commit 17f6bac22493 ("x86/tsc:
Prevent result truncation on 32bit"). I both cherry-picked that
on top of rc2 to make sure it really is that commit, and also
tested plain rc3 to make sure it still works.
And comparing the bogomips between the three relevant commits makes
the bug pretty obvious in hindsight:
Calibrating delay loop (skipped), value calculated using timer frequency.. 1718674.70 BogoMIPS (lpj=2863311530)
Calibrating delay loop (skipped), value calculated using timer frequency.. 859455.59 BogoMIPS (lpj=1431852151)
Calibrating delay loop (skipped), value calculated using timer frequency.. 1994.50 BogoMIPS (lpj=3322410)
I suppose we just got very lucky with older kernels.
--
Ville Syrj?l?
Intel
On Tue, 11 Sep 2018, Ville Syrj?l? wrote:
> On Mon, Sep 10, 2018 at 06:53:54PM +0200, Thomas Gleixner wrote:
> > On Mon, 10 Sep 2018, Ville Syrj?l? wrote:
> >
> > Good: 1718674.70 BogoMIPS (lpj=2863311530)
> > Bad: 859455.59 BogoMIPS (lpj=1431852151)
> >
> > while both kernels agree on the CPU frequency of 996MHz. This pretty much
> > smells like the 32bit LPJ conversion bug which got fixed in rc3. Does the
> > problem persist with rc3?
>
> Indeed looks to be fixed by commit 17f6bac22493 ("x86/tsc:
> Prevent result truncation on 32bit").
Not a surprise. That was pretty clear when I looked at dmesg because
bogomips were very bogus for both variants.
So can you now understand why I prefer a proper bug/regression report with
as much information as possible over a revert patch which lacks a proper
explanation and does not even fix the underlying issue at all?
Reverting that patch as you can see from bogus mips solves exactly
nothing. It's pure chance that it booted.
You could have spared my and your time by
1) checking whether the problem persist in the latest upstream -rc
first
2) Providing useful information upfront
I don't care much about your time and how you think what's the best way to
get a discussion started, but I care very much about my time being wasted
for pointless discsussions which can be avoided more or less completely.
> I suppose we just got very lucky with older kernels.
The problem got initialy introduced with
commit dd759d93f4dd ("x86/timers: Add simple udelay calibration")
but this was not fatal because it only affected the very early boot and was
fixed up by the correct LPJ calculation in tsc_init() before any serious
user was affected. I'll whip up a fix for the affected stable kernels
nevertheless.
Thanks
tglx