Hi, everyone,
Starting with Linux 5.7-rc1, my Arrandale (Core i3-380M) laptop is
completely unbootable (the last messages on screen are GRUB's loading
kernel/initramfs). I was hoping someone would have noticed it before
5.7-rc2 had been tagged, but alas. Anyway, I bisected it down to
commit 1567c3e3467cddeb019a7b53ec632f834b6a9239 (x86, sched: Add
support for frequency invariance). After reverting it, the machine
boots again, obviously.
Let me know if you need any further info in order to fix this issue.
Complete bisection log follows.
git bisect start
# good: [7111951b8d4973bda27ff663f2cf18b663d15b48] Linux 5.6
git bisect good 7111951b8d4973bda27ff663f2cf18b663d15b48
# bad: [8f3d9f354286745c751374f5f1fcafee6b3f3136] Linux 5.7-rc1
git bisect bad 8f3d9f354286745c751374f5f1fcafee6b3f3136
# bad: [4646de87d32526ee87b46c2e0130413367fb5362] Merge tag
'mailbox-v5.7' of
git://git.linaro.org/landing-teams/working/fujitsu/integration
git bisect bad 4646de87d32526ee87b46c2e0130413367fb5362
# bad: [5b67fbfc32b544daa7f4e0f4e0ecdec4e4895938] Merge tag
'kbuild-v5.7' of
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
git bisect bad 5b67fbfc32b544daa7f4e0f4e0ecdec4e4895938
# good: [e129940938d84d8b71074e40a9cc4f69278eb1e1] Merge tag
'regmap-v5.7' of
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap
git bisect good e129940938d84d8b71074e40a9cc4f69278eb1e1
# bad: [2d385336afcc43732aef1d51528c03f177ecd54e] Merge tag
'irq-core-2020-03-30' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 2d385336afcc43732aef1d51528c03f177ecd54e
# good: [7c4fa150714fb319d4e2bb2303ebbd7307b0fb6d] Merge branch
'core-rcu-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 7c4fa150714fb319d4e2bb2303ebbd7307b0fb6d
# good: [4b9fd8a829a1eec7442e38afff21d610604de56a] Merge branch
'locking-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 4b9fd8a829a1eec7442e38afff21d610604de56a
# good: [9b82f05f869a823d43ea4186f5f732f2924d3693] Merge branch
'perf-core-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 9b82f05f869a823d43ea4186f5f732f2924d3693
# bad: [313f16e2e35abb833eab5bdebc6ae30699adca18] Merge branch
'sched/rt' into sched/core, to pick up completed topic tree
git bisect bad 313f16e2e35abb833eab5bdebc6ae30699adca18
# bad: [ae1677c0bbe23fe30d634ac0d9f5c147ee4adbc1] arm64/topology:
Populate arch_scale_thermal_pressure() for arm64 platforms
git bisect bad ae1677c0bbe23fe30d634ac0d9f5c147ee4adbc1
# bad: [b2b2042b204796190af7c20069ab790a614c36d0] sched/numa:
Distinguish between the different task_numa_migrate() failure cases
git bisect bad b2b2042b204796190af7c20069ab790a614c36d0
# bad: [b4fb015eeff7f3e5518a7dbe8061169a3e2f2bc7] sched/rt: Optimize
checking group RT scheduler constraints
git bisect bad b4fb015eeff7f3e5518a7dbe8061169a3e2f2bc7
# bad: [eacf0474aec8bdccdc7f19386319127c67be3588] x86, sched: Add
support for frequency invariance on ATOM_GOLDMONT*
git bisect bad eacf0474aec8bdccdc7f19386319127c67be3588
# bad: [2a0abc59699896f03bf6f16efb8a3a490511216f] x86, sched: Add
support for frequency invariance on SKYLAKE_X
git bisect bad 2a0abc59699896f03bf6f16efb8a3a490511216f
# bad: [1567c3e3467cddeb019a7b53ec632f834b6a9239] x86, sched: Add
support for frequency invariance
git bisect bad 1567c3e3467cddeb019a7b53ec632f834b6a9239
# first bad commit: [1567c3e3467cddeb019a7b53ec632f834b6a9239] x86,
sched: Add support for frequency invariance
Thanks,
Rui
On Fri, 2020-04-24 at 09:00 +0100, Rui Salvaterra wrote:
> Hi, everyone,
>
> Starting with Linux 5.7-rc1, my Arrandale (Core i3-380M) laptop is
> completely unbootable (the last messages on screen are GRUB's loading
> kernel/initramfs). I was hoping someone would have noticed it before
> 5.7-rc2 had been tagged, but alas. Anyway, I bisected it down to
> commit 1567c3e3467cddeb019a7b53ec632f834b6a9239 (x86, sched: Add
> support for frequency invariance). After reverting it, the machine
> boots again, obviously.
>
> Let me know if you need any further info in order to fix this issue.
>
> Complete bisection log follows.
> [...]
Hello Rui,
thanks for the report.
The problem you encountered is due to a bug where the code doesn't work on
machines with less than 4 physical CPU cores. It is fixed by this patch
series:
https://lore.kernel.org/lkml/[email protected]/
The series has been merged in the branch sched/urgent of the "tip" tree
(https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git), the commit
fixing the bug you observed is 23ccee22e834 (x86, sched: Account for CPUs
with less than 4 cores in freq. invariance), and will be merged in Linus'
tree at some point.
Thanks,
Giovanni Gherdovich
On Fri, 24 Apr 2020 at 09:11, Giovanni Gherdovich <[email protected]> wrote:
>
> Hello Rui,
>
> thanks for the report.
>
> The problem you encountered is due to a bug where the code doesn't work on
> machines with less than 4 physical CPU cores. It is fixed by this patch
> series:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> The series has been merged in the branch sched/urgent of the "tip" tree
> (https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git), the commit
> fixing the bug you observed is 23ccee22e834 (x86, sched: Account for CPUs
> with less than 4 cores in freq. invariance), and will be merged in Linus'
> tree at some point.
>
>
> Thanks,
> Giovanni Gherdovich
Hi, Giovanni,
Great, thanks for the quick turnaround. I wager it will have hit
Linus' tree by 5.7-rc3.
Best regards,
Rui
Hi, again,
On Fri, 24 Apr 2020 at 09:11, Giovanni Gherdovich <[email protected]> wrote:
>
> The problem you encountered is due to a bug where the code doesn't work on
> machines with less than 4 physical CPU cores.
Thinking about it more thoroughly, are you sure that's the bug I'm
hitting? I'm asking because I have an i5-6200U (Skylake, also dual
core, dual thread, like the i3-380M) laptop which runs 5.7-rc1/rc2
just fine.
Thanks,
Rui
On Fri, 2020-04-24 at 18:39 +0100, Rui Salvaterra wrote:
> Hi, again,
>
> On Fri, 24 Apr 2020 at 09:11, Giovanni Gherdovich <[email protected]> wrote:
> >
> > The problem you encountered is due to a bug where the code doesn't work on
> > machines with less than 4 physical CPU cores.
>
> Thinking about it more thoroughly, are you sure that's the bug I'm
> hitting? I'm asking because I have an i5-6200U (Skylake, also dual
> core, dual thread, like the i3-380M) laptop which runs 5.7-rc1/rc2
> just fine.
>
> Thanks,
> Rui
There is an easy way to tell (besides compiling with those patches on top and
check if it works): run the command "turbostat --interval 1 sleep 0", the
output should tell you the content of the register MSR_TURBO_RATIO_LIMIT.
If bits 31:24 are zero, you see the bug (the code divides by that value),
otherwise you don't. Some 2 cores / 4 threads CPU have a non-zero value there
(even if it doesn't mean much), some others have zero instead.
The Intel Software Developer Manual (SDM) says the register content is like
this:
Bit Fields Bit Description
7:0 Maximum turbo ratio limit of 1 core active.
15:8 Maximum turbo ratio limit of 2 core active.
23:16 Maximum turbo ratio limit of 3 core active.
31:24 Maximum turbo ratio limit of 4 core active.
39:32 Maximum turbo ratio limit of 5 core active.
47:40 Maximum turbo ratio limit of 6 core active.
55:48 Maximum turbo ratio limit of 7 core active.
63:56 Maximum turbo ratio limit of 8 core active.
As I wrote above, some 2c/4t CPUs will say (correctly) their 4 cores turbo
frequency is zero, such as this Intel Core i5-430M (Arrandale) where I've seen
turbostat saying:
cpu1: MSR_TURBO_RATIO_LIMIT: 0x00001313
19 * 133.3 = 2533.3 MHz max turbo 2 active cores
19 * 133.3 = 2533.3 MHz max turbo 1 active cores
On the contrary, my laptop has an Intel Core i5-5300U (Broadwell, also
2 cores / 4 threads) and it has:
cpu3: MSR_TURBO_RATIO_LIMIT: 0x1b1b1b1b1b1d
27 * 100.0 = 2700.0 MHz max turbo 6 active cores
27 * 100.0 = 2700.0 MHz max turbo 5 active cores
27 * 100.0 = 2700.0 MHz max turbo 4 active cores
27 * 100.0 = 2700.0 MHz max turbo 3 active cores
27 * 100.0 = 2700.0 MHz max turbo 2 active cores
29 * 100.0 = 2900.0 MHz max turbo 1 active cores
You can see above that the 4 cores turbo freq is declared as 2.7GHz even if
it's nonsense because there aren't 4 cores. In any case, this cpu wouldn't
trigger the bug, just as your skylake.
Thanks,
Giovanni
On Sat, 25 Apr 2020 at 17:01, Giovanni Gherdovich <[email protected]> wrote:
>
> There is an easy way to tell (besides compiling with those patches on top and
> check if it works): run the command "turbostat --interval 1 sleep 0", the
> output should tell you the content of the register MSR_TURBO_RATIO_LIMIT.
>
> If bits 31:24 are zero, you see the bug (the code divides by that value),
> otherwise you don't. Some 2 cores / 4 threads CPU have a non-zero value there
> (even if it doesn't mean much), some others have zero instead.
>
> The Intel Software Developer Manual (SDM) says the register content is like
> this:
>
> Bit Fields Bit Description
> 7:0 Maximum turbo ratio limit of 1 core active.
> 15:8 Maximum turbo ratio limit of 2 core active.
> 23:16 Maximum turbo ratio limit of 3 core active.
> 31:24 Maximum turbo ratio limit of 4 core active.
> 39:32 Maximum turbo ratio limit of 5 core active.
> 47:40 Maximum turbo ratio limit of 6 core active.
> 55:48 Maximum turbo ratio limit of 7 core active.
> 63:56 Maximum turbo ratio limit of 8 core active.
>
> As I wrote above, some 2c/4t CPUs will say (correctly) their 4 cores turbo
> frequency is zero, such as this Intel Core i5-430M (Arrandale) where I've seen
> turbostat saying:
>
> cpu1: MSR_TURBO_RATIO_LIMIT: 0x00001313
> 19 * 133.3 = 2533.3 MHz max turbo 2 active cores
> 19 * 133.3 = 2533.3 MHz max turbo 1 active cores
>
> On the contrary, my laptop has an Intel Core i5-5300U (Broadwell, also
> 2 cores / 4 threads) and it has:
>
> cpu3: MSR_TURBO_RATIO_LIMIT: 0x1b1b1b1b1b1d
> 27 * 100.0 = 2700.0 MHz max turbo 6 active cores
> 27 * 100.0 = 2700.0 MHz max turbo 5 active cores
> 27 * 100.0 = 2700.0 MHz max turbo 4 active cores
> 27 * 100.0 = 2700.0 MHz max turbo 3 active cores
> 27 * 100.0 = 2700.0 MHz max turbo 2 active cores
> 29 * 100.0 = 2900.0 MHz max turbo 1 active cores
>
> You can see above that the 4 cores turbo freq is declared as 2.7GHz even if
> it's nonsense because there aren't 4 cores. In any case, this cpu wouldn't
> trigger the bug, just as your skylake.
>
>
> Thanks,
> Giovanni
Hi again, Giovanni,
Thanks for the detailed and insightful explanation and sorry for not
replying earlier. This was indeed the bug I was hitting, as my
Arrandale laptop is now booting 5.7-rc3 just fine.
Best regards,
Rui