2014-04-01 18:17:08

by Paul Gortmaker

[permalink] [raw]
Subject: Regression in intel_idle on Avaton/Rangely Mohon Peak board

Hi Len,

I've got an eval board with a 1.7GHz Avaton/C2000 that hangs at boot
shortly after the idle driver registration -- typically 1/2 dozen
dmesg lines later, around rtc init, or net stack init.

It may be that this early board/early bios makes it a non-issue for
mainline, but I figured I'd better mention it anyway.

Problem starts with commit v3.12-rc4-32-gfab04b2208dd
("intel_idle: Support Intel Atom Processor C2000 Product Family").

Even though a bisect leads here, this commit shouldn't be considered
in isolation however, since it depends on these earlier v3.12 commits:

commit v3.12-rc2-1-geba682a5aeb6
"intel_idle: shrink states tables"

commit v3.12-rc2-2-g9d046ccb9808
"intel_idle: mark states tables with __initdata tag"

..and then these v3.13 subsequent fixups;

commit v3.13-rc1-1-g22e580d07f65
"intel_idle: Fixed C6 state on Avoton/Rangeley processors"

commit v3.13-rc7-1-gba0dc81ed5d9
"Revert "intel_idle: mark states tables with __initdata tag""

commit v3.13-rc7-2-g88390996c95b
"intel_idle: close avn_cstates array with correct marker"

However, even with all these present in v3.13-final, the board
still does the same thing as the bisected fab 04 commit. Same
for v3.14-final and today's linux-next -- i.e. the regression
remains uninterrupted.

The interesting part is that a nearly identical board, but with
different (newer/faster) CPU and newer BIOS doesn't have the hang.

Hardware details:

board: intel Mohon Peak CRB GA-95PEV ALPHA2 rev0.3
BIOS: EDVLCRB1.86B.0010.R00.1303272109 03/27/2013
CPU (dmidecode):
Family: Pentium 4
ID: D0 06 04 00 FF FB EB 9F
Signature: Type 0, Family 6, Model 77, Stepping 0
Version: Genuine Intel(R) CPU 4000 @ 1.70GHz

The same board that doesn't hang, for comparison, has:

BIOS: EDVLCRB1.86B.0017.R00.1305271414 05/27/2013
CPU: (dmidecode):
Family: Pentium 4
ID: D0 06 04 00 FF FB EB 9F
Signature: Type 0, Family 6, Model 77, Stepping 0
Version: Genuine Intel(R) CPU 4000 @ 2.40GHz

Since it appears only the clock multiplier has changed, I'm
guessing we've hit a bug in the earlier BIOS. These boards
don't have flash capable BIOS AFAICT -- looks like the little
trap-door socket style that houses a removable chip?

I have the full dmidecode files, dmesg from 3.11-distro kernel,
lspci, and /proc/cpuinfo from both boards, if that is needed.

Paul.


2014-04-01 21:59:43

by Brown, Len

[permalink] [raw]
Subject: RE: Regression in intel_idle on Avaton/Rangely Mohon Peak board

> I've got an eval board with a 1.7GHz Avaton/C2000 that hangs at boot
> shortly after the idle driver registration -- typically 1/2 dozen
> dmesg lines later, around rtc init, or net stack init.

Paul,
Please boot the failing board with "intel_idle.max_cstate=0"
to disable intel_idle entirely, and then show the C-states
exported by acpi_idle, that predumably, are stable on both boards:

dmesg | grep idle
grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*

Then go back and boot with "intel_idle.max_cstate=N"
where N is incremented by 1 until when the system fails
and note the largest N that still works.

> The interesting part is that a nearly identical board, but with
> different (newer/faster) CPU and newer BIOS doesn't have the hang.

Possibly an electrical bug in the earlier board.
Maybe they worked around it by disabling a C-state in ACPI
and didn't test upstream Linux?

I'd be interested in the acpi_idle output above for both the
new and old boards to see if they are exporting different states
on the two boards.

dmidecode isn't useful in this case. The CPUID in /proc/cpuinfo
may be useful if the problem turns out to be associated with
some stepping.

thanks,
-Len

2014-04-01 23:45:59

by Paul Gortmaker

[permalink] [raw]
Subject: Re: Regression in intel_idle on Avaton/Rangely Mohon Peak board

[RE: Regression in intel_idle on Avaton/Rangely Mohon Peak board] On 01/04/2014 (Tue 17:59) Brown, Len wrote:

> > I've got an eval board with a 1.7GHz Avaton/C2000 that hangs at boot
> > shortly after the idle driver registration -- typically 1/2 dozen
> > dmesg lines later, around rtc init, or net stack init.
>
> Paul,
> Please boot the failing board with "intel_idle.max_cstate=0"
> to disable intel_idle entirely, and then show the C-states
> exported by acpi_idle, that predumably, are stable on both boards:
>
> dmesg | grep idle
> grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
>
> Then go back and boot with "intel_idle.max_cstate=N"
> where N is incremented by 1 until when the system fails
> and note the largest N that still works.

OK, I kept the failing board on loan, since I expected a reply that
would contain "can you try this..." :) I will be able to do the
above tomorrow (EST).

>
> > The interesting part is that a nearly identical board, but with
> > different (newer/faster) CPU and newer BIOS doesn't have the hang.
>
> Possibly an electrical bug in the earlier board.
> Maybe they worked around it by disabling a C-state in ACPI
> and didn't test upstream Linux?
>
> I'd be interested in the acpi_idle output above for both the
> new and old boards to see if they are exporting different states
> on the two boards.

Could be ; I can probably get access to the newer one again too, if
that will be useful.

>
> dmidecode isn't useful in this case. The CPUID in /proc/cpuinfo
> may be useful if the problem turns out to be associated with
> some stepping.

The dmidecode info I'd posted indicated that the steppings were
unnchanged. I can get the /proc/cpuinfo tomorrow, but I figured
the dmidecode stepping info was accurate. Is it not reliable?

P.
--

>
> thanks,
> -Len
>

2014-04-02 20:01:08

by Paul Gortmaker

[permalink] [raw]
Subject: Re: Regression in intel_idle on Avaton/Rangely Mohon Peak board

On 14-04-01 05:59 PM, Brown, Len wrote:
>> I've got an eval board with a 1.7GHz Avaton/C2000 that hangs at boot
>> shortly after the idle driver registration -- typically 1/2 dozen
>> dmesg lines later, around rtc init, or net stack init.
>
> Paul,
> Please boot the failing board with "intel_idle.max_cstate=0"
> to disable intel_idle entirely, and then show the C-states
> exported by acpi_idle, that predumably, are stable on both boards:
>
> dmesg | grep idle
> grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
>
> Then go back and boot with "intel_idle.max_cstate=N"
> where N is incremented by 1 until when the system fails
> and note the largest N that still works.

The dying board works for N=1, fails for N=2.

root@localhost:/sys/devices/system/cpu/cpuidle# grep . *
current_driver:intel_idle
current_governor_ro:menu
root@localhost:/sys/devices/system/cpu/cpuidle# dmesg|grep idle
[ 0.000000] Command line: BOOT_IMAGE=/boot/bzImage-current console=tty0 noinitrd root=/dev/sda4 rw ip=dhcp selinux=0 enforcing=0 intel_idle.max_cstate=1
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/bzImage-current console=tty0 noinitrd root=/dev/sda4 rw ip=dhcp selinux=0 enforcing=0 intel_idle.max_cstate=1
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.217203] cpuidle: using governor ladder
[ 0.217309] cpuidle: using governor menu
[ 0.840598] intel_idle: MWAIT substates: 0x33000020
[ 0.840662] intel_idle: v0.4 model 0x4D
[ 0.840668] intel_idle: lapic_timer_reliable_states 0x2
[ 0.840673] intel_idle: max_cstate 1 reached
root@localhost:/sys/devices/system/cpu/cpuidle#

Another interesting data point -- the dying board doesn't die if
I boot 3.14's x86-64 defconfig. Nothing immediately jumps out at
me in the dying .config ; there are a few tweaks in there like
RCU_NOCB etc. that I'll have to weed out with a pseudo .config
bisect I guess....

I'll go get the N=1 and N=2 data for the working board next.

Paul.
--

>
>> The interesting part is that a nearly identical board, but with
>> different (newer/faster) CPU and newer BIOS doesn't have the hang.
>
> Possibly an electrical bug in the earlier board.
> Maybe they worked around it by disabling a C-state in ACPI
> and didn't test upstream Linux?
>
> I'd be interested in the acpi_idle output above for both the
> new and old boards to see if they are exporting different states
> on the two boards.
>
> dmidecode isn't useful in this case. The CPUID in /proc/cpuinfo
> may be useful if the problem turns out to be associated with
> some stepping.
>
> thanks,
> -Len
>

2014-04-02 20:31:19

by Paul Gortmaker

[permalink] [raw]
Subject: Re: Regression in intel_idle on Avaton/Rangely Mohon Peak board

[Re: Regression in intel_idle on Avaton/Rangely Mohon Peak board] On 02/04/2014 (Wed 16:01) Paul Gortmaker wrote:

> On 14-04-01 05:59 PM, Brown, Len wrote:
> >> I've got an eval board with a 1.7GHz Avaton/C2000 that hangs at boot
> >> shortly after the idle driver registration -- typically 1/2 dozen
> >> dmesg lines later, around rtc init, or net stack init.
> >
> > Paul,
> > Please boot the failing board with "intel_idle.max_cstate=0"
> > to disable intel_idle entirely, and then show the C-states
> > exported by acpi_idle, that predumably, are stable on both boards:
> >
> > dmesg | grep idle
> > grep . /sys/devices/system/cpu/cpu0/cpuidle/*/*
> >
> > Then go back and boot with "intel_idle.max_cstate=N"
> > where N is incremented by 1 until when the system fails
> > and note the largest N that still works.
>
> The dying board works for N=1, fails for N=2.
>
> root@localhost:/sys/devices/system/cpu/cpuidle# grep . *
> current_driver:intel_idle
> current_governor_ro:menu
> root@localhost:/sys/devices/system/cpu/cpuidle# dmesg|grep idle
> [ 0.000000] Command line: BOOT_IMAGE=/boot/bzImage-current console=tty0 noinitrd root=/dev/sda4 rw ip=dhcp selinux=0 enforcing=0 intel_idle.max_cstate=1
> [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/bzImage-current console=tty0 noinitrd root=/dev/sda4 rw ip=dhcp selinux=0 enforcing=0 intel_idle.max_cstate=1
> [ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
> [ 0.217203] cpuidle: using governor ladder
> [ 0.217309] cpuidle: using governor menu
> [ 0.840598] intel_idle: MWAIT substates: 0x33000020
> [ 0.840662] intel_idle: v0.4 model 0x4D
> [ 0.840668] intel_idle: lapic_timer_reliable_states 0x2
> [ 0.840673] intel_idle: max_cstate 1 reached
> root@localhost:/sys/devices/system/cpu/cpuidle#

...and the working board differs in reliable states, and it never
prints out max_cstate reached either. Here are the data sets for
no boot arg, and N=1 and N=2 from the working board with newer bios:

---------------- no bootarg ---------------------
root@localhost:/sys/devices/system/cpu/cpuidle# dmesg|grep idle
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.220217] cpuidle: using governor ladder
[ 0.220323] cpuidle: using governor menu
[ 0.877519] intel_idle: MWAIT substates: 0x33000020
[ 0.877524] intel_idle: v0.4 model 0x4D
[ 0.877528] intel_idle: lapic_timer_reliable_states 0xffffffff
root@localhost:/sys/devices/system/cpu/cpuidle# grep . *
current_driver:intel_idle
current_governor_ro:menu
root@localhost:/sys/devices/system/cpu/cpuidle#

--------------- N=1 ----------------
root@localhost:/sys/devices/system/cpu/cpuidle# dmesg|grep idle
[ 0.000000] Command line: BOOT_IMAGE=/boot/bzImage-current console=tty0 noinitrd root=/dev/sda4 rw ip=dhcp selinux=0 enforcing=0 intel_idle.max_cstate=1
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/bzImage-current console=tty0 noinitrd root=/dev/sda4 rw ip=dhcp selinux=0 enforcing=0 intel_idle.max_cstate=1
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.220169] cpuidle: using governor ladder
[ 0.220276] cpuidle: using governor menu
[ 0.786569] intel_idle: MWAIT substates: 0x33000020
[ 0.786574] intel_idle: v0.4 model 0x4D
[ 0.786578] intel_idle: lapic_timer_reliable_states 0xffffffff
[ 0.786582] intel_idle: max_cstate 1 reached
root@localhost:/sys/devices/system/cpu/cpuidle# grep . *
current_driver:intel_idle
current_governor_ro:menu
root@localhost:/sys/devices/system/cpu/cpuidle#

--------------- N=2 ----------------
root@localhost:~# cd /sys/devices/system/cpu/cpuidle/
root@localhost:/sys/devices/system/cpu/cpuidle# dmesg|grep idle
[ 0.000000] Command line: BOOT_IMAGE=/boot/bzImage-current console=tty0 noinitrd root=/dev/sda4 rw ip=dhcp selinux=0 enforcing=0 intel_idle.max_cstate=2
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/bzImage-current console=tty0 noinitrd root=/dev/sda4 rw ip=dhcp selinux=0 enforcing=0 intel_idle.max_cstate=2
[ 0.000000] RCU dyntick-idle grace-period acceleration is enabled.
[ 0.220415] cpuidle: using governor ladder
[ 0.220524] cpuidle: using governor menu
[ 0.877641] intel_idle: MWAIT substates: 0x33000020
[ 0.877646] intel_idle: v0.4 model 0x4D
[ 0.877649] intel_idle: lapic_timer_reliable_states 0xffffffff
root@localhost:/sys/devices/system/cpu/cpuidle# grep . *
current_driver:intel_idle
current_governor_ro:menu
root@localhost:/sys/devices/system/cpu/cpuidle#

Paul.
--

>
> Another interesting data point -- the dying board doesn't die if
> I boot 3.14's x86-64 defconfig. Nothing immediately jumps out at
> me in the dying .config ; there are a few tweaks in there like
> RCU_NOCB etc. that I'll have to weed out with a pseudo .config
> bisect I guess....
>
> I'll go get the N=1 and N=2 data for the working board next.
>
> Paul.
> --
>
> >
> >> The interesting part is that a nearly identical board, but with
> >> different (newer/faster) CPU and newer BIOS doesn't have the hang.
> >
> > Possibly an electrical bug in the earlier board.
> > Maybe they worked around it by disabling a C-state in ACPI
> > and didn't test upstream Linux?
> >
> > I'd be interested in the acpi_idle output above for both the
> > new and old boards to see if they are exporting different states
> > on the two boards.
> >
> > dmidecode isn't useful in this case. The CPUID in /proc/cpuinfo
> > may be useful if the problem turns out to be associated with
> > some stepping.
> >
> > thanks,
> > -Len
> >

2014-04-02 21:50:08

by Brown, Len

[permalink] [raw]
Subject: RE: Regression in intel_idle on Avaton/Rangely Mohon Peak board

> > I'd be interested in the acpi_idle output above for both the
> > new and old boards to see if they are exporting different states
> > on the two boards.
>
> Could be ; I can probably get access to the newer one again too, if
> that will be useful.

yes, please.

> >
> > dmidecode isn't useful in this case. The CPUID in /proc/cpuinfo
> > may be useful if the problem turns out to be associated with
> > some stepping.
>
> The dmidecode info I'd posted indicated that the steppings were
> unnchanged. I can get the /proc/cpuinfo tomorrow, but I figured
> the dmidecode stepping info was accurate. Is it not reliable?


DMI is useful to get the BIOS version string, not much else.
Here you want the output from the CPUID instruction,
which is displayed as family/model/stepping in /proc/cpuinfo

thanks,
-Len

2014-04-02 21:58:45

by Brown, Len

[permalink] [raw]
Subject: RE: Regression in intel_idle on Avaton/Rangely Mohon Peak board

> [ 0.840668] intel_idle: lapic_timer_reliable_states 0x2
vs.
> [ 0.877528] intel_idle: lapic_timer_reliable_states 0xffffffff

This means CPUID.ARAT is set for the new board, and not set
for the old board. You can observe that also in /proc/cpuinfo flags
where you will likely also find a visible stepping difference between
these two boards.

Bring-up Avoton had ARAT disabled, and also had broken deep C-states.
It looks like that is what your old board has. Throw it away and
use only production steppings. If you are stuck w/ pre-production hardware,
then you need to manually modify upstream Linux to make it happy --
since upstream Linux only cares about production hardware.

thanks,
-Len

2014-04-02 23:30:13

by Paul Gortmaker

[permalink] [raw]
Subject: Re: Regression in intel_idle on Avaton/Rangely Mohon Peak board

On 14-04-02 05:58 PM, Brown, Len wrote:
>> [ 0.840668] intel_idle: lapic_timer_reliable_states 0x2
> vs.
>> [ 0.877528] intel_idle: lapic_timer_reliable_states 0xffffffff
>
> This means CPUID.ARAT is set for the new board, and not set
> for the old board. You can observe that also in /proc/cpuinfo flags
> where you will likely also find a visible stepping difference between
> these two boards.

Stepping is same, but ucode is different:

-microcode : 0x7
+microcode : 0xc
-bogomips : 3399.78
+bogomips : 4787.51
-cpu MHz : 1700.000
+cpu MHz : 2393.759

...and the newer core has "nonstop_tsc" and "arat" as you'd guessed.

>
> Bring-up Avoton had ARAT disabled, and also had broken deep C-states.
> It looks like that is what your old board has. Throw it away and
> use only production steppings. If you are stuck w/ pre-production hardware,

Yep, understood (and expected) -- that is why I'd mentioned at the
beginning that "It may be that this early board/early bios makes it
a non-issue for mainline..."

> then you need to manually modify upstream Linux to make it happy --
> since upstream Linux only cares about production hardware.

Actually it turns out that the defconfig boots okay because it doesn't
use CONFIG_INTEL_IDLE, so that, or the command line max_cstate are
two easy work-arounds if others are stuck on the pre production kit.

Thanks for the suggestions and the final diagnosis. The full cpuinfo
is below if there is anything you wanted to also see.

Paul.
---

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 77
model name : Genuine Intel(R) CPU 4000 @ 1.70GHz
stepping : 0
microcode : 0x7
cpu MHz : 1700.000
cache size : 1024 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 8
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch dtherm tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms
bogomips : 3399.78
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management: