Hi,
I just built a 3.11-rc2 kernel (+ a few patches, but nothing
arch-related), and I saw the following: http://i.imgur.com/dCTqOyR.jpg
The rough transcription is
Call Trace:
<IRQ>
generic_smp_call_fucntion_single_interrupt
smp_call_function_single_interrupt
call_function_single_interrupt
<EOI>
? default_idle
? default_idle
arch_cpu_idle
cpu_startup_entry
rest_init
start_kernel
? repair_env_string
x86_64_start_reservations
x86_64_start_kernel
Code: ... cc 81 8b 0f <0f> 32 48 c1 e2 20 89 c0 ...
RIP: __rdmsr_on_cpu+0x2e/0x44
Kernel panic - not syncing: Fatal exception in interrupt
A 3.10-rc7 kernel booted just fine. Is this likely a real issue? Or
perhaps a mis-build of some sort?
Thanks for any advice,
-ilia
On Thu, Jul 25, 2013 at 6:32 PM, Ilia Mirkin <[email protected]> wrote:
> Hi,
>
> I just built a 3.11-rc2 kernel (+ a few patches, but nothing
> arch-related), and I saw the following: http://i.imgur.com/dCTqOyR.jpg
>
> The rough transcription is
>
> Call Trace:
> <IRQ>
> generic_smp_call_fucntion_single_interrupt
> smp_call_function_single_interrupt
> call_function_single_interrupt
> <EOI>
> ? default_idle
> ? default_idle
> arch_cpu_idle
> cpu_startup_entry
> rest_init
> start_kernel
> ? repair_env_string
> x86_64_start_reservations
> x86_64_start_kernel
> Code: ... cc 81 8b 0f <0f> 32 48 c1 e2 20 89 c0 ...
> RIP: __rdmsr_on_cpu+0x2e/0x44
> Kernel panic - not syncing: Fatal exception in interrupt
>
> A 3.10-rc7 kernel booted just fine. Is this likely a real issue? Or
> perhaps a mis-build of some sort?
FWIW this is repeatable. I did a clean build (make clean && make) and
I still see the same thing. I have a Core i7-920 cpu, not sure what
other information would be relevant. I'd love to avoid a bisect, so
some likely candidates would be most welcome.
Thanks,
-ilia
On Fri, Jul 26, 2013 at 7:59 AM, Ilia Mirkin <[email protected]> wrote:
> On Thu, Jul 25, 2013 at 6:32 PM, Ilia Mirkin <[email protected]> wrote:
>> Hi,
>>
>> I just built a 3.11-rc2 kernel (+ a few patches, but nothing
>> arch-related), and I saw the following: http://i.imgur.com/dCTqOyR.jpg
>>
>> The rough transcription is
>>
>> Call Trace:
>> <IRQ>
>> generic_smp_call_fucntion_single_interrupt
>> smp_call_function_single_interrupt
>> call_function_single_interrupt
>> <EOI>
>> ? default_idle
>> ? default_idle
>> arch_cpu_idle
>> cpu_startup_entry
>> rest_init
>> start_kernel
>> ? repair_env_string
>> x86_64_start_reservations
>> x86_64_start_kernel
>> Code: ... cc 81 8b 0f <0f> 32 48 c1 e2 20 89 c0 ...
>> RIP: __rdmsr_on_cpu+0x2e/0x44
>> Kernel panic - not syncing: Fatal exception in interrupt
>>
>> A 3.10-rc7 kernel booted just fine. Is this likely a real issue? Or
>> perhaps a mis-build of some sort?
>
> FWIW this is repeatable. I did a clean build (make clean && make) and
> I still see the same thing. I have a Core i7-920 cpu, not sure what
> other information would be relevant. I'd love to avoid a bisect, so
> some likely candidates would be most welcome.
Aha, figured it out. I had enabled "X86 package temperature thermal
driver" = Y, which caused my Core i7-920 to produce the above trace on
boot. Glancing over the code, should this:
if (!cpu_has(c, X86_FEATURE_DTHERM) &&
!cpu_has(c, X86_FEATURE_PTS))
return -ENODEV;
perhaps be
if (!cpu_has(c, X86_FEATURE_DTHERM) ||
!cpu_has(c, X86_FEATURE_PTS))
return -ENODEV;
i.e. are both of those things required, or just one of them? My cpu
has DTHERM but not PTS, according to /proc/cpuinfo.
-ilia
This is already fixed and it is in Linus main line. Check commit id
"f3ed0a17f0292300b3caca32d823ecd32554a667"
Thanks for analysis and you are correct.
Thanks,
Srinivas
On 07/26/2013 06:15 AM, Ilia Mirkin wrote:
> On Fri, Jul 26, 2013 at 7:59 AM, Ilia Mirkin <[email protected]> wrote:
>> On Thu, Jul 25, 2013 at 6:32 PM, Ilia Mirkin <[email protected]> wrote:
>>> Hi,
>>>
>>> I just built a 3.11-rc2 kernel (+ a few patches, but nothing
>>> arch-related), and I saw the following: http://i.imgur.com/dCTqOyR.jpg
>>>
>>> The rough transcription is
>>>
>>> Call Trace:
>>> <IRQ>
>>> generic_smp_call_fucntion_single_interrupt
>>> smp_call_function_single_interrupt
>>> call_function_single_interrupt
>>> <EOI>
>>> ? default_idle
>>> ? default_idle
>>> arch_cpu_idle
>>> cpu_startup_entry
>>> rest_init
>>> start_kernel
>>> ? repair_env_string
>>> x86_64_start_reservations
>>> x86_64_start_kernel
>>> Code: ... cc 81 8b 0f <0f> 32 48 c1 e2 20 89 c0 ...
>>> RIP: __rdmsr_on_cpu+0x2e/0x44
>>> Kernel panic - not syncing: Fatal exception in interrupt
>>>
>>> A 3.10-rc7 kernel booted just fine. Is this likely a real issue? Or
>>> perhaps a mis-build of some sort?
>> FWIW this is repeatable. I did a clean build (make clean && make) and
>> I still see the same thing. I have a Core i7-920 cpu, not sure what
>> other information would be relevant. I'd love to avoid a bisect, so
>> some likely candidates would be most welcome.
> Aha, figured it out. I had enabled "X86 package temperature thermal
> driver" = Y, which caused my Core i7-920 to produce the above trace on
> boot. Glancing over the code, should this:
>
> if (!cpu_has(c, X86_FEATURE_DTHERM) &&
> !cpu_has(c, X86_FEATURE_PTS))
> return -ENODEV;
>
> perhaps be
>
> if (!cpu_has(c, X86_FEATURE_DTHERM) ||
> !cpu_has(c, X86_FEATURE_PTS))
> return -ENODEV;
>
> i.e. are both of those things required, or just one of them? My cpu
> has DTHERM but not PTS, according to /proc/cpuinfo.
>
> -ilia
>