2013-05-19 11:26:34

by Tommy Apel

[permalink] [raw]
Subject: kernel 3.10-rc1 p-state/cpuidle panic

Hello guys, I'm getting this with the current 3.10-rc1, I've enabled the new full-NOHZ
I'm not sure though if that has something to do with this or if something is changed in the
p-state code.

System :
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
stepping : 7
microcode : 0x70b
cpu MHz : 1176.000
cache size : 10240 KB
physical id : 1
siblings : 4
core id : 3
cpu cores : 4
apicid : 38
initial apicid : 38
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : [lots of stuff]
bogomips : 4800.56
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:

crash bt:
PID: 0 TASK: ffff88085c585950 CPU: 5 COMMAND: "swapper/5"
#0 [ffff88107fc83b80] machine_kexec at ffffffff8102aad6
#1 [ffff88107fc83bc0] crash_kexec at ffffffff810e57d0
#2 [ffff88107fc83c90] oops_end at ffffffff810073b8
#3 [ffff88107fc83cb0] do_divide_error at ffffffff810040c2
#4 [ffff88107fc83d50] divide_error at ffffffff81637348
[exception RIP: intel_pstate_timer_func+1071]
RIP: ffffffff814f501f RSP: ffff88107fc83e08 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8808555b9e00 RCX: ffff8808555b9e40
RDX: 0000000000000000 RSI: 0000000000000064 RDI: 0000000000000001
RBP: 0000025debb95955 R8: ffff8808555b9f70 R9: 00000000000003e0
R10: dead000000200200 R11: 0000000000000000 R12: 0000001877f8d1d3
R13: 000000009a45ae19 R14: 0000001d27d3d1d8 R15: 0000000000000040
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#5 [ffff88107fc83e00] intel_pstate_timer_func at ffffffff814f4fa2
#6 [ffff88107fc83e60] call_timer_fn at ffffffff8109edc3
#7 [ffff88107fc83e90] run_timer_softirq at ffffffff810a097e
#8 [ffff88107fc83f10] __do_softirq at ffffffff810990f1
#9 [ffff88107fc83f80] irq_exit at ffffffff810993ce
#10 [ffff88107fc83f90] smp_apic_timer_interrupt at ffffffff81026448
#11 [ffff88107fc83fb0] apic_timer_interrupt at ffffffff81636eca
--- <IRQ stack> ---
#12 [ffff88085c587db8] apic_timer_interrupt at ffffffff81636eca
[exception RIP: cpuidle_enter_state+72]
RIP: ffffffff814f5978 RSP: ffff88085c587e68 RFLAGS: 00000216
RAX: 000000000001c61d RBX: ffffffff810a11e8 RCX: 0000000000000018
RDX: 0000000225c17d03 RSI: ffff88085c587fd8 RDI: ffffffff81a12500
RBP: 0000000000000002 R8: 0000000000000030 R9: 0000000000000001
R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff810c48dd
R13: ffff88085c173950 R14: 0000000000000086 R15: 0000000000000000
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
#13 [ffff88085c587eb0] cpuidle_idle_call at ffffffff814f5aca
#14 [ffff88085c587ef0] arch_cpu_idle at ffffffff8100cef9
#15 [ffff88085c587f00] cpu_startup_entry at ffffffff810ce5af

kernel config:
http://pastebin.com/AmEqQNZx

/Tommy


2013-05-20 05:08:42

by Borislav Petkov

[permalink] [raw]
Subject: Re: kernel 3.10-rc1 p-state/cpuidle panic

Hmm,

divide by 0, it seems.

+ Dirk Brandewie.

On Sun, May 19, 2013 at 01:25:41PM +0200, Tommy Apel Hansen wrote:
> Hello guys, I'm getting this with the current 3.10-rc1, I've enabled the new full-NOHZ
> I'm not sure though if that has something to do with this or if something is changed in the
> p-state code.
>
> System :
> vendor_id : GenuineIntel
> cpu family : 6
> model : 45
> model name : Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
> stepping : 7
> microcode : 0x70b
> cpu MHz : 1176.000
> cache size : 10240 KB
> physical id : 1
> siblings : 4
> core id : 3
> cpu cores : 4
> apicid : 38
> initial apicid : 38
> fpu : yes
> fpu_exception : yes
> cpuid level : 13
> wp : yes
> flags : [lots of stuff]
> bogomips : 4800.56
> clflush size : 64
> cache_alignment : 64
> address sizes : 46 bits physical, 48 bits virtual
> power management:
>
> crash bt:
> PID: 0 TASK: ffff88085c585950 CPU: 5 COMMAND: "swapper/5"
> #0 [ffff88107fc83b80] machine_kexec at ffffffff8102aad6
> #1 [ffff88107fc83bc0] crash_kexec at ffffffff810e57d0
> #2 [ffff88107fc83c90] oops_end at ffffffff810073b8
> #3 [ffff88107fc83cb0] do_divide_error at ffffffff810040c2
> #4 [ffff88107fc83d50] divide_error at ffffffff81637348
> [exception RIP: intel_pstate_timer_func+1071]
> RIP: ffffffff814f501f RSP: ffff88107fc83e08 RFLAGS: 00010246
> RAX: 0000000000000000 RBX: ffff8808555b9e00 RCX: ffff8808555b9e40
> RDX: 0000000000000000 RSI: 0000000000000064 RDI: 0000000000000001
> RBP: 0000025debb95955 R8: ffff8808555b9f70 R9: 00000000000003e0
> R10: dead000000200200 R11: 0000000000000000 R12: 0000001877f8d1d3
> R13: 000000009a45ae19 R14: 0000001d27d3d1d8 R15: 0000000000000040
> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> #5 [ffff88107fc83e00] intel_pstate_timer_func at ffffffff814f4fa2
> #6 [ffff88107fc83e60] call_timer_fn at ffffffff8109edc3
> #7 [ffff88107fc83e90] run_timer_softirq at ffffffff810a097e
> #8 [ffff88107fc83f10] __do_softirq at ffffffff810990f1
> #9 [ffff88107fc83f80] irq_exit at ffffffff810993ce
> #10 [ffff88107fc83f90] smp_apic_timer_interrupt at ffffffff81026448
> #11 [ffff88107fc83fb0] apic_timer_interrupt at ffffffff81636eca
> --- <IRQ stack> ---
> #12 [ffff88085c587db8] apic_timer_interrupt at ffffffff81636eca
> [exception RIP: cpuidle_enter_state+72]
> RIP: ffffffff814f5978 RSP: ffff88085c587e68 RFLAGS: 00000216
> RAX: 000000000001c61d RBX: ffffffff810a11e8 RCX: 0000000000000018
> RDX: 0000000225c17d03 RSI: ffff88085c587fd8 RDI: ffffffff81a12500
> RBP: 0000000000000002 R8: 0000000000000030 R9: 0000000000000001
> R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff810c48dd
> R13: ffff88085c173950 R14: 0000000000000086 R15: 0000000000000000
> ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
> #13 [ffff88085c587eb0] cpuidle_idle_call at ffffffff814f5aca
> #14 [ffff88085c587ef0] arch_cpu_idle at ffffffff8100cef9
> #15 [ffff88085c587f00] cpu_startup_entry at ffffffff810ce5af
>
> kernel config:
> http://pastebin.com/AmEqQNZx
>
> /Tommy
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-05-20 10:02:57

by Tommy Apel

[permalink] [raw]
Subject: Re: kernel 3.10-rc1 p-state/cpuidle panic

I think it's worth mentioning that this happens on a dual cpu system,
I'm running the exact same kernel on a Xeon E3
and has not had this problem.

I also changed back to the regular dyntick and after that the dual cpu
system has been stabil.

/Tommy

On May 20, 2013 7:08 AM, "Borislav Petkov" <[email protected]> wrote:
>
> Hmm,
>
> divide by 0, it seems.
>
> + Dirk Brandewie.
>
> On Sun, May 19, 2013 at 01:25:41PM +0200, Tommy Apel Hansen wrote:
> > Hello guys, I'm getting this with the current 3.10-rc1, I've enabled the new full-NOHZ
> > I'm not sure though if that has something to do with this or if something is changed in the
> > p-state code.
> >
> > System :
> > vendor_id : GenuineIntel
> > cpu family : 6
> > model : 45
> > model name : Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
> > stepping : 7
> > microcode : 0x70b
> > cpu MHz : 1176.000
> > cache size : 10240 KB
> > physical id : 1
> > siblings : 4
> > core id : 3
> > cpu cores : 4
> > apicid : 38
> > initial apicid : 38
> > fpu : yes
> > fpu_exception : yes
> > cpuid level : 13
> > wp : yes
> > flags : [lots of stuff]
> > bogomips : 4800.56
> > clflush size : 64
> > cache_alignment : 64
> > address sizes : 46 bits physical, 48 bits virtual
> > power management:
> >
> > crash bt:
> > PID: 0 TASK: ffff88085c585950 CPU: 5 COMMAND: "swapper/5"
> > #0 [ffff88107fc83b80] machine_kexec at ffffffff8102aad6
> > #1 [ffff88107fc83bc0] crash_kexec at ffffffff810e57d0
> > #2 [ffff88107fc83c90] oops_end at ffffffff810073b8
> > #3 [ffff88107fc83cb0] do_divide_error at ffffffff810040c2
> > #4 [ffff88107fc83d50] divide_error at ffffffff81637348
> > [exception RIP: intel_pstate_timer_func+1071]
> > RIP: ffffffff814f501f RSP: ffff88107fc83e08 RFLAGS: 00010246
> > RAX: 0000000000000000 RBX: ffff8808555b9e00 RCX: ffff8808555b9e40
> > RDX: 0000000000000000 RSI: 0000000000000064 RDI: 0000000000000001
> > RBP: 0000025debb95955 R8: ffff8808555b9f70 R9: 00000000000003e0
> > R10: dead000000200200 R11: 0000000000000000 R12: 0000001877f8d1d3
> > R13: 000000009a45ae19 R14: 0000001d27d3d1d8 R15: 0000000000000040
> > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> > #5 [ffff88107fc83e00] intel_pstate_timer_func at ffffffff814f4fa2
> > #6 [ffff88107fc83e60] call_timer_fn at ffffffff8109edc3
> > #7 [ffff88107fc83e90] run_timer_softirq at ffffffff810a097e
> > #8 [ffff88107fc83f10] __do_softirq at ffffffff810990f1
> > #9 [ffff88107fc83f80] irq_exit at ffffffff810993ce
> > #10 [ffff88107fc83f90] smp_apic_timer_interrupt at ffffffff81026448
> > #11 [ffff88107fc83fb0] apic_timer_interrupt at ffffffff81636eca
> > --- <IRQ stack> ---
> > #12 [ffff88085c587db8] apic_timer_interrupt at ffffffff81636eca
> > [exception RIP: cpuidle_enter_state+72]
> > RIP: ffffffff814f5978 RSP: ffff88085c587e68 RFLAGS: 00000216
> > RAX: 000000000001c61d RBX: ffffffff810a11e8 RCX: 0000000000000018
> > RDX: 0000000225c17d03 RSI: ffff88085c587fd8 RDI: ffffffff81a12500
> > RBP: 0000000000000002 R8: 0000000000000030 R9: 0000000000000001
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff810c48dd
> > R13: ffff88085c173950 R14: 0000000000000086 R15: 0000000000000000
> > ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
> > #13 [ffff88085c587eb0] cpuidle_idle_call at ffffffff814f5aca
> > #14 [ffff88085c587ef0] arch_cpu_idle at ffffffff8100cef9
> > #15 [ffff88085c587f00] cpu_startup_entry at ffffffff810ce5af
> >
> > kernel config:
> > http://pastebin.com/AmEqQNZx
> >
> > /Tommy
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --

2013-05-20 12:14:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: kernel 3.10-rc1 p-state/cpuidle panic

On Mon, May 20, 2013 at 12:02:53PM +0200, Tommy Apel wrote:
> I think it's worth mentioning that this happens on a dual cpu system,
> I'm running the exact same kernel on a Xeon E3
> and has not had this problem.
>
> I also changed back to the regular dyntick and after that the dual cpu
> system has been stabil.

True story - NO_HZ_FULL=y. Although I can't see the connection between
the issue and NO_HZ_FULL.

Adding Frederic and leaving in the rest for reference.

> On May 20, 2013 7:08 AM, "Borislav Petkov" <[email protected]> wrote:
> >
> > Hmm,
> >
> > divide by 0, it seems.
> >
> > + Dirk Brandewie.
> >
> > On Sun, May 19, 2013 at 01:25:41PM +0200, Tommy Apel Hansen wrote:
> > > Hello guys, I'm getting this with the current 3.10-rc1, I've enabled the new full-NOHZ
> > > I'm not sure though if that has something to do with this or if something is changed in the
> > > p-state code.
> > >
> > > System :
> > > vendor_id : GenuineIntel
> > > cpu family : 6
> > > model : 45
> > > model name : Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
> > > stepping : 7
> > > microcode : 0x70b
> > > cpu MHz : 1176.000
> > > cache size : 10240 KB
> > > physical id : 1
> > > siblings : 4
> > > core id : 3
> > > cpu cores : 4
> > > apicid : 38
> > > initial apicid : 38
> > > fpu : yes
> > > fpu_exception : yes
> > > cpuid level : 13
> > > wp : yes
> > > flags : [lots of stuff]
> > > bogomips : 4800.56
> > > clflush size : 64
> > > cache_alignment : 64
> > > address sizes : 46 bits physical, 48 bits virtual
> > > power management:
> > >
> > > crash bt:
> > > PID: 0 TASK: ffff88085c585950 CPU: 5 COMMAND: "swapper/5"
> > > #0 [ffff88107fc83b80] machine_kexec at ffffffff8102aad6
> > > #1 [ffff88107fc83bc0] crash_kexec at ffffffff810e57d0
> > > #2 [ffff88107fc83c90] oops_end at ffffffff810073b8
> > > #3 [ffff88107fc83cb0] do_divide_error at ffffffff810040c2
> > > #4 [ffff88107fc83d50] divide_error at ffffffff81637348
> > > [exception RIP: intel_pstate_timer_func+1071]
> > > RIP: ffffffff814f501f RSP: ffff88107fc83e08 RFLAGS: 00010246
> > > RAX: 0000000000000000 RBX: ffff8808555b9e00 RCX: ffff8808555b9e40
> > > RDX: 0000000000000000 RSI: 0000000000000064 RDI: 0000000000000001
> > > RBP: 0000025debb95955 R8: ffff8808555b9f70 R9: 00000000000003e0
> > > R10: dead000000200200 R11: 0000000000000000 R12: 0000001877f8d1d3
> > > R13: 000000009a45ae19 R14: 0000001d27d3d1d8 R15: 0000000000000040
> > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
> > > #5 [ffff88107fc83e00] intel_pstate_timer_func at ffffffff814f4fa2
> > > #6 [ffff88107fc83e60] call_timer_fn at ffffffff8109edc3
> > > #7 [ffff88107fc83e90] run_timer_softirq at ffffffff810a097e
> > > #8 [ffff88107fc83f10] __do_softirq at ffffffff810990f1
> > > #9 [ffff88107fc83f80] irq_exit at ffffffff810993ce
> > > #10 [ffff88107fc83f90] smp_apic_timer_interrupt at ffffffff81026448
> > > #11 [ffff88107fc83fb0] apic_timer_interrupt at ffffffff81636eca
> > > --- <IRQ stack> ---
> > > #12 [ffff88085c587db8] apic_timer_interrupt at ffffffff81636eca
> > > [exception RIP: cpuidle_enter_state+72]
> > > RIP: ffffffff814f5978 RSP: ffff88085c587e68 RFLAGS: 00000216
> > > RAX: 000000000001c61d RBX: ffffffff810a11e8 RCX: 0000000000000018
> > > RDX: 0000000225c17d03 RSI: ffff88085c587fd8 RDI: ffffffff81a12500
> > > RBP: 0000000000000002 R8: 0000000000000030 R9: 0000000000000001
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff810c48dd
> > > R13: ffff88085c173950 R14: 0000000000000086 R15: 0000000000000000
> > > ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
> > > #13 [ffff88085c587eb0] cpuidle_idle_call at ffffffff814f5aca
> > > #14 [ffff88085c587ef0] arch_cpu_idle at ffffffff8100cef9
> > > #15 [ffff88085c587f00] cpu_startup_entry at ffffffff810ce5af
> > >
> > > kernel config:
> > > http://pastebin.com/AmEqQNZx
> > >
> > > /Tommy
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to [email protected]
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at http://www.tux.org/lkml/
> > >
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > Sent from a fat crate under my desk. Formatting is fine.
> > --
>

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-05-20 13:55:51

by Tommy Apel

[permalink] [raw]
Subject: Re: kernel 3.10-rc1 p-state/cpuidle panic

Well it beats me why it breaks on a dual but not a single cpu system,
or maybe I just havn't hit something yet.

2013/5/20 Borislav Petkov <[email protected]>:
> On Mon, May 20, 2013 at 12:02:53PM +0200, Tommy Apel wrote:
>> I think it's worth mentioning that this happens on a dual cpu system,
>> I'm running the exact same kernel on a Xeon E3
>> and has not had this problem.
>>
>> I also changed back to the regular dyntick and after that the dual cpu
>> system has been stabil.
>
> True story - NO_HZ_FULL=y. Although I can't see the connection between
> the issue and NO_HZ_FULL.
>
> Adding Frederic and leaving in the rest for reference.
>
>> On May 20, 2013 7:08 AM, "Borislav Petkov" <[email protected]> wrote:
>> >
>> > Hmm,
>> >
>> > divide by 0, it seems.
>> >
>> > + Dirk Brandewie.
>> >
>> > On Sun, May 19, 2013 at 01:25:41PM +0200, Tommy Apel Hansen wrote:
>> > > Hello guys, I'm getting this with the current 3.10-rc1, I've enabled the new full-NOHZ
>> > > I'm not sure though if that has something to do with this or if something is changed in the
>> > > p-state code.
>> > >
>> > > System :
>> > > vendor_id : GenuineIntel
>> > > cpu family : 6
>> > > model : 45
>> > > model name : Intel(R) Xeon(R) CPU E5-2609 0 @ 2.40GHz
>> > > stepping : 7
>> > > microcode : 0x70b
>> > > cpu MHz : 1176.000
>> > > cache size : 10240 KB
>> > > physical id : 1
>> > > siblings : 4
>> > > core id : 3
>> > > cpu cores : 4
>> > > apicid : 38
>> > > initial apicid : 38
>> > > fpu : yes
>> > > fpu_exception : yes
>> > > cpuid level : 13
>> > > wp : yes
>> > > flags : [lots of stuff]
>> > > bogomips : 4800.56
>> > > clflush size : 64
>> > > cache_alignment : 64
>> > > address sizes : 46 bits physical, 48 bits virtual
>> > > power management:
>> > >
>> > > crash bt:
>> > > PID: 0 TASK: ffff88085c585950 CPU: 5 COMMAND: "swapper/5"
>> > > #0 [ffff88107fc83b80] machine_kexec at ffffffff8102aad6
>> > > #1 [ffff88107fc83bc0] crash_kexec at ffffffff810e57d0
>> > > #2 [ffff88107fc83c90] oops_end at ffffffff810073b8
>> > > #3 [ffff88107fc83cb0] do_divide_error at ffffffff810040c2
>> > > #4 [ffff88107fc83d50] divide_error at ffffffff81637348
>> > > [exception RIP: intel_pstate_timer_func+1071]
>> > > RIP: ffffffff814f501f RSP: ffff88107fc83e08 RFLAGS: 00010246
>> > > RAX: 0000000000000000 RBX: ffff8808555b9e00 RCX: ffff8808555b9e40
>> > > RDX: 0000000000000000 RSI: 0000000000000064 RDI: 0000000000000001
>> > > RBP: 0000025debb95955 R8: ffff8808555b9f70 R9: 00000000000003e0
>> > > R10: dead000000200200 R11: 0000000000000000 R12: 0000001877f8d1d3
>> > > R13: 000000009a45ae19 R14: 0000001d27d3d1d8 R15: 0000000000000040
>> > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
>> > > #5 [ffff88107fc83e00] intel_pstate_timer_func at ffffffff814f4fa2
>> > > #6 [ffff88107fc83e60] call_timer_fn at ffffffff8109edc3
>> > > #7 [ffff88107fc83e90] run_timer_softirq at ffffffff810a097e
>> > > #8 [ffff88107fc83f10] __do_softirq at ffffffff810990f1
>> > > #9 [ffff88107fc83f80] irq_exit at ffffffff810993ce
>> > > #10 [ffff88107fc83f90] smp_apic_timer_interrupt at ffffffff81026448
>> > > #11 [ffff88107fc83fb0] apic_timer_interrupt at ffffffff81636eca
>> > > --- <IRQ stack> ---
>> > > #12 [ffff88085c587db8] apic_timer_interrupt at ffffffff81636eca
>> > > [exception RIP: cpuidle_enter_state+72]
>> > > RIP: ffffffff814f5978 RSP: ffff88085c587e68 RFLAGS: 00000216
>> > > RAX: 000000000001c61d RBX: ffffffff810a11e8 RCX: 0000000000000018
>> > > RDX: 0000000225c17d03 RSI: ffff88085c587fd8 RDI: ffffffff81a12500
>> > > RBP: 0000000000000002 R8: 0000000000000030 R9: 0000000000000001
>> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff810c48dd
>> > > R13: ffff88085c173950 R14: 0000000000000086 R15: 0000000000000000
>> > > ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0018
>> > > #13 [ffff88085c587eb0] cpuidle_idle_call at ffffffff814f5aca
>> > > #14 [ffff88085c587ef0] arch_cpu_idle at ffffffff8100cef9
>> > > #15 [ffff88085c587f00] cpu_startup_entry at ffffffff810ce5af
>> > >
>> > > kernel config:
>> > > http://pastebin.com/AmEqQNZx
>> > >
>> > > /Tommy
>> > >
>> > > --
>> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> > > the body of a message to [email protected]
>> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> > > Please read the FAQ at http://www.tux.org/lkml/
>> > >
>> >
>> > --
>> > Regards/Gruss,
>> > Boris.
>> >
>> > Sent from a fat crate under my desk. Formatting is fine.
>> > --
>>
>
> --
> Regards/Gruss,
> Boris.
>
> Sent from a fat crate under my desk. Formatting is fine.
> --

2013-05-20 14:16:50

by Dirk Brandewie

[permalink] [raw]
Subject: Re: kernel 3.10-rc1 p-state/cpuidle panic

On 05/20/2013 06:55 AM, Tommy Apel wrote:
> Well it beats me why it breaks on a dual but not a single cpu system,
> or maybe I just havn't hit something yet.
>
> 2013/5/20 Borislav Petkov <[email protected]>:
>> On Mon, May 20, 2013 at 12:02:53PM +0200, Tommy Apel wrote:
>>> I think it's worth mentioning that this happens on a dual cpu system,
>>> I'm running the exact same kernel on a Xeon E3
>>> and has not had this problem.
>>>
>>> I also changed back to the regular dyntick and after that the dual cpu
>>> system has been stabil.
>>
>> True story - NO_HZ_FULL=y. Although I can't see the connection between
>> the issue and NO_HZ_FULL.
>>
>> Adding Frederic and leaving in the rest for reference.
>>
>>> On May 20, 2013 7:08 AM, "Borislav Petkov" <[email protected]> wrote:
>>>>
>>>> Hmm,
>>>>
>>>> divide by 0, it seems.
>>>>
>>>> + Dirk Brandewie.
>>>>
>>>> On Sun, May 19, 2013 at 01:25:41PM +0200, Tommy Apel Hansen wrote:
>>>>> Hello guys, I'm getting this with the current 3.10-rc1, I've enabled the new full-NOHZ
>>>>> I'm not sure though if that has something to do with this or if something is changed in the
>>>>> p-state code

The following patch removes the offending division since it was not needed
anway. This is queued for rc-2.

commit ca7bb07c8fa8d982971d4775bef48d0bc3c6e1cf
Author: Dirk Brandewie <[email protected]>
Date: Fri May 17 07:19:58 2013 -0700

cpufreq/intel_pstate: remove idle time and duration from sample and
calculations

Idle time is taken into account in the APERF/MPERF ratio calculation
there is no reason for the driver to track it seperately. This
reduces the work in the driver and makes the code more readable.

Removal of the tracking of sample duration removes the possibility of
the divide by zero exception when the duration is sub 1us

https://bugzilla.kernel.org/show_bug.cgi?id=56691

Reported-by: Mike Lothian <[email protected]>
Cc: [email protected]
Signed-off-by: Dirk Brandewie <[email protected]>
---
drivers/cpufreq/intel_pstate.c | 45 +++++++--------------------------------
1 files changed, 8 insertions(+), 37 deletions(-)

diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
index cc3a8e6..c6e10d0 100644
--- a/drivers/cpufreq/intel_pstate.c
+++ b/drivers/cpufreq/intel_pstate.c
@@ -48,12 +48,7 @@ static inline int32_t div_fp(int32_t x, int32_t y)
}

struct sample {
- ktime_t start_time;
- ktime_t end_time;
int core_pct_busy;
- int pstate_pct_busy;
- u64 duration_us;
- u64 idletime_us;
u64 aperf;
u64 mperf;
int freq;
@@ -91,8 +86,6 @@ struct cpudata {
int min_pstate_count;
int idle_mode;

- ktime_t prev_sample;
- u64 prev_idle_time_us;
u64 prev_aperf;
u64 prev_mperf;
int sample_ptr;
@@ -450,48 +443,26 @@ static inline void intel_pstate_calc_busy(struct cpudata *cpu,
struct sample *sample)
{
u64 core_pct;
- sample->pstate_pct_busy = 100 - div64_u64(
- sample->idletime_us * 100,
- sample->duration_us);
core_pct = div64_u64(sample->aperf * 100, sample->mperf);
sample->freq = cpu->pstate.max_pstate * core_pct * 1000;

- sample->core_pct_busy = div_s64((sample->pstate_pct_busy * core_pct),
- 100);
+ sample->core_pct_busy = core_pct;
}

static inline void intel_pstate_sample(struct cpudata *cpu)
{
- ktime_t now;
- u64 idle_time_us;
u64 aperf, mperf;

- now = ktime_get();
- idle_time_us = get_cpu_idle_time_us(cpu->cpu, NULL);
-
rdmsrl(MSR_IA32_APERF, aperf);
rdmsrl(MSR_IA32_MPERF, mperf);
- /* for the first sample, don't actually record a sample, just
- * set the baseline */
- if (cpu->prev_idle_time_us > 0) {
- cpu->sample_ptr = (cpu->sample_ptr + 1) % SAMPLE_COUNT;
- cpu->samples[cpu->sample_ptr].start_time = cpu->prev_sample;
- cpu->samples[cpu->sample_ptr].end_time = now;
- cpu->samples[cpu->sample_ptr].duration_us =
- ktime_us_delta(now, cpu->prev_sample);
- cpu->samples[cpu->sample_ptr].idletime_us =
- idle_time_us - cpu->prev_idle_time_us;
-
- cpu->samples[cpu->sample_ptr].aperf = aperf;
- cpu->samples[cpu->sample_ptr].mperf = mperf;
- cpu->samples[cpu->sample_ptr].aperf -= cpu->prev_aperf;
- cpu->samples[cpu->sample_ptr].mperf -= cpu->prev_mperf;
-
- intel_pstate_calc_busy(cpu, &cpu->samples[cpu->sample_ptr]);
- }
+ cpu->sample_ptr = (cpu->sample_ptr + 1) % SAMPLE_COUNT;
+ cpu->samples[cpu->sample_ptr].aperf = aperf;
+ cpu->samples[cpu->sample_ptr].mperf = mperf;
+ cpu->samples[cpu->sample_ptr].aperf -= cpu->prev_aperf;
+ cpu->samples[cpu->sample_ptr].mperf -= cpu->prev_mperf;
+
+ intel_pstate_calc_busy(cpu, &cpu->samples[cpu->sample_ptr]);

- cpu->prev_sample = now;
- cpu->prev_idle_time_us = idle_time_us;
cpu->prev_aperf = aperf;
cpu->prev_mperf = mperf;
}