This patch fixes a bug introduced by:
commit c05199e5a57a579fea1e8fa65e2b511ceb524ffc
Author: Kan Liang <[email protected]>
Date: Tue Jan 20 04:54:25 2015 +0000
perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization
It moves uncore_box_init() out of place but this meant that for desktop memory
controller (IMC) PCI-based PMU, the intialization was now missing causing a
crash at first access. This patch fixes the issue by adding the call to the
uncore_enable_box() in snb_uncore_imc_read_counter() to ensure the box
is initialized and the PCI BAR is io-remapped.
Signed-off-by: Stephane Eranian <[email protected]>
--
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
index 4562e9e..755ec05 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
@@ -329,6 +329,9 @@ static void snb_uncore_imc_event_start(struct perf_event *event, int flags)
list_add_tail(&event->active_entry, &box->active_list);
+ if (box->n_active == 1)
+ uncore_enable_box(box);
+
count = snb_uncore_imc_read_counter(box, event);
local64_set(&event->hw.prev_count, count);
On Thu, 23 Apr 2015, Stephane Eranian wrote:
> This patch fixes a bug introduced by:
>
> commit c05199e5a57a579fea1e8fa65e2b511ceb524ffc
> Author: Kan Liang <[email protected]>
> Date: Tue Jan 20 04:54:25 2015 +0000
I was hoping this would fix the uncore/imc bug that the perf_fuzzer
triggers on my Haswell box, but even with this patch I can still quickly
hard lock the system
[ 79.468201] ------------[ cut here ]------------
[ 79.473121] kernel BUG at mm/vmalloc.c:1335!
[ 79.477705] invalid opcode: 0000 [#1] SMP
[ 79.482141] Modules linked in: fuse x86_pkg_temp_thermal intel_powerclamp intel_rapl iosf_mbi coretemp kvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel snd_hda_controller snd_hda_codec crc32_pclmul snd_hda_core snd_hwdep ghash_clmulni_intel i915 aesni_intel aes_x86_64 lrw gf128mul iTCO_wdt glue_helper ppdev snd_pcm iTCO_vendor_support evdev ablk_helper drm_kms_helper psmouse snd_timer cryptd drm tpm_tis serio_raw pcspkr xhci_pci snd parport_pc soundcore tpm xhci_hcd lpc_ich parport mei_me wmi mfd_core processor video battery i2c_i801 i2c_algo_bit button mei sg sr_mod cdrom sd_mod ehci_pci ehci_hcd ahci e1000e libahci ptp crc32c_intel usbcore libata scsi_mod usb_common pps_core thermal fan thermal_sys
[ 79.554768] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.0.0+ #135
[ 79.561315] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[ 79.569250] task: ffffffff81c14580 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 79.577270] RIP: 0010:[<ffffffff811b0cd0>] [<ffffffff811b0cd0>] __get_vm_area_node+0x170/0x180
[ 79.586597] RSP: 0000:ffff88011ea03cb8 EFLAGS: 00010006
[ 79.592301] RAX: 0000000080010000 RBX: 0000000000006000 RCX: ffffc90000000000
[ 79.599950] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 0000000000006000
[ 79.607581] RBP: ffff88011ea03cf8 R08: ffffe8ffffffffff R09: 00000000ffffffff
[ 79.615220] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000002
[ 79.622850] R13: 00000000fed16000 R14: 00000000fed10000 R15: 00000000fed10000
[ 79.630480] FS: 0000000000000000(0000) GS:ffff88011ea00000(0000) knlGS:0000000000000000
[ 79.639167] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 79.645302] CR2: 000000000061c218 CR3: 0000000001c0d000 CR4: 00000000001407f0
[ 79.652953] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 79.660571] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 79.668188] Stack:
[ 79.670344] 00000000fed10000 ffffc90000000000 ffff88011ea03d18 0000000000006000
[ 79.678352] 0000000000000002 00000000fed16000 00000000fed10000 00000000fed10000
[ 79.686346] ffff88011ea03d18 ffffffff811b2130 00000000000000d0 ffffffff81039f4d
[ 79.694391] Call Trace:
[ 79.696997] <IRQ>
[ 79.699090] [<ffffffff811b2130>] get_vm_area_caller+0x40/0x50
[ 79.705505] [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90
[ 79.712414] [<ffffffff810635e5>] __ioremap_caller+0x195/0x350
[ 79.718610] [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90
[ 79.725462] [<ffffffff81427f6b>] ? debug_object_activate+0x14b/0x1e0
[ 79.732346] [<ffffffff810637b7>] ioremap_nocache+0x17/0x20
[ 79.738283] [<ffffffff81039f4d>] snb_uncore_imc_init_box+0x6d/0x90
[ 79.744945] [<ffffffff81039cf7>] snb_uncore_imc_event_start+0xb7/0x110
[ 79.752020] [<ffffffff81039d97>] snb_uncore_imc_event_add+0x47/0x60
[ 79.758832] [<ffffffff81162cbb>] event_sched_in.isra.85+0xfb/0x330
[ 79.765519] [<ffffffff81162f5f>] group_sched_in+0x6f/0x1e0
[ 79.771481] [<ffffffff8101df1a>] ? native_sched_clock+0x2a/0x90
[ 79.777858] [<ffffffff811637bc>] __perf_event_enable+0x25c/0x2a0
[ 79.784418] [<ffffffff810f3e69>] ? tick_nohz_irq_exit+0x29/0x30
[ 79.790820] [<ffffffff8115ef30>] ? cpu_clock_event_start+0x40/0x40
[ 79.797546] [<ffffffff8115ef80>] remote_function+0x50/0x60
[ 79.803535] [<ffffffff810f8cd1>] flush_smp_call_function_queue+0x81/0x180
[ 79.810840] [<ffffffff810f9763>] generic_smp_call_function_single_interrupt+0x13/0x60
[ 79.819328] [<ffffffff8104b5e8>] smp_trace_call_function_single_interrupt+0x38/0xc0
[ 79.827614] [<ffffffff816de9be>] trace_call_function_single_interrupt+0x6e/0x80
[ 79.835465] <EOI>
[ 79.837543] [<ffffffff8156e8b5>] ? cpuidle_enter_state+0x65/0x160
[ 79.844377] [<ffffffff8156e8a1>] ? cpuidle_enter_state+0x51/0x160
[ 79.851015] [<ffffffff8156e9e7>] cpuidle_enter+0x17/0x20
[ 79.856791] [<ffffffff810b6e39>] cpu_startup_entry+0x399/0x440
[ 79.863165] [<ffffffff816c9ddb>] rest_init+0xbb/0xd0
[ 79.868555] [<ffffffff81d46f74>] start_kernel+0x44e/0x45b
[ 79.874433] [<ffffffff81d46120>] ? early_idt_handlers+0x120/0x120
[ 79.881083] [<ffffffff81d464d7>] x86_64_start_reservations+0x2a/0x2c
[ 79.888625] [<ffffffff81d46614>] x86_64_start_kernel+0x13b/0x14a
[ 79.895775] Code: fe ff ff 0f 1f 84 00 00 00 00 00 4c 89 ef e8 d8 b9 01 00 48 83 c4 18 31 c0 5b 41 5c 41 5d 41 5e 41 5f 5d c3 0f 1f 80 00 00 00 00 <0f> 0b 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
[ 79.918001] RIP [<ffffffff811b0cd0>] __get_vm_area_node+0x170/0x180
[ 79.925511] RSP <ffff88011ea03cb8>
[ 79.929921] ---[ end trace 838bf78817e86b4e ]---
[ 79.935575] Kernel panic - not syncing: Fatal exception in interrupt
[ 79.943131] Kernel Offset: disabled
[ 79.947644] drm_kms_helper: panic occurred, switching back to text console
[ 79.955802] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
[ 79.964192] ------------[ cut here ]------------
On Fri, 24 Apr 2015, Vince Weaver wrote:
> [ 79.473121] kernel BUG at mm/vmalloc.c:1335!
> [ 79.477705] invalid opcode: 0000 [#1] SMP
...
> [ 79.694391] Call Trace:
> [ 79.696997] <IRQ>
> [ 79.699090] [<ffffffff811b2130>] get_vm_area_caller+0x40/0x50
> [ 79.705505] [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90
This maps to:
static void snb_uncore_imc_init_box(struct intel_uncore_box *box) {
...
box->io_addr = ioremap(addr, SNB_UNCORE_PCI_IMC_MAP_SIZE);
The machine I am seeing this on is a Haswell desktop, Intel 6/60/3.
Vince
On Fri, Apr 24, 2015 at 7:03 AM, Vince Weaver <[email protected]> wrote:
> On Fri, 24 Apr 2015, Vince Weaver wrote:
>
>> [ 79.473121] kernel BUG at mm/vmalloc.c:1335!
>> [ 79.477705] invalid opcode: 0000 [#1] SMP
> ...
>> [ 79.694391] Call Trace:
>> [ 79.696997] <IRQ>
>> [ 79.699090] [<ffffffff811b2130>] get_vm_area_caller+0x40/0x50
>> [ 79.705505] [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90
>
> This maps to:
>
> static void snb_uncore_imc_init_box(struct intel_uncore_box *box) {
> ...
> box->io_addr = ioremap(addr, SNB_UNCORE_PCI_IMC_MAP_SIZE);
>
> The machine I am seeing this on is a Haswell desktop, Intel 6/60/3.
>
I also got that on one Lenovo IvyBridge laptop but I had just updated the BIOS.
It was working before with older kernels. I am not sure what is going on here.
It will be easier to debug once I get my desktop Haswell back and setup with
serial console. Need to verify that the address of the BAR is the same compared
with older kernels.
Maybe Bjorn can shed some lights. Why would ioremap() die like this in the
latest 4.0 tree.
On Fri, Apr 24, 2015 at 9:38 AM, Stephane Eranian <[email protected]> wrote:
> On Fri, Apr 24, 2015 at 7:03 AM, Vince Weaver <[email protected]> wrote:
>> On Fri, 24 Apr 2015, Vince Weaver wrote:
>>
>>> [ 79.473121] kernel BUG at mm/vmalloc.c:1335!
>>> [ 79.477705] invalid opcode: 0000 [#1] SMP
>> ...
>>> [ 79.694391] Call Trace:
>>> [ 79.696997] <IRQ>
>>> [ 79.699090] [<ffffffff811b2130>] get_vm_area_caller+0x40/0x50
>>> [ 79.705505] [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90
>>
>> This maps to:
>>
>> static void snb_uncore_imc_init_box(struct intel_uncore_box *box) {
>> ...
>> box->io_addr = ioremap(addr, SNB_UNCORE_PCI_IMC_MAP_SIZE);
>>
>> The machine I am seeing this on is a Haswell desktop, Intel 6/60/3.
>>
> I also got that on one Lenovo IvyBridge laptop but I had just updated the BIOS.
> It was working before with older kernels. I am not sure what is going on here.
> It will be easier to debug once I get my desktop Haswell back and setup with
> serial console. Need to verify that the address of the BAR is the same compared
> with older kernels.
>
> Maybe Bjorn can shed some lights. Why would ioremap() die like this in the
> latest 4.0 tree.
1328 static struct vm_struct *__get_vm_area_node(unsigned long size,
1329 unsigned long align, unsigned long flags,
unsigned long start,
1330 unsigned long end, int node, gfp_t gfp_mask,
const void *caller)
1331 {
1332 struct vmap_area *va;
1333 struct vm_struct *area;
1334
1335 BUG_ON(in_interrupt()); <-----------------
Is there some perfmon initialization happening in interrupt context?
On Fri, Apr 24, 2015 at 12:22 PM, Bjorn Helgaas <[email protected]> wrote:
> On Fri, Apr 24, 2015 at 9:38 AM, Stephane Eranian <[email protected]> wrote:
>> On Fri, Apr 24, 2015 at 7:03 AM, Vince Weaver <[email protected]> wrote:
>>> On Fri, 24 Apr 2015, Vince Weaver wrote:
>>>
>>>> [ 79.473121] kernel BUG at mm/vmalloc.c:1335!
>>>> [ 79.477705] invalid opcode: 0000 [#1] SMP
>>> ...
>>>> [ 79.694391] Call Trace:
>>>> [ 79.696997] <IRQ>
>>>> [ 79.699090] [<ffffffff811b2130>] get_vm_area_caller+0x40/0x50
>>>> [ 79.705505] [<ffffffff81039f4d>] ? snb_uncore_imc_init_box+0x6d/0x90
>>>
>>> This maps to:
>>>
>>> static void snb_uncore_imc_init_box(struct intel_uncore_box *box) {
>>> ...
>>> box->io_addr = ioremap(addr, SNB_UNCORE_PCI_IMC_MAP_SIZE);
>>>
>>> The machine I am seeing this on is a Haswell desktop, Intel 6/60/3.
>>>
>> I also got that on one Lenovo IvyBridge laptop but I had just updated the BIOS.
>> It was working before with older kernels. I am not sure what is going on here.
>> It will be easier to debug once I get my desktop Haswell back and setup with
>> serial console. Need to verify that the address of the BAR is the same compared
>> with older kernels.
>>
>> Maybe Bjorn can shed some lights. Why would ioremap() die like this in the
>> latest 4.0 tree.
>
> 1328 static struct vm_struct *__get_vm_area_node(unsigned long size,
> 1329 unsigned long align, unsigned long flags,
> unsigned long start,
> 1330 unsigned long end, int node, gfp_t gfp_mask,
> const void *caller)
> 1331 {
> 1332 struct vmap_area *va;
> 1333 struct vm_struct *area;
> 1334
> 1335 BUG_ON(in_interrupt()); <-----------------
>
> Is there some perfmon initialization happening in interrupt context?
May be in IPI context if I read Vince's stack trace.
This leads me to believe that this patch:
commit c05199e5a57a579fea1e8fa65e2b511ceb524ffc
Author: Kan Liang <[email protected]>
Date: Tue Jan 20 04:54:25 2015 +0000
perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization
If I revert it, I bet things will work again.
> This leads me to believe that this patch:
>
> commit c05199e5a57a579fea1e8fa65e2b511ceb524ffc
> Author: Kan Liang <[email protected]>
> Date: Tue Jan 20 04:54:25 2015 +0000
>
> perf/x86/intel/uncore: Move uncore_box_init() out of driver initialization
>
> If I revert it, I bet things will work again.
Yes the initialization needs to be moved out of the IPI context.
-Andi
--
[email protected] -- Speaking for myself only
>
> > This leads me to believe that this patch:
> >
> > commit c05199e5a57a579fea1e8fa65e2b511ceb524ffc
> > Author: Kan Liang <[email protected]>
> > Date: Tue Jan 20 04:54:25 2015 +0000
> >
> > perf/x86/intel/uncore: Move uncore_box_init() out of driver
> initialization
> >
> > If I revert it, I bet things will work again.
>
> Yes the initialization needs to be moved out of the IPI context.
>
Maybe we can move them to event init, which is not in IPI context.
What do you think of this patch?
---
>From 8a61c48144921e9d1c841656829c3bae9bfb4408 Mon Sep 17 00:00:00 2001
From: Kan Liang <[email protected]>
Date: Sun, 26 Apr 2015 16:24:59 -0400
Subject: [PATCH 1/1] perf/x86/intel/uncore: move uncore_box_init to uncore
event init
commit c05199e5a57a("perf/x86/intel/uncore: Move uncore_box_init() out
of driver initialization") moves uncore_box_init into uncore_enable_box
to prevent potential boot failures. However, uncore_enable_box is not
called on some client platforms (SNB/IVB/HSW) for counting IMC event.
When it is not called, the box is not initialized, which hard locks the
system.
Additionally, uncore_enable_box along with the initialization code in it
is always called in uncore event start functions, which are in IPI
context. But the initizlization code should not be in IPI context. This
is because, for example, the IMC box initialization codes for client
platforms include ioremap, which is not allowed to be called in IPI
context.
This patch moves uncore_box_init out of IPI context, to uncore event
init. The box is initialized only when it has not yet been initialized.
Signed-off-by: Kan Liang <[email protected]>
---
arch/x86/kernel/cpu/perf_event_intel_uncore.c | 4 ++++
arch/x86/kernel/cpu/perf_event_intel_uncore.h | 2 --
arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c | 3 +++
3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
index c635b8b..cbc1a93 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
@@ -623,6 +623,10 @@ static int uncore_pmu_event_init(struct perf_event *event)
box = uncore_pmu_to_box(pmu, event->cpu);
if (!box || box->cpu < 0)
return -EINVAL;
+
+ /* Init box if it's not initialized yet */
+ uncore_box_init(box);
+
event->cpu = box->cpu;
event->hw.idx = -1;
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
index 6c8c1e7..1fb2905 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
@@ -273,8 +273,6 @@ static inline void uncore_disable_box(struct intel_uncore_box *box)
static inline void uncore_enable_box(struct intel_uncore_box *box)
{
- uncore_box_init(box);
-
if (box->pmu->type->ops->enable_box)
box->pmu->type->ops->enable_box(box);
}
diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
index 4562e9e..ead70a6 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
@@ -279,6 +279,9 @@ static int snb_uncore_imc_event_init(struct perf_event *event)
if (!box || box->cpu < 0)
return -EINVAL;
+ /* Init box if it's not initialized yet */
+ uncore_box_init(box);
+
event->cpu = box->cpu;
event->hw.idx = -1;
Thanks,
Kan
> -Andi
>
>
> --
> [email protected] -- Speaking for myself only
On Sun, Apr 26, 2015 at 8:43 PM, Liang, Kan <[email protected]> wrote:
>
>>
>> > This leads me to believe that this patch:
>> >
>> > commit c05199e5a57a579fea1e8fa65e2b511ceb524ffc
>> > Author: Kan Liang <[email protected]>
>> > Date: Tue Jan 20 04:54:25 2015 +0000
>> >
>> > perf/x86/intel/uncore: Move uncore_box_init() out of driver
>> initialization
>> >
>> > If I revert it, I bet things will work again.
>>
>> Yes the initialization needs to be moved out of the IPI context.
>>
>
> Maybe we can move them to event init, which is not in IPI context.
>
> What do you think of this patch?
>
> ---
>
> From 8a61c48144921e9d1c841656829c3bae9bfb4408 Mon Sep 17 00:00:00 2001
> From: Kan Liang <[email protected]>
> Date: Sun, 26 Apr 2015 16:24:59 -0400
> Subject: [PATCH 1/1] perf/x86/intel/uncore: move uncore_box_init to uncore
> event init
>
> commit c05199e5a57a("perf/x86/intel/uncore: Move uncore_box_init() out
> of driver initialization") moves uncore_box_init into uncore_enable_box
> to prevent potential boot failures. However, uncore_enable_box is not
> called on some client platforms (SNB/IVB/HSW) for counting IMC event.
> When it is not called, the box is not initialized, which hard locks the
> system.
>
> Additionally, uncore_enable_box along with the initialization code in it
> is always called in uncore event start functions, which are in IPI
> context. But the initizlization code should not be in IPI context. This
> is because, for example, the IMC box initialization codes for client
> platforms include ioremap, which is not allowed to be called in IPI
> context.
>
> This patch moves uncore_box_init out of IPI context, to uncore event
> init. The box is initialized only when it has not yet been initialized.
>
> Signed-off-by: Kan Liang <[email protected]>
Ok this works for me now. Thanks.
Tested-by: Stephane Eranian <[email protected]>
> ---
> arch/x86/kernel/cpu/perf_event_intel_uncore.c | 4 ++++
> arch/x86/kernel/cpu/perf_event_intel_uncore.h | 2 --
> arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c | 3 +++
> 3 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.c b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> index c635b8b..cbc1a93 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.c
> @@ -623,6 +623,10 @@ static int uncore_pmu_event_init(struct perf_event *event)
> box = uncore_pmu_to_box(pmu, event->cpu);
> if (!box || box->cpu < 0)
> return -EINVAL;
> +
> + /* Init box if it's not initialized yet */
> + uncore_box_init(box);
> +
> event->cpu = box->cpu;
>
> event->hw.idx = -1;
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore.h b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> index 6c8c1e7..1fb2905 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore.h
> @@ -273,8 +273,6 @@ static inline void uncore_disable_box(struct intel_uncore_box *box)
>
> static inline void uncore_enable_box(struct intel_uncore_box *box)
> {
> - uncore_box_init(box);
> -
> if (box->pmu->type->ops->enable_box)
> box->pmu->type->ops->enable_box(box);
> }
> diff --git a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
> index 4562e9e..ead70a6 100644
> --- a/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
> +++ b/arch/x86/kernel/cpu/perf_event_intel_uncore_snb.c
> @@ -279,6 +279,9 @@ static int snb_uncore_imc_event_init(struct perf_event *event)
> if (!box || box->cpu < 0)
> return -EINVAL;
>
> + /* Init box if it's not initialized yet */
> + uncore_box_init(box);
> +
> event->cpu = box->cpu;
>
> event->hw.idx = -1;
>
> Thanks,
> Kan
>
>> -Andi
>>
>>
>> --
>> [email protected] -- Speaking for myself only
On Mon, Apr 27, 2015 at 03:43:32AM +0000, Liang, Kan wrote:
>
> >
> > > This leads me to believe that this patch:
> > >
> > > commit c05199e5a57a579fea1e8fa65e2b511ceb524ffc
> > > Author: Kan Liang <[email protected]>
> > > Date: Tue Jan 20 04:54:25 2015 +0000
> > >
> > > perf/x86/intel/uncore: Move uncore_box_init() out of driver
> > initialization
> > >
> > > If I revert it, I bet things will work again.
> >
> > Yes the initialization needs to be moved out of the IPI context.
I'm for the clean revert I think. Crashing is bad, but hiding/delaying
it seems counter productive too, it'll just mean we'll only learn about
it later.
* Peter Zijlstra <[email protected]> wrote:
> On Mon, Apr 27, 2015 at 03:43:32AM +0000, Liang, Kan wrote:
> >
> > >
> > > > This leads me to believe that this patch:
> > > >
> > > > commit c05199e5a57a579fea1e8fa65e2b511ceb524ffc
> > > > Author: Kan Liang <[email protected]>
> > > > Date: Tue Jan 20 04:54:25 2015 +0000
> > > >
> > > > perf/x86/intel/uncore: Move uncore_box_init() out of driver
> > > initialization
> > > >
> > > > If I revert it, I bet things will work again.
> > >
> > > Yes the initialization needs to be moved out of the IPI context.
>
> I'm for the clean revert I think. Crashing is bad, but hiding/delaying
> it seems counter productive too, it'll just mean we'll only learn about
> it later.
So should I revert c05199e5a57a, with a Cc: stable?
Thanks,
Ingo
On Mon, 11 May 2015, Ingo Molnar wrote:
>
> * Peter Zijlstra <[email protected]> wrote:
>
> > On Mon, Apr 27, 2015 at 03:43:32AM +0000, Liang, Kan wrote:
> > >
> > > >
> > > > > This leads me to believe that this patch:
> > > > >
> > > > > commit c05199e5a57a579fea1e8fa65e2b511ceb524ffc
> > > > > Author: Kan Liang <[email protected]>
> > > > > Date: Tue Jan 20 04:54:25 2015 +0000
> > > > >
> > > > > perf/x86/intel/uncore: Move uncore_box_init() out of driver
> > > > initialization
> > > > >
> > > > > If I revert it, I bet things will work again.
> > > >
> > > > Yes the initialization needs to be moved out of the IPI context.
> >
> > I'm for the clean revert I think. Crashing is bad, but hiding/delaying
> > it seems counter productive too, it'll just mean we'll only learn about
> > it later.
>
> So should I revert c05199e5a57a, with a Cc: stable?
any progress on this issue? With 4.1-rc7 I can still quickly crash a
Haswell machine due to this bug.
Vince
* Vince Weaver <[email protected]> wrote:
> On Mon, 11 May 2015, Ingo Molnar wrote:
>
> >
> > * Peter Zijlstra <[email protected]> wrote:
> >
> > > On Mon, Apr 27, 2015 at 03:43:32AM +0000, Liang, Kan wrote:
> > > >
> > > > >
> > > > > > This leads me to believe that this patch:
> > > > > >
> > > > > > commit c05199e5a57a579fea1e8fa65e2b511ceb524ffc
> > > > > > Author: Kan Liang <[email protected]>
> > > > > > Date: Tue Jan 20 04:54:25 2015 +0000
> > > > > >
> > > > > > perf/x86/intel/uncore: Move uncore_box_init() out of driver
> > > > > initialization
> > > > > >
> > > > > > If I revert it, I bet things will work again.
> > > > >
> > > > > Yes the initialization needs to be moved out of the IPI context.
> > >
> > > I'm for the clean revert I think. Crashing is bad, but hiding/delaying
> > > it seems counter productive too, it'll just mean we'll only learn about
> > > it later.
> >
> > So should I revert c05199e5a57a, with a Cc: stable?
>
> any progress on this issue? With 4.1-rc7 I can still quickly crash a
> Haswell machine due to this bug.
This dropped through the cracks. Peter?
Thanks,
Ingo
On Mon, May 11, 2015 at 12:00:30PM +0200, Ingo Molnar wrote:
> > I'm for the clean revert I think. Crashing is bad, but hiding/delaying
> > it seems counter productive too, it'll just mean we'll only learn about
> > it later.
>
> So should I revert c05199e5a57a, with a Cc: stable?
Yep.