After turning modversions off I finally managed to get a 4.9-rc kernel to
boot.
Anyway as per the suggestion at Linux Plumbers I enabled KASAN and on my
haswell machine it falls over in a few minutes of running the perf_fuzzer.
[ 205.740194] ==================================================================
[ 205.748005] BUG: KASAN: slab-out-of-bounds in snb_uncore_imc_event_del+0x6c/0xa0 at addr ffff8800caa43768
[ 205.758324] Read of size 8 by task perf_fuzzer/6618
[ 205.763589] CPU: 0 PID: 6618 Comm: perf_fuzzer Not tainted 4.9.0-rc5 #4
[ 205.770721] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
[ 205.778689] ffff8800c3c479b8 ffffffff816bb796 ffff88011ec00600 ffff8800caa43580
[ 205.786759] ffff8800c3c479e0 ffffffff812fb961 ffff8800c3c47a78 ffff8800caa43580
[ 205.794850] ffff8800caa43580 ffff8800c3c47a68 ffffffff812fbbd8 ffff8800c3c47a28
[ 205.802911] Call Trace:
[ 205.805559] [<ffffffff816bb796>] dump_stack+0x63/0x8d
[ 205.811135] [<ffffffff812fb961>] kasan_object_err+0x21/0x70
[ 205.817267] [<ffffffff812fbbd8>] kasan_report_error+0x1d8/0x4c0
[ 205.823752] [<ffffffff81133275>] ? __lock_is_held+0x75/0xc0
[ 205.829868] [<ffffffff81025b12>] ? snb_uncore_imc_read_counter+0x42/0x50
[ 205.837198] [<ffffffff810222e2>] ? uncore_perf_event_update+0xe2/0x160
[ 205.844337] [<ffffffff812fc319>] kasan_report+0x39/0x40
[ 205.850085] [<ffffffff81025e3c>] ? snb_uncore_imc_event_del+0x6c/0xa0
[ 205.857114] [<ffffffff812fa8fe>] __asan_load8+0x5e/0x70
[ 205.862874] [<ffffffff81025e3c>] snb_uncore_imc_event_del+0x6c/0xa0
[ 205.869727] [<ffffffff81241bd2>] event_sched_out.isra.89+0x192/0x690
[ 205.876664] [<ffffffff81242167>] group_sched_out+0x97/0x170
[ 205.882760] [<ffffffff81242810>] __perf_event_disable+0x140/0x1b0
[ 205.889395] [<ffffffff812384e7>] event_function+0x117/0x1f0
[ 205.895503] [<ffffffff812426d0>] ? task_ctx_sched_out+0x60/0x60
[ 205.901959] [<ffffffff812383d0>] ? update_group_times+0x50/0x50
[ 205.908425] [<ffffffff8123b020>] ? perf_cgroup_attach+0xb0/0xb0
[ 205.914937] [<ffffffff8123b096>] remote_function+0x76/0xa0
[ 205.920955] [<ffffffff8118da7c>] generic_exec_single+0xfc/0x170
[ 205.927434] [<ffffffff8123b020>] ? perf_cgroup_attach+0xb0/0xb0
[ 205.933883] [<ffffffff8118dc30>] smp_call_function_single+0x140/0x1b0
[ 205.940967] [<ffffffff8118daf0>] ? generic_exec_single+0x170/0x170
[ 205.947776] [<ffffffff81238e48>] event_function_call+0x268/0x270
[ 205.954336] [<ffffffff812426d0>] ? task_ctx_sched_out+0x60/0x60
[ 205.960806] [<ffffffff81238be0>] ? task_function_call+0xc0/0xc0
[ 205.967276] [<ffffffff812426d0>] ? task_ctx_sched_out+0x60/0x60
[ 205.973740] [<ffffffff81238e79>] ? _perf_event_disable+0x29/0x70
[ 205.980300] [<ffffffff812383d0>] ? update_group_times+0x50/0x50
[ 205.986750] [<ffffffff81238e97>] ? _perf_event_disable+0x47/0x70
[ 205.993338] [<ffffffff8113a4d7>] ? do_raw_spin_unlock+0x97/0x130
[ 205.999906] [<ffffffff81238e50>] ? event_function_call+0x270/0x270
[ 206.006674] [<ffffffff81238ea8>] _perf_event_disable+0x58/0x70
[ 206.013069] [<ffffffff812386a3>] perf_event_for_each_child+0x53/0xd0
[ 206.019990] [<ffffffff81247a51>] perf_event_task_disable+0x61/0xc0
[ 206.026759] [<ffffffff810daee2>] SyS_prctl+0x3f2/0x690
[ 206.032409] [<ffffffff810daaf0>] ? SyS_umask+0x40/0x40
[ 206.038059] [<ffffffff81b8dabb>] entry_SYSCALL_64_fastpath+0x1e/0xb2
[ 206.045007] Object at ffff8800caa43580, in cache kmalloc-512 size: 512
[ 206.052015] Allocated:
[ 206.054565] PID = 1
[ 206.056842] [ 206.058367] [<ffffffff8105fcdb>] save_stack_trace+0x1b/0x20
[ 206.064410] [ 206.065933] [<ffffffff812facc6>] save_stack+0x46/0xd0
[ 206.071416] [ 206.072953] [<ffffffff812faf3d>] kasan_kmalloc+0xad/0xe0
[ 206.078683] [ 206.080214] [<ffffffff812f7e3a>] __kmalloc_node+0x4a/0x60
[ 206.086061] [ 206.087590] [<ffffffff81020799>] uncore_alloc_box+0x39/0x150
[ 206.093685] [ 206.095208] [<ffffffff81020b8f>] uncore_pci_probe+0xff/0x4f0
[ 206.101357] [ 206.102879] [<ffffffff8172bc7a>] local_pci_probe+0x7a/0xd0
[ 206.108816] [ 206.110347] [<ffffffff8172df6e>] pci_device_probe+0x19e/0x1f0
[ 206.116553] [ 206.118073] [<ffffffff818a9a1d>] driver_probe_device+0x25d/0x400
[ 206.124566] [ 206.126087] [<ffffffff818a9c9c>] __driver_attach+0xdc/0xe0
[ 206.132021] [ 206.133534] [<ffffffff818a653b>] bus_for_each_dev+0xeb/0x150
[ 206.139654] [ 206.141184] [<ffffffff818a8f2b>] driver_attach+0x2b/0x30
[ 206.146948] [ 206.148493] [<ffffffff818a8900>] bus_add_driver+0x2b0/0x330
[ 206.154519] [ 206.156042] [<ffffffff818aa9f3>] driver_register+0xd3/0x190
[ 206.164160] [ 206.165688] [<ffffffff8172b2b4>] __pci_register_driver+0xb4/0xc0
[ 206.174265] [ 206.175783] [<ffffffff8261553b>] intel_uncore_init+0x2f3/0x388
[ 206.184162] [ 206.185672] [<ffffffff81002258>] do_one_initcall+0xa8/0x210
[ 206.193721] [ 206.195261] [<ffffffff8260e4c2>] kernel_init_freeable+0x27c/0x312
[ 206.203821] [ 206.205349] [<ffffffff81b80b13>] kernel_init+0x13/0x120
[ 206.212889] [ 206.214439] [<ffffffff81b8dd35>] ret_from_fork+0x25/0x30
[ 206.222067] Freed:
[ 206.226172] PID = 0
[ 206.230341] (stack is not available)
[ 206.236044] Memory state around the buggy address:
[ 206.243157] ffff8800caa43600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 206.252788] ffff8800caa43680: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 206.262437] >ffff8800caa43700: 00 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc
[ 206.272071] ^
[ 206.281005] ffff8800caa43780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 206.290640] ffff8800caa43800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 206.300302]
==================================================================
On Mon, 14 Nov 2016, Vince Weaver wrote:
> Anyway as per the suggestion at Linux Plumbers I enabled KASAN and on my
> haswell machine it falls over in a few minutes of running the perf_fuzzer.
>
> [ 205.740194] ==================================================================
> [ 205.748005] BUG: KASAN: slab-out-of-bounds in snb_uncore_imc_event_del+0x6c/0xa0 at addr ffff8800caa43768
> [ 205.758324] Read of size 8 by task perf_fuzzer/6618
> [ 205.763589] CPU: 0 PID: 6618 Comm: perf_fuzzer Not tainted 4.9.0-rc5 #4
> [ 205.770721] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> [ 205.778689] ffff8800c3c479b8 ffffffff816bb796 ffff88011ec00600 ffff8800caa43580
> [ 205.786759] ffff8800c3c479e0 ffffffff812fb961 ffff8800c3c47a78 ffff8800caa43580
> [ 205.794850] ffff8800caa43580 ffff8800c3c47a68 ffffffff812fbbd8 ffff8800c3c47a28
> [ 205.802911] Call Trace:
> [ 205.805559] [<ffffffff816bb796>] dump_stack+0x63/0x8d
> [ 205.811135] [<ffffffff812fb961>] kasan_object_err+0x21/0x70
> [ 205.817267] [<ffffffff812fbbd8>] kasan_report_error+0x1d8/0x4c0
> [ 205.823752] [<ffffffff81133275>] ? __lock_is_held+0x75/0xc0
> [ 205.829868] [<ffffffff81025b12>] ? snb_uncore_imc_read_counter+0x42/0x50
> [ 205.837198] [<ffffffff810222e2>] ? uncore_perf_event_update+0xe2/0x160
> [ 205.844337] [<ffffffff812fc319>] kasan_report+0x39/0x40
> [ 205.850085] [<ffffffff81025e3c>] ? snb_uncore_imc_event_del+0x6c/0xa0
The best I can tell this maps to:
static void snb_uncore_imc_event_del(struct perf_event *event, int flags)
{
struct intel_uncore_box *box = uncore_event_to_box(event);
int i;
snb_uncore_imc_event_stop(event, PERF_EF_UPDATE);
for (i = 0; i < box->n_events; i++) {
>>> if (event == box->event_list[i]) {
--box->n_events;
break;
}
}
}
Can this code be right? Does it actually remove the event?
The similar code in
static void uncore_pmu_event_del(struct perf_event *event, int flags)
....
for (i = 0; i < box->n_events; i++) {
if (event == box->event_list[i]) {
uncore_put_event_constraint(box, event);
for (++i; i < box->n_events; i++)
box->event_list[i - 1] = box->event_list[i];
--box->n_events;
break;
}
}
seems like it is more likely to be correct.
Vince
On Tue, Nov 15, 2016 at 6:57 AM, Vince Weaver <[email protected]> wrote:
> On Mon, 14 Nov 2016, Vince Weaver wrote:
>
>> Anyway as per the suggestion at Linux Plumbers I enabled KASAN and on my
>> haswell machine it falls over in a few minutes of running the perf_fuzzer.
>>
>> [ 205.740194] ==================================================================
>> [ 205.748005] BUG: KASAN: slab-out-of-bounds in snb_uncore_imc_event_del+0x6c/0xa0 at addr ffff8800caa43768
>> [ 205.758324] Read of size 8 by task perf_fuzzer/6618
>> [ 205.763589] CPU: 0 PID: 6618 Comm: perf_fuzzer Not tainted 4.9.0-rc5 #4
>> [ 205.770721] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
>> [ 205.778689] ffff8800c3c479b8 ffffffff816bb796 ffff88011ec00600 ffff8800caa43580
>> [ 205.786759] ffff8800c3c479e0 ffffffff812fb961 ffff8800c3c47a78 ffff8800caa43580
>> [ 205.794850] ffff8800caa43580 ffff8800c3c47a68 ffffffff812fbbd8 ffff8800c3c47a28
>> [ 205.802911] Call Trace:
>> [ 205.805559] [<ffffffff816bb796>] dump_stack+0x63/0x8d
>> [ 205.811135] [<ffffffff812fb961>] kasan_object_err+0x21/0x70
>> [ 205.817267] [<ffffffff812fbbd8>] kasan_report_error+0x1d8/0x4c0
>> [ 205.823752] [<ffffffff81133275>] ? __lock_is_held+0x75/0xc0
>> [ 205.829868] [<ffffffff81025b12>] ? snb_uncore_imc_read_counter+0x42/0x50
>> [ 205.837198] [<ffffffff810222e2>] ? uncore_perf_event_update+0xe2/0x160
>> [ 205.844337] [<ffffffff812fc319>] kasan_report+0x39/0x40
>> [ 205.850085] [<ffffffff81025e3c>] ? snb_uncore_imc_event_del+0x6c/0xa0
If you pipe the report through
https://github.com/google/sanitizers/blob/master/address-sanitizer/tools/kasan_symbolize.py
it will give you line numbers and inlined frames.
> The best I can tell this maps to:
>
> static void snb_uncore_imc_event_del(struct perf_event *event, int flags)
> {
> struct intel_uncore_box *box = uncore_event_to_box(event);
> int i;
>
> snb_uncore_imc_event_stop(event, PERF_EF_UPDATE);
>
> for (i = 0; i < box->n_events; i++) {
>>>> if (event == box->event_list[i]) {
> --box->n_events;
> break;
> }
> }
> }
>
> Can this code be right? Does it actually remove the event?
> The similar code in
>
> static void uncore_pmu_event_del(struct perf_event *event, int flags)
>
> ....
>
> for (i = 0; i < box->n_events; i++) {
> if (event == box->event_list[i]) {
> uncore_put_event_constraint(box, event);
>
> for (++i; i < box->n_events; i++)
> box->event_list[i - 1] = box->event_list[i];
>
> --box->n_events;
> break;
> }
> }
>
>
> seems like it is more likely to be correct.
>
> Vince
On Tue, 15 Nov 2016, Dmitry Vyukov wrote:
> If you pipe the report through
> https://github.com/google/sanitizers/blob/master/address-sanitizer/tools/kasan_symbolize.py
> it will give you line numbers and inlined frames.
is there any documentation for that program? If I run the dump through
it, it removes the timestamps and as far as I can see doesn't do anything
else.
Vince
On Tue, Nov 15, 2016 at 2:52 PM, Vince Weaver <[email protected]> wrote:
> On Tue, 15 Nov 2016, Dmitry Vyukov wrote:
>
>> If you pipe the report through
>> https://github.com/google/sanitizers/blob/master/address-sanitizer/tools/kasan_symbolize.py
>> it will give you line numbers and inlined frames.
>
> is there any documentation for that program?
the source code...
> If I run the dump through
> it, it removes the timestamps and as far as I can see doesn't do anything
> else.
There are 4 flags:
https://github.com/google/sanitizers/blob/master/address-sanitizer/tools/kasan_symbolize.py#L315
The only important one is --linux which should point to a dir with
vmlinux. Default value is cwd.
Also do you have kernel built with debug info? No debug info, no lines.
On Tue, Nov 15, 2016 at 12:57:31AM -0500, Vince Weaver wrote:
> On Mon, 14 Nov 2016, Vince Weaver wrote:
>
> > Anyway as per the suggestion at Linux Plumbers I enabled KASAN and on my
> > haswell machine it falls over in a few minutes of running the perf_fuzzer.
> >
> > [ 205.740194] ==================================================================
> > [ 205.748005] BUG: KASAN: slab-out-of-bounds in snb_uncore_imc_event_del+0x6c/0xa0 at addr ffff8800caa43768
> > [ 205.758324] Read of size 8 by task perf_fuzzer/6618
> > [ 205.763589] CPU: 0 PID: 6618 Comm: perf_fuzzer Not tainted 4.9.0-rc5 #4
> > [ 205.770721] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS FBKT72AUS 01/26/2014
> > [ 205.778689] ffff8800c3c479b8 ffffffff816bb796 ffff88011ec00600 ffff8800caa43580
> > [ 205.786759] ffff8800c3c479e0 ffffffff812fb961 ffff8800c3c47a78 ffff8800caa43580
> > [ 205.794850] ffff8800caa43580 ffff8800c3c47a68 ffffffff812fbbd8 ffff8800c3c47a28
> > [ 205.802911] Call Trace:
> > [ 205.805559] [<ffffffff816bb796>] dump_stack+0x63/0x8d
> > [ 205.811135] [<ffffffff812fb961>] kasan_object_err+0x21/0x70
> > [ 205.817267] [<ffffffff812fbbd8>] kasan_report_error+0x1d8/0x4c0
> > [ 205.823752] [<ffffffff81133275>] ? __lock_is_held+0x75/0xc0
> > [ 205.829868] [<ffffffff81025b12>] ? snb_uncore_imc_read_counter+0x42/0x50
> > [ 205.837198] [<ffffffff810222e2>] ? uncore_perf_event_update+0xe2/0x160
> > [ 205.844337] [<ffffffff812fc319>] kasan_report+0x39/0x40
> > [ 205.850085] [<ffffffff81025e3c>] ? snb_uncore_imc_event_del+0x6c/0xa0
>
> The best I can tell this maps to:
>
> static void snb_uncore_imc_event_del(struct perf_event *event, int flags)
> {
> struct intel_uncore_box *box = uncore_event_to_box(event);
> int i;
>
> snb_uncore_imc_event_stop(event, PERF_EF_UPDATE);
>
> for (i = 0; i < box->n_events; i++) {
> >>> if (event == box->event_list[i]) {
> --box->n_events;
> break;
> }
> }
> }
>
> Can this code be right? Does it actually remove the event?
> The similar code in
>
> static void uncore_pmu_event_del(struct perf_event *event, int flags)
>
> ....
>
> for (i = 0; i < box->n_events; i++) {
> if (event == box->event_list[i]) {
> uncore_put_event_constraint(box, event);
>
> for (++i; i < box->n_events; i++)
> box->event_list[i - 1] = box->event_list[i];
>
> --box->n_events;
> break;
> }
> }
>
>
> seems like it is more likely to be correct.
Kan, can you look at this?
>
> On Tue, Nov 15, 2016 at 12:57:31AM -0500, Vince Weaver wrote:
> > On Mon, 14 Nov 2016, Vince Weaver wrote:
> >
> > > Anyway as per the suggestion at Linux Plumbers I enabled KASAN and
> > > on my haswell machine it falls over in a few minutes of running the
> perf_fuzzer.
> > >
> > > [ 205.740194]
> > >
> ===============================================================
> ===
> > > [ 205.748005] BUG: KASAN: slab-out-of-bounds in
> > > snb_uncore_imc_event_del+0x6c/0xa0 at addr ffff8800caa43768 [
> > > 205.758324] Read of size 8 by task perf_fuzzer/6618 [ 205.763589]
> > > CPU: 0 PID: 6618 Comm: perf_fuzzer Not tainted 4.9.0-rc5 #4 [
> > > 205.770721] Hardware name: LENOVO 10AM000AUS/SHARKBAY, BIOS
> > > FBKT72AUS 01/26/2014 [ 205.778689] ffff8800c3c479b8
> > > ffffffff816bb796 ffff88011ec00600 ffff8800caa43580 [ 205.786759]
> > > ffff8800c3c479e0 ffffffff812fb961 ffff8800c3c47a78 ffff8800caa43580
> [ 205.794850] ffff8800caa43580 ffff8800c3c47a68 ffffffff812fbbd8
> ffff8800c3c47a28 [ 205.802911] Call Trace:
> > > [ 205.805559] [<ffffffff816bb796>] dump_stack+0x63/0x8d [
> > > 205.811135] [<ffffffff812fb961>] kasan_object_err+0x21/0x70 [
> > > 205.817267] [<ffffffff812fbbd8>] kasan_report_error+0x1d8/0x4c0 [
> > > 205.823752] [<ffffffff81133275>] ? __lock_is_held+0x75/0xc0 [
> > > 205.829868] [<ffffffff81025b12>] ?
> > > snb_uncore_imc_read_counter+0x42/0x50
> > > [ 205.837198] [<ffffffff810222e2>] ?
> > > uncore_perf_event_update+0xe2/0x160
> > > [ 205.844337] [<ffffffff812fc319>] kasan_report+0x39/0x40 [
> > > 205.850085] [<ffffffff81025e3c>] ?
> > > snb_uncore_imc_event_del+0x6c/0xa0
> >
> > The best I can tell this maps to:
> >
> > static void snb_uncore_imc_event_del(struct perf_event *event, int
> > flags) {
> > struct intel_uncore_box *box = uncore_event_to_box(event);
> > int i;
> >
> > snb_uncore_imc_event_stop(event, PERF_EF_UPDATE);
> >
> > for (i = 0; i < box->n_events; i++) {
> > >>> if (event == box->event_list[i]) {
> > --box->n_events;
> > break;
> > }
> > }
> > }
> >
> > Can this code be right? Does it actually remove the event?
> > The similar code in
> >
> > static void uncore_pmu_event_del(struct perf_event *event, int flags)
> >
> > ....
> >
> > for (i = 0; i < box->n_events; i++) {
> > if (event == box->event_list[i]) {
> > uncore_put_event_constraint(box, event);
> >
> > for (++i; i < box->n_events; i++)
> > box->event_list[i - 1] =
> > box->event_list[i];
> >
> > --box->n_events;
> > break;
> > }
> > }
> >
> >
> > seems like it is more likely to be correct.
>
> Kan, can you look at this?
For client IMC, there is no generic counters.
Current implementation defines its own fixed free running counters.
event_list and n_events are unused.
I think we can just remove them.
Vince, could you please try the patch as below?
------
diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
index 81195cc..a3dcc12 100644
--- a/arch/x86/events/intel/uncore_snb.c
+++ b/arch/x86/events/intel/uncore_snb.c
@@ -490,24 +490,12 @@ static int snb_uncore_imc_event_add(struct perf_event *event, int flags)
snb_uncore_imc_event_start(event, 0);
- box->n_events++;
-
return 0;
}
static void snb_uncore_imc_event_del(struct perf_event *event, int flags)
{
- struct intel_uncore_box *box = uncore_event_to_box(event);
- int i;
-
snb_uncore_imc_event_stop(event, PERF_EF_UPDATE);
-
- for (i = 0; i < box->n_events; i++) {
- if (event == box->event_list[i]) {
- --box->n_events;
- break;
- }
- }
}
int snb_pci2phy_map_init(int devid)
On Tue, 15 Nov 2016, Liang, Kan wrote:
> For client IMC, there is no generic counters.
> Current implementation defines its own fixed free running counters.
> event_list and n_events are unused.
> I think we can just remove them.
>
> Vince, could you please try the patch as below?
>
With this patch I have not been able to trigger the imc/uncore issue.
Or at least I used to be able to trigger it within 5 minutes, now I go
longer (maybe 10 minutes) before hitting an unrelated issue.
Vince
> -----Original Message-----
> From: Vince Weaver [mailto:[email protected]]
> Sent: Tuesday, November 15, 2016 12:39 PM
> To: Liang, Kan <[email protected]>
> Cc: Peter Zijlstra <[email protected]>; Vince Weaver
> <[email protected]>; [email protected]; Ingo Molnar
> <[email protected]>; Arnaldo Carvalho de Melo <[email protected]>;
> [email protected]; [email protected]; Stephane Eranian
> <[email protected]>
> Subject: RE: perf: fuzzer KASAN slab-out-of-bounds in
> snb_uncore_imc_event_del
>
> On Tue, 15 Nov 2016, Liang, Kan wrote:
>
> > For client IMC, there is no generic counters.
> > Current implementation defines its own fixed free running counters.
> > event_list and n_events are unused.
> > I think we can just remove them.
> >
> > Vince, could you please try the patch as below?
> >
>
> With this patch I have not been able to trigger the imc/uncore issue.
>
> Or at least I used to be able to trigger it within 5 minutes, now I go longer
> (maybe 10 minutes) before hitting an unrelated issue.
>
Thanks a lot for the test.
I will submit the patch then.
Thanks,
Kan