2014-07-30 04:46:00

by Stephane Eranian

[permalink] [raw]
Subject: Re: [perf/x86/RAPL] BUG: unable to handle kernel NULL pointer dereference at 00000028

On Wed, Jul 30, 2014 at 6:00 AM, Fengguang Wu <[email protected]> wrote:
> Greetings,
>
> 0day kernel testing robot got the below dmesg and the first bad commit is
>
Is this booting a guest kernel or native?
What is the host CPU?

> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> commit 4788e5b4b2338f85fa42a712a182d8afd65d7c58
> Author: Stephane Eranian <[email protected]>
> AuthorDate: Tue Nov 12 17:58:50 2013 +0100
> Commit: Ingo Molnar <[email protected]>
> CommitDate: Wed Nov 27 11:16:40 2013 +0100
>
> perf/x86: Add Intel RAPL PMU support
>
> This patch adds a new uncore PMU to expose the Intel
> RAPL energy consumption counters. Up to 3 counters,
> each counting a particular RAPL event are exposed.
>
> The RAPL counters are available on Intel SandyBridge,
> IvyBridge, Haswell. The server skus add a 3rd counter.
>
> The following events are available and exposed in sysfs:
>
> - power/energy-cores: power consumption of all cores on socket
> - power/energy-pkg: power consumption of all cores + LLc cache
> - power/energy-dram: power consumption of DRAM (servers only)
>
> For each event both the unit (Joules) and scale (2^-32 J)
> is exposed in sysfs for use by perf stat and other tools.
> The files are:
>
> /sys/devices/power/events/energy-*.unit
> /sys/devices/power/events/energy-*.scale
>
> The RAPL PMU is uncore by nature and is implemented such
> that it only works in system-wide mode. Measuring only
> one CPU per socket is sufficient. The /sys/devices/power/cpumask
> file can be used by tools to figure out which CPUs to monitor
> by default. For instance, on a 2-socket system, 2 CPUs
> (one on each socket) will be shown.
>
> All the counters measure in the same unit (exposed via sysfs).
> The perf_events API exposes all RAPL counters as 64-bit integers
> counting in unit of 1/2^32 Joules (about 0.23 nJ). User level tools
> must convert the counts by multiplying them by 2^-32 to obtain
> Joules. The reason for this is that the kernel avoids
> doing floating point math whenever possible because it is
> expensive (user floating-point state must be saved). The method
> used avoids kernel floating-point usage. There is no loss of
> precision. Thanks to PeterZ for suggesting this approach.
>
> To convert the raw count in Watt:
> W = C * 2.3 / (1e10 * time)
> or ldexp(C, -32).
>
> RAPL PMU is a new standalone PMU which registers with the
> perf_event core subsystem. The PMU type (attr->type) is
> dynamically allocated and is available from /sys/device/power/type.
>
> Sampling is not supported by the RAPL PMU. There is no
> privilege level filtering either.
>
> Signed-off-by: Stephane Eranian <[email protected]>
> Reviewed-by: Maria Dimakopoulou <[email protected]>
> Reviewed-by: Andi Kleen <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Link: http://lkml.kernel.org/r/[email protected]
> Signed-off-by: Ingo Molnar <[email protected]>
>
> +-----------------------------------------------------------+------------+------------+---------------+
> | | 410136f5dd | 4788e5b4b2 | next-20140724 |
> +-----------------------------------------------------------+------------+------------+---------------+
> | boot_successes | 1000 | 751 | 78 |
> | boot_failures | 0 | 149 | 3 |
> | BUG:unable_to_handle_kernel_NULL_pointer_dereference | 0 | 132 | 2 |
> | Oops | 0 | 132 | 2 |
> | EIP_is_at_rapl_pmu_init | 0 | 132 | |
> | Kernel_panic-not_syncing:Attempted_to_kill_init_exitcode= | 0 | 132 | 2 |
> | backtrace:rapl_pmu_init | 0 | 132 | |
> | backtrace:kernel_init_freeable | 0 | 132 | 2 |
> | BUG:kernel_boot_hang | 0 | 17 | 1 |
> | EIP_is_at_validate_chain | 0 | 0 | 2 |
> | backtrace:free_reserved_area | 0 | 0 | 2 |
> | backtrace:free_init_pages | 0 | 0 | 2 |
> | backtrace:populate_rootfs | 0 | 0 | 2 |
> +-----------------------------------------------------------+------------+------------+---------------+
>
> [ 0.613305] PCI: CLS 0 bytes, default 64
> [ 0.614699] Unpacking initramfs...
> [ 0.732188] Freeing initrd memory: 3276K (d3cbd000 - d3ff0000)
> [ 0.733895] BUG: unable to handle kernel NULL pointer dereference at 00000028
> [ 0.735603] IP: [<c09b20cb>] rapl_pmu_init+0x11e/0x139
> [ 0.736012] *pdpt = 0000000000000000 *pde = f000ff53f000ff53
> [ 0.736012] Oops: 0000 [#1] PREEMPT
> [ 0.736012] Modules linked in:
> [ 0.736012] CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.0-05711-g4788e5b #11
> [ 0.736012] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 0.736012] task: d244c020 ti: d244e000 task.ti: d244e000
> [ 0.736012] EIP: 0060:[<c09b20cb>] EFLAGS: 00010202 CPU: 0
> [ 0.736012] EIP is at rapl_pmu_init+0x11e/0x139
> [ 0.736012] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000001
> [ 0.736012] ESI: c09b1fad EDI: 000000cc EBP: d244ff00 ESP: d244fef0
> [ 0.736012] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> [ 0.736012] CR0: 80050033 CR2: 00000028 CR3: 00a16000 CR4: 000406b0
> [ 0.736012] Stack:
> [ 0.736012] c04ddabe 00000000 00000002 00000000 d244ff74 c0200477 c0251b16 d244ff2c
> [ 0.736012] c025467d d3ff63cb d244ff34 c02410cb d3ff63cb d244ff00 c09aa512 c080d71c
> [ 0.736012] 000000cc d244ff74 c02412d5 c0829fe0 00000286 c023b6d8 00000246 00060006
> [ 0.736012] Call Trace:
> [ 0.736012] [<c04ddabe>] ? register_syscore_ops+0x32/0x35
> [ 0.736012] [<c0200477>] do_one_initcall+0xdf/0x138
> [ 0.736012] [<c0251b16>] ? lock_release_holdtime.part.20+0x93/0xf8
> [ 0.736012] [<c025467d>] ? trace_hardirqs_on_caller+0xeb/0x1ad
> [ 0.736012] [<c02410cb>] ? parameq+0x13/0x5e
> [ 0.736012] [<c09aa512>] ? repair_env_string+0x12/0x51
> [ 0.736012] [<c02412d5>] ? parse_args+0x1bf/0x2f8
> [ 0.736012] [<c023b6d8>] ? __usermodehelper_set_disable_depth+0x3e/0x44
> [ 0.736012] [<c09aab46>] kernel_init_freeable+0xde/0x178
> [ 0.736012] [<c09aa500>] ? do_early_param+0x78/0x78
> [ 0.736012] [<c064bd10>] kernel_init+0xb/0xed
> [ 0.736012] [<c0249199>] ? schedule_tail+0xc/0x3a
> [ 0.736012] [<c0659637>] ret_from_kernel_thread+0x1b/0x28
> [ 0.736012] [<c064bd05>] ? rest_init+0xb5/0xb5
> [ 0.736012] Code: 99 87 ff 89 5c 24 04 c7 04 24 90 bf 76 c0 e8 dd e9 c9 ff 83 c8 ff eb 28 a1 44 bc a1 c0 f3 0f b8 c0 90 89 44 24 08 a1 80 73 82 c0 <8b> 40 28 89 44 24 04 c7 04 24 d4 bf 76 c0 e8 b2 e9 c9 ff 31 c0
> [ 0.736012] EIP: [<c09b20cb>] rapl_pmu_init+0x11e/0x139 SS:ESP 0068:d244fef0
> [ 0.736012] CR2: 0000000000000028
> [ 0.736012] ---[ end trace 0a81712c9fb36a0a ]---
> [ 0.736012] swapper (1) used greatest stack depth: 5800 bytes left
>
> git bisect start v3.14 v3.13 --
> git bisect bad 09df7c4c8097ca4a11393b1edd4997d786daad52 # 16:18 0- 3 x86: Remove CONFIG_X86_OOSTORE
> git bisect bad 15c81026204da897a05424c79263aea861a782cc # 16:24 2- 5 Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad a0fa1dd3cdbccec9597fe53b6177a9aa6e20f2f8 # 16:33 0- 15 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good edde1fb8c41d0db7c8ce17fb32886da2e389b0cc # 17:48 900+ 0 Merge tag 'localmodconfig-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-kconfig
> git bisect good a693c46e14c9fdadbcd68ddfa94a4f72495531a9 # 17:55 900+ 0 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect good 2cc3f16cad1561c6fc551aefff559e53726efc8b # 18:12 900+ 0 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad 9326657abe1a83ed4b4f396b923ca1217fd50cba # 18:21 9- 2 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> git bisect bad 7bb73553e2490ac6667387ee723e0faa61e9d999 # 18:38 0- 1 tools lib traceevent: Get rid of die() in reparent_op_arg()
> git bisect bad 3d7c0144491bd8c21d53b43032274a85efdfe434 # 18:41 11- 4 perf tools: Add build and install plugins targets
> git bisect bad ba1ddf42f3c3af111d3adee277534f73c1ef6a9b # 18:43 0- 15 perf script: Print mmap[2] events also
> git bisect bad a8b4c7014cadfdacd4e1f4c963128593be6f20de # 18:49 0- 2 perf completion: Rename file to reflect zsh support
> git bisect bad 4788e5b4b2338f85fa42a712a182d8afd65d7c58 # 18:53 0- 1 perf/x86: Add Intel RAPL PMU support
> git bisect good c912dae60ae6f659455f239298110adc67a5f3e9 # 19:33 900+ 14 uprobes: Cleanup !CONFIG_UPROBES decls, unexport xol_area
> git bisect good 09897d78dbc3a544426f2272b5601c62922ccab9 # 19:44 900+ 0 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
> git bisect good 410136f5dd96b6013fe6d1011b523b1c247e1ccb # 19:52 900+ 0 tools/perf/stat: Add event unit and scale support
> # first bad commit: [4788e5b4b2338f85fa42a712a182d8afd65d7c58] perf/x86: Add Intel RAPL PMU support
> git bisect good 410136f5dd96b6013fe6d1011b523b1c247e1ccb # 19:56 1000+ 0 tools/perf/stat: Add event unit and scale support
> git bisect bad 1a58d9909611972fd1c081bb04a9f7dc2571e612 # 19:58 0- 3 Add linux-next specific files for 20140724
> git bisect bad 82e13c71bc655b6dc7110da4e164079dadb44892 # 20:07 448- 10 Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux
> git bisect bad 5a7439efd1c5c416f768fc550048ca130cf4bf99 # 20:14 2- 6 Add linux-next specific files for 20140725
>
>
> This script may reproduce the error.
>
> ----------------------------------------------------------------------------
> #!/bin/bash
>
> kernel=$1
> initrd=yocto-minimal-i386.cgz
>
> wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/blob/master/initrd/$initrd
>
> kvm=(
> qemu-system-x86_64
> -enable-kvm
> -cpu Haswell,+smep,+smap
> -kernel $kernel
> -initrd $initrd
> -m 320
> -smp 1
> -net nic,vlan=1,model=e1000
> -net user,vlan=1
> -boot order=nc
> -no-reboot
> -watchdog i6300esb
> -rtc base=localtime
> -serial stdio
> -display none
> -monitor null
> )
>
> append=(
> hung_task_panic=1
> earlyprintk=ttyS0,115200
> debug
> apic=debug
> sysrq_always_enabled
> rcupdate.rcu_cpu_stall_timeout=100
> panic=10
> softlockup_panic=1
> nmi_watchdog=panic
> prompt_ramdisk=0
> console=ttyS0,115200
> console=tty0
> vga=normal
> root=/dev/ram0
> rw
> drbd.minor_count=8
> )
>
> "${kvm[@]}" --append "${append[*]}"
> ----------------------------------------------------------------------------
>
> Thanks,
> Fengguang
>
> _______________________________________________
> LKP mailing list
> [email protected]
>

2014-07-30 05:53:39

by Fengguang Wu

[permalink] [raw]
Subject: Re: [perf/x86/RAPL] BUG: unable to handle kernel NULL pointer dereference at 00000028

On Wed, Jul 30, 2014 at 06:45:58AM +0200, Stephane Eranian wrote:
> On Wed, Jul 30, 2014 at 6:00 AM, Fengguang Wu <[email protected]> wrote:
> > Greetings,
> >
> > 0day kernel testing robot got the below dmesg and the first bad commit is
> >
> Is this booting a guest kernel or native?

It's a guest kernel.

> What is the host CPU?

The host CPU is E5-2680, Sandy Bridge-EP.

Thanks,
Fengguang

> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> > commit 4788e5b4b2338f85fa42a712a182d8afd65d7c58
> > Author: Stephane Eranian <[email protected]>
> > AuthorDate: Tue Nov 12 17:58:50 2013 +0100
> > Commit: Ingo Molnar <[email protected]>
> > CommitDate: Wed Nov 27 11:16:40 2013 +0100
> >
> > perf/x86: Add Intel RAPL PMU support
> >
> > This patch adds a new uncore PMU to expose the Intel
> > RAPL energy consumption counters. Up to 3 counters,
> > each counting a particular RAPL event are exposed.
> >
> > The RAPL counters are available on Intel SandyBridge,
> > IvyBridge, Haswell. The server skus add a 3rd counter.
> >
> > The following events are available and exposed in sysfs:
> >
> > - power/energy-cores: power consumption of all cores on socket
> > - power/energy-pkg: power consumption of all cores + LLc cache
> > - power/energy-dram: power consumption of DRAM (servers only)
> >
> > For each event both the unit (Joules) and scale (2^-32 J)
> > is exposed in sysfs for use by perf stat and other tools.
> > The files are:
> >
> > /sys/devices/power/events/energy-*.unit
> > /sys/devices/power/events/energy-*.scale
> >
> > The RAPL PMU is uncore by nature and is implemented such
> > that it only works in system-wide mode. Measuring only
> > one CPU per socket is sufficient. The /sys/devices/power/cpumask
> > file can be used by tools to figure out which CPUs to monitor
> > by default. For instance, on a 2-socket system, 2 CPUs
> > (one on each socket) will be shown.
> >
> > All the counters measure in the same unit (exposed via sysfs).
> > The perf_events API exposes all RAPL counters as 64-bit integers
> > counting in unit of 1/2^32 Joules (about 0.23 nJ). User level tools
> > must convert the counts by multiplying them by 2^-32 to obtain
> > Joules. The reason for this is that the kernel avoids
> > doing floating point math whenever possible because it is
> > expensive (user floating-point state must be saved). The method
> > used avoids kernel floating-point usage. There is no loss of
> > precision. Thanks to PeterZ for suggesting this approach.
> >
> > To convert the raw count in Watt:
> > W = C * 2.3 / (1e10 * time)
> > or ldexp(C, -32).
> >
> > RAPL PMU is a new standalone PMU which registers with the
> > perf_event core subsystem. The PMU type (attr->type) is
> > dynamically allocated and is available from /sys/device/power/type.
> >
> > Sampling is not supported by the RAPL PMU. There is no
> > privilege level filtering either.
> >
> > Signed-off-by: Stephane Eranian <[email protected]>
> > Reviewed-by: Maria Dimakopoulou <[email protected]>
> > Reviewed-by: Andi Kleen <[email protected]>
> > Signed-off-by: Peter Zijlstra <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Link: http://lkml.kernel.org/r/[email protected]
> > Signed-off-by: Ingo Molnar <[email protected]>
> >
> > +-----------------------------------------------------------+------------+------------+---------------+
> > | | 410136f5dd | 4788e5b4b2 | next-20140724 |
> > +-----------------------------------------------------------+------------+------------+---------------+
> > | boot_successes | 1000 | 751 | 78 |
> > | boot_failures | 0 | 149 | 3 |
> > | BUG:unable_to_handle_kernel_NULL_pointer_dereference | 0 | 132 | 2 |
> > | Oops | 0 | 132 | 2 |
> > | EIP_is_at_rapl_pmu_init | 0 | 132 | |
> > | Kernel_panic-not_syncing:Attempted_to_kill_init_exitcode= | 0 | 132 | 2 |
> > | backtrace:rapl_pmu_init | 0 | 132 | |
> > | backtrace:kernel_init_freeable | 0 | 132 | 2 |
> > | BUG:kernel_boot_hang | 0 | 17 | 1 |
> > | EIP_is_at_validate_chain | 0 | 0 | 2 |
> > | backtrace:free_reserved_area | 0 | 0 | 2 |
> > | backtrace:free_init_pages | 0 | 0 | 2 |
> > | backtrace:populate_rootfs | 0 | 0 | 2 |
> > +-----------------------------------------------------------+------------+------------+---------------+
> >
> > [ 0.613305] PCI: CLS 0 bytes, default 64
> > [ 0.614699] Unpacking initramfs...
> > [ 0.732188] Freeing initrd memory: 3276K (d3cbd000 - d3ff0000)
> > [ 0.733895] BUG: unable to handle kernel NULL pointer dereference at 00000028
> > [ 0.735603] IP: [<c09b20cb>] rapl_pmu_init+0x11e/0x139
> > [ 0.736012] *pdpt = 0000000000000000 *pde = f000ff53f000ff53
> > [ 0.736012] Oops: 0000 [#1] PREEMPT
> > [ 0.736012] Modules linked in:
> > [ 0.736012] CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.0-05711-g4788e5b #11
> > [ 0.736012] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [ 0.736012] task: d244c020 ti: d244e000 task.ti: d244e000
> > [ 0.736012] EIP: 0060:[<c09b20cb>] EFLAGS: 00010202 CPU: 0
> > [ 0.736012] EIP is at rapl_pmu_init+0x11e/0x139
> > [ 0.736012] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000001
> > [ 0.736012] ESI: c09b1fad EDI: 000000cc EBP: d244ff00 ESP: d244fef0
> > [ 0.736012] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> > [ 0.736012] CR0: 80050033 CR2: 00000028 CR3: 00a16000 CR4: 000406b0
> > [ 0.736012] Stack:
> > [ 0.736012] c04ddabe 00000000 00000002 00000000 d244ff74 c0200477 c0251b16 d244ff2c
> > [ 0.736012] c025467d d3ff63cb d244ff34 c02410cb d3ff63cb d244ff00 c09aa512 c080d71c
> > [ 0.736012] 000000cc d244ff74 c02412d5 c0829fe0 00000286 c023b6d8 00000246 00060006
> > [ 0.736012] Call Trace:
> > [ 0.736012] [<c04ddabe>] ? register_syscore_ops+0x32/0x35
> > [ 0.736012] [<c0200477>] do_one_initcall+0xdf/0x138
> > [ 0.736012] [<c0251b16>] ? lock_release_holdtime.part.20+0x93/0xf8
> > [ 0.736012] [<c025467d>] ? trace_hardirqs_on_caller+0xeb/0x1ad
> > [ 0.736012] [<c02410cb>] ? parameq+0x13/0x5e
> > [ 0.736012] [<c09aa512>] ? repair_env_string+0x12/0x51
> > [ 0.736012] [<c02412d5>] ? parse_args+0x1bf/0x2f8
> > [ 0.736012] [<c023b6d8>] ? __usermodehelper_set_disable_depth+0x3e/0x44
> > [ 0.736012] [<c09aab46>] kernel_init_freeable+0xde/0x178
> > [ 0.736012] [<c09aa500>] ? do_early_param+0x78/0x78
> > [ 0.736012] [<c064bd10>] kernel_init+0xb/0xed
> > [ 0.736012] [<c0249199>] ? schedule_tail+0xc/0x3a
> > [ 0.736012] [<c0659637>] ret_from_kernel_thread+0x1b/0x28
> > [ 0.736012] [<c064bd05>] ? rest_init+0xb5/0xb5
> > [ 0.736012] Code: 99 87 ff 89 5c 24 04 c7 04 24 90 bf 76 c0 e8 dd e9 c9 ff 83 c8 ff eb 28 a1 44 bc a1 c0 f3 0f b8 c0 90 89 44 24 08 a1 80 73 82 c0 <8b> 40 28 89 44 24 04 c7 04 24 d4 bf 76 c0 e8 b2 e9 c9 ff 31 c0
> > [ 0.736012] EIP: [<c09b20cb>] rapl_pmu_init+0x11e/0x139 SS:ESP 0068:d244fef0
> > [ 0.736012] CR2: 0000000000000028
> > [ 0.736012] ---[ end trace 0a81712c9fb36a0a ]---
> > [ 0.736012] swapper (1) used greatest stack depth: 5800 bytes left
> >
> > git bisect start v3.14 v3.13 --
> > git bisect bad 09df7c4c8097ca4a11393b1edd4997d786daad52 # 16:18 0- 3 x86: Remove CONFIG_X86_OOSTORE
> > git bisect bad 15c81026204da897a05424c79263aea861a782cc # 16:24 2- 5 Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect bad a0fa1dd3cdbccec9597fe53b6177a9aa6e20f2f8 # 16:33 0- 15 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect good edde1fb8c41d0db7c8ce17fb32886da2e389b0cc # 17:48 900+ 0 Merge tag 'localmodconfig-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-kconfig
> > git bisect good a693c46e14c9fdadbcd68ddfa94a4f72495531a9 # 17:55 900+ 0 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect good 2cc3f16cad1561c6fc551aefff559e53726efc8b # 18:12 900+ 0 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect bad 9326657abe1a83ed4b4f396b923ca1217fd50cba # 18:21 9- 2 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> > git bisect bad 7bb73553e2490ac6667387ee723e0faa61e9d999 # 18:38 0- 1 tools lib traceevent: Get rid of die() in reparent_op_arg()
> > git bisect bad 3d7c0144491bd8c21d53b43032274a85efdfe434 # 18:41 11- 4 perf tools: Add build and install plugins targets
> > git bisect bad ba1ddf42f3c3af111d3adee277534f73c1ef6a9b # 18:43 0- 15 perf script: Print mmap[2] events also
> > git bisect bad a8b4c7014cadfdacd4e1f4c963128593be6f20de # 18:49 0- 2 perf completion: Rename file to reflect zsh support
> > git bisect bad 4788e5b4b2338f85fa42a712a182d8afd65d7c58 # 18:53 0- 1 perf/x86: Add Intel RAPL PMU support
> > git bisect good c912dae60ae6f659455f239298110adc67a5f3e9 # 19:33 900+ 14 uprobes: Cleanup !CONFIG_UPROBES decls, unexport xol_area
> > git bisect good 09897d78dbc3a544426f2272b5601c62922ccab9 # 19:44 900+ 0 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
> > git bisect good 410136f5dd96b6013fe6d1011b523b1c247e1ccb # 19:52 900+ 0 tools/perf/stat: Add event unit and scale support
> > # first bad commit: [4788e5b4b2338f85fa42a712a182d8afd65d7c58] perf/x86: Add Intel RAPL PMU support
> > git bisect good 410136f5dd96b6013fe6d1011b523b1c247e1ccb # 19:56 1000+ 0 tools/perf/stat: Add event unit and scale support
> > git bisect bad 1a58d9909611972fd1c081bb04a9f7dc2571e612 # 19:58 0- 3 Add linux-next specific files for 20140724
> > git bisect bad 82e13c71bc655b6dc7110da4e164079dadb44892 # 20:07 448- 10 Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux
> > git bisect bad 5a7439efd1c5c416f768fc550048ca130cf4bf99 # 20:14 2- 6 Add linux-next specific files for 20140725
> >
> >
> > This script may reproduce the error.
> >
> > ----------------------------------------------------------------------------
> > #!/bin/bash
> >
> > kernel=$1
> > initrd=yocto-minimal-i386.cgz
> >
> > wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/blob/master/initrd/$initrd
> >
> > kvm=(
> > qemu-system-x86_64
> > -enable-kvm
> > -cpu Haswell,+smep,+smap
> > -kernel $kernel
> > -initrd $initrd
> > -m 320
> > -smp 1
> > -net nic,vlan=1,model=e1000
> > -net user,vlan=1
> > -boot order=nc
> > -no-reboot
> > -watchdog i6300esb
> > -rtc base=localtime
> > -serial stdio
> > -display none
> > -monitor null
> > )
> >
> > append=(
> > hung_task_panic=1
> > earlyprintk=ttyS0,115200
> > debug
> > apic=debug
> > sysrq_always_enabled
> > rcupdate.rcu_cpu_stall_timeout=100
> > panic=10
> > softlockup_panic=1
> > nmi_watchdog=panic
> > prompt_ramdisk=0
> > console=ttyS0,115200
> > console=tty0
> > vga=normal
> > root=/dev/ram0
> > rw
> > drbd.minor_count=8
> > )
> >
> > "${kvm[@]}" --append "${append[*]}"
> > ----------------------------------------------------------------------------
> >
> > Thanks,
> > Fengguang
> >
> > _______________________________________________
> > LKP mailing list
> > [email protected]
> >

2014-07-30 17:56:14

by Stephane Eranian

[permalink] [raw]
Subject: Re: [perf/x86/RAPL] BUG: unable to handle kernel NULL pointer dereference at 00000028

On Wed, Jul 30, 2014 at 7:53 AM, Fengguang Wu <[email protected]> wrote:
> On Wed, Jul 30, 2014 at 06:45:58AM +0200, Stephane Eranian wrote:
>> On Wed, Jul 30, 2014 at 6:00 AM, Fengguang Wu <[email protected]> wrote:
>> > Greetings,
>> >
>> > 0day kernel testing robot got the below dmesg and the first bad commit is
>> >
>> Is this booting a guest kernel or native?
>
> It's a guest kernel.
>
>> What is the host CPU?
>
> The host CPU is E5-2680, Sandy Bridge-EP.
>
I thought this problem had already be mentioned a while back.

See https://lkml.org/lkml/2014/3/6/685
And https://lkml.org/lkml/2014/4/23/512

So what you are telling here is that those two fixes never made it or
that you are
running an older kernel.


> Thanks,
> Fengguang
>
>> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> > commit 4788e5b4b2338f85fa42a712a182d8afd65d7c58
>> > Author: Stephane Eranian <[email protected]>
>> > AuthorDate: Tue Nov 12 17:58:50 2013 +0100
>> > Commit: Ingo Molnar <[email protected]>
>> > CommitDate: Wed Nov 27 11:16:40 2013 +0100
>> >
>> > perf/x86: Add Intel RAPL PMU support
>> >
>> > This patch adds a new uncore PMU to expose the Intel
>> > RAPL energy consumption counters. Up to 3 counters,
>> > each counting a particular RAPL event are exposed.
>> >
>> > The RAPL counters are available on Intel SandyBridge,
>> > IvyBridge, Haswell. The server skus add a 3rd counter.
>> >
>> > The following events are available and exposed in sysfs:
>> >
>> > - power/energy-cores: power consumption of all cores on socket
>> > - power/energy-pkg: power consumption of all cores + LLc cache
>> > - power/energy-dram: power consumption of DRAM (servers only)
>> >
>> > For each event both the unit (Joules) and scale (2^-32 J)
>> > is exposed in sysfs for use by perf stat and other tools.
>> > The files are:
>> >
>> > /sys/devices/power/events/energy-*.unit
>> > /sys/devices/power/events/energy-*.scale
>> >
>> > The RAPL PMU is uncore by nature and is implemented such
>> > that it only works in system-wide mode. Measuring only
>> > one CPU per socket is sufficient. The /sys/devices/power/cpumask
>> > file can be used by tools to figure out which CPUs to monitor
>> > by default. For instance, on a 2-socket system, 2 CPUs
>> > (one on each socket) will be shown.
>> >
>> > All the counters measure in the same unit (exposed via sysfs).
>> > The perf_events API exposes all RAPL counters as 64-bit integers
>> > counting in unit of 1/2^32 Joules (about 0.23 nJ). User level tools
>> > must convert the counts by multiplying them by 2^-32 to obtain
>> > Joules. The reason for this is that the kernel avoids
>> > doing floating point math whenever possible because it is
>> > expensive (user floating-point state must be saved). The method
>> > used avoids kernel floating-point usage. There is no loss of
>> > precision. Thanks to PeterZ for suggesting this approach.
>> >
>> > To convert the raw count in Watt:
>> > W = C * 2.3 / (1e10 * time)
>> > or ldexp(C, -32).
>> >
>> > RAPL PMU is a new standalone PMU which registers with the
>> > perf_event core subsystem. The PMU type (attr->type) is
>> > dynamically allocated and is available from /sys/device/power/type.
>> >
>> > Sampling is not supported by the RAPL PMU. There is no
>> > privilege level filtering either.
>> >
>> > Signed-off-by: Stephane Eranian <[email protected]>
>> > Reviewed-by: Maria Dimakopoulou <[email protected]>
>> > Reviewed-by: Andi Kleen <[email protected]>
>> > Signed-off-by: Peter Zijlstra <[email protected]>
>> > Cc: [email protected]
>> > Cc: [email protected]
>> > Cc: [email protected]
>> > Cc: [email protected]
>> > Link: http://lkml.kernel.org/r/[email protected]
>> > Signed-off-by: Ingo Molnar <[email protected]>
>> >
>> > +-----------------------------------------------------------+------------+------------+---------------+
>> > | | 410136f5dd | 4788e5b4b2 | next-20140724 |
>> > +-----------------------------------------------------------+------------+------------+---------------+
>> > | boot_successes | 1000 | 751 | 78 |
>> > | boot_failures | 0 | 149 | 3 |
>> > | BUG:unable_to_handle_kernel_NULL_pointer_dereference | 0 | 132 | 2 |
>> > | Oops | 0 | 132 | 2 |
>> > | EIP_is_at_rapl_pmu_init | 0 | 132 | |
>> > | Kernel_panic-not_syncing:Attempted_to_kill_init_exitcode= | 0 | 132 | 2 |
>> > | backtrace:rapl_pmu_init | 0 | 132 | |
>> > | backtrace:kernel_init_freeable | 0 | 132 | 2 |
>> > | BUG:kernel_boot_hang | 0 | 17 | 1 |
>> > | EIP_is_at_validate_chain | 0 | 0 | 2 |
>> > | backtrace:free_reserved_area | 0 | 0 | 2 |
>> > | backtrace:free_init_pages | 0 | 0 | 2 |
>> > | backtrace:populate_rootfs | 0 | 0 | 2 |
>> > +-----------------------------------------------------------+------------+------------+---------------+
>> >
>> > [ 0.613305] PCI: CLS 0 bytes, default 64
>> > [ 0.614699] Unpacking initramfs...
>> > [ 0.732188] Freeing initrd memory: 3276K (d3cbd000 - d3ff0000)
>> > [ 0.733895] BUG: unable to handle kernel NULL pointer dereference at 00000028
>> > [ 0.735603] IP: [<c09b20cb>] rapl_pmu_init+0x11e/0x139
>> > [ 0.736012] *pdpt = 0000000000000000 *pde = f000ff53f000ff53
>> > [ 0.736012] Oops: 0000 [#1] PREEMPT
>> > [ 0.736012] Modules linked in:
>> > [ 0.736012] CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.0-05711-g4788e5b #11
>> > [ 0.736012] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> > [ 0.736012] task: d244c020 ti: d244e000 task.ti: d244e000
>> > [ 0.736012] EIP: 0060:[<c09b20cb>] EFLAGS: 00010202 CPU: 0
>> > [ 0.736012] EIP is at rapl_pmu_init+0x11e/0x139
>> > [ 0.736012] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000001
>> > [ 0.736012] ESI: c09b1fad EDI: 000000cc EBP: d244ff00 ESP: d244fef0
>> > [ 0.736012] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
>> > [ 0.736012] CR0: 80050033 CR2: 00000028 CR3: 00a16000 CR4: 000406b0
>> > [ 0.736012] Stack:
>> > [ 0.736012] c04ddabe 00000000 00000002 00000000 d244ff74 c0200477 c0251b16 d244ff2c
>> > [ 0.736012] c025467d d3ff63cb d244ff34 c02410cb d3ff63cb d244ff00 c09aa512 c080d71c
>> > [ 0.736012] 000000cc d244ff74 c02412d5 c0829fe0 00000286 c023b6d8 00000246 00060006
>> > [ 0.736012] Call Trace:
>> > [ 0.736012] [<c04ddabe>] ? register_syscore_ops+0x32/0x35
>> > [ 0.736012] [<c0200477>] do_one_initcall+0xdf/0x138
>> > [ 0.736012] [<c0251b16>] ? lock_release_holdtime.part.20+0x93/0xf8
>> > [ 0.736012] [<c025467d>] ? trace_hardirqs_on_caller+0xeb/0x1ad
>> > [ 0.736012] [<c02410cb>] ? parameq+0x13/0x5e
>> > [ 0.736012] [<c09aa512>] ? repair_env_string+0x12/0x51
>> > [ 0.736012] [<c02412d5>] ? parse_args+0x1bf/0x2f8
>> > [ 0.736012] [<c023b6d8>] ? __usermodehelper_set_disable_depth+0x3e/0x44
>> > [ 0.736012] [<c09aab46>] kernel_init_freeable+0xde/0x178
>> > [ 0.736012] [<c09aa500>] ? do_early_param+0x78/0x78
>> > [ 0.736012] [<c064bd10>] kernel_init+0xb/0xed
>> > [ 0.736012] [<c0249199>] ? schedule_tail+0xc/0x3a
>> > [ 0.736012] [<c0659637>] ret_from_kernel_thread+0x1b/0x28
>> > [ 0.736012] [<c064bd05>] ? rest_init+0xb5/0xb5
>> > [ 0.736012] Code: 99 87 ff 89 5c 24 04 c7 04 24 90 bf 76 c0 e8 dd e9 c9 ff 83 c8 ff eb 28 a1 44 bc a1 c0 f3 0f b8 c0 90 89 44 24 08 a1 80 73 82 c0 <8b> 40 28 89 44 24 04 c7 04 24 d4 bf 76 c0 e8 b2 e9 c9 ff 31 c0
>> > [ 0.736012] EIP: [<c09b20cb>] rapl_pmu_init+0x11e/0x139 SS:ESP 0068:d244fef0
>> > [ 0.736012] CR2: 0000000000000028
>> > [ 0.736012] ---[ end trace 0a81712c9fb36a0a ]---
>> > [ 0.736012] swapper (1) used greatest stack depth: 5800 bytes left
>> >
>> > git bisect start v3.14 v3.13 --
>> > git bisect bad 09df7c4c8097ca4a11393b1edd4997d786daad52 # 16:18 0- 3 x86: Remove CONFIG_X86_OOSTORE
>> > git bisect bad 15c81026204da897a05424c79263aea861a782cc # 16:24 2- 5 Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> > git bisect bad a0fa1dd3cdbccec9597fe53b6177a9aa6e20f2f8 # 16:33 0- 15 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> > git bisect good edde1fb8c41d0db7c8ce17fb32886da2e389b0cc # 17:48 900+ 0 Merge tag 'localmodconfig-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-kconfig
>> > git bisect good a693c46e14c9fdadbcd68ddfa94a4f72495531a9 # 17:55 900+ 0 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> > git bisect good 2cc3f16cad1561c6fc551aefff559e53726efc8b # 18:12 900+ 0 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> > git bisect bad 9326657abe1a83ed4b4f396b923ca1217fd50cba # 18:21 9- 2 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> > git bisect bad 7bb73553e2490ac6667387ee723e0faa61e9d999 # 18:38 0- 1 tools lib traceevent: Get rid of die() in reparent_op_arg()
>> > git bisect bad 3d7c0144491bd8c21d53b43032274a85efdfe434 # 18:41 11- 4 perf tools: Add build and install plugins targets
>> > git bisect bad ba1ddf42f3c3af111d3adee277534f73c1ef6a9b # 18:43 0- 15 perf script: Print mmap[2] events also
>> > git bisect bad a8b4c7014cadfdacd4e1f4c963128593be6f20de # 18:49 0- 2 perf completion: Rename file to reflect zsh support
>> > git bisect bad 4788e5b4b2338f85fa42a712a182d8afd65d7c58 # 18:53 0- 1 perf/x86: Add Intel RAPL PMU support
>> > git bisect good c912dae60ae6f659455f239298110adc67a5f3e9 # 19:33 900+ 14 uprobes: Cleanup !CONFIG_UPROBES decls, unexport xol_area
>> > git bisect good 09897d78dbc3a544426f2272b5601c62922ccab9 # 19:44 900+ 0 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
>> > git bisect good 410136f5dd96b6013fe6d1011b523b1c247e1ccb # 19:52 900+ 0 tools/perf/stat: Add event unit and scale support
>> > # first bad commit: [4788e5b4b2338f85fa42a712a182d8afd65d7c58] perf/x86: Add Intel RAPL PMU support
>> > git bisect good 410136f5dd96b6013fe6d1011b523b1c247e1ccb # 19:56 1000+ 0 tools/perf/stat: Add event unit and scale support
>> > git bisect bad 1a58d9909611972fd1c081bb04a9f7dc2571e612 # 19:58 0- 3 Add linux-next specific files for 20140724
>> > git bisect bad 82e13c71bc655b6dc7110da4e164079dadb44892 # 20:07 448- 10 Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux
>> > git bisect bad 5a7439efd1c5c416f768fc550048ca130cf4bf99 # 20:14 2- 6 Add linux-next specific files for 20140725
>> >
>> >
>> > This script may reproduce the error.
>> >
>> > ----------------------------------------------------------------------------
>> > #!/bin/bash
>> >
>> > kernel=$1
>> > initrd=yocto-minimal-i386.cgz
>> >
>> > wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/blob/master/initrd/$initrd
>> >
>> > kvm=(
>> > qemu-system-x86_64
>> > -enable-kvm
>> > -cpu Haswell,+smep,+smap
>> > -kernel $kernel
>> > -initrd $initrd
>> > -m 320
>> > -smp 1
>> > -net nic,vlan=1,model=e1000
>> > -net user,vlan=1
>> > -boot order=nc
>> > -no-reboot
>> > -watchdog i6300esb
>> > -rtc base=localtime
>> > -serial stdio
>> > -display none
>> > -monitor null
>> > )
>> >
>> > append=(
>> > hung_task_panic=1
>> > earlyprintk=ttyS0,115200
>> > debug
>> > apic=debug
>> > sysrq_always_enabled
>> > rcupdate.rcu_cpu_stall_timeout=100
>> > panic=10
>> > softlockup_panic=1
>> > nmi_watchdog=panic
>> > prompt_ramdisk=0
>> > console=ttyS0,115200
>> > console=tty0
>> > vga=normal
>> > root=/dev/ram0
>> > rw
>> > drbd.minor_count=8
>> > )
>> >
>> > "${kvm[@]}" --append "${append[*]}"
>> > ----------------------------------------------------------------------------
>> >
>> > Thanks,
>> > Fengguang
>> >
>> > _______________________________________________
>> > LKP mailing list
>> > [email protected]
>> >

2014-07-31 02:32:58

by Fengguang Wu

[permalink] [raw]
Subject: Re: [perf/x86/RAPL] BUG: unable to handle kernel NULL pointer dereference at 00000028

Hi Stephane,

On Wed, Jul 30, 2014 at 07:56:11PM +0200, Stephane Eranian wrote:
> On Wed, Jul 30, 2014 at 7:53 AM, Fengguang Wu <[email protected]> wrote:
> > On Wed, Jul 30, 2014 at 06:45:58AM +0200, Stephane Eranian wrote:
> >> On Wed, Jul 30, 2014 at 6:00 AM, Fengguang Wu <[email protected]> wrote:
> >> > Greetings,
> >> >
> >> > 0day kernel testing robot got the below dmesg and the first bad commit is
> >> >
> >> Is this booting a guest kernel or native?
> >
> > It's a guest kernel.
> >
> >> What is the host CPU?
> >
> > The host CPU is E5-2680, Sandy Bridge-EP.
> >
> I thought this problem had already be mentioned a while back.
>
> See https://lkml.org/lkml/2014/3/6/685
> And https://lkml.org/lkml/2014/4/23/512
>
> So what you are telling here is that those two fixes never made it or
> that you are
> running an older kernel.

I just checked linux-next and find that the bug in rapl_pmu_init() has
been fixed. linux-next happen to have the same "BUG: unable to handle
kernel NULL pointer dereference" message but at another function
validate_chain().. Attached is the dmesg in linux-next.

Sorry for the noise!

Thanks,
Fengguang

> >> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> >> > commit 4788e5b4b2338f85fa42a712a182d8afd65d7c58
> >> > Author: Stephane Eranian <[email protected]>
> >> > AuthorDate: Tue Nov 12 17:58:50 2013 +0100
> >> > Commit: Ingo Molnar <[email protected]>
> >> > CommitDate: Wed Nov 27 11:16:40 2013 +0100
> >> >
> >> > perf/x86: Add Intel RAPL PMU support
> >> >
> >> > This patch adds a new uncore PMU to expose the Intel
> >> > RAPL energy consumption counters. Up to 3 counters,
> >> > each counting a particular RAPL event are exposed.
> >> >
> >> > The RAPL counters are available on Intel SandyBridge,
> >> > IvyBridge, Haswell. The server skus add a 3rd counter.
> >> >
> >> > The following events are available and exposed in sysfs:
> >> >
> >> > - power/energy-cores: power consumption of all cores on socket
> >> > - power/energy-pkg: power consumption of all cores + LLc cache
> >> > - power/energy-dram: power consumption of DRAM (servers only)
> >> >
> >> > For each event both the unit (Joules) and scale (2^-32 J)
> >> > is exposed in sysfs for use by perf stat and other tools.
> >> > The files are:
> >> >
> >> > /sys/devices/power/events/energy-*.unit
> >> > /sys/devices/power/events/energy-*.scale
> >> >
> >> > The RAPL PMU is uncore by nature and is implemented such
> >> > that it only works in system-wide mode. Measuring only
> >> > one CPU per socket is sufficient. The /sys/devices/power/cpumask
> >> > file can be used by tools to figure out which CPUs to monitor
> >> > by default. For instance, on a 2-socket system, 2 CPUs
> >> > (one on each socket) will be shown.
> >> >
> >> > All the counters measure in the same unit (exposed via sysfs).
> >> > The perf_events API exposes all RAPL counters as 64-bit integers
> >> > counting in unit of 1/2^32 Joules (about 0.23 nJ). User level tools
> >> > must convert the counts by multiplying them by 2^-32 to obtain
> >> > Joules. The reason for this is that the kernel avoids
> >> > doing floating point math whenever possible because it is
> >> > expensive (user floating-point state must be saved). The method
> >> > used avoids kernel floating-point usage. There is no loss of
> >> > precision. Thanks to PeterZ for suggesting this approach.
> >> >
> >> > To convert the raw count in Watt:
> >> > W = C * 2.3 / (1e10 * time)
> >> > or ldexp(C, -32).
> >> >
> >> > RAPL PMU is a new standalone PMU which registers with the
> >> > perf_event core subsystem. The PMU type (attr->type) is
> >> > dynamically allocated and is available from /sys/device/power/type.
> >> >
> >> > Sampling is not supported by the RAPL PMU. There is no
> >> > privilege level filtering either.
> >> >
> >> > Signed-off-by: Stephane Eranian <[email protected]>
> >> > Reviewed-by: Maria Dimakopoulou <[email protected]>
> >> > Reviewed-by: Andi Kleen <[email protected]>
> >> > Signed-off-by: Peter Zijlstra <[email protected]>
> >> > Cc: [email protected]
> >> > Cc: [email protected]
> >> > Cc: [email protected]
> >> > Cc: [email protected]
> >> > Link: http://lkml.kernel.org/r/[email protected]
> >> > Signed-off-by: Ingo Molnar <[email protected]>
> >> >
> >> > +-----------------------------------------------------------+------------+------------+---------------+
> >> > | | 410136f5dd | 4788e5b4b2 | next-20140724 |
> >> > +-----------------------------------------------------------+------------+------------+---------------+
> >> > | boot_successes | 1000 | 751 | 78 |
> >> > | boot_failures | 0 | 149 | 3 |
> >> > | BUG:unable_to_handle_kernel_NULL_pointer_dereference | 0 | 132 | 2 |
> >> > | Oops | 0 | 132 | 2 |
> >> > | EIP_is_at_rapl_pmu_init | 0 | 132 | |
> >> > | Kernel_panic-not_syncing:Attempted_to_kill_init_exitcode= | 0 | 132 | 2 |
> >> > | backtrace:rapl_pmu_init | 0 | 132 | |
> >> > | backtrace:kernel_init_freeable | 0 | 132 | 2 |
> >> > | BUG:kernel_boot_hang | 0 | 17 | 1 |
> >> > | EIP_is_at_validate_chain | 0 | 0 | 2 |
> >> > | backtrace:free_reserved_area | 0 | 0 | 2 |
> >> > | backtrace:free_init_pages | 0 | 0 | 2 |
> >> > | backtrace:populate_rootfs | 0 | 0 | 2 |
> >> > +-----------------------------------------------------------+------------+------------+---------------+
> >> >
> >> > [ 0.613305] PCI: CLS 0 bytes, default 64
> >> > [ 0.614699] Unpacking initramfs...
> >> > [ 0.732188] Freeing initrd memory: 3276K (d3cbd000 - d3ff0000)
> >> > [ 0.733895] BUG: unable to handle kernel NULL pointer dereference at 00000028
> >> > [ 0.735603] IP: [<c09b20cb>] rapl_pmu_init+0x11e/0x139
> >> > [ 0.736012] *pdpt = 0000000000000000 *pde = f000ff53f000ff53
> >> > [ 0.736012] Oops: 0000 [#1] PREEMPT
> >> > [ 0.736012] Modules linked in:
> >> > [ 0.736012] CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.0-05711-g4788e5b #11
> >> > [ 0.736012] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> >> > [ 0.736012] task: d244c020 ti: d244e000 task.ti: d244e000
> >> > [ 0.736012] EIP: 0060:[<c09b20cb>] EFLAGS: 00010202 CPU: 0
> >> > [ 0.736012] EIP is at rapl_pmu_init+0x11e/0x139
> >> > [ 0.736012] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000001
> >> > [ 0.736012] ESI: c09b1fad EDI: 000000cc EBP: d244ff00 ESP: d244fef0
> >> > [ 0.736012] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
> >> > [ 0.736012] CR0: 80050033 CR2: 00000028 CR3: 00a16000 CR4: 000406b0
> >> > [ 0.736012] Stack:
> >> > [ 0.736012] c04ddabe 00000000 00000002 00000000 d244ff74 c0200477 c0251b16 d244ff2c
> >> > [ 0.736012] c025467d d3ff63cb d244ff34 c02410cb d3ff63cb d244ff00 c09aa512 c080d71c
> >> > [ 0.736012] 000000cc d244ff74 c02412d5 c0829fe0 00000286 c023b6d8 00000246 00060006
> >> > [ 0.736012] Call Trace:
> >> > [ 0.736012] [<c04ddabe>] ? register_syscore_ops+0x32/0x35
> >> > [ 0.736012] [<c0200477>] do_one_initcall+0xdf/0x138
> >> > [ 0.736012] [<c0251b16>] ? lock_release_holdtime.part.20+0x93/0xf8
> >> > [ 0.736012] [<c025467d>] ? trace_hardirqs_on_caller+0xeb/0x1ad
> >> > [ 0.736012] [<c02410cb>] ? parameq+0x13/0x5e
> >> > [ 0.736012] [<c09aa512>] ? repair_env_string+0x12/0x51
> >> > [ 0.736012] [<c02412d5>] ? parse_args+0x1bf/0x2f8
> >> > [ 0.736012] [<c023b6d8>] ? __usermodehelper_set_disable_depth+0x3e/0x44
> >> > [ 0.736012] [<c09aab46>] kernel_init_freeable+0xde/0x178
> >> > [ 0.736012] [<c09aa500>] ? do_early_param+0x78/0x78
> >> > [ 0.736012] [<c064bd10>] kernel_init+0xb/0xed
> >> > [ 0.736012] [<c0249199>] ? schedule_tail+0xc/0x3a
> >> > [ 0.736012] [<c0659637>] ret_from_kernel_thread+0x1b/0x28
> >> > [ 0.736012] [<c064bd05>] ? rest_init+0xb5/0xb5
> >> > [ 0.736012] Code: 99 87 ff 89 5c 24 04 c7 04 24 90 bf 76 c0 e8 dd e9 c9 ff 83 c8 ff eb 28 a1 44 bc a1 c0 f3 0f b8 c0 90 89 44 24 08 a1 80 73 82 c0 <8b> 40 28 89 44 24 04 c7 04 24 d4 bf 76 c0 e8 b2 e9 c9 ff 31 c0
> >> > [ 0.736012] EIP: [<c09b20cb>] rapl_pmu_init+0x11e/0x139 SS:ESP 0068:d244fef0
> >> > [ 0.736012] CR2: 0000000000000028
> >> > [ 0.736012] ---[ end trace 0a81712c9fb36a0a ]---
> >> > [ 0.736012] swapper (1) used greatest stack depth: 5800 bytes left
> >> >
> >> > git bisect start v3.14 v3.13 --
> >> > git bisect bad 09df7c4c8097ca4a11393b1edd4997d786daad52 # 16:18 0- 3 x86: Remove CONFIG_X86_OOSTORE
> >> > git bisect bad 15c81026204da897a05424c79263aea861a782cc # 16:24 2- 5 Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >> > git bisect bad a0fa1dd3cdbccec9597fe53b6177a9aa6e20f2f8 # 16:33 0- 15 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >> > git bisect good edde1fb8c41d0db7c8ce17fb32886da2e389b0cc # 17:48 900+ 0 Merge tag 'localmodconfig-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-kconfig
> >> > git bisect good a693c46e14c9fdadbcd68ddfa94a4f72495531a9 # 17:55 900+ 0 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >> > git bisect good 2cc3f16cad1561c6fc551aefff559e53726efc8b # 18:12 900+ 0 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >> > git bisect bad 9326657abe1a83ed4b4f396b923ca1217fd50cba # 18:21 9- 2 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
> >> > git bisect bad 7bb73553e2490ac6667387ee723e0faa61e9d999 # 18:38 0- 1 tools lib traceevent: Get rid of die() in reparent_op_arg()
> >> > git bisect bad 3d7c0144491bd8c21d53b43032274a85efdfe434 # 18:41 11- 4 perf tools: Add build and install plugins targets
> >> > git bisect bad ba1ddf42f3c3af111d3adee277534f73c1ef6a9b # 18:43 0- 15 perf script: Print mmap[2] events also
> >> > git bisect bad a8b4c7014cadfdacd4e1f4c963128593be6f20de # 18:49 0- 2 perf completion: Rename file to reflect zsh support
> >> > git bisect bad 4788e5b4b2338f85fa42a712a182d8afd65d7c58 # 18:53 0- 1 perf/x86: Add Intel RAPL PMU support
> >> > git bisect good c912dae60ae6f659455f239298110adc67a5f3e9 # 19:33 900+ 14 uprobes: Cleanup !CONFIG_UPROBES decls, unexport xol_area
> >> > git bisect good 09897d78dbc3a544426f2272b5601c62922ccab9 # 19:44 900+ 0 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
> >> > git bisect good 410136f5dd96b6013fe6d1011b523b1c247e1ccb # 19:52 900+ 0 tools/perf/stat: Add event unit and scale support
> >> > # first bad commit: [4788e5b4b2338f85fa42a712a182d8afd65d7c58] perf/x86: Add Intel RAPL PMU support
> >> > git bisect good 410136f5dd96b6013fe6d1011b523b1c247e1ccb # 19:56 1000+ 0 tools/perf/stat: Add event unit and scale support
> >> > git bisect bad 1a58d9909611972fd1c081bb04a9f7dc2571e612 # 19:58 0- 3 Add linux-next specific files for 20140724
> >> > git bisect bad 82e13c71bc655b6dc7110da4e164079dadb44892 # 20:07 448- 10 Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux
> >> > git bisect bad 5a7439efd1c5c416f768fc550048ca130cf4bf99 # 20:14 2- 6 Add linux-next specific files for 20140725
> >> >
> >> >
> >> > This script may reproduce the error.
> >> >
> >> > ----------------------------------------------------------------------------
> >> > #!/bin/bash
> >> >
> >> > kernel=$1
> >> > initrd=yocto-minimal-i386.cgz
> >> >
> >> > wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/blob/master/initrd/$initrd
> >> >
> >> > kvm=(
> >> > qemu-system-x86_64
> >> > -enable-kvm
> >> > -cpu Haswell,+smep,+smap
> >> > -kernel $kernel
> >> > -initrd $initrd
> >> > -m 320
> >> > -smp 1
> >> > -net nic,vlan=1,model=e1000
> >> > -net user,vlan=1
> >> > -boot order=nc
> >> > -no-reboot
> >> > -watchdog i6300esb
> >> > -rtc base=localtime
> >> > -serial stdio
> >> > -display none
> >> > -monitor null
> >> > )
> >> >
> >> > append=(
> >> > hung_task_panic=1
> >> > earlyprintk=ttyS0,115200
> >> > debug
> >> > apic=debug
> >> > sysrq_always_enabled
> >> > rcupdate.rcu_cpu_stall_timeout=100
> >> > panic=10
> >> > softlockup_panic=1
> >> > nmi_watchdog=panic
> >> > prompt_ramdisk=0
> >> > console=ttyS0,115200
> >> > console=tty0
> >> > vga=normal
> >> > root=/dev/ram0
> >> > rw
> >> > drbd.minor_count=8
> >> > )
> >> >
> >> > "${kvm[@]}" --append "${append[*]}"
> >> > ----------------------------------------------------------------------------
> >> >
> >> > Thanks,
> >> > Fengguang
> >> >
> >> > _______________________________________________
> >> > LKP mailing list
> >> > [email protected]
> >> >


Attachments:
(No filename) (13.80 kB)
dmesg-quantal-ivb42-98:20140725122403:i386-randconfig-ib1-07251153:: (30.41 kB)
Download all attachments

2014-07-31 17:57:27

by Stephane Eranian

[permalink] [raw]
Subject: Re: [perf/x86/RAPL] BUG: unable to handle kernel NULL pointer dereference at 00000028

On Thu, Jul 31, 2014 at 4:32 AM, Fengguang Wu <[email protected]> wrote:
> Hi Stephane,
>
> On Wed, Jul 30, 2014 at 07:56:11PM +0200, Stephane Eranian wrote:
>> On Wed, Jul 30, 2014 at 7:53 AM, Fengguang Wu <[email protected]> wrote:
>> > On Wed, Jul 30, 2014 at 06:45:58AM +0200, Stephane Eranian wrote:
>> >> On Wed, Jul 30, 2014 at 6:00 AM, Fengguang Wu <[email protected]> wrote:
>> >> > Greetings,
>> >> >
>> >> > 0day kernel testing robot got the below dmesg and the first bad commit is
>> >> >
>> >> Is this booting a guest kernel or native?
>> >
>> > It's a guest kernel.
>> >
>> >> What is the host CPU?
>> >
>> > The host CPU is E5-2680, Sandy Bridge-EP.
>> >
>> I thought this problem had already be mentioned a while back.
>>
>> See https://lkml.org/lkml/2014/3/6/685
>> And https://lkml.org/lkml/2014/4/23/512
>>
>> So what you are telling here is that those two fixes never made it or
>> that you are
>> running an older kernel.
>
> I just checked linux-next and find that the bug in rapl_pmu_init() has
> been fixed. linux-next happen to have the same "BUG: unable to handle
> kernel NULL pointer dereference" message but at another function
> validate_chain().. Attached is the dmesg in linux-next.
>
> Sorry for the noise!
>
Is it fixed with the two patches I referred you to?

> Thanks,
> Fengguang
>
>> >> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
>> >> > commit 4788e5b4b2338f85fa42a712a182d8afd65d7c58
>> >> > Author: Stephane Eranian <[email protected]>
>> >> > AuthorDate: Tue Nov 12 17:58:50 2013 +0100
>> >> > Commit: Ingo Molnar <[email protected]>
>> >> > CommitDate: Wed Nov 27 11:16:40 2013 +0100
>> >> >
>> >> > perf/x86: Add Intel RAPL PMU support
>> >> >
>> >> > This patch adds a new uncore PMU to expose the Intel
>> >> > RAPL energy consumption counters. Up to 3 counters,
>> >> > each counting a particular RAPL event are exposed.
>> >> >
>> >> > The RAPL counters are available on Intel SandyBridge,
>> >> > IvyBridge, Haswell. The server skus add a 3rd counter.
>> >> >
>> >> > The following events are available and exposed in sysfs:
>> >> >
>> >> > - power/energy-cores: power consumption of all cores on socket
>> >> > - power/energy-pkg: power consumption of all cores + LLc cache
>> >> > - power/energy-dram: power consumption of DRAM (servers only)
>> >> >
>> >> > For each event both the unit (Joules) and scale (2^-32 J)
>> >> > is exposed in sysfs for use by perf stat and other tools.
>> >> > The files are:
>> >> >
>> >> > /sys/devices/power/events/energy-*.unit
>> >> > /sys/devices/power/events/energy-*.scale
>> >> >
>> >> > The RAPL PMU is uncore by nature and is implemented such
>> >> > that it only works in system-wide mode. Measuring only
>> >> > one CPU per socket is sufficient. The /sys/devices/power/cpumask
>> >> > file can be used by tools to figure out which CPUs to monitor
>> >> > by default. For instance, on a 2-socket system, 2 CPUs
>> >> > (one on each socket) will be shown.
>> >> >
>> >> > All the counters measure in the same unit (exposed via sysfs).
>> >> > The perf_events API exposes all RAPL counters as 64-bit integers
>> >> > counting in unit of 1/2^32 Joules (about 0.23 nJ). User level tools
>> >> > must convert the counts by multiplying them by 2^-32 to obtain
>> >> > Joules. The reason for this is that the kernel avoids
>> >> > doing floating point math whenever possible because it is
>> >> > expensive (user floating-point state must be saved). The method
>> >> > used avoids kernel floating-point usage. There is no loss of
>> >> > precision. Thanks to PeterZ for suggesting this approach.
>> >> >
>> >> > To convert the raw count in Watt:
>> >> > W = C * 2.3 / (1e10 * time)
>> >> > or ldexp(C, -32).
>> >> >
>> >> > RAPL PMU is a new standalone PMU which registers with the
>> >> > perf_event core subsystem. The PMU type (attr->type) is
>> >> > dynamically allocated and is available from /sys/device/power/type.
>> >> >
>> >> > Sampling is not supported by the RAPL PMU. There is no
>> >> > privilege level filtering either.
>> >> >
>> >> > Signed-off-by: Stephane Eranian <[email protected]>
>> >> > Reviewed-by: Maria Dimakopoulou <[email protected]>
>> >> > Reviewed-by: Andi Kleen <[email protected]>
>> >> > Signed-off-by: Peter Zijlstra <[email protected]>
>> >> > Cc: [email protected]
>> >> > Cc: [email protected]
>> >> > Cc: [email protected]
>> >> > Cc: [email protected]
>> >> > Link: http://lkml.kernel.org/r/[email protected]
>> >> > Signed-off-by: Ingo Molnar <[email protected]>
>> >> >
>> >> > +-----------------------------------------------------------+------------+------------+---------------+
>> >> > | | 410136f5dd | 4788e5b4b2 | next-20140724 |
>> >> > +-----------------------------------------------------------+------------+------------+---------------+
>> >> > | boot_successes | 1000 | 751 | 78 |
>> >> > | boot_failures | 0 | 149 | 3 |
>> >> > | BUG:unable_to_handle_kernel_NULL_pointer_dereference | 0 | 132 | 2 |
>> >> > | Oops | 0 | 132 | 2 |
>> >> > | EIP_is_at_rapl_pmu_init | 0 | 132 | |
>> >> > | Kernel_panic-not_syncing:Attempted_to_kill_init_exitcode= | 0 | 132 | 2 |
>> >> > | backtrace:rapl_pmu_init | 0 | 132 | |
>> >> > | backtrace:kernel_init_freeable | 0 | 132 | 2 |
>> >> > | BUG:kernel_boot_hang | 0 | 17 | 1 |
>> >> > | EIP_is_at_validate_chain | 0 | 0 | 2 |
>> >> > | backtrace:free_reserved_area | 0 | 0 | 2 |
>> >> > | backtrace:free_init_pages | 0 | 0 | 2 |
>> >> > | backtrace:populate_rootfs | 0 | 0 | 2 |
>> >> > +-----------------------------------------------------------+------------+------------+---------------+
>> >> >
>> >> > [ 0.613305] PCI: CLS 0 bytes, default 64
>> >> > [ 0.614699] Unpacking initramfs...
>> >> > [ 0.732188] Freeing initrd memory: 3276K (d3cbd000 - d3ff0000)
>> >> > [ 0.733895] BUG: unable to handle kernel NULL pointer dereference at 00000028
>> >> > [ 0.735603] IP: [<c09b20cb>] rapl_pmu_init+0x11e/0x139
>> >> > [ 0.736012] *pdpt = 0000000000000000 *pde = f000ff53f000ff53
>> >> > [ 0.736012] Oops: 0000 [#1] PREEMPT
>> >> > [ 0.736012] Modules linked in:
>> >> > [ 0.736012] CPU: 0 PID: 1 Comm: swapper Not tainted 3.12.0-05711-g4788e5b #11
>> >> > [ 0.736012] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> >> > [ 0.736012] task: d244c020 ti: d244e000 task.ti: d244e000
>> >> > [ 0.736012] EIP: 0060:[<c09b20cb>] EFLAGS: 00010202 CPU: 0
>> >> > [ 0.736012] EIP is at rapl_pmu_init+0x11e/0x139
>> >> > [ 0.736012] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000001
>> >> > [ 0.736012] ESI: c09b1fad EDI: 000000cc EBP: d244ff00 ESP: d244fef0
>> >> > [ 0.736012] DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
>> >> > [ 0.736012] CR0: 80050033 CR2: 00000028 CR3: 00a16000 CR4: 000406b0
>> >> > [ 0.736012] Stack:
>> >> > [ 0.736012] c04ddabe 00000000 00000002 00000000 d244ff74 c0200477 c0251b16 d244ff2c
>> >> > [ 0.736012] c025467d d3ff63cb d244ff34 c02410cb d3ff63cb d244ff00 c09aa512 c080d71c
>> >> > [ 0.736012] 000000cc d244ff74 c02412d5 c0829fe0 00000286 c023b6d8 00000246 00060006
>> >> > [ 0.736012] Call Trace:
>> >> > [ 0.736012] [<c04ddabe>] ? register_syscore_ops+0x32/0x35
>> >> > [ 0.736012] [<c0200477>] do_one_initcall+0xdf/0x138
>> >> > [ 0.736012] [<c0251b16>] ? lock_release_holdtime.part.20+0x93/0xf8
>> >> > [ 0.736012] [<c025467d>] ? trace_hardirqs_on_caller+0xeb/0x1ad
>> >> > [ 0.736012] [<c02410cb>] ? parameq+0x13/0x5e
>> >> > [ 0.736012] [<c09aa512>] ? repair_env_string+0x12/0x51
>> >> > [ 0.736012] [<c02412d5>] ? parse_args+0x1bf/0x2f8
>> >> > [ 0.736012] [<c023b6d8>] ? __usermodehelper_set_disable_depth+0x3e/0x44
>> >> > [ 0.736012] [<c09aab46>] kernel_init_freeable+0xde/0x178
>> >> > [ 0.736012] [<c09aa500>] ? do_early_param+0x78/0x78
>> >> > [ 0.736012] [<c064bd10>] kernel_init+0xb/0xed
>> >> > [ 0.736012] [<c0249199>] ? schedule_tail+0xc/0x3a
>> >> > [ 0.736012] [<c0659637>] ret_from_kernel_thread+0x1b/0x28
>> >> > [ 0.736012] [<c064bd05>] ? rest_init+0xb5/0xb5
>> >> > [ 0.736012] Code: 99 87 ff 89 5c 24 04 c7 04 24 90 bf 76 c0 e8 dd e9 c9 ff 83 c8 ff eb 28 a1 44 bc a1 c0 f3 0f b8 c0 90 89 44 24 08 a1 80 73 82 c0 <8b> 40 28 89 44 24 04 c7 04 24 d4 bf 76 c0 e8 b2 e9 c9 ff 31 c0
>> >> > [ 0.736012] EIP: [<c09b20cb>] rapl_pmu_init+0x11e/0x139 SS:ESP 0068:d244fef0
>> >> > [ 0.736012] CR2: 0000000000000028
>> >> > [ 0.736012] ---[ end trace 0a81712c9fb36a0a ]---
>> >> > [ 0.736012] swapper (1) used greatest stack depth: 5800 bytes left
>> >> >
>> >> > git bisect start v3.14 v3.13 --
>> >> > git bisect bad 09df7c4c8097ca4a11393b1edd4997d786daad52 # 16:18 0- 3 x86: Remove CONFIG_X86_OOSTORE
>> >> > git bisect bad 15c81026204da897a05424c79263aea861a782cc # 16:24 2- 5 Merge branch 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> >> > git bisect bad a0fa1dd3cdbccec9597fe53b6177a9aa6e20f2f8 # 16:33 0- 15 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> >> > git bisect good edde1fb8c41d0db7c8ce17fb32886da2e389b0cc # 17:48 900+ 0 Merge tag 'localmodconfig-v3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-kconfig
>> >> > git bisect good a693c46e14c9fdadbcd68ddfa94a4f72495531a9 # 17:55 900+ 0 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> >> > git bisect good 2cc3f16cad1561c6fc551aefff559e53726efc8b # 18:12 900+ 0 Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> >> > git bisect bad 9326657abe1a83ed4b4f396b923ca1217fd50cba # 18:21 9- 2 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
>> >> > git bisect bad 7bb73553e2490ac6667387ee723e0faa61e9d999 # 18:38 0- 1 tools lib traceevent: Get rid of die() in reparent_op_arg()
>> >> > git bisect bad 3d7c0144491bd8c21d53b43032274a85efdfe434 # 18:41 11- 4 perf tools: Add build and install plugins targets
>> >> > git bisect bad ba1ddf42f3c3af111d3adee277534f73c1ef6a9b # 18:43 0- 15 perf script: Print mmap[2] events also
>> >> > git bisect bad a8b4c7014cadfdacd4e1f4c963128593be6f20de # 18:49 0- 2 perf completion: Rename file to reflect zsh support
>> >> > git bisect bad 4788e5b4b2338f85fa42a712a182d8afd65d7c58 # 18:53 0- 1 perf/x86: Add Intel RAPL PMU support
>> >> > git bisect good c912dae60ae6f659455f239298110adc67a5f3e9 # 19:33 900+ 14 uprobes: Cleanup !CONFIG_UPROBES decls, unexport xol_area
>> >> > git bisect good 09897d78dbc3a544426f2272b5601c62922ccab9 # 19:44 900+ 0 Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/core
>> >> > git bisect good 410136f5dd96b6013fe6d1011b523b1c247e1ccb # 19:52 900+ 0 tools/perf/stat: Add event unit and scale support
>> >> > # first bad commit: [4788e5b4b2338f85fa42a712a182d8afd65d7c58] perf/x86: Add Intel RAPL PMU support
>> >> > git bisect good 410136f5dd96b6013fe6d1011b523b1c247e1ccb # 19:56 1000+ 0 tools/perf/stat: Add event unit and scale support
>> >> > git bisect bad 1a58d9909611972fd1c081bb04a9f7dc2571e612 # 19:58 0- 3 Add linux-next specific files for 20140724
>> >> > git bisect bad 82e13c71bc655b6dc7110da4e164079dadb44892 # 20:07 448- 10 Merge branch 'for-3.16' of git://linux-nfs.org/~bfields/linux
>> >> > git bisect bad 5a7439efd1c5c416f768fc550048ca130cf4bf99 # 20:14 2- 6 Add linux-next specific files for 20140725
>> >> >
>> >> >
>> >> > This script may reproduce the error.
>> >> >
>> >> > ----------------------------------------------------------------------------
>> >> > #!/bin/bash
>> >> >
>> >> > kernel=$1
>> >> > initrd=yocto-minimal-i386.cgz
>> >> >
>> >> > wget --no-clobber https://github.com/fengguang/reproduce-kernel-bug/blob/master/initrd/$initrd
>> >> >
>> >> > kvm=(
>> >> > qemu-system-x86_64
>> >> > -enable-kvm
>> >> > -cpu Haswell,+smep,+smap
>> >> > -kernel $kernel
>> >> > -initrd $initrd
>> >> > -m 320
>> >> > -smp 1
>> >> > -net nic,vlan=1,model=e1000
>> >> > -net user,vlan=1
>> >> > -boot order=nc
>> >> > -no-reboot
>> >> > -watchdog i6300esb
>> >> > -rtc base=localtime
>> >> > -serial stdio
>> >> > -display none
>> >> > -monitor null
>> >> > )
>> >> >
>> >> > append=(
>> >> > hung_task_panic=1
>> >> > earlyprintk=ttyS0,115200
>> >> > debug
>> >> > apic=debug
>> >> > sysrq_always_enabled
>> >> > rcupdate.rcu_cpu_stall_timeout=100
>> >> > panic=10
>> >> > softlockup_panic=1
>> >> > nmi_watchdog=panic
>> >> > prompt_ramdisk=0
>> >> > console=ttyS0,115200
>> >> > console=tty0
>> >> > vga=normal
>> >> > root=/dev/ram0
>> >> > rw
>> >> > drbd.minor_count=8
>> >> > )
>> >> >
>> >> > "${kvm[@]}" --append "${append[*]}"
>> >> > ----------------------------------------------------------------------------
>> >> >
>> >> > Thanks,
>> >> > Fengguang
>> >> >
>> >> > _______________________________________________
>> >> > LKP mailing list
>> >> > [email protected]
>> >> >

2014-07-31 23:59:49

by Fengguang Wu

[permalink] [raw]
Subject: Re: [perf/x86/RAPL] BUG: unable to handle kernel NULL pointer dereference at 00000028

On Thu, Jul 31, 2014 at 07:57:25PM +0200, Stephane Eranian wrote:
> On Thu, Jul 31, 2014 at 4:32 AM, Fengguang Wu <[email protected]> wrote:
> > Hi Stephane,
> >
> > On Wed, Jul 30, 2014 at 07:56:11PM +0200, Stephane Eranian wrote:
> >> On Wed, Jul 30, 2014 at 7:53 AM, Fengguang Wu <[email protected]> wrote:
> >> > On Wed, Jul 30, 2014 at 06:45:58AM +0200, Stephane Eranian wrote:
> >> >> On Wed, Jul 30, 2014 at 6:00 AM, Fengguang Wu <[email protected]> wrote:
> >> >> > Greetings,
> >> >> >
> >> >> > 0day kernel testing robot got the below dmesg and the first bad commit is
> >> >> >
> >> >> Is this booting a guest kernel or native?
> >> >
> >> > It's a guest kernel.
> >> >
> >> >> What is the host CPU?
> >> >
> >> > The host CPU is E5-2680, Sandy Bridge-EP.
> >> >
> >> I thought this problem had already be mentioned a while back.
> >>
> >> See https://lkml.org/lkml/2014/3/6/685
> >> And https://lkml.org/lkml/2014/4/23/512
> >>
> >> So what you are telling here is that those two fixes never made it or
> >> that you are
> >> running an older kernel.
> >
> > I just checked linux-next and find that the bug in rapl_pmu_init() has
> > been fixed. linux-next happen to have the same "BUG: unable to handle
> > kernel NULL pointer dereference" message but at another function
> > validate_chain().. Attached is the dmesg in linux-next.
> >
> > Sorry for the noise!
> >
> Is it fixed with the two patches I referred you to?

Yes.

Thanks,
Fengguang