Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755123AbcLZRbt convert rfc822-to-8bit (ORCPT ); Mon, 26 Dec 2016 12:31:49 -0500 Received: from userp1040.oracle.com ([156.151.31.81]:44420 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752104AbcLZRbr (ORCPT ); Mon, 26 Dec 2016 12:31:47 -0500 Subject: Re: [GIT pull] smp/hotplug: Removal of notifiers To: Markus Trippelsdorf , Thomas Gleixner References: <20161226074530.GA297@x4> <20161226110600.GB297@x4> <20161226154502.GA287@x4> Cc: Linus Torvalds , LKML , Ingo Molnar , "H. Peter Anvin" , Sebastian Andrzej Siewior , Borislav Petkov From: Boris Ostrovsky Message-ID: <53e3b52b-f353-63c8-f96f-649d754596bc@oracle.com> Date: Mon, 26 Dec 2016 12:31:08 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.4.0 MIME-Version: 1.0 In-Reply-To: <20161226154502.GA287@x4> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8BIT X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4756 Lines: 115 On 12/26/2016 10:45 AM, Markus Trippelsdorf wrote: > On 2016.12.26 at 12:06 +0100, Markus Trippelsdorf wrote: >> On 2016.12.26 at 08:45 +0100, Markus Trippelsdorf wrote: >>> On 2016.12.25 at 14:39 +0100, Thomas Gleixner wrote: >>>> Linus, >>>> >>>> please pull the latest smp-urgent-for-linus git tree from: >>>> >>>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git smp-urgent-for-linus >>>> >>>> Thomas Gleixner (11): >>>> cpu/hotplug: Prevent overwriting of callbacks >>> The following commit: >>> >>> commit dc280d93623927570da279e99393879dbbab39e7 >>> Author: Thomas Gleixner >>> Date: Wed Dec 21 20:19:49 2016 +0100 >>> >>> cpu/hotplug: Prevent overwriting of callbacks >>> >>> results in an early OOPs during boot on my AMD machine. >>> I haven't wrote down the entire backtrace, but basically things start to >>> go wrong in mce_threshold_create_device() from >>> arch/x86/kernel/cpu/mcheck/mce_amd.c. >>> >>> # CONFIG_HOTPLUG_CPU is not set >>> >>> Reverting the commit "fixes" the issue for me. >> CCing Sebastian and Borislav. > BUG: unable to handle kernel NULL pointer dereference at 000000000000004c > > RIP: kobject_get at lib/kobject.c:594 > (inlined by) kobject_add_internal at lib/kobject.c:214 > > ? kobj_to_dev at include/linux/device.h:968 (discriminator 1) > (inlined by) get_device at drivers/base/core.c:1796 (discriminator 1) > > ? kobject_add at lib/kobject.c:415 > > ? kobject_create_and_add at lib/kobject.c:753 > > ? threshold_create_bank at arch/x86/kernel/cpu/mcheck/mce_amd.c:1212 > (inlined by) mce_threshold_create_device at arch/x86/kernel/cpu/mcheck/mce_amd.c:1348 > > The comment in arch/x86/kernel/cpu/mcheck/mce_amd.c says: > > 1384 * mcheck_init_device should be inited before threshold_init_device to > 1385 * initialize mce_device, otherwise a NULL ptr dereference will cause panic. My nightly test hit this as well. AMD only, Intel passed. I haven't verified whether commit that Markus implicated is the one that caused this but it's the same BUG signature (but possibly slightly different stack) [ 1.554351] smpboot: CPU0: AMD Engineering Sample (family: 0x10, model: 0x4, stepping: 0x1) ... [ 33.579949] BUG: unable to handle kernel NULL pointer dereference at 000000000000004c [ 33.588018] IP: kobject_get+0x11/0x80 [ 33.591787] PGD 0 [ 33.591788] [ 33.595386] Oops: 0000 [#1] SMP [ 33.598620] Modules linked in: [ 33.601765] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc1upstream #1 [ 33.608936] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS 080014 07/18/200 8 [ 33.620136] task: ffff880216eb6d40 task.stack: ffffc90000c60000 [ 33.626235] RIP: 0010:kobject_get+0x11/0x80 [ 33.630543] RSP: 0018:ffffc90000c63c98 EFLAGS: 00010202 [ 33.635925] RAX: ffffffff81b6ba09 RBX: 0000000000000010 RCX: 0000000000000000 [ 33.643276] RDX: 0000000000000000 RSI: 000000000000002f RDI: 0000000000000010 [ 33.650627] RBP: ffffc90000c63ca8 R08: 0000000000000001 R09: 0000000000000025 [ 33.657978] R10: dead000000000200 R11: dead000000000100 R12: ffff8802164887c0 [ 33.665329] R13: 0000000000000000 R14: 000000000000d538 R15: ffff88021694c180 [ 33.672680] FS: 0000000000000000(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000 [ 33.681015] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 33.686933] CR2: 000000000000004c CR3: 0000000001e0a000 CR4: 00000000000006e0 [ 33.694284] Call Trace: [ 33.696803] kobject_add_internal+0x40/0x2e0 [ 33.701199] ? kfree_const+0x1d/0x30 [ 33.704878] kobject_add_varg+0x38/0x60 [ 33.708829] kobject_add+0x44/0x70 [ 33.712331] kobject_create_and_add+0x3e/0x80 [ 33.716818] mce_threshold_create_device+0x128/0x380 [ 33.721931] ? __debugfs_create_file+0xe9/0x130 [ 33.726596] threshold_init_device+0x26/0x56 [ 33.730994] ? severities_debugfs_init+0x3c/0x3c [ 33.735749] ? severities_debugfs_init+0x3c/0x3c [ 33.740504] do_one_initcall+0x45/0x170 [ 33.744455] kernel_init_freeable+0x17b/0x214 [ 33.748941] ? kernel_init_freeable+0x214/0x214 [ 33.753606] ? rest_init+0x90/0x90 [ 33.757108] kernel_init+0x9/0x100 [ 33.760610] ret_from_fork+0x25/0x30 [ 33.764289] Code: 89 e5 e8 b3 a6 e5 ff c9 c3 90 55 48 89 e5 e8 a7 a6 e5 ff c9 c3 0f 1f 44 00 00 55 48 89 e5 53 48 89 fb 48 83 ec 08 48 85 ff 74 18 47 3c 01 74 1c b8 01 00 00 00 f0 0f c1 43 38 83 c0 01 83 f8 [ 33.783741] RIP: kobject_get+0x11/0x80 RSP: ffffc90000c63c98 [ 33.789570] CR2: 000000000000004c [ 33.792984] ---[ end trace 861eb820e5b8a9c8 ]--- [ 33.797737] Kernel panic - not syncing: Fatal exception [ 33.803132] Kernel Offset: disabled [ 33.806722] ---[ end Kernel panic - not syncing: Fatal exception