Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753089AbeADUMA (ORCPT + 1 other); Thu, 4 Jan 2018 15:12:00 -0500 Received: from mail-it0-f52.google.com ([209.85.214.52]:43859 "EHLO mail-it0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752109AbeADUL5 (ORCPT ); Thu, 4 Jan 2018 15:11:57 -0500 X-Google-Smtp-Source: ACJfBovH0Vvv9QcI8r1uduXaxwRZ1p3Fp3T9bzZU7bYtNotD60UjJhLRhASKHwTwjeCwqDUsRm42Pk3vG+LzCgd1jXk= MIME-Version: 1.0 In-Reply-To: References: <20180103195056.837404126@linuxfoundation.org> From: Linus Torvalds Date: Thu, 4 Jan 2018 12:11:56 -0800 X-Google-Sender-Auth: Bz8ggADWMKu0ZIyfC8r4MJ7w1d4 Message-ID: Subject: Re: [PATCH 4.4 00/37] 4.4.110-stable review To: Pavel Tatashin Cc: Greg Kroah-Hartman , Linux Kernel Mailing List , Andrew Morton , Guenter Roeck , Shuah Khan , patches@kernelci.org, Ben Hutchings , lkft-triage@lists.linaro.org, stable Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: On Thu, Jan 4, 2018 at 8:38 AM, Pavel Tatashin wrote: > I am getting the following panic when trying to boot 4.4.110rc1 on > Intel(R) Xeon(R) CPU E5-2630: > > [ 5.923489] BUG: unable to handle kernel NULL pointer dereference at 000000000000000d > [ 5.932259] IP: [] dyntick_save_progress_counter+0x12/0x50 Hmm. You don't have the "Code:" line in this oops anywhere, do you? > [ 5.977905] RIP: dyntick_save_progress_counter+0x12/0x50 > [ 5.988505] RSP: 0000:ffff881ff2f27dc0 EFLAGS: 00010046 > [ 5.994434] RAX: 0000000000000001 RBX: ffffffff81b02140 RCX: ffff883fec768000 > [ 6.002403] RDX: 0000000000000000 RSI: ffff881ff2f27e5f RDI: ffff88407e958140 > [ 6.010368] RBP: ffff881ff2f27dc0 R08: ffff881ff2f27e78 R09: 000000016110f359 > [ 6.018333] R10: 0000000000000b10 R11: 0000000000000000 R12: ffffffff81b02140 > [ 6.026297] R13: 00000000ffffffdf R14: 0000000000000021 R15: 0000000200000000 > [ 6.034262] FS: 0000000000000000(0000) GS:ffff881fff940000(0000) knlGS:0000000000000000 > [ 6.043293] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 6.049707] CR2: 000000000000000d CR3: 0000000001aa6000 CR4: 0000000000360670 > [ 6.057672] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 6.065638] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 6.073603] Stack: > [ 6.075847] ffff881ff2f27e18 ffffffff810e8fac 0000000000000202 ffff881ff2f27e60 > [ 6.084158] ffff881ff2f27e5f ffffffff810e70c0 ffffffff81b02140 ffffffff81b127a0 > [ 6.092465] 0000000000000001 0000000000000000 0000000000000003 ffff881ff2f27eb8 > [ 6.100768] Call Trace: > [ 6.103501] [] force_qs_rnp+0xdc/0x150 The oops looks like it *might* be this: lock xadd %edx,0xc(%rax) which is from the int snap = atomic_add_return(0, &rdtp->dynticks); in rcu_dynticks_snap() because %rax is 1 and that would give you the invalid page fault and the right faulting address. But that would be complete rcu data structure corruption (that rdtp pointer comes from per_cpu_ptr(rsp->rda, cpu) in force_qs_rnp(), afaik. The PTI patches obviously change percpu stuff, but this looks like an odd place for that to manifest. Linus