Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751238AbdL3SUI (ORCPT ); Sat, 30 Dec 2017 13:20:08 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:35138 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750977AbdL3SUH (ORCPT ); Sat, 30 Dec 2017 13:20:07 -0500 Date: Sat, 30 Dec 2017 19:20:04 +0100 (CET) From: Thomas Gleixner To: Dominik Brodowski cc: Andy Lutomirski , dave.hansen@linux.intel.com, LKML , x86@kernel.org, Linus Torvalds Subject: Re: x86/pti: smp_processor_id() called while preemptible in resume-from-sleep In-Reply-To: <20171230153054.GA1604@light.dominikbrodowski.net> Message-ID: References: <20171230132927.GA2731@light.dominikbrodowski.net> <20171230153054.GA1604@light.dominikbrodowski.net> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2410 Lines: 60 On Sat, 30 Dec 2017, Dominik Brodowski wrote: > On Sat, Dec 30, 2017 at 04:03:07PM +0100, Thomas Gleixner wrote: > > On Sat, 30 Dec 2017, Dominik Brodowski wrote: > > > resume-from-sleep (mem/S3) on v4.15-rc5-149-g5aa90a845892 triggers the > > > following bug. If I boot with "pti=off", the kernel does not show this > > > issue, and neither did kernels before pti was merged: > > > > > > [ 39.951703] ACPI: Low-level resume complete > > > [ 39.951832] ACPI: EC: EC started > > > [ 39.951840] PM: Restoring platform NVS memory > > > [ 39.954648] Enabling non-boot CPUs ... > > > [ 39.954792] x86: Booting SMP configuration: > > > [ 39.954800] smpboot: Booting Node 0 Processor 1 APIC 0x2 > > > [ 39.954834] BUG: using smp_processor_id() in preemptible [00000000] code: sh/465 > > > [ 39.954841] caller is native_cpu_up+0x2f0/0xa30 > > > > I can't reproduce at the moment and I can't find a possible reason for this > > by code inspection. > > Thanks for taking a look at it! > > > Can you please provide your .config file > > See attached. > > > and perhaps decode the two offending call sites with > > > > scripts/faddr2line vmlinux native_cpu_up+0x2f0/0xa30 native_cpu_up+0x447/0xa30 > > native_cpu_up+0x2f0/0xa30: > invalidate_user_asid at arch/x86/include/asm/tlbflush.h:343 Ah, that makes sense. Missed that in the maze. What makes less sense is that tlbflush itself. I'm surely missing something subtle, but from a first look that tlbflush is pointless. > (inlined by) __native_flush_tlb at arch/x86/include/asm/tlbflush.h:351 > (inlined by) smpboot_setup_warm_reset_vector at arch/x86/kernel/smpboot.c:129 > (inlined by) do_boot_cpu at arch/x86/kernel/smpboot.c:950 > (inlined by) native_cpu_up at arch/x86/kernel/smpboot.c:1070 > > native_cpu_up+0x447/0xa30: > kern_pcid at arch/x86/include/asm/tlbflush.h:105 > (inlined by) invalidate_user_asid at arch/x86/include/asm/tlbflush.h:342 > (inlined by) __native_flush_tlb at arch/x86/include/asm/tlbflush.h:351 > (inlined by) smpboot_restore_warm_reset_vector at arch/x86/kernel/smpboot.c:146 This one even more so as the stale comment suggests, that there was some page table fiddling at some point in the past. > (inlined by) do_boot_cpu at arch/x86/kernel/smpboot.c:1022 > (inlined by) native_cpu_up at arch/x86/kernel/smpboot.c:1070 Let me think about it and do some archaeological research. Thanks, tglx