Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751014AbdL3Skj (ORCPT ); Sat, 30 Dec 2017 13:40:39 -0500 Received: from mail-io0-f181.google.com ([209.85.223.181]:39222 "EHLO mail-io0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750912AbdL3Ski (ORCPT ); Sat, 30 Dec 2017 13:40:38 -0500 X-Google-Smtp-Source: ACJfBotHMV0DHXACvnr0eNj7HTLsZaw9TdB1xa7qknEweTP/Hq68uZxBMQBdMsuFiS0lc6j6DCYaHXx3TsLOxHz2njE= MIME-Version: 1.0 In-Reply-To: References: <20171230132927.GA2731@light.dominikbrodowski.net> <20171230153054.GA1604@light.dominikbrodowski.net> From: Linus Torvalds Date: Sat, 30 Dec 2017 10:40:37 -0800 X-Google-Sender-Auth: OA86_7Hcw4EVfulUTFDx_LUXKyY Message-ID: Subject: Re: x86/pti: smp_processor_id() called while preemptible in resume-from-sleep To: Thomas Gleixner Cc: Dominik Brodowski , Andy Lutomirski , Dave Hansen , LKML , "the arch/x86 maintainers" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1859 Lines: 49 On Sat, Dec 30, 2017 at 10:20 AM, Thomas Gleixner wrote: > On Sat, 30 Dec 2017, Dominik Brodowski wrote: >> >> native_cpu_up+0x2f0/0xa30: >> invalidate_user_asid at arch/x86/include/asm/tlbflush.h:343 > > Ah, that makes sense. Missed that in the maze. > > What makes less sense is that tlbflush itself. I'm surely missing something > subtle, but from a first look that tlbflush is pointless. Hmm. Regardless of whether the TLB flush makes sense in that smpboot_setup_warm_reset_vector() location (and I agree that it looks a bit odd), the warning does look pretty relevant. The __native_flush_tlb() function looks _very_ broken. It does: invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); /* * If current->mm == NULL then we borrow a mm which may change * during a task switch and therefore we must not be preempted * while we write CR3 back: */ preempt_disable(); native_write_cr3(__native_read_cr3()); preempt_enable(); but why is that preempt-disabled region only around the cr3 write? The invalidate_user_asid() logic seems to be very CPU-sensitive too. And even if there is some reason why invalidate_user_asid() really can do multiple different percpu accesses and it doesn't matter whether the thread is bouncing around on different cpu's while it does it, there doesn't seem any _reason_ not to just extend the preempt-disable over the whole series. It really looks strange how it does multiple reads (and then a final write!) to percpu state, when the cpu can change in between. So I'd suggest moving the preempt_disable() up to the top of that function, regardless of whether we could then remove that seemingly stale TLB flush in that crazy smpboot_setup/restore_warm_reset_vector() dance... Andy? Linus