Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp6287058imm; Mon, 27 Aug 2018 13:00:51 -0700 (PDT) X-Google-Smtp-Source: ANB0VdYQUXexar/GW5ool/Z6RccVa1VS5YzcXnItr1vCkEHdjYlTlLPzcK2PzRccJP/yKYtmaas+ X-Received: by 2002:a63:9c19:: with SMTP id f25-v6mr4824620pge.447.1535400051336; Mon, 27 Aug 2018 13:00:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535400051; cv=none; d=google.com; s=arc-20160816; b=qZykmgL9poy8F/ZVOMofInQ8W/gfuudmqmxM07gHI6CmcoZ0O53JQPzSdh827wQtDf tW0HTrfSermEGTzPmNFGo8zsx9oCzh2z8Y7sWVjhsaT7NJOq/5JEgGRNYs/P8zpU49mv dY++5TkDv38EgbP48pEmfzefWlW1DtvtnrRve8Y7BeBh+9lrkATpGne3OB7o27+1H9XF PZgVKxY6iOakiu0i3rq8vl6g8yD4ZJNuc49Dl20lvadwiwKLM31v88cGJ51HRYit2t/7 OY1LvovVEf6hWACy1HSktdobV9rge9UxF0OAEPO0RcfRhlgxXg6TyTHOW6KT1ooIagFd 6nWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=jCxOismmO9q4VrOMvlbCHJ2JG//IasbrrwYqJwoq5bE=; b=O6I/2rxYHxfroCm4uFa1EfAqIvd5L5SRwqmrBcb9LGibbAo3M8ojwEEIU8E6KRanOG N3pZppO+p7KCeGDg0gS7upI0ScLY8HT6EdOABsZ/VoAIMuAzGYX3Ix1QRVLHKEbvXSx8 oOWijQeOvcLsUVELjBXgYPruBKtFMpCzBIhI68Nx1tRQuRxPR4HYTqeBsRDWZZJZG2QV LGXXYpWbRPBg1AoR2Gk+yB0ut1imFd3DqLt/yIjR82YGBfY/TR2orOeZuPrq4J+vxJMM URiuJb2V8sNcDHRzdYHzkuFwllANwldl2XDySkxjqpoZVdYRHQy49s5jWwSay4w8cmdq kknw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=oBeqARXT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d24-v6si105047pls.477.2018.08.27.13.00.35; Mon, 27 Aug 2018 13:00:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=oBeqARXT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727522AbeH0Xqt (ORCPT + 99 others); Mon, 27 Aug 2018 19:46:49 -0400 Received: from mail.kernel.org ([198.145.29.99]:55118 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726826AbeH0Xqt (ORCPT ); Mon, 27 Aug 2018 19:46:49 -0400 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 3417E208EA for ; Mon, 27 Aug 2018 19:58:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1535399924; bh=yfV6MujTmif/z/8GtZtzONwcByqgY7LXbibozEpy4VY=; h=In-Reply-To:References:From:Date:Subject:To:Cc:From; b=oBeqARXTv1m0aUlxYnB3uYGFTw03zbU5EkFs3wQxnjiHK3Gg0hgK99EnmDFWCvUYu 7MGejzb0sRjB1c9JTXeF7l6tmmxqzCYUi38Ig9lIAXA78NRm190g4h1gKMFZwO/ojz TosZCitV4ahD7IcMkJcy/ucxNy4pGCJZn9tjfCQQ= Received: by mail-wr1-f42.google.com with SMTP id k5-v6so133209wre.10 for ; Mon, 27 Aug 2018 12:58:44 -0700 (PDT) X-Gm-Message-State: APzg51A3VyrxHl/QsT/XgYfnPoeRLo4lf1wzdAG94Ju9RyfUeXWlSRTK 9lg8pkt1SXV/GC2UlDpBMSI7ilzK79eIhQ6CHx2CdQ== X-Received: by 2002:adf:dcc1:: with SMTP id x1-v6mr9946045wrm.21.1535399922606; Mon, 27 Aug 2018 12:58:42 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a1c:548:0:0:0:0:0 with HTTP; Mon, 27 Aug 2018 12:58:22 -0700 (PDT) In-Reply-To: References: <20180824180438.GS24124@hirez.programming.kicks-ass.net> <56A9902F-44BE-4520-A17C-26650FCC3A11@gmail.com> <9A38D3F4-2F75-401D-8B4D-83A844C9061B@gmail.com> <8E0D8C66-6F21-4890-8984-B6B3082D4CC5@gmail.com> <20180826112341.f77a528763e297cbc36058fa@kernel.org> <20180826090958.GT24124@hirez.programming.kicks-ass.net> <20180827120305.01a6f26267c64610cadec5d8@kernel.org> <4BF82052-4738-441C-8763-26C85003F2C9@gmail.com> <20180827170511.6bafa15cbc102ae135366e86@kernel.org> <01DA0BDD-7504-4209-8A8F-20B27CF6A1C7@gmail.com> <0000D631-FDDF-4273-8F3C-714E6825E59B@gmail.com> <823D916E-4056-4A36-BDD8-0FB682A8DCAE@gmail.com> From: Andy Lutomirski Date: Mon, 27 Aug 2018 12:58:22 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: TLB flushes on fixmap changes To: Nadav Amit Cc: Andy Lutomirski , Masami Hiramatsu , Peter Zijlstra , Kees Cook , Linus Torvalds , Paolo Bonzini , Jiri Kosina , Will Deacon , Benjamin Herrenschmidt , Nick Piggin , "the arch/x86 maintainers" , Borislav Petkov , Rik van Riel , Jann Horn , Adin Scannell , Dave Hansen , Linux Kernel Mailing List , linux-mm , David Miller , Martin Schwidefsky , Michael Ellerman Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 27, 2018 at 12:43 PM, Nadav Amit wrote: > at 12:10 PM, Nadav Amit wrote: > >> at 11:58 AM, Andy Lutomirski wrote: >> >>> On Mon, Aug 27, 2018 at 11:54 AM, Nadav Amit wrote: >>>>> On Mon, Aug 27, 2018 at 10:34 AM, Nadav Amit wrote: >>>>> What do you all think? >>>> >>>> I agree in general. But I think that current->mm would need to be loaded, as >>>> otherwise I am afraid it would break switch_mm_irqs_off(). >>> >>> What breaks? >> >> Actually nothing. I just saw the IBPB stuff regarding tsk, but it should not >> matter. > > So here is what I got. It certainly needs some cleanup, but it boots. > > Let me know how crappy you find it... > > > diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h > index bbc796eb0a3b..336779650a41 100644 > --- a/arch/x86/include/asm/mmu_context.h > +++ b/arch/x86/include/asm/mmu_context.h > @@ -343,4 +343,24 @@ static inline unsigned long __get_current_cr3_fast(void) > return cr3; > } > > +typedef struct { > + struct mm_struct *prev; > +} temporary_mm_state_t; > + > +static inline temporary_mm_state_t use_temporary_mm(struct mm_struct *mm) > +{ > + temporary_mm_state_t state; > + > + lockdep_assert_irqs_disabled(); > + state.prev = this_cpu_read(cpu_tlbstate.loaded_mm); > + switch_mm_irqs_off(NULL, mm, current); > + return state; > +} > + > +static inline void unuse_temporary_mm(temporary_mm_state_t prev) > +{ > + lockdep_assert_irqs_disabled(); > + switch_mm_irqs_off(NULL, prev.prev, current); > +} > + > #endif /* _ASM_X86_MMU_CONTEXT_H */ > diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h > index 5715647fc4fe..ef62af9a0ef7 100644 > --- a/arch/x86/include/asm/pgtable.h > +++ b/arch/x86/include/asm/pgtable.h > @@ -976,6 +976,10 @@ static inline void __meminit init_trampoline_default(void) > /* Default trampoline pgd value */ > trampoline_pgd_entry = init_top_pgt[pgd_index(__PAGE_OFFSET)]; > } > + > +void __init patching_mm_init(void); > +#define patching_mm_init patching_mm_init > + > # ifdef CONFIG_RANDOMIZE_MEMORY > void __meminit init_trampoline(void); > # else > diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h > index 054765ab2da2..9f44262abde0 100644 > --- a/arch/x86/include/asm/pgtable_64_types.h > +++ b/arch/x86/include/asm/pgtable_64_types.h > @@ -116,6 +116,9 @@ extern unsigned int ptrs_per_p4d; > #define LDT_PGD_ENTRY (pgtable_l5_enabled() ? LDT_PGD_ENTRY_L5 : LDT_PGD_ENTRY_L4) > #define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT) > > +#define TEXT_POKE_PGD_ENTRY -5UL > +#define TEXT_POKE_ADDR (TEXT_POKE_PGD_ENTRY << PGDIR_SHIFT) > + > #define __VMALLOC_BASE_L4 0xffffc90000000000UL > #define __VMALLOC_BASE_L5 0xffa0000000000000UL > > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h > index 99fff853c944..840c72ec8c4f 100644 > --- a/arch/x86/include/asm/pgtable_types.h > +++ b/arch/x86/include/asm/pgtable_types.h > @@ -505,6 +505,9 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, > /* Install a pte for a particular vaddr in kernel space. */ > void set_pte_vaddr(unsigned long vaddr, pte_t pte); > > +struct mm_struct; > +void set_mm_pte_vaddr(struct mm_struct *mm, unsigned long vaddr, pte_t pte); > + > #ifdef CONFIG_X86_32 > extern void native_pagetable_init(void); > #else > diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h > index 2ecd34e2d46c..cb364ea5b19d 100644 > --- a/arch/x86/include/asm/text-patching.h > +++ b/arch/x86/include/asm/text-patching.h > @@ -38,4 +38,6 @@ extern void *text_poke(void *addr, const void *opcode, size_t len); > extern int poke_int3_handler(struct pt_regs *regs); > extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler); > > +extern struct mm_struct *patching_mm; > + > #endif /* _ASM_X86_TEXT_PATCHING_H */ > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c > index a481763a3776..fd8a950b0d62 100644 > --- a/arch/x86/kernel/alternative.c > +++ b/arch/x86/kernel/alternative.c > @@ -11,6 +11,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -701,8 +702,36 @@ void *text_poke(void *addr, const void *opcode, size_t len) > WARN_ON(!PageReserved(pages[0])); > pages[1] = virt_to_page(addr + PAGE_SIZE); > } > - BUG_ON(!pages[0]); > + > local_irq_save(flags); > + BUG_ON(!pages[0]); > + > + /* > + * During initial boot, it is hard to initialize patching_mm due to > + * dependencies in boot order. > + */ > + if (patching_mm) { > + pte_t pte; > + temporary_mm_state_t prev; > + > + prev = use_temporary_mm(patching_mm); > + pte = mk_pte(pages[0], PAGE_KERNEL); > + set_mm_pte_vaddr(patching_mm, TEXT_POKE_ADDR, pte); > + pte = mk_pte(pages[1], PAGE_KERNEL); > + set_mm_pte_vaddr(patching_mm, TEXT_POKE_ADDR + PAGE_SIZE, pte); > + > + memcpy((void *)(TEXT_POKE_ADDR | ((unsigned long)addr & ~PAGE_MASK)), > + opcode, len); > + > + set_mm_pte_vaddr(patching_mm, TEXT_POKE_ADDR, __pte(0)); > + set_mm_pte_vaddr(patching_mm, TEXT_POKE_ADDR + PAGE_SIZE, __pte(0)); > + local_flush_tlb(); Hmm. This is stuff busted on SMP, and it's IMO more complicated than needed. How about getting rid of all the weird TLB flushing stuff and instead putting the mapping at vaddr - __START_KERNEL_map or whatever it is? You *might* need to flush_tlb_mm_range() on module unload, but that's it. > + sync_core(); I can't think of any case where sync_core() is needed. The mm switch serializes. Also, is there any circumstance in which any of this is used before at least jump table init? All the early stuff is text_poke_early(), right?