Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp6302769imm; Mon, 27 Aug 2018 13:18:05 -0700 (PDT) X-Google-Smtp-Source: ANB0VdZg5G3oz3IdlKxTKxa2Z/gCA6YDZ2HyV23FaCV1xY/VXgpd+jRyK59EaBfO/cM3MicYvk7z X-Received: by 2002:a63:be4a:: with SMTP id g10-v6mr13678640pgo.378.1535401085349; Mon, 27 Aug 2018 13:18:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535401085; cv=none; d=google.com; s=arc-20160816; b=I9YqRVf5+Tt1bIRHFVcpphGuOJ7GXq3V9qckEbdo5LFLGfot7GckYnPIbEUI047GqY 3+rktn6eZPxUdiCPInxlJMT9nI39QnIcrdeSJpIJtlcvbBtR1VbE7YvmFHJuDA4/MrAM WBp1nUo5YYKgE3YRPUNEsogOzrW97LgbWxsbpuYCwNKpSEKmPtwV+0V4mrx4dfSeYl2L 4EP5rzF6NbqNDPPWygHIprt3FqhC+1Pew4N84YmEiJwVYrary8zhPJcDtzBaya3x836x tgs5GgRpLaEPuTKD74H7feAVljpYBqVx5vg6D1MkfrVivXpOZ4mzkAWR+i2F6opjlYve yPrA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:arc-authentication-results; bh=w2rilaAqy6gNVgryVoyHTN98egfrOruUloIVWokSAUg=; b=KIXe35M9mDkk/5rRpBpdVO80nz0czIdMxoJhmdd12OnjO/eTQmR0R8bTF2x7NU77Uc jsj6XnZubnlr+ACS6qW4gsqZ8Kpih+yaDVE/+IhNgXBvUnPWfpyqIyax1SzkxPUX+13C WkaAntmf52br3n2A7sekpSKXG9sZ/4ASEGDvbQdyPSG640cet4SMJiBTSferS6/xWtMR 4wd+QgupeQucPEorSk74GIk6J+UeuK7Q49Z68MHfovhsrAPo/DA5N8vMSUdUWj0TnNIA aFcrSvJUYBgr2sJDZGs3Zt5CSIaKpOO4Y8xFysn75eCXN4q6KBcuDcdMhzRwitWUQAE8 Pdfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=OLSUMMRs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i12-v6si187716pfj.190.2018.08.27.13.17.49; Mon, 27 Aug 2018 13:18:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=OLSUMMRs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727309AbeH1AEe (ORCPT + 99 others); Mon, 27 Aug 2018 20:04:34 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:41475 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727013AbeH1AEe (ORCPT ); Mon, 27 Aug 2018 20:04:34 -0400 Received: by mail-pg1-f193.google.com with SMTP id s15-v6so87428pgv.8 for ; Mon, 27 Aug 2018 13:16:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=w2rilaAqy6gNVgryVoyHTN98egfrOruUloIVWokSAUg=; b=OLSUMMRsiDfhiNcEawfd81Dc4H2kGRWh5VAP3OhtauqEYzGnjlrzzV36oGN8y1l1jt Ki9Ym/96/MsSt/SD1n1KrARtTEvWPOQ/R/SJi5r4KDHl86UOfvPm/V/x16ISHeFRn7vs Qq2hqhn0OSCxiulQyi5mwqelm2VgZS3mj/Ji2HXYCqK37BfmlnZb1kLlwU0MhDgSc/1d kO+yR5tY7jWEyhWoJKyHDZQcCFWAs3khbBbKp+Et1+FdrdmIHcPqAmPQLEaTU4pbqhEO TCPhs6AfLX5G4qgG4thR5DhiH9pnGM9AngI80t77z2iBMrjkxgJzOBTOloziEKaA9Qsu ceUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=w2rilaAqy6gNVgryVoyHTN98egfrOruUloIVWokSAUg=; b=fAlNzRwy8DXU8zXROvl+e4Um+nLAERw4atxqsVpEd01Y/pHolHxZm7L2+2HN8EgAr1 eIURzUkIsCk+WNJSjloMeTO/tkApDwHiH7O+gZWeMmIifA/bDbwNL6MPFsJ6L6JOga/O xJmWeN8xRZ+DvFnry9f4gTvKjOm4ACyektJejbYIegRAOnAAcin/agqVFt96H3WQvRQ8 hf85ZHrFcNv4MSRopg7RjwFWbrR+4JZ0Xpi65aIarZN29/lJ82N/AMoeIi4eKdgJzSTz fnLykhJqzAgQhjftTpNMnMp3syDcSHmAuCOIa6wHioBzE4qokjGTJR7Fqddx3OzHgKHR Gryg== X-Gm-Message-State: APzg51BoqU7WTeGsNqBcsI0Q381FKKXKLHt1WWjn6WxZAarszyVnOQLC gYRjm3zYArUsqZ4ZE6Hs5oA= X-Received: by 2002:a62:86ca:: with SMTP id x193-v6mr15853290pfd.252.1535400985450; Mon, 27 Aug 2018 13:16:25 -0700 (PDT) Received: from [10.33.114.204] ([66.170.99.1]) by smtp.gmail.com with ESMTPSA id t12-v6sm104465pgg.72.2018.08.27.13.16.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 27 Aug 2018 13:16:24 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) Subject: Re: TLB flushes on fixmap changes From: Nadav Amit In-Reply-To: Date: Mon, 27 Aug 2018 13:16:22 -0700 Cc: Masami Hiramatsu , Peter Zijlstra , Kees Cook , Linus Torvalds , Paolo Bonzini , Jiri Kosina , Will Deacon , Benjamin Herrenschmidt , Nick Piggin , the arch/x86 maintainers , Borislav Petkov , Rik van Riel , Jann Horn , Adin Scannell , Dave Hansen , Linux Kernel Mailing List , linux-mm , David Miller , Martin Schwidefsky , Michael Ellerman Content-Transfer-Encoding: quoted-printable Message-Id: <08A6BCB2-66C2-47ED-AEB8-AA8F4D7DBD45@gmail.com> References: <20180824180438.GS24124@hirez.programming.kicks-ass.net> <56A9902F-44BE-4520-A17C-26650FCC3A11@gmail.com> <9A38D3F4-2F75-401D-8B4D-83A844C9061B@gmail.com> <8E0D8C66-6F21-4890-8984-B6B3082D4CC5@gmail.com> <20180826112341.f77a528763e297cbc36058fa@kernel.org> <20180826090958.GT24124@hirez.programming.kicks-ass.net> <20180827120305.01a6f26267c64610cadec5d8@kernel.org> <4BF82052-4738-441C-8763-26C85003F2C9@gmail.com> <20180827170511.6bafa15cbc102ae135366e86@kernel.org> <01DA0BDD-7504-4209-8A8F-20B27CF6A1C7@gmail.com> <0000D631-FDDF-4273-8F3C-714E6825E59B@gmail.com> <823D916E-4056-4A36-BDD8-0FB682A8DCAE@gmail.com> To: Andy Lutomirski X-Mailer: Apple Mail (2.3445.9.1) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org at 12:58 PM, Andy Lutomirski wrote: > On Mon, Aug 27, 2018 at 12:43 PM, Nadav Amit = wrote: >> at 12:10 PM, Nadav Amit wrote: >>=20 >>> at 11:58 AM, Andy Lutomirski wrote: >>>=20 >>>> On Mon, Aug 27, 2018 at 11:54 AM, Nadav Amit = wrote: >>>>>> On Mon, Aug 27, 2018 at 10:34 AM, Nadav Amit = wrote: >>>>>> What do you all think? >>>>>=20 >>>>> I agree in general. But I think that current->mm would need to be = loaded, as >>>>> otherwise I am afraid it would break switch_mm_irqs_off(). >>>>=20 >>>> What breaks? >>>=20 >>> Actually nothing. I just saw the IBPB stuff regarding tsk, but it = should not >>> matter. >>=20 >> So here is what I got. It certainly needs some cleanup, but it boots. >>=20 >> Let me know how crappy you find it... >>=20 >>=20 >> diff --git a/arch/x86/include/asm/mmu_context.h = b/arch/x86/include/asm/mmu_context.h >> index bbc796eb0a3b..336779650a41 100644 >> --- a/arch/x86/include/asm/mmu_context.h >> +++ b/arch/x86/include/asm/mmu_context.h >> @@ -343,4 +343,24 @@ static inline unsigned long = __get_current_cr3_fast(void) >> return cr3; >> } >>=20 >> +typedef struct { >> + struct mm_struct *prev; >> +} temporary_mm_state_t; >> + >> +static inline temporary_mm_state_t use_temporary_mm(struct mm_struct = *mm) >> +{ >> + temporary_mm_state_t state; >> + >> + lockdep_assert_irqs_disabled(); >> + state.prev =3D this_cpu_read(cpu_tlbstate.loaded_mm); >> + switch_mm_irqs_off(NULL, mm, current); >> + return state; >> +} >> + >> +static inline void unuse_temporary_mm(temporary_mm_state_t prev) >> +{ >> + lockdep_assert_irqs_disabled(); >> + switch_mm_irqs_off(NULL, prev.prev, current); >> +} >> + >> #endif /* _ASM_X86_MMU_CONTEXT_H */ >> diff --git a/arch/x86/include/asm/pgtable.h = b/arch/x86/include/asm/pgtable.h >> index 5715647fc4fe..ef62af9a0ef7 100644 >> --- a/arch/x86/include/asm/pgtable.h >> +++ b/arch/x86/include/asm/pgtable.h >> @@ -976,6 +976,10 @@ static inline void __meminit = init_trampoline_default(void) >> /* Default trampoline pgd value */ >> trampoline_pgd_entry =3D = init_top_pgt[pgd_index(__PAGE_OFFSET)]; >> } >> + >> +void __init patching_mm_init(void); >> +#define patching_mm_init patching_mm_init >> + >> # ifdef CONFIG_RANDOMIZE_MEMORY >> void __meminit init_trampoline(void); >> # else >> diff --git a/arch/x86/include/asm/pgtable_64_types.h = b/arch/x86/include/asm/pgtable_64_types.h >> index 054765ab2da2..9f44262abde0 100644 >> --- a/arch/x86/include/asm/pgtable_64_types.h >> +++ b/arch/x86/include/asm/pgtable_64_types.h >> @@ -116,6 +116,9 @@ extern unsigned int ptrs_per_p4d; >> #define LDT_PGD_ENTRY (pgtable_l5_enabled() ? = LDT_PGD_ENTRY_L5 : LDT_PGD_ENTRY_L4) >> #define LDT_BASE_ADDR (LDT_PGD_ENTRY << PGDIR_SHIFT) >>=20 >> +#define TEXT_POKE_PGD_ENTRY -5UL >> +#define TEXT_POKE_ADDR (TEXT_POKE_PGD_ENTRY << PGDIR_SHIFT) >> + >> #define __VMALLOC_BASE_L4 0xffffc90000000000UL >> #define __VMALLOC_BASE_L5 0xffa0000000000000UL >>=20 >> diff --git a/arch/x86/include/asm/pgtable_types.h = b/arch/x86/include/asm/pgtable_types.h >> index 99fff853c944..840c72ec8c4f 100644 >> --- a/arch/x86/include/asm/pgtable_types.h >> +++ b/arch/x86/include/asm/pgtable_types.h >> @@ -505,6 +505,9 @@ pgprot_t phys_mem_access_prot(struct file *file, = unsigned long pfn, >> /* Install a pte for a particular vaddr in kernel space. */ >> void set_pte_vaddr(unsigned long vaddr, pte_t pte); >>=20 >> +struct mm_struct; >> +void set_mm_pte_vaddr(struct mm_struct *mm, unsigned long vaddr, = pte_t pte); >> + >> #ifdef CONFIG_X86_32 >> extern void native_pagetable_init(void); >> #else >> diff --git a/arch/x86/include/asm/text-patching.h = b/arch/x86/include/asm/text-patching.h >> index 2ecd34e2d46c..cb364ea5b19d 100644 >> --- a/arch/x86/include/asm/text-patching.h >> +++ b/arch/x86/include/asm/text-patching.h >> @@ -38,4 +38,6 @@ extern void *text_poke(void *addr, const void = *opcode, size_t len); >> extern int poke_int3_handler(struct pt_regs *regs); >> extern void *text_poke_bp(void *addr, const void *opcode, size_t len, = void *handler); >>=20 >> +extern struct mm_struct *patching_mm; >> + >> #endif /* _ASM_X86_TEXT_PATCHING_H */ >> diff --git a/arch/x86/kernel/alternative.c = b/arch/x86/kernel/alternative.c >> index a481763a3776..fd8a950b0d62 100644 >> --- a/arch/x86/kernel/alternative.c >> +++ b/arch/x86/kernel/alternative.c >> @@ -11,6 +11,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -701,8 +702,36 @@ void *text_poke(void *addr, const void *opcode, = size_t len) >> WARN_ON(!PageReserved(pages[0])); >> pages[1] =3D virt_to_page(addr + PAGE_SIZE); >> } >> - BUG_ON(!pages[0]); >> + >> local_irq_save(flags); >> + BUG_ON(!pages[0]); >> + >> + /* >> + * During initial boot, it is hard to initialize patching_mm = due to >> + * dependencies in boot order. >> + */ >> + if (patching_mm) { >> + pte_t pte; >> + temporary_mm_state_t prev; >> + >> + prev =3D use_temporary_mm(patching_mm); >> + pte =3D mk_pte(pages[0], PAGE_KERNEL); >> + set_mm_pte_vaddr(patching_mm, TEXT_POKE_ADDR, pte); >> + pte =3D mk_pte(pages[1], PAGE_KERNEL); >> + set_mm_pte_vaddr(patching_mm, TEXT_POKE_ADDR + = PAGE_SIZE, pte); >> + >> + memcpy((void *)(TEXT_POKE_ADDR | ((unsigned long)addr = & ~PAGE_MASK)), >> + opcode, len); >> + >> + set_mm_pte_vaddr(patching_mm, TEXT_POKE_ADDR, = __pte(0)); >> + set_mm_pte_vaddr(patching_mm, TEXT_POKE_ADDR + = PAGE_SIZE, __pte(0)); >> + local_flush_tlb(); >=20 > Hmm. This is stuff busted on SMP, and it's IMO more complicated than > needed. How about getting rid of all the weird TLB flushing stuff and > instead putting the mapping at vaddr - __START_KERNEL_map or whatever > it is? You *might* need to flush_tlb_mm_range() on module unload, but > that's it. I don=E2=80=99t see what=E2=80=99s wrong in SMP, since this entire piece = of code should be running under text_mutex. I don=E2=80=99t quite understand your proposal. I really don=E2=80=99t = want to have any chance in which the page-tables for the poked address is not = preallocated. It is more complicated than needed, and there are redundant TLB flushes. = The reason I preferred to do it this way, is in order not to use other = functions that take locks during the software page-walk and not to duplicate = existing code. Yet, duplication might be the way to go. >> + sync_core(); >=20 > I can't think of any case where sync_core() is needed. The mm switch > serializes. Good point! >=20 > Also, is there any circumstance in which any of this is used before at > least jump table init? All the early stuff is text_poke_early(), > right? Not before jump_label_init. However, I did not manage to get rid of the = two code-patches in text_poke(), since text_poke is used relatively early by x86_late_time_init(), and at this stage kmem_cache_alloc() - which is = needed to duplicate init_mm - still fails.