Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp4036094rwr; Mon, 8 May 2023 01:41:53 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7TpQ0sulUpTOtu348DwSs1BDh6f8Uf1DczFcGL2sdR8gunlV5SYcHCJ4dyb8h2kcJpBz1i X-Received: by 2002:a05:6a20:160c:b0:100:e1f3:9bd2 with SMTP id l12-20020a056a20160c00b00100e1f39bd2mr546211pzj.40.1683535312962; Mon, 08 May 2023 01:41:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683535312; cv=none; d=google.com; s=arc-20160816; b=DmR/sU3cTaKKL8vr3CRS1A2C6AMF3WqGUJD2TEXHLNf+n5iLH7QyKqKj71i8G0ki8q WrdxJxQiYBN4Yo9mH9lANa2WIlDPdZFeTaPmU+f1+bhQXLZAjVyD4HjAiTR/AMuUF+ey FbhuVZLOullyFaKqI423C+8hyUjqEIwjksnO7VWraniDijT7QqpzX7G0ZZiY6+cVVkgW +DVtOOAhDMut651hRG4noNyyltWu35jlbMQvvNSwhs0X6I7pASdNbgUxNHEiufpVD7fI LKS0xLccOUjqRvwoZ44SDjBe2ntPeuLbfJ8TsMZWq1QvXrhp887aZywaGn80ll3yrRVe TlHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=3q1uT5SmdzZDV8ged1jIWVc1IT1BZVaQxuv/otBiOgw=; b=E79zHpd43B7QkCzOsGBrJeHZTmfAlZjO+B/U6nqKDI5RzDRIGR+SBK2StwfgY+ybZJ tB7LRAIPi0AeH62NfIMxNRMqxbbaIpYbe5nxNf9pqhH5ukKrMlih4cFeO5jqfwwmGD8p lGEtN3UPfc3OzxDn8T3Ak7ghnyH41qJ6nkqcqivmH4tjlUsp9krYIDdnKnp+gWOCLOxi kTQeP/3z4MccqnP4XMwVFtmtUpT45iSpJM7XXWDkCMjHasHQ1+21JLKgq4jhWKFk4R92 yb9lxzrHU98IRDJOh30Yi43NYeyEU3iyCsfyoL9bvSQJCPsJTR/Z/QjEA0vWDmyqgYWP agMQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=antgroup.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lb9-20020a17090b4a4900b00240ad93edfesi26562404pjb.5.2023.05.08.01.41.40; Mon, 08 May 2023 01:41:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=antgroup.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232700AbjEHIcf (ORCPT + 99 others); Mon, 8 May 2023 04:32:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53668 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229492AbjEHIce (ORCPT ); Mon, 8 May 2023 04:32:34 -0400 Received: from out0-211.mail.aliyun.com (out0-211.mail.aliyun.com [140.205.0.211]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 31AB626B1 for ; Mon, 8 May 2023 01:32:32 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R331e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018047188;MF=houwenlong.hwl@antgroup.com;NM=1;PH=DS;RN=16;SR=0;TI=SMTPD_---.SbR-kOs_1683534743; Received: from localhost(mailfrom:houwenlong.hwl@antgroup.com fp:SMTPD_---.SbR-kOs_1683534743) by smtp.aliyun-inc.com; Mon, 08 May 2023 16:32:24 +0800 Date: Mon, 08 May 2023 16:32:23 +0800 From: "Hou Wenlong" To: Ard Biesheuvel Cc: , "Lai Jiangshan" , "Kees Cook" , "Thomas Gleixner" , "Ingo Molnar" , "Borislav Petkov" , "Dave Hansen" , , "H. Peter Anvin" , "Peter Zijlstra" , "Petr Mladek" , "Greg Kroah-Hartman" , "Jason A. Donenfeld" , "Song Liu" , "Julian Pidancet" Subject: Re: [PATCH RFC 31/43] x86/modules: Adapt module loading for PIE support Message-ID: <20230508083223.GA116442@k08j02272.eu95sqa> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Apr 29, 2023 at 03:29:32AM +0800, Ard Biesheuvel wrote: > On Fri, 28 Apr 2023 at 10:53, Hou Wenlong wrote: > > > > Adapt module loading to support PIE relocations. No GOT is generared for > > module, all the GOT entry of got references in module should exist in > > kernel GOT. Currently, there is only one usable got reference for > > __fentry__(). > > > > I don't think this is the right approach. We should permit GOTPCREL > relocations properly, which means making them point to a location in > memory that carries the absolute address of the symbol. There are > several ways to go about that, but perhaps the simplest way is to make > the symbol address in ksymtab a 64-bit absolute value (but retain the > PC32 references for the symbol name and the symbol namespace name). > That way, you can always resolve such GOTPCREL relocations by pointing > it to the ksymtab entry. Another option would be to take inspiration > from the PLT code we have on ARM and arm64 (and other architectures, > surely) and to count the GOT based relocations, allocate some extra > r/o module space for each, and allocate slots and populate them with > the right value as you fix up the relocations. > > Then, many such relocations can be relaxed at module load time if the > symbol is in range. IIUC, the module and kernel will still be inside > the same 2G window even after widening the KASLR range to 512G, so > most GOT loads can be converted into RIP relative LEA instructions. > > Note that this will also permit you to do things like > > #define PV_VCPU_PREEMPTED_ASM \ > "leaq __per_cpu_offset(%rip), %rax \n\t" \ > "movq (%rax,%rdi,8), %rax \n\t" \ > "addq steal_time@GOTPCREL(%rip), %rax \n\t" \ > "cmpb $0, " __stringify(KVM_STEAL_TIME_preempted) "(%rax) \n\t" \ > "setne %al\n\t" > > or > > +#ifdef CONFIG_X86_PIE > + " pushq arch_rethook_trampoline@GOTPCREL(%rip)\n" > +#else > " pushq $arch_rethook_trampoline\n" > +#endif > > instead of having these kludgy push/pop sequences to free up temp registers. > > (FYI I have looked into this PIE linking just a few weeks ago [0] so > this is all rather fresh in my memory) > > > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=x86-pie > > Hi Ard, Thanks for providing the link, it has been very helpful for me as I am new to the topic of compilers. One key difference I noticed is that you linked the kernel with "-pie" instead of "--emit-reloc". I also noticed that Thomas' initial patchset[0] used "-pie", but in RFC v3 [1], it switched to "--emit-reloc" in order to reduce dynamic relocation space on mapped memory. The another issue is that it requires the addition of the "-mrelax-relocations=no" option to support older compilers and linkers. R_X86_64_GOTPCRELX and R_X86_64_REX_GOTPCRELX relocations are supported in binutils 2.26 and later, but the mini version required for the kernel is 2.25. This option disables relocation relaxation, which makes GOT not empty. I also noticed this option in arch/x86/boot/compressed/Makefile with the reason given in [2]. Without relocation relaxation, GOT references would increase the size of GOT. Therefore, I do not want to use GOT reference in assembly directly. However, I realized that the compiler could still generate GOT references in some cases such as "fentry" calls and stack canary references. Regarding module loading, I agree that we should support GOT reference for the module itself. I will refactor it according to your suggestion. Thanks. [0] https://yhbt.net/lore/all/20170718223333.110371-20-thgarnie@google.com [1] https://yhbt.net/lore/all/20171004212003.28296-1-thgarnie@google.com [2] https://lore.kernel.org/all/20200903203053.3411268-2-samitolvanen@google.com/ > > Signed-off-by: Hou Wenlong > > Cc: Thomas Garnier > > Cc: Lai Jiangshan > > Cc: Kees Cook > > --- > > arch/x86/include/asm/sections.h | 5 +++++ > > arch/x86/kernel/module.c | 27 +++++++++++++++++++++++++++ > > 2 files changed, 32 insertions(+) > > > > diff --git a/arch/x86/include/asm/sections.h b/arch/x86/include/asm/sections.h > > index a6e8373a5170..dc1c2b08ec48 100644 > > --- a/arch/x86/include/asm/sections.h > > +++ b/arch/x86/include/asm/sections.h > > @@ -12,6 +12,11 @@ extern char __end_rodata_aligned[]; > > > > #if defined(CONFIG_X86_64) > > extern char __end_rodata_hpage_align[]; > > + > > +#ifdef CONFIG_X86_PIE > > +extern char __start_got[], __end_got[]; > > +#endif > > + > > #endif > > > > extern char __end_of_kernel_reserve[]; > > diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c > > index 84ad0e61ba6e..051f88e6884e 100644 > > --- a/arch/x86/kernel/module.c > > +++ b/arch/x86/kernel/module.c > > @@ -129,6 +129,18 @@ int apply_relocate(Elf32_Shdr *sechdrs, > > return 0; > > } > > #else /*X86_64*/ > > +#ifdef CONFIG_X86_PIE > > +static u64 find_got_kernel_entry(Elf64_Sym *sym, const Elf64_Rela *rela) > > +{ > > + u64 *pos; > > + > > + for (pos = (u64 *)__start_got; pos < (u64 *)__end_got; pos++) > > + if (*pos == sym->st_value) > > + return (u64)pos + rela->r_addend; > > + return 0; > > +} > > +#endif > > + > > static int __write_relocate_add(Elf64_Shdr *sechdrs, > > const char *strtab, > > unsigned int symindex, > > @@ -171,6 +183,7 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs, > > case R_X86_64_64: > > size = 8; > > break; > > +#ifndef CONFIG_X86_PIE > > case R_X86_64_32: > > if (val != *(u32 *)&val) > > goto overflow; > > @@ -181,6 +194,13 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs, > > goto overflow; > > size = 4; > > break; > > +#else > > + case R_X86_64_GOTPCREL: > > + val = find_got_kernel_entry(sym, rel); > > + if (!val) > > + goto unexpected_got_reference; > > + fallthrough; > > +#endif > > case R_X86_64_PC32: > > case R_X86_64_PLT32: > > val -= (u64)loc; > > @@ -214,11 +234,18 @@ static int __write_relocate_add(Elf64_Shdr *sechdrs, > > } > > return 0; > > > > +#ifdef CONFIG_X86_PIE > > +unexpected_got_reference: > > + pr_err("Target got entry doesn't exist in kernel got, loc %p\n", loc); > > + return -ENOEXEC; > > +#else > > overflow: > > pr_err("overflow in relocation type %d val %Lx\n", > > (int)ELF64_R_TYPE(rel[i].r_info), val); > > pr_err("`%s' likely not compiled with -mcmodel=kernel\n", > > me->name); > > +#endif > > + > > return -ENOEXEC; > > } > > > > -- > > 2.31.1 > >