Building position independent code using GCC by default results in references
to symbols with external linkage to be resolved via GOT entries, which
carry the absolute addresses of the symbols, and thus need to be corrected
if the load time address of the executable != the link time address.
For fully linked binaries, such GOT indirected references are completely
useless, and actually make the startup code more complicated than necessary,
since these corrections may need to be applied more than once. In fact, we
have been very careful to avoid such references in the EFI stub code, since
it would require yet another [earlier] pass of GOT fixups which we currently
don't implement.
Older GCCs were quirky when it came to overriding this behavior using symbol
visibility, but now that we have increased the minimum GCC version to 4.6,
we can actually start setting the symbol visibility to 'hidden' globally for
all symbol references in the decompressor, getting rid of the GOT entirely.
This means we can get rid of the GOT fixup code right away, and we can start
using ordinary external symbol references in the EFI stub without running the
risk of boot regressions. (v2 note: we have already started doing this)
CC'ing Linus and Maarten, who were involved in diagnosing an issue related
to GOT entries emitted from the EFI stub ~5 years ago. [0] [1]
Many thanks to Arvind for the suggestions and the help in testing these
changes. Tested on GCC 4.6 + binutils 2.24 (Ubuntu 14.04), and GCC 8 +
binutils 2.31 (Debian Buster)
Changes since v1 [2]:
Rebase only - recent EFI changes have moved all the C code into
drivers/firmware/efi/libstub/, which is also built with hidden
visibility, and contains an additional objdump pass to detect (and
reject) absolute symbol references.
Unless anyone objects, I'd like to incorporate these changes into my
late EFI PR for v5.8, which will go out in a day or two.
Cc: Maarten Lankhorst <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Arvind Sankar <[email protected]>
[0] https://lore.kernel.org/lkml/[email protected]/
[1] https://lore.kernel.org/lkml/CA+55aFxW9PmtjOf9nUQwpU8swsFqJOz8whZXcONo+XFmkSwezg@mail.gmail.com/
[2] https://lore.kernel.org/linux-efi/[email protected]/
Ard Biesheuvel (3):
x86/boot/compressed: move .got.plt entries out of the .got section
x86/boot/compressed: force hidden visibility for all symbol references
x86/boot/compressed: get rid of GOT fixup code
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/head_32.S | 22 ++------
arch/x86/boot/compressed/head_64.S | 57 --------------------
arch/x86/boot/compressed/hidden.h | 19 +++++++
arch/x86/boot/compressed/vmlinux.lds.S | 16 ++++--
5 files changed, 36 insertions(+), 79 deletions(-)
create mode 100644 arch/x86/boot/compressed/hidden.h
--
2.20.1
The .got.plt section contains the part of the GOT which is used by PLT
entries, and which gets updated lazily by the dynamic loader when
function calls are dispatched through those PLT entries.
On fully linked binaries such as the kernel proper or the decompressor,
this never happens, and so in practice, the .got.plt section consists
only of the first 3 magic entries that are meant to point at the _DYNAMIC
section and at the fixup routine in the loader. However, since we don't
use a dynamic loader, those entries are never populated or used.
This means that treating those entries like ordinary GOT entries, and
updating their values based on the actual placement of the executable in
memory is completely pointless, and we can just ignore the .got.plt
section entirely, provided that it has no additional entries beyond
the first 3 ones.
So add an assertion in the linker script to ensure that this assumption
holds, and move the contents out of the [_got, _egot) memory range that
is modified by the GOT fixup routines.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/vmlinux.lds.S | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 0dc5c2b9614b..ce3fdfb93b57 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -44,10 +44,13 @@ SECTIONS
}
.got : {
_got = .;
- KEEP(*(.got.plt))
KEEP(*(.got))
_egot = .;
}
+ .got.plt : {
+ KEEP(*(.got.plt))
+ }
+
.data : {
_data = . ;
*(.data)
@@ -75,3 +78,11 @@ SECTIONS
. = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
_end = .;
}
+
+#ifdef CONFIG_X86_64
+ASSERT (SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0x18,
+ "Unexpected GOT/PLT entries detected!")
+#else
+ASSERT (SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0xc,
+ "Unexpected GOT/PLT entries detected!")
+#endif
--
2.20.1
Eliminate all GOT entries in the decompressor binary, by forcing hidden
visibility for all symbol references, which informs the compiler that
such references will be resolved at link time without the need for
allocating GOT entries.
To ensure that no GOT entries will creep back in, add an assertion to
the decompressor linker script that will fire if the .got section has
a non-zero size.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/hidden.h | 19 +++++++++++++++++++
arch/x86/boot/compressed/vmlinux.lds.S | 1 +
3 files changed, 21 insertions(+)
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 5f7c262bcc99..aa9ed814e5fa 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -40,6 +40,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
KBUILD_CFLAGS += -Wno-pointer-sign
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
+KBUILD_CFLAGS += -include hidden.h
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
diff --git a/arch/x86/boot/compressed/hidden.h b/arch/x86/boot/compressed/hidden.h
new file mode 100644
index 000000000000..49a17b6b5962
--- /dev/null
+++ b/arch/x86/boot/compressed/hidden.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * When building position independent code with GCC using the -fPIC option,
+ * (or even the -fPIE one on older versions), it will assume that we are
+ * building a dynamic object (either a shared library or an executable) that
+ * may have symbol references that can only be resolved at load time. For a
+ * variety of reasons (ELF symbol preemption, the CoW footprint of the section
+ * that is modified by the loader), this results in all references to symbols
+ * with external linkage to go via entries in the Global Offset Table (GOT),
+ * which carries absolute addresses which need to be fixed up when the
+ * executable image is loaded at an offset which is different from its link
+ * time offset.
+ *
+ * Fortunately, there is a way to inform the compiler that such symbol
+ * references will be satisfied at link time rather than at load time, by
+ * giving them 'hidden' visibility.
+ */
+
+#pragma GCC visibility push(hidden)
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index ce3fdfb93b57..60a99dfb9d72 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -79,6 +79,7 @@ SECTIONS
_end = .;
}
+ASSERT (SIZEOF(.got) == 0, "Unexpected GOT entries detected!")
#ifdef CONFIG_X86_64
ASSERT (SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0x18,
"Unexpected GOT/PLT entries detected!")
--
2.20.1
In a previous patch, we have eliminated GOT entries from the decompressor
binary and added an assertion that the .got section is empty. This means
that the GOT fixup routines that exist in both the 32-bit and 64-bit
startup routines have become dead code, and can be removed.
Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/x86/boot/compressed/head_32.S | 22 ++------
arch/x86/boot/compressed/head_64.S | 57 --------------------
arch/x86/boot/compressed/vmlinux.lds.S | 2 -
3 files changed, 3 insertions(+), 78 deletions(-)
diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index ab3307036ba4..dfa4131c65df 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -49,16 +49,13 @@
* Position Independent Executable (PIE) so that linker won't optimize
* R_386_GOT32X relocation to its fixed symbol address. Older
* linkers generate R_386_32 relocations against locally defined symbols,
- * _bss, _ebss, _got and _egot, in PIE. It isn't wrong, just less
- * optimal than R_386_RELATIVE. But the x86 kernel fails to properly handle
+ * _bss, _ebss, in PIE. It isn't wrong, just suboptimal compared
+ * to R_386_RELATIVE. But the x86 kernel fails to properly handle
* R_386_32 relocations when relocating the kernel. To generate
- * R_386_RELATIVE relocations, we mark _bss, _ebss, _got and _egot as
- * hidden:
+ * R_386_RELATIVE relocations, we mark _bss and _ebss as hidden:
*/
.hidden _bss
.hidden _ebss
- .hidden _got
- .hidden _egot
__HEAD
SYM_FUNC_START(startup_32)
@@ -191,19 +188,6 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
shrl $2, %ecx
rep stosl
-/*
- * Adjust our own GOT
- */
- leal _got(%ebx), %edx
- leal _egot(%ebx), %ecx
-1:
- cmpl %ecx, %edx
- jae 2f
- addl %ebx, (%edx)
- addl $4, %edx
- jmp 1b
-2:
-
/*
* Do the extraction, and jump to the new kernel..
*/
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 4f7e6b84be07..706fbf6eef53 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -40,8 +40,6 @@
*/
.hidden _bss
.hidden _ebss
- .hidden _got
- .hidden _egot
__HEAD
.code32
@@ -344,25 +342,6 @@ SYM_CODE_START(startup_64)
/* Set up the stack */
leaq boot_stack_end(%rbx), %rsp
- /*
- * paging_prepare() and cleanup_trampoline() below can have GOT
- * references. Adjust the table with address we are running at.
- *
- * Zero RAX for adjust_got: the GOT was not adjusted before;
- * there's no adjustment to undo.
- */
- xorq %rax, %rax
-
- /*
- * Calculate the address the binary is loaded at and use it as
- * a GOT adjustment.
- */
- call 1f
-1: popq %rdi
- subq $1b, %rdi
-
- call .Ladjust_got
-
/*
* At this point we are in long mode with 4-level paging enabled,
* but we might want to enable 5-level paging or vice versa.
@@ -447,21 +426,6 @@ trampoline_return:
pushq $0
popfq
- /*
- * Previously we've adjusted the GOT with address the binary was
- * loaded at. Now we need to re-adjust for relocation address.
- *
- * Calculate the address the binary is loaded at, so that we can
- * undo the previous GOT adjustment.
- */
- call 1f
-1: popq %rax
- subq $1b, %rax
-
- /* The new adjustment is the relocation address */
- movq %rbx, %rdi
- call .Ladjust_got
-
/*
* Copy the compressed kernel to the end of our buffer
* where decompression in place becomes safe.
@@ -539,27 +503,6 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
jmp *%rax
SYM_FUNC_END(.Lrelocated)
-/*
- * Adjust the global offset table
- *
- * RAX is the previous adjustment of the table to undo (use 0 if it's the
- * first time we touch GOT).
- * RDI is the new adjustment to apply.
- */
-.Ladjust_got:
- /* Walk through the GOT adding the address to the entries */
- leaq _got(%rip), %rdx
- leaq _egot(%rip), %rcx
-1:
- cmpq %rcx, %rdx
- jae 2f
- subq %rax, (%rdx) /* Undo previous adjustment */
- addq %rdi, (%rdx) /* Apply the new adjustment */
- addq $8, %rdx
- jmp 1b
-2:
- ret
-
.code32
/*
* This is the 32-bit trampoline that will be copied over to low memory.
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 60a99dfb9d72..d91fdda51aa8 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -43,9 +43,7 @@ SECTIONS
_erodata = . ;
}
.got : {
- _got = .;
KEEP(*(.got))
- _egot = .;
}
.got.plt : {
KEEP(*(.got.plt))
--
2.20.1
On Sat, May 23, 2020 at 02:00:18PM +0200, Ard Biesheuvel wrote:
> Building position independent code using GCC by default results in references
> to symbols with external linkage to be resolved via GOT entries, which
> carry the absolute addresses of the symbols, and thus need to be corrected
> if the load time address of the executable != the link time address.
>
> For fully linked binaries, such GOT indirected references are completely
> useless, and actually make the startup code more complicated than necessary,
> since these corrections may need to be applied more than once. In fact, we
> have been very careful to avoid such references in the EFI stub code, since
> it would require yet another [earlier] pass of GOT fixups which we currently
> don't implement.
>
> Older GCCs were quirky when it came to overriding this behavior using symbol
> visibility, but now that we have increased the minimum GCC version to 4.6,
> we can actually start setting the symbol visibility to 'hidden' globally for
> all symbol references in the decompressor, getting rid of the GOT entirely.
> This means we can get rid of the GOT fixup code right away, and we can start
> using ordinary external symbol references in the EFI stub without running the
> risk of boot regressions. (v2 note: we have already started doing this)
>
> CC'ing Linus and Maarten, who were involved in diagnosing an issue related
> to GOT entries emitted from the EFI stub ~5 years ago. [0] [1]
>
> Many thanks to Arvind for the suggestions and the help in testing these
> changes. Tested on GCC 4.6 + binutils 2.24 (Ubuntu 14.04), and GCC 8 +
> binutils 2.31 (Debian Buster)
>
> Changes since v1 [2]:
> Rebase only - recent EFI changes have moved all the C code into
> drivers/firmware/efi/libstub/, which is also built with hidden
> visibility, and contains an additional objdump pass to detect (and
> reject) absolute symbol references.
>
> Unless anyone objects, I'd like to incorporate these changes into my
> late EFI PR for v5.8, which will go out in a day or two.
>
> Cc: Maarten Lankhorst <[email protected]>
> Cc: Linus Torvalds <[email protected]>
> Cc: Arvind Sankar <[email protected]>
>
> [0] https://lore.kernel.org/lkml/[email protected]/
> [1] https://lore.kernel.org/lkml/CA+55aFxW9PmtjOf9nUQwpU8swsFqJOz8whZXcONo+XFmkSwezg@mail.gmail.com/
> [2] https://lore.kernel.org/linux-efi/[email protected]/
>
Haha, I was actually doing exactly the same thing as part of getting the
compressed kernel linkable with LLD.
Acked-by: Arvind Sankar <[email protected]>
On Sat, May 23, 2020 at 02:00:21PM +0200, Ard Biesheuvel wrote:
> In a previous patch, we have eliminated GOT entries from the decompressor
> binary and added an assertion that the .got section is empty. This means
> that the GOT fixup routines that exist in both the 32-bit and 64-bit
> startup routines have become dead code, and can be removed.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> index 60a99dfb9d72..d91fdda51aa8 100644
> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> @@ -43,9 +43,7 @@ SECTIONS
> _erodata = . ;
> }
> .got : {
> - _got = .;
> KEEP(*(.got))
> - _egot = .;
> }
> .got.plt : {
> KEEP(*(.got.plt))
> --
> 2.20.1
>
I think you can get rid of both the KEEP's here as well?
On Sat, 23 May 2020 at 17:18, Arvind Sankar <[email protected]> wrote:
>
> On Sat, May 23, 2020 at 02:00:21PM +0200, Ard Biesheuvel wrote:
> > In a previous patch, we have eliminated GOT entries from the decompressor
> > binary and added an assertion that the .got section is empty. This means
> > that the GOT fixup routines that exist in both the 32-bit and 64-bit
> > startup routines have become dead code, and can be removed.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> > index 60a99dfb9d72..d91fdda51aa8 100644
> > --- a/arch/x86/boot/compressed/vmlinux.lds.S
> > +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> > @@ -43,9 +43,7 @@ SECTIONS
> > _erodata = . ;
> > }
> > .got : {
> > - _got = .;
> > KEEP(*(.got))
> > - _egot = .;
> > }
> > .got.plt : {
> > KEEP(*(.got.plt))
> > --
> > 2.20.1
> >
>
> I think you can get rid of both the KEEP's here as well?
Yeah, they seem fairly pointless to me to begin with, given that these
contents are created by the linker on the fly, rather than passed
through from the input objects.
I'll drop them.
* Ard Biesheuvel <[email protected]> wrote:
> The .got.plt section contains the part of the GOT which is used by PLT
> entries, and which gets updated lazily by the dynamic loader when
> function calls are dispatched through those PLT entries.
>
> On fully linked binaries such as the kernel proper or the decompressor,
> this never happens, and so in practice, the .got.plt section consists
> only of the first 3 magic entries that are meant to point at the _DYNAMIC
> section and at the fixup routine in the loader. However, since we don't
> use a dynamic loader, those entries are never populated or used.
>
> This means that treating those entries like ordinary GOT entries, and
> updating their values based on the actual placement of the executable in
> memory is completely pointless, and we can just ignore the .got.plt
> section entirely, provided that it has no additional entries beyond
> the first 3 ones.
>
> So add an assertion in the linker script to ensure that this assumption
> holds, and move the contents out of the [_got, _egot) memory range that
> is modified by the GOT fixup routines.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/boot/compressed/vmlinux.lds.S | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> index 0dc5c2b9614b..ce3fdfb93b57 100644
> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> @@ -44,10 +44,13 @@ SECTIONS
> }
> .got : {
> _got = .;
> - KEEP(*(.got.plt))
> KEEP(*(.got))
> _egot = .;
> }
> + .got.plt : {
> + KEEP(*(.got.plt))
> + }
> +
> .data : {
> _data = . ;
> *(.data)
> @@ -75,3 +78,11 @@ SECTIONS
> . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
> _end = .;
> }
> +
> +#ifdef CONFIG_X86_64
> +ASSERT (SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0x18,
> + "Unexpected GOT/PLT entries detected!")
> +#else
> +ASSERT (SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0xc,
> + "Unexpected GOT/PLT entries detected!")
> +#endif
Cool debugging check!
Minor consistent-style nit:
s/ASSERT (
/ASSERT(
Optional: maybe even merge these on a single line, as a special exception to the col80 rule?
#ifdef CONFIG_X86_64
ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0x18, "Unexpected GOT/PLT entries detected!")
#else
ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0xc, "Unexpected GOT/PLT entries detected!")
#endif
But fine either way.
Thanks,
Ingo
* Ard Biesheuvel <[email protected]> wrote:
> Eliminate all GOT entries in the decompressor binary, by forcing hidden
> visibility for all symbol references, which informs the compiler that
> such references will be resolved at link time without the need for
> allocating GOT entries.
>
> To ensure that no GOT entries will creep back in, add an assertion to
> the decompressor linker script that will fire if the .got section has
> a non-zero size.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/boot/compressed/hidden.h | 19 +++++++++++++++++++
> arch/x86/boot/compressed/vmlinux.lds.S | 1 +
> 3 files changed, 21 insertions(+)
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 5f7c262bcc99..aa9ed814e5fa 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
> KBUILD_CFLAGS += -Wno-pointer-sign
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> +KBUILD_CFLAGS += -include hidden.h
> + * When building position independent code with GCC using the -fPIC option,
> + * (or even the -fPIE one on older versions), it will assume that we are
> + * building a dynamic object (either a shared library or an executable) that
> + * may have symbol references that can only be resolved at load time. For a
> + * variety of reasons (ELF symbol preemption, the CoW footprint of the section
> + * that is modified by the loader), this results in all references to symbols
> + * with external linkage to go via entries in the Global Offset Table (GOT),
> + * which carries absolute addresses which need to be fixed up when the
> + * executable image is loaded at an offset which is different from its link
> + * time offset.
> + *
> + * Fortunately, there is a way to inform the compiler that such symbol
> + * references will be satisfied at link time rather than at load time, by
> + * giving them 'hidden' visibility.
> + */
> +
> +#pragma GCC visibility push(hidden)
BTW., how many such GOT entries did we have before this change, on a typical kernel?
> +ASSERT (SIZEOF(.got) == 0, "Unexpected GOT entries detected!")
s/ASSERT (
/ASSERT(
Thanks,
Ingo
On Sun, 24 May 2020 at 17:08, Ingo Molnar <[email protected]> wrote:
>
>
> * Ard Biesheuvel <[email protected]> wrote:
>
> > The .got.plt section contains the part of the GOT which is used by PLT
> > entries, and which gets updated lazily by the dynamic loader when
> > function calls are dispatched through those PLT entries.
> >
> > On fully linked binaries such as the kernel proper or the decompressor,
> > this never happens, and so in practice, the .got.plt section consists
> > only of the first 3 magic entries that are meant to point at the _DYNAMIC
> > section and at the fixup routine in the loader. However, since we don't
> > use a dynamic loader, those entries are never populated or used.
> >
> > This means that treating those entries like ordinary GOT entries, and
> > updating their values based on the actual placement of the executable in
> > memory is completely pointless, and we can just ignore the .got.plt
> > section entirely, provided that it has no additional entries beyond
> > the first 3 ones.
> >
> > So add an assertion in the linker script to ensure that this assumption
> > holds, and move the contents out of the [_got, _egot) memory range that
> > is modified by the GOT fixup routines.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > ---
> > arch/x86/boot/compressed/vmlinux.lds.S | 13 ++++++++++++-
> > 1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> > index 0dc5c2b9614b..ce3fdfb93b57 100644
> > --- a/arch/x86/boot/compressed/vmlinux.lds.S
> > +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> > @@ -44,10 +44,13 @@ SECTIONS
> > }
> > .got : {
> > _got = .;
> > - KEEP(*(.got.plt))
> > KEEP(*(.got))
> > _egot = .;
> > }
> > + .got.plt : {
> > + KEEP(*(.got.plt))
> > + }
> > +
> > .data : {
> > _data = . ;
> > *(.data)
> > @@ -75,3 +78,11 @@ SECTIONS
> > . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
> > _end = .;
> > }
> > +
> > +#ifdef CONFIG_X86_64
> > +ASSERT (SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0x18,
> > + "Unexpected GOT/PLT entries detected!")
> > +#else
> > +ASSERT (SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0xc,
> > + "Unexpected GOT/PLT entries detected!")
> > +#endif
>
> Cool debugging check!
>
> Minor consistent-style nit:
>
> s/ASSERT (
> /ASSERT(
>
ok
> Optional: maybe even merge these on a single line, as a special exception to the col80 rule?
>
> #ifdef CONFIG_X86_64
> ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0x18, "Unexpected GOT/PLT entries detected!")
> #else
> ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0xc, "Unexpected GOT/PLT entries detected!")
> #endif
>
> But fine either way.
>
SIZEOF(.got.plt) <= 0x18
could be used as well, given that the section is either empty, or has
at least the 3 magic entries.
* Ard Biesheuvel <[email protected]> wrote:
> On Sun, 24 May 2020 at 17:08, Ingo Molnar <[email protected]> wrote:
> >
> >
> > * Ard Biesheuvel <[email protected]> wrote:
> >
> > > The .got.plt section contains the part of the GOT which is used by PLT
> > > entries, and which gets updated lazily by the dynamic loader when
> > > function calls are dispatched through those PLT entries.
> > >
> > > On fully linked binaries such as the kernel proper or the decompressor,
> > > this never happens, and so in practice, the .got.plt section consists
> > > only of the first 3 magic entries that are meant to point at the _DYNAMIC
> > > section and at the fixup routine in the loader. However, since we don't
> > > use a dynamic loader, those entries are never populated or used.
> > >
> > > This means that treating those entries like ordinary GOT entries, and
> > > updating their values based on the actual placement of the executable in
> > > memory is completely pointless, and we can just ignore the .got.plt
> > > section entirely, provided that it has no additional entries beyond
> > > the first 3 ones.
> > >
> > > So add an assertion in the linker script to ensure that this assumption
> > > holds, and move the contents out of the [_got, _egot) memory range that
> > > is modified by the GOT fixup routines.
> > >
> > > Signed-off-by: Ard Biesheuvel <[email protected]>
> > > ---
> > > arch/x86/boot/compressed/vmlinux.lds.S | 13 ++++++++++++-
> > > 1 file changed, 12 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> > > index 0dc5c2b9614b..ce3fdfb93b57 100644
> > > --- a/arch/x86/boot/compressed/vmlinux.lds.S
> > > +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> > > @@ -44,10 +44,13 @@ SECTIONS
> > > }
> > > .got : {
> > > _got = .;
> > > - KEEP(*(.got.plt))
> > > KEEP(*(.got))
> > > _egot = .;
> > > }
> > > + .got.plt : {
> > > + KEEP(*(.got.plt))
> > > + }
> > > +
> > > .data : {
> > > _data = . ;
> > > *(.data)
> > > @@ -75,3 +78,11 @@ SECTIONS
> > > . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
> > > _end = .;
> > > }
> > > +
> > > +#ifdef CONFIG_X86_64
> > > +ASSERT (SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0x18,
> > > + "Unexpected GOT/PLT entries detected!")
> > > +#else
> > > +ASSERT (SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0xc,
> > > + "Unexpected GOT/PLT entries detected!")
> > > +#endif
> >
> > Cool debugging check!
> >
> > Minor consistent-style nit:
> >
> > s/ASSERT (
> > /ASSERT(
> >
>
> ok
>
> > Optional: maybe even merge these on a single line, as a special exception to the col80 rule?
> >
> > #ifdef CONFIG_X86_64
> > ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0x18, "Unexpected GOT/PLT entries detected!")
> > #else
> > ASSERT(SIZEOF(.got.plt) == 0 || SIZEOF(.got.plt) == 0xc, "Unexpected GOT/PLT entries detected!")
> > #endif
> >
> > But fine either way.
> >
>
> SIZEOF(.got.plt) <= 0x18
>
> could be used as well, given that the section is either empty, or has
> at least the 3 magic entries.
This this has bitten us before, so I think the precise assert is better.
Two-line version is OK too.
Thanks,
Ingo
On Sun, 24 May 2020 at 17:12, Ingo Molnar <[email protected]> wrote:
>
>
> * Ard Biesheuvel <[email protected]> wrote:
>
> > Eliminate all GOT entries in the decompressor binary, by forcing hidden
> > visibility for all symbol references, which informs the compiler that
> > such references will be resolved at link time without the need for
> > allocating GOT entries.
> >
> > To ensure that no GOT entries will creep back in, add an assertion to
> > the decompressor linker script that will fire if the .got section has
> > a non-zero size.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > ---
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/boot/compressed/hidden.h | 19 +++++++++++++++++++
> > arch/x86/boot/compressed/vmlinux.lds.S | 1 +
> > 3 files changed, 21 insertions(+)
> >
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index 5f7c262bcc99..aa9ed814e5fa 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -40,6 +40,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
> > KBUILD_CFLAGS += -Wno-pointer-sign
> > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > +KBUILD_CFLAGS += -include hidden.h
>
> > + * When building position independent code with GCC using the -fPIC option,
> > + * (or even the -fPIE one on older versions), it will assume that we are
> > + * building a dynamic object (either a shared library or an executable) that
> > + * may have symbol references that can only be resolved at load time. For a
> > + * variety of reasons (ELF symbol preemption, the CoW footprint of the section
> > + * that is modified by the loader), this results in all references to symbols
> > + * with external linkage to go via entries in the Global Offset Table (GOT),
> > + * which carries absolute addresses which need to be fixed up when the
> > + * executable image is loaded at an offset which is different from its link
> > + * time offset.
> > + *
> > + * Fortunately, there is a way to inform the compiler that such symbol
> > + * references will be satisfied at link time rather than at load time, by
> > + * giving them 'hidden' visibility.
> > + */
> > +
> > +#pragma GCC visibility push(hidden)
>
> BTW., how many such GOT entries did we have before this change, on a typical kernel?
>
None if you are using a recent GCC/binutils that use a relocation type
that can be relaxed into an ordinary relative reference by the linker.
Older GCC/binutils may emit a couple for the decompressor.
> > +ASSERT (SIZEOF(.got) == 0, "Unexpected GOT entries detected!")
>
> s/ASSERT (
> /ASSERT(
>
OK
> Thanks,
>
> Ingo
On Sat, May 23, 2020 at 02:00:20PM +0200, Ard Biesheuvel wrote:
> Eliminate all GOT entries in the decompressor binary, by forcing hidden
> visibility for all symbol references, which informs the compiler that
> such references will be resolved at link time without the need for
> allocating GOT entries.
>
> To ensure that no GOT entries will creep back in, add an assertion to
> the decompressor linker script that will fire if the .got section has
> a non-zero size.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>
> ---
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/boot/compressed/hidden.h | 19 +++++++++++++++++++
> arch/x86/boot/compressed/vmlinux.lds.S | 1 +
> 3 files changed, 21 insertions(+)
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 5f7c262bcc99..aa9ed814e5fa 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -40,6 +40,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
> KBUILD_CFLAGS += -Wno-pointer-sign
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> +KBUILD_CFLAGS += -include hidden.h
>
Ard, from the other thread [1] in case you missed it -- the plain
hidden.h fails to build in-tree. We need something like
KBUILD_CFLAGS += -include $(srctree)/$(src)/hidden.h
instead.
[1] https://lore.kernel.org/lkml/[email protected]/
On Wed, May 27, 2020 at 2:08 PM Arvind Sankar <[email protected]> wrote:
>
> On Sat, May 23, 2020 at 02:00:20PM +0200, Ard Biesheuvel wrote:
> > Eliminate all GOT entries in the decompressor binary, by forcing hidden
> > visibility for all symbol references, which informs the compiler that
> > such references will be resolved at link time without the need for
> > allocating GOT entries.
> >
> > To ensure that no GOT entries will creep back in, add an assertion to
> > the decompressor linker script that will fire if the .got section has
> > a non-zero size.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > ---
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/boot/compressed/hidden.h | 19 +++++++++++++++++++
> > arch/x86/boot/compressed/vmlinux.lds.S | 1 +
> > 3 files changed, 21 insertions(+)
> >
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index 5f7c262bcc99..aa9ed814e5fa 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -40,6 +40,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
> > KBUILD_CFLAGS += -Wno-pointer-sign
> > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > +KBUILD_CFLAGS += -include hidden.h
> >
>
> Ard, from the other thread [1] in case you missed it -- the plain
> hidden.h fails to build in-tree. We need something like
> KBUILD_CFLAGS += -include $(srctree)/$(src)/hidden.h
> instead.
>
> [1] https://lore.kernel.org/lkml/[email protected]/
How about using -fvisibility=hidden instead of including this header?
--
Brian Gerst
On Wed, 27 May 2020 at 20:29, Brian Gerst <[email protected]> wrote:
>
> On Wed, May 27, 2020 at 2:08 PM Arvind Sankar <[email protected]> wrote:
> >
> > On Sat, May 23, 2020 at 02:00:20PM +0200, Ard Biesheuvel wrote:
> > > Eliminate all GOT entries in the decompressor binary, by forcing hidden
> > > visibility for all symbol references, which informs the compiler that
> > > such references will be resolved at link time without the need for
> > > allocating GOT entries.
> > >
> > > To ensure that no GOT entries will creep back in, add an assertion to
> > > the decompressor linker script that will fire if the .got section has
> > > a non-zero size.
> > >
> > > Signed-off-by: Ard Biesheuvel <[email protected]>
> > > ---
> > > arch/x86/boot/compressed/Makefile | 1 +
> > > arch/x86/boot/compressed/hidden.h | 19 +++++++++++++++++++
> > > arch/x86/boot/compressed/vmlinux.lds.S | 1 +
> > > 3 files changed, 21 insertions(+)
> > >
> > > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > > index 5f7c262bcc99..aa9ed814e5fa 100644
> > > --- a/arch/x86/boot/compressed/Makefile
> > > +++ b/arch/x86/boot/compressed/Makefile
> > > @@ -40,6 +40,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
> > > KBUILD_CFLAGS += -Wno-pointer-sign
> > > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> > > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > > +KBUILD_CFLAGS += -include hidden.h
> > >
> >
> > Ard, from the other thread [1] in case you missed it -- the plain
> > hidden.h fails to build in-tree. We need something like
> > KBUILD_CFLAGS += -include $(srctree)/$(src)/hidden.h
> > instead.
> >
> > [1] https://lore.kernel.org/lkml/[email protected]/
>
> How about using -fvisibility=hidden instead of including this header?
>
That only works for definitions, not for extern declarations.
On Wed, 27 May 2020 at 16:36, Arvind Sankar <[email protected]> wrote:
>
> On Sat, May 23, 2020 at 02:00:20PM +0200, Ard Biesheuvel wrote:
> > Eliminate all GOT entries in the decompressor binary, by forcing hidden
> > visibility for all symbol references, which informs the compiler that
> > such references will be resolved at link time without the need for
> > allocating GOT entries.
> >
> > To ensure that no GOT entries will creep back in, add an assertion to
> > the decompressor linker script that will fire if the .got section has
> > a non-zero size.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
> > ---
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/boot/compressed/hidden.h | 19 +++++++++++++++++++
> > arch/x86/boot/compressed/vmlinux.lds.S | 1 +
> > 3 files changed, 21 insertions(+)
> >
> > diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> > index 5f7c262bcc99..aa9ed814e5fa 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -40,6 +40,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
> > KBUILD_CFLAGS += -Wno-pointer-sign
> > KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> > KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> > +KBUILD_CFLAGS += -include hidden.h
> >
>
> Ard, from the other thread [1] in case you missed it -- the plain
> hidden.h fails to build in-tree. We need something like
> KBUILD_CFLAGS += -include $(srctree)/$(src)/hidden.h
> instead.
>
> [1] https://lore.kernel.org/lkml/[email protected]/
Thanks for the reminder, but I was aware of this.
I'll send out a v2 but given how close we are to the v5.7 release,
this is going to be v5.8 material anyway, so let's revisit this later.