2020-01-09 18:36:12

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

Commit 5b11f1cee579 ("x86, boot: straighten out ranges to copy/zero in
compressed/head*.S") introduced a separate .pgtable section, splitting
it out from the rest of .bss. This section was added without the
writeable flag, marking it as read-only. This results in the linker
putting the .rela.dyn section (containing bogus dynamic relocations from
head_64.o) after the .bss and .pgtable sections.

When we use objcopy to convert compressed/vmlinux into a binary for the
bzImage, the .bss and .pgtable sections get materialized as ~176KiB of
zero bytes in the binary in order to place .rela.dyn at the correct
location.

Fix this by marking .pgtable as writeable. This moves the .rela.dyn
section earlier so that .bss and .pgtable are the last allocated
sections and so don't appear in bzImage.

Signed-off-by: Arvind Sankar <[email protected]>
---
arch/x86/boot/compressed/head_64.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 58a512e33d8d..6eb30f8a3ce7 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -709,7 +709,7 @@ SYM_DATA_END_LABEL(boot_stack, SYM_L_LOCAL, boot_stack_end)
/*
* Space for page tables (not in .bss so not zeroed)
*/
- .section ".pgtable","a",@nobits
+ .section ".pgtable","aw",@nobits
.balign 4096
SYM_DATA_LOCAL(pgtable, .fill BOOT_PGT_SIZE, 1, 0)

--
2.24.1


2020-01-09 18:52:23

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

Discarding the sections that are unused in the compressed kernel saves
about 10 KiB on 32-bit and 6 KiB on 64-bit, mostly from .eh_frame.

Signed-off-by: Arvind Sankar <[email protected]>
---
arch/x86/boot/compressed/vmlinux.lds.S | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 508cfa6828c5..12a20603d92e 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -73,4 +73,9 @@ SECTIONS
#endif
. = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
_end = .;
+
+ /* Discard all remaining sections */
+ /DISCARD/ : {
+ *(*)
+ }
}
--
2.24.1

2020-02-05 16:30:45

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Thu, Jan 09, 2020 at 10:02:17AM -0500, Arvind Sankar wrote:
> Commit 5b11f1cee579 ("x86, boot: straighten out ranges to copy/zero in
> compressed/head*.S") introduced a separate .pgtable section, splitting
> it out from the rest of .bss. This section was added without the
> writeable flag, marking it as read-only. This results in the linker
> putting the .rela.dyn section (containing bogus dynamic relocations from
> head_64.o) after the .bss and .pgtable sections.
>
> When we use objcopy to convert compressed/vmlinux into a binary for the
> bzImage, the .bss and .pgtable sections get materialized as ~176KiB of
> zero bytes in the binary in order to place .rela.dyn at the correct
> location.
>
> Fix this by marking .pgtable as writeable. This moves the .rela.dyn
> section earlier so that .bss and .pgtable are the last allocated
> sections and so don't appear in bzImage.
>
> Signed-off-by: Arvind Sankar <[email protected]>
> ---
> arch/x86/boot/compressed/head_64.S | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> index 58a512e33d8d..6eb30f8a3ce7 100644
> --- a/arch/x86/boot/compressed/head_64.S
> +++ b/arch/x86/boot/compressed/head_64.S
> @@ -709,7 +709,7 @@ SYM_DATA_END_LABEL(boot_stack, SYM_L_LOCAL, boot_stack_end)
> /*
> * Space for page tables (not in .bss so not zeroed)
> */
> - .section ".pgtable","a",@nobits
> + .section ".pgtable","aw",@nobits
> .balign 4096
> SYM_DATA_LOCAL(pgtable, .fill BOOT_PGT_SIZE, 1, 0)
>
> --
> 2.24.1
>

Gentle reminder.

https://lore.kernel.org/lkml/[email protected]

2020-02-06 11:31:41

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Thu, Jan 09, 2020 at 10:02:17AM -0500, Arvind Sankar wrote:
> Commit 5b11f1cee579 ("x86, boot: straighten out ranges to copy/zero in
> compressed/head*.S") introduced a separate .pgtable section, splitting
> it out from the rest of .bss. This section was added without the
> writeable flag, marking it as read-only. This results in the linker
> putting the .rela.dyn section (containing bogus dynamic relocations from
> head_64.o) after the .bss and .pgtable sections.

Thank you! As you know from the fg-kaslr thread[1], I ran into this (10
year old!) bug while helping there. I could not figure out why .bss was
getting allocated into the on-disk image.

> When we use objcopy to convert compressed/vmlinux into a binary for the
> bzImage, the .bss and .pgtable sections get materialized as ~176KiB of
> zero bytes in the binary in order to place .rela.dyn at the correct
> location.
>
> Fix this by marking .pgtable as writeable. This moves the .rela.dyn
> section earlier so that .bss and .pgtable are the last allocated
> sections and so don't appear in bzImage.
>
> Signed-off-by: Arvind Sankar <[email protected]>

Acked-by: Kees Cook <[email protected]>

-Kees

[1] https://lore.kernel.org/lkml/202002060251.681292DE63@keescook/

--
Kees Cook

2020-02-06 12:27:53

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Thu, Jan 09, 2020 at 10:02:18AM -0500, Arvind Sankar wrote:
> Discarding the sections that are unused in the compressed kernel saves
> about 10 KiB on 32-bit and 6 KiB on 64-bit, mostly from .eh_frame.
>
> Signed-off-by: Arvind Sankar <[email protected]>

Looks good to me. I was worried this would paper over the problem, but
actually this solves it even more directly by removing the troublesome
sections. So, yes, let's do this too. :)

Acked-by: Kees Cook <[email protected]>

--
Kees Cook

2020-02-18 18:04:33

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Wed, Feb 05, 2020 at 11:29:21AM -0500, Arvind Sankar wrote:
> On Thu, Jan 09, 2020 at 10:02:17AM -0500, Arvind Sankar wrote:
> > Commit 5b11f1cee579 ("x86, boot: straighten out ranges to copy/zero in
> > compressed/head*.S") introduced a separate .pgtable section, splitting
> > it out from the rest of .bss. This section was added without the
> > writeable flag, marking it as read-only. This results in the linker
> > putting the .rela.dyn section (containing bogus dynamic relocations from
> > head_64.o) after the .bss and .pgtable sections.
> >
> > When we use objcopy to convert compressed/vmlinux into a binary for the
> > bzImage, the .bss and .pgtable sections get materialized as ~176KiB of
> > zero bytes in the binary in order to place .rela.dyn at the correct
> > location.
> >
> > Fix this by marking .pgtable as writeable. This moves the .rela.dyn
> > section earlier so that .bss and .pgtable are the last allocated
> > sections and so don't appear in bzImage.
> >
> > Signed-off-by: Arvind Sankar <[email protected]>
> > ---
> > arch/x86/boot/compressed/head_64.S | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> > index 58a512e33d8d..6eb30f8a3ce7 100644
> > --- a/arch/x86/boot/compressed/head_64.S
> > +++ b/arch/x86/boot/compressed/head_64.S
> > @@ -709,7 +709,7 @@ SYM_DATA_END_LABEL(boot_stack, SYM_L_LOCAL, boot_stack_end)
> > /*
> > * Space for page tables (not in .bss so not zeroed)
> > */
> > - .section ".pgtable","a",@nobits
> > + .section ".pgtable","aw",@nobits
> > .balign 4096
> > SYM_DATA_LOCAL(pgtable, .fill BOOT_PGT_SIZE, 1, 0)
> >
> > --
> > 2.24.1
> >
>
> Gentle reminder.
>
> https://lore.kernel.org/lkml/[email protected]

Ping.

2020-02-19 12:11:51

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Tue, Feb 18, 2020 at 01:03:54PM -0500, Arvind Sankar wrote:
> On Wed, Feb 05, 2020 at 11:29:21AM -0500, Arvind Sankar wrote:
> > On Thu, Jan 09, 2020 at 10:02:17AM -0500, Arvind Sankar wrote:
> > > Commit 5b11f1cee579 ("x86, boot: straighten out ranges to copy/zero in
> > > compressed/head*.S") introduced a separate .pgtable section, splitting
> > > it out from the rest of .bss. This section was added without the
> > > writeable flag, marking it as read-only. This results in the linker
> > > putting the .rela.dyn section (containing bogus dynamic relocations from
> > > head_64.o) after the .bss and .pgtable sections.
> > >
> > > When we use objcopy to convert compressed/vmlinux into a binary for the
> > > bzImage, the .bss and .pgtable sections get materialized as ~176KiB of
> > > zero bytes in the binary in order to place .rela.dyn at the correct
> > > location.
> > >
> > > Fix this by marking .pgtable as writeable. This moves the .rela.dyn
> > > section earlier so that .bss and .pgtable are the last allocated
> > > sections and so don't appear in bzImage.
> > >
> > > Signed-off-by: Arvind Sankar <[email protected]>
> > > ---
> > > arch/x86/boot/compressed/head_64.S | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
> > > index 58a512e33d8d..6eb30f8a3ce7 100644
> > > --- a/arch/x86/boot/compressed/head_64.S
> > > +++ b/arch/x86/boot/compressed/head_64.S
> > > @@ -709,7 +709,7 @@ SYM_DATA_END_LABEL(boot_stack, SYM_L_LOCAL, boot_stack_end)
> > > /*
> > > * Space for page tables (not in .bss so not zeroed)
> > > */
> > > - .section ".pgtable","a",@nobits
> > > + .section ".pgtable","aw",@nobits
> > > .balign 4096
> > > SYM_DATA_LOCAL(pgtable, .fill BOOT_PGT_SIZE, 1, 0)
> > >
> > > --
> > > 2.24.1
> > >
> >
> > Gentle reminder.
> >
> > https://lore.kernel.org/lkml/[email protected]
>
> Ping.

You keep pinging. Why? Is it a showstopper or what is the urgency here?

This is shaving off some 100 - 200 KiB from the final bzImage, AFAICT. Or
is there something more broken this is fixing?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Subject: [tip: x86/boot] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

The following commit has been merged into the x86/boot branch of tip:

Commit-ID: 3ee372ccce4d4e7c610748d0583979d3ed3a0cf4
Gitweb: https://git.kernel.org/tip/3ee372ccce4d4e7c610748d0583979d3ed3a0cf4
Author: Arvind Sankar <[email protected]>
AuthorDate: Thu, 09 Jan 2020 10:02:17 -05:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Wed, 19 Feb 2020 17:28:57 +01:00

x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

Commit

5b11f1cee579 ("x86, boot: straighten out ranges to copy/zero in
compressed/head*.S")

introduced a separate .pgtable section, splitting it out from the rest
of .bss. This section was added without the writeable flag, marking it
as read-only. This results in the linker putting the .rela.dyn section
(containing bogus dynamic relocations from head_64.o) after the .bss and
.pgtable sections.

When objcopy is used to convert compressed/vmlinux into a binary for
the bzImage:

$ objcopy -O binary -R .note -R .comment -S arch/x86/boot/compressed/vmlinux \
arch/x86/boot/vmlinux.bin

the .bss and .pgtable sections get materialized as ~176KiB of zero
bytes in the binary in order to place .rela.dyn at the correct location.

Fix this by marking .pgtable as writeable. This moves the .rela.dyn
section up in the ELF image layout so that .bss and .pgtable are the
last allocated sections and so don't appear in bzImage.

[ bp: Massage commit message. ]

Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Kees Cook <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/head_64.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index 68f31c4..c8ee6ef 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -645,7 +645,7 @@ SYM_DATA_END_LABEL(boot_stack, SYM_L_LOCAL, boot_stack_end)
/*
* Space for page tables (not in .bss so not zeroed)
*/
- .section ".pgtable","a",@nobits
+ .section ".pgtable","aw",@nobits
.balign 4096
SYM_DATA_LOCAL(pgtable, .fill BOOT_PGT_SIZE, 1, 0)

Subject: [tip: x86/boot] x86/boot/compressed: Remove unnecessary sections from bzImage

The following commit has been merged into the x86/boot branch of tip:

Commit-ID: e11831d0ada3efc7f9268f35c80257f7e1b1dd0c
Gitweb: https://git.kernel.org/tip/e11831d0ada3efc7f9268f35c80257f7e1b1dd0c
Author: Arvind Sankar <[email protected]>
AuthorDate: Thu, 09 Jan 2020 10:02:18 -05:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Wed, 19 Feb 2020 17:35:30 +01:00

x86/boot/compressed: Remove unnecessary sections from bzImage

Discarding the sections that are unused in the compressed kernel saves
about 10 KiB on 32-bit and 6 KiB on 64-bit, mostly from .eh_frame.

Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Acked-by: Kees Cook <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/vmlinux.lds.S | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 508cfa6..12a2060 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -73,4 +73,9 @@ SECTIONS
#endif
. = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
_end = .;
+
+ /* Discard all remaining sections */
+ /DISCARD/ : {
+ *(*)
+ }
}

2020-02-19 17:58:13

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Wed, Feb 19, 2020 at 01:09:38PM +0100, Borislav Petkov wrote:
>
> You keep pinging. Why? Is it a showstopper or what is the urgency here?
>
> This is shaving off some 100 - 200 KiB from the final bzImage, AFAICT. Or
> is there something more broken this is fixing?
>

There isn't any particular urgency (at least until fg-kaslr patches try
to make it 64MiB bigger), but it's unclear how long to wait before
sending a reminder -- Documentation/process suggests that comments
should be received in a week or so, pinging after 4 and 6 weeks seemed
reasonable. If x86 has a longer queue, might be worth documenting that
somewhere?

2020-02-19 18:22:52

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Wed, Feb 19, 2020 at 12:57:17PM -0500, Arvind Sankar wrote:
> There isn't any particular urgency (at least until fg-kaslr patches try

Ok, I was just making sure you're not pinging because there's something
more urgent here. See below.

> to make it 64MiB bigger), but it's unclear how long to wait before
> sending a reminder -- Documentation/process suggests that comments
> should be received in a week or so, pinging after 4 and 6 weeks seemed
> reasonable. If x86 has a longer queue, might be worth documenting that
> somewhere?

We try to track all stuff but x86 is super crazy most of the time,
especially currently, so stuff gets prioritized based on urgency and we
also try to round-robin through all submitters so that stuff doesn't get
left out.

Thus the one-week thing will never work with x86. Once a month ping
maybe.

In your case, since it is an improvement which is good to have but not
absolutely a must and not a bugfix, it is understandable that it would
get pushed back in priority.

But stuff usually won't be forgotten and we'll get to it eventually - it
is just that we're mega swamped all the time. :-\

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-02-19 19:07:07

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Wed, Feb 19, 2020 at 07:22:30PM +0100, Borislav Petkov wrote:
> On Wed, Feb 19, 2020 at 12:57:17PM -0500, Arvind Sankar wrote:
> > There isn't any particular urgency (at least until fg-kaslr patches try
>
> Ok, I was just making sure you're not pinging because there's something
> more urgent here. See below.
>
> > to make it 64MiB bigger), but it's unclear how long to wait before
> > sending a reminder -- Documentation/process suggests that comments
> > should be received in a week or so, pinging after 4 and 6 weeks seemed
> > reasonable. If x86 has a longer queue, might be worth documenting that
> > somewhere?
>
> We try to track all stuff but x86 is super crazy most of the time,
> especially currently, so stuff gets prioritized based on urgency and we
> also try to round-robin through all submitters so that stuff doesn't get
> left out.
>
> Thus the one-week thing will never work with x86. Once a month ping
> maybe.
>
> In your case, since it is an improvement which is good to have but not
> absolutely a must and not a bugfix, it is understandable that it would
> get pushed back in priority.
>
> But stuff usually won't be forgotten and we'll get to it eventually - it
> is just that we're mega swamped all the time. :-\
>

Ok, thanks.

2020-02-22 05:09:47

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Thu, Jan 09, 2020 at 10:02:18AM -0500, Arvind Sankar wrote:
> Discarding the sections that are unused in the compressed kernel saves
> about 10 KiB on 32-bit and 6 KiB on 64-bit, mostly from .eh_frame.
>
> Signed-off-by: Arvind Sankar <[email protected]>
> ---
> arch/x86/boot/compressed/vmlinux.lds.S | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> index 508cfa6828c5..12a20603d92e 100644
> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> @@ -73,4 +73,9 @@ SECTIONS
> #endif
> . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
> _end = .;
> +
> + /* Discard all remaining sections */
> + /DISCARD/ : {
> + *(*)
> + }
> }
> --
> 2.24.1
>

This patch breaks linking with ld.lld:

$ make -j$(nproc) -s CC=clang LD=ld.lld O=out.x86_64 distclean defconfig bzImage
ld.lld: error: discarding .shstrtab section is not allowed
...

I am not exactly sure how to keep that section around (or if it is
ABSOLUTELY necessary like ld.lld seems to claim) otherwise I would send
a patch.

It would be nice not to break this tool since it is faster than ld.bfd.

Cheers,
Nathan

2020-02-22 06:56:55

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Fri, Feb 21, 2020 at 10:08:45PM -0700, Nathan Chancellor wrote:
> On Thu, Jan 09, 2020 at 10:02:18AM -0500, Arvind Sankar wrote:
> > Discarding the sections that are unused in the compressed kernel saves
> > about 10 KiB on 32-bit and 6 KiB on 64-bit, mostly from .eh_frame.
> >
> > Signed-off-by: Arvind Sankar <[email protected]>
> > ---
> > arch/x86/boot/compressed/vmlinux.lds.S | 5 +++++
> > 1 file changed, 5 insertions(+)
> >
> > diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> > index 508cfa6828c5..12a20603d92e 100644
> > --- a/arch/x86/boot/compressed/vmlinux.lds.S
> > +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> > @@ -73,4 +73,9 @@ SECTIONS
> > #endif
> > . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
> > _end = .;
> > +
> > + /* Discard all remaining sections */
> > + /DISCARD/ : {
> > + *(*)
> > + }
> > }
> > --
> > 2.24.1
> >
>
> This patch breaks linking with ld.lld:
>
> $ make -j$(nproc) -s CC=clang LD=ld.lld O=out.x86_64 distclean defconfig bzImage
> ld.lld: error: discarding .shstrtab section is not allowed

Well, why is it not allowed? And why isn't the GNU linker complaining?

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-02-22 07:02:52

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Sat, Feb 22, 2020 at 07:55:21AM +0100, Borislav Petkov wrote:
> On Fri, Feb 21, 2020 at 10:08:45PM -0700, Nathan Chancellor wrote:
> > On Thu, Jan 09, 2020 at 10:02:18AM -0500, Arvind Sankar wrote:
> > > Discarding the sections that are unused in the compressed kernel saves
> > > about 10 KiB on 32-bit and 6 KiB on 64-bit, mostly from .eh_frame.
> > >
> > > Signed-off-by: Arvind Sankar <[email protected]>
> > > ---
> > > arch/x86/boot/compressed/vmlinux.lds.S | 5 +++++
> > > 1 file changed, 5 insertions(+)
> > >
> > > diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> > > index 508cfa6828c5..12a20603d92e 100644
> > > --- a/arch/x86/boot/compressed/vmlinux.lds.S
> > > +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> > > @@ -73,4 +73,9 @@ SECTIONS
> > > #endif
> > > . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
> > > _end = .;
> > > +
> > > + /* Discard all remaining sections */
> > > + /DISCARD/ : {
> > > + *(*)
> > > + }
> > > }
> > > --
> > > 2.24.1
> > >
> >
> > This patch breaks linking with ld.lld:
> >
> > $ make -j$(nproc) -s CC=clang LD=ld.lld O=out.x86_64 distclean defconfig bzImage
> > ld.lld: error: discarding .shstrtab section is not allowed
>
> Well, why is it not allowed? And why isn't the GNU linker complaining?

No idea, unfortunately I am not a linker expert and the patch that
changes this in lld does not really explain why it adds this
restriction:

https://github.com/llvm/llvm-project/commit/1e799942b37d04f30b73f6a9e792d551dadafeea

CC'ing Fangrui as I don't know George's email and he is usually
responsive to ld.lld issues/questions.

Cheers,
Nathan

2020-02-22 07:22:45

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On 2020-02-22, Nathan Chancellor wrote:
>On Sat, Feb 22, 2020 at 07:55:21AM +0100, Borislav Petkov wrote:
>> On Fri, Feb 21, 2020 at 10:08:45PM -0700, Nathan Chancellor wrote:
>> > On Thu, Jan 09, 2020 at 10:02:18AM -0500, Arvind Sankar wrote:
>> > > Discarding the sections that are unused in the compressed kernel saves
>> > > about 10 KiB on 32-bit and 6 KiB on 64-bit, mostly from .eh_frame.
>> > >
>> > > Signed-off-by: Arvind Sankar <[email protected]>
>> > > ---
>> > > arch/x86/boot/compressed/vmlinux.lds.S | 5 +++++
>> > > 1 file changed, 5 insertions(+)
>> > >
>> > > diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
>> > > index 508cfa6828c5..12a20603d92e 100644
>> > > --- a/arch/x86/boot/compressed/vmlinux.lds.S
>> > > +++ b/arch/x86/boot/compressed/vmlinux.lds.S
>> > > @@ -73,4 +73,9 @@ SECTIONS
>> > > #endif
>> > > . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
>> > > _end = .;
>> > > +
>> > > + /* Discard all remaining sections */
>> > > + /DISCARD/ : {
>> > > + *(*)
>> > > + }
>> > > }
>> > > --
>> > > 2.24.1
>> > >
>> >
>> > This patch breaks linking with ld.lld:
>> >
>> > $ make -j$(nproc) -s CC=clang LD=ld.lld O=out.x86_64 distclean defconfig bzImage
>> > ld.lld: error: discarding .shstrtab section is not allowed
>>
>> Well, why is it not allowed? And why isn't the GNU linker complaining?
>
>No idea, unfortunately I am not a linker expert and the patch that
>changes this in lld does not really explain why it adds this
>restriction:
>
>https://github.com/llvm/llvm-project/commit/1e799942b37d04f30b73f6a9e792d551dadafeea
>
>CC'ing Fangrui as I don't know George's email and he is usually
>responsive to ld.lld issues/questions.
>
>Cheers,
>Nathan

In GNU ld, it seems that .shstrtab .symtab and .strtab are special
cased. Neither the input section description *(.shstrtab) nor *(*)
discards .shstrtab . I feel that this is a weird case (probably even a bug)
that lld should not implement.

In GNU ld, the following is not useful, while lld will discard the
synthesized .strtab as requested:

/DISCARD/ : { *(.strtab) }

I think it is better making the intention (retaining .shstrtab)
explicit, by adding a .shstrtab beside /DISCARD/ :

SECTIONS {
...
.shstrtab : { *(.shstrtab) }
/DISCARD/ : { *(*) }
}

2020-02-22 07:43:20

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Fri, Feb 21, 2020 at 11:21:44PM -0800, Fangrui Song wrote:
> On 2020-02-22, Nathan Chancellor wrote:
> > On Sat, Feb 22, 2020 at 07:55:21AM +0100, Borislav Petkov wrote:
> > > On Fri, Feb 21, 2020 at 10:08:45PM -0700, Nathan Chancellor wrote:
> > > > On Thu, Jan 09, 2020 at 10:02:18AM -0500, Arvind Sankar wrote:
> > > > > Discarding the sections that are unused in the compressed kernel saves
> > > > > about 10 KiB on 32-bit and 6 KiB on 64-bit, mostly from .eh_frame.
> > > > >
> > > > > Signed-off-by: Arvind Sankar <[email protected]>
> > > > > ---
> > > > > arch/x86/boot/compressed/vmlinux.lds.S | 5 +++++
> > > > > 1 file changed, 5 insertions(+)
> > > > >
> > > > > diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> > > > > index 508cfa6828c5..12a20603d92e 100644
> > > > > --- a/arch/x86/boot/compressed/vmlinux.lds.S
> > > > > +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> > > > > @@ -73,4 +73,9 @@ SECTIONS
> > > > > #endif
> > > > > . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
> > > > > _end = .;
> > > > > +
> > > > > + /* Discard all remaining sections */
> > > > > + /DISCARD/ : {
> > > > > + *(*)
> > > > > + }
> > > > > }
> > > > > --
> > > > > 2.24.1
> > > > >
> > > >
> > > > This patch breaks linking with ld.lld:
> > > >
> > > > $ make -j$(nproc) -s CC=clang LD=ld.lld O=out.x86_64 distclean defconfig bzImage
> > > > ld.lld: error: discarding .shstrtab section is not allowed
> > >
> > > Well, why is it not allowed? And why isn't the GNU linker complaining?
> >
> > No idea, unfortunately I am not a linker expert and the patch that
> > changes this in lld does not really explain why it adds this
> > restriction:
> >
> > https://github.com/llvm/llvm-project/commit/1e799942b37d04f30b73f6a9e792d551dadafeea
> >
> > CC'ing Fangrui as I don't know George's email and he is usually
> > responsive to ld.lld issues/questions.
> >
> > Cheers,
> > Nathan
>
> In GNU ld, it seems that .shstrtab .symtab and .strtab are special
> cased. Neither the input section description *(.shstrtab) nor *(*)
> discards .shstrtab . I feel that this is a weird case (probably even a bug)
> that lld should not implement.
>
> In GNU ld, the following is not useful, while lld will discard the
> synthesized .strtab as requested:
>
> /DISCARD/ : { *(.strtab) }
>
> I think it is better making the intention (retaining .shstrtab)
> explicit, by adding a .shstrtab beside /DISCARD/ :
>
> SECTIONS {
> ...
> .shstrtab : { *(.shstrtab) }
> /DISCARD/ : { *(*) }
> }

Thanks for the clarity. With your suggestion (diff below), I see the
following error:

arch/x86/boot/compressed/vmlinux: no symbols
ld.lld: error: undefined symbol: ZO_input_data
>>> referenced by arch/x86/boot/header.o:(.header+0x59)

ld.lld: error: undefined symbol: ZO_z_input_len
>>> referenced by arch/x86/boot/header.o:(.header+0x5D)
make[3]: *** [../arch/x86/boot/Makefile:108: arch/x86/boot/setup.elf]

It seems like the section still isn't being added?

Cheers,
Nathan

diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 12a20603d92e..a6c3c7440c27 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -74,6 +74,10 @@ SECTIONS
. = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
_end = .;

+ .shstrtab : {
+ *(.shstrtab)
+ }
+
/* Discard all remaining sections */
/DISCARD/ : {
*(*)

2020-02-22 07:44:30

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Fri, Feb 21, 2020 at 11:21:44PM -0800, Fangrui Song wrote:
> In GNU ld, it seems that .shstrtab .symtab and .strtab are special
> cased. Neither the input section description *(.shstrtab) nor *(*)
> discards .shstrtab . I feel that this is a weird case (probably even a bug)
> that lld should not implement.

Ok, forget what the tools do for a second: why is .shstrtab special and
why would one want to keep it?

Because one still wants to know what the section names of an object are
or other tools need it or why?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-02-22 15:38:54

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Sat, Feb 22, 2020 at 12:42:42AM -0700, Nathan Chancellor wrote:
>
> Thanks for the clarity. With your suggestion (diff below), I see the
> following error:
>
> arch/x86/boot/compressed/vmlinux: no symbols
> ld.lld: error: undefined symbol: ZO_input_data
> >>> referenced by arch/x86/boot/header.o:(.header+0x59)
>
> ld.lld: error: undefined symbol: ZO_z_input_len
> >>> referenced by arch/x86/boot/header.o:(.header+0x5D)
> make[3]: *** [../arch/x86/boot/Makefile:108: arch/x86/boot/setup.elf]
>
> It seems like the section still isn't being added?
>
> Cheers,
> Nathan

It seems like lld also doesn't treat .symtab as special and is
discarding it, but that one is actually essential to be able to build
the bzImage.

The sections that GNU ld ends up discarding via that *(*) directive are
.dynsym, .dynstr, .gnu.hash, .eh_frame, .rela.dyn, .comment and
.dynamic.

Out of these, only .eh_frame has any significant size, and that's what
we discard in the other linker scripts (in kernel/vmlinux.lds.S and
boot/setup.ld).

It looks like it would be safest to just do
/DISCARD/ : {
*(.eh_frame)
}
instead. If you can double-check that that works with lld, I can send
out a new version.

Thanks and sorry for the breakage.

2020-02-22 16:23:23

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Sat, Feb 22, 2020 at 08:42:54AM +0100, Borislav Petkov wrote:
> On Fri, Feb 21, 2020 at 11:21:44PM -0800, Fangrui Song wrote:
> > In GNU ld, it seems that .shstrtab .symtab and .strtab are special
> > cased. Neither the input section description *(.shstrtab) nor *(*)
> > discards .shstrtab . I feel that this is a weird case (probably even a bug)
> > that lld should not implement.
>
> Ok, forget what the tools do for a second: why is .shstrtab special and
> why would one want to keep it?
>
> Because one still wants to know what the section names of an object are
> or other tools need it or why?
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

.shstrtab is required by the ELF specification. The e_shstrndx field in
the ELF header is the index of .shstrtab, and each section in the
section table is required to have an sh_name that points into the
.shstrtab.

2020-02-22 16:44:44

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Sat, Feb 22, 2020 at 10:37:47AM -0500, Arvind Sankar wrote:
> On Sat, Feb 22, 2020 at 12:42:42AM -0700, Nathan Chancellor wrote:
> >
> > Thanks for the clarity. With your suggestion (diff below), I see the
> > following error:
> >
> > arch/x86/boot/compressed/vmlinux: no symbols
> > ld.lld: error: undefined symbol: ZO_input_data
> > >>> referenced by arch/x86/boot/header.o:(.header+0x59)
> >
> > ld.lld: error: undefined symbol: ZO_z_input_len
> > >>> referenced by arch/x86/boot/header.o:(.header+0x5D)
> > make[3]: *** [../arch/x86/boot/Makefile:108: arch/x86/boot/setup.elf]
> >
> > It seems like the section still isn't being added?
> >
> > Cheers,
> > Nathan
>
> It seems like lld also doesn't treat .symtab as special and is
> discarding it, but that one is actually essential to be able to build
> the bzImage.
>
> The sections that GNU ld ends up discarding via that *(*) directive are
> .dynsym, .dynstr, .gnu.hash, .eh_frame, .rela.dyn, .comment and
> .dynamic.
>
> Out of these, only .eh_frame has any significant size, and that's what
> we discard in the other linker scripts (in kernel/vmlinux.lds.S and
> boot/setup.ld).
>
> It looks like it would be safest to just do
> /DISCARD/ : {
> *(.eh_frame)
> }
> instead. If you can double-check that that works with lld, I can send
> out a new version.
>
> Thanks and sorry for the breakage.

Tested with lld and it seems fine with that change.

Boris, should I send the fix as a diff to the current patch in tip, or
as a fresh one that can replace it?

Thanks.

2020-02-22 17:19:35

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH] x86/boot/compressed: Fix compressed kernel linking with lld

Commit TBD ("x86/boot/compressed: Remove unnecessary sections from
bzImage") discarded unnecessary sections with *(*). While this works
fine with the bfd linker, lld tries to also discard essential sections
like .shstrtab, .symtab and .strtab, which results in the link failing
since .shstrtab is required by the ELF specification. .symtab and
.strtab are also necessary to generate the zoffset.h file for the
bzImage header.

Since the only sizeable section that can be discarded is .eh_frame,
restrict the discard to only .eh_frame to be safe.

Signed-off-by: Arvind Sankar <[email protected]>
---
Sending as a fix on top of tip/x86/boot.

arch/x86/boot/compressed/vmlinux.lds.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 12a20603d92e..469dcf800a2c 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -74,8 +74,8 @@ SECTIONS
. = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
_end = .;

- /* Discard all remaining sections */
+ /* Discard .eh_frame to save some space */
/DISCARD/ : {
- *(*)
+ *(.eh_frame)
}
}
--
2.24.1

2020-02-22 17:30:30

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Sat, Feb 22, 2020 at 11:44:20AM -0500, Arvind Sankar wrote:
> Boris, should I send the fix as a diff to the current patch in tip, or
> as a fresh one that can replace it?

The offending commit is the top commit on tip:x86/boot so I'll merge
your new one with it and thus "convert" the former one into the new one
discarding .eh_frame only.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-02-22 17:53:58

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Sat, Feb 22, 2020 at 06:29:48PM +0100, Borislav Petkov wrote:
> On Sat, Feb 22, 2020 at 11:44:20AM -0500, Arvind Sankar wrote:
> > Boris, should I send the fix as a diff to the current patch in tip, or
> > as a fresh one that can replace it?
>
> The offending commit is the top commit on tip:x86/boot so I'll merge
> your new one with it and thus "convert" the former one into the new one
> discarding .eh_frame only.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

Thanks.

2020-02-22 18:15:26

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Fix compressed kernel linking with lld

On Sat, Feb 22, 2020 at 12:18:59PM -0500, Arvind Sankar wrote:
> Commit TBD ("x86/boot/compressed: Remove unnecessary sections from
> bzImage") discarded unnecessary sections with *(*). While this works
> fine with the bfd linker, lld tries to also discard essential sections
> like .shstrtab, .symtab and .strtab, which results in the link failing
> since .shstrtab is required by the ELF specification. .symtab and
> .strtab are also necessary to generate the zoffset.h file for the
> bzImage header.
>
> Since the only sizeable section that can be discarded is .eh_frame,
> restrict the discard to only .eh_frame to be safe.
>
> Signed-off-by: Arvind Sankar <[email protected]>
> ---
> Sending as a fix on top of tip/x86/boot.
>
> arch/x86/boot/compressed/vmlinux.lds.S | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> index 12a20603d92e..469dcf800a2c 100644
> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> @@ -74,8 +74,8 @@ SECTIONS
> . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
> _end = .;
>
> - /* Discard all remaining sections */
> + /* Discard .eh_frame to save some space */
> /DISCARD/ : {
> - *(*)
> + *(.eh_frame)
> }
> }
> --
> 2.24.1
>

FWIW:

Tested-by: Nathan Chancellor <[email protected]>

2020-02-22 18:58:58

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Fix compressed kernel linking with lld

On 2020-02-22, Nathan Chancellor wrote:
>On Sat, Feb 22, 2020 at 12:18:59PM -0500, Arvind Sankar wrote:
>> Commit TBD ("x86/boot/compressed: Remove unnecessary sections from
>> bzImage") discarded unnecessary sections with *(*). While this works
>> fine with the bfd linker, lld tries to also discard essential sections
>> like .shstrtab, .symtab and .strtab, which results in the link failing
>> since .shstrtab is required by the ELF specification. .symtab and
>> .strtab are also necessary to generate the zoffset.h file for the
>> bzImage header.
>>
>> Since the only sizeable section that can be discarded is .eh_frame,
>> restrict the discard to only .eh_frame to be safe.
>>
>> Signed-off-by: Arvind Sankar <[email protected]>
>> ---
>> Sending as a fix on top of tip/x86/boot.
>>
>> arch/x86/boot/compressed/vmlinux.lds.S | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
>> index 12a20603d92e..469dcf800a2c 100644
>> --- a/arch/x86/boot/compressed/vmlinux.lds.S
>> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
>> @@ -74,8 +74,8 @@ SECTIONS
>> . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
>> _end = .;
>>
>> - /* Discard all remaining sections */
>> + /* Discard .eh_frame to save some space */
>> /DISCARD/ : {
>> - *(*)
>> + *(.eh_frame)
>> }
>> }
>> --
>> 2.24.1
>>
>
>FWIW:
>
>Tested-by: Nathan Chancellor <[email protected]>

I am puzzled. Doesn't -fno-asynchronous-unwind-tables suppress
.eh_frame in the object files? Why are there still .eh_frame?

Though, there is prior art: arch/s390/boot/compressed/vmlinux.lds.S also discards .eh_frame

2020-02-22 20:18:56

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Fix compressed kernel linking with lld

On Sat, Feb 22, 2020 at 10:58:06AM -0800, Fangrui Song wrote:
> On 2020-02-22, Nathan Chancellor wrote:
> >On Sat, Feb 22, 2020 at 12:18:59PM -0500, Arvind Sankar wrote:
> >> Commit TBD ("x86/boot/compressed: Remove unnecessary sections from
> >> bzImage") discarded unnecessary sections with *(*). While this works
> >> fine with the bfd linker, lld tries to also discard essential sections
> >> like .shstrtab, .symtab and .strtab, which results in the link failing
> >> since .shstrtab is required by the ELF specification. .symtab and
> >> .strtab are also necessary to generate the zoffset.h file for the
> >> bzImage header.
> >>
> >> Since the only sizeable section that can be discarded is .eh_frame,
> >> restrict the discard to only .eh_frame to be safe.
> >>
> >> Signed-off-by: Arvind Sankar <[email protected]>
> >> ---
> >> Sending as a fix on top of tip/x86/boot.
> >>
> >> arch/x86/boot/compressed/vmlinux.lds.S | 4 ++--
> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> >> index 12a20603d92e..469dcf800a2c 100644
> >> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> >> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> >> @@ -74,8 +74,8 @@ SECTIONS
> >> . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
> >> _end = .;
> >>
> >> - /* Discard all remaining sections */
> >> + /* Discard .eh_frame to save some space */
> >> /DISCARD/ : {
> >> - *(*)
> >> + *(.eh_frame)
> >> }
> >> }
> >> --
> >> 2.24.1
> >>
> >
> >FWIW:
> >
> >Tested-by: Nathan Chancellor <[email protected]>
>
> I am puzzled. Doesn't -fno-asynchronous-unwind-tables suppress
> .eh_frame in the object files? Why are there still .eh_frame?
>
> Though, there is prior art: arch/s390/boot/compressed/vmlinux.lds.S also discards .eh_frame

The compressed kernel doesn't use the regular flags and it seems it
doesn't have that option. Maybe we should add it in to avoid generating
those in the first place.

The .eh_frame discard in arch/x86/kernel/vmlinux.lds.S does seem
superfluous though.

2020-02-22 21:01:30

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Fix compressed kernel linking with lld

On 2020-02-22, Arvind Sankar wrote:
>On Sat, Feb 22, 2020 at 10:58:06AM -0800, Fangrui Song wrote:
>> On 2020-02-22, Nathan Chancellor wrote:
>> >On Sat, Feb 22, 2020 at 12:18:59PM -0500, Arvind Sankar wrote:
>> >> Commit TBD ("x86/boot/compressed: Remove unnecessary sections from
>> >> bzImage") discarded unnecessary sections with *(*). While this works
>> >> fine with the bfd linker, lld tries to also discard essential sections
>> >> like .shstrtab, .symtab and .strtab, which results in the link failing
>> >> since .shstrtab is required by the ELF specification. .symtab and
>> >> .strtab are also necessary to generate the zoffset.h file for the
>> >> bzImage header.
>> >>
>> >> Since the only sizeable section that can be discarded is .eh_frame,
>> >> restrict the discard to only .eh_frame to be safe.
>> >>
>> >> Signed-off-by: Arvind Sankar <[email protected]>
>> >> ---
>> >> Sending as a fix on top of tip/x86/boot.
>> >>
>> >> arch/x86/boot/compressed/vmlinux.lds.S | 4 ++--
>> >> 1 file changed, 2 insertions(+), 2 deletions(-)
>> >>
>> >> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
>> >> index 12a20603d92e..469dcf800a2c 100644
>> >> --- a/arch/x86/boot/compressed/vmlinux.lds.S
>> >> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
>> >> @@ -74,8 +74,8 @@ SECTIONS
>> >> . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
>> >> _end = .;
>> >>
>> >> - /* Discard all remaining sections */
>> >> + /* Discard .eh_frame to save some space */
>> >> /DISCARD/ : {
>> >> - *(*)
>> >> + *(.eh_frame)
>> >> }
>> >> }
>> >> --
>> >> 2.24.1
>> >>
>> >
>> >FWIW:
>> >
>> >Tested-by: Nathan Chancellor <[email protected]>
>>
>> I am puzzled. Doesn't -fno-asynchronous-unwind-tables suppress
>> .eh_frame in the object files? Why are there still .eh_frame?
>>
>> Though, there is prior art: arch/s390/boot/compressed/vmlinux.lds.S also discards .eh_frame
>
>The compressed kernel doesn't use the regular flags and it seems it
>doesn't have that option. Maybe we should add it in to avoid generating
>those in the first place.
>
>The .eh_frame discard in arch/x86/kernel/vmlinux.lds.S does seem
>superfluous though.

Yes, please do that. I recommend suppressting unneeded sections at
compile time, instead of discarding them at link time.

https://github.com/torvalds/linux/commit/83a092cf95f28696ddc36c8add0cf03ac034897f
added -Wl,--orphan-handling=warn to arch/powerpc/Makefile .
x86 can follow if that is appropriate.

I don't recommend -Wl,--orphan-handling=error, which can unnecessarily
break the builds.

2020-02-22 23:21:12

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Sat, Feb 22, 2020 at 8:22 AM Arvind Sankar <[email protected]> wrote:
>
> On Sat, Feb 22, 2020 at 08:42:54AM +0100, Borislav Petkov wrote:
> > On Fri, Feb 21, 2020 at 11:21:44PM -0800, Fangrui Song wrote:
> > > In GNU ld, it seems that .shstrtab .symtab and .strtab are special
> > > cased. Neither the input section description *(.shstrtab) nor *(*)
> > > discards .shstrtab . I feel that this is a weird case (probably even a bug)
> > > that lld should not implement.
> >
> > Ok, forget what the tools do for a second: why is .shstrtab special and
> > why would one want to keep it?
> >
> > Because one still wants to know what the section names of an object are
> > or other tools need it or why?
> >
> > Thx.
> >
> > --
> > Regards/Gruss,
> > Boris.
> >
> > https://people.kernel.org/tglx/notes-about-netiquette
>
> .shstrtab is required by the ELF specification. The e_shstrndx field in
> the ELF header is the index of .shstrtab, and each section in the
> section table is required to have an sh_name that points into the
> .shstrtab.

Yeah, I can see it both ways. That `*` doesn't glob all remaining
sections is surprising to me, but bfd seems to be "extra helpful" in
not discarding sections that are required via ELF spec.

Kees is working on a series to just be explicit about what sections
are ordered where, and what's discarded, which should better handle
incompatibilities between linkers in regards to orphan section
placement and "what does `*` mean." Kees, that series can't come soon
enough. ;) (I think it's intended to help "fine grain" (per function)
KASLR). More comments in the other thread.

Taken from the Zen of Python, but in regards to sections in linker
scripts, "explicit is better than implicit."
--
Thanks,
~Nick Desaulniers

2020-02-22 23:35:20

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Fix compressed kernel linking with lld

On Sat, Feb 22, 2020 at 1:01 PM 'Fangrui Song' via Clang Built Linux
<[email protected]> wrote:
>
> On 2020-02-22, Arvind Sankar wrote:
> >On Sat, Feb 22, 2020 at 10:58:06AM -0800, Fangrui Song wrote:
> >> On 2020-02-22, Nathan Chancellor wrote:
> >> >On Sat, Feb 22, 2020 at 12:18:59PM -0500, Arvind Sankar wrote:
> >> >> Commit TBD ("x86/boot/compressed: Remove unnecessary sections from
> >> >> bzImage") discarded unnecessary sections with *(*). While this works
> >> >> fine with the bfd linker, lld tries to also discard essential sections
> >> >> like .shstrtab, .symtab and .strtab, which results in the link failing
> >> >> since .shstrtab is required by the ELF specification. .symtab and
> >> >> .strtab are also necessary to generate the zoffset.h file for the
> >> >> bzImage header.
> >> >>
> >> >> Since the only sizeable section that can be discarded is .eh_frame,
> >> >> restrict the discard to only .eh_frame to be safe.
> >> >>
> >> >> Signed-off-by: Arvind Sankar <[email protected]>
> >> >> ---
> >> >> Sending as a fix on top of tip/x86/boot.
> >> >>
> >> >> arch/x86/boot/compressed/vmlinux.lds.S | 4 ++--
> >> >> 1 file changed, 2 insertions(+), 2 deletions(-)
> >> >>
> >> >> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> >> >> index 12a20603d92e..469dcf800a2c 100644
> >> >> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> >> >> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> >> >> @@ -74,8 +74,8 @@ SECTIONS
> >> >> . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
> >> >> _end = .;
> >> >>
> >> >> - /* Discard all remaining sections */
> >> >> + /* Discard .eh_frame to save some space */
> >> >> /DISCARD/ : {
> >> >> - *(*)
> >> >> + *(.eh_frame)
> >> >> }
> >> >> }
> >> >> --
> >> >> 2.24.1
> >> >>
> >> >
> >> >FWIW:
> >> >
> >> >Tested-by: Nathan Chancellor <[email protected]>
> >>
> >> I am puzzled. Doesn't -fno-asynchronous-unwind-tables suppress
> >> .eh_frame in the object files? Why are there still .eh_frame?
> >>
> >> Though, there is prior art: arch/s390/boot/compressed/vmlinux.lds.S also discards .eh_frame
> >
> >The compressed kernel doesn't use the regular flags and it seems it
> >doesn't have that option. Maybe we should add it in to avoid generating
> >those in the first place.

Ah, yikes. For reference, please see my commit:

commit b059f801a937 ("x86/purgatory: Use CFLAGS_REMOVE rather than
reset KBUILD_CFLAGS")

I'm of the conviction that reassigning KBUILD_CFLAGS via `:=`, as
opposed to strictly filtering flags out of it or appending to it, is
an antipattern. We very very carefully construct KBUILD_CFLAGS in top
level and arch/ Makefiles, and it's very easy to miss a flag or to
when you "reset" KBUILD_CFLAGS.

*Boom* Case in point.

I meant to audit the rest of the places we do this in the kernel, but
haven't had the time to revisit arch/x86/boot/compressed/Makefile.

For now, I suggest:
1. revert `Commit TBD ("x86/boot/compressed: Remove unnecessary
sections from bzImage")` as it runs afoul differences in `*` for
`DISCARD` sections between linkers, as the intent was to remove
.eh_frame, of which it's less work to not generate them in the first
place via compiler flag, rather than generate then discard via linker.
2. simply add `KBUILD_CFLAGS += -fno-asynchronous-unwind-tables` to
arch/x86/boot/compressed/Makefile with Fangrui's Sugguested-by tag.
3. Remind me to revisit my proposed cleanup of
arch/x86/boot/compressed/Makefile (which eventually will undo #2). ;)
4. tglx to remind me that my compiler is broken and that I should fix it. :P

> >
> >The .eh_frame discard in arch/x86/kernel/vmlinux.lds.S does seem
> >superfluous though.
>
> Yes, please do that. I recommend suppressting unneeded sections at
> compile time, instead of discarding them at link time.
>
> https://github.com/torvalds/linux/commit/83a092cf95f28696ddc36c8add0cf03ac034897f
> added -Wl,--orphan-handling=warn to arch/powerpc/Makefile .
> x86 can follow if that is appropriate.

Kees has patches for that; I recommend waiting on his series.

> I don't recommend -Wl,--orphan-handling=error, which can unnecessarily
> break the builds.

Agreed. Until we have CI testing with LLD which is a WIP (action item on me).
--
Thanks,
~Nick Desaulniers

2020-02-22 23:57:57

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Fix compressed kernel linking with lld

On Sat, Feb 22, 2020 at 03:33:20PM -0800, Nick Desaulniers wrote:
>
> Ah, yikes. For reference, please see my commit:
>
> commit b059f801a937 ("x86/purgatory: Use CFLAGS_REMOVE rather than
> reset KBUILD_CFLAGS")
>
> I'm of the conviction that reassigning KBUILD_CFLAGS via `:=`, as
> opposed to strictly filtering flags out of it or appending to it, is
> an antipattern. We very very carefully construct KBUILD_CFLAGS in top
> level and arch/ Makefiles, and it's very easy to miss a flag or to
> when you "reset" KBUILD_CFLAGS.
>
> *Boom* Case in point.
>
> I meant to audit the rest of the places we do this in the kernel, but
> haven't had the time to revisit arch/x86/boot/compressed/Makefile.
>
> For now, I suggest:
> 1. revert `Commit TBD ("x86/boot/compressed: Remove unnecessary
> sections from bzImage")` as it runs afoul differences in `*` for
> `DISCARD` sections between linkers, as the intent was to remove
> .eh_frame, of which it's less work to not generate them in the first
> place via compiler flag, rather than generate then discard via linker.
> 2. simply add `KBUILD_CFLAGS += -fno-asynchronous-unwind-tables` to
> arch/x86/boot/compressed/Makefile with Fangrui's Sugguested-by tag.
> 3. Remind me to revisit my proposed cleanup of
> arch/x86/boot/compressed/Makefile (which eventually will undo #2). ;)
> 4. tglx to remind me that my compiler is broken and that I should fix it. :P

Ok. For reference, note that arch/x86/boot/Makefile also redefines
KBUILD_CFLAGS and missed this option, which is why commit 163159aad74d
("x86/boot: Discard .eh_frame sections") was necessary.

There's also arch/x86/realmode/rm/Makefile as well.

There's also a bunch of places where the CFLAGS_REMOVE have fallen
behind the times -- they remove only -pg rather than CC_FLAGS_FTRACE.
Probably harmless currently since the other flags should be ineffective
without the -pg but might want to clean this up as well.

2020-02-23 19:37:49

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH 1/2] arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections

While discussing a patch to discard .eh_frame from the compressed
vmlinux using the linker script, Fangrui Song pointed out [1] that these
sections shouldn't exist in the first place because arch/x86/Makefile
uses -fno-asynchronous-unwind-tables.

It turns out this is because the Makefiles used to build the compressed
kernel redefine KBUILD_CFLAGS, dropping this flag.

Add the flag to the Makefile for the compressed kernel, as well as the
EFI stub Makefile to fix this.

Also add the flag to boot/Makefile and realmode/rm/Makefile so that the
kernel's boot code (boot/setup.elf) and realmode trampoline
(realmode/rm/realmode.elf) won't be compiled with .eh_frame sections,
since their linker scripts also just discard it.

Signed-off-by: Arvind Sankar <[email protected]>
Suggested-By: Fangrui Song <[email protected]>
[1] https://lore.kernel.org/lkml/[email protected]/
---
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/realmode/rm/Makefile | 1 +
drivers/firmware/efi/libstub/Makefile | 3 ++-
4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 012b82fc8617..24f011e0adf1 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -68,6 +68,7 @@ clean-files += cpustr.h
KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
+KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
UBSAN_SANITIZE := n

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 26050ae0b27e..c33111341325 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -39,6 +39,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)
KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
KBUILD_CFLAGS += -Wno-pointer-sign
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
+KBUILD_CFLAGS += -fno-asynchronous-unwind-tables

KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 99b6332ba540..b11ec5d8f8ac 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -71,5 +71,6 @@ $(obj)/realmode.relocs: $(obj)/realmode.elf FORCE
KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
-I$(srctree)/arch/x86/boot
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
+KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
UBSAN_SANITIZE := n
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 98a81576213d..a1140c4ee478 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -12,7 +12,8 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__KERNEL__ -O2 \
-mno-mmx -mno-sse -fshort-wchar \
-Wno-pointer-sign \
$(call cc-disable-warning, address-of-packed-member) \
- $(call cc-disable-warning, gnu)
+ $(call cc-disable-warning, gnu) \
+ -fno-asynchronous-unwind-tables

# arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
# disable the stackleak plugin
--
2.24.1

2020-02-23 19:38:27

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH 0/2] Stop generating .eh_frame sections

In three places in the x86 kernel we are currently generating .eh_frame
sections only to discard them later via linker script. This is in the
boot code (setup.elf), the realmode trampoline (realmode.elf) and the
compressed kernel.

Implement Fangrui and Nick's suggestion [1] to fix KBUILD_CFLAGS by
adding -fno-asynchronous-unwind-tables to avoid generating .eh_frame
sections in the first place, rather than discarding it in the linker
script.

Arvind Sankar (2):
arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections
arch/x86: Drop unneeded linker script discard of .eh_frame

arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/setup.ld | 1 -
arch/x86/kernel/vmlinux.lds.S | 3 ---
arch/x86/realmode/rm/Makefile | 1 +
arch/x86/realmode/rm/realmode.lds.S | 1 -
drivers/firmware/efi/libstub/Makefile | 3 ++-
7 files changed, 5 insertions(+), 6 deletions(-)

--
2.24.1

2020-02-23 19:39:03

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH 2/2] arch/x86: Drop unneeded linker script discard of .eh_frame

Now that we don't generate .eh_frame sections for the files in setup.elf
and realmode.elf, the linker scripts don't need the /DISCARD/ any more.

Also remove the one in the main kernel linker script, since there are no
.eh_frame sections already.

Signed-off-by: Arvind Sankar <[email protected]>
---
arch/x86/boot/setup.ld | 1 -
arch/x86/kernel/vmlinux.lds.S | 3 ---
arch/x86/realmode/rm/realmode.lds.S | 1 -
3 files changed, 5 deletions(-)

diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index 3da1c37c6dd5..24c95522f231 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -52,7 +52,6 @@ SECTIONS
_end = .;

/DISCARD/ : {
- *(.eh_frame)
*(.note*)
}

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index e3296aa028fe..54f7b9f46446 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -412,9 +412,6 @@ SECTIONS
DWARF_DEBUG

DISCARDS
- /DISCARD/ : {
- *(.eh_frame)
- }
}


diff --git a/arch/x86/realmode/rm/realmode.lds.S b/arch/x86/realmode/rm/realmode.lds.S
index 64d135d1ee63..63aa51875ba0 100644
--- a/arch/x86/realmode/rm/realmode.lds.S
+++ b/arch/x86/realmode/rm/realmode.lds.S
@@ -71,7 +71,6 @@ SECTIONS
/DISCARD/ : {
*(.note*)
*(.debug*)
- *(.eh_frame*)
}

#include "pasyms.h"
--
2.24.1

2020-02-23 22:02:12

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Fix compressed kernel linking with lld

On Sat, Feb 22, 2020 at 1:01 PM 'Fangrui Song' via Clang Built Linux
<[email protected]> wrote:
> https://github.com/torvalds/linux/commit/83a092cf95f28696ddc36c8add0cf03ac034897f
> added -Wl,--orphan-handling=warn to arch/powerpc/Makefile .
> x86 can follow if that is appropriate.

I've been playing with a series to do this, here:

https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=linker/orphans/x86-arm

There's some work to be done still...

--
Kees Cook

2020-02-24 04:16:08

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH 0/2] Stop generating .eh_frame sections

On Sun, Feb 23, 2020 at 02:37:13PM -0500, Arvind Sankar wrote:
> In three places in the x86 kernel we are currently generating .eh_frame
> sections only to discard them later via linker script. This is in the
> boot code (setup.elf), the realmode trampoline (realmode.elf) and the
> compressed kernel.
>
> Implement Fangrui and Nick's suggestion [1] to fix KBUILD_CFLAGS by
> adding -fno-asynchronous-unwind-tables to avoid generating .eh_frame
> sections in the first place, rather than discarding it in the linker
> script.
>
> Arvind Sankar (2):
> arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections
> arch/x86: Drop unneeded linker script discard of .eh_frame
>
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/boot/setup.ld | 1 -
> arch/x86/kernel/vmlinux.lds.S | 3 ---
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/realmode/rm/realmode.lds.S | 1 -
> drivers/firmware/efi/libstub/Makefile | 3 ++-
> 7 files changed, 5 insertions(+), 6 deletions(-)
>
> --
> 2.24.1
>

With both of these patches applied on top of next-20200221 and a revert
of commit e11831d0ada3 ("x86/boot/compressed: Remove unnecessary
sections from bzImage"), I do not see any .eh_frame sections in the
following files:

$ readelf -S arch/x86/boot/setup.elf \
arch/x86/realmode/rm/realmode.elf \
arch/x86/boot/compressed/vmlinux \
vmlinux |& grep eh_frame

Additionally, I can see -fno-asynchronous-unwind-tables get added to
files that didn't previously have it.

Reviewed-by: Nathan Chancellor <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>

2020-02-24 06:07:53

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH] x86/boot/compressed: Fix compressed kernel linking with lld

On 2020-02-23, Kees Cook wrote:
>On Sat, Feb 22, 2020 at 1:01 PM 'Fangrui Song' via Clang Built Linux
><[email protected]> wrote:
>> https://github.com/torvalds/linux/commit/83a092cf95f28696ddc36c8add0cf03ac034897f
>> added -Wl,--orphan-handling=warn to arch/powerpc/Makefile .
>> x86 can follow if that is appropriate.
>
>I've been playing with a series to do this, here:
>
>https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=linker/orphans/x86-arm
>
>There's some work to be done still...

Thanks for investigating this! There are a number of compiler options
which can add ad-hoc sections. They may need caution.


I just filed https://sourceware.org/bugzilla/show_bug.cgi?id=25591 "Should /DISCARD/ : { *(.symtab) *(.strtab) } work?"
Let's see what GNU ld will do...

Note that * can be refined to SHF_ALLOC sections
(https://sourceware.org/binutils/docs/ld/Input-Section-Basics.html):

SECTIONS {
.text : { *(.text) }
/* This excludes .strtab / .symtab / .shstrtab */
/* https://reviews.llvm.org/D72756 implemented INPUT_SECTION_FLAGS. Not included in LLVM release/10.* */
/DISCARD/ : { INPUT_SECTION_FLAGS(SHF_ALLOC) *(*) }
}


Just realized that this was reported as https://bugs.llvm.org/show_bug.cgi?id=44452
Looks like we will probably close it as wontfix.

Subject: [tip: x86/boot] x86/boot/compressed: Remove .eh_frame section from bzImage

The following commit has been merged into the x86/boot branch of tip:

Commit-ID: 0eea39a234dc52063d14541fabcb2c64516a2328
Gitweb: https://git.kernel.org/tip/0eea39a234dc52063d14541fabcb2c64516a2328
Author: Arvind Sankar <[email protected]>
AuthorDate: Thu, 09 Jan 2020 10:02:18 -05:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Mon, 24 Feb 2020 12:30:28 +01:00

x86/boot/compressed: Remove .eh_frame section from bzImage

Discarding unnecessary sections with "*(*)" (see thread at Link: below)
works fine with the bfd linker but fails with lld:

$ make -j$(nproc) -s CC=clang LD=ld.lld O=out.x86_64 distclean defconfig bzImage
ld.lld: error: discarding .shstrtab section is not allowed

lld tries to also discard essential sections like .shstrtab, .symtab and
.strtab, which results in the link failing since .shstrtab is required
by the ELF specification: the e_shstrndx field in the ELF header is the
index of .shstrtab, and each section in the section table is required to
have an sh_name that points into the .shstrtab.

.symtab and .strtab are also necessary to generate the zoffset.h file
for the bzImage header.

Since the only sizeable section that can be discarded is .eh_frame,
restrict the discard to only .eh_frame to be safe.

[ bp: Flesh out commit message and replace offending commit with this one. ]

Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/vmlinux.lds.S | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 508cfa6..469dcf8 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -73,4 +73,9 @@ SECTIONS
#endif
. = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
_end = .;
+
+ /* Discard .eh_frame to save some space */
+ /DISCARD/ : {
+ *(.eh_frame)
+ }
}

2020-02-24 13:29:17

by Michael Matz

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

Hello,

On Sat, 22 Feb 2020, Nick Desaulniers wrote:

> > > > In GNU ld, it seems that .shstrtab .symtab and .strtab are special
> > > > cased. Neither the input section description *(.shstrtab) nor *(*)
> > > > discards .shstrtab . I feel that this is a weird case (probably even a bug)
> > > > that lld should not implement.
> > >
> > > Ok, forget what the tools do for a second: why is .shstrtab special and
> > > why would one want to keep it?
> > >
> > > Because one still wants to know what the section names of an object are
> > > or other tools need it or why?
> > >
> > > Thx.
> > >
> > > --
> > > Regards/Gruss,
> > > Boris.
> > >
> > > https://people.kernel.org/tglx/notes-about-netiquette
> >
> > .shstrtab is required by the ELF specification. The e_shstrndx field in
> > the ELF header is the index of .shstrtab, and each section in the
> > section table is required to have an sh_name that points into the
> > .shstrtab.
>
> Yeah, I can see it both ways. That `*` doesn't glob all remaining
> sections is surprising to me, but bfd seems to be "extra helpful" in
> not discarding sections that are required via ELF spec.

In a way the /DISCARD/ assignment should be thought of as applying to
_input_ sections (as all such section references on the RHS), not
necessarily to output sections. What this then means for sections that
are synthesized by the link editor is less clear. Some of them are
generated regardless (as you noted, e.g. the symbol table and associated
string sections, including section name string table), some of them are
suppressed, and either lead to an followup error (e.g. with .gnu.hash), or
to invalid output (e.g. missing .dynsym for executables simply lead to
segfaults when running them).

That's the reason for the perceived inconsistency with behaviour on '*':
it's application to synthesized sections. Arguably bfd should be fixed to
also not discard the other essential sections (or alternatively to give an
error when an essential section is discarded). The lld behaviour of e.g.
discarding .shstrtab (or other synthesized sections necessary for valid
ELF output) doesn't make much sense either, though.


Ciao,
Michael.

2020-02-24 16:43:07

by Arvind Sankar

[permalink] [raw]
Subject: Re: [tip: x86/boot] x86/boot/compressed: Remove .eh_frame section from bzImage

On Mon, Feb 24, 2020 at 11:37:00AM -0000, tip-bot2 for Arvind Sankar wrote:
> The following commit has been merged into the x86/boot branch of tip:
>
> Commit-ID: 0eea39a234dc52063d14541fabcb2c64516a2328
> Gitweb: https://git.kernel.org/tip/0eea39a234dc52063d14541fabcb2c64516a2328
> Author: Arvind Sankar <[email protected]>
> AuthorDate: Thu, 09 Jan 2020 10:02:18 -05:00
> Committer: Borislav Petkov <[email protected]>
> CommitterDate: Mon, 24 Feb 2020 12:30:28 +01:00
>
> x86/boot/compressed: Remove .eh_frame section from bzImage
>

Hi Boris, apologies for the confusion and unnecessary work I've created,
but I think the preference is to merge the 2-patch series I posted
yesterday [1] instead of this.

[1] https://lore.kernel.org/lkml/[email protected]/

2020-02-24 17:17:46

by Borislav Petkov

[permalink] [raw]
Subject: Re: [tip: x86/boot] x86/boot/compressed: Remove .eh_frame section from bzImage

On Mon, Feb 24, 2020 at 11:41:29AM -0500, Arvind Sankar wrote:
> Hi Boris, apologies for the confusion and unnecessary work I've created,
> but I think the preference is to merge the 2-patch series I posted
> yesterday [1] instead of this.
>
> [1] https://lore.kernel.org/lkml/[email protected]/

What guarantees this would work and we won't hit some corner case or
toolchain configuration this hasn't been tested on?

If that happens, I need to have a state to revert back to, i.e., this
patch, discarding .eh_frame explicitly.

So I'll pick up [1] too, but give people a couple of days - a chance to
complain about. :)

--
Regards/Gruss,
Boris.

SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

2020-02-24 17:29:35

by Arvind Sankar

[permalink] [raw]
Subject: Re: [tip: x86/boot] x86/boot/compressed: Remove .eh_frame section from bzImage

On Mon, Feb 24, 2020 at 06:16:18PM +0100, Borislav Petkov wrote:
> On Mon, Feb 24, 2020 at 11:41:29AM -0500, Arvind Sankar wrote:
> > Hi Boris, apologies for the confusion and unnecessary work I've created,
> > but I think the preference is to merge the 2-patch series I posted
> > yesterday [1] instead of this.
> >
> > [1] https://lore.kernel.org/lkml/[email protected]/
>
> What guarantees this would work and we won't hit some corner case or
> toolchain configuration this hasn't been tested on?
>
> If that happens, I need to have a state to revert back to, i.e., this
> patch, discarding .eh_frame explicitly.
>
> So I'll pick up [1] too, but give people a couple of days - a chance to
> complain about. :)
>
> --
> Regards/Gruss,
> Boris.
>
> SUSE Software Solutions Germany GmbH, GF: Felix Imendörffer, HRB 36809, AG Nürnberg

Ok, note that the 2-patch series assumed that it would replace this one,
so it doesn't contain a revert of discarding .eh_frame explicitly for
the compressed kernel.

The first patch of that series at least should be safe enough, it will
stop generating .eh_frame sections in most cases, and shouldn't make any
edge cases worse than before. The second one might be a regression if we
do have some case that still creates them though.

2020-02-24 20:34:32

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH 1/2] arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections

On Sun, Feb 23, 2020 at 11:37 AM Arvind Sankar <[email protected]> wrote:
>
> While discussing a patch to discard .eh_frame from the compressed
> vmlinux using the linker script, Fangrui Song pointed out [1] that these
> sections shouldn't exist in the first place because arch/x86/Makefile
> uses -fno-asynchronous-unwind-tables.

Another benefit is that -fno-asynchronous-unwind-tables may help
reduce the size of .text!
https://stackoverflow.com/a/26302715/1027966

>
> It turns out this is because the Makefiles used to build the compressed
> kernel redefine KBUILD_CFLAGS, dropping this flag.
>
> Add the flag to the Makefile for the compressed kernel, as well as the
> EFI stub Makefile to fix this.
>
> Also add the flag to boot/Makefile and realmode/rm/Makefile so that the
> kernel's boot code (boot/setup.elf) and realmode trampoline
> (realmode/rm/realmode.elf) won't be compiled with .eh_frame sections,
> since their linker scripts also just discard it.
>
> Signed-off-by: Arvind Sankar <[email protected]>
> Suggested-By: Fangrui Song <[email protected]>
> [1] https://lore.kernel.org/lkml/[email protected]/
> ---
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 3 ++-
> 4 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index 012b82fc8617..24f011e0adf1 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -68,6 +68,7 @@ clean-files += cpustr.h
> KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> +KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> UBSAN_SANITIZE := n
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 26050ae0b27e..c33111341325 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -39,6 +39,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)
> KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
> KBUILD_CFLAGS += -Wno-pointer-sign
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> +KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 99b6332ba540..b11ec5d8f8ac 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -71,5 +71,6 @@ $(obj)/realmode.relocs: $(obj)/realmode.elf FORCE
> KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> -I$(srctree)/arch/x86/boot
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> +KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 98a81576213d..a1140c4ee478 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -12,7 +12,8 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__KERNEL__ -O2 \
> -mno-mmx -mno-sse -fshort-wchar \
> -Wno-pointer-sign \
> $(call cc-disable-warning, address-of-packed-member) \
> - $(call cc-disable-warning, gnu)
> + $(call cc-disable-warning, gnu) \
> + -fno-asynchronous-unwind-tables

I think we want to add this flag a little lower, line 27 has:

KBUILD_CFLAGS := $(cflags-y) -DDISABLE_BRANCH_PROFILING \

so the `cflags-y` variable you modify in this hunk will only set
-fno-asynchronous-unwind-tables for CONFIG_X86, which I don't think is
intentional. Though when I run

$ llvm-readelf -S drivers/firmware/efi/libstub/lib.a | grep eh_frame

after doing an x86_64 defconfig, I don't get any hits. Do you observe
.eh_frame sections on any of these objects in this dir? (I'm fine
adding it to be safe, but I'm curious why I'm not seeing any
.eh_frame)

>
> # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
> # disable the stackleak plugin
> --
> 2.24.1
>


--
Thanks,
~Nick Desaulniers

2020-02-24 20:46:47

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch/x86: Drop unneeded linker script discard of .eh_frame

On Sun, Feb 23, 2020 at 11:37 AM Arvind Sankar <[email protected]> wrote:
>
> Now that we don't generate .eh_frame sections for the files in setup.elf
> and realmode.elf, the linker scripts don't need the /DISCARD/ any more.
>
> Also remove the one in the main kernel linker script, since there are no
> .eh_frame sections already.

Yep, we could go even further and validate the object files post link
such that $(READELF) reported no .eh_frame section, suggesting the use
of -fno-asynchronous-unwind-tables in KBUILD_CFLAGS.

>
> Signed-off-by: Arvind Sankar <[email protected]>
> ---
> arch/x86/boot/setup.ld | 1 -
> arch/x86/kernel/vmlinux.lds.S | 3 ---
> arch/x86/realmode/rm/realmode.lds.S | 1 -
> 3 files changed, 5 deletions(-)
>
> diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
> index 3da1c37c6dd5..24c95522f231 100644
> --- a/arch/x86/boot/setup.ld
> +++ b/arch/x86/boot/setup.ld
> @@ -52,7 +52,6 @@ SECTIONS
> _end = .;
>
> /DISCARD/ : {
> - *(.eh_frame)
> *(.note*)
> }
>
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index e3296aa028fe..54f7b9f46446 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -412,9 +412,6 @@ SECTIONS
> DWARF_DEBUG
>
> DISCARDS
> - /DISCARD/ : {
> - *(.eh_frame)
> - }
> }

grepping for eh_frame in arch/x86/ there's a comment in
arch/x86/include/asm/dwarf2.h:
40 #ifndef BUILD_VDSO
41 /*
42 * Emit CFI data in .debug_frame sections, not .eh_frame
sections.
43 * The latter we currently just discard since we don't do DWARF
44 * unwinding at runtime. So only the offline DWARF information is
45 * useful to anyone. Note we should not use this directive if
46 * vmlinux.lds.S gets changed so it doesn't discard .eh_frame.
47 */
48 .cfi_sections .debug_frame

add via:
commit 7b956f035a9ef ("x86/asm: Re-add parts of the manual CFI infrastructure")

https://sourceware.org/binutils/docs/as/CFI-directives.html#g_t_002ecfi_005fsections-section_005flist
is the manual's section on .cfi_sections directives, and states `The
default if this directive is not used is .cfi_sections .eh_frame.`.
So the comment is slightly stale since we're no longer explicitly
discarding .eh_frame in arch/x86/kernel/vmlinux.lds.S, rather
preventing the generation via -fno-asynchronous-unwind-tables in
KBUILD_CFLAGS (across a few different Makefiles). Would you mind also
updating the comment in arch/x86/include/asm/dwarf2.h in a V2? The
rest of this patch LGTM.

>
>
> diff --git a/arch/x86/realmode/rm/realmode.lds.S b/arch/x86/realmode/rm/realmode.lds.S
> index 64d135d1ee63..63aa51875ba0 100644
> --- a/arch/x86/realmode/rm/realmode.lds.S
> +++ b/arch/x86/realmode/rm/realmode.lds.S
> @@ -71,7 +71,6 @@ SECTIONS
> /DISCARD/ : {
> *(.note*)
> *(.debug*)
> - *(.eh_frame*)
> }
>
> #include "pasyms.h"
> --
> 2.24.1
>


--
Thanks,
~Nick Desaulniers

2020-02-24 20:49:38

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH 0/2] Stop generating .eh_frame sections

On Sun, Feb 23, 2020 at 11:37 AM Arvind Sankar <[email protected]> wrote:
>
> In three places in the x86 kernel we are currently generating .eh_frame
> sections only to discard them later via linker script. This is in the
> boot code (setup.elf), the realmode trampoline (realmode.elf) and the
> compressed kernel.
>
> Implement Fangrui and Nick's suggestion [1] to fix KBUILD_CFLAGS by
> adding -fno-asynchronous-unwind-tables to avoid generating .eh_frame
> sections in the first place, rather than discarding it in the linker
> script.
>
> Arvind Sankar (2):
> arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections
> arch/x86: Drop unneeded linker script discard of .eh_frame

Thanks for the series! I've left some feedback for a v2. Would you
mind please including a revert of ("x86/boot/compressed: Remove
unnecessary sections from bzImage") in a v2 series? Our CI being red
through the weekend is no bueno.

>
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/boot/setup.ld | 1 -
> arch/x86/kernel/vmlinux.lds.S | 3 ---
> arch/x86/realmode/rm/Makefile | 1 +
> arch/x86/realmode/rm/realmode.lds.S | 1 -
> drivers/firmware/efi/libstub/Makefile | 3 ++-
> 7 files changed, 5 insertions(+), 6 deletions(-)
>
> --
> 2.24.1
>


--
Thanks,
~Nick Desaulniers

2020-02-24 20:51:49

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Mon, Feb 24, 2020 at 5:28 AM Michael Matz <[email protected]> wrote:
>
> Hello,
>
> On Sat, 22 Feb 2020, Nick Desaulniers wrote:
>
> > > > > In GNU ld, it seems that .shstrtab .symtab and .strtab are special
> > > > > cased. Neither the input section description *(.shstrtab) nor *(*)
> > > > > discards .shstrtab . I feel that this is a weird case (probably even a bug)
> > > > > that lld should not implement.
> > > >
> > > > Ok, forget what the tools do for a second: why is .shstrtab special and
> > > > why would one want to keep it?
> > > >
> > > > Because one still wants to know what the section names of an object are
> > > > or other tools need it or why?
> > > >
> > > > Thx.
> > > >
> > > > --
> > > > Regards/Gruss,
> > > > Boris.
> > > >
> > > > https://people.kernel.org/tglx/notes-about-netiquette
> > >
> > > .shstrtab is required by the ELF specification. The e_shstrndx field in
> > > the ELF header is the index of .shstrtab, and each section in the
> > > section table is required to have an sh_name that points into the
> > > .shstrtab.
> >
> > Yeah, I can see it both ways. That `*` doesn't glob all remaining
> > sections is surprising to me, but bfd seems to be "extra helpful" in
> > not discarding sections that are required via ELF spec.
>
> In a way the /DISCARD/ assignment should be thought of as applying to
> _input_ sections (as all such section references on the RHS), not
> necessarily to output sections. What this then means for sections that
> are synthesized by the link editor is less clear. Some of them are
> generated regardless (as you noted, e.g. the symbol table and associated
> string sections, including section name string table), some of them are
> suppressed, and either lead to an followup error (e.g. with .gnu.hash), or
> to invalid output (e.g. missing .dynsym for executables simply lead to
> segfaults when running them).
>
> That's the reason for the perceived inconsistency with behaviour on '*':
> it's application to synthesized sections. Arguably bfd should be fixed to
> also not discard the other essential sections (or alternatively to give an
> error when an essential section is discarded). The lld behaviour of e.g.
> discarding .shstrtab (or other synthesized sections necessary for valid
> ELF output) doesn't make much sense either, though.

Hi Michael, thank you for the precise feedback. Do you have a list of
"synthesized sections necessary for valid ELF output?" Also, could you
point me to the documentation about `*` and its relation to
"synthesized sections necessary for valid ELF output?" This will help
me file a precise bug against LLD.
--
Thanks,
~Nick Desaulniers

2020-02-24 21:05:47

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections

On Mon, Feb 24, 2020 at 12:33:49PM -0800, Nick Desaulniers wrote:
> On Sun, Feb 23, 2020 at 11:37 AM Arvind Sankar <[email protected]> wrote:
> >
> > While discussing a patch to discard .eh_frame from the compressed
> > vmlinux using the linker script, Fangrui Song pointed out [1] that these
> > sections shouldn't exist in the first place because arch/x86/Makefile
> > uses -fno-asynchronous-unwind-tables.
>
> Another benefit is that -fno-asynchronous-unwind-tables may help
> reduce the size of .text!
> https://stackoverflow.com/a/26302715/1027966

Hm I don't see any change in .text size.
> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > index 98a81576213d..a1140c4ee478 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -12,7 +12,8 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__KERNEL__ -O2 \
> > -mno-mmx -mno-sse -fshort-wchar \
> > -Wno-pointer-sign \
> > $(call cc-disable-warning, address-of-packed-member) \
> > - $(call cc-disable-warning, gnu)
> > + $(call cc-disable-warning, gnu) \
> > + -fno-asynchronous-unwind-tables
>
> I think we want to add this flag a little lower, line 27 has:
>
> KBUILD_CFLAGS := $(cflags-y) -DDISABLE_BRANCH_PROFILING \
>
> so the `cflags-y` variable you modify in this hunk will only set
> -fno-asynchronous-unwind-tables for CONFIG_X86, which I don't think is
> intentional. Though when I run

It is intentional -- the other case is that we're building for ARM,
which only filters out the regular KBUILD_CFLAGS, so adding the flag for
it should not be necessary. The cflags for ARM are constructed by
manipulating KBUILD_CFLAGS. Besides it may or may not want unwind
tables. 32-bit ARM appears to have an option to enable -funwind-tables.

>
> $ llvm-readelf -S drivers/firmware/efi/libstub/lib.a | grep eh_frame
>
> after doing an x86_64 defconfig, I don't get any hits. Do you observe
> .eh_frame sections on any of these objects in this dir? (I'm fine
> adding it to be safe, but I'm curious why I'm not seeing any
> .eh_frame)
>

You mean before this patch, right? I see hits on every .o file in there
(compiling with gcc 9.2.0).

> >
> > # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
> > # disable the stackleak plugin
> > --
> > 2.24.1
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers

2020-02-24 21:12:50

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH 1/2] arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections

On 2020-02-24, Arvind Sankar wrote:
>On Mon, Feb 24, 2020 at 12:33:49PM -0800, Nick Desaulniers wrote:
>> On Sun, Feb 23, 2020 at 11:37 AM Arvind Sankar <[email protected]> wrote:
>> >
>> > While discussing a patch to discard .eh_frame from the compressed
>> > vmlinux using the linker script, Fangrui Song pointed out [1] that these
>> > sections shouldn't exist in the first place because arch/x86/Makefile
>> > uses -fno-asynchronous-unwind-tables.
>>
>> Another benefit is that -fno-asynchronous-unwind-tables may help
>> reduce the size of .text!
>> https://stackoverflow.com/a/26302715/1027966
>
>Hm I don't see any change in .text size.
>> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
>> > index 98a81576213d..a1140c4ee478 100644
>> > --- a/drivers/firmware/efi/libstub/Makefile
>> > +++ b/drivers/firmware/efi/libstub/Makefile
>> > @@ -12,7 +12,8 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__KERNEL__ -O2 \
>> > -mno-mmx -mno-sse -fshort-wchar \
>> > -Wno-pointer-sign \
>> > $(call cc-disable-warning, address-of-packed-member) \
>> > - $(call cc-disable-warning, gnu)
>> > + $(call cc-disable-warning, gnu) \
>> > + -fno-asynchronous-unwind-tables
>>
>> I think we want to add this flag a little lower, line 27 has:
>>
>> KBUILD_CFLAGS := $(cflags-y) -DDISABLE_BRANCH_PROFILING \
>>
>> so the `cflags-y` variable you modify in this hunk will only set
>> -fno-asynchronous-unwind-tables for CONFIG_X86, which I don't think is
>> intentional. Though when I run
>
>It is intentional -- the other case is that we're building for ARM,
>which only filters out the regular KBUILD_CFLAGS, so adding the flag for
>it should not be necessary. The cflags for ARM are constructed by
>manipulating KBUILD_CFLAGS. Besides it may or may not want unwind
>tables. 32-bit ARM appears to have an option to enable -funwind-tables.

clang (as of today) has not implemented the
-funwind-tables/-fasynchronous-unwind-tables distinction as GCC does..
(probably because not many people care..)

>>
>> $ llvm-readelf -S drivers/firmware/efi/libstub/lib.a | grep eh_frame
>>
>> after doing an x86_64 defconfig, I don't get any hits. Do you observe
>> .eh_frame sections on any of these objects in this dir? (I'm fine
>> adding it to be safe, but I'm curious why I'm not seeing any
>> .eh_frame)
>>
>
>You mean before this patch, right? I see hits on every .o file in there
>(compiling with gcc 9.2.0).
>
>> >
>> > # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
>> > # disable the stackleak plugin
>> > --
>> > 2.24.1
>> >

2020-02-24 21:18:55

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH 1/2] arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections

On Mon, Feb 24, 2020 at 1:12 PM Fangrui Song <[email protected]> wrote:
>
> On 2020-02-24, Arvind Sankar wrote:
> >On Mon, Feb 24, 2020 at 12:33:49PM -0800, Nick Desaulniers wrote:
> >> On Sun, Feb 23, 2020 at 11:37 AM Arvind Sankar <[email protected]> wrote:
> >> >
> >> > While discussing a patch to discard .eh_frame from the compressed
> >> > vmlinux using the linker script, Fangrui Song pointed out [1] that these
> >> > sections shouldn't exist in the first place because arch/x86/Makefile
> >> > uses -fno-asynchronous-unwind-tables.
> >>
> >> Another benefit is that -fno-asynchronous-unwind-tables may help
> >> reduce the size of .text!
> >> https://stackoverflow.com/a/26302715/1027966
> >
> >Hm I don't see any change in .text size.
> >> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> >> > index 98a81576213d..a1140c4ee478 100644
> >> > --- a/drivers/firmware/efi/libstub/Makefile
> >> > +++ b/drivers/firmware/efi/libstub/Makefile
> >> > @@ -12,7 +12,8 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__KERNEL__ -O2 \
> >> > -mno-mmx -mno-sse -fshort-wchar \
> >> > -Wno-pointer-sign \
> >> > $(call cc-disable-warning, address-of-packed-member) \
> >> > - $(call cc-disable-warning, gnu)
> >> > + $(call cc-disable-warning, gnu) \
> >> > + -fno-asynchronous-unwind-tables
> >>
> >> I think we want to add this flag a little lower, line 27 has:
> >>
> >> KBUILD_CFLAGS := $(cflags-y) -DDISABLE_BRANCH_PROFILING \
> >>
> >> so the `cflags-y` variable you modify in this hunk will only set
> >> -fno-asynchronous-unwind-tables for CONFIG_X86, which I don't think is
> >> intentional. Though when I run
> >
> >It is intentional -- the other case is that we're building for ARM,
> >which only filters out the regular KBUILD_CFLAGS, so adding the flag for
> >it should not be necessary. The cflags for ARM are constructed by
> >manipulating KBUILD_CFLAGS. Besides it may or may not want unwind
> >tables. 32-bit ARM appears to have an option to enable -funwind-tables.

Ah, right the `subst` from `KBUILD_CFLAGS`.
Are there other architectures that care about EFI beyond x86 and ARM? IA64?

>
> clang (as of today) has not implemented the
> -funwind-tables/-fasynchronous-unwind-tables distinction as GCC does..
> (probably because not many people care..)

Ah, thanks for the clarification.

>
> >>
> >> $ llvm-readelf -S drivers/firmware/efi/libstub/lib.a | grep eh_frame
> >>
> >> after doing an x86_64 defconfig, I don't get any hits. Do you observe
> >> .eh_frame sections on any of these objects in this dir? (I'm fine
> >> adding it to be safe, but I'm curious why I'm not seeing any
> >> .eh_frame)
> >>
> >
> >You mean before this patch, right? I see hits on every .o file in there
> >(compiling with gcc 9.2.0).
> >
> >> >
> >> > # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
> >> > # disable the stackleak plugin
> >> > --
> >> > 2.24.1
> >> >



--
Thanks,
~Nick Desaulniers

2020-02-24 21:22:55

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH 1/2] arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections

On Mon, Feb 24, 2020 at 1:17 PM Nick Desaulniers
<[email protected]> wrote:
>
> On Mon, Feb 24, 2020 at 1:12 PM Fangrui Song <[email protected]> wrote:
> >
> > On 2020-02-24, Arvind Sankar wrote:
> > >On Mon, Feb 24, 2020 at 12:33:49PM -0800, Nick Desaulniers wrote:
> > >> On Sun, Feb 23, 2020 at 11:37 AM Arvind Sankar <[email protected]> wrote:
> > >> >
> > >> > While discussing a patch to discard .eh_frame from the compressed
> > >> > vmlinux using the linker script, Fangrui Song pointed out [1] that these
> > >> > sections shouldn't exist in the first place because arch/x86/Makefile
> > >> > uses -fno-asynchronous-unwind-tables.
> > >>
> > >> Another benefit is that -fno-asynchronous-unwind-tables may help
> > >> reduce the size of .text!
> > >> https://stackoverflow.com/a/26302715/1027966
> > >
> > >Hm I don't see any change in .text size.
> > >> > diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> > >> > index 98a81576213d..a1140c4ee478 100644
> > >> > --- a/drivers/firmware/efi/libstub/Makefile
> > >> > +++ b/drivers/firmware/efi/libstub/Makefile
> > >> > @@ -12,7 +12,8 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__KERNEL__ -O2 \
> > >> > -mno-mmx -mno-sse -fshort-wchar \
> > >> > -Wno-pointer-sign \
> > >> > $(call cc-disable-warning, address-of-packed-member) \
> > >> > - $(call cc-disable-warning, gnu)
> > >> > + $(call cc-disable-warning, gnu) \
> > >> > + -fno-asynchronous-unwind-tables
> > >>
> > >> I think we want to add this flag a little lower, line 27 has:
> > >>
> > >> KBUILD_CFLAGS := $(cflags-y) -DDISABLE_BRANCH_PROFILING \
> > >>
> > >> so the `cflags-y` variable you modify in this hunk will only set
> > >> -fno-asynchronous-unwind-tables for CONFIG_X86, which I don't think is
> > >> intentional. Though when I run
> > >
> > >It is intentional -- the other case is that we're building for ARM,
> > >which only filters out the regular KBUILD_CFLAGS, so adding the flag for
> > >it should not be necessary. The cflags for ARM are constructed by
> > >manipulating KBUILD_CFLAGS. Besides it may or may not want unwind
> > >tables. 32-bit ARM appears to have an option to enable -funwind-tables.
>
> Ah, right the `subst` from `KBUILD_CFLAGS`.
> Are there other architectures that care about EFI beyond x86 and ARM? IA64?

Looks like while IA64 supports CONFIG_EFI, it doesn't support
CONFIG_EFI_STUB, which controls whether drivers/firmware/efi/libstub/
gets built or not. So that patch should be good to go.

Reviewed-by: Nick Desaulniers <[email protected]>

>
> >
> > clang (as of today) has not implemented the
> > -funwind-tables/-fasynchronous-unwind-tables distinction as GCC does..
> > (probably because not many people care..)
>
> Ah, thanks for the clarification.
>
> >
> > >>
> > >> $ llvm-readelf -S drivers/firmware/efi/libstub/lib.a | grep eh_frame
> > >>
> > >> after doing an x86_64 defconfig, I don't get any hits. Do you observe
> > >> .eh_frame sections on any of these objects in this dir? (I'm fine
> > >> adding it to be safe, but I'm curious why I'm not seeing any
> > >> .eh_frame)
> > >>
> > >
> > >You mean before this patch, right? I see hits on every .o file in there
> > >(compiling with gcc 9.2.0).
> > >
> > >> >
> > >> > # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
> > >> > # disable the stackleak plugin
> > >> > --
> > >> > 2.24.1
> > >> >
>
>
>
> --
> Thanks,
> ~Nick Desaulniers



--
Thanks,
~Nick Desaulniers

2020-02-24 21:28:56

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On 2020-02-24, Nick Desaulniers wrote:
>On Mon, Feb 24, 2020 at 5:28 AM Michael Matz <[email protected]> wrote:
>>
>> Hello,
>>
>> On Sat, 22 Feb 2020, Nick Desaulniers wrote:
>>
>> > > > > In GNU ld, it seems that .shstrtab .symtab and .strtab are special
>> > > > > cased. Neither the input section description *(.shstrtab) nor *(*)
>> > > > > discards .shstrtab . I feel that this is a weird case (probably even a bug)
>> > > > > that lld should not implement.
>> > > >
>> > > > Ok, forget what the tools do for a second: why is .shstrtab special and
>> > > > why would one want to keep it?
>> > > >
>> > > > Because one still wants to know what the section names of an object are
>> > > > or other tools need it or why?
>> > > >
>> > > > Thx.
>> > > >
>> > > > --
>> > > > Regards/Gruss,
>> > > > Boris.
>> > > >
>> > > > https://people.kernel.org/tglx/notes-about-netiquette
>> > >
>> > > .shstrtab is required by the ELF specification. The e_shstrndx field in
>> > > the ELF header is the index of .shstrtab, and each section in the
>> > > section table is required to have an sh_name that points into the
>> > > .shstrtab.
>> >
>> > Yeah, I can see it both ways. That `*` doesn't glob all remaining
>> > sections is surprising to me, but bfd seems to be "extra helpful" in
>> > not discarding sections that are required via ELF spec.
>>
>> In a way the /DISCARD/ assignment should be thought of as applying to
>> _input_ sections (as all such section references on the RHS), not
>> necessarily to output sections. What this then means for sections that
>> are synthesized by the link editor is less clear. Some of them are
>> generated regardless (as you noted, e.g. the symbol table and associated
>> string sections, including section name string table), some of them are
>> suppressed, and either lead to an followup error (e.g. with .gnu.hash), or
>> to invalid output (e.g. missing .dynsym for executables simply lead to
>> segfaults when running them).

Hi Michael, please see my other reply on this thread: https://lkml.org/lkml/2020/2/24/47

Synthesized sections can be matched as well. For example, SECTIONS { .pltfoo : { *(.plt) }} can rename the output section .plt to .pltfoo
It seems that in GNU ld, the synthesized section is associated with the
original object file, so it can be written as:

SECTIONS { .pltfoo : { a.o(.plt) }}

In lld, you need a wildcard to match the synthesized section *(.plt)

.rela.dyn is another example.

>> That's the reason for the perceived inconsistency with behaviour on '*':
>> it's application to synthesized sections. Arguably bfd should be fixed to
>> also not discard the other essential sections (or alternatively to give an
>> error when an essential section is discarded). The lld behaviour of e.g.
>> discarding .shstrtab (or other synthesized sections necessary for valid
>> ELF output) doesn't make much sense either, though.

I think most input section descriptions *(*) are misuse. They really
should be INPUT_SECTION_FLAGS(SHF_ALLOC) *(*)

>Hi Michael, thank you for the precise feedback. Do you have a list of
>"synthesized sections necessary for valid ELF output?" Also, could you
>point me to the documentation about `*` and its relation to
>"synthesized sections necessary for valid ELF output?" This will help
>me file a precise bug against LLD.

https://sourceware.org/binutils/docs/ld/Output-Section-Discarding.html#Output-Section-Discarding

has a few words on this topic. A large part is implementation defined.
In GNU ld, the implementation is mostly in ld/ldlang.c and ld/ldexp.c
(very long).

2020-02-24 21:34:42

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch/x86: Drop unneeded linker script discard of .eh_frame

On Mon, Feb 24, 2020 at 12:45:51PM -0800, Nick Desaulniers wrote:
>
> grepping for eh_frame in arch/x86/ there's a comment in
> arch/x86/include/asm/dwarf2.h:
> 40 #ifndef BUILD_VDSO
> 41 /*
> 42 * Emit CFI data in .debug_frame sections, not .eh_frame
> sections.
> 43 * The latter we currently just discard since we don't do DWARF
> 44 * unwinding at runtime. So only the offline DWARF information is
> 45 * useful to anyone. Note we should not use this directive if
> 46 * vmlinux.lds.S gets changed so it doesn't discard .eh_frame.
> 47 */
> 48 .cfi_sections .debug_frame
>
> add via:
> commit 7b956f035a9ef ("x86/asm: Re-add parts of the manual CFI infrastructure")
>
> https://sourceware.org/binutils/docs/as/CFI-directives.html#g_t_002ecfi_005fsections-section_005flist
> is the manual's section on .cfi_sections directives, and states `The
> default if this directive is not used is .cfi_sections .eh_frame.`.
> So the comment is slightly stale since we're no longer explicitly
> discarding .eh_frame in arch/x86/kernel/vmlinux.lds.S, rather
> preventing the generation via -fno-asynchronous-unwind-tables in
> KBUILD_CFLAGS (across a few different Makefiles). Would you mind also
> updating the comment in arch/x86/include/asm/dwarf2.h in a V2? The
> rest of this patch LGTM.
>

i.e. just replace that last sentence with "Note ... if we decide to use
runtime DWARF unwinding again"?

The whole ifdef-ery machinery there is obsolete, all the directives its
checking support for have been there since binutils-2.18, so should
probably also clean it up to just unconditionally define them.

2020-02-24 21:50:54

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Mon, Feb 24, 2020 at 01:28:28PM -0800, Fangrui Song wrote:
> Hi Michael, please see my other reply on this thread: https://lkml.org/lkml/2020/2/24/47
>
> Synthesized sections can be matched as well. For example, SECTIONS { .pltfoo : { *(.plt) }} can rename the output section .plt to .pltfoo
> It seems that in GNU ld, the synthesized section is associated with the
> original object file, so it can be written as:
>
> SECTIONS { .pltfoo : { a.o(.plt) }}
>
> In lld, you need a wildcard to match the synthesized section *(.plt)
>
> .rela.dyn is another example.
>

With the BFD toolchain, file matching doesn't actually seem to work at
least for .rela.dyn. I've tried playing around with it in the past and
if you try to use file-matching to capture relocations from a particular
input file, it just doesn't work sensibly.

2020-02-24 21:55:22

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 0/2] Stop generating .eh_frame sections

On Mon, Feb 24, 2020 at 12:49:03PM -0800, Nick Desaulniers wrote:
> On Sun, Feb 23, 2020 at 11:37 AM Arvind Sankar <[email protected]> wrote:
> >
> > In three places in the x86 kernel we are currently generating .eh_frame
> > sections only to discard them later via linker script. This is in the
> > boot code (setup.elf), the realmode trampoline (realmode.elf) and the
> > compressed kernel.
> >
> > Implement Fangrui and Nick's suggestion [1] to fix KBUILD_CFLAGS by
> > adding -fno-asynchronous-unwind-tables to avoid generating .eh_frame
> > sections in the first place, rather than discarding it in the linker
> > script.
> >
> > Arvind Sankar (2):
> > arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections
> > arch/x86: Drop unneeded linker script discard of .eh_frame
>
> Thanks for the series! I've left some feedback for a v2. Would you
> mind please including a revert of ("x86/boot/compressed: Remove
> unnecessary sections from bzImage") in a v2 series? Our CI being red
> through the weekend is no bueno.

Sorry about that. Boris already updated tip:x86/boot to only discard
eh_frame, so your CI should be ok at least now.

>
> >
> > arch/x86/boot/Makefile | 1 +
> > arch/x86/boot/compressed/Makefile | 1 +
> > arch/x86/boot/setup.ld | 1 -
> > arch/x86/kernel/vmlinux.lds.S | 3 ---
> > arch/x86/realmode/rm/Makefile | 1 +
> > arch/x86/realmode/rm/realmode.lds.S | 1 -
> > drivers/firmware/efi/libstub/Makefile | 3 ++-
> > 7 files changed, 5 insertions(+), 6 deletions(-)
> >
> > --
> > 2.24.1
> >
>
>
> --
> Thanks,
> ~Nick Desaulniers

2020-02-24 21:59:18

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH 2/2] arch/x86: Drop unneeded linker script discard of .eh_frame

On Mon, Feb 24, 2020 at 1:33 PM Arvind Sankar <[email protected]> wrote:
>
> On Mon, Feb 24, 2020 at 12:45:51PM -0800, Nick Desaulniers wrote:
> >
> > grepping for eh_frame in arch/x86/ there's a comment in
> > arch/x86/include/asm/dwarf2.h:
> > 40 #ifndef BUILD_VDSO
> > 41 /*
> > 42 * Emit CFI data in .debug_frame sections, not .eh_frame
> > sections.
> > 43 * The latter we currently just discard since we don't do DWARF
> > 44 * unwinding at runtime. So only the offline DWARF information is
> > 45 * useful to anyone. Note we should not use this directive if
> > 46 * vmlinux.lds.S gets changed so it doesn't discard .eh_frame.
> > 47 */
> > 48 .cfi_sections .debug_frame
> >
> > add via:
> > commit 7b956f035a9ef ("x86/asm: Re-add parts of the manual CFI infrastructure")
> >
> > https://sourceware.org/binutils/docs/as/CFI-directives.html#g_t_002ecfi_005fsections-section_005flist
> > is the manual's section on .cfi_sections directives, and states `The
> > default if this directive is not used is .cfi_sections .eh_frame.`.
> > So the comment is slightly stale since we're no longer explicitly
> > discarding .eh_frame in arch/x86/kernel/vmlinux.lds.S, rather
> > preventing the generation via -fno-asynchronous-unwind-tables in
> > KBUILD_CFLAGS (across a few different Makefiles). Would you mind also
> > updating the comment in arch/x86/include/asm/dwarf2.h in a V2? The
> > rest of this patch LGTM.
> >
>
> i.e. just replace that last sentence with "Note ... if we decide to use
> runtime DWARF unwinding again"?

Yeah that should be good. Maybe these cleanups could be a separate
patch, if you prefer?

>
> The whole ifdef-ery machinery there is obsolete, all the directives its
> checking support for have been there since binutils-2.18, so should
> probably also clean it up to just unconditionally define them.

arch/x86/Makefile:
184 # do binutils support CFI?
185 cfi := $(call as-instr,.cfi_startproc\n.cfi_rel_offset
$(sp-y)$(comma)0\n.cfi_endproc,-DCONFIG_AS_CFI=1)
186 # is .cfi_signal_frame supported too?
187 cfi-sigframe := $(call
as-instr,.cfi_startproc\n.cfi_signal_frame\n.cfi_endproc,-DCONFIG_AS_CFI_SIGNAL_FRAME=1)
188 cfi-sections := $(call as-instr,.cfi_sections
.debug_frame,-DCONFIG_AS_CFI_SECTIONS=1)

2.18? Oh, yeah, we can clean that up to.
Documentation/process/changes.rst list binutils 2.21 as the minimum
supported version. Then I assume that code that uses those -D flags
can go, too.
--
Thanks,
~Nick Desaulniers

2020-02-24 22:02:27

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH 0/2] Stop generating .eh_frame sections

On Mon, Feb 24, 2020 at 1:53 PM Arvind Sankar <[email protected]> wrote:
>
> On Mon, Feb 24, 2020 at 12:49:03PM -0800, Nick Desaulniers wrote:
> > On Sun, Feb 23, 2020 at 11:37 AM Arvind Sankar <[email protected]> wrote:
> > >
> > > In three places in the x86 kernel we are currently generating .eh_frame
> > > sections only to discard them later via linker script. This is in the
> > > boot code (setup.elf), the realmode trampoline (realmode.elf) and the
> > > compressed kernel.
> > >
> > > Implement Fangrui and Nick's suggestion [1] to fix KBUILD_CFLAGS by
> > > adding -fno-asynchronous-unwind-tables to avoid generating .eh_frame
> > > sections in the first place, rather than discarding it in the linker
> > > script.
> > >
> > > Arvind Sankar (2):
> > > arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections
> > > arch/x86: Drop unneeded linker script discard of .eh_frame
> >
> > Thanks for the series! I've left some feedback for a v2. Would you
> > mind please including a revert of ("x86/boot/compressed: Remove
> > unnecessary sections from bzImage") in a v2 series? Our CI being red
> > through the weekend is no bueno.
>
> Sorry about that. Boris already updated tip:x86/boot to only discard
> eh_frame, so your CI should be ok at least now.

Yep: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?h=x86/boot&id=0eea39a234dc52063d14541fabcb2c64516a2328
Looks like our daily CI ran 6 hours ago just missed it:
https://travis-ci.com/ClangBuiltLinux/continuous-integration/jobs/290435629
Thanks for the fix!

--
Thanks,
~Nick Desaulniers

2020-02-24 22:18:42

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On 2020-02-24, Arvind Sankar wrote:
>On Mon, Feb 24, 2020 at 01:28:28PM -0800, Fangrui Song wrote:
>> Hi Michael, please see my other reply on this thread: https://lkml.org/lkml/2020/2/24/47
>>
>> Synthesized sections can be matched as well. For example, SECTIONS { .pltfoo : { *(.plt) }} can rename the output section .plt to .pltfoo
>> It seems that in GNU ld, the synthesized section is associated with the
>> original object file, so it can be written as:
>>
>> SECTIONS { .pltfoo : { a.o(.plt) }}
>>
>> In lld, you need a wildcard to match the synthesized section *(.plt)
>>
>> .rela.dyn is another example.
>>
>
>With the BFD toolchain, file matching doesn't actually seem to work at
>least for .rela.dyn. I've tried playing around with it in the past and
>if you try to use file-matching to capture relocations from a particular
>input file, it just doesn't work sensibly.

I think most things are working in GNU ld...

/* a.x */
SECTIONS {
.rela.pltfoo : { a.o(.rela.plt) } /* *(.rela.plt) with lld */
.rela.dynfoo : { a.o(.rela.data) } /* *(.rela.dyn) with lld */
}

% cat <<e > a.s
.globl foo
foo:
call bar
.data
.quad quz
e
% as a.s -o a.o
% ld.bfd -T a.x a.o -shared -o a.so

2020-02-24 22:45:24

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Mon, Feb 24, 2020 at 02:17:03PM -0800, Fangrui Song wrote:
> On 2020-02-24, Arvind Sankar wrote:
> >On Mon, Feb 24, 2020 at 01:28:28PM -0800, Fangrui Song wrote:
> >> Hi Michael, please see my other reply on this thread: https://lkml.org/lkml/2020/2/24/47
> >>
> >> Synthesized sections can be matched as well. For example, SECTIONS { .pltfoo : { *(.plt) }} can rename the output section .plt to .pltfoo
> >> It seems that in GNU ld, the synthesized section is associated with the
> >> original object file, so it can be written as:
> >>
> >> SECTIONS { .pltfoo : { a.o(.plt) }}
> >>
> >> In lld, you need a wildcard to match the synthesized section *(.plt)
> >>
> >> .rela.dyn is another example.
> >>
> >
> >With the BFD toolchain, file matching doesn't actually seem to work at
> >least for .rela.dyn. I've tried playing around with it in the past and
> >if you try to use file-matching to capture relocations from a particular
> >input file, it just doesn't work sensibly.
>
> I think most things are working in GNU ld...
>
> /* a.x */
> SECTIONS {
> .rela.pltfoo : { a.o(.rela.plt) } /* *(.rela.plt) with lld */
> .rela.dynfoo : { a.o(.rela.data) } /* *(.rela.dyn) with lld */
> }

The file matching doesn't do anything sensible. If you split your .data
section out into b.s, and update the linker script so it filters for
b.o(.rela.data), .rela.dynfoo doesn't get created, instead the default
.rela.dyn will contains the .data section relocation. If you keep the
filter as a.o(.rela.data), you get .rela.dynfoo, even though a.o doesn't
actually contain any .rela.data section any more.

>
> % cat <<e > a.s
> .globl foo
> foo:
> call bar
> .data
> .quad quz
> e
> % as a.s -o a.o
> % ld.bfd -T a.x a.o -shared -o a.so

2020-02-24 22:52:44

by Fangrui Song

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On 2020-02-24, Arvind Sankar wrote:
>On Mon, Feb 24, 2020 at 02:17:03PM -0800, Fangrui Song wrote:
>> On 2020-02-24, Arvind Sankar wrote:
>> >On Mon, Feb 24, 2020 at 01:28:28PM -0800, Fangrui Song wrote:
>> >> Hi Michael, please see my other reply on this thread: https://lkml.org/lkml/2020/2/24/47
>> >>
>> >> Synthesized sections can be matched as well. For example, SECTIONS { .pltfoo : { *(.plt) }} can rename the output section .plt to .pltfoo
>> >> It seems that in GNU ld, the synthesized section is associated with the
>> >> original object file, so it can be written as:
>> >>
>> >> SECTIONS { .pltfoo : { a.o(.plt) }}
>> >>
>> >> In lld, you need a wildcard to match the synthesized section *(.plt)
>> >>
>> >> .rela.dyn is another example.
>> >>
>> >
>> >With the BFD toolchain, file matching doesn't actually seem to work at
>> >least for .rela.dyn. I've tried playing around with it in the past and
>> >if you try to use file-matching to capture relocations from a particular
>> >input file, it just doesn't work sensibly.
>>
>> I think most things are working in GNU ld...
>>
>> /* a.x */
>> SECTIONS {
>> .rela.pltfoo : { a.o(.rela.plt) } /* *(.rela.plt) with lld */
>> .rela.dynfoo : { a.o(.rela.data) } /* *(.rela.dyn) with lld */
>> }
>
>The file matching doesn't do anything sensible. If you split your .data
>section out into b.s, and update the linker script so it filters for
>b.o(.rela.data), .rela.dynfoo doesn't get created, instead the default
>.rela.dyn will contains the .data section relocation. If you keep the
>filter as a.o(.rela.data), you get .rela.dynfoo, even though a.o doesn't
>actually contain any .rela.data section any more.

I raised the examples to support my viewpoint "synthesized sections can
be matched, as well as input sections."

If there is really a need (rare, not recommended) to rename output
sections only consisting of synthesized sections (e.g. .plt .rela.dyn),
for linker portability, it is better using a wildcard for the input
filename pattern.

As another example, SECTIONS { /DISCARD/ : { *(.rela.*) } } discards synthesized .rela.*

>>
>> % cat <<e > a.s
>> .globl foo
>> foo:
>> call bar
>> .data
>> .quad quz
>> e
>> % as a.s -o a.o
>> % ld.bfd -T a.x a.o -shared -o a.so

2020-02-24 23:09:24

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections from bzImage

On Mon, Feb 24, 2020 at 02:50:59PM -0800, Fangrui Song wrote:
> On 2020-02-24, Arvind Sankar wrote:
> >On Mon, Feb 24, 2020 at 02:17:03PM -0800, Fangrui Song wrote:
> >> On 2020-02-24, Arvind Sankar wrote:
> >> >On Mon, Feb 24, 2020 at 01:28:28PM -0800, Fangrui Song wrote:
> >> >> Hi Michael, please see my other reply on this thread: https://lkml.org/lkml/2020/2/24/47
> >> >>
> >> >> Synthesized sections can be matched as well. For example, SECTIONS { .pltfoo : { *(.plt) }} can rename the output section .plt to .pltfoo
> >> >> It seems that in GNU ld, the synthesized section is associated with the
> >> >> original object file, so it can be written as:
> >> >>
> >> >> SECTIONS { .pltfoo : { a.o(.plt) }}
> >> >>
> >> >> In lld, you need a wildcard to match the synthesized section *(.plt)
> >> >>
> >> >> .rela.dyn is another example.
> >> >>
> >> >
> >> >With the BFD toolchain, file matching doesn't actually seem to work at
> >> >least for .rela.dyn. I've tried playing around with it in the past and
> >> >if you try to use file-matching to capture relocations from a particular
> >> >input file, it just doesn't work sensibly.
> >>
> >> I think most things are working in GNU ld...
> >>
> >> /* a.x */
> >> SECTIONS {
> >> .rela.pltfoo : { a.o(.rela.plt) } /* *(.rela.plt) with lld */
> >> .rela.dynfoo : { a.o(.rela.data) } /* *(.rela.dyn) with lld */
> >> }
> >
> >The file matching doesn't do anything sensible. If you split your .data
> >section out into b.s, and update the linker script so it filters for
> >b.o(.rela.data), .rela.dynfoo doesn't get created, instead the default
> >.rela.dyn will contains the .data section relocation. If you keep the
> >filter as a.o(.rela.data), you get .rela.dynfoo, even though a.o doesn't
> >actually contain any .rela.data section any more.
>
> I raised the examples to support my viewpoint "synthesized sections can
> be matched, as well as input sections."
>
> If there is really a need (rare, not recommended) to rename output
> sections only consisting of synthesized sections (e.g. .plt .rela.dyn),
> for linker portability, it is better using a wildcard for the input
> filename pattern.

Yep, you have to use * for the filename. My comment was only addressing
the fact that this part isn't accurate:

> >> >> It seems that in GNU ld, the synthesized section is associated with the
> >> >> original object file, so it can be written as:

I can't make head or tail of how GNU ld decides what file to associate
with the synthesized sections.

Also, even section name matching doesn't work with all synthesized
sections -- it's not possible to match .strtab or .shstrtab with GNU ld
in order to rename them, in addition to not being able to discard them,
while that does work with LLD.

>
> As another example, SECTIONS { /DISCARD/ : { *(.rela.*) } } discards synthesized .rela.*
>
> >>
> >> % cat <<e > a.s
> >> .globl foo
> >> foo:
> >> call bar
> >> .data
> >> .quad quz
> >> e
> >> % as a.s -o a.o
> >> % ld.bfd -T a.x a.o -shared -o a.so

2020-02-24 23:22:10

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH v2 2/2] arch/x86: Drop unneeded linker script discard of .eh_frame

Now that we don't generate .eh_frame sections for the files in setup.elf
and realmode.elf, the linker scripts don't need the /DISCARD/ any more.

Remove the one in the main kernel linker script as well, since there are
no .eh_frame sections already, and fix up a comment referencing .eh_frame.

Update the comment in asm/dwarf2.h referring to .eh_frame so it continues
to make sense, as well as being more specific.

Signed-off-by: Arvind Sankar <[email protected]>
---
arch/x86/boot/compressed/vmlinux.lds.S | 5 -----
arch/x86/boot/setup.ld | 1 -
arch/x86/include/asm/dwarf2.h | 4 ++--
arch/x86/kernel/vmlinux.lds.S | 7 ++-----
arch/x86/realmode/rm/realmode.lds.S | 1 -
5 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 469dcf800a2c..508cfa6828c5 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -73,9 +73,4 @@ SECTIONS
#endif
. = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
_end = .;
-
- /* Discard .eh_frame to save some space */
- /DISCARD/ : {
- *(.eh_frame)
- }
}
diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index 3da1c37c6dd5..24c95522f231 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -52,7 +52,6 @@ SECTIONS
_end = .;

/DISCARD/ : {
- *(.eh_frame)
*(.note*)
}

diff --git a/arch/x86/include/asm/dwarf2.h b/arch/x86/include/asm/dwarf2.h
index ae391f609840..f71a0cce9373 100644
--- a/arch/x86/include/asm/dwarf2.h
+++ b/arch/x86/include/asm/dwarf2.h
@@ -42,8 +42,8 @@
* Emit CFI data in .debug_frame sections, not .eh_frame sections.
* The latter we currently just discard since we don't do DWARF
* unwinding at runtime. So only the offline DWARF information is
- * useful to anyone. Note we should not use this directive if
- * vmlinux.lds.S gets changed so it doesn't discard .eh_frame.
+ * useful to anyone. Note we should not use this directive if we
+ * ever decide to enable DWARF unwinding at runtime.
*/
.cfi_sections .debug_frame
#else
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index e3296aa028fe..5cab3a29adcb 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -313,8 +313,8 @@ SECTIONS

. = ALIGN(8);
/*
- * .exit.text is discard at runtime, not link time, to deal with
- * references from .altinstructions and .eh_frame
+ * .exit.text is discarded at runtime, not link time, to deal with
+ * references from .altinstructions
*/
.exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) {
EXIT_TEXT
@@ -412,9 +412,6 @@ SECTIONS
DWARF_DEBUG

DISCARDS
- /DISCARD/ : {
- *(.eh_frame)
- }
}


diff --git a/arch/x86/realmode/rm/realmode.lds.S b/arch/x86/realmode/rm/realmode.lds.S
index 64d135d1ee63..63aa51875ba0 100644
--- a/arch/x86/realmode/rm/realmode.lds.S
+++ b/arch/x86/realmode/rm/realmode.lds.S
@@ -71,7 +71,6 @@ SECTIONS
/DISCARD/ : {
*(.note*)
*(.debug*)
- *(.eh_frame*)
}

#include "pasyms.h"
--
2.24.1

2020-02-24 23:22:43

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH v2 1/2] arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections

While discussing a patch to discard .eh_frame from the compressed
vmlinux using the linker script, Fangrui Song pointed out [1] that these
sections shouldn't exist in the first place because arch/x86/Makefile
uses -fno-asynchronous-unwind-tables.

It turns out this is because the Makefiles used to build the compressed
kernel redefine KBUILD_CFLAGS, dropping this flag.

Add the flag to the Makefile for the compressed kernel, as well as the
EFI stub Makefile to fix this.

Also add the flag to boot/Makefile and realmode/rm/Makefile so that the
kernel's boot code (boot/setup.elf) and realmode trampoline
(realmode/rm/realmode.elf) won't be compiled with .eh_frame sections,
since their linker scripts also just discard it.

Signed-off-by: Arvind Sankar <[email protected]>
Suggested-By: Fangrui Song <[email protected]>
[1] https://lore.kernel.org/lkml/[email protected]/
---
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/realmode/rm/Makefile | 1 +
drivers/firmware/efi/libstub/Makefile | 3 ++-
4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 012b82fc8617..24f011e0adf1 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -68,6 +68,7 @@ clean-files += cpustr.h
KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
+KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
UBSAN_SANITIZE := n

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 26050ae0b27e..c33111341325 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -39,6 +39,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)
KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
KBUILD_CFLAGS += -Wno-pointer-sign
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
+KBUILD_CFLAGS += -fno-asynchronous-unwind-tables

KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 99b6332ba540..b11ec5d8f8ac 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -71,5 +71,6 @@ $(obj)/realmode.relocs: $(obj)/realmode.elf FORCE
KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
-I$(srctree)/arch/x86/boot
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
+KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
UBSAN_SANITIZE := n
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 98a81576213d..a1140c4ee478 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -12,7 +12,8 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__KERNEL__ -O2 \
-mno-mmx -mno-sse -fshort-wchar \
-Wno-pointer-sign \
$(call cc-disable-warning, address-of-packed-member) \
- $(call cc-disable-warning, gnu)
+ $(call cc-disable-warning, gnu) \
+ -fno-asynchronous-unwind-tables

# arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
# disable the stackleak plugin
--
2.24.1

2020-02-24 23:22:48

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH v2 0/2] Stop generating .eh_frame sections

In three places in the x86 kernel we are currently generating .eh_frame
sections only to discard them later via linker script. This is in the
boot code (setup.elf), the realmode trampoline (realmode.elf) and the
compressed kernel.

Implement Fangrui and Nick's suggestion [1] to fix KBUILD_CFLAGS by
adding -fno-asynchronous-unwind-tables to avoid generating .eh_frame
sections in the first place, rather than discarding it in the linker
script.

Changes from v1:

Rebase on top of tip:x86/boot and include reverting the addition of
.eh_frame discard in compressed/vmlinux.lds.S.
Fix up a comment that refers to .eh_frame, pointed out by Nick.

Arvind Sankar (2):
arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections
arch/x86: Drop unneeded linker script discard of .eh_frame

arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/boot/compressed/vmlinux.lds.S | 5 -----
arch/x86/boot/setup.ld | 1 -
arch/x86/include/asm/dwarf2.h | 4 ++--
arch/x86/kernel/vmlinux.lds.S | 7 ++-----
arch/x86/realmode/rm/Makefile | 1 +
arch/x86/realmode/rm/realmode.lds.S | 1 -
drivers/firmware/efi/libstub/Makefile | 3 ++-
9 files changed, 9 insertions(+), 15 deletions(-)

--
2.24.1

2020-02-24 23:28:25

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] arch/x86: Drop unneeded linker script discard of .eh_frame

On Mon, Feb 24, 2020 at 06:21:29PM -0500, Arvind Sankar wrote:
> Now that we don't generate .eh_frame sections for the files in setup.elf
> and realmode.elf, the linker scripts don't need the /DISCARD/ any more.
>
> Remove the one in the main kernel linker script as well, since there are
> no .eh_frame sections already, and fix up a comment referencing .eh_frame.
>
> Update the comment in asm/dwarf2.h referring to .eh_frame so it continues
> to make sense, as well as being more specific.
>
> Signed-off-by: Arvind Sankar <[email protected]>

My previous review conments still stand,

Reviewed-by: Nathan Chancellor <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>

2020-02-24 23:30:08

by Nathan Chancellor

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections

On Mon, Feb 24, 2020 at 06:21:28PM -0500, Arvind Sankar wrote:
> While discussing a patch to discard .eh_frame from the compressed
> vmlinux using the linker script, Fangrui Song pointed out [1] that these
> sections shouldn't exist in the first place because arch/x86/Makefile
> uses -fno-asynchronous-unwind-tables.
>
> It turns out this is because the Makefiles used to build the compressed
> kernel redefine KBUILD_CFLAGS, dropping this flag.
>
> Add the flag to the Makefile for the compressed kernel, as well as the
> EFI stub Makefile to fix this.
>
> Also add the flag to boot/Makefile and realmode/rm/Makefile so that the
> kernel's boot code (boot/setup.elf) and realmode trampoline
> (realmode/rm/realmode.elf) won't be compiled with .eh_frame sections,
> since their linker scripts also just discard it.
>
> Signed-off-by: Arvind Sankar <[email protected]>
> Suggested-By: Fangrui Song <[email protected]>
> [1] https://lore.kernel.org/lkml/[email protected]/

My previous review conments still stand,

Reviewed-by: Nathan Chancellor <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>

2020-02-24 23:31:07

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections

On Mon, Feb 24, 2020 at 3:21 PM Arvind Sankar <[email protected]> wrote:
>
> While discussing a patch to discard .eh_frame from the compressed
> vmlinux using the linker script, Fangrui Song pointed out [1] that these
> sections shouldn't exist in the first place because arch/x86/Makefile
> uses -fno-asynchronous-unwind-tables.
>
> It turns out this is because the Makefiles used to build the compressed
> kernel redefine KBUILD_CFLAGS, dropping this flag.
>
> Add the flag to the Makefile for the compressed kernel, as well as the
> EFI stub Makefile to fix this.
>
> Also add the flag to boot/Makefile and realmode/rm/Makefile so that the
> kernel's boot code (boot/setup.elf) and realmode trampoline
> (realmode/rm/realmode.elf) won't be compiled with .eh_frame sections,
> since their linker scripts also just discard it.
>
> Signed-off-by: Arvind Sankar <[email protected]>
> Suggested-By: Fangrui Song <[email protected]>

Reviewed-by: Nick Desaulniers <[email protected]>

> [1] https://lore.kernel.org/lkml/[email protected]/
> ---
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 3 ++-
> 4 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index 012b82fc8617..24f011e0adf1 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -68,6 +68,7 @@ clean-files += cpustr.h
> KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> +KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> UBSAN_SANITIZE := n
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 26050ae0b27e..c33111341325 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -39,6 +39,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)
> KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
> KBUILD_CFLAGS += -Wno-pointer-sign
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> +KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 99b6332ba540..b11ec5d8f8ac 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -71,5 +71,6 @@ $(obj)/realmode.relocs: $(obj)/realmode.elf FORCE
> KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> -I$(srctree)/arch/x86/boot
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> +KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 98a81576213d..a1140c4ee478 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -12,7 +12,8 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__KERNEL__ -O2 \
> -mno-mmx -mno-sse -fshort-wchar \
> -Wno-pointer-sign \
> $(call cc-disable-warning, address-of-packed-member) \
> - $(call cc-disable-warning, gnu)
> + $(call cc-disable-warning, gnu) \
> + -fno-asynchronous-unwind-tables
>
> # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
> # disable the stackleak plugin
> --
> 2.24.1
>


--
Thanks,
~Nick Desaulniers

2020-02-24 23:34:29

by Nick Desaulniers

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] arch/x86: Drop unneeded linker script discard of .eh_frame

On Mon, Feb 24, 2020 at 3:21 PM Arvind Sankar <[email protected]> wrote:
>
> Now that we don't generate .eh_frame sections for the files in setup.elf
> and realmode.elf, the linker scripts don't need the /DISCARD/ any more.
>
> Remove the one in the main kernel linker script as well, since there are
> no .eh_frame sections already, and fix up a comment referencing .eh_frame.
>
> Update the comment in asm/dwarf2.h referring to .eh_frame so it continues
> to make sense, as well as being more specific.
>
> Signed-off-by: Arvind Sankar <[email protected]>

Thanks for cleaning up the comments.
Reviewed-by: Nick Desaulniers <[email protected]>

> ---
> arch/x86/boot/compressed/vmlinux.lds.S | 5 -----
> arch/x86/boot/setup.ld | 1 -
> arch/x86/include/asm/dwarf2.h | 4 ++--
> arch/x86/kernel/vmlinux.lds.S | 7 ++-----
> arch/x86/realmode/rm/realmode.lds.S | 1 -
> 5 files changed, 4 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> index 469dcf800a2c..508cfa6828c5 100644
> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> @@ -73,9 +73,4 @@ SECTIONS
> #endif
> . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
> _end = .;
> -
> - /* Discard .eh_frame to save some space */
> - /DISCARD/ : {
> - *(.eh_frame)
> - }
> }
> diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
> index 3da1c37c6dd5..24c95522f231 100644
> --- a/arch/x86/boot/setup.ld
> +++ b/arch/x86/boot/setup.ld
> @@ -52,7 +52,6 @@ SECTIONS
> _end = .;
>
> /DISCARD/ : {
> - *(.eh_frame)
> *(.note*)
> }
>
> diff --git a/arch/x86/include/asm/dwarf2.h b/arch/x86/include/asm/dwarf2.h
> index ae391f609840..f71a0cce9373 100644
> --- a/arch/x86/include/asm/dwarf2.h
> +++ b/arch/x86/include/asm/dwarf2.h
> @@ -42,8 +42,8 @@
> * Emit CFI data in .debug_frame sections, not .eh_frame sections.
> * The latter we currently just discard since we don't do DWARF
> * unwinding at runtime. So only the offline DWARF information is
> - * useful to anyone. Note we should not use this directive if
> - * vmlinux.lds.S gets changed so it doesn't discard .eh_frame.
> + * useful to anyone. Note we should not use this directive if we
> + * ever decide to enable DWARF unwinding at runtime.
> */
> .cfi_sections .debug_frame
> #else
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index e3296aa028fe..5cab3a29adcb 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -313,8 +313,8 @@ SECTIONS
>
> . = ALIGN(8);
> /*
> - * .exit.text is discard at runtime, not link time, to deal with
> - * references from .altinstructions and .eh_frame
> + * .exit.text is discarded at runtime, not link time, to deal with
> + * references from .altinstructions
> */
> .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) {
> EXIT_TEXT
> @@ -412,9 +412,6 @@ SECTIONS
> DWARF_DEBUG
>
> DISCARDS
> - /DISCARD/ : {
> - *(.eh_frame)
> - }
> }
>
>
> diff --git a/arch/x86/realmode/rm/realmode.lds.S b/arch/x86/realmode/rm/realmode.lds.S
> index 64d135d1ee63..63aa51875ba0 100644
> --- a/arch/x86/realmode/rm/realmode.lds.S
> +++ b/arch/x86/realmode/rm/realmode.lds.S
> @@ -71,7 +71,6 @@ SECTIONS
> /DISCARD/ : {
> *(.note*)
> *(.debug*)
> - *(.eh_frame*)
> }
>
> #include "pasyms.h"
> --
> 2.24.1
>


--
Thanks,
~Nick Desaulniers

2020-02-25 04:11:35

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] arch/x86: Drop unneeded linker script discard of .eh_frame

On Mon, Feb 24, 2020 at 06:21:29PM -0500, Arvind Sankar wrote:
> Now that we don't generate .eh_frame sections for the files in setup.elf
> and realmode.elf, the linker scripts don't need the /DISCARD/ any more.
>
> Remove the one in the main kernel linker script as well, since there are
> no .eh_frame sections already, and fix up a comment referencing .eh_frame.
>
> Update the comment in asm/dwarf2.h referring to .eh_frame so it continues
> to make sense, as well as being more specific.
>
> Signed-off-by: Arvind Sankar <[email protected]>

Reviewed-by: Kees Cook <[email protected]>

-Kees

> ---
> arch/x86/boot/compressed/vmlinux.lds.S | 5 -----
> arch/x86/boot/setup.ld | 1 -
> arch/x86/include/asm/dwarf2.h | 4 ++--
> arch/x86/kernel/vmlinux.lds.S | 7 ++-----
> arch/x86/realmode/rm/realmode.lds.S | 1 -
> 5 files changed, 4 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
> index 469dcf800a2c..508cfa6828c5 100644
> --- a/arch/x86/boot/compressed/vmlinux.lds.S
> +++ b/arch/x86/boot/compressed/vmlinux.lds.S
> @@ -73,9 +73,4 @@ SECTIONS
> #endif
> . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
> _end = .;
> -
> - /* Discard .eh_frame to save some space */
> - /DISCARD/ : {
> - *(.eh_frame)
> - }
> }
> diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
> index 3da1c37c6dd5..24c95522f231 100644
> --- a/arch/x86/boot/setup.ld
> +++ b/arch/x86/boot/setup.ld
> @@ -52,7 +52,6 @@ SECTIONS
> _end = .;
>
> /DISCARD/ : {
> - *(.eh_frame)
> *(.note*)
> }
>
> diff --git a/arch/x86/include/asm/dwarf2.h b/arch/x86/include/asm/dwarf2.h
> index ae391f609840..f71a0cce9373 100644
> --- a/arch/x86/include/asm/dwarf2.h
> +++ b/arch/x86/include/asm/dwarf2.h
> @@ -42,8 +42,8 @@
> * Emit CFI data in .debug_frame sections, not .eh_frame sections.
> * The latter we currently just discard since we don't do DWARF
> * unwinding at runtime. So only the offline DWARF information is
> - * useful to anyone. Note we should not use this directive if
> - * vmlinux.lds.S gets changed so it doesn't discard .eh_frame.
> + * useful to anyone. Note we should not use this directive if we
> + * ever decide to enable DWARF unwinding at runtime.
> */
> .cfi_sections .debug_frame
> #else
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index e3296aa028fe..5cab3a29adcb 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -313,8 +313,8 @@ SECTIONS
>
> . = ALIGN(8);
> /*
> - * .exit.text is discard at runtime, not link time, to deal with
> - * references from .altinstructions and .eh_frame
> + * .exit.text is discarded at runtime, not link time, to deal with
> + * references from .altinstructions
> */
> .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) {
> EXIT_TEXT
> @@ -412,9 +412,6 @@ SECTIONS
> DWARF_DEBUG
>
> DISCARDS
> - /DISCARD/ : {
> - *(.eh_frame)
> - }
> }
>
>
> diff --git a/arch/x86/realmode/rm/realmode.lds.S b/arch/x86/realmode/rm/realmode.lds.S
> index 64d135d1ee63..63aa51875ba0 100644
> --- a/arch/x86/realmode/rm/realmode.lds.S
> +++ b/arch/x86/realmode/rm/realmode.lds.S
> @@ -71,7 +71,6 @@ SECTIONS
> /DISCARD/ : {
> *(.note*)
> *(.debug*)
> - *(.eh_frame*)
> }
>
> #include "pasyms.h"
> --
> 2.24.1
>

--
Kees Cook

2020-02-25 04:12:06

by Kees Cook

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] arch/x86: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections

On Mon, Feb 24, 2020 at 06:21:28PM -0500, Arvind Sankar wrote:
> While discussing a patch to discard .eh_frame from the compressed
> vmlinux using the linker script, Fangrui Song pointed out [1] that these
> sections shouldn't exist in the first place because arch/x86/Makefile
> uses -fno-asynchronous-unwind-tables.
>
> It turns out this is because the Makefiles used to build the compressed
> kernel redefine KBUILD_CFLAGS, dropping this flag.
>
> Add the flag to the Makefile for the compressed kernel, as well as the
> EFI stub Makefile to fix this.
>
> Also add the flag to boot/Makefile and realmode/rm/Makefile so that the
> kernel's boot code (boot/setup.elf) and realmode trampoline
> (realmode/rm/realmode.elf) won't be compiled with .eh_frame sections,
> since their linker scripts also just discard it.
>
> Signed-off-by: Arvind Sankar <[email protected]>

Thanks for sorting this out. I think it's much cleaner than adding it to
DISCARD. :)

Reviewed-by: Kees Cook <[email protected]>

-Kees

> Suggested-By: Fangrui Song <[email protected]>
> [1] https://lore.kernel.org/lkml/[email protected]/
> ---
> arch/x86/boot/Makefile | 1 +
> arch/x86/boot/compressed/Makefile | 1 +
> arch/x86/realmode/rm/Makefile | 1 +
> drivers/firmware/efi/libstub/Makefile | 3 ++-
> 4 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
> index 012b82fc8617..24f011e0adf1 100644
> --- a/arch/x86/boot/Makefile
> +++ b/arch/x86/boot/Makefile
> @@ -68,6 +68,7 @@ clean-files += cpustr.h
> KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> +KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> UBSAN_SANITIZE := n
>
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 26050ae0b27e..c33111341325 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -39,6 +39,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)
> KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
> KBUILD_CFLAGS += -Wno-pointer-sign
> KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
> +KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
>
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> GCOV_PROFILE := n
> diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
> index 99b6332ba540..b11ec5d8f8ac 100644
> --- a/arch/x86/realmode/rm/Makefile
> +++ b/arch/x86/realmode/rm/Makefile
> @@ -71,5 +71,6 @@ $(obj)/realmode.relocs: $(obj)/realmode.elf FORCE
> KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
> -I$(srctree)/arch/x86/boot
> KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
> +KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
> GCOV_PROFILE := n
> UBSAN_SANITIZE := n
> diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
> index 98a81576213d..a1140c4ee478 100644
> --- a/drivers/firmware/efi/libstub/Makefile
> +++ b/drivers/firmware/efi/libstub/Makefile
> @@ -12,7 +12,8 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__KERNEL__ -O2 \
> -mno-mmx -mno-sse -fshort-wchar \
> -Wno-pointer-sign \
> $(call cc-disable-warning, address-of-packed-member) \
> - $(call cc-disable-warning, gnu)
> + $(call cc-disable-warning, gnu) \
> + -fno-asynchronous-unwind-tables
>
> # arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
> # disable the stackleak plugin
> --
> 2.24.1
>

--
Kees Cook

2020-02-25 05:36:27

by Kees Cook

[permalink] [raw]
Subject: Re: --orphan-handling=warn (was Re: [PATCH 2/2] x86/boot/compressed: Remove unnecessary sections) from bzImage

On Sat, Feb 22, 2020 at 03:20:39PM -0800, Nick Desaulniers wrote:
> Kees is working on a series to just be explicit about what sections
> are ordered where, and what's discarded, which should better handle
> incompatibilities between linkers in regards to orphan section
> placement and "what does `*` mean." Kees, that series can't come soon

So, with my series[1] applied, ld.bfd builds clean. With ld.lld, I get a
TON of warnings, such as:

(.bss.rel.ro) is being placed in '.bss.rel.ro'
(.iplt) is being placed in '.iplt'
(.plt) is being placed in '.plt'
(.rela.altinstr_aux) is being placed in '.rela.altinstr_aux'
(.rela.altinstr_replacement) is being placed in
'.rela.altinstr_replacement'
(.rela.altinstructions) is being placed in '.rela.altinstructions'
(.rela.apicdrivers) is being placed in '.rela.apicdrivers'
(.rela__bug_table) is being placed in '.rela__bug_table'
(.rela.con_initcall.init) is being placed in '.rela.init.data'
(.rela.cpuidle.text) is being placed in '.rela.text'
(.rela.data..cacheline_aligned) is being placed in '.rela.data'
(.rela.data) is being placed in '.rela.data'
(.rela.data..percpu) is being placed in '.rela.data..percpu'
(.rela.data..percpu..page_aligned) is being placed in '.rela.data..percpu'
...

But as you can see in the /DISCARD/, these (and all the others), should
be getting caught:

/DISCARD/ : {
*(.eh_frame)
+ *(.rela.*) *(.rela_*)
+ *(.rel.*) *(.rel_*)
+ *(.got) *(.got.*)
+ *(.igot.*) *(.iplt)
}

I don't understand what's happening here. I haven't minimized this case
nor opened an lld bug yet.

> enough. ;) (I think it's intended to help "fine grain" (per function)
> KASLR). More comments in the other thread.

Actually, it's rather opposed to the FGKASLR series, as for that, I need
some kind of linker script directive like this:

/PASSTHRU/ : {
*(.text.*)
}

Where "PASSTHRU" would create a 1-to-1 input-section to output-section
with the same name, flags, etc.

ld.bfd's handling of orphan sections named .text.* is to put them each
as a separate output section, after the existing .text output section.

ld.lld's handling of orphan sections named .text.* is to put them into
the .text output section.

For FGKASLR (as it is currently implemented[2]), the sections need to be
individually named output sections (as bfd does it). *However*, with the
"warn on orphans" patch, FGKASLR's intentional orphaning will backfire
(I guess the warning could be turned off, but I'd like lld to handle
FGKASLR at some point.)

Note that cheating and doing the 1-to-1 mapping by handy with a 40,000
entry linker script ... made ld.lld take about 15 minutes to do the
final link. :(

> Taken from the Zen of Python, but in regards to sections in linker
> scripts, "explicit is better than implicit."

Totally agreed. I just hope there's a good solution for this PASSTHRU
idea...

-Kees

[1] https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=linker/orphans/x86-arm
[2]
https://github.com/kaccardi/linux/commit/127111e8c6170a130d8d12d73728e74acbe05e13

--
Kees Cook

2020-02-25 17:08:01

by Arvind Sankar

[permalink] [raw]
Subject: Re: --orphan-handling=warn

On Mon, Feb 24, 2020 at 09:35:04PM -0800, Kees Cook wrote:
> On Sat, Feb 22, 2020 at 03:20:39PM -0800, Nick Desaulniers wrote:
> > Kees is working on a series to just be explicit about what sections
> > are ordered where, and what's discarded, which should better handle
> > incompatibilities between linkers in regards to orphan section
> > placement and "what does `*` mean." Kees, that series can't come soon
>
> So, with my series[1] applied, ld.bfd builds clean. With ld.lld, I get a
> TON of warnings, such as:
>
> (.bss.rel.ro) is being placed in '.bss.rel.ro'
> (.iplt) is being placed in '.iplt'
> (.plt) is being placed in '.plt'
> (.rela.altinstr_aux) is being placed in '.rela.altinstr_aux'
> (.rela.altinstr_replacement) is being placed in
> '.rela.altinstr_replacement'
> (.rela.altinstructions) is being placed in '.rela.altinstructions'
> (.rela.apicdrivers) is being placed in '.rela.apicdrivers'
> (.rela__bug_table) is being placed in '.rela__bug_table'
> (.rela.con_initcall.init) is being placed in '.rela.init.data'
> (.rela.cpuidle.text) is being placed in '.rela.text'
> (.rela.data..cacheline_aligned) is being placed in '.rela.data'
> (.rela.data) is being placed in '.rela.data'
> (.rela.data..percpu) is being placed in '.rela.data..percpu'
> (.rela.data..percpu..page_aligned) is being placed in '.rela.data..percpu'
> ...
>
> But as you can see in the /DISCARD/, these (and all the others), should
> be getting caught:
>
> /DISCARD/ : {
> *(.eh_frame)
> + *(.rela.*) *(.rela_*)
> + *(.rel.*) *(.rel_*)
> + *(.got) *(.got.*)
> + *(.igot.*) *(.iplt)
> }
>
> I don't understand what's happening here. I haven't minimized this case
> nor opened an lld bug yet.
>

At least for the relocation sections, they cannot actually be discarded
because we build vmlinux with --emit-relocs (at least for x86).

However, attempting to collect them via for eg
.rela.all : { *(.rela.*) }
while using --emit-relocs works for neither ld.bfd nor ld.lld, just like
discarding doesn't work. The original .rela.<section_name> sections are
always output. ld.bfd doesn't warn, but ld.lld warns that these are all
orphan sections. I think these warnings would have to be suppressed to
make --orphan-handling=warn useful with --emit-relocs.

It seems like lld needs even .symtab, .strtab and .shstrtab (and
.symtab_shndx -- not sure what this is as it doesn't actually appear in
output) to be explicitly in the linker script to avoid warning about
them. Maybe these should be suppressed as well?

For .plt I think you just forgot to discard it. I didn't actually get a
warning for iplt, and the one for plt goes away if you do add it to
discards.

Not sure why there's a .bss.rel.ro section in the first place -- surely
the bss can't have any relocations? But that can be removed via discard
as well at least.

Subject: [tip: x86/boot] x86/vmlinux: Drop unneeded linker script discard of .eh_frame

The following commit has been merged into the x86/boot branch of tip:

Commit-ID: 6f8f0dc980028e98ae339876a8403edae4d20e39
Gitweb: https://git.kernel.org/tip/6f8f0dc980028e98ae339876a8403edae4d20e39
Author: Arvind Sankar <[email protected]>
AuthorDate: Mon, 24 Feb 2020 18:21:29 -05:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Tue, 25 Feb 2020 14:51:29 +01:00

x86/vmlinux: Drop unneeded linker script discard of .eh_frame

Now that .eh_frame sections for the files in setup.elf and realmode.elf
are not generated anymore, the linker scripts don't need the special
output section name /DISCARD/ any more.

Remove the one in the main kernel linker script as well, since there are
no .eh_frame sections already, and fix up a comment referencing .eh_frame.

Update the comment in asm/dwarf2.h referring to .eh_frame so it continues
to make sense, as well as being more specific.

[ bp: Touch up commit message. ]

Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/boot/compressed/vmlinux.lds.S | 5 -----
arch/x86/boot/setup.ld | 1 -
arch/x86/include/asm/dwarf2.h | 4 ++--
arch/x86/kernel/vmlinux.lds.S | 7 ++-----
arch/x86/realmode/rm/realmode.lds.S | 1 -
5 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 469dcf8..508cfa6 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -73,9 +73,4 @@ SECTIONS
#endif
. = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */
_end = .;
-
- /* Discard .eh_frame to save some space */
- /DISCARD/ : {
- *(.eh_frame)
- }
}
diff --git a/arch/x86/boot/setup.ld b/arch/x86/boot/setup.ld
index 3da1c37..24c9552 100644
--- a/arch/x86/boot/setup.ld
+++ b/arch/x86/boot/setup.ld
@@ -52,7 +52,6 @@ SECTIONS
_end = .;

/DISCARD/ : {
- *(.eh_frame)
*(.note*)
}

diff --git a/arch/x86/include/asm/dwarf2.h b/arch/x86/include/asm/dwarf2.h
index ae391f6..f71a0cc 100644
--- a/arch/x86/include/asm/dwarf2.h
+++ b/arch/x86/include/asm/dwarf2.h
@@ -42,8 +42,8 @@
* Emit CFI data in .debug_frame sections, not .eh_frame sections.
* The latter we currently just discard since we don't do DWARF
* unwinding at runtime. So only the offline DWARF information is
- * useful to anyone. Note we should not use this directive if
- * vmlinux.lds.S gets changed so it doesn't discard .eh_frame.
+ * useful to anyone. Note we should not use this directive if we
+ * ever decide to enable DWARF unwinding at runtime.
*/
.cfi_sections .debug_frame
#else
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index e3296aa..5cab3a2 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -313,8 +313,8 @@ SECTIONS

. = ALIGN(8);
/*
- * .exit.text is discard at runtime, not link time, to deal with
- * references from .altinstructions and .eh_frame
+ * .exit.text is discarded at runtime, not link time, to deal with
+ * references from .altinstructions
*/
.exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) {
EXIT_TEXT
@@ -412,9 +412,6 @@ SECTIONS
DWARF_DEBUG

DISCARDS
- /DISCARD/ : {
- *(.eh_frame)
- }
}


diff --git a/arch/x86/realmode/rm/realmode.lds.S b/arch/x86/realmode/rm/realmode.lds.S
index 64d135d..63aa518 100644
--- a/arch/x86/realmode/rm/realmode.lds.S
+++ b/arch/x86/realmode/rm/realmode.lds.S
@@ -71,7 +71,6 @@ SECTIONS
/DISCARD/ : {
*(.note*)
*(.debug*)
- *(.eh_frame*)
}

#include "pasyms.h"

Subject: [tip: x86/boot] x86/*/Makefile: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections

The following commit has been merged into the x86/boot branch of tip:

Commit-ID: 003602ad5516e59940de42e44c8d8033387bb363
Gitweb: https://git.kernel.org/tip/003602ad5516e59940de42e44c8d8033387bb363
Author: Arvind Sankar <[email protected]>
AuthorDate: Mon, 24 Feb 2020 18:21:28 -05:00
Committer: Borislav Petkov <[email protected]>
CommitterDate: Tue, 25 Feb 2020 13:18:29 +01:00

x86/*/Makefile: Use -fno-asynchronous-unwind-tables to suppress .eh_frame sections

While discussing a patch to discard .eh_frame from the compressed
vmlinux using the linker script, Fangrui Song pointed out [1] that these
sections shouldn't exist in the first place because arch/x86/Makefile
uses -fno-asynchronous-unwind-tables.

It turns out this is because the Makefiles used to build the compressed
kernel redefine KBUILD_CFLAGS, dropping this flag.

Add the flag to the Makefile for the compressed kernel, as well as the
EFI stub Makefile to fix this.

Also add the flag to boot/Makefile and realmode/rm/Makefile so that the
kernel's boot code (boot/setup.elf) and realmode trampoline
(realmode/rm/realmode.elf) won't be compiled with .eh_frame sections,
since their linker scripts also just discard them.

[1] https://lore.kernel.org/lkml/[email protected]/

Suggested-by: Fangrui Song <[email protected]>
Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Reviewed-by: Nick Desaulniers <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
arch/x86/boot/Makefile | 1 +
arch/x86/boot/compressed/Makefile | 1 +
arch/x86/realmode/rm/Makefile | 1 +
drivers/firmware/efi/libstub/Makefile | 3 ++-
4 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index 012b82f..24f011e 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -68,6 +68,7 @@ clean-files += cpustr.h
KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
+KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
UBSAN_SANITIZE := n

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 26050ae..c331113 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -39,6 +39,7 @@ KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)
KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
KBUILD_CFLAGS += -Wno-pointer-sign
KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
+KBUILD_CFLAGS += -fno-asynchronous-unwind-tables

KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
GCOV_PROFILE := n
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index 99b6332..b11ec5d 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -71,5 +71,6 @@ $(obj)/realmode.relocs: $(obj)/realmode.elf FORCE
KBUILD_CFLAGS := $(REALMODE_CFLAGS) -D_SETUP -D_WAKEUP \
-I$(srctree)/arch/x86/boot
KBUILD_AFLAGS := $(KBUILD_CFLAGS) -D__ASSEMBLY__
+KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
GCOV_PROFILE := n
UBSAN_SANITIZE := n
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index 98a8157..a1140c4 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -12,7 +12,8 @@ cflags-$(CONFIG_X86) += -m$(BITS) -D__KERNEL__ -O2 \
-mno-mmx -mno-sse -fshort-wchar \
-Wno-pointer-sign \
$(call cc-disable-warning, address-of-packed-member) \
- $(call cc-disable-warning, gnu)
+ $(call cc-disable-warning, gnu) \
+ -fno-asynchronous-unwind-tables

# arm64 uses the full KBUILD_CFLAGS so it's necessary to explicitly
# disable the stackleak plugin

2020-02-25 18:31:55

by Arvind Sankar

[permalink] [raw]
Subject: Re: --orphan-handling=warn

On Mon, Feb 24, 2020 at 09:35:04PM -0800, Kees Cook wrote:
>
> Actually, it's rather opposed to the FGKASLR series, as for that, I need
> some kind of linker script directive like this:
>
> /PASSTHRU/ : {
> *(.text.*)
> }
>
> Where "PASSTHRU" would create a 1-to-1 input-section to output-section
> with the same name, flags, etc.
>
> ld.bfd's handling of orphan sections named .text.* is to put them each
> as a separate output section, after the existing .text output section.
>
> ld.lld's handling of orphan sections named .text.* is to put them into
> the .text output section.

This doesn't match ld's documentation [1] of how orphan sections are to
be handled, it's supposed to append it into an existing output section
only if the names match exactly, creating a new one if that isn't so. If
ld.lld is to be a drop-in replacement for ld.bfd, this probably needs to
change?

Also ld.lld doesn't seem to support the --unique option, I think you'll
also want that for FGKASLR to avoid merging static functions with the
same name from unrelated source files.

[1] https://sourceware.org/binutils/docs/ld/Orphan-Sections.html

>
> For FGKASLR (as it is currently implemented[2]), the sections need to be
> individually named output sections (as bfd does it). *However*, with the
> "warn on orphans" patch, FGKASLR's intentional orphaning will backfire
> (I guess the warning could be turned off, but I'd like lld to handle
> FGKASLR at some point.)
>
> Note that cheating and doing the 1-to-1 mapping by handy with a 40,000
> entry linker script ... made ld.lld take about 15 minutes to do the
> final link. :(

Out of curiosity, how long does ld.bfd take on that linker script :)

2020-02-25 19:43:49

by Kees Cook

[permalink] [raw]
Subject: Re: --orphan-handling=warn

On Tue, Feb 25, 2020 at 01:29:51PM -0500, Arvind Sankar wrote:
> On Mon, Feb 24, 2020 at 09:35:04PM -0800, Kees Cook wrote:
> > Actually, it's rather opposed to the FGKASLR series, as for that, I need
> > some kind of linker script directive like this:
> >
> > /PASSTHRU/ : {
> > *(.text.*)
> > }
> >
> > Where "PASSTHRU" would create a 1-to-1 input-section to output-section
> > with the same name, flags, etc.
> >
> > ld.bfd's handling of orphan sections named .text.* is to put them each
> > as a separate output section, after the existing .text output section.
> >
> > ld.lld's handling of orphan sections named .text.* is to put them into
> > the .text output section.
>
> This doesn't match ld's documentation [1] of how orphan sections are to
> be handled, it's supposed to append it into an existing output section
> only if the names match exactly, creating a new one if that isn't so. If
> ld.lld is to be a drop-in replacement for ld.bfd, this probably needs to
> change?

That would be lovely! :P

> Also ld.lld doesn't seem to support the --unique option, I think you'll
> also want that for FGKASLR to avoid merging static functions with the
> same name from unrelated source files.

Right, yes, that seems like something we have to depend on.

>
> [1] https://sourceware.org/binutils/docs/ld/Orphan-Sections.html
>
> >
> > For FGKASLR (as it is currently implemented[2]), the sections need to be
> > individually named output sections (as bfd does it). *However*, with the
> > "warn on orphans" patch, FGKASLR's intentional orphaning will backfire
> > (I guess the warning could be turned off, but I'd like lld to handle
> > FGKASLR at some point.)
> >
> > Note that cheating and doing the 1-to-1 mapping by handy with a 40,000
> > entry linker script ... made ld.lld take about 15 minutes to do the
> > final link. :(
>
> Out of curiosity, how long does ld.bfd take on that linker script :)

A single CPU at 100% for 15 minutes. :)

--
Kees Cook

2020-02-25 20:48:07

by Nick Desaulniers

[permalink] [raw]
Subject: Re: --orphan-handling=warn

On Tue, Feb 25, 2020 at 11:43 AM Kees Cook <[email protected]> wrote:
>
> On Tue, Feb 25, 2020 at 01:29:51PM -0500, Arvind Sankar wrote:
> > On Mon, Feb 24, 2020 at 09:35:04PM -0800, Kees Cook wrote:
> > > Note that cheating and doing the 1-to-1 mapping by handy with a 40,000
> > > entry linker script ... made ld.lld take about 15 minutes to do the
> > > final link. :(
> >
> > Out of curiosity, how long does ld.bfd take on that linker script :)
>
> A single CPU at 100% for 15 minutes. :)

I can see the implementers of linker script handling thinking "surely
no one would ever have >10k entries." Then we invented things like
-ffunction-sections, -fdata-sections, (per basic block equivalents:
https://reviews.llvm.org/D68049) and then finally FGKASLR. "640k ought
to be enough for anybody" and such.
--
Thanks,
~Nick Desaulniers

2020-02-25 22:04:52

by Kees Cook

[permalink] [raw]
Subject: Re: --orphan-handling=warn

On Tue, Feb 25, 2020 at 12:37:26PM -0800, Nick Desaulniers wrote:
> On Tue, Feb 25, 2020 at 11:43 AM Kees Cook <[email protected]> wrote:
> >
> > On Tue, Feb 25, 2020 at 01:29:51PM -0500, Arvind Sankar wrote:
> > > On Mon, Feb 24, 2020 at 09:35:04PM -0800, Kees Cook wrote:
> > > > Note that cheating and doing the 1-to-1 mapping by handy with a 40,000
> > > > entry linker script ... made ld.lld take about 15 minutes to do the
> > > > final link. :(
> > >
> > > Out of curiosity, how long does ld.bfd take on that linker script :)
> >
> > A single CPU at 100% for 15 minutes. :)
>
> I can see the implementers of linker script handling thinking "surely
> no one would ever have >10k entries." Then we invented things like
> -ffunction-sections, -fdata-sections, (per basic block equivalents:
> https://reviews.llvm.org/D68049) and then finally FGKASLR. "640k ought
> to be enough for anybody" and such.

Heh, yeah. I had no expectation that it would work _well_; I just
wanted to test if it _could_ work. And it did: FGKASLR up and running
on Clang+LLD. I stopped there before attempting the next step:
FGKASLR+LTO+CFI, which I assume would be hilariously slow linking.

--
Kees Cook

2020-02-26 01:56:33

by Fangrui Song

[permalink] [raw]
Subject: Re: --orphan-handling=warn

> > Kees is working on a series to just be explicit about what sections
> > are ordered where, and what's discarded, which should better handle
> > incompatibilities between linkers in regards to orphan section
> > placement and "what does `*` mean." Kees, that series can't come soon
>
> So, with my series[1] applied, ld.bfd builds clean. With ld.lld, I get a
> TON of warnings, such as:
>
> (.bss.rel.ro) is being placed in '.bss.rel.ro'

.bss.rel.ro (SHT_NOBITS) is lld specific. GNU ld does not have it. It is
currently used for copy relocations of symbols in read-only PT_LOAD
segments. If a relro section's statically relocated data is all zeros,
we can move the section to .bss.rel.ro

> (.iplt) is being placed in '.iplt'
> (.plt) is being placed in '.plt'
> (.rela.altinstr_aux) is being placed in '.rela.altinstr_aux'
> (.rela.altinstr_replacement) is being placed in
> '.rela.altinstr_replacement'
> (.rela.altinstructions) is being placed in '.rela.altinstructions'
> (.rela.apicdrivers) is being placed in '.rela.apicdrivers'
> (.rela__bug_table) is being placed in '.rela__bug_table'
> (.rela.con_initcall.init) is being placed in '.rela.init.data'
> (.rela.cpuidle.text) is being placed in '.rela.text'
> (.rela.data..cacheline_aligned) is being placed in '.rela.data'
> (.rela.data) is being placed in '.rela.data'
> (.rela.data..percpu) is being placed in '.rela.data..percpu'
> (.rela.data..percpu..page_aligned) is being placed in '.rela.data..percpu'
> ...

I need to figure out the exact GNU ld rule for input SHT_REL[A] retained
by --emit-relocs.

ld.bfd: warning: orphan section `.rela.meminit.text' from `arch/x86/kernel/head_64.o' being placed in section `.rela.dyn'
ld.bfd: warning: orphan section `.rela___ksymtab+__ctzsi2' from `arch/x86/kernel/head_64.o' being placed in section `.rela.dyn'
ld.bfd: warning: orphan section `.rela___ksymtab+__clzsi2' from `arch/x86/kernel/head_64.o' being placed in section `.rela.dyn'
ld.bfd: warning: orphan section `.rela___ksymtab+__clzdi2' from `arch/x86/kernel/head_64.o' being placed in section `.rela.dyn'
ld.bfd: warning: orphan section `.rela___ksymtab+__ctzdi2' from `arch/x86/kernel/head_64.o' being placed in section `.rela.dyn'

lld simply ignores such SHT_REL[A] when checking input section descriptions.
A .rela.foo relocating .foo will be named .rela.foobar if .foo is placed in .foobar

It makes sense for --orphan-handling= not to warn/error.
https://reviews.llvm.org/D75151

> But as you can see in the /DISCARD/, these (and all the others), should
> be getting caught:
>
> /DISCARD/ : {
> *(.eh_frame)
> + *(.rela.*) *(.rela_*)
> + *(.rel.*) *(.rel_*)
> + *(.got) *(.got.*)
> + *(.igot.*) *(.iplt)
> }
>
> I don't understand what's happening here. I haven't minimized this case
> nor opened an lld bug yet.

--orphan-handling was implemented per
https://bugs.llvm.org/show_bug.cgi?id=34946
It seems the reporter did not follow up after the feature was implemented.
Now we have the Linux kernel case...
Last December I encountered another case in my company.

It is pretty clear that this feature is useful and we should fix it :)

https://reviews.llvm.org/D75149

> enough. ;) (I think it's intended to help "fine grain" (per function)
> KASLR). More comments in the other thread.

> Actually, it's rather opposed to the FGKASLR series, as for that, I need
> some kind of linker script directive like this:
>
> /PASSTHRU/ : {
> *(.text.*)
> }
>
> Where "PASSTHRU" would create a 1-to-1 input-section to output-section
> with the same name, flags, etc.

/PASSTHRU/ sections are still handled as orphan sections?
Do you restrict { } to input section descriptions, not output section
data (https://sourceware.org/binutils/docs/ld/Output-Section-Data.html#Output-Section-Data)?
or symbol assignments?

You can ask https://sourceware.org/ml/binutils/2020-02/ whether
they'd like to accept the feature request:)

(My personal feeling is that I want to see more use cases to add the new
feature...)

> ld.bfd's handling of orphan sections named .text.* is to put them each
> as a separate output section, after the existing .text output section.
>
> ld.lld's handling of orphan sections named .text.* is to put them into
> the .text output section.

Confirmed. lld can adapt. I need to do some homework...

> For FGKASLR (as it is currently implemented[2]), the sections need to be
> individually named output sections (as bfd does it). *However*, with the
> "warn on orphans" patch, FGKASLR's intentional orphaning will backfire
> (I guess the warning could be turned off, but I'd like lld to handle
> FGKASLR at some point.)
>
> Note that cheating and doing the 1-to-1 mapping by handy with a 40,000
> entry linker script ... made ld.lld take about 15 minutes to do the
> final link. :(

Placing N orphan sections requires O(N^2) time (in both GNU ld and lld) :(

> > Taken from the Zen of Python, but in regards to sections in linker
> > scripts, "explicit is better than implicit."
>
> Totally agreed. I just hope there's a good solution for this PASSTHRU
> idea...
>
> -Kees
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git/log/?h=linker/orphans/x86-arm
> [2]
> https://github.com/kaccardi/linux/commit/127111e8c6170a130d8d12d73728e74acbe05e13

On 2020-02-25, Kees Cook wrote:
>On Tue, Feb 25, 2020 at 12:37:26PM -0800, Nick Desaulniers wrote:
>> On Tue, Feb 25, 2020 at 11:43 AM Kees Cook <[email protected]> wrote:
>> >
>> > On Tue, Feb 25, 2020 at 01:29:51PM -0500, Arvind Sankar wrote:
>> > > On Mon, Feb 24, 2020 at 09:35:04PM -0800, Kees Cook wrote:
>> > > > Note that cheating and doing the 1-to-1 mapping by handy with a 40,000
>> > > > entry linker script ... made ld.lld take about 15 minutes to do the
>> > > > final link. :(
>> > >
>> > > Out of curiosity, how long does ld.bfd take on that linker script :)
>> >
>> > A single CPU at 100% for 15 minutes. :)
>>
>> I can see the implementers of linker script handling thinking "surely
>> no one would ever have >10k entries." Then we invented things like
>> -ffunction-sections, -fdata-sections, (per basic block equivalents:
>> https://reviews.llvm.org/D68049) and then finally FGKASLR. "640k ought
>> to be enough for anybody" and such.
>
>Heh, yeah. I had no expectation that it would work _well_; I just
>wanted to test if it _could_ work. And it did: FGKASLR up and running
>on Clang+LLD. I stopped there before attempting the next step:
>FGKASLR+LTO+CFI, which I assume would be hilariously slow linking.

Now I learned the term FGKASLR... I need to do some homework.

2020-02-26 05:37:13

by Kees Cook

[permalink] [raw]
Subject: Re: --orphan-handling=warn

On Tue, Feb 25, 2020 at 05:56:06PM -0800, Fangrui Song wrote:
> > > Kees is working on a series to just be explicit about what sections
> > > are ordered where, and what's discarded, which should better handle
> > > incompatibilities between linkers in regards to orphan section
> > > placement and "what does `*` mean." Kees, that series can't come soon
> >
> > So, with my series[1] applied, ld.bfd builds clean. With ld.lld, I get a
> > TON of warnings, such as:
> >
> > (.bss.rel.ro) is being placed in '.bss.rel.ro'
>
> .bss.rel.ro (SHT_NOBITS) is lld specific. GNU ld does not have it. It is
> currently used for copy relocations of symbols in read-only PT_LOAD
> segments. If a relro section's statically relocated data is all zeros,
> we can move the section to .bss.rel.ro
>
> > (.iplt) is being placed in '.iplt'
> > (.plt) is being placed in '.plt'
> > (.rela.altinstr_aux) is being placed in '.rela.altinstr_aux'
> > (.rela.altinstr_replacement) is being placed in
> > '.rela.altinstr_replacement'
> > (.rela.altinstructions) is being placed in '.rela.altinstructions'
> > (.rela.apicdrivers) is being placed in '.rela.apicdrivers'
> > (.rela__bug_table) is being placed in '.rela__bug_table'
> > (.rela.con_initcall.init) is being placed in '.rela.init.data'
> > (.rela.cpuidle.text) is being placed in '.rela.text'
> > (.rela.data..cacheline_aligned) is being placed in '.rela.data'
> > (.rela.data) is being placed in '.rela.data'
> > (.rela.data..percpu) is being placed in '.rela.data..percpu'
> > (.rela.data..percpu..page_aligned) is being placed in '.rela.data..percpu'
> > ...
>
> I need to figure out the exact GNU ld rule for input SHT_REL[A] retained
> by --emit-relocs.
>
> ld.bfd: warning: orphan section `.rela.meminit.text' from `arch/x86/kernel/head_64.o' being placed in section `.rela.dyn'
> ld.bfd: warning: orphan section `.rela___ksymtab+__ctzsi2' from `arch/x86/kernel/head_64.o' being placed in section `.rela.dyn'
> ld.bfd: warning: orphan section `.rela___ksymtab+__clzsi2' from `arch/x86/kernel/head_64.o' being placed in section `.rela.dyn'
> ld.bfd: warning: orphan section `.rela___ksymtab+__clzdi2' from `arch/x86/kernel/head_64.o' being placed in section `.rela.dyn'
> ld.bfd: warning: orphan section `.rela___ksymtab+__ctzdi2' from `arch/x86/kernel/head_64.o' being placed in section `.rela.dyn'
>
> lld simply ignores such SHT_REL[A] when checking input section descriptions.
> A .rela.foo relocating .foo will be named .rela.foobar if .foo is placed in .foobar
>
> It makes sense for --orphan-handling= not to warn/error.
> https://reviews.llvm.org/D75151

Awesome! I commented on the review, but just to follow up, this cuts out
all the .rela warnings.

>
> > But as you can see in the /DISCARD/, these (and all the others), should
> > be getting caught:
> >
> > /DISCARD/ : {
> > *(.eh_frame)
> > + *(.rela.*) *(.rela_*)
> > + *(.rel.*) *(.rel_*)
> > + *(.got) *(.got.*)
> > + *(.igot.*) *(.iplt)
> > }
> >
> > I don't understand what's happening here. I haven't minimized this case
> > nor opened an lld bug yet.
>
> --orphan-handling was implemented per
> https://bugs.llvm.org/show_bug.cgi?id=34946
> It seems the reporter did not follow up after the feature was implemented.
> Now we have the Linux kernel case...
> Last December I encountered another case in my company.
>
> It is pretty clear that this feature is useful and we should fix it :)
>
> https://reviews.llvm.org/D75149

Also commented in the review; this removes the following:

... (.bss.rel.ro) is being placed in '.bss.rel.ro'
... (.iplt) is being placed in '.iplt'
... (.plt) is being placed in '.plt'
... (.symtab_shndx) is being placed in '.symtab_shndx'

I'm now only left with:

ld.lld: warning: <internal>:(.symtab) is being placed in '.symtab'
ld.lld: warning: <internal>:(.shstrtab) is being placed in '.shstrtab'
ld.lld: warning: <internal>:(.strtab) is being placed in '.strtab'

> > enough. ;) (I think it's intended to help "fine grain" (per function)
> > KASLR). More comments in the other thread.
>
> > Actually, it's rather opposed to the FGKASLR series, as for that, I need
> > some kind of linker script directive like this:
> >
> > /PASSTHRU/ : {
> > *(.text.*)
> > }
> >
> > Where "PASSTHRU" would create a 1-to-1 input-section to output-section
> > with the same name, flags, etc.
>
> /PASSTHRU/ sections are still handled as orphan sections?

If PASSTHRU existed, I wouldn't want them reported as orphan sections
(since they'd be chosen). This "section" would serve two purposes:

- remove the warning about being orphan
- position the sections in a known location (FGKASLR needs this so that
it can know where the end of the all the .text.* sections actually
are. Right now, the correct result is effectively accidental[1].)

[1] https://lore.kernel.org/lkml/[email protected]/

> Do you restrict { } to input section descriptions, not output section
> data (https://sourceware.org/binutils/docs/ld/Output-Section-Data.html#Output-Section-Data)?
> or symbol assignments?

I only imagine it being input section descriptions. (I imagine PASSTHRU
being much like DISCARD -- it won't make sense to put output section
data there either.)

> You can ask https://sourceware.org/ml/binutils/2020-02/ whether
> they'd like to accept the feature request:)
>
> (My personal feeling is that I want to see more use cases to add the new
> feature...)

This is the only case I've got for this right now. If the "do not merge
the orphan .text.* into .text" bug is fixed, then FGKASLR can continue
to work "on accident", and nothing like PASSTHRU would be needed. :)
In this case, I would have FGKASLR disable the orphan section warning,
since it would depend on orphan handling.

> > ld.bfd's handling of orphan sections named .text.* is to put them each
> > as a separate output section, after the existing .text output section.
> >
> > ld.lld's handling of orphan sections named .text.* is to put them into
> > the .text output section.
>
> Confirmed. lld can adapt. I need to do some homework...

Awesome. If it matches bfd, then that'll be sufficient:

https://sourceware.org/binutils/docs/ld/Orphan-Sections.html

Namely:
"If there is no output section with a matching name then new
output sections will be created."
and:
"... the linker attempts to place orphan sections after sections
of the same attribute ..."

So .text.blah would end up in .text.blah but after .text (since they're
both code).

(And as noted earlier, --unique support is needed too.)

> > For FGKASLR (as it is currently implemented[2]), the sections need to be
> > individually named output sections (as bfd does it). *However*, with the
> > "warn on orphans" patch, FGKASLR's intentional orphaning will backfire
> > (I guess the warning could be turned off, but I'd like lld to handle
> > FGKASLR at some point.)
> >
> > Note that cheating and doing the 1-to-1 mapping by handy with a 40,000
> > entry linker script ... made ld.lld take about 15 minutes to do the
> > final link. :(
>
> Placing N orphan sections requires O(N^2) time (in both GNU ld and lld) :(

Owchy. :)

> [...]
> Now I learned the term FGKASLR... I need to do some homework.

Thanks for looking into this! It'll be really nice to have the orphan
section warnings working in the kernel. And getting the ground work for
FGKASLR ready will be nice!

Kristen, can I convince you that FG stands for function-granular
instead of fine-grain? :)

--
Kees Cook

2020-02-26 19:11:39

by Kristen Carlson Accardi

[permalink] [raw]
Subject: Re: --orphan-handling=warn

On Tue, 2020-02-25 at 21:35 -0800, Kees Cook wrote:
>
> Thanks for looking into this! It'll be really nice to have the orphan
> section warnings working in the kernel. And getting the ground work
> for
> FGKASLR ready will be nice!
>
> Kristen, can I convince you that FG stands for function-granular
> instead of fine-grain? :)
>

Yes, sounds good to me - that way if we ever do basic block reordering
or some crazy thing like that we don't have to say even-finer-fine-
grained KASLR :).

Thanks for your help making this work.


2020-02-26 19:27:17

by Nick Desaulniers

[permalink] [raw]
Subject: Re: --orphan-handling=warn

On Wed, Feb 26, 2020 at 11:11 AM Kristen Carlson Accardi
<[email protected]> wrote:
>
> On Tue, 2020-02-25 at 21:35 -0800, Kees Cook wrote:
> >
> > Thanks for looking into this! It'll be really nice to have the orphan
> > section warnings working in the kernel. And getting the ground work
> > for
> > FGKASLR ready will be nice!
> >
> > Kristen, can I convince you that FG stands for function-granular
> > instead of fine-grain? :)
> >
>
> Yes, sounds good to me - that way if we ever do basic block reordering
> or some crazy thing like that we don't have to say even-finer-fine-
> grained KASLR :).

LOL (I laugh now, but "never say never.")
--
Thanks,
~Nick Desaulniers

2020-04-05 16:53:33

by Sergey Shatunov

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

This patch causes some strange things happens with my laptop.

Cold boot crashed in some early initilization logic with message 'Failed to execute /esp/.../linux.efi: Buffer Too Small'.
After couple reboots into firmware setup (bios) or hot reboot from other working kernel (without that commit) helps it to boot.
During bisecting couple times I saw different message: 'exit_efi() failed; efi_main() failed', but above tricks helps it too.
So bisect points to that commit and I tried to compile kernel with that commit reverted and it works flawlessly.

Some notes about my setup:
Kernel tree I used: Torvalds git (which recently pulls that commit).
Kernel itself with initrd and cmdline packed into systemd-boot stub (probably here can be issues too, not sure).
Secure boot enabled with custom keyring.

I can provide more info or change my setup (for example get rid of systemd-boot stub) if needed for sure.

2020-04-05 23:20:18

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Sun, Apr 05, 2020 at 10:42:46PM +0700, Sergey Shatunov wrote:
> This patch causes some strange things happens with my laptop.
>
> Cold boot crashed in some early initilization logic with message 'Failed to execute /esp/.../linux.efi: Buffer Too Small'.
> After couple reboots into firmware setup (bios) or hot reboot from other working kernel (without that commit) helps it to boot.
> During bisecting couple times I saw different message: 'exit_efi() failed; efi_main() failed', but above tricks helps it too.
> So bisect points to that commit and I tried to compile kernel with that commit reverted and it works flawlessly.
>
> Some notes about my setup:
> Kernel tree I used: Torvalds git (which recently pulls that commit).
> Kernel itself with initrd and cmdline packed into systemd-boot stub (probably here can be issues too, not sure).
> Secure boot enabled with custom keyring.
>
> I can provide more info or change my setup (for example get rid of systemd-boot stub) if needed for sure.

I'm not familiar with systemd-boot: when you say systemd-boot stub, is
that something different from the kernel's EFI_STUB option? Or is it
just a kernel with EFI_STUB enabled and with builtin initramfs + builtin
cmdline? If it is different, can you point me to the tool that creates
it?

The first error message you mention should be from systemd-boot after
the kernel exits with an EFI_BUFFER_TOO_SMALL error status. The second
one -- is that 'exit_boot() failed' rather than 'exit_efi() failed'? I
can't find the latter string in the tree.

Can you also specify a commit tag that is broken + works with this patch
reverted, just to make sure we're looking at the same code?

Thanks

Cc linux-efi.

2020-04-06 00:11:30

by Sergey Shatunov

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Sun, 2020-04-05 at 19:18 -0400, Arvind Sankar wrote:
> I'm not familiar with systemd-boot: when you say systemd-boot stub,
> is
> that something different from the kernel's EFI_STUB option? Or is it
> just a kernel with EFI_STUB enabled and with builtin initramfs +
> builtin
> cmdline?
Basicaly systemd-boot stub is efi application with packed EFI_STUB-
enabled kernel, initrd and cmdline into single file. Source can be
found here:
https://github.com/systemd/systemd/blob/master/src/boot/efi/stub.c

It doesn't do anything unusual, just extracting data from sections and
calling efi handover.

Final image created by objcopy'ing precompiled stub and adding sections with that stuff:

objcopy \
--add-section .osrel=os_release --change-section-vma
'.osrel=0x20000' \
--add-section .cmdline=cmdline --change-section-vma
'.cmdline=0x30000' \
--add-section .linux=vmlinuz --change-section-vma
'.linux=0x2000000' \
--add-section .initrd=initrd --change-section-vma
'.initrd=0x3000000' \
/usr/lib/systemd/boot/efi/linuxx64.efi.stub output.efi

> The first error message you mention should be from systemd-boot after
> the kernel exits with an EFI_BUFFER_TOO_SMALL error status. The
> second
> one -- is that 'exit_boot() failed' rather than 'exit_efi() failed'?
> I
> can't find the latter string in the tree.
Yeah, probably it's exit_boot()

> Can you also specify a commit tag that is broken + works with this
> patch
> reverted, just to make sure we're looking at the same code?
I'm using latest master branch from Torvalds tree, so latest commit
(for now it's a10c9c710f9ecea87b9f4bbb837467893b4bef01) broken, but
with 3ee372ccce4d4e7c610748d0583979d3ed3a0cf4 reverted works.

Not sure
what I should specify as broken except the
3ee372ccce4d4e7c610748d0583979d3ed3a0cf4 itself.
But as I told 'Buffer
Too Small' changed to 'exit_boot() failed' couple times during
bisecting (it was closely around that commit, probably in commits about
.eh_frame section), but I don't remember exactly commits when this
happened. I could do one more bisecting with more details about each
step if you wish.

2020-04-06 03:51:48

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, Apr 06, 2020 at 07:00:39AM +0700, Sergey Shatunov wrote:
> On Sun, 2020-04-05 at 19:18 -0400, Arvind Sankar wrote:
> > I'm not familiar with systemd-boot: when you say systemd-boot stub,
> > is
> > that something different from the kernel's EFI_STUB option? Or is it
> > just a kernel with EFI_STUB enabled and with builtin initramfs +
> > builtin
> > cmdline?
> Basicaly systemd-boot stub is efi application with packed EFI_STUB-
> enabled kernel, initrd and cmdline into single file. Source can be
> found here:
> https://github.com/systemd/systemd/blob/master/src/boot/efi/stub.c
>
> It doesn't do anything unusual, just extracting data from sections and
> calling efi handover.
>
> Final image created by objcopy'ing precompiled stub and adding sections with that stuff:
>
> objcopy \
> --add-section .osrel=os_release --change-section-vma
> '.osrel=0x20000' \
> --add-section .cmdline=cmdline --change-section-vma
> '.cmdline=0x30000' \
> --add-section .linux=vmlinuz --change-section-vma
> '.linux=0x2000000' \
> --add-section .initrd=initrd --change-section-vma
> '.initrd=0x3000000' \
> /usr/lib/systemd/boot/efi/linuxx64.efi.stub output.efi

So this embeds the bzImage which is a PE executable inside another PE
executable. Before my patch, the bss section was explicitly part of the
bzImage and so would have been zero, now it isn't any more and the PE
loader is expected to zero it out before executing. systemd-boot's stub
loader doesn't do that prior to jumping to the EFI handover entry, so
the issue must be because bss contains garbage. I'm not 100% sure why
that leads to a crash, as the only variables in bss in the EFI stub are
for some boolean EFI command line arguments, so it ought to still have
worked, just as though it was invoked with random arguments. Anyway we
need to handle an uninitialized bss to get this to work properly.

I also see from systemd [0] and dracut source [1] that these VMA's seem
to be hardcoded with no checking for how big the files actually are, and
objcopy doesn't seem to complain if sections end up overlapping.

So since [2] in dracut, the space available for the .linux section
containing the bzImage shrank from ~48MiB to 16MiB. This will hopefully
still fit the compressed kernel (although an allyesconfig bzImage is far
bigger than even 48MiB), but in-place decompression is unlikely to be
possible even for a normal config, which will break another patchset
that got merged into mainline for 5.7 [3,4], which tries to avoid
copying the kernel unless necessary, and has a good chance of triggering
in-place decompression if kaslr is disabled.

I'll get systemd-boot installed here so I can reproduce and implement
some workarounds for both issues. I should hopefully have a fix in a day
or two.

[0] https://github.com/systemd/systemd/blob/9fac14980df8dcce922e1fe8856a88b09590d2c3/test/test-efi-create-disk.sh#L30
[1] https://git.kernel.org/pub/scm/boot/dracut/dracut.git/tree/dracut.sh#n2039
[2] https://git.kernel.org/pub/scm/boot/dracut/dracut.git/commit/?id=4237aeb040c276722b528001bdea31e6eb994d06
[3] https://lore.kernel.org/linux-efi/[email protected]/
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d5cdf4cfeac914617ca22866bd4685fd7f876dec

2020-04-06 07:34:35

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, 6 Apr 2020 at 05:51, Arvind Sankar <[email protected]> wrote:
>
> On Mon, Apr 06, 2020 at 07:00:39AM +0700, Sergey Shatunov wrote:
> > On Sun, 2020-04-05 at 19:18 -0400, Arvind Sankar wrote:
> > > I'm not familiar with systemd-boot: when you say systemd-boot stub,
> > > is
> > > that something different from the kernel's EFI_STUB option? Or is it
> > > just a kernel with EFI_STUB enabled and with builtin initramfs +
> > > builtin
> > > cmdline?
> > Basicaly systemd-boot stub is efi application with packed EFI_STUB-
> > enabled kernel, initrd and cmdline into single file. Source can be
> > found here:
> > https://github.com/systemd/systemd/blob/master/src/boot/efi/stub.c
> >
> > It doesn't do anything unusual, just extracting data from sections and
> > calling efi handover.
> >
> > Final image created by objcopy'ing precompiled stub and adding sections with that stuff:
> >
> > objcopy \
> > --add-section .osrel=os_release --change-section-vma
> > '.osrel=0x20000' \
> > --add-section .cmdline=cmdline --change-section-vma
> > '.cmdline=0x30000' \
> > --add-section .linux=vmlinuz --change-section-vma
> > '.linux=0x2000000' \
> > --add-section .initrd=initrd --change-section-vma
> > '.initrd=0x3000000' \
> > /usr/lib/systemd/boot/efi/linuxx64.efi.stub output.efi
>
> So this embeds the bzImage which is a PE executable inside another PE
> executable. Before my patch, the bss section was explicitly part of the
> bzImage and so would have been zero, now it isn't any more and the PE
> loader is expected to zero it out before executing. systemd-boot's stub
> loader doesn't do that prior to jumping to the EFI handover entry, so
> the issue must be because bss contains garbage. I'm not 100% sure why
> that leads to a crash, as the only variables in bss in the EFI stub are
> for some boolean EFI command line arguments, so it ought to still have
> worked, just as though it was invoked with random arguments. Anyway we
> need to handle an uninitialized bss to get this to work properly.
>

The EFI handover protocol strikes again :-(

It seems we did not include any guidance in the documentation in
Documentation/x86/boot.rst regarding zero-initializing BSS, and come
to think of it, we don't include any other requirements either, i.e.,
regarding placement wrt section alignment etc. This is a serious bug.
Even though EFI usually lays out PE/COFF images in files the exact way
they appear in memory, this is not actually required by the spec. Most
notably, the virtual size can be smaller than the file size, and the
loader is expected to zero-initialize the difference as well.

Since the EFI handover protocol should be considered deprecated at
this point (and is never going to be supported in upstream GRUB
either, for instance), I would recommend the systemd-boot developers
to start looking into deprecating this as well, and switch to the
ordinary PE/COFF entry point, and use the new initrd callback protocol
for initrd loading.

On the Linux/x86 side, we should at least add some code to the EFI
handover protocol entry point to zero initialize BSS, and ensure that
it is either not needed in other places, or add the code to deal with
those as well.

> I also see from systemd [0] and dracut source [1] that these VMA's seem
> to be hardcoded with no checking for how big the files actually are, and
> objcopy doesn't seem to complain if sections end up overlapping.
>
> So since [2] in dracut, the space available for the .linux section
> containing the bzImage shrank from ~48MiB to 16MiB. This will hopefully
> still fit the compressed kernel (although an allyesconfig bzImage is far
> bigger than even 48MiB), but in-place decompression is unlikely to be
> possible even for a normal config, which will break another patchset
> that got merged into mainline for 5.7 [3,4], which tries to avoid
> copying the kernel unless necessary, and has a good chance of triggering
> in-place decompression if kaslr is disabled.
>
> I'll get systemd-boot installed here so I can reproduce and implement
> some workarounds for both issues. I should hopefully have a fix in a day
> or two.
>

Thanks Arvind.

2020-04-06 08:45:04

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, 6 Apr 2020 at 01:18, Arvind Sankar <[email protected]> wrote:
>
> On Sun, Apr 05, 2020 at 10:42:46PM +0700, Sergey Shatunov wrote:
> > This patch causes some strange things happens with my laptop.
> >
> > Cold boot crashed in some early initilization logic with message 'Failed to execute /esp/.../linux.efi: Buffer Too Small'.
> > After couple reboots into firmware setup (bios) or hot reboot from other working kernel (without that commit) helps it to boot.
> > During bisecting couple times I saw different message: 'exit_efi() failed; efi_main() failed', but above tricks helps it too.

Could you please try adding 'efi=no_disable_early_pci_dma' to the
kernel command line? The lack of BSS zeroization may result in that
option to get inadvertently enabled, and it is known to break
exit_boot() on some systems.

2020-04-06 08:48:12

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, Apr 06, 2020 at 09:32:47AM +0200, Ard Biesheuvel wrote:
> The EFI handover protocol strikes again :-(
>
> It seems we did not include any guidance in the documentation in
> Documentation/x86/boot.rst regarding zero-initializing BSS, and come
> to think of it, we don't include any other requirements either, i.e.,
> regarding placement wrt section alignment etc. This is a serious bug.
> Even though EFI usually lays out PE/COFF images in files the exact way
> they appear in memory, this is not actually required by the spec. Most
> notably, the virtual size can be smaller than the file size, and the
> loader is expected to zero-initialize the difference as well.

Is that expectation stated explicitly somewhere?

> Since the EFI handover protocol should be considered deprecated at
> this point (and is never going to be supported in upstream GRUB
> either, for instance), I would recommend the systemd-boot developers
> to start looking into deprecating this as well, and switch to the
> ordinary PE/COFF entry point, and use the new initrd callback protocol
> for initrd loading.

Any pointers to that new initrd callback protocol?

In any case, I'd really appreciate a patch to boot.rst formulating those
requirements so that they're written down and people can find them.

> On the Linux/x86 side, we should at least add some code to the EFI
> handover protocol entry point to zero initialize BSS, and ensure that
> it is either not needed in other places, or add the code to deal with
> those as well.

Sounds like a simple fix, if that would fix it.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-04-06 09:12:33

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, 6 Apr 2020 at 10:47, Borislav Petkov <[email protected]> wrote:
>
> On Mon, Apr 06, 2020 at 09:32:47AM +0200, Ard Biesheuvel wrote:
> > The EFI handover protocol strikes again :-(
> >
> > It seems we did not include any guidance in the documentation in
> > Documentation/x86/boot.rst regarding zero-initializing BSS, and come
> > to think of it, we don't include any other requirements either, i.e.,
> > regarding placement wrt section alignment etc. This is a serious bug.
> > Even though EFI usually lays out PE/COFF images in files the exact way
> > they appear in memory, this is not actually required by the spec. Most
> > notably, the virtual size can be smaller than the file size, and the
> > loader is expected to zero-initialize the difference as well.
>
> Is that expectation stated explicitly somewhere?
>

Yes, it is in the PE/COFF specification. [0]

The whole problem is that we are conflating 'loading a PE/COFF image'
with 'copying a PE/COFF image into memory', which are not the same
thing. It is not just the layout issue, we are running into other
problems with things like UEFI secure boot and TPM-based measured
boot, where the fact that omitting the standard LoadImage() boot
service (which takes care of these things under the hood) means that
you now have to do your own checks and measurements. These things are
literally all over the place at the moment, shim, GRUB, systemd-boot
etc, with no authoritative spec that describes which component should
be doing what.

> > Since the EFI handover protocol should be considered deprecated at
> > this point (and is never going to be supported in upstream GRUB
> > either, for instance), I would recommend the systemd-boot developers
> > to start looking into deprecating this as well, and switch to the
> > ordinary PE/COFF entry point, and use the new initrd callback protocol
> > for initrd loading.
>
> Any pointers to that new initrd callback protocol?
>

Commit ec93fc371f014a6fb483e3556061ecad4b40735c has the background, but ...

> In any case, I'd really appreciate a patch to boot.rst formulating those
> requirements so that they're written down and people can find them.
>

... I'll look into updating the documentation as well. Note that this
stuff is hot off the press, so there may be some issues lurking (like
this one) that we hadn't thought of yet.

OVMF and u-boot already have implementations of this initrd loading
approach. A GRUB version is under discussion.

> > On the Linux/x86 side, we should at least add some code to the EFI
> > handover protocol entry point to zero initialize BSS, and ensure that
> > it is either not needed in other places, or add the code to deal with
> > those as well.
>
> Sounds like a simple fix, if that would fix it.
>

Actually, it may be sufficient to #define __efistub_global to
__section(.data) like we already do for ARM, to ensure that these
global flags are always initialized correctly. (I'll wait for Sergey
to confirm that the spurious enabling of the PCI DMA protection
resulting from this BSS issue is causing the boot regression)



[0] https://docs.microsoft.com/en-us/windows/win32/debug/pe-format

2020-04-06 11:21:19

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, Apr 06, 2020 at 11:11:21AM +0200, Ard Biesheuvel wrote:
> Yes, it is in the PE/COFF specification. [0]
>
> The whole problem is that we are conflating 'loading a PE/COFF image'
> with 'copying a PE/COFF image into memory', which are not the same
> thing. It is not just the layout issue, we are running into other
> problems with things like UEFI secure boot and TPM-based measured
> boot, where the fact that omitting the standard LoadImage() boot
> service (which takes care of these things under the hood) means that
> you now have to do your own checks and measurements. These things are
> literally all over the place at the moment, shim, GRUB, systemd-boot
> etc, with no authoritative spec that describes which component should
> be doing what.

Sounds to me like what LoadImage() does is what the authoritative spec
should be. Perhaps we should write it down as "Do what LoadImage()
does... " and then enumerate the requirements.

> Commit ec93fc371f014a6fb483e3556061ecad4b40735c has the background, but ...

Nice, I like the aspect of letting firmware do only a minimum amount of
work. :)

> ... I'll look into updating the documentation as well.

Thanks!

> Note that this stuff is hot off the press, so there may be some issues
> lurking (like this one) that we hadn't thought of yet.

Right.

> Actually, it may be sufficient to #define __efistub_global to
> __section(.data) like we already do for ARM, to ensure that these
> global flags are always initialized correctly. (I'll wait for Sergey
> to confirm that the spurious enabling of the PCI DMA protection
> resulting from this BSS issue is causing the boot regression)

Cool, but let's not jinx it. :-)

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-04-06 12:38:42

by Sergey Shatunov

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, 2020-04-06 at 10:44 +0200, Ard Biesheuvel wrote:
> On Mon, 6 Apr 2020 at 01:18, Arvind Sankar <[email protected]>
> wrote:
> > On Sun, Apr 05, 2020 at 10:42:46PM +0700, Sergey Shatunov wrote:
> > > This patch causes some strange things happens with my laptop.
> > >
> > > Cold boot crashed in some early initilization logic with message
> > > 'Failed to execute /esp/.../linux.efi: Buffer Too Small'.
> > > After couple reboots into firmware setup (bios) or hot reboot
> > > from other working kernel (without that commit) helps it to boot.
> > > During bisecting couple times I saw different message:
> > > 'exit_efi() failed; efi_main() failed', but above tricks helps it
> > > too.
>
> Could you please try adding 'efi=no_disable_early_pci_dma' to the
> kernel command line? The lack of BSS zeroization may result in that
> option to get inadvertently enabled, and it is known to break
> exit_boot() on some systems.

With 'efi=no_disable_early_pci_dma' it works again.

2020-04-06 13:21:17

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, 6 Apr 2020 at 15:14, Sergey Shatunov <[email protected]> wrote:
>
> On Mon, 2020-04-06 at 10:44 +0200, Ard Biesheuvel wrote:
> > On Mon, 6 Apr 2020 at 01:18, Arvind Sankar <[email protected]>
> > wrote:
> > > On Sun, Apr 05, 2020 at 10:42:46PM +0700, Sergey Shatunov wrote:
> > > > This patch causes some strange things happens with my laptop.
> > > >
> > > > Cold boot crashed in some early initilization logic with message
> > > > 'Failed to execute /esp/.../linux.efi: Buffer Too Small'.
> > > > After couple reboots into firmware setup (bios) or hot reboot
> > > > from other working kernel (without that commit) helps it to boot.
> > > > During bisecting couple times I saw different message:
> > > > 'exit_efi() failed; efi_main() failed', but above tricks helps it
> > > > too.
> >
> > Could you please try adding 'efi=no_disable_early_pci_dma' to the
> > kernel command line? The lack of BSS zeroization may result in that
> > option to get inadvertently enabled, and it is known to break
> > exit_boot() on some systems.
>
> With 'efi=no_disable_early_pci_dma' it works again.
>

Thanks Sergey

2020-04-06 13:23:55

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, Apr 06, 2020 at 01:20:42PM +0200, Borislav Petkov wrote:
> On Mon, Apr 06, 2020 at 11:11:21AM +0200, Ard Biesheuvel wrote:
> > Yes, it is in the PE/COFF specification. [0]
> >
> > The whole problem is that we are conflating 'loading a PE/COFF image'
> > with 'copying a PE/COFF image into memory', which are not the same
> > thing. It is not just the layout issue, we are running into other
> > problems with things like UEFI secure boot and TPM-based measured
> > boot, where the fact that omitting the standard LoadImage() boot
> > service (which takes care of these things under the hood) means that
> > you now have to do your own checks and measurements. These things are
> > literally all over the place at the moment, shim, GRUB, systemd-boot
> > etc, with no authoritative spec that describes which component should
> > be doing what.
>
> Sounds to me like what LoadImage() does is what the authoritative spec
> should be. Perhaps we should write it down as "Do what LoadImage()
> does... " and then enumerate the requirements.
>
> > Commit ec93fc371f014a6fb483e3556061ecad4b40735c has the background, but ...
>
> Nice, I like the aspect of letting firmware do only a minimum amount of
> work. :)
>
> > ... I'll look into updating the documentation as well.
>
> Thanks!
>
> > Note that this stuff is hot off the press, so there may be some issues
> > lurking (like this one) that we hadn't thought of yet.
>
> Right.
>
> > Actually, it may be sufficient to #define __efistub_global to
> > __section(.data) like we already do for ARM, to ensure that these
> > global flags are always initialized correctly. (I'll wait for Sergey
> > to confirm that the spurious enabling of the PCI DMA protection
> > resulting from this BSS issue is causing the boot regression)

Yeah I thought of that as the easiest fix, but it might be safer to
explicitly zero-init in efi_main to avoid future problems in case
someone adds another variable in bss and isn't aware of this obscure
requirement. We actually already have sys_table in bss, but that one is
always initialized. There's also other globals that aren't annotated
(but not in bss by virtue of having initializers). What do you think?

What do you think of the other problem -- that's actually worse to fix,
as it won't just be when kaslr is disabled, the startup_64 code will do
relocation to the end of init_size and clobber the initrd before getting
to the kaslr code, so it will break as soon as the firmware loads the
"unified kernel image" at a 2Mb-aligned address. The only thing I can
think of is to just unconditionally call efi_relocate_kernel if we were
entered via handover_entry?

>
> Cool, but let's not jinx it. :-)
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette

2020-04-06 13:31:15

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, 6 Apr 2020 at 15:22, Arvind Sankar <[email protected]> wrote:
>
> On Mon, Apr 06, 2020 at 01:20:42PM +0200, Borislav Petkov wrote:
> > On Mon, Apr 06, 2020 at 11:11:21AM +0200, Ard Biesheuvel wrote:
> > > Yes, it is in the PE/COFF specification. [0]
> > >
> > > The whole problem is that we are conflating 'loading a PE/COFF image'
> > > with 'copying a PE/COFF image into memory', which are not the same
> > > thing. It is not just the layout issue, we are running into other
> > > problems with things like UEFI secure boot and TPM-based measured
> > > boot, where the fact that omitting the standard LoadImage() boot
> > > service (which takes care of these things under the hood) means that
> > > you now have to do your own checks and measurements. These things are
> > > literally all over the place at the moment, shim, GRUB, systemd-boot
> > > etc, with no authoritative spec that describes which component should
> > > be doing what.
> >
> > Sounds to me like what LoadImage() does is what the authoritative spec
> > should be. Perhaps we should write it down as "Do what LoadImage()
> > does... " and then enumerate the requirements.
> >
> > > Commit ec93fc371f014a6fb483e3556061ecad4b40735c has the background, but ...
> >
> > Nice, I like the aspect of letting firmware do only a minimum amount of
> > work. :)
> >
> > > ... I'll look into updating the documentation as well.
> >
> > Thanks!
> >
> > > Note that this stuff is hot off the press, so there may be some issues
> > > lurking (like this one) that we hadn't thought of yet.
> >
> > Right.
> >
> > > Actually, it may be sufficient to #define __efistub_global to
> > > __section(.data) like we already do for ARM, to ensure that these
> > > global flags are always initialized correctly. (I'll wait for Sergey
> > > to confirm that the spurious enabling of the PCI DMA protection
> > > resulting from this BSS issue is causing the boot regression)
>
> Yeah I thought of that as the easiest fix, but it might be safer to
> explicitly zero-init in efi_main to avoid future problems in case
> someone adds another variable in bss and isn't aware of this obscure
> requirement. We actually already have sys_table in bss, but that one is
> always initialized. There's also other globals that aren't annotated
> (but not in bss by virtue of having initializers). What do you think?
>

*If* we zero init BSS, I'd prefer for it to be done in the EFI
handover protocol entrypoint only. But does that fix the issue that
BSS lives outside of the memory footprint of the kernel image?


> What do you think of the other problem -- that's actually worse to fix,
> as it won't just be when kaslr is disabled, the startup_64 code will do
> relocation to the end of init_size and clobber the initrd before getting
> to the kaslr code, so it will break as soon as the firmware loads the
> "unified kernel image" at a 2Mb-aligned address. The only thing I can
> think of is to just unconditionally call efi_relocate_kernel if we were
> entered via handover_entry?
>

Yes, that seems to be the most robust approach.

2020-04-06 16:24:31

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, 6 Apr 2020 at 18:01, Arvind Sankar <[email protected]> wrote:
>
> On Mon, Apr 06, 2020 at 03:29:18PM +0200, Ard Biesheuvel wrote:
> > > >
> > > > > Actually, it may be sufficient to #define __efistub_global to
> > > > > __section(.data) like we already do for ARM, to ensure that these
> > > > > global flags are always initialized correctly. (I'll wait for Sergey
> > > > > to confirm that the spurious enabling of the PCI DMA protection
> > > > > resulting from this BSS issue is causing the boot regression)
> > >
> > > Yeah I thought of that as the easiest fix, but it might be safer to
> > > explicitly zero-init in efi_main to avoid future problems in case
> > > someone adds another variable in bss and isn't aware of this obscure
> > > requirement. We actually already have sys_table in bss, but that one is
> > > always initialized. There's also other globals that aren't annotated
> > > (but not in bss by virtue of having initializers). What do you think?
> > >
> >
> > *If* we zero init BSS, I'd prefer for it to be done in the EFI
> > handover protocol entrypoint only. But does that fix the issue that
> > BSS lives outside of the memory footprint of the kernel image?
> >
>
> Yes, I was thinking of only doing it if we didn't come through the
> pe_entry. We could also avoid re-parsing the command line if we add a
> global flag to indicate that.
>

Yeah. I was trying to avoid that, but if we end up needing to
distinguish between the two cases anyway, we might just as well
optimize that too.

> Regarding the memory footprint, if there's no initrd that might be a
> problem, since in that case ImageSize will only cover the actual data
> from the bzImage, so it would be safer to move them to data (including
> sys_table).
>


> We could just pull the stub's bss section all into data with objcopy
> similar to what ARM64 does [1]? i.e. rename .bss to .bss.efistub and
> then pull that into .data in the linker script for the compressed
> kernel?
>

I don't follow. I'm not aware of arm64 doing anything out of the
ordinary with .bss? Note that arm64 does not have a separate
decompressor, so the EFI stub is part of the kernel proper. This is
why sections are renamed to start with .init

> There is also this scary looking comment in gnu-efi's linker script:
> /* the EFI loader doesn't seem to like a .bss section, so we stick
> it all into .data: */
> I don't know what the history of that is.
>

I don't remember, to be honest, but I'm pretty sure I copy-pasted that
from elsewhere at the time.

> [1] As an aside, why doesn't ARM do this as well rather than using the
> section(.data) annotation?
>

The ARM decompressor has this hideous hack, where text [and rodata]
executes straight from flash, and BSS is relocated into DRAM. In order
for this to work, it actually *requires* GOT indirections for BSS
items, to ensure that all BSS references use absolute addresses, which
can be relocated independently from the rest of the kernel image. This
is the reason ARM does not permit a .data section in the decompressor.
However, the EFI stub does not support execute in place from flash,
and so we can permit a .data section there. At the same time, we don't
support GOT indirections in the EFI stub, so we cannot use the BSS
section. So instead, we just put the few BSS pieces we have into .data
instead.






> >
> > > What do you think of the other problem -- that's actually worse to fix,
> > > as it won't just be when kaslr is disabled, the startup_64 code will do
> > > relocation to the end of init_size and clobber the initrd before getting
> > > to the kaslr code, so it will break as soon as the firmware loads the
> > > "unified kernel image" at a 2Mb-aligned address. The only thing I can
> > > think of is to just unconditionally call efi_relocate_kernel if we were
> > > entered via handover_entry?
> > >
> >
> > Yes, that seems to be the most robust approach.

2020-04-06 17:00:08

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, Apr 06, 2020 at 03:29:18PM +0200, Ard Biesheuvel wrote:
> > >
> > > > Actually, it may be sufficient to #define __efistub_global to
> > > > __section(.data) like we already do for ARM, to ensure that these
> > > > global flags are always initialized correctly. (I'll wait for Sergey
> > > > to confirm that the spurious enabling of the PCI DMA protection
> > > > resulting from this BSS issue is causing the boot regression)
> >
> > Yeah I thought of that as the easiest fix, but it might be safer to
> > explicitly zero-init in efi_main to avoid future problems in case
> > someone adds another variable in bss and isn't aware of this obscure
> > requirement. We actually already have sys_table in bss, but that one is
> > always initialized. There's also other globals that aren't annotated
> > (but not in bss by virtue of having initializers). What do you think?
> >
>
> *If* we zero init BSS, I'd prefer for it to be done in the EFI
> handover protocol entrypoint only. But does that fix the issue that
> BSS lives outside of the memory footprint of the kernel image?
>

Yes, I was thinking of only doing it if we didn't come through the
pe_entry. We could also avoid re-parsing the command line if we add a
global flag to indicate that.

Regarding the memory footprint, if there's no initrd that might be a
problem, since in that case ImageSize will only cover the actual data
from the bzImage, so it would be safer to move them to data (including
sys_table).

We could just pull the stub's bss section all into data with objcopy
similar to what ARM64 does [1]? i.e. rename .bss to .bss.efistub and
then pull that into .data in the linker script for the compressed
kernel?

There is also this scary looking comment in gnu-efi's linker script:
/* the EFI loader doesn't seem to like a .bss section, so we stick
it all into .data: */
I don't know what the history of that is.

[1] As an aside, why doesn't ARM do this as well rather than using the
section(.data) annotation?

>
> > What do you think of the other problem -- that's actually worse to fix,
> > as it won't just be when kaslr is disabled, the startup_64 code will do
> > relocation to the end of init_size and clobber the initrd before getting
> > to the kaslr code, so it will break as soon as the firmware loads the
> > "unified kernel image" at a 2Mb-aligned address. The only thing I can
> > think of is to just unconditionally call efi_relocate_kernel if we were
> > entered via handover_entry?
> >
>
> Yes, that seems to be the most robust approach.

2020-04-06 17:44:06

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, Apr 06, 2020 at 06:22:33PM +0200, Ard Biesheuvel wrote:
>
> > We could just pull the stub's bss section all into data with objcopy
> > similar to what ARM64 does [1]? i.e. rename .bss to .bss.efistub and
> > then pull that into .data in the linker script for the compressed
> > kernel?
> >
>
> I don't follow. I'm not aware of arm64 doing anything out of the
> ordinary with .bss? Note that arm64 does not have a separate
> decompressor, so the EFI stub is part of the kernel proper. This is
> why sections are renamed to start with .init

ARM64 renames the stub sections prefixing them with .init, and that
includes .bss. The ARM64 linker script then puts .init.bss into the
.init.data section. So I was suggesting doing something similar with the
x86 stub, renaming .bss to .bss.efistub, and then putting that into
.data via linker script.

Anyway, I'll do the section annotation for now as that will be less
churn and we can revisit this after that.

>
> > There is also this scary looking comment in gnu-efi's linker script:
> > /* the EFI loader doesn't seem to like a .bss section, so we stick
> > it all into .data: */
> > I don't know what the history of that is.
> >
>
> I don't remember, to be honest, but I'm pretty sure I copy-pasted that
> from elsewhere at the time.
>
> > [1] As an aside, why doesn't ARM do this as well rather than using the
> > section(.data) annotation?
> >
>
> The ARM decompressor has this hideous hack, where text [and rodata]
> executes straight from flash, and BSS is relocated into DRAM. In order
> for this to work, it actually *requires* GOT indirections for BSS
> items, to ensure that all BSS references use absolute addresses, which
> can be relocated independently from the rest of the kernel image. This
> is the reason ARM does not permit a .data section in the decompressor.
> However, the EFI stub does not support execute in place from flash,
> and so we can permit a .data section there. At the same time, we don't
> support GOT indirections in the EFI stub, so we cannot use the BSS
> section. So instead, we just put the few BSS pieces we have into .data
> instead.
>

I see, thanks.

2020-04-06 17:44:37

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, 6 Apr 2020 at 18:52, Arvind Sankar <[email protected]> wrote:
>
> On Mon, Apr 06, 2020 at 06:22:33PM +0200, Ard Biesheuvel wrote:
> >
> > > We could just pull the stub's bss section all into data with objcopy
> > > similar to what ARM64 does [1]? i.e. rename .bss to .bss.efistub and
> > > then pull that into .data in the linker script for the compressed
> > > kernel?
> > >
> >
> > I don't follow. I'm not aware of arm64 doing anything out of the
> > ordinary with .bss? Note that arm64 does not have a separate
> > decompressor, so the EFI stub is part of the kernel proper. This is
> > why sections are renamed to start with .init
>
> ARM64 renames the stub sections prefixing them with .init, and that
> includes .bss. The ARM64 linker script then puts .init.bss into the
> .init.data section. So I was suggesting doing something similar with the
> x86 stub, renaming .bss to .bss.efistub, and then putting that into
> .data via linker script.
>

OK, I see what you mean now. IIRC, putting that into .init.data was
simply because there was no way to selectively prefix sections, and
leave .bss alone.

But I guess we can combine all these different histories and
rationales into one coherent way of managing the stub's .bss.


> Anyway, I'll do the section annotation for now as that will be less
> churn and we can revisit this after that.
>
> >
> > > There is also this scary looking comment in gnu-efi's linker script:
> > > /* the EFI loader doesn't seem to like a .bss section, so we stick
> > > it all into .data: */
> > > I don't know what the history of that is.
> > >
> >
> > I don't remember, to be honest, but I'm pretty sure I copy-pasted that
> > from elsewhere at the time.
> >
> > > [1] As an aside, why doesn't ARM do this as well rather than using the
> > > section(.data) annotation?
> > >
> >
> > The ARM decompressor has this hideous hack, where text [and rodata]
> > executes straight from flash, and BSS is relocated into DRAM. In order
> > for this to work, it actually *requires* GOT indirections for BSS
> > items, to ensure that all BSS references use absolute addresses, which
> > can be relocated independently from the rest of the kernel image. This
> > is the reason ARM does not permit a .data section in the decompressor.
> > However, the EFI stub does not support execute in place from flash,
> > and so we can permit a .data section there. At the same time, we don't
> > support GOT indirections in the EFI stub, so we cannot use the BSS
> > section. So instead, we just put the few BSS pieces we have into .data
> > instead.
> >
>
> I see, thanks.

2020-04-06 17:45:54

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH 1/2] x86/boot/compressed/64: Remove .bss/.pgtable from bzImage

On Mon, Apr 06, 2020 at 03:29:18PM +0200, Ard Biesheuvel wrote:
> > What do you think of the other problem -- that's actually worse to fix,
> > as it won't just be when kaslr is disabled, the startup_64 code will do
> > relocation to the end of init_size and clobber the initrd before getting
> > to the kaslr code, so it will break as soon as the firmware loads the
> > "unified kernel image" at a 2Mb-aligned address. The only thing I can
> > think of is to just unconditionally call efi_relocate_kernel if we were
> > entered via handover_entry?
> >
>
> Yes, that seems to be the most robust approach.

The commit in question is this one:

d5cdf4cfeac9 ("efi/x86: Don't relocate the kernel unless necessary")

I presume?

I'm guessing it can simply be reverted as it doesn't fix a bug but it is
just an optimization... provided I'm not missing something, of course.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

2020-04-06 18:07:39

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data

Commit

3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
bzImage")

removed the .bss section from the bzImage.

However, while a PE loader is required to zero-initialize the .bss
section before calling the PE entry point, the EFI handover protocol
does not currently document any requirement that .bss be initialized by
the bootloader prior to calling the handover entry.

When systemd-boot is used to boot a unified kernel image [1], the image
is constructed by embedding the bzImage as a .linux section in a PE
executable that contains a small stub loader from systemd together with
additional sections and potentially an initrd. As the .bss section
within the bzImage is no longer explicitly present as part of the file,
it is not initialized before calling the EFI handover entry.
Furthermore, as the size of the embedded .linux section is only the size
of the bzImage file itself, the .bss section's memory may not even have
been allocated.

In particular, this can result in efi_disable_pci_dma being true even
when it was not specified via the command line or configuration option,
which in turn causes crashes while booting on some systems.

To avoid issues, place all EFI stub global variables into the .data
section instead of .bss. As of this writing, only boolean flags for a
few command line arguments and the sys_table pointer were in .bss and
will now move into the .data section.

[1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images

Signed-off-by: Arvind Sankar <[email protected]>
Reported-by: Sergey Shatunov <[email protected]>
Fixes: 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from bzImage")
---
drivers/firmware/efi/libstub/efistub.h | 2 +-
drivers/firmware/efi/libstub/x86-stub.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index cc90a748bcf0..67d26949fd26 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -25,7 +25,7 @@
#define EFI_ALLOC_ALIGN EFI_PAGE_SIZE
#endif

-#ifdef CONFIG_ARM
+#if defined(CONFIG_ARM) || defined(CONFIG_X86)
#define __efistub_global __section(.data)
#else
#define __efistub_global
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 8d3a707789de..e7af6d2eddbf 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -20,7 +20,7 @@
/* Maximum physical address for 64-bit kernel with 4-level paging */
#define MAXMEM_X86_64_4LEVEL (1ull << 46)

-static efi_system_table_t *sys_table;
+static efi_system_table_t *sys_table __efistub_global;
extern const bool efi_is64;
extern u32 image_offset;

--
2.24.1

2020-04-06 18:08:01

by Arvind Sankar

[permalink] [raw]
Subject: [PATCH 2/2] efi/x86: Always relocate the kernel for EFI handover entry

Commit

d5cdf4cfeac9 ("efi/x86: Don't relocate the kernel unless necessary")

tries to avoid relocating the kernel in the EFI stub as far as possible.

However, when systemd-boot is used to boot a unified kernel image [1],
the image is constructed by embedding the bzImage as a .linux section in
a PE executable that contains a small stub loader from systemd that will
call the EFI stub handover entry, together with additional sections and
potentially an initrd. When this image is constructed, by for example
dracut, the initrd is placed after the bzImage without ensuring that at
least init_size bytes are available for the bzImage. If the kernel is
not relocated by the EFI stub, this could result in the compressed
kernel's startup code in head_{32,64}.S overwriting the initrd.

To prevent this, unconditionally relocate the kernel if the EFI stub was
entered via the handover entry point.

[1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images

Signed-off-by: Arvind Sankar <[email protected]>
Reported-by: Sergey Shatunov <[email protected]>
Fixes: d5cdf4cfeac9 ("efi/x86: Don't relocate the kernel unless necessary")
---
drivers/firmware/efi/libstub/x86-stub.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index e7af6d2eddbf..7583e908852f 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -742,8 +742,15 @@ unsigned long efi_main(efi_handle_t handle,
* now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same as what
* KASLR uses.
*
- * Also relocate it if image_offset is zero, i.e. we weren't loaded by
- * LoadImage, but we are not aligned correctly.
+ * Also relocate it if image_offset is zero, i.e. the kernel wasn't
+ * loaded by LoadImage, but rather by a bootloader that called the
+ * handover entry. The reason we must always relocate in this case is
+ * to handle the case of systemd-boot booting a unified kernel image,
+ * which is a PE executable that contains the bzImage and an initrd as
+ * COFF sections. The initrd section is placed after the bzImage
+ * without ensuring that there are at least init_size bytes available
+ * for the bzImage, and thus the compressed kernel's startup code may
+ * overwrite the initrd unless it is moved out of the way.
*/

buffer_start = ALIGN(bzimage_addr - image_offset,
@@ -753,8 +760,7 @@ unsigned long efi_main(efi_handle_t handle,
if ((buffer_start < LOAD_PHYSICAL_ADDR) ||
(IS_ENABLED(CONFIG_X86_32) && buffer_end > KERNEL_IMAGE_SIZE) ||
(IS_ENABLED(CONFIG_X86_64) && buffer_end > MAXMEM_X86_64_4LEVEL) ||
- (image_offset == 0 && !IS_ALIGNED(bzimage_addr,
- hdr->kernel_alignment))) {
+ (image_offset == 0)) {
status = efi_relocate_kernel(&bzimage_addr,
hdr->init_size, hdr->init_size,
hdr->pref_address,
--
2.24.1

2020-04-06 18:32:17

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data

On Mon, 6 Apr 2020 at 20:06, Arvind Sankar <[email protected]> wrote:
>
> Commit
>
> 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> bzImage")
>
> removed the .bss section from the bzImage.
>
> However, while a PE loader is required to zero-initialize the .bss
> section before calling the PE entry point, the EFI handover protocol
> does not currently document any requirement that .bss be initialized by
> the bootloader prior to calling the handover entry.
>
> When systemd-boot is used to boot a unified kernel image [1], the image
> is constructed by embedding the bzImage as a .linux section in a PE
> executable that contains a small stub loader from systemd together with
> additional sections and potentially an initrd. As the .bss section
> within the bzImage is no longer explicitly present as part of the file,
> it is not initialized before calling the EFI handover entry.
> Furthermore, as the size of the embedded .linux section is only the size
> of the bzImage file itself, the .bss section's memory may not even have
> been allocated.
>
> In particular, this can result in efi_disable_pci_dma being true even
> when it was not specified via the command line or configuration option,
> which in turn causes crashes while booting on some systems.
>
> To avoid issues, place all EFI stub global variables into the .data
> section instead of .bss. As of this writing, only boolean flags for a
> few command line arguments and the sys_table pointer were in .bss and
> will now move into the .data section.
>
> [1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images
>
> Signed-off-by: Arvind Sankar <[email protected]>
> Reported-by: Sergey Shatunov <[email protected]>
> Fixes: 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from bzImage")

Thanks Arvind.

This should be fine for the time being, but going forward, it would
indeed be better [as you suggested] to consolidate the section
tweaking logic that exists on different architectures for entirely
different reasons, but with similar results. That will also ensure
that BSS gets pulled into .data (or .init.data for arm64)
automatically, rather than requiring these manual annotations.

I'll queue these up in efi/urgent, and send them on later this week.


> ---
> drivers/firmware/efi/libstub/efistub.h | 2 +-
> drivers/firmware/efi/libstub/x86-stub.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> index cc90a748bcf0..67d26949fd26 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -25,7 +25,7 @@
> #define EFI_ALLOC_ALIGN EFI_PAGE_SIZE
> #endif
>
> -#ifdef CONFIG_ARM
> +#if defined(CONFIG_ARM) || defined(CONFIG_X86)
> #define __efistub_global __section(.data)
> #else
> #define __efistub_global
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index 8d3a707789de..e7af6d2eddbf 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -20,7 +20,7 @@
> /* Maximum physical address for 64-bit kernel with 4-level paging */
> #define MAXMEM_X86_64_4LEVEL (1ull << 46)
>
> -static efi_system_table_t *sys_table;
> +static efi_system_table_t *sys_table __efistub_global;
> extern const bool efi_is64;
> extern u32 image_offset;
>
> --
> 2.24.1
>

2020-04-08 07:58:36

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data

(add Peter, Leif and Daniel)

On Wed, 8 Apr 2020 at 09:43, Dave Young <[email protected]> wrote:
>
> On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > Commit
> >
> > 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > bzImage")
> >
> > removed the .bss section from the bzImage.
> >
> > However, while a PE loader is required to zero-initialize the .bss
> > section before calling the PE entry point, the EFI handover protocol
> > does not currently document any requirement that .bss be initialized by
> > the bootloader prior to calling the handover entry.
> >
> > When systemd-boot is used to boot a unified kernel image [1], the image
> > is constructed by embedding the bzImage as a .linux section in a PE
> > executable that contains a small stub loader from systemd together with
> > additional sections and potentially an initrd. As the .bss section
> > within the bzImage is no longer explicitly present as part of the file,
> > it is not initialized before calling the EFI handover entry.
> > Furthermore, as the size of the embedded .linux section is only the size
> > of the bzImage file itself, the .bss section's memory may not even have
> > been allocated.
>
> I did not follow up the old report, maybe I missed something. But not
> sure why only systemd-boot is mentioned here. I also have similar issue
> with early efi failure. With these two patches applied, it works well
> then.
>
> BTW, I use Fedora 31 + Grub2
>

OK, so I take it this means that GRUB's PE/COFF loader does not
zero-initialize BSS either? Does it honor the image size in memory if
it exceeds the file size?

2020-04-08 08:13:56

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data

On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> Commit
>
> 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> bzImage")
>
> removed the .bss section from the bzImage.
>
> However, while a PE loader is required to zero-initialize the .bss
> section before calling the PE entry point, the EFI handover protocol
> does not currently document any requirement that .bss be initialized by
> the bootloader prior to calling the handover entry.
>
> When systemd-boot is used to boot a unified kernel image [1], the image
> is constructed by embedding the bzImage as a .linux section in a PE
> executable that contains a small stub loader from systemd together with
> additional sections and potentially an initrd. As the .bss section
> within the bzImage is no longer explicitly present as part of the file,
> it is not initialized before calling the EFI handover entry.
> Furthermore, as the size of the embedded .linux section is only the size
> of the bzImage file itself, the .bss section's memory may not even have
> been allocated.

I did not follow up the old report, maybe I missed something. But not
sure why only systemd-boot is mentioned here. I also have similar issue
with early efi failure. With these two patches applied, it works well
then.

BTW, I use Fedora 31 + Grub2

>
> In particular, this can result in efi_disable_pci_dma being true even
> when it was not specified via the command line or configuration option,
> which in turn causes crashes while booting on some systems.
>
> To avoid issues, place all EFI stub global variables into the .data
> section instead of .bss. As of this writing, only boolean flags for a
> few command line arguments and the sys_table pointer were in .bss and
> will now move into the .data section.
>
> [1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images
>
> Signed-off-by: Arvind Sankar <[email protected]>
> Reported-by: Sergey Shatunov <[email protected]>
> Fixes: 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from bzImage")
> ---
> drivers/firmware/efi/libstub/efistub.h | 2 +-
> drivers/firmware/efi/libstub/x86-stub.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
> index cc90a748bcf0..67d26949fd26 100644
> --- a/drivers/firmware/efi/libstub/efistub.h
> +++ b/drivers/firmware/efi/libstub/efistub.h
> @@ -25,7 +25,7 @@
> #define EFI_ALLOC_ALIGN EFI_PAGE_SIZE
> #endif
>
> -#ifdef CONFIG_ARM
> +#if defined(CONFIG_ARM) || defined(CONFIG_X86)
> #define __efistub_global __section(.data)
> #else
> #define __efistub_global
> diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
> index 8d3a707789de..e7af6d2eddbf 100644
> --- a/drivers/firmware/efi/libstub/x86-stub.c
> +++ b/drivers/firmware/efi/libstub/x86-stub.c
> @@ -20,7 +20,7 @@
> /* Maximum physical address for 64-bit kernel with 4-level paging */
> #define MAXMEM_X86_64_4LEVEL (1ull << 46)
>
> -static efi_system_table_t *sys_table;
> +static efi_system_table_t *sys_table __efistub_global;
> extern const bool efi_is64;
> extern u32 image_offset;
>
> --
> 2.24.1
>

Thanks
Dave

2020-04-09 14:40:25

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data

On Wed, Apr 08, 2020 at 09:49:15AM +0200, Ard Biesheuvel wrote:
> (add Peter, Leif and Daniel)
>
> On Wed, 8 Apr 2020 at 09:43, Dave Young <[email protected]> wrote:
> >
> > On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > > Commit
> > >
> > > 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > > bzImage")
> > >
> > > removed the .bss section from the bzImage.
> > >
> > > However, while a PE loader is required to zero-initialize the .bss
> > > section before calling the PE entry point, the EFI handover protocol
> > > does not currently document any requirement that .bss be initialized by
> > > the bootloader prior to calling the handover entry.
> > >
> > > When systemd-boot is used to boot a unified kernel image [1], the image
> > > is constructed by embedding the bzImage as a .linux section in a PE
> > > executable that contains a small stub loader from systemd together with
> > > additional sections and potentially an initrd. As the .bss section
> > > within the bzImage is no longer explicitly present as part of the file,
> > > it is not initialized before calling the EFI handover entry.
> > > Furthermore, as the size of the embedded .linux section is only the size
> > > of the bzImage file itself, the .bss section's memory may not even have
> > > been allocated.
> >
> > I did not follow up the old report, maybe I missed something. But not
> > sure why only systemd-boot is mentioned here. I also have similar issue
> > with early efi failure. With these two patches applied, it works well
> > then.
> >
> > BTW, I use Fedora 31 + Grub2
> >
>
> OK, so I take it this means that GRUB's PE/COFF loader does not
> zero-initialize BSS either? Does it honor the image size in memory if
> it exceeds the file size?

Dave, that comment was because the previous report was for systemd-boot
stub.

Ard, should I revise the commit message to make it clear it's not
restricted to systemd-boot but anything using handover entry may be
affected? Maybe just a "for example, when systemd-boot..." and then a
line to say grub2 with the EFI stub patches is also impacted?

https://src.fedoraproject.org/rpms/grub2/blob/f31/f/0001-Add-support-for-Linux-EFI-stub-loading.patch#_743

+ kernel_mem = grub_efi_allocate_pages_max(lh.pref_address,
+ BYTES_TO_PAGES(lh.init_size));

Looking at this, grub does allocate init_size for the image, but it
doesn't zero it out.

This call also looks wrong to me though. It allocates at max address of
pref_address, which, if it succeeds, will guarantee that the kernel gets
loaded entirely below pref_address == LOAD_PHYSICAL_ADDR. In native
mode, if it weren't for the EFI stub copying the kernel again, this
would cause the startup code to relocate the kernel into unallocated
memory. On a mixed-mode boot, this would cause the early page tables
setup prior to transitioning to 64-bit mode to be in unallocated memory
and potentially get clobbered by the EFI stub.

The first try to allocate pref_address should be calling
grub_efi_allocate_fixed instead.

2020-04-09 14:50:16

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data

On Thu, 9 Apr 2020 at 16:39, Arvind Sankar <[email protected]> wrote:
>
> On Wed, Apr 08, 2020 at 09:49:15AM +0200, Ard Biesheuvel wrote:
> > (add Peter, Leif and Daniel)
> >
> > On Wed, 8 Apr 2020 at 09:43, Dave Young <[email protected]> wrote:
> > >
> > > On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > > > Commit
> > > >
> > > > 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > > > bzImage")
> > > >
> > > > removed the .bss section from the bzImage.
> > > >
> > > > However, while a PE loader is required to zero-initialize the .bss
> > > > section before calling the PE entry point, the EFI handover protocol
> > > > does not currently document any requirement that .bss be initialized by
> > > > the bootloader prior to calling the handover entry.
> > > >
> > > > When systemd-boot is used to boot a unified kernel image [1], the image
> > > > is constructed by embedding the bzImage as a .linux section in a PE
> > > > executable that contains a small stub loader from systemd together with
> > > > additional sections and potentially an initrd. As the .bss section
> > > > within the bzImage is no longer explicitly present as part of the file,
> > > > it is not initialized before calling the EFI handover entry.
> > > > Furthermore, as the size of the embedded .linux section is only the size
> > > > of the bzImage file itself, the .bss section's memory may not even have
> > > > been allocated.
> > >
> > > I did not follow up the old report, maybe I missed something. But not
> > > sure why only systemd-boot is mentioned here. I also have similar issue
> > > with early efi failure. With these two patches applied, it works well
> > > then.
> > >
> > > BTW, I use Fedora 31 + Grub2
> > >
> >
> > OK, so I take it this means that GRUB's PE/COFF loader does not
> > zero-initialize BSS either? Does it honor the image size in memory if
> > it exceeds the file size?
>
> Dave, that comment was because the previous report was for systemd-boot
> stub.
>
> Ard, should I revise the commit message to make it clear it's not
> restricted to systemd-boot but anything using handover entry may be
> affected? Maybe just a "for example, when systemd-boot..." and then a
> line to say grub2 with the EFI stub patches is also impacted?
>

Well, the fact the /some/ piece of software is used in production that
relies on the ill-defined EFI handover protocol is sufficient
justification, so I don't think it is hugely important to update it.

> https://src.fedoraproject.org/rpms/grub2/blob/f31/f/0001-Add-support-for-Linux-EFI-stub-loading.patch#_743
>
> + kernel_mem = grub_efi_allocate_pages_max(lh.pref_address,
> + BYTES_TO_PAGES(lh.init_size));
>
> Looking at this, grub does allocate init_size for the image, but it
> doesn't zero it out.
>
> This call also looks wrong to me though. It allocates at max address of
> pref_address, which, if it succeeds, will guarantee that the kernel gets
> loaded entirely below pref_address == LOAD_PHYSICAL_ADDR. In native
> mode, if it weren't for the EFI stub copying the kernel again, this
> would cause the startup code to relocate the kernel into unallocated
> memory. On a mixed-mode boot, this would cause the early page tables
> setup prior to transitioning to 64-bit mode to be in unallocated memory
> and potentially get clobbered by the EFI stub.
>
> The first try to allocate pref_address should be calling
> grub_efi_allocate_fixed instead.

Thanks Arvind. I'm sure the Fedora/RedHat folks on cc should be able
to get these logged somewhere.

2020-04-09 17:28:08

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data

On Thu, Apr 09, 2020 at 04:47:55PM +0200, Ard Biesheuvel wrote:
> On Thu, 9 Apr 2020 at 16:39, Arvind Sankar <[email protected]> wrote:
> >
> > On Wed, Apr 08, 2020 at 09:49:15AM +0200, Ard Biesheuvel wrote:
> > > (add Peter, Leif and Daniel)
> > >
> > > On Wed, 8 Apr 2020 at 09:43, Dave Young <[email protected]> wrote:
> > > >
> > > > On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > > > > Commit
> > > > >
> > > > > 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > > > > bzImage")
> > > > >
> > > > > removed the .bss section from the bzImage.
> > > > >
> > > > > However, while a PE loader is required to zero-initialize the .bss
> > > > > section before calling the PE entry point, the EFI handover protocol
> > > > > does not currently document any requirement that .bss be initialized by
> > > > > the bootloader prior to calling the handover entry.
> > > > >
> > > > > When systemd-boot is used to boot a unified kernel image [1], the image
> > > > > is constructed by embedding the bzImage as a .linux section in a PE
> > > > > executable that contains a small stub loader from systemd together with
> > > > > additional sections and potentially an initrd. As the .bss section
> > > > > within the bzImage is no longer explicitly present as part of the file,
> > > > > it is not initialized before calling the EFI handover entry.
> > > > > Furthermore, as the size of the embedded .linux section is only the size
> > > > > of the bzImage file itself, the .bss section's memory may not even have
> > > > > been allocated.
> > > >
> > > > I did not follow up the old report, maybe I missed something. But not
> > > > sure why only systemd-boot is mentioned here. I also have similar issue
> > > > with early efi failure. With these two patches applied, it works well
> > > > then.
> > > >
> > > > BTW, I use Fedora 31 + Grub2
> > > >
> > >
> > > OK, so I take it this means that GRUB's PE/COFF loader does not
> > > zero-initialize BSS either? Does it honor the image size in memory if
> > > it exceeds the file size?
> >
> > Dave, that comment was because the previous report was for systemd-boot
> > stub.
> >
> > Ard, should I revise the commit message to make it clear it's not
> > restricted to systemd-boot but anything using handover entry may be
> > affected? Maybe just a "for example, when systemd-boot..." and then a
> > line to say grub2 with the EFI stub patches is also impacted?
> >
>
> Well, the fact the /some/ piece of software is used in production that
> relies on the ill-defined EFI handover protocol is sufficient
> justification, so I don't think it is hugely important to update it.
>
> > https://src.fedoraproject.org/rpms/grub2/blob/f31/f/0001-Add-support-for-Linux-EFI-stub-loading.patch#_743
> >
> > + kernel_mem = grub_efi_allocate_pages_max(lh.pref_address,
> > + BYTES_TO_PAGES(lh.init_size));
> >
> > Looking at this, grub does allocate init_size for the image, but it
> > doesn't zero it out.
> >
> > This call also looks wrong to me though. It allocates at max address of
> > pref_address, which, if it succeeds, will guarantee that the kernel gets
> > loaded entirely below pref_address == LOAD_PHYSICAL_ADDR. In native
> > mode, if it weren't for the EFI stub copying the kernel again, this
> > would cause the startup code to relocate the kernel into unallocated
> > memory. On a mixed-mode boot, this would cause the early page tables
> > setup prior to transitioning to 64-bit mode to be in unallocated memory
> > and potentially get clobbered by the EFI stub.
> >
> > The first try to allocate pref_address should be calling
> > grub_efi_allocate_fixed instead.
>
> Thanks Arvind. I'm sure the Fedora/RedHat folks on cc should be able
> to get these logged somewhere.

Ok. For dracut, the process for building the unified kernel image needs
a check to make sure the kernel can fit in the space provided for it --
there is 16MiB of space and the distro bzImage's are up to 10-11MiB in
size, so there's some slack left at present.

Additionally, in mixed-mode, the unified kernel images are quite likely
to end up with early pgtables from startup_32 clobbering the initrd,
independently of the recent kernel changes. Hopefully no-one actually
uses these in mixed-mode.

2020-04-10 11:43:49

by Thomas Meyer

[permalink] [raw]
Subject: Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data

On Mon, Apr 06, 2020 at 02:06:13PM -0400, Arvind Sankar wrote:

Hi,

I did write an email to [email protected], which sadly seems to have no
mailing list archive, I wonder if my problem has anything to do with the
patches you are discussing here:

I found this reply, which contains my original email in my inbox:

Subject: Kernel v5.5 doesn't boot on my x86 laptop

Hi,

I'm using an old MacBookPro1,1 to run Fedora 30 (the last one to support
x86) and a upstream up-to-date kernel, currently 5.4.16.

I'm using sd-boot to boot into an EFI-enabled kernel which contains
an embedded initram cpio image (because loading the image from kernel's
EFI stub doesn't seem to work for some unknown reason, I tried to debug
this but my early debugging foo is too weak).

Kernel 5.4.x works correctly with this setup (but resuming from disk
seems to have broken in 5.4.x or maybe even earlier - when resuming from
disk I get all kind of funky OOPSes/errors, but that's another story, hopefully
5.5.x was fixed in this regards).

So I did have a look at the commits under arch/x86/boot and "x86/boot:
Introduce setup_indirect" (b3c72fc9a78e74161f9d05ef7191706060628f8c) did
talk about "bump setup_header version in arch/x86/boot/header.S", so I
did revert above commit and I was finally able to boot into v5.5 kernel!

So either sd-boot also needs an upgrade or this commit does break
something.
Any help is welcome, don't hesitate to get in contact with me if you
have any questions.

mfg
thomas

2020-04-10 14:39:22

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data

On Fri, Apr 10, 2020 at 01:26:05PM +0200, Thomas Meyer wrote:
> On Mon, Apr 06, 2020 at 02:06:13PM -0400, Arvind Sankar wrote:
>
> Hi,
>
> I did write an email to [email protected], which sadly seems to have no
> mailing list archive, I wonder if my problem has anything to do with the
> patches you are discussing here:
>
> I found this reply, which contains my original email in my inbox:
>
> Subject: Kernel v5.5 doesn't boot on my x86 laptop
>
> Hi,
>
> I'm using an old MacBookPro1,1 to run Fedora 30 (the last one to support
> x86) and a upstream up-to-date kernel, currently 5.4.16.
>
> I'm using sd-boot to boot into an EFI-enabled kernel which contains
> an embedded initram cpio image (because loading the image from kernel's
> EFI stub doesn't seem to work for some unknown reason, I tried to debug
> this but my early debugging foo is too weak).
>
> Kernel 5.4.x works correctly with this setup (but resuming from disk
> seems to have broken in 5.4.x or maybe even earlier - when resuming from
> disk I get all kind of funky OOPSes/errors, but that's another story, hopefully
> 5.5.x was fixed in this regards).
>
> So I did have a look at the commits under arch/x86/boot and "x86/boot:
> Introduce setup_indirect" (b3c72fc9a78e74161f9d05ef7191706060628f8c) did
> talk about "bump setup_header version in arch/x86/boot/header.S", so I
> did revert above commit and I was finally able to boot into v5.5 kernel!
>
> So either sd-boot also needs an upgrade or this commit does break
> something.
> Any help is welcome, don't hesitate to get in contact with me if you
> have any questions.
>
> mfg
> thomas
>

If it is a problem with 5.5, it would be unrelated to this thread, which
is about problems introduced by patches for the upcoming 5.7.

Thanks.

2020-04-10 14:49:01

by Arvind Sankar

[permalink] [raw]
Subject: Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data

On Thu, Apr 09, 2020 at 12:35:30PM -0400, Arvind Sankar wrote:
> On Thu, Apr 09, 2020 at 04:47:55PM +0200, Ard Biesheuvel wrote:
> > On Thu, 9 Apr 2020 at 16:39, Arvind Sankar <[email protected]> wrote:
> > >
> > > On Wed, Apr 08, 2020 at 09:49:15AM +0200, Ard Biesheuvel wrote:
> > > > (add Peter, Leif and Daniel)
> > > >
> > > > On Wed, 8 Apr 2020 at 09:43, Dave Young <[email protected]> wrote:
> > > > >
> > > > > On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > > > > > Commit
> > > > > >
> > > > > > 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > > > > > bzImage")
> > > > > >
> > > > > > removed the .bss section from the bzImage.
> > > > > >
> > > > > > However, while a PE loader is required to zero-initialize the .bss
> > > > > > section before calling the PE entry point, the EFI handover protocol
> > > > > > does not currently document any requirement that .bss be initialized by
> > > > > > the bootloader prior to calling the handover entry.
> > > > > >
> > > > > > When systemd-boot is used to boot a unified kernel image [1], the image
> > > > > > is constructed by embedding the bzImage as a .linux section in a PE
> > > > > > executable that contains a small stub loader from systemd together with
> > > > > > additional sections and potentially an initrd. As the .bss section
> > > > > > within the bzImage is no longer explicitly present as part of the file,
> > > > > > it is not initialized before calling the EFI handover entry.
> > > > > > Furthermore, as the size of the embedded .linux section is only the size
> > > > > > of the bzImage file itself, the .bss section's memory may not even have
> > > > > > been allocated.
> > > > >
> > > > > I did not follow up the old report, maybe I missed something. But not
> > > > > sure why only systemd-boot is mentioned here. I also have similar issue
> > > > > with early efi failure. With these two patches applied, it works well
> > > > > then.
> > > > >
> > > > > BTW, I use Fedora 31 + Grub2
> > > > >
> > > >
> > > > OK, so I take it this means that GRUB's PE/COFF loader does not
> > > > zero-initialize BSS either? Does it honor the image size in memory if
> > > > it exceeds the file size?
> > >
> > > Dave, that comment was because the previous report was for systemd-boot
> > > stub.
> > >
> > > Ard, should I revise the commit message to make it clear it's not
> > > restricted to systemd-boot but anything using handover entry may be
> > > affected? Maybe just a "for example, when systemd-boot..." and then a
> > > line to say grub2 with the EFI stub patches is also impacted?
> > >
> >
> > Well, the fact the /some/ piece of software is used in production that
> > relies on the ill-defined EFI handover protocol is sufficient
> > justification, so I don't think it is hugely important to update it.
> >
> > > https://src.fedoraproject.org/rpms/grub2/blob/f31/f/0001-Add-support-for-Linux-EFI-stub-loading.patch#_743
> > >
> > > + kernel_mem = grub_efi_allocate_pages_max(lh.pref_address,
> > > + BYTES_TO_PAGES(lh.init_size));
> > >
> > > Looking at this, grub does allocate init_size for the image, but it
> > > doesn't zero it out.
> > >
> > > This call also looks wrong to me though. It allocates at max address of
> > > pref_address, which, if it succeeds, will guarantee that the kernel gets
> > > loaded entirely below pref_address == LOAD_PHYSICAL_ADDR. In native
> > > mode, if it weren't for the EFI stub copying the kernel again, this
> > > would cause the startup code to relocate the kernel into unallocated
> > > memory. On a mixed-mode boot, this would cause the early page tables
> > > setup prior to transitioning to 64-bit mode to be in unallocated memory
> > > and potentially get clobbered by the EFI stub.
> > >
> > > The first try to allocate pref_address should be calling
> > > grub_efi_allocate_fixed instead.
> >
> > Thanks Arvind. I'm sure the Fedora/RedHat folks on cc should be able
> > to get these logged somewhere.
>
> Ok. For dracut, the process for building the unified kernel image needs
> a check to make sure the kernel can fit in the space provided for it --
> there is 16MiB of space and the distro bzImage's are up to 10-11MiB in
> size, so there's some slack left at present.
>
> Additionally, in mixed-mode, the unified kernel images are quite likely
> to end up with early pgtables from startup_32 clobbering the initrd,
> independently of the recent kernel changes. Hopefully no-one actually
> uses these in mixed-mode.

The grub EFI handover entry patch is busted in mixed-mode for another
reason -- while it allocates init_size, it doesn't use the correct
alignment. I tested on a Debian buster VM in mixed-mode (that was the
one I was able to get to install/boot with mixed-mode), and the early
pagetable from startup_32 ends up in unallocated memory due to the
rounding up of the bzImage address to account for kernel alignment. This
would be an existing problem prior to these patches.

Should we try to handle this in the kernel? At some point KASLR is going
to pick that memory for the kernel and overwrite the pagetables I would
think, resulting in sporadic crashes that are almost unreproducible.

2020-04-10 15:27:36

by Ard Biesheuvel

[permalink] [raw]
Subject: Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data

On Fri, 10 Apr 2020 at 16:48, Arvind Sankar <[email protected]> wrote:
>
> On Thu, Apr 09, 2020 at 12:35:30PM -0400, Arvind Sankar wrote:
> > On Thu, Apr 09, 2020 at 04:47:55PM +0200, Ard Biesheuvel wrote:
> > > On Thu, 9 Apr 2020 at 16:39, Arvind Sankar <[email protected]> wrote:
> > > >
> > > > On Wed, Apr 08, 2020 at 09:49:15AM +0200, Ard Biesheuvel wrote:
> > > > > (add Peter, Leif and Daniel)
> > > > >
> > > > > On Wed, 8 Apr 2020 at 09:43, Dave Young <[email protected]> wrote:
> > > > > >
> > > > > > On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > > > > > > Commit
> > > > > > >
> > > > > > > 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > > > > > > bzImage")
> > > > > > >
> > > > > > > removed the .bss section from the bzImage.
> > > > > > >
> > > > > > > However, while a PE loader is required to zero-initialize the .bss
> > > > > > > section before calling the PE entry point, the EFI handover protocol
> > > > > > > does not currently document any requirement that .bss be initialized by
> > > > > > > the bootloader prior to calling the handover entry.
> > > > > > >
> > > > > > > When systemd-boot is used to boot a unified kernel image [1], the image
> > > > > > > is constructed by embedding the bzImage as a .linux section in a PE
> > > > > > > executable that contains a small stub loader from systemd together with
> > > > > > > additional sections and potentially an initrd. As the .bss section
> > > > > > > within the bzImage is no longer explicitly present as part of the file,
> > > > > > > it is not initialized before calling the EFI handover entry.
> > > > > > > Furthermore, as the size of the embedded .linux section is only the size
> > > > > > > of the bzImage file itself, the .bss section's memory may not even have
> > > > > > > been allocated.
> > > > > >
> > > > > > I did not follow up the old report, maybe I missed something. But not
> > > > > > sure why only systemd-boot is mentioned here. I also have similar issue
> > > > > > with early efi failure. With these two patches applied, it works well
> > > > > > then.
> > > > > >
> > > > > > BTW, I use Fedora 31 + Grub2
> > > > > >
> > > > >
> > > > > OK, so I take it this means that GRUB's PE/COFF loader does not
> > > > > zero-initialize BSS either? Does it honor the image size in memory if
> > > > > it exceeds the file size?
> > > >
> > > > Dave, that comment was because the previous report was for systemd-boot
> > > > stub.
> > > >
> > > > Ard, should I revise the commit message to make it clear it's not
> > > > restricted to systemd-boot but anything using handover entry may be
> > > > affected? Maybe just a "for example, when systemd-boot..." and then a
> > > > line to say grub2 with the EFI stub patches is also impacted?
> > > >
> > >
> > > Well, the fact the /some/ piece of software is used in production that
> > > relies on the ill-defined EFI handover protocol is sufficient
> > > justification, so I don't think it is hugely important to update it.
> > >
> > > > https://src.fedoraproject.org/rpms/grub2/blob/f31/f/0001-Add-support-for-Linux-EFI-stub-loading.patch#_743
> > > >
> > > > + kernel_mem = grub_efi_allocate_pages_max(lh.pref_address,
> > > > + BYTES_TO_PAGES(lh.init_size));
> > > >
> > > > Looking at this, grub does allocate init_size for the image, but it
> > > > doesn't zero it out.
> > > >
> > > > This call also looks wrong to me though. It allocates at max address of
> > > > pref_address, which, if it succeeds, will guarantee that the kernel gets
> > > > loaded entirely below pref_address == LOAD_PHYSICAL_ADDR. In native
> > > > mode, if it weren't for the EFI stub copying the kernel again, this
> > > > would cause the startup code to relocate the kernel into unallocated
> > > > memory. On a mixed-mode boot, this would cause the early page tables
> > > > setup prior to transitioning to 64-bit mode to be in unallocated memory
> > > > and potentially get clobbered by the EFI stub.
> > > >
> > > > The first try to allocate pref_address should be calling
> > > > grub_efi_allocate_fixed instead.
> > >
> > > Thanks Arvind. I'm sure the Fedora/RedHat folks on cc should be able
> > > to get these logged somewhere.
> >
> > Ok. For dracut, the process for building the unified kernel image needs
> > a check to make sure the kernel can fit in the space provided for it --
> > there is 16MiB of space and the distro bzImage's are up to 10-11MiB in
> > size, so there's some slack left at present.
> >
> > Additionally, in mixed-mode, the unified kernel images are quite likely
> > to end up with early pgtables from startup_32 clobbering the initrd,
> > independently of the recent kernel changes. Hopefully no-one actually
> > uses these in mixed-mode.
>
> The grub EFI handover entry patch is busted in mixed-mode for another
> reason -- while it allocates init_size, it doesn't use the correct
> alignment. I tested on a Debian buster VM in mixed-mode (that was the
> one I was able to get to install/boot with mixed-mode), and the early
> pagetable from startup_32 ends up in unallocated memory due to the
> rounding up of the bzImage address to account for kernel alignment. This
> would be an existing problem prior to these patches.
>
> Should we try to handle this in the kernel? At some point KASLR is going
> to pick that memory for the kernel and overwrite the pagetables I would
> think, resulting in sporadic crashes that are almost unreproducible.

Upstream GRUB does not implement the EFI handover protocol at all, and
the distros all have their own GRUB forks that implement this along
with mixed mode, secure boot, shim, measured boot etc.

What you are saying is that GRUB forks turn out to exist that violate
both the PE/COFF specification and the Linux/x86 boot protocol in a
way that might break mixed mode, and nobody noticed until you happened
to find it by code inspection. While I appreciate the effort, I think
this is where I would like to draw the line, and say that there is
only so much we can do to work around bugs in out-of-tree forks of
other projects. So unless it can be done cleanly and without losing
any of the benefits of the recent cleanup and optimization work, I'd
say don't bother.

2020-04-11 08:50:59

by Thomas Meyer

[permalink] [raw]
Subject: Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data

On Fri, Apr 10, 2020 at 10:38:21AM -0400, Arvind Sankar wrote:
> On Fri, Apr 10, 2020 at 01:26:05PM +0200, Thomas Meyer wrote:
> > On Mon, Apr 06, 2020 at 02:06:13PM -0400, Arvind Sankar wrote:
> >
> > Hi,
> >
> > I did write an email to [email protected], which sadly seems to have no
> > mailing list archive, I wonder if my problem has anything to do with the
> > patches you are discussing here:
> >
> > I found this reply, which contains my original email in my inbox:
> >
> > Subject: Kernel v5.5 doesn't boot on my x86 laptop
> >
> > Hi,
> >
> > I'm using an old MacBookPro1,1 to run Fedora 30 (the last one to support
> > x86) and a upstream up-to-date kernel, currently 5.4.16.
> >
> > I'm using sd-boot to boot into an EFI-enabled kernel which contains
> > an embedded initram cpio image (because loading the image from kernel's
> > EFI stub doesn't seem to work for some unknown reason, I tried to debug
> > this but my early debugging foo is too weak).
> >
> > Kernel 5.4.x works correctly with this setup (but resuming from disk
> > seems to have broken in 5.4.x or maybe even earlier - when resuming from
> > disk I get all kind of funky OOPSes/errors, but that's another story, hopefully
> > 5.5.x was fixed in this regards).
> >
> > So I did have a look at the commits under arch/x86/boot and "x86/boot:
> > Introduce setup_indirect" (b3c72fc9a78e74161f9d05ef7191706060628f8c) did
> > talk about "bump setup_header version in arch/x86/boot/header.S", so I
> > did revert above commit and I was finally able to boot into v5.5 kernel!
> >
> > So either sd-boot also needs an upgrade or this commit does break
> > something.
> > Any help is welcome, don't hesitate to get in contact with me if you
> > have any questions.
> >
> > mfg
> > thomas
> >
>
> If it is a problem with 5.5, it would be unrelated to this thread, which
> is about problems introduced by patches for the upcoming 5.7.

okay, ping me if I should test something on real hardware.

>
> Thanks.

2020-04-14 16:23:31

by Daniel Kiper

[permalink] [raw]
Subject: Re: [PATCH 1/2] efi/x86: Move efi stub globals from .bss to .data

On Fri, Apr 10, 2020 at 05:26:34PM +0200, Ard Biesheuvel wrote:
> On Fri, 10 Apr 2020 at 16:48, Arvind Sankar <[email protected]> wrote:
> > On Thu, Apr 09, 2020 at 12:35:30PM -0400, Arvind Sankar wrote:
> > > On Thu, Apr 09, 2020 at 04:47:55PM +0200, Ard Biesheuvel wrote:
> > > > On Thu, 9 Apr 2020 at 16:39, Arvind Sankar <[email protected]> wrote:
> > > > >
> > > > > On Wed, Apr 08, 2020 at 09:49:15AM +0200, Ard Biesheuvel wrote:
> > > > > > (add Peter, Leif and Daniel)
> > > > > >
> > > > > > On Wed, 8 Apr 2020 at 09:43, Dave Young <[email protected]> wrote:
> > > > > > >
> > > > > > > On 04/06/20 at 02:06pm, Arvind Sankar wrote:
> > > > > > > > Commit
> > > > > > > >
> > > > > > > > 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from
> > > > > > > > bzImage")
> > > > > > > >
> > > > > > > > removed the .bss section from the bzImage.
> > > > > > > >
> > > > > > > > However, while a PE loader is required to zero-initialize the .bss
> > > > > > > > section before calling the PE entry point, the EFI handover protocol
> > > > > > > > does not currently document any requirement that .bss be initialized by
> > > > > > > > the bootloader prior to calling the handover entry.
> > > > > > > >
> > > > > > > > When systemd-boot is used to boot a unified kernel image [1], the image
> > > > > > > > is constructed by embedding the bzImage as a .linux section in a PE
> > > > > > > > executable that contains a small stub loader from systemd together with
> > > > > > > > additional sections and potentially an initrd. As the .bss section
> > > > > > > > within the bzImage is no longer explicitly present as part of the file,
> > > > > > > > it is not initialized before calling the EFI handover entry.
> > > > > > > > Furthermore, as the size of the embedded .linux section is only the size
> > > > > > > > of the bzImage file itself, the .bss section's memory may not even have
> > > > > > > > been allocated.
> > > > > > >
> > > > > > > I did not follow up the old report, maybe I missed something. But not
> > > > > > > sure why only systemd-boot is mentioned here. I also have similar issue
> > > > > > > with early efi failure. With these two patches applied, it works well
> > > > > > > then.
> > > > > > >
> > > > > > > BTW, I use Fedora 31 + Grub2
> > > > > > >
> > > > > >
> > > > > > OK, so I take it this means that GRUB's PE/COFF loader does not
> > > > > > zero-initialize BSS either? Does it honor the image size in memory if
> > > > > > it exceeds the file size?
> > > > >
> > > > > Dave, that comment was because the previous report was for systemd-boot
> > > > > stub.
> > > > >
> > > > > Ard, should I revise the commit message to make it clear it's not
> > > > > restricted to systemd-boot but anything using handover entry may be
> > > > > affected? Maybe just a "for example, when systemd-boot..." and then a
> > > > > line to say grub2 with the EFI stub patches is also impacted?
> > > > >
> > > >
> > > > Well, the fact the /some/ piece of software is used in production that
> > > > relies on the ill-defined EFI handover protocol is sufficient
> > > > justification, so I don't think it is hugely important to update it.
> > > >
> > > > > https://src.fedoraproject.org/rpms/grub2/blob/f31/f/0001-Add-support-for-Linux-EFI-stub-loading.patch#_743
> > > > >
> > > > > + kernel_mem = grub_efi_allocate_pages_max(lh.pref_address,
> > > > > + BYTES_TO_PAGES(lh.init_size));
> > > > >
> > > > > Looking at this, grub does allocate init_size for the image, but it
> > > > > doesn't zero it out.
> > > > >
> > > > > This call also looks wrong to me though. It allocates at max address of
> > > > > pref_address, which, if it succeeds, will guarantee that the kernel gets
> > > > > loaded entirely below pref_address == LOAD_PHYSICAL_ADDR. In native
> > > > > mode, if it weren't for the EFI stub copying the kernel again, this
> > > > > would cause the startup code to relocate the kernel into unallocated
> > > > > memory. On a mixed-mode boot, this would cause the early page tables
> > > > > setup prior to transitioning to 64-bit mode to be in unallocated memory
> > > > > and potentially get clobbered by the EFI stub.
> > > > >
> > > > > The first try to allocate pref_address should be calling
> > > > > grub_efi_allocate_fixed instead.
> > > >
> > > > Thanks Arvind. I'm sure the Fedora/RedHat folks on cc should be able
> > > > to get these logged somewhere.
> > >
> > > Ok. For dracut, the process for building the unified kernel image needs
> > > a check to make sure the kernel can fit in the space provided for it --
> > > there is 16MiB of space and the distro bzImage's are up to 10-11MiB in
> > > size, so there's some slack left at present.
> > >
> > > Additionally, in mixed-mode, the unified kernel images are quite likely
> > > to end up with early pgtables from startup_32 clobbering the initrd,
> > > independently of the recent kernel changes. Hopefully no-one actually
> > > uses these in mixed-mode.
> >
> > The grub EFI handover entry patch is busted in mixed-mode for another
> > reason -- while it allocates init_size, it doesn't use the correct
> > alignment. I tested on a Debian buster VM in mixed-mode (that was the
> > one I was able to get to install/boot with mixed-mode), and the early
> > pagetable from startup_32 ends up in unallocated memory due to the
> > rounding up of the bzImage address to account for kernel alignment. This
> > would be an existing problem prior to these patches.
> >
> > Should we try to handle this in the kernel? At some point KASLR is going
> > to pick that memory for the kernel and overwrite the pagetables I would
> > think, resulting in sporadic crashes that are almost unreproducible.
>
> Upstream GRUB does not implement the EFI handover protocol at all, and
> the distros all have their own GRUB forks that implement this along
> with mixed mode, secure boot, shim, measured boot etc.

Exactly...

> What you are saying is that GRUB forks turn out to exist that violate
> both the PE/COFF specification and the Linux/x86 boot protocol in a
> way that might break mixed mode, and nobody noticed until you happened
> to find it by code inspection. While I appreciate the effort, I think
> this is where I would like to draw the line, and say that there is
> only so much we can do to work around bugs in out-of-tree forks of
> other projects. So unless it can be done cleanly and without losing
> any of the benefits of the recent cleanup and optimization work, I'd
> say don't bother.

I fully agree!

Daniel

Subject: [tip: efi/urgent] efi/x86: Always relocate the kernel for EFI handover entry

The following commit has been merged into the efi/urgent branch of tip:

Commit-ID: 21cb9b414301c76f77f70d990a784ad6360e5a20
Gitweb: https://git.kernel.org/tip/21cb9b414301c76f77f70d990a784ad6360e5a20
Author: Arvind Sankar <[email protected]>
AuthorDate: Thu, 09 Apr 2020 15:04:29 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 14 Apr 2020 08:32:13 +02:00

efi/x86: Always relocate the kernel for EFI handover entry

Commit

d5cdf4cfeac9 ("efi/x86: Don't relocate the kernel unless necessary")

tries to avoid relocating the kernel in the EFI stub as far as possible.

However, when systemd-boot is used to boot a unified kernel image [1],
the image is constructed by embedding the bzImage as a .linux section in
a PE executable that contains a small stub loader from systemd that will
call the EFI stub handover entry, together with additional sections and
potentially an initrd. When this image is constructed, by for example
dracut, the initrd is placed after the bzImage without ensuring that at
least init_size bytes are available for the bzImage. If the kernel is
not relocated by the EFI stub, this could result in the compressed
kernel's startup code in head_{32,64}.S overwriting the initrd.

To prevent this, unconditionally relocate the kernel if the EFI stub was
entered via the handover entry point.

[1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images

Fixes: d5cdf4cfeac9 ("efi/x86: Don't relocate the kernel unless necessary")
Reported-by: Sergey Shatunov <[email protected]>
Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
---
drivers/firmware/efi/libstub/x86-stub.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index 867a57e..05ccb22 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -740,8 +740,15 @@ unsigned long efi_main(efi_handle_t handle,
* now use KERNEL_IMAGE_SIZE, which will be 512MiB, the same as what
* KASLR uses.
*
- * Also relocate it if image_offset is zero, i.e. we weren't loaded by
- * LoadImage, but we are not aligned correctly.
+ * Also relocate it if image_offset is zero, i.e. the kernel wasn't
+ * loaded by LoadImage, but rather by a bootloader that called the
+ * handover entry. The reason we must always relocate in this case is
+ * to handle the case of systemd-boot booting a unified kernel image,
+ * which is a PE executable that contains the bzImage and an initrd as
+ * COFF sections. The initrd section is placed after the bzImage
+ * without ensuring that there are at least init_size bytes available
+ * for the bzImage, and thus the compressed kernel's startup code may
+ * overwrite the initrd unless it is moved out of the way.
*/

buffer_start = ALIGN(bzimage_addr - image_offset,
@@ -751,8 +758,7 @@ unsigned long efi_main(efi_handle_t handle,
if ((buffer_start < LOAD_PHYSICAL_ADDR) ||
(IS_ENABLED(CONFIG_X86_32) && buffer_end > KERNEL_IMAGE_SIZE) ||
(IS_ENABLED(CONFIG_X86_64) && buffer_end > MAXMEM_X86_64_4LEVEL) ||
- (image_offset == 0 && !IS_ALIGNED(bzimage_addr,
- hdr->kernel_alignment))) {
+ (image_offset == 0)) {
status = efi_relocate_kernel(&bzimage_addr,
hdr->init_size, hdr->init_size,
hdr->pref_address,

Subject: [tip: efi/urgent] efi/x86: Move efi stub globals from .bss to .data

The following commit has been merged into the efi/urgent branch of tip:

Commit-ID: 105cb9544b161819b7be23a8a8419353a3218807
Gitweb: https://git.kernel.org/tip/105cb9544b161819b7be23a8a8419353a3218807
Author: Arvind Sankar <[email protected]>
AuthorDate: Thu, 09 Apr 2020 15:04:28 +02:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 14 Apr 2020 08:32:13 +02:00

efi/x86: Move efi stub globals from .bss to .data

Commit

3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from bzImage")

removed the .bss section from the bzImage.

However, while a PE loader is required to zero-initialize the .bss
section before calling the PE entry point, the EFI handover protocol
does not currently document any requirement that .bss be initialized by
the bootloader prior to calling the handover entry.

When systemd-boot is used to boot a unified kernel image [1], the image
is constructed by embedding the bzImage as a .linux section in a PE
executable that contains a small stub loader from systemd together with
additional sections and potentially an initrd. As the .bss section
within the bzImage is no longer explicitly present as part of the file,
it is not initialized before calling the EFI handover entry.
Furthermore, as the size of the embedded .linux section is only the size
of the bzImage file itself, the .bss section's memory may not even have
been allocated.

In particular, this can result in efi_disable_pci_dma being true even
when it was not specified via the command line or configuration option,
which in turn causes crashes while booting on some systems.

To avoid issues, place all EFI stub global variables into the .data
section instead of .bss. As of this writing, only boolean flags for a
few command line arguments and the sys_table pointer were in .bss and
will now move into the .data section.

[1] https://systemd.io/BOOT_LOADER_SPECIFICATION/#type-2-efi-unified-kernel-images

Fixes: 3ee372ccce4d ("x86/boot/compressed/64: Remove .bss/.pgtable from bzImage")
Reported-by: Sergey Shatunov <[email protected]>
Signed-off-by: Arvind Sankar <[email protected]>
Signed-off-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Link: https://lore.kernel.org/r/[email protected]
---
drivers/firmware/efi/libstub/efistub.h | 2 +-
drivers/firmware/efi/libstub/x86-stub.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index cc90a74..67d2694 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -25,7 +25,7 @@
#define EFI_ALLOC_ALIGN EFI_PAGE_SIZE
#endif

-#ifdef CONFIG_ARM
+#if defined(CONFIG_ARM) || defined(CONFIG_X86)
#define __efistub_global __section(.data)
#else
#define __efistub_global
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index e02ea51..867a57e 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -20,7 +20,7 @@
/* Maximum physical address for 64-bit kernel with 4-level paging */
#define MAXMEM_X86_64_4LEVEL (1ull << 46)

-static efi_system_table_t *sys_table;
+static efi_system_table_t *sys_table __efistub_global;
extern const bool efi_is64;
extern u32 image_offset;