2012-05-18 15:57:17

by H. Peter Anvin

[permalink] [raw]
Subject: Urgent: x86-32 and GNU ld 2.22.52.0.1

I need an urgent opinion. It seems we have an epic mess on our hands.

GNU ld 2.22.52.0.1 silently changed the semantics of section-relative
symbols that are part of otherwise empty sections, and silently changes
them to absolute. We rely on section-relative symbols staying
section-relative, and actually have several sections in the linker
script solely for this purpose.

The postprocessor for the x86-32 kernel, relocs.c, currently doesn't
enforce its audited absolute symbols list. As part of the
tip:x86/trampoline rework, however, I made it error out rather that
silently producing bad output.

Ingo has found that with this particular version of GNU ld, the error
triggers. I want to emphasize that this merely catches an error which
the current version of the tool would have allowed to silently go by,
which would have (possibly) caused a failure if the kernel was
subsequently booted in anything but its default location.

There are a few ways we can deal with this, but I think we need to do
one or the other:

1. We can blacklist this version of GNU ld.
2. We can uprev the tool to the one from the tip:x86/trampoline work,
with error checking, and give it a list of symbols that should
be relative but may end up as absolute. We risk build errors for
some people if the list isn't complete.
3. We do a minimal forward-port of the error checking into the current
tool.
4. We add to the list of relative symbols in the current version of
the tool without adding the error checking.

However, since it seems clear that we're silently producing corrupt
kernels out of the current build, I think we need a fix for this for 3.4.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.


2012-05-18 16:11:40

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On 05/18/2012 08:56 AM, H. Peter Anvin wrote:
> I need an urgent opinion. It seems we have an epic mess on our hands.
>
> GNU ld 2.22.52.0.1 silently changed the semantics of section-relative
> symbols that are part of otherwise empty sections, and silently changes
> them to absolute. We rely on section-relative symbols staying
> section-relative, and actually have several sections in the linker
> script solely for this purpose.
>
> The postprocessor for the x86-32 kernel, relocs.c, currently doesn't
> enforce its audited absolute symbols list. As part of the
> tip:x86/trampoline rework, however, I made it error out rather that
> silently producing bad output.
>
> Ingo has found that with this particular version of GNU ld, the error
> triggers. I want to emphasize that this merely catches an error which
> the current version of the tool would have allowed to silently go by,
> which would have (possibly) caused a failure if the kernel was
> subsequently booted in anything but its default location.
>
> There are a few ways we can deal with this, but I think we need to do
> one or the other:
>
> 1. We can blacklist this version of GNU ld.
> 2. We can uprev the tool to the one from the tip:x86/trampoline work,
> with error checking, and give it a list of symbols that should
> be relative but may end up as absolute. We risk build errors for
> some people if the list isn't complete.
> 3. We do a minimal forward-port of the error checking into the current
> tool.
> 4. We add to the list of relative symbols in the current version of
> the tool without adding the error checking.
>
> However, since it seems clear that we're silently producing corrupt
> kernels out of the current build, I think we need a fix for this for 3.4.
>

For the record, these are the checkins out of the -tip tree. They are a
little bigger than necessary because they move the tool around to make
it available for reuse, and of course introduce additional functionality.

-hpa



Attachments:
0001-x86-relocs-Workaround-for-binutils-2.22.52.0.1-secti.patch (1.11 kB)
0021-x86-realmode-move-relocs-from-scripts-to-arch-x86-to.patch (44.65 kB)
0033-x86-relocs-Workaround-for-binutils-2.22.52.0.1-secti.patch (1.11 kB)
0034-x86-relocs-More-relocations-which-may-end-up-as-abso.patch (1.57 kB)
Download all attachments

2012-05-18 16:14:08

by H.J. Lu

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On Fri, May 18, 2012 at 8:56 AM, H. Peter Anvin <[email protected]> wrote:
> I need an urgent opinion. ?It seems we have an epic mess on our hands.
>
> GNU ld 2.22.52.0.1 silently changed the semantics of section-relative
> symbols that are part of otherwise empty sections, and silently changes
> them to absolute. ?We rely on section-relative symbols staying
> section-relative, and actually have several sections in the linker
> script solely for this purpose.

That is I talked to you a couple days ago:

http://sourceware.org/bugzilla/show_bug.cgi?id=14052

> The postprocessor for the x86-32 kernel, relocs.c, currently doesn't
> enforce its audited absolute symbols list. ?As part of the
> tip:x86/trampoline rework, however, I made it error out rather that
> silently producing bad output.
>
> Ingo has found that with this particular version of GNU ld, the error
> triggers. ?I want to emphasize that this merely catches an error which
> the current version of the tool would have allowed to silently go by,
> which would have (possibly) caused a failure if the kernel was
> subsequently booted in anything but its default location.
>
> There are a few ways we can deal with this, but I think we need to do
> one or the other:
>
> 1. We can blacklist this version of GNU ld.

I think this is the best approach.

> 2. We can uprev the tool to the one from the tip:x86/trampoline work,
> ? with error checking, and give it a list of symbols that should
> ? be relative but may end up as absolute. ?We risk build errors for
> ? some people if the list isn't complete.
> 3. We do a minimal forward-port of the error checking into the current
> ? tool.
> 4. We add to the list of relative symbols in the current version of
> ? the tool without adding the error checking.
>
> However, since it seems clear that we're silently producing corrupt
> kernels out of the current build, I think we need a fix for this for 3.4.
>


--
H.J.

2012-05-18 16:16:44

by H.J. Lu

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On Fri, May 18, 2012 at 9:14 AM, H.J. Lu <[email protected]> wrote:
> On Fri, May 18, 2012 at 8:56 AM, H. Peter Anvin <[email protected]> wrote:
>> I need an urgent opinion. ?It seems we have an epic mess on our hands.
>>
>> GNU ld 2.22.52.0.1 silently changed the semantics of section-relative
>> symbols that are part of otherwise empty sections, and silently changes
>> them to absolute. ?We rely on section-relative symbols staying
>> section-relative, and actually have several sections in the linker
>> script solely for this purpose.
>
> That is I talked to you a couple days ago:
>
> http://sourceware.org/bugzilla/show_bug.cgi?id=14052
>
>> The postprocessor for the x86-32 kernel, relocs.c, currently doesn't
>> enforce its audited absolute symbols list. ?As part of the
>> tip:x86/trampoline rework, however, I made it error out rather that
>> silently producing bad output.
>>
>> Ingo has found that with this particular version of GNU ld, the error
>> triggers. ?I want to emphasize that this merely catches an error which
>> the current version of the tool would have allowed to silently go by,
>> which would have (possibly) caused a failure if the kernel was
>> subsequently booted in anything but its default location.
>>
>> There are a few ways we can deal with this, but I think we need to do
>> one or the other:
>>
>> 1. We can blacklist this version of GNU ld.
>
> I think this is the best approach.
>

Please verify that binutils 2.22.52.0.2 is broken
and binutils 2.22.52.0.1 is OK.


--
H.J.

2012-05-18 16:19:56

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On 05/18/2012 09:16 AM, H.J. Lu wrote:
>
> Please verify that binutils 2.22.52.0.2 is broken
> and binutils 2.22.52.0.1 is OK.
>

I believe Ingo is using 2.22.52.0.1, and it is most definitely not okay.

<mingo> GNU ld version 2.22.52.0.1-5.fc17 20120131
<mingo> gcc version 4.7.0 20120112 (Red Hat 4.7.0-0.6) (GCC)

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-05-18 16:21:07

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On 05/18/2012 09:14 AM, H.J. Lu wrote:
> On Fri, May 18, 2012 at 8:56 AM, H. Peter Anvin <[email protected]> wrote:
>> I need an urgent opinion. It seems we have an epic mess on our hands.
>>
>> GNU ld 2.22.52.0.1 silently changed the semantics of section-relative
>> symbols that are part of otherwise empty sections, and silently changes
>> them to absolute. We rely on section-relative symbols staying
>> section-relative, and actually have several sections in the linker
>> script solely for this purpose.
>
> That is I talked to you a couple days ago:
>
> http://sourceware.org/bugzilla/show_bug.cgi?id=14052
>

I know, which was a very good thing... otherwise we'd probably not have
tracked this down anywhere near as quickly. Thank you.

The problem is that this version of binutils made it into Fedora 17, and
so we now have a large number of users with a known bad binutils in the
field...

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-05-18 16:35:19

by H.J. Lu

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On Fri, May 18, 2012 at 9:19 AM, H. Peter Anvin <[email protected]> wrote:
> On 05/18/2012 09:16 AM, H.J. Lu wrote:
>>
>> Please verify that binutils 2.22.52.0.2 is broken
>> and binutils 2.22.52.0.1 is OK.
>>
>
> I believe Ingo is using 2.22.52.0.1, and it is most definitely not okay.
>
> <mingo> GNU ld version 2.22.52.0.1-5.fc17 20120131
> <mingo> gcc version 4.7.0 20120112 (Red Hat 4.7.0-0.6) (GCC)
>

In that case, both 2.22.52.0.1 and 2.22.52.0.2 are bad.

--
H.J.

2012-05-18 16:46:33

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On 05/18/2012 09:35 AM, H.J. Lu wrote:
>
> In that case, both 2.22.52.0.1 and 2.22.52.0.2 are bad.
>

I suspect that really means we should have a patch which verifies by
construction, and not rely on version numbers.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-05-18 16:50:33

by H.J. Lu

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On Fri, May 18, 2012 at 9:46 AM, H. Peter Anvin <[email protected]> wrote:
> On 05/18/2012 09:35 AM, H.J. Lu wrote:
>>
>> In that case, both 2.22.52.0.1 and 2.22.52.0.2 are bad.
>>
>
> I suspect that really means we should have a patch which verifies by
> construction, and not rely on version numbers.
>

A small testcase:

[hjl@gnu-6 pr14052]$ cat pr14052.s
.text
.global start /* Used by SH targets. */
start:
.global _start
_start:
.global __start
__start:
.global main /* Used by HPPA targets. */
main:
.dc.a 0
[hjl@gnu-6 pr14052]$ cat pr14052.t
SECTIONS {
.text : {
*(.text)
}
. = ALIGN (0x1000);
.data : {
_data_start = .;
*(.data)
}
/DISCARD/ : { *(.*) }
}
[hjl@gnu-6 pr14052]$ make
as -o pr14052.o pr14052.s
./ld -o pr14052 -T pr14052.t pr14052.o
readelf -s pr14052

Symbol table '.symtab' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 SECTION LOCAL DEFAULT 1
2: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 1 __start
3: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 1 _start
4: 0000000000001000 0 NOTYPE GLOBAL DEFAULT 1 _data_start
5: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 1 main
6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 1 start
[hjl@gnu-6 pr14052]$

There should be no symbols in ABS section.

--
H.J.

2012-05-18 16:52:00

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On 05/18/2012 09:50 AM, H.J. Lu wrote:
>
> A small testcase:
>

Right, but we can equally well just let the postprocessing tool throw an
error.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

2012-05-18 16:55:57

by Josh Boyer

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On Fri, May 18, 2012 at 12:20 PM, H. Peter Anvin <[email protected]> wrote:
> On 05/18/2012 09:14 AM, H.J. Lu wrote:
>> On Fri, May 18, 2012 at 8:56 AM, H. Peter Anvin <[email protected]> wrote:
>>> I need an urgent opinion. ?It seems we have an epic mess on our hands.
>>>
>>> GNU ld 2.22.52.0.1 silently changed the semantics of section-relative
>>> symbols that are part of otherwise empty sections, and silently changes
>>> them to absolute. ?We rely on section-relative symbols staying
>>> section-relative, and actually have several sections in the linker
>>> script solely for this purpose.
>>
>> That is I talked to you a couple days ago:
>>
>> http://sourceware.org/bugzilla/show_bug.cgi?id=14052
>>
>
> I know, which was a very good thing... otherwise we'd probably not have
> tracked this down anywhere near as quickly. ?Thank you.
>
> The problem is that this version of binutils made it into Fedora 17, and
> so we now have a large number of users with a known bad binutils in the
> field...

We've not seen many kernel bugs that would seem to be blamed on this as
of yet. It does seem like a problem waiting to hit us once F17 goes GA
though. My limited 32-bit F17 machine collection definitely shows the
__init_{begin,end} symbols being absolute, but they boot fine. Likely
because the kernel isn't relocated on them.

For what it's worth, I've filed a bug against Fedora binutils here:
https://bugzilla.redhat.com/show_bug.cgi?id=822981

josh

2012-05-18 18:41:27

by Greg KH

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On Fri, May 18, 2012 at 09:51:44AM -0700, H. Peter Anvin wrote:
> On 05/18/2012 09:50 AM, H.J. Lu wrote:
> >
> > A small testcase:
> >
>
> Right, but we can equally well just let the postprocessing tool throw an
> error.

That would probably be the best thing to do right now, and we can
backport that to the stable kernel releases also to ensure they work
properly.

thanks,

greg k-h

2012-05-18 18:52:55

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On 05/18/2012 11:41 AM, Greg KH wrote:
> On Fri, May 18, 2012 at 09:51:44AM -0700, H. Peter Anvin wrote:
>> On 05/18/2012 09:50 AM, H.J. Lu wrote:
>>>
>>> A small testcase:
>>>
>>
>> Right, but we can equally well just let the postprocessing tool throw an
>> error.
>
> That would probably be the best thing to do right now, and we can
> backport that to the stable kernel releases also to ensure they work
> properly.
>

So the question is: do you want to simply take the patches from the
trampoline branch (which are reasonably tested) or do a minimal backport
which only throws an error (which would not be)?

-hpa

2012-05-18 19:11:14

by Greg KH

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On Fri, May 18, 2012 at 11:52:37AM -0700, H. Peter Anvin wrote:
> On 05/18/2012 11:41 AM, Greg KH wrote:
> > On Fri, May 18, 2012 at 09:51:44AM -0700, H. Peter Anvin wrote:
> >> On 05/18/2012 09:50 AM, H.J. Lu wrote:
> >>>
> >>> A small testcase:
> >>>
> >>
> >> Right, but we can equally well just let the postprocessing tool throw an
> >> error.
> >
> > That would probably be the best thing to do right now, and we can
> > backport that to the stable kernel releases also to ensure they work
> > properly.
> >
>
> So the question is: do you want to simply take the patches from the
> trampoline branch (which are reasonably tested) or do a minimal backport
> which only throws an error (which would not be)?

All 4 of those patches? They look simple and "sane" to me. They solve
the problem even with the "buggy" binutils, right? If so, sure, I'll
take those after they land in Linus's tree, which I'm guessing will be
for 3.5-rc1, right?

thanks,

greg k-h

2012-05-18 19:13:47

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On 05/18/2012 12:11 PM, Greg KH wrote:
>>
>> So the question is: do you want to simply take the patches from the
>> trampoline branch (which are reasonably tested) or do a minimal backport
>> which only throws an error (which would not be)?
>
> All 4 of those patches? They look simple and "sane" to me. They solve
> the problem even with the "buggy" binutils, right? If so, sure, I'll
> take those after they land in Linus's tree, which I'm guessing will be
> for 3.5-rc1, right?
>

We think they fix the problem even with the buggy binutils... and will
throw an error if they don't. I intend to push them for 3.5-rc1, but
Linus may want something for 3.4.

-hpa

2012-05-19 10:20:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1


* Josh Boyer <[email protected]> wrote:

> On Fri, May 18, 2012 at 12:20 PM, H. Peter Anvin <[email protected]> wrote:
> > On 05/18/2012 09:14 AM, H.J. Lu wrote:
> >> On Fri, May 18, 2012 at 8:56 AM, H. Peter Anvin <[email protected]> wrote:
> >>> I need an urgent opinion. ?It seems we have an epic mess on our hands.
> >>>
> >>> GNU ld 2.22.52.0.1 silently changed the semantics of section-relative
> >>> symbols that are part of otherwise empty sections, and silently changes
> >>> them to absolute. ?We rely on section-relative symbols staying
> >>> section-relative, and actually have several sections in the linker
> >>> script solely for this purpose.
> >>
> >> That is I talked to you a couple days ago:
> >>
> >> http://sourceware.org/bugzilla/show_bug.cgi?id=14052
> >>
> >
> > I know, which was a very good thing... otherwise we'd probably not have
> > tracked this down anywhere near as quickly. ?Thank you.
> >
> > The problem is that this version of binutils made it into Fedora 17, and
> > so we now have a large number of users with a known bad binutils in the
> > field...
>
> We've not seen many kernel bugs that would seem to be blamed
> on this as of yet. It does seem like a problem waiting to hit
> us once F17 goes GA though. My limited 32-bit F17 machine
> collection definitely shows the __init_{begin,end} symbols
> being absolute, but they boot fine. Likely because the kernel
> isn't relocated on them.

Relocation is rare, it typically happens with crashdump kernels.

Thanks,

Ingo

2012-05-19 18:12:45

by H. Peter Anvin

[permalink] [raw]
Subject: Re: Urgent: x86-32 and GNU ld 2.22.52.0.1

On 05/19/2012 03:20 AM, Ingo Molnar wrote:
>
> Relocation is rare, it typically happens with crashdump kernels.
>

Or if you boot using EFI (fortunately EFI32 is rare), or you have a
screwed-up memory map (which is still rare but is increasingly common.)
For a while some distros, including Fedora, had a messed-up config
which meant it was *always* relocated, but that was a configuration bug
which has since been fixed, I believe.

-hpa