Subject: regression: Bug 215601 - gcc segv at startup on ia64

Hi, this is your Linux kernel regression tracker.

I noticed a regression report in bugzilla.kernel.org that afaics nobody
acted upon since it was reported about a week ago, that's why I'm hereby
forwarding it to the lists and the relevant people. To quote
https://bugzilla.kernel.org/show_bug.cgi?id=215601 :

> On ia64, after 5f501d555653f8968011a1e65ebb121c8b43c144, the gcc
> binary crashes with SIGSEGV at startup (i.e., during ELF loading).
> Only gcc exhibits the crash (including g++, etc), other toolchain
> components (such as ld, ldd, etc) do not, and neither does any other
> binary from what I can tell. I also haven't observed the issue on
> any other architecture.
>
> Reverting this commit resolves the issue up to and including git tip,
> with no (visible) issues.
>
> Hardware: HP Integrity rx2800 i2 Kernel config attached.

Could somebody take a look into this? Or was this discussed somewhere
else already? Or even fixed?

Anyway, to get this tracked:

#regzbot introduced: 5f501d555653f8968011a1e65ebb121c8b43c144
#regzbot from: matoro <[email protected]>
#regzbot title: gcc segv at startup on ia64
#regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215601

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I'm getting a lot of
reports on my table. I can only look briefly into most of them and lack
knowledge about most of the areas they concern. I thus unfortunately
will sometimes get things wrong or miss something important. I hope
that's not the case here; if you think it is, don't hesitate to tell me
in a public reply, it's in everyone's interest to set the public record
straight.

--
Additional information about regzbot:

If you want to know more about regzbot, check out its web-interface, the
getting start guide, and the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for reporters: when reporting a regression it's in your interest to
CC the regression list and tell regzbot about the issue, as that ensures
the regression makes it onto the radar of the Linux kernel's regression
tracker -- that's in your interest, as it ensures your report won't fall
through the cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot once
it's involved. Fix the issue as you normally would, just remember to
include 'Link:' tag in the patch descriptions pointing to all reports
about the issue. This has been expected from developers even before
regzbot showed up for reasons explained in
'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst'.


Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64

[reply to get Anthony on board, I screwed up when copy and pasting his
email address when sending below mail; sorry for the noise!]

On 20.02.22 18:12, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker.
>
> I noticed a regression report in bugzilla.kernel.org that afaics nobody
> acted upon since it was reported about a week ago, that's why I'm hereby
> forwarding it to the lists and the relevant people. To quote
> https://bugzilla.kernel.org/show_bug.cgi?id=215601 :
>
>> On ia64, after 5f501d555653f8968011a1e65ebb121c8b43c144, the gcc
>> binary crashes with SIGSEGV at startup (i.e., during ELF loading).
>> Only gcc exhibits the crash (including g++, etc), other toolchain
>> components (such as ld, ldd, etc) do not, and neither does any other
>> binary from what I can tell. I also haven't observed the issue on
>> any other architecture.
>>
>> Reverting this commit resolves the issue up to and including git tip,
>> with no (visible) issues.
>>
>> Hardware: HP Integrity rx2800 i2 Kernel config attached.
>
> Could somebody take a look into this? Or was this discussed somewhere
> else already? Or even fixed?
>
> Anyway, to get this tracked:
>
> #regzbot introduced: 5f501d555653f8968011a1e65ebb121c8b43c144
> #regzbot from: matoro <[email protected]>
> #regzbot title: gcc segv at startup on ia64
> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215601
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>
> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
> reports on my table. I can only look briefly into most of them and lack
> knowledge about most of the areas they concern. I thus unfortunately
> will sometimes get things wrong or miss something important. I hope
> that's not the case here; if you think it is, don't hesitate to tell me
> in a public reply, it's in everyone's interest to set the public record
> straight.
>

2022-02-21 09:38:59

by Kees Cook

[permalink] [raw]
Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64



On February 20, 2022 9:19:46 AM PST, Thorsten Leemhuis <[email protected]> wrote:
>[reply to get Anthony on board, I screwed up when copy and pasting his
>email address when sending below mail; sorry for the noise!]
>
>On 20.02.22 18:12, Thorsten Leemhuis wrote:
>> Hi, this is your Linux kernel regression tracker.
>>
>> I noticed a regression report in bugzilla.kernel.org that afaics nobody
>> acted upon since it was reported about a week ago, that's why I'm hereby
>> forwarding it to the lists and the relevant people. To quote
>> https://bugzilla.kernel.org/show_bug.cgi?id=215601 :
>>
>>> On ia64, after 5f501d555653f8968011a1e65ebb121c8b43c144, the gcc
>>> binary crashes with SIGSEGV at startup (i.e., during ELF loading).
>>> Only gcc exhibits the crash (including g++, etc), other toolchain
>>> components (such as ld, ldd, etc) do not, and neither does any other
>>> binary from what I can tell. I also haven't observed the issue on
>>> any other architecture.
>>>
>>> Reverting this commit resolves the issue up to and including git tip,
>>> with no (visible) issues.
>>>
>>> Hardware: HP Integrity rx2800 i2 Kernel config attached.
>>
>> Could somebody take a look into this? Or was this discussed somewhere
>> else already? Or even fixed?
>>
>> Anyway, to get this tracked:
>>
>> #regzbot introduced: 5f501d555653f8968011a1e65ebb121c8b43c144
>> #regzbot from: matoro <[email protected]>
>> #regzbot title: gcc segv at startup on ia64
>> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215601

Does this fix it?

https://www.ozlabs.org/~akpm/mmotm/broken-out/elf-fix-overflow-in-total-mapping-size-calculation.patch

-Kees


>>
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>
>> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
>> reports on my table. I can only look briefly into most of them and lack
>> knowledge about most of the areas they concern. I thus unfortunately
>> will sometimes get things wrong or miss something important. I hope
>> that's not the case here; if you think it is, don't hesitate to tell me
>> in a public reply, it's in everyone's interest to set the public record
>> straight.
>>

--
Kees Cook

Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64

Hi Kees!

On 2/21/22 08:42, Kees Cook wrote:
>>>> Reverting this commit resolves the issue up to and including git tip,
>>>> with no (visible) issues.
>>>>
>>>> Hardware: HP Integrity rx2800 i2 Kernel config attached.
>>>
>>> Could somebody take a look into this? Or was this discussed somewhere
>>> else already? Or even fixed?
>>>
>>> Anyway, to get this tracked:
>>>
>>> #regzbot introduced: 5f501d555653f8968011a1e65ebb121c8b43c144
>>> #regzbot from: matoro <[email protected]>
>>> #regzbot title: gcc segv at startup on ia64
>>> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215601
>
> Does this fix it?
>
> https://www.ozlabs.org/~akpm/mmotm/broken-out/elf-fix-overflow-in-total-mapping-size-calculation.patch

I have applied this patch on top of 038101e6b2cd5c55f888f85db42ea2ad3aecb4b6 and it doesn't
fix the problem for me. Reverting 5f501d555653f8968011a1e65ebb121c8b43c144, however, fixes
the problem.

FWIW, this problem doesn't just affect GCC but systemd keeps segfaulting with this change as well.

Adrian

--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - [email protected]
`. `' Freie Universitaet Berlin - [email protected]
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

2022-02-22 02:36:05

by Kees Cook

[permalink] [raw]
Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64



On February 21, 2022 11:49:20 AM PST, John Paul Adrian Glaubitz <[email protected]> wrote:
>Hi Kees!
>
>On 2/21/22 08:42, Kees Cook wrote:
>>>>> Reverting this commit resolves the issue up to and including git tip,
>>>>> with no (visible) issues.
>>>>>
>>>>> Hardware: HP Integrity rx2800 i2 Kernel config attached.
>>>>
>>>> Could somebody take a look into this? Or was this discussed somewhere
>>>> else already? Or even fixed?
>>>>
>>>> Anyway, to get this tracked:
>>>>
>>>> #regzbot introduced: 5f501d555653f8968011a1e65ebb121c8b43c144
>>>> #regzbot from: matoro <[email protected]>
>>>> #regzbot title: gcc segv at startup on ia64
>>>> #regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215601
>>
>> Does this fix it?
>>
>> https://www.ozlabs.org/~akpm/mmotm/broken-out/elf-fix-overflow-in-total-mapping-size-calculation.patch
>
>I have applied this patch on top of 038101e6b2cd5c55f888f85db42ea2ad3aecb4b6 and it doesn't
>fix the problem for me. Reverting 5f501d555653f8968011a1e65ebb121c8b43c144, however, fixes
>the problem.
>
>FWIW, this problem doesn't just affect GCC but systemd keeps segfaulting with this change as well.

Very weird! Can you attached either of those binaries to bugzilla (or a URL I can fetch it from)? I can try to figure out where it is going weird...

--
Kees Cook

Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64

Hi Kees!

On 2/21/22 21:58, Kees Cook wrote:
>> I have applied this patch on top of 038101e6b2cd5c55f888f85db42ea2ad3aecb4b6 and it doesn't
>> fix the problem for me. Reverting 5f501d555653f8968011a1e65ebb121c8b43c144, however, fixes
>> the problem.
>>
>> FWIW, this problem doesn't just affect GCC but systemd keeps segfaulting with this change as well.
>
> Very weird! Can you attached either of those binaries to bugzilla (or a URL I can fetch it from)? I can try to figure out where it is going weird...

Here's the initrd of that particular machine:

> https://people.debian.org/~glaubitz/initrd.img-5.17.0-rc5+

You should be able to extract the binaries from this initrd image and the "mount" command,
for example, should be one of the affected binaries.

Adrian

--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - [email protected]
`. `' Freie Universitaet Berlin - [email protected]
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

2022-02-24 04:23:55

by Kees Cook

[permalink] [raw]
Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64

On Mon, Feb 21, 2022 at 10:57:01PM +0100, John Paul Adrian Glaubitz wrote:
> Hi Kees!
>
> On 2/21/22 21:58, Kees Cook wrote:
> >> I have applied this patch on top of 038101e6b2cd5c55f888f85db42ea2ad3aecb4b6 and it doesn't
> >> fix the problem for me. Reverting 5f501d555653f8968011a1e65ebb121c8b43c144, however, fixes
> >> the problem.
> >>
> >> FWIW, this problem doesn't just affect GCC but systemd keeps segfaulting with this change as well.
> >
> > Very weird! Can you attached either of those binaries to bugzilla (or a URL I can fetch it from)? I can try to figure out where it is going weird...
>
> Here's the initrd of that particular machine:
>
> > https://people.debian.org/~glaubitz/initrd.img-5.17.0-rc5+
>
> You should be able to extract the binaries from this initrd image and the "mount" command,
> for example, should be one of the affected binaries.

I don't see anything immediately obvious here, but I'll keep looking. Is
there any way to emulate ia64? I don't see anything that'll work under
QEMU...

--
Kees Cook

2022-02-24 06:59:05

by Kees Cook

[permalink] [raw]
Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64

On Mon, Feb 21, 2022 at 10:57:01PM +0100, John Paul Adrian Glaubitz wrote:
> Hi Kees!
>
> On 2/21/22 21:58, Kees Cook wrote:
> >> I have applied this patch on top of 038101e6b2cd5c55f888f85db42ea2ad3aecb4b6 and it doesn't
> >> fix the problem for me. Reverting 5f501d555653f8968011a1e65ebb121c8b43c144, however, fixes
> >> the problem.
> >>
> >> FWIW, this problem doesn't just affect GCC but systemd keeps segfaulting with this change as well.
> >
> > Very weird! Can you attached either of those binaries to bugzilla (or a URL I can fetch it from)? I can try to figure out where it is going weird...
>
> Here's the initrd of that particular machine:
>
> > https://people.debian.org/~glaubitz/initrd.img-5.17.0-rc5+
>
> You should be able to extract the binaries from this initrd image and the "mount" command,
> for example, should be one of the affected binaries.

In dmesg, do you see any of these reports?

pr_info("%d (%s): Uhuuh, elf segment at %px requested but the memory is mapped already\n",
task_pid_nr(current), current->comm, (void *)addr);

I don't see anything out of order in the "mount" binary from the above
initrd. What does "readelf -lW" show for the GCC you're seeing failures
on?

--
Kees Cook

Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64

Hi Kees!

On 2/24/22 06:16, Kees Cook wrote:
>> You should be able to extract the binaries from this initrd image and the "mount" command,
>> for example, should be one of the affected binaries.
>
> In dmesg, do you see any of these reports?
>
> pr_info("%d (%s): Uhuuh, elf segment at %px requested but the memory is mapped already\n",
> task_pid_nr(current), current->comm, (void *)addr);

I'll check that.

> I don't see anything out of order in the "mount" binary from the above
> initrd. What does "readelf -lW" show for the GCC you're seeing failures
> on?

I'm not 100% sure whether it's the mount binary that is affected. What happens is that once init takes over,
I'm seeing multiple "Segmentation Fault" message on the console until I'm dropped to the initrd shell.

I can check what dmesg says.

Adrian

--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - [email protected]
`. `' Freie Universitaet Berlin - [email protected]
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913

2022-02-24 15:17:09

by matoro

[permalink] [raw]
Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64

Hi Kees, I can provide live ssh access to my system exhibiting the
issue. My system is a lot more stable due to using openrc rather than
systemd, for me GCC seems to be the only binary affected. Would that be
helpful?

On 2022-02-24 04:33, John Paul Adrian Glaubitz wrote:
> Hi Kees!
>
> On 2/24/22 06:16, Kees Cook wrote:
>>> You should be able to extract the binaries from this initrd image and
>>> the "mount" command,
>>> for example, should be one of the affected binaries.
>>
>> In dmesg, do you see any of these reports?
>>
>> pr_info("%d (%s): Uhuuh, elf segment at %px requested
>> but the memory is mapped already\n",
>> task_pid_nr(current), current->comm, (void
>> *)addr);
>
> I'll check that.
>
>> I don't see anything out of order in the "mount" binary from the above
>> initrd. What does "readelf -lW" show for the GCC you're seeing
>> failures
>> on?
>
> I'm not 100% sure whether it's the mount binary that is affected. What
> happens is that once init takes over,
> I'm seeing multiple "Segmentation Fault" message on the console until
> I'm dropped to the initrd shell.
>
> I can check what dmesg says.
>
> Adrian

2022-02-24 16:58:45

by Kees Cook

[permalink] [raw]
Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64

On Thu, Feb 24, 2022 at 09:22:04AM -0500, matoro wrote:
> Hi Kees, I can provide live ssh access to my system exhibiting the issue.
> My system is a lot more stable due to using openrc rather than systemd, for
> me GCC seems to be the only binary affected. Would that be helpful?

Yeah, that might be helpful. I have access to a Debian ia64 porter box,
but it's not running the broken kernel, so I can only examine "normal"
behavior.

I'll switch to off-list email...

-Kees

--
Kees Cook

2022-02-26 13:53:56

by Kees Cook

[permalink] [raw]
Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64

On Thu, Feb 24, 2022 at 09:22:04AM -0500, matoro wrote:
> Hi Kees, I can provide live ssh access to my system exhibiting the issue.
> My system is a lot more stable due to using openrc rather than systemd, for
> me GCC seems to be the only binary affected. Would that be helpful?

Thanks for this access! I think I see the problem. Non-PIE (i.e. normal
ET_EXEC) ia64 binaries appear to have two very non-contiguous virtual
memory PT_LOAD segments that are file-offset adjacent. As seen in
readelf -lW:

LOAD 0x000000 0x4000000000000000 0x4000000000000000 0x00b5a0 0x00b5a0 R E 0x10000
LOAD 0x00b5a0 0x600000000000b5a0 0x600000000000b5a0 0x0005ac 0x000710 RW 0x10000
^^^^^^^^ ^^^^^^^^^^^^^^^^^^

When the kernel tries to map these with a combined allocation, it asks
for a giant mmap of the file, but the file is, of course, not at all
that large, and the mapping is rejected.

So... I'm trying to think about how best to deal with this. If I or
anyone else can't think of an elegant solution, I'll send a revert for
the offending patch next week.

In the meantime now I've got another dimension to regression test. ;)

--
Kees Cook

2022-02-28 11:10:46

by Magnus Groß

[permalink] [raw]
Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64

> When the kernel tries to map these with a combined allocation, it asks
> for a giant mmap of the file, but the file is, of course, not at all
> that large, and the mapping is rejected.

> So... I'm trying to think about how best to deal with this. If I or
> anyone else can't think of an elegant solution, I'll send a revert for
> the offending patch next week.

Shouldn't we just be able to patch total_mapping_size() again to instead
sum up all p_memsz fields, instead of comparing minimum and maximum
p_vaddr?

Runtime complexity would be the same as we are iterating through all
segments already anyway. And I would also argue that is the behaviour
that one wanted to see in that function anyway.

If you agree with this, I can post a patch, but I would need to know
what tree to base it on to avoid merge conflicts with the just merged
patch from Alexey.

--
Magnus

2022-02-28 20:45:45

by Kees Cook

[permalink] [raw]
Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64

On Mon, Feb 28, 2022 at 11:46:13AM +0100, Magnus Gro? wrote:
> > When the kernel tries to map these with a combined allocation, it asks
> > for a giant mmap of the file, but the file is, of course, not at all
> > that large, and the mapping is rejected.
>
> > So... I'm trying to think about how best to deal with this. If I or
> > anyone else can't think of an elegant solution, I'll send a revert for
> > the offending patch next week.
>
> Shouldn't we just be able to patch total_mapping_size() again to instead
> sum up all p_memsz fields, instead of comparing minimum and maximum
> p_vaddr?

I don't think so, and I need to have a "minimal change" to fix this so
it's more obviously correct.

And, apologies, I failed to Cc you on this patch:
https://lore.kernel.org/linux-hardening/[email protected]/

--
Kees Cook

Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64

On 02.03.22 13:01, John Paul Adrian Glaubitz wrote:
> On 2/20/22 18:12, Thorsten Leemhuis wrote:
>> I noticed a regression report in bugzilla.kernel.org that afaics nobody
>> acted upon since it was reported about a week ago, that's why I'm hereby
>> forwarding it to the lists and the relevant people. To quote
>> https://bugzilla.kernel.org/show_bug.cgi?id=215601 :
> As a heads-up, this issue has been fixed in 439a8468242b [1].
>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=439a8468242b313486e69b8cc3b45ddcfa898fbf

Thx, but no need to in this particular, thx to the "Link:
https://lore.kernel.org/r/[email protected]"
that Kees included in the patch description. That allowed regzbot to
automatically notice the two new mailing lists threads where Kees posted
patches for testing and later also made regzbot notice the fix when it
landed in mainline:
https://linux-regtracking.leemhuis.info/regzbot/regression/[email protected]/

Sadly quite a few developers don't set such links tags, but I hope with
some education (and a improved regzbot that is a bit more useful at
least for subsystem maintainers) that will improve over time.

Ciao, Thorsten

Subject: Re: regression: Bug 215601 - gcc segv at startup on ia64

Hi Thorsten!

On 2/20/22 18:12, Thorsten Leemhuis wrote:
> I noticed a regression report in bugzilla.kernel.org that afaics nobody
> acted upon since it was reported about a week ago, that's why I'm hereby
> forwarding it to the lists and the relevant people. To quote
> https://bugzilla.kernel.org/show_bug.cgi?id=215601 :

As a heads-up, this issue has been fixed in 439a8468242b [1].

Adrian

> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=439a8468242b313486e69b8cc3b45ddcfa898fbf

--
.''`. John Paul Adrian Glaubitz
: :' : Debian Developer - [email protected]
`. `' Freie Universitaet Berlin - [email protected]
`- GPG: 62FF 8A75 84E0 2956 9546 0006 7426 3B37 F5B5 F913