2020-12-03 21:49:27

by Al Viro

[permalink] [raw]
Subject: [PATCHSET] saner elf compat

This series deals with the warts in ELF compat on triarch
architectures (x86_64 and mips64, that is).

x86_64 at least does use compat_binfmt_elf.c for both
32bit ABIs; the way it is done is ugly as hell, though, and more
than slightly brittle (see asm/compat.h for PRSTATUS_SIZE and SET_PR_FPVALID
definitions - IMO that kind of magic is too ugly to live).

mips64, OTOH, does not use compat_binfmt_elf.c for either of its
32bit ABIs. It has a couple of analogues (each with include of
../../../fs/binfmt_elf.c, BTW), with quite a bit of ancient cruft
accumulated in those.

Fortunately, cleanup of i386/x32 mess (first 3 commits in
the series) provides a fairly straightforward way for mips64 to use
fs/compat_binfmt_elf.c for both n32 and o32.

That stuff had been sitting around since June; lately rdd has
spotted Kconfig problems around COMPAT_BINFMT_ELF selects. All of them
had been on configs that had COMPAT_BINFMT_ELF != COMPAT && BINFMT_ELF.
For most of the architectures that's impossible to achieve, but some
(sparc, e.g.) can end up with that. Randy posted a patch adding
if BINFMT_ELF to selects that lacked it, but that looked wrong to me -
why not centralize that logics into fs/Kconfig.binfmt? IOW, what's
the point of having any such selects in arch/*/Kconfig?

The answer (for mainline) is that mips compat does *NOT* want
COMPAT_BINFMT_ELF. Not a problem with that series, though, so I'd
retested it (seems to work, both for x86_64 and mips64, execs and
coredumps for all ABIs alike), with centralization of Kconfig logics
thrown in.

It's based at 5.10-rc1 and lives in
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git#work.elf-compat
I'll post the individual patches in followups.

Shortlog:
binfmt_elf: partially sanitize PRSTATUS_SIZE and SET_PR_FPVALID
elf_prstatus: collect the common part (everything before pr_reg) into a struct
[elfcore-compat][amd64] clean PRSTATUS_SIZE/SET_PR_FPVALID up properly
mips binfmt_elf*32.c: use elfcore-compat.h
mips: kill unused definitions in binfmt_elf[on]32.c
mips: KVM_GUEST makes no sense for 64bit builds...
mips compat: don't bother with ELF_ET_DYN_BASE
mips: don't bother with ELF_CORE_EFLAGS
mips compat: switch to compat_binfmt_elf.c
Kconfig: regularize selection of CONFIG_BINFMT_ELF

Diffstat:
arch/Kconfig | 3 +
arch/arm64/Kconfig | 1 -
arch/ia64/kernel/crash.c | 2 +-
arch/mips/Kconfig | 8 +--
arch/mips/include/asm/elf.h | 56 +++++----------
arch/mips/include/asm/elfcore-compat.h | 29 ++++++++
arch/mips/kernel/Makefile | 4 +-
arch/mips/kernel/binfmt_elfn32.c | 106 ----------------------------
arch/mips/kernel/binfmt_elfo32.c | 109 -----------------------------
arch/mips/kernel/scall64-n64.S | 2 +-
arch/parisc/Kconfig | 1 -
arch/powerpc/Kconfig | 1 -
arch/powerpc/platforms/powernv/opal-core.c | 6 +-
arch/s390/Kconfig | 1 -
arch/s390/kernel/crash_dump.c | 2 +-
arch/sparc/Kconfig | 1 -
arch/x86/Kconfig | 2 +-
arch/x86/include/asm/compat.h | 11 ---
arch/x86/include/asm/elfcore-compat.h | 31 ++++++++
fs/Kconfig.binfmt | 2 +-
fs/binfmt_elf.c | 19 +++--
fs/binfmt_elf_fdpic.c | 22 ++----
fs/compat_binfmt_elf.c | 1 +
include/linux/elfcore-compat.h | 15 +++-
include/linux/elfcore.h | 7 +-
kernel/kexec_core.c | 2 +-
26 files changed, 127 insertions(+), 317 deletions(-)
create mode 100644 arch/mips/include/asm/elfcore-compat.h
delete mode 100644 arch/mips/kernel/binfmt_elfn32.c
delete mode 100644 arch/mips/kernel/binfmt_elfo32.c
create mode 100644 arch/x86/include/asm/elfcore-compat.h


2020-12-03 22:13:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCHSET] saner elf compat

On Thu, Dec 3, 2020 at 1:46 PM Al Viro <[email protected]> wrote:
>
> The answer (for mainline) is that mips compat does *NOT* want
> COMPAT_BINFMT_ELF. Not a problem with that series, though, so I'd
> retested it (seems to work, both for x86_64 and mips64, execs and
> coredumps for all ABIs alike), with centralization of Kconfig logics
> thrown in.

Well, the diffstat looks nice:

> 26 files changed, 127 insertions(+), 317 deletions(-)

and the patches didn't trigger anything for me, but how much did this
get tested? Do you actually have both kinds of 32-bit elf mips
binaries around and a machine to test on?

Linux-mips was cc'd, but I'm adding Thomas B to the cc here explicitly
just so that he has a heads-up on this thing and can go and look at
the mailing list in case it goes to a separate mailbox for him..

Linus

2020-12-03 23:06:49

by Al Viro

[permalink] [raw]
Subject: Re: [PATCHSET] saner elf compat

On Thu, Dec 03, 2020 at 02:09:04PM -0800, Linus Torvalds wrote:
> On Thu, Dec 3, 2020 at 1:46 PM Al Viro <[email protected]> wrote:
> >
> > The answer (for mainline) is that mips compat does *NOT* want
> > COMPAT_BINFMT_ELF. Not a problem with that series, though, so I'd
> > retested it (seems to work, both for x86_64 and mips64, execs and
> > coredumps for all ABIs alike), with centralization of Kconfig logics
> > thrown in.
>
> Well, the diffstat looks nice:
>
> > 26 files changed, 127 insertions(+), 317 deletions(-)
>
> and the patches didn't trigger anything for me, but how much did this
> get tested? Do you actually have both kinds of 32-bit elf mips
> binaries around and a machine to test on?

Yes (aptitude install gcc-multilib on debian mips64el/stretch sets the toolchain
and libraries just fine, and then it's just a matter of -mabi=n32 passed
to gcc). "Machine" is qemu-system-mips64el -machine malta -m 1024 -cpu 5KEc
and the things appear to work; I hadn't tried that on the actual hardware.
I do have a Loongson-2 box, but it would take a while to dig it out and
get it up-to-date.

> Linux-mips was cc'd, but I'm adding Thomas B to the cc here explicitly
> just so that he has a heads-up on this thing and can go and look at
> the mailing list in case it goes to a separate mailbox for him..

I would certainly appreciate review and testing - this branch sat
around in the "should post it someday" state since June (it was
one of the followups grown from regset work back then), and I'm
_not_ going to ask pulling it without an explicit OK from mips
folks.

2020-12-06 04:20:10

by Al Viro

[permalink] [raw]
Subject: Re: [PATCHSET] saner elf compat

On Thu, Dec 03, 2020 at 11:03:36PM +0000, Al Viro wrote:
> > > The answer (for mainline) is that mips compat does *NOT* want
> > > COMPAT_BINFMT_ELF. Not a problem with that series, though, so I'd
> > > retested it (seems to work, both for x86_64 and mips64, execs and
> > > coredumps for all ABIs alike), with centralization of Kconfig logics
> > > thrown in.
> >
> > Well, the diffstat looks nice:
> >
> > > 26 files changed, 127 insertions(+), 317 deletions(-)
> >
> > and the patches didn't trigger anything for me, but how much did this
> > get tested? Do you actually have both kinds of 32-bit elf mips
> > binaries around and a machine to test on?
>
> Yes (aptitude install gcc-multilib on debian mips64el/stretch sets the toolchain
> and libraries just fine, and then it's just a matter of -mabi=n32 passed
> to gcc). "Machine" is qemu-system-mips64el -machine malta -m 1024 -cpu 5KEc
> and the things appear to work; I hadn't tried that on the actual hardware.
> I do have a Loongson-2 box, but it would take a while to dig it out and
> get it up-to-date.
>
> > Linux-mips was cc'd, but I'm adding Thomas B to the cc here explicitly
> > just so that he has a heads-up on this thing and can go and look at
> > the mailing list in case it goes to a separate mailbox for him..
>
> I would certainly appreciate review and testing - this branch sat
> around in the "should post it someday" state since June (it was
> one of the followups grown from regset work back then), and I'm
> _not_ going to ask pulling it without an explicit OK from mips
> folks.

BTW, there's something curious going on in ELF binary recognition for
x32. Unlike other 64bit architectures, here we have a 32bit binary
that successfully passes the native elf_check_arch(). Usually we
either have different EM_... values for 64bit and 32bit (e.g. for ppc
and sparc) or we have an explicit check for ->e_ident[EI_CLASS]
having the right value (ELFCLASS32 or ELFCLASS64 for 32bit and 64bit
binaries resp.)

For x32 that's not true - we use EM_X86_64 for ->e_machine and that's
the only thing the native elf_check_arch() is looking at. IOW,
it looks like amd64 elf_load_binary() progresses past elf_check_arch()
for x32 binaries. And gets to load_elf_phdrs(), which would appear
to have a check of its own that should reject the sucker:
/*
* If the size of this structure has changed, then punt, since
* we will be doing the wrong thing.
*/
if (elf_ex->e_phentsize != sizeof(struct elf_phdr))
goto out;
After all, ->e_phentsize is going to be 32 (sizeof(struct elf32_phdr)
rather than expected 56 (sizeof(struct elf64_phdr)) and off we bugger,
even though it happens at slightly later point than usual. Except that
we are looking at struct elf64_hdr ->e_phentsize - in struct elf32_hdr.
I.e. at offset 54, two bytes past the end of in-file struct elf32_hdr.

Usually we won't find 0x38 0x00 in that location, so everything works,
but IMO that's too convoluted.

Peter, is there any reason not to check ->ei_ident[EI_CLASS] in
amd64 elf_check_arch()? It's a 1-byte load from hot cacheline
(offset 4 and we'd just read the 4 bytes at offsets 0..3) +
compare + branch not taken, so performance impact is pretty much
nil. I'm not saying it's a security problem or anything of that
sort, just that it makes the analysis more subtle than it ought
to be...

Is it about some malformed homegrown 64bit binaries with BS value
at offset 4? Confused...

2020-12-07 03:42:07

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCHSET] saner elf compat

On December 5, 2020 7:23:05 PM PST, Al Viro <[email protected]> wrote:
>On Thu, Dec 03, 2020 at 11:03:36PM +0000, Al Viro wrote:
>> > > The answer (for mainline) is that mips compat does *NOT* want
>> > > COMPAT_BINFMT_ELF. Not a problem with that series, though, so
>I'd
>> > > retested it (seems to work, both for x86_64 and mips64, execs and
>> > > coredumps for all ABIs alike), with centralization of Kconfig
>logics
>> > > thrown in.
>> >
>> > Well, the diffstat looks nice:
>> >
>> > > 26 files changed, 127 insertions(+), 317 deletions(-)
>> >
>> > and the patches didn't trigger anything for me, but how much did
>this
>> > get tested? Do you actually have both kinds of 32-bit elf mips
>> > binaries around and a machine to test on?
>>
>> Yes (aptitude install gcc-multilib on debian mips64el/stretch sets
>the toolchain
>> and libraries just fine, and then it's just a matter of -mabi=n32
>passed
>> to gcc). "Machine" is qemu-system-mips64el -machine malta -m 1024
>-cpu 5KEc
>> and the things appear to work; I hadn't tried that on the actual
>hardware.
>> I do have a Loongson-2 box, but it would take a while to dig it out
>and
>> get it up-to-date.
>>
>> > Linux-mips was cc'd, but I'm adding Thomas B to the cc here
>explicitly
>> > just so that he has a heads-up on this thing and can go and look at
>> > the mailing list in case it goes to a separate mailbox for him..
>>
>> I would certainly appreciate review and testing - this branch sat
>> around in the "should post it someday" state since June (it was
>> one of the followups grown from regset work back then), and I'm
>> _not_ going to ask pulling it without an explicit OK from mips
>> folks.
>
>BTW, there's something curious going on in ELF binary recognition for
>x32. Unlike other 64bit architectures, here we have a 32bit binary
>that successfully passes the native elf_check_arch(). Usually we
>either have different EM_... values for 64bit and 32bit (e.g. for ppc
>and sparc) or we have an explicit check for ->e_ident[EI_CLASS]
>having the right value (ELFCLASS32 or ELFCLASS64 for 32bit and 64bit
>binaries resp.)
>
>For x32 that's not true - we use EM_X86_64 for ->e_machine and that's
>the only thing the native elf_check_arch() is looking at. IOW,
>it looks like amd64 elf_load_binary() progresses past elf_check_arch()
>for x32 binaries. And gets to load_elf_phdrs(), which would appear
>to have a check of its own that should reject the sucker:
> /*
> * If the size of this structure has changed, then punt, since
> * we will be doing the wrong thing.
> */
> if (elf_ex->e_phentsize != sizeof(struct elf_phdr))
> goto out;
>After all, ->e_phentsize is going to be 32 (sizeof(struct elf32_phdr)
>rather than expected 56 (sizeof(struct elf64_phdr)) and off we bugger,
>even though it happens at slightly later point than usual. Except that
>we are looking at struct elf64_hdr ->e_phentsize - in struct elf32_hdr.
>I.e. at offset 54, two bytes past the end of in-file struct elf32_hdr.
>
>Usually we won't find 0x38 0x00 in that location, so everything works,
>but IMO that's too convoluted.
>
>Peter, is there any reason not to check ->ei_ident[EI_CLASS] in
>amd64 elf_check_arch()? It's a 1-byte load from hot cacheline
>(offset 4 and we'd just read the 4 bytes at offsets 0..3) +
>compare + branch not taken, so performance impact is pretty much
>nil. I'm not saying it's a security problem or anything of that
>sort, just that it makes the analysis more subtle than it ought
>to be...
>
>Is it about some malformed homegrown 64bit binaries with BS value
>at offset 4? Confused...

I can't think of any.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

2020-12-07 18:06:27

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCHSET] saner elf compat

On Thu, 3 Dec 2020, Al Viro wrote:

> > Linux-mips was cc'd, but I'm adding Thomas B to the cc here explicitly
> > just so that he has a heads-up on this thing and can go and look at
> > the mailing list in case it goes to a separate mailbox for him..
>
> I would certainly appreciate review and testing - this branch sat
> around in the "should post it someday" state since June (it was
> one of the followups grown from regset work back then), and I'm
> _not_ going to ask pulling it without an explicit OK from mips
> folks.

It may be worth pushing through GDB's gdb.threads/tls-core.exp test case,
making sure no UNSUPPORTED results have been produced due to resource
limits preventing a core from being dumped (and no FAILs, of course), with
o32/n32 native GDB. This should guarantee our output is still as expected
by an interpreter. Sadly I'm currently not set up for such testing though
eventually I mean to.

Maciej

2020-12-15 20:00:19

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [PATCHSET] saner elf compat

On Thu, Dec 03, 2020 at 11:03:36PM +0000, Al Viro wrote:
> On Thu, Dec 03, 2020 at 02:09:04PM -0800, Linus Torvalds wrote:
> > On Thu, Dec 3, 2020 at 1:46 PM Al Viro <[email protected]> wrote:
> > >
> > > The answer (for mainline) is that mips compat does *NOT* want
> > > COMPAT_BINFMT_ELF. Not a problem with that series, though, so I'd
> > > retested it (seems to work, both for x86_64 and mips64, execs and
> > > coredumps for all ABIs alike), with centralization of Kconfig logics
> > > thrown in.
> >
> > Well, the diffstat looks nice:
> >
> > > 26 files changed, 127 insertions(+), 317 deletions(-)
> >
> > and the patches didn't trigger anything for me, but how much did this
> > get tested? Do you actually have both kinds of 32-bit elf mips
> > binaries around and a machine to test on?
>
> Yes (aptitude install gcc-multilib on debian mips64el/stretch sets the toolchain
> and libraries just fine, and then it's just a matter of -mabi=n32 passed
> to gcc). "Machine" is qemu-system-mips64el -machine malta -m 1024 -cpu 5KEc
> and the things appear to work; I hadn't tried that on the actual hardware.
> I do have a Loongson-2 box, but it would take a while to dig it out and
> get it up-to-date.
>
> > Linux-mips was cc'd, but I'm adding Thomas B to the cc here explicitly
> > just so that he has a heads-up on this thing and can go and look at
> > the mailing list in case it goes to a separate mailbox for him..
>
> I would certainly appreciate review and testing - this branch sat
> around in the "should post it someday" state since June (it was
> one of the followups grown from regset work back then), and I'm
> _not_ going to ask pulling it without an explicit OK from mips
> folks.

I've tested it on real hardware and so far everything looks good.

You can add my

Acked-by: Thomas Bogendoerfer <[email protected]>

for the MIPS part.

Thomas.

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]

2020-12-22 20:08:02

by Al Viro

[permalink] [raw]
Subject: Re: [PATCHSET] saner elf compat

On Wed, Dec 16, 2020 at 09:44:53AM +0000, Maciej W. Rozycki wrote:
> On Wed, 16 Dec 2020, Al Viro wrote:
>
> > > It may be worth pushing through GDB's gdb.threads/tls-core.exp test case,
> > > making sure no UNSUPPORTED results have been produced due to resource
> > > limits preventing a core from being dumped (and no FAILs, of course), with
> > > o32/n32 native GDB. This should guarantee our output is still as expected
> > > by an interpreter. Sadly I'm currently not set up for such testing though
> > > eventually I mean to.
> >
> > Umm... What triple does one use for n32 gdb?
>
> I don't think there's a standardised one, just configure with CC/CXX set
> for n32 compilation, e.g.:
>
> $ /path/to/configure CC="gcc -mabi=n32" CXX="g++ -mabi=n32"
>
> (and any other options set as usually). This has to be with CC/CXX rather
> than CFLAGS/CXXFLAGS so that it is guaranteed to be never overridden with
> any logic that might do any fiddling with compilation options. This will
> set up the test suite accordingly.
>
> NB this may already be the compiler's default, depending on how it was
> configured, i.e. if `--with-abi=n32' was used, in which case no extra
> options will be required. I don't know if any standard MIPS distribution
> does it though; 64-bit MIPS/Debian might. This will be reported with `gcc
> --help -v', somewhere along the way.
>
> Let me know if there are issues with this approach.

FWIW, on debian/mips64el (both stretch and buster) the test fails with the
distro kernels (4.9- and 4.19-based) as well as with 5.10-rc1 and
5.10-rc1+that series, all in the same way:
[Current thread is 1 (LWP 4154)]
(gdb) p/x foo
Cannot find thread-local storage for LWP 4154, executable file <pathname>
Cannot find thread-local variables on this target

buster has libc6-2.28, so that should be fine for the test in question
(libthread_db definitely recent enough). That was n32 gdb; considering
how much time it had taken to build that sucker I hadn't tried o32
yet.

Note that it's not just with native coredumps - gcore-produced ones give
the same result. That was gdb from binutils-gdb.git; I'm not familiar
with gdb guts to start debugging it, so if you have any suggestions
in that direction that do not include a full rebuild... In any case,
I won't get around to that until the next week.

Incidentally, build time is bloody awful - 3 days, with qemu-3.1 on
3.5GHz amd64 host, all spent pretty much entirely in userland (both
from guest and host POV). g++-8 is atrociously slow...

That said, I don't see what in that series could possibly mess the
things up for tls, while leaving the registers working; the only
thing that realistically might've been fucked up is prstatus layout
(and possibly size), and that would've screwed the registers as
well.

2020-12-22 21:40:38

by Al Viro

[permalink] [raw]
Subject: Re: [PATCHSET] saner elf compat

On Tue, Dec 22, 2020 at 08:04:31PM +0000, Al Viro wrote:

> FWIW, on debian/mips64el (both stretch and buster) the test fails with the
> distro kernels (4.9- and 4.19-based) as well as with 5.10-rc1 and
> 5.10-rc1+that series, all in the same way:
> [Current thread is 1 (LWP 4154)]
> (gdb) p/x foo
> Cannot find thread-local storage for LWP 4154, executable file <pathname>
> Cannot find thread-local variables on this target
>
> buster has libc6-2.28, so that should be fine for the test in question
> (libthread_db definitely recent enough). That was n32 gdb; considering
> how much time it had taken to build that sucker I hadn't tried o32
> yet.
>
> Note that it's not just with native coredumps - gcore-produced ones give
> the same result. That was gdb from binutils-gdb.git; I'm not familiar
> with gdb guts to start debugging it, so if you have any suggestions
> in that direction that do not include a full rebuild... In any case,
> I won't get around to that until the next week.
>
> Incidentally, build time is bloody awful - 3 days, with qemu-3.1 on
> 3.5GHz amd64 host, all spent pretty much entirely in userland (both
> from guest and host POV). g++-8 is atrociously slow...
>
> That said, I don't see what in that series could possibly mess the
> things up for tls, while leaving the registers working; the only
> thing that realistically might've been fucked up is prstatus layout
> (and possibly size), and that would've screwed the registers as
> well.

... and it smells like the damn thing needs n32 debug info from libthread_db.so
and/or libpthread.so. Which is not packaged by debian libc6 mips64el build.
Sorry, any debugging of that crap is going to happen in January ;-/

2020-12-22 23:02:07

by Al Viro

[permalink] [raw]
Subject: Re: [PATCHSET] saner elf compat

On Tue, Dec 22, 2020 at 09:38:35PM +0000, Al Viro wrote:
> On Tue, Dec 22, 2020 at 08:04:31PM +0000, Al Viro wrote:
>
> > FWIW, on debian/mips64el (both stretch and buster) the test fails with the
> > distro kernels (4.9- and 4.19-based) as well as with 5.10-rc1 and
> > 5.10-rc1+that series, all in the same way:
> > [Current thread is 1 (LWP 4154)]
> > (gdb) p/x foo
> > Cannot find thread-local storage for LWP 4154, executable file <pathname>
> > Cannot find thread-local variables on this target
> >
> > buster has libc6-2.28, so that should be fine for the test in question
> > (libthread_db definitely recent enough). That was n32 gdb; considering
> > how much time it had taken to build that sucker I hadn't tried o32
> > yet.
> >
> > Note that it's not just with native coredumps - gcore-produced ones give
> > the same result. That was gdb from binutils-gdb.git; I'm not familiar
> > with gdb guts to start debugging it, so if you have any suggestions
> > in that direction that do not include a full rebuild... In any case,
> > I won't get around to that until the next week.
> >
> > Incidentally, build time is bloody awful - 3 days, with qemu-3.1 on
> > 3.5GHz amd64 host, all spent pretty much entirely in userland (both
> > from guest and host POV). g++-8 is atrociously slow...
> >
> > That said, I don't see what in that series could possibly mess the
> > things up for tls, while leaving the registers working; the only
> > thing that realistically might've been fucked up is prstatus layout
> > (and possibly size), and that would've screwed the registers as
> > well.
>
> ... and it smells like the damn thing needs n32 debug info from libthread_db.so
> and/or libpthread.so. Which is not packaged by debian libc6 mips64el build.
> Sorry, any debugging of that crap is going to happen in January ;-/

Cute... Completely unrelated, but there's a fun bug in mainline o32
coredumps - say readelf -a core and watch NT_FILE section dump.
Compare that for dumps done on mips32 and mips64 hosts (for the same
o32 binaries, obviously). Or to gcore(1) results on such processes,
for that matter.

What happens there is that 2aa362c49c31 ("coredump: extend core dump note
section to contain file names of mapped files") that has introduced that
section has added
#define user_long_t compat_long_t
to fs/compat_binfmt_elf.c, but not to arch/mips/kernel/binfmt_elfo32.c,
resulting in default (long) being used by fill_files_note().

2020-12-23 07:05:20

by Al Viro

[permalink] [raw]
Subject: Re: [PATCHSET] saner elf compat

[Denys Vlasenko cc'd]

On Wed, Dec 16, 2020 at 09:44:53AM +0000, Maciej W. Rozycki wrote:
> On Wed, 16 Dec 2020, Al Viro wrote:
>
> > > It may be worth pushing through GDB's gdb.threads/tls-core.exp test case,
> > > making sure no UNSUPPORTED results have been produced due to resource
> > > limits preventing a core from being dumped (and no FAILs, of course), with
> > > o32/n32 native GDB. This should guarantee our output is still as expected
> > > by an interpreter. Sadly I'm currently not set up for such testing though
> > > eventually I mean to.
> >
> > Umm... What triple does one use for n32 gdb?
>
> I don't think there's a standardised one, just configure with CC/CXX set
> for n32 compilation, e.g.:
>
> $ /path/to/configure CC="gcc -mabi=n32" CXX="g++ -mabi=n32"
>
> (and any other options set as usually). This has to be with CC/CXX rather
> than CFLAGS/CXXFLAGS so that it is guaranteed to be never overridden with
> any logic that might do any fiddling with compilation options. This will
> set up the test suite accordingly.
>
> NB this may already be the compiler's default, depending on how it was
> configured, i.e. if `--with-abi=n32' was used, in which case no extra
> options will be required. I don't know if any standard MIPS distribution
> does it though; 64-bit MIPS/Debian might. This will be reported with `gcc
> --help -v', somewhere along the way.
>
> Let me know if there are issues with this approach.

One issue is that testsuite doesn't care about $CC, $CFLAGS or anything
of that sort. What I'd done was
cat >~/bin/cc-n32 <<'EOF'
#!/bin/sh
exec /usr/bin/gcc -mabi=n32 "$@"
EOF
chmod +x ~/bin/cc-n32
and add CC_FOR_TARGET="/home/al/bin/cc-n32" in RUNTESTFLAGS.

With that it works. Moreover, it fixes a test failure on mainline.
Mainline kernel (5.10, same behaviour as debian/buster mips64el one):
Test run by al on Tue Dec 22 21:23:09 2020
Native configuration is mips64el-unknown-linux-gnuabin32

=== gdb tests ===

Schedule of variations:
unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/al/binutils-gdb/gdb/testsuite/config/unix.exp as tool-and-target-specific interface file.
Running /home/al/binutils-gdb/gdb/testsuite/gdb.threads/tls-core.exp ...
FAIL: gdb.threads/tls-core.exp: native: print thread-local storage variable
=== gdb Summary ===

# of expected passes 5
# of unexpected failures 1

vfs.git #work.elf-compat:
Test run by al on Tue Dec 22 21:31:14 2020
Native configuration is mips64el-unknown-linux-gnuabin32

=== gdb tests ===

Schedule of variations:
unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/al/binutils-gdb/gdb/testsuite/config/unix.exp as tool-and-target-specific interface file.
Running /home/al/binutils-gdb/gdb/testsuite/gdb.threads/tls-core.exp ...

=== gdb Summary ===

# of expected passes 6


Which is bloody embarrassing, since I'd completely missed the
behaviour change - this series was supposed to be an equivalent
transformation.

Anyway, the minimal patch fixing that failure is this one-liner and
unlike the elf-compat series it's trivial to backport:

[mips] fix n32 coredump breakage

Back in 2012, 49ae4d4b113b ("coredump: add a new elf note with siginfo
of the signal") has introduced a new ELF coredump note - NT_FILE. It contains
a mix of strings and addresses, and addresses are 32bit for 32bit targets
and 64bit for 64bit ones. Eventually gdb has come to use it.

Biarch targets had been taken care of from the very beginning - the
same commit has added a macro (user_long_t) with default being long
and fs/compat_binfmt_elf.c overriding it to compat_long_t.

Unfortunately, Denis had missed the mips weirdness. As the result,
on mips64 both o32 and n32 ended up using 64-bit layout. readelf(1)
is not happy. More importantly, neither is gdb(1); as the matter
of fact, gdb.thread/tls-core.exp kept complaining. Note that gcore(1)
is using 32bit layout for n32 case - it's only the kernel n32 coredumps
that get broken NT_FILE note.

NOTE: similar patch is almost certainly needed for o32; I have only
tested it with n32 gdb, though.

Fixes: 49ae4d4b113b ("coredump: add a new elf note with siginfo of the signal")
Signed-off-by: Al Viro <[email protected]>
---
diff --git a/arch/mips/kernel/binfmt_elfn32.c b/arch/mips/kernel/binfmt_elfn32.c
index 6ee3f7218c67..c073136968e8 100644
--- a/arch/mips/kernel/binfmt_elfn32.c
+++ b/arch/mips/kernel/binfmt_elfn32.c
@@ -103,4 +103,6 @@ jiffies_to_old_timeval32(unsigned long jiffies, struct old_timeval32 *value)
#undef ns_to_kernel_old_timeval
#define ns_to_kernel_old_timeval ns_to_old_timeval32

+#define user_long_t compat_long_t
+
#include "../../../fs/binfmt_elf.c"

2020-12-23 07:14:06

by Al Viro

[permalink] [raw]
Subject: Re: [PATCHSET] saner elf compat

On Wed, Dec 23, 2020 at 07:03:20AM +0000, Al Viro wrote:

Argh.... Wrong commit blamed - the parent of the correct one.
It's actually 2aa362c49c31 ("coredump: extend core dump note section to
contain file names of mapped files"). My apologies - fat-fingered
cut'n'paste...

siginfo commit does suffer the same problem, but it becomes an issue
only for 32bit processes under mips64 big-endian kernel (there it yields
e.g. zero .__sigfault.si_addr in $_siginfo when using gdb with a coredump
of 32bit process, whatever the actual faulting address had been). And
b-e mips64 is rather uncommon, so that's less of an issue.

2020-12-24 19:46:51

by Al Viro

[permalink] [raw]
Subject: [RFC][PATCH] NT_FILE/NT_SIGINFO breakage on mips compat coredumps

On Wed, Dec 23, 2020 at 07:12:13AM +0000, Al Viro wrote:
> On Wed, Dec 23, 2020 at 07:03:20AM +0000, Al Viro wrote:
>
> Argh.... Wrong commit blamed - the parent of the correct one.
> It's actually 2aa362c49c31 ("coredump: extend core dump note section to
> contain file names of mapped files"). My apologies - fat-fingered
> cut'n'paste...
>
> siginfo commit does suffer the same problem, but it becomes an issue
> only for 32bit processes under mips64 big-endian kernel (there it yields
> e.g. zero .__sigfault.si_addr in $_siginfo when using gdb with a coredump
> of 32bit process, whatever the actual faulting address had been). And
> b-e mips64 is rather uncommon, so that's less of an issue.

FWIW, here's debian/mips image (stretch) booted with
qemu-system-mips64 -M malta -cpu 5KEc:
root@mips:~# uname -a
Linux mips 4.9.0-13-5kc-malta #1 Debian 4.9.228-1 (2020-07-05) mips64 GNU/Linux
root@mips:~# cat a.c
main()
{
*(char *)0x0123 = 0;
}
root@mips:~# gcc a.c
a.c:1:1: warning: return type defaults to ‘int’ [-Wimplicit-int]
main()
^~~~
root@mips:~# ulimit -c unlimited
root@mips:~# ./a.out
[ 519.744983] do_page_fault(): sending SIGSEGV to a.out for invalid write access to 0000000000000123
[ 519.746735] epc = 00000000558477c0 in a.out[55847000+1000]
[ 519.747758] ra = 000000007792f4a8 in libc-2.24.so[77916000+16a000]
Segmentation fault (core dumped)
root@mips:~# gdb a.out core
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "mips-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from a.out...(no debugging symbols found)...done.
[New LWP 1202]
Core was generated by `./a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x55dde7c0 in main ()
(gdb) print $_siginfo
$1 = {si_signo = 11, si_errno = 1, si_code = 0, _sifields = {_pad = {0, 0,
291, 0 <repeats 26 times>}, _kill = {si_pid = 0, si_uid = 0}, _timer = {
si_tid = 0, si_overrun = 0, si_sigval = {sival_int = 291,
sival_ptr = 0x123}}, _rt = {si_pid = 0, si_uid = 0, si_sigval = {
sival_int = 291, sival_ptr = 0x123}}, _sigchld = {si_pid = 0,
si_uid = 0, si_status = 291, si_utime = 0, si_stime = 0}, _sigfault = {
si_addr = 0x0}, _sigpoll = {si_band = 0, si_fd = 0}}}
(gdb) quit


Note the wrong value in _sigfault.si_addr - it should've been 0x123, not 0.

root@mips:~# readelf -n core

Displaying notes found at file offset 0x00000234 with length 0x000005f4:
Owner Data size Description
CORE 0x00000100 NT_PRSTATUS (prstatus structure)
CORE 0x00000080 NT_PRPSINFO (prpsinfo structure)
CORE 0x00000080 NT_SIGINFO (siginfo_t data)
CORE 0x00000090 NT_AUXV (auxiliary vector)
CORE 0x000001e1 NT_FILE (mapped files)
Page size: 9
Start End Page Offset
CORE 0x00000108 NT_FPREGSET (floating point registers)

For comparison, exact same image booted with qemu-system-mips -M malta:

root@mips:~# uname -a
Linux mips 4.9.0-13-4kc-malta #1 Debian 4.9.228-1 (2020-07-05) mips GNU/Linux
root@mips:~# ulimit -c unlimited
root@mips:~# ./a.out
[ 83.380870] do_page_fault(): sending SIGSEGV to a.out for invalid write access to 00000123
[ 83.390678] epc = 55e0e7c0 in a.out[55e0e000+1000]
[ 83.391525] ra = 76f644a8 in libc-2.24.so[76f4b000+16a000]
Segmentation fault (core dumped)
root@mips:~# gdb a.out core
GNU gdb (Debian 7.12-6) 7.12.0.20161007-git
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "mips-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from a.out...(no debugging symbols found)...done.
[New LWP 1184]
Core was generated by `./a.out'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x55e0e7c0 in main ()
(gdb) print $_siginfo
$1 = {si_signo = 11, si_errno = 1, si_code = 0, _sifields = {_pad = {291,
0 <repeats 28 times>}, _kill = {si_pid = 291, si_uid = 0}, _timer = {
si_tid = 291, si_overrun = 0, si_sigval = {sival_int = 0,
sival_ptr = 0x0}}, _rt = {si_pid = 291, si_uid = 0, si_sigval = {
sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 291,
si_uid = 0, si_status = 0, si_utime = 0, si_stime = 0}, _sigfault = {
si_addr = 0x123}, _sigpoll = {si_band = 291, si_fd = 0}}}
(gdb) quit
root@mips:~# readelf -n core

Displaying notes found at file offset 0x00000234 with length 0x00000580:
Owner Data size Description
CORE 0x00000100 NT_PRSTATUS (prstatus structure)
CORE 0x00000080 NT_PRPSINFO (prpsinfo structure)
CORE 0x00000080 NT_SIGINFO (siginfo_t data)
CORE 0x00000090 NT_AUXV (auxiliary vector)
CORE 0x0000016d NT_FILE (mapped files)
Page size: 4096
Start End Page Offset
0x55e0e000 0x55e0f000 0x00000000
/root/a.out
0x55e1e000 0x55e1f000 0x00000000
/root/a.out
0x76f4b000 0x770b5000 0x00000000
/lib/mips-linux-gnu/libc-2.24.so
0x770b5000 0x770c5000 0x0000016a
/lib/mips-linux-gnu/libc-2.24.so
0x770c5000 0x770c8000 0x0000016a
/lib/mips-linux-gnu/libc-2.24.so
0x770c8000 0x770cb000 0x0000016d
/lib/mips-linux-gnu/libc-2.24.so
0x770cd000 0x770f0000 0x00000000
/lib/mips-linux-gnu/ld-2.24.so
0x770ff000 0x77100000 0x00000022
/lib/mips-linux-gnu/ld-2.24.so
0x77100000 0x77101000 0x00000023
/lib/mips-linux-gnu/ld-2.24.so
CORE 0x00000108 NT_FPREGSET (floating point registers)

So that's not so theoretical - big-endian mips64 userland is unsupported,
but booting the big-endian mips32 userland on mips64 hardware is clearly
meant to work - they even ship a 64bit kernel built for that.

IOW, both O32 and N32 coredumps in 64bit mips kernels have broken
NT_FILE and NT_SIGINFO. And while NT_SIGINFO breakage is really
visible only on b-e, NT_FILE one is common to b-e and l-e. One of
the effects of the latter is that current gdb fails to work with
threaded coredumps of 32bit processes produced on boxen with 64bit
kernels. Coredumps generated by gcore(1) are fine...

I think the following ought to be applied. Comments?

[mips] fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps

Patches that introduced NT_FILE and NT_SIGINFO notes back in 2012
had taken care of native (fs/binfmt_elf.c) and compat (fs/compat_binfmt_elf.c)
coredumps; unfortunately, compat on mips (which does not go through the
usual compat_binfmt_elf.c) had not been noticed.

As the result, both N32 and O32 coredumps on 64bit mips kernels
have those sections malformed enough to confuse the living hell out of
all gdb and readelf versions (up to and including the tip of binutils-gdb.git).

Longer term solution is to make both O32 and N32 compat use the
regular compat_binfmt_elf.c, but that's too much for backports. The minimal
solution is to do in arch/mips/kernel/binfmt_elf[on]32.c the same thing
those patches have done in fs/compat_binfmt_elf.c

Cc: [email protected] # v3.7+
Signed-off-by: Al Viro <[email protected]>
---
diff --git a/arch/mips/kernel/binfmt_elfn32.c b/arch/mips/kernel/binfmt_elfn32.c
index 6ee3f7218c67..c4441416e96b 100644
--- a/arch/mips/kernel/binfmt_elfn32.c
+++ b/arch/mips/kernel/binfmt_elfn32.c
@@ -103,4 +103,11 @@ jiffies_to_old_timeval32(unsigned long jiffies, struct old_timeval32 *value)
#undef ns_to_kernel_old_timeval
#define ns_to_kernel_old_timeval ns_to_old_timeval32

+/*
+ * Some data types as stored in coredump.
+ */
+#define user_long_t compat_long_t
+#define user_siginfo_t compat_siginfo_t
+#define copy_siginfo_to_external copy_siginfo_to_external32
+
#include "../../../fs/binfmt_elf.c"
diff --git a/arch/mips/kernel/binfmt_elfo32.c b/arch/mips/kernel/binfmt_elfo32.c
index 6dd103d3cebb..7b2a23f48c1a 100644
--- a/arch/mips/kernel/binfmt_elfo32.c
+++ b/arch/mips/kernel/binfmt_elfo32.c
@@ -106,4 +106,11 @@ jiffies_to_old_timeval32(unsigned long jiffies, struct old_timeval32 *value)
#undef ns_to_kernel_old_timeval
#define ns_to_kernel_old_timeval ns_to_old_timeval32

+/*
+ * Some data types as stored in coredump.
+ */
+#define user_long_t compat_long_t
+#define user_siginfo_t compat_siginfo_t
+#define copy_siginfo_to_external copy_siginfo_to_external32
+
#include "../../../fs/binfmt_elf.c"

2020-12-29 15:12:57

by Thomas Bogendoerfer

[permalink] [raw]
Subject: Re: [RFC][PATCH] NT_FILE/NT_SIGINFO breakage on mips compat coredumps

On Thu, Dec 24, 2020 at 07:44:38PM +0000, Al Viro wrote:
> [mips] fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps
>
> Patches that introduced NT_FILE and NT_SIGINFO notes back in 2012
> had taken care of native (fs/binfmt_elf.c) and compat (fs/compat_binfmt_elf.c)
> coredumps; unfortunately, compat on mips (which does not go through the
> usual compat_binfmt_elf.c) had not been noticed.
>
> As the result, both N32 and O32 coredumps on 64bit mips kernels
> have those sections malformed enough to confuse the living hell out of
> all gdb and readelf versions (up to and including the tip of binutils-gdb.git).
>
> Longer term solution is to make both O32 and N32 compat use the
> regular compat_binfmt_elf.c, but that's too much for backports. The minimal
> solution is to do in arch/mips/kernel/binfmt_elf[on]32.c the same thing
> those patches have done in fs/compat_binfmt_elf.c
>
> Cc: [email protected] # v3.7+
> Signed-off-by: Al Viro <[email protected]>
> ---
> diff --git a/arch/mips/kernel/binfmt_elfn32.c b/arch/mips/kernel/binfmt_elfn32.c
> index 6ee3f7218c67..c4441416e96b 100644
> --- a/arch/mips/kernel/binfmt_elfn32.c
> +++ b/arch/mips/kernel/binfmt_elfn32.c
> @@ -103,4 +103,11 @@ jiffies_to_old_timeval32(unsigned long jiffies, struct old_timeval32 *value)
> #undef ns_to_kernel_old_timeval
> #define ns_to_kernel_old_timeval ns_to_old_timeval32
>
> +/*
> + * Some data types as stored in coredump.
> + */
> +#define user_long_t compat_long_t
> +#define user_siginfo_t compat_siginfo_t
> +#define copy_siginfo_to_external copy_siginfo_to_external32
> +
> #include "../../../fs/binfmt_elf.c"
> diff --git a/arch/mips/kernel/binfmt_elfo32.c b/arch/mips/kernel/binfmt_elfo32.c
> index 6dd103d3cebb..7b2a23f48c1a 100644
> --- a/arch/mips/kernel/binfmt_elfo32.c
> +++ b/arch/mips/kernel/binfmt_elfo32.c
> @@ -106,4 +106,11 @@ jiffies_to_old_timeval32(unsigned long jiffies, struct old_timeval32 *value)
> #undef ns_to_kernel_old_timeval
> #define ns_to_kernel_old_timeval ns_to_old_timeval32
>
> +/*
> + * Some data types as stored in coredump.
> + */
> +#define user_long_t compat_long_t
> +#define user_siginfo_t compat_siginfo_t
> +#define copy_siginfo_to_external copy_siginfo_to_external32
> +
> #include "../../../fs/binfmt_elf.c"

LGTM, I've applied it to mips-fixes.

Thomas.

--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]