2018-01-30 09:42:53

by Michal Hocko

[permalink] [raw]
Subject: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

On Tue 30-01-18 14:35:12, Michael Ellerman wrote:
> Michal Hocko <[email protected]> writes:
>
> > On Mon 29-01-18 11:02:09, Anshuman Khandual wrote:
> >> On 01/29/2018 08:17 AM, Anshuman Khandual wrote:
> >> > On 01/26/2018 07:34 PM, Michal Hocko wrote:
> >> >> On Fri 26-01-18 18:04:27, Anshuman Khandual wrote:
> >> >> [...]
> >> >>> I tried to instrument mmap_region() for a single instance of 'sed'
> >> >>> binary and traced all it's VMA creation. But there is no trace when
> >> >>> that 'anon' VMA got created which suddenly shows up during subsequent
> >> >>> elf_map() call eventually failing it. Please note that the following
> >> >>> VMA was never created through call into map_region() in the process
> >> >>> which is strange.
> ...
> >>
> >> Okay, this colliding VMA seems to be getting loaded from load_elf_binary()
> >> function as well.
> >>
> >> [ 9.422410] vma c000001fceedbc40 start 0000000010030000 end 0000000010040000
> >> next c000001fceedbe80 prev c000001fceedb700 mm c000001fceea8200
> >> prot 8000000000000104 anon_vma (null) vm_ops (null)
> >> pgoff 1003 file (null) private_data (null)
> >> flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
> >> [ 9.422576] CPU: 46 PID: 7457 Comm: sed Not tainted 4.14.0-dirty #158
> >> [ 9.422610] Call Trace:
> >> [ 9.422623] [c000001fdc4f79b0] [c000000000b17ac0] dump_stack+0xb0/0xf0 (unreliable)
> >> [ 9.422670] [c000001fdc4f79f0] [c0000000002dafb8] do_brk_flags+0x2d8/0x440
> >> [ 9.422708] [c000001fdc4f7ac0] [c0000000002db3d0] vm_brk_flags+0x80/0x130
> >> [ 9.422747] [c000001fdc4f7b20] [c0000000003d23a4] set_brk+0x80/0xdc
> >> [ 9.422785] [c000001fdc4f7b60] [c0000000003d1f24] load_elf_binary+0x1304/0x158c
> >> [ 9.422830] [c000001fdc4f7c80] [c00000000035d3e0] search_binary_handler+0xd0/0x270
> >> [ 9.422881] [c000001fdc4f7d10] [c00000000035f338] do_execveat_common.isra.31+0x658/0x890
> >> [ 9.422926] [c000001fdc4f7df0] [c00000000035f980] SyS_execve+0x40/0x50
> >> [ 9.423588] [c000001fdc4f7e30] [c00000000000b220] system_call+0x58/0x6c
> >>
> >> which is getting hit after adding some more debug.
> >
> > Voila! So your binary simply overrides brk by elf segments. That sounds
> > like the exactly the thing that the patch is supposed to protect from.
> > Why this is the case I dunno. It is just clear that either brk or
> > elf base are not put to the proper place. Something to get fixed. You
> > are probably just lucky that brk allocations do not spil over to elf
> > mappings.
>
> It is something to get fixed, but we can't retrospectively fix the
> existing binaries sitting on peoples' systems.

Yeah. Can we identify those somehow? Are they something people can
easily come across?

> Possibly powerpc arch code is doing something with the mmap layout or
> something else that is confusing the ELF loader, in which case we should
> fix that.

Yes this definitely should be fixed. How can elf loader completely
overlap brk mapping?

> But if not then the only solution is for the ELF loader to be more
> tolerant of this situation.
>
> So for 4.16 this patch either needs to be dropped, or reworked such that
> powerpc can opt out of it.

Yeah, let's hold on merging this until we understand what the heck is
going on here. If this turnes to be unfixable I will think of a way for
ppc to opt out.

Anshuman, could you try to run
sed 's@^@@' /proc/self/smaps
on a system with MAP_FIXED_NOREPLACE reverted?
--
Michal Hocko
SUSE Labs


2018-01-31 05:18:47

by Anshuman Khandual

[permalink] [raw]
Subject: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

On 01/30/2018 03:12 PM, Michal Hocko wrote:
> On Tue 30-01-18 14:35:12, Michael Ellerman wrote:
>> Michal Hocko <[email protected]> writes:
>>
>>> On Mon 29-01-18 11:02:09, Anshuman Khandual wrote:
>>>> On 01/29/2018 08:17 AM, Anshuman Khandual wrote:
>>>>> On 01/26/2018 07:34 PM, Michal Hocko wrote:
>>>>>> On Fri 26-01-18 18:04:27, Anshuman Khandual wrote:
>>>>>> [...]
>>>>>>> I tried to instrument mmap_region() for a single instance of 'sed'
>>>>>>> binary and traced all it's VMA creation. But there is no trace when
>>>>>>> that 'anon' VMA got created which suddenly shows up during subsequent
>>>>>>> elf_map() call eventually failing it. Please note that the following
>>>>>>> VMA was never created through call into map_region() in the process
>>>>>>> which is strange.
>> ...
>>>>
>>>> Okay, this colliding VMA seems to be getting loaded from load_elf_binary()
>>>> function as well.
>>>>
>>>> [ 9.422410] vma c000001fceedbc40 start 0000000010030000 end 0000000010040000
>>>> next c000001fceedbe80 prev c000001fceedb700 mm c000001fceea8200
>>>> prot 8000000000000104 anon_vma (null) vm_ops (null)
>>>> pgoff 1003 file (null) private_data (null)
>>>> flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
>>>> [ 9.422576] CPU: 46 PID: 7457 Comm: sed Not tainted 4.14.0-dirty #158
>>>> [ 9.422610] Call Trace:
>>>> [ 9.422623] [c000001fdc4f79b0] [c000000000b17ac0] dump_stack+0xb0/0xf0 (unreliable)
>>>> [ 9.422670] [c000001fdc4f79f0] [c0000000002dafb8] do_brk_flags+0x2d8/0x440
>>>> [ 9.422708] [c000001fdc4f7ac0] [c0000000002db3d0] vm_brk_flags+0x80/0x130
>>>> [ 9.422747] [c000001fdc4f7b20] [c0000000003d23a4] set_brk+0x80/0xdc
>>>> [ 9.422785] [c000001fdc4f7b60] [c0000000003d1f24] load_elf_binary+0x1304/0x158c
>>>> [ 9.422830] [c000001fdc4f7c80] [c00000000035d3e0] search_binary_handler+0xd0/0x270
>>>> [ 9.422881] [c000001fdc4f7d10] [c00000000035f338] do_execveat_common.isra.31+0x658/0x890
>>>> [ 9.422926] [c000001fdc4f7df0] [c00000000035f980] SyS_execve+0x40/0x50
>>>> [ 9.423588] [c000001fdc4f7e30] [c00000000000b220] system_call+0x58/0x6c
>>>>
>>>> which is getting hit after adding some more debug.
>>>
>>> Voila! So your binary simply overrides brk by elf segments. That sounds
>>> like the exactly the thing that the patch is supposed to protect from.
>>> Why this is the case I dunno. It is just clear that either brk or
>>> elf base are not put to the proper place. Something to get fixed. You
>>> are probably just lucky that brk allocations do not spil over to elf
>>> mappings.
>>
>> It is something to get fixed, but we can't retrospectively fix the
>> existing binaries sitting on peoples' systems.
>
> Yeah. Can we identify those somehow? Are they something people can
> easily come across?
>
>> Possibly powerpc arch code is doing something with the mmap layout or
>> something else that is confusing the ELF loader, in which case we should
>> fix that.
>
> Yes this definitely should be fixed. How can elf loader completely
> overlap brk mapping?
>
>> But if not then the only solution is for the ELF loader to be more
>> tolerant of this situation.
>>
>> So for 4.16 this patch either needs to be dropped, or reworked such that
>> powerpc can opt out of it.
>
> Yeah, let's hold on merging this until we understand what the heck is
> going on here. If this turnes to be unfixable I will think of a way for
> ppc to opt out.
>
> Anshuman, could you try to run
> sed 's@^@@' /proc/self/smaps
> on a system with MAP_FIXED_NOREPLACE reverted?
>

After reverting the following commits from mmotm-2018-01-25-16-20 tag.

67caea694ba5965a52a61fdad495d847f03c4025 ("mm-introduce-map_fixed_safe-fix")
64da2e0c134ecf3936a4c36b949bcf2cdc98977e ("fs-elf-drop-map_fixed-usage-from-elf_map-fix-fix")
645983ab6ca7fd644f52b4c55462b91940012595 ("mm: don't use the same value for MAP_FIXED_NOREPLACE and MAP_SYNC")
d77bab291ac435aab91fa214b85efa74a26c9c22 ("fs-elf-drop-map_fixed-usage-from-elf_map-checkpatch-fixes")
a75c5f92d9ecb21d3299cc7db48e401cbf335c34 ("fs, elf: drop MAP_FIXED usage from elf_map")
00906d029ffe515221e3939b222c237026af2903 ("mm: introduce MAP_FIXED_NOREPLACE")

$sed 's@^@@' /proc/self/smaps
-------------------------------------------
10000000-10020000 r-xp 00000000 fd:00 10558 /usr/bin/sed
Size: 128 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 128 kB
Pss: 128 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 128 kB
Private_Dirty: 0 kB
Referenced: 128 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 128 kB
VmFlags: rd ex mr mw me dw
10020000-10030000 r--p 00010000 fd:00 10558 /usr/bin/sed
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd mr mw me dw ac
10030000-10040000 rw-p 00020000 fd:00 10558 /usr/bin/sed
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd wr mr mw me dw ac
2cbb0000-2cbe0000 rw-p 00000000 00:00 0 [heap]
Size: 192 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd wr mr mw me ac
7fff7f9c0000-7fff7f9e0000 rw-p 00000000 00:00 0
Size: 128 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 128 kB
Pss: 128 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 128 kB
Referenced: 128 kB
Anonymous: 128 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 128 kB
VmFlags: rd wr mr mw me ac
7fff7f9e0000-7fff86280000 r--p 00000000 fd:00 33660156 /usr/lib/locale/locale-archive
Size: 107136 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 384 kB
Pss: 37 kB
Shared_Clean: 384 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 384 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 37 kB
VmFlags: rd mr mw me
7fff86280000-7fff86290000 r-xp 00000000 fd:00 33660115 /usr/lib64/libdl-2.17.so
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 2 kB
Shared_Clean: 64 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 64 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 2 kB
VmFlags: rd ex mr mw me
7fff86290000-7fff862a0000 r--p 00000000 fd:00 33660115 /usr/lib64/libdl-2.17.so
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd mr mw me ac
7fff862a0000-7fff862b0000 rw-p 00010000 fd:00 33660115 /usr/lib64/libdl-2.17.so
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd wr mr mw me ac
7fff862b0000-7fff86300000 r-xp 00000000 fd:00 33594504 /usr/lib64/libpcre.so.1.2.0
Size: 320 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 2 kB
Shared_Clean: 64 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 64 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 2 kB
VmFlags: rd ex mr mw me
7fff86300000-7fff86310000 r--p 00040000 fd:00 33594504 /usr/lib64/libpcre.so.1.2.0
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd mr mw me ac
7fff86310000-7fff86320000 rw-p 00050000 fd:00 33594504 /usr/lib64/libpcre.so.1.2.0
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd wr mr mw me ac
7fff86320000-7fff864f0000 r-xp 00000000 fd:00 33660109 /usr/lib64/libc-2.17.so
Size: 1856 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 1280 kB
Pss: 45 kB
Shared_Clean: 1280 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 1280 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 45 kB
VmFlags: rd ex mr mw me
7fff864f0000-7fff86500000 r--p 001c0000 fd:00 33660109 /usr/lib64/libc-2.17.so
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd mr mw me ac
7fff86500000-7fff86510000 rw-p 001d0000 fd:00 33660109 /usr/lib64/libc-2.17.so
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd wr mr mw me ac
7fff86510000-7fff86540000 r-xp 00000000 fd:00 33594516 /usr/lib64/libselinux.so.1
Size: 192 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 192 kB
Pss: 8 kB
Shared_Clean: 192 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 192 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 8 kB
VmFlags: rd ex mr mw me
7fff86540000-7fff86550000 r--p 00020000 fd:00 33594516 /usr/lib64/libselinux.so.1
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd mr mw me ac
7fff86550000-7fff86560000 rw-p 00030000 fd:00 33594516 /usr/lib64/libselinux.so.1
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd wr mr mw me ac
7fff86560000-7fff86570000 r--s 00000000 fd:00 67194934 /usr/lib64/gconv/gconv-modules.cache
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 12 kB
Shared_Clean: 64 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 64 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 12 kB
VmFlags: rd mr me ms
7fff86570000-7fff86590000 r-xp 00000000 00:00 0 [vdso]
Size: 128 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 1 kB
Shared_Clean: 64 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 64 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 1 kB
VmFlags: rd ex mr mw me de
7fff86590000-7fff865c0000 r-xp 00000000 fd:00 33660102 /usr/lib64/ld-2.17.so
Size: 192 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 192 kB
Pss: 6 kB
Shared_Clean: 192 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 192 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 6 kB
VmFlags: rd ex mr mw me dw
7fff865c0000-7fff865d0000 r--p 00020000 fd:00 33660102 /usr/lib64/ld-2.17.so
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd mr mw me dw ac
7fff865d0000-7fff865e0000 rw-p 00030000 fd:00 33660102 /usr/lib64/ld-2.17.so
Size: 64 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd wr mr mw me dw ac
7fffd27a0000-7fffd27d0000 rw-p 00000000 00:00 0 [stack]
Size: 192 kB
KernelPageSize: 64 kB
MMUPageSize: 64 kB
Rss: 64 kB
Pss: 64 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 64 kB
Referenced: 64 kB
Anonymous: 64 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
Locked: 64 kB
VmFlags: rd wr mr mw me gd ac


2018-01-31 13:21:03

by Michal Hocko

[permalink] [raw]
Subject: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

On Wed 31-01-18 10:35:38, Anshuman Khandual wrote:
> On 01/30/2018 03:12 PM, Michal Hocko wrote:
[...]
> > Anshuman, could you try to run
> > sed 's@^@@' /proc/self/smaps
> > on a system with MAP_FIXED_NOREPLACE reverted?
> >
>
> After reverting the following commits from mmotm-2018-01-25-16-20 tag.
>
> 67caea694ba5965a52a61fdad495d847f03c4025 ("mm-introduce-map_fixed_safe-fix")
> 64da2e0c134ecf3936a4c36b949bcf2cdc98977e ("fs-elf-drop-map_fixed-usage-from-elf_map-fix-fix")
> 645983ab6ca7fd644f52b4c55462b91940012595 ("mm: don't use the same value for MAP_FIXED_NOREPLACE and MAP_SYNC")
> d77bab291ac435aab91fa214b85efa74a26c9c22 ("fs-elf-drop-map_fixed-usage-from-elf_map-checkpatch-fixes")
> a75c5f92d9ecb21d3299cc7db48e401cbf335c34 ("fs, elf: drop MAP_FIXED usage from elf_map")
> 00906d029ffe515221e3939b222c237026af2903 ("mm: introduce MAP_FIXED_NOREPLACE")
>
> $sed 's@^@@' /proc/self/smaps

Interesting

> -------------------------------------------
> 10000000-10020000 r-xp 00000000 fd:00 10558 /usr/bin/sed
> 10020000-10030000 r--p 00010000 fd:00 10558 /usr/bin/sed
> 10030000-10040000 rw-p 00020000 fd:00 10558 /usr/bin/sed
> 2cbb0000-2cbe0000 rw-p 00000000 00:00 0 [heap]

We still have a brk and at a different offset. Could you confirm that we
still try to map previous brk at the clashing address 0x10030000?

> 7fff7f9c0000-7fff7f9e0000 rw-p 00000000 00:00 0
> 7fff7f9e0000-7fff86280000 r--p 00000000 fd:00 33660156 /usr/lib/locale/locale-archive
> 7fff86280000-7fff86290000 r-xp 00000000 fd:00 33660115 /usr/lib64/libdl-2.17.so
> 7fff86290000-7fff862a0000 r--p 00000000 fd:00 33660115 /usr/lib64/libdl-2.17.so
> 7fff862a0000-7fff862b0000 rw-p 00010000 fd:00 33660115 /usr/lib64/libdl-2.17.so
> 7fff862b0000-7fff86300000 r-xp 00000000 fd:00 33594504 /usr/lib64/libpcre.so.1.2.0
> 7fff86300000-7fff86310000 r--p 00040000 fd:00 33594504 /usr/lib64/libpcre.so.1.2.0
> 7fff86310000-7fff86320000 rw-p 00050000 fd:00 33594504 /usr/lib64/libpcre.so.1.2.0
> 7fff86320000-7fff864f0000 r-xp 00000000 fd:00 33660109 /usr/lib64/libc-2.17.so
> 7fff864f0000-7fff86500000 r--p 001c0000 fd:00 33660109 /usr/lib64/libc-2.17.so
> 7fff86500000-7fff86510000 rw-p 001d0000 fd:00 33660109 /usr/lib64/libc-2.17.so
> 7fff86510000-7fff86540000 r-xp 00000000 fd:00 33594516 /usr/lib64/libselinux.so.1
> 7fff86540000-7fff86550000 r--p 00020000 fd:00 33594516 /usr/lib64/libselinux.so.1
> 7fff86550000-7fff86560000 rw-p 00030000 fd:00 33594516 /usr/lib64/libselinux.so.1
> 7fff86560000-7fff86570000 r--s 00000000 fd:00 67194934 /usr/lib64/gconv/gconv-modules.cache
> 7fff86570000-7fff86590000 r-xp 00000000 00:00 0 [vdso]
> 7fff86590000-7fff865c0000 r-xp 00000000 fd:00 33660102 /usr/lib64/ld-2.17.so
> 7fff865c0000-7fff865d0000 r--p 00020000 fd:00 33660102 /usr/lib64/ld-2.17.so
> 7fff865d0000-7fff865e0000 rw-p 00030000 fd:00 33660102 /usr/lib64/ld-2.17.so
> 7fffd27a0000-7fffd27d0000 rw-p 00000000 00:00 0 [stack]

--
Michal Hocko
SUSE Labs

2018-02-01 03:14:58

by Anshuman Khandual

[permalink] [raw]
Subject: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

On 01/31/2018 06:49 PM, Michal Hocko wrote:
> On Wed 31-01-18 10:35:38, Anshuman Khandual wrote:
>> On 01/30/2018 03:12 PM, Michal Hocko wrote:
> [...]
>>> Anshuman, could you try to run
>>> sed 's@^@@' /proc/self/smaps
>>> on a system with MAP_FIXED_NOREPLACE reverted?
>>>
>> After reverting the following commits from mmotm-2018-01-25-16-20 tag.
>>
>> 67caea694ba5965a52a61fdad495d847f03c4025 ("mm-introduce-map_fixed_safe-fix")
>> 64da2e0c134ecf3936a4c36b949bcf2cdc98977e ("fs-elf-drop-map_fixed-usage-from-elf_map-fix-fix")
>> 645983ab6ca7fd644f52b4c55462b91940012595 ("mm: don't use the same value for MAP_FIXED_NOREPLACE and MAP_SYNC")
>> d77bab291ac435aab91fa214b85efa74a26c9c22 ("fs-elf-drop-map_fixed-usage-from-elf_map-checkpatch-fixes")
>> a75c5f92d9ecb21d3299cc7db48e401cbf335c34 ("fs, elf: drop MAP_FIXED usage from elf_map")
>> 00906d029ffe515221e3939b222c237026af2903 ("mm: introduce MAP_FIXED_NOREPLACE")
>>
>> $sed 's@^@@' /proc/self/smaps
> Interesting
>
>> -------------------------------------------
>> 10000000-10020000 r-xp 00000000 fd:00 10558 /usr/bin/sed
>> 10020000-10030000 r--p 00010000 fd:00 10558 /usr/bin/sed
>> 10030000-10040000 rw-p 00020000 fd:00 10558 /usr/bin/sed
>> 2cbb0000-2cbe0000 rw-p 00000000 00:00 0 [heap]
> We still have a brk and at a different offset. Could you confirm that we
> still try to map previous brk at the clashing address 0x10030000?

yes.

[ 9.295990] vma c000001fc8137c80 start 0000000010030000 end 0000000010040000
next c000001fc81378c0 prev c000001fc8137680 mm c000001fc8108200
prot 8000000000000104 anon_vma (null) vm_ops (null)
pgoff 1003 file (null) private_data (null)
flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
[ 9.296351] CPU: 47 PID: 7537 Comm: sed Not tainted 4.14.0-00006-g4bd92fe-dirty #162
[ 9.296450] Call Trace:
[ 9.296482] [c000001fc70db9b0] [c000000000b180e0] dump_stack+0xb0/0xf0 (unreliable)
[ 9.296588] [c000001fc70db9f0] [c0000000002db0b8] do_brk_flags+0x2d8/0x440
[ 9.296674] [c000001fc70dbac0] [c0000000002db4d0] vm_brk_flags+0x80/0x130
[ 9.296751] [c000001fc70dbb20] [c0000000003d2998] set_brk+0x80/0xe8
[ 9.296824] [c000001fc70dbb60] [c0000000003d2518] load_elf_binary+0x12f8/0x1580
[ 9.296910] [c000001fc70dbc80] [c00000000035d9e0] search_binary_handler+0xd0/0x270
[ 9.296999] [c000001fc70dbd10] [c00000000035f938] do_execveat_common.isra.31+0x658/0x890
[ 9.297089] [c000001fc70dbdf0] [c00000000035ff80] SyS_execve+0x40/0x50
[ 9.297162] [c000001fc70dbe30] [c00000000000b220] system_call+0x58/0x6c

But coming back to when it failed with MAP_FIXED_NOREPLACE, looking into ELF
section details (readelf -aW /usr/bin/sed), there was a PT_LOAD segment with
p_memsz > p_filesz which might be causing set_brk() to be called.


Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
...
LOAD 0x020328 0x0000000010030328 0x0000000010030328 0x000384 0x0094a0 RW 0x10000

which can be confirmed by just dumping elf_brk/elf_bss for this particular
instance. (elf_brk > elf_bss)

$dmesg | grep elf_brk
[ 9.571192] elf_brk 10030328 elf_bss 10030000

static int load_elf_binary(struct linux_binprm *bprm)
---------------------

if (unlikely (elf_brk > elf_bss)) {
unsigned long nbyte;

/* There was a PT_LOAD segment with p_memsz > p_filesz
before this one. Map anonymous pages, if needed,
and clear the area. */
retval = set_brk(elf_bss + load_bias,
elf_brk + load_bias,
bss_prot);


---------------------
So is not there a chance that subsequent file mapping might be overlapping
with these anon mappings ? I mean may be thats how ELF loading might be
happening right now.


2018-02-01 13:10:59

by Michal Hocko

[permalink] [raw]
Subject: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

[CC Kees and Linus - for your background, we are talking about failures
http://lkml.kernel.org/r/[email protected]
introduced by http://lkml.kernel.org/r/[email protected]
Debugging has shown that load_elf_binary tries to map elf segment over
an existing brk - see below.]

On Thu 01-02-18 08:43:34, Anshuman Khandual wrote:
[...]
> [ 9.295990] vma c000001fc8137c80 start 0000000010030000 end 0000000010040000
> next c000001fc81378c0 prev c000001fc8137680 mm c000001fc8108200
> prot 8000000000000104 anon_vma (null) vm_ops (null)
> pgoff 1003 file (null) private_data (null)
> flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
> [ 9.296351] CPU: 47 PID: 7537 Comm: sed Not tainted 4.14.0-00006-g4bd92fe-dirty #162
> [ 9.296450] Call Trace:
> [ 9.296482] [c000001fc70db9b0] [c000000000b180e0] dump_stack+0xb0/0xf0 (unreliable)
> [ 9.296588] [c000001fc70db9f0] [c0000000002db0b8] do_brk_flags+0x2d8/0x440
> [ 9.296674] [c000001fc70dbac0] [c0000000002db4d0] vm_brk_flags+0x80/0x130
> [ 9.296751] [c000001fc70dbb20] [c0000000003d2998] set_brk+0x80/0xe8
> [ 9.296824] [c000001fc70dbb60] [c0000000003d2518] load_elf_binary+0x12f8/0x1580
> [ 9.296910] [c000001fc70dbc80] [c00000000035d9e0] search_binary_handler+0xd0/0x270
> [ 9.296999] [c000001fc70dbd10] [c00000000035f938] do_execveat_common.isra.31+0x658/0x890
> [ 9.297089] [c000001fc70dbdf0] [c00000000035ff80] SyS_execve+0x40/0x50
> [ 9.297162] [c000001fc70dbe30] [c00000000000b220] system_call+0x58/0x6c
>
> But coming back to when it failed with MAP_FIXED_NOREPLACE, looking into ELF
> section details (readelf -aW /usr/bin/sed), there was a PT_LOAD segment with
> p_memsz > p_filesz which might be causing set_brk() to be called.
>
>
> Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> ...
> LOAD 0x020328 0x0000000010030328 0x0000000010030328 0x000384 0x0094a0 RW 0x10000
>
> which can be confirmed by just dumping elf_brk/elf_bss for this particular
> instance. (elf_brk > elf_bss)

Hmm, interesting. So the above is not a regular brk. The check has been
added in 2001 by "v2.4.10.1 -> v2.4.10.2" but the changelog is not
revealing at all.

Btw. my /bin/ls also has MemSiz>FileSiz
LOAD 0x01ade0 0x000000000061ade0 0x000000000061ade0 0x00079c 0x001520 RW 0x200000
113: 000000000061b57c 0 NOTYPE GLOBAL DEFAULT ABS __bss_start

and do not see any problem. So this is more likely a problem of elf_brk
being placed at a wrong address. But I am desperately lost in this code
so I might be completely off.

> $dmesg | grep elf_brk
> [ 9.571192] elf_brk 10030328 elf_bss 10030000

Hmm these are on the same page. Is this really expected?

> static int load_elf_binary(struct linux_binprm *bprm)
> ---------------------
>
> if (unlikely (elf_brk > elf_bss)) {
> unsigned long nbyte;
>
> /* There was a PT_LOAD segment with p_memsz > p_filesz
> before this one. Map anonymous pages, if needed,
> and clear the area. */
> retval = set_brk(elf_bss + load_bias,
> elf_brk + load_bias,
> bss_prot);
>
>
> ---------------------
> So is not there a chance that subsequent file mapping might be overlapping
> with these anon mappings ? I mean may be thats how ELF loading might be
> happening right now.

I will study the code more but it would be really great if
somebody more familiar with this area could help me out a
bit. Why do we add this brk at all and why it doesn't matter that
we map over it by a real file mapping. As per previous email
http://lkml.kernel.org/r/[email protected] there
will be a new brk established later.

--
Michal Hocko
SUSE Labs

2018-02-01 13:41:17

by Michal Hocko

[permalink] [raw]
Subject: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

On Thu 01-02-18 14:10:07, Michal Hocko wrote:
> [CC Kees and Linus - for your background, we are talking about failures
> http://lkml.kernel.org/r/[email protected]
> introduced by http://lkml.kernel.org/r/[email protected]
> Debugging has shown that load_elf_binary tries to map elf segment over
> an existing brk - see below.]
>
> On Thu 01-02-18 08:43:34, Anshuman Khandual wrote:
> [...]
> > [ 9.295990] vma c000001fc8137c80 start 0000000010030000 end 0000000010040000
> > next c000001fc81378c0 prev c000001fc8137680 mm c000001fc8108200
> > prot 8000000000000104 anon_vma (null) vm_ops (null)
> > pgoff 1003 file (null) private_data (null)
> > flags: 0x100073(read|write|mayread|maywrite|mayexec|account)
> > [ 9.296351] CPU: 47 PID: 7537 Comm: sed Not tainted 4.14.0-00006-g4bd92fe-dirty #162
> > [ 9.296450] Call Trace:
> > [ 9.296482] [c000001fc70db9b0] [c000000000b180e0] dump_stack+0xb0/0xf0 (unreliable)
> > [ 9.296588] [c000001fc70db9f0] [c0000000002db0b8] do_brk_flags+0x2d8/0x440
> > [ 9.296674] [c000001fc70dbac0] [c0000000002db4d0] vm_brk_flags+0x80/0x130
> > [ 9.296751] [c000001fc70dbb20] [c0000000003d2998] set_brk+0x80/0xe8
> > [ 9.296824] [c000001fc70dbb60] [c0000000003d2518] load_elf_binary+0x12f8/0x1580
> > [ 9.296910] [c000001fc70dbc80] [c00000000035d9e0] search_binary_handler+0xd0/0x270
> > [ 9.296999] [c000001fc70dbd10] [c00000000035f938] do_execveat_common.isra.31+0x658/0x890
> > [ 9.297089] [c000001fc70dbdf0] [c00000000035ff80] SyS_execve+0x40/0x50
> > [ 9.297162] [c000001fc70dbe30] [c00000000000b220] system_call+0x58/0x6c
> >
> > But coming back to when it failed with MAP_FIXED_NOREPLACE, looking into ELF
> > section details (readelf -aW /usr/bin/sed), there was a PT_LOAD segment with
> > p_memsz > p_filesz which might be causing set_brk() to be called.
> >
> >
> > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
> > ...
> > LOAD 0x020328 0x0000000010030328 0x0000000010030328 0x000384 0x0094a0 RW 0x10000
> >
> > which can be confirmed by just dumping elf_brk/elf_bss for this particular
> > instance. (elf_brk > elf_bss)
>
> Hmm, interesting. So the above is not a regular brk. The check has been
> added in 2001 by "v2.4.10.1 -> v2.4.10.2" but the changelog is not
> revealing at all.
>
> Btw. my /bin/ls also has MemSiz>FileSiz
> LOAD 0x01ade0 0x000000000061ade0 0x000000000061ade0 0x00079c 0x001520 RW 0x200000
> 113: 000000000061b57c 0 NOTYPE GLOBAL DEFAULT ABS __bss_start
>
> and do not see any problem. So this is more likely a problem of elf_brk
> being placed at a wrong address. But I am desperately lost in this code
> so I might be completely off.

Thanks a lot to Michael Matz for his background. He has pointed me to
the following two segments from your binary[1]
LOAD 0x0000000000000000 0x0000000010000000 0x0000000010000000
0x0000000000013a8c 0x0000000000013a8c R E 10000
LOAD 0x000000000001fd40 0x000000001002fd40 0x000000001002fd40
0x00000000000002c0 0x00000000000005e8 RW 10000
LOAD 0x0000000000020328 0x0000000010030328 0x0000000010030328
0x0000000000000384 0x00000000000094a0 RW 10000

That binary has two RW LOAD segments, the first crosses a page border
into the second
0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr)

He says
: This is actually an artifact of RELRO machinism. The first RW mapping
: will be remapped as RO after relocations are applied (to increase
: security).
: Well, to be honest, normal relro binaries also don't have more than
: two LOAD segments, so whatever RHEL did to their compilation options,
: it's something in addition to just relro (which can be detected by
: having a GNU_RELRO program header)
: But it definitely has something to do with relro, it's just not the
: whole story yet.

I am still trying to wrap my head around all this, but it smells rather
dubious to map different segments over the same page. Is this something
that might happen widely and therefore MAP_FIXED_NOREPLACE is a no-go
when loading ELF segments? Or is this a special case we can detect?

Or am I completely off?

[1] http://lkml.kernel.org/r/96458c0a-e273-3fb9-a33b-f6f2d536f90b%40linux.vnet.ibm.com

--
Michal Hocko
SUSE Labs

2018-02-01 13:49:18

by Michal Hocko

[permalink] [raw]
Subject: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

On Thu 01-02-18 08:43:34, Anshuman Khandual wrote:
[...]
> $dmesg | grep elf_brk
> [ 9.571192] elf_brk 10030328 elf_bss 10030000
>
> static int load_elf_binary(struct linux_binprm *bprm)
> ---------------------
>
> if (unlikely (elf_brk > elf_bss)) {
> unsigned long nbyte;
>
> /* There was a PT_LOAD segment with p_memsz > p_filesz
> before this one. Map anonymous pages, if needed,
> and clear the area. */
> retval = set_brk(elf_bss + load_bias,
> elf_brk + load_bias,
> bss_prot);
>
>
> ---------------------

Just a blind shot... Does the following make any difference?
---
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 021fe78998ea..04b24d00c911 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -895,7 +895,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
the correct location in memory. */
for(i = 0, elf_ppnt = elf_phdata;
i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
- int elf_prot = 0, elf_flags;
+ int elf_prot = 0, elf_flags, elf_fixed = MAP_FIXED_NOREPLACE;
unsigned long k, vaddr;
unsigned long total_size = 0;

@@ -927,6 +927,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
*/
}
}
+ elf_fixed = MAP_FIXED;
}

if (elf_ppnt->p_flags & PF_R)
@@ -944,7 +945,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
* the ET_DYN load_addr calculations, proceed normally.
*/
if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
- elf_flags |= MAP_FIXED_NOREPLACE;
+ elf_flags |= elf_fixed;
} else if (loc->elf_ex.e_type == ET_DYN) {
/*
* This logic is run once for the first LOAD Program
@@ -980,7 +981,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
load_bias = ELF_ET_DYN_BASE;
if (current->flags & PF_RANDOMIZE)
load_bias += arch_mmap_rnd();
- elf_flags |= MAP_FIXED_NOREPLACE;
+ elf_flags |= elf_fixed;
} else
load_bias = 0;


--
Michal Hocko
SUSE Labs

2018-02-01 20:56:39

by Kees Cook

[permalink] [raw]
Subject: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

On Fri, Feb 2, 2018 at 12:40 AM, Michal Hocko <[email protected]> wrote:
> On Thu 01-02-18 14:10:07, Michal Hocko wrote:
> Thanks a lot to Michael Matz for his background. He has pointed me to
> the following two segments from your binary[1]
> LOAD 0x0000000000000000 0x0000000010000000 0x0000000010000000
> 0x0000000000013a8c 0x0000000000013a8c R E 10000
> LOAD 0x000000000001fd40 0x000000001002fd40 0x000000001002fd40
> 0x00000000000002c0 0x00000000000005e8 RW 10000
> LOAD 0x0000000000020328 0x0000000010030328 0x0000000010030328
> 0x0000000000000384 0x00000000000094a0 RW 10000
>
> That binary has two RW LOAD segments, the first crosses a page border
> into the second
> 0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr)
>
> He says
> : This is actually an artifact of RELRO machinism. The first RW mapping
> : will be remapped as RO after relocations are applied (to increase
> : security).
> : Well, to be honest, normal relro binaries also don't have more than
> : two LOAD segments, so whatever RHEL did to their compilation options,
> : it's something in addition to just relro (which can be detected by
> : having a GNU_RELRO program header)
> : But it definitely has something to do with relro, it's just not the
> : whole story yet.
>
> I am still trying to wrap my head around all this, but it smells rather
> dubious to map different segments over the same page. Is this something
> that might happen widely and therefore MAP_FIXED_NOREPLACE is a no-go
> when loading ELF segments? Or is this a special case we can detect?

Eww. FWIW, I would expect that to be rare and detectable.

-Kees

--
Kees Cook
Pixel Security

2018-02-01 21:11:15

by Kees Cook

[permalink] [raw]
Subject: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

On Fri, Feb 2, 2018 at 12:48 AM, Michal Hocko <[email protected]> wrote:
> On Thu 01-02-18 08:43:34, Anshuman Khandual wrote:
> [...]
>> $dmesg | grep elf_brk
>> [ 9.571192] elf_brk 10030328 elf_bss 10030000
>>
>> static int load_elf_binary(struct linux_binprm *bprm)
>> ---------------------
>>
>> if (unlikely (elf_brk > elf_bss)) {
>> unsigned long nbyte;
>>
>> /* There was a PT_LOAD segment with p_memsz > p_filesz
>> before this one. Map anonymous pages, if needed,
>> and clear the area. */
>> retval = set_brk(elf_bss + load_bias,
>> elf_brk + load_bias,
>> bss_prot);
>>
>>
>> ---------------------
>
> Just a blind shot... Does the following make any difference?
> ---
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 021fe78998ea..04b24d00c911 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -895,7 +895,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> the correct location in memory. */
> for(i = 0, elf_ppnt = elf_phdata;
> i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
> - int elf_prot = 0, elf_flags;
> + int elf_prot = 0, elf_flags, elf_fixed = MAP_FIXED_NOREPLACE;
> unsigned long k, vaddr;
> unsigned long total_size = 0;
>
> @@ -927,6 +927,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> */
> }
> }
> + elf_fixed = MAP_FIXED;
> }
>
> if (elf_ppnt->p_flags & PF_R)
> @@ -944,7 +945,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> * the ET_DYN load_addr calculations, proceed normally.
> */
> if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
> - elf_flags |= MAP_FIXED_NOREPLACE;
> + elf_flags |= elf_fixed;
> } else if (loc->elf_ex.e_type == ET_DYN) {
> /*
> * This logic is run once for the first LOAD Program
> @@ -980,7 +981,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> load_bias = ELF_ET_DYN_BASE;
> if (current->flags & PF_RANDOMIZE)
> load_bias += arch_mmap_rnd();
> - elf_flags |= MAP_FIXED_NOREPLACE;
> + elf_flags |= elf_fixed;
> } else
> load_bias = 0;

If I'm reading this patch correctly, the intention is to allow LOADs
after brk-expansion to collide with the prior LOAD? (This patch will
need more comments for our future sanity.) I think this makes sense,
though it might be nice to be sure we're overlapping only with the
prior LOAD (and nothing else), but it's not clear to me the best way
to accomplish that here. Even without that extra check, I think this
is a net benefit (i.e. gain MAP_FIXED_NOREPLACE without breaking
page-crossing LOADs).

Actually, maybe the second LOAD could be page-aligned first, so it
doesn't collide with the prior LOAD, and it could keep
MAP_FIXED_NOREPLACE? I'll have more time to study this on Monday...

-Kees

--
Kees Cook
Pixel Security

2018-02-12 15:47:01

by Michal Hocko

[permalink] [raw]
Subject: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

Did you have any chance to test the following hack, Anshuman?

On Thu 01-02-18 14:48:29, Michal Hocko wrote:
[...]
> Just a blind shot... Does the following make any difference?
> ---
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 021fe78998ea..04b24d00c911 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -895,7 +895,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> the correct location in memory. */
> for(i = 0, elf_ppnt = elf_phdata;
> i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
> - int elf_prot = 0, elf_flags;
> + int elf_prot = 0, elf_flags, elf_fixed = MAP_FIXED_NOREPLACE;
> unsigned long k, vaddr;
> unsigned long total_size = 0;
>
> @@ -927,6 +927,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> */
> }
> }
> + elf_fixed = MAP_FIXED;
> }
>
> if (elf_ppnt->p_flags & PF_R)
> @@ -944,7 +945,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> * the ET_DYN load_addr calculations, proceed normally.
> */
> if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
> - elf_flags |= MAP_FIXED_NOREPLACE;
> + elf_flags |= elf_fixed;
> } else if (loc->elf_ex.e_type == ET_DYN) {
> /*
> * This logic is run once for the first LOAD Program
> @@ -980,7 +981,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> load_bias = ELF_ET_DYN_BASE;
> if (current->flags & PF_RANDOMIZE)
> load_bias += arch_mmap_rnd();
> - elf_flags |= MAP_FIXED_NOREPLACE;
> + elf_flags |= elf_fixed;
> } else
> load_bias = 0;
>
--
Michal Hocko
SUSE Labs

2018-02-13 01:04:00

by Anshuman Khandual

[permalink] [raw]
Subject: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

On 02/12/2018 08:18 PM, Michal Hocko wrote:
> Did you have any chance to test the following hack, Anshuman?

The system has some issues at the moment, will get back on this.


2018-02-13 06:50:31

by Anshuman Khandual

[permalink] [raw]
Subject: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

On 02/01/2018 07:18 PM, Michal Hocko wrote:
> On Thu 01-02-18 08:43:34, Anshuman Khandual wrote:
> [...]
>> $dmesg | grep elf_brk
>> [ 9.571192] elf_brk 10030328 elf_bss 10030000
>>
>> static int load_elf_binary(struct linux_binprm *bprm)
>> ---------------------
>>
>> if (unlikely (elf_brk > elf_bss)) {
>> unsigned long nbyte;
>>
>> /* There was a PT_LOAD segment with p_memsz > p_filesz
>> before this one. Map anonymous pages, if needed,
>> and clear the area. */
>> retval = set_brk(elf_bss + load_bias,
>> elf_brk + load_bias,
>> bss_prot);
>>
>>
>> ---------------------
>
> Just a blind shot... Does the following make any difference?
> ---
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 021fe78998ea..04b24d00c911 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -895,7 +895,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> the correct location in memory. */
> for(i = 0, elf_ppnt = elf_phdata;
> i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
> - int elf_prot = 0, elf_flags;
> + int elf_prot = 0, elf_flags, elf_fixed = MAP_FIXED_NOREPLACE;
> unsigned long k, vaddr;
> unsigned long total_size = 0;
>
> @@ -927,6 +927,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> */
> }
> }
> + elf_fixed = MAP_FIXED;
> }
>
> if (elf_ppnt->p_flags & PF_R)
> @@ -944,7 +945,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> * the ET_DYN load_addr calculations, proceed normally.
> */
> if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
> - elf_flags |= MAP_FIXED_NOREPLACE;
> + elf_flags |= elf_fixed;
> } else if (loc->elf_ex.e_type == ET_DYN) {
> /*
> * This logic is run once for the first LOAD Program
> @@ -980,7 +981,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> load_bias = ELF_ET_DYN_BASE;
> if (current->flags & PF_RANDOMIZE)
> load_bias += arch_mmap_rnd();
> - elf_flags |= MAP_FIXED_NOREPLACE;
> + elf_flags |= elf_fixed;
> } else
> load_bias = 0;
>
>

Yeah, it does solve the problem on mmotm-2018-01-25-16-20.


2018-02-13 10:01:47

by Michal Hocko

[permalink] [raw]
Subject: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE

On Tue 13-02-18 12:19:18, Anshuman Khandual wrote:
> On 02/01/2018 07:18 PM, Michal Hocko wrote:
> > On Thu 01-02-18 08:43:34, Anshuman Khandual wrote:
> > [...]
> >> $dmesg | grep elf_brk
> >> [ 9.571192] elf_brk 10030328 elf_bss 10030000
> >>
> >> static int load_elf_binary(struct linux_binprm *bprm)
> >> ---------------------
> >>
> >> if (unlikely (elf_brk > elf_bss)) {
> >> unsigned long nbyte;
> >>
> >> /* There was a PT_LOAD segment with p_memsz > p_filesz
> >> before this one. Map anonymous pages, if needed,
> >> and clear the area. */
> >> retval = set_brk(elf_bss + load_bias,
> >> elf_brk + load_bias,
> >> bss_prot);
> >>
> >>
> >> ---------------------
> >
> > Just a blind shot... Does the following make any difference?
> > ---
> > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > index 021fe78998ea..04b24d00c911 100644
> > --- a/fs/binfmt_elf.c
> > +++ b/fs/binfmt_elf.c
> > @@ -895,7 +895,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> > the correct location in memory. */
> > for(i = 0, elf_ppnt = elf_phdata;
> > i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
> > - int elf_prot = 0, elf_flags;
> > + int elf_prot = 0, elf_flags, elf_fixed = MAP_FIXED_NOREPLACE;
> > unsigned long k, vaddr;
> > unsigned long total_size = 0;
> >
> > @@ -927,6 +927,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> > */
> > }
> > }
> > + elf_fixed = MAP_FIXED;
> > }
> >
> > if (elf_ppnt->p_flags & PF_R)
> > @@ -944,7 +945,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> > * the ET_DYN load_addr calculations, proceed normally.
> > */
> > if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
> > - elf_flags |= MAP_FIXED_NOREPLACE;
> > + elf_flags |= elf_fixed;
> > } else if (loc->elf_ex.e_type == ET_DYN) {
> > /*
> > * This logic is run once for the first LOAD Program
> > @@ -980,7 +981,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
> > load_bias = ELF_ET_DYN_BASE;
> > if (current->flags & PF_RANDOMIZE)
> > load_bias += arch_mmap_rnd();
> > - elf_flags |= MAP_FIXED_NOREPLACE;
> > + elf_flags |= elf_fixed;
> > } else
> > load_bias = 0;
> >
> >
>
> Yeah, it does solve the problem on mmotm-2018-01-25-16-20.

Thanks a lot for testing! I will post an RFC patch shortly.

--
Michal Hocko
SUSE Labs

2018-02-13 10:05:51

by Michal Hocko

[permalink] [raw]
Subject: [RFC PATCH] elf: enforce MAP_FIXED on overlaying elf segments (was: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE)

On Fri 02-02-18 07:55:14, Kees Cook wrote:
> On Fri, Feb 2, 2018 at 12:40 AM, Michal Hocko <[email protected]> wrote:
> > On Thu 01-02-18 14:10:07, Michal Hocko wrote:
> > Thanks a lot to Michael Matz for his background. He has pointed me to
> > the following two segments from your binary[1]
> > LOAD 0x0000000000000000 0x0000000010000000 0x0000000010000000
> > 0x0000000000013a8c 0x0000000000013a8c R E 10000
> > LOAD 0x000000000001fd40 0x000000001002fd40 0x000000001002fd40
> > 0x00000000000002c0 0x00000000000005e8 RW 10000
> > LOAD 0x0000000000020328 0x0000000010030328 0x0000000010030328
> > 0x0000000000000384 0x00000000000094a0 RW 10000
> >
> > That binary has two RW LOAD segments, the first crosses a page border
> > into the second
> > 0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr)
> >
> > He says
> > : This is actually an artifact of RELRO machinism. The first RW mapping
> > : will be remapped as RO after relocations are applied (to increase
> > : security).
> > : Well, to be honest, normal relro binaries also don't have more than
> > : two LOAD segments, so whatever RHEL did to their compilation options,
> > : it's something in addition to just relro (which can be detected by
> > : having a GNU_RELRO program header)
> > : But it definitely has something to do with relro, it's just not the
> > : whole story yet.
> >
> > I am still trying to wrap my head around all this, but it smells rather
> > dubious to map different segments over the same page. Is this something
> > that might happen widely and therefore MAP_FIXED_NOREPLACE is a no-go
> > when loading ELF segments? Or is this a special case we can detect?
>
> Eww. FWIW, I would expect that to be rare and detectable.

OK, so Anshuman has confirmed [1] that the patch below fixes the issue
for him. I am sending this as an RFC because this is not really my area
and load_elf_binary is obscure as hell. The changelog could see much
more clear wording than I am able to provide. Any help would be highly
appreciated.

[1] http://lkml.kernel.org/r/[email protected]

From 97e7355a6dc31a73005fa806566a57eb5c38032b Mon Sep 17 00:00:00 2001
From: Michal Hocko <[email protected]>
Date: Tue, 13 Feb 2018 10:50:53 +0100
Subject: [PATCH] elf: enforce MAP_FIXED on overlaying elf segments

Anshuman has reported that some ELF binaries in his environment fail to
start with
[ 23.423642] 9148 (sed): Uhuuh, elf segment at 0000000010030000 requested but the memory is mapped already
[ 23.423706] requested [10030000, 10040000] mapped [10030000, 10040000] 100073 anon

The reason is that the above binary has overlapping elf segments:
LOAD 0x0000000000000000 0x0000000010000000 0x0000000010000000
0x0000000000013a8c 0x0000000000013a8c R E 10000
LOAD 0x000000000001fd40 0x000000001002fd40 0x000000001002fd40
0x00000000000002c0 0x00000000000005e8 RW 10000
LOAD 0x0000000000020328 0x0000000010030328 0x0000000010030328
0x0000000000000384 0x00000000000094a0 RW 10000

That binary has two RW LOAD segments, the first crosses a page border
into the second

0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr)

Handle this situation by enforcing MAP_FIXED when we establish a
temporary brk VMA to handle overlapping segments. All other mappings
will still use MAP_FIXED_NOREPLACE.

Fixes: fs, elf: drop MAP_FIXED usage from elf_map
Reported-by: Anshuman Khandual <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
fs/binfmt_elf.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 2f492dfcabde..4679d1d945f9 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -895,7 +895,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
the correct location in memory. */
for(i = 0, elf_ppnt = elf_phdata;
i < loc->elf_ex.e_phnum; i++, elf_ppnt++) {
- int elf_prot = 0, elf_flags;
+ int elf_prot = 0, elf_flags, elf_fixed = MAP_FIXED_NOREPLACE;
unsigned long k, vaddr;
unsigned long total_size = 0;

@@ -927,6 +927,13 @@ static int load_elf_binary(struct linux_binprm *bprm)
*/
}
}
+
+ /*
+ * Some binaries have overlapping elf segments and then
+ * we have to forcefully map over an existing mapping
+ * e.g. over this newly established brk mapping.
+ */
+ elf_fixed = MAP_FIXED;
}

if (elf_ppnt->p_flags & PF_R)
@@ -944,7 +951,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
* the ET_DYN load_addr calculations, proceed normally.
*/
if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
- elf_flags |= MAP_FIXED_NOREPLACE;
+ elf_flags |= elf_fixed;
} else if (loc->elf_ex.e_type == ET_DYN) {
/*
* This logic is run once for the first LOAD Program
@@ -980,7 +987,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
load_bias = ELF_ET_DYN_BASE;
if (current->flags & PF_RANDOMIZE)
load_bias += arch_mmap_rnd();
- elf_flags |= MAP_FIXED_NOREPLACE;
+ elf_flags |= elf_fixed;
} else
load_bias = 0;

--
2.15.1

--
Michal Hocko
SUSE Labs

2018-02-14 16:32:39

by Khalid Aziz

[permalink] [raw]
Subject: Re: [RFC PATCH] elf: enforce MAP_FIXED on overlaying elf segments (was: Re: ppc elf_map breakage with MAP_FIXED_NOREPLACE)

On Tue, 2018-02-13 at 11:04 +0100, Michal Hocko wrote:
>
> From 97e7355a6dc31a73005fa806566a57eb5c38032b Mon Sep 17 00:00:00
> 2001
> From: Michal Hocko <[email protected]>
> Date: Tue, 13 Feb 2018 10:50:53 +0100
> Subject: [PATCH] elf: enforce MAP_FIXED on overlaying elf segments
>
> Anshuman has reported that some ELF binaries in his environment fail
> to
> start with
>  [   23.423642] 9148 (sed): Uhuuh, elf segment at 0000000010030000
> requested but the memory is mapped already
>  [   23.423706] requested [10030000, 10040000] mapped [10030000,
> 10040000] 100073 anon
>
> The reason is that the above binary has overlapping elf segments:
>   LOAD           0x0000000000000000 0x0000000010000000
> 0x0000000010000000
>                  0x0000000000013a8c 0x0000000000013a8c  R E    10000
>   LOAD           0x000000000001fd40 0x000000001002fd40
> 0x000000001002fd40
>                  0x00000000000002c0 0x00000000000005e8  RW     10000
>   LOAD           0x0000000000020328 0x0000000010030328
> 0x0000000010030328
>                  0x0000000000000384 0x00000000000094a0  RW     10000
>
> That binary has two RW LOAD segments, the first crosses a page border
> into the second
>
> 0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-
> vaddr)
>
> Handle this situation by enforcing MAP_FIXED when we establish a
> temporary brk VMA to handle overlapping segments. All other mappings
> will still use MAP_FIXED_NOREPLACE.
>
> Fixes: fs, elf: drop MAP_FIXED usage from elf_map
> Reported-by: Anshuman Khandual <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
>

Looks reasonable to me.

Reviewed-by: Khalid Aziz <[email protected]>