Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932896AbdCUVQ7 (ORCPT ); Tue, 21 Mar 2017 17:16:59 -0400 Received: from tartarus.angband.pl ([89.206.35.136]:36917 "EHLO tartarus.angband.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758258AbdCUVQ6 (ORCPT ); Tue, 21 Mar 2017 17:16:58 -0400 Date: Tue, 21 Mar 2017 22:16:48 +0100 From: Adam Borowski To: Dmitry Safonov Cc: linux-kernel@vger.kernel.org, 0x7f454c46@gmail.com, linux-mm@kvack.org, Andrei Vagin , Cyrill Gorcunov , Borislav Petkov , "Kirill A. Shutemov" , x86@kernel.org, "H. Peter Anvin" , Andy Lutomirski , Ingo Molnar , Thomas Gleixner Subject: Re: [PATCHv3] x86/mm: set x32 syscall bit in SET_PERSONALITY() Message-ID: <20170321211648.xcgwigbv37ktxofx@angband.pl> References: <20170321174711.29880-1-dsafonov@virtuozzo.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="7djm4lj6yhu65nrj" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20170321174711.29880-1-dsafonov@virtuozzo.com> X-Junkbait: aaron@angband.pl, zzyx@angband.pl User-Agent: NeoMutt/20170113 (1.7.2) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: kilobyte@angband.pl X-SA-Exim-Scanned: No (on tartarus.angband.pl); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4250 Lines: 162 --7djm4lj6yhu65nrj Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit On Tue, Mar 21, 2017 at 08:47:11PM +0300, Dmitry Safonov wrote: > After my changes to mmap(), its code now relies on the bitness of > performing syscall. According to that, it chooses the base of allocation: > mmap_base for 64-bit mmap() and mmap_compat_base for 32-bit syscall. > It was done by: > commit 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for > 32-bit mmap()"). > > The code afterwards relies on in_compat_syscall() returning true for > 32-bit syscalls. It's usually so while we're in context of application > that does 32-bit syscalls. But during exec() it is not valid for x32 ELF. > The reason is that the application hasn't yet done any syscall, so x32 > bit has not being set. > That results in -ENOMEM for x32 ELF files as there fired BAD_ADDR() > in elf_map(), that is called from do_execve()->load_elf_binary(). > For i386 ELFs it works as SET_PERSONALITY() sets TS_COMPAT flag. > > Set x32 bit before first return to userspace, during setting personality > at exec(). This way we can rely on in_compat_syscall() during exec(). > Do also the reverse: drop x32 syscall bit at SET_PERSONALITY for 64-bits. > > Fixes: commit 1b028f784e8c ("x86/mm: Introduce mmap_compat_base() for > 32-bit mmap()") Tested: with bash:x32, mksh:amd64, posh:i386, zsh:armhf (binfmt:qemu), fork+exec works for every parent-child combination. Contrary to my naive initial reading of your fix, mixing syscalls from a process of the wrong ABI also works as it did before. While using a glibc wrapper will call the right version, x32 processes calling amd64 syscalls is surprisingly common -- this brings seccomp joy. I've attached a freestanding test case for write() and mmap(); it's freestanding asm as most of you don't have an x32 toolchain at hand, sorry for unfriendly error messages. So with these two patches: x86/tls: Forcibly set the accessed bit in TLS segments x86/mm: set x32 syscall bit in SET_PERSONALITY() everything appears to be fine. -- ⢀⣴⠾⠻⢶⣦⠀ Meow! ⣾⠁⢠⠒⠀⣿⡁ ⢿⡄⠘⠷⠚⠋⠀ Collisions shmolisions, let's see them find a collision or second ⠈⠳⣄⠀⠀⠀⠀ preimage for double rot13! --7djm4lj6yhu65nrj Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="meow.s" .globl _start .data msg: .ascii "Meow!\n" badmsg: .ascii "syscall failed\n" .text _start: # x32 mov $0x40000001, %rax # syscall: write mov $1, %rdi mov $msg, %rsi mov $6, %rdx syscall # amd64 mov $1, %rax # syscall: write mov $1, %rdi mov $msg, %rsi mov $6, %rdx syscall # i386 mov $4, %eax # syscall: write mov $1, %ebx mov $msg, %ecx mov $6, %edx int $0x80 # x32 mov $0x40000009, %rax # syscall: mmap mov $0, %rdi mov $0x10000, %rsi mov $3, %rdx # PROT_READ|PROT_WRITE mov $0x62, %r10 # MAP_PRIVATE|MAP_ANON|MAP_32BIT mov $-1, %r8 mov $0, %r9 syscall or %rax, %rax js badness # amd64 mov $0x9, %rax # syscall: mmap mov $0, %rdi mov $0x10000, %rsi mov $3, %rdx # PROT_READ|PROT_WRITE mov $0x62, %r10 # MAP_PRIVATE|MAP_ANON|MAP_32BIT mov $-1, %r8 mov $0, %r9 syscall or %rax, %rax js badness jmp goodbye # m'kay, this one doesn't work, no regression # i386 mov $0x90, %eax # syscall: mmap mov $0, %ebx mov $0x10000, %ecx mov $3, %edx # PROT_READ|PROT_WRITE mov $0x62, %esi # MAP_PRIVATE|MAP_ANON|MAP_32BIT mov $-1, %edi mov $0, %ebp int $0x80 movslq %eax, %rax or %rax, %rax js badness goodbye: mov $0x4000003c, %rax # syscall: _exit xor %rdi, %rdi syscall badness: # I'm too lazy to printf this as a number... push %rax mov $0x40000001, %rax # syscall: write mov $1, %rdi mov $badmsg, %rsi mov $15, %rdx syscall mov $0x4000003c, %rax # syscall: _exit pop %rdi syscall --7djm4lj6yhu65nrj Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename=Makefile # Any of amd64/x32/i386 will do. X86=x86_64-linux-gnu all: meow-x32 meow-amd64 clean: rm -f meow-* meow-x32: meow.s $(X86)-as --x32 $^ -o $@.o $(X86)-ld -melf32_x86_64 -s $@.o -o $@ meow-amd64: meow.s $(X86)-as --64 $^ -o $@.o $(X86)-ld -melf_x86_64 -s $@.o -o $@ --7djm4lj6yhu65nrj--