Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751177AbdIOFcA (ORCPT ); Fri, 15 Sep 2017 01:32:00 -0400 Received: from mail-wm0-f49.google.com ([74.125.82.49]:47599 "EHLO mail-wm0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750767AbdIOFb7 (ORCPT ); Fri, 15 Sep 2017 01:31:59 -0400 X-Google-Smtp-Source: AOwi7QBCHNllA/pwj58tX1/c3EBxtxPl3Xe6SQ8lsxBF8DRqeRV5aa+qeicQ+C+pJEVYe3+Rk1H1/w== Date: Fri, 15 Sep 2017 07:31:55 +0200 From: Ingo Molnar To: Andy Lutomirski Cc: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , X86 ML , Oleg Nesterov , Eugene Syromyatnikov , "linux-kernel@vger.kernel.org" , Linus Torvalds , Peter Zijlstra , Andrew Morton Subject: Re: [PATCH] x86/asm/64: do not clear high 32 bits of syscall number when CONFIG_X86_X32=y Message-ID: <20170915053155.f336vlejdql23zxu@gmail.com> References: <20170912225756.GA19364@altlinux.org> <20170914213316.GB17533@altlinux.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2060 Lines: 54 * Andy Lutomirski wrote: > >> > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S > >> > index 4916725..3bab6af 100644 > >> > --- a/arch/x86/entry/entry_64.S > >> > +++ b/arch/x86/entry/entry_64.S > >> > @@ -185,12 +185,10 @@ entry_SYSCALL_64_fastpath: > >> > */ > >> > TRACE_IRQS_ON > >> > ENABLE_INTERRUPTS(CLBR_NONE) > >> > -#if __SYSCALL_MASK == ~0 > >> > - cmpq $__NR_syscall_max, %rax > >> > -#else > >> > - andl $__SYSCALL_MASK, %eax > >> > - cmpl $__NR_syscall_max, %eax > >> > +#if __SYSCALL_MASK != ~0 > >> > + andq $__SYSCALL_MASK, %rax > >> > #endif > >> > + cmpq $__NR_syscall_max, %rax > >> > >> I don't know much about x32 userspace, but there's an argument that > >> the high bits *should* be masked off if the x32 bit is set. > > > > Why? > > Because it always worked that way. > > That being said, I'd be okay with applying your patch and seeing > whether anything breaks. Ingo? So I believe this was introduced with x32 as a 'fresh, modern syscall ABI' behavioral aspect, because we wanted to protect the overly complex syscall entry code from 'weird' input values. IIRC there was an old bug where we'd overflow the syscall table in certain circumstances ... But our new, redesigned entry code is a lot less complex, a lot more readable and a lot more maintainable (not to mention a lot more robust), so if invalid RAX values with high bits set get reliably turned into -ENOSYS or such then I'd not mind the patch per se either, as a general consistency improvement. Of course if something in x32 user-land breaks then this turns into an ABI and we have to reintroduce this aspect, as a quirk :-/ It should also improve x32 syscall performance a tiny bit, right? So might be worth a try on various grounds. ( Another future advantage would be that _maybe_ we could use the high RAX component as an extra (64-bit only) special argument of sorts. Not that I can think of any such use right now. ) Thanks, Ingo