Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752605AbbBYJUv (ORCPT ); Wed, 25 Feb 2015 04:20:51 -0500 Received: from mail-wi0-f174.google.com ([209.85.212.174]:50675 "EHLO mail-wi0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751866AbbBYJUs (ORCPT ); Wed, 25 Feb 2015 04:20:48 -0500 Date: Wed, 25 Feb 2015 10:20:43 +0100 From: Ingo Molnar To: "H. Peter Anvin" Cc: Andy Lutomirski , Denys Vlasenko , Linus Torvalds , Steven Rostedt , Borislav Petkov , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , X86 ML , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns Message-ID: <20150225092043.GB16165@gmail.com> References: <1424803895-4420-1-git-send-email-dvlasenk@redhat.com> <54ED00B5.3020203@zytor.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <54ED00B5.3020203@zytor.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2211 Lines: 67 * H. Peter Anvin wrote: > On 02/24/2015 02:25 PM, Andy Lutomirski wrote: > > On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko wrote: > >> > >> In all three 32-bit entry points, %eax is > >> zero-extended to %rax. It is safe to do 32-bit compare > >> when checking that syscall# is not too large. > > > > Applied. Thanks! > > > > NAK NAK NAK NAK NAK!!!! > > We have already had this turn into a security issue not > just once but TWICE, because someone decided to > "optimize" the path by taking out the zero extend. > > The use of a 64-bit compare here is an intentional "belts > and suspenders" safety issue. I think the fundamental fragility is that we allow the high 32 bits to be nonzero. So could we just zap the high 32 bits of RAX early in the entry code, and then from that point on we could both use 32-bit ops and won't have to remember the possibility either? That's arguably one more (cheap) instruction in the 32-bit entry paths but then we could use the shorter 32-bit instructions for compares and other uses and could always be certain that we get what we want. But, if we do that, we can do even better, and also do an optimization of the 64-bit entry path as well: we could simply mask RAX with 0x3ff and not do a compare. Pad the syscall table up to 0x400 (1024) entries and fill in the table with sys_ni syscall entries. This is valid on 64-bit and 32-bit kernels as well, and it allows the removal of a compare from the syscall entry path, at the cost of a couple of kilobytes of unused syscall table. The downside would be that if we ever grow past 1024 syscall entries we'll be in trouble if new userspace calls syscall 513 on an old kernel and gets syscall 1. I doubt we'll ever get so many syscalls, and user-space will be able to be smart in any case, so it's not a showstopper. This is the safest and quickest implementation as well. Thoughts? Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/