Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755479AbbHYK7z (ORCPT ); Tue, 25 Aug 2015 06:59:55 -0400 Received: from mail-ob0-f179.google.com ([209.85.214.179]:33616 "EHLO mail-ob0-f179.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751937AbbHYK7x (ORCPT ); Tue, 25 Aug 2015 06:59:53 -0400 MIME-Version: 1.0 In-Reply-To: References: Date: Tue, 25 Aug 2015 06:59:53 -0400 Message-ID: Subject: Re: Proposal for finishing the 64-bit x86 syscall cleanup From: Brian Gerst To: Andy Lutomirski Cc: X86 ML , Denys Vlasenko , Borislav Petkov , Linus Torvalds , "linux-kernel@vger.kernel.org" , Jan Beulich Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3077 Lines: 68 On Mon, Aug 24, 2015 at 5:13 PM, Andy Lutomirski wrote: > Hi all- > > I want to (try to) mostly or fully get rid of the messy bits (as > opposed to the hardware-bs-forced bits) of the 64-bit syscall asm. > There are two major conceptual things that are in the way. > > Thing 1: partial pt_regs > > 64-bit fast path syscalls don't fully initialize pt_regs: bx, bp, and > r12-r15 are uninitialized. Some syscalls require them to be > initialized, and they have special awful stubs to do it. The entry > and exit tracing code (except for phase1 tracing) also need them > initialized, and they have their own messy initialization. Compat > syscalls are their own private little mess here. > > This gets in the way of all kinds of cleanups, because C code can't > switch between the full and partial pt_regs states. > > I can see two ways out. We could remove the optimization entirely, > which consists of pushing and popping six more registers and adds > about ten cycles to fast path syscalls on Sandy Bridge. It also > simplifies and presumably speeds up the slow paths. > > We could also annotate with syscalls need full regs and jump to the > slow path for them. This would leave the fast path unchanged (we > could duplicate the sys call table so that regs-requiring syscalls > would turn into some asm that switches to the slow path). We'd make > the syscall table say something like: > > 59 64 execve sys_execve:regs > > The fast path would have exactly identical performance and the slow > path would presumably speed up. The down side would be additional > complexity. I don't think it is worth it to optimize the syscalls that need full pt_regs (which are generally quite expensive and less frequently used) at the expense of every other syscall. What kind of cleanups, other than just removing the stubs, would this allow? Is there more code you plan to move to C? > Thing 2: vdso compilation with binutils that doesn't support .cfi directives > > Userspace debuggers really like having the vdso properly > CFI-annotated, and the 32-bit fast syscall entries are annotatied > manually in hexidecimal. AFAIK Jan Beulich is the only person who > understands it. > > I want to be able to change the entries a little bit to clean them up > (and possibly rework the SYSCALL32 and SYSENTER register tricks, which > currently suck), but it's really, really messy right now because of > the hex CFI stuff. Could we just drop the CFI annotations if the > binutils version is too old or even just require new enough binutils > to build 32-bit and compat kernels? One thing I want to do is rework the 32-bit VDSO into a single image, using alternatives to handle the selection of entry method. The open-coded CFI crap has made that near impossible to do. -- Brian Gerst -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/