Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754643AbbIABiX (ORCPT ); Mon, 31 Aug 2015 21:38:23 -0400 Received: from mail-oi0-f42.google.com ([209.85.218.42]:34384 "EHLO mail-oi0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754415AbbIABiS (ORCPT ); Mon, 31 Aug 2015 21:38:18 -0400 MIME-Version: 1.0 In-Reply-To: References: <1440861626-27008-1-git-send-email-brgerst@gmail.com> From: Andy Lutomirski Date: Mon, 31 Aug 2015 18:37:58 -0700 Message-ID: Subject: Re: [PATCH 0/7] x86 vdso32 cleanups To: Brian Gerst Cc: X86 ML , "linux-kernel@vger.kernel.org" , Ingo Molnar , "H. Peter Anvin" , Denys Vlasenko , Linus Torvalds Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4142 Lines: 108 On Mon, Aug 31, 2015 at 6:19 PM, Andy Lutomirski wrote: > > On Sun, Aug 30, 2015 at 7:52 PM, Andy Lutomirski wrote: >> >> On Sun, Aug 30, 2015 at 2:18 PM, Brian Gerst wrote: >> > On Sat, Aug 29, 2015 at 12:10 PM, Andy Lutomirski wrote: >> >> On Sat, Aug 29, 2015 at 8:20 AM, Brian Gerst wrote: >> >>> This patch set contains several cleanups to the 32-bit VDSO. The >> >>> main change is to only build one VDSO image, and select the syscall >> >>> entry point at runtime. >> >> >> >> Oh no, we have dueling patches! >> >> >> >> I have a 2/3 finished series that cleans up the AT_SYSINFO mess >> >> differently, as I outlined earlier. I've only done the compat and >> >> common bits (no 32-bit native support quite yet), and it enters >> >> successfully on Intel using SYSENTER and on (fake) AMD using SYSCALL. >> >> The SYSRET bit isn't there yet. >> >> >> >> Other than some ifdeffery, the final system_call.S looks like this: >> >> >> >> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/tree/arch/x86/entry/vdso/vdso32/system_call.S?h=x86/entry_compat >> >> >> >> The meat is (sorry for whitespace damage): >> >> >> >> .text >> >> .globl __kernel_vsyscall >> >> .type __kernel_vsyscall,@function >> >> ALIGN >> >> __kernel_vsyscall: >> >> CFI_STARTPROC >> >> /* >> >> * Reshuffle regs so that all of any of the entry instructions >> >> * will preserve enough state. >> >> */ >> >> pushl %edx >> >> CFI_ADJUST_CFA_OFFSET 4 >> >> CFI_REL_OFFSET edx, 0 >> >> pushl %ecx >> >> CFI_ADJUST_CFA_OFFSET 4 >> >> CFI_REL_OFFSET ecx, 0 >> >> movl %esp, %ecx >> >> >> >> #ifdef CONFIG_X86_64 >> >> /* If SYSENTER is available, use it. */ >> >> ALTERNATIVE_2 "", "sysenter", X86_FEATURE_SYSENTER32, \ >> >> "syscall", X86_FEATURE_SYSCALL32 >> >> #endif >> >> >> >> /* Enter using int $0x80 */ >> >> movl (%esp), %ecx >> >> int $0x80 >> >> GLOBAL(int80_landing_pad) >> >> >> >> /* Restore ECX and EDX in case they were clobbered. */ >> >> popl %ecx >> >> CFI_RESTORE ecx >> >> CFI_ADJUST_CFA_OFFSET -4 >> >> popl %edx >> >> CFI_RESTORE edx >> >> CFI_ADJUST_CFA_OFFSET -4 >> >> ret >> >> CFI_ENDPROC >> >> >> >> .size __kernel_vsyscall,.-__kernel_vsyscall >> >> .previous >> >> >> >> And that's it. >> >> >> >> What do you think? This comes with massively cleaned up kernel-side >> >> asm as well as a test case that actually validates the CFI directives. >> >> >> >> Certainly, a bunch of your patches make sense regardless, and I'll >> >> review them and add them to my queue soon. >> >> >> >> --Andy >> > >> > How does the performance compare to the original? Looking at the >> > disassembly, there are two added function calls, and it reloads the >> > args from the stack instead of just shuffling registers. >> >> The replacement is dramatically faster, which means I probably >> benchmarked it wrong. I'll try again in a day or two. > > > It's enough slower to be problematic. I need to figure out how to trace it properly. (Hmm? Maybe it's time to learn how to get perf on the host to trace a KVM guest.) > > Everything is and was hilariously slow with context tracking on. That needs to get fixed, and hopefully once this entry stuff is done someone will do the other end of it. > I got random errors from perf kvm, but I think I found at least part of the issue. The two irqs_disabled() calls in common.c are kind of expensive. I should disable them on non-lockdep kernels. The context tracking hooks are also too expensive, even when disabled. I should do something to optimize those. Hello, static keys? This doesn't affect syscalls, though. With context tracking off and the irqs_disabled checks commented out, we're probably doing well enough. We can always tweak the C code and aggressively force inlining if we want a few cycles back. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/