Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753008AbbBYT71 (ORCPT ); Wed, 25 Feb 2015 14:59:27 -0500 Received: from mail-lb0-f173.google.com ([209.85.217.173]:46018 "EHLO mail-lb0-f173.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751852AbbBYT7Z (ORCPT ); Wed, 25 Feb 2015 14:59:25 -0500 MIME-Version: 1.0 In-Reply-To: <54EE1799.2000602@redhat.com> References: <1423778052-21038-1-git-send-email-dvlasenk@redhat.com> <1423778052-21038-2-git-send-email-dvlasenk@redhat.com> <54EE1799.2000602@redhat.com> Date: Wed, 25 Feb 2015 23:59:23 +0400 Message-ID: Subject: Re: [PATCH 2/3 v3] x86: entry_64.S: always allocate complete "struct pt_regs" From: Andrey Wagin To: Denys Vlasenko Cc: Andy Lutomirski , Linus Torvalds , Oleg Nesterov , Borislav Petkov , "H. Peter Anvin" , Frederic Weisbecker , X86 ML , Alexei Starovoitov , Will Drewry , Kees Cook , LKML Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4476 Lines: 111 2015-02-25 21:42 GMT+03:00 Denys Vlasenko : > On 02/25/2015 01:37 PM, Andrey Wagin wrote: >> 2015-02-13 0:54 GMT+03:00 Denys Vlasenko : >>> 64-bit code was using six stack slots less by not saving/restoring >>> registers which are callee-preserved according to C ABI, >>> and not allocating space for them. >>> Only when syscall needed a complete "struct pt_regs", >>> the complete area was allocated and filled in. >>> As an additional twist, on interrupt entry a "slightly less truncated pt_regs" >>> trick is used, to make nested interrupt stacks easier to unwind. >>> >>> This proved to be a source of significant obfuscation and subtle bugs. >>> For example, stub_fork had to pop the return address, >>> extend the struct, save registers, and push return address back. Ugly. >>> ia32_ptregs_common pops return address and "returns" via jmp insn, >>> throwing a wrench into CPU return stack cache. >>> >>> This patch changes code to always allocate a complete "struct pt_regs". >>> The saving of registers is still done lazily. >>> >>> "Partial pt_regs" trick on interrupt stack is retained. >>> >>> Macros which manipulate "struct pt_regs" on stack are reworked: >>> ALLOC_PT_GPREGS_ON_STACK allocates the structure. >>> SAVE_C_REGS saves to it those registers which are clobbered by C code. >>> SAVE_EXTRA_REGS saves to it all other registers. >>> Corresponding RESTORE_* and REMOVE_PT_GPREGS_FROM_STACK macros reverse it. >>> >>> ia32_ptregs_common, stub_fork and friends lost their ugly dance with >>> return pointer. >>> >>> LOAD_ARGS32 in ia32entry.S now uses symbolic stack offsets >>> instead of magic numbers. >>> >>> error_entry and save_paranoid now use SAVE_C_REGS + SAVE_EXTRA_REGS >>> instead of having it open-coded yet again. >>> >>> Patch was run-tested: 64-bit executables, 32-bit executables, >>> strace works. >>> Timing tests did not show measurable difference in 32-bit >>> and 64-bit syscalls. >> >> Hello Denys, >> >> My test vm doesn't boot with this patch. Could you help to investigate >> this issue? > > I think I found it. This part of my patch is possibly wrong: > > @@ -171,9 +171,9 @@ static inline int arch_irqs_disabled(void) > #define ARCH_LOCKDEP_SYS_EXIT_IRQ \ > TRACE_IRQS_ON; \ > sti; \ > - SAVE_REST; \ > + SAVE_EXTRA_REGS; \ > LOCKDEP_SYS_EXIT; \ > - RESTORE_REST; \ > + RESTORE_EXTRA_REGS; \ > cli; \ > TRACE_IRQS_OFF; > > The "SAVE_REST" here is intended to really *push* extra regs on stack, > but the patch changed it so that they are written to existing stack > slots above. > > From code inspection it should work in almost all cases, but some > locations where it is used are really obscure. > > If there are places where *pushing* regs is really necessary, > this can corrupt rbp,rbx,r12-15 registers. > > Your config has CONFIG_LOCKDEP=y, I think it's worth trying whether the bug > was here. > Please find updated patch attached. Can you try it? It doesn't work [ 2.282198] microcode: CPU0 sig=0x623, pf=0x0, revision=0x1 [ 2.288321] microcode: CPU1 sig=0x623, pf=0x0, revision=0x1 [ 2.289139] microcode: CPU2 sig=0x623, pf=0x0, revision=0x1 [ 2.290618] microcode: CPU3 sig=0x623, pf=0x0, revision=0x1 [ 2.292417] microcode: Microcode Update Driver: v2.00 , Peter Oruba [ 2.392592] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0 [ 2.510882] systemd-fsck[349]: /dev/sda1: clean, 343/128016 files, 166911/512000 blocks [ 2.521463] Adding 4128764k swap on /dev/sda2. Priority:-1 extents:1 across:4128764k FS [ 2.573597] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null) [ 2.597771] systemd-journald[283]: Received request to flush runtime journal from PID 1 [ 2.802288] audit: type=1305 audit(1424892361.629:3): audit_pid=369 old=0 auid=4294967295 ses=4294967295 res=1 [ 2.819660] traps: systemd-cgroups[380] general protection ip:7fb227939028 sp:7fff6bac59c8 error:0 in ld-2.18.so[7fb227921000+20000] [ 3.016262] traps: systemd-cgroups[390] general protection ip:7f456f7b6028 sp:7fffdc059718 error:0 in ld-2.18.so[7f456f79e000+20000] > > -- > vda > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/