Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754096AbbFRJbn (ORCPT ); Thu, 18 Jun 2015 05:31:43 -0400 Received: from mail-wg0-f47.google.com ([74.125.82.47]:34338 "EHLO mail-wg0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752890AbbFRJbk (ORCPT ); Thu, 18 Jun 2015 05:31:40 -0400 Date: Thu, 18 Jun 2015 11:31:34 +0200 From: Ingo Molnar To: Denys Vlasenko Cc: Linus Torvalds , Steven Rostedt , Borislav Petkov , "H. Peter Anvin" , Andy Lutomirski , Oleg Nesterov , Frederic Weisbecker , Alexei Starovoitov , Will Drewry , Kees Cook , x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 4/5] x86/asm/entry/32: Replace RESTORE_RSI_RDI[_RDX] with open-coded 32-bit reads Message-ID: <20150618093134.GA1094@gmail.com> References: <1433876051-26604-1-git-send-email-dvlasenk@redhat.com> <1433876051-26604-4-git-send-email-dvlasenk@redhat.com> <20150614084059.GA24562@gmail.com> <557D9BEE.8010902@redhat.com> <20150615202008.GA12450@gmail.com> <557F6CC3.7070709@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <557F6CC3.7070709@redhat.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1956 Lines: 47 * Denys Vlasenko wrote: > On 06/15/2015 10:20 PM, Ingo Molnar wrote: > >> Actually, ecx and r11 need to be loaded first. They are not so much "restored" > >> as "prepared for SYSRET insn". Every cycle lost in loading these delays SYSRET. > >> [...] > > > > So in the typical case they will still be cached, and so their max latency should > > be around 3 cycles. > > If syscall flushes caches (say, a large read), or sleeps > and CPU schedules away, then pt_regs->ip,flags are evicted > and need to be reloaded. > > > In fact because they are memory loads, they don't really have dependencies, > > they should be available to SYSRET almost immediately, > > They depend on the memory data. > > > i.e. within a cycle - and > > there's no reason to believe why these loads wouldn't pipeline properly and > > parallelize with the many other things SYSRET has to do to organize a return to > > user-space, before it can actually use the target RIP and RFLAGS. > > This does not sound right. > > If it takes, say, 20 cycles to pull data from e.g. L3 cache to ECX, > then SYSRET can't possibly complete sooner than in 20 cycles. Yeah, that's true, but my point is: SYSRET has to do a lot of other things (permission checks, loading the user mode state - most of which are unrelated to R11/RCX), which take dozens of cycles, and which are probably overlapped with any cache misses on arguments such as R11/RCX. It's not impossible that reordering helps, for example if SYSRET has some internal dependencies that makes it parallelism worse than ideal - but I'd complicate this code only if it gives a measurable improvement for cache-cold syscall performance. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/