Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754206AbbGAP2A (ORCPT ); Wed, 1 Jul 2015 11:28:00 -0400 Received: from mail-lb0-f169.google.com ([209.85.217.169]:32878 "EHLO mail-lb0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753445AbbGAP1w (ORCPT ); Wed, 1 Jul 2015 11:27:52 -0400 MIME-Version: 1.0 In-Reply-To: <559405E5.7000405@redhat.com> References: <20150630213736.GQ10247@tucnak.redhat.com> <559405E5.7000405@redhat.com> From: Andy Lutomirski Date: Wed, 1 Jul 2015 08:27:30 -0700 Message-ID: Subject: Re: gcc feature request / RFC: extra clobbered regs To: Vladimir Makarov Cc: Jakub Jelinek , Andy Lutomirski , gcc@gcc.gnu.org, "linux-kernel@vger.kernel.org" , Linus Torvalds , "H. Peter Anvin" , Ingo Molnar , Thomas Gleixner Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3840 Lines: 80 On Wed, Jul 1, 2015 at 8:23 AM, Vladimir Makarov wrote: > > > On 06/30/2015 05:37 PM, Jakub Jelinek wrote: >> >> On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote: >>> >>> I'm working on a massive set of cleanups to Linux's syscall handling. >>> We currently have a nasty optimization in which we don't save rbx, >>> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions. >>> This works, but it makes the code a huge mess. I'd rather save all >>> regs in asm and then call C code. >>> >>> Unfortunately, this will add five cycles (on SNB) to one of the >>> hottest paths in the kernel. To counteract it, I have a gcc feature >>> request that might not be all that crazy. When writing C functions >>> intended to be called from asm, what if we could do: >>> >>> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14", >>> "r15"))) void func(void); >>> >>> This will save enough pushes and pops that it could easily give us our >>> five cycles back and then some. It's also easy to be compatible with >>> old GCC versions -- we could just omit the attribute, since preserving >>> a register is always safe. >>> >>> Thoughts? Is this totally crazy? Is it easy to implement? >>> >>> (I'm not necessarily suggesting that we do this for the syscall bodies >>> themselves. I want to do it for the entry and exit helpers, so we'd >>> still lose the five cycles in the full fast-path case, but we'd do >>> better in the slower paths, and the slower paths are becoming >>> increasingly important in real workloads.) >> >> GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG >> options, which allow to tweak the calling conventions; but it is per >> translation unit right now. It isn't clear which of these options >> you mean with the extra_clobber. >> I assume you are looking for a possibility to change this to be >> per-function, with caller with a different calling convention having to >> adjust for different ABI callee. To some extent, recent GCC versions >> do that automatically with -fipa-ra already - if some call used registers >> are not clobbered by some call and the caller can analyze that callee, >> it can stick values in such registers across the call. >> I'd say the most natural API for this would be to allow >> f{fixed,call-{used,saved}}-REG in target attribute. >> >> > One consequence of frequent changing calling convention per function or > register usage could be GCC slowdown. RA calculates too many data and it > requires a lot of time to recalculate them after something in the register > usage convention is changed. Do you mean that RA precalculates things based on the calling convention and saves it across functions? Hmm. I don't think this would be a big problem in my intended use case -- there would only be a handful of functions using this extension, and they'd have very few non-asm callers. > > Another consequence would be that RA fails generate the code in some cases > and even worse the failure might depend on version of GCC (I already saw PRs > where RA worked for an asm in one GCC version because a pseudo was changed > by equivalent constant and failed in another GCC version where it did not > happen). > Would this be a problem generating code for a function with extra "used" regs or just a problem generating code to call such a function. I imagine that, in the former case, RA's job would be easier, not harder, since there would be more registers to work with. In practice, though, I think it would just end up changing the prologue and epilogue. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/