Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753076AbZA1Wmr (ORCPT ); Wed, 28 Jan 2009 17:42:47 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751296AbZA1Wmg (ORCPT ); Wed, 28 Jan 2009 17:42:36 -0500 Received: from gw.goop.org ([64.81.55.164]:52550 "EHLO abulafia.goop.org" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751332AbZA1Wmf (ORCPT ); Wed, 28 Jan 2009 17:42:35 -0500 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: [PATCH 0 of 7] x86/paravirt: optimise pvop calls and register use Message-Id: Date: Wed, 28 Jan 2009 14:35:00 -0800 From: Jeremy Fitzhardinge To: Ingo Molnar Cc: linux-kernel@vger.kernel.org, Xen-devel , the arch/x86 maintainers , Ian Campbell , Zachary Amsden , Rusty Russell , Ravikiran Thirumalai Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3096 Lines: 67 Hi Ingo, This series implements a sequence of optimisations to reduce the impact of enabling CONFIG_PARAVIRT while running native. They are: (0. Move some Xen code around to make later changes work better.) 1. For a number of the pvops, the native implemention is a simple identity function which returns its argument. Add a specific paravirt identity function which the patcher can treat specially, by directly inlining the either nops (32-bit) or a mov (64-bit) into the instruction stream. 2. When a pvop is called from asm code, it also provides a hint about what registers are available to be clobbered by the called code. Until now, that information was ignored, and all caller-save registers were saved. Now, don't bother saving/restoring registers which are clobberable. 3. The C calling convention lists which registers the caller can expect to survive a function call, and which the callee is allowed to clobber. The latter set is quite large, especially on 64-bit. This means that converting a pile of simple inline functions into function calls caused a lot more register pressure, making the generated code much worse. I introduce a new "callee-save" calling convention which makes only the return register (eax:edx on 32-bit, rax on 64) callee-clobberable; the callee must preserve all other registers, including the argument registers. This makes the callsites for these functions clobber many fewer registers, giving the compiler a chance to generate better code. Small asm functions, which generally only use one or two registers anyway, to be directly called. C code can also be called via a thunk, which does the necessary register saving/restoring (generated by PV_CALLEE_SAVE_REGS_THUNK(func)). The irq enable/disable/save/restore functions are the first to make use of this calling convention, since they are the most commonly used in the kernel, and are also called form asm code. 4. Convert the pte_val/make_pte identity functions to use the callee-save convention; they're only identity functions anyway, so they have no need to trash lots of registers. I had to make some adjustments to VSMP and lguest to match the new calling conventions. I wasn't sure how I should change VMI, so I'm waiting for Zach's input on that (VMI doesn't compile at the moment). In testing, the net result was that the overhead dropped by about 75%, though I found it hard to really get stable results. The most obvious improvement was a reduction in L2 references, presumably meaning that L1 was getting a better hit rate. Each of these transforms is an unambigious improvement in generated code for the native case, so I'm curious to see what other people see. Thanks, J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/