Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: [PATCH 0 of 7] x86/paravirt: optimise pvop calls and register use
Message-Id: <patchbomb.1233182100@abulafia.goop.org>
Date: Wed, 28 Jan 2009 14:35:00 -0800
From: Jeremy Fitzhardinge <jeremy@goop.org>
To: Ingo Molnar <mingo@elte.hu>
Cc: linux-kernel@vger.kernel.org, Xen-devel <xen-devel@lists.xensource.com>,
       the arch/x86 maintainers <x86@kernel.org>,
       Ian Campbell <ian.campbell@citrix.com>,
       Zachary Amsden <zach@vmware.com>, Rusty Russell <rusty@rustcorp.com.au>,
       Ravikiran Thirumalai <kiran@scalemp.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 3096
Lines: 67

Hi Ingo,

This series implements a sequence of optimisations to reduce the
impact of enabling CONFIG_PARAVIRT while running native.

They are:
  (0. Move some Xen code around to make later changes work better.)

  1. For a number of the pvops, the native implemention is a simple
     identity function which returns its argument.  Add a specific
     paravirt identity function which the patcher can treat specially,
     by directly inlining the either nops (32-bit) or a mov (64-bit)
     into the instruction stream.

  2. When a pvop is called from asm code, it also provides a hint
     about what registers are available to be clobbered by the called
     code.  Until now, that information was ignored, and all
     caller-save registers were saved.  Now, don't bother
     saving/restoring registers which are clobberable.

  3. The C calling convention lists which registers the caller can
     expect to survive a function call, and which the callee is
     allowed to clobber.  The latter set is quite large, especially on
     64-bit.  This means that converting a pile of simple inline
     functions into function calls caused a lot more register
     pressure, making the generated code much worse.

     I introduce a new "callee-save" calling convention which makes
     only the return register (eax:edx on 32-bit, rax on 64)
     callee-clobberable; the callee must preserve all other registers,
     including the argument registers.

     This makes the callsites for these functions clobber many fewer
     registers, giving the compiler a chance to generate better code.

     Small asm functions, which generally only use one or two
     registers anyway, to be directly called.  C code can also be
     called via a thunk, which does the necessary register
     saving/restoring (generated by PV_CALLEE_SAVE_REGS_THUNK(func)).

     The irq enable/disable/save/restore functions are the first to
     make use of this calling convention, since they are the most
     commonly used in the kernel, and are also called form asm code.

  4. Convert the pte_val/make_pte identity functions to use the
     callee-save convention; they're only identity functions anyway,
     so they have no need to trash lots of registers.

I had to make some adjustments to VSMP and lguest to match the new
calling conventions.  I wasn't sure how I should change VMI, so I'm
waiting for Zach's input on that (VMI doesn't compile at the moment).

In testing, the net result was that the overhead dropped by about 75%,
though I found it hard to really get stable results.  The most obvious
improvement was a reduction in L2 references, presumably meaning that
L1 was getting a better hit rate.  Each of these transforms is an
unambigious improvement in generated code for the native case, so I'm
curious to see what other people see.

Thanks,
	J

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/