I'm just reworking the x86 tlb code to use smp_call_function_mask, and I
see how the UV tlb flushing hooks in. A few things occur to me:
1. There should be a CONFIG_X86_UV to select this code. tlb_uv.o is
around 6k, which is not trivial overhead to subject every x86_64
kernel to.
2. CONFIG_X86_UV should either depend on or select CONFIG_PARAVIRT.
3. You should hook into paravirt_ops to enable your tlb-flush code.
That is, in - say - uv_bau_init() you do
"pv_mmu_ops.flush_tlb_others = uv_flush_tlb_others". This removes
a test/branch in the generic code. Using paravirt_ops may open
other opportunities to put UV-optimised functions in place without
having to modify generic code.
My understanding is that the UV hardware has some kind of
payload-carrying IPI mechanism, which is a capability could be useful to
express in a higher-level way in the kernel. Certainly I could imagine
using it in a virtual environment as a way to do inter-VCPU messaging
with less context switch overhead.
Thanks,
J
On Tuesday 29 July 2008 10:28, Jeremy Fitzhardinge wrote:
> I'm just reworking the x86 tlb code to use smp_call_function_mask, and I
> see how the UV tlb flushing hooks in. A few things occur to me:
>
> 1. There should be a CONFIG_X86_UV to select this code. tlb_uv.o is
> around 6k, which is not trivial overhead to subject every x86_64
> kernel to.
Definitely.
> 2. CONFIG_X86_UV should either depend on or select CONFIG_PARAVIRT.
> 3. You should hook into paravirt_ops to enable your tlb-flush code.
> That is, in - say - uv_bau_init() you do
> "pv_mmu_ops.flush_tlb_others = uv_flush_tlb_others". This removes
> a test/branch in the generic code. Using paravirt_ops may open
> other opportunities to put UV-optimised functions in place without
> having to modify generic code.
Really? It's not virtualized at all, although I don't like adding that
branch for such a small class of systems either.
It would possibly be better to have a new function (eg.
override_flush_tlb_others()), which returns 0 if
CONFIG_OVERRIDE_FLUSH_TLB is set, otherwise branches. And have *that*
selected by CONFIG_PARAVIRT and X86_UV.
> My understanding is that the UV hardware has some kind of
> payload-carrying IPI mechanism, which is a capability could be useful to
> express in a higher-level way in the kernel. Certainly I could imagine
> using it in a virtual environment as a way to do inter-VCPU messaging
> with less context switch overhead.
Yes, as I said in my review of that part of the UV tlb flushing, it would
be nice to have a generic mechanism to IPI with payload, which falls back
to a smp_call_function-like approach on platforms that don't have the
capability.
Nick Piggin wrote:
>> 2. CONFIG_X86_UV should either depend on or select CONFIG_PARAVIRT.
>> 3. You should hook into paravirt_ops to enable your tlb-flush code.
>> That is, in - say - uv_bau_init() you do
>> "pv_mmu_ops.flush_tlb_others = uv_flush_tlb_others". This removes
>> a test/branch in the generic code. Using paravirt_ops may open
>> other opportunities to put UV-optimised functions in place without
>> having to modify generic code.
>>
>
> Really? It's not virtualized at all, although I don't like adding that
> branch for such a small class of systems either.
>
It's not virtualized, but paravirt_ops provides a wide range of
low-level hooks into all kinds of useful things; we may as well use them
if they're there. It's similar to VSMP's use of pvops: they do
something odd with shadowing the interrupt flag in AC in flags, and hook
irq_enable/disable/save/restore to implement it.
> It would possibly be better to have a new function (eg.
> override_flush_tlb_others()), which returns 0 if
> CONFIG_OVERRIDE_FLUSH_TLB is set, otherwise branches. And have *that*
> selected by CONFIG_PARAVIRT and X86_UV.
>
There doesn't seem much point. CONFIG_PARAVIRT will turn all the
flush_tlb_others() into indirect calls which can be hooked, then the
paravirt patching machinery will turn them back into direct calls. So
it basically gives you the flexibility of pluggin in arbitrary
functions, but with zero runtime overhead.
J
On Tue, Jul 29, 2008 at 02:12:18PM +1000, Nick Piggin wrote:
> On Tuesday 29 July 2008 10:28, Jeremy Fitzhardinge wrote:
> > I'm just reworking the x86 tlb code to use smp_call_function_mask, and I
> > see how the UV tlb flushing hooks in. A few things occur to me:
> >
> > 1. There should be a CONFIG_X86_UV to select this code. tlb_uv.o is
> > around 6k, which is not trivial overhead to subject every x86_64
> > kernel to.
>
> Definitely.
I'd like to talk about this issue separate from the virtualization one.
I think that the Linux distributions are not going to build a special
UV kernel, are they? So every distro would have to be prompted to
turn on CONFIG_X86_UV, or else their kernel is not going to boot on UV.
But you have a point about not linking the 6k UV object file where
size is an issue.
Thanks for catching that.
Perhaps the UV code should be excluded if CONFIG_EMBEDDED is set.
-Cliff
Cliff Wickman wrote:
> I think that the Linux distributions are not going to build a special
> UV kernel, are they? So every distro would have to be prompted to
> turn on CONFIG_X86_UV, or else their kernel is not going to boot on UV.
>
Distros will generally turn on everything. You could have it on by
default if the kernel is built for CONFIG_X86_GENERICARCH which enables
support for other big numa configurations. I think distros generally
build with that enabled anyway.
J
On Tue, Jul 29, 2008 at 07:43:39AM -0700, Jeremy Fitzhardinge wrote:
> Cliff Wickman wrote:
>> I think that the Linux distributions are not going to build a special
>> UV kernel, are they? So every distro would have to be prompted to
>> turn on CONFIG_X86_UV, or else their kernel is not going to boot on UV.
>>
>
> Distros will generally turn on everything. You could have it on by
> default if the kernel is built for CONFIG_X86_GENERICARCH which enables
> support for other big numa configurations. I think distros generally
> build with that enabled anyway.
>
> J
I don't know much about x86 configuration options. It looks like
CONFIG_X86_GENERICARCH is fairly new, so I don't know where to look
for samples of how it will be set by RH and SuSE.
But if the tlb_uv.o code should be present in "every" distro x86 kernel
I don't see the point of having to configure it in. Why not just
configure it out for small (embedded) kernels?
-Cliff
--
Cliff Wickman
Silicon Graphics, Inc.
[email protected]
(651) 683-3824
Cliff Wickman wrote:
> I don't know much about x86 configuration options. It looks like
> CONFIG_X86_GENERICARCH is fairly new, so I don't know where to look
> for samples of how it will be set by RH and SuSE.
>
No, it's a very old option. Certainly the Fedora Rawhide kernel I have
sitting here has it set, and I'm pretty sure RHEL always has it set,
since it's necessary to support the big iron hardware that people like
to run RHEL on. Basically, if a distro doesn't have it set, then they
don't want to support big machines like yours and you should just
recommend your customers use something else. There are a lot of other
config options which will affect whether a kernel is suitable for your
hardware, like max cpus. A "desktop only" distro might set max cpus to
8 or 16, which I assume would be disappointingly small for someone who's
just bought a 4k cpu machine.
> But if the tlb_uv.o code should be present in "every" distro x86 kernel
> I don't see the point of having to configure it in. Why not just
> configure it out for small (embedded) kernels?
Because it's not an binary thing. Lots of people who are compiling
their own kernels for specialized uses don't set CONFIG_EMBEDDED, but
also don't want a kitchen sink kernel. 6k isn't that much, but if every
obscure platform enabled some always-on code it rapidly starts to build up.
Basically, if you want to make sure if you're going to get some level of
distro support, you need to make contact with the distros directly and
talk about what you'd like them to do.
J
On Tue, Jul 29, 2008 at 7:43 AM, Jeremy Fitzhardinge <[email protected]> wrote:
> Cliff Wickman wrote:
>>
>> I think that the Linux distributions are not going to build a special
>> UV kernel, are they? So every distro would have to be prompted to turn on
>> CONFIG_X86_UV, or else their kernel is not going to boot on UV.
>>
>
> Distros will generally turn on everything. You could have it on by default
> if the kernel is built for CONFIG_X86_GENERICARCH which enables support for
> other big numa configurations. I think distros generally build with that
> enabled anyway.
config X86_GENERICARCH
bool "Generic architecture"
depends on X86_32
help
This option compiles in the NUMAQ, Summit, bigsmp, ES7000, default
subarchitectures. It is intended for a generic binary kernel.
if you select them all, kernel will probe it one by one. and will
fallback to default.
YH
Yinghai Lu wrote:
> On Tue, Jul 29, 2008 at 7:43 AM, Jeremy Fitzhardinge <[email protected]> wrote:
>
>> Cliff Wickman wrote:
>>
>>> I think that the Linux distributions are not going to build a special
>>> UV kernel, are they? So every distro would have to be prompted to turn on
>>> CONFIG_X86_UV, or else their kernel is not going to boot on UV.
>>>
>>>
>> Distros will generally turn on everything. You could have it on by default
>> if the kernel is built for CONFIG_X86_GENERICARCH which enables support for
>> other big numa configurations. I think distros generally build with that
>> enabled anyway.
>>
>
> config X86_GENERICARCH
> bool "Generic architecture"
> depends on X86_32
>
Ah, overlooked that.
OK, well, either way it still needs to be a separate config option.
J
Jeremy Fitzhardinge wrote:
> Yinghai Lu wrote:
>> On Tue, Jul 29, 2008 at 7:43 AM, Jeremy Fitzhardinge <[email protected]>
>> wrote:
>>
>>> Cliff Wickman wrote:
>>>
>>>> I think that the Linux distributions are not going to build a special
>>>> UV kernel, are they? So every distro would have to be prompted to
>>>> turn on
>>>> CONFIG_X86_UV, or else their kernel is not going to boot on UV.
>>>>
>>>>
>>> Distros will generally turn on everything. You could have it on by
>>> default
>>> if the kernel is built for CONFIG_X86_GENERICARCH which enables
>>> support for
>>> other big numa configurations. I think distros generally build with
>>> that
>>> enabled anyway.
>>>
>>
>> config X86_GENERICARCH
>> bool "Generic architecture"
>> depends on X86_32
>>
>
> Ah, overlooked that.
>
> OK, well, either way it still needs to be a separate config option.
>
> J
Should there be a "generic" X86_ENTERPRISE_ARCH which turns on various
capabilities? Our licenses and support agreements with SuSE and RH are
for their "Enterprise Editions" though the customer is free to run
whatever (unsupported from us). Asking distros to enable that option
should be a no-brainer. And it could be defaulted ON with the comment
to turn it off if you have a small desktop system. This way they won't
forget... ;-) [Actually, this could turn on MAXSMP as well.]
Thanks,
Mike
Mike Travis wrote:
> Should there be a "generic" X86_ENTERPRISE_ARCH which turns on various
> capabilities? Our licenses and support agreements with SuSE and RH are
> for their "Enterprise Editions" though the customer is free to run
> whatever (unsupported from us). Asking distros to enable that option
> should be a no-brainer. And it could be defaulted ON with the comment
> to turn it off if you have a small desktop system. This way they won't
> forget... ;-) [Actually, this could turn on MAXSMP as well.]
>
I don't have a strong opinion, but I suspect that distros actually pay
attention to each option being set, since they end up having to support
them all anyway. As such, they'd generally prefer to explicitly turn
each thing on or off themselves, and big-switch config options aren't
all that useful.
But you would have to talk to a distro person to get confirmation ;)
J
On Tue, Jul 29, 2008 at 10:46:41AM -0700, Jeremy Fitzhardinge wrote:
> Cliff Wickman wrote:
>> But if the tlb_uv.o code should be present in "every" distro x86 kernel
>> I don't see the point of having to configure it in. Why not just
>> configure it out for small (embedded) kernels?
>
> Because it's not an binary thing. Lots of people who are compiling
> their own kernels for specialized uses don't set CONFIG_EMBEDDED, but
> also don't want a kitchen sink kernel. 6k isn't that much, but if every
> obscure platform enabled some always-on code it rapidly starts to build
> up.
But UV will not be obscure! It will be common among x86_64 :)
I know what you're talking about. It may sometimes be useful to turn
off chunks of code you don't need.
But is the specialized application of 64-bit processors big enough to
warrant the feature?
The size of most any 64-bit system would, I would think, make 6k of
code insignificant.
And the more options you add, the more likely someone will pick
combinations that won't work together.
> Basically, if you want to make sure if you're going to get some level of
> distro support, you need to make contact with the distros directly and
> talk about what you'd like them to do.
> J
And we do. And could reasonably expect that they would turn on that
option for x86_64 kernels. We'd just, of course, rather not have to watch
and prompt to be sure all x86_64 kernels will run on our hardware.
-Cliff
--
Cliff Wickman
Silicon Graphics, Inc.
[email protected]
(651) 683-3824