2015-11-04 14:47:31

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support

On Thu, 1 Oct 2015 23:09:34 -0700
Fenghua Yu <[email protected]> wrote:

> This series has some preparatory patches and Intel cache allocation
> support.

Ping? What's the status of this series?

We badly need this series for KVM-RT workloads. I did try it and it
seems to work but, apart from small fixable issues which I'll reply
to specific patches to point out, there are some design issues which
I need some clarification. They are in order of relevance:

o Cache reservations are global to all NUMA nodes

CAT is mostly intended for real-time and high performance
computing. For both of them the most common setup is to
pin your threads to specific cores on a specific NUMA node.

So, suppose I have two HPC threads pinned to specific cores
on node1. I want to reserve 80% of the L3 cache to those
threads. With current patches I'd do this:

1. Create a "all-tasks" cgroup which can only access 20% of
the cache
2. Create a "hpc" cgroup which can access 80% of the cache
3. Move my HPC threads to "hpc" and all the other threads to
"all-tasks"

This has the intended behavior on node1: the "hpc" threads
will write into 80% of the L3 cache and any "all-tasks" threads
executing there will only write into 20% of the cache.

However, this is also true for node0! So, the "all-tasks"
threads can only write into 20% of the cache in node0 even
though "hpc" threads will never execute there.

Is this intended by design? Like, is this a hardware limitation
(given that the IA32_L3_MASK_n MSRs are global anyways) or maybe
a way to enforce cache coherence?

I was wondering if we could have masks per NUMA node, where
they are applied to processes whenever they migrate among
NUMA nodes.

o How does this feature apply to kernel threads?

I'm just unable to move kernel threads out of the root
cgroup. This means that kernel threads can always write
into all cache no matter what the reservation scheme is.

Is this intended by design? Why? Unless I'm missing
something, reservations could and should be applied to
kernel threads as well.

o You can't change the root cgroup's CBM

I can understand this makes the implementation a lot simpler.
However, the reality is that there are way too little CBMs
and loosing one for the root group seems like a waste.

Can we change this or is there strong reasons not to do so?

o cgroups hierarchy is limited by the number of CBMs

Today on my Haswell system, this means that I can only have 3
directories in my cgroups hierarchy. If the number of CBMs
are expected to grow in next processors, then I think having
this feature as cgroups makes sense. However, if we're still
going to be this limited in terms of directory structure, then
it seems a bit overkill to me to have this as cgroups


2015-11-04 14:58:35

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support

On Wed, 4 Nov 2015, Luiz Capitulino wrote:

> On Thu, 1 Oct 2015 23:09:34 -0700
> Fenghua Yu <[email protected]> wrote:
>
> > This series has some preparatory patches and Intel cache allocation
> > support.
>
> Ping? What's the status of this series?

We still need to agree on the user space interface which is the
hardest part of it....

Thanks,

tglx

2015-11-04 15:12:36

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support

On Wed, 4 Nov 2015 15:57:41 +0100 (CET)
Thomas Gleixner <[email protected]> wrote:

> On Wed, 4 Nov 2015, Luiz Capitulino wrote:
>
> > On Thu, 1 Oct 2015 23:09:34 -0700
> > Fenghua Yu <[email protected]> wrote:
> >
> > > This series has some preparatory patches and Intel cache allocation
> > > support.
> >
> > Ping? What's the status of this series?
>
> We still need to agree on the user space interface which is the
> hardest part of it....

My understanding is that two interfaces have been proposed: the cgroups
one and an API based on syscalls or ioctls.

Are those proposals mutual exclusive? What about having the cgroups one
merged IFF it's useful, and having the syscall API later if really
needed?

I don't want to make the wrong decision, but the cgroups interface is
here. Holding it while we discuss a perfect interface that doesn't
even exist will just do a bad service for users.

2015-11-04 15:28:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support

On Wed, 4 Nov 2015, Luiz Capitulino wrote:
> On Wed, 4 Nov 2015 15:57:41 +0100 (CET)
> Thomas Gleixner <[email protected]> wrote:
>
> > On Wed, 4 Nov 2015, Luiz Capitulino wrote:
> >
> > > On Thu, 1 Oct 2015 23:09:34 -0700
> > > Fenghua Yu <[email protected]> wrote:
> > >
> > > > This series has some preparatory patches and Intel cache allocation
> > > > support.
> > >
> > > Ping? What's the status of this series?
> >
> > We still need to agree on the user space interface which is the
> > hardest part of it....
>
> My understanding is that two interfaces have been proposed: the cgroups
> one and an API based on syscalls or ioctls.
>
> Are those proposals mutual exclusive? What about having the cgroups one
> merged IFF it's useful, and having the syscall API later if really
> needed?
>
> I don't want to make the wrong decision, but the cgroups interface is
> here. Holding it while we discuss a perfect interface that doesn't
> even exist will just do a bad service for users.

Well, no. We do not just introduce a random user space ABI simply
because we have to support it forever.

Thanks,

tglx

2015-11-04 15:36:05

by Luiz Capitulino

[permalink] [raw]
Subject: Re: [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support

On Wed, 4 Nov 2015 16:28:04 +0100 (CET)
Thomas Gleixner <[email protected]> wrote:

> On Wed, 4 Nov 2015, Luiz Capitulino wrote:
> > On Wed, 4 Nov 2015 15:57:41 +0100 (CET)
> > Thomas Gleixner <[email protected]> wrote:
> >
> > > On Wed, 4 Nov 2015, Luiz Capitulino wrote:
> > >
> > > > On Thu, 1 Oct 2015 23:09:34 -0700
> > > > Fenghua Yu <[email protected]> wrote:
> > > >
> > > > > This series has some preparatory patches and Intel cache allocation
> > > > > support.
> > > >
> > > > Ping? What's the status of this series?
> > >
> > > We still need to agree on the user space interface which is the
> > > hardest part of it....
> >
> > My understanding is that two interfaces have been proposed: the cgroups
> > one and an API based on syscalls or ioctls.
> >
> > Are those proposals mutual exclusive? What about having the cgroups one
> > merged IFF it's useful, and having the syscall API later if really
> > needed?
> >
> > I don't want to make the wrong decision, but the cgroups interface is
> > here. Holding it while we discuss a perfect interface that doesn't
> > even exist will just do a bad service for users.
>
> Well, no. We do not just introduce a random user space ABI simply
> because we have to support it forever.

I don't think it's random, it's in discussion for a long time and
Peter seems to be in favor of it.

But I'm all for progress here whatever route we take. In that regard,
what's your opinion on the best way to move forward?

2015-11-04 15:51:29

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH V15 00/11] x86: Intel Cache Allocation Technology Support

On Wed, 4 Nov 2015, Luiz Capitulino wrote:
> On Wed, 4 Nov 2015 16:28:04 +0100 (CET)
> Thomas Gleixner <[email protected]> wrote:
>
> > On Wed, 4 Nov 2015, Luiz Capitulino wrote:
> > > On Wed, 4 Nov 2015 15:57:41 +0100 (CET)
> > > Thomas Gleixner <[email protected]> wrote:
> > >
> > > > On Wed, 4 Nov 2015, Luiz Capitulino wrote:
> > > >
> > > > > On Thu, 1 Oct 2015 23:09:34 -0700
> > > > > Fenghua Yu <[email protected]> wrote:
> > > > >
> > > > > > This series has some preparatory patches and Intel cache allocation
> > > > > > support.
> > > > >
> > > > > Ping? What's the status of this series?
> > > >
> > > > We still need to agree on the user space interface which is the
> > > > hardest part of it....
> > >
> > > My understanding is that two interfaces have been proposed: the cgroups
> > > one and an API based on syscalls or ioctls.
> > >
> > > Are those proposals mutual exclusive? What about having the cgroups one
> > > merged IFF it's useful, and having the syscall API later if really
> > > needed?
> > >
> > > I don't want to make the wrong decision, but the cgroups interface is
> > > here. Holding it while we discuss a perfect interface that doesn't
> > > even exist will just do a bad service for users.
> >
> > Well, no. We do not just introduce a random user space ABI simply
> > because we have to support it forever.
>
> I don't think it's random, it's in discussion for a long time and
> Peter seems to be in favor of it.

It does not matter whether it's in discussion for a long time. We have
requests for functionality which cannot be covered with that
interface.

> But I'm all for progress here whatever route we take. In that regard,
> what's your opinion on the best way to move forward?

Talk to the people in your very company, who are having a different
opinion and requests for stuff which cannot be handled by the current
proposed interface. You had yourself a list of things you want to see
handled.

So feel free to come up with patches which implement that instead of
telling us that your company needs it badly for some reason.

Thanks,

tglx