2012-10-01 08:48:43

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH v3 04/13] kmem accounting basic infrastructure

On 09/30/2012 12:23 PM, Tejun Heo wrote:
> Hello, Glauber.
>
> On Thu, Sep 27, 2012 at 10:30:36PM +0400, Glauber Costa wrote:
>>> But that happens only when pages enter and leave slab and if it still
>>> is significant, we can try to further optimize charging. Given that
>>> this is only for cases where memcg is already in use and we provide a
>>> switch to disable it globally, I really don't think this warrants
>>> implementing fully hierarchy configuration.
>>
>> Not totally true. We still have to match every allocation to the right
>> cache, and that is actually our heaviest hit, responsible for the 2, 3 %
>> we're seeing when this is enabled. It is the kind of path so hot that
>> people frown upon branches being added, so I don't think we'll ever get
>> this close to being free.
>
> Sure, depening on workload, any addition to alloc/free could be
> noticeable. I don't know. I'll write more about it when replying to
> Michal's message. BTW, __memcg_kmem_get_cache() does seem a bit
> heavy. I wonder whether indexing from cache side would make it
> cheaper? e.g. something like the following.
>
> kmem_cache *__memcg_kmem_get_cache(cachep, gfp)
> {
> struct kmem_cache *c;
>
> c = cachep->memcg_params->caches[percpu_read(kmemcg_slab_idx)];
> if (likely(c))
> return c;
> /* try to create and then fall back to cachep */
> }
>
> where kmemcg_slab_idx is updated from sched notifier (or maybe add and
> use current->kmemcg_slab_idx?). You would still need __GFP_* and
> in_interrupt() tests but current->mm and PF_KTHREAD tests can be
> rolled into index selection.
>

How big would this array be? there can be a lot more kmem_caches than
there are memcgs. That is why it is done from memcg side.


2012-10-03 22:55:09

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v3 04/13] kmem accounting basic infrastructure

Hello, Glauber.

On Mon, Oct 01, 2012 at 12:45:11PM +0400, Glauber Costa wrote:
> > where kmemcg_slab_idx is updated from sched notifier (or maybe add and
> > use current->kmemcg_slab_idx?). You would still need __GFP_* and
> > in_interrupt() tests but current->mm and PF_KTHREAD tests can be
> > rolled into index selection.
>
> How big would this array be? there can be a lot more kmem_caches than
> there are memcgs. That is why it is done from memcg side.

The total number of memcgs are pretty limited due to the ID thing,
right? And kmemcg is only applied to subset of caches. I don't think
the array size would be a problem in terms of memory overhead, would
it? If so, RCU synchronize and dynamically grow them?

Thanks.

--
tejun

2012-10-04 11:58:53

by Glauber Costa

[permalink] [raw]
Subject: Re: [PATCH v3 04/13] kmem accounting basic infrastructure

On 10/04/2012 02:54 AM, Tejun Heo wrote:
> Hello, Glauber.
>
> On Mon, Oct 01, 2012 at 12:45:11PM +0400, Glauber Costa wrote:
>>> where kmemcg_slab_idx is updated from sched notifier (or maybe add and
>>> use current->kmemcg_slab_idx?). You would still need __GFP_* and
>>> in_interrupt() tests but current->mm and PF_KTHREAD tests can be
>>> rolled into index selection.
>>
>> How big would this array be? there can be a lot more kmem_caches than
>> there are memcgs. That is why it is done from memcg side.
>
> The total number of memcgs are pretty limited due to the ID thing,
> right? And kmemcg is only applied to subset of caches. I don't think
> the array size would be a problem in terms of memory overhead, would
> it? If so, RCU synchronize and dynamically grow them?
>
> Thanks.
>

I don't want to assume the number of memcgs will always be that limited.
Sure, the ID limitation sounds pretty much a big one, but people doing
VMs usually want to stack as many VMs as they possibly can in an
environment, and the less things preventing that from happening, the better.

That said, now that I've experimented with this a bit, indexing from the
cache may have some advantages: it can get too complicated to propagate
new caches appearing to all memcgs that already in-flight. We don't have
this problem from the cache side, because instances of it are guaranteed
not to exist at this point by definition.

I don't want to bloat unrelated kmem_cache structures, so I can't embed
a memcg array in there: I would have to have a pointer to a memcg array
that gets assigned at first use. But if we don't want to have a static
number, as you and christoph already frowned upon heavily, we may have
to do that memcg side as well.

The array gets bigger, though, because it pretty much has to be enough
to accomodate all css_ids. Even now, they are more than the 400 I used
in this patchset. Not allocating all of them at once will lead to more
complication and pointer chasing in here.

I'll take a look at the alternatives today and tomorrow.

2012-10-06 02:19:31

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v3 04/13] kmem accounting basic infrastructure

Hello, Glauber.

On Thu, Oct 04, 2012 at 03:55:14PM +0400, Glauber Costa wrote:
> I don't want to bloat unrelated kmem_cache structures, so I can't embed
> a memcg array in there: I would have to have a pointer to a memcg array
> that gets assigned at first use. But if we don't want to have a static
> number, as you and christoph already frowned upon heavily, we may have
> to do that memcg side as well.
>
> The array gets bigger, though, because it pretty much has to be enough
> to accomodate all css_ids. Even now, they are more than the 400 I used
> in this patchset. Not allocating all of them at once will lead to more
> complication and pointer chasing in here.

I don't think it would require more pointer chasing. At the simplest,
we can just compare the array size each time. If you wanna be more
efficient, all arrays can be kept at the same size and resized when
the number of memcgs cross the current number. The only runtime
overhead would be one pointer deref which I don't think can be avoided
regardless of the indexing direction.

Thanks.

--
tejun