2010-04-01 04:53:38

by Balbir Singh

[permalink] [raw]
Subject: Re: RFC [Patch] Remove "please try 'cgroup_disable=memory' option if you don't want memory cgroups" printk at boot time.

* KAMEZAWA Hiroyuki <[email protected]> [2010-04-01 10:48:59]:

> On Wed, 31 Mar 2010 13:57:46 -0400
> Rik van Riel <[email protected]> wrote:
>
> > On 03/31/2010 11:54 AM, Larry Woodman wrote:
> > > On Wed, 2010-03-31 at 11:28 -0400, Larry Woodman wrote:
> > >> We are considering removing this printk at boot time from RHEL because
> > >> it will confuse customers, encourage them to change the boot parameters
> > >> and generate extraneous support calls. Its documented in
> > >> Documentation/kernel-parameters.txt anyway. Any thoughts???
> >
> > Yeah, that is a strange boot message...
> >
> > Acked-by: Rik van Riel <[email protected]>
> >
> please CC linux-mm and maintainers.
>
> Acked-by: KAMEZAWA Hiroyuki <[email protected]>
>
> It have been there for a year and I think memory usage by page_cgroup
> will not surprise linux kernel users, more.
>
> Assume x86-32.
>
> RHEL allows amount of memory up to 16G, right?
>
> without memcg: memmap uses 32bytes * 16G/4k = 128M.
> with memcg: memmap+page_cgroup uses (32+20) bytes * 16G/4k = 208M.
>
> I thought this may cause OOM in ZONE_NORMAL. Then, I added it when I wrote
> original patch. This kind of memory eater can cause trouble when it pops
> up suddenly. But I think 'one year' can be an excuse.
>

I've seen this issue come up on multiple machines, I think the printk
is useful. However, we might need to change the panic() to a big fat
warning and disable the memcg controller if we fail to allocate memory
in page_cgroup_init_flatmem().

--
Three Cheers,
Balbir


2010-04-07 07:59:53

by Heiko Carstens

[permalink] [raw]
Subject: Re: RFC [Patch] Remove "please try 'cgroup_disable=memory' option if you don't want memory cgroups" printk at boot time.

On Thu, Apr 01, 2010 at 10:23:10AM +0530, Balbir Singh wrote:
> * KAMEZAWA Hiroyuki <[email protected]> [2010-04-01 10:48:59]:
>
> > On Wed, 31 Mar 2010 13:57:46 -0400
> > Rik van Riel <[email protected]> wrote:
> >
> > > On 03/31/2010 11:54 AM, Larry Woodman wrote:
> > > > On Wed, 2010-03-31 at 11:28 -0400, Larry Woodman wrote:
> > > >> We are considering removing this printk at boot time from RHEL because
> > > >> it will confuse customers, encourage them to change the boot parameters
> > > >> and generate extraneous support calls. Its documented in
> > > >> Documentation/kernel-parameters.txt anyway. Any thoughts???
> > >
> > > Yeah, that is a strange boot message...
> > >
> > > Acked-by: Rik van Riel <[email protected]>
> > >
> > please CC linux-mm and maintainers.
> >
> > Acked-by: KAMEZAWA Hiroyuki <[email protected]>
> >
> > It have been there for a year and I think memory usage by page_cgroup
> > will not surprise linux kernel users, more.
> >
> > Assume x86-32.
> >
> > RHEL allows amount of memory up to 16G, right?
> >
> > without memcg: memmap uses 32bytes * 16G/4k = 128M.
> > with memcg: memmap+page_cgroup uses (32+20) bytes * 16G/4k = 208M.
> >
> > I thought this may cause OOM in ZONE_NORMAL. Then, I added it when I wrote
> > original patch. This kind of memory eater can cause trouble when it pops
> > up suddenly. But I think 'one year' can be an excuse.
> >
>
> I've seen this issue come up on multiple machines, I think the printk
> is useful. However, we might need to change the panic() to a big fat
> warning and disable the memcg controller if we fail to allocate memory
> in page_cgroup_init_flatmem().

Probably a stupid question: but isn't it possible to allocate the huge
amounts of memory only if somebody activates memcg during runtime?
And then allocate everything using vmalloc?
But that probably doesn't work, since you need to record everything
from the boot of the system, I would guess?
Just wondering because we do everything to not even waste a single bit
in struct page and all of a sudden on the enterprise distros we allocate
(by default!) 40 additional bytes per page.

2010-04-07 08:15:11

by Kamezawa Hiroyuki

[permalink] [raw]
Subject: Re: RFC [Patch] Remove "please try 'cgroup_disable=memory' option if you don't want memory cgroups" printk at boot time.

On Wed, 7 Apr 2010 10:00:14 +0200
Heiko Carstens <[email protected]> wrote:

> On Thu, Apr 01, 2010 at 10:23:10AM +0530, Balbir Singh wrote:
> > * KAMEZAWA Hiroyuki <[email protected]> [2010-04-01 10:48:59]:
> > I've seen this issue come up on multiple machines, I think the printk
> > is useful. However, we might need to change the panic() to a big fat
> > warning and disable the memcg controller if we fail to allocate memory
> > in page_cgroup_init_flatmem().
>
> Probably a stupid question: but isn't it possible to allocate the huge
> amounts of memory only if somebody activates memcg during runtime?

Activation can occur only at boot but page_cgroup allocation happens at
memory hotplug.

> And then allocate everything using vmalloc?
No.

> But that probably doesn't work, since you need to record everything
> from the boot of the system, I would guess?

The story was..

1. at first, page_cgroup was allocated on demand. but we need to have
page->page_cgroup pointer. Then, we pay 8bytes per page even if we
disable memory cgroup.
All page behavior was tracked since boot time.

2. Fedora maintaienr said "we never enable memcg if you contiue to use
page->page_cgroup pointer, 8bytes per page costs!".
Then, we decieded to allocate page_cgroup at boot time, and allocate
all at once at boot time. This makes memcg runtime robust. And we
got rid of page->page_cgroup pointer.
cgroup_disable=memory user have no waste of memory now.

> Just wondering because we do everything to not even waste a single bit
> in struct page and all of a sudden on the enterprise distros we allocate
> (by default!) 40 additional bytes per page.

3. Then, I added warning when I wrote a patch to allocate page_cgroup at boot.
It's easy to avoid extra 40bytes.
For enterprise, I have no concern. Enterprise admin tend to be careful and
check all default value when he use a new kernel.
That message was for desktop guys using desktop distro.

Disabling memory cgroup at default may be a choice. But no one send such kind
of patch until now.

Thanks,
-Kame

2010-04-07 08:32:49

by Heiko Carstens

[permalink] [raw]
Subject: Re: RFC [Patch] Remove "please try 'cgroup_disable=memory' option if you don't want memory cgroups" printk at boot time.

On Wed, Apr 07, 2010 at 05:11:13PM +0900, KAMEZAWA Hiroyuki wrote:
> On Wed, 7 Apr 2010 10:00:14 +0200
> Heiko Carstens <[email protected]> wrote:
> > Probably a stupid question: but isn't it possible to allocate the huge
> > amounts of memory only if somebody activates memcg during runtime?
>
> Activation can occur only at boot but page_cgroup allocation happens at
> memory hotplug.
>
> > And then allocate everything using vmalloc?
> No.
>
> > But that probably doesn't work, since you need to record everything
> > from the boot of the system, I would guess?
>
> The story was..
>
> 1. at first, page_cgroup was allocated on demand. but we need to have
> page->page_cgroup pointer. Then, we pay 8bytes per page even if we
> disable memory cgroup.
> All page behavior was tracked since boot time.
>
> 2. Fedora maintaienr said "we never enable memcg if you contiue to use
> page->page_cgroup pointer, 8bytes per page costs!".
> Then, we decieded to allocate page_cgroup at boot time, and allocate
> all at once at boot time. This makes memcg runtime robust. And we
> got rid of page->page_cgroup pointer.
> cgroup_disable=memory user have no waste of memory now.
>
> > Just wondering because we do everything to not even waste a single bit
> > in struct page and all of a sudden on the enterprise distros we allocate
> > (by default!) 40 additional bytes per page.
>
> 3. Then, I added warning when I wrote a patch to allocate page_cgroup at boot.
> It's easy to avoid extra 40bytes.
> For enterprise, I have no concern. Enterprise admin tend to be careful and
> check all default value when he use a new kernel.
> That message was for desktop guys using desktop distro.
>
> Disabling memory cgroup at default may be a choice. But no one send such kind
> of patch until now.

Thanks for explaining! Looks like whatever you do there's always somebody
who complains.
Distros _could_ ship with memcg by default off independently of the upstream
kernel. But it looks like they won't. Anyway that's a problem we can't solve
here.