hi, all
when I used memory cgroup in latest mainline, the following error occurred:
# mount -t cgroup -o memory xxx /cgroup/
# ll /cgroup/memory.memsw.*
-rw-r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.failcnt
-rw-r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.limit_in_bytes
-rw-r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.max_usage_in_bytes
-r--r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.usage_in_bytes
# cat /cgroup/memory.memsw.*
cat: /cgroup/memory.memsw.failcnt: Operation not supported
cat: /cgroup/memory.memsw.limit_in_bytes: Operation not supported
cat: /cgroup/memory.memsw.max_usage_in_bytes: Operation not supported
cat: /cgroup/memory.memsw.usage_in_bytes: Operation not supported
I'm confusing why it can't read memory.memsw.* files.
as commit:a42c390cfa0c said, CGROUP_MEM_RES_CTLR_SWAP_ENABLED and
swapaccount kernel parameter control memcg swap accounting,
but I confirmed the two options all don't be set:
# cat /usr/lib/modules/3.5.0-rc4+/source/.config | grep CGROUP_MEM
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
# CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED is not set
CONFIG_CGROUP_MEM_RES_CTLR_KMEM=y
# cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.5.0-rc4+ root=/dev/mapper/vg_amd--pike--06-lv_root ro rd.lvm.lv=vg_amd-pike-06/lv_swap rd.md=0 LANG=en_US.UTF-8 console=ttyS0,115200n81 KEYTABLE=us SYSFONT=True rd.luks=0 rd.dm=0 rd.lvm.lv=vg_amd-pike-06/lv_root
so I have two problems here:
1. when kernel neither set 'CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED' nor 'swapaccount' options,
why memcg have memory.memsw.* files ?
2. why we can't read memory.memsw.* ?
Addition info:
when I open CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED option, the above issues are gone.
also I tested v3.4.0, there aren't the two issues, so please take a look.
--
Thanks,
Zhouping
On Tue 26-06-12 23:49:15, Zhouping Liu wrote:
> hi, all
>
> when I used memory cgroup in latest mainline, the following error occurred:
>
> # mount -t cgroup -o memory xxx /cgroup/
> # ll /cgroup/memory.memsw.*
> -rw-r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.failcnt
> -rw-r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.limit_in_bytes
> -rw-r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.max_usage_in_bytes
> -r--r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.usage_in_bytes
> # cat /cgroup/memory.memsw.*
> cat: /cgroup/memory.memsw.failcnt: Operation not supported
> cat: /cgroup/memory.memsw.limit_in_bytes: Operation not supported
> cat: /cgroup/memory.memsw.max_usage_in_bytes: Operation not supported
> cat: /cgroup/memory.memsw.usage_in_bytes: Operation not supported
>
> I'm confusing why it can't read memory.memsw.* files.
Those files are exported if CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y even
if the feature is turned off when any attempt to open the file returns
EOPNOTSUPP which is exactly what you are seeing.
This is a deliberate decision see: b6d9270d (memcg: always create memsw
files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP).
Does this help to explain your problem? Do you actually see any problem
with this behavior?
Thanks
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
On Wed, 27 Jun 2012, Michal Hocko wrote:
> > # mount -t cgroup -o memory xxx /cgroup/
> > # ll /cgroup/memory.memsw.*
> > -rw-r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.failcnt
> > -rw-r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.limit_in_bytes
> > -rw-r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.max_usage_in_bytes
> > -r--r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.usage_in_bytes
> > # cat /cgroup/memory.memsw.*
> > cat: /cgroup/memory.memsw.failcnt: Operation not supported
> > cat: /cgroup/memory.memsw.limit_in_bytes: Operation not supported
> > cat: /cgroup/memory.memsw.max_usage_in_bytes: Operation not supported
> > cat: /cgroup/memory.memsw.usage_in_bytes: Operation not supported
> >
> > I'm confusing why it can't read memory.memsw.* files.
>
> Those files are exported if CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y even
> if the feature is turned off when any attempt to open the file returns
> EOPNOTSUPP which is exactly what you are seeing.
> This is a deliberate decision see: b6d9270d (memcg: always create memsw
> files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP).
>
You mean af36f906c0f4?
> Does this help to explain your problem? Do you actually see any problem
> with this behavior?
>
I think it's a crappy solution and one that is undocumented in
Documentation/cgroups/memory.txt. If you can only enable swap accounting
at boot either via .config or the command line then these files should
never be added for CONFIG_CGROUP_MEM_RES_CTLR_SWAP=n or when
do_swap_account is 0. It's much easier to test if the feature is enabled
by checking for the presence of these files at the memcg mount point
rather than doing an open(2) and checking for -EOPNOTSUPP, which isn't
even a listed error code. I don't care how much cleaner it makes the
internal memcg code.
Hello, Michal, David.
On Wed, Jun 27, 2012 at 01:04:51PM -0700, David Rientjes wrote:
> I think it's a crappy solution and one that is undocumented in
> Documentation/cgroups/memory.txt. If you can only enable swap accounting
> at boot either via .config or the command line then these files should
> never be added for CONFIG_CGROUP_MEM_RES_CTLR_SWAP=n or when
> do_swap_account is 0. It's much easier to test if the feature is enabled
> by checking for the presence of these files at the memcg mount point
> rather than doing an open(2) and checking for -EOPNOTSUPP, which isn't
> even a listed error code. I don't care how much cleaner it makes the
> internal memcg code.
Yeah, it's kinda ugly. Taking a step back, do we really need be able
to configure out memsw? How much vmlinux bloat or runtime overhead
are we talking about? I don't think config options need to be this
granular.
Thanks.
--
tejun
On Wed, 27 Jun 2012, Tejun Heo wrote:
> Yeah, it's kinda ugly. Taking a step back, do we really need be able
> to configure out memsw? How much vmlinux bloat or runtime overhead
> are we talking about? I don't think config options need to be this
> granular.
>
Well it also has a prerequisite that memcg doesn't have: CONFIG_SWAP, so
even if CONFIG_CGROUP_MEM_RES_CTLR_SWAP is folded into
CONFIG_CGROUP_MEM_RES_CTLR, then these should still depend on CONFIG_SWAP
since configuring them would imply there is some limit to be enforced.
But to answer your question:
text data bss dec hex filename
25777 3644 4128 33549 830d memcontrol.o.swap_disabled
27294 4476 4128 35898 8c3a memcontrol.o.swap_enabled
Is it really too painful to not create these files when
CONFIG_CGROUP_MEM_RES_CTLR_SWAP is disabled? If so, can we at least allow
them to be opened but return -EINVAL if memory.memsw.limit_in_bytes is
written?
Hello,
On Wed, Jun 27, 2012 at 01:21:27PM -0700, David Rientjes wrote:
> Well it also has a prerequisite that memcg doesn't have: CONFIG_SWAP, so
Right.
> even if CONFIG_CGROUP_MEM_RES_CTLR_SWAP is folded into
> CONFIG_CGROUP_MEM_RES_CTLR, then these should still depend on CONFIG_SWAP
> since configuring them would imply there is some limit to be enforced.
>
> But to answer your question:
>
> text data bss dec hex filename
> 25777 3644 4128 33549 830d memcontrol.o.swap_disabled
> 27294 4476 4128 35898 8c3a memcontrol.o.swap_enabled
I still wish it's folded into CONFIG_MEMCG and conditionalized just on
CONFIG_SWAP tho.
> Is it really too painful to not create these files when
> CONFIG_CGROUP_MEM_RES_CTLR_SWAP is disabled? If so, can we at least allow
> them to be opened but return -EINVAL if memory.memsw.limit_in_bytes is
> written?
Not at all, that was the first version anyway, which (IIRC) KAME
didn't like and suggested always creating those files. KAME, what do
you think?
Thanks.
--
tejun
On Wed, 27 Jun 2012, Tejun Heo wrote:
> > text data bss dec hex filename
> > 25777 3644 4128 33549 830d memcontrol.o.swap_disabled
> > 27294 4476 4128 35898 8c3a memcontrol.o.swap_enabled
>
> I still wish it's folded into CONFIG_MEMCG and conditionalized just on
> CONFIG_SWAP tho.
>
Agreed.
(2012/06/28 5:24), Tejun Heo wrote:
> Hello,
>
> On Wed, Jun 27, 2012 at 01:21:27PM -0700, David Rientjes wrote:
>> Well it also has a prerequisite that memcg doesn't have: CONFIG_SWAP, so
>
> Right.
>
>> even if CONFIG_CGROUP_MEM_RES_CTLR_SWAP is folded into
>> CONFIG_CGROUP_MEM_RES_CTLR, then these should still depend on CONFIG_SWAP
>> since configuring them would imply there is some limit to be enforced.
>>
>> But to answer your question:
>>
>> text data bss dec hex filename
>> 25777 3644 4128 33549 830d memcontrol.o.swap_disabled
>> 27294 4476 4128 35898 8c3a memcontrol.o.swap_enabled
>
> I still wish it's folded into CONFIG_MEMCG and conditionalized just on
> CONFIG_SWAP tho.
>
In old days, memsw controller was not very stable. So, we devided the config.
And, it makes size of memory for swap-device double (adds 2bytes per swapent.)
That is the problem.
>> Is it really too painful to not create these files when
>> CONFIG_CGROUP_MEM_RES_CTLR_SWAP is disabled? If so, can we at least allow
>> them to be opened but return -EINVAL if memory.memsw.limit_in_bytes is
>> written?
>
> Not at all, that was the first version anyway, which (IIRC) KAME
> didn't like and suggested always creating those files. KAME, what do
> you think?
>
IIRC...at that time, we made decision, cgroup has no feature to
'create files dynamically'. Then, we made it in static, decision was done
at compile time and ignores "do_swap_account".
Now, IIUC, we have the feature. So, it's may be a time to create the file
with regard to "do_swap_account", making decision at boot time.
Thanks,
-Kame
[Adding Kame and Aneesh to CC]
On Wed 27-06-12 13:04:51, David Rientjes wrote:
> On Wed, 27 Jun 2012, Michal Hocko wrote:
>
> > > # mount -t cgroup -o memory xxx /cgroup/
> > > # ll /cgroup/memory.memsw.*
> > > -rw-r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.failcnt
> > > -rw-r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.limit_in_bytes
> > > -rw-r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.max_usage_in_bytes
> > > -r--r--r--. 1 root root 0 Jun 26 23:17 /cgroup/memory.memsw.usage_in_bytes
> > > # cat /cgroup/memory.memsw.*
> > > cat: /cgroup/memory.memsw.failcnt: Operation not supported
> > > cat: /cgroup/memory.memsw.limit_in_bytes: Operation not supported
> > > cat: /cgroup/memory.memsw.max_usage_in_bytes: Operation not supported
> > > cat: /cgroup/memory.memsw.usage_in_bytes: Operation not supported
> > >
> > > I'm confusing why it can't read memory.memsw.* files.
> >
> > Those files are exported if CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y even
> > if the feature is turned off when any attempt to open the file returns
> > EOPNOTSUPP which is exactly what you are seeing.
> > This is a deliberate decision see: b6d9270d (memcg: always create memsw
> > files if CONFIG_CGROUP_MEM_RES_CTLR_SWAP).
> >
>
> You mean af36f906c0f4?
Ahh, right. The other one was from the mm tree. Sorry about the confusion.
> > Does this help to explain your problem? Do you actually see any problem
> > with this behavior?
> >
>
> I think it's a crappy solution and one that is undocumented in
> Documentation/cgroups/memory.txt.
Yes the documentation part is really missing. I don't think the current
state is ideal as well...
> If you can only enable swap accounting at boot either via .config
> or the command line then these files should never be added for
> CONFIG_CGROUP_MEM_RES_CTLR_SWAP=n or when do_swap_account is 0.
Yes, I think we can enhance the internal implementation to support
configurable files (hugetlb controler would benefit from it as well
because the exported files depend on the supported/configured huge page
sizes). What about something like (totally untested) patch bellow? If
this sounds like a reasonable thing to support I can spin a regular
patch...
---
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index d3f5fba..3fc7859 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -527,6 +527,7 @@ struct cgroup_subsys {
/* base cftypes, automatically [de]registered with subsys itself */
struct cftype *base_cftypes;
+ bool (*cftype_enabled)(const char *name);
struct cftype_set base_cftset;
/* should be defined only by modular subsystems */
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 0f3527d..0d1a25d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -2726,6 +2726,9 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cgroup_subsys *subsys,
int err, ret = 0;
for (cft = cfts; cft->name[0] != '\0'; cft++) {
+ if (subsys->cftype_enabled && !subsys->cftype_enabled(cft->name))
+ continue;
+
if (is_add)
err = cgroup_add_file(cgrp, subsys, cft);
else
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a2677e0..45b65ba 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -72,6 +72,13 @@ static int really_do_swap_account __initdata = 1;
static int really_do_swap_account __initdata = 0;
#endif
+bool mem_cgroup_file_enabled(const char *name)
+{
+ if (!strncmp("memsw.", name, 6))
+ return do_swap_account;
+ return true;
+}
+
#else
#define do_swap_account 0
#endif
@@ -5521,6 +5528,9 @@ struct cgroup_subsys mem_cgroup_subsys = {
.cancel_attach = mem_cgroup_cancel_attach,
.attach = mem_cgroup_move_task,
.base_cftypes = mem_cgroup_files,
+#ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
+ .cftype_enabled = mem_cgroup_file_enabled,
+#endif
.early_init = 0,
.use_id = 1,
.__DEPRECATED_clear_css_refs = true,
--
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9
Czech Republic
Hello, Michal.
On Thu, Jun 28, 2012 at 02:36:11PM +0200, Michal Hocko wrote:
> @@ -2726,6 +2726,9 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cgroup_subsys *subsys,
> int err, ret = 0;
>
> for (cft = cfts; cft->name[0] != '\0'; cft++) {
> + if (subsys->cftype_enabled && !subsys->cftype_enabled(cft->name))
> + continue;
> +
> if (is_add)
> err = cgroup_add_file(cgrp, subsys, cft);
> else
I hope we could avoid this dynamic decision. That was one of the main
reasons behind doing the cftype thing. It's better to be able to
"declare" these kind of things rather than being able to implement
fully flexible dynamic logic. Too much flexibility often doesn't
achieve much while being a hindrance to evolution of code base (trying
to improve / simplify X - ooh... there's this single wacko corner case
YYY here which is really different from all other users).
really_do_swap_account can't change once booted, right? Why not just
separate out memsw cfts into a separate array and call
cgroup_add_cftypes() from init path? Can't we do that from
enable_swap_cgroup()?
Thanks.
--
tejun
Hello, KAME.
On Thu, Jun 28, 2012 at 01:04:16PM +0900, Kamezawa Hiroyuki wrote:
> >I still wish it's folded into CONFIG_MEMCG and conditionalized just on
> >CONFIG_SWAP tho.
> >
>
> In old days, memsw controller was not very stable. So, we devided the config.
> And, it makes size of memory for swap-device double (adds 2bytes per swapent.)
> That is the problem.
I see. Do you think it's now reasonable to drop the separate config
option? Having memcg enabled but swap unaccounted sounds half-broken
to me.
> IIRC...at that time, we made decision, cgroup has no feature to
> 'create files dynamically'. Then, we made it in static, decision was done
> at compile time and ignores "do_swap_account".
>
> Now, IIUC, we have the feature. So, it's may be a time to create the file
> with regard to "do_swap_account", making decision at boot time.
Heh, yeah, maybe I'm confused about how it happened. Anyways, let's
get it fixed.
Thanks!
--
tejun
(2012/06/29 3:29), Tejun Heo wrote:
> Hello, Michal.
>
> On Thu, Jun 28, 2012 at 02:36:11PM +0200, Michal Hocko wrote:
>> @@ -2726,6 +2726,9 @@ static int cgroup_addrm_files(struct cgroup *cgrp, struct cgroup_subsys *subsys,
>> int err, ret = 0;
>>
>> for (cft = cfts; cft->name[0] != '\0'; cft++) {
>> + if (subsys->cftype_enabled && !subsys->cftype_enabled(cft->name))
>> + continue;
>> +
>> if (is_add)
>> err = cgroup_add_file(cgrp, subsys, cft);
>> else
>
> I hope we could avoid this dynamic decision. That was one of the main
> reasons behind doing the cftype thing. It's better to be able to
> "declare" these kind of things rather than being able to implement
> fully flexible dynamic logic. Too much flexibility often doesn't
> achieve much while being a hindrance to evolution of code base (trying
> to improve / simplify X - ooh... there's this single wacko corner case
> YYY here which is really different from all other users).
>
> really_do_swap_account can't change once booted, right? Why not just
> separate out memsw cfts into a separate array and call
> cgroup_add_cftypes() from init path? Can't we do that from
> enable_swap_cgroup()?
>
Yes, that's will be good.
Thanks,
-Kame
On 06/28/2012 08:04 AM, Kamezawa Hiroyuki wrote:
>>
>> I still wish it's folded into CONFIG_MEMCG and conditionalized just on
>> CONFIG_SWAP tho.
>>
>
> In old days, memsw controller was not very stable. So, we devided the
> config.
> And, it makes size of memory for swap-device double (adds 2bytes per
> swapent.)
> That is the problem.
That's the tendency to happen with anything new, since we want to add it
without disrupting what's already in there. I am not very fond of config
options explosions myself, so I am for removing it.
(2012/06/29 3:31), Tejun Heo wrote:
> Hello, KAME.
>
> On Thu, Jun 28, 2012 at 01:04:16PM +0900, Kamezawa Hiroyuki wrote:
>>> I still wish it's folded into CONFIG_MEMCG and conditionalized just on
>>> CONFIG_SWAP tho.
>>>
>>
>> In old days, memsw controller was not very stable. So, we devided the config.
>> And, it makes size of memory for swap-device double (adds 2bytes per swapent.)
>> That is the problem.
>
> I see. Do you think it's now reasonable to drop the separate config
> option? Having memcg enabled but swap unaccounted sounds half-broken
> to me.
>
Hmm. Maybe it's ok if we can keep boot option. I'll cook a patch in the next week.
Thanks,
-Kame