2016-04-06 01:25:17

by David Rientjes

[permalink] [raw]
Subject: [patch] mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size

The page_counter rounds limits down to page size values. This makes
sense, except in the case of hugetlb_cgroup where it's not possible to
charge partial hugepages.

Round the hugetlb_cgroup limit down to hugepage size.

Signed-off-by: David Rientjes <[email protected]>
---
mm/hugetlb_cgroup.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -288,6 +288,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,

switch (MEMFILE_ATTR(of_cft(of)->private)) {
case RES_LIMIT:
+ nr_pages &= ~((1 << huge_page_order(&hstates[idx])) - 1);
mutex_lock(&hugetlb_limit_mutex);
ret = page_counter_limit(&h_cg->hugepage[idx], nr_pages);
mutex_unlock(&hugetlb_limit_mutex);


2016-04-06 07:26:59

by Angel Shtilianov

[permalink] [raw]
Subject: Re: [patch] mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size



On 04/06/2016 04:25 AM, David Rientjes wrote:
> The page_counter rounds limits down to page size values. This makes
> sense, except in the case of hugetlb_cgroup where it's not possible to
> charge partial hugepages.
>
> Round the hugetlb_cgroup limit down to hugepage size.
>
> Signed-off-by: David Rientjes <[email protected]>
> ---
> mm/hugetlb_cgroup.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
> --- a/mm/hugetlb_cgroup.c
> +++ b/mm/hugetlb_cgroup.c
> @@ -288,6 +288,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,
>
> switch (MEMFILE_ATTR(of_cft(of)->private)) {
> case RES_LIMIT:
> + nr_pages &= ~((1 << huge_page_order(&hstates[idx])) - 1);

Why not:

nr_pages = round_down(nr_pages, huge_page_order(&hstates[idx]));


> mutex_lock(&hugetlb_limit_mutex);
> ret = page_counter_limit(&h_cg->hugepage[idx], nr_pages);
> mutex_unlock(&hugetlb_limit_mutex);
>

2016-04-06 07:33:25

by Angel Shtilianov

[permalink] [raw]
Subject: Re: [patch] mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size



On 04/06/2016 10:26 AM, Nikolay Borisov wrote:
>
>
> On 04/06/2016 04:25 AM, David Rientjes wrote:
>> The page_counter rounds limits down to page size values. This makes
>> sense, except in the case of hugetlb_cgroup where it's not possible to
>> charge partial hugepages.
>>
>> Round the hugetlb_cgroup limit down to hugepage size.
>>
>> Signed-off-by: David Rientjes <[email protected]>
>> ---
>> mm/hugetlb_cgroup.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
>> --- a/mm/hugetlb_cgroup.c
>> +++ b/mm/hugetlb_cgroup.c
>> @@ -288,6 +288,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,
>>
>> switch (MEMFILE_ATTR(of_cft(of)->private)) {
>> case RES_LIMIT:
>> + nr_pages &= ~((1 << huge_page_order(&hstates[idx])) - 1);
>
> Why not:
>
> nr_pages = round_down(nr_pages, huge_page_order(&hstates[idx]));

Oops, that should be:

round_down(nr_pages, 1 << huge_page_order(&hstates[idx]));

>
>
>> mutex_lock(&hugetlb_limit_mutex);
>> ret = page_counter_limit(&h_cg->hugepage[idx], nr_pages);
>> mutex_unlock(&hugetlb_limit_mutex);
>>

2016-04-06 09:09:58

by Michal Hocko

[permalink] [raw]
Subject: Re: [patch] mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size

On Wed 06-04-16 10:33:19, Nikolay Borisov wrote:
>
>
> On 04/06/2016 10:26 AM, Nikolay Borisov wrote:
> >
> >
> > On 04/06/2016 04:25 AM, David Rientjes wrote:
> >> The page_counter rounds limits down to page size values. This makes
> >> sense, except in the case of hugetlb_cgroup where it's not possible to
> >> charge partial hugepages.
> >>
> >> Round the hugetlb_cgroup limit down to hugepage size.
> >>
> >> Signed-off-by: David Rientjes <[email protected]>
> >> ---
> >> mm/hugetlb_cgroup.c | 1 +
> >> 1 file changed, 1 insertion(+)
> >>
> >> diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
> >> --- a/mm/hugetlb_cgroup.c
> >> +++ b/mm/hugetlb_cgroup.c
> >> @@ -288,6 +288,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,
> >>
> >> switch (MEMFILE_ATTR(of_cft(of)->private)) {
> >> case RES_LIMIT:
> >> + nr_pages &= ~((1 << huge_page_order(&hstates[idx])) - 1);
> >
> > Why not:
> >
> > nr_pages = round_down(nr_pages, huge_page_order(&hstates[idx]));
>
> Oops, that should be:
>
> round_down(nr_pages, 1 << huge_page_order(&hstates[idx]));

round_down is a bit nicer.

Anyway
Acked-by: Michal Hocko <[email protected]>

--
Michal Hocko
SUSE Labs

2016-04-06 22:10:28

by David Rientjes

[permalink] [raw]
Subject: [patch v2] mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size

The page_counter rounds limits down to page size values. This makes
sense, except in the case of hugetlb_cgroup where it's not possible to
charge partial hugepages.

Round the hugetlb_cgroup limit down to hugepage size.

Signed-off-by: David Rientjes <[email protected]>
---
mm/hugetlb_cgroup.c | 35 ++++++++++++++++++++++++++---------
1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -67,26 +67,42 @@ static inline bool hugetlb_cgroup_have_usage(struct hugetlb_cgroup *h_cg)
return false;
}

+static void hugetlb_cgroup_init(struct hugetlb_cgroup *h_cgroup,
+ struct hugetlb_cgroup *parent_h_cgroup)
+{
+ int idx;
+
+ for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) {
+ struct page_counter *counter = &h_cgroup->hugepage[idx];
+ struct page_counter *parent = NULL;
+ unsigned long limit;
+ int ret;
+
+ if (parent_h_cgroup)
+ parent = &parent_h_cgroup->hugepage[idx];
+ page_counter_init(counter, parent);
+
+ limit = round_down(PAGE_COUNTER_MAX,
+ 1 << huge_page_order(&hstates[idx]));
+ ret = page_counter_limit(counter, limit);
+ VM_BUG_ON(ret);
+ }
+}
+
static struct cgroup_subsys_state *
hugetlb_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
{
struct hugetlb_cgroup *parent_h_cgroup = hugetlb_cgroup_from_css(parent_css);
struct hugetlb_cgroup *h_cgroup;
- int idx;

h_cgroup = kzalloc(sizeof(*h_cgroup), GFP_KERNEL);
if (!h_cgroup)
return ERR_PTR(-ENOMEM);

- if (parent_h_cgroup) {
- for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
- page_counter_init(&h_cgroup->hugepage[idx],
- &parent_h_cgroup->hugepage[idx]);
- } else {
+ if (!parent_h_cgroup)
root_h_cgroup = h_cgroup;
- for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
- page_counter_init(&h_cgroup->hugepage[idx], NULL);
- }
+
+ hugetlb_cgroup_init(h_cgroup, parent_h_cgroup);
return &h_cgroup->css;
}

@@ -285,6 +301,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,
return ret;

idx = MEMFILE_IDX(of_cft(of)->private);
+ nr_pages = round_down(nr_pages, 1 << huge_page_order(&hstates[idx]));

switch (MEMFILE_ATTR(of_cft(of)->private)) {
case RES_LIMIT:

2016-04-07 12:51:50

by Michal Hocko

[permalink] [raw]
Subject: Re: [patch v2] mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size

On Wed 06-04-16 15:10:23, David Rientjes wrote:
[...]
> +static void hugetlb_cgroup_init(struct hugetlb_cgroup *h_cgroup,
> + struct hugetlb_cgroup *parent_h_cgroup)
> +{
> + int idx;
> +
> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) {
> + struct page_counter *counter = &h_cgroup->hugepage[idx];
> + struct page_counter *parent = NULL;
> + unsigned long limit;
> + int ret;
> +
> + if (parent_h_cgroup)
> + parent = &parent_h_cgroup->hugepage[idx];
> + page_counter_init(counter, parent);
> +
> + limit = round_down(PAGE_COUNTER_MAX,
> + 1 << huge_page_order(&hstates[idx]));
> + ret = page_counter_limit(counter, limit);
> + VM_BUG_ON(ret);
> + }
> +}

I fail to see the point for this. Why would want to round down
PAGE_COUNTER_MAX? It will never make a real difference. Or am I missing
something?
--
Michal Hocko
SUSE Labs

2016-04-14 20:22:34

by David Rientjes

[permalink] [raw]
Subject: Re: [patch v2] mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size

On Thu, 7 Apr 2016, Michal Hocko wrote:

> > +static void hugetlb_cgroup_init(struct hugetlb_cgroup *h_cgroup,
> > + struct hugetlb_cgroup *parent_h_cgroup)
> > +{
> > + int idx;
> > +
> > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) {
> > + struct page_counter *counter = &h_cgroup->hugepage[idx];
> > + struct page_counter *parent = NULL;
> > + unsigned long limit;
> > + int ret;
> > +
> > + if (parent_h_cgroup)
> > + parent = &parent_h_cgroup->hugepage[idx];
> > + page_counter_init(counter, parent);
> > +
> > + limit = round_down(PAGE_COUNTER_MAX,
> > + 1 << huge_page_order(&hstates[idx]));
> > + ret = page_counter_limit(counter, limit);
> > + VM_BUG_ON(ret);
> > + }
> > +}
>
> I fail to see the point for this. Why would want to round down
> PAGE_COUNTER_MAX? It will never make a real difference. Or am I missing
> something?

Did you try the patch?

If we're rounding down the user value, it makes sense to be consistent
with the upper bound default to specify intent.

2016-04-15 13:24:56

by Michal Hocko

[permalink] [raw]
Subject: Re: [patch v2] mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size

On Thu 14-04-16 13:22:30, David Rientjes wrote:
> On Thu, 7 Apr 2016, Michal Hocko wrote:
>
> > > +static void hugetlb_cgroup_init(struct hugetlb_cgroup *h_cgroup,
> > > + struct hugetlb_cgroup *parent_h_cgroup)
> > > +{
> > > + int idx;
> > > +
> > > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) {
> > > + struct page_counter *counter = &h_cgroup->hugepage[idx];
> > > + struct page_counter *parent = NULL;
> > > + unsigned long limit;
> > > + int ret;
> > > +
> > > + if (parent_h_cgroup)
> > > + parent = &parent_h_cgroup->hugepage[idx];
> > > + page_counter_init(counter, parent);
> > > +
> > > + limit = round_down(PAGE_COUNTER_MAX,
> > > + 1 << huge_page_order(&hstates[idx]));
> > > + ret = page_counter_limit(counter, limit);
> > > + VM_BUG_ON(ret);
> > > + }
> > > +}
> >
> > I fail to see the point for this. Why would want to round down
> > PAGE_COUNTER_MAX? It will never make a real difference. Or am I missing
> > something?
>
> Did you try the patch?
>
> If we're rounding down the user value, it makes sense to be consistent
> with the upper bound default to specify intent.

The point I've tried to raise is why do we care and add a code if we can
never reach that value? Does actually anybody checks for the alignment.
--
Michal Hocko
SUSE Labs

2016-04-18 21:24:04

by David Rientjes

[permalink] [raw]
Subject: Re: [patch v2] mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size

On Fri, 15 Apr 2016, Michal Hocko wrote:

> > > > +static void hugetlb_cgroup_init(struct hugetlb_cgroup *h_cgroup,
> > > > + struct hugetlb_cgroup *parent_h_cgroup)
> > > > +{
> > > > + int idx;
> > > > +
> > > > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) {
> > > > + struct page_counter *counter = &h_cgroup->hugepage[idx];
> > > > + struct page_counter *parent = NULL;
> > > > + unsigned long limit;
> > > > + int ret;
> > > > +
> > > > + if (parent_h_cgroup)
> > > > + parent = &parent_h_cgroup->hugepage[idx];
> > > > + page_counter_init(counter, parent);
> > > > +
> > > > + limit = round_down(PAGE_COUNTER_MAX,
> > > > + 1 << huge_page_order(&hstates[idx]));
> > > > + ret = page_counter_limit(counter, limit);
> > > > + VM_BUG_ON(ret);
> > > > + }
> > > > +}
> > >
> > > I fail to see the point for this. Why would want to round down
> > > PAGE_COUNTER_MAX? It will never make a real difference. Or am I missing
> > > something?
> >
> > Did you try the patch?
> >
> > If we're rounding down the user value, it makes sense to be consistent
> > with the upper bound default to specify intent.
>
> The point I've tried to raise is why do we care and add a code if we can
> never reach that value? Does actually anybody checks for the alignment.

If the user modifies the value successfully, it can never be restored to
the default since the write handler rounds down. It's a matter of
consistency for a long-term maintainable kernel and prevents bug reports.

2016-04-25 21:31:02

by David Rientjes

[permalink] [raw]
Subject: Re: [patch v2] mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size

On Wed, 6 Apr 2016, David Rientjes wrote:

> The page_counter rounds limits down to page size values. This makes
> sense, except in the case of hugetlb_cgroup where it's not possible to
> charge partial hugepages.
>
> Round the hugetlb_cgroup limit down to hugepage size.
>
> Signed-off-by: David Rientjes <[email protected]>

May this be merged into -mm?

> ---
> mm/hugetlb_cgroup.c | 35 ++++++++++++++++++++++++++---------
> 1 file changed, 26 insertions(+), 9 deletions(-)
>
> diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
> --- a/mm/hugetlb_cgroup.c
> +++ b/mm/hugetlb_cgroup.c
> @@ -67,26 +67,42 @@ static inline bool hugetlb_cgroup_have_usage(struct hugetlb_cgroup *h_cg)
> return false;
> }
>
> +static void hugetlb_cgroup_init(struct hugetlb_cgroup *h_cgroup,
> + struct hugetlb_cgroup *parent_h_cgroup)
> +{
> + int idx;
> +
> + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) {
> + struct page_counter *counter = &h_cgroup->hugepage[idx];
> + struct page_counter *parent = NULL;
> + unsigned long limit;
> + int ret;
> +
> + if (parent_h_cgroup)
> + parent = &parent_h_cgroup->hugepage[idx];
> + page_counter_init(counter, parent);
> +
> + limit = round_down(PAGE_COUNTER_MAX,
> + 1 << huge_page_order(&hstates[idx]));
> + ret = page_counter_limit(counter, limit);
> + VM_BUG_ON(ret);
> + }
> +}
> +
> static struct cgroup_subsys_state *
> hugetlb_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> {
> struct hugetlb_cgroup *parent_h_cgroup = hugetlb_cgroup_from_css(parent_css);
> struct hugetlb_cgroup *h_cgroup;
> - int idx;
>
> h_cgroup = kzalloc(sizeof(*h_cgroup), GFP_KERNEL);
> if (!h_cgroup)
> return ERR_PTR(-ENOMEM);
>
> - if (parent_h_cgroup) {
> - for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> - page_counter_init(&h_cgroup->hugepage[idx],
> - &parent_h_cgroup->hugepage[idx]);
> - } else {
> + if (!parent_h_cgroup)
> root_h_cgroup = h_cgroup;
> - for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
> - page_counter_init(&h_cgroup->hugepage[idx], NULL);
> - }
> +
> + hugetlb_cgroup_init(h_cgroup, parent_h_cgroup);
> return &h_cgroup->css;
> }
>
> @@ -285,6 +301,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,
> return ret;
>
> idx = MEMFILE_IDX(of_cft(of)->private);
> + nr_pages = round_down(nr_pages, 1 << huge_page_order(&hstates[idx]));
>
> switch (MEMFILE_ATTR(of_cft(of)->private)) {
> case RES_LIMIT:
>

2016-04-25 21:52:26

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch v2] mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size

On Mon, 18 Apr 2016 14:23:58 -0700 (PDT) David Rientjes <[email protected]> wrote:

> On Fri, 15 Apr 2016, Michal Hocko wrote:
>
> > > > > +static void hugetlb_cgroup_init(struct hugetlb_cgroup *h_cgroup,
> > > > > + struct hugetlb_cgroup *parent_h_cgroup)
> > > > > +{
> > > > > + int idx;
> > > > > +
> > > > > + for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) {
> > > > > + struct page_counter *counter = &h_cgroup->hugepage[idx];
> > > > > + struct page_counter *parent = NULL;
> > > > > + unsigned long limit;
> > > > > + int ret;
> > > > > +
> > > > > + if (parent_h_cgroup)
> > > > > + parent = &parent_h_cgroup->hugepage[idx];
> > > > > + page_counter_init(counter, parent);
> > > > > +
> > > > > + limit = round_down(PAGE_COUNTER_MAX,
> > > > > + 1 << huge_page_order(&hstates[idx]));
> > > > > + ret = page_counter_limit(counter, limit);
> > > > > + VM_BUG_ON(ret);
> > > > > + }
> > > > > +}
> > > >
> > > > I fail to see the point for this. Why would want to round down
> > > > PAGE_COUNTER_MAX? It will never make a real difference. Or am I missing
> > > > something?
> > >
> > > Did you try the patch?
> > >
> > > If we're rounding down the user value, it makes sense to be consistent
> > > with the upper bound default to specify intent.
> >
> > The point I've tried to raise is why do we care and add a code if we can
> > never reach that value? Does actually anybody checks for the alignment.
>
> If the user modifies the value successfully, it can never be restored to
> the default since the write handler rounds down. It's a matter of
> consistency for a long-term maintainable kernel and prevents bug reports.

Can we please get the above reasoning into the changelog?

Also, the runtime effects of the patch are unclear - "not possible to
charge partial hugepages" sounds serious, but there's no cc:stable.
Some clarification there also please.

2016-04-25 23:55:00

by David Rientjes

[permalink] [raw]
Subject: [patch v3] mm, hugetlb_cgroup: round limit_in_bytes down to hugepage size

The page_counter rounds limits down to page size values. This makes
sense, except in the case of hugetlb_cgroup where it's not possible to
charge partial hugepages. If the hugetlb_cgroup margin is less than the
hugepage size being charged, it will fail as expected.

Round the hugetlb_cgroup limit down to hugepage size, since it is the
effective limit of the cgroup.

For consistency, round down PAGE_COUNTER_MAX as well when a
hugetlb_cgroup is created: this prevents error reports when a user cannot
restore the value to the kernel default.

Signed-off-by: David Rientjes <[email protected]>
---
v3: update changelog per akpm
no stable backport needed

mm/hugetlb_cgroup.c | 35 ++++++++++++++++++++++++++---------
1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/mm/hugetlb_cgroup.c b/mm/hugetlb_cgroup.c
--- a/mm/hugetlb_cgroup.c
+++ b/mm/hugetlb_cgroup.c
@@ -67,26 +67,42 @@ static inline bool hugetlb_cgroup_have_usage(struct hugetlb_cgroup *h_cg)
return false;
}

+static void hugetlb_cgroup_init(struct hugetlb_cgroup *h_cgroup,
+ struct hugetlb_cgroup *parent_h_cgroup)
+{
+ int idx;
+
+ for (idx = 0; idx < HUGE_MAX_HSTATE; idx++) {
+ struct page_counter *counter = &h_cgroup->hugepage[idx];
+ struct page_counter *parent = NULL;
+ unsigned long limit;
+ int ret;
+
+ if (parent_h_cgroup)
+ parent = &parent_h_cgroup->hugepage[idx];
+ page_counter_init(counter, parent);
+
+ limit = round_down(PAGE_COUNTER_MAX,
+ 1 << huge_page_order(&hstates[idx]));
+ ret = page_counter_limit(counter, limit);
+ VM_BUG_ON(ret);
+ }
+}
+
static struct cgroup_subsys_state *
hugetlb_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
{
struct hugetlb_cgroup *parent_h_cgroup = hugetlb_cgroup_from_css(parent_css);
struct hugetlb_cgroup *h_cgroup;
- int idx;

h_cgroup = kzalloc(sizeof(*h_cgroup), GFP_KERNEL);
if (!h_cgroup)
return ERR_PTR(-ENOMEM);

- if (parent_h_cgroup) {
- for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
- page_counter_init(&h_cgroup->hugepage[idx],
- &parent_h_cgroup->hugepage[idx]);
- } else {
+ if (!parent_h_cgroup)
root_h_cgroup = h_cgroup;
- for (idx = 0; idx < HUGE_MAX_HSTATE; idx++)
- page_counter_init(&h_cgroup->hugepage[idx], NULL);
- }
+
+ hugetlb_cgroup_init(h_cgroup, parent_h_cgroup);
return &h_cgroup->css;
}

@@ -285,6 +301,7 @@ static ssize_t hugetlb_cgroup_write(struct kernfs_open_file *of,
return ret;

idx = MEMFILE_IDX(of_cft(of)->private);
+ nr_pages = round_down(nr_pages, 1 << huge_page_order(&hstates[idx]));

switch (MEMFILE_ATTR(of_cft(of)->private)) {
case RES_LIMIT: