The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in
__vmalloc_area_node()") switched to bulk page allocator for order 0
allocation backing vmalloc. However bulk page allocator does not support
__GFP_ACCOUNT allocations and there are several users of
kvmalloc(__GFP_ACCOUNT).
For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In
future if there is workload that can be significantly improved with the
bulk page allocator with __GFP_ACCCOUNT support, we can revisit the
decision.
Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()")
Signed-off-by: Shakeel Butt <[email protected]>
---
mm/page_alloc.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 668edb16446a..b3acad4615d3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
unsigned int alloc_flags = ALLOC_WMARK_LOW;
int nr_populated = 0, nr_account = 0;
+ /* Bulk allocator does not support memcg accounting. */
+ if (unlikely(gfp & __GFP_ACCOUNT))
+ goto out;
+
/*
* Skip populated array elements to determine if any pages need
* to be allocated before disabling IRQs.
--
2.33.0.882.g93a45727a2-goog
On Wed, Oct 13, 2021 at 12:43:38PM -0700, Shakeel Butt wrote:
> The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in
> __vmalloc_area_node()") switched to bulk page allocator for order 0
> allocation backing vmalloc. However bulk page allocator does not support
> __GFP_ACCOUNT allocations and there are several users of
> kvmalloc(__GFP_ACCOUNT).
>
> For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In
> future if there is workload that can be significantly improved with the
> bulk page allocator with __GFP_ACCCOUNT support, we can revisit the
> decision.
>
> Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()")
> Signed-off-by: Shakeel Butt <[email protected]>
> ---
> mm/page_alloc.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 668edb16446a..b3acad4615d3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> unsigned int alloc_flags = ALLOC_WMARK_LOW;
> int nr_populated = 0, nr_account = 0;
>
> + /* Bulk allocator does not support memcg accounting. */
> + if (unlikely(gfp & __GFP_ACCOUNT))
> + goto out;
> +
Isn't it a bit too aggressive?
How about
if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT))
gfp &= ~__GFP_ACCOUNT;
And maybe with some explanatory message?
Thanks!
On Wed, Oct 13, 2021 at 3:03 PM Roman Gushchin <[email protected]> wrote:
>
> On Wed, Oct 13, 2021 at 12:43:38PM -0700, Shakeel Butt wrote:
> > The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in
> > __vmalloc_area_node()") switched to bulk page allocator for order 0
> > allocation backing vmalloc. However bulk page allocator does not support
> > __GFP_ACCOUNT allocations and there are several users of
> > kvmalloc(__GFP_ACCOUNT).
> >
> > For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In
> > future if there is workload that can be significantly improved with the
> > bulk page allocator with __GFP_ACCCOUNT support, we can revisit the
> > decision.
> >
> > Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()")
> > Signed-off-by: Shakeel Butt <[email protected]>
> > ---
> > mm/page_alloc.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 668edb16446a..b3acad4615d3 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> > unsigned int alloc_flags = ALLOC_WMARK_LOW;
> > int nr_populated = 0, nr_account = 0;
> >
> > + /* Bulk allocator does not support memcg accounting. */
> > + if (unlikely(gfp & __GFP_ACCOUNT))
> > + goto out;
> > +
>
> Isn't it a bit too aggressive?
>
> How about
> if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT))
We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can
trigger bulk page allocator through vmalloc, so I don't think the
warning would be any helpful.
> gfp &= ~__GFP_ACCOUNT;
Bulk allocator is best effort, so callers have adequate fallbacks.
Transparently disabling accounting would be unexpected.
On Wed, Oct 13, 2021 at 03:26:11PM -0700, Shakeel Butt wrote:
> On Wed, Oct 13, 2021 at 3:03 PM Roman Gushchin <[email protected]> wrote:
> >
> > On Wed, Oct 13, 2021 at 12:43:38PM -0700, Shakeel Butt wrote:
> > > The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in
> > > __vmalloc_area_node()") switched to bulk page allocator for order 0
> > > allocation backing vmalloc. However bulk page allocator does not support
> > > __GFP_ACCOUNT allocations and there are several users of
> > > kvmalloc(__GFP_ACCOUNT).
> > >
> > > For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In
> > > future if there is workload that can be significantly improved with the
> > > bulk page allocator with __GFP_ACCCOUNT support, we can revisit the
> > > decision.
> > >
> > > Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()")
> > > Signed-off-by: Shakeel Butt <[email protected]>
> > > ---
> > > mm/page_alloc.c | 4 ++++
> > > 1 file changed, 4 insertions(+)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 668edb16446a..b3acad4615d3 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> > > unsigned int alloc_flags = ALLOC_WMARK_LOW;
> > > int nr_populated = 0, nr_account = 0;
> > >
> > > + /* Bulk allocator does not support memcg accounting. */
> > > + if (unlikely(gfp & __GFP_ACCOUNT))
> > > + goto out;
> > > +
> >
> > Isn't it a bit too aggressive?
> >
> > How about
> > if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT))
>
> We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can
> trigger bulk page allocator through vmalloc, so I don't think the
> warning would be any helpful.
>
> > gfp &= ~__GFP_ACCOUNT;
>
> Bulk allocator is best effort, so callers have adequate fallbacks.
> Transparently disabling accounting would be unexpected.
I see...
Shouldn't we then move this check to an upper level?
E.g.:
if (!(gfp & __GFP_ACCOUNT))
call_into_bulk_allocator();
else
call_into_per_page_allocator();
Not a big deal, I'm just worried about potential silent memory allocation
failures.
Thanks!
On Wed, Oct 13, 2021 at 4:15 PM Roman Gushchin <[email protected]> wrote:
>
[...]
> > >
> > > Isn't it a bit too aggressive?
> > >
> > > How about
> > > if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT))
> >
> > We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can
> > trigger bulk page allocator through vmalloc, so I don't think the
> > warning would be any helpful.
> >
> > > gfp &= ~__GFP_ACCOUNT;
> >
> > Bulk allocator is best effort, so callers have adequate fallbacks.
> > Transparently disabling accounting would be unexpected.
>
> I see...
>
> Shouldn't we then move this check to an upper level?
>
> E.g.:
>
> if (!(gfp & __GFP_ACCOUNT))
> call_into_bulk_allocator();
> else
> call_into_per_page_allocator();
>
If we add this check in the upper level (e.g. in vm_area_alloc_pages()
) then I think we would need WARN_ON_ONCE(gfp & __GFP_ACCOUNT) in the
bulk allocator to detect future users.
At the moment I am more inclined towards this patch's approach. Let's
say in future we find there is a __GFP_ACCOUNT allocation which can
benefit from bulk allocator and we decide to add such support in bulk
allocator then we would not need to change the bulk allocator callers
at that time just the bulk allocator.
On Wed, Oct 13, 2021 at 04:45:35PM -0700, Shakeel Butt wrote:
> On Wed, Oct 13, 2021 at 4:15 PM Roman Gushchin <[email protected]> wrote:
> >
> [...]
> > > >
> > > > Isn't it a bit too aggressive?
> > > >
> > > > How about
> > > > if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT))
> > >
> > > We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can
> > > trigger bulk page allocator through vmalloc, so I don't think the
> > > warning would be any helpful.
> > >
> > > > gfp &= ~__GFP_ACCOUNT;
> > >
> > > Bulk allocator is best effort, so callers have adequate fallbacks.
> > > Transparently disabling accounting would be unexpected.
> >
> > I see...
> >
> > Shouldn't we then move this check to an upper level?
> >
> > E.g.:
> >
> > if (!(gfp & __GFP_ACCOUNT))
> > call_into_bulk_allocator();
> > else
> > call_into_per_page_allocator();
> >
>
> If we add this check in the upper level (e.g. in vm_area_alloc_pages()
> ) then I think we would need WARN_ON_ONCE(gfp & __GFP_ACCOUNT) in the
> bulk allocator to detect future users.
>
> At the moment I am more inclined towards this patch's approach. Let's
> say in future we find there is a __GFP_ACCOUNT allocation which can
> benefit from bulk allocator and we decide to add such support in bulk
> allocator then we would not need to change the bulk allocator callers
> at that time just the bulk allocator.
Ok, no objections from me.
Thanks!
On Wed, Oct 13, 2021 at 04:45:35PM -0700, Shakeel Butt wrote:
> On Wed, Oct 13, 2021 at 4:15 PM Roman Gushchin <[email protected]> wrote:
> >
> [...]
> > > >
> > > > Isn't it a bit too aggressive?
> > > >
> > > > How about
> > > > if (WARN_ON_ONCE(gfp & __GFP_ACCOUNT))
> > >
> > > We actually know that kvmalloc(__GFP_ACCOUNT) users exist and can
> > > trigger bulk page allocator through vmalloc, so I don't think the
> > > warning would be any helpful.
> > >
> > > > gfp &= ~__GFP_ACCOUNT;
> > >
> > > Bulk allocator is best effort, so callers have adequate fallbacks.
> > > Transparently disabling accounting would be unexpected.
> >
> > I see...
> >
> > Shouldn't we then move this check to an upper level?
> >
> > E.g.:
> >
> > if (!(gfp & __GFP_ACCOUNT))
> > call_into_bulk_allocator();
> > else
> > call_into_per_page_allocator();
> >
>
> If we add this check in the upper level (e.g. in vm_area_alloc_pages()
> ) then I think we would need WARN_ON_ONCE(gfp & __GFP_ACCOUNT) in the
> bulk allocator to detect future users.
>
> At the moment I am more inclined towards this patch's approach. Let's
> say in future we find there is a __GFP_ACCOUNT allocation which can
> benefit from bulk allocator and we decide to add such support in bulk
> allocator then we would not need to change the bulk allocator callers
> at that time just the bulk allocator.
I agree with you. Let's apply the patch as-is.
On 13.10.2021 22:43, Shakeel Butt wrote:
> The commit 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in
> __vmalloc_area_node()") switched to bulk page allocator for order 0
> allocation backing vmalloc. However bulk page allocator does not support
> __GFP_ACCOUNT allocations and there are several users of
> kvmalloc(__GFP_ACCOUNT).
>
> For now make __GFP_ACCOUNT allocations bypass bulk page allocator. In
> future if there is workload that can be significantly improved with the
> bulk page allocator with __GFP_ACCCOUNT support, we can revisit the
> decision.
>
> Fixes: 5c1f4e690eec ("mm/vmalloc: switch to bulk allocator in __vmalloc_area_node()")
> Signed-off-by: Shakeel Butt <[email protected]>
> ---
> mm/page_alloc.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 668edb16446a..b3acad4615d3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> unsigned int alloc_flags = ALLOC_WMARK_LOW;
> int nr_populated = 0, nr_account = 0;
>
> + /* Bulk allocator does not support memcg accounting. */
> + if (unlikely(gfp & __GFP_ACCOUNT))
> + goto out;
> +
May be (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) check is better here?
> /*
> * Skip populated array elements to determine if any pages need
> * to be allocated before disabling IRQs.
>
On Wed 13-10-21 12:43:38, Shakeel Butt wrote:
[...]
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 668edb16446a..b3acad4615d3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> unsigned int alloc_flags = ALLOC_WMARK_LOW;
> int nr_populated = 0, nr_account = 0;
>
> + /* Bulk allocator does not support memcg accounting. */
> + if (unlikely(gfp & __GFP_ACCOUNT))
> + goto out;
Did you mean goto failed here? This would break some which do not
have any fallback. E.g. xfs_buf_alloc_pages but likely more.
Sorry I could have been more specific when talking about bypassing the
bulk allocator. It is quite confusing because the bulk allocator
interface consists of the bulk allocator and the fallback to the normal
page allocator.
> +
> /*
> * Skip populated array elements to determine if any pages need
> * to be allocated before disabling IRQs.
> --
> 2.33.0.882.g93a45727a2-goog
--
Michal Hocko
SUSE Labs
On Thu 14-10-21 10:05:21, Vasily Averin wrote:
[...]
> > + /* Bulk allocator does not support memcg accounting. */
> > + if (unlikely(gfp & __GFP_ACCOUNT))
> > + goto out;
> > +
>
> May be (memcg_kmem_enabled() && (gfp & __GFP_ACCOUNT)) check is better here?
Yes please
--
Michal Hocko
SUSE Labs
On Thu 14-10-21 08:01:16, Shakeel Butt wrote:
> On Thu, Oct 14, 2021 at 12:16 AM Michal Hocko <[email protected]> wrote:
> >
> > On Wed 13-10-21 12:43:38, Shakeel Butt wrote:
> > [...]
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 668edb16446a..b3acad4615d3 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> > > unsigned int alloc_flags = ALLOC_WMARK_LOW;
> > > int nr_populated = 0, nr_account = 0;
> > >
> > > + /* Bulk allocator does not support memcg accounting. */
> > > + if (unlikely(gfp & __GFP_ACCOUNT))
> > > + goto out;
> >
> > Did you mean goto failed here? This would break some which do not
> > have any fallback. E.g. xfs_buf_alloc_pages but likely more.
> >
> > Sorry I could have been more specific when talking about bypassing the
> > bulk allocator. It is quite confusing because the bulk allocator
> > interface consists of the bulk allocator and the fallback to the normal
> > page allocator.
> >
>
> I did consider 'goto failed' here but for that I have to move
> __GFP_ACCOUNT check after the "Already populated array" check in the
> function. Basically what's the point of doing other operations
> (incrementing nr_populated) if we are gonna skip bulk anyways.
I have to say I do not follow why that is a problem.
> Regarding xfs_buf_alloc_pages(), it is not using __GFP_ACCOUNT and
> vmalloc() is the only __GFP_ACCOUNT user at this point. So, not an
> issue for now but I suppose it is better to be future-proof and do the
> 'goto failed'.
Why do we want to have that silent trap?
--
Michal Hocko
SUSE Labs
On Thu, Oct 14, 2021 at 12:16 AM Michal Hocko <[email protected]> wrote:
>
> On Wed 13-10-21 12:43:38, Shakeel Butt wrote:
> [...]
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 668edb16446a..b3acad4615d3 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5215,6 +5215,10 @@ unsigned long __alloc_pages_bulk(gfp_t gfp, int preferred_nid,
> > unsigned int alloc_flags = ALLOC_WMARK_LOW;
> > int nr_populated = 0, nr_account = 0;
> >
> > + /* Bulk allocator does not support memcg accounting. */
> > + if (unlikely(gfp & __GFP_ACCOUNT))
> > + goto out;
>
> Did you mean goto failed here? This would break some which do not
> have any fallback. E.g. xfs_buf_alloc_pages but likely more.
>
> Sorry I could have been more specific when talking about bypassing the
> bulk allocator. It is quite confusing because the bulk allocator
> interface consists of the bulk allocator and the fallback to the normal
> page allocator.
>
I did consider 'goto failed' here but for that I have to move
__GFP_ACCOUNT check after the "Already populated array" check in the
function. Basically what's the point of doing other operations
(incrementing nr_populated) if we are gonna skip bulk anyways.
Regarding xfs_buf_alloc_pages(), it is not using __GFP_ACCOUNT and
vmalloc() is the only __GFP_ACCOUNT user at this point. So, not an
issue for now but I suppose it is better to be future-proof and do the
'goto failed'.
Let me know what you think.
On 14.10.2021 18:23, Johannes Weiner wrote:
> On Thu, Oct 14, 2021 at 08:01:16AM -0700, Shakeel Butt wrote:
>> Regarding xfs_buf_alloc_pages(), it is not using __GFP_ACCOUNT
>
> It probably should, actually. Sorry, somewhat off-topic, but we've
> seen this consume quite large amounts of memory. __GFP_ACCOUNT and
> vmstat tracking seems overdue for this one.
If this will be required, you can use
[PATCH mm v5] memcg: enable memory accounting in __alloc_pages_bulk
https://lkml.org/lkml/2021/10/14/197
As far as I understand it will not be used right now,
however I decided to submit it anyway, perhaps it may be needed later.
Thank you,
Vasily Averin
On Thu, Oct 14, 2021 at 08:01:16AM -0700, Shakeel Butt wrote:
> Regarding xfs_buf_alloc_pages(), it is not using __GFP_ACCOUNT
It probably should, actually. Sorry, somewhat off-topic, but we've
seen this consume quite large amounts of memory. __GFP_ACCOUNT and
vmstat tracking seems overdue for this one.