2021-09-11 07:43:30

by Vasily Averin

[permalink] [raw]
Subject: [PATCH] ipc: remove memcg accounting for sops objects in do_semtimedop()

Linus proposes to revert an accounting for sops objects in
do_semtimedop() because it's really just a temporary buffer
for a single semtimedop() system call.

This object can consume up to 2 pages, syscall is sleeping one,
size and duration can be controlled by user, and this allocation
can be repeated by many thread at the same time.

However Shakeel Butt pointed that there are much more popular objects
with the same life time and similar memory consumption, the accounting
of which was decided to be rejected for performance reasons.

In addition, any usual task consumes much more accounted memory,
so 2 pages of this temporal buffer can be safely ignored.

Link: https://patchwork.kernel.org/project/linux-fsdevel/patch/[email protected]/

Fixes: 18319498fdd4 ("memcg: enable accounting of ipc resources")
Signed-off-by: Vasily Averin <[email protected]>
---
ipc/sem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ipc/sem.c b/ipc/sem.c
index f833238df1ce..6693daf4fe11 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -2238,7 +2238,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
return -EINVAL;

if (nsops > SEMOPM_FAST) {
- sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL_ACCOUNT);
+ sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
if (sops == NULL)
return -ENOMEM;
}
--
2.25.1


2021-09-12 16:09:55

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH] ipc: remove memcg accounting for sops objects in do_semtimedop()

On Sat, Sep 11, 2021 at 12:40 AM Vasily Averin <[email protected]> wrote:
>
> Linus proposes to revert an accounting for sops objects in
> do_semtimedop() because it's really just a temporary buffer
> for a single semtimedop() system call.
>
> This object can consume up to 2 pages, syscall is sleeping one,
> size and duration can be controlled by user, and this allocation
> can be repeated by many thread at the same time.
>
> However Shakeel Butt pointed that there are much more popular objects
> with the same life time and similar memory consumption, the accounting
> of which was decided to be rejected for performance reasons.
>
> In addition, any usual task consumes much more accounted memory,
> so 2 pages of this temporal buffer can be safely ignored.
>
> Link: https://patchwork.kernel.org/project/linux-fsdevel/patch/[email protected]/
>
> Fixes: 18319498fdd4 ("memcg: enable accounting of ipc resources")
> Signed-off-by: Vasily Averin <[email protected]>

Thanks Vasily.

Acked-by: Shakeel Butt <[email protected]>

2021-09-13 08:38:37

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] ipc: remove memcg accounting for sops objects in do_semtimedop()

On Sat 11-09-21 10:40:08, Vasily Averin wrote:
> Linus proposes to revert an accounting for sops objects in
> do_semtimedop() because it's really just a temporary buffer
> for a single semtimedop() system call.
>
> This object can consume up to 2 pages, syscall is sleeping one,
> size and duration can be controlled by user, and this allocation
> can be repeated by many thread at the same time.

Is there any upper bound or is it just bounded by the number of
tasks/threads (that can be controlled by pid controller at least)?

> However Shakeel Butt pointed that there are much more popular objects
> with the same life time and similar memory consumption, the accounting
> of which was decided to be rejected for performance reasons.

Is there any measurable performance impact in this particular case?

> In addition, any usual task consumes much more accounted memory,
> so 2 pages of this temporal buffer can be safely ignored.
>
> Link: https://patchwork.kernel.org/project/linux-fsdevel/patch/[email protected]/
>
> Fixes: 18319498fdd4 ("memcg: enable accounting of ipc resources")
> Signed-off-by: Vasily Averin <[email protected]>
> ---
> ipc/sem.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ipc/sem.c b/ipc/sem.c
> index f833238df1ce..6693daf4fe11 100644
> --- a/ipc/sem.c
> +++ b/ipc/sem.c
> @@ -2238,7 +2238,7 @@ static long do_semtimedop(int semid, struct sembuf __user *tsops,
> return -EINVAL;
>
> if (nsops > SEMOPM_FAST) {
> - sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL_ACCOUNT);
> + sops = kvmalloc_array(nsops, sizeof(*sops), GFP_KERNEL);
> if (sops == NULL)
> return -ENOMEM;
> }
> --
> 2.25.1

--
Michal Hocko
SUSE Labs

2021-09-14 04:34:49

by Shakeel Butt

[permalink] [raw]
Subject: Re: [PATCH] ipc: remove memcg accounting for sops objects in do_semtimedop()

On Mon, Sep 13, 2021 at 1:37 AM Michal Hocko <[email protected]> wrote:
>
[...]
> > However Shakeel Butt pointed that there are much more popular objects
> > with the same life time and similar memory consumption, the accounting
> > of which was decided to be rejected for performance reasons.
>
> Is there any measurable performance impact in this particular case?
>

I don't think there was any regression report or any performance
evaluation. Linus raised the concern on the potential performance
impact. I suggested to backoff for this allocation for now and revisit
again once we have improved the memcg accounting for kernel memory.

2021-09-14 07:18:09

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] ipc: remove memcg accounting for sops objects in do_semtimedop()

On Mon 13-09-21 21:32:25, Shakeel Butt wrote:
> On Mon, Sep 13, 2021 at 1:37 AM Michal Hocko <[email protected]> wrote:
> >
> [...]
> > > However Shakeel Butt pointed that there are much more popular objects
> > > with the same life time and similar memory consumption, the accounting
> > > of which was decided to be rejected for performance reasons.
> >
> > Is there any measurable performance impact in this particular case?
> >
>
> I don't think there was any regression report or any performance
> evaluation. Linus raised the concern on the potential performance
> impact. I suggested to backoff for this allocation for now and revisit
> again once we have improved the memcg accounting for kernel memory.

I am fine with the change, I am just not satisfied with the
justification. It is not really clear what the intention is except that
Linus wanted it. I have already asked Vasily to provide more
explanation. E.g. this part really begs for clarification
"
This object can consume up to 2 pages, syscall is sleeping one,
size and duration can be controlled by user, and this allocation
can be repeated by many thread at the same time.
"

It sounds like a problem, except it is not because? A worst case
scenario evaluation would be beneficial for example

Thanks!

--
Michal Hocko
SUSE Labs

2021-09-14 14:29:14

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH] ipc: remove memcg accounting for sops objects in do_semtimedop()

On Tue, Sep 14, 2021 at 09:13:48AM +0200, Michal Hocko <[email protected]> wrote:
> "
> This object can consume up to 2 pages, syscall is sleeping one,
> size and duration can be controlled by user, and this allocation
> can be repeated by many thread at the same time.
> "
>
> It sounds like a problem, except it is not because? A worst case
> scenario evaluation would be beneficial for example

AFAICS, the offending allocation is in place only during the duration of
the syscall. So it's basically O(#tasks).
Considering at least 2 pages for task_struct + 2 pages for kernel stack,
back of the envelope calculation gives me the footprint amplification is
<1.5.
The factor would IMO be interesting if it was >> 2 (from the PoV of
excessive (ab)use, fine-grained accounting seems to be currently
unfeasible due to performance impact).

The commit message can be more explicit about this but to the patch
Reviewed-by: Michal Koutn? <[email protected]>

2021-09-14 14:37:24

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH] ipc: remove memcg accounting for sops objects in do_semtimedop()

On Tue 14-09-21 16:23:16, Michal Koutny wrote:
> On Tue, Sep 14, 2021 at 09:13:48AM +0200, Michal Hocko <[email protected]> wrote:
> > "
> > This object can consume up to 2 pages, syscall is sleeping one,
> > size and duration can be controlled by user, and this allocation
> > can be repeated by many thread at the same time.
> > "
> >
> > It sounds like a problem, except it is not because? A worst case
> > scenario evaluation would be beneficial for example
>
> AFAICS, the offending allocation is in place only during the duration of
> the syscall. So it's basically O(#tasks).
> Considering at least 2 pages for task_struct + 2 pages for kernel stack,
> back of the envelope calculation gives me the footprint amplification is
> <1.5.
> The factor would IMO be interesting if it was >> 2 (from the PoV of
> excessive (ab)use, fine-grained accounting seems to be currently
> unfeasible due to performance impact).

Yes this sounds exactly like something I would appreciate in the
changelog. With that or similar feel free to add
Acked-by: Michal Hocko <[email protected]>

Thanks a lot Michal for this clarification!

> The commit message can be more explicit about this but to the patch
> Reviewed-by: Michal Koutn? <[email protected]>
--
Michal Hocko
SUSE Labs