2023-03-22 15:17:50

by Florian Schmidt

[permalink] [raw]
Subject: [RFC] memcg v1: provide read access to memory.pressure_level

cgroups v1 has a unique way of setting up memory pressure notifications:
the user opens "memory.pressure_level" of the cgroup they want to
monitor for pressure, then open "cgroup.event_control" and write the fd
(among other things) to that file. memory.pressure_level has no other
use, specifically it does not support any read or write operations.
Consequently, no handlers are provided, and the file ends up with
permissions 000. However, to actually use the mechanism, the subscribing
user must have read access to the file and open the fd for reading, see
memcg_write_event_control().

This is all fine as long as the subscribing process runs as root and is
otherwise unconfined by further restrictions. However, if you add strict
access controls such as selinux, the permission bits will be enforced,
and opening memory.pressure_level for reading will fail, preventing the
process from subscribing, even as root.

There are several ways around this issue, but adding a dummy read
handler seems like the least invasive to me. I'd be interested to hear:
(a) do you think there is a less invasive way? Alternatively, we could
add a flag in cftype in include/linux/cgroup-defs.h, but that seems
more invasive for what is a legacy interface.
(b) would you be interested to take this patch, or is it too niche a fix
for a legacy subsystem?
---
mm/memcontrol.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5abffe6f8389..e48c749d9724 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3734,6 +3734,16 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
}
}

+/*
+ * This function doesn't do anything useful. Its only job is to provide a read
+ * handler so that the file gets read permissions when it's created.
+ */
+static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m,
+ __always_unused void *v)
+{
+ return -EINVAL;
+}
+
#ifdef CONFIG_MEMCG_KMEM
static int memcg_online_kmem(struct mem_cgroup *memcg)
{
@@ -5064,6 +5074,7 @@ static struct cftype mem_cgroup_legacy_files[] = {
},
{
.name = "pressure_level",
+ .seq_show = mem_cgroup_dummy_seq_show,
},
#ifdef CONFIG_NUMA
{
--
2.32.0


2023-03-22 16:02:33

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] memcg v1: provide read access to memory.pressure_level

On Wed 22-03-23 14:25:25, Florian Schmidt wrote:
> cgroups v1 has a unique way of setting up memory pressure notifications:
> the user opens "memory.pressure_level" of the cgroup they want to
> monitor for pressure, then open "cgroup.event_control" and write the fd
> (among other things) to that file. memory.pressure_level has no other
> use, specifically it does not support any read or write operations.
> Consequently, no handlers are provided, and the file ends up with
> permissions 000. However, to actually use the mechanism, the subscribing
> user must have read access to the file and open the fd for reading, see
> memcg_write_event_control().
>
> This is all fine as long as the subscribing process runs as root and is
> otherwise unconfined by further restrictions. However, if you add strict
> access controls such as selinux, the permission bits will be enforced,
> and opening memory.pressure_level for reading will fail, preventing the
> process from subscribing, even as root.
>
>
> There are several ways around this issue, but adding a dummy read
> handler seems like the least invasive to me.

I was struggling to see how that addresses the problem because all you
need is a read permission. But then I've looked into cgroup code and
learned that permissions are constructed based on available callbacks
(cgroup_file_mode). This would have made the review easier ;)

I have no issue with the patch. It would be great to hear from cgroup
maintainers whether a concept of default permissions is something that
would be useful also for other files.

> I'd be interested to hear:
> (a) do you think there is a less invasive way? Alternatively, we could
> add a flag in cftype in include/linux/cgroup-defs.h, but that seems
> more invasive for what is a legacy interface.
> (b) would you be interested to take this patch, or is it too niche a fix
> for a legacy subsystem?

After you add your s-o-b, feel free to add
Acked-by: Michal Hocko <[email protected]>

If cgroup people find a concept of default permissions for a cgroup file
sound then this could be replaced by that approach but this is really an
easy workaround.
> ---
> mm/memcontrol.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 5abffe6f8389..e48c749d9724 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -3734,6 +3734,16 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
> }
> }
>
> +/*
> + * This function doesn't do anything useful. Its only job is to provide a read
> + * handler so that the file gets read permissions when it's created.

I would just reference cgroup_file_mode() in the comment to make our
lifes easier and comment more helpful.

> + */
> +static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m,
> + __always_unused void *v)
> +{
> + return -EINVAL;
> +}
> +
> #ifdef CONFIG_MEMCG_KMEM
> static int memcg_online_kmem(struct mem_cgroup *memcg)
> {
> @@ -5064,6 +5074,7 @@ static struct cftype mem_cgroup_legacy_files[] = {
> },
> {
> .name = "pressure_level",
> + .seq_show = mem_cgroup_dummy_seq_show,
> },
> #ifdef CONFIG_NUMA
> {
> --
> 2.32.0

--
Michal Hocko
SUSE Labs

2023-03-22 16:05:39

by Florian Schmidt

[permalink] [raw]
Subject: Re: [RFC] memcg v1: provide read access to memory.pressure_level



On 22/03/2023 15:57, Michal Hocko wrote:
> On Wed 22-03-23 14:25:25, Florian Schmidt wrote:
>> cgroups v1 has a unique way of setting up memory pressure notifications:
>> the user opens "memory.pressure_level" of the cgroup they want to
>> monitor for pressure, then open "cgroup.event_control" and write the fd
>> (among other things) to that file. memory.pressure_level has no other
>> use, specifically it does not support any read or write operations.
>> Consequently, no handlers are provided, and the file ends up with
>> permissions 000. However, to actually use the mechanism, the subscribing
>> user must have read access to the file and open the fd for reading, see
>> memcg_write_event_control().
>>
>> This is all fine as long as the subscribing process runs as root and is
>> otherwise unconfined by further restrictions. However, if you add strict
>> access controls such as selinux, the permission bits will be enforced,
>> and opening memory.pressure_level for reading will fail, preventing the
>> process from subscribing, even as root.
>>
>>
>> There are several ways around this issue, but adding a dummy read
>> handler seems like the least invasive to me.
>
> I was struggling to see how that addresses the problem because all you
> need is a read permission. But then I've looked into cgroup code and
> learned that permissions are constructed based on available callbacks
> (cgroup_file_mode). This would have made the review easier ;)

Oh, sorry, I forgot to mention that salient detail!
I didn't check whether that was a common pattern or not...


>
> I have no issue with the patch. It would be great to hear from cgroup
> maintainers whether a concept of default permissions is something that
> would be useful also for other files.
>
>> I'd be interested to hear:
>> (a) do you think there is a less invasive way? Alternatively, we could
>> add a flag in cftype in include/linux/cgroup-defs.h, but that seems
>> more invasive for what is a legacy interface.
>> (b) would you be interested to take this patch, or is it too niche a fix
>> for a legacy subsystem?
>
> After you add your s-o-b, feel free to add
> Acked-by: Michal Hocko <[email protected]>
>
> If cgroup people find a concept of default permissions for a cgroup file
> sound then this could be replaced by that approach but this is really an
> easy workaround.

Will do, once I know the path forward and construct a proper commit
message, I'll add the s-o-b and ack.

>> ---
>> mm/memcontrol.c | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 5abffe6f8389..e48c749d9724 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -3734,6 +3734,16 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
>> }
>> }
>>
>> +/*
>> + * This function doesn't do anything useful. Its only job is to provide a read
>> + * handler so that the file gets read permissions when it's created.
>
> I would just reference cgroup_file_mode() in the comment to make our
> lifes easier and comment more helpful.

Ack.


>
>> + */
>> +static int mem_cgroup_dummy_seq_show(__always_unused struct seq_file *m,
>> + __always_unused void *v)
>> +{
>> + return -EINVAL;
>> +}
>> +
>> #ifdef CONFIG_MEMCG_KMEM
>> static int memcg_online_kmem(struct mem_cgroup *memcg)
>> {
>> @@ -5064,6 +5074,7 @@ static struct cftype mem_cgroup_legacy_files[] = {
>> },
>> {
>> .name = "pressure_level",
>> + .seq_show = mem_cgroup_dummy_seq_show,
>> },
>> #ifdef CONFIG_NUMA
>> {
>> --
>> 2.32.0
>

2023-03-24 15:13:40

by Michal Koutný

[permalink] [raw]
Subject: Re: [RFC] memcg v1: provide read access to memory.pressure_level

Hello.

On Wed, Mar 22, 2023 at 02:25:25PM +0000, Florian Schmidt <[email protected]> wrote:
> cgroups v1 has a unique way of setting up memory pressure notifications:
...
> There are several ways around this issue, but adding a dummy read
> handler seems like the least invasive to me. I'd be interested to hear:
> (a) do you think there is a less invasive way? Alternatively, we could
> add a flag in cftype in include/linux/cgroup-defs.h, but that seems
> more invasive for what is a legacy interface.

You can (as privileged user) modify file perms in userspace first (e.g.
chmod o+r memory.pressure_level) and then it can used by non-privileged
users. (Or do LSM prevent you from that too?)

> (b) would you be interested to take this patch, or is it too niche a fix
> for a legacy subsystem?

I'd rather not extend this "unique way" with additionally unique dummy
helpers.

My 0.02 €,
Michal


Attachments:
(No filename) (936.00 B)
signature.asc (235.00 B)
Download all attachments

2023-03-27 14:08:20

by Florian Schmidt

[permalink] [raw]
Subject: Re: [RFC] memcg v1: provide read access to memory.pressure_level

Hi Michal,

On 24/03/2023 15:03, Michal Koutný wrote:
> On Wed, Mar 22, 2023 at 02:25:25PM +0000, Florian Schmidt <[email protected]> wrote:
>> cgroups v1 has a unique way of setting up memory pressure notifications:
> ...
>> There are several ways around this issue, but adding a dummy read
>> handler seems like the least invasive to me. I'd be interested to hear:
>> (a) do you think there is a less invasive way? Alternatively, we could
>> add a flag in cftype in include/linux/cgroup-defs.h, but that seems
>> more invasive for what is a legacy interface.
>
> You can (as privileged user) modify file perms in userspace first (e.g.
> chmod o+r memory.pressure_level) and then it can used by non-privileged
> users. (Or do LSM prevent you from that too?)

That's true, we can work around this in userspace (though it means you
need to give the process additional permissions, to change file
permissions on top of just reading and writing).

Though considering that the memcg_write_event_control() explicitly
checks whether the caller has read permissions on pressure_level, it
felt sensible to me that the file would be created with read permissions
in the first place, just like all the other files are created with
permissions that are suitable for their immediate use without having to
manually change permissions. The current implementation feels
inconsistent in that way.


>> (b) would you be interested to take this patch, or is it too niche a fix
>> for a legacy subsystem?
>
> I'd rather not extend this "unique way" with additionally unique dummy
> helpers.

I understand that this is all code that has no modern user any more,
which is why I tried to keep the fix as self-contained as possible.
Another option would be to have a special handler in cgroup_file_mode(),
but that feels a lot klunkier to me, and leaks a v1-specific behaviour
into the shared cgroup code.


Cheers,
Florian

2023-03-27 20:53:24

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC] memcg v1: provide read access to memory.pressure_level

On Mon 27-03-23 14:59:37, Florian Schmidt wrote:
> Hi Michal,
>
> On 24/03/2023 15:03, Michal Koutn? wrote:
> > On Wed, Mar 22, 2023 at 02:25:25PM +0000, Florian Schmidt <[email protected]> wrote:
[...]
> > > (b) would you be interested to take this patch, or is it too niche a fix
> > > for a legacy subsystem?
> >
> > I'd rather not extend this "unique way" with additionally unique dummy
> > helpers.
>
> I understand that this is all code that has no modern user any more, which
> is why I tried to keep the fix as self-contained as possible.
> Another option would be to have a special handler in cgroup_file_mode(), but
> that feels a lot klunkier to me, and leaks a v1-specific behaviour into the
> shared cgroup code.

Yes, this is effectivelly a deprecated interface but I do agree that we
shouldn't really make life of users more complicated than necessary. If
the simplest solution to address this is to provide an empty callback
then be it. I am not sure but I do not think there are other cgroup
interfaces to warrant a more generic solution.

--
Michal Hocko
SUSE Labs

2023-04-04 08:47:11

by Florian Schmidt

[permalink] [raw]
Subject: Re: [RFC] memcg v1: provide read access to memory.pressure_level

Hi all,

to summarise, I've heard generally positive feedback from Michal H and
some more reserved, but not fundamentally opposed feedback from Michal
K. Thanks to both of you.

Since there's been no other feedback for the last few days, I'll raise a
proper patch, and any potential further discussion can then be done on that.

Cheers,
Florian