LinuxLists.cc - [PATCH] cgroup: pids: use {READ,WRITE}

2019-10-12 01:07:14

Subject: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

Because pids->limit can be changed concurrently (but we don't want to
take a lock because it would be needlessly expensive), use the
appropriate memory barriers.

Fixes: commit 49b786ea146f ("cgroup: implement the PIDs subsystem")
Cc: [email protected] # v4.3+
Signed-off-by: Aleksa Sarai <[email protected]>
---
kernel/cgroup/pids.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index 8e513a573fe9..a726e4a20177 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -152,7 +152,7 @@ static int pids_try_charge(struct pids_cgroup *pids, int num)
* p->limit is %PIDS_MAX then we know that this test will never
* fail.
*/
- if (new > p->limit)
+ if (new > READ_ONCE(p->limit))
goto revert;
}

@@ -277,7 +277,7 @@ static ssize_t pids_max_write(struct kernfs_open_file *of, char *buf,
* Limit updates don't need to be mutex'd, since it isn't
* critical that any racing fork()s follow the new limit.
*/
- pids->limit = limit;
+ WRITE_ONCE(pids->limit, limit);
return nbytes;
}

@@ -285,7 +285,7 @@ static int pids_max_show(struct seq_file *sf, void *v)
{
struct cgroup_subsys_state *css = seq_css(sf);
struct pids_cgroup *pids = css_pids(css);
- int64_t limit = pids->limit;
+ int64_t limit = READ_ONCE(pids->limit);

if (limit >= PIDS_MAX)
seq_printf(sf, "%s\n", PIDS_MAX_STR);
--
2.23.0

2019-10-14 16:22:04

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
> Because pids->limit can be changed concurrently (but we don't want to
> take a lock because it would be needlessly expensive), use the
> appropriate memory barriers.

I can't quite tell what problem it's fixing. Can you elaborate a
scenario where the current code would break that your patch fixes?

Thanks.

--
tejun

2019-10-14 16:33:38

by Aleksa Sarai

[permalink] [raw]

Subject: Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

On 2019-10-14, Tejun Heo <[email protected]> wrote:
> On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
> > Because pids->limit can be changed concurrently (but we don't want to
> > take a lock because it would be needlessly expensive), use the
> > appropriate memory barriers.
>
> I can't quite tell what problem it's fixing. Can you elaborate a
> scenario where the current code would break that your patch fixes?

As far as I can tell, not using *_ONCE() here means that if you had a
process changing pids->limit from A to B, a process might be able to
temporarily exceed pids->limit -- because pids->limit accesses are not
protected by mutexes and the C compiler can produce confusing
intermediate values for pids->limit[1].

But this is more of a correctness fix than one fixing an actually
exploitable bug -- given the kernel memory model work, it seems like a
good idea to just use READ_ONCE() and WRITE_ONCE() for shared memory
access.

[1]: https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Attachments:

(No filename) (1.12 kB)
signature.asc (235.00 B)
Download all attachments

2019-10-14 17:02:47

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

Hello, Aleksa.

On Tue, Oct 15, 2019 at 02:59:31AM +1100, Aleksa Sarai wrote:
> On 2019-10-14, Tejun Heo <[email protected]> wrote:
> > On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
> > > Because pids->limit can be changed concurrently (but we don't want to
> > > take a lock because it would be needlessly expensive), use the
> > > appropriate memory barriers.
> >
> > I can't quite tell what problem it's fixing. Can you elaborate a
> > scenario where the current code would break that your patch fixes?
>
> As far as I can tell, not using *_ONCE() here means that if you had a
> process changing pids->limit from A to B, a process might be able to
> temporarily exceed pids->limit -- because pids->limit accesses are not
> protected by mutexes and the C compiler can produce confusing
> intermediate values for pids->limit[1].
>
> But this is more of a correctness fix than one fixing an actually
> exploitable bug -- given the kernel memory model work, it seems like a
> good idea to just use READ_ONCE() and WRITE_ONCE() for shared memory
> access.

READ/WRITE_ONCE provides protection against compiler generating
multiple accesses for a single operation. It won't prevent split
writes / reads of 64bit variables on 32bit machines. For that, you'd
have to switch them to atomic64_t's.

Thanks.

--
tejun

2019-10-16 12:57:56

by Aleksa Sarai

[permalink] [raw]

Subject: Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

On 2019-10-14, Tejun Heo <[email protected]> wrote:
> Hello, Aleksa.
>
> On Tue, Oct 15, 2019 at 02:59:31AM +1100, Aleksa Sarai wrote:
> > On 2019-10-14, Tejun Heo <[email protected]> wrote:
> > > On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
> > > > Because pids->limit can be changed concurrently (but we don't want to
> > > > take a lock because it would be needlessly expensive), use the
> > > > appropriate memory barriers.
> > >
> > > I can't quite tell what problem it's fixing. Can you elaborate a
> > > scenario where the current code would break that your patch fixes?
> >
> > As far as I can tell, not using *_ONCE() here means that if you had a
> > process changing pids->limit from A to B, a process might be able to
> > temporarily exceed pids->limit -- because pids->limit accesses are not
> > protected by mutexes and the C compiler can produce confusing
> > intermediate values for pids->limit[1].
> >
> > But this is more of a correctness fix than one fixing an actually
> > exploitable bug -- given the kernel memory model work, it seems like a
> > good idea to just use READ_ONCE() and WRITE_ONCE() for shared memory
> > access.
>
> READ/WRITE_ONCE provides protection against compiler generating
> multiple accesses for a single operation. It won't prevent split
> writes / reads of 64bit variables on 32bit machines. For that, you'd
> have to switch them to atomic64_t's.

Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to
me like it's explicitly saying that I shouldn't use atomic64_t if I'm
just using it for fetching and assignment.

> The non-RMW ops are (typically) regular LOADs and STOREs and are
> canonically implemented using READ_ONCE(), WRITE_ONCE(),
> smp_load_acquire() and smp_store_release() respectively. Therefore, if
> you find yourself only using the Non-RMW operations of atomic_t, you
> do not in fact need atomic_t at all and are doing it wrong.

As for 64-bit on 32-bit machines -- that is a separate issue, but from
[1] it seems to me like there are more problems that *_ONCE() fixes than
just split reads and writes.

[1]: https://lwn.net/Articles/793253/

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Attachments:

(No filename) (2.23 kB)
signature.asc (235.00 B)
Download all attachments

2019-10-16 16:24:33

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

Hello, Aleksa.

On Wed, Oct 16, 2019 at 07:32:19PM +1100, Aleksa Sarai wrote:
> Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to
> me like it's explicitly saying that I shouldn't use atomic64_t if I'm
> just using it for fetching and assignment.

Hah, where is it saying that? The alternative would be seqlock or
u64_stats or straight-up locking but idk for this atomic64_t should be
fine.

> > The non-RMW ops are (typically) regular LOADs and STOREs and are
> > canonically implemented using READ_ONCE(), WRITE_ONCE(),
> > smp_load_acquire() and smp_store_release() respectively. Therefore, if
> > you find yourself only using the Non-RMW operations of atomic_t, you
> > do not in fact need atomic_t at all and are doing it wrong.
>
> As for 64-bit on 32-bit machines -- that is a separate issue, but from
> [1] it seems to me like there are more problems that *_ONCE() fixes than
> just split reads and writes.

Your explanations are too wishy washy. If you wanna fix it, please do
it correctly. R/W ONCE isn't the right solution here.

Thanks.

--
tejun

2019-10-16 17:10:31

by Aleksa Sarai

[permalink] [raw]

Subject: Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

On 2019-10-16, Tejun Heo <[email protected]> wrote:
> Hello, Aleksa.
>
> On Wed, Oct 16, 2019 at 07:32:19PM +1100, Aleksa Sarai wrote:
> > Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to
> > me like it's explicitly saying that I shouldn't use atomic64_t if I'm
> > just using it for fetching and assignment.
>
> Hah, where is it saying that?

Isn't that what this says:

> Therefore, if you find yourself only using the Non-RMW operations of
> atomic_t, you do not in fact need atomic_t at all and are doing it
> wrong.

Doesn't using just atomic64_read() and atomic64_set() fall under "only
using the non-RMW operations of atomic_t"? But yes, I agree that any
locking is overkill.

> > As for 64-bit on 32-bit machines -- that is a separate issue, but from
> > [1] it seems to me like there are more problems that *_ONCE() fixes than
> > just split reads and writes.
>
> Your explanations are too wishy washy. If you wanna fix it, please do
> it correctly. R/W ONCE isn't the right solution here.

Sure, I will switch it to use atomic64_read() and atomic64_set() instead
if that's what you'd prefer. Though I will mention that on quite a few
architectures atomic64_read() is defined as:

#define atomic64_read(v) READ_ONCE((v)->counter)

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Attachments:

(No filename) (1.37 kB)
signature.asc (235.00 B)
Download all attachments

2019-10-16 22:21:32

by Aleksa Sarai

[permalink] [raw]

Subject: Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

On 2019-10-17, Aleksa Sarai <[email protected]> wrote:
> On 2019-10-16, Tejun Heo <[email protected]> wrote:
> > Hello, Aleksa.
> >
> > On Wed, Oct 16, 2019 at 07:32:19PM +1100, Aleksa Sarai wrote:
> > > Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to
> > > me like it's explicitly saying that I shouldn't use atomic64_t if I'm
> > > just using it for fetching and assignment.
> >
> > Hah, where is it saying that?
>
> Isn't that what this says:
>
> > Therefore, if you find yourself only using the Non-RMW operations of
> > atomic_t, you do not in fact need atomic_t at all and are doing it
> > wrong.
>
> Doesn't using just atomic64_read() and atomic64_set() fall under "only
> using the non-RMW operations of atomic_t"? But yes, I agree that any
> locking is overkill.
>
> > > As for 64-bit on 32-bit machines -- that is a separate issue, but from
> > > [1] it seems to me like there are more problems that *_ONCE() fixes than
> > > just split reads and writes.
> >
> > Your explanations are too wishy washy. If you wanna fix it, please do
> > it correctly. R/W ONCE isn't the right solution here.
>
> Sure, I will switch it to use atomic64_read() and atomic64_set() instead
> if that's what you'd prefer. Though I will mention that on quite a few
> architectures atomic64_read() is defined as:
>
> #define atomic64_read(v) READ_ONCE((v)->counter)

Though I guess that's because on those architectures it turns out that
READ_ONCE is properly atomic?

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Attachments:

(No filename) (1.59 kB)
signature.asc (235.00 B)
Download all attachments

2019-10-16 22:21:32

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

Hello,

On Thu, Oct 17, 2019 at 02:29:46AM +1100, Aleksa Sarai wrote:
> > Hah, where is it saying that?
>
> Isn't that what this says:
>
> > Therefore, if you find yourself only using the Non-RMW operations of
> > atomic_t, you do not in fact need atomic_t at all and are doing it
> > wrong.
>
> Doesn't using just atomic64_read() and atomic64_set() fall under "only
> using the non-RMW operations of atomic_t"? But yes, I agree that any
> locking is overkill.

Yeah, I mean, it's an overkill. We can use seqlock or u64_stat here
but it doesn't matter that much.

> > > As for 64-bit on 32-bit machines -- that is a separate issue, but from
> > > [1] it seems to me like there are more problems that *_ONCE() fixes than
> > > just split reads and writes.
> >
> > Your explanations are too wishy washy. If you wanna fix it, please do
> > it correctly. R/W ONCE isn't the right solution here.
>
> Sure, I will switch it to use atomic64_read() and atomic64_set() instead
> if that's what you'd prefer. Though I will mention that on quite a few
> architectures atomic64_read() is defined as:
>
> #define atomic64_read(v) READ_ONCE((v)->counter)

Yeah, on archs which don't have split access on 64bits. On the ones
which do, it does something else. The generic implementation is
straight-up locking, I think.

Thanks.

--
tejun

2019-10-16 22:22:25

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH] cgroup: pids: use {READ,WRITE}_ONCE for pids->limit operations

On Thu, Oct 17, 2019 at 02:35:20AM +1100, Aleksa Sarai wrote:
> > Sure, I will switch it to use atomic64_read() and atomic64_set() instead
> > if that's what you'd prefer. Though I will mention that on quite a few
> > architectures atomic64_read() is defined as:
> >
> > #define atomic64_read(v) READ_ONCE((v)->counter)
>
> Though I guess that's because on those architectures it turns out that
> READ_ONCE is properly atomic?

Oh yeah, on archs where 64bit accesses are atomic, READ_ONCE() /
WRITE_ONCE() would work here. If the limit variable were ulong
instead of an explicit 64bit variable, RW ONCE would work too as ulong
accesses are atomic on all archs IIRC.

Thanks.

--
tejun