Because pids->limit can be changed concurrently (but we don't want to
take a lock because it would be needlessly expensive), use the
appropriate memory barriers.
Fixes: commit 49b786ea146f ("cgroup: implement the PIDs subsystem")
Cc: [email protected] # v4.3+
Signed-off-by: Aleksa Sarai <[email protected]>
---
kernel/cgroup/pids.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index 8e513a573fe9..a726e4a20177 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -152,7 +152,7 @@ static int pids_try_charge(struct pids_cgroup *pids, int num)
* p->limit is %PIDS_MAX then we know that this test will never
* fail.
*/
- if (new > p->limit)
+ if (new > READ_ONCE(p->limit))
goto revert;
}
@@ -277,7 +277,7 @@ static ssize_t pids_max_write(struct kernfs_open_file *of, char *buf,
* Limit updates don't need to be mutex'd, since it isn't
* critical that any racing fork()s follow the new limit.
*/
- pids->limit = limit;
+ WRITE_ONCE(pids->limit, limit);
return nbytes;
}
@@ -285,7 +285,7 @@ static int pids_max_show(struct seq_file *sf, void *v)
{
struct cgroup_subsys_state *css = seq_css(sf);
struct pids_cgroup *pids = css_pids(css);
- int64_t limit = pids->limit;
+ int64_t limit = READ_ONCE(pids->limit);
if (limit >= PIDS_MAX)
seq_printf(sf, "%s\n", PIDS_MAX_STR);
--
2.23.0
On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
> Because pids->limit can be changed concurrently (but we don't want to
> take a lock because it would be needlessly expensive), use the
> appropriate memory barriers.
I can't quite tell what problem it's fixing. Can you elaborate a
scenario where the current code would break that your patch fixes?
Thanks.
--
tejun
On 2019-10-14, Tejun Heo <[email protected]> wrote:
> On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
> > Because pids->limit can be changed concurrently (but we don't want to
> > take a lock because it would be needlessly expensive), use the
> > appropriate memory barriers.
>
> I can't quite tell what problem it's fixing. Can you elaborate a
> scenario where the current code would break that your patch fixes?
As far as I can tell, not using *_ONCE() here means that if you had a
process changing pids->limit from A to B, a process might be able to
temporarily exceed pids->limit -- because pids->limit accesses are not
protected by mutexes and the C compiler can produce confusing
intermediate values for pids->limit[1].
But this is more of a correctness fix than one fixing an actually
exploitable bug -- given the kernel memory model work, it seems like a
good idea to just use READ_ONCE() and WRITE_ONCE() for shared memory
access.
[1]: https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
Hello, Aleksa.
On Tue, Oct 15, 2019 at 02:59:31AM +1100, Aleksa Sarai wrote:
> On 2019-10-14, Tejun Heo <[email protected]> wrote:
> > On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
> > > Because pids->limit can be changed concurrently (but we don't want to
> > > take a lock because it would be needlessly expensive), use the
> > > appropriate memory barriers.
> >
> > I can't quite tell what problem it's fixing. Can you elaborate a
> > scenario where the current code would break that your patch fixes?
>
> As far as I can tell, not using *_ONCE() here means that if you had a
> process changing pids->limit from A to B, a process might be able to
> temporarily exceed pids->limit -- because pids->limit accesses are not
> protected by mutexes and the C compiler can produce confusing
> intermediate values for pids->limit[1].
>
> But this is more of a correctness fix than one fixing an actually
> exploitable bug -- given the kernel memory model work, it seems like a
> good idea to just use READ_ONCE() and WRITE_ONCE() for shared memory
> access.
READ/WRITE_ONCE provides protection against compiler generating
multiple accesses for a single operation. It won't prevent split
writes / reads of 64bit variables on 32bit machines. For that, you'd
have to switch them to atomic64_t's.
Thanks.
--
tejun
On 2019-10-14, Tejun Heo <[email protected]> wrote:
> Hello, Aleksa.
>
> On Tue, Oct 15, 2019 at 02:59:31AM +1100, Aleksa Sarai wrote:
> > On 2019-10-14, Tejun Heo <[email protected]> wrote:
> > > On Sat, Oct 12, 2019 at 12:05:39PM +1100, Aleksa Sarai wrote:
> > > > Because pids->limit can be changed concurrently (but we don't want to
> > > > take a lock because it would be needlessly expensive), use the
> > > > appropriate memory barriers.
> > >
> > > I can't quite tell what problem it's fixing. Can you elaborate a
> > > scenario where the current code would break that your patch fixes?
> >
> > As far as I can tell, not using *_ONCE() here means that if you had a
> > process changing pids->limit from A to B, a process might be able to
> > temporarily exceed pids->limit -- because pids->limit accesses are not
> > protected by mutexes and the C compiler can produce confusing
> > intermediate values for pids->limit[1].
> >
> > But this is more of a correctness fix than one fixing an actually
> > exploitable bug -- given the kernel memory model work, it seems like a
> > good idea to just use READ_ONCE() and WRITE_ONCE() for shared memory
> > access.
>
> READ/WRITE_ONCE provides protection against compiler generating
> multiple accesses for a single operation. It won't prevent split
> writes / reads of 64bit variables on 32bit machines. For that, you'd
> have to switch them to atomic64_t's.
Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to
me like it's explicitly saying that I shouldn't use atomic64_t if I'm
just using it for fetching and assignment.
> The non-RMW ops are (typically) regular LOADs and STOREs and are
> canonically implemented using READ_ONCE(), WRITE_ONCE(),
> smp_load_acquire() and smp_store_release() respectively. Therefore, if
> you find yourself only using the Non-RMW operations of atomic_t, you
> do not in fact need atomic_t at all and are doing it wrong.
As for 64-bit on 32-bit machines -- that is a separate issue, but from
[1] it seems to me like there are more problems that *_ONCE() fixes than
just split reads and writes.
[1]: https://lwn.net/Articles/793253/
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
Hello, Aleksa.
On Wed, Oct 16, 2019 at 07:32:19PM +1100, Aleksa Sarai wrote:
> Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to
> me like it's explicitly saying that I shouldn't use atomic64_t if I'm
> just using it for fetching and assignment.
Hah, where is it saying that? The alternative would be seqlock or
u64_stats or straight-up locking but idk for this atomic64_t should be
fine.
> > The non-RMW ops are (typically) regular LOADs and STOREs and are
> > canonically implemented using READ_ONCE(), WRITE_ONCE(),
> > smp_load_acquire() and smp_store_release() respectively. Therefore, if
> > you find yourself only using the Non-RMW operations of atomic_t, you
> > do not in fact need atomic_t at all and are doing it wrong.
>
> As for 64-bit on 32-bit machines -- that is a separate issue, but from
> [1] it seems to me like there are more problems that *_ONCE() fixes than
> just split reads and writes.
Your explanations are too wishy washy. If you wanna fix it, please do
it correctly. R/W ONCE isn't the right solution here.
Thanks.
--
tejun
On 2019-10-16, Tejun Heo <[email protected]> wrote:
> Hello, Aleksa.
>
> On Wed, Oct 16, 2019 at 07:32:19PM +1100, Aleksa Sarai wrote:
> > Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to
> > me like it's explicitly saying that I shouldn't use atomic64_t if I'm
> > just using it for fetching and assignment.
>
> Hah, where is it saying that?
Isn't that what this says:
> Therefore, if you find yourself only using the Non-RMW operations of
> atomic_t, you do not in fact need atomic_t at all and are doing it
> wrong.
Doesn't using just atomic64_read() and atomic64_set() fall under "only
using the non-RMW operations of atomic_t"? But yes, I agree that any
locking is overkill.
> > As for 64-bit on 32-bit machines -- that is a separate issue, but from
> > [1] it seems to me like there are more problems that *_ONCE() fixes than
> > just split reads and writes.
>
> Your explanations are too wishy washy. If you wanna fix it, please do
> it correctly. R/W ONCE isn't the right solution here.
Sure, I will switch it to use atomic64_read() and atomic64_set() instead
if that's what you'd prefer. Though I will mention that on quite a few
architectures atomic64_read() is defined as:
#define atomic64_read(v) READ_ONCE((v)->counter)
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
On 2019-10-17, Aleksa Sarai <[email protected]> wrote:
> On 2019-10-16, Tejun Heo <[email protected]> wrote:
> > Hello, Aleksa.
> >
> > On Wed, Oct 16, 2019 at 07:32:19PM +1100, Aleksa Sarai wrote:
> > > Maybe I'm misunderstanding Documentation/atomic_t.txt, but it looks to
> > > me like it's explicitly saying that I shouldn't use atomic64_t if I'm
> > > just using it for fetching and assignment.
> >
> > Hah, where is it saying that?
>
> Isn't that what this says:
>
> > Therefore, if you find yourself only using the Non-RMW operations of
> > atomic_t, you do not in fact need atomic_t at all and are doing it
> > wrong.
>
> Doesn't using just atomic64_read() and atomic64_set() fall under "only
> using the non-RMW operations of atomic_t"? But yes, I agree that any
> locking is overkill.
>
> > > As for 64-bit on 32-bit machines -- that is a separate issue, but from
> > > [1] it seems to me like there are more problems that *_ONCE() fixes than
> > > just split reads and writes.
> >
> > Your explanations are too wishy washy. If you wanna fix it, please do
> > it correctly. R/W ONCE isn't the right solution here.
>
> Sure, I will switch it to use atomic64_read() and atomic64_set() instead
> if that's what you'd prefer. Though I will mention that on quite a few
> architectures atomic64_read() is defined as:
>
> #define atomic64_read(v) READ_ONCE((v)->counter)
Though I guess that's because on those architectures it turns out that
READ_ONCE is properly atomic?
--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
Hello,
On Thu, Oct 17, 2019 at 02:29:46AM +1100, Aleksa Sarai wrote:
> > Hah, where is it saying that?
>
> Isn't that what this says:
>
> > Therefore, if you find yourself only using the Non-RMW operations of
> > atomic_t, you do not in fact need atomic_t at all and are doing it
> > wrong.
>
> Doesn't using just atomic64_read() and atomic64_set() fall under "only
> using the non-RMW operations of atomic_t"? But yes, I agree that any
> locking is overkill.
Yeah, I mean, it's an overkill. We can use seqlock or u64_stat here
but it doesn't matter that much.
> > > As for 64-bit on 32-bit machines -- that is a separate issue, but from
> > > [1] it seems to me like there are more problems that *_ONCE() fixes than
> > > just split reads and writes.
> >
> > Your explanations are too wishy washy. If you wanna fix it, please do
> > it correctly. R/W ONCE isn't the right solution here.
>
> Sure, I will switch it to use atomic64_read() and atomic64_set() instead
> if that's what you'd prefer. Though I will mention that on quite a few
> architectures atomic64_read() is defined as:
>
> #define atomic64_read(v) READ_ONCE((v)->counter)
Yeah, on archs which don't have split access on 64bits. On the ones
which do, it does something else. The generic implementation is
straight-up locking, I think.
Thanks.
--
tejun
On Thu, Oct 17, 2019 at 02:35:20AM +1100, Aleksa Sarai wrote:
> > Sure, I will switch it to use atomic64_read() and atomic64_set() instead
> > if that's what you'd prefer. Though I will mention that on quite a few
> > architectures atomic64_read() is defined as:
> >
> > #define atomic64_read(v) READ_ONCE((v)->counter)
>
> Though I guess that's because on those architectures it turns out that
> READ_ONCE is properly atomic?
Oh yeah, on archs where 64bit accesses are atomic, READ_ONCE() /
WRITE_ONCE() would work here. If the limit variable were ulong
instead of an explicit 64bit variable, RW ONCE would work too as ulong
accesses are atomic on all archs IIRC.
Thanks.
--
tejun