As raw_cpu_generic_read() is a plain read from a raw_cpu_ptr() address,
it's possible (albeit unlikely) that the compiler will split the access
across multiple instructions.
In this_cpu_generic_read() we disable preemption but not interrupts
before calling raw_cpu_generic_read(). Thus, an interrupt could be taken
in the middle of the split load instructions. If a this_cpu_write() or
RMW this_cpu_*() op is made to the same variable in the interrupt
handling path, this_cpu_read() will return a torn value.
Avoid this by using READ_ONCE() to inhibit tearing.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Pranith Kumar <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: [email protected]
---
include/asm-generic/percpu.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/asm-generic/percpu.h b/include/asm-generic/percpu.h
index 0504ef8..79a8a58 100644
--- a/include/asm-generic/percpu.h
+++ b/include/asm-generic/percpu.h
@@ -67,7 +67,7 @@
#define raw_cpu_generic_read(pcp) \
({ \
- *raw_cpu_ptr(&(pcp)); \
+ READ_ONCE(*raw_cpu_ptr(&(pcp))); \
})
#define raw_cpu_generic_to_op(pcp, val, op) \
--
1.9.1
Hello, Mark.
On Mon, Sep 25, 2017 at 02:24:32PM +0100, Mark Rutland wrote:
> As raw_cpu_generic_read() is a plain read from a raw_cpu_ptr() address,
> it's possible (albeit unlikely) that the compiler will split the access
> across multiple instructions.
>
> In this_cpu_generic_read() we disable preemption but not interrupts
> before calling raw_cpu_generic_read(). Thus, an interrupt could be taken
> in the middle of the split load instructions. If a this_cpu_write() or
> RMW this_cpu_*() op is made to the same variable in the interrupt
> handling path, this_cpu_read() will return a torn value.
>
> Avoid this by using READ_ONCE() to inhibit tearing.
That's why there are irq-safe variants of the operations. Adding
READ_ONCE() doesn't generically guarantee that the reads won't be
split - e.g. there are arch which simply can't load a 64bit value with
a single instruction.
Thanks.
--
tejun
On Mon, Sep 25, 2017 at 08:18:27AM -0700, Tejun Heo wrote:
> Hello, Mark.
>
> On Mon, Sep 25, 2017 at 02:24:32PM +0100, Mark Rutland wrote:
> > As raw_cpu_generic_read() is a plain read from a raw_cpu_ptr() address,
> > it's possible (albeit unlikely) that the compiler will split the access
> > across multiple instructions.
> >
> > In this_cpu_generic_read() we disable preemption but not interrupts
> > before calling raw_cpu_generic_read(). Thus, an interrupt could be taken
> > in the middle of the split load instructions. If a this_cpu_write() or
> > RMW this_cpu_*() op is made to the same variable in the interrupt
> > handling path, this_cpu_read() will return a torn value.
> >
> > Avoid this by using READ_ONCE() to inhibit tearing.
>
> That's why there are irq-safe variants of the operations.
Unfortunately, the generic this_cpu_read(), which is intended to be
irq-safe, is not:
#define this_cpu_generic_read(pcp) \
({ \
typeof(pcp) __ret; \
preempt_disable_notrace(); \
__ret = raw_cpu_generic_read(pcp); \
preempt_enable_notrace(); \
__ret; \
})
I guess it'd be preferable to manipulate that in-place.
> Adding READ_ONCE() doesn't generically guarantee that the reads won't
> be split - e.g. there are arch which simply can't load a 64bit value
> with a single instruction.
True.
In which case, it really sounds like this_cpu_generic_read() needs to
disable interrupts too...
Thanks,
Mark.
Hello,
On Mon, Sep 25, 2017 at 04:33:02PM +0100, Mark Rutland wrote:
> Unfortunately, the generic this_cpu_read(), which is intended to be
> irq-safe, is not:
>
> #define this_cpu_generic_read(pcp) \
> ({ \
> typeof(pcp) __ret; \
> preempt_disable_notrace(); \
> __ret = raw_cpu_generic_read(pcp); \
> preempt_enable_notrace(); \
> __ret; \
> })
I see. Yeah, that looks like the bug there.
> I guess it'd be preferable to manipulate that in-place.
>
> > Adding READ_ONCE() doesn't generically guarantee that the reads won't
> > be split - e.g. there are arch which simply can't load a 64bit value
> > with a single instruction.
>
> In which case, it really sounds like this_cpu_generic_read() needs to
> disable interrupts too...
Can you please spin up a patch for this?
Thanks.
--
tejun
On Mon, 25 Sep 2017, Tejun Heo wrote:
> Hello,
>
> On Mon, Sep 25, 2017 at 04:33:02PM +0100, Mark Rutland wrote:
> > Unfortunately, the generic this_cpu_read(), which is intended to be
> > irq-safe, is not:
> >
> > #define this_cpu_generic_read(pcp) \
> > ({ \
> > typeof(pcp) __ret; \
> > preempt_disable_notrace(); \
> > __ret = raw_cpu_generic_read(pcp); \
> > preempt_enable_notrace(); \
> > __ret; \
> > })
>
> I see. Yeah, that looks like the bug there.
This is a single fetch operation of a value that needs to be atomic. It
really does not matter if an interrupt happens before or after that load
because it could also occur before or after the preempt_enable/disable
without the code being able to distinguish that case.
The fetch of a scalar value from memory is an atomic operation and that is
required from all arches. There is an exception for double word fetches.
Maybe we would need to special code that case but so far this does not
seem to have been an issue.
On Tue, 26 Sep 2017, Christopher Lameter wrote:
> On Mon, 25 Sep 2017, Tejun Heo wrote:
>
> > Hello,
> >
> > On Mon, Sep 25, 2017 at 04:33:02PM +0100, Mark Rutland wrote:
> > > Unfortunately, the generic this_cpu_read(), which is intended to be
> > > irq-safe, is not:
> > >
> > > #define this_cpu_generic_read(pcp) \
> > > ({ \
> > > typeof(pcp) __ret; \
> > > preempt_disable_notrace(); \
> > > __ret = raw_cpu_generic_read(pcp); \
> > > preempt_enable_notrace(); \
> > > __ret; \
> > > })
> >
> > I see. Yeah, that looks like the bug there.
>
> This is a single fetch operation of a value that needs to be atomic. It
> really does not matter if an interrupt happens before or after that load
> because it could also occur before or after the preempt_enable/disable
> without the code being able to distinguish that case.
>
> The fetch of a scalar value from memory is an atomic operation and that is
> required from all arches. There is an exception for double word fetches.
this_cpu_read_8() is a double word fetch on many 32bit architectures.
> Maybe we would need to special code that case but so far this does not
> seem to have been an issue.
Just because nobody ran into problem with that it is a non issue? That's
just hillarious.
It's obviously not correct and needs to be fixed _before_ someone has to go
through the pain of debugging such a problem.
Thanks,
tglx
On Tue, 26 Sep 2017, Thomas Gleixner wrote:
> > because it could also occur before or after the preempt_enable/disable
> > without the code being able to distinguish that case.
> >
> > The fetch of a scalar value from memory is an atomic operation and that is
> > required from all arches. There is an exception for double word fetches.
>
> this_cpu_read_8() is a double word fetch on many 32bit architectures.
Ok then this_cpu_read_8 for those platforms need to disable interrupts.
But that is not true for all arches.
> > Maybe we would need to special code that case but so far this does not
> > seem to have been an issue.
>
> Just because nobody ran into problem with that it is a non issue? That's
> just hillarious.
Its is even more (see your above statement) stupid to figure out that this
is actually unnecessary in the generic case and then continue the
argument.
> It's obviously not correct and needs to be fixed _before_ someone has to go
> through the pain of debugging such a problem.
As you pointed out the current approach *is* correct. Fetching a scalar
word is an atomic operation and must be one. If an arch cannot guarantee
that then arch specific measures need to be implemented to ensure that
this guarantee is kept. Could be done with interrupts disables, cmpxchg or
the reservation scheme on RISC processors. It definitely does not
belong into generic code.
Hi,
On Tue, Sep 26, 2017 at 01:47:27AM -0500, Christopher Lameter wrote:
> On Mon, 25 Sep 2017, Tejun Heo wrote:
> > On Mon, Sep 25, 2017 at 04:33:02PM +0100, Mark Rutland wrote:
> > > Unfortunately, the generic this_cpu_read(), which is intended to be
> > > irq-safe, is not:
> > >
> > > #define this_cpu_generic_read(pcp) \
> > > ({ \
> > > typeof(pcp) __ret; \
> > > preempt_disable_notrace(); \
> > > __ret = raw_cpu_generic_read(pcp); \
> > > preempt_enable_notrace(); \
> > > __ret; \
> > > })
> >
> > I see. Yeah, that looks like the bug there.
>
> This is a single fetch operation of a value that needs to be atomic. It
> really does not matter if an interrupt happens before or after that load
> because it could also occur before or after the preempt_enable/disable
> without the code being able to distinguish that case.
Unfortunately, this instance is not necessarily a single fetch operation,
given that the access is a plain C load from a pointer:
#define raw_cpu_generic_read(pcp) \
({ \
*raw_cpu_ptr(&(pcp)); \
})
... and thus the compiler is at liberty to tear the access across
multiple instructions. It probably won't, and it would almost certainly
be stupid to do so, but for better or worse we have no guarantee.
> The fetch of a scalar value from memory is an atomic operation and that is
> required from all arches.
Sure, where we ensure said fetch is a single access (e.g. with
READ_ONCE()). Otherwise the compiler is at liberty to tear the access
(e.g. if it fuses it with nearby accesses into a memcpy).
> There is an exception for double word fetches. Maybe we would need to
> special code that case but so far this does not seem to have been an
> issue.
I've sent a v2 which fixes both cases, only disabling interrupts for
those doubleword fetches, and otherwise using READ_ONCE() to prevent
tearing.
Thanks,
Mark.
On Tue, 26 Sep 2017, Christopher Lameter wrote:
> On Tue, 26 Sep 2017, Thomas Gleixner wrote:
>
> > > because it could also occur before or after the preempt_enable/disable
> > > without the code being able to distinguish that case.
> > >
> > > The fetch of a scalar value from memory is an atomic operation and that is
> > > required from all arches. There is an exception for double word fetches.
> >
> > this_cpu_read_8() is a double word fetch on many 32bit architectures.
>
> Ok then this_cpu_read_8 for those platforms need to disable interrupts.
> But that is not true for all arches.
>
> > > Maybe we would need to special code that case but so far this does not
> > > seem to have been an issue.
> >
> > Just because nobody ran into problem with that it is a non issue? That's
> > just hillarious.
>
> Its is even more (see your above statement) stupid to figure out that this
> is actually unnecessary in the generic case and then continue the
> argument.
>
> > It's obviously not correct and needs to be fixed _before_ someone has to go
> > through the pain of debugging such a problem.
>
> As you pointed out the current approach *is* correct. Fetching a scalar
> word is an atomic operation and must be one. If an arch cannot guarantee
> that then arch specific measures need to be implemented to ensure that
> this guarantee is kept. Could be done with interrupts disables, cmpxchg or
> the reservation scheme on RISC processors. It definitely does not
> belong into generic code.
Wrong. The default for 32bit architectures is that they CANNOT do atomic
fetch/write of 8 bytes. So instead of forcing that to be implemented in
every affected architecture this wants to be addressed in generic code and
those few 32bit architectures which can do atomic double word fetch/write
implement the magic functions to do so.
Thanks,
tglx
On Wed, Sep 27, 2017 at 11:01:13AM +0200, Thomas Gleixner wrote:
> The default for 32bit architectures is that they CANNOT do atomic
> fetch/write of 8 bytes.
Indeed.
In addition, even the native word size accesses weren't guaranteed to be
atomic, as we hadn't used the appropriate accessors to prevent the
compiler from messing with things behind our backs.
> So instead of forcing that to be implemented in every affected
> architecture this wants to be addressed in generic code and those few
> 32bit architectures which can do atomic double word fetch/write
> implement the magic functions to do so.
I've done this in v2 [1], which Tejun has queued [2] as a fix for v4.14.
Thanks,
Mark.
[1] https://lkml.kernel.org/r/[email protected]
[2] https://lkml.kernel.org/r/[email protected]