2012-10-30 15:35:09

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 0/2] irq_work: A couple fixes

Hi,

The first patch is extracted from my printk patches, with changelog
reworked. The second patch is an addition.

And I still wonder if cpu_relax() is enough to prevent the compiler
from correctly reloading work->flags in irq_work_sync() loop.
Do we need ACCESS_ONCE()?

Thanks.

Frederic Weisbecker (2):
irq_work: Fix racy check on work pending flag
irq_work: Fix racy IRQ_WORK_BUSY flag setting

kernel/irq_work.c | 29 ++++++++++++++++++++++-------
1 files changed, 22 insertions(+), 7 deletions(-)

--
1.7.5.4


2012-10-30 15:35:12

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/2] irq_work: Fix racy check on work pending flag

Work claiming semantics require this operation
to be SMP-safe.

So we want a strict ordering between the data we
want the work to handle and the flags that describe
the work state such that either we claim and we enqueue
the work that will see our data or we fail the claim
but the CPU where the work is still pending will
see the data we prepared when it executes the work:

CPU 0 CPU 1

data = something claim work
smp_mb() smp_mb()
try claim work execute work (data)

The early check for the pending flag in irq_work_claim()
fails to meet this ordering requirement. As a result,
when we fail to claim a work, it may have been processed
by CPU that were previously owning it already, leaving
our data unprocessed.

Discussing this with Steven Rostedt, we can use memory
barriers to fix this or we can rely on cmpxchg() that
implies full barrier already.

To fix this, we start by speculating about the value we
wish to be in the work->flags but we only make any conclusion
after the value returned by the cmpxchg() call that either
claims the work or does the ordering such that the CPU
where the work is pending handles our data.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul Gortmaker <[email protected]>
---
kernel/irq_work.c | 22 +++++++++++++++++-----
1 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 1588e3b..764240a 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -34,15 +34,27 @@ static DEFINE_PER_CPU(struct llist_head, irq_work_list);
*/
static bool irq_work_claim(struct irq_work *work)
{
- unsigned long flags, nflags;
+ unsigned long flags, oflags, nflags;

+ /*
+ * Can't check IRQ_WORK_PENDING bit right now because the work
+ * can be running on another CPU and we need correct ordering
+ * between work flags and data to compute in work execution
+ * such that either we claim and we execute the work or the work
+ * is still pending on another CPU but it's guaranteed it will see
+ * our data when it executes the work.
+ * Start with our best wish as a premise but only deal with
+ * flags value for real after cmpxchg() ordering.
+ */
+ flags = work->flags & ~IRQ_WORK_PENDING;
for (;;) {
- flags = work->flags;
- if (flags & IRQ_WORK_PENDING)
- return false;
nflags = flags | IRQ_WORK_FLAGS;
- if (cmpxchg(&work->flags, flags, nflags) == flags)
+ oflags = cmpxchg(&work->flags, flags, nflags);
+ if (oflags == flags)
break;
+ if (oflags & IRQ_WORK_PENDING)
+ return false;
+ flags = oflags;
cpu_relax();
}

--
1.7.5.4

2012-10-30 15:35:19

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

The IRQ_WORK_BUSY flag is set right before we execute the
work. Once this flag value is set, the work enters a
claimable state again.

This is necessary because if we want to enqueue a work but we
fail the claim, we want to ensure that the CPU where that work
is still pending will see and handle the data we expected the
work to compute.

This might not work as expected though because IRQ_WORK_BUSY
isn't set atomically. By the time a CPU fails a work claim,
this work may well have been already executed by the CPU where
it was previously pending.

Due to the lack of appropriate memory barrier, the IRQ_WORK_BUSY
flag value may not be visible by the CPU trying to claim while
the work is executing, and that until we clear the busy bit in
the work flags using cmpxchg() that implies the full barrier.

One solution could involve a full barrier between setting
IRQ_WORK_BUSY flag and the work execution. This way we
ensure that the work execution site sees the expected data
and the claim site sees the IRQ_WORK_BUSY:

CPU 0 CPU 1

data = something flags = IRQ_WORK_BUSY
smp_mb() (implicit with cmpxchg smp_mb()
on flags in claim) execute_work (sees data from CPU 0)
try to claim

As a shortcut, let's just use xchg() that implies a full memory
barrier.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul Gortmaker <[email protected]>
---
kernel/irq_work.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index 764240a..ea79365 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -130,9 +130,12 @@ void irq_work_run(void)

/*
* Clear the PENDING bit, after this point the @work
- * can be re-used.
+ * can be re-used. Use xchg to force ordering against
+ * data to process, such that if claiming fails on
+ * another CPU, we see and handle the data it wants
+ * us to process on the work.
*/
- work->flags = IRQ_WORK_BUSY;
+ xchg(&work->flags, IRQ_WORK_BUSY);
work->func(work);
/*
* Clear the BUSY bit and return to the free state if
--
1.7.5.4

2012-10-30 15:52:35

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] irq_work: A couple fixes

On Tue, 2012-10-30 at 16:34 +0100, Frederic Weisbecker wrote:
> Hi,

> And I still wonder if cpu_relax() is enough to prevent the compiler
> from correctly reloading work->flags in irq_work_sync() loop.
> Do we need ACCESS_ONCE()?

You mean this loop:

flags = work->flags & ~IRQ_WORK_PENDING;
for (;;) {
nflags = flags | IRQ_WORK_FLAGS;
oflags = cmpxchg(&work->flags, flags, nflags);
if (oflags == flags)
break;
if (oflags & IRQ_WORK_PENDING)
return false;
flags = oflags;
cpu_relax();
}

After the first loading of work->flags, you are worried about the
&work->flags in the cmpxchg()? The cmpxchg() will handle that itself. I
don't see any place that a ACCESS_ONCE() is required here. The cmpxchg()
acts on the address of work->flags, the compiler isn't involved with the
value at that address.

-- Steve

2012-10-30 15:53:51

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/2] irq_work: Fix racy check on work pending flag

On Tue, 2012-10-30 at 16:35 +0100, Frederic Weisbecker wrote:
> Work claiming semantics require this operation
> to be SMP-safe.
>
> So we want a strict ordering between the data we
> want the work to handle and the flags that describe
> the work state such that either we claim and we enqueue
> the work that will see our data or we fail the claim
> but the CPU where the work is still pending will
> see the data we prepared when it executes the work:
>
> CPU 0 CPU 1
>
> data = something claim work
> smp_mb() smp_mb()
> try claim work execute work (data)
>
> The early check for the pending flag in irq_work_claim()
> fails to meet this ordering requirement. As a result,
> when we fail to claim a work, it may have been processed
> by CPU that were previously owning it already, leaving
> our data unprocessed.
>
> Discussing this with Steven Rostedt, we can use memory
> barriers to fix this or we can rely on cmpxchg() that
> implies full barrier already.
>
> To fix this, we start by speculating about the value we
> wish to be in the work->flags but we only make any conclusion
> after the value returned by the cmpxchg() call that either
> claims the work or does the ordering such that the CPU
> where the work is pending handles our data.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Paul Gortmaker <[email protected]>

Acked-by: Steven Rostedt <[email protected]>

-- Steve

> ---
> kernel/irq_work.c | 22 +++++++++++++++++-----
> 1 files changed, 17 insertions(+), 5 deletions(-)
>

2012-10-30 16:25:43

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 0/2] irq_work: A couple fixes

2012/10/30 Steven Rostedt <[email protected]>:
> On Tue, 2012-10-30 at 16:34 +0100, Frederic Weisbecker wrote:
>> Hi,
>
>> And I still wonder if cpu_relax() is enough to prevent the compiler
>> from correctly reloading work->flags in irq_work_sync() loop.
>> Do we need ACCESS_ONCE()?
>
> You mean this loop:
>
> flags = work->flags & ~IRQ_WORK_PENDING;
> for (;;) {
> nflags = flags | IRQ_WORK_FLAGS;
> oflags = cmpxchg(&work->flags, flags, nflags);
> if (oflags == flags)
> break;
> if (oflags & IRQ_WORK_PENDING)
> return false;
> flags = oflags;
> cpu_relax();
> }
>
> After the first loading of work->flags, you are worried about the
> &work->flags in the cmpxchg()? The cmpxchg() will handle that itself. I
> don't see any place that a ACCESS_ONCE() is required here. The cmpxchg()
> acts on the address of work->flags, the compiler isn't involved with the
> value at that address.

No I was worried about the cpu_relax() in irq_work_sync()

2012-10-30 16:26:37

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

On Tue, 2012-10-30 at 16:35 +0100, Frederic Weisbecker wrote:
> The IRQ_WORK_BUSY flag is set right before we execute the
> work. Once this flag value is set, the work enters a
> claimable state again.
>
> This is necessary because if we want to enqueue a work but we
> fail the claim, we want to ensure that the CPU where that work
> is still pending will see and handle the data we expected the
> work to compute.
>
> This might not work as expected though because IRQ_WORK_BUSY
> isn't set atomically. By the time a CPU fails a work claim,
> this work may well have been already executed by the CPU where
> it was previously pending.
>
> Due to the lack of appropriate memory barrier, the IRQ_WORK_BUSY
> flag value may not be visible by the CPU trying to claim while
> the work is executing, and that until we clear the busy bit in
> the work flags using cmpxchg() that implies the full barrier.
>
> One solution could involve a full barrier between setting
> IRQ_WORK_BUSY flag and the work execution. This way we
> ensure that the work execution site sees the expected data
> and the claim site sees the IRQ_WORK_BUSY:
>
> CPU 0 CPU 1
>
> data = something flags = IRQ_WORK_BUSY
> smp_mb() (implicit with cmpxchg smp_mb()
> on flags in claim) execute_work (sees data from CPU 0)
> try to claim
>
> As a shortcut, let's just use xchg() that implies a full memory
> barrier.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Paul Gortmaker <[email protected]>

Reviewed-by: Steven Rostedt <[email protected]>

-- Steve

> ---
> kernel/irq_work.c | 7 +++++--
> 1 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 764240a..ea79365 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -130,9 +130,12 @@ void irq_work_run(void)
>
> /*
> * Clear the PENDING bit, after this point the @work
> - * can be re-used.
> + * can be re-used. Use xchg to force ordering against
> + * data to process, such that if claiming fails on
> + * another CPU, we see and handle the data it wants
> + * us to process on the work.
> */
> - work->flags = IRQ_WORK_BUSY;
> + xchg(&work->flags, IRQ_WORK_BUSY);
> work->func(work);
> /*
> * Clear the BUSY bit and return to the free state if

2012-10-30 17:13:16

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 0/2] irq_work: A couple fixes

On Tue, 2012-10-30 at 17:25 +0100, Frederic Weisbecker wrote:

> No I was worried about the cpu_relax() in irq_work_sync()

That one is fine too, as this is the purpose of cpu_relax(). Not only to
relax the cpu, but also to tell gcc that the loop needs to be reread.

-- Steve

2012-10-30 17:17:14

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 0/2] irq_work: A couple fixes

2012/10/30 Steven Rostedt <[email protected]>:
> On Tue, 2012-10-30 at 17:25 +0100, Frederic Weisbecker wrote:
>
>> No I was worried about the cpu_relax() in irq_work_sync()
>
> That one is fine too, as this is the purpose of cpu_relax(). Not only to
> relax the cpu, but also to tell gcc that the loop needs to be reread.

Ok, should be fine then.

Thanks!

2012-10-30 18:33:59

by anish singh

[permalink] [raw]
Subject: Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

> The IRQ_WORK_BUSY flag is set right before we execute the
> work. Once this flag value is set, the work enters a
> claimable state again.
>
> This is necessary because if we want to enqueue a work but we
> fail the claim, we want to ensure that the CPU where that work
> is still pending will see and handle the data we expected the
> work to compute.
>
> This might not work as expected though because IRQ_WORK_BUSY
> isn't set atomically. By the time a CPU fails a work claim,
> this work may well have been already executed by the CPU where
> it was previously pending.
>
> Due to the lack of appropriate memory barrier, the IRQ_WORK_BUSY
> flag value may not be visible by the CPU trying to claim while
> the work is executing, and that until we clear the busy bit in
> the work flags using cmpxchg() that implies the full barrier.
>
> One solution could involve a full barrier between setting
> IRQ_WORK_BUSY flag and the work execution. This way we
> ensure that the work execution site sees the expected data
> and the claim site sees the IRQ_WORK_BUSY:
>
> CPU 0 CPU 1
>
> data = something flags = IRQ_WORK_BUSY
> smp_mb() (implicit with cmpxchg smp_mb()
> on flags in claim) execute_work (sees data from CPU
> 0)
> try to claim
>
As I understand without the memory barrier proposed by you the situation
would be as below:
CPU 0 CPU 1

data = something flags = IRQ_WORK_BUSY
smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0)
on flags in claim)
_success_ in claiming and goes
ahead and execute the work(wrong?)
cmpxchg cause flag to IRQ_WORK_BUSY

Now knows the flag==IRQ_WORK_BUSY

Am I right?

Probably a stupid question.Why do we return the bool from irq_work_queue
when no one bothers to check the return value?Wouldn't it be better if
this function is void as used by the users of this function or am I
looking at the wrong code.
>
> As a shortcut, let's just use xchg() that implies a full memory
> barrier.
>
> Signed-off-by: Frederic Weisbecker <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Thomas Gleixner <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Steven Rostedt <[email protected]>
> Cc: Paul Gortmaker <[email protected]>
> ---
> kernel/irq_work.c | 7 +++++--
> 1 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/irq_work.c b/kernel/irq_work.c
> index 764240a..ea79365 100644
> --- a/kernel/irq_work.c
> +++ b/kernel/irq_work.c
> @@ -130,9 +130,12 @@ void irq_work_run(void)
>
> /*
> * Clear the PENDING bit, after this point the @work
> - * can be re-used.
> + * can be re-used. Use xchg to force ordering against
> + * data to process, such that if claiming fails on
> + * another CPU, we see and handle the data it wants
> + * us to process on the work.
> */
> - work->flags = IRQ_WORK_BUSY;
> + xchg(&work->flags, IRQ_WORK_BUSY);
> work->func(work);
> /*
> * Clear the BUSY bit and return to the free state if
> --
> 1.7.5.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2012-10-30 18:45:55

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

On Wed, 2012-10-31 at 03:33 +0900, anish kumar wrote:

> > CPU 0 CPU 1
> >
> > data = something flags = IRQ_WORK_BUSY
> > smp_mb() (implicit with cmpxchg smp_mb()
> > on flags in claim) execute_work (sees data from CPU
> > 0)
> > try to claim
> >
> As I understand without the memory barrier proposed by you the situation
> would be as below:
> CPU 0 CPU 1
>
> data = something flags = IRQ_WORK_BUSY
> smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0)
> on flags in claim)
> _success_ in claiming and goes

Correct, because it would see the stale value of flags.

> ahead and execute the work(wrong?)
> cmpxchg cause flag to IRQ_WORK_BUSY
>
> Now knows the flag==IRQ_WORK_BUSY
>
> Am I right?

right.

>
> Probably a stupid question.Why do we return the bool from irq_work_queue
> when no one bothers to check the return value?Wouldn't it be better if
> this function is void as used by the users of this function or am I
> looking at the wrong code.

Not a stupid question, as I was asking that to myself just earlier
today. But forgot to mention it as well. Especially, because it makes it
look like there's a bug in the code. Maybe someday someone will care if
their work was finished by itself, or some other CPU.

Probably should just nix the return value.

-- Steve

2012-10-31 00:36:41

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

2012/10/30 anish kumar <[email protected]>:
> As I understand without the memory barrier proposed by you the situation
> would be as below:
> CPU 0 CPU 1
>
> data = something flags = IRQ_WORK_BUSY
> smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0)
> on flags in claim)
> _success_ in claiming and goes
> ahead and execute the work(wrong?)
> cmpxchg cause flag to IRQ_WORK_BUSY
>
> Now knows the flag==IRQ_WORK_BUSY
>
> Am I right?

(Adding Paul in Cc because I'm again confused with memory barriers)

Actually what I had in mind is rather that CPU 0 fails its claim
because it's not seeing the IRQ_WORK_BUSY flag as it should:


CPU 0 CPU 1

data = something flags = IRQ_WORK_BUSY
cmpxchg() for claim execute_work (sees data from CPU 0)

CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this
value in a non-atomic way.

Also, while browsing Paul's perfbook, I realize my changelog is buggy.
It seems we can't reliably use memory barriers here because we would
be in the following case:

CPU 0 CPU 1

store(work data) store(flags)
smp_mb() smp_mb()
load(flags) load(work data)

On top of this barrier pairing, we can't make the assumption that, for
example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees
the flags stored in CPU 1.

So now I wonder if cmpxchg() can give us more confidence:


CPU 0 CPU 1

store(work data) xchg(flags, IRQ_WORK_BUSY)
cmpxchg(flags, IRQ_WORK_FLAGS) load(work data)

Can I make this assumption?

- If CPU 0 fails the cmpxchg() (which means CPU 1 has not yet xchg())
then CPU 1 will execute the work and see our data.

At least cmpxchg / xchg pair orders correctly to ensure somebody will
execute our work. Now probably some locking is needed from the work
function itself if it's not per cpu.

>
> Probably a stupid question.Why do we return the bool from irq_work_queue
> when no one bothers to check the return value?Wouldn't it be better if
> this function is void as used by the users of this function or am I
> looking at the wrong code.

No idea. Probably Peter had plans there.

2012-10-31 02:23:07

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

On Wed, 2012-10-31 at 01:36 +0100, Frederic Weisbecker wrote:
> 2012/10/30 anish kumar <[email protected]>:
> > As I understand without the memory barrier proposed by you the situation
> > would be as below:
> > CPU 0 CPU 1
> >
> > data = something flags = IRQ_WORK_BUSY
> > smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0)
> > on flags in claim)
> > _success_ in claiming and goes
> > ahead and execute the work(wrong?)
> > cmpxchg cause flag to IRQ_WORK_BUSY
> >
> > Now knows the flag==IRQ_WORK_BUSY
> >
> > Am I right?
>
> (Adding Paul in Cc because I'm again confused with memory barriers)
>
> Actually what I had in mind is rather that CPU 0 fails its claim
> because it's not seeing the IRQ_WORK_BUSY flag as it should:
>
>
> CPU 0 CPU 1
>
> data = something flags = IRQ_WORK_BUSY
> cmpxchg() for claim execute_work (sees data from CPU 0)
>
> CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this
> value in a non-atomic way.
>
> Also, while browsing Paul's perfbook, I realize my changelog is buggy.
> It seems we can't reliably use memory barriers here because we would
> be in the following case:
>
> CPU 0 CPU 1
>
> store(work data) store(flags)
> smp_mb() smp_mb()
> load(flags) load(work data)
>
> On top of this barrier pairing, we can't make the assumption that, for
> example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees
> the flags stored in CPU 1.
>
> So now I wonder if cmpxchg() can give us more confidence:

More confidence over what? The xchg()? They are equivalent (wrt memory
barriers).

Here's the issue that currently exists. Let's look at the code:


/*
* Claim the entry so that no one else will poke at it.
*/
static bool irq_work_claim(struct irq_work *work)
{
unsigned long flags, nflags;

for (;;) {
flags = work->flags;
if (flags & IRQ_WORK_PENDING)
return false;
nflags = flags | IRQ_WORK_FLAGS;
if (cmpxchg(&work->flags, flags, nflags) == flags)
break;
cpu_relax();
}

return true;
}

and

llnode = llist_del_all(this_list);
while (llnode != NULL) {
work = llist_entry(llnode, struct irq_work, llnode);

llnode = llist_next(llnode);

/*
* Clear the PENDING bit, after this point the @work
* can be re-used.
*/
work->flags = IRQ_WORK_BUSY;
work->func(work);
/*
* Clear the BUSY bit and return to the free state if
* no-one else claimed it meanwhile.
*/
(void)cmpxchg(&work->flags, IRQ_WORK_BUSY, 0);
}

The irq_work_claim() will only queue its work if it's not already
pending. If it is pending, then someone is going to process it anyway.
But once we start running the function, new work needs to be processed
again.

Thus we have:

CPU 1 CPU 2
----- -----
(flags = 0)
cmpxchg(flags, 0, IRQ_WORK_FLAGS)
(flags = 3)
[...]

if (flags & IRQ_WORK_PENDING)
return false
flags = IRQ_WORK_BUSY
(flags = 2)
func()

The above shows the normal case were CPU 2 doesn't need to queue work,
because its going to be done for it by CPU 1. But...



CPU 1 CPU 2
----- -----
(flags = 0)
cmpxchg(flags, 0, IRQ_WORK_FLAGS)
(flags = 3)
[...]
flags = IRQ_WORK_BUSY
(flags = 2)
func()
(sees flags = 3)
if (flags & IRQ_WORK_PENDING)
return false
cmpxchg(flags, 2, 0);
(flags = 0)


Here, because we did not do a memory barrier after
flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work,
and missed the opportunity. Now if you had this fix with the xchg() as
you have in your patch, then CPU 2 would not see the stale flags.
Except, with the code I showed above it still can!

CPU 1 CPU 2
----- -----
(flags = 0)
cmpxchg(flags, 0, IRQ_WORK_FLAGS)
(flags = 3)
[...]
(fetch flags)
xchg(&flags, IRQ_WORK_BUSY)
(flags = 2)
func()
(sees flags = 3)
if (flags & IRQ_WORK_PENDING)
return false
cmpxchg(flags, 2, 0);
(flags = 0)


Even with the update of xchg(), if CPU2 fetched the flags before CPU1
did the xchg, then it would still lose out. But that's where your
previous patch comes in that does:

flags = work->flags & ~IRQ_WORK_PENDING;
for (;;) {
nflags = flags | IRQ_WORK_FLAGS;
oflags = cmpxchg(&work->flags, flags, nflags);
if (oflags == flags)
break;
if (oflags & IRQ_WORK_PENDING)
return false;
flags = oflags;
cpu_relax();
}


This now does:

CPU 1 CPU 2
----- -----
(flags = 0)
cmpxchg(flags, 0, IRQ_WORK_FLAGS)
(flags = 3)
[...]
xchg(&flags, IRQ_WORK_BUSY)
(flags = 2)
func()
oflags = cmpxchg(&flags, flags, nflags);
(sees flags = 2)
if (flags & IRQ_WORK_PENDING)
(not true)
(loop)
cmpxchg(flags, 2, 0);
(flags = 2)
flags = 3



With both patch 1 and 2 you fixed the bug.

I guess you suffer from...

http://weknowmemes.com/wp-content/uploads/2012/09/my-code-doesnt-work-i-have-no-idea-why.jpg

;-)


>
>
> CPU 0 CPU 1
>
> store(work data) xchg(flags, IRQ_WORK_BUSY)
> cmpxchg(flags, IRQ_WORK_FLAGS) load(work data)
>
> Can I make this assumption?
>
> - If CPU 0 fails the cmpxchg() (which means CPU 1 has not yet xchg())
> then CPU 1 will execute the work and see our data.
>
> At least cmpxchg / xchg pair orders correctly to ensure somebody will
> execute our work. Now probably some locking is needed from the work
> function itself if it's not per cpu.

Yeah, there's nothing stopping the work->func() from executing from two
different CPUs. The protection is just for the internal use of llist
that is used. We don't need to worry about it being queued twice and
corrupting the work->llnode.

But the synchronization of the func() should be up to the func() code
itself, not a guarantee of the irq_work.

-- Steve

2012-10-31 02:54:08

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

2012/10/31 Steven Rostedt <[email protected]>:
> More confidence over what? The xchg()? They are equivalent (wrt memory
> barriers).
>
> Here's the issue that currently exists. Let's look at the code:
>
>
> /*
> * Claim the entry so that no one else will poke at it.
> */
> static bool irq_work_claim(struct irq_work *work)
> {
> unsigned long flags, nflags;
>
> for (;;) {
> flags = work->flags;
> if (flags & IRQ_WORK_PENDING)
> return false;
> nflags = flags | IRQ_WORK_FLAGS;
> if (cmpxchg(&work->flags, flags, nflags) == flags)
> break;
> cpu_relax();
> }
>
> return true;
> }
>
> and
>
> llnode = llist_del_all(this_list);
> while (llnode != NULL) {
> work = llist_entry(llnode, struct irq_work, llnode);
>
> llnode = llist_next(llnode);
>
> /*
> * Clear the PENDING bit, after this point the @work
> * can be re-used.
> */
> work->flags = IRQ_WORK_BUSY;
> work->func(work);
> /*
> * Clear the BUSY bit and return to the free state if
> * no-one else claimed it meanwhile.
> */
> (void)cmpxchg(&work->flags, IRQ_WORK_BUSY, 0);
> }
>
> The irq_work_claim() will only queue its work if it's not already
> pending. If it is pending, then someone is going to process it anyway.
> But once we start running the function, new work needs to be processed
> again.
>
> Thus we have:
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
>
> if (flags & IRQ_WORK_PENDING)
> return false
> flags = IRQ_WORK_BUSY
> (flags = 2)
> func()
>
> The above shows the normal case were CPU 2 doesn't need to queue work,
> because its going to be done for it by CPU 1. But...
>
>
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> flags = IRQ_WORK_BUSY
> (flags = 2)
> func()
> (sees flags = 3)
> if (flags & IRQ_WORK_PENDING)
> return false
> cmpxchg(flags, 2, 0);
> (flags = 0)
>
>
> Here, because we did not do a memory barrier after
> flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work,
> and missed the opportunity. Now if you had this fix with the xchg() as
> you have in your patch, then CPU 2 would not see the stale flags.
> Except, with the code I showed above it still can!
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> (fetch flags)
> xchg(&flags, IRQ_WORK_BUSY)
> (flags = 2)
> func()
> (sees flags = 3)
> if (flags & IRQ_WORK_PENDING)
> return false
> cmpxchg(flags, 2, 0);
> (flags = 0)
>
>
> Even with the update of xchg(), if CPU2 fetched the flags before CPU1
> did the xchg, then it would still lose out. But that's where your
> previous patch comes in that does:
>
> flags = work->flags & ~IRQ_WORK_PENDING;
> for (;;) {
> nflags = flags | IRQ_WORK_FLAGS;
> oflags = cmpxchg(&work->flags, flags, nflags);
> if (oflags == flags)
> break;
> if (oflags & IRQ_WORK_PENDING)
> return false;
> flags = oflags;
> cpu_relax();
> }
>
>
> This now does:
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> xchg(&flags, IRQ_WORK_BUSY)
> (flags = 2)
> func()
> oflags = cmpxchg(&flags, flags, nflags);
> (sees flags = 2)
> if (flags & IRQ_WORK_PENDING)
> (not true)
> (loop)
> cmpxchg(flags, 2, 0);
> (flags = 2)
> flags = 3
>
>
>
> With both patch 1 and 2 you fixed the bug.
>
> I guess you suffer from...
>
> http://weknowmemes.com/wp-content/uploads/2012/09/my-code-doesnt-work-i-have-no-idea-why.jpg
>
> ;-)
>

Heh. No I was ok with all the above :) My point was that
xchg()/cmpxchg() is good to synchronize against flag values and the
need or not to queue the work, but memory barriers couldn't replace
that, and my ordering assumption against user data manipulated by work
were wrong. As such my changelogs are buggy.

My last doubt was: does the cmpxchg/xchg pair we have can settle an
ordering between flags and user data such that we don't need locking
for these. But as you highlight below, we definetly need locking for
user data anyway.

>>
>>
>> CPU 0 CPU 1
>>
>> store(work data) xchg(flags, IRQ_WORK_BUSY)
>> cmpxchg(flags, IRQ_WORK_FLAGS) load(work data)
>>
>> Can I make this assumption?
>>
>> - If CPU 0 fails the cmpxchg() (which means CPU 1 has not yet xchg())
>> then CPU 1 will execute the work and see our data.
>>
>> At least cmpxchg / xchg pair orders correctly to ensure somebody will
>> execute our work. Now probably some locking is needed from the work
>> function itself if it's not per cpu.
>
> Yeah, there's nothing stopping the work->func() from executing from two
> different CPUs. The protection is just for the internal use of llist
> that is used. We don't need to worry about it being queued twice and
> corrupting the work->llnode.
>
> But the synchronization of the func() should be up to the func() code
> itself, not a guarantee of the irq_work.

Agreed.

I'll refine my changelogs accordingly.

Thanks!

2012-10-31 11:16:56

by anish singh

[permalink] [raw]
Subject: Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

On Tue, 2012-10-30 at 22:23 -0400, Steven Rostedt wrote:
> On Wed, 2012-10-31 at 01:36 +0100, Frederic Weisbecker wrote:
> > 2012/10/30 anish kumar <[email protected]>:
> > > As I understand without the memory barrier proposed by you the situation
> > > would be as below:
> > > CPU 0 CPU 1
> > >
> > > data = something flags = IRQ_WORK_BUSY
> > > smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0)
> > > on flags in claim)
> > > _success_ in claiming and goes
> > > ahead and execute the work(wrong?)
> > > cmpxchg cause flag to IRQ_WORK_BUSY
> > >
> > > Now knows the flag==IRQ_WORK_BUSY
> > >
> > > Am I right?
> >
> > (Adding Paul in Cc because I'm again confused with memory barriers)
> >
> > Actually what I had in mind is rather that CPU 0 fails its claim
> > because it's not seeing the IRQ_WORK_BUSY flag as it should:
> >
> >
> > CPU 0 CPU 1
> >
> > data = something flags = IRQ_WORK_BUSY
> > cmpxchg() for claim execute_work (sees data from CPU 0)
> >
> > CPU 0 should see IRQ_WORK_BUSY but it may not because CPU 1 sets this
> > value in a non-atomic way.
> >
> > Also, while browsing Paul's perfbook, I realize my changelog is buggy.
> > It seems we can't reliably use memory barriers here because we would
> > be in the following case:
> >
> > CPU 0 CPU 1
> >
> > store(work data) store(flags)
> > smp_mb() smp_mb()
> > load(flags) load(work data)
> >
> > On top of this barrier pairing, we can't make the assumption that, for
> > example, if CPU 1 sees the work data stored in CPU 0 then CPU 0 sees
> > the flags stored in CPU 1.
> >
> > So now I wonder if cmpxchg() can give us more confidence:
>
> More confidence over what? The xchg()? They are equivalent (wrt memory
> barriers).
>
> Here's the issue that currently exists. Let's look at the code:
>
>
> /*
> * Claim the entry so that no one else will poke at it.
> */
> static bool irq_work_claim(struct irq_work *work)
> {
> unsigned long flags, nflags;
>
> for (;;) {
> flags = work->flags;
> if (flags & IRQ_WORK_PENDING)
> return false;
> nflags = flags | IRQ_WORK_FLAGS;
nflags = 1 | 3
nflags = 2 | 3
In both cases the result would be same.If I am right then wouldn't this
operation be redundant?
> if (cmpxchg(&work->flags, flags, nflags) == flags)
> break;
> cpu_relax();
> }
>
> return true;
> }
>
> and
>
> llnode = llist_del_all(this_list);
> while (llnode != NULL) {
> work = llist_entry(llnode, struct irq_work, llnode);
>
> llnode = llist_next(llnode);
>
> /*
> * Clear the PENDING bit, after this point the @work
> * can be re-used.
> */
> work->flags = IRQ_WORK_BUSY;
> work->func(work);
> /*
> * Clear the BUSY bit and return to the free state if
> * no-one else claimed it meanwhile.
> */
> (void)cmpxchg(&work->flags, IRQ_WORK_BUSY, 0);
> }
>
> The irq_work_claim() will only queue its work if it's not already
> pending. If it is pending, then someone is going to process it anyway.
> But once we start running the function, new work needs to be processed
> again.
>
> Thus we have:
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
>
> if (flags & IRQ_WORK_PENDING)
> return false
> flags = IRQ_WORK_BUSY
> (flags = 2)
> func()
>
> The above shows the normal case were CPU 2 doesn't need to queue work,
> because its going to be done for it by CPU 1. But...
>
>
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> flags = IRQ_WORK_BUSY
> (flags = 2)
> func()
> (sees flags = 3)
> if (flags & IRQ_WORK_PENDING)
> return false
> cmpxchg(flags, 2, 0);
> (flags = 0)
>
>
> Here, because we did not do a memory barrier after
> flags = IRQ_WORK_BUSY, CPU 2 saw stale data and did not queue its work,
> and missed the opportunity. Now if you had this fix with the xchg() as
> you have in your patch, then CPU 2 would not see the stale flags.
> Except, with the code I showed above it still can!
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> (fetch flags)
> xchg(&flags, IRQ_WORK_BUSY)
> (flags = 2)
> func()
> (sees flags = 3)
> if (flags & IRQ_WORK_PENDING)
> return false
> cmpxchg(flags, 2, 0);
> (flags = 0)
>
>
> Even with the update of xchg(), if CPU2 fetched the flags before CPU1
> did the xchg, then it would still lose out. But that's where your
> previous patch comes in that does:
>
> flags = work->flags & ~IRQ_WORK_PENDING;
> for (;;) {
> nflags = flags | IRQ_WORK_FLAGS;
> oflags = cmpxchg(&work->flags, flags, nflags);
> if (oflags == flags)
> break;
> if (oflags & IRQ_WORK_PENDING)
> return false;
> flags = oflags;
> cpu_relax();
> }
>
>
> This now does:
>
> CPU 1 CPU 2
> ----- -----
> (flags = 0)
> cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> (flags = 3)
> [...]
> xchg(&flags, IRQ_WORK_BUSY)
> (flags = 2)
> func()
> oflags = cmpxchg(&flags, flags, nflags);
> (sees flags = 2)
> if (flags & IRQ_WORK_PENDING)
> (not true)
> (loop)
> cmpxchg(flags, 2, 0);
> (flags = 2)
> flags = 3
>
>
>
> With both patch 1 and 2 you fixed the bug.
This is the best explanation anyone can put forward for this problem.
>
> I guess you suffer from...
>
> http://weknowmemes.com/wp-content/uploads/2012/09/my-code-doesnt-work-i-have-no-idea-why.jpg
>
> ;-)
>
>
> >
> >
> > CPU 0 CPU 1
> >
> > store(work data) xchg(flags, IRQ_WORK_BUSY)
> > cmpxchg(flags, IRQ_WORK_FLAGS) load(work data)
> >
> > Can I make this assumption?
> >
> > - If CPU 0 fails the cmpxchg() (which means CPU 1 has not yet xchg())
> > then CPU 1 will execute the work and see our data.
> >
> > At least cmpxchg / xchg pair orders correctly to ensure somebody will
> > execute our work. Now probably some locking is needed from the work
> > function itself if it's not per cpu.
>
> Yeah, there's nothing stopping the work->func() from executing from two
> different CPUs. The protection is just for the internal use of llist
> that is used. We don't need to worry about it being queued twice and
> corrupting the work->llnode.
>
> But the synchronization of the func() should be up to the func() code
> itself, not a guarantee of the irq_work.
>
> -- Steve
>
>

2012-10-31 11:17:04

by anish singh

[permalink] [raw]
Subject: Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

On Tue, 2012-10-30 at 14:45 -0400, Steven Rostedt wrote:
> On Wed, 2012-10-31 at 03:33 +0900, anish kumar wrote:
>
> > > CPU 0 CPU 1
> > >
> > > data = something flags = IRQ_WORK_BUSY
> > > smp_mb() (implicit with cmpxchg smp_mb()
> > > on flags in claim) execute_work (sees data from CPU
> > > 0)
> > > try to claim
> > >
> > As I understand without the memory barrier proposed by you the situation
> > would be as below:
> > CPU 0 CPU 1
> >
> > data = something flags = IRQ_WORK_BUSY
> > smp_mb() (implicit with cmpxchg execute_work (sees data from CPU 0)
> > on flags in claim)
> > _success_ in claiming and goes
>
> Correct, because it would see the stale value of flags.
>
> > ahead and execute the work(wrong?)
> > cmpxchg cause flag to IRQ_WORK_BUSY
> >
> > Now knows the flag==IRQ_WORK_BUSY
> >
> > Am I right?
>
> right.
>
> >
> > Probably a stupid question.Why do we return the bool from irq_work_queue
> > when no one bothers to check the return value?Wouldn't it be better if
> > this function is void as used by the users of this function or am I
> > looking at the wrong code.
>
> Not a stupid question, as I was asking that to myself just earlier
> today. But forgot to mention it as well. Especially, because it makes it
> look like there's a bug in the code. Maybe someday someone will care if
> their work was finished by itself, or some other CPU.
>
> Probably should just nix the return value.
Can I send a patch to fix this?
>
> -- Steve
>
>

2012-10-31 12:59:54

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

On Wed, 2012-10-31 at 20:04 +0900, anish kumar wrote:

> > /*
> > * Claim the entry so that no one else will poke at it.
> > */
> > static bool irq_work_claim(struct irq_work *work)
> > {
> > unsigned long flags, nflags;
> >
> > for (;;) {
> > flags = work->flags;
> > if (flags & IRQ_WORK_PENDING)
> > return false;
> > nflags = flags | IRQ_WORK_FLAGS;
> nflags = 1 | 3
> nflags = 2 | 3
> In both cases the result would be same.If I am right then wouldn't this
> operation be redundant?

Right. Actually we could change the new loop to:

for (;;) {
oflags = cmpxchg(&work->flags, flags, IRQ_WORK_FLAGS);
if (oflags == flags)
break;
if (oflags & IRQ_WORK_PENDING)
return false;
flags = oflags;
cpu_relax();
}


> > if (cmpxchg(&work->flags, flags, nflags) == flags)
> > break;
> > cpu_relax();
> > }
> >
> > return true;
> > }
> >


> >
> > This now does:
> >
> > CPU 1 CPU 2
> > ----- -----
> > (flags = 0)
> > cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> > (flags = 3)
> > [...]
> > xchg(&flags, IRQ_WORK_BUSY)
> > (flags = 2)
> > func()
> > oflags = cmpxchg(&flags, flags, nflags);
> > (sees flags = 2)
> > if (flags & IRQ_WORK_PENDING)
> > (not true)
> > (loop)
> > cmpxchg(flags, 2, 0);
> > (flags = 2)
> > flags = 3
> >
> >
> >
> > With both patch 1 and 2 you fixed the bug.
> This is the best explanation anyone can put forward for this problem.

Frederic,

Would you like to add my explanation to your change log? You can add the
entire thing, which I think would explain a lot to people.

Thanks,

-- Steve

2012-10-31 13:32:36

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

2012/10/31 Steven Rostedt <[email protected]>:
> On Wed, 2012-10-31 at 20:04 +0900, anish kumar wrote:
>> nflags = 1 | 3
>> nflags = 2 | 3
>> In both cases the result would be same.If I am right then wouldn't this
>> operation be redundant?
>
> Right. Actually we could change the new loop to:
>
> for (;;) {
> oflags = cmpxchg(&work->flags, flags, IRQ_WORK_FLAGS);
> if (oflags == flags)
> break;
> if (oflags & IRQ_WORK_PENDING)
> return false;
> flags = oflags;
> cpu_relax();
> }

We could. But I wanted to keep the code able to handle new flags in
the future (such as IRQ_WORK_LAZY).

> Frederic,
>
> Would you like to add my explanation to your change log? You can add the
> entire thing, which I think would explain a lot to people.

It's indeed a very clear explanation. I'll put that in the changelog, thanks!

2012-10-31 13:51:59

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 2/2] irq_work: Fix racy IRQ_WORK_BUSY flag setting

On Wed, 2012-10-31 at 20:04 +0900, anish kumar wrote:

> > This now does:
> >
> > CPU 1 CPU 2
> > ----- -----
> > (flags = 0)
> > cmpxchg(flags, 0, IRQ_WORK_FLAGS)
> > (flags = 3)
> > [...]

We can still add here:

(fetch flags)

> > xchg(&flags, IRQ_WORK_BUSY)
> > (flags = 2)
> > func()
> > oflags = cmpxchg(&flags, flags, nflags);

Then the cmpxchg() would fail, and oflags would be 2

> > (sees flags = 2)
> > if (flags & IRQ_WORK_PENDING)

This should be:
if (oflags & IRQ_WORK_PENDING)


> > (not true)
> > (loop)
> > cmpxchg(flags, 2, 0);
> > (flags = 2)

This should be:
(flags = 0)
as we described the previous cmpxchg() as failing, flags would still be
2 before this cmpxchg(), and this one would succeed.

-- Steve

> > flags = 3
> >
> >
> >