I saw randam system hang testing virtio with blk-mq enabled and cpu hotplug
runing in the background. It turns out __ref_is_percpu() doesn't always return
correct percpu pointer. percpu_ref_put() calls __ref_is_percpu(), which checks
__PERCPU_REF_ATOMIC. After this check, the __PERCPU_REF_ATOMIC or
__PERCPU_REF_DEAD might be set, so we must exclude the two bits from the percpu
pointer. Fortunately we can still use percpu data for percpu_ref_put() even
this happens, because the final transistion from percpu to atomic occurs at rcu
context while __ref_is_percpu() is always called with rcu read lock protected.
CC: Jens Axboe <[email protected]>
CC: Tejun Heo <[email protected]>
CC: Kent Overstreet <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
---
include/linux/percpu-refcount.h | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index d5c89e0..6beee08 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -136,7 +136,14 @@ static inline bool __ref_is_percpu(struct percpu_ref *ref,
if (unlikely(percpu_ptr & __PERCPU_REF_ATOMIC))
return false;
- *percpu_countp = (unsigned long __percpu *)percpu_ptr;
+ /*
+ * At this point ATOMIC or DEAD might be set when percpu_ref_kill() is
+ * running. It's still safe to use percpu here, because the final
+ * transition from percpu to atomic occurs at rcu context while this
+ * routine is protected with rcu read lock.
+ */
+ *percpu_countp = (unsigned long __percpu *)(percpu_ptr &
+ ~__PERCPU_REF_ATOMIC_DEAD);
return true;
}
--
1.8.3.2
While decoupling ATOMIC and DEAD flags, f47ad4578461 ("percpu_ref:
decouple switching to percpu mode and reinit") updated
__ref_is_percpu() so that it only tests ATOMIC flag to determine
whether the ref is in percpu mode or not; however, while DEAD implies
ATOMIC, the two flags are set separately during percpu_ref_kill() and
if __ref_is_percpu() races percpu_ref_kill(), it may see DEAD w/o
ATOMIC. Because __ref_is_percpu() returns @ref->percpu_count_ptr
value verbatim as the percpu pointer after testing ATOMIC, the pointer
may now be contaminated with the DEAD flag.
This can be fixed by clearing the flag bits before returning the
pointer which was the fix proposed by Shaohua; however, as DEAD
implies ATOMIC, we can just test for both flags at once and avoid the
explicit masking.
Update __ref_is_percpu() so that it tests that both ATOMIC and DEAD
are clear before returning @ref->percpu_count_ptr as the percpu
pointer.
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Shaohua Li <[email protected]>
Link: http://lkml.kernel.org/r/995deb699f5b873c45d667df4add3b06f73c2c25.1416638887.git.shli@kernel.org
Fixes: f47ad4578461 ("percpu_ref: decouple switching to percpu mode and reinit")
---
Hello, Shaohua.
That was a nasty one. I think this fix is slightly better. Can you
please confirm that this fixes the issues you're seeing too?
Thanks.
include/linux/percpu-refcount.h | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index d5c89e0..51ce60c 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -133,7 +133,13 @@ static inline bool __ref_is_percpu(struct percpu_ref *ref,
/* paired with smp_store_release() in percpu_ref_reinit() */
smp_read_barrier_depends();
- if (unlikely(percpu_ptr & __PERCPU_REF_ATOMIC))
+ /*
+ * Theoretically, the following could test just ATOMIC; however,
+ * then we'd have to mask off DEAD separately as DEAD may be
+ * visible without ATOMIC if we race with percpu_ref_kill(). DEAD
+ * implies ATOMIC anyway. Test them together.
+ */
+ if (unlikely(percpu_ptr & __PERCPU_REF_ATOMIC_DEAD))
return false;
*percpu_countp = (unsigned long __percpu *)percpu_ptr;
On Sat, Nov 22, 2014 at 09:22:42AM -0500, Tejun Heo wrote:
> While decoupling ATOMIC and DEAD flags, f47ad4578461 ("percpu_ref:
> decouple switching to percpu mode and reinit") updated
> __ref_is_percpu() so that it only tests ATOMIC flag to determine
> whether the ref is in percpu mode or not; however, while DEAD implies
> ATOMIC, the two flags are set separately during percpu_ref_kill() and
> if __ref_is_percpu() races percpu_ref_kill(), it may see DEAD w/o
> ATOMIC. Because __ref_is_percpu() returns @ref->percpu_count_ptr
> value verbatim as the percpu pointer after testing ATOMIC, the pointer
> may now be contaminated with the DEAD flag.
>
> This can be fixed by clearing the flag bits before returning the
> pointer which was the fix proposed by Shaohua; however, as DEAD
> implies ATOMIC, we can just test for both flags at once and avoid the
> explicit masking.
>
> Update __ref_is_percpu() so that it tests that both ATOMIC and DEAD
> are clear before returning @ref->percpu_count_ptr as the percpu
> pointer.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Reported-by: Shaohua Li <[email protected]>
> Link: http://lkml.kernel.org/r/995deb699f5b873c45d667df4add3b06f73c2c25.1416638887.git.shli@kernel.org
> Fixes: f47ad4578461 ("percpu_ref: decouple switching to percpu mode and reinit")
> ---
> Hello, Shaohua.
>
> That was a nasty one. I think this fix is slightly better. Can you
> please confirm that this fixes the issues you're seeing too?
>
> Thanks.
>
> include/linux/percpu-refcount.h | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
> index d5c89e0..51ce60c 100644
> --- a/include/linux/percpu-refcount.h
> +++ b/include/linux/percpu-refcount.h
> @@ -133,7 +133,13 @@ static inline bool __ref_is_percpu(struct percpu_ref *ref,
> /* paired with smp_store_release() in percpu_ref_reinit() */
> smp_read_barrier_depends();
>
> - if (unlikely(percpu_ptr & __PERCPU_REF_ATOMIC))
> + /*
> + * Theoretically, the following could test just ATOMIC; however,
> + * then we'd have to mask off DEAD separately as DEAD may be
> + * visible without ATOMIC if we race with percpu_ref_kill(). DEAD
> + * implies ATOMIC anyway. Test them together.
> + */
> + if (unlikely(percpu_ptr & __PERCPU_REF_ATOMIC_DEAD))
> return false;
this sounds not the correct answer. the DEAD/ATOMIC bit can be set by
percpu_ref_kill() right after the check.
Thanks,
Shaohua
On Sat, Nov 22, 2014 at 09:04:48AM -0800, Shaohua Li wrote:
...
> > + /*
> > + * Theoretically, the following could test just ATOMIC; however,
> > + * then we'd have to mask off DEAD separately as DEAD may be
> > + * visible without ATOMIC if we race with percpu_ref_kill(). DEAD
> > + * implies ATOMIC anyway. Test them together.
> > + */
> > + if (unlikely(percpu_ptr & __PERCPU_REF_ATOMIC_DEAD))
> > return false;
>
> this sounds not the correct answer. the DEAD/ATOMIC bit can be set by
> percpu_ref_kill() right after the check.
Yes, but that's why we're fetching @percpu_ptr with ACCESS_ONCE()
before checking the flags.
Thanks.
--
tejun
On Sat, Nov 22, 2014 at 12:06:02PM -0500, Tejun Heo wrote:
> On Sat, Nov 22, 2014 at 09:04:48AM -0800, Shaohua Li wrote:
> ...
> > > + /*
> > > + * Theoretically, the following could test just ATOMIC; however,
> > > + * then we'd have to mask off DEAD separately as DEAD may be
> > > + * visible without ATOMIC if we race with percpu_ref_kill(). DEAD
> > > + * implies ATOMIC anyway. Test them together.
> > > + */
> > > + if (unlikely(percpu_ptr & __PERCPU_REF_ATOMIC_DEAD))
> > > return false;
> >
> > this sounds not the correct answer. the DEAD/ATOMIC bit can be set by
> > percpu_ref_kill() right after the check.
>
> Yes, but that's why we're fetching @percpu_ptr with ACCESS_ONCE()
> before checking the flags.
Ok, I forgot we cache the percpu_ptr. Yes, this does work. You can add my
signed-off in the patch.
Thanks,
Shaohua
On Sat, Nov 22, 2014 at 09:22:42AM -0500, Tejun Heo wrote:
> While decoupling ATOMIC and DEAD flags, f47ad4578461 ("percpu_ref:
> decouple switching to percpu mode and reinit") updated
> __ref_is_percpu() so that it only tests ATOMIC flag to determine
> whether the ref is in percpu mode or not; however, while DEAD implies
> ATOMIC, the two flags are set separately during percpu_ref_kill() and
> if __ref_is_percpu() races percpu_ref_kill(), it may see DEAD w/o
> ATOMIC. Because __ref_is_percpu() returns @ref->percpu_count_ptr
> value verbatim as the percpu pointer after testing ATOMIC, the pointer
> may now be contaminated with the DEAD flag.
>
> This can be fixed by clearing the flag bits before returning the
> pointer which was the fix proposed by Shaohua; however, as DEAD
> implies ATOMIC, we can just test for both flags at once and avoid the
> explicit masking.
>
> Update __ref_is_percpu() so that it tests that both ATOMIC and DEAD
> are clear before returning @ref->percpu_count_ptr as the percpu
> pointer.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Reported-by: Shaohua Li <[email protected]>
> Link: http://lkml.kernel.org/r/995deb699f5b873c45d667df4add3b06f73c2c25.1416638887.git.shli@kernel.org
> Fixes: f47ad4578461 ("percpu_ref: decouple switching to percpu mode and reinit")
Applied to percpu/for-3.18-fixes and pushed out to Linus.
Thanks.
--
tejun