2018-12-11 23:04:21

by Dennis Zhou

[permalink] [raw]
Subject: [PATCH] blkcg: handle dying request_queue when associating a blkg

Between v3 [1] and v4 [2] of the blkg association series, the
association point moved from generic_make_request_checks(), which is
called after the request enters the queue, to bio_set_dev(), which is when
the bio is formed before submit_bio(). When the request_queue goes away,
the blkgs supporting the request_queue are destroyed and then the
q->root_blkg is set to %NULL.

This patch adds a %NULL check to blkg_tryget_closest() to prevent the
NPE caused by the above. It also adds a guard to see if the
request_queue is dying when creating a blkg to prevent creating a blkg
for a dead request_queue.

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lore.kernel.org/lkml/[email protected]/

Fixes: 5cdf2e3fea5e ("blkcg: associate blkg when associating a device")
Reported-and-tested-by: Ming Lei <[email protected]>
Signed-off-by: Dennis Zhou <[email protected]>
---
block/blk-cgroup.c | 6 ++++++
include/linux/blk-cgroup.h | 2 +-
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 6bd0619a7d6e..c30661ddc873 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -202,6 +202,12 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
WARN_ON_ONCE(!rcu_read_lock_held());
lockdep_assert_held(&q->queue_lock);

+ /* request_queue is dying, do not create/recreate a blkg */
+ if (blk_queue_dying(q)) {
+ ret = -ENODEV;
+ goto err_free_blkg;
+ }
+
/* blkg holds a reference to blkcg */
if (!css_tryget_online(&blkcg->css)) {
ret = -ENODEV;
diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h
index bf13ecb0fe4f..f025fd1e22e6 100644
--- a/include/linux/blk-cgroup.h
+++ b/include/linux/blk-cgroup.h
@@ -511,7 +511,7 @@ static inline bool blkg_tryget(struct blkcg_gq *blkg)
*/
static inline struct blkcg_gq *blkg_tryget_closest(struct blkcg_gq *blkg)
{
- while (!percpu_ref_tryget(&blkg->refcnt))
+ while (blkg && !percpu_ref_tryget(&blkg->refcnt))
blkg = blkg->parent;

return blkg;
--
2.17.1



2018-12-11 23:17:20

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] blkcg: handle dying request_queue when associating a blkg

On Tue, 2018-12-11 at 18:03 -0500, Dennis Zhou wrote:
+AD4 diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
+AD4 index 6bd0619a7d6e..c30661ddc873 100644
+AD4 --- a/block/blk-cgroup.c
+AD4 +-+-+- b/block/blk-cgroup.c
+AD4 +AEAAQA -202,6 +-202,12 +AEAAQA static struct blkcg+AF8-gq +ACo-blkg+AF8-create(struct blkcg +ACo-blkcg,
+AD4 WARN+AF8-ON+AF8-ONCE(+ACE-rcu+AF8-read+AF8-lock+AF8-held())+ADs
+AD4 lockdep+AF8-assert+AF8-held(+ACY-q-+AD4-queue+AF8-lock)+ADs
+AD4
+AD4 +- /+ACo request+AF8-queue is dying, do not create/recreate a blkg +ACo-/
+AD4 +- if (blk+AF8-queue+AF8-dying(q)) +AHs
+AD4 +- ret +AD0 -ENODEV+ADs
+AD4 +- goto err+AF8-free+AF8-blkg+ADs
+AD4 +- +AH0
+AD4 +-
+AD4 /+ACo blkg holds a reference to blkcg +ACo-/
+AD4 if (+ACE-css+AF8-tryget+AF8-online(+ACY-blkcg-+AD4-css)) +AHs
+AD4 ret +AD0 -ENODEV+ADs

What prevents that the queue state changes after blk+AF8-queue+AF8-dying() has returned
and before blkg+AF8-create() returns? Are you sure you don't need to protect this
code with a blk+AF8-queue+AF8-enter() / blk+AF8-queue+AF8-exit() pair?

Bart.

2018-12-12 04:08:43

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH] blkcg: handle dying request_queue when associating a blkg

Hi Bart,

On Tue, Dec 11, 2018 at 03:16:13PM -0800, Bart Van Assche wrote:
> On Tue, 2018-12-11 at 18:03 -0500, Dennis Zhou wrote:
> > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> > index 6bd0619a7d6e..c30661ddc873 100644
> > --- a/block/blk-cgroup.c
> > +++ b/block/blk-cgroup.c
> > @@ -202,6 +202,12 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
> > WARN_ON_ONCE(!rcu_read_lock_held());
> > lockdep_assert_held(&q->queue_lock);
> >
> > + /* request_queue is dying, do not create/recreate a blkg */
> > + if (blk_queue_dying(q)) {
> > + ret = -ENODEV;
> > + goto err_free_blkg;
> > + }
> > +
> > /* blkg holds a reference to blkcg */
> > if (!css_tryget_online(&blkcg->css)) {
> > ret = -ENODEV;
>
> What prevents that the queue state changes after blk_queue_dying() has returned
> and before blkg_create() returns? Are you sure you don't need to protect this
> code with a blk_queue_enter() / blk_queue_exit() pair?
>

Hmmm. So I think the idea is that we rely on normal shutdown as I don't
think there is anything wrong with creating a blkg on a dying
request_queue. When we are doing association, the request_queue should
be pinned by the open call. What we are racing against is when the
request_queue is shutting down, it goes around and destroys the blkgs.
For clarity, QUEUE_FLAG_DYING is set in blk_cleanup_queue() before
calling blk_exit_queue() which eventually calls blkcg_exit_queue().

The use of blk_queue_dying() is to determine whether blkg shutdown has
already started as if we create one after it has started, we may
incorrectly orphan a blkg and leak it. Both blkg creation and
destruction require holding the queue_lock, so if the QUEUE_FLAG_DYING
flag is set after we've checked it, it means blkg destruction hasn't
started because it has to wait on the queue_lock. If QUEUE_FLAG_DYING is
set, then we have no guarantee of knowing what phase blkg destruction is
in leading to a potential leak.

Thanks,
Dennis

2018-12-12 23:56:17

by Bart Van Assche

[permalink] [raw]
Subject: Re: [PATCH] blkcg: handle dying request_queue when associating a blkg

On Tue, 2018-12-11 at 23:06 -0500, Dennis Zhou wrote:
+AD4 Hi Bart,
+AD4
+AD4 On Tue, Dec 11, 2018 at 03:16:13PM -0800, Bart Van Assche wrote:
+AD4 +AD4 On Tue, 2018-12-11 at 18:03 -0500, Dennis Zhou wrote:
+AD4 +AD4 +AD4 diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
+AD4 +AD4 +AD4 index 6bd0619a7d6e..c30661ddc873 100644
+AD4 +AD4 +AD4 --- a/block/blk-cgroup.c
+AD4 +AD4 +AD4 +-+-+- b/block/blk-cgroup.c
+AD4 +AD4 +AD4 +AEAAQA -202,6 +-202,12 +AEAAQA static struct blkcg+AF8-gq +ACo-blkg+AF8-create(struct blkcg +ACo-blkcg,
+AD4 +AD4 +AD4 WARN+AF8-ON+AF8-ONCE(+ACE-rcu+AF8-read+AF8-lock+AF8-held())+ADs
+AD4 +AD4 +AD4 lockdep+AF8-assert+AF8-held(+ACY-q-+AD4-queue+AF8-lock)+ADs
+AD4 +AD4 +AD4
+AD4 +AD4 +AD4 +- /+ACo request+AF8-queue is dying, do not create/recreate a blkg +ACo-/
+AD4 +AD4 +AD4 +- if (blk+AF8-queue+AF8-dying(q)) +AHs
+AD4 +AD4 +AD4 +- ret +AD0 -ENODEV+ADs
+AD4 +AD4 +AD4 +- goto err+AF8-free+AF8-blkg+ADs
+AD4 +AD4 +AD4 +- +AH0
+AD4 +AD4 +AD4 +-
+AD4 +AD4 +AD4 /+ACo blkg holds a reference to blkcg +ACo-/
+AD4 +AD4 +AD4 if (+ACE-css+AF8-tryget+AF8-online(+ACY-blkcg-+AD4-css)) +AHs
+AD4 +AD4 +AD4 ret +AD0 -ENODEV+ADs
+AD4 +AD4
+AD4 +AD4 What prevents that the queue state changes after blk+AF8-queue+AF8-dying() has returned
+AD4 +AD4 and before blkg+AF8-create() returns? Are you sure you don't need to protect this
+AD4 +AD4 code with a blk+AF8-queue+AF8-enter() / blk+AF8-queue+AF8-exit() pair?
+AD4 +AD4
+AD4
+AD4 Hmmm. So I think the idea is that we rely on normal shutdown as I don't
+AD4 think there is anything wrong with creating a blkg on a dying
+AD4 request+AF8-queue. When we are doing association, the request+AF8-queue should
+AD4 be pinned by the open call. What we are racing against is when the
+AD4 request+AF8-queue is shutting down, it goes around and destroys the blkgs.
+AD4 For clarity, QUEUE+AF8-FLAG+AF8-DYING is set in blk+AF8-cleanup+AF8-queue() before
+AD4 calling blk+AF8-exit+AF8-queue() which eventually calls blkcg+AF8-exit+AF8-queue().
+AD4
+AD4 The use of blk+AF8-queue+AF8-dying() is to determine whether blkg shutdown has
+AD4 already started as if we create one after it has started, we may
+AD4 incorrectly orphan a blkg and leak it. Both blkg creation and
+AD4 destruction require holding the queue+AF8-lock, so if the QUEUE+AF8-FLAG+AF8-DYING
+AD4 flag is set after we've checked it, it means blkg destruction hasn't
+AD4 started because it has to wait on the queue+AF8-lock. If QUEUE+AF8-FLAG+AF8-DYING is
+AD4 set, then we have no guarantee of knowing what phase blkg destruction is
+AD4 in leading to a potential leak.

Hi Dennis,

To answer my own question: since all queue flag manipulations are protected
by the queue lock and since blkg+AF8-create() is called with the queue lock held
the above code does not need any further protection. Hence feel free to add
the following:

Reviewed-by: Bart Van Assche +ADw-bvanassche+AEA-acm.org+AD4




2018-12-13 00:45:17

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] blkcg: handle dying request_queue when associating a blkg

On 12/11/18 4:03 PM, Dennis Zhou wrote:
> Between v3 [1] and v4 [2] of the blkg association series, the
> association point moved from generic_make_request_checks(), which is
> called after the request enters the queue, to bio_set_dev(), which is when
> the bio is formed before submit_bio(). When the request_queue goes away,
> the blkgs supporting the request_queue are destroyed and then the
> q->root_blkg is set to %NULL.
>
> This patch adds a %NULL check to blkg_tryget_closest() to prevent the
> NPE caused by the above. It also adds a guard to see if the
> request_queue is dying when creating a blkg to prevent creating a blkg
> for a dead request_queue.

Applied, thanks Dennis.

--
Jens Axboe


2018-12-13 15:50:11

by Dennis Zhou

[permalink] [raw]
Subject: Re: [PATCH] blkcg: handle dying request_queue when associating a blkg

On Wed, Dec 12, 2018 at 03:54:52PM -0800, Bart Van Assche wrote:
> On Tue, 2018-12-11 at 23:06 -0500, Dennis Zhou wrote:
> > Hi Bart,
> >
> > On Tue, Dec 11, 2018 at 03:16:13PM -0800, Bart Van Assche wrote:
> > > On Tue, 2018-12-11 at 18:03 -0500, Dennis Zhou wrote:
> > > > diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> > > > index 6bd0619a7d6e..c30661ddc873 100644
> > > > --- a/block/blk-cgroup.c
> > > > +++ b/block/blk-cgroup.c
> > > > @@ -202,6 +202,12 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
> > > > WARN_ON_ONCE(!rcu_read_lock_held());
> > > > lockdep_assert_held(&q->queue_lock);
> > > >
> > > > + /* request_queue is dying, do not create/recreate a blkg */
> > > > + if (blk_queue_dying(q)) {
> > > > + ret = -ENODEV;
> > > > + goto err_free_blkg;
> > > > + }
> > > > +
> > > > /* blkg holds a reference to blkcg */
> > > > if (!css_tryget_online(&blkcg->css)) {
> > > > ret = -ENODEV;
> > >
> > > What prevents that the queue state changes after blk_queue_dying() has returned
> > > and before blkg_create() returns? Are you sure you don't need to protect this
> > > code with a blk_queue_enter() / blk_queue_exit() pair?
> > >
> >
> > Hmmm. So I think the idea is that we rely on normal shutdown as I don't
> > think there is anything wrong with creating a blkg on a dying
> > request_queue. When we are doing association, the request_queue should
> > be pinned by the open call. What we are racing against is when the
> > request_queue is shutting down, it goes around and destroys the blkgs.
> > For clarity, QUEUE_FLAG_DYING is set in blk_cleanup_queue() before
> > calling blk_exit_queue() which eventually calls blkcg_exit_queue().
> >
> > The use of blk_queue_dying() is to determine whether blkg shutdown has
> > already started as if we create one after it has started, we may
> > incorrectly orphan a blkg and leak it. Both blkg creation and
> > destruction require holding the queue_lock, so if the QUEUE_FLAG_DYING
> > flag is set after we've checked it, it means blkg destruction hasn't
> > started because it has to wait on the queue_lock. If QUEUE_FLAG_DYING is
> > set, then we have no guarantee of knowing what phase blkg destruction is
> > in leading to a potential leak.
>
> Hi Dennis,
>
> To answer my own question: since all queue flag manipulations are protected
> by the queue lock and since blkg_create() is called with the queue lock held
> the above code does not need any further protection. Hence feel free to add
> the following:
>
> Reviewed-by: Bart Van Assche <[email protected]>
>

It seems that Christoph in 57d74df90783 ("block: use atomic bitops for
->queue_flags") changed it so that flag manipulations no longer are
protected by the queue_lock in for-4.21/block. But I think my
explanation above suffices that we will always be able to clean up a
blkg created as long as the QUEUE_FLAG_DYING is not set.

Thanks for reviewing this!
Dennis