2019-03-25 05:39:23

by jianchao.wang

[permalink] [raw]
Subject: [PATCH V2 0/8]: blk-mq: use static_rqs to iterate busy tags

Hi Jens

As we know, there is a risk of accesing stale requests when iterate
in-flight requests with tags->rqs[] and this has been talked in following
thread,
[1] https://marc.info/?l=linux-scsi&m=154511693912752&w=2
[2] https://marc.info/?l=linux-block&m=154526189023236&w=2

A typical sence could be
blk_mq_get_request blk_mq_queue_tag_busy_iter
-> blk_mq_get_tag
-> bt_for_each
-> bt_iter
-> rq = taags->rqs[]
-> rq->q
-> blk_mq_rq_ctx_init
-> data->hctx->tags->rqs[rq->tag] = rq;

The root cause is that there is a window between set bit on tag sbitmap
and set tags->rqs[].

This patch would fix this issue by iterating requests with tags->static_rqs[]
instead of tags->rqs[] which would be changed dynamically. Moreover,
we will try to get a non-zero q_usage_counter before access hctxs and tags and
thus could avoid the race with updating nr_hw_queues, switching io scheduler
and even queue clean up which are all under a frozen and drained queue.

The 1st patch get rid of the useless of synchronize_rcu in __blk_mq_update_nr_hw_queues

The 2nd patch modify the blk_mq_queue_tag_busy_iter to use tags->static_rqs[]
instead of tags->rqs[] to iterate the busy tags.

The 3rd ~ 7th patch change the blk_mq_tagset_busy_iter to blk_mq_queue_tag_busy_iter
which is safer

The 8th patch get rid of the blk_mq_tagset_busy_iter.

Change log

V1 -> V2:
- Add wrapper to hide the 'infligh' parameter to user based on Sagi's suggestion.
- Other misc changes on comment.

Jianchao Wang (8)
blk-mq: get rid of the synchronize_rcu in
blk-mq: use static_rqs instead of rqs to iterate tags
blk-mq: use blk_mq_queue_tag_inflight_iter in debugfs
mtip32xx: use blk_mq_queue_tag_inflight_iter
nbd: use blk_mq_queue_tag_inflight_iter
skd: use blk_mq_queue_tag_inflight_iter
nvme: use blk_mq_queue_tag_inflight_iter
blk-mq: remove blk_mq_tagset_busy_iter

diff stat
block/blk-mq-debugfs.c | 2 +-
block/blk-mq-tag.c | 193 ++++++++++++++------------------------
block/blk-mq-tag.h | 4 +-
block/blk-mq.c | 31 ++----
drivers/block/mtip32xx/mtip32xx.c | 6 +-
drivers/block/nbd.c | 2 +-
drivers/block/skd_main.c | 4 +-
drivers/nvme/host/core.c | 12 +++
drivers/nvme/host/fc.c | 10 +-
drivers/nvme/host/nvme.h | 2 +
drivers/nvme/host/pci.c | 5 +-
drivers/nvme/host/rdma.c | 4 +-
drivers/nvme/host/tcp.c | 5 +-
drivers/nvme/target/loop.c | 4 +-
include/linux/blk-mq.h | 7 +-
15 files changed, 119 insertions(+), 172 deletions(-)

Thanks
Jianchao


2019-03-25 05:37:51

by jianchao.wang

[permalink] [raw]
Subject: [PATCH V2 2/8] blk-mq: use static_rqs instead of rqs to iterate tags

tags->rqs[] will not been cleaned when free driver tag to avoid
an extra store on a shared area in the per io path. But there
is a window between get driver tag and write tags->rqs[], so we
may see stale rq in tags->rqs[] which may have been freed, as
following case,
blk_mq_get_request blk_mq_queue_tag_busy_iter
-> blk_mq_get_tag
-> bt_for_each
-> bt_iter
-> rq = taags->rqs[]
-> rq->q
-> blk_mq_rq_ctx_init
-> data->hctx->tags->rqs[rq->tag] = rq;

In additiion, tags->rqs[] only contains the requests that get
driver tag. It is not accurate for io-scheduler case when account
busy tags in part_in_flight.

To fix both of them, the blk_mq_queue_tag_busy_iter is changed
in this patch to use tags->static_rqs[] instead of tags->rqs[].
We have to identify whether there is a io scheduler attached to
decide to use hctx->tags or hctx->sched_tags. And we will try to
get a non-zero q_usage_counter before that, then could avoid race
with update nr_hw_queues, switch io-scheduler and even queue cleanup.

Add 'inflight' parameter to determine to iterate in-flight
requests or just busy tags and add a new helper interface
blk_mq_queue_tag_inflight_iter to iterate all of the in-flight
tags and export this interface for drivers.

Signed-off-by: Jianchao Wang <[email protected]>
---
block/blk-mq-tag.c | 96 ++++++++++++++++++++++++++++++++++++--------------
block/blk-mq-tag.h | 4 +--
block/blk-mq.c | 27 +++++---------
include/linux/blk-mq.h | 5 +--
4 files changed, 83 insertions(+), 49 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 5f28002..bf7b235 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -216,26 +216,38 @@ struct bt_iter_data {
busy_iter_fn *fn;
void *data;
bool reserved;
+ bool inflight;
};

static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
{
struct bt_iter_data *iter_data = data;
struct blk_mq_hw_ctx *hctx = iter_data->hctx;
- struct blk_mq_tags *tags = hctx->tags;
bool reserved = iter_data->reserved;
+ struct blk_mq_tags *tags;
struct request *rq;

+ tags = hctx->sched_tags ? hctx->sched_tags : hctx->tags;
+
if (!reserved)
bitnr += tags->nr_reserved_tags;
- rq = tags->rqs[bitnr];
+ /*
+ * Because tags->rqs[] will not been cleaned when free driver tag
+ * and there is a window between get driver tag and write tags->rqs[],
+ * so we may see stale rq in tags->rqs[] which may have been freed.
+ * Using static_rqs[] is safer.
+ */
+ rq = tags->static_rqs[bitnr];

/*
- * We can hit rq == NULL here, because the tagging functions
- * test and set the bit before assigning ->rqs[].
+ * There is a small window between get tag and blk_mq_rq_ctx_init,
+ * so rq->q and rq->mq_hctx maybe different.
*/
- if (rq && rq->q == hctx->queue)
- return iter_data->fn(hctx, rq, iter_data->data, reserved);
+ if (rq && rq->q == hctx->queue &&
+ rq->mq_hctx == hctx &&
+ (!iter_data->inflight ||
+ blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT))
+ return iter_data->fn(rq, iter_data->data, reserved);
return true;
}

@@ -246,7 +258,7 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* or the bitmap_tags member of struct blk_mq_tags.
* @fn: Pointer to the function that will be called for each request
* associated with @hctx that has been assigned a driver tag.
- * @fn will be called as follows: @fn(@hctx, rq, @data, @reserved)
+ * @fn will be called as follows: @fn(rq, @data, @reserved)
* where rq is a pointer to a request. Return true to continue
* iterating tags, false to stop.
* @data: Will be passed as third argument to @fn.
@@ -254,13 +266,14 @@ static bool bt_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
* bitmap_tags member of struct blk_mq_tags.
*/
static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt,
- busy_iter_fn *fn, void *data, bool reserved)
+ busy_iter_fn *fn, void *data, bool reserved, bool inflight)
{
struct bt_iter_data iter_data = {
.hctx = hctx,
.fn = fn,
.data = data,
.reserved = reserved,
+ .inflight = inflight,
};

sbitmap_for_each_set(&bt->sb, bt_iter, &iter_data);
@@ -362,36 +375,33 @@ void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
EXPORT_SYMBOL(blk_mq_tagset_busy_iter);

/**
- * blk_mq_queue_tag_busy_iter - iterate over all requests with a driver tag
+ * __blk_mq_queue_tag_busy_iter - iterate over all busy or inflight tags
* @q: Request queue to examine.
- * @fn: Pointer to the function that will be called for each request
- * on @q. @fn will be called as follows: @fn(hctx, rq, @priv,
- * reserved) where rq is a pointer to a request and hctx points
- * to the hardware queue associated with the request. 'reserved'
- * indicates whether or not @rq is a reserved request.
+ * @fn: Pointer to the function that will be called for each
+ * in-flight request issued by @q. @fn will be called as
+ * follows:
+ * @fn(rq, @priv, reserved)
+ * rq is a pointer to a request.'reserved' indicates whether or
+ * not @rq is a reserved request.
* @priv: Will be passed as third argument to @fn.
- *
- * Note: if @q->tag_set is shared with other request queues then @fn will be
- * called for all requests on all queues that share that tag set and not only
- * for requests associated with @q.
*/
-void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
- void *priv)
+static void __blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
+ void *priv, bool inflight)
{
struct blk_mq_hw_ctx *hctx;
int i;

/*
- * __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx
- * while the queue is frozen. So we can use q_usage_counter to avoid
- * racing with it.
+ * Get a reference of the queue unless it has been zero. We use this
+ * to avoid the race with the code that would modify the hctxs after
+ * freeze and drain the queue, including updating nr_hw_queues, io
+ * scheduler switching and queue clean up.
*/
if (!percpu_ref_tryget(&q->q_usage_counter))
return;

queue_for_each_hw_ctx(q, hctx, i) {
- struct blk_mq_tags *tags = hctx->tags;
-
+ struct blk_mq_tags *tags;
/*
* If no software queues are currently mapped to this
* hardware queue, there's nothing to check
@@ -399,13 +409,45 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
if (!blk_mq_hw_queue_mapped(hctx))
continue;

+ tags = hctx->sched_tags ? hctx->sched_tags : hctx->tags;
+
if (tags->nr_reserved_tags)
- bt_for_each(hctx, &tags->breserved_tags, fn, priv, true);
- bt_for_each(hctx, &tags->bitmap_tags, fn, priv, false);
+ bt_for_each(hctx, &tags->breserved_tags,
+ fn, priv, true, inflight);
+ bt_for_each(hctx, &tags->bitmap_tags,
+ fn, priv, false, inflight);
+ /*
+ * flush_rq represents the rq with REQ_PREFLUSH and REQ_FUA
+ * (if FUA is not supported by device) to be issued to
+ * device. So we need to consider it when iterate inflight
+ * rqs, but needn't to count it when iterate busy tags.
+ */
+ if (inflight &&
+ blk_mq_rq_state(hctx->fq->flush_rq) == MQ_RQ_IN_FLIGHT)
+ fn(hctx->fq->flush_rq, priv, false);
}
blk_queue_exit(q);
}

+/*
+ * Iterate all the busy tags including pending and in-flight ones.
+ */
+void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
+ void *priv)
+{
+ __blk_mq_queue_tag_busy_iter(q, fn, priv, false);
+}
+
+/*
+ * Iterate all the inflight tags.
+ */
+void blk_mq_queue_tag_inflight_iter(struct request_queue *q,
+ busy_iter_fn *fn, void *priv)
+{
+ __blk_mq_queue_tag_busy_iter(q, fn, priv, true);
+}
+EXPORT_SYMBOL(blk_mq_queue_tag_inflight_iter);
+
static int bt_alloc(struct sbitmap_queue *bt, unsigned int depth,
bool round_robin, int node)
{
diff --git a/block/blk-mq-tag.h b/block/blk-mq-tag.h
index 61deab0..bcbb699 100644
--- a/block/blk-mq-tag.h
+++ b/block/blk-mq-tag.h
@@ -32,9 +32,9 @@ extern bool blk_mq_has_free_tags(struct blk_mq_tags *tags);
extern int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
struct blk_mq_tags **tags,
unsigned int depth, bool can_grow);
-extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);
-void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
+extern void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
void *priv);
+extern void blk_mq_tag_wakeup_all(struct blk_mq_tags *tags, bool);

static inline struct sbq_wait_state *bt_wait_ptr(struct sbitmap_queue *bt,
struct blk_mq_hw_ctx *hctx)
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2102d91..531522e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -93,8 +93,7 @@ struct mq_inflight {
unsigned int *inflight;
};

-static bool blk_mq_check_inflight(struct blk_mq_hw_ctx *hctx,
- struct request *rq, void *priv,
+static bool blk_mq_check_inflight(struct request *rq, void *priv,
bool reserved)
{
struct mq_inflight *mi = priv;
@@ -119,8 +118,7 @@ unsigned int blk_mq_in_flight(struct request_queue *q, struct hd_struct *part)
return inflight[0];
}

-static bool blk_mq_check_inflight_rw(struct blk_mq_hw_ctx *hctx,
- struct request *rq, void *priv,
+static bool blk_mq_check_inflight_rw(struct request *rq, void *priv,
bool reserved)
{
struct mq_inflight *mi = priv;
@@ -809,28 +807,22 @@ struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag)
}
EXPORT_SYMBOL(blk_mq_tag_to_rq);

-static bool blk_mq_rq_inflight(struct blk_mq_hw_ctx *hctx, struct request *rq,
- void *priv, bool reserved)
+static bool blk_mq_rq_inflight(struct request *rq, void *priv, bool reserved)
{
+ bool *busy = priv;
/*
* If we find a request that is inflight and the queue matches,
* we know the queue is busy. Return false to stop the iteration.
*/
- if (rq->state == MQ_RQ_IN_FLIGHT && rq->q == hctx->queue) {
- bool *busy = priv;
-
- *busy = true;
- return false;
- }
-
- return true;
+ *busy = true;
+ return false;
}

bool blk_mq_queue_inflight(struct request_queue *q)
{
bool busy = false;

- blk_mq_queue_tag_busy_iter(q, blk_mq_rq_inflight, &busy);
+ blk_mq_queue_tag_inflight_iter(q, blk_mq_rq_inflight, &busy);
return busy;
}
EXPORT_SYMBOL_GPL(blk_mq_queue_inflight);
@@ -870,8 +862,7 @@ static bool blk_mq_req_expired(struct request *rq, unsigned long *next)
return false;
}

-static bool blk_mq_check_expired(struct blk_mq_hw_ctx *hctx,
- struct request *rq, void *priv, bool reserved)
+static bool blk_mq_check_expired(struct request *rq, void *priv, bool reserved)
{
unsigned long *next = priv;

@@ -932,7 +923,7 @@ static void blk_mq_timeout_work(struct work_struct *work)
if (!percpu_ref_tryget(&q->q_usage_counter))
return;

- blk_mq_queue_tag_busy_iter(q, blk_mq_check_expired, &next);
+ blk_mq_queue_tag_inflight_iter(q, blk_mq_check_expired, &next);

if (next != 0) {
mod_timer(&q->timeout, next);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index b0c814b..dff9bb6 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -132,8 +132,7 @@ typedef int (init_request_fn)(struct blk_mq_tag_set *set, struct request *,
typedef void (exit_request_fn)(struct blk_mq_tag_set *set, struct request *,
unsigned int);

-typedef bool (busy_iter_fn)(struct blk_mq_hw_ctx *, struct request *, void *,
- bool);
+typedef bool (busy_iter_fn)(struct request *, void *, bool);
typedef bool (busy_tag_iter_fn)(struct request *, void *, bool);
typedef int (poll_fn)(struct blk_mq_hw_ctx *);
typedef int (map_queues_fn)(struct blk_mq_tag_set *set);
@@ -321,6 +320,8 @@ bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
void blk_mq_run_hw_queues(struct request_queue *q, bool async);
void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
busy_tag_iter_fn *fn, void *priv);
+void blk_mq_queue_tag_inflight_iter(struct request_queue *q,
+ busy_iter_fn *fn, void *priv);
void blk_mq_freeze_queue(struct request_queue *q);
void blk_mq_unfreeze_queue(struct request_queue *q);
void blk_freeze_queue_start(struct request_queue *q);
--
2.7.4


2019-03-25 05:38:00

by jianchao.wang

[permalink] [raw]
Subject: [PATCH V2 5/8] nbd: use blk_mq_queue_tag_inflight_iter

blk_mq_tagset_busy_iter is not safe that it could get stale request
in tags->rqs[]. Use blk_mq_queue_tag_inflight_iter here.

Signed-off-by: Jianchao Wang <[email protected]>
---
drivers/block/nbd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 90ba9f4..c48984b 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -747,7 +747,7 @@ static bool nbd_clear_req(struct request *req, void *data, bool reserved)
static void nbd_clear_que(struct nbd_device *nbd)
{
blk_mq_quiesce_queue(nbd->disk->queue);
- blk_mq_tagset_busy_iter(&nbd->tag_set, nbd_clear_req, NULL);
+ blk_mq_queue_tag_inflight_iter(nbd->disk->queue, nbd_clear_req, NULL);
blk_mq_unquiesce_queue(nbd->disk->queue);
dev_dbg(disk_to_dev(nbd->disk), "queue cleared\n");
}
--
2.7.4


2019-03-25 05:38:13

by jianchao.wang

[permalink] [raw]
Subject: [PATCH V2 1/8] blk-mq: get rid of the synchronize_rcu in __blk_mq_update_nr_hw_queues

In commit 530ca2c (blk-mq: Allow blocking queue tag iter callbacks),
we try to get a non-zero q_usage_counter to avoid access hctxs that
being modified. So the synchronize_rcu is useless and should be
removed.

Reviewed-by: Ming Lei <[email protected]>
Signed-off-by: Jianchao Wang <[email protected]>
---
block/blk-mq-tag.c | 4 +---
block/blk-mq.c | 4 ----
2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index a4931fc..5f28002 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -384,9 +384,7 @@ void blk_mq_queue_tag_busy_iter(struct request_queue *q, busy_iter_fn *fn,
/*
* __blk_mq_update_nr_hw_queues() updates nr_hw_queues and queue_hw_ctx
* while the queue is frozen. So we can use q_usage_counter to avoid
- * racing with it. __blk_mq_update_nr_hw_queues() uses
- * synchronize_rcu() to ensure this function left the critical section
- * below.
+ * racing with it.
*/
if (!percpu_ref_tryget(&q->q_usage_counter))
return;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a9c1816..2102d91 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -3211,10 +3211,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
list_for_each_entry(q, &set->tag_list, tag_set_list)
blk_mq_freeze_queue(q);
/*
- * Sync with blk_mq_queue_tag_busy_iter.
- */
- synchronize_rcu();
- /*
* Switch IO scheduler to 'none', cleaning up the data associated
* with the previous scheduler. We will switch back once we are done
* updating the new sw to hw queue mappings.
--
2.7.4


2019-03-25 05:38:18

by jianchao.wang

[permalink] [raw]
Subject: [PATCH V2 3/8] blk-mq: use blk_mq_queue_tag_inflight_iter in debugfs

blk_mq_tagset_busy_iter is not safe that it could get stale request
in tags->rqs[]. Use blk_mq_queue_tag_inflight_iter here.

Signed-off-by: Jianchao Wang <[email protected]>
---
block/blk-mq-debugfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index ec1d18c..b7f2538 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -438,7 +438,7 @@ static int hctx_busy_show(void *data, struct seq_file *m)
struct blk_mq_hw_ctx *hctx = data;
struct show_busy_params params = { .m = m, .hctx = hctx };

- blk_mq_tagset_busy_iter(hctx->queue->tag_set, hctx_show_busy_rq,
+ blk_mq_queue_tag_inflight_iter(hctx->queue, hctx_show_busy_rq,
&params);

return 0;
--
2.7.4


2019-03-25 05:38:21

by jianchao.wang

[permalink] [raw]
Subject: [PATCH V2 6/8] skd: use blk_mq_queue_tag_inflight_iter

blk_mq_tagset_busy_iter is not safe that it could get stale request
in tags->rqs[]. Use blk_mq_queue_tag_inflight_iter here.

Signed-off-by: Jianchao Wang <[email protected]>
---
drivers/block/skd_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/skd_main.c b/drivers/block/skd_main.c
index 7d3ad6c..0213b19 100644
--- a/drivers/block/skd_main.c
+++ b/drivers/block/skd_main.c
@@ -395,7 +395,7 @@ static int skd_in_flight(struct skd_device *skdev)
{
int count = 0;

- blk_mq_tagset_busy_iter(&skdev->tag_set, skd_inc_in_flight, &count);
+ blk_mq_queue_tag_inflight_iter(skdev->queue, skd_inc_in_flight, &count);

return count;
}
@@ -1916,7 +1916,7 @@ static bool skd_recover_request(struct request *req, void *data, bool reserved)

static void skd_recover_requests(struct skd_device *skdev)
{
- blk_mq_tagset_busy_iter(&skdev->tag_set, skd_recover_request, skdev);
+ blk_mq_queue_tag_inflight_iter(skdev->queue, skd_recover_request, skdev);
}

static void skd_isr_msg_from_dev(struct skd_device *skdev)
--
2.7.4


2019-03-25 05:38:27

by jianchao.wang

[permalink] [raw]
Subject: [PATCH V2 7/8] nvme: use blk_mq_queue_tag_inflight_iter

blk_mq_tagset_inflight_iter is not safe that it could get stale request
in tags->rqs[]. Use blk_mq_queue_tag_inflight_iter here. A new helper
interface nvme_iterate_inflight_rqs is introduced to iterate
all of the ns under a ctrl.

Signed-off-by: Jianchao Wang <[email protected]>
---
drivers/nvme/host/core.c | 12 ++++++++++++
drivers/nvme/host/fc.c | 10 +++++-----
drivers/nvme/host/nvme.h | 2 ++
drivers/nvme/host/pci.c | 5 +++--
drivers/nvme/host/rdma.c | 4 ++--
drivers/nvme/host/tcp.c | 5 +++--
drivers/nvme/target/loop.c | 4 ++--
7 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 4706019..d6c53fe 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3874,6 +3874,18 @@ void nvme_start_queues(struct nvme_ctrl *ctrl)
}
EXPORT_SYMBOL_GPL(nvme_start_queues);

+void nvme_iterate_inflight_rqs(struct nvme_ctrl *ctrl,
+ busy_iter_fn *fn, void *data)
+{
+ struct nvme_ns *ns;
+
+ down_read(&ctrl->namespaces_rwsem);
+ list_for_each_entry(ns, &ctrl->namespaces, list)
+ blk_mq_queue_tag_inflight_iter(ns->queue, fn, data);
+ up_read(&ctrl->namespaces_rwsem);
+}
+EXPORT_SYMBOL_GPL(nvme_iterate_inflight_rqs);
+
int __init nvme_core_init(void)
{
int result = -ENOMEM;
diff --git a/drivers/nvme/host/fc.c b/drivers/nvme/host/fc.c
index f3b9d91..667da72 100644
--- a/drivers/nvme/host/fc.c
+++ b/drivers/nvme/host/fc.c
@@ -2367,7 +2367,7 @@ nvme_fc_complete_rq(struct request *rq)
/*
* This routine is used by the transport when it needs to find active
* io on a queue that is to be terminated. The transport uses
- * blk_mq_tagset_busy_itr() to find the busy requests, which then invoke
+ * blk_mq_queue_tag_inflight_iter() to find the busy requests, which then invoke
* this routine to kill them on a 1 by 1 basis.
*
* As FC allocates FC exchange for each io, the transport must contact
@@ -2740,7 +2740,7 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
* If io queues are present, stop them and terminate all outstanding
* ios on them. As FC allocates FC exchange for each io, the
* transport must contact the LLDD to terminate the exchange,
- * thus releasing the FC exchange. We use blk_mq_tagset_busy_itr()
+ * thus releasing the FC exchange. We use blk_mq_queue_tag_inflight_iter
* to tell us what io's are busy and invoke a transport routine
* to kill them with the LLDD. After terminating the exchange
* the LLDD will call the transport's normal io done path, but it
@@ -2750,7 +2750,7 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)
*/
if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(&ctrl->ctrl);
- blk_mq_tagset_busy_iter(&ctrl->tag_set,
+ nvme_iterate_inflight_rqs(&ctrl->ctrl,
nvme_fc_terminate_exchange, &ctrl->ctrl);
}

@@ -2768,11 +2768,11 @@ nvme_fc_delete_association(struct nvme_fc_ctrl *ctrl)

/*
* clean up the admin queue. Same thing as above.
- * use blk_mq_tagset_busy_itr() and the transport routine to
+ * use blk_mq_queue_tag_inflight_iter() and the transport routine to
* terminate the exchanges.
*/
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
- blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
+ blk_mq_queue_tag_inflight_iter(ctrl->ctrl.admin_q,
nvme_fc_terminate_exchange, &ctrl->ctrl);

/* kill the aens as they are a separate path */
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 527d645..4c6bc803 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -445,6 +445,8 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl);
void nvme_wait_freeze(struct nvme_ctrl *ctrl);
void nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout);
void nvme_start_freeze(struct nvme_ctrl *ctrl);
+void nvme_iterate_inflight_rqs(struct nvme_ctrl *ctrl,
+ busy_iter_fn *fn, void *data);

#define NVME_QID_ANY -1
struct request *nvme_alloc_request(struct request_queue *q,
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index a90cf5d..96faa36 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2430,8 +2430,9 @@ static void nvme_dev_disable(struct nvme_dev *dev, bool shutdown)
nvme_suspend_queue(&dev->queues[0]);
nvme_pci_disable(dev);

- blk_mq_tagset_busy_iter(&dev->tagset, nvme_cancel_request, &dev->ctrl);
- blk_mq_tagset_busy_iter(&dev->admin_tagset, nvme_cancel_request, &dev->ctrl);
+ nvme_iterate_inflight_rqs(&dev->ctrl, nvme_cancel_request, &dev->ctrl);
+ blk_mq_queue_tag_inflight_iter(dev->ctrl.admin_q,
+ nvme_cancel_request, &dev->ctrl);

/*
* The driver will not be starting up queues again if shutting down so
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 11a5eca..5660200 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -914,7 +914,7 @@ static void nvme_rdma_teardown_admin_queue(struct nvme_rdma_ctrl *ctrl,
{
blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
nvme_rdma_stop_queue(&ctrl->queues[0]);
- blk_mq_tagset_busy_iter(&ctrl->admin_tag_set, nvme_cancel_request,
+ blk_mq_queue_tag_inflight_iter(ctrl->ctrl.admin_q, nvme_cancel_request,
&ctrl->ctrl);
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
nvme_rdma_destroy_admin_queue(ctrl, remove);
@@ -926,7 +926,7 @@ static void nvme_rdma_teardown_io_queues(struct nvme_rdma_ctrl *ctrl,
if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(&ctrl->ctrl);
nvme_rdma_stop_io_queues(ctrl);
- blk_mq_tagset_busy_iter(&ctrl->tag_set, nvme_cancel_request,
+ nvme_iterate_inflight_rqs(&ctrl->ctrl, nvme_cancel_request,
&ctrl->ctrl);
if (remove)
nvme_start_queues(&ctrl->ctrl);
diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
index e7e0888..4c825dc 100644
--- a/drivers/nvme/host/tcp.c
+++ b/drivers/nvme/host/tcp.c
@@ -1710,7 +1710,8 @@ static void nvme_tcp_teardown_admin_queue(struct nvme_ctrl *ctrl,
{
blk_mq_quiesce_queue(ctrl->admin_q);
nvme_tcp_stop_queue(ctrl, 0);
- blk_mq_tagset_busy_iter(ctrl->admin_tagset, nvme_cancel_request, ctrl);
+ blk_mq_queue_tag_inflight_iter(ctrl->admin_q,
+ nvme_cancel_request, ctrl);
blk_mq_unquiesce_queue(ctrl->admin_q);
nvme_tcp_destroy_admin_queue(ctrl, remove);
}
@@ -1722,7 +1723,7 @@ static void nvme_tcp_teardown_io_queues(struct nvme_ctrl *ctrl,
return;
nvme_stop_queues(ctrl);
nvme_tcp_stop_io_queues(ctrl);
- blk_mq_tagset_busy_iter(ctrl->tagset, nvme_cancel_request, ctrl);
+ nvme_iterate_inflight_rqs(ctrl, nvme_cancel_request, ctrl);
if (remove)
nvme_start_queues(ctrl);
nvme_tcp_destroy_io_queues(ctrl, remove);
diff --git a/drivers/nvme/target/loop.c b/drivers/nvme/target/loop.c
index b9f623a..50d7288 100644
--- a/drivers/nvme/target/loop.c
+++ b/drivers/nvme/target/loop.c
@@ -421,7 +421,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
{
if (ctrl->ctrl.queue_count > 1) {
nvme_stop_queues(&ctrl->ctrl);
- blk_mq_tagset_busy_iter(&ctrl->tag_set,
+ nvme_iterate_inflight_rqs(&ctrl->ctrl,
nvme_cancel_request, &ctrl->ctrl);
nvme_loop_destroy_io_queues(ctrl);
}
@@ -430,7 +430,7 @@ static void nvme_loop_shutdown_ctrl(struct nvme_loop_ctrl *ctrl)
nvme_shutdown_ctrl(&ctrl->ctrl);

blk_mq_quiesce_queue(ctrl->ctrl.admin_q);
- blk_mq_tagset_busy_iter(&ctrl->admin_tag_set,
+ blk_mq_queue_tag_inflight_iter(ctrl->ctrl.admin_q,
nvme_cancel_request, &ctrl->ctrl);
blk_mq_unquiesce_queue(ctrl->ctrl.admin_q);
nvme_loop_destroy_admin_queue(ctrl);
--
2.7.4


2019-03-25 05:38:40

by jianchao.wang

[permalink] [raw]
Subject: [PATCH V2 8/8] blk-mq: remove blk_mq_tagset_busy_iter

As nobody uses blk_mq_tagset_busy_iter, remove it.

Signed-off-by: Jianchao Wang <[email protected]>
---
block/blk-mq-tag.c | 95 --------------------------------------------------
include/linux/blk-mq.h | 2 --
2 files changed, 97 deletions(-)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index bf7b235..a6a28dd 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -279,101 +279,6 @@ static void bt_for_each(struct blk_mq_hw_ctx *hctx, struct sbitmap_queue *bt,
sbitmap_for_each_set(&bt->sb, bt_iter, &iter_data);
}

-struct bt_tags_iter_data {
- struct blk_mq_tags *tags;
- busy_tag_iter_fn *fn;
- void *data;
- bool reserved;
-};
-
-static bool bt_tags_iter(struct sbitmap *bitmap, unsigned int bitnr, void *data)
-{
- struct bt_tags_iter_data *iter_data = data;
- struct blk_mq_tags *tags = iter_data->tags;
- bool reserved = iter_data->reserved;
- struct request *rq;
-
- if (!reserved)
- bitnr += tags->nr_reserved_tags;
-
- /*
- * We can hit rq == NULL here, because the tagging functions
- * test and set the bit before assining ->rqs[].
- */
- rq = tags->rqs[bitnr];
- if (rq && blk_mq_request_started(rq))
- return iter_data->fn(rq, iter_data->data, reserved);
-
- return true;
-}
-
-/**
- * bt_tags_for_each - iterate over the requests in a tag map
- * @tags: Tag map to iterate over.
- * @bt: sbitmap to examine. This is either the breserved_tags member
- * or the bitmap_tags member of struct blk_mq_tags.
- * @fn: Pointer to the function that will be called for each started
- * request. @fn will be called as follows: @fn(rq, @data,
- * @reserved) where rq is a pointer to a request. Return true
- * to continue iterating tags, false to stop.
- * @data: Will be passed as second argument to @fn.
- * @reserved: Indicates whether @bt is the breserved_tags member or the
- * bitmap_tags member of struct blk_mq_tags.
- */
-static void bt_tags_for_each(struct blk_mq_tags *tags, struct sbitmap_queue *bt,
- busy_tag_iter_fn *fn, void *data, bool reserved)
-{
- struct bt_tags_iter_data iter_data = {
- .tags = tags,
- .fn = fn,
- .data = data,
- .reserved = reserved,
- };
-
- if (tags->rqs)
- sbitmap_for_each_set(&bt->sb, bt_tags_iter, &iter_data);
-}
-
-/**
- * blk_mq_all_tag_busy_iter - iterate over all started requests in a tag map
- * @tags: Tag map to iterate over.
- * @fn: Pointer to the function that will be called for each started
- * request. @fn will be called as follows: @fn(rq, @priv,
- * reserved) where rq is a pointer to a request. 'reserved'
- * indicates whether or not @rq is a reserved request. Return
- * true to continue iterating tags, false to stop.
- * @priv: Will be passed as second argument to @fn.
- */
-static void blk_mq_all_tag_busy_iter(struct blk_mq_tags *tags,
- busy_tag_iter_fn *fn, void *priv)
-{
- if (tags->nr_reserved_tags)
- bt_tags_for_each(tags, &tags->breserved_tags, fn, priv, true);
- bt_tags_for_each(tags, &tags->bitmap_tags, fn, priv, false);
-}
-
-/**
- * blk_mq_tagset_busy_iter - iterate over all started requests in a tag set
- * @tagset: Tag set to iterate over.
- * @fn: Pointer to the function that will be called for each started
- * request. @fn will be called as follows: @fn(rq, @priv,
- * reserved) where rq is a pointer to a request. 'reserved'
- * indicates whether or not @rq is a reserved request. Return
- * true to continue iterating tags, false to stop.
- * @priv: Will be passed as second argument to @fn.
- */
-void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
- busy_tag_iter_fn *fn, void *priv)
-{
- int i;
-
- for (i = 0; i < tagset->nr_hw_queues; i++) {
- if (tagset->tags && tagset->tags[i])
- blk_mq_all_tag_busy_iter(tagset->tags[i], fn, priv);
- }
-}
-EXPORT_SYMBOL(blk_mq_tagset_busy_iter);
-
/**
* __blk_mq_queue_tag_busy_iter - iterate over all busy or inflight tags
* @q: Request queue to examine.
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index dff9bb6..3ba8001 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -318,8 +318,6 @@ void blk_mq_unquiesce_queue(struct request_queue *q);
void blk_mq_delay_run_hw_queue(struct blk_mq_hw_ctx *hctx, unsigned long msecs);
bool blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
void blk_mq_run_hw_queues(struct request_queue *q, bool async);
-void blk_mq_tagset_busy_iter(struct blk_mq_tag_set *tagset,
- busy_tag_iter_fn *fn, void *priv);
void blk_mq_queue_tag_inflight_iter(struct request_queue *q,
busy_iter_fn *fn, void *priv);
void blk_mq_freeze_queue(struct request_queue *q);
--
2.7.4


2019-03-25 05:40:36

by jianchao.wang

[permalink] [raw]
Subject: [PATCH V2 4/8] mtip32xx: use blk_mq_queue_tag_inflight_iter

blk_mq_tagset_busy_iter is not safe that it could get stale request
in tags->rqs[]. Use blk_mq_queue_tag_inflight_iter here.

Signed-off-by: Jianchao Wang <[email protected]>
---
drivers/block/mtip32xx/mtip32xx.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 83302ec..103e691 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -2771,12 +2771,12 @@ static int mtip_service_thread(void *data)

blk_mq_quiesce_queue(dd->queue);

- blk_mq_tagset_busy_iter(&dd->tags, mtip_queue_cmd, dd);
+ blk_mq_queue_tag_inflight_iter(dd->queue, mtip_queue_cmd, dd);

set_bit(MTIP_PF_ISSUE_CMDS_BIT, &dd->port->flags);

if (mtip_device_reset(dd))
- blk_mq_tagset_busy_iter(&dd->tags,
+ blk_mq_queue_tag_inflight_iter(dd->queue,
mtip_abort_cmd, dd);

clear_bit(MTIP_PF_TO_ACTIVE_BIT, &dd->port->flags);
@@ -3903,7 +3903,7 @@ static int mtip_block_remove(struct driver_data *dd)

blk_freeze_queue_start(dd->queue);
blk_mq_quiesce_queue(dd->queue);
- blk_mq_tagset_busy_iter(&dd->tags, mtip_no_dev_cleanup, dd);
+ blk_mq_queue_tag_inflight_iter(dd->queue, mtip_no_dev_cleanup, dd);
blk_mq_unquiesce_queue(dd->queue);

/*
--
2.7.4