2020-03-16 10:02:13

by Baolin Wang

[permalink] [raw]
Subject: [RESEND RFC PATCH 0/8] Add MMC packed request support

Hi All,

Now some SD/MMC controllers can support packed command or packed request,
that means it can package multiple requests to host controller to be handled
at one time, which can improve the I/O performence. Thus this patch set
tries to add the MMC packed request function to support packed request or
packed command.

In this patch set, I changed the dispatch_request() interface to allow
dispatching a batch of requests to hardware and expanded the MMC software
queue to support packed request. I also implemented the SD host ADMA3
transfer mode to support packed request. The ADMA3 transfer mode can process
a multi-block data transfer by using a pair of command descriptor and ADMA2
descriptor. In future we can easily expand the MMC packed function to support
packed command.

Below are some comparison data between packed request and non-packed request
with fio tool. The fio command I used is like below with changing the
'--rw' parameter and enabling the direct IO flag to measure the actual hardware
transfer speed. I tested 5 times for each case and output a average speed.

./fio --filename=/dev/mmcblk0p30 --direct=1 --iodepth=20 --rw=read --bs=4K --size=512M --group_reporting --numjobs=20 --name=test_read

My eMMC card working at HS400 Enhanced strobe mode:
[ 2.229856] mmc0: new HS400 Enhanced strobe MMC card at address 0001
[ 2.237566] mmcblk0: mmc0:0001 HBG4a2 29.1 GiB
[ 2.242621] mmcblk0boot0: mmc0:0001 HBG4a2 partition 1 4.00 MiB
[ 2.249110] mmcblk0boot1: mmc0:0001 HBG4a2 partition 2 4.00 MiB
[ 2.255307] mmcblk0rpmb: mmc0:0001 HBG4a2 partition 3 4.00 MiB, chardev (248:0)

1. Non-packed request
1) Sequential read:
Speed: 59.2MiB/s, 60.4MiB/s, 63.6MiB/s, 60.3MiB/s, 59.9MiB/s
Average speed: 60.68MiB/s

2) Random read:
Speed: 31.3MiB/s, 31.4MiB/s, 31.5MiB/s, 31.3MiB/s, 31.3MiB/s
Average speed: 31.36MiB/s

3) Sequential write:
Speed: 71MiB/s, 71.8MiB/s, 72.3MiB/s, 72.2MiB/s, 71MiB/s
Average speed: 71.66MiB/s

4) Random write:
Speed: 68.9MiB/s, 68.7MiB/s, 68.8MiB/s, 68.6MiB/s, 68.8MiB/s
Average speed: 68.76MiB/s

2. Packed request
1) Sequential read:
Speed: 230MiB/s, 230MiB/s, 229MiB/s, 230MiB/s, 229MiB/s
Average speed: 229.6MiB/s

2) Random read:
Speed: 181MiB/s, 181MiB/s, 181MiB/s, 180MiB/s, 181MiB/s
Average speed: 180.8MiB/s

3) Sequential write:
Speed: 175MiB/s, 171MiB/s, 171MiB/s, 172MiB/s, 171MiB/s
Average speed: 172MiB/s

4) Random write:
Speed: 169MiB/s, 169MiB/s, 171MiB/s, 167MiB/s, 170MiB/s
Average speed: 169.2MiB/s

From above data, we can see the packed request can improve the performance
greatly. Any comments are welcome. Thanks a lot.

Baolin Wang (8):
block: Change the dispatch_request() API to support batch requests
block: Allow sending a batch of requests from the scheduler to
hardware
mmc: Add MMC packed request support for MMC software queue
mmc: host: sdhci: Introduce ADMA3 transfer mode
mmc: host: sdhci: Factor out the command configuration
mmc: host: sdhci: Remove redundant sg_count member of struct
sdhci_host
mmc: host: sdhci: Add MMC packed request support
mmc: host: sdhci-sprd: Add MMC packed request support

block/bfq-iosched.c | 38 +++-
block/blk-mq-sched.c | 15 +-
block/blk-mq.c | 2 -
block/blk-settings.c | 13 ++
block/kyber-iosched.c | 80 ++++---
block/mq-deadline.c | 26 ++-
drivers/mmc/core/block.c | 35 ++-
drivers/mmc/core/block.h | 3 +-
drivers/mmc/core/core.c | 26 +++
drivers/mmc/core/core.h | 2 +
drivers/mmc/core/queue.c | 22 +-
drivers/mmc/host/mmc_hsq.c | 271 ++++++++++++++++++++---
drivers/mmc/host/mmc_hsq.h | 25 ++-
drivers/mmc/host/sdhci-sprd.c | 30 ++-
drivers/mmc/host/sdhci.c | 499 ++++++++++++++++++++++++++++++++++++------
drivers/mmc/host/sdhci.h | 61 +++++-
include/linux/blkdev.h | 8 +
include/linux/elevator.h | 2 +-
include/linux/mmc/core.h | 6 +
include/linux/mmc/host.h | 9 +
20 files changed, 991 insertions(+), 182 deletions(-)

--
1.9.1


2020-03-16 10:02:17

by Baolin Wang

[permalink] [raw]
Subject: [RESEND RFC PATCH 1/8] block: Change the dispatch_request() API to support batch requests

Now some SD/MMC host controllers can support packed command or packed request,
that means we can package several requests to host controller at one time
to improve performence.

But the blk-mq always takes one request from the scheduler and dispatch it to
the device, regardless of the driver or the scheduler, so there should only
ever be one request in the local list in blk_mq_dispatch_rq_list(), that means
the bd.last is always true and the driver can not use bd.last to decide if
there are requests are pending now in hardware queue to help to package
requests.

Thus this is a preparation patch, which tries to change the dispatch_request()
API to allow dispatching more than one request from the scheduler.

Suggested-by: Arnd Bergmann <[email protected]>
Signed-off-by: Baolin Wang <[email protected]>
---
block/bfq-iosched.c | 12 +++++++++---
block/blk-mq-sched.c | 15 ++++-----------
block/kyber-iosched.c | 20 +++++++++++++-------
block/mq-deadline.c | 12 +++++++++---
include/linux/elevator.h | 2 +-
5 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 8c436ab..d7a128e 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4789,7 +4789,8 @@ static inline void bfq_update_dispatch_stats(struct request_queue *q,
bool idle_timer_disabled) {}
#endif /* CONFIG_BFQ_CGROUP_DEBUG */

-static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
+static int bfq_dispatch_requests(struct blk_mq_hw_ctx *hctx,
+ struct list_head *list)
{
struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
struct request *rq;
@@ -4811,7 +4812,12 @@ static struct request *bfq_dispatch_request(struct blk_mq_hw_ctx *hctx)
bfq_update_dispatch_stats(hctx->queue, rq, in_serv_queue,
idle_timer_disabled);

- return rq;
+ if (!rq)
+ return 0;
+
+ list_add(&rq->queuelist, list);
+
+ return 1;
}

/*
@@ -6785,7 +6791,7 @@ static ssize_t bfq_low_latency_store(struct elevator_queue *e,
.finish_request = bfq_finish_requeue_request,
.exit_icq = bfq_exit_icq,
.insert_requests = bfq_insert_requests,
- .dispatch_request = bfq_dispatch_request,
+ .dispatch_requests = bfq_dispatch_requests,
.next_request = elv_rb_latter_request,
.former_request = elv_rb_former_request,
.allow_merge = bfq_allow_bio_merge,
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index ca22afd..f49f9d9 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -90,28 +90,21 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator;
LIST_HEAD(rq_list);
+ int ret;

do {
- struct request *rq;
-
if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
break;

if (!blk_mq_get_dispatch_budget(hctx))
break;

- rq = e->type->ops.dispatch_request(hctx);
- if (!rq) {
+ ret = e->type->ops.dispatch_requests(hctx, &rq_list);
+ if (ret == 0) {
blk_mq_put_dispatch_budget(hctx);
break;
}

- /*
- * Now this rq owns the budget which has to be released
- * if this rq won't be queued to driver via .queue_rq()
- * in blk_mq_dispatch_rq_list().
- */
- list_add(&rq->queuelist, &rq_list);
} while (blk_mq_dispatch_rq_list(q, &rq_list, true));
}

@@ -171,7 +164,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator;
- const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
+ const bool has_sched_dispatch = e && e->type->ops.dispatch_requests;
LIST_HEAD(rq_list);

/* RCU or SRCU read lock is needed before checking quiesced flag */
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 34dcea0..8f58434 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -796,12 +796,13 @@ static int kyber_get_domain_token(struct kyber_queue_data *kqd,
return NULL;
}

-static struct request *kyber_dispatch_request(struct blk_mq_hw_ctx *hctx)
+static int kyber_dispatch_requests(struct blk_mq_hw_ctx *hctx,
+ struct list_head *list)
{
struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
struct kyber_hctx_data *khd = hctx->sched_data;
struct request *rq;
- int i;
+ int i, ret = 0;

spin_lock(&khd->lock);

@@ -811,8 +812,11 @@ static struct request *kyber_dispatch_request(struct blk_mq_hw_ctx *hctx)
*/
if (khd->batching < kyber_batch_size[khd->cur_domain]) {
rq = kyber_dispatch_cur_domain(kqd, khd, hctx);
- if (rq)
+ if (rq) {
+ list_add(&rq->queuelist, list);
+ ret = 1;
goto out;
+ }
}

/*
@@ -832,14 +836,16 @@ static struct request *kyber_dispatch_request(struct blk_mq_hw_ctx *hctx)
khd->cur_domain++;

rq = kyber_dispatch_cur_domain(kqd, khd, hctx);
- if (rq)
+ if (rq) {
+ list_add(&rq->queuelist, list);
+ ret = 1;
goto out;
+ }
}

- rq = NULL;
out:
spin_unlock(&khd->lock);
- return rq;
+ return ret;
}

static bool kyber_has_work(struct blk_mq_hw_ctx *hctx)
@@ -1020,7 +1026,7 @@ static int kyber_batching_show(void *data, struct seq_file *m)
.finish_request = kyber_finish_request,
.requeue_request = kyber_finish_request,
.completed_request = kyber_completed_request,
- .dispatch_request = kyber_dispatch_request,
+ .dispatch_requests = kyber_dispatch_requests,
.has_work = kyber_has_work,
},
#ifdef CONFIG_BLK_DEBUG_FS
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index b490f47..9fbffba 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -378,7 +378,8 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd)
* different hardware queue. This is because mq-deadline has shared
* state for all hardware queues, in terms of sorting, FIFOs, etc.
*/
-static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
+static int dd_dispatch_requests(struct blk_mq_hw_ctx *hctx,
+ struct list_head *list)
{
struct deadline_data *dd = hctx->queue->elevator->elevator_data;
struct request *rq;
@@ -387,7 +388,12 @@ static struct request *dd_dispatch_request(struct blk_mq_hw_ctx *hctx)
rq = __dd_dispatch_request(dd);
spin_unlock(&dd->lock);

- return rq;
+ if (!rq)
+ return 0;
+
+ list_add(&rq->queuelist, list);
+
+ return 1;
}

static void dd_exit_queue(struct elevator_queue *e)
@@ -774,7 +780,7 @@ static void deadline_dispatch_stop(struct seq_file *m, void *v)
static struct elevator_type mq_deadline = {
.ops = {
.insert_requests = dd_insert_requests,
- .dispatch_request = dd_dispatch_request,
+ .dispatch_requests = dd_dispatch_requests,
.prepare_request = dd_prepare_request,
.finish_request = dd_finish_request,
.next_request = elv_rb_latter_request,
diff --git a/include/linux/elevator.h b/include/linux/elevator.h
index 901bda3..a65bf5d 100644
--- a/include/linux/elevator.h
+++ b/include/linux/elevator.h
@@ -42,7 +42,7 @@ struct elevator_mq_ops {
void (*prepare_request)(struct request *, struct bio *bio);
void (*finish_request)(struct request *);
void (*insert_requests)(struct blk_mq_hw_ctx *, struct list_head *, bool);
- struct request *(*dispatch_request)(struct blk_mq_hw_ctx *);
+ int (*dispatch_requests)(struct blk_mq_hw_ctx *, struct list_head *);
bool (*has_work)(struct blk_mq_hw_ctx *);
void (*completed_request)(struct request *, u64);
void (*requeue_request)(struct request *);
--
1.9.1

2020-03-16 10:02:24

by Baolin Wang

[permalink] [raw]
Subject: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

As we know, some SD/MMC host controllers can support packed request,
that means we can package several requests to host controller at one
time to improve performence. So the hardware driver expects the blk-mq
can dispatch a batch of requests at one time, and driver can use bd.last
to indicate if it is the last request in the batch to help to combine
requests as much as possible.

Thus we should add batch requests setting from the block driver to tell
the scheduler how many requests can be dispatched in a batch, as well
as changing the scheduler to dispatch more than one request if setting
the maximum batch requests number.

Signed-off-by: Baolin Wang <[email protected]>
---
block/bfq-iosched.c | 32 ++++++++++++++--------
block/blk-mq.c | 2 --
block/blk-settings.c | 13 +++++++++
block/kyber-iosched.c | 74 +++++++++++++++++++++++++++-----------------------
block/mq-deadline.c | 20 ++++++++++----
include/linux/blkdev.h | 8 ++++++
6 files changed, 95 insertions(+), 54 deletions(-)

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index d7a128e..9c1e3aa 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -4793,29 +4793,37 @@ static int bfq_dispatch_requests(struct blk_mq_hw_ctx *hctx,
struct list_head *list)
{
struct bfq_data *bfqd = hctx->queue->elevator->elevator_data;
+ unsigned int batch_reqs = queue_max_batch_requests(hctx->queue) ? : 1;
struct request *rq;
struct bfq_queue *in_serv_queue;
bool waiting_rq, idle_timer_disabled;
+ int i;

- spin_lock_irq(&bfqd->lock);
+ for (i = 0; i < batch_reqs; i++) {
+ spin_lock_irq(&bfqd->lock);

- in_serv_queue = bfqd->in_service_queue;
- waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);
+ in_serv_queue = bfqd->in_service_queue;
+ waiting_rq = in_serv_queue && bfq_bfqq_wait_request(in_serv_queue);

- rq = __bfq_dispatch_request(hctx);
+ rq = __bfq_dispatch_request(hctx);

- idle_timer_disabled =
- waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);
+ idle_timer_disabled =
+ waiting_rq && !bfq_bfqq_wait_request(in_serv_queue);

- spin_unlock_irq(&bfqd->lock);
+ spin_unlock_irq(&bfqd->lock);

- bfq_update_dispatch_stats(hctx->queue, rq, in_serv_queue,
- idle_timer_disabled);
+ bfq_update_dispatch_stats(hctx->queue, rq, in_serv_queue,
+ idle_timer_disabled);

- if (!rq)
- return 0;
+ if (!rq) {
+ if (list_empty(list))
+ return 0;

- list_add(&rq->queuelist, list);
+ return 1;
+ }
+
+ list_add(&rq->queuelist, list);
+ }

return 1;
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index a12b176..2588e7a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1193,8 +1193,6 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
if (list_empty(list))
return false;

- WARN_ON(!list_is_singular(list) && got_budget);
-
/*
* Now process all the entries, sending them to the driver.
*/
diff --git a/block/blk-settings.c b/block/blk-settings.c
index c8eda2e..8c0b325 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -59,6 +59,7 @@ void blk_set_default_limits(struct queue_limits *lim)
lim->io_opt = 0;
lim->misaligned = 0;
lim->zoned = BLK_ZONED_NONE;
+ lim->max_batch_reqs = 1;
}
EXPORT_SYMBOL(blk_set_default_limits);

@@ -871,6 +872,18 @@ bool blk_queue_can_use_dma_map_merging(struct request_queue *q,
}
EXPORT_SYMBOL_GPL(blk_queue_can_use_dma_map_merging);

+/**
+ * blk_queue_max_batch_requests - set max requests for batch processing
+ * @q: the request queue for the device
+ * @max_batch_requests: maximum number of requests in a batch
+ **/
+void blk_queue_max_batch_requests(struct request_queue *q,
+ unsigned int max_batch_requests)
+{
+ q->limits.max_batch_reqs = max_batch_requests;
+}
+EXPORT_SYMBOL(blk_queue_max_batch_requests);
+
static int __init blk_settings_init(void)
{
blk_max_low_pfn = max_low_pfn - 1;
diff --git a/block/kyber-iosched.c b/block/kyber-iosched.c
index 8f58434..3a84a5f 100644
--- a/block/kyber-iosched.c
+++ b/block/kyber-iosched.c
@@ -801,50 +801,56 @@ static int kyber_dispatch_requests(struct blk_mq_hw_ctx *hctx,
{
struct kyber_queue_data *kqd = hctx->queue->elevator->elevator_data;
struct kyber_hctx_data *khd = hctx->sched_data;
+ unsigned int batch_reqs = queue_max_batch_requests(hctx->queue) ? : 1;
struct request *rq;
- int i, ret = 0;
+ int i, j, ret = 0;

spin_lock(&khd->lock);

- /*
- * First, if we are still entitled to batch, try to dispatch a request
- * from the batch.
- */
- if (khd->batching < kyber_batch_size[khd->cur_domain]) {
- rq = kyber_dispatch_cur_domain(kqd, khd, hctx);
- if (rq) {
- list_add(&rq->queuelist, list);
- ret = 1;
- goto out;
+ for (j = 0; j < batch_reqs; j++) {
+ /*
+ * First, if we are still entitled to batch, try to dispatch a
+ * request from the batch.
+ */
+ if (khd->batching < kyber_batch_size[khd->cur_domain]) {
+ rq = kyber_dispatch_cur_domain(kqd, khd, hctx);
+ if (rq) {
+ list_add(&rq->queuelist, list);
+ ret = 1;
+ continue;
+ }
}
- }
-
- /*
- * Either,
- * 1. We were no longer entitled to a batch.
- * 2. The domain we were batching didn't have any requests.
- * 3. The domain we were batching was out of tokens.
- *
- * Start another batch. Note that this wraps back around to the original
- * domain if no other domains have requests or tokens.
- */
- khd->batching = 0;
- for (i = 0; i < KYBER_NUM_DOMAINS; i++) {
- if (khd->cur_domain == KYBER_NUM_DOMAINS - 1)
- khd->cur_domain = 0;
- else
- khd->cur_domain++;

- rq = kyber_dispatch_cur_domain(kqd, khd, hctx);
- if (rq) {
- list_add(&rq->queuelist, list);
- ret = 1;
- goto out;
+ /*
+ * Either,
+ * 1. We were no longer entitled to a batch.
+ * 2. The domain we were batching didn't have any requests.
+ * 3. The domain we were batching was out of tokens.
+ *
+ * Start another batch. Note that this wraps back around to the
+ * original domain if no other domains have requests or tokens.
+ */
+ khd->batching = 0;
+ for (i = 0; i < KYBER_NUM_DOMAINS; i++) {
+ if (khd->cur_domain == KYBER_NUM_DOMAINS - 1)
+ khd->cur_domain = 0;
+ else
+ khd->cur_domain++;
+
+ rq = kyber_dispatch_cur_domain(kqd, khd, hctx);
+ if (rq) {
+ list_add(&rq->queuelist, list);
+ ret = 1;
+ break;
+ }
}
}

-out:
spin_unlock(&khd->lock);
+
+ if (list_empty(list))
+ ret = 0;
+
return ret;
}

diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 9fbffba..4e3d58a 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -382,16 +382,24 @@ static int dd_dispatch_requests(struct blk_mq_hw_ctx *hctx,
struct list_head *list)
{
struct deadline_data *dd = hctx->queue->elevator->elevator_data;
+ unsigned int batch_reqs = queue_max_batch_requests(hctx->queue) ? : 1;
struct request *rq;
+ int i;

- spin_lock(&dd->lock);
- rq = __dd_dispatch_request(dd);
- spin_unlock(&dd->lock);
+ for (i = 0; i < batch_reqs; i++) {
+ spin_lock(&dd->lock);
+ rq = __dd_dispatch_request(dd);
+ spin_unlock(&dd->lock);

- if (!rq)
- return 0;
+ if (!rq) {
+ if (list_empty(list))
+ return 0;

- list_add(&rq->queuelist, list);
+ return 1;
+ }
+
+ list_add(&rq->queuelist, list);
+ }

return 1;
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 053ea4b..d7032a0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -338,6 +338,7 @@ struct queue_limits {
unsigned int max_write_zeroes_sectors;
unsigned int discard_granularity;
unsigned int discard_alignment;
+ unsigned int max_batch_reqs;

unsigned short max_segments;
unsigned short max_integrity_segments;
@@ -1109,6 +1110,8 @@ extern void blk_queue_required_elevator_features(struct request_queue *q,
unsigned int features);
extern bool blk_queue_can_use_dma_map_merging(struct request_queue *q,
struct device *dev);
+extern void blk_queue_max_batch_requests(struct request_queue *q,
+ unsigned int max_batch_requests);

/*
* Number of physical segments as sent to the device.
@@ -1291,6 +1294,11 @@ static inline unsigned int queue_max_segment_size(const struct request_queue *q)
return q->limits.max_segment_size;
}

+static inline unsigned int queue_max_batch_requests(const struct request_queue *q)
+{
+ return q->limits.max_batch_reqs;
+}
+
static inline unsigned queue_logical_block_size(const struct request_queue *q)
{
int retval = 512;
--
1.9.1

2020-03-16 10:02:40

by Baolin Wang

[permalink] [raw]
Subject: [RESEND RFC PATCH 5/8] mmc: host: sdhci: Factor out the command configuration

Move the SD command configuration into one separate function to simplify
the sdhci_send_command(). Moreover this function can be used to support
ADMA3 transfer mode in following patches.

Signed-off-by: Baolin Wang <[email protected]>
---
drivers/mmc/host/sdhci.c | 65 ++++++++++++++++++++++++++++--------------------
1 file changed, 38 insertions(+), 27 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 6238b5c..4de0f48 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -1546,9 +1546,43 @@ static void sdhci_finish_data(struct sdhci_host *host)
}
}

-void sdhci_send_command(struct sdhci_host *host, struct mmc_command *cmd)
+static int sdhci_get_command(struct sdhci_host *host, struct mmc_command *cmd)
{
int flags;
+
+ if ((cmd->flags & MMC_RSP_136) && (cmd->flags & MMC_RSP_BUSY)) {
+ pr_err("%s: Unsupported response type!\n",
+ mmc_hostname(host->mmc));
+ cmd->error = -EINVAL;
+ sdhci_finish_mrq(host, cmd->mrq);
+ return -EINVAL;
+ }
+
+ if (!(cmd->flags & MMC_RSP_PRESENT))
+ flags = SDHCI_CMD_RESP_NONE;
+ else if (cmd->flags & MMC_RSP_136)
+ flags = SDHCI_CMD_RESP_LONG;
+ else if (cmd->flags & MMC_RSP_BUSY)
+ flags = SDHCI_CMD_RESP_SHORT_BUSY;
+ else
+ flags = SDHCI_CMD_RESP_SHORT;
+
+ if (cmd->flags & MMC_RSP_CRC)
+ flags |= SDHCI_CMD_CRC;
+ if (cmd->flags & MMC_RSP_OPCODE)
+ flags |= SDHCI_CMD_INDEX;
+
+ /* CMD19 is special in that the Data Present Select should be set */
+ if (cmd->data || cmd->opcode == MMC_SEND_TUNING_BLOCK ||
+ cmd->opcode == MMC_SEND_TUNING_BLOCK_HS200)
+ flags |= SDHCI_CMD_DATA;
+
+ return SDHCI_MAKE_CMD(cmd->opcode, flags);
+}
+
+void sdhci_send_command(struct sdhci_host *host, struct mmc_command *cmd)
+{
+ int command;
u32 mask;
unsigned long timeout;

@@ -1605,32 +1639,9 @@ void sdhci_send_command(struct sdhci_host *host, struct mmc_command *cmd)

sdhci_set_transfer_mode(host, cmd);

- if ((cmd->flags & MMC_RSP_136) && (cmd->flags & MMC_RSP_BUSY)) {
- pr_err("%s: Unsupported response type!\n",
- mmc_hostname(host->mmc));
- cmd->error = -EINVAL;
- sdhci_finish_mrq(host, cmd->mrq);
+ command = sdhci_get_command(host, cmd);
+ if (command < 0)
return;
- }
-
- if (!(cmd->flags & MMC_RSP_PRESENT))
- flags = SDHCI_CMD_RESP_NONE;
- else if (cmd->flags & MMC_RSP_136)
- flags = SDHCI_CMD_RESP_LONG;
- else if (cmd->flags & MMC_RSP_BUSY)
- flags = SDHCI_CMD_RESP_SHORT_BUSY;
- else
- flags = SDHCI_CMD_RESP_SHORT;
-
- if (cmd->flags & MMC_RSP_CRC)
- flags |= SDHCI_CMD_CRC;
- if (cmd->flags & MMC_RSP_OPCODE)
- flags |= SDHCI_CMD_INDEX;
-
- /* CMD19 is special in that the Data Present Select should be set */
- if (cmd->data || cmd->opcode == MMC_SEND_TUNING_BLOCK ||
- cmd->opcode == MMC_SEND_TUNING_BLOCK_HS200)
- flags |= SDHCI_CMD_DATA;

timeout = jiffies;
if (host->data_timeout)
@@ -1644,7 +1655,7 @@ void sdhci_send_command(struct sdhci_host *host, struct mmc_command *cmd)
if (host->use_external_dma)
sdhci_external_dma_pre_transfer(host, cmd);

- sdhci_writew(host, SDHCI_MAKE_CMD(cmd->opcode, flags), SDHCI_COMMAND);
+ sdhci_writew(host, command, SDHCI_COMMAND);
}
EXPORT_SYMBOL_GPL(sdhci_send_command);

--
1.9.1

2020-03-16 10:02:55

by Baolin Wang

[permalink] [raw]
Subject: [RESEND RFC PATCH 4/8] mmc: host: sdhci: Introduce ADMA3 transfer mode

The standard SD host controller can support ADMA3 transfer mode optionally.
The ADMA3 uses command descriptor to issue an SD command, and a multi-block
data transfer is programmed by using a pair of command descriptor and ADMA2
descriptor. ADMA3 performs multiple of multi-block data transfer by using
integrated descriptor.

This is a preparation patch to add ADMA3 structures and help to expand the
ADMA buffer for support ADMA3 transfer mode.

Signed-off-by: Baolin Wang <[email protected]>
---
drivers/mmc/host/sdhci.c | 106 +++++++++++++++++++++++++++++++++++++++--------
drivers/mmc/host/sdhci.h | 48 +++++++++++++++++++++
2 files changed, 137 insertions(+), 17 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 4febbcb..6238b5c 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -772,7 +772,8 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
* If this triggers then we have a calculation bug
* somewhere. :/
*/
- WARN_ON((desc - host->adma_table) >= host->adma_table_sz);
+ WARN_ON((desc - host->adma_table) >=
+ host->adma_table_sz * host->adma3_table_cnt);
}

if (host->quirks & SDHCI_QUIRK_NO_ENDATTR_IN_NOPDESC) {
@@ -4012,10 +4013,17 @@ int sdhci_setup_host(struct sdhci_host *host)
(host->caps & SDHCI_CAN_DO_ADMA2))
host->flags |= SDHCI_USE_ADMA;

+ if ((host->quirks2 & SDHCI_QUIRK2_USE_ADMA3_SUPPORT) &&
+ (host->flags & SDHCI_USE_ADMA) &&
+ (host->caps1 & SDHCI_CAN_DO_ADMA3)) {
+ DBG("Enable ADMA3 mode for data transfer\n");
+ host->flags |= SDHCI_USE_ADMA3;
+ }
+
if ((host->quirks & SDHCI_QUIRK_BROKEN_ADMA) &&
(host->flags & SDHCI_USE_ADMA)) {
DBG("Disabling ADMA as it is marked broken\n");
- host->flags &= ~SDHCI_USE_ADMA;
+ host->flags &= ~(SDHCI_USE_ADMA | SDHCI_USE_ADMA3);
}

if (sdhci_can_64bit_dma(host))
@@ -4048,7 +4056,7 @@ int sdhci_setup_host(struct sdhci_host *host)
if (ret) {
pr_warn("%s: No suitable DMA available - falling back to PIO\n",
mmc_hostname(mmc));
- host->flags &= ~(SDHCI_USE_SDMA | SDHCI_USE_ADMA);
+ host->flags &= ~(SDHCI_USE_SDMA | SDHCI_USE_ADMA | SDHCI_USE_ADMA3);

ret = 0;
}
@@ -4070,31 +4078,68 @@ int sdhci_setup_host(struct sdhci_host *host)
host->desc_sz = host->alloc_desc_sz;
host->adma_table_sz = host->adma_table_cnt * host->desc_sz;

+ host->adma3_table_cnt = 1;
+
+ if (host->flags & SDHCI_USE_ADMA3) {
+ /* We can package maximum 16 requests once */
+ host->adma3_table_cnt = SDHCI_MAX_ADMA3_ENTRIES;
+
+ if (host->flags & SDHCI_USE_64_BIT_DMA)
+ host->integr_desc_sz = SDHCI_INTEGR_64_DESC_SZ;
+ else
+ host->integr_desc_sz = SDHCI_INTEGR_32_DESC_SZ;
+
+ host->cmd_desc_sz = SDHCI_ADMA3_CMD_DESC_SZ;
+ host->cmd_table_sz = host->adma3_table_cnt *
+ SDHCI_ADMA3_CMD_DESC_SZ * SDHCI_ADMA3_CMD_DESC_ENTRIES;
+
+ buf = dma_alloc_coherent(mmc_dev(mmc),
+ host->adma3_table_cnt *
+ host->integr_desc_sz,
+ &dma, GFP_KERNEL);
+ if (!buf) {
+ pr_warn("%s: Unable to allocate ADMA3 integrated buffers - falling back to ADMA\n",
+ mmc_hostname(mmc));
+ host->flags &= ~SDHCI_USE_ADMA3;
+ host->adma3_table_cnt = 1;
+ } else {
+ host->integr_table = buf;
+ host->integr_addr = dma;
+ }
+ }
+
host->align_buffer_sz = SDHCI_MAX_SEGS * SDHCI_ADMA2_ALIGN;
/*
* Use zalloc to zero the reserved high 32-bits of 128-bit
* descriptors so that they never need to be written.
*/
buf = dma_alloc_coherent(mmc_dev(mmc),
- host->align_buffer_sz + host->adma_table_sz,
+ host->align_buffer_sz *
+ host->adma3_table_cnt +
+ host->cmd_table_sz +
+ host->adma_table_sz *
+ host->adma3_table_cnt,
&dma, GFP_KERNEL);
if (!buf) {
pr_warn("%s: Unable to allocate ADMA buffers - falling back to standard DMA\n",
mmc_hostname(mmc));
- host->flags &= ~SDHCI_USE_ADMA;
- } else if ((dma + host->align_buffer_sz) &
+ host->flags &= ~(SDHCI_USE_ADMA | SDHCI_USE_ADMA3);
+ } else if ((dma + host->align_buffer_sz * host->adma3_table_cnt) &
(SDHCI_ADMA2_DESC_ALIGN - 1)) {
pr_warn("%s: unable to allocate aligned ADMA descriptor\n",
mmc_hostname(mmc));
- host->flags &= ~SDHCI_USE_ADMA;
- dma_free_coherent(mmc_dev(mmc), host->align_buffer_sz +
- host->adma_table_sz, buf, dma);
+ host->flags &= ~(SDHCI_USE_ADMA | SDHCI_USE_ADMA3);
+ dma_free_coherent(mmc_dev(mmc), host->align_buffer_sz *
+ host->adma3_table_cnt +
+ host->cmd_table_sz +
+ host->adma_table_sz *
+ host->adma3_table_cnt, buf, dma);
} else {
host->align_buffer = buf;
host->align_addr = dma;

- host->adma_table = buf + host->align_buffer_sz;
- host->adma_addr = dma + host->align_buffer_sz;
+ host->adma_table = buf + host->align_buffer_sz * host->adma3_table_cnt;
+ host->adma_addr = dma + host->align_buffer_sz * host->adma3_table_cnt;
}
}

@@ -4495,12 +4540,21 @@ int sdhci_setup_host(struct sdhci_host *host)
regulator_disable(mmc->supply.vqmmc);
undma:
if (host->align_buffer)
- dma_free_coherent(mmc_dev(mmc), host->align_buffer_sz +
- host->adma_table_sz, host->align_buffer,
+ dma_free_coherent(mmc_dev(mmc),
+ host->align_buffer_sz * host->adma3_table_cnt +
+ host->cmd_table_sz +
+ host->adma_table_sz * host->adma3_table_cnt,
+ host->align_buffer,
host->align_addr);
host->adma_table = NULL;
host->align_buffer = NULL;

+ if (host->integr_table)
+ dma_free_coherent(mmc_dev(mmc),
+ host->adma3_table_cnt * host->integr_desc_sz,
+ host->integr_table, host->integr_addr);
+ host->integr_table = NULL;
+
return ret;
}
EXPORT_SYMBOL_GPL(sdhci_setup_host);
@@ -4513,8 +4567,11 @@ void sdhci_cleanup_host(struct sdhci_host *host)
regulator_disable(mmc->supply.vqmmc);

if (host->align_buffer)
- dma_free_coherent(mmc_dev(mmc), host->align_buffer_sz +
- host->adma_table_sz, host->align_buffer,
+ dma_free_coherent(mmc_dev(mmc),
+ host->align_buffer_sz * host->adma3_table_cnt +
+ host->cmd_table_sz +
+ host->adma_table_sz * host->adma3_table_cnt,
+ host->align_buffer,
host->align_addr);

if (host->use_external_dma)
@@ -4522,6 +4579,12 @@ void sdhci_cleanup_host(struct sdhci_host *host)

host->adma_table = NULL;
host->align_buffer = NULL;
+
+ if (host->integr_table)
+ dma_free_coherent(mmc_dev(mmc),
+ host->adma3_table_cnt * host->integr_desc_sz,
+ host->integr_table, host->integr_addr);
+ host->integr_table = NULL;
}
EXPORT_SYMBOL_GPL(sdhci_cleanup_host);

@@ -4650,8 +4713,11 @@ void sdhci_remove_host(struct sdhci_host *host, int dead)
regulator_disable(mmc->supply.vqmmc);

if (host->align_buffer)
- dma_free_coherent(mmc_dev(mmc), host->align_buffer_sz +
- host->adma_table_sz, host->align_buffer,
+ dma_free_coherent(mmc_dev(mmc),
+ host->align_buffer_sz * host->adma3_table_cnt +
+ host->cmd_table_sz +
+ host->adma_table_sz * host->adma3_table_cnt,
+ host->align_buffer,
host->align_addr);

if (host->use_external_dma)
@@ -4659,6 +4725,12 @@ void sdhci_remove_host(struct sdhci_host *host, int dead)

host->adma_table = NULL;
host->align_buffer = NULL;
+
+ if (host->integr_table)
+ dma_free_coherent(mmc_dev(mmc),
+ host->adma3_table_cnt * host->integr_desc_sz,
+ host->integr_table, host->integr_addr);
+ host->integr_table = NULL;
}

EXPORT_SYMBOL_GPL(sdhci_remove_host);
diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index 5507a73..96aed99 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -274,6 +274,9 @@
#define SDHCI_PRESET_SDCLK_FREQ_MASK 0x3FF
#define SDHCI_PRESET_SDCLK_FREQ_SHIFT 0

+#define SDHCI_ADMA3_ADDRESS 0x78
+#define SDHCI_ADMA3_ADDRESS_HI 0x7c
+
#define SDHCI_SLOT_INT_STATUS 0xFC

#define SDHCI_HOST_VERSION 0xFE
@@ -346,6 +349,41 @@ struct sdhci_adma2_64_desc {
#define ADMA2_NOP_END_VALID 0x3
#define ADMA2_END 0x2

+#define SDHCI_MAX_ADMA3_ENTRIES 16
+
+/* ADMA3 command descriptor */
+struct sdhci_adma3_cmd_desc {
+ __le32 cmd;
+ __le32 reg;
+} __packed __aligned(4);
+
+#define ADMA3_TRAN_VALID 0x9
+#define ADMA3_TRAN_END 0xb
+
+/* ADMA3 command descriptor size */
+#define SDHCI_ADMA3_CMD_DESC_ENTRIES 4
+#define SDHCI_ADMA3_CMD_DESC_SZ 8
+
+/* ADMA3 integrated 32-bit descriptor */
+struct sdhci_integr_32_desc {
+ __le32 cmd;
+ __le32 addr;
+} __packed __aligned(4);
+
+#define SDHCI_INTEGR_32_DESC_SZ 8
+
+/* ADMA3 integrated 64-bit descriptor. */
+struct sdhci_integr_64_desc {
+ __le32 cmd;
+ __le32 addr_lo;
+ __le32 addr_hi;
+} __packed __aligned(4);
+
+#define SDHCI_INTEGR_64_DESC_SZ 16
+
+#define ADMA3_INTEGR_TRAN_VALID 0x39
+#define ADMA3_INTEGR_TRAN_END 0x3b
+
/*
* Maximum segments assuming a 512KiB maximum requisition size and a minimum
* 4KiB page size.
@@ -484,6 +522,8 @@ struct sdhci_host {
* block count.
*/
#define SDHCI_QUIRK2_USE_32BIT_BLK_CNT (1<<18)
+/* use ADMA3 for data read/write if hardware supports */
+#define SDHCI_QUIRK2_USE_ADMA3_SUPPORT (1<<19)

int irq; /* Device IRQ */
void __iomem *ioaddr; /* Mapped address */
@@ -520,6 +560,7 @@ struct sdhci_host {
#define SDHCI_SIGNALING_330 (1<<14) /* Host is capable of 3.3V signaling */
#define SDHCI_SIGNALING_180 (1<<15) /* Host is capable of 1.8V signaling */
#define SDHCI_SIGNALING_120 (1<<16) /* Host is capable of 1.2V signaling */
+#define SDHCI_USE_ADMA3 (1<<17) /* Host is ADMA3 capable */

unsigned int version; /* SDHCI spec. version */

@@ -552,15 +593,20 @@ struct sdhci_host {

void *adma_table; /* ADMA descriptor table */
void *align_buffer; /* Bounce buffer */
+ void *integr_table; /* ADMA3 intergrate descriptor table */

size_t adma_table_sz; /* ADMA descriptor table size */
size_t align_buffer_sz; /* Bounce buffer size */
+ size_t cmd_table_sz; /* ADMA3 command descriptor table size */

dma_addr_t adma_addr; /* Mapped ADMA descr. table */
dma_addr_t align_addr; /* Mapped bounce buffer */
+ dma_addr_t integr_addr; /* Mapped ADMA3 intergrate descr. table */

unsigned int desc_sz; /* ADMA current descriptor size */
unsigned int alloc_desc_sz; /* ADMA descr. max size host supports */
+ unsigned int cmd_desc_sz; /* ADMA3 command descriptor size */
+ unsigned int integr_desc_sz; /* ADMA3 intergrate descriptor size */

struct workqueue_struct *complete_wq; /* Request completion wq */
struct work_struct complete_work; /* Request completion work */
@@ -611,6 +657,8 @@ struct sdhci_host {

/* Host ADMA table count */
u32 adma_table_cnt;
+ /* Host ADMA3 table count */
+ u32 adma3_table_cnt;

u64 data_timeout;

--
1.9.1

2020-03-16 10:03:14

by Baolin Wang

[permalink] [raw]
Subject: [RESEND RFC PATCH 8/8] mmc: host: sdhci-sprd: Add MMC packed request support

Enable the ADMA3 transfer mode as well as adding packed operations
to support MMC packed requests to improve IO performance.

Signed-off-by: Baolin Wang <[email protected]>
---
drivers/mmc/host/sdhci-sprd.c | 30 ++++++++++++++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/mmc/host/sdhci-sprd.c b/drivers/mmc/host/sdhci-sprd.c
index 49afe1c..daa38ed 100644
--- a/drivers/mmc/host/sdhci-sprd.c
+++ b/drivers/mmc/host/sdhci-sprd.c
@@ -390,6 +390,12 @@ static void sdhci_sprd_request_done(struct sdhci_host *host,
mmc_request_done(host->mmc, mrq);
}

+static void sdhci_sprd_packed_request_done(struct sdhci_host *host,
+ struct mmc_packed_request *prq)
+{
+ mmc_hsq_finalize_packed_request(host->mmc, prq);
+}
+
static struct sdhci_ops sdhci_sprd_ops = {
.read_l = sdhci_sprd_readl,
.write_l = sdhci_sprd_writel,
@@ -404,6 +410,7 @@ static void sdhci_sprd_request_done(struct sdhci_host *host,
.get_max_timeout_count = sdhci_sprd_get_max_timeout_count,
.get_ro = sdhci_sprd_get_ro,
.request_done = sdhci_sprd_request_done,
+ .packed_request_done = sdhci_sprd_packed_request_done,
};

static void sdhci_sprd_request(struct mmc_host *mmc, struct mmc_request *mrq)
@@ -546,10 +553,18 @@ static void sdhci_sprd_phy_param_parse(struct sdhci_sprd_host *sprd_host,
SDHCI_QUIRK_MISSING_CAPS,
.quirks2 = SDHCI_QUIRK2_BROKEN_HS200 |
SDHCI_QUIRK2_USE_32BIT_BLK_CNT |
- SDHCI_QUIRK2_PRESET_VALUE_BROKEN,
+ SDHCI_QUIRK2_PRESET_VALUE_BROKEN |
+ SDHCI_QUIRK2_USE_ADMA3_SUPPORT,
.ops = &sdhci_sprd_ops,
};

+static const struct hsq_packed_ops packed_ops = {
+ .packed_algo = mmc_hsq_packed_algo_rw,
+ .prepare_hardware = sdhci_prepare_packed,
+ .unprepare_hardware = sdhci_unprepare_packed,
+ .packed_request = sdhci_packed_request,
+};
+
static int sdhci_sprd_probe(struct platform_device *pdev)
{
struct sdhci_host *host;
@@ -676,7 +691,18 @@ static int sdhci_sprd_probe(struct platform_device *pdev)
goto err_cleanup_host;
}

- ret = mmc_hsq_init(hsq, host->mmc, NULL, 0);
+ /*
+ * If the host controller can support ADMA3 mode, we can enable the
+ * packed request mode to improve the read/write performance.
+ *
+ * Considering the maximum ADMA3 entries (default is 16) and the request
+ * latency, we set the default maximum packed requests number is 8.
+ */
+ if (host->flags & SDHCI_USE_ADMA3)
+ ret = mmc_hsq_init(hsq, host->mmc, &packed_ops,
+ SDHCI_MAX_ADMA3_ENTRIES / 2);
+ else
+ ret = mmc_hsq_init(hsq, host->mmc, NULL, 0);
if (ret)
goto err_cleanup_host;

--
1.9.1

2020-03-16 10:03:18

by Baolin Wang

[permalink] [raw]
Subject: [RESEND RFC PATCH 3/8] mmc: Add MMC packed request support for MMC software queue

Some SD controllers can support packed command or packed request, that
means it can package several requests to host controller to be handled
at one time, which can reduce interrutps and improve the DMA transfer.
As a result, the I/O performence can be improved.

Thus this patch adds MMC packed function to support packed requests or
packed command based on the MMC software queue.

The basic concept of this function is that, we try to collect more
requests from block layer as much as possible to be linked into
MMC packed queue by mmc_blk_hsq_issue_rw_rq(). When the last
request of the hardware queue comes, or the collected request numbers
are larger than 16, or a larger request comes, then we can start to
pakage a packed request to host controller. The MMC packed function
also supplies packed algorithm operations to help to package qualified
requests. After finishing the packed request, the MMC packed function
will help to complete each request, at the same time, the MMC packed
queue will allow to collect more requests from block layer. After
completing each request, the MMC packed function can try to package
another packed request to host controller directly in the complete path,
if there are enough requests in MMC packed queue or the request pending
flag is not set. If the pending flag was set, we should let the
mmc_blk_hsq_issue_rw_rq() collect more request as much as possible.

Signed-off-by: Baolin Wang <[email protected]>
---
drivers/mmc/core/block.c | 35 ++++--
drivers/mmc/core/block.h | 3 +-
drivers/mmc/core/core.c | 26 ++++
drivers/mmc/core/core.h | 2 +
drivers/mmc/core/queue.c | 22 +++-
drivers/mmc/host/mmc_hsq.c | 271 ++++++++++++++++++++++++++++++++++++------
drivers/mmc/host/mmc_hsq.h | 25 +++-
drivers/mmc/host/sdhci-sprd.c | 2 +-
include/linux/mmc/core.h | 6 +
include/linux/mmc/host.h | 9 ++
10 files changed, 351 insertions(+), 50 deletions(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 55d52fc..19c589e 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1537,7 +1537,8 @@ static int mmc_blk_cqe_issue_flush(struct mmc_queue *mq, struct request *req)
return mmc_blk_cqe_start_req(mq->card->host, mrq);
}

-static int mmc_blk_hsq_issue_rw_rq(struct mmc_queue *mq, struct request *req)
+static int mmc_blk_hsq_issue_rw_rq(struct mmc_queue *mq, struct request *req,
+ bool last)
{
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
struct mmc_host *host = mq->card->host;
@@ -1548,19 +1549,25 @@ static int mmc_blk_hsq_issue_rw_rq(struct mmc_queue *mq, struct request *req)
mmc_pre_req(host, &mqrq->brq.mrq);

err = mmc_cqe_start_req(host, &mqrq->brq.mrq);
- if (err)
+ if (err) {
mmc_post_req(host, &mqrq->brq.mrq, err);
+ return err;
+ }

- return err;
+ if (last)
+ mmc_cqe_commit_rqs(host);
+
+ return 0;
}

-static int mmc_blk_cqe_issue_rw_rq(struct mmc_queue *mq, struct request *req)
+static int mmc_blk_cqe_issue_rw_rq(struct mmc_queue *mq, struct request *req,
+ bool last)
{
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
struct mmc_host *host = mq->card->host;

if (host->hsq_enabled)
- return mmc_blk_hsq_issue_rw_rq(mq, req);
+ return mmc_blk_hsq_issue_rw_rq(mq, req, last);

mmc_blk_data_prep(mq, mqrq, 0, NULL, NULL);

@@ -1959,6 +1966,18 @@ static void mmc_blk_hsq_req_done(struct mmc_request *mrq)
if (mmc_blk_rq_error(&mqrq->brq) ||
mmc_blk_urgent_bkops_needed(mq, mqrq)) {
spin_lock_irqsave(&mq->lock, flags);
+ /*
+ * The HSQ may complete more than one requests at one time
+ * for the packed request mode. So if there is one recovery
+ * request is pending, the following error requests just
+ * should be completed directly, since we should not do
+ * recovery continuously.
+ */
+ if (mq->recovery_needed) {
+ spin_unlock_irqrestore(&mq->lock, flags);
+ goto out;
+ }
+
mq->recovery_needed = true;
mq->recovery_req = req;
spin_unlock_irqrestore(&mq->lock, flags);
@@ -1971,6 +1990,7 @@ static void mmc_blk_hsq_req_done(struct mmc_request *mrq)

mmc_blk_rw_reset_success(mq, req);

+out:
/*
* Block layer timeouts race with completions which means the normal
* completion path cannot be used during recovery.
@@ -2236,7 +2256,8 @@ static int mmc_blk_wait_for_idle(struct mmc_queue *mq, struct mmc_host *host)
return mmc_blk_rw_wait(mq, NULL);
}

-enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
+enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req,
+ bool last)
{
struct mmc_blk_data *md = mq->blkdata;
struct mmc_card *card = md->queue.card;
@@ -2280,7 +2301,7 @@ enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
case REQ_OP_READ:
case REQ_OP_WRITE:
if (mq->use_cqe)
- ret = mmc_blk_cqe_issue_rw_rq(mq, req);
+ ret = mmc_blk_cqe_issue_rw_rq(mq, req, last);
else
ret = mmc_blk_mq_issue_rw_rq(mq, req);
break;
diff --git a/drivers/mmc/core/block.h b/drivers/mmc/core/block.h
index 31153f6..8bfb89f 100644
--- a/drivers/mmc/core/block.h
+++ b/drivers/mmc/core/block.h
@@ -9,7 +9,8 @@

enum mmc_issued;

-enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req);
+enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req,
+ bool last);
void mmc_blk_mq_complete(struct request *req);
void mmc_blk_mq_recovery(struct mmc_queue *mq);

diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index aa54d35..f96c0b1 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -329,6 +329,7 @@ static int mmc_mrq_prep(struct mmc_host *host, struct mmc_request *mrq)
}
}

+ INIT_LIST_HEAD(&mrq->list);
return 0;
}

@@ -536,6 +537,31 @@ void mmc_cqe_post_req(struct mmc_host *host, struct mmc_request *mrq)
}
EXPORT_SYMBOL(mmc_cqe_post_req);

+/**
+ * mmc_cqe_commit_rqs - Commit requests pending in CQE
+ * @host: MMC host
+ * @last: Indicate if the last request from block layer
+ */
+void mmc_cqe_commit_rqs(struct mmc_host *host)
+{
+ if (host->cqe_ops->cqe_commit_rqs)
+ host->cqe_ops->cqe_commit_rqs(host);
+}
+EXPORT_SYMBOL(mmc_cqe_commit_rqs);
+
+/**
+ * mmc_cqe_is_busy - If CQE is busy or not
+ * @host: MMC host
+ */
+bool mmc_cqe_is_busy(struct mmc_host *host)
+{
+ if (host->cqe_ops->cqe_is_busy)
+ return host->cqe_ops->cqe_is_busy(host);
+
+ return false;
+}
+EXPORT_SYMBOL(mmc_cqe_is_busy);
+
/* Arbitrary 1 second timeout */
#define MMC_CQE_RECOVERY_TIMEOUT 1000

diff --git a/drivers/mmc/core/core.h b/drivers/mmc/core/core.h
index 575ac02..db81ba2 100644
--- a/drivers/mmc/core/core.h
+++ b/drivers/mmc/core/core.h
@@ -139,6 +139,8 @@ static inline void mmc_claim_host(struct mmc_host *host)
int mmc_cqe_start_req(struct mmc_host *host, struct mmc_request *mrq);
void mmc_cqe_post_req(struct mmc_host *host, struct mmc_request *mrq);
int mmc_cqe_recovery(struct mmc_host *host);
+void mmc_cqe_commit_rqs(struct mmc_host *host);
+bool mmc_cqe_is_busy(struct mmc_host *host);

/**
* mmc_pre_req - Prepare for a new request
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 25bee3d..774ea82 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -285,11 +285,7 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
}
break;
case MMC_ISSUE_ASYNC:
- /*
- * For MMC host software queue, we only allow 2 requests in
- * flight to avoid a long latency.
- */
- if (host->hsq_enabled && mq->in_flight[issue_type] > 2) {
+ if (mq->use_cqe && mmc_cqe_is_busy(host)) {
spin_unlock_irq(&mq->lock);
return BLK_STS_RESOURCE;
}
@@ -330,7 +326,7 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,

blk_mq_start_request(req);

- issued = mmc_blk_mq_issue_rq(mq, req);
+ issued = mmc_blk_mq_issue_rq(mq, req, bd->last);

switch (issued) {
case MMC_REQ_BUSY:
@@ -362,8 +358,19 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
return ret;
}

+static void mmc_mq_commit_rqs(struct blk_mq_hw_ctx *hctx)
+{
+ struct mmc_queue *mq = hctx->queue->queuedata;
+ struct mmc_card *card = mq->card;
+ struct mmc_host *host = card->host;
+
+ if (mq->use_cqe)
+ mmc_cqe_commit_rqs(host);
+}
+
static const struct blk_mq_ops mmc_mq_ops = {
.queue_rq = mmc_mq_queue_rq,
+ .commit_rqs = mmc_mq_commit_rqs,
.init_request = mmc_mq_init_request,
.exit_request = mmc_mq_exit_request,
.complete = mmc_blk_mq_complete,
@@ -390,6 +397,9 @@ static void mmc_setup_queue(struct mmc_queue *mq, struct mmc_card *card)
"merging was advertised but not possible");
blk_queue_max_segments(mq->queue, mmc_get_max_segments(host));

+ if (host->max_packed_reqs != 0)
+ blk_queue_max_batch_requests(mq->queue, host->max_packed_reqs);
+
if (mmc_card_mmc(card))
block_size = card->ext_csd.data_sector_size;

diff --git a/drivers/mmc/host/mmc_hsq.c b/drivers/mmc/host/mmc_hsq.c
index aafc0d1..c0ba24e 100644
--- a/drivers/mmc/host/mmc_hsq.c
+++ b/drivers/mmc/host/mmc_hsq.c
@@ -9,6 +9,7 @@

#include <linux/mmc/card.h>
#include <linux/mmc/host.h>
+#include <linux/mmc/mmc.h>
#include <linux/module.h>

#include "mmc_hsq.h"
@@ -16,16 +17,46 @@
#define HSQ_NUM_SLOTS 64
#define HSQ_INVALID_TAG HSQ_NUM_SLOTS

+#define HSQ_PACKED_FLUSH_BLOCKS 256
+
+/**
+ * mmc_hsq_packed_algo_rw - the algorithm to package read or write requests
+ * @mmc: the host controller
+ *
+ * TODO: we can add more condition to decide if we can package this
+ * request or not.
+ */
+void mmc_hsq_packed_algo_rw(struct mmc_host *mmc)
+{
+ struct mmc_hsq *hsq = mmc->cqe_private;
+ struct hsq_packed *packed = hsq->packed;
+ struct mmc_packed_request *prq = &packed->prq;
+ struct mmc_request *mrq, *t;
+ u32 i = 0;
+
+ list_for_each_entry_safe(mrq, t, &packed->list, list) {
+ if (++i > packed->max_entries)
+ break;
+
+ list_move_tail(&mrq->list, &prq->list);
+ prq->nr_reqs++;
+ }
+}
+EXPORT_SYMBOL_GPL(mmc_hsq_packed_algo_rw);
+
static void mmc_hsq_pump_requests(struct mmc_hsq *hsq)
{
+ struct hsq_packed *packed = hsq->packed;
struct mmc_host *mmc = hsq->mmc;
struct hsq_slot *slot;
+ struct mmc_request *mrq;
unsigned long flags;
+ int ret;

spin_lock_irqsave(&hsq->lock, flags);

/* Make sure we are not already running a request now */
- if (hsq->mrq) {
+ if (hsq->mrq || (packed && packed->prq.nr_reqs)) {
spin_unlock_irqrestore(&hsq->lock, flags);
return;
}
@@ -36,16 +67,58 @@ static void mmc_hsq_pump_requests(struct mmc_hsq *hsq)
return;
}

- slot = &hsq->slot[hsq->next_tag];
- hsq->mrq = slot->mrq;
- hsq->qcnt--;
+ if (packed) {
+ /* Try to package requests */
+ packed->ops->packed_algo(mmc);
+
+ packed->busy = true;
+ hsq->qcnt -= packed->prq.nr_reqs;
+ } else {
+ slot = &hsq->slot[hsq->next_tag];
+ hsq->mrq = slot->mrq;
+ hsq->qcnt--;
+ }

spin_unlock_irqrestore(&hsq->lock, flags);

- if (mmc->ops->request_atomic)
- mmc->ops->request_atomic(mmc, hsq->mrq);
- else
- mmc->ops->request(mmc, hsq->mrq);
+ if (!packed) {
+ if (mmc->ops->request_atomic)
+ mmc->ops->request_atomic(mmc, hsq->mrq);
+ else
+ mmc->ops->request(mmc, hsq->mrq);
+
+ return;
+ }
+
+ if (packed->ops->prepare_hardware) {
+ ret = packed->ops->prepare_hardware(mmc);
+ if (ret) {
+ pr_err("failed to prepare hardware\n");
+ goto error;
+ }
+ }
+
+ ret = packed->ops->packed_request(mmc, &packed->prq);
+ if (ret) {
+ pr_err("failed to packed requests\n");
+ goto error;
+ }
+
+ return;
+
+error:
+ spin_lock_irqsave(&hsq->lock, flags);
+
+ list_for_each_entry(mrq, &packed->prq.list, list) {
+ struct mmc_data *data = mrq->data;
+
+ data->error = ret;
+ data->bytes_xfered = 0;
+ }
+
+ spin_unlock_irqrestore(&hsq->lock, flags);
+
+ mmc_hsq_finalize_packed_request(mmc, &packed->prq);
}

static void mmc_hsq_update_next_tag(struct mmc_hsq *hsq, int remains)
@@ -87,16 +160,21 @@ static void mmc_hsq_update_next_tag(struct mmc_hsq *hsq, int remains)

static void mmc_hsq_post_request(struct mmc_hsq *hsq)
{
+ struct hsq_packed *packed = hsq->packed;
unsigned long flags;
int remains;

spin_lock_irqsave(&hsq->lock, flags);

remains = hsq->qcnt;
- hsq->mrq = NULL;
+ if (packed) {
+ packed->prq.nr_reqs = 0;
+ } else {
+ hsq->mrq = NULL;

- /* Update the next available tag to be queued. */
- mmc_hsq_update_next_tag(hsq, remains);
+ /* Update the next available tag to be queued. */
+ mmc_hsq_update_next_tag(hsq, remains);
+ }

if (hsq->waiting_for_idle && !remains) {
hsq->waiting_for_idle = false;
@@ -111,10 +189,6 @@ static void mmc_hsq_post_request(struct mmc_hsq *hsq)

spin_unlock_irqrestore(&hsq->lock, flags);

- /*
- * Try to pump new request to host controller as fast as possible,
- * after completing previous request.
- */
if (remains > 0)
mmc_hsq_pump_requests(hsq);
}
@@ -154,6 +228,91 @@ bool mmc_hsq_finalize_request(struct mmc_host *mmc, struct mmc_request *mrq)
}
EXPORT_SYMBOL_GPL(mmc_hsq_finalize_request);

+/**
+ * mmc_hsq_finalize_packed_request - finalize one packed request
+ * @mmc: the host controller
+ * @prq: the packed request need to be finalized
+ */
+void mmc_hsq_finalize_packed_request(struct mmc_host *mmc,
+ struct mmc_packed_request *prq)
+{
+ struct mmc_hsq *hsq = mmc->cqe_private;
+ struct hsq_packed *packed = hsq->packed;
+ struct mmc_request *mrq, *t;
+ LIST_HEAD(head);
+ unsigned long flags;
+
+ if (!packed || !prq)
+ return;
+
+ if (packed->ops->unprepare_hardware &&
+ packed->ops->unprepare_hardware(mmc))
+ pr_err("failed to unprepare hardware\n");
+
+ /*
+ * Clear busy flag to allow collecting more requests into command
+ * queue, but now we can not pump them to controller, we should wait
+ * for all requests are completed. During the period of completing
+ * requests, we should collect more requests from block layer as much
+ * as possible.
+ */
+ spin_lock_irqsave(&hsq->lock, flags);
+ list_splice_tail_init(&prq->list, &head);
+ packed->busy = false;
+ spin_unlock_irqrestore(&hsq->lock, flags);
+
+ list_for_each_entry_safe(mrq, t, &head, list) {
+ list_del(&mrq->list);
+
+ mmc_cqe_request_done(mmc, mrq);
+ }
+
+ mmc_hsq_post_request(hsq);
+}
+EXPORT_SYMBOL_GPL(mmc_hsq_finalize_packed_request);
+
+static void mmc_hsq_commit_rqs(struct mmc_host *mmc)
+{
+ struct mmc_hsq *hsq = mmc->cqe_private;
+ struct hsq_packed *packed = hsq->packed;
+
+ if (!packed)
+ return;
+
+ mmc_hsq_pump_requests(hsq);
+}
+
+static bool mmc_hsq_is_busy(struct mmc_host *mmc)
+{
+ struct mmc_hsq *hsq = mmc->cqe_private;
+ struct hsq_packed *packed = hsq->packed;
+ unsigned long flags;
+ bool busy;
+
+ spin_lock_irqsave(&hsq->lock, flags);
+
+ /*
+ * For packed mode, when hardware is busy, we can only allow maximum
+ * packed number requests to be ready in software queue to be queued
+ * after previous packed request is completed, which avoiding long
+ * latency.
+ *
+ * For non-packed mode, we can only allow 2 requests in flight to avoid
+ * long latency.
+ *
+ * Otherwise return BLK_STS_RESOURCE to tell block layer to dispatch
+ * requests later.
+ */
+ if (packed)
+ busy = packed->busy && hsq->qcnt >= packed->max_entries;
+ else
+ busy = hsq->qcnt > 1;
+
+ spin_unlock_irqrestore(&hsq->lock, flags);
+
+ return busy;
+}
+
static void mmc_hsq_recovery_start(struct mmc_host *mmc)
{
struct mmc_hsq *hsq = mmc->cqe_private;
@@ -189,7 +348,8 @@ static void mmc_hsq_recovery_finish(struct mmc_host *mmc)
static int mmc_hsq_request(struct mmc_host *mmc, struct mmc_request *mrq)
{
struct mmc_hsq *hsq = mmc->cqe_private;
- int tag = mrq->tag;
+ struct hsq_packed *packed = hsq->packed;
+ int nr_rqs = 0, tag = mrq->tag;

spin_lock_irq(&hsq->lock);

@@ -204,20 +364,37 @@ static int mmc_hsq_request(struct mmc_host *mmc, struct mmc_request *mrq)
return -EBUSY;
}

- hsq->slot[tag].mrq = mrq;
+ hsq->qcnt++;

- /*
- * Set the next tag as current request tag if no available
- * next tag.
- */
- if (hsq->next_tag == HSQ_INVALID_TAG)
- hsq->next_tag = tag;
+ if (packed) {
+ list_add_tail(&mrq->list, &packed->list);

- hsq->qcnt++;
+ nr_rqs = hsq->qcnt;
+ } else {
+ hsq->slot[tag].mrq = mrq;
+
+ /*
+ * Set the next tag as current request tag if no available
+ * next tag.
+ */
+ if (hsq->next_tag == HSQ_INVALID_TAG)
+ hsq->next_tag = tag;
+ }

spin_unlock_irq(&hsq->lock);

- mmc_hsq_pump_requests(hsq);
+ /*
+ * For non-packed request mode, we should pump requests as soon as
+ * possible.
+ *
+ * For the packed request mode, if it is a larger request or the
+ * request count is larger than the maximum packed number, we
+ * should pump requests to controller. Otherwise we should try to
+ * combine requests as much as we can.
+ */
+ if (!packed || mrq->data->blocks > HSQ_PACKED_FLUSH_BLOCKS ||
+ nr_rqs >= packed->max_entries)
+ mmc_hsq_pump_requests(hsq);

return 0;
}
@@ -230,12 +407,17 @@ static void mmc_hsq_post_req(struct mmc_host *mmc, struct mmc_request *mrq)

static bool mmc_hsq_queue_is_idle(struct mmc_hsq *hsq, int *ret)
{
+ struct hsq_packed *packed = hsq->packed;
bool is_idle;

spin_lock_irq(&hsq->lock);

- is_idle = (!hsq->mrq && !hsq->qcnt) ||
- hsq->recovery_halt;
+ if (packed)
+ is_idle = (!packed->prq.nr_reqs && !hsq->qcnt) ||
+ hsq->recovery_halt;
+ else
+ is_idle = (!hsq->mrq && !hsq->qcnt) ||
+ hsq->recovery_halt;

*ret = hsq->recovery_halt ? -EBUSY : 0;
hsq->waiting_for_idle = !is_idle;
@@ -312,17 +494,38 @@ static int mmc_hsq_enable(struct mmc_host *mmc, struct mmc_card *card)
.cqe_wait_for_idle = mmc_hsq_wait_for_idle,
.cqe_recovery_start = mmc_hsq_recovery_start,
.cqe_recovery_finish = mmc_hsq_recovery_finish,
+ .cqe_is_busy = mmc_hsq_is_busy,
+ .cqe_commit_rqs = mmc_hsq_commit_rqs,
};

-int mmc_hsq_init(struct mmc_hsq *hsq, struct mmc_host *mmc)
+int mmc_hsq_init(struct mmc_hsq *hsq, struct mmc_host *mmc,
+ const struct hsq_packed_ops *ops, int max_packed)
{
- hsq->num_slots = HSQ_NUM_SLOTS;
- hsq->next_tag = HSQ_INVALID_TAG;
+ if (ops && max_packed > 1) {
+ struct hsq_packed *packed;
+
+ packed = devm_kzalloc(mmc_dev(mmc), sizeof(struct hsq_packed),
+ GFP_KERNEL);
+ if (!packed)
+ return -ENOMEM;
+
+ packed->ops = ops;
+ packed->max_entries = max_packed;
+ INIT_LIST_HEAD(&packed->list);
+ INIT_LIST_HEAD(&packed->prq.list);
+
+ hsq->packed = packed;
+ mmc->max_packed_reqs = max_packed;
+ } else {
+ hsq->num_slots = HSQ_NUM_SLOTS;
+ hsq->next_tag = HSQ_INVALID_TAG;

- hsq->slot = devm_kcalloc(mmc_dev(mmc), hsq->num_slots,
- sizeof(struct hsq_slot), GFP_KERNEL);
- if (!hsq->slot)
- return -ENOMEM;
+ hsq->slot = devm_kcalloc(mmc_dev(mmc), hsq->num_slots,
+ sizeof(struct hsq_slot),
+ GFP_KERNEL);
+ if (!hsq->slot)
+ return -ENOMEM;
+ }

hsq->mmc = mmc;
hsq->mmc->cqe_private = hsq;
diff --git a/drivers/mmc/host/mmc_hsq.h b/drivers/mmc/host/mmc_hsq.h
index 18b9cf5..6fcc0dc 100644
--- a/drivers/mmc/host/mmc_hsq.h
+++ b/drivers/mmc/host/mmc_hsq.h
@@ -2,6 +2,23 @@
#ifndef LINUX_MMC_HSQ_H
#define LINUX_MMC_HSQ_H

+struct hsq_packed_ops {
+ void (*packed_algo)(struct mmc_host *mmc);
+ int (*prepare_hardware)(struct mmc_host *mmc);
+ int (*unprepare_hardware)(struct mmc_host *mmc);
+ int (*packed_request)(struct mmc_host *mmc,
+ struct mmc_packed_request *prq);
+};
+
+struct hsq_packed {
+ bool busy;
+ int max_entries;
+
+ struct list_head list;
+ struct mmc_packed_request prq;
+ const struct hsq_packed_ops *ops;
+};
+
struct hsq_slot {
struct mmc_request *mrq;
};
@@ -20,11 +37,17 @@ struct mmc_hsq {
bool enabled;
bool waiting_for_idle;
bool recovery_halt;
+
+ struct hsq_packed *packed;
};

-int mmc_hsq_init(struct mmc_hsq *hsq, struct mmc_host *mmc);
+int mmc_hsq_init(struct mmc_hsq *hsq, struct mmc_host *mmc,
+ const struct hsq_packed_ops *ops, int max_packed);
void mmc_hsq_suspend(struct mmc_host *mmc);
int mmc_hsq_resume(struct mmc_host *mmc);
bool mmc_hsq_finalize_request(struct mmc_host *mmc, struct mmc_request *mrq);
+void mmc_hsq_finalize_packed_request(struct mmc_host *mmc,
+ struct mmc_packed_request *prq);
+void mmc_hsq_packed_algo_rw(struct mmc_host *mmc);

#endif
diff --git a/drivers/mmc/host/sdhci-sprd.c b/drivers/mmc/host/sdhci-sprd.c
index ae9acb8..49afe1c 100644
--- a/drivers/mmc/host/sdhci-sprd.c
+++ b/drivers/mmc/host/sdhci-sprd.c
@@ -676,7 +676,7 @@ static int sdhci_sprd_probe(struct platform_device *pdev)
goto err_cleanup_host;
}

- ret = mmc_hsq_init(hsq, host->mmc);
+ ret = mmc_hsq_init(hsq, host->mmc, NULL, 0);
if (ret)
goto err_cleanup_host;

diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h
index b7ba881..b1be983 100644
--- a/include/linux/mmc/core.h
+++ b/include/linux/mmc/core.h
@@ -165,6 +165,12 @@ struct mmc_request {
bool cap_cmd_during_tfr;

int tag;
+ struct list_head list;
+};
+
+struct mmc_packed_request {
+ struct list_head list;
+ u32 nr_reqs;
};

struct mmc_card;
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index db5e59c..04b45d0 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -216,6 +216,14 @@ struct mmc_cqe_ops {
* will have zero data bytes transferred.
*/
void (*cqe_recovery_finish)(struct mmc_host *host);
+
+ /* If CQE is busy or not. */
+ bool (*cqe_is_busy)(struct mmc_host *host);
+ /*
+ * Serve the purpose of kicking the hardware to handle pending
+ * requests.
+ */
+ void (*cqe_commit_rqs)(struct mmc_host *host);
};

struct mmc_async_req {
@@ -385,6 +393,7 @@ struct mmc_host {
unsigned int max_blk_size; /* maximum size of one mmc block */
unsigned int max_blk_count; /* maximum number of blocks in one req */
unsigned int max_busy_timeout; /* max busy timeout in ms */
+ unsigned int max_packed_reqs; /* max number of requests can be packed */

/* private data */
spinlock_t lock; /* lock for claim and bus ops */
--
1.9.1

2020-03-16 10:03:53

by Baolin Wang

[permalink] [raw]
Subject: [RESEND RFC PATCH 6/8] mmc: host: sdhci: Remove redundant sg_count member of struct sdhci_host

The mmc_data structure has a member to save the mapped sg count, so
no need introduce a redundant sg_count of struct sdhci_host, remove it.
This is also a preparation patch to support ADMA3 transfer mode.

Signed-off-by: Baolin Wang <[email protected]>
---
drivers/mmc/host/sdhci.c | 12 +++++-------
drivers/mmc/host/sdhci.h | 2 --
2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 4de0f48..6d6f450 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -708,7 +708,7 @@ static void sdhci_adma_mark_end(void *desc)
}

static void sdhci_adma_table_pre(struct sdhci_host *host,
- struct mmc_data *data, int sg_count)
+ struct mmc_data *data)
{
struct scatterlist *sg;
unsigned long flags;
@@ -722,14 +722,12 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
* We currently guess that it is LE.
*/

- host->sg_count = sg_count;
-
desc = host->adma_table;
align = host->align_buffer;

align_addr = host->align_addr;

- for_each_sg(data->sg, sg, host->sg_count, i) {
+ for_each_sg(data->sg, sg, data->sg_count, i) {
addr = sg_dma_address(sg);
len = sg_dma_len(sg);

@@ -801,7 +799,7 @@ static void sdhci_adma_table_post(struct sdhci_host *host,
bool has_unaligned = false;

/* Do a quick scan of the SG list for any unaligned mappings */
- for_each_sg(data->sg, sg, host->sg_count, i)
+ for_each_sg(data->sg, sg, data->sg_count, i)
if (sg_dma_address(sg) & SDHCI_ADMA2_MASK) {
has_unaligned = true;
break;
@@ -813,7 +811,7 @@ static void sdhci_adma_table_post(struct sdhci_host *host,

align = host->align_buffer;

- for_each_sg(data->sg, sg, host->sg_count, i) {
+ for_each_sg(data->sg, sg, data->sg_count, i) {
if (sg_dma_address(sg) & SDHCI_ADMA2_MASK) {
size = SDHCI_ADMA2_ALIGN -
(sg_dma_address(sg) & SDHCI_ADMA2_MASK);
@@ -1133,7 +1131,7 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command *cmd)
WARN_ON(1);
host->flags &= ~SDHCI_REQ_USE_DMA;
} else if (host->flags & SDHCI_USE_ADMA) {
- sdhci_adma_table_pre(host, data, sg_cnt);
+ sdhci_adma_table_pre(host, data);
sdhci_set_adma_addr(host, host->adma_addr);
} else {
WARN_ON(sg_cnt != 1);
diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index 96aed99..f33830b 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -589,8 +589,6 @@ struct sdhci_host {
struct sg_mapping_iter sg_miter; /* SG state for PIO */
unsigned int blocks; /* remaining PIO blocks */

- int sg_count; /* Mapped sg entries */
-
void *adma_table; /* ADMA descriptor table */
void *align_buffer; /* Bounce buffer */
void *integr_table; /* ADMA3 intergrate descriptor table */
--
1.9.1

2020-03-16 10:04:10

by Baolin Wang

[permalink] [raw]
Subject: [RESEND RFC PATCH 7/8] mmc: host: sdhci: Add MMC packed request support

This patch adds MMC packed operations to support packed requests,
and enables ADMA3 transfer mode to support this feature.

Enable ADMA3 transfer mode only for read and write commands, and
we will disable command interrupt and data timeout interrupt,
instead we will use software data timeout for ADMA3 fransfer mode.
For other non-data commands, we still use the ADMA2 transfer, since
no bebefits using ADMA3 transfer.

Signed-off-by: Baolin Wang <[email protected]>
---
drivers/mmc/host/sdhci.c | 316 +++++++++++++++++++++++++++++++++++++++++++++--
drivers/mmc/host/sdhci.h | 11 ++
2 files changed, 315 insertions(+), 12 deletions(-)

diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 6d6f450..5f1f157 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -110,6 +110,19 @@ void sdhci_dumpregs(struct sdhci_host *host)
}
}

+ if (host->adma3_enabled) {
+ if (host->flags & SDHCI_USE_64_BIT_DMA) {
+ SDHCI_DUMP("ADMA3 Err: 0x%08x | ADMA3 Ptr: 0x%08x%08x\n",
+ sdhci_readl(host, SDHCI_ADMA_ERROR),
+ sdhci_readl(host, SDHCI_ADMA3_ADDRESS_HI),
+ sdhci_readl(host, SDHCI_ADMA3_ADDRESS));
+ } else {
+ SDHCI_DUMP("ADMA3 Err: 0x%08x | ADMA3 Ptr: 0x%08x\n",
+ sdhci_readl(host, SDHCI_ADMA_ERROR),
+ sdhci_readl(host, SDHCI_ADMA3_ADDRESS));
+ }
+ }
+
SDHCI_DUMP("============================================\n");
}
EXPORT_SYMBOL_GPL(sdhci_dumpregs);
@@ -287,7 +300,9 @@ static void sdhci_config_dma(struct sdhci_host *host)
goto out;

/* Note if DMA Select is zero then SDMA is selected */
- if (host->flags & SDHCI_USE_ADMA)
+ if (host->adma3_enabled)
+ ctrl |= SDHCI_CTRL_ADMA3;
+ else if (host->flags & SDHCI_USE_ADMA)
ctrl |= SDHCI_CTRL_ADMA32;

if (host->flags & SDHCI_USE_64_BIT_DMA) {
@@ -457,7 +472,7 @@ static inline void sdhci_led_deactivate(struct sdhci_host *host)
static void sdhci_mod_timer(struct sdhci_host *host, struct mmc_request *mrq,
unsigned long timeout)
{
- if (sdhci_data_line_cmd(mrq->cmd))
+ if (host->prq || sdhci_data_line_cmd(mrq->cmd))
mod_timer(&host->data_timer, timeout);
else
mod_timer(&host->timer, timeout);
@@ -465,7 +480,7 @@ static void sdhci_mod_timer(struct sdhci_host *host, struct mmc_request *mrq,

static void sdhci_del_timer(struct sdhci_host *host, struct mmc_request *mrq)
{
- if (sdhci_data_line_cmd(mrq->cmd))
+ if (host->prq || sdhci_data_line_cmd(mrq->cmd))
del_timer(&host->data_timer);
else
del_timer(&host->timer);
@@ -722,10 +737,16 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
* We currently guess that it is LE.
*/

- desc = host->adma_table;
- align = host->align_buffer;
-
- align_addr = host->align_addr;
+ if (host->adma3_enabled) {
+ desc = host->adma3_pos;
+ align = host->adma3_align_pos;
+ align_addr = host->align_addr +
+ host->adma3_align_pos - host->align_buffer;
+ } else {
+ desc = host->adma_table;
+ align = host->align_buffer;
+ align_addr = host->align_addr;
+ }

for_each_sg(data->sg, sg, data->sg_count, i) {
addr = sg_dma_address(sg);
@@ -784,6 +805,11 @@ static void sdhci_adma_table_pre(struct sdhci_host *host,
/* Add a terminating entry - nop, end, valid */
__sdhci_adma_write_desc(host, &desc, 0, 0, ADMA2_NOP_END_VALID);
}
+
+ if (host->adma3_enabled) {
+ host->adma3_pos = desc;
+ host->adma3_align_pos = align;
+ }
}

static void sdhci_adma_table_post(struct sdhci_host *host,
@@ -809,7 +835,10 @@ static void sdhci_adma_table_post(struct sdhci_host *host,
dma_sync_sg_for_cpu(mmc_dev(host->mmc), data->sg,
data->sg_len, DMA_FROM_DEVICE);

- align = host->align_buffer;
+ if (host->adma3_enabled)
+ align = host->adma3_align_pos;
+ else
+ align = host->align_buffer;

for_each_sg(data->sg, sg, data->sg_count, i) {
if (sg_dma_address(sg) & SDHCI_ADMA2_MASK) {
@@ -823,6 +852,9 @@ static void sdhci_adma_table_post(struct sdhci_host *host,
align += SDHCI_ADMA2_ALIGN;
}
}
+
+ if (host->adma3_enabled)
+ host->adma3_align_pos = align;
}
}
}
@@ -1031,7 +1063,7 @@ static void sdhci_set_timeout(struct sdhci_host *host, struct mmc_command *cmd)
static void sdhci_initialize_data(struct sdhci_host *host,
struct mmc_data *data)
{
- WARN_ON(host->data);
+ WARN_ON(!host->prq && host->data);

/* Sanity checks */
BUG_ON(data->blksz * data->blocks > 524288);
@@ -1132,7 +1164,8 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command *cmd)
host->flags &= ~SDHCI_REQ_USE_DMA;
} else if (host->flags & SDHCI_USE_ADMA) {
sdhci_adma_table_pre(host, data);
- sdhci_set_adma_addr(host, host->adma_addr);
+ if (!host->adma3_enabled)
+ sdhci_set_adma_addr(host, host->adma_addr);
} else {
WARN_ON(sg_cnt != 1);
sdhci_set_sdma_addr(host, sdhci_sdma_address(host));
@@ -1155,6 +1188,9 @@ static void sdhci_prepare_data(struct sdhci_host *host, struct mmc_command *cmd)

sdhci_set_transfer_irqs(host);

+ if (host->adma3_enabled)
+ return;
+
sdhci_set_block_info(host, data);
}

@@ -1485,6 +1521,36 @@ static void sdhci_finish_mrq(struct sdhci_host *host, struct mmc_request *mrq)
queue_work(host->complete_wq, &host->complete_work);
}

+static void sdhci_finish_packed_data(struct sdhci_host *host, int error)
+{
+ struct mmc_request *mrq;
+
+ host->data = NULL;
+ /*
+ * Reset the align buffer pointer address for unaligned mappings after
+ * finishing the transfer.
+ */
+ host->adma3_align_pos = host->align_buffer;
+
+ if (error)
+ sdhci_do_reset(host, SDHCI_RESET_CMD | SDHCI_RESET_DATA);
+
+ list_for_each_entry(mrq, &host->prq->list, list) {
+ struct mmc_data *data = mrq->data;
+
+ sdhci_adma_table_post(host, data);
+ data->error = error;
+
+ if (data->error)
+ data->bytes_xfered = 0;
+ else
+ data->bytes_xfered = data->blksz * data->blocks;
+ }
+
+ sdhci_del_timer(host, NULL);
+ sdhci_led_deactivate(host);
+}
+
static void sdhci_finish_data(struct sdhci_host *host)
{
struct mmc_command *data_cmd = host->data_cmd;
@@ -1620,7 +1686,7 @@ void sdhci_send_command(struct sdhci_host *host, struct mmc_command *cmd)

host->cmd = cmd;
host->data_timeout = 0;
- if (sdhci_data_line_cmd(cmd)) {
+ if (!host->prq && sdhci_data_line_cmd(cmd)) {
WARN_ON(host->data_cmd);
host->data_cmd = cmd;
sdhci_set_timeout(host, cmd);
@@ -2026,6 +2092,206 @@ void sdhci_set_power(struct sdhci_host *host, unsigned char mode,
* *
\*****************************************************************************/

+static void sdhci_adma3_write_cmd_desc(struct sdhci_host *host,
+ struct mmc_command *cmd)
+{
+ struct mmc_data *data = cmd->data;
+ struct sdhci_adma3_cmd_desc *cmd_desc = host->adma3_pos;
+ int blksz, command;
+ u16 mode = 0;
+
+ /* Set block count */
+ cmd_desc->cmd = cpu_to_le32(ADMA3_TRAN_VALID);
+ cmd_desc->reg = cpu_to_le32(data->blocks);
+ cmd_desc++;
+
+ /* Set block size */
+ cmd_desc->cmd = cpu_to_le32(ADMA3_TRAN_VALID);
+ blksz = SDHCI_MAKE_BLKSZ(host->sdma_boundary, data->blksz);
+ cmd_desc->reg = cpu_to_le32(blksz);
+ cmd_desc++;
+
+ /* Set argument */
+ cmd_desc->cmd = cpu_to_le32(ADMA3_TRAN_VALID);
+ cmd_desc->reg = cpu_to_le32(cmd->arg);
+ cmd_desc++;
+
+ /* set command and transfer mode */
+ if (data->flags & MMC_DATA_READ)
+ mode |= SDHCI_TRNS_READ;
+
+ if (!(host->quirks2 & SDHCI_QUIRK2_SUPPORT_SINGLE))
+ mode |= SDHCI_TRNS_BLK_CNT_EN;
+
+ if (mmc_op_multi(cmd->opcode) || data->blocks > 1)
+ mode |= SDHCI_TRNS_MULTI;
+
+ sdhci_auto_cmd_select(host, cmd, &mode);
+ mode |= SDHCI_TRNS_DMA;
+
+ command = sdhci_get_command(host, cmd);
+ command = (command << 16) | mode;
+ cmd_desc->cmd = cpu_to_le32(ADMA3_TRAN_END);
+ cmd_desc->reg = cpu_to_le32(command);
+
+ host->adma3_pos +=
+ SDHCI_ADMA3_CMD_DESC_SZ * SDHCI_ADMA3_CMD_DESC_ENTRIES;
+}
+
+static void sdhci_adma3_write_integr_desc(struct sdhci_host *host,
+ dma_addr_t addr)
+{
+ struct sdhci_integr_64_desc *integr_desc = host->integr_table;
+
+ integr_desc->cmd = cpu_to_le32(ADMA3_INTEGR_TRAN_END);
+ integr_desc->addr_lo = cpu_to_le32((u32)addr);
+
+ if (host->flags & SDHCI_USE_64_BIT_DMA)
+ integr_desc->addr_hi = cpu_to_le32((u64)addr >> 32);
+}
+
+static void sdhci_set_adma3_addr(struct sdhci_host *host, dma_addr_t addr)
+{
+ sdhci_writel(host, addr, SDHCI_ADMA3_ADDRESS);
+ if (host->flags & SDHCI_USE_64_BIT_DMA)
+ sdhci_writel(host, (u64)addr >> 32, SDHCI_ADMA3_ADDRESS_HI);
+}
+
+int sdhci_prepare_packed(struct mmc_host *mmc)
+{
+ struct sdhci_host *host = mmc_priv(mmc);
+ unsigned long timeout, flags;
+ u32 mask;
+
+ spin_lock_irqsave(&host->lock, flags);
+
+ if (!(host->flags & SDHCI_USE_ADMA3) ||
+ !(host->flags & (SDHCI_AUTO_CMD23 | SDHCI_AUTO_CMD12))) {
+ spin_unlock_irqrestore(&host->lock, flags);
+ pr_err("%s: Unsupported packed request\n",
+ mmc_hostname(host->mmc));
+ return -EOPNOTSUPP;
+ }
+
+ /* Wait max 10 ms */
+ timeout = 10;
+ mask = SDHCI_CMD_INHIBIT | SDHCI_DATA_INHIBIT;
+
+ while (sdhci_readl(host, SDHCI_PRESENT_STATE) & mask) {
+ if (timeout == 0) {
+ sdhci_dumpregs(host);
+ spin_unlock_irqrestore(&host->lock, flags);
+
+ pr_err("%s: Controller never released inhibit bit(s).\n",
+ mmc_hostname(host->mmc));
+ return -EIO;
+ }
+
+ timeout--;
+ mdelay(1);
+ }
+
+ /* Disable command complete event for ADMA3 mode */
+ host->ier &= ~SDHCI_INT_RESPONSE;
+ sdhci_writel(host, host->ier, SDHCI_INT_ENABLE);
+ sdhci_writel(host, host->ier, SDHCI_SIGNAL_ENABLE);
+
+ /*
+ * Disable data timeout interrupt, and will use software timeout for
+ * packed request.
+ */
+ sdhci_set_data_timeout_irq(host, false);
+
+ /* Enable ADMA3 mode for packed request */
+ host->adma3_enabled = true;
+
+ spin_unlock_irqrestore(&host->lock, flags);
+
+ return 0;
+}
+
+int sdhci_unprepare_packed(struct mmc_host *mmc)
+{
+ struct sdhci_host *host = mmc_priv(mmc);
+ unsigned long flags;
+
+ spin_lock_irqsave(&host->lock, flags);
+
+ /* Disable ADMA3 mode after finishing packed request */
+ host->adma3_enabled = false;
+
+ /* Re-enable command complete event after ADMA3 mode */
+ host->ier |= SDHCI_INT_RESPONSE;
+
+ sdhci_writel(host, host->ier, SDHCI_INT_ENABLE);
+ sdhci_writel(host, host->ier, SDHCI_SIGNAL_ENABLE);
+ spin_unlock_irqrestore(&host->lock, flags);
+
+ return 0;
+}
+
+int sdhci_packed_request(struct mmc_host *mmc,
+ struct mmc_packed_request *prq)
+{
+ struct sdhci_host *host = mmc_priv(mmc);
+ struct mmc_request *mrq;
+ unsigned long timeout, flags;
+ u64 data_timeout = 0;
+ dma_addr_t integr_addr;
+ int present;
+
+ /* Firstly check card presence */
+ present = mmc->ops->get_cd(mmc);
+
+ spin_lock_irqsave(&host->lock, flags);
+
+ sdhci_led_activate(host);
+
+ if (!present || host->flags & SDHCI_DEVICE_DEAD) {
+ spin_unlock_irqrestore(&host->lock, flags);
+ return -ENOMEDIUM;
+ }
+
+ host->prq = prq;
+ host->adma3_pos = host->adma_table;
+ host->adma3_align_pos = host->align_buffer;
+ integr_addr = host->adma_addr;
+
+ list_for_each_entry(mrq, &prq->list, list) {
+ struct mmc_command *cmd = mrq->cmd;
+
+ /* Set command descriptor */
+ sdhci_adma3_write_cmd_desc(host, cmd);
+ /* Set ADMA2 descriptors */
+ sdhci_prepare_data(host, cmd);
+ /* Set integrated descriptor */
+ sdhci_adma3_write_integr_desc(host, integr_addr);
+
+ /* Update the integrated descriptor address */
+ integr_addr =
+ host->adma_addr + (host->adma3_pos - host->adma_table);
+
+ /* Calculate each command's data timeout */
+ sdhci_calc_sw_timeout(host, cmd);
+ data_timeout += host->data_timeout;
+ }
+
+ timeout = jiffies;
+ if (data_timeout)
+ timeout += nsecs_to_jiffies(data_timeout);
+ else
+ timeout += 10 * HZ * prq->nr_reqs;
+ sdhci_mod_timer(host, NULL, timeout);
+
+ /* Start ADMA3 transfer */
+ sdhci_set_adma3_addr(host, host->integr_addr);
+
+ spin_unlock_irqrestore(&host->lock, flags);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(sdhci_packed_request);
+
static void sdhci_start_request(struct mmc_host *mmc, struct mmc_request *mrq,
int present)
{
@@ -2854,9 +3120,19 @@ static bool sdhci_request_done(struct sdhci_host *host)
{
unsigned long flags;
struct mmc_request *mrq;
+ struct mmc_packed_request *prq;
int i;

spin_lock_irqsave(&host->lock, flags);
+ prq = host->prq;
+
+ if (prq) {
+ host->prq = NULL;
+ spin_unlock_irqrestore(&host->lock, flags);
+
+ host->ops->packed_request_done(host, prq);
+ return true;
+ }

for (i = 0; i < SDHCI_MAX_MRQS; i++) {
mrq = host->mrqs_done[i];
@@ -3012,6 +3288,17 @@ static void sdhci_timeout_data_timer(struct timer_list *t)

spin_lock_irqsave(&host->lock, flags);

+ if (host->prq) {
+ pr_err("%s: Packed requests timeout for hardware interrupt.\n",
+ mmc_hostname(host->mmc));
+ sdhci_dumpregs(host);
+ sdhci_finish_packed_data(host, -ETIMEDOUT);
+ queue_work(host->complete_wq, &host->complete_work);
+ spin_unlock_irqrestore(&host->lock, flags);
+
+ return;
+ }
+
if (host->data || host->data_cmd ||
(host->cmd && sdhci_data_line_cmd(host->cmd))) {
pr_err("%s: Timeout waiting for hardware interrupt.\n",
@@ -3219,7 +3506,9 @@ static void sdhci_data_irq(struct sdhci_host *host, u32 intmask)
host->ops->adma_workaround(host, intmask);
}

- if (host->data->error)
+ if (host->prq)
+ sdhci_finish_packed_data(host, host->data->error);
+ else if (host->data->error)
sdhci_finish_data(host);
else {
if (intmask & (SDHCI_INT_DATA_AVAIL | SDHCI_INT_SPACE_AVAIL))
@@ -3391,6 +3680,9 @@ static irqreturn_t sdhci_irq(int irq, void *dev_id)
host->mrqs_done[i] = NULL;
}
}
+
+ if (host->prq)
+ result = IRQ_WAKE_THREAD;
out:
spin_unlock(&host->lock);

diff --git a/drivers/mmc/host/sdhci.h b/drivers/mmc/host/sdhci.h
index f33830b..bbc937b 100644
--- a/drivers/mmc/host/sdhci.h
+++ b/drivers/mmc/host/sdhci.h
@@ -579,6 +579,7 @@ struct sdhci_host {
bool v4_mode; /* Host Version 4 Enable */
bool use_external_dma; /* Host selects to use external DMA */
bool always_defer_done; /* Always defer to complete requests */
+ bool adma3_enabled; /* ADMA3 mode enabled */

struct mmc_request *mrqs_done[SDHCI_MAX_MRQS]; /* Requests done */
struct mmc_command *cmd; /* Current command */
@@ -586,12 +587,15 @@ struct sdhci_host {
struct mmc_data *data; /* Current data request */
unsigned int data_early:1; /* Data finished before cmd */

+ struct mmc_packed_request *prq; /* Current packed request */
struct sg_mapping_iter sg_miter; /* SG state for PIO */
unsigned int blocks; /* remaining PIO blocks */

void *adma_table; /* ADMA descriptor table */
void *align_buffer; /* Bounce buffer */
void *integr_table; /* ADMA3 intergrate descriptor table */
+ void *adma3_pos; /* ADMA3 buffer position */
+ void *adma3_align_pos; /* ADMA3 Bounce buffer position */

size_t adma_table_sz; /* ADMA descriptor table size */
size_t align_buffer_sz; /* Bounce buffer size */
@@ -703,6 +707,8 @@ struct sdhci_ops {
dma_addr_t addr, int len, unsigned int cmd);
void (*request_done)(struct sdhci_host *host,
struct mmc_request *mrq);
+ void (*packed_request_done)(struct sdhci_host *host,
+ struct mmc_packed_request *prq);
};

#ifdef CONFIG_MMC_SDHCI_IO_ACCESSORS
@@ -857,4 +863,9 @@ bool sdhci_cqe_irq(struct sdhci_host *host, u32 intmask, int *cmd_error,
void sdhci_set_data_timeout_irq(struct sdhci_host *host, bool enable);
void __sdhci_set_timeout(struct sdhci_host *host, struct mmc_command *cmd);

+int sdhci_prepare_packed(struct mmc_host *mmc);
+int sdhci_unprepare_packed(struct mmc_host *mmc);
+int sdhci_packed_request(struct mmc_host *mmc,
+ struct mmc_packed_request *prq);
+
#endif /* __SDHCI_HW_H */
--
1.9.1

2020-03-18 10:02:41

by Ming Lei

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> As we know, some SD/MMC host controllers can support packed request,
> that means we can package several requests to host controller at one
> time to improve performence. So the hardware driver expects the blk-mq
> can dispatch a batch of requests at one time, and driver can use bd.last
> to indicate if it is the last request in the batch to help to combine
> requests as much as possible.
>
> Thus we should add batch requests setting from the block driver to tell
> the scheduler how many requests can be dispatched in a batch, as well
> as changing the scheduler to dispatch more than one request if setting
> the maximum batch requests number.
>

I feel this batch dispatch style is more complicated, and some other
drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
.queue_rq().

So what about the following way by extending .commit_rqs() to this usage?
And you can do whatever batch processing in .commit_rqs() which will be
guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 856356b1619e..cd2bbe56f83f 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
* its queue by itself in its completion handler, so we don't need to
* restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
*/
-static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
+static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator;
LIST_HEAD(rq_list);
+ bool ret = false;

do {
struct request *rq;
@@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
* in blk_mq_dispatch_rq_list().
*/
list_add(&rq->queuelist, &rq_list);
- } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
+ ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
+ } while (ret);
+
+ return ret;
}

static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
@@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
* its queue by itself in its completion handler, so we don't need to
* restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
*/
-static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
+static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
LIST_HEAD(rq_list);
struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
+ bool ret = false;

do {
struct request *rq;
@@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)

/* round robin for fair dispatch */
ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
-
- } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
+ ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
+ } while (ret);

WRITE_ONCE(hctx->dispatch_from, ctx);
+
+ return ret;
}

void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
@@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
struct elevator_queue *e = q->elevator;
const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
LIST_HEAD(rq_list);
+ bool dispatch_ret;

/* RCU or SRCU read lock is needed before checking quiesced flag */
if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
@@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
*/
if (!list_empty(&rq_list)) {
blk_mq_sched_mark_restart_hctx(hctx);
- if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
+ dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
+ if (dispatch_ret) {
if (has_sched_dispatch)
- blk_mq_do_dispatch_sched(hctx);
+ dispatch_ret = blk_mq_do_dispatch_sched(hctx);
else
- blk_mq_do_dispatch_ctx(hctx);
+ dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
}
} else if (has_sched_dispatch) {
- blk_mq_do_dispatch_sched(hctx);
+ dispatch_ret = blk_mq_do_dispatch_sched(hctx);
} else if (hctx->dispatch_busy) {
/* dequeue request one by one from sw queue if queue is busy */
- blk_mq_do_dispatch_ctx(hctx);
+ dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
} else {
blk_mq_flush_busy_ctxs(hctx, &rq_list);
- blk_mq_dispatch_rq_list(q, &rq_list, false);
+ dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
+ }
+
+ if (dispatch_ret) {
+ if (hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)
+ hctx->queue->mq_ops->commit_rqs(hctx);
}
}

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 87c6699f35ae..9b46f5d6c7fd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1238,11 +1238,15 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
* Flag last if we have no more requests, or if we have more
* but can't assign a driver tag to it.
*/
- if (list_empty(list))
- bd.last = true;
- else {
- nxt = list_first_entry(list, struct request, queuelist);
- bd.last = !blk_mq_get_driver_tag(nxt);
+ if (!(hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)) {
+ if (list_empty(list))
+ bd.last = true;
+ else {
+ nxt = list_first_entry(list, struct request, queuelist);
+ bd.last = !blk_mq_get_driver_tag(nxt);
+ }
+ } else {
+ bd.last = false;
}

ret = q->mq_ops->queue_rq(hctx, &bd);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 07fa767bff86..c0ef6990b698 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -394,6 +394,7 @@ enum {
BLK_MQ_F_SHOULD_MERGE = 1 << 0,
BLK_MQ_F_TAG_SHARED = 1 << 1,
BLK_MQ_F_NO_MANAGED_IRQ = 1 << 2,
+ BLK_MQ_F_FORCE_COMMIT_RQS = 1 << 3,
BLK_MQ_F_BLOCKING = 1 << 5,
BLK_MQ_F_NO_SCHED = 1 << 6,
BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,

Thanks,
Ming

2020-03-18 10:26:47

by Baolin Wang

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

Hi Ming,

On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
>
> On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > As we know, some SD/MMC host controllers can support packed request,
> > that means we can package several requests to host controller at one
> > time to improve performence. So the hardware driver expects the blk-mq
> > can dispatch a batch of requests at one time, and driver can use bd.last
> > to indicate if it is the last request in the batch to help to combine
> > requests as much as possible.
> >
> > Thus we should add batch requests setting from the block driver to tell
> > the scheduler how many requests can be dispatched in a batch, as well
> > as changing the scheduler to dispatch more than one request if setting
> > the maximum batch requests number.
> >
>
> I feel this batch dispatch style is more complicated, and some other
> drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> .queue_rq().
>
> So what about the following way by extending .commit_rqs() to this usage?
> And you can do whatever batch processing in .commit_rqs() which will be
> guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.

I'm very appreciated for your good suggestion, which is much simpler than mine.
It seems to solve my problem, and I will try it on my platform to see
if it can work and give you the feadback. Thanks again.

> diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> index 856356b1619e..cd2bbe56f83f 100644
> --- a/block/blk-mq-sched.c
> +++ b/block/blk-mq-sched.c
> @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> * its queue by itself in its completion handler, so we don't need to
> * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> */
> -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> {
> struct request_queue *q = hctx->queue;
> struct elevator_queue *e = q->elevator;
> LIST_HEAD(rq_list);
> + bool ret = false;
>
> do {
> struct request *rq;
> @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> * in blk_mq_dispatch_rq_list().
> */
> list_add(&rq->queuelist, &rq_list);
> - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> + } while (ret);
> +
> + return ret;
> }
>
> static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> * its queue by itself in its completion handler, so we don't need to
> * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> */
> -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> {
> struct request_queue *q = hctx->queue;
> LIST_HEAD(rq_list);
> struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> + bool ret = false;
>
> do {
> struct request *rq;
> @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
>
> /* round robin for fair dispatch */
> ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> -
> - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> + } while (ret);
>
> WRITE_ONCE(hctx->dispatch_from, ctx);
> +
> + return ret;
> }
>
> void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> struct elevator_queue *e = q->elevator;
> const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> LIST_HEAD(rq_list);
> + bool dispatch_ret;
>
> /* RCU or SRCU read lock is needed before checking quiesced flag */
> if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> */
> if (!list_empty(&rq_list)) {
> blk_mq_sched_mark_restart_hctx(hctx);
> - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> + if (dispatch_ret) {
> if (has_sched_dispatch)
> - blk_mq_do_dispatch_sched(hctx);
> + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> else
> - blk_mq_do_dispatch_ctx(hctx);
> + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> }
> } else if (has_sched_dispatch) {
> - blk_mq_do_dispatch_sched(hctx);
> + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> } else if (hctx->dispatch_busy) {
> /* dequeue request one by one from sw queue if queue is busy */
> - blk_mq_do_dispatch_ctx(hctx);
> + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> } else {
> blk_mq_flush_busy_ctxs(hctx, &rq_list);
> - blk_mq_dispatch_rq_list(q, &rq_list, false);
> + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> + }
> +
> + if (dispatch_ret) {
> + if (hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)
> + hctx->queue->mq_ops->commit_rqs(hctx);
> }
> }
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 87c6699f35ae..9b46f5d6c7fd 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -1238,11 +1238,15 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
> * Flag last if we have no more requests, or if we have more
> * but can't assign a driver tag to it.
> */
> - if (list_empty(list))
> - bd.last = true;
> - else {
> - nxt = list_first_entry(list, struct request, queuelist);
> - bd.last = !blk_mq_get_driver_tag(nxt);
> + if (!(hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)) {
> + if (list_empty(list))
> + bd.last = true;
> + else {
> + nxt = list_first_entry(list, struct request, queuelist);
> + bd.last = !blk_mq_get_driver_tag(nxt);
> + }
> + } else {
> + bd.last = false;
> }
>
> ret = q->mq_ops->queue_rq(hctx, &bd);
> diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> index 07fa767bff86..c0ef6990b698 100644
> --- a/include/linux/blk-mq.h
> +++ b/include/linux/blk-mq.h
> @@ -394,6 +394,7 @@ enum {
> BLK_MQ_F_SHOULD_MERGE = 1 << 0,
> BLK_MQ_F_TAG_SHARED = 1 << 1,
> BLK_MQ_F_NO_MANAGED_IRQ = 1 << 2,
> + BLK_MQ_F_FORCE_COMMIT_RQS = 1 << 3,
> BLK_MQ_F_BLOCKING = 1 << 5,
> BLK_MQ_F_NO_SCHED = 1 << 6,
> BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
>
> Thanks,
> Ming
>


--
Baolin Wang

2020-03-20 10:28:28

by Baolin Wang

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

Hi Ming,

On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <[email protected]> wrote:
>
> Hi Ming,
>
> On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
> >
> > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > As we know, some SD/MMC host controllers can support packed request,
> > > that means we can package several requests to host controller at one
> > > time to improve performence. So the hardware driver expects the blk-mq
> > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > to indicate if it is the last request in the batch to help to combine
> > > requests as much as possible.
> > >
> > > Thus we should add batch requests setting from the block driver to tell
> > > the scheduler how many requests can be dispatched in a batch, as well
> > > as changing the scheduler to dispatch more than one request if setting
> > > the maximum batch requests number.
> > >
> >
> > I feel this batch dispatch style is more complicated, and some other
> > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > .queue_rq().
> >
> > So what about the following way by extending .commit_rqs() to this usage?
> > And you can do whatever batch processing in .commit_rqs() which will be
> > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
>
> I'm very appreciated for your good suggestion, which is much simpler than mine.
> It seems to solve my problem, and I will try it on my platform to see
> if it can work and give you the feadback. Thanks again.

I tried your approach on my platform, but met some problems, see below.

>
> > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > index 856356b1619e..cd2bbe56f83f 100644
> > --- a/block/blk-mq-sched.c
> > +++ b/block/blk-mq-sched.c
> > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> > * its queue by itself in its completion handler, so we don't need to
> > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > */
> > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > {
> > struct request_queue *q = hctx->queue;
> > struct elevator_queue *e = q->elevator;
> > LIST_HEAD(rq_list);
> > + bool ret = false;
> >
> > do {
> > struct request *rq;
> > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > * in blk_mq_dispatch_rq_list().
> > */
> > list_add(&rq->queuelist, &rq_list);
> > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > + } while (ret);
> > +
> > + return ret;
> > }
> >
> > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > * its queue by itself in its completion handler, so we don't need to
> > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > */
> > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > {
> > struct request_queue *q = hctx->queue;
> > LIST_HEAD(rq_list);
> > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > + bool ret = false;
> >
> > do {
> > struct request *rq;
> > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> >
> > /* round robin for fair dispatch */
> > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > -
> > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > + } while (ret);
> >
> > WRITE_ONCE(hctx->dispatch_from, ctx);
> > +
> > + return ret;
> > }
> >
> > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > struct elevator_queue *e = q->elevator;
> > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > LIST_HEAD(rq_list);
> > + bool dispatch_ret;
> >
> > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > */
> > if (!list_empty(&rq_list)) {
> > blk_mq_sched_mark_restart_hctx(hctx);
> > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > + if (dispatch_ret) {
> > if (has_sched_dispatch)
> > - blk_mq_do_dispatch_sched(hctx);
> > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);

If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
and got dispatch_ret = true now. Then we will try to dispatch more
reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
more requests in scheduler, then we will got dispatch_ret = false. In
this case, we will not issue commit_rqs() to tell the hardware to
handle previous request dispatched from &rq_list.

So I think we should not overlap the 'dispatch_ret'? Or do you have
any other thoughts to fix?

> > else
> > - blk_mq_do_dispatch_ctx(hctx);
> > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> > }
> > } else if (has_sched_dispatch) {
> > - blk_mq_do_dispatch_sched(hctx);
> > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > } else if (hctx->dispatch_busy) {
> > /* dequeue request one by one from sw queue if queue is busy */
> > - blk_mq_do_dispatch_ctx(hctx);
> > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> > } else {
> > blk_mq_flush_busy_ctxs(hctx, &rq_list);
> > - blk_mq_dispatch_rq_list(q, &rq_list, false);
> > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > + }
> > +
> > + if (dispatch_ret) {
> > + if (hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)
> > + hctx->queue->mq_ops->commit_rqs(hctx);
> > }
> > }
> >
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 87c6699f35ae..9b46f5d6c7fd 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -1238,11 +1238,15 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
> > * Flag last if we have no more requests, or if we have more
> > * but can't assign a driver tag to it.
> > */
> > - if (list_empty(list))
> > - bd.last = true;
> > - else {
> > - nxt = list_first_entry(list, struct request, queuelist);
> > - bd.last = !blk_mq_get_driver_tag(nxt);
> > + if (!(hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)) {
> > + if (list_empty(list))
> > + bd.last = true;
> > + else {
> > + nxt = list_first_entry(list, struct request, queuelist);
> > + bd.last = !blk_mq_get_driver_tag(nxt);
> > + }
> > + } else {
> > + bd.last = false;

If we enabled BLK_MQ_F_FORCE_COMMIT_RQS flag, we will always get
bd.last = false even for the real last request in the IO scheduler. I
know you already use commit_irqs() to help to kick driver. But I
worried if it is reasonable that drivers always get bd.last = false.

Thanks for your approach.

> > }
> >
> > ret = q->mq_ops->queue_rq(hctx, &bd);
> > diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
> > index 07fa767bff86..c0ef6990b698 100644
> > --- a/include/linux/blk-mq.h
> > +++ b/include/linux/blk-mq.h
> > @@ -394,6 +394,7 @@ enum {
> > BLK_MQ_F_SHOULD_MERGE = 1 << 0,
> > BLK_MQ_F_TAG_SHARED = 1 << 1,
> > BLK_MQ_F_NO_MANAGED_IRQ = 1 << 2,
> > + BLK_MQ_F_FORCE_COMMIT_RQS = 1 << 3,
> > BLK_MQ_F_BLOCKING = 1 << 5,
> > BLK_MQ_F_NO_SCHED = 1 << 6,
> > BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
> >
> > Thanks,
> > Ming
> >
>
>
> --
> Baolin Wang



--
Baolin Wang

2020-03-23 03:46:41

by Ming Lei

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

On Fri, Mar 20, 2020 at 06:27:41PM +0800, Baolin Wang wrote:
> Hi Ming,
>
> On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <[email protected]> wrote:
> >
> > Hi Ming,
> >
> > On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
> > >
> > > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > > As we know, some SD/MMC host controllers can support packed request,
> > > > that means we can package several requests to host controller at one
> > > > time to improve performence. So the hardware driver expects the blk-mq
> > > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > > to indicate if it is the last request in the batch to help to combine
> > > > requests as much as possible.
> > > >
> > > > Thus we should add batch requests setting from the block driver to tell
> > > > the scheduler how many requests can be dispatched in a batch, as well
> > > > as changing the scheduler to dispatch more than one request if setting
> > > > the maximum batch requests number.
> > > >
> > >
> > > I feel this batch dispatch style is more complicated, and some other
> > > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > > .queue_rq().
> > >
> > > So what about the following way by extending .commit_rqs() to this usage?
> > > And you can do whatever batch processing in .commit_rqs() which will be
> > > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
> >
> > I'm very appreciated for your good suggestion, which is much simpler than mine.
> > It seems to solve my problem, and I will try it on my platform to see
> > if it can work and give you the feadback. Thanks again.
>
> I tried your approach on my platform, but met some problems, see below.
>
> >
> > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > index 856356b1619e..cd2bbe56f83f 100644
> > > --- a/block/blk-mq-sched.c
> > > +++ b/block/blk-mq-sched.c
> > > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> > > * its queue by itself in its completion handler, so we don't need to
> > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > */
> > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > {
> > > struct request_queue *q = hctx->queue;
> > > struct elevator_queue *e = q->elevator;
> > > LIST_HEAD(rq_list);
> > > + bool ret = false;
> > >
> > > do {
> > > struct request *rq;
> > > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > * in blk_mq_dispatch_rq_list().
> > > */
> > > list_add(&rq->queuelist, &rq_list);
> > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > + } while (ret);
> > > +
> > > + return ret;
> > > }
> > >
> > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > * its queue by itself in its completion handler, so we don't need to
> > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > */
> > > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > {
> > > struct request_queue *q = hctx->queue;
> > > LIST_HEAD(rq_list);
> > > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > > + bool ret = false;
> > >
> > > do {
> > > struct request *rq;
> > > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > >
> > > /* round robin for fair dispatch */
> > > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > > -
> > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > + } while (ret);
> > >
> > > WRITE_ONCE(hctx->dispatch_from, ctx);
> > > +
> > > + return ret;
> > > }
> > >
> > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > struct elevator_queue *e = q->elevator;
> > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > LIST_HEAD(rq_list);
> > > + bool dispatch_ret;
> > >
> > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > */
> > > if (!list_empty(&rq_list)) {
> > > blk_mq_sched_mark_restart_hctx(hctx);
> > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > + if (dispatch_ret) {
> > > if (has_sched_dispatch)
> > > - blk_mq_do_dispatch_sched(hctx);
> > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
>
> If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
> and got dispatch_ret = true now. Then we will try to dispatch more
> reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
> more requests in scheduler, then we will got dispatch_ret = false. In

'dispatch_ret' always holds result of the last blk_mq_do_dispatch_sched().
When any one request has been dispatched successfully, 'dispatch_ret'
is true. New request is always added to list before calling
blk_mq_do_dispatch_sched(), so once blk_mq_do_dispatch_sched() returns
false, it means that .commit_rqs() has been called.

> this case, we will not issue commit_rqs() to tell the hardware to
> handle previous request dispatched from &rq_list.
>
> So I think we should not overlap the 'dispatch_ret'? Or do you have
> any other thoughts to fix?
>
> > > else
> > > - blk_mq_do_dispatch_ctx(hctx);
> > > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> > > }
> > > } else if (has_sched_dispatch) {
> > > - blk_mq_do_dispatch_sched(hctx);
> > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > } else if (hctx->dispatch_busy) {
> > > /* dequeue request one by one from sw queue if queue is busy */
> > > - blk_mq_do_dispatch_ctx(hctx);
> > > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> > > } else {
> > > blk_mq_flush_busy_ctxs(hctx, &rq_list);
> > > - blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > + }
> > > +
> > > + if (dispatch_ret) {
> > > + if (hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)
> > > + hctx->queue->mq_ops->commit_rqs(hctx);
> > > }
> > > }
> > >
> > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > index 87c6699f35ae..9b46f5d6c7fd 100644
> > > --- a/block/blk-mq.c
> > > +++ b/block/blk-mq.c
> > > @@ -1238,11 +1238,15 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
> > > * Flag last if we have no more requests, or if we have more
> > > * but can't assign a driver tag to it.
> > > */
> > > - if (list_empty(list))
> > > - bd.last = true;
> > > - else {
> > > - nxt = list_first_entry(list, struct request, queuelist);
> > > - bd.last = !blk_mq_get_driver_tag(nxt);
> > > + if (!(hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)) {
> > > + if (list_empty(list))
> > > + bd.last = true;
> > > + else {
> > > + nxt = list_first_entry(list, struct request, queuelist);
> > > + bd.last = !blk_mq_get_driver_tag(nxt);
> > > + }
> > > + } else {
> > > + bd.last = false;
>
> If we enabled BLK_MQ_F_FORCE_COMMIT_RQS flag, we will always get
> bd.last = false even for the real last request in the IO scheduler. I
> know you already use commit_irqs() to help to kick driver. But I
> worried if it is reasonable that drivers always get bd.last = false.
>

BLK_MQ_F_FORCE_COMMIT_RQS means the .last flag is ignored, and we can
document this usage.


Thanks,
Ming

2020-03-23 05:37:28

by Baolin Wang

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

On Mon, Mar 23, 2020 at 11:44 AM Ming Lei <[email protected]> wrote:
>
> On Fri, Mar 20, 2020 at 06:27:41PM +0800, Baolin Wang wrote:
> > Hi Ming,
> >
> > On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <[email protected]> wrote:
> > >
> > > Hi Ming,
> > >
> > > On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
> > > >
> > > > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > > > As we know, some SD/MMC host controllers can support packed request,
> > > > > that means we can package several requests to host controller at one
> > > > > time to improve performence. So the hardware driver expects the blk-mq
> > > > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > > > to indicate if it is the last request in the batch to help to combine
> > > > > requests as much as possible.
> > > > >
> > > > > Thus we should add batch requests setting from the block driver to tell
> > > > > the scheduler how many requests can be dispatched in a batch, as well
> > > > > as changing the scheduler to dispatch more than one request if setting
> > > > > the maximum batch requests number.
> > > > >
> > > >
> > > > I feel this batch dispatch style is more complicated, and some other
> > > > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > > > .queue_rq().
> > > >
> > > > So what about the following way by extending .commit_rqs() to this usage?
> > > > And you can do whatever batch processing in .commit_rqs() which will be
> > > > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
> > >
> > > I'm very appreciated for your good suggestion, which is much simpler than mine.
> > > It seems to solve my problem, and I will try it on my platform to see
> > > if it can work and give you the feadback. Thanks again.
> >
> > I tried your approach on my platform, but met some problems, see below.
> >
> > >
> > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > > index 856356b1619e..cd2bbe56f83f 100644
> > > > --- a/block/blk-mq-sched.c
> > > > +++ b/block/blk-mq-sched.c
> > > > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> > > > * its queue by itself in its completion handler, so we don't need to
> > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > */
> > > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > {
> > > > struct request_queue *q = hctx->queue;
> > > > struct elevator_queue *e = q->elevator;
> > > > LIST_HEAD(rq_list);
> > > > + bool ret = false;
> > > >
> > > > do {
> > > > struct request *rq;
> > > > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > * in blk_mq_dispatch_rq_list().
> > > > */
> > > > list_add(&rq->queuelist, &rq_list);
> > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > + } while (ret);
> > > > +
> > > > + return ret;
> > > > }
> > > >
> > > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > * its queue by itself in its completion handler, so we don't need to
> > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > */
> > > > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > {
> > > > struct request_queue *q = hctx->queue;
> > > > LIST_HEAD(rq_list);
> > > > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > > > + bool ret = false;
> > > >
> > > > do {
> > > > struct request *rq;
> > > > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > >
> > > > /* round robin for fair dispatch */
> > > > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > > > -
> > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > + } while (ret);
> > > >
> > > > WRITE_ONCE(hctx->dispatch_from, ctx);
> > > > +
> > > > + return ret;
> > > > }
> > > >
> > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > struct elevator_queue *e = q->elevator;
> > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > LIST_HEAD(rq_list);
> > > > + bool dispatch_ret;
> > > >
> > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > */
> > > > if (!list_empty(&rq_list)) {
> > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > + if (dispatch_ret) {
> > > > if (has_sched_dispatch)
> > > > - blk_mq_do_dispatch_sched(hctx);
> > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> >
> > If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
> > and got dispatch_ret = true now. Then we will try to dispatch more
> > reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
> > more requests in scheduler, then we will got dispatch_ret = false. In
>
> 'dispatch_ret' always holds result of the last blk_mq_do_dispatch_sched().
> When any one request has been dispatched successfully, 'dispatch_ret'
> is true. New request is always added to list before calling
> blk_mq_do_dispatch_sched(), so once blk_mq_do_dispatch_sched() returns
> false, it means that .commit_rqs() has been called.

Not really, if no requests int the IO cheduler, we will break the loop
in blk_mq_do_dispatch_sched() and return false without calling
.commit_rqs().

So in this case, blk_mq_do_dispatch_sched() will return 'false', which
overlapped the return value of 'true' from blk_mq_dispatch_rq_list(),
and did not call .commit_rqs(). Then the IO processing will be stuck.

static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
{
struct request_queue *q = hctx->queue;
struct elevator_queue *e = q->elevator;
LIST_HEAD(rq_list);
bool ret = false;

do {
struct request *rq;

if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
break;

.......
} while (ret);

return ret;
}

>
> > this case, we will not issue commit_rqs() to tell the hardware to
> > handle previous request dispatched from &rq_list.
> >
> > So I think we should not overlap the 'dispatch_ret'? Or do you have
> > any other thoughts to fix?
> >
> > > > else
> > > > - blk_mq_do_dispatch_ctx(hctx);
> > > > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> > > > }
> > > > } else if (has_sched_dispatch) {
> > > > - blk_mq_do_dispatch_sched(hctx);
> > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > > } else if (hctx->dispatch_busy) {
> > > > /* dequeue request one by one from sw queue if queue is busy */
> > > > - blk_mq_do_dispatch_ctx(hctx);
> > > > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> > > > } else {
> > > > blk_mq_flush_busy_ctxs(hctx, &rq_list);
> > > > - blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > + }
> > > > +
> > > > + if (dispatch_ret) {
> > > > + if (hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)
> > > > + hctx->queue->mq_ops->commit_rqs(hctx);
> > > > }
> > > > }
> > > >
> > > > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > > > index 87c6699f35ae..9b46f5d6c7fd 100644
> > > > --- a/block/blk-mq.c
> > > > +++ b/block/blk-mq.c
> > > > @@ -1238,11 +1238,15 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
> > > > * Flag last if we have no more requests, or if we have more
> > > > * but can't assign a driver tag to it.
> > > > */
> > > > - if (list_empty(list))
> > > > - bd.last = true;
> > > > - else {
> > > > - nxt = list_first_entry(list, struct request, queuelist);
> > > > - bd.last = !blk_mq_get_driver_tag(nxt);
> > > > + if (!(hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)) {
> > > > + if (list_empty(list))
> > > > + bd.last = true;
> > > > + else {
> > > > + nxt = list_first_entry(list, struct request, queuelist);
> > > > + bd.last = !blk_mq_get_driver_tag(nxt);
> > > > + }
> > > > + } else {
> > > > + bd.last = false;
> >
> > If we enabled BLK_MQ_F_FORCE_COMMIT_RQS flag, we will always get
> > bd.last = false even for the real last request in the IO scheduler. I
> > know you already use commit_irqs() to help to kick driver. But I
> > worried if it is reasonable that drivers always get bd.last = false.
> >
>
> BLK_MQ_F_FORCE_COMMIT_RQS means the .last flag is ignored, and we can
> document this usage.

OK. Thanks for your comments.

--
Baolin Wang

2020-03-23 07:29:58

by Ming Lei

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

On Mon, Mar 23, 2020 at 01:36:34PM +0800, Baolin Wang wrote:
> On Mon, Mar 23, 2020 at 11:44 AM Ming Lei <[email protected]> wrote:
> >
> > On Fri, Mar 20, 2020 at 06:27:41PM +0800, Baolin Wang wrote:
> > > Hi Ming,
> > >
> > > On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <[email protected]> wrote:
> > > >
> > > > Hi Ming,
> > > >
> > > > On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
> > > > >
> > > > > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > > > > As we know, some SD/MMC host controllers can support packed request,
> > > > > > that means we can package several requests to host controller at one
> > > > > > time to improve performence. So the hardware driver expects the blk-mq
> > > > > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > > > > to indicate if it is the last request in the batch to help to combine
> > > > > > requests as much as possible.
> > > > > >
> > > > > > Thus we should add batch requests setting from the block driver to tell
> > > > > > the scheduler how many requests can be dispatched in a batch, as well
> > > > > > as changing the scheduler to dispatch more than one request if setting
> > > > > > the maximum batch requests number.
> > > > > >
> > > > >
> > > > > I feel this batch dispatch style is more complicated, and some other
> > > > > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > > > > .queue_rq().
> > > > >
> > > > > So what about the following way by extending .commit_rqs() to this usage?
> > > > > And you can do whatever batch processing in .commit_rqs() which will be
> > > > > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
> > > >
> > > > I'm very appreciated for your good suggestion, which is much simpler than mine.
> > > > It seems to solve my problem, and I will try it on my platform to see
> > > > if it can work and give you the feadback. Thanks again.
> > >
> > > I tried your approach on my platform, but met some problems, see below.
> > >
> > > >
> > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > > > index 856356b1619e..cd2bbe56f83f 100644
> > > > > --- a/block/blk-mq-sched.c
> > > > > +++ b/block/blk-mq-sched.c
> > > > > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > */
> > > > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > {
> > > > > struct request_queue *q = hctx->queue;
> > > > > struct elevator_queue *e = q->elevator;
> > > > > LIST_HEAD(rq_list);
> > > > > + bool ret = false;
> > > > >
> > > > > do {
> > > > > struct request *rq;
> > > > > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > * in blk_mq_dispatch_rq_list().
> > > > > */
> > > > > list_add(&rq->queuelist, &rq_list);
> > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > + } while (ret);
> > > > > +
> > > > > + return ret;
> > > > > }
> > > > >
> > > > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > */
> > > > > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > {
> > > > > struct request_queue *q = hctx->queue;
> > > > > LIST_HEAD(rq_list);
> > > > > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > > > > + bool ret = false;
> > > > >
> > > > > do {
> > > > > struct request *rq;
> > > > > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > >
> > > > > /* round robin for fair dispatch */
> > > > > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > > > > -
> > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > + } while (ret);
> > > > >
> > > > > WRITE_ONCE(hctx->dispatch_from, ctx);
> > > > > +
> > > > > + return ret;
> > > > > }
> > > > >
> > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > struct elevator_queue *e = q->elevator;
> > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > LIST_HEAD(rq_list);
> > > > > + bool dispatch_ret;
> > > > >
> > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > */
> > > > > if (!list_empty(&rq_list)) {
> > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > > + if (dispatch_ret) {
> > > > > if (has_sched_dispatch)
> > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > >
> > > If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
> > > and got dispatch_ret = true now. Then we will try to dispatch more
> > > reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
> > > more requests in scheduler, then we will got dispatch_ret = false. In
> >
> > 'dispatch_ret' always holds result of the last blk_mq_do_dispatch_sched().
> > When any one request has been dispatched successfully, 'dispatch_ret'
> > is true. New request is always added to list before calling
> > blk_mq_do_dispatch_sched(), so once blk_mq_do_dispatch_sched() returns
> > false, it means that .commit_rqs() has been called.
>
> Not really, if no requests int the IO cheduler, we will break the loop
> in blk_mq_do_dispatch_sched() and return false without calling
> .commit_rqs().

If there isn't any request to dispatch, false is returned. Otherwise,
always return the return value of last 'blk_mq_dispatch_rq_list'.

>
> So in this case, blk_mq_do_dispatch_sched() will return 'false', which
> overlapped the return value of 'true' from blk_mq_dispatch_rq_list(),
> and did not call .commit_rqs(). Then the IO processing will be stuck.

See below.

>
> static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> {
> struct request_queue *q = hctx->queue;
> struct elevator_queue *e = q->elevator;
> LIST_HEAD(rq_list);
> bool ret = false;

The above initialization is just done once.

>
> do {
> struct request *rq;
>
> if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
> break;
>
> .......
ret = blk_mq_dispatch_rq_list(q, list, ...);

list includes one request, so blk_mq_dispatch_rq_list() won't return
false in case of no request in list.

> } while (ret);
>
> return ret;

'ret' is always updated by return value of last blk_mq_dispatch_rq_list()
if at least one request is dispatched. So if it becomes false, the loop
breaks, that means .commit_rqs() has been called cause 'list' does
include one request for blk_mq_dispatch_rq_list(). Otherwise, true is
still returned.

thanks,
Ming

2020-03-23 08:23:27

by Baolin Wang

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

On Mon, Mar 23, 2020 at 3:27 PM Ming Lei <[email protected]> wrote:
>
> On Mon, Mar 23, 2020 at 01:36:34PM +0800, Baolin Wang wrote:
> > On Mon, Mar 23, 2020 at 11:44 AM Ming Lei <[email protected]> wrote:
> > >
> > > On Fri, Mar 20, 2020 at 06:27:41PM +0800, Baolin Wang wrote:
> > > > Hi Ming,
> > > >
> > > > On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <[email protected]> wrote:
> > > > >
> > > > > Hi Ming,
> > > > >
> > > > > On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > > > > > As we know, some SD/MMC host controllers can support packed request,
> > > > > > > that means we can package several requests to host controller at one
> > > > > > > time to improve performence. So the hardware driver expects the blk-mq
> > > > > > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > > > > > to indicate if it is the last request in the batch to help to combine
> > > > > > > requests as much as possible.
> > > > > > >
> > > > > > > Thus we should add batch requests setting from the block driver to tell
> > > > > > > the scheduler how many requests can be dispatched in a batch, as well
> > > > > > > as changing the scheduler to dispatch more than one request if setting
> > > > > > > the maximum batch requests number.
> > > > > > >
> > > > > >
> > > > > > I feel this batch dispatch style is more complicated, and some other
> > > > > > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > > > > > .queue_rq().
> > > > > >
> > > > > > So what about the following way by extending .commit_rqs() to this usage?
> > > > > > And you can do whatever batch processing in .commit_rqs() which will be
> > > > > > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
> > > > >
> > > > > I'm very appreciated for your good suggestion, which is much simpler than mine.
> > > > > It seems to solve my problem, and I will try it on my platform to see
> > > > > if it can work and give you the feadback. Thanks again.
> > > >
> > > > I tried your approach on my platform, but met some problems, see below.
> > > >
> > > > >
> > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > > > > index 856356b1619e..cd2bbe56f83f 100644
> > > > > > --- a/block/blk-mq-sched.c
> > > > > > +++ b/block/blk-mq-sched.c
> > > > > > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > */
> > > > > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > {
> > > > > > struct request_queue *q = hctx->queue;
> > > > > > struct elevator_queue *e = q->elevator;
> > > > > > LIST_HEAD(rq_list);
> > > > > > + bool ret = false;
> > > > > >
> > > > > > do {
> > > > > > struct request *rq;
> > > > > > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > * in blk_mq_dispatch_rq_list().
> > > > > > */
> > > > > > list_add(&rq->queuelist, &rq_list);
> > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > + } while (ret);
> > > > > > +
> > > > > > + return ret;
> > > > > > }
> > > > > >
> > > > > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > */
> > > > > > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > {
> > > > > > struct request_queue *q = hctx->queue;
> > > > > > LIST_HEAD(rq_list);
> > > > > > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > > > > > + bool ret = false;
> > > > > >
> > > > > > do {
> > > > > > struct request *rq;
> > > > > > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > >
> > > > > > /* round robin for fair dispatch */
> > > > > > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > > > > > -
> > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > + } while (ret);
> > > > > >
> > > > > > WRITE_ONCE(hctx->dispatch_from, ctx);
> > > > > > +
> > > > > > + return ret;
> > > > > > }
> > > > > >
> > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > struct elevator_queue *e = q->elevator;
> > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > > LIST_HEAD(rq_list);
> > > > > > + bool dispatch_ret;
> > > > > >
> > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > */
> > > > > > if (!list_empty(&rq_list)) {
> > > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > > > + if (dispatch_ret) {
> > > > > > if (has_sched_dispatch)
> > > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > >
> > > > If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
> > > > and got dispatch_ret = true now. Then we will try to dispatch more
> > > > reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
> > > > more requests in scheduler, then we will got dispatch_ret = false. In
> > >
> > > 'dispatch_ret' always holds result of the last blk_mq_do_dispatch_sched().
> > > When any one request has been dispatched successfully, 'dispatch_ret'
> > > is true. New request is always added to list before calling
> > > blk_mq_do_dispatch_sched(), so once blk_mq_do_dispatch_sched() returns
> > > false, it means that .commit_rqs() has been called.
> >
> > Not really, if no requests int the IO cheduler, we will break the loop
> > in blk_mq_do_dispatch_sched() and return false without calling
> > .commit_rqs().
>
> If there isn't any request to dispatch, false is returned. Otherwise,
> always return the return value of last 'blk_mq_dispatch_rq_list'.
>
> >
> > So in this case, blk_mq_do_dispatch_sched() will return 'false', which
> > overlapped the return value of 'true' from blk_mq_dispatch_rq_list(),
> > and did not call .commit_rqs(). Then the IO processing will be stuck.
>
> See below.
>
> >
> > static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > {
> > struct request_queue *q = hctx->queue;
> > struct elevator_queue *e = q->elevator;
> > LIST_HEAD(rq_list);
> > bool ret = false;
>
> The above initialization is just done once.
>
> >
> > do {
> > struct request *rq;
> >
> > if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
> > break;
> >
> > .......
> ret = blk_mq_dispatch_rq_list(q, list, ...);
>
> list includes one request, so blk_mq_dispatch_rq_list() won't return
> false in case of no request in list.
>
> > } while (ret);
> >
> > return ret;
>
> 'ret' is always updated by return value of last blk_mq_dispatch_rq_list()
> if at least one request is dispatched. So if it becomes false, the loop
> breaks, that means .commit_rqs() has been called cause 'list' does
> include one request for blk_mq_dispatch_rq_list(). Otherwise, true is
> still returned.

Sorry for my confusing description, let me try again to describe the problem.
When I try to mount the block device, I got the IO stuck with your
patch, and I did some debugging. I found we missed calling
commit_rqs() for one case:

void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
@@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct
blk_mq_hw_ctx *hctx)
struct elevator_queue *e = q->elevator;
const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
LIST_HEAD(rq_list);
+ bool dispatch_ret;

/* RCU or SRCU read lock is needed before checking quiesced flag */
if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
@@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct
blk_mq_hw_ctx *hctx)
*/
if (!list_empty(&rq_list)) {
blk_mq_sched_mark_restart_hctx(hctx);
- if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
+ dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);

Suppose we dispatch one request to block driver, and return 'true' here.

+ if (dispatch_ret) {
if (has_sched_dispatch)
- blk_mq_do_dispatch_sched(hctx);
+ dispatch_ret = blk_mq_do_dispatch_sched(hctx);

Then we will continue to try to dispatch more requests from IO
scheduler, but if there are no requests in IO scheduler now, it will
return 'false' here, and set dispatch_ret as false.

else
- blk_mq_do_dispatch_ctx(hctx);
+ dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
}
} else if (has_sched_dispatch) {
- blk_mq_do_dispatch_sched(hctx);
+ dispatch_ret = blk_mq_do_dispatch_sched(hctx);
} else if (hctx->dispatch_busy) {
/* dequeue request one by one from sw queue if queue is busy */
- blk_mq_do_dispatch_ctx(hctx);
+ dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
} else {
blk_mq_flush_busy_ctxs(hctx, &rq_list);
- blk_mq_dispatch_rq_list(q, &rq_list, false);
+ dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
+ }
+
+ if (dispatch_ret) {

So in this case, we will not call commit_rqs() to kick block driver
due to dispatch_ret == false, though we've dispatched one request to
block driver by blk_mq_dispatch_rq_list(), which will cause the IO
stuck.

+ if (hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)
+ hctx->queue->mq_ops->commit_rqs(hctx);
}
}


--
Baolin Wang

2020-03-23 08:30:55

by Ming Lei

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

On Mon, Mar 23, 2020 at 04:22:38PM +0800, Baolin Wang wrote:
> On Mon, Mar 23, 2020 at 3:27 PM Ming Lei <[email protected]> wrote:
> >
> > On Mon, Mar 23, 2020 at 01:36:34PM +0800, Baolin Wang wrote:
> > > On Mon, Mar 23, 2020 at 11:44 AM Ming Lei <[email protected]> wrote:
> > > >
> > > > On Fri, Mar 20, 2020 at 06:27:41PM +0800, Baolin Wang wrote:
> > > > > Hi Ming,
> > > > >
> > > > > On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <[email protected]> wrote:
> > > > > >
> > > > > > Hi Ming,
> > > > > >
> > > > > > On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
> > > > > > >
> > > > > > > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > > > > > > As we know, some SD/MMC host controllers can support packed request,
> > > > > > > > that means we can package several requests to host controller at one
> > > > > > > > time to improve performence. So the hardware driver expects the blk-mq
> > > > > > > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > > > > > > to indicate if it is the last request in the batch to help to combine
> > > > > > > > requests as much as possible.
> > > > > > > >
> > > > > > > > Thus we should add batch requests setting from the block driver to tell
> > > > > > > > the scheduler how many requests can be dispatched in a batch, as well
> > > > > > > > as changing the scheduler to dispatch more than one request if setting
> > > > > > > > the maximum batch requests number.
> > > > > > > >
> > > > > > >
> > > > > > > I feel this batch dispatch style is more complicated, and some other
> > > > > > > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > > > > > > .queue_rq().
> > > > > > >
> > > > > > > So what about the following way by extending .commit_rqs() to this usage?
> > > > > > > And you can do whatever batch processing in .commit_rqs() which will be
> > > > > > > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
> > > > > >
> > > > > > I'm very appreciated for your good suggestion, which is much simpler than mine.
> > > > > > It seems to solve my problem, and I will try it on my platform to see
> > > > > > if it can work and give you the feadback. Thanks again.
> > > > >
> > > > > I tried your approach on my platform, but met some problems, see below.
> > > > >
> > > > > >
> > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > > > > > index 856356b1619e..cd2bbe56f83f 100644
> > > > > > > --- a/block/blk-mq-sched.c
> > > > > > > +++ b/block/blk-mq-sched.c
> > > > > > > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > */
> > > > > > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > {
> > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > LIST_HEAD(rq_list);
> > > > > > > + bool ret = false;
> > > > > > >
> > > > > > > do {
> > > > > > > struct request *rq;
> > > > > > > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > * in blk_mq_dispatch_rq_list().
> > > > > > > */
> > > > > > > list_add(&rq->queuelist, &rq_list);
> > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > + } while (ret);
> > > > > > > +
> > > > > > > + return ret;
> > > > > > > }
> > > > > > >
> > > > > > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > */
> > > > > > > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > {
> > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > LIST_HEAD(rq_list);
> > > > > > > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > > > > > > + bool ret = false;
> > > > > > >
> > > > > > > do {
> > > > > > > struct request *rq;
> > > > > > > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > >
> > > > > > > /* round robin for fair dispatch */
> > > > > > > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > > > > > > -
> > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > + } while (ret);
> > > > > > >
> > > > > > > WRITE_ONCE(hctx->dispatch_from, ctx);
> > > > > > > +
> > > > > > > + return ret;
> > > > > > > }
> > > > > > >
> > > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > > > LIST_HEAD(rq_list);
> > > > > > > + bool dispatch_ret;
> > > > > > >
> > > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > */
> > > > > > > if (!list_empty(&rq_list)) {
> > > > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > > > > + if (dispatch_ret) {
> > > > > > > if (has_sched_dispatch)
> > > > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > > >
> > > > > If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
> > > > > and got dispatch_ret = true now. Then we will try to dispatch more
> > > > > reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
> > > > > more requests in scheduler, then we will got dispatch_ret = false. In
> > > >
> > > > 'dispatch_ret' always holds result of the last blk_mq_do_dispatch_sched().
> > > > When any one request has been dispatched successfully, 'dispatch_ret'
> > > > is true. New request is always added to list before calling
> > > > blk_mq_do_dispatch_sched(), so once blk_mq_do_dispatch_sched() returns
> > > > false, it means that .commit_rqs() has been called.
> > >
> > > Not really, if no requests int the IO cheduler, we will break the loop
> > > in blk_mq_do_dispatch_sched() and return false without calling
> > > .commit_rqs().
> >
> > If there isn't any request to dispatch, false is returned. Otherwise,
> > always return the return value of last 'blk_mq_dispatch_rq_list'.
> >
> > >
> > > So in this case, blk_mq_do_dispatch_sched() will return 'false', which
> > > overlapped the return value of 'true' from blk_mq_dispatch_rq_list(),
> > > and did not call .commit_rqs(). Then the IO processing will be stuck.
> >
> > See below.
> >
> > >
> > > static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > {
> > > struct request_queue *q = hctx->queue;
> > > struct elevator_queue *e = q->elevator;
> > > LIST_HEAD(rq_list);
> > > bool ret = false;
> >
> > The above initialization is just done once.
> >
> > >
> > > do {
> > > struct request *rq;
> > >
> > > if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
> > > break;
> > >
> > > .......
> > ret = blk_mq_dispatch_rq_list(q, list, ...);
> >
> > list includes one request, so blk_mq_dispatch_rq_list() won't return
> > false in case of no request in list.
> >
> > > } while (ret);
> > >
> > > return ret;
> >
> > 'ret' is always updated by return value of last blk_mq_dispatch_rq_list()
> > if at least one request is dispatched. So if it becomes false, the loop
> > breaks, that means .commit_rqs() has been called cause 'list' does
> > include one request for blk_mq_dispatch_rq_list(). Otherwise, true is
> > still returned.
>
> Sorry for my confusing description, let me try again to describe the problem.
> When I try to mount the block device, I got the IO stuck with your
> patch, and I did some debugging. I found we missed calling
> commit_rqs() for one case:
>
> void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct
> blk_mq_hw_ctx *hctx)
> struct elevator_queue *e = q->elevator;
> const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> LIST_HEAD(rq_list);
> + bool dispatch_ret;
>
> /* RCU or SRCU read lock is needed before checking quiesced flag */
> if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct
> blk_mq_hw_ctx *hctx)
> */
> if (!list_empty(&rq_list)) {
> blk_mq_sched_mark_restart_hctx(hctx);
> - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
>
> Suppose we dispatch one request to block driver, and return 'true' here.
>
> + if (dispatch_ret) {
> if (has_sched_dispatch)
> - blk_mq_do_dispatch_sched(hctx);
> + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
>
> Then we will continue to try to dispatch more requests from IO
> scheduler, but if there are no requests in IO scheduler now, it will
> return 'false' here, and set dispatch_ret as false.
>
> else
> - blk_mq_do_dispatch_ctx(hctx);
> + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);

OK, this one is an issue, but it can be fixed simply by not updating
'dispatch_ret' for the following dispatch, something like the below
way:

if (dispatch_ret) {
if (has_sched_dispatch)
blk_mq_do_dispatch_sched(hctx);
else
blk_mq_do_dispatch_ctx(hctx);
}


Thanks,
Ming

2020-03-23 09:14:27

by Baolin Wang

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

On Mon, Mar 23, 2020 at 4:29 PM Ming Lei <[email protected]> wrote:
>
> On Mon, Mar 23, 2020 at 04:22:38PM +0800, Baolin Wang wrote:
> > On Mon, Mar 23, 2020 at 3:27 PM Ming Lei <[email protected]> wrote:
> > >
> > > On Mon, Mar 23, 2020 at 01:36:34PM +0800, Baolin Wang wrote:
> > > > On Mon, Mar 23, 2020 at 11:44 AM Ming Lei <[email protected]> wrote:
> > > > >
> > > > > On Fri, Mar 20, 2020 at 06:27:41PM +0800, Baolin Wang wrote:
> > > > > > Hi Ming,
> > > > > >
> > > > > > On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <[email protected]> wrote:
> > > > > > >
> > > > > > > Hi Ming,
> > > > > > >
> > > > > > > On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > > > > > > > As we know, some SD/MMC host controllers can support packed request,
> > > > > > > > > that means we can package several requests to host controller at one
> > > > > > > > > time to improve performence. So the hardware driver expects the blk-mq
> > > > > > > > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > > > > > > > to indicate if it is the last request in the batch to help to combine
> > > > > > > > > requests as much as possible.
> > > > > > > > >
> > > > > > > > > Thus we should add batch requests setting from the block driver to tell
> > > > > > > > > the scheduler how many requests can be dispatched in a batch, as well
> > > > > > > > > as changing the scheduler to dispatch more than one request if setting
> > > > > > > > > the maximum batch requests number.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I feel this batch dispatch style is more complicated, and some other
> > > > > > > > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > > > > > > > .queue_rq().
> > > > > > > >
> > > > > > > > So what about the following way by extending .commit_rqs() to this usage?
> > > > > > > > And you can do whatever batch processing in .commit_rqs() which will be
> > > > > > > > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
> > > > > > >
> > > > > > > I'm very appreciated for your good suggestion, which is much simpler than mine.
> > > > > > > It seems to solve my problem, and I will try it on my platform to see
> > > > > > > if it can work and give you the feadback. Thanks again.
> > > > > >
> > > > > > I tried your approach on my platform, but met some problems, see below.
> > > > > >
> > > > > > >
> > > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > > > > > > index 856356b1619e..cd2bbe56f83f 100644
> > > > > > > > --- a/block/blk-mq-sched.c
> > > > > > > > +++ b/block/blk-mq-sched.c
> > > > > > > > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > */
> > > > > > > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > {
> > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > + bool ret = false;
> > > > > > > >
> > > > > > > > do {
> > > > > > > > struct request *rq;
> > > > > > > > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > * in blk_mq_dispatch_rq_list().
> > > > > > > > */
> > > > > > > > list_add(&rq->queuelist, &rq_list);
> > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > + } while (ret);
> > > > > > > > +
> > > > > > > > + return ret;
> > > > > > > > }
> > > > > > > >
> > > > > > > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > */
> > > > > > > > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > {
> > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > > > > > > > + bool ret = false;
> > > > > > > >
> > > > > > > > do {
> > > > > > > > struct request *rq;
> > > > > > > > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > >
> > > > > > > > /* round robin for fair dispatch */
> > > > > > > > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > > > > > > > -
> > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > + } while (ret);
> > > > > > > >
> > > > > > > > WRITE_ONCE(hctx->dispatch_from, ctx);
> > > > > > > > +
> > > > > > > > + return ret;
> > > > > > > > }
> > > > > > > >
> > > > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > + bool dispatch_ret;
> > > > > > > >
> > > > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > */
> > > > > > > > if (!list_empty(&rq_list)) {
> > > > > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > > > > > + if (dispatch_ret) {
> > > > > > > > if (has_sched_dispatch)
> > > > > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > > > >
> > > > > > If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
> > > > > > and got dispatch_ret = true now. Then we will try to dispatch more
> > > > > > reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
> > > > > > more requests in scheduler, then we will got dispatch_ret = false. In
> > > > >
> > > > > 'dispatch_ret' always holds result of the last blk_mq_do_dispatch_sched().
> > > > > When any one request has been dispatched successfully, 'dispatch_ret'
> > > > > is true. New request is always added to list before calling
> > > > > blk_mq_do_dispatch_sched(), so once blk_mq_do_dispatch_sched() returns
> > > > > false, it means that .commit_rqs() has been called.
> > > >
> > > > Not really, if no requests int the IO cheduler, we will break the loop
> > > > in blk_mq_do_dispatch_sched() and return false without calling
> > > > .commit_rqs().
> > >
> > > If there isn't any request to dispatch, false is returned. Otherwise,
> > > always return the return value of last 'blk_mq_dispatch_rq_list'.
> > >
> > > >
> > > > So in this case, blk_mq_do_dispatch_sched() will return 'false', which
> > > > overlapped the return value of 'true' from blk_mq_dispatch_rq_list(),
> > > > and did not call .commit_rqs(). Then the IO processing will be stuck.
> > >
> > > See below.
> > >
> > > >
> > > > static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > {
> > > > struct request_queue *q = hctx->queue;
> > > > struct elevator_queue *e = q->elevator;
> > > > LIST_HEAD(rq_list);
> > > > bool ret = false;
> > >
> > > The above initialization is just done once.
> > >
> > > >
> > > > do {
> > > > struct request *rq;
> > > >
> > > > if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
> > > > break;
> > > >
> > > > .......
> > > ret = blk_mq_dispatch_rq_list(q, list, ...);
> > >
> > > list includes one request, so blk_mq_dispatch_rq_list() won't return
> > > false in case of no request in list.
> > >
> > > > } while (ret);
> > > >
> > > > return ret;
> > >
> > > 'ret' is always updated by return value of last blk_mq_dispatch_rq_list()
> > > if at least one request is dispatched. So if it becomes false, the loop
> > > breaks, that means .commit_rqs() has been called cause 'list' does
> > > include one request for blk_mq_dispatch_rq_list(). Otherwise, true is
> > > still returned.
> >
> > Sorry for my confusing description, let me try again to describe the problem.
> > When I try to mount the block device, I got the IO stuck with your
> > patch, and I did some debugging. I found we missed calling
> > commit_rqs() for one case:
> >
> > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct
> > blk_mq_hw_ctx *hctx)
> > struct elevator_queue *e = q->elevator;
> > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > LIST_HEAD(rq_list);
> > + bool dispatch_ret;
> >
> > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct
> > blk_mq_hw_ctx *hctx)
> > */
> > if (!list_empty(&rq_list)) {
> > blk_mq_sched_mark_restart_hctx(hctx);
> > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> >
> > Suppose we dispatch one request to block driver, and return 'true' here.
> >
> > + if (dispatch_ret) {
> > if (has_sched_dispatch)
> > - blk_mq_do_dispatch_sched(hctx);
> > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> >
> > Then we will continue to try to dispatch more requests from IO
> > scheduler, but if there are no requests in IO scheduler now, it will
> > return 'false' here, and set dispatch_ret as false.
> >
> > else
> > - blk_mq_do_dispatch_ctx(hctx);
> > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
>
> OK, this one is an issue, but it can be fixed simply by not updating
> 'dispatch_ret' for the following dispatch, something like the below
> way:
>
> if (dispatch_ret) {
> if (has_sched_dispatch)
> blk_mq_do_dispatch_sched(hctx);
> else
> blk_mq_do_dispatch_ctx(hctx);
> }

Yes, this can work.

But I found your patch will drop some performance comparing with my
method in patch 1/2. My method can fetch several requests from IO
scheduler and dispatch them to block driver at one time, but in your
patch we still need dispatch request one by one, which will drop some
performance I think.
What do you think? Thanks.

--
Baolin Wang

2020-03-23 09:59:35

by Ming Lei

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

On Mon, Mar 23, 2020 at 05:13:27PM +0800, Baolin Wang wrote:
> On Mon, Mar 23, 2020 at 4:29 PM Ming Lei <[email protected]> wrote:
> >
> > On Mon, Mar 23, 2020 at 04:22:38PM +0800, Baolin Wang wrote:
> > > On Mon, Mar 23, 2020 at 3:27 PM Ming Lei <[email protected]> wrote:
> > > >
> > > > On Mon, Mar 23, 2020 at 01:36:34PM +0800, Baolin Wang wrote:
> > > > > On Mon, Mar 23, 2020 at 11:44 AM Ming Lei <[email protected]> wrote:
> > > > > >
> > > > > > On Fri, Mar 20, 2020 at 06:27:41PM +0800, Baolin Wang wrote:
> > > > > > > Hi Ming,
> > > > > > >
> > > > > > > On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <[email protected]> wrote:
> > > > > > > >
> > > > > > > > Hi Ming,
> > > > > > > >
> > > > > > > > On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > > > > > > > > As we know, some SD/MMC host controllers can support packed request,
> > > > > > > > > > that means we can package several requests to host controller at one
> > > > > > > > > > time to improve performence. So the hardware driver expects the blk-mq
> > > > > > > > > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > > > > > > > > to indicate if it is the last request in the batch to help to combine
> > > > > > > > > > requests as much as possible.
> > > > > > > > > >
> > > > > > > > > > Thus we should add batch requests setting from the block driver to tell
> > > > > > > > > > the scheduler how many requests can be dispatched in a batch, as well
> > > > > > > > > > as changing the scheduler to dispatch more than one request if setting
> > > > > > > > > > the maximum batch requests number.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I feel this batch dispatch style is more complicated, and some other
> > > > > > > > > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > > > > > > > > .queue_rq().
> > > > > > > > >
> > > > > > > > > So what about the following way by extending .commit_rqs() to this usage?
> > > > > > > > > And you can do whatever batch processing in .commit_rqs() which will be
> > > > > > > > > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
> > > > > > > >
> > > > > > > > I'm very appreciated for your good suggestion, which is much simpler than mine.
> > > > > > > > It seems to solve my problem, and I will try it on my platform to see
> > > > > > > > if it can work and give you the feadback. Thanks again.
> > > > > > >
> > > > > > > I tried your approach on my platform, but met some problems, see below.
> > > > > > >
> > > > > > > >
> > > > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > > > > > > > index 856356b1619e..cd2bbe56f83f 100644
> > > > > > > > > --- a/block/blk-mq-sched.c
> > > > > > > > > +++ b/block/blk-mq-sched.c
> > > > > > > > > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > > */
> > > > > > > > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > {
> > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > + bool ret = false;
> > > > > > > > >
> > > > > > > > > do {
> > > > > > > > > struct request *rq;
> > > > > > > > > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > * in blk_mq_dispatch_rq_list().
> > > > > > > > > */
> > > > > > > > > list_add(&rq->queuelist, &rq_list);
> > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > > + } while (ret);
> > > > > > > > > +
> > > > > > > > > + return ret;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > > */
> > > > > > > > > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > {
> > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > > > > > > > > + bool ret = false;
> > > > > > > > >
> > > > > > > > > do {
> > > > > > > > > struct request *rq;
> > > > > > > > > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > >
> > > > > > > > > /* round robin for fair dispatch */
> > > > > > > > > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > > > > > > > > -
> > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > > + } while (ret);
> > > > > > > > >
> > > > > > > > > WRITE_ONCE(hctx->dispatch_from, ctx);
> > > > > > > > > +
> > > > > > > > > + return ret;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > + bool dispatch_ret;
> > > > > > > > >
> > > > > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > */
> > > > > > > > > if (!list_empty(&rq_list)) {
> > > > > > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > > > > > > + if (dispatch_ret) {
> > > > > > > > > if (has_sched_dispatch)
> > > > > > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > > > > >
> > > > > > > If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
> > > > > > > and got dispatch_ret = true now. Then we will try to dispatch more
> > > > > > > reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
> > > > > > > more requests in scheduler, then we will got dispatch_ret = false. In
> > > > > >
> > > > > > 'dispatch_ret' always holds result of the last blk_mq_do_dispatch_sched().
> > > > > > When any one request has been dispatched successfully, 'dispatch_ret'
> > > > > > is true. New request is always added to list before calling
> > > > > > blk_mq_do_dispatch_sched(), so once blk_mq_do_dispatch_sched() returns
> > > > > > false, it means that .commit_rqs() has been called.
> > > > >
> > > > > Not really, if no requests int the IO cheduler, we will break the loop
> > > > > in blk_mq_do_dispatch_sched() and return false without calling
> > > > > .commit_rqs().
> > > >
> > > > If there isn't any request to dispatch, false is returned. Otherwise,
> > > > always return the return value of last 'blk_mq_dispatch_rq_list'.
> > > >
> > > > >
> > > > > So in this case, blk_mq_do_dispatch_sched() will return 'false', which
> > > > > overlapped the return value of 'true' from blk_mq_dispatch_rq_list(),
> > > > > and did not call .commit_rqs(). Then the IO processing will be stuck.
> > > >
> > > > See below.
> > > >
> > > > >
> > > > > static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > {
> > > > > struct request_queue *q = hctx->queue;
> > > > > struct elevator_queue *e = q->elevator;
> > > > > LIST_HEAD(rq_list);
> > > > > bool ret = false;
> > > >
> > > > The above initialization is just done once.
> > > >
> > > > >
> > > > > do {
> > > > > struct request *rq;
> > > > >
> > > > > if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
> > > > > break;
> > > > >
> > > > > .......
> > > > ret = blk_mq_dispatch_rq_list(q, list, ...);
> > > >
> > > > list includes one request, so blk_mq_dispatch_rq_list() won't return
> > > > false in case of no request in list.
> > > >
> > > > > } while (ret);
> > > > >
> > > > > return ret;
> > > >
> > > > 'ret' is always updated by return value of last blk_mq_dispatch_rq_list()
> > > > if at least one request is dispatched. So if it becomes false, the loop
> > > > breaks, that means .commit_rqs() has been called cause 'list' does
> > > > include one request for blk_mq_dispatch_rq_list(). Otherwise, true is
> > > > still returned.
> > >
> > > Sorry for my confusing description, let me try again to describe the problem.
> > > When I try to mount the block device, I got the IO stuck with your
> > > patch, and I did some debugging. I found we missed calling
> > > commit_rqs() for one case:
> > >
> > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct
> > > blk_mq_hw_ctx *hctx)
> > > struct elevator_queue *e = q->elevator;
> > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > LIST_HEAD(rq_list);
> > > + bool dispatch_ret;
> > >
> > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct
> > > blk_mq_hw_ctx *hctx)
> > > */
> > > if (!list_empty(&rq_list)) {
> > > blk_mq_sched_mark_restart_hctx(hctx);
> > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > >
> > > Suppose we dispatch one request to block driver, and return 'true' here.
> > >
> > > + if (dispatch_ret) {
> > > if (has_sched_dispatch)
> > > - blk_mq_do_dispatch_sched(hctx);
> > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > >
> > > Then we will continue to try to dispatch more requests from IO
> > > scheduler, but if there are no requests in IO scheduler now, it will
> > > return 'false' here, and set dispatch_ret as false.
> > >
> > > else
> > > - blk_mq_do_dispatch_ctx(hctx);
> > > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> >
> > OK, this one is an issue, but it can be fixed simply by not updating
> > 'dispatch_ret' for the following dispatch, something like the below
> > way:
> >
> > if (dispatch_ret) {
> > if (has_sched_dispatch)
> > blk_mq_do_dispatch_sched(hctx);
> > else
> > blk_mq_do_dispatch_ctx(hctx);
> > }
>
> Yes, this can work.
>
> But I found your patch will drop some performance comparing with my
> method in patch 1/2. My method can fetch several requests from IO
> scheduler and dispatch them to block driver at one time, but in your
> patch we still need dispatch request one by one, which will drop some
> performance I think.
> What do you think? Thanks.

Please run your test and see if performance drop can be observed.

Thanks,
Ming

2020-03-24 08:30:28

by Baolin Wang

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

Hi Ming,

On Mon, Mar 23, 2020 at 5:58 PM Ming Lei <[email protected]> wrote:
>
> On Mon, Mar 23, 2020 at 05:13:27PM +0800, Baolin Wang wrote:
> > On Mon, Mar 23, 2020 at 4:29 PM Ming Lei <[email protected]> wrote:
> > >
> > > On Mon, Mar 23, 2020 at 04:22:38PM +0800, Baolin Wang wrote:
> > > > On Mon, Mar 23, 2020 at 3:27 PM Ming Lei <[email protected]> wrote:
> > > > >
> > > > > On Mon, Mar 23, 2020 at 01:36:34PM +0800, Baolin Wang wrote:
> > > > > > On Mon, Mar 23, 2020 at 11:44 AM Ming Lei <[email protected]> wrote:
> > > > > > >
> > > > > > > On Fri, Mar 20, 2020 at 06:27:41PM +0800, Baolin Wang wrote:
> > > > > > > > Hi Ming,
> > > > > > > >
> > > > > > > > On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > Hi Ming,
> > > > > > > > >
> > > > > > > > > On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > > > > > > > > > As we know, some SD/MMC host controllers can support packed request,
> > > > > > > > > > > that means we can package several requests to host controller at one
> > > > > > > > > > > time to improve performence. So the hardware driver expects the blk-mq
> > > > > > > > > > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > > > > > > > > > to indicate if it is the last request in the batch to help to combine
> > > > > > > > > > > requests as much as possible.
> > > > > > > > > > >
> > > > > > > > > > > Thus we should add batch requests setting from the block driver to tell
> > > > > > > > > > > the scheduler how many requests can be dispatched in a batch, as well
> > > > > > > > > > > as changing the scheduler to dispatch more than one request if setting
> > > > > > > > > > > the maximum batch requests number.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I feel this batch dispatch style is more complicated, and some other
> > > > > > > > > > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > > > > > > > > > .queue_rq().
> > > > > > > > > >
> > > > > > > > > > So what about the following way by extending .commit_rqs() to this usage?
> > > > > > > > > > And you can do whatever batch processing in .commit_rqs() which will be
> > > > > > > > > > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
> > > > > > > > >
> > > > > > > > > I'm very appreciated for your good suggestion, which is much simpler than mine.
> > > > > > > > > It seems to solve my problem, and I will try it on my platform to see
> > > > > > > > > if it can work and give you the feadback. Thanks again.
> > > > > > > >
> > > > > > > > I tried your approach on my platform, but met some problems, see below.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > > > > > > > > index 856356b1619e..cd2bbe56f83f 100644
> > > > > > > > > > --- a/block/blk-mq-sched.c
> > > > > > > > > > +++ b/block/blk-mq-sched.c
> > > > > > > > > > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > > > */
> > > > > > > > > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > {
> > > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > + bool ret = false;
> > > > > > > > > >
> > > > > > > > > > do {
> > > > > > > > > > struct request *rq;
> > > > > > > > > > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > * in blk_mq_dispatch_rq_list().
> > > > > > > > > > */
> > > > > > > > > > list_add(&rq->queuelist, &rq_list);
> > > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > > > + } while (ret);
> > > > > > > > > > +
> > > > > > > > > > + return ret;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > > > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > > > */
> > > > > > > > > > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > {
> > > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > > > > > > > > > + bool ret = false;
> > > > > > > > > >
> > > > > > > > > > do {
> > > > > > > > > > struct request *rq;
> > > > > > > > > > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > >
> > > > > > > > > > /* round robin for fair dispatch */
> > > > > > > > > > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > > > > > > > > > -
> > > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > > > + } while (ret);
> > > > > > > > > >
> > > > > > > > > > WRITE_ONCE(hctx->dispatch_from, ctx);
> > > > > > > > > > +
> > > > > > > > > > + return ret;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > + bool dispatch_ret;
> > > > > > > > > >
> > > > > > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > */
> > > > > > > > > > if (!list_empty(&rq_list)) {
> > > > > > > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > > > > > > > + if (dispatch_ret) {
> > > > > > > > > > if (has_sched_dispatch)
> > > > > > > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > > > > > >
> > > > > > > > If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
> > > > > > > > and got dispatch_ret = true now. Then we will try to dispatch more
> > > > > > > > reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
> > > > > > > > more requests in scheduler, then we will got dispatch_ret = false. In
> > > > > > >
> > > > > > > 'dispatch_ret' always holds result of the last blk_mq_do_dispatch_sched().
> > > > > > > When any one request has been dispatched successfully, 'dispatch_ret'
> > > > > > > is true. New request is always added to list before calling
> > > > > > > blk_mq_do_dispatch_sched(), so once blk_mq_do_dispatch_sched() returns
> > > > > > > false, it means that .commit_rqs() has been called.
> > > > > >
> > > > > > Not really, if no requests int the IO cheduler, we will break the loop
> > > > > > in blk_mq_do_dispatch_sched() and return false without calling
> > > > > > .commit_rqs().
> > > > >
> > > > > If there isn't any request to dispatch, false is returned. Otherwise,
> > > > > always return the return value of last 'blk_mq_dispatch_rq_list'.
> > > > >
> > > > > >
> > > > > > So in this case, blk_mq_do_dispatch_sched() will return 'false', which
> > > > > > overlapped the return value of 'true' from blk_mq_dispatch_rq_list(),
> > > > > > and did not call .commit_rqs(). Then the IO processing will be stuck.
> > > > >
> > > > > See below.
> > > > >
> > > > > >
> > > > > > static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > {
> > > > > > struct request_queue *q = hctx->queue;
> > > > > > struct elevator_queue *e = q->elevator;
> > > > > > LIST_HEAD(rq_list);
> > > > > > bool ret = false;
> > > > >
> > > > > The above initialization is just done once.
> > > > >
> > > > > >
> > > > > > do {
> > > > > > struct request *rq;
> > > > > >
> > > > > > if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
> > > > > > break;
> > > > > >
> > > > > > .......
> > > > > ret = blk_mq_dispatch_rq_list(q, list, ...);
> > > > >
> > > > > list includes one request, so blk_mq_dispatch_rq_list() won't return
> > > > > false in case of no request in list.
> > > > >
> > > > > > } while (ret);
> > > > > >
> > > > > > return ret;
> > > > >
> > > > > 'ret' is always updated by return value of last blk_mq_dispatch_rq_list()
> > > > > if at least one request is dispatched. So if it becomes false, the loop
> > > > > breaks, that means .commit_rqs() has been called cause 'list' does
> > > > > include one request for blk_mq_dispatch_rq_list(). Otherwise, true is
> > > > > still returned.
> > > >
> > > > Sorry for my confusing description, let me try again to describe the problem.
> > > > When I try to mount the block device, I got the IO stuck with your
> > > > patch, and I did some debugging. I found we missed calling
> > > > commit_rqs() for one case:
> > > >
> > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct
> > > > blk_mq_hw_ctx *hctx)
> > > > struct elevator_queue *e = q->elevator;
> > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > LIST_HEAD(rq_list);
> > > > + bool dispatch_ret;
> > > >
> > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct
> > > > blk_mq_hw_ctx *hctx)
> > > > */
> > > > if (!list_empty(&rq_list)) {
> > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > >
> > > > Suppose we dispatch one request to block driver, and return 'true' here.
> > > >
> > > > + if (dispatch_ret) {
> > > > if (has_sched_dispatch)
> > > > - blk_mq_do_dispatch_sched(hctx);
> > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > >
> > > > Then we will continue to try to dispatch more requests from IO
> > > > scheduler, but if there are no requests in IO scheduler now, it will
> > > > return 'false' here, and set dispatch_ret as false.
> > > >
> > > > else
> > > > - blk_mq_do_dispatch_ctx(hctx);
> > > > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> > >
> > > OK, this one is an issue, but it can be fixed simply by not updating
> > > 'dispatch_ret' for the following dispatch, something like the below
> > > way:
> > >
> > > if (dispatch_ret) {
> > > if (has_sched_dispatch)
> > > blk_mq_do_dispatch_sched(hctx);
> > > else
> > > blk_mq_do_dispatch_ctx(hctx);
> > > }
> >
> > Yes, this can work.
> >
> > But I found your patch will drop some performance comparing with my
> > method in patch 1/2. My method can fetch several requests from IO
> > scheduler and dispatch them to block driver at one time, but in your
> > patch we still need dispatch request one by one, which will drop some
> > performance I think.
> > What do you think? Thanks.
>
> Please run your test and see if performance drop can be observed.

From my testing (using the same fio configuration in cover letter), I
found your method will drop some performance from below data.

My original patches:
Sequential read: 229.6MiB/s
Random read:180.8MiB/s
Sequential write: 172MiB/s
Random write:169.2MiB/s

Your patches:
Sequential read: 209MiB/s
Random read:177MiB/s
Sequential write: 148MiB/s
Random write:147MiB/s

--
Baolin Wang

2020-03-27 08:31:26

by Baolin Wang

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

Hi Ming,

On Tue, Mar 24, 2020 at 4:29 PM Baolin Wang <[email protected]> wrote:
>
> Hi Ming,
>
> On Mon, Mar 23, 2020 at 5:58 PM Ming Lei <[email protected]> wrote:
> >
> > On Mon, Mar 23, 2020 at 05:13:27PM +0800, Baolin Wang wrote:
> > > On Mon, Mar 23, 2020 at 4:29 PM Ming Lei <[email protected]> wrote:
> > > >
> > > > On Mon, Mar 23, 2020 at 04:22:38PM +0800, Baolin Wang wrote:
> > > > > On Mon, Mar 23, 2020 at 3:27 PM Ming Lei <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Mar 23, 2020 at 01:36:34PM +0800, Baolin Wang wrote:
> > > > > > > On Mon, Mar 23, 2020 at 11:44 AM Ming Lei <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Fri, Mar 20, 2020 at 06:27:41PM +0800, Baolin Wang wrote:
> > > > > > > > > Hi Ming,
> > > > > > > > >
> > > > > > > > > On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Ming,
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > > > > > > > > > > As we know, some SD/MMC host controllers can support packed request,
> > > > > > > > > > > > that means we can package several requests to host controller at one
> > > > > > > > > > > > time to improve performence. So the hardware driver expects the blk-mq
> > > > > > > > > > > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > > > > > > > > > > to indicate if it is the last request in the batch to help to combine
> > > > > > > > > > > > requests as much as possible.
> > > > > > > > > > > >
> > > > > > > > > > > > Thus we should add batch requests setting from the block driver to tell
> > > > > > > > > > > > the scheduler how many requests can be dispatched in a batch, as well
> > > > > > > > > > > > as changing the scheduler to dispatch more than one request if setting
> > > > > > > > > > > > the maximum batch requests number.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I feel this batch dispatch style is more complicated, and some other
> > > > > > > > > > > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > > > > > > > > > > .queue_rq().
> > > > > > > > > > >
> > > > > > > > > > > So what about the following way by extending .commit_rqs() to this usage?
> > > > > > > > > > > And you can do whatever batch processing in .commit_rqs() which will be
> > > > > > > > > > > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
> > > > > > > > > >
> > > > > > > > > > I'm very appreciated for your good suggestion, which is much simpler than mine.
> > > > > > > > > > It seems to solve my problem, and I will try it on my platform to see
> > > > > > > > > > if it can work and give you the feadback. Thanks again.
> > > > > > > > >
> > > > > > > > > I tried your approach on my platform, but met some problems, see below.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > > > > > > > > > index 856356b1619e..cd2bbe56f83f 100644
> > > > > > > > > > > --- a/block/blk-mq-sched.c
> > > > > > > > > > > +++ b/block/blk-mq-sched.c
> > > > > > > > > > > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > > > > */
> > > > > > > > > > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > {
> > > > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > > + bool ret = false;
> > > > > > > > > > >
> > > > > > > > > > > do {
> > > > > > > > > > > struct request *rq;
> > > > > > > > > > > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > * in blk_mq_dispatch_rq_list().
> > > > > > > > > > > */
> > > > > > > > > > > list_add(&rq->queuelist, &rq_list);
> > > > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > > > > + } while (ret);
> > > > > > > > > > > +
> > > > > > > > > > > + return ret;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > > > > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > > > > */
> > > > > > > > > > > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > {
> > > > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > > > > > > > > > > + bool ret = false;
> > > > > > > > > > >
> > > > > > > > > > > do {
> > > > > > > > > > > struct request *rq;
> > > > > > > > > > > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > >
> > > > > > > > > > > /* round robin for fair dispatch */
> > > > > > > > > > > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > > > > > > > > > > -
> > > > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > > > > + } while (ret);
> > > > > > > > > > >
> > > > > > > > > > > WRITE_ONCE(hctx->dispatch_from, ctx);
> > > > > > > > > > > +
> > > > > > > > > > > + return ret;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > > + bool dispatch_ret;
> > > > > > > > > > >
> > > > > > > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > > > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > > > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > */
> > > > > > > > > > > if (!list_empty(&rq_list)) {
> > > > > > > > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > > > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > > > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > > > > > > > > + if (dispatch_ret) {
> > > > > > > > > > > if (has_sched_dispatch)
> > > > > > > > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > > > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > > > > > > >
> > > > > > > > > If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
> > > > > > > > > and got dispatch_ret = true now. Then we will try to dispatch more
> > > > > > > > > reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
> > > > > > > > > more requests in scheduler, then we will got dispatch_ret = false. In
> > > > > > > >
> > > > > > > > 'dispatch_ret' always holds result of the last blk_mq_do_dispatch_sched().
> > > > > > > > When any one request has been dispatched successfully, 'dispatch_ret'
> > > > > > > > is true. New request is always added to list before calling
> > > > > > > > blk_mq_do_dispatch_sched(), so once blk_mq_do_dispatch_sched() returns
> > > > > > > > false, it means that .commit_rqs() has been called.
> > > > > > >
> > > > > > > Not really, if no requests int the IO cheduler, we will break the loop
> > > > > > > in blk_mq_do_dispatch_sched() and return false without calling
> > > > > > > .commit_rqs().
> > > > > >
> > > > > > If there isn't any request to dispatch, false is returned. Otherwise,
> > > > > > always return the return value of last 'blk_mq_dispatch_rq_list'.
> > > > > >
> > > > > > >
> > > > > > > So in this case, blk_mq_do_dispatch_sched() will return 'false', which
> > > > > > > overlapped the return value of 'true' from blk_mq_dispatch_rq_list(),
> > > > > > > and did not call .commit_rqs(). Then the IO processing will be stuck.
> > > > > >
> > > > > > See below.
> > > > > >
> > > > > > >
> > > > > > > static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > {
> > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > LIST_HEAD(rq_list);
> > > > > > > bool ret = false;
> > > > > >
> > > > > > The above initialization is just done once.
> > > > > >
> > > > > > >
> > > > > > > do {
> > > > > > > struct request *rq;
> > > > > > >
> > > > > > > if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
> > > > > > > break;
> > > > > > >
> > > > > > > .......
> > > > > > ret = blk_mq_dispatch_rq_list(q, list, ...);
> > > > > >
> > > > > > list includes one request, so blk_mq_dispatch_rq_list() won't return
> > > > > > false in case of no request in list.
> > > > > >
> > > > > > > } while (ret);
> > > > > > >
> > > > > > > return ret;
> > > > > >
> > > > > > 'ret' is always updated by return value of last blk_mq_dispatch_rq_list()
> > > > > > if at least one request is dispatched. So if it becomes false, the loop
> > > > > > breaks, that means .commit_rqs() has been called cause 'list' does
> > > > > > include one request for blk_mq_dispatch_rq_list(). Otherwise, true is
> > > > > > still returned.
> > > > >
> > > > > Sorry for my confusing description, let me try again to describe the problem.
> > > > > When I try to mount the block device, I got the IO stuck with your
> > > > > patch, and I did some debugging. I found we missed calling
> > > > > commit_rqs() for one case:
> > > > >
> > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct
> > > > > blk_mq_hw_ctx *hctx)
> > > > > struct elevator_queue *e = q->elevator;
> > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > LIST_HEAD(rq_list);
> > > > > + bool dispatch_ret;
> > > > >
> > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct
> > > > > blk_mq_hw_ctx *hctx)
> > > > > */
> > > > > if (!list_empty(&rq_list)) {
> > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > >
> > > > > Suppose we dispatch one request to block driver, and return 'true' here.
> > > > >
> > > > > + if (dispatch_ret) {
> > > > > if (has_sched_dispatch)
> > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > > >
> > > > > Then we will continue to try to dispatch more requests from IO
> > > > > scheduler, but if there are no requests in IO scheduler now, it will
> > > > > return 'false' here, and set dispatch_ret as false.
> > > > >
> > > > > else
> > > > > - blk_mq_do_dispatch_ctx(hctx);
> > > > > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> > > >
> > > > OK, this one is an issue, but it can be fixed simply by not updating
> > > > 'dispatch_ret' for the following dispatch, something like the below
> > > > way:
> > > >
> > > > if (dispatch_ret) {
> > > > if (has_sched_dispatch)
> > > > blk_mq_do_dispatch_sched(hctx);
> > > > else
> > > > blk_mq_do_dispatch_ctx(hctx);
> > > > }
> > >
> > > Yes, this can work.
> > >
> > > But I found your patch will drop some performance comparing with my
> > > method in patch 1/2. My method can fetch several requests from IO
> > > scheduler and dispatch them to block driver at one time, but in your
> > > patch we still need dispatch request one by one, which will drop some
> > > performance I think.
> > > What do you think? Thanks.
> >
> > Please run your test and see if performance drop can be observed.
>
> From my testing (using the same fio configuration in cover letter), I
> found your method will drop some performance from below data.
>
> My original patches:
> Sequential read: 229.6MiB/s
> Random read:180.8MiB/s
> Sequential write: 172MiB/s
> Random write:169.2MiB/s
>
> Your patches:
> Sequential read: 209MiB/s
> Random read:177MiB/s
> Sequential write: 148MiB/s
> Random write:147MiB/s

After some optimiziton and I did more testing, I did not found any
performance issue with your patch comparing with my old method. Sorry
for noise in my last email.

So will you send out a formal patch? If yes, please add my test-by
tag. Thanks for your help.
Tested-by: Baolin Wang <[email protected]>

--
Baolin Wang

2020-04-22 09:23:28

by Baolin Wang

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

Hi Ming,

On Fri, Mar 27, 2020 at 4:30 PM Baolin Wang <[email protected]> wrote:
>
> Hi Ming,
>
> On Tue, Mar 24, 2020 at 4:29 PM Baolin Wang <[email protected]> wrote:
> >
> > Hi Ming,
> >
> > On Mon, Mar 23, 2020 at 5:58 PM Ming Lei <[email protected]> wrote:
> > >
> > > On Mon, Mar 23, 2020 at 05:13:27PM +0800, Baolin Wang wrote:
> > > > On Mon, Mar 23, 2020 at 4:29 PM Ming Lei <[email protected]> wrote:
> > > > >
> > > > > On Mon, Mar 23, 2020 at 04:22:38PM +0800, Baolin Wang wrote:
> > > > > > On Mon, Mar 23, 2020 at 3:27 PM Ming Lei <[email protected]> wrote:
> > > > > > >
> > > > > > > On Mon, Mar 23, 2020 at 01:36:34PM +0800, Baolin Wang wrote:
> > > > > > > > On Mon, Mar 23, 2020 at 11:44 AM Ming Lei <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Fri, Mar 20, 2020 at 06:27:41PM +0800, Baolin Wang wrote:
> > > > > > > > > > Hi Ming,
> > > > > > > > > >
> > > > > > > > > > On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Ming,
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > > > > > > > > > > > As we know, some SD/MMC host controllers can support packed request,
> > > > > > > > > > > > > that means we can package several requests to host controller at one
> > > > > > > > > > > > > time to improve performence. So the hardware driver expects the blk-mq
> > > > > > > > > > > > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > > > > > > > > > > > to indicate if it is the last request in the batch to help to combine
> > > > > > > > > > > > > requests as much as possible.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thus we should add batch requests setting from the block driver to tell
> > > > > > > > > > > > > the scheduler how many requests can be dispatched in a batch, as well
> > > > > > > > > > > > > as changing the scheduler to dispatch more than one request if setting
> > > > > > > > > > > > > the maximum batch requests number.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I feel this batch dispatch style is more complicated, and some other
> > > > > > > > > > > > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > > > > > > > > > > > .queue_rq().
> > > > > > > > > > > >
> > > > > > > > > > > > So what about the following way by extending .commit_rqs() to this usage?
> > > > > > > > > > > > And you can do whatever batch processing in .commit_rqs() which will be
> > > > > > > > > > > > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
> > > > > > > > > > >
> > > > > > > > > > > I'm very appreciated for your good suggestion, which is much simpler than mine.
> > > > > > > > > > > It seems to solve my problem, and I will try it on my platform to see
> > > > > > > > > > > if it can work and give you the feadback. Thanks again.
> > > > > > > > > >
> > > > > > > > > > I tried your approach on my platform, but met some problems, see below.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > > > > > > > > > > index 856356b1619e..cd2bbe56f83f 100644
> > > > > > > > > > > > --- a/block/blk-mq-sched.c
> > > > > > > > > > > > +++ b/block/blk-mq-sched.c
> > > > > > > > > > > > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > > > > > */
> > > > > > > > > > > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > {
> > > > > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > > > + bool ret = false;
> > > > > > > > > > > >
> > > > > > > > > > > > do {
> > > > > > > > > > > > struct request *rq;
> > > > > > > > > > > > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > * in blk_mq_dispatch_rq_list().
> > > > > > > > > > > > */
> > > > > > > > > > > > list_add(&rq->queuelist, &rq_list);
> > > > > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > > > > > + } while (ret);
> > > > > > > > > > > > +
> > > > > > > > > > > > + return ret;
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > > > > > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > > > > > */
> > > > > > > > > > > > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > {
> > > > > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > > > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > > > > > > > > > > > + bool ret = false;
> > > > > > > > > > > >
> > > > > > > > > > > > do {
> > > > > > > > > > > > struct request *rq;
> > > > > > > > > > > > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > >
> > > > > > > > > > > > /* round robin for fair dispatch */
> > > > > > > > > > > > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > > > > > > > > > > > -
> > > > > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > > > > > + } while (ret);
> > > > > > > > > > > >
> > > > > > > > > > > > WRITE_ONCE(hctx->dispatch_from, ctx);
> > > > > > > > > > > > +
> > > > > > > > > > > > + return ret;
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > > > + bool dispatch_ret;
> > > > > > > > > > > >
> > > > > > > > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > > > > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > > > > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > */
> > > > > > > > > > > > if (!list_empty(&rq_list)) {
> > > > > > > > > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > > > > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > > > > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > > > > > > > > > + if (dispatch_ret) {
> > > > > > > > > > > > if (has_sched_dispatch)
> > > > > > > > > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > > > > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > > > > > > > >
> > > > > > > > > > If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
> > > > > > > > > > and got dispatch_ret = true now. Then we will try to dispatch more
> > > > > > > > > > reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
> > > > > > > > > > more requests in scheduler, then we will got dispatch_ret = false. In
> > > > > > > > >
> > > > > > > > > 'dispatch_ret' always holds result of the last blk_mq_do_dispatch_sched().
> > > > > > > > > When any one request has been dispatched successfully, 'dispatch_ret'
> > > > > > > > > is true. New request is always added to list before calling
> > > > > > > > > blk_mq_do_dispatch_sched(), so once blk_mq_do_dispatch_sched() returns
> > > > > > > > > false, it means that .commit_rqs() has been called.
> > > > > > > >
> > > > > > > > Not really, if no requests int the IO cheduler, we will break the loop
> > > > > > > > in blk_mq_do_dispatch_sched() and return false without calling
> > > > > > > > .commit_rqs().
> > > > > > >
> > > > > > > If there isn't any request to dispatch, false is returned. Otherwise,
> > > > > > > always return the return value of last 'blk_mq_dispatch_rq_list'.
> > > > > > >
> > > > > > > >
> > > > > > > > So in this case, blk_mq_do_dispatch_sched() will return 'false', which
> > > > > > > > overlapped the return value of 'true' from blk_mq_dispatch_rq_list(),
> > > > > > > > and did not call .commit_rqs(). Then the IO processing will be stuck.
> > > > > > >
> > > > > > > See below.
> > > > > > >
> > > > > > > >
> > > > > > > > static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > {
> > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > bool ret = false;
> > > > > > >
> > > > > > > The above initialization is just done once.
> > > > > > >
> > > > > > > >
> > > > > > > > do {
> > > > > > > > struct request *rq;
> > > > > > > >
> > > > > > > > if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
> > > > > > > > break;
> > > > > > > >
> > > > > > > > .......
> > > > > > > ret = blk_mq_dispatch_rq_list(q, list, ...);
> > > > > > >
> > > > > > > list includes one request, so blk_mq_dispatch_rq_list() won't return
> > > > > > > false in case of no request in list.
> > > > > > >
> > > > > > > > } while (ret);
> > > > > > > >
> > > > > > > > return ret;
> > > > > > >
> > > > > > > 'ret' is always updated by return value of last blk_mq_dispatch_rq_list()
> > > > > > > if at least one request is dispatched. So if it becomes false, the loop
> > > > > > > breaks, that means .commit_rqs() has been called cause 'list' does
> > > > > > > include one request for blk_mq_dispatch_rq_list(). Otherwise, true is
> > > > > > > still returned.
> > > > > >
> > > > > > Sorry for my confusing description, let me try again to describe the problem.
> > > > > > When I try to mount the block device, I got the IO stuck with your
> > > > > > patch, and I did some debugging. I found we missed calling
> > > > > > commit_rqs() for one case:
> > > > > >
> > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct
> > > > > > blk_mq_hw_ctx *hctx)
> > > > > > struct elevator_queue *e = q->elevator;
> > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > > LIST_HEAD(rq_list);
> > > > > > + bool dispatch_ret;
> > > > > >
> > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct
> > > > > > blk_mq_hw_ctx *hctx)
> > > > > > */
> > > > > > if (!list_empty(&rq_list)) {
> > > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > > >
> > > > > > Suppose we dispatch one request to block driver, and return 'true' here.
> > > > > >
> > > > > > + if (dispatch_ret) {
> > > > > > if (has_sched_dispatch)
> > > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > > > >
> > > > > > Then we will continue to try to dispatch more requests from IO
> > > > > > scheduler, but if there are no requests in IO scheduler now, it will
> > > > > > return 'false' here, and set dispatch_ret as false.
> > > > > >
> > > > > > else
> > > > > > - blk_mq_do_dispatch_ctx(hctx);
> > > > > > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> > > > >
> > > > > OK, this one is an issue, but it can be fixed simply by not updating
> > > > > 'dispatch_ret' for the following dispatch, something like the below
> > > > > way:
> > > > >
> > > > > if (dispatch_ret) {
> > > > > if (has_sched_dispatch)
> > > > > blk_mq_do_dispatch_sched(hctx);
> > > > > else
> > > > > blk_mq_do_dispatch_ctx(hctx);
> > > > > }
> > > >
> > > > Yes, this can work.
> > > >
> > > > But I found your patch will drop some performance comparing with my
> > > > method in patch 1/2. My method can fetch several requests from IO
> > > > scheduler and dispatch them to block driver at one time, but in your
> > > > patch we still need dispatch request one by one, which will drop some
> > > > performance I think.
> > > > What do you think? Thanks.
> > >
> > > Please run your test and see if performance drop can be observed.
> >
> > From my testing (using the same fio configuration in cover letter), I
> > found your method will drop some performance from below data.
> >
> > My original patches:
> > Sequential read: 229.6MiB/s
> > Random read:180.8MiB/s
> > Sequential write: 172MiB/s
> > Random write:169.2MiB/s
> >
> > Your patches:
> > Sequential read: 209MiB/s
> > Random read:177MiB/s
> > Sequential write: 148MiB/s
> > Random write:147MiB/s
>
> After some optimiziton and I did more testing, I did not found any
> performance issue with your patch comparing with my old method. Sorry
> for noise in my last email.
>
> So will you send out a formal patch? If yes, please add my test-by
> tag. Thanks for your help.
> Tested-by: Baolin Wang <[email protected]>

Can I take this patch into my patch set with your authority? Or you
want to send it out by yourself? Thanks.

--
Baolin Wang

2020-04-22 09:28:45

by Ming Lei

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

On Wed, Apr 22, 2020 at 05:21:01PM +0800, Baolin Wang wrote:
> Hi Ming,
>
> On Fri, Mar 27, 2020 at 4:30 PM Baolin Wang <[email protected]> wrote:
> >
> > Hi Ming,
> >
> > On Tue, Mar 24, 2020 at 4:29 PM Baolin Wang <[email protected]> wrote:
> > >
> > > Hi Ming,
> > >
> > > On Mon, Mar 23, 2020 at 5:58 PM Ming Lei <[email protected]> wrote:
> > > >
> > > > On Mon, Mar 23, 2020 at 05:13:27PM +0800, Baolin Wang wrote:
> > > > > On Mon, Mar 23, 2020 at 4:29 PM Ming Lei <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Mar 23, 2020 at 04:22:38PM +0800, Baolin Wang wrote:
> > > > > > > On Mon, Mar 23, 2020 at 3:27 PM Ming Lei <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Mon, Mar 23, 2020 at 01:36:34PM +0800, Baolin Wang wrote:
> > > > > > > > > On Mon, Mar 23, 2020 at 11:44 AM Ming Lei <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > On Fri, Mar 20, 2020 at 06:27:41PM +0800, Baolin Wang wrote:
> > > > > > > > > > > Hi Ming,
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <[email protected]> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Ming,
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > > > > > > > > > > > > As we know, some SD/MMC host controllers can support packed request,
> > > > > > > > > > > > > > that means we can package several requests to host controller at one
> > > > > > > > > > > > > > time to improve performence. So the hardware driver expects the blk-mq
> > > > > > > > > > > > > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > > > > > > > > > > > > to indicate if it is the last request in the batch to help to combine
> > > > > > > > > > > > > > requests as much as possible.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thus we should add batch requests setting from the block driver to tell
> > > > > > > > > > > > > > the scheduler how many requests can be dispatched in a batch, as well
> > > > > > > > > > > > > > as changing the scheduler to dispatch more than one request if setting
> > > > > > > > > > > > > > the maximum batch requests number.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I feel this batch dispatch style is more complicated, and some other
> > > > > > > > > > > > > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > > > > > > > > > > > > .queue_rq().
> > > > > > > > > > > > >
> > > > > > > > > > > > > So what about the following way by extending .commit_rqs() to this usage?
> > > > > > > > > > > > > And you can do whatever batch processing in .commit_rqs() which will be
> > > > > > > > > > > > > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
> > > > > > > > > > > >
> > > > > > > > > > > > I'm very appreciated for your good suggestion, which is much simpler than mine.
> > > > > > > > > > > > It seems to solve my problem, and I will try it on my platform to see
> > > > > > > > > > > > if it can work and give you the feadback. Thanks again.
> > > > > > > > > > >
> > > > > > > > > > > I tried your approach on my platform, but met some problems, see below.
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > > > > > > > > > > > index 856356b1619e..cd2bbe56f83f 100644
> > > > > > > > > > > > > --- a/block/blk-mq-sched.c
> > > > > > > > > > > > > +++ b/block/blk-mq-sched.c
> > > > > > > > > > > > > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > > > > > > */
> > > > > > > > > > > > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > {
> > > > > > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > > > > + bool ret = false;
> > > > > > > > > > > > >
> > > > > > > > > > > > > do {
> > > > > > > > > > > > > struct request *rq;
> > > > > > > > > > > > > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > * in blk_mq_dispatch_rq_list().
> > > > > > > > > > > > > */
> > > > > > > > > > > > > list_add(&rq->queuelist, &rq_list);
> > > > > > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > > > > > > + } while (ret);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + return ret;
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > > > > > > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > > > > > > */
> > > > > > > > > > > > > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > {
> > > > > > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > > > > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > > > > > > > > > > > > + bool ret = false;
> > > > > > > > > > > > >
> > > > > > > > > > > > > do {
> > > > > > > > > > > > > struct request *rq;
> > > > > > > > > > > > > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > >
> > > > > > > > > > > > > /* round robin for fair dispatch */
> > > > > > > > > > > > > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > > > > > > > > > > > > -
> > > > > > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > > > > > > + } while (ret);
> > > > > > > > > > > > >
> > > > > > > > > > > > > WRITE_ONCE(hctx->dispatch_from, ctx);
> > > > > > > > > > > > > +
> > > > > > > > > > > > > + return ret;
> > > > > > > > > > > > > }
> > > > > > > > > > > > >
> > > > > > > > > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > > > > + bool dispatch_ret;
> > > > > > > > > > > > >
> > > > > > > > > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > > > > > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > > > > > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > */
> > > > > > > > > > > > > if (!list_empty(&rq_list)) {
> > > > > > > > > > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > > > > > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > > > > > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > > > > > > > > > > + if (dispatch_ret) {
> > > > > > > > > > > > > if (has_sched_dispatch)
> > > > > > > > > > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > > > > > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > > > > > > > > >
> > > > > > > > > > > If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
> > > > > > > > > > > and got dispatch_ret = true now. Then we will try to dispatch more
> > > > > > > > > > > reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
> > > > > > > > > > > more requests in scheduler, then we will got dispatch_ret = false. In
> > > > > > > > > >
> > > > > > > > > > 'dispatch_ret' always holds result of the last blk_mq_do_dispatch_sched().
> > > > > > > > > > When any one request has been dispatched successfully, 'dispatch_ret'
> > > > > > > > > > is true. New request is always added to list before calling
> > > > > > > > > > blk_mq_do_dispatch_sched(), so once blk_mq_do_dispatch_sched() returns
> > > > > > > > > > false, it means that .commit_rqs() has been called.
> > > > > > > > >
> > > > > > > > > Not really, if no requests int the IO cheduler, we will break the loop
> > > > > > > > > in blk_mq_do_dispatch_sched() and return false without calling
> > > > > > > > > .commit_rqs().
> > > > > > > >
> > > > > > > > If there isn't any request to dispatch, false is returned. Otherwise,
> > > > > > > > always return the return value of last 'blk_mq_dispatch_rq_list'.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > So in this case, blk_mq_do_dispatch_sched() will return 'false', which
> > > > > > > > > overlapped the return value of 'true' from blk_mq_dispatch_rq_list(),
> > > > > > > > > and did not call .commit_rqs(). Then the IO processing will be stuck.
> > > > > > > >
> > > > > > > > See below.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > {
> > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > bool ret = false;
> > > > > > > >
> > > > > > > > The above initialization is just done once.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > do {
> > > > > > > > > struct request *rq;
> > > > > > > > >
> > > > > > > > > if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
> > > > > > > > > break;
> > > > > > > > >
> > > > > > > > > .......
> > > > > > > > ret = blk_mq_dispatch_rq_list(q, list, ...);
> > > > > > > >
> > > > > > > > list includes one request, so blk_mq_dispatch_rq_list() won't return
> > > > > > > > false in case of no request in list.
> > > > > > > >
> > > > > > > > > } while (ret);
> > > > > > > > >
> > > > > > > > > return ret;
> > > > > > > >
> > > > > > > > 'ret' is always updated by return value of last blk_mq_dispatch_rq_list()
> > > > > > > > if at least one request is dispatched. So if it becomes false, the loop
> > > > > > > > breaks, that means .commit_rqs() has been called cause 'list' does
> > > > > > > > include one request for blk_mq_dispatch_rq_list(). Otherwise, true is
> > > > > > > > still returned.
> > > > > > >
> > > > > > > Sorry for my confusing description, let me try again to describe the problem.
> > > > > > > When I try to mount the block device, I got the IO stuck with your
> > > > > > > patch, and I did some debugging. I found we missed calling
> > > > > > > commit_rqs() for one case:
> > > > > > >
> > > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct
> > > > > > > blk_mq_hw_ctx *hctx)
> > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > > > LIST_HEAD(rq_list);
> > > > > > > + bool dispatch_ret;
> > > > > > >
> > > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct
> > > > > > > blk_mq_hw_ctx *hctx)
> > > > > > > */
> > > > > > > if (!list_empty(&rq_list)) {
> > > > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > > > >
> > > > > > > Suppose we dispatch one request to block driver, and return 'true' here.
> > > > > > >
> > > > > > > + if (dispatch_ret) {
> > > > > > > if (has_sched_dispatch)
> > > > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > > > > >
> > > > > > > Then we will continue to try to dispatch more requests from IO
> > > > > > > scheduler, but if there are no requests in IO scheduler now, it will
> > > > > > > return 'false' here, and set dispatch_ret as false.
> > > > > > >
> > > > > > > else
> > > > > > > - blk_mq_do_dispatch_ctx(hctx);
> > > > > > > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> > > > > >
> > > > > > OK, this one is an issue, but it can be fixed simply by not updating
> > > > > > 'dispatch_ret' for the following dispatch, something like the below
> > > > > > way:
> > > > > >
> > > > > > if (dispatch_ret) {
> > > > > > if (has_sched_dispatch)
> > > > > > blk_mq_do_dispatch_sched(hctx);
> > > > > > else
> > > > > > blk_mq_do_dispatch_ctx(hctx);
> > > > > > }
> > > > >
> > > > > Yes, this can work.
> > > > >
> > > > > But I found your patch will drop some performance comparing with my
> > > > > method in patch 1/2. My method can fetch several requests from IO
> > > > > scheduler and dispatch them to block driver at one time, but in your
> > > > > patch we still need dispatch request one by one, which will drop some
> > > > > performance I think.
> > > > > What do you think? Thanks.
> > > >
> > > > Please run your test and see if performance drop can be observed.
> > >
> > > From my testing (using the same fio configuration in cover letter), I
> > > found your method will drop some performance from below data.
> > >
> > > My original patches:
> > > Sequential read: 229.6MiB/s
> > > Random read:180.8MiB/s
> > > Sequential write: 172MiB/s
> > > Random write:169.2MiB/s
> > >
> > > Your patches:
> > > Sequential read: 209MiB/s
> > > Random read:177MiB/s
> > > Sequential write: 148MiB/s
> > > Random write:147MiB/s
> >
> > After some optimiziton and I did more testing, I did not found any
> > performance issue with your patch comparing with my old method. Sorry
> > for noise in my last email.
> >
> > So will you send out a formal patch? If yes, please add my test-by
> > tag. Thanks for your help.
> > Tested-by: Baolin Wang <[email protected]>
>
> Can I take this patch into my patch set with your authority? Or you
> want to send it out by yourself? Thanks.

Hi Baolin,

You can fold the patch into your series.

Thanks,
Ming

2020-04-22 09:30:32

by Baolin Wang

[permalink] [raw]
Subject: Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

On Wed, Apr 22, 2020 at 5:26 PM Ming Lei <[email protected]> wrote:
>
> On Wed, Apr 22, 2020 at 05:21:01PM +0800, Baolin Wang wrote:
> > Hi Ming,
> >
> > On Fri, Mar 27, 2020 at 4:30 PM Baolin Wang <[email protected]> wrote:
> > >
> > > Hi Ming,
> > >
> > > On Tue, Mar 24, 2020 at 4:29 PM Baolin Wang <[email protected]> wrote:
> > > >
> > > > Hi Ming,
> > > >
> > > > On Mon, Mar 23, 2020 at 5:58 PM Ming Lei <[email protected]> wrote:
> > > > >
> > > > > On Mon, Mar 23, 2020 at 05:13:27PM +0800, Baolin Wang wrote:
> > > > > > On Mon, Mar 23, 2020 at 4:29 PM Ming Lei <[email protected]> wrote:
> > > > > > >
> > > > > > > On Mon, Mar 23, 2020 at 04:22:38PM +0800, Baolin Wang wrote:
> > > > > > > > On Mon, Mar 23, 2020 at 3:27 PM Ming Lei <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Mar 23, 2020 at 01:36:34PM +0800, Baolin Wang wrote:
> > > > > > > > > > On Mon, Mar 23, 2020 at 11:44 AM Ming Lei <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Mar 20, 2020 at 06:27:41PM +0800, Baolin Wang wrote:
> > > > > > > > > > > > Hi Ming,
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang <[email protected]> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Ming,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Mar 18, 2020 at 6:01 PM Ming Lei <[email protected]> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
> > > > > > > > > > > > > > > As we know, some SD/MMC host controllers can support packed request,
> > > > > > > > > > > > > > > that means we can package several requests to host controller at one
> > > > > > > > > > > > > > > time to improve performence. So the hardware driver expects the blk-mq
> > > > > > > > > > > > > > > can dispatch a batch of requests at one time, and driver can use bd.last
> > > > > > > > > > > > > > > to indicate if it is the last request in the batch to help to combine
> > > > > > > > > > > > > > > requests as much as possible.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thus we should add batch requests setting from the block driver to tell
> > > > > > > > > > > > > > > the scheduler how many requests can be dispatched in a batch, as well
> > > > > > > > > > > > > > > as changing the scheduler to dispatch more than one request if setting
> > > > > > > > > > > > > > > the maximum batch requests number.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I feel this batch dispatch style is more complicated, and some other
> > > > > > > > > > > > > > drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
> > > > > > > > > > > > > > .queue_rq().
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > So what about the following way by extending .commit_rqs() to this usage?
> > > > > > > > > > > > > > And you can do whatever batch processing in .commit_rqs() which will be
> > > > > > > > > > > > > > guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm very appreciated for your good suggestion, which is much simpler than mine.
> > > > > > > > > > > > > It seems to solve my problem, and I will try it on my platform to see
> > > > > > > > > > > > > if it can work and give you the feadback. Thanks again.
> > > > > > > > > > > >
> > > > > > > > > > > > I tried your approach on my platform, but met some problems, see below.
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > > diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
> > > > > > > > > > > > > > index 856356b1619e..cd2bbe56f83f 100644
> > > > > > > > > > > > > > --- a/block/blk-mq-sched.c
> > > > > > > > > > > > > > +++ b/block/blk-mq-sched.c
> > > > > > > > > > > > > > @@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > > > > > > > */
> > > > > > > > > > > > > > -static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > > +static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > > {
> > > > > > > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > > > > > + bool ret = false;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > do {
> > > > > > > > > > > > > > struct request *rq;
> > > > > > > > > > > > > > @@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > > * in blk_mq_dispatch_rq_list().
> > > > > > > > > > > > > > */
> > > > > > > > > > > > > > list_add(&rq->queuelist, &rq_list);
> > > > > > > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > > > > > > > + } while (ret);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > + return ret;
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > > > > > > > @@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
> > > > > > > > > > > > > > * its queue by itself in its completion handler, so we don't need to
> > > > > > > > > > > > > > * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
> > > > > > > > > > > > > > */
> > > > > > > > > > > > > > -static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > > +static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > > {
> > > > > > > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > > > > > struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
> > > > > > > > > > > > > > + bool ret = false;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > do {
> > > > > > > > > > > > > > struct request *rq;
> > > > > > > > > > > > > > @@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > /* round robin for fair dispatch */
> > > > > > > > > > > > > > ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
> > > > > > > > > > > > > > -
> > > > > > > > > > > > > > - } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
> > > > > > > > > > > > > > + ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
> > > > > > > > > > > > > > + } while (ret);
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > WRITE_ONCE(hctx->dispatch_from, ctx);
> > > > > > > > > > > > > > +
> > > > > > > > > > > > > > + return ret;
> > > > > > > > > > > > > > }
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > > > > > + bool dispatch_ret;
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > > > > > > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > > > > > > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > > > > > */
> > > > > > > > > > > > > > if (!list_empty(&rq_list)) {
> > > > > > > > > > > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > > > > > > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > > > > > > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > > > > > > > > > > > + if (dispatch_ret) {
> > > > > > > > > > > > > > if (has_sched_dispatch)
> > > > > > > > > > > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > > > > > > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > > > > > > > > > >
> > > > > > > > > > > > If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
> > > > > > > > > > > > and got dispatch_ret = true now. Then we will try to dispatch more
> > > > > > > > > > > > reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
> > > > > > > > > > > > more requests in scheduler, then we will got dispatch_ret = false. In
> > > > > > > > > > >
> > > > > > > > > > > 'dispatch_ret' always holds result of the last blk_mq_do_dispatch_sched().
> > > > > > > > > > > When any one request has been dispatched successfully, 'dispatch_ret'
> > > > > > > > > > > is true. New request is always added to list before calling
> > > > > > > > > > > blk_mq_do_dispatch_sched(), so once blk_mq_do_dispatch_sched() returns
> > > > > > > > > > > false, it means that .commit_rqs() has been called.
> > > > > > > > > >
> > > > > > > > > > Not really, if no requests int the IO cheduler, we will break the loop
> > > > > > > > > > in blk_mq_do_dispatch_sched() and return false without calling
> > > > > > > > > > .commit_rqs().
> > > > > > > > >
> > > > > > > > > If there isn't any request to dispatch, false is returned. Otherwise,
> > > > > > > > > always return the return value of last 'blk_mq_dispatch_rq_list'.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > So in this case, blk_mq_do_dispatch_sched() will return 'false', which
> > > > > > > > > > overlapped the return value of 'true' from blk_mq_dispatch_rq_list(),
> > > > > > > > > > and did not call .commit_rqs(). Then the IO processing will be stuck.
> > > > > > > > >
> > > > > > > > > See below.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
> > > > > > > > > > {
> > > > > > > > > > struct request_queue *q = hctx->queue;
> > > > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > > > bool ret = false;
> > > > > > > > >
> > > > > > > > > The above initialization is just done once.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > do {
> > > > > > > > > > struct request *rq;
> > > > > > > > > >
> > > > > > > > > > if (e->type->ops.has_work && !e->type->ops.has_work(hctx))
> > > > > > > > > > break;
> > > > > > > > > >
> > > > > > > > > > .......
> > > > > > > > > ret = blk_mq_dispatch_rq_list(q, list, ...);
> > > > > > > > >
> > > > > > > > > list includes one request, so blk_mq_dispatch_rq_list() won't return
> > > > > > > > > false in case of no request in list.
> > > > > > > > >
> > > > > > > > > > } while (ret);
> > > > > > > > > >
> > > > > > > > > > return ret;
> > > > > > > > >
> > > > > > > > > 'ret' is always updated by return value of last blk_mq_dispatch_rq_list()
> > > > > > > > > if at least one request is dispatched. So if it becomes false, the loop
> > > > > > > > > breaks, that means .commit_rqs() has been called cause 'list' does
> > > > > > > > > include one request for blk_mq_dispatch_rq_list(). Otherwise, true is
> > > > > > > > > still returned.
> > > > > > > >
> > > > > > > > Sorry for my confusing description, let me try again to describe the problem.
> > > > > > > > When I try to mount the block device, I got the IO stuck with your
> > > > > > > > patch, and I did some debugging. I found we missed calling
> > > > > > > > commit_rqs() for one case:
> > > > > > > >
> > > > > > > > void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
> > > > > > > > @@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct
> > > > > > > > blk_mq_hw_ctx *hctx)
> > > > > > > > struct elevator_queue *e = q->elevator;
> > > > > > > > const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
> > > > > > > > LIST_HEAD(rq_list);
> > > > > > > > + bool dispatch_ret;
> > > > > > > >
> > > > > > > > /* RCU or SRCU read lock is needed before checking quiesced flag */
> > > > > > > > if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
> > > > > > > > @@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct
> > > > > > > > blk_mq_hw_ctx *hctx)
> > > > > > > > */
> > > > > > > > if (!list_empty(&rq_list)) {
> > > > > > > > blk_mq_sched_mark_restart_hctx(hctx);
> > > > > > > > - if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
> > > > > > > > + dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
> > > > > > > >
> > > > > > > > Suppose we dispatch one request to block driver, and return 'true' here.
> > > > > > > >
> > > > > > > > + if (dispatch_ret) {
> > > > > > > > if (has_sched_dispatch)
> > > > > > > > - blk_mq_do_dispatch_sched(hctx);
> > > > > > > > + dispatch_ret = blk_mq_do_dispatch_sched(hctx);
> > > > > > > >
> > > > > > > > Then we will continue to try to dispatch more requests from IO
> > > > > > > > scheduler, but if there are no requests in IO scheduler now, it will
> > > > > > > > return 'false' here, and set dispatch_ret as false.
> > > > > > > >
> > > > > > > > else
> > > > > > > > - blk_mq_do_dispatch_ctx(hctx);
> > > > > > > > + dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
> > > > > > >
> > > > > > > OK, this one is an issue, but it can be fixed simply by not updating
> > > > > > > 'dispatch_ret' for the following dispatch, something like the below
> > > > > > > way:
> > > > > > >
> > > > > > > if (dispatch_ret) {
> > > > > > > if (has_sched_dispatch)
> > > > > > > blk_mq_do_dispatch_sched(hctx);
> > > > > > > else
> > > > > > > blk_mq_do_dispatch_ctx(hctx);
> > > > > > > }
> > > > > >
> > > > > > Yes, this can work.
> > > > > >
> > > > > > But I found your patch will drop some performance comparing with my
> > > > > > method in patch 1/2. My method can fetch several requests from IO
> > > > > > scheduler and dispatch them to block driver at one time, but in your
> > > > > > patch we still need dispatch request one by one, which will drop some
> > > > > > performance I think.
> > > > > > What do you think? Thanks.
> > > > >
> > > > > Please run your test and see if performance drop can be observed.
> > > >
> > > > From my testing (using the same fio configuration in cover letter), I
> > > > found your method will drop some performance from below data.
> > > >
> > > > My original patches:
> > > > Sequential read: 229.6MiB/s
> > > > Random read:180.8MiB/s
> > > > Sequential write: 172MiB/s
> > > > Random write:169.2MiB/s
> > > >
> > > > Your patches:
> > > > Sequential read: 209MiB/s
> > > > Random read:177MiB/s
> > > > Sequential write: 148MiB/s
> > > > Random write:147MiB/s
> > >
> > > After some optimiziton and I did more testing, I did not found any
> > > performance issue with your patch comparing with my old method. Sorry
> > > for noise in my last email.
> > >
> > > So will you send out a formal patch? If yes, please add my test-by
> > > tag. Thanks for your help.
> > > Tested-by: Baolin Wang <[email protected]>
> >
> > Can I take this patch into my patch set with your authority? Or you
> > want to send it out by yourself? Thanks.
>
> Hi Baolin,
>
> You can fold the patch into your series.

OK. Much thanks for your help.

--
Baolin Wang