2023-09-19 07:50:15

by Wenchao Chen

[permalink] [raw]
Subject: [PATCH V4 0/2] mmc: hsq: dynamically adjust hsq_depth to improve performance

Change in v4:
- Remove "struct hsq_slot *slot" and simplify the code.
- In general, need_change has to be greater than or equal to 2 to allow the threshold
to be adjusted.

Change in v3:
- Use "mrq->data->blksz * mrq->data->blocks == 4096" for 4K.
- Add explanation for "HSQ_PERFORMANCE_DEPTH".

Change in v2:
- Support for dynamic adjustment of hsq_depth.

Test
=====
I tested 3 times for each case and output a average speed.
Ran 'fio' to evaluate the performance:
1.Fixed hsq_depth
1) Sequential write:
Speed: 168 164 165
Average speed: 165.67MB/S

2) Sequential read:
Speed: 326 326 326
Average speed: 326MB/S

3) Random write:
Speed: 82.6 83 83
Average speed: 82.87MB/S

4) Random read:
Speed: 48.2 48.3 47.6
Average speed: 48.03MB/S

2.Dynamic hsq_depth
1) Sequential write:
Speed: 167 166 166
Average speed: 166.33MB/S

2) Sequential read:
Speed: 327 326 326
Average speed: 326.3MB/S

3) Random write:
Speed: 86.1 86.2 87.7
Average speed: 86.67MB/S

4) Random read:
Speed: 48.1 48 48
Average speed: 48.03MB/S

Based on the above data, dynamic hsq_depth can improve the performance of random writes.
Random write improved by 4.6%.

In addition, we tested 8K and 16K.
1.Fixed hsq_depth
1) Random write(bs=8K):
Speed: 116 114 115
Average speed: 115MB/S

2) Random read(bs=8K):
Speed: 83 83 82.5
Average speed: 82.8MB/S

3) Random write(bs=16K):
Speed: 141 142 141
Average speed: 141.3MB/S

4) Random read(bs=16K):
Speed: 132 132 132
Average speed: 132MB/S

2.Dynamic hsq_depth(mrq->data->blksz * mrq->data->blocks == 8192 or 16384)
1) Random write(bs=8K):
Speed: 115 115 115
Average speed: 115MB/S

2) Random read(bs=8K):
Speed: 82.7 82.9 82.8
Average speed: 82.8MB/S

3) Random write(bs=16K):
Speed: 143 141 141
Average speed: 141.6MB/S

4) Random read(bs=16K):
Speed: 132 132 132
Average speed: 132MB/S

Increasing hsq_depth cannot improve 8k and 16k random read/write performance.
To reduce latency, we dynamically increase hsq_depth only for 4k random writes.

Test cmd
=========
1)write: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=write -bs=512K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
2)read: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=read -bs=512K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
3)randwrite: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randwrite -bs=4K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
4)randread: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randread -bs=4K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
5)randwrite: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randwrite -bs=8K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
6)randread: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randread -bs=8K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
7)randwrite: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randwrite -bs=16K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
8)randread: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randread -bs=16K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64

Wenchao Chen (2):
mmc: queue: replace immediate with hsq->depth
mmc: hsq: dynamic adjustment of hsq->depth

drivers/mmc/core/queue.c | 6 +-----
drivers/mmc/host/mmc_hsq.c | 22 ++++++++++++++++++++++
drivers/mmc/host/mmc_hsq.h | 11 +++++++++++
include/linux/mmc/host.h | 1 +
4 files changed, 35 insertions(+), 5 deletions(-)

--
2.17.1


2023-09-19 07:50:20

by Wenchao Chen

[permalink] [raw]
Subject: [PATCH V4 2/2] mmc: hsq: dynamic adjustment of hsq->depth

Increasing hsq_depth improves random write performance.

Signed-off-by: Wenchao Chen <[email protected]>
---
drivers/mmc/host/mmc_hsq.c | 21 +++++++++++++++++++++
drivers/mmc/host/mmc_hsq.h | 5 +++++
2 files changed, 26 insertions(+)

diff --git a/drivers/mmc/host/mmc_hsq.c b/drivers/mmc/host/mmc_hsq.c
index 8556cacb21a1..869da6472a6d 100644
--- a/drivers/mmc/host/mmc_hsq.c
+++ b/drivers/mmc/host/mmc_hsq.c
@@ -21,6 +21,25 @@ static void mmc_hsq_retry_handler(struct work_struct *work)
mmc->ops->request(mmc, hsq->mrq);
}

+static void mmc_hsq_modify_threshold(struct mmc_hsq *hsq)
+{
+ struct mmc_host *mmc = hsq->mmc;
+ struct mmc_request *mrq;
+ int need_change = 0;
+ int tag;
+
+ mmc->hsq_depth = HSQ_NORMAL_DEPTH;
+ for (tag = 0; tag < HSQ_NUM_SLOTS; tag++) {
+ mrq = hsq->slot[tag].mrq;
+ if (mrq && mrq->data &&
+ (mrq->data->blksz * mrq->data->blocks == 4096) &&
+ (mrq->data->flags & MMC_DATA_WRITE) && (++need_change >= 2)) {
+ mmc->hsq_depth = HSQ_PERFORMANCE_DEPTH;
+ break;
+ }
+ }
+}
+
static void mmc_hsq_pump_requests(struct mmc_hsq *hsq)
{
struct mmc_host *mmc = hsq->mmc;
@@ -42,6 +61,8 @@ static void mmc_hsq_pump_requests(struct mmc_hsq *hsq)
return;
}

+ mmc_hsq_modify_threshold(hsq);
+
slot = &hsq->slot[hsq->next_tag];
hsq->mrq = slot->mrq;
hsq->qcnt--;
diff --git a/drivers/mmc/host/mmc_hsq.h b/drivers/mmc/host/mmc_hsq.h
index aa5c4543b55f..dd352a6ac32a 100644
--- a/drivers/mmc/host/mmc_hsq.h
+++ b/drivers/mmc/host/mmc_hsq.h
@@ -10,6 +10,11 @@
* flight to avoid a long latency.
*/
#define HSQ_NORMAL_DEPTH 2
+/*
+ * For 4k random writes, we allow hsq_depth to increase to 5
+ * for better performance.
+ */
+#define HSQ_PERFORMANCE_DEPTH 5

struct hsq_slot {
struct mmc_request *mrq;
--
2.17.1

2023-09-19 07:51:35

by Wenchao Chen

[permalink] [raw]
Subject: [PATCH V4 1/2] mmc: queue: replace immediate with hsq->depth

Hsq is similar to cqe, using hsq->depth to represent
the maximum processing capacity of hsq. We can adjust
hsq->depth according to the actual situation.

Signed-off-by: Wenchao Chen <[email protected]>
---
drivers/mmc/core/queue.c | 6 +-----
drivers/mmc/host/mmc_hsq.c | 1 +
drivers/mmc/host/mmc_hsq.h | 6 ++++++
include/linux/mmc/host.h | 1 +
4 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index b396e3900717..a0a2412f62a7 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -260,11 +260,7 @@ static blk_status_t mmc_mq_queue_rq(struct blk_mq_hw_ctx *hctx,
}
break;
case MMC_ISSUE_ASYNC:
- /*
- * For MMC host software queue, we only allow 2 requests in
- * flight to avoid a long latency.
- */
- if (host->hsq_enabled && mq->in_flight[issue_type] > 2) {
+ if (host->hsq_enabled && mq->in_flight[issue_type] > host->hsq_depth) {
spin_unlock_irq(&mq->lock);
return BLK_STS_RESOURCE;
}
diff --git a/drivers/mmc/host/mmc_hsq.c b/drivers/mmc/host/mmc_hsq.c
index 424dc7b07858..8556cacb21a1 100644
--- a/drivers/mmc/host/mmc_hsq.c
+++ b/drivers/mmc/host/mmc_hsq.c
@@ -337,6 +337,7 @@ int mmc_hsq_init(struct mmc_hsq *hsq, struct mmc_host *mmc)
hsq->mmc = mmc;
hsq->mmc->cqe_private = hsq;
mmc->cqe_ops = &mmc_hsq_ops;
+ mmc->hsq_depth = HSQ_NORMAL_DEPTH;

for (i = 0; i < HSQ_NUM_SLOTS; i++)
hsq->tag_slot[i] = HSQ_INVALID_TAG;
diff --git a/drivers/mmc/host/mmc_hsq.h b/drivers/mmc/host/mmc_hsq.h
index 1808024fc6c5..aa5c4543b55f 100644
--- a/drivers/mmc/host/mmc_hsq.h
+++ b/drivers/mmc/host/mmc_hsq.h
@@ -5,6 +5,12 @@
#define HSQ_NUM_SLOTS 64
#define HSQ_INVALID_TAG HSQ_NUM_SLOTS

+/*
+ * For MMC host software queue, we only allow 2 requests in
+ * flight to avoid a long latency.
+ */
+#define HSQ_NORMAL_DEPTH 2
+
struct hsq_slot {
struct mmc_request *mrq;
};
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 62a6847a3b6f..2f445c651742 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -526,6 +526,7 @@ struct mmc_host {

/* Host Software Queue support */
bool hsq_enabled;
+ int hsq_depth;

u32 err_stats[MMC_ERR_MAX];
unsigned long private[] ____cacheline_aligned;
--
2.17.1

2023-09-26 15:07:33

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH V4 0/2] mmc: hsq: dynamically adjust hsq_depth to improve performance

On Tue, 19 Sept 2023 at 09:47, Wenchao Chen <[email protected]> wrote:
>
> Change in v4:
> - Remove "struct hsq_slot *slot" and simplify the code.
> - In general, need_change has to be greater than or equal to 2 to allow the threshold
> to be adjusted.
>
> Change in v3:
> - Use "mrq->data->blksz * mrq->data->blocks == 4096" for 4K.
> - Add explanation for "HSQ_PERFORMANCE_DEPTH".
>
> Change in v2:
> - Support for dynamic adjustment of hsq_depth.
>
> Test
> =====
> I tested 3 times for each case and output a average speed.
> Ran 'fio' to evaluate the performance:
> 1.Fixed hsq_depth
> 1) Sequential write:
> Speed: 168 164 165
> Average speed: 165.67MB/S
>
> 2) Sequential read:
> Speed: 326 326 326
> Average speed: 326MB/S
>
> 3) Random write:
> Speed: 82.6 83 83
> Average speed: 82.87MB/S
>
> 4) Random read:
> Speed: 48.2 48.3 47.6
> Average speed: 48.03MB/S
>
> 2.Dynamic hsq_depth
> 1) Sequential write:
> Speed: 167 166 166
> Average speed: 166.33MB/S
>
> 2) Sequential read:
> Speed: 327 326 326
> Average speed: 326.3MB/S
>
> 3) Random write:
> Speed: 86.1 86.2 87.7
> Average speed: 86.67MB/S
>
> 4) Random read:
> Speed: 48.1 48 48
> Average speed: 48.03MB/S
>
> Based on the above data, dynamic hsq_depth can improve the performance of random writes.
> Random write improved by 4.6%.
>
> In addition, we tested 8K and 16K.
> 1.Fixed hsq_depth
> 1) Random write(bs=8K):
> Speed: 116 114 115
> Average speed: 115MB/S
>
> 2) Random read(bs=8K):
> Speed: 83 83 82.5
> Average speed: 82.8MB/S
>
> 3) Random write(bs=16K):
> Speed: 141 142 141
> Average speed: 141.3MB/S
>
> 4) Random read(bs=16K):
> Speed: 132 132 132
> Average speed: 132MB/S
>
> 2.Dynamic hsq_depth(mrq->data->blksz * mrq->data->blocks == 8192 or 16384)
> 1) Random write(bs=8K):
> Speed: 115 115 115
> Average speed: 115MB/S
>
> 2) Random read(bs=8K):
> Speed: 82.7 82.9 82.8
> Average speed: 82.8MB/S
>
> 3) Random write(bs=16K):
> Speed: 143 141 141
> Average speed: 141.6MB/S
>
> 4) Random read(bs=16K):
> Speed: 132 132 132
> Average speed: 132MB/S
>
> Increasing hsq_depth cannot improve 8k and 16k random read/write performance.
> To reduce latency, we dynamically increase hsq_depth only for 4k random writes.
>
> Test cmd
> =========
> 1)write: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=write -bs=512K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 2)read: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=read -bs=512K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 3)randwrite: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randwrite -bs=4K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 4)randread: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randread -bs=4K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 5)randwrite: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randwrite -bs=8K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 6)randread: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randread -bs=8K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 7)randwrite: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randwrite -bs=16K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 8)randread: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randread -bs=16K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
>
> Wenchao Chen (2):
> mmc: queue: replace immediate with hsq->depth
> mmc: hsq: dynamic adjustment of hsq->depth
>
> drivers/mmc/core/queue.c | 6 +-----
> drivers/mmc/host/mmc_hsq.c | 22 ++++++++++++++++++++++
> drivers/mmc/host/mmc_hsq.h | 11 +++++++++++
> include/linux/mmc/host.h | 1 +
> 4 files changed, 35 insertions(+), 5 deletions(-)
>
> --
> 2.17.1
>

The series applied for next! I took the liberty of making some updates
to the commit messages and a small adjustment of the code in patch2.
Please have a look and yell at me if there is something that looks
odd!

Thanks and kind regards
Uffe

2023-09-26 16:22:28

by Wenchao Chen

[permalink] [raw]
Subject: Re: [PATCH V4 0/2] mmc: hsq: dynamically adjust hsq_depth to improve performance

A gentle ping.

On Tue, 19 Sept 2023 at 15:47, Wenchao Chen <[email protected]> wrote:
>
> Change in v4:
> - Remove "struct hsq_slot *slot" and simplify the code.
> - In general, need_change has to be greater than or equal to 2 to allow the threshold
> to be adjusted.
>
> Change in v3:
> - Use "mrq->data->blksz * mrq->data->blocks == 4096" for 4K.
> - Add explanation for "HSQ_PERFORMANCE_DEPTH".
>
> Change in v2:
> - Support for dynamic adjustment of hsq_depth.
>
> Test
> =====
> I tested 3 times for each case and output a average speed.
> Ran 'fio' to evaluate the performance:
> 1.Fixed hsq_depth
> 1) Sequential write:
> Speed: 168 164 165
> Average speed: 165.67MB/S
>
> 2) Sequential read:
> Speed: 326 326 326
> Average speed: 326MB/S
>
> 3) Random write:
> Speed: 82.6 83 83
> Average speed: 82.87MB/S
>
> 4) Random read:
> Speed: 48.2 48.3 47.6
> Average speed: 48.03MB/S
>
> 2.Dynamic hsq_depth
> 1) Sequential write:
> Speed: 167 166 166
> Average speed: 166.33MB/S
>
> 2) Sequential read:
> Speed: 327 326 326
> Average speed: 326.3MB/S
>
> 3) Random write:
> Speed: 86.1 86.2 87.7
> Average speed: 86.67MB/S
>
> 4) Random read:
> Speed: 48.1 48 48
> Average speed: 48.03MB/S
>
> Based on the above data, dynamic hsq_depth can improve the performance of random writes.
> Random write improved by 4.6%.
>
> In addition, we tested 8K and 16K.
> 1.Fixed hsq_depth
> 1) Random write(bs=8K):
> Speed: 116 114 115
> Average speed: 115MB/S
>
> 2) Random read(bs=8K):
> Speed: 83 83 82.5
> Average speed: 82.8MB/S
>
> 3) Random write(bs=16K):
> Speed: 141 142 141
> Average speed: 141.3MB/S
>
> 4) Random read(bs=16K):
> Speed: 132 132 132
> Average speed: 132MB/S
>
> 2.Dynamic hsq_depth(mrq->data->blksz * mrq->data->blocks == 8192 or 16384)
> 1) Random write(bs=8K):
> Speed: 115 115 115
> Average speed: 115MB/S
>
> 2) Random read(bs=8K):
> Speed: 82.7 82.9 82.8
> Average speed: 82.8MB/S
>
> 3) Random write(bs=16K):
> Speed: 143 141 141
> Average speed: 141.6MB/S
>
> 4) Random read(bs=16K):
> Speed: 132 132 132
> Average speed: 132MB/S
>
> Increasing hsq_depth cannot improve 8k and 16k random read/write performance.
> To reduce latency, we dynamically increase hsq_depth only for 4k random writes.
>
> Test cmd
> =========
> 1)write: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=write -bs=512K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 2)read: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=read -bs=512K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 3)randwrite: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randwrite -bs=4K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 4)randread: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randread -bs=4K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 5)randwrite: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randwrite -bs=8K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 6)randread: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randread -bs=8K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 7)randwrite: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randwrite -bs=16K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
> 8)randread: fio -filename=/dev/mmcblk0p72 -direct=1 -rw=randread -bs=16K -size=512M -group_reporting -name=test -numjobs=8 -thread -iodepth=64
>
> Wenchao Chen (2):
> mmc: queue: replace immediate with hsq->depth
> mmc: hsq: dynamic adjustment of hsq->depth
>
> drivers/mmc/core/queue.c | 6 +-----
> drivers/mmc/host/mmc_hsq.c | 22 ++++++++++++++++++++++
> drivers/mmc/host/mmc_hsq.h | 11 +++++++++++
> include/linux/mmc/host.h | 1 +
> 4 files changed, 35 insertions(+), 5 deletions(-)
>
> --
> 2.17.1
>