2023-09-13 19:06:40

by Ping Gan

[permalink] [raw]
Subject: [PATCH 0/4] nvmet: support polling queue task for bio request

Since nvme target currently does not support to submit bio to a polling
queue, the bio's completion relies on system interrupt. But when there
is high workload in system and the competition is very high, so it makes
sense to add polling queue task to submit bio to disk's polling queue
and poll the completion queue of disk.

Ping Gan (4):
nvmet: Add nvme target polling queue task parameters
nvmet: Add polling queue task for nvme target
nvmet: support bio polling queue request
nvme-core: Get lowlevel disk for target polling queue task

drivers/nvme/host/multipath.c | 20 +
drivers/nvme/target/Makefile | 2 +-
drivers/nvme/target/core.c | 55 +-
drivers/nvme/target/io-cmd-bdev.c | 243 ++++++++-
drivers/nvme/target/nvmet.h | 13 +
drivers/nvme/target/polling-queue-thread.c | 594 +++++++++++++++++++++
6 files changed, 895 insertions(+), 32 deletions(-)
create mode 100644 drivers/nvme/target/polling-queue-thread.c

--
2.26.2


2023-09-13 21:22:34

by Chaitanya Kulkarni

[permalink] [raw]
Subject: Re: [PATCH 0/4] nvmet: support polling queue task for bio request

On 9/13/2023 1:34 AM, Ping Gan wrote:
> Since nvme target currently does not support to submit bio to a polling
> queue, the bio's completion relies on system interrupt. But when there
> is high workload in system and the competition is very high, so it makes
> sense to add polling queue task to submit bio to disk's polling queue
> and poll the completion queue of disk.
>
>

I did some work in the past for nvmet polling and saw good
performance improvement.

Can you please share performance numbers for this series ?

-ck


2023-09-14 02:09:10

by Ping Gan

[permalink] [raw]
Subject: [PATCH 1/4] nvmet: Add nvme target polling queue task parameters

To define a polling task's running parameters when
nvme target submits bio to a nvme polling queue.

Signed-off-by: Ping Gan <[email protected]>
---
drivers/nvme/target/core.c | 55 ++++++++++++++++++++++++++++++++++++--
1 file changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index 3935165048e7..6f49965d5d17 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -17,6 +17,29 @@

#include "nvmet.h"

+/* Define the polling queue thread's affinity cpu core.
+ * */
+static int pqt_affinity_core = -1;
+module_param(pqt_affinity_core, int, 0644);
+MODULE_PARM_DESC(pqt_affinity_core,
+ "nvme polling queue thread's affinity core, -1 for all online cpus");
+
+/* Define a time (in usecs) that polling queue thread shall sample the
+ * * io request ring before determining it to be idle.
+ * */
+static int pqt_idle_usecs;
+module_param(pqt_idle_usecs, int, 0644);
+MODULE_PARM_DESC(pqt_idle_usecs,
+ "polling queue task will poll io request till idle time in usecs");
+
+/* Define the polling queue thread ring's size.
+ * * The ring will be consumed by polling queue thread.
+ * */
+static int pqt_ring_size;
+module_param(pqt_ring_size, int, 0644);
+MODULE_PARM_DESC(pqt_ring_size,
+ "nvme target polling queue thread ring size");
+
struct kmem_cache *nvmet_bvec_cache;
struct workqueue_struct *buffered_io_wq;
struct workqueue_struct *zbd_wq;
@@ -1648,13 +1671,34 @@ static int __init nvmet_init(void)
{
int error = -ENOMEM;

+ if ((pqt_affinity_core >= -1 &&
+ pqt_affinity_core < nr_cpu_ids) ||
+ pqt_idle_usecs > 0 || pqt_ring_size > 0) {
+ if (pqt_idle_usecs == 0)
+ pqt_idle_usecs = 1000; //default 1ms
+ if (pqt_affinity_core < -1 ||
+ pqt_affinity_core >= nr_cpu_ids) {
+ printk(KERN_ERR "bad parameter for affinity core \n");
+ error = -EINVAL;
+ return error;
+ }
+ if (pqt_ring_size == 0)
+ pqt_ring_size = 4096; //default 4k
+ error = nvmet_init_pq_thread(pqt_idle_usecs,
+ pqt_affinity_core, pqt_ring_size);
+ if (error)
+ return error;
+ }
+
nvmet_ana_group_enabled[NVMET_DEFAULT_ANA_GRPID] = 1;

nvmet_bvec_cache = kmem_cache_create("nvmet-bvec",
NVMET_MAX_MPOOL_BVEC * sizeof(struct bio_vec), 0,
SLAB_HWCACHE_ALIGN, NULL);
- if (!nvmet_bvec_cache)
- return -ENOMEM;
+ if (!nvmet_bvec_cache) {
+ error = -ENOMEM;
+ goto out_free_pqt;
+ }

zbd_wq = alloc_workqueue("nvmet-zbd-wq", WQ_MEM_RECLAIM, 0);
if (!zbd_wq)
@@ -1688,6 +1732,8 @@ static int __init nvmet_init(void)
destroy_workqueue(zbd_wq);
out_destroy_bvec_cache:
kmem_cache_destroy(nvmet_bvec_cache);
+out_free_pqt:
+ nvmet_exit_pq_thread();
return error;
}

@@ -1701,6 +1747,11 @@ static void __exit nvmet_exit(void)
destroy_workqueue(zbd_wq);
kmem_cache_destroy(nvmet_bvec_cache);

+ if ((pqt_affinity_core >= -1 &&
+ pqt_affinity_core < nr_cpu_ids) ||
+ pqt_idle_usecs > 0 || pqt_ring_size > 0)
+ nvmet_exit_pq_thread();
+
BUILD_BUG_ON(sizeof(struct nvmf_disc_rsp_page_entry) != 1024);
BUILD_BUG_ON(sizeof(struct nvmf_disc_rsp_page_hdr) != 1024);
}
--
2.26.2

2023-09-15 13:53:41

by Ping Gan

[permalink] [raw]
Subject:

> On 9/13/2023 1:34 AM, Ping Gan wrote:
> > Since nvme target currently does not support to submit bio to a
> > polling
> > queue, the bio's completion relies on system interrupt. But when there
> > is high workload in system and the competition is very high, so it
> > makes
> > sense to add polling queue task to submit bio to disk's polling queue
> > and poll the completion queue of disk.
> >
> >
>
> I did some work in the past for nvmet polling and saw good
> performance improvement.
>
> Can you please share performance numbers for this series ?
>
> -ck

hi,
I have verified this patch on two testbeds one for host and the other
for target. I used tcp as transport protocol, spdk perf as initiator.
I do two group tests. The IO size of first is 4K, and the other is 2M.
Both includ randrw, randwrite and randrw. Both have same prerequisites.
At the initiator side I used 1 qp, 32 queue depth,and 1 spdk perf
application, and for target side I bound tcp queue to 1 target core.
And I get below results.
iosize_4k polling queue interrupt
randrw NIC_rx:338M/s NIC_tx:335M/s NIC_rx:260M/s
NIC_tx:258M/s
randwrite NIC_rx:587M/s NIC_rx:431M/s
randread NIC_tx:873M/s NIC_tx:654M/s

iosize_2M polling queue interrupt
randrw NIC_rx:738M/s NIC_tx:741M/s NIC_rx:674M/s
NIC_tx:674M/s
randwrite NIC_rx:1199M/s NIC_rx:1146M/s
randread NIC_tx:2226M/s NIC_tx:2119M/s

For iosize 4k the NIC's bandwidth of poling queue is more than 30% than
bandwidth of interrupt. But for iosize 2M the improvement is not obvious,
the randrw of polling queue is about 9% more than interrupt; randwrite
and randread of polling queue is about 5% more than interrupt.


Thanks,
Ping


2023-09-19 08:57:38

by Ping Gan

[permalink] [raw]
Subject: Re: [PATCH 0/4] nvmet: support polling queue task for bio

> On 9/13/2023 1:34 AM, Ping Gan wrote:
> > Since nvme target currently does not support to submit bio to a
> > polling
> > queue, the bio's completion relies on system interrupt. But when
> > there
> > is high workload in system and the competition is very high, so it
> > makes
> > sense to add polling queue task to submit bio to disk's polling
> > queue
> > and poll the completion queue of disk.
> >
> >
>
> I did some work in the past for nvmet polling and saw good
> performance improvement.
>
> Can you please share performance numbers for this series ?
>
> -ck

hi,
I have verified this patch on two testbeds one for host and the other
for target. I used tcp as transport protocol, spdk perf as initiator.
I did two group tests. The IO size of first is 4K, and the other is 2M.
Both include randrw, randwrite and randrw. Both also have same prerequisites.
At the initiator side I used 1 qp, 32 queue depth,and 1 spdk perf
application, and for target side I bound tcp queue to 1 target core.
And I get below results.
iosize_4k polling queue interrupt
randrw NIC_rx:338M/s NIC_tx:335M/s NIC_rx:260M/s
NIC_tx:258M/s
randwrite NIC_rx:587M/s NIC_rx:431M/s
randread NIC_tx:873M/s NIC_tx:654M/s

iosize_2M polling queue interrupt
randrw NIC_rx:738M/s NIC_tx:741M/s NIC_rx:674M/s
NIC_tx:674M/s
randwrite NIC_rx:1199M/s NIC_rx:1146M/s
randread NIC_tx:2226M/s NIC_tx:2119M/s

For iosize 4k the NIC's bandwidth of poling queue is more than 30% than
bandwidth of interrupt. But for iosize 2M the improvement is not
obvious,
the randrw of polling queue is about 9% more than interrupt; randwrite
and randread of polling queue is about 5% more than interrupt.


Thanks,
Ping