2020-09-29 09:15:31

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH blk-next 0/2] Delete the get_vector_affinity leftovers

From: Leon Romanovsky <[email protected]>

There are no drivers that implement .get_vector_affinity(), so delete
the RDMA function and simplify block code.

Thanks

P.S. Probably it should go through block tree.

Leon Romanovsky (2):
blk-mq-rdma: Delete not-used multi-queue RDMA map queue code
RDMA/core: Delete not-implemented get_vector_affinity

block/Kconfig | 5 ----
block/Makefile | 1 -
block/blk-mq-rdma.c | 44 --------------------------------
drivers/infiniband/core/device.c | 1 -
drivers/nvme/host/rdma.c | 7 ++---
include/linux/blk-mq-rdma.h | 11 --------
include/rdma/ib_verbs.h | 23 -----------------
7 files changed, 2 insertions(+), 90 deletions(-)
delete mode 100644 block/blk-mq-rdma.c
delete mode 100644 include/linux/blk-mq-rdma.h

--
2.26.2


2020-09-29 09:16:10

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH blk-next 2/2] RDMA/core: Delete not-implemented get_vector_affinity

From: Leon Romanovsky <[email protected]>

There are no drivers that support .get_vector_affinity(), so delete it.

Fixes: 9afc97c29b03 ("mlx5: remove support for ib_get_vector_affinity")
Signed-off-by: Leon Romanovsky <[email protected]>
---
drivers/infiniband/core/device.c | 1 -
include/rdma/ib_verbs.h | 23 -----------------------
2 files changed, 24 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 417d93bbdaca..e00ce044555d 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -2619,7 +2619,6 @@ void ib_set_device_ops(struct ib_device *dev, const struct ib_device_ops *ops)
SET_DEVICE_OP(dev_ops, get_link_layer);
SET_DEVICE_OP(dev_ops, get_netdev);
SET_DEVICE_OP(dev_ops, get_port_immutable);
- SET_DEVICE_OP(dev_ops, get_vector_affinity);
SET_DEVICE_OP(dev_ops, get_vf_config);
SET_DEVICE_OP(dev_ops, get_vf_guid);
SET_DEVICE_OP(dev_ops, get_vf_stats);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 7fb09a36b654..b1b279d888f0 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2319,8 +2319,6 @@ struct ib_device_ops {
int (*modify_device)(struct ib_device *device, int device_modify_mask,
struct ib_device_modify *device_modify);
void (*get_dev_fw_str)(struct ib_device *device, char *str);
- const struct cpumask *(*get_vector_affinity)(struct ib_device *ibdev,
- int comp_vector);
int (*query_port)(struct ib_device *device, u8 port_num,
struct ib_port_attr *port_attr);
int (*modify_port)(struct ib_device *device, u8 port_num,
@@ -4545,27 +4543,6 @@ static inline __be16 ib_lid_be16(u32 lid)
return cpu_to_be16((u16)lid);
}

-/**
- * ib_get_vector_affinity - Get the affinity mappings of a given completion
- * vector
- * @device: the rdma device
- * @comp_vector: index of completion vector
- *
- * Returns NULL on failure, otherwise a corresponding cpu map of the
- * completion vector (returns all-cpus map if the device driver doesn't
- * implement get_vector_affinity).
- */
-static inline const struct cpumask *
-ib_get_vector_affinity(struct ib_device *device, int comp_vector)
-{
- if (comp_vector < 0 || comp_vector >= device->num_comp_vectors ||
- !device->ops.get_vector_affinity)
- return NULL;
-
- return device->ops.get_vector_affinity(device, comp_vector);
-
-}
-
/**
* rdma_roce_rescan_device - Rescan all of the network devices in the system
* and add their gids, as needed, to the relevant RoCE devices.
--
2.26.2

2020-09-29 09:16:54

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH blk-next 1/2] blk-mq-rdma: Delete not-used multi-queue RDMA map queue code

From: Leon Romanovsky <[email protected]>

The RDMA vector affinity code is not backed up by any driver and always
returns NULL to every ib_get_vector_affinity() call.

This means that blk_mq_rdma_map_queues() always takes fallback path.

Fixes: 9afc97c29b03 ("mlx5: remove support for ib_get_vector_affinity")
Signed-off-by: Leon Romanovsky <[email protected]>
---
block/Kconfig | 5 -----
block/Makefile | 1 -
block/blk-mq-rdma.c | 44 -------------------------------------
drivers/nvme/host/rdma.c | 7 ++----
include/linux/blk-mq-rdma.h | 11 ----------
5 files changed, 2 insertions(+), 66 deletions(-)
delete mode 100644 block/blk-mq-rdma.c
delete mode 100644 include/linux/blk-mq-rdma.h

diff --git a/block/Kconfig b/block/Kconfig
index bbad5e8bbffe..8ede308a1343 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -227,11 +227,6 @@ config BLK_MQ_VIRTIO
depends on BLOCK && VIRTIO
default y

-config BLK_MQ_RDMA
- bool
- depends on BLOCK && INFINIBAND
- default y
-
config BLK_PM
def_bool BLOCK && PM

diff --git a/block/Makefile b/block/Makefile
index 8d841f5f986f..bbdc3e82308a 100644
--- a/block/Makefile
+++ b/block/Makefile
@@ -29,7 +29,6 @@ obj-$(CONFIG_BLK_DEV_INTEGRITY) += bio-integrity.o blk-integrity.o
obj-$(CONFIG_BLK_DEV_INTEGRITY_T10) += t10-pi.o
obj-$(CONFIG_BLK_MQ_PCI) += blk-mq-pci.o
obj-$(CONFIG_BLK_MQ_VIRTIO) += blk-mq-virtio.o
-obj-$(CONFIG_BLK_MQ_RDMA) += blk-mq-rdma.o
obj-$(CONFIG_BLK_DEV_ZONED) += blk-zoned.o
obj-$(CONFIG_BLK_WBT) += blk-wbt.o
obj-$(CONFIG_BLK_DEBUG_FS) += blk-mq-debugfs.o
diff --git a/block/blk-mq-rdma.c b/block/blk-mq-rdma.c
deleted file mode 100644
index 14f968e58b8f..000000000000
--- a/block/blk-mq-rdma.c
+++ /dev/null
@@ -1,44 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0
-/*
- * Copyright (c) 2017 Sagi Grimberg.
- */
-#include <linux/blk-mq.h>
-#include <linux/blk-mq-rdma.h>
-#include <rdma/ib_verbs.h>
-
-/**
- * blk_mq_rdma_map_queues - provide a default queue mapping for rdma device
- * @map: CPU to hardware queue map.
- * @dev: rdma device to provide a mapping for.
- * @first_vec: first interrupt vectors to use for queues (usually 0)
- *
- * This function assumes the rdma device @dev has at least as many available
- * interrupt vetors as @set has queues. It will then query it's affinity mask
- * and built queue mapping that maps a queue to the CPUs that have irq affinity
- * for the corresponding vector.
- *
- * In case either the driver passed a @dev with less vectors than
- * @set->nr_hw_queues, or @dev does not provide an affinity mask for a
- * vector, we fallback to the naive mapping.
- */
-int blk_mq_rdma_map_queues(struct blk_mq_queue_map *map,
- struct ib_device *dev, int first_vec)
-{
- const struct cpumask *mask;
- unsigned int queue, cpu;
-
- for (queue = 0; queue < map->nr_queues; queue++) {
- mask = ib_get_vector_affinity(dev, first_vec + queue);
- if (!mask)
- goto fallback;
-
- for_each_cpu(cpu, mask)
- map->mq_map[cpu] = map->queue_offset + queue;
- }
-
- return 0;
-
-fallback:
- return blk_mq_map_queues(map);
-}
-EXPORT_SYMBOL_GPL(blk_mq_rdma_map_queues);
diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c
index 9e378d0a0c01..5989d4e35ef3 100644
--- a/drivers/nvme/host/rdma.c
+++ b/drivers/nvme/host/rdma.c
@@ -12,7 +12,6 @@
#include <linux/string.h>
#include <linux/atomic.h>
#include <linux/blk-mq.h>
-#include <linux/blk-mq-rdma.h>
#include <linux/types.h>
#include <linux/list.h>
#include <linux/mutex.h>
@@ -2171,10 +2170,8 @@ static int nvme_rdma_map_queues(struct blk_mq_tag_set *set)
ctrl->io_queues[HCTX_TYPE_DEFAULT];
set->map[HCTX_TYPE_READ].queue_offset = 0;
}
- blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_DEFAULT],
- ctrl->device->dev, 0);
- blk_mq_rdma_map_queues(&set->map[HCTX_TYPE_READ],
- ctrl->device->dev, 0);
+ blk_mq_map_queues(&set->map[HCTX_TYPE_DEFAULT]);
+ blk_mq_map_queues(&set->map[HCTX_TYPE_READ]);

if (opts->nr_poll_queues && ctrl->io_queues[HCTX_TYPE_POLL]) {
/* map dedicated poll queues only if we have queues left */
diff --git a/include/linux/blk-mq-rdma.h b/include/linux/blk-mq-rdma.h
deleted file mode 100644
index 5cc5f0f36218..000000000000
--- a/include/linux/blk-mq-rdma.h
+++ /dev/null
@@ -1,11 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _LINUX_BLK_MQ_RDMA_H
-#define _LINUX_BLK_MQ_RDMA_H
-
-struct blk_mq_tag_set;
-struct ib_device;
-
-int blk_mq_rdma_map_queues(struct blk_mq_queue_map *map,
- struct ib_device *dev, int first_vec);
-
-#endif /* _LINUX_BLK_MQ_RDMA_H */
--
2.26.2

2020-09-29 10:22:15

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH blk-next 1/2] blk-mq-rdma: Delete not-used multi-queue RDMA map queue code

On Tue, Sep 29, 2020 at 12:13:57PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <[email protected]>
>
> The RDMA vector affinity code is not backed up by any driver and always
> returns NULL to every ib_get_vector_affinity() call.
>
> This means that blk_mq_rdma_map_queues() always takes fallback path.
>
> Fixes: 9afc97c29b03 ("mlx5: remove support for ib_get_vector_affinity")
> Signed-off-by: Leon Romanovsky <[email protected]>

So you guys totally broken the nvme queue assignment without even
telling anyone? Great job!

2020-09-29 10:37:53

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH blk-next 1/2] blk-mq-rdma: Delete not-used multi-queue RDMA map queue code

On Tue, Sep 29, 2020 at 12:20:46PM +0200, Christoph Hellwig wrote:
> On Tue, Sep 29, 2020 at 12:13:57PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <[email protected]>
> >
> > The RDMA vector affinity code is not backed up by any driver and always
> > returns NULL to every ib_get_vector_affinity() call.
> >
> > This means that blk_mq_rdma_map_queues() always takes fallback path.
> >
> > Fixes: 9afc97c29b03 ("mlx5: remove support for ib_get_vector_affinity")
> > Signed-off-by: Leon Romanovsky <[email protected]>
>
> So you guys totally broken the nvme queue assignment without even
> telling anyone? Great job!

Who is "you guys" and it wasn't silent either? I'm sure that Sagi knows the craft.
https://lore.kernel.org/linux-rdma/[email protected]/

commit 759ace7832802eaefbca821b2b43a44ab896b449
Author: Sagi Grimberg <[email protected]>
Date: Thu Nov 1 13:08:07 2018 -0700

i40iw: remove support for ib_get_vector_affinity

....

commit 9afc97c29b032af9a4112c2f4a02d5313b4dc71f
Author: Sagi Grimberg <[email protected]>
Date: Thu Nov 1 09:13:12 2018 -0700

mlx5: remove support for ib_get_vector_affinity

Thanks

2020-09-29 18:31:57

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH blk-next 1/2] blk-mq-rdma: Delete not-used multi-queue RDMA map queue code


>>> From: Leon Romanovsky <[email protected]>
>>>
>>> The RDMA vector affinity code is not backed up by any driver and always
>>> returns NULL to every ib_get_vector_affinity() call.
>>>
>>> This means that blk_mq_rdma_map_queues() always takes fallback path.
>>>
>>> Fixes: 9afc97c29b03 ("mlx5: remove support for ib_get_vector_affinity")
>>> Signed-off-by: Leon Romanovsky <[email protected]>
>>
>> So you guys totally broken the nvme queue assignment without even
>> telling anyone? Great job!
>
> Who is "you guys" and it wasn't silent either? I'm sure that Sagi knows the craft.
> https://lore.kernel.org/linux-rdma/[email protected]/
>
> commit 759ace7832802eaefbca821b2b43a44ab896b449
> Author: Sagi Grimberg <[email protected]>
> Date: Thu Nov 1 13:08:07 2018 -0700
>
> i40iw: remove support for ib_get_vector_affinity
>
> ....
>
> commit 9afc97c29b032af9a4112c2f4a02d5313b4dc71f
> Author: Sagi Grimberg <[email protected]>
> Date: Thu Nov 1 09:13:12 2018 -0700
>
> mlx5: remove support for ib_get_vector_affinity
>
> Thanks

Yes, basically usage of managed affinity caused people to report
regressions not being able to change irq affinity from procfs.

Back then I started a discussion with Thomas to make managed
affinity to still allow userspace to modify this, but this
was dropped at some point. So currently rdma cannot do
automatic irq affinitization out of the box.

2020-10-01 05:05:25

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH blk-next 0/2] Delete the get_vector_affinity leftovers

On Tue, Sep 29, 2020 at 12:13:56PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <[email protected]>
>
> There are no drivers that implement .get_vector_affinity(), so delete
> the RDMA function and simplify block code.
>
> Thanks
>
> P.S. Probably it should go through block tree.
>
> Leon Romanovsky (2):
> blk-mq-rdma: Delete not-used multi-queue RDMA map queue code
> RDMA/core: Delete not-implemented get_vector_affinity

Jens, Keith

How can we progress here?

Thanks

>
> block/Kconfig | 5 ----
> block/Makefile | 1 -
> block/blk-mq-rdma.c | 44 --------------------------------
> drivers/infiniband/core/device.c | 1 -
> drivers/nvme/host/rdma.c | 7 ++---
> include/linux/blk-mq-rdma.h | 11 --------
> include/rdma/ib_verbs.h | 23 -----------------
> 7 files changed, 2 insertions(+), 90 deletions(-)
> delete mode 100644 block/blk-mq-rdma.c
> delete mode 100644 include/linux/blk-mq-rdma.h
>
> --
> 2.26.2
>

2020-10-02 01:29:37

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH blk-next 0/2] Delete the get_vector_affinity leftovers

On 9/30/20 11:01 PM, Leon Romanovsky wrote:
> On Tue, Sep 29, 2020 at 12:13:56PM +0300, Leon Romanovsky wrote:
>> From: Leon Romanovsky <[email protected]>
>>
>> There are no drivers that implement .get_vector_affinity(), so delete
>> the RDMA function and simplify block code.
>>
>> Thanks
>>
>> P.S. Probably it should go through block tree.
>>
>> Leon Romanovsky (2):
>> blk-mq-rdma: Delete not-used multi-queue RDMA map queue code
>> RDMA/core: Delete not-implemented get_vector_affinity
>
> Jens, Keith
>
> How can we progress here?

I'd really like for the nvme side to sign off on this first.

--
Jens Axboe

2020-10-02 06:49:28

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH blk-next 1/2] blk-mq-rdma: Delete not-used multi-queue RDMA map queue code

On Tue, Sep 29, 2020 at 11:24:49AM -0700, Sagi Grimberg wrote:
> Yes, basically usage of managed affinity caused people to report
> regressions not being able to change irq affinity from procfs.

Well, why would they change it? The whole point of the infrastructure
is that there is a single sane affinity setting for a given setup. Now
that setting needed some refinement from the original series (e.g. the
current series about only using housekeeping cpus if cpu isolation is
in use). But allowing random users to modify affinity is just a receipe
for a trainwreck.

So I think we need to bring this back ASAP, as doing affinity right
out of the box is an absolute requirement for sane performance without
all the benchmarketing deep magic.

2020-10-02 20:23:23

by Sagi Grimberg

[permalink] [raw]
Subject: Re: [PATCH blk-next 1/2] blk-mq-rdma: Delete not-used multi-queue RDMA map queue code


>> Yes, basically usage of managed affinity caused people to report
>> regressions not being able to change irq affinity from procfs.
>
> Well, why would they change it? The whole point of the infrastructure
> is that there is a single sane affinity setting for a given setup. Now
> that setting needed some refinement from the original series (e.g. the
> current series about only using housekeeping cpus if cpu isolation is
> in use). But allowing random users to modify affinity is just a receipe
> for a trainwreck.

Well allowing people to mangle irq affinity settings seem to be a hard
requirement from the discussions in the past.

> So I think we need to bring this back ASAP, as doing affinity right
> out of the box is an absolute requirement for sane performance without
> all the benchmarketing deep magic.

Well, it's hard to say that setting custom irq affinity settings is
deemed non-useful to anyone and hence should be prevented. I'd expect
that irq settings have a sane default that works and if someone wants to
change it, it can but there should be no guarantees on optimal
performance. But IIRC this had some dependencies on drivers and some
more infrastructure to handle dynamic changes...

2020-10-05 08:39:43

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH blk-next 1/2] blk-mq-rdma: Delete not-used multi-queue RDMA map queue code

On Fri, Oct 02, 2020 at 01:20:35PM -0700, Sagi Grimberg wrote:
>> Well, why would they change it? The whole point of the infrastructure
>> is that there is a single sane affinity setting for a given setup. Now
>> that setting needed some refinement from the original series (e.g. the
>> current series about only using housekeeping cpus if cpu isolation is
>> in use). But allowing random users to modify affinity is just a receipe
>> for a trainwreck.
>
> Well allowing people to mangle irq affinity settings seem to be a hard
> requirement from the discussions in the past.
>
>> So I think we need to bring this back ASAP, as doing affinity right
>> out of the box is an absolute requirement for sane performance without
>> all the benchmarketing deep magic.
>
> Well, it's hard to say that setting custom irq affinity settings is
> deemed non-useful to anyone and hence should be prevented. I'd expect
> that irq settings have a sane default that works and if someone wants to
> change it, it can but there should be no guarantees on optimal
> performance. But IIRC this had some dependencies on drivers and some
> more infrastructure to handle dynamic changes...

The problem is that people change random settings. We need to generalize
it into a sane API (e.g. the housekeeping CPUs thing which totally makes
sense).

2020-10-06 05:01:33

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH blk-next 1/2] blk-mq-rdma: Delete not-used multi-queue RDMA map queue code

On Mon, Oct 05, 2020 at 10:38:17AM +0200, Christoph Hellwig wrote:
> On Fri, Oct 02, 2020 at 01:20:35PM -0700, Sagi Grimberg wrote:
> >> Well, why would they change it? The whole point of the infrastructure
> >> is that there is a single sane affinity setting for a given setup. Now
> >> that setting needed some refinement from the original series (e.g. the
> >> current series about only using housekeeping cpus if cpu isolation is
> >> in use). But allowing random users to modify affinity is just a receipe
> >> for a trainwreck.
> >
> > Well allowing people to mangle irq affinity settings seem to be a hard
> > requirement from the discussions in the past.
> >
> >> So I think we need to bring this back ASAP, as doing affinity right
> >> out of the box is an absolute requirement for sane performance without
> >> all the benchmarketing deep magic.
> >
> > Well, it's hard to say that setting custom irq affinity settings is
> > deemed non-useful to anyone and hence should be prevented. I'd expect
> > that irq settings have a sane default that works and if someone wants to
> > change it, it can but there should be no guarantees on optimal
> > performance. But IIRC this had some dependencies on drivers and some
> > more infrastructure to handle dynamic changes...
>
> The problem is that people change random settings. We need to generalize
> it into a sane API (e.g. the housekeeping CPUs thing which totally makes
> sense).

I don't see many people jump on the bandwagon, someone should do it, but
who will? I personally have no knowledge in that area to do anything
meaningful.

Thanks