2021-04-05 05:51:30

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH rdma-next 0/8] Generalize if ULP supported check

From: Leon Romanovsky <[email protected]>

Hi,

This series adds new callback to check if ib client is supported/not_supported.
Such general callback allows us to save memory footprint by not starting
on devices that not going to work on them anyway.

Thanks

Parav Pandit (8):
RDMA/core: Check if client supports IB device or not
RDMA/cma: Skip device which doesn't support CM
IB/cm: Skip device which doesn't support IB CM
IB/core: Skip device which doesn't have necessary capabilities
IB/IPoIB: Skip device which doesn't have InfiniBand port
IB/opa_vnic: Move to client_supported callback
net/smc: Move to client_supported callback
net/rds: Move to client_supported callback

drivers/infiniband/core/cm.c | 15 +++++++++++++-
drivers/infiniband/core/cma.c | 15 +++++++++++++-
drivers/infiniband/core/device.c | 3 +++
drivers/infiniband/core/multicast.c | 15 +++++++++++++-
drivers/infiniband/core/sa_query.c | 15 +++++++++++++-
drivers/infiniband/ulp/ipoib/ipoib_main.c | 13 ++++++++++++
.../infiniband/ulp/opa_vnic/opa_vnic_vema.c | 4 +---
include/rdma/ib_verbs.h | 9 +++++++++
net/rds/ib.c | 20 ++++++++++++-------
net/smc/smc_ib.c | 9 ++++++---
10 files changed, 101 insertions(+), 17 deletions(-)

--
2.30.2


2021-04-05 05:51:30

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH rdma-next 2/8] RDMA/cma: Skip device which doesn't support CM

From: Parav Pandit <[email protected]>

A switchdev RDMA device do not support IB CM. When such device is added
to the RDMA CM's device list, when application invokes rdma_listen(),
cma attempts to listen to such device, however it has IB CM attribute
disabled.

Due to this, rdma_listen() call fails to listen for other non
switchdev devices as well.

A below error message can be seen.

infiniband mlx5_0: RDMA CMA: cma_listen_on_dev, error -38

A failing call flow is below.

rdma_listen()
cma_listen_on_all()
cma_listen_on_dev()
_cma_attach_to_dev()
rdma_listen() <- fails on a specific switchdev device

Hence, when a IB device doesn't support IB CM or IW CM, avoid adding
such device to the cma list.

Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
---
drivers/infiniband/core/cma.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 42a1c8955c50..80156faf90de 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -157,11 +157,13 @@ EXPORT_SYMBOL(rdma_res_to_id);

static int cma_add_one(struct ib_device *device);
static void cma_remove_one(struct ib_device *device, void *client_data);
+static bool cma_supported(struct ib_device *device);

static struct ib_client cma_client = {
.name = "cma",
.add = cma_add_one,
- .remove = cma_remove_one
+ .remove = cma_remove_one,
+ .is_supported = cma_supported,
};

static struct ib_sa_client sa_client;
@@ -4870,6 +4872,17 @@ static void cma_process_remove(struct cma_device *cma_dev)
wait_for_completion(&cma_dev->comp);
}

+static bool cma_supported(struct ib_device *device)
+{
+ u32 i;
+
+ rdma_for_each_port(device, i) {
+ if (rdma_cap_ib_cm(device, i) || rdma_cap_iw_cm(device, i))
+ return true;
+ }
+ return false;
+}
+
static int cma_add_one(struct ib_device *device)
{
struct rdma_id_private *to_destroy;
--
2.30.2

2021-04-05 05:52:15

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH rdma-next 1/8] RDMA/core: Check if client supports IB device or not

From: Parav Pandit <[email protected]>

RDMA devices are of different transport(iWarp, IB, RoCE) and have
different attributes.
Not all clients are interested in all type of devices.

Implement a generic callback that each IB client can implement to decide
if client add() or remove() should be done by the IB core or not for a
given IB device, client combination.

Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
---
drivers/infiniband/core/device.c | 3 +++
include/rdma/ib_verbs.h | 9 +++++++++
2 files changed, 12 insertions(+)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index c660cef66ac6..c9af2deba8c1 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -691,6 +691,9 @@ static int add_client_context(struct ib_device *device,
if (!device->kverbs_provider && !client->no_kverbs_req)
return 0;

+ if (client->is_supported && !client->is_supported(device))
+ return 0;
+
down_write(&device->client_data_rwsem);
/*
* So long as the client is registered hold both the client and device
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 59138174affa..777fbcbd4858 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -2756,6 +2756,15 @@ struct ib_client {
const union ib_gid *gid,
const struct sockaddr *addr,
void *client_data);
+ /*
+ * Returns if the client is supported for a given device or not.
+ * @dev: An RDMA device to check if client can support this RDMA or not.
+ *
+ * A client that is interested in specific device attributes, should
+ * implement it to check if client can be supported for this device or
+ * not.
+ */
+ bool (*is_supported)(struct ib_device *dev);

refcount_t uses;
struct completion uses_zero;
--
2.30.2

2021-04-05 05:53:38

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH rdma-next 6/8] IB/opa_vnic: Move to client_supported callback

From: Parav Pandit <[email protected]>

Move to newly introduced client_supported callback
Avoid client registration using newly introduced helper callback if the
IB device doesn't have OPA VNIC capability.

Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
---
drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c b/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c
index cecf0f7cadf9..58658eba97dd 100644
--- a/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c
+++ b/drivers/infiniband/ulp/opa_vnic/opa_vnic_vema.c
@@ -121,6 +121,7 @@ static struct ib_client opa_vnic_client = {
.name = opa_vnic_driver_name,
.add = opa_vnic_vema_add_one,
.remove = opa_vnic_vema_rem_one,
+ .is_supported = rdma_cap_opa_vnic,
};

/**
@@ -993,9 +994,6 @@ static int opa_vnic_vema_add_one(struct ib_device *device)
struct opa_vnic_ctrl_port *cport;
int rc, size = sizeof(*cport);

- if (!rdma_cap_opa_vnic(device))
- return -EOPNOTSUPP;
-
size += device->phys_port_cnt * sizeof(struct opa_vnic_vema_port);
cport = kzalloc(size, GFP_KERNEL);
if (!cport)
--
2.30.2

2021-04-05 05:53:38

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH rdma-next 4/8] IB/core: Skip device which doesn't have necessary capabilities

From: Parav Pandit <[email protected]>

If device doesn't have multicast capability, avoid client registration
for it. This saves 16Kbytes of memory for a RDMA device consist of 128
ports.

If device doesn't support subnet administration, avoid client
registration for it. This saves 8Kbytes of memory for a RDMA device
consist of 128 ports.

Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
---
drivers/infiniband/core/multicast.c | 15 ++++++++++++++-
drivers/infiniband/core/sa_query.c | 15 ++++++++++++++-
2 files changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index a5dd4b7a74bc..8c81acc24e3e 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -44,11 +44,13 @@

static int mcast_add_one(struct ib_device *device);
static void mcast_remove_one(struct ib_device *device, void *client_data);
+static bool mcast_client_supported(struct ib_device *device);

static struct ib_client mcast_client = {
.name = "ib_multicast",
.add = mcast_add_one,
- .remove = mcast_remove_one
+ .remove = mcast_remove_one,
+ .is_supported = mcast_client_supported,
};

static struct ib_sa_client sa_client;
@@ -816,6 +818,17 @@ static void mcast_event_handler(struct ib_event_handler *handler,
}
}

+static bool mcast_client_supported(struct ib_device *device)
+{
+ u32 i;
+
+ rdma_for_each_port(device, i) {
+ if (rdma_cap_ib_mcast(device, i))
+ return true;
+ }
+ return false;
+}
+
static int mcast_add_one(struct ib_device *device)
{
struct mcast_device *dev;
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 9a4a49c37922..7e00e24d9423 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -176,11 +176,13 @@ static const struct nla_policy ib_nl_policy[LS_NLA_TYPE_MAX] = {

static int ib_sa_add_one(struct ib_device *device);
static void ib_sa_remove_one(struct ib_device *device, void *client_data);
+static bool ib_sa_client_supported(struct ib_device *device);

static struct ib_client sa_client = {
.name = "sa",
.add = ib_sa_add_one,
- .remove = ib_sa_remove_one
+ .remove = ib_sa_remove_one,
+ .is_supported = ib_sa_client_supported,
};

static DEFINE_XARRAY_FLAGS(queries, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
@@ -2293,6 +2295,17 @@ static void ib_sa_event(struct ib_event_handler *handler,
}
}

+static bool ib_sa_client_supported(struct ib_device *device)
+{
+ unsigned int i;
+
+ rdma_for_each_port(device, i) {
+ if (rdma_cap_ib_sa(device, i))
+ return true;
+ }
+ return false;
+}
+
static int ib_sa_add_one(struct ib_device *device)
{
struct ib_sa_device *sa_dev;
--
2.30.2

2021-04-05 05:53:43

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH rdma-next 3/8] IB/cm: Skip device which doesn't support IB CM

From: Parav Pandit <[email protected]>

There are at least 3 types of RDMA devices which do not support IB CM.
They are
(1) A (eswitch) switchdev RDMA device,
(2) iWARP device and
(3) RDMA device without a RoCE capability

Hence, avoid IB CM initialization for such devices.

This saves 8Kbytes of memory for eswitch device consist of 512 ports and
also avoids unnecessary initialization for all above 3 types of devices.

Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
---
drivers/infiniband/core/cm.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 8a7791ebae69..5025f2c1347b 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -87,6 +87,7 @@ struct cm_id_private;
struct cm_work;
static int cm_add_one(struct ib_device *device);
static void cm_remove_one(struct ib_device *device, void *client_data);
+static bool cm_supported(struct ib_device *device);
static void cm_process_work(struct cm_id_private *cm_id_priv,
struct cm_work *work);
static int cm_send_sidr_rep_locked(struct cm_id_private *cm_id_priv,
@@ -103,7 +104,8 @@ static int cm_send_rej_locked(struct cm_id_private *cm_id_priv,
static struct ib_client cm_client = {
.name = "cm",
.add = cm_add_one,
- .remove = cm_remove_one
+ .remove = cm_remove_one,
+ .is_supported = cm_supported,
};

static struct ib_cm {
@@ -4371,6 +4373,17 @@ static void cm_remove_port_fs(struct cm_port *port)

}

+static bool cm_supported(struct ib_device *device)
+{
+ u32 i;
+
+ rdma_for_each_port(device, i) {
+ if (rdma_cap_ib_cm(device, i))
+ return true;
+ }
+ return false;
+}
+
static int cm_add_one(struct ib_device *ib_device)
{
struct cm_device *cm_dev;
--
2.30.2

2021-04-05 05:54:43

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH rdma-next 5/8] IB/IPoIB: Skip device which doesn't have InfiniBand port

From: Parav Pandit <[email protected]>

Skip RDMA device which doesn't have InfiniBand ports using newly
introduced client_supported() callback.

Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 8f769ebaacc6..b02c10dea242 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -93,6 +93,7 @@ static struct net_device *ipoib_get_net_dev_by_params(
struct ib_device *dev, u32 port, u16 pkey,
const union ib_gid *gid, const struct sockaddr *addr,
void *client_data);
+static bool ipoib_client_supported(struct ib_device *device);
static int ipoib_set_mac(struct net_device *dev, void *addr);
static int ipoib_ioctl(struct net_device *dev, struct ifreq *ifr,
int cmd);
@@ -102,6 +103,7 @@ static struct ib_client ipoib_client = {
.add = ipoib_add_one,
.remove = ipoib_remove_one,
.get_net_dev_by_params = ipoib_get_net_dev_by_params,
+ .is_supported = ipoib_client_supported,
};

#ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
@@ -2530,6 +2532,17 @@ static struct net_device *ipoib_add_port(const char *format,
return ERR_PTR(-ENOMEM);
}

+static bool ipoib_client_supported(struct ib_device *device)
+{
+ u32 i;
+
+ rdma_for_each_port(device, i) {
+ if (rdma_protocol_ib(device, i))
+ return true;
+ }
+ return false;
+}
+
static int ipoib_add_one(struct ib_device *device)
{
struct list_head *dev_list;
--
2.30.2

2021-04-05 07:46:49

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH rdma-next 7/8] net/smc: Move to client_supported callback

From: Parav Pandit <[email protected]>

Use newly introduced client_supported() callback to avoid client
additional if the RDMA device is not of IB type.

Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
---
net/smc/smc_ib.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 6b65c5d1f957..f7186d9d1299 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -767,6 +767,11 @@ void smc_ib_ndev_change(struct net_device *ndev, unsigned long event)
mutex_unlock(&smc_ib_devices.mutex);
}

+static bool smc_client_supported(struct ib_device *ibdev)
+{
+ return ibdev->node_type == RDMA_NODE_IB_CA;
+}
+
/* callback function for ib_register_client() */
static int smc_ib_add_dev(struct ib_device *ibdev)
{
@@ -774,9 +779,6 @@ static int smc_ib_add_dev(struct ib_device *ibdev)
u8 port_cnt;
int i;

- if (ibdev->node_type != RDMA_NODE_IB_CA)
- return -EOPNOTSUPP;
-
smcibdev = kzalloc(sizeof(*smcibdev), GFP_KERNEL);
if (!smcibdev)
return -ENOMEM;
@@ -840,6 +842,7 @@ static struct ib_client smc_ib_client = {
.name = "smc_ib",
.add = smc_ib_add_dev,
.remove = smc_ib_remove_dev,
+ .is_supported = smc_client_supported,
};

int __init smc_ib_register_client(void)
--
2.30.2

2021-04-05 07:58:51

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH rdma-next 8/8] net/rds: Move to client_supported callback

From: Parav Pandit <[email protected]>

Use newly introduced client_supported() callback to avoid client
additional if the RDMA device is not of IB type or if it doesn't
support device memory extensions.

Signed-off-by: Parav Pandit <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
---
net/rds/ib.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/net/rds/ib.c b/net/rds/ib.c
index 24c9a9005a6f..bd2ff7d5a718 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -125,18 +125,23 @@ void rds_ib_dev_put(struct rds_ib_device *rds_ibdev)
queue_work(rds_wq, &rds_ibdev->free_work);
}

-static int rds_ib_add_one(struct ib_device *device)
+static bool rds_client_supported(struct ib_device *device)
{
- struct rds_ib_device *rds_ibdev;
- int ret;
-
/* Only handle IB (no iWARP) devices */
if (device->node_type != RDMA_NODE_IB_CA)
- return -EOPNOTSUPP;
+ return false;

/* Device must support FRWR */
if (!(device->attrs.device_cap_flags & IB_DEVICE_MEM_MGT_EXTENSIONS))
- return -EOPNOTSUPP;
+ return false;
+
+ return true;
+}
+
+static int rds_ib_add_one(struct ib_device *device)
+{
+ struct rds_ib_device *rds_ibdev;
+ int ret;

rds_ibdev = kzalloc_node(sizeof(struct rds_ib_device), GFP_KERNEL,
ibdev_to_node(device));
@@ -288,7 +293,8 @@ static void rds_ib_remove_one(struct ib_device *device, void *client_data)
struct ib_client rds_ib_client = {
.name = "rds_ib",
.add = rds_ib_add_one,
- .remove = rds_ib_remove_one
+ .remove = rds_ib_remove_one,
+ .is_supported = rds_client_supported,
};

static int rds_ib_conn_info_visitor(struct rds_connection *conn,
--
2.30.2

2021-04-05 08:23:49

by Gal Pressman

[permalink] [raw]
Subject: Re: [PATCH rdma-next 1/8] RDMA/core: Check if client supports IB device or not

On 05/04/2021 8:49, Leon Romanovsky wrote:
> From: Parav Pandit <[email protected]>
>
> RDMA devices are of different transport(iWarp, IB, RoCE) and have
> different attributes.
> Not all clients are interested in all type of devices.
>
> Implement a generic callback that each IB client can implement to decide
> if client add() or remove() should be done by the IB core or not for a
> given IB device, client combination.
>
> Signed-off-by: Parav Pandit <[email protected]>
> Signed-off-by: Leon Romanovsky <[email protected]>
> ---
> drivers/infiniband/core/device.c | 3 +++
> include/rdma/ib_verbs.h | 9 +++++++++
> 2 files changed, 12 insertions(+)
>
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index c660cef66ac6..c9af2deba8c1 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -691,6 +691,9 @@ static int add_client_context(struct ib_device *device,
> if (!device->kverbs_provider && !client->no_kverbs_req)
> return 0;
>
> + if (client->is_supported && !client->is_supported(device))
> + return 0;

Isn't it better to remove the kverbs_provider flag (from previous if statement)
and unify it with this generic support check?

2021-04-05 08:56:28

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH rdma-next 1/8] RDMA/core: Check if client supports IB device or not

On Mon, Apr 05, 2021 at 09:20:32AM +0300, Gal Pressman wrote:
> On 05/04/2021 8:49, Leon Romanovsky wrote:
> > From: Parav Pandit <[email protected]>
> >
> > RDMA devices are of different transport(iWarp, IB, RoCE) and have
> > different attributes.
> > Not all clients are interested in all type of devices.
> >
> > Implement a generic callback that each IB client can implement to decide
> > if client add() or remove() should be done by the IB core or not for a
> > given IB device, client combination.
> >
> > Signed-off-by: Parav Pandit <[email protected]>
> > Signed-off-by: Leon Romanovsky <[email protected]>
> > ---
> > drivers/infiniband/core/device.c | 3 +++
> > include/rdma/ib_verbs.h | 9 +++++++++
> > 2 files changed, 12 insertions(+)
> >
> > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> > index c660cef66ac6..c9af2deba8c1 100644
> > --- a/drivers/infiniband/core/device.c
> > +++ b/drivers/infiniband/core/device.c
> > @@ -691,6 +691,9 @@ static int add_client_context(struct ib_device *device,
> > if (!device->kverbs_provider && !client->no_kverbs_req)
> > return 0;
> >
> > + if (client->is_supported && !client->is_supported(device))
> > + return 0;
>
> Isn't it better to remove the kverbs_provider flag (from previous if statement)
> and unify it with this generic support check?

I thought about it, but didn't find it worth. The kverbs_provider needs
to be provided by device and all ULPs except uverbs will have the same check.

Thanks

2021-04-06 03:21:40

by Santosh Shilimkar

[permalink] [raw]
Subject: [PATCH rdma-next 8/8] net/rds: Move to client_supported callback

On Apr 4, 2021, at 10:50 PM, Leon Romanovsky <[email protected]> wrote:
>
> From: Parav Pandit <[email protected]>
>
> Use newly introduced client_supported() callback to avoid client
> additional if the RDMA device is not of IB type or if it doesn't
> support device memory extensions.
>
> Signed-off-by: Parav Pandit <[email protected]>
> Signed-off-by: Leon Romanovsky <[email protected]>
> ---
> net/rds/ib.c | 20 +++++++++++++———

Looks fine by me.

Acked-by: Santosh Shilimkar <[email protected]>


2021-04-07 06:53:54

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH rdma-next 4/8] IB/core: Skip device which doesn't have necessary capabilities

On Mon, Apr 05, 2021 at 08:49:56AM +0300, Leon Romanovsky wrote:
> @@ -2293,6 +2295,17 @@ static void ib_sa_event(struct ib_event_handler *handler,
> }
> }
>
> +static bool ib_sa_client_supported(struct ib_device *device)
> +{
> + unsigned int i;
> +
> + rdma_for_each_port(device, i) {
> + if (rdma_cap_ib_sa(device, i))
> + return true;
> + }
> + return false;
> +}

This is already done though:

for (i = 0; i <= e - s; ++i) {
spin_lock_init(&sa_dev->port[i].ah_lock);
if (!rdma_cap_ib_sa(device, i + 1))
continue;
[..]

if (!count) {
ret = -EOPNOTSUPP;
goto free;

Why does it need to be duplicated? The other patches are all basically
like that too.

The add_one function should return -EOPNOTSUPP if it doesn't want to
run on this device and any supported checks should just be at the
front - this is how things work right now

Jason

2021-04-07 21:13:13

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH rdma-next 4/8] IB/core: Skip device which doesn't have necessary capabilities

On Wed, Apr 07, 2021 at 03:06:35PM +0000, Parav Pandit wrote:
>
>
> > From: Jason Gunthorpe <[email protected]>
> > Sent: Tuesday, April 6, 2021 9:17 PM
> >
> > On Mon, Apr 05, 2021 at 08:49:56AM +0300, Leon Romanovsky wrote:
> > > @@ -2293,6 +2295,17 @@ static void ib_sa_event(struct ib_event_handler
> > *handler,
> > > }
> > > }
> > >
> > > +static bool ib_sa_client_supported(struct ib_device *device) {
> > > + unsigned int i;
> > > +
> > > + rdma_for_each_port(device, i) {
> > > + if (rdma_cap_ib_sa(device, i))
> > > + return true;
> > > + }
> > > + return false;
> > > +}
> >
> > This is already done though:

> It is but, ib_sa_device() allocates ib_sa_device worth of struct for
> each port without checking the rdma_cap_ib_sa(). This results into
> allocating 40 * 512 = 20480 rounded of to power of 2 to 32K bytes of
> memory for the rdma device with 512 ports. Other modules are also
> similarly wasting such memory.

If it returns EOPNOTUPP then the remove is never called so if it
allocated memory and left it allocated then it is leaking memory.

If you are saying 32k bytes of temporary allocation matters during
device startup then it needs benchmarks and a use case.

> > The add_one function should return -EOPNOTSUPP if it doesn't want to run
> > on this device and any supported checks should just be at the front - this is
> > how things work right now

> I am ok to fold this check at the beginning of add callback. When
> 512 to 1K RoCE devices are used, they do not have SA, CM, CMA etc
> caps on and all the client needs to go through refcnt + xa + sem and
> unroll them. Is_supported() routine helps to cut down all of it. I
> didn't calculate the usec saved with it.

If that is the reason then explain in the cover letter and provide
benchmarks

Jason

2021-04-07 21:13:24

by Parav Pandit

[permalink] [raw]
Subject: RE: [PATCH rdma-next 4/8] IB/core: Skip device which doesn't have necessary capabilities



> From: Jason Gunthorpe <[email protected]>
> Sent: Tuesday, April 6, 2021 9:17 PM
>
> On Mon, Apr 05, 2021 at 08:49:56AM +0300, Leon Romanovsky wrote:
> > @@ -2293,6 +2295,17 @@ static void ib_sa_event(struct ib_event_handler
> *handler,
> > }
> > }
> >
> > +static bool ib_sa_client_supported(struct ib_device *device) {
> > + unsigned int i;
> > +
> > + rdma_for_each_port(device, i) {
> > + if (rdma_cap_ib_sa(device, i))
> > + return true;
> > + }
> > + return false;
> > +}
>
> This is already done though:
It is but, ib_sa_device() allocates ib_sa_device worth of struct for each port without checking the rdma_cap_ib_sa().
This results into allocating 40 * 512 = 20480 rounded of to power of 2 to 32K bytes of memory for the rdma device with 512 ports.
Other modules are also similarly wasting such memory.

>
> for (i = 0; i <= e - s; ++i) {
> spin_lock_init(&sa_dev->port[i].ah_lock);
> if (!rdma_cap_ib_sa(device, i + 1))
> continue;
> [..]
>
> if (!count) {
> ret = -EOPNOTSUPP;
> goto free;
>
> Why does it need to be duplicated? The other patches are all basically like
> that too.
>
> The add_one function should return -EOPNOTSUPP if it doesn't want to run
> on this device and any supported checks should just be at the front - this is
> how things work right now
>
I am ok to fold this check at the beginning of add callback.
When 512 to 1K RoCE devices are used, they do not have SA, CM, CMA etc caps on and all the client needs to go through refcnt + xa + sem and unroll them.
Is_supported() routine helps to cut down all of it. I didn't calculate the usec saved with it.

Please let me know.

2021-04-07 21:25:10

by Parav Pandit

[permalink] [raw]
Subject: RE: [PATCH rdma-next 4/8] IB/core: Skip device which doesn't have necessary capabilities



> From: Jason Gunthorpe <[email protected]>
> Sent: Wednesday, April 7, 2021 8:44 PM
>
> On Wed, Apr 07, 2021 at 03:06:35PM +0000, Parav Pandit wrote:
> >
> >
> > > From: Jason Gunthorpe <[email protected]>
> > > Sent: Tuesday, April 6, 2021 9:17 PM
> > >
> > > On Mon, Apr 05, 2021 at 08:49:56AM +0300, Leon Romanovsky wrote:
> > > > @@ -2293,6 +2295,17 @@ static void ib_sa_event(struct
> > > > ib_event_handler
> > > *handler,
> > > > }
> > > > }
> > > >
> > > > +static bool ib_sa_client_supported(struct ib_device *device) {
> > > > + unsigned int i;
> > > > +
> > > > + rdma_for_each_port(device, i) {
> > > > + if (rdma_cap_ib_sa(device, i))
> > > > + return true;
> > > > + }
> > > > + return false;
> > > > +}
> > >
> > > This is already done though:
>
> > It is but, ib_sa_device() allocates ib_sa_device worth of struct for
> > each port without checking the rdma_cap_ib_sa(). This results into
> > allocating 40 * 512 = 20480 rounded of to power of 2 to 32K bytes of
> > memory for the rdma device with 512 ports. Other modules are also
> > similarly wasting such memory.
>
> If it returns EOPNOTUPP then the remove is never called so if it allocated
> memory and left it allocated then it is leaking memory.
>
I probably confused you. There is no leak today because add_one allocates memory, and later on when SA/CM etc per port cap is not present, it is unused left there which is freed on remove_one().
Returning EOPNOTUPP is fine at start of add_one() before allocation.

> If you are saying 32k bytes of temporary allocation matters during device
> startup then it needs benchmarks and a use case.
>
Use case is clear and explained in commit logs, i.e. to not allocate the memory which is never used.

> > > The add_one function should return -EOPNOTSUPP if it doesn't want to
> > > run on this device and any supported checks should just be at the
> > > front - this is how things work right now
>
> > I am ok to fold this check at the beginning of add callback. When
> > 512 to 1K RoCE devices are used, they do not have SA, CM, CMA etc caps
> > on and all the client needs to go through refcnt + xa + sem and unroll
> > them. Is_supported() routine helps to cut down all of it. I didn't
> > calculate the usec saved with it.
>
> If that is the reason then explain in the cover letter and provide benchmarks
I doubt it will be significant but I will do a benchmark.

2021-04-08 12:19:35

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH rdma-next 4/8] IB/core: Skip device which doesn't have necessary capabilities

On Wed, Apr 07, 2021 at 03:44:35PM +0000, Parav Pandit wrote:

> > If it returns EOPNOTUPP then the remove is never called so if it allocated
> > memory and left it allocated then it is leaking memory.
> >
> I probably confused you. There is no leak today because add_one
> allocates memory, and later on when SA/CM etc per port cap is not
> present, it is unused left there which is freed on remove_one().
> Returning EOPNOTUPP is fine at start of add_one() before allocation.

Most of ULPs are OK, eg umad does:

umad_dev = kzalloc(struct_size(umad_dev, ports, e - s + 1), GFP_KERNEL);
if (!umad_dev)
return -ENOMEM;
for (i = s; i <= e; ++i) {
if (!rdma_cap_ib_mad(device, i))
continue;

if (!count) {
ret = -EOPNOTSUPP;
goto free;
free:
/* balances kref_init */
ib_umad_dev_put(umad_dev);

It looks like only cm.c and cma.c need fixing, just fix those two.

The CM using ULPs have a different issue though..

Jason

2021-04-09 12:32:44

by Parav Pandit

[permalink] [raw]
Subject: RE: [PATCH rdma-next 4/8] IB/core: Skip device which doesn't have necessary capabilities



> From: Jason Gunthorpe <[email protected]>
> Sent: Thursday, April 8, 2021 5:46 PM
> On Wed, Apr 07, 2021 at 03:44:35PM +0000, Parav Pandit wrote:
>
> > > If it returns EOPNOTUPP then the remove is never called so if it
> > > allocated memory and left it allocated then it is leaking memory.
> > >
> > I probably confused you. There is no leak today because add_one
> > allocates memory, and later on when SA/CM etc per port cap is not
> > present, it is unused left there which is freed on remove_one().
> > Returning EOPNOTUPP is fine at start of add_one() before allocation.
>
> Most of ULPs are OK, eg umad does:
>
> umad_dev = kzalloc(struct_size(umad_dev, ports, e - s + 1),
> GFP_KERNEL);
> if (!umad_dev)
> return -ENOMEM;
> for (i = s; i <= e; ++i) {
> if (!rdma_cap_ib_mad(device, i))
> continue;
>
> if (!count) {
> ret = -EOPNOTSUPP;
> goto free;
> free:
> /* balances kref_init */
> ib_umad_dev_put(umad_dev);
>
> It looks like only cm.c and cma.c need fixing, just fix those two.
Only cma.c needs a fixing. cm.c also reports EOPNOTSUPP.
I will send the simplified fix through Leon.