2015-04-07 12:26:02

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 00/17] IB/Verbs: IB Management Helpers


Since v1:
* Apply suggestions from Doug, Ira, Jason, Tom, thanks for the comments :-)
and please remind me if I missed anything :-P
* Adopt new callback query_transport() to directly get the transport type
of device
* Reform a lot to adopt new management helpers, cleanup the old helpers

There are plenty of lengthy code to check the transport type of IB device,
or the link layer type of it's port, but actually we are just speculating
whether a particular management/feature is supported by the device/port.

Thus instead of inferring, we should have our own mechanism for IB management
capability/protocol/feature checking, several proposals below.

This patch set will reform the method of getting transport type, we will
now using query_transport() instead of inferring from transport and link
layer respectively, also we defined the new transport type to make the
concept more reasonable.

Mapping List:
node-type link-layer old-transport new-transport
nes RNIC ETH IWARP IWARP
amso1100 RNIC ETH IWARP IWARP
cxgb3 RNIC ETH IWARP IWARP
cxgb4 RNIC ETH IWARP IWARP
usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP
ocrdma IB_CA ETH IB IBOE
mlx4 IB_CA IB/ETH IB IB/IBOE
mlx5 IB_CA IB IB IB
ehca IB_CA IB IB IB
ipath IB_CA IB IB IB
mthca IB_CA IB IB IB
qib IB_CA IB IB IB

For example:
if (transport == IB) && (link-layer == ETH)
will now become:
if (query_transport() == IBOE)

Thus we will be able to get rid of the respective transport and link-layer
checking, and it will help us to add new protocol/Technology (like OPA) more
easier, also with the introduced management helpers, IB management logical
will be more clear and easier for extending.

TODO:
The patch set covered a wide range of IB stuff, thus for those who are
familiar with the particular part, your suggestion would be invaluable ;-)

Patches haven't been tested yet, we appreciate if any one who have these
HW willing to provide his Tested-by :-)

Proposals:
Sean:
https://www.mail-archive.com/[email protected]/msg23339.html
Doug:
https://www.mail-archive.com/[email protected]/msg23418.html
Jason:
https://www.mail-archive.com/[email protected]/msg23425.html

Michael Wang (17):
[PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW
[PATCH v2 02/17] IB/Verbs: Implement raw management helpers
[PATCH v2 03/17] IB/Verbs: Use management helper cap_ib_mad() for mad-check
[PATCH v2 04/17] IB/Verbs: Use management helper cap_ib_smi() for smi-check
[PATCH v2 05/17] IB/Verbs: Use management helper cap_ib_cm() for cm-check
[PATCH v2 06/17] IB/Verbs: Use management helper cap_ib_sa() for sa-check
[PATCH v2 07/17] IB/Verbs: Use management helper cap_ib_mcast() for mcast-check
[PATCH v2 08/17] IB/Verbs: Use management helper cap_ipoib() for ipoib-check
[PATCH v2 09/17] IB/Verbs: Use helper cap_read_multi_sge() and reform svc_rdma_accept()
[PATCH v2 10/17] IB/Verbs: Adopt management helpers for IB helpers
[PATCH v2 11/17] IB/Verbs: Reform link_layer_show() and ib_uverbs_query_port()
[PATCH v2 12/17] IB/Verbs: Use management helper cap_ib_cm_dev() for cm-device-check
[PATCH v2 13/17] IB/Verbs: Reform cma/ucma with management helpers
[PATCH v2 14/17] IB/Verbs: Reserve legacy transport type for 'struct rdma_dev_addr'
[PATCH v2 15/17] IB/Verbs: Reform cma_acquire_dev() with management helpers
[PATCH v2 16/17] IB/Verbs: Cleanup rdma_node_get_transport()
[PATCH v2 17/17] IB/Verbs: Move rdma_port_get_link_layer() to mlx4 head file

---
drivers/infiniband/core/agent.c | 2
drivers/infiniband/core/cm.c | 22 +-
drivers/infiniband/core/cma.c | 281 ++++++++++++---------------
drivers/infiniband/core/device.c | 1
drivers/infiniband/core/mad.c | 20 -
drivers/infiniband/core/multicast.c | 12 -
drivers/infiniband/core/sa_query.c | 29 +-
drivers/infiniband/core/sysfs.c | 8
drivers/infiniband/core/ucm.c | 3
drivers/infiniband/core/ucma.c | 25 --
drivers/infiniband/core/user_mad.c | 26 +-
drivers/infiniband/core/uverbs_cmd.c | 6
drivers/infiniband/core/verbs.c | 51 ----
drivers/infiniband/hw/amso1100/c2_provider.c | 7
drivers/infiniband/hw/cxgb3/iwch_provider.c | 7
drivers/infiniband/hw/cxgb4/provider.c | 7
drivers/infiniband/hw/ehca/ehca_hca.c | 6
drivers/infiniband/hw/ehca/ehca_iverbs.h | 3
drivers/infiniband/hw/ehca/ehca_main.c | 1
drivers/infiniband/hw/ipath/ipath_verbs.c | 7
drivers/infiniband/hw/mlx4/main.c | 10
drivers/infiniband/hw/mlx4/mlx4_ib.h | 8
drivers/infiniband/hw/mlx5/main.c | 7
drivers/infiniband/hw/mthca/mthca_provider.c | 7
drivers/infiniband/hw/nes/nes_verbs.c | 6
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6
drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3
drivers/infiniband/hw/qib/qib_verbs.c | 7
drivers/infiniband/hw/usnic/usnic_ib_main.c | 1
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6
drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2
drivers/infiniband/ulp/ipoib/ipoib_main.c | 17 -
include/rdma/ib_verbs.h | 163 ++++++++++++++-
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 4
net/sunrpc/xprtrdma/svc_rdma_transport.c | 12 -
36 files changed, 490 insertions(+), 294 deletions(-)


2015-04-07 12:28:22

by Michael Wang

[permalink] [raw]
Subject: [PATCH 01/17] IB/Verbs: Implement new callback query_transport() for each HW


Add new callback query_transport() and implement for each HW.

Mapping List:
node-type link-layer old-transport new-transport
nes RNIC ETH IWARP IWARP
amso1100 RNIC ETH IWARP IWARP
cxgb3 RNIC ETH IWARP IWARP
cxgb4 RNIC ETH IWARP IWARP
usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP
ocrdma IB_CA ETH IB IBOE
mlx4 IB_CA IB/ETH IB IB/IBOE
mlx5 IB_CA IB IB IB
ehca IB_CA IB IB IB
ipath IB_CA IB IB IB
mthca IB_CA IB IB IB
qib IB_CA IB IB IB

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/device.c | 1 +
drivers/infiniband/core/verbs.c | 4 +++-
drivers/infiniband/hw/amso1100/c2_provider.c | 7 +++++++
drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 +++++++
drivers/infiniband/hw/cxgb4/provider.c | 7 +++++++
drivers/infiniband/hw/ehca/ehca_hca.c | 6 ++++++
drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 +++
drivers/infiniband/hw/ehca/ehca_main.c | 1 +
drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +++++++
drivers/infiniband/hw/mlx4/main.c | 10 ++++++++++
drivers/infiniband/hw/mlx5/main.c | 7 +++++++
drivers/infiniband/hw/mthca/mthca_provider.c | 7 +++++++
drivers/infiniband/hw/nes/nes_verbs.c | 6 ++++++
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 +
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 ++++++
drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 +++
drivers/infiniband/hw/qib/qib_verbs.c | 7 +++++++
drivers/infiniband/hw/usnic/usnic_ib_main.c | 1 +
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 ++++++
drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2 ++
include/rdma/ib_verbs.h | 7 ++++++-
21 files changed, 104 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 18c1ece..a9587c4 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -76,6 +76,7 @@ static int ib_device_check_mandatory(struct ib_device *device)
} mandatory_table[] = {
IB_MANDATORY_FUNC(query_device),
IB_MANDATORY_FUNC(query_port),
+ IB_MANDATORY_FUNC(query_transport),
IB_MANDATORY_FUNC(query_pkey),
IB_MANDATORY_FUNC(query_gid),
IB_MANDATORY_FUNC(alloc_pd),
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index f93eb8d..83370de 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -133,14 +133,16 @@ enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_
if (device->get_link_layer)
return device->get_link_layer(device, port_num);

- switch (rdma_node_get_transport(device->node_type)) {
+ switch (device->query_transport(device, port_num)) {
case RDMA_TRANSPORT_IB:
+ case RDMA_TRANSPORT_IBOE:
return IB_LINK_LAYER_INFINIBAND;
case RDMA_TRANSPORT_IWARP:
case RDMA_TRANSPORT_USNIC:
case RDMA_TRANSPORT_USNIC_UDP:
return IB_LINK_LAYER_ETHERNET;
default:
+ BUG();
return IB_LINK_LAYER_UNSPECIFIED;
}
}
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c
index bdf3507..d46bbb0 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -99,6 +99,12 @@ static int c2_query_port(struct ib_device *ibdev,
return 0;
}

+static enum rdma_transport_type
+c2_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IWARP;
+}
+
static int c2_query_pkey(struct ib_device *ibdev,
u8 port, u16 index, u16 * pkey)
{
@@ -801,6 +807,7 @@ int c2_register_device(struct c2_dev *dev)
dev->ibdev.dma_device = &dev->pcidev->dev;
dev->ibdev.query_device = c2_query_device;
dev->ibdev.query_port = c2_query_port;
+ dev->ibdev.query_transport = c2_query_transport;
dev->ibdev.query_pkey = c2_query_pkey;
dev->ibdev.query_gid = c2_query_gid;
dev->ibdev.alloc_ucontext = c2_alloc_ucontext;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 811b24a..09682e9e 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1232,6 +1232,12 @@ static int iwch_query_port(struct ib_device *ibdev,
return 0;
}

+static enum rdma_transport_type
+iwch_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IWARP;
+}
+
static ssize_t show_rev(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -1385,6 +1391,7 @@ int iwch_register_device(struct iwch_dev *dev)
dev->ibdev.dma_device = &(dev->rdev.rnic_info.pdev->dev);
dev->ibdev.query_device = iwch_query_device;
dev->ibdev.query_port = iwch_query_port;
+ dev->ibdev.query_transport = iwch_query_transport;
dev->ibdev.query_pkey = iwch_query_pkey;
dev->ibdev.query_gid = iwch_query_gid;
dev->ibdev.alloc_ucontext = iwch_alloc_ucontext;
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 66bd6a2..a445e0d 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -390,6 +390,12 @@ static int c4iw_query_port(struct ib_device *ibdev, u8 port,
return 0;
}

+static enum rdma_transport_type
+c4iw_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IWARP;
+}
+
static ssize_t show_rev(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -506,6 +512,7 @@ int c4iw_register_device(struct c4iw_dev *dev)
dev->ibdev.dma_device = &(dev->rdev.lldi.pdev->dev);
dev->ibdev.query_device = c4iw_query_device;
dev->ibdev.query_port = c4iw_query_port;
+ dev->ibdev.query_transport = c4iw_query_transport;
dev->ibdev.query_pkey = c4iw_query_pkey;
dev->ibdev.query_gid = c4iw_query_gid;
dev->ibdev.alloc_ucontext = c4iw_alloc_ucontext;
diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c
index 9ed4d25..d5a34a6 100644
--- a/drivers/infiniband/hw/ehca/ehca_hca.c
+++ b/drivers/infiniband/hw/ehca/ehca_hca.c
@@ -242,6 +242,12 @@ query_port1:
return ret;
}

+enum rdma_transport_type
+ehca_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
int ehca_query_sma_attr(struct ehca_shca *shca,
u8 port, struct ehca_sma_attr *attr)
{
diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
index 22f79af..cec945f 100644
--- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
+++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
@@ -49,6 +49,9 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props);
int ehca_query_port(struct ib_device *ibdev, u8 port,
struct ib_port_attr *props);

+enum rdma_transport_type
+ehca_query_transport(struct ib_device *device, u8 port_num);
+
int ehca_query_sma_attr(struct ehca_shca *shca, u8 port,
struct ehca_sma_attr *attr);

diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index cd8d290..60e0a09 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -467,6 +467,7 @@ static int ehca_init_device(struct ehca_shca *shca)
shca->ib_device.dma_device = &shca->ofdev->dev;
shca->ib_device.query_device = ehca_query_device;
shca->ib_device.query_port = ehca_query_port;
+ shca->ib_device.query_transport = ehca_query_transport;
shca->ib_device.query_gid = ehca_query_gid;
shca->ib_device.query_pkey = ehca_query_pkey;
/* shca->in_device.modify_device = ehca_modify_device */
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 44ea939..58d36e3 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1638,6 +1638,12 @@ static int ipath_query_port(struct ib_device *ibdev,
return 0;
}

+static enum rdma_transport_type
+ipath_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
static int ipath_modify_device(struct ib_device *device,
int device_modify_mask,
struct ib_device_modify *device_modify)
@@ -2140,6 +2146,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd)
dev->query_device = ipath_query_device;
dev->modify_device = ipath_modify_device;
dev->query_port = ipath_query_port;
+ dev->query_transport = ipath_query_transport;
dev->modify_port = ipath_modify_port;
dev->query_pkey = ipath_query_pkey;
dev->query_gid = ipath_query_gid;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 0b280b1..28100bd 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -413,6 +413,15 @@ static int mlx4_ib_query_port(struct ib_device *ibdev, u8 port,
return __mlx4_ib_query_port(ibdev, port, props, 0);
}

+static enum rdma_transport_type
+mlx4_ib_query_transport(struct ib_device *device, u8 port_num)
+{
+ struct mlx4_dev *dev = to_mdev(device)->dev;
+
+ return dev->caps.port_mask[port_num] == MLX4_PORT_TYPE_IB ?
+ RDMA_TRANSPORT_IB : RDMA_TRANSPORT_IBOE;
+}
+
int __mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
union ib_gid *gid, int netw_view)
{
@@ -2121,6 +2130,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)

ibdev->ib_dev.query_device = mlx4_ib_query_device;
ibdev->ib_dev.query_port = mlx4_ib_query_port;
+ ibdev->ib_dev.query_transport = mlx4_ib_query_transport;
ibdev->ib_dev.get_link_layer = mlx4_ib_port_link_layer;
ibdev->ib_dev.query_gid = mlx4_ib_query_gid;
ibdev->ib_dev.query_pkey = mlx4_ib_query_pkey;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index cc4ac1e..209c796 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -351,6 +351,12 @@ out:
return err;
}

+static enum rdma_transport_type
+mlx5_ib_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
static int mlx5_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
union ib_gid *gid)
{
@@ -1336,6 +1342,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)

dev->ib_dev.query_device = mlx5_ib_query_device;
dev->ib_dev.query_port = mlx5_ib_query_port;
+ dev->ib_dev.query_transport = mlx5_ib_query_transport;
dev->ib_dev.query_gid = mlx5_ib_query_gid;
dev->ib_dev.query_pkey = mlx5_ib_query_pkey;
dev->ib_dev.modify_device = mlx5_ib_modify_device;
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 415f8e1..67ac6a4 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -179,6 +179,12 @@ static int mthca_query_port(struct ib_device *ibdev,
return err;
}

+static enum rdma_transport_type
+mthca_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
static int mthca_modify_device(struct ib_device *ibdev,
int mask,
struct ib_device_modify *props)
@@ -1281,6 +1287,7 @@ int mthca_register_device(struct mthca_dev *dev)
dev->ib_dev.dma_device = &dev->pdev->dev;
dev->ib_dev.query_device = mthca_query_device;
dev->ib_dev.query_port = mthca_query_port;
+ dev->ib_dev.query_transport = mthca_query_transport;
dev->ib_dev.modify_device = mthca_modify_device;
dev->ib_dev.modify_port = mthca_modify_port;
dev->ib_dev.query_pkey = mthca_query_pkey;
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index c0d0296..8df5b61 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -606,6 +606,11 @@ static int nes_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr
return 0;
}

+static enum rdma_transport_type
+nes_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IWARP;
+}

/**
* nes_query_pkey
@@ -3879,6 +3884,7 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev)
nesibdev->ibdev.dev.parent = &nesdev->pcidev->dev;
nesibdev->ibdev.query_device = nes_query_device;
nesibdev->ibdev.query_port = nes_query_port;
+ nesibdev->ibdev.query_transport = nes_query_transport;
nesibdev->ibdev.query_pkey = nes_query_pkey;
nesibdev->ibdev.query_gid = nes_query_gid;
nesibdev->ibdev.alloc_ucontext = nes_alloc_ucontext;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 7a2b59a..9f4d182 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -244,6 +244,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
/* mandatory verbs. */
dev->ibdev.query_device = ocrdma_query_device;
dev->ibdev.query_port = ocrdma_query_port;
+ dev->ibdev.query_transport = ocrdma_query_transport;
dev->ibdev.modify_port = ocrdma_modify_port;
dev->ibdev.query_gid = ocrdma_query_gid;
dev->ibdev.get_link_layer = ocrdma_link_layer;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 8771755..73bace4 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -187,6 +187,12 @@ int ocrdma_query_port(struct ib_device *ibdev,
return 0;
}

+enum rdma_transport_type
+ocrdma_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IBOE;
+}
+
int ocrdma_modify_port(struct ib_device *ibdev, u8 port, int mask,
struct ib_port_modify *props)
{
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index b8f7853..4a81b63 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -41,6 +41,9 @@ int ocrdma_query_port(struct ib_device *, u8 port, struct ib_port_attr *props);
int ocrdma_modify_port(struct ib_device *, u8 port, int mask,
struct ib_port_modify *props);

+enum rdma_transport_type
+ocrdma_query_transport(struct ib_device *device, u8 port_num);
+
void ocrdma_get_guid(struct ocrdma_dev *, u8 *guid);
int ocrdma_query_gid(struct ib_device *, u8 port,
int index, union ib_gid *gid);
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index 4a35998..caad665 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -1650,6 +1650,12 @@ static int qib_query_port(struct ib_device *ibdev, u8 port,
return 0;
}

+static enum rdma_transport_type
+qib_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
static int qib_modify_device(struct ib_device *device,
int device_modify_mask,
struct ib_device_modify *device_modify)
@@ -2184,6 +2190,7 @@ int qib_register_ib_device(struct qib_devdata *dd)
ibdev->query_device = qib_query_device;
ibdev->modify_device = qib_modify_device;
ibdev->query_port = qib_query_port;
+ ibdev->query_transport = qib_query_transport;
ibdev->modify_port = qib_modify_port;
ibdev->query_pkey = qib_query_pkey;
ibdev->query_gid = qib_query_gid;
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c b/drivers/infiniband/hw/usnic/usnic_ib_main.c
index 0d0f986..03ea9f3 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
@@ -360,6 +360,7 @@ static void *usnic_ib_device_add(struct pci_dev *dev)

us_ibdev->ib_dev.query_device = usnic_ib_query_device;
us_ibdev->ib_dev.query_port = usnic_ib_query_port;
+ us_ibdev->ib_dev.query_transport = usnic_ib_query_transport;
us_ibdev->ib_dev.query_pkey = usnic_ib_query_pkey;
us_ibdev->ib_dev.query_gid = usnic_ib_query_gid;
us_ibdev->ib_dev.get_link_layer = usnic_ib_port_link_layer;
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index 53bd6a2..ff9a5f7 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -348,6 +348,12 @@ int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
return 0;
}

+enum rdma_transport_type
+usnic_ib_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_USNIC_UDP;
+}
+
int usnic_ib_query_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr,
int qp_attr_mask,
struct ib_qp_init_attr *qp_init_attr)
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
index bb864f5..0b1633b 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
@@ -27,6 +27,8 @@ int usnic_ib_query_device(struct ib_device *ibdev,
struct ib_device_attr *props);
int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
struct ib_port_attr *props);
+enum rdma_transport_type
+usnic_ib_query_transport(struct ib_device *device, u8 port_num);
int usnic_ib_query_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr,
int qp_attr_mask,
struct ib_qp_init_attr *qp_init_attr);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 65994a1..d54f91e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -75,10 +75,13 @@ enum rdma_node_type {
};

enum rdma_transport_type {
+ /* legacy for users */
RDMA_TRANSPORT_IB,
RDMA_TRANSPORT_IWARP,
RDMA_TRANSPORT_USNIC,
- RDMA_TRANSPORT_USNIC_UDP
+ RDMA_TRANSPORT_USNIC_UDP,
+ /* new transport */
+ RDMA_TRANSPORT_IBOE,
};

__attribute_const__ enum rdma_transport_type
@@ -1501,6 +1504,8 @@ struct ib_device {
int (*query_port)(struct ib_device *device,
u8 port_num,
struct ib_port_attr *port_attr);
+ enum rdma_transport_type (*query_transport)(struct ib_device *device,
+ u8 port_num);
enum rdma_link_layer (*get_link_layer)(struct ib_device *device,
u8 port_num);
int (*query_gid)(struct ib_device *device,
--
2.1.0

2015-04-07 12:29:32

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 02/17] IB/Verbs: Implement raw management helpers


Add raw helpers:
rdma_transport_ib
rdma_transport_iboe
rdma_transport_iwarp
rdma_ib_mgmt
To help us checking transport type.

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
include/rdma/ib_verbs.h | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index d54f91e..780b3b7 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1748,6 +1748,31 @@ int ib_query_port(struct ib_device *device,
enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device,
u8 port_num);

+static inline int rdma_transport_ib(struct ib_device *device, u8 port_num)
+{
+ return device->query_transport(device, port_num)
+ == RDMA_TRANSPORT_IB;
+}
+
+static inline int rdma_transport_iboe(struct ib_device *device, u8 port_num)
+{
+ return device->query_transport(device, port_num)
+ == RDMA_TRANSPORT_IBOE;
+}
+
+static inline int rdma_transport_iwarp(struct ib_device *device, u8 port_num)
+{
+ return device->query_transport(device, port_num)
+ == RDMA_TRANSPORT_IWARP;
+}
+
+static inline int rdma_ib_mgmt(struct ib_device *device, u8 port_num)
+{
+ enum rdma_transport_type tp = device->query_transport(device, port_num);
+
+ return (tp == RDMA_TRANSPORT_IB || tp == RDMA_TRANSPORT_IBOE);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-07 12:30:27

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 03/17] IB/Verbs: Use management helper cap_ib_mad() for mad-check


Introduce helper cap_ib_mad() to help us check if the port of an
IB device support Infiniband Management Datagrams.

Reform ib_umad_add_one() to fit per-port-check method better.

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/mad.c | 18 +++++++++---------
drivers/infiniband/core/user_mad.c | 26 ++++++++++++++++++++------
include/rdma/ib_verbs.h | 15 +++++++++++++++
3 files changed, 44 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 74c30f4..ef0c0c5 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -3057,9 +3057,6 @@ static void ib_mad_init_device(struct ib_device *device)
{
int start, end, i;

- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
-
if (device->node_type == RDMA_NODE_IB_SWITCH) {
start = 0;
end = 0;
@@ -3069,6 +3066,9 @@ static void ib_mad_init_device(struct ib_device *device)
}

for (i = start; i <= end; i++) {
+ if (!cap_ib_mad(device, i))
+ continue;
+
if (ib_mad_port_open(device, i)) {
dev_err(&device->dev, "Couldn't open port %d\n", i);
goto error;
@@ -3086,15 +3086,15 @@ error_agent:
dev_err(&device->dev, "Couldn't close port %d\n", i);

error:
- i--;
+ while (--i >= start) {
+ if (!cap_ib_mad(device, i))
+ continue;

- while (i >= start) {
if (ib_agent_port_close(device, i))
dev_err(&device->dev,
"Couldn't close port %d for agents\n", i);
if (ib_mad_port_close(device, i))
dev_err(&device->dev, "Couldn't close port %d\n", i);
- i--;
}
}

@@ -3102,9 +3102,6 @@ static void ib_mad_remove_device(struct ib_device *device)
{
int i, num_ports, cur_port;

- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
-
if (device->node_type == RDMA_NODE_IB_SWITCH) {
num_ports = 1;
cur_port = 0;
@@ -3113,6 +3110,9 @@ static void ib_mad_remove_device(struct ib_device *device)
cur_port = 1;
}
for (i = 0; i < num_ports; i++, cur_port++) {
+ if (!cap_ib_mad(device, i))
+ continue;
+
if (ib_agent_port_close(device, cur_port))
dev_err(&device->dev,
"Couldn't close port %d for agents\n",
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 928cdd2..b52884b 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -1273,9 +1273,7 @@ static void ib_umad_add_one(struct ib_device *device)
{
struct ib_umad_device *umad_dev;
int s, e, i;
-
- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
+ int count = 0;

if (device->node_type == RDMA_NODE_IB_SWITCH)
s = e = 0;
@@ -1296,11 +1294,21 @@ static void ib_umad_add_one(struct ib_device *device)
umad_dev->end_port = e;

for (i = s; i <= e; ++i) {
+ if (!cap_ib_mad(device, i))
+ continue;
+
umad_dev->port[i - s].umad_dev = umad_dev;

if (ib_umad_init_port(device, i, umad_dev,
&umad_dev->port[i - s]))
goto err;
+
+ count++;
+ }
+
+ if (!count) {
+ kobject_put(&umad_dev->kobj);
+ return;
}

ib_set_client_data(device, &umad_client, umad_dev);
@@ -1308,8 +1316,12 @@ static void ib_umad_add_one(struct ib_device *device)
return;

err:
- while (--i >= s)
+ while (--i >= s) {
+ if (!cap_ib_mad(device, i))
+ continue;
+
ib_umad_kill_port(&umad_dev->port[i - s]);
+ }

kobject_put(&umad_dev->kobj);
}
@@ -1322,8 +1334,10 @@ static void ib_umad_remove_one(struct ib_device *device)
if (!umad_dev)
return;

- for (i = 0; i <= umad_dev->end_port - umad_dev->start_port; ++i)
- ib_umad_kill_port(&umad_dev->port[i]);
+ for (i = 0; i <= umad_dev->end_port - umad_dev->start_port; ++i) {
+ if (cap_ib_mad(device, i))
+ ib_umad_kill_port(&umad_dev->port[i]);
+ }

kobject_put(&umad_dev->kobj);
}
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 780b3b7..4013933 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1773,6 +1773,21 @@ static inline int rdma_ib_mgmt(struct ib_device *device, u8 port_num)
return (tp == RDMA_TRANSPORT_IB || tp == RDMA_TRANSPORT_IBOE);
}

+/**
+ * cap_ib_mad - Check if the port of device has the capability Infiniband
+ * Management Datagrams.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support Infiniband
+ * Management Datagrams.
+ */
+static inline int cap_ib_mad(struct ib_device *device, u8 port_num)
+{
+ return rdma_ib_mgmt(device, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-07 12:31:34

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 04/17] IB/Verbs: Use management helper cap_ib_smi() for smi-check


Introduce helper cap_ib_smi() to help us check if the port of an
IB device support Infiniband Subnet Management Interface.

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/agent.c | 2 +-
drivers/infiniband/core/mad.c | 2 +-
include/rdma/ib_verbs.h | 15 +++++++++++++++
3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index f6d2961..61471ee 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int port_num)
goto error1;
}

- if (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) {
+ if (cap_ib_smi(device, port_num)) {
/* Obtain send only MAD agent for SMI QP */
port_priv->agent[0] = ib_register_mad_agent(device, port_num,
IB_QPT_SMI, NULL, 0,
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index ef0c0c5..2668d4e 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2938,7 +2938,7 @@ static int ib_mad_port_open(struct ib_device *device,
init_mad_qp(port_priv, &port_priv->qp_info[1]);

cq_size = mad_sendq_size + mad_recvq_size;
- has_smi = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND;
+ has_smi = cap_ib_smi(device, port_num);
if (has_smi)
cq_size *= 2;

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 4013933..ee76010 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1788,6 +1788,21 @@ static inline int cap_ib_mad(struct ib_device *device, u8 port_num)
return rdma_ib_mgmt(device, port_num);
}

+/**
+ * cap_ib_smi - Check if the port of device has the capability Infiniband
+ * Subnet Management Interface.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support Infiniband
+ * Subnet Management Interface.
+ */
+static inline int cap_ib_smi(struct ib_device *device, u8 port_num)
+{
+ return rdma_transport_ib(device, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-07 12:32:10

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 05/17] IB/Verbs: Use management helper cap_ib_cm() for cm-check


Introduce helper cap_ib_cm() to help us check if the port of an
IB device support Infiniband Communication Manager.

Reform cm_add_one() to fit per-port-check method better.

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cm.c | 22 +++++++++++++++++++---
include/rdma/ib_verbs.h | 15 +++++++++++++++
2 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index e28a494..63418ee 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -3761,9 +3761,7 @@ static void cm_add_one(struct ib_device *ib_device)
unsigned long flags;
int ret;
u8 i;
-
- if (rdma_node_get_transport(ib_device->node_type) != RDMA_TRANSPORT_IB)
- return;
+ int count = 0;

cm_dev = kzalloc(sizeof(*cm_dev) + sizeof(*port) *
ib_device->phys_port_cnt, GFP_KERNEL);
@@ -3783,6 +3781,9 @@ static void cm_add_one(struct ib_device *ib_device)

set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask);
for (i = 1; i <= ib_device->phys_port_cnt; i++) {
+ if (!cap_ib_cm(ib_device, i))
+ continue;
+
port = kzalloc(sizeof *port, GFP_KERNEL);
if (!port)
goto error1;
@@ -3809,7 +3810,16 @@ static void cm_add_one(struct ib_device *ib_device)
ret = ib_modify_port(ib_device, i, 0, &port_modify);
if (ret)
goto error3;
+
+ count++;
}
+
+ if (!count) {
+ device_unregister(cm_dev->device);
+ kfree(cm_dev);
+ return;
+ }
+
ib_set_client_data(ib_device, &cm_client, cm_dev);

write_lock_irqsave(&cm.device_lock, flags);
@@ -3825,6 +3835,9 @@ error1:
port_modify.set_port_cap_mask = 0;
port_modify.clr_port_cap_mask = IB_PORT_CM_SUP;
while (--i) {
+ if (!cap_ib_cm(ib_device, i))
+ continue;
+
port = cm_dev->port[i-1];
ib_modify_port(ib_device, port->port_num, 0, &port_modify);
ib_unregister_mad_agent(port->mad_agent);
@@ -3853,6 +3866,9 @@ static void cm_remove_one(struct ib_device *ib_device)
write_unlock_irqrestore(&cm.device_lock, flags);

for (i = 1; i <= ib_device->phys_port_cnt; i++) {
+ if (!cap_ib_cm(ib_device, i))
+ continue;
+
port = cm_dev->port[i-1];
ib_modify_port(ib_device, port->port_num, 0, &port_modify);
ib_unregister_mad_agent(port->mad_agent);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index ee76010..3ba963f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1803,6 +1803,21 @@ static inline int cap_ib_smi(struct ib_device *device, u8 port_num)
return rdma_transport_ib(device, port_num);
}

+/**
+ * cap_ib_cm - Check if the port of device has the capability Infiniband
+ * Communication Manager.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support Infiniband
+ * Communication Manager.
+ */
+static inline int cap_ib_cm(struct ib_device *device, u8 port_num)
+{
+ return rdma_ib_mgmt(device, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-07 12:32:51

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 06/17] IB/Verbs: Use management helper cap_ib_sa() for sa-check


Introduce helper cap_ib_sa() to help us check if the port of an
IB device support Infiniband Subnet Administrator.

Reform ib_sa_add_one() to fit per-port-check method better.

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/sa_query.c | 27 +++++++++++++++++----------
include/rdma/ib_verbs.h | 15 +++++++++++++++
2 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index c38f030..f704254 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -450,7 +450,7 @@ static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event
struct ib_sa_port *port =
&sa_dev->port[event->element.port_num - sa_dev->start_port];

- if (rdma_port_get_link_layer(handler->device, port->port_num) != IB_LINK_LAYER_INFINIBAND)
+ if (WARN_ON(!cap_ib_sa(handler->device, port->port_num)))
return;

spin_lock_irqsave(&port->ah_lock, flags);
@@ -1153,9 +1153,7 @@ static void ib_sa_add_one(struct ib_device *device)
{
struct ib_sa_device *sa_dev;
int s, e, i;
-
- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
+ int count = 0;

if (device->node_type == RDMA_NODE_IB_SWITCH)
s = e = 0;
@@ -1175,7 +1173,7 @@ static void ib_sa_add_one(struct ib_device *device)

for (i = 0; i <= e - s; ++i) {
spin_lock_init(&sa_dev->port[i].ah_lock);
- if (rdma_port_get_link_layer(device, i + 1) != IB_LINK_LAYER_INFINIBAND)
+ if (!cap_ib_sa(device, i + 1))
continue;

sa_dev->port[i].sm_ah = NULL;
@@ -1189,6 +1187,13 @@ static void ib_sa_add_one(struct ib_device *device)
goto err;

INIT_WORK(&sa_dev->port[i].update_task, update_sm_ah);
+
+ count++;
+ }
+
+ if (!count) {
+ kfree(sa_dev);
+ return;
}

ib_set_client_data(device, &sa_client, sa_dev);
@@ -1204,16 +1209,18 @@ static void ib_sa_add_one(struct ib_device *device)
if (ib_register_event_handler(&sa_dev->event_handler))
goto err;

- for (i = 0; i <= e - s; ++i)
- if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND)
+ for (i = 0; i <= e - s; ++i) {
+ if (cap_ib_sa(device, i + 1))
update_sm_ah(&sa_dev->port[i].update_task);
+ }

return;

err:
- while (--i >= 0)
- if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND)
+ while (--i >= 0) {
+ if (cap_ib_sa(device, i + 1))
ib_unregister_mad_agent(sa_dev->port[i].agent);
+ }

kfree(sa_dev);

@@ -1233,7 +1240,7 @@ static void ib_sa_remove_one(struct ib_device *device)
flush_workqueue(ib_wq);

for (i = 0; i <= sa_dev->end_port - sa_dev->start_port; ++i) {
- if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND) {
+ if (cap_ib_sa(device, i + 1)) {
ib_unregister_mad_agent(sa_dev->port[i].agent);
if (sa_dev->port[i].sm_ah)
kref_put(&sa_dev->port[i].sm_ah->ref, free_sm_ah);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 3ba963f..c405e45 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1818,6 +1818,21 @@ static inline int cap_ib_cm(struct ib_device *device, u8 port_num)
return rdma_ib_mgmt(device, port_num);
}

+/**
+ * cap_ib_sa - Check if the port of device has the capability Infiniband
+ * Subnet Administrator.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support Infiniband
+ * Subnet Administrator.
+ */
+static inline int cap_ib_sa(struct ib_device *device, u8 port_num)
+{
+ return rdma_transport_ib(device, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-07 12:33:28

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 07/17] IB/Verbs: Use management helper cap_ib_mcast() for mcast-check


Introduce helper cap_ib_mcast() to help us check if the port of an
IB device support Infiniband Multicast.

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/multicast.c | 12 +++---------
include/rdma/ib_verbs.h | 15 +++++++++++++++
2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index fa17b55..bdc1880 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -780,8 +780,7 @@ static void mcast_event_handler(struct ib_event_handler *handler,
int index;

dev = container_of(handler, struct mcast_device, event_handler);
- if (rdma_port_get_link_layer(dev->device, event->element.port_num) !=
- IB_LINK_LAYER_INFINIBAND)
+ if (WARN_ON(!cap_ib_mcast(dev->device, event->element.port_num)))
return;

index = event->element.port_num - dev->start_port;
@@ -808,9 +807,6 @@ static void mcast_add_one(struct ib_device *device)
int i;
int count = 0;

- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
-
dev = kmalloc(sizeof *dev + device->phys_port_cnt * sizeof *port,
GFP_KERNEL);
if (!dev)
@@ -824,8 +820,7 @@ static void mcast_add_one(struct ib_device *device)
}

for (i = 0; i <= dev->end_port - dev->start_port; i++) {
- if (rdma_port_get_link_layer(device, dev->start_port + i) !=
- IB_LINK_LAYER_INFINIBAND)
+ if (!cap_ib_mcast(device, dev->start_port + i))
continue;
port = &dev->port[i];
port->dev = dev;
@@ -863,8 +858,7 @@ static void mcast_remove_one(struct ib_device *device)
flush_workqueue(mcast_wq);

for (i = 0; i <= dev->end_port - dev->start_port; i++) {
- if (rdma_port_get_link_layer(device, dev->start_port + i) ==
- IB_LINK_LAYER_INFINIBAND) {
+ if (cap_ib_mcast(device, dev->start_port + i)) {
port = &dev->port[i];
deref_port(port);
wait_for_completion(&port->comp);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index c405e45..5a5f6d5 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1833,6 +1833,21 @@ static inline int cap_ib_sa(struct ib_device *device, u8 port_num)
return rdma_transport_ib(device, port_num);
}

+/**
+ * cap_ib_mcast - Check if the port of device has the capability Infiniband
+ * Multicast.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support Infiniband
+ * Multicast.
+ */
+static inline int cap_ib_mcast(struct ib_device *device, u8 port_num)
+{
+ return cap_ib_sa(device, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-07 12:34:07

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 08/17] IB/Verbs: Use management helper cap_ipoib() for ipoib-check


Introduce helper cap_ipoib() to help us check if the port of an
IB device support IP over Infiniband.

Reform ipoib_add_one() to fit per-port-check method better.

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 17 ++++++++++-------
include/rdma/ib_verbs.h | 15 +++++++++++++++
2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 58b5aa3..e36a926 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1654,9 +1654,7 @@ static void ipoib_add_one(struct ib_device *device)
struct net_device *dev;
struct ipoib_dev_priv *priv;
int s, e, p;
-
- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
+ int count = 0;

dev_list = kmalloc(sizeof *dev_list, GFP_KERNEL);
if (!dev_list)
@@ -1673,13 +1671,21 @@ static void ipoib_add_one(struct ib_device *device)
}

for (p = s; p <= e; ++p) {
- if (rdma_port_get_link_layer(device, p) != IB_LINK_LAYER_INFINIBAND)
+ if (!cap_ipoib(device, p))
continue;
+
dev = ipoib_add_port("ib%d", device, p);
if (!IS_ERR(dev)) {
priv = netdev_priv(dev);
list_add_tail(&priv->list, dev_list);
}
+
+ count++;
+ }
+
+ if (!count) {
+ kfree(dev_list);
+ return;
}

ib_set_client_data(device, &ipoib_client, dev_list);
@@ -1690,9 +1696,6 @@ static void ipoib_remove_one(struct ib_device *device)
struct ipoib_dev_priv *priv, *tmp;
struct list_head *dev_list;

- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
-
dev_list = ib_get_client_data(device, &ipoib_client);
if (!dev_list)
return;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 5a5f6d5..9db8966 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1848,6 +1848,21 @@ static inline int cap_ib_mcast(struct ib_device *device, u8 port_num)
return cap_ib_sa(device, port_num);
}

+/**
+ * cap_ipoib - Check if the port of device has the capability
+ * IP over Infiniband.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support
+ * IP over Infiniband.
+ */
+static inline int cap_ipoib(struct ib_device *device, u8 port_num)
+{
+ return rdma_transport_ib(device, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-07 12:34:49

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 09/17] IB/Verbs: Use helper cap_read_multi_sge() and reform svc_rdma_accept()


Introduce helper cap_read_multi_sge() to help us check if the port of an
IB device support RDMA Read Multiple Scatter-Gather Entries.

Reform svc_rdma_accept() to adopt management helpers.

Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
include/rdma/ib_verbs.h | 15 +++++++++++++++
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 4 ++--
net/sunrpc/xprtrdma/svc_rdma_transport.c | 12 +++++-------
3 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 9db8966..cae6f2d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1849,6 +1849,21 @@ static inline int cap_ib_mcast(struct ib_device *device, u8 port_num)
}

/**
+ * cap_read_multi_sge - Check if the port of device has the capability
+ * RDMA Read Multiple Scatter-Gather Entries.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support
+ * RDMA Read Multiple Scatter-Gather Entries.
+ */
+static inline int cap_read_multi_sge(struct ib_device *device, u8 port_num)
+{
+ return !rdma_transport_iwarp(device, port_num);
+}
+
+/**
* cap_ipoib - Check if the port of device has the capability
* IP over Infiniband.
*
diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index e011027..604d035 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -118,8 +118,8 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,

static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
{
- if (rdma_node_get_transport(xprt->sc_cm_id->device->node_type) ==
- RDMA_TRANSPORT_IWARP)
+ if (!cap_read_multi_sge(xprt->sc_cm_id->device,
+ xprt->sc_cm_id->port_num))
return 1;
else
return min_t(int, sge_count, xprt->sc_max_sge);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 4e61880..e75175d 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -979,8 +979,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
/*
* Determine if a DMA MR is required and if so, what privs are required
*/
- switch (rdma_node_get_transport(newxprt->sc_cm_id->device->node_type)) {
- case RDMA_TRANSPORT_IWARP:
+ if (rdma_transport_iwarp(newxprt->sc_cm_id->device,
+ newxprt->sc_cm_id->port_num)) {
newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV;
if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) {
need_dma_mr = 1;
@@ -992,8 +992,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
} else
need_dma_mr = 0;
- break;
- case RDMA_TRANSPORT_IB:
+ } else if (rdma_ib_mgmt(newxprt->sc_cm_id->device,
+ newxprt->sc_cm_id->port_num)) {
if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) {
need_dma_mr = 1;
dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
@@ -1003,10 +1003,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
} else
need_dma_mr = 0;
- break;
- default:
+ } else
goto errout;
- }

/* Create the DMA MR if needed, otherwise, use the DMA LKEY */
if (need_dma_mr) {
--
2.1.0

2015-04-07 12:35:26

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 10/17] IB/Verbs: Adopt management helpers for IB helpers


Adopt management helpers for:
ib_init_ah_from_path()
ib_init_ah_from_wc()
ib_resolve_eth_l2_attrs()

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/sa_query.c | 2 +-
drivers/infiniband/core/verbs.c | 6 ++----
2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index f704254..4e61104 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -540,7 +540,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
ah_attr->port_num = port_num;
ah_attr->static_rate = rec->rate;

- force_grh = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_ETHERNET;
+ force_grh = !rdma_transport_ib(device, port_num);

if (rec->hop_limit > 1 || force_grh) {
ah_attr->ah_flags = IB_AH_GRH;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 83370de..ca06f76 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -200,11 +200,9 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
u32 flow_class;
u16 gid_index;
int ret;
- int is_eth = (rdma_port_get_link_layer(device, port_num) ==
- IB_LINK_LAYER_ETHERNET);

memset(ah_attr, 0, sizeof *ah_attr);
- if (is_eth) {
+ if (!rdma_transport_ib(device, port_num)) {
if (!(wc->wc_flags & IB_WC_GRH))
return -EPROTOTYPE;

@@ -873,7 +871,7 @@ int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
union ib_gid sgid;

if ((*qp_attr_mask & IB_QP_AV) &&
- (rdma_port_get_link_layer(qp->device, qp_attr->ah_attr.port_num) == IB_LINK_LAYER_ETHERNET)) {
+ (!rdma_transport_ib(qp->device, qp_attr->ah_attr.port_num))) {
ret = ib_query_gid(qp->device, qp_attr->ah_attr.port_num,
qp_attr->ah_attr.grh.sgid_index, &sgid);
if (ret)
--
2.1.0

2015-04-07 12:36:04

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 11/17] IB/Verbs: Reform link_layer_show() and ib_uverbs_query_port()


Reform link_layer_show() and ib_uverbs_query_port() with management helpers.

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/sysfs.c | 8 ++------
drivers/infiniband/core/uverbs_cmd.c | 6 ++++--
2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/sysfs.c b/drivers/infiniband/core/sysfs.c
index cbd0383..aa53e40 100644
--- a/drivers/infiniband/core/sysfs.c
+++ b/drivers/infiniband/core/sysfs.c
@@ -248,14 +248,10 @@ static ssize_t phys_state_show(struct ib_port *p, struct port_attribute *unused,
static ssize_t link_layer_show(struct ib_port *p, struct port_attribute *unused,
char *buf)
{
- switch (rdma_port_get_link_layer(p->ibdev, p->port_num)) {
- case IB_LINK_LAYER_INFINIBAND:
+ if (rdma_transport_ib(p->ibdev, p->port_num))
return sprintf(buf, "%s\n", "InfiniBand");
- case IB_LINK_LAYER_ETHERNET:
+ else
return sprintf(buf, "%s\n", "Ethernet");
- default:
- return sprintf(buf, "%s\n", "Unknown");
- }
}

static PORT_ATTR_RO(state);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index a9f0489..3eb6eb5 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -515,8 +515,10 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file *file,
resp.active_width = attr.active_width;
resp.active_speed = attr.active_speed;
resp.phys_state = attr.phys_state;
- resp.link_layer = rdma_port_get_link_layer(file->device->ib_dev,
- cmd.port_num);
+ resp.link_layer = rdma_transport_ib(file->device->ib_dev,
+ cmd.port_num) ?
+ IB_LINK_LAYER_INFINIBAND :
+ IB_LINK_LAYER_ETHERNET;

if (copy_to_user((void __user *) (unsigned long) cmd.response,
&resp, sizeof resp))
--
2.1.0

2015-04-07 12:36:38

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 12/17] IB/Verbs: Use management helper cap_ib_cm_dev() for cm-device-check


Introduce helper cap_ib_cm_dev() to help us check if any port of device
has the capability Infiniband Communication Manager.

Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 5 ++---
drivers/infiniband/core/ucm.c | 3 +--
include/rdma/ib_verbs.h | 20 ++++++++++++++++++++
3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index d570030..d8a8ea7 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1625,8 +1625,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
struct rdma_cm_id *id;
int ret;

- if (cma_family(id_priv) == AF_IB &&
- rdma_node_get_transport(cma_dev->device->node_type) != RDMA_TRANSPORT_IB)
+ if (cma_family(id_priv) == AF_IB && !cap_ib_cm_dev(cma_dev->device))
return;

id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps,
@@ -2028,7 +2027,7 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv)
mutex_lock(&lock);
list_for_each_entry(cur_dev, &dev_list, list) {
if (cma_family(id_priv) == AF_IB &&
- rdma_node_get_transport(cur_dev->device->node_type) != RDMA_TRANSPORT_IB)
+ !cap_ib_cm_dev(cur_dev->device))
continue;

if (!cma_dev)
diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index f2f6393..065405e 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -1253,8 +1253,7 @@ static void ib_ucm_add_one(struct ib_device *device)
dev_t base;
struct ib_ucm_device *ucm_dev;

- if (!device->alloc_ucontext ||
- rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
+ if (!device->alloc_ucontext || !cap_ib_cm_dev(device))
return;

ucm_dev = kzalloc(sizeof *ucm_dev, GFP_KERNEL);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index cae6f2d..2767a91 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1819,6 +1819,26 @@ static inline int cap_ib_cm(struct ib_device *device, u8 port_num)
}

/**
+ * cap_ib_cm_dev - Check if any port of device has the capability Infiniband
+ * Communication Manager.
+ *
+ * @device: Device to be checked
+ *
+ * Return 0 when all port of the device don't support Infiniband
+ * Communication Manager.
+ */
+static inline int cap_ib_cm_dev(struct ib_device *device)
+{
+ int i;
+
+ for (i = 1; i <= device->phys_port_cnt; i++) {
+ if (cap_ib_cm(device, i))
+ return 1;
+ }
+ return 0;
+}
+
+/**
* cap_ib_sa - Check if the port of device has the capability Infiniband
* Subnet Administrator.
*
--
2.1.0

2015-04-07 12:37:14

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 13/17] IB/Verbs: Reform cma/ucma with management helpers


Reform cma/ucma with management helpers.

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 182 +++++++++++++----------------------------
drivers/infiniband/core/ucma.c | 25 ++----
2 files changed, 65 insertions(+), 142 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index d8a8ea7..c23f483 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -435,10 +435,10 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
pkey = ntohs(addr->sib_pkey);

list_for_each_entry(cur_dev, &dev_list, list) {
- if (rdma_node_get_transport(cur_dev->device->node_type) != RDMA_TRANSPORT_IB)
- continue;
-
for (p = 1; p <= cur_dev->device->phys_port_cnt; ++p) {
+ if (!rdma_ib_mgmt(cur_dev->device, p))
+ continue;
+
if (ib_find_cached_pkey(cur_dev->device, p, pkey, &index))
continue;

@@ -633,10 +633,10 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
if (ret)
goto out;

- if (rdma_node_get_transport(id_priv->cma_dev->device->node_type)
- == RDMA_TRANSPORT_IB &&
- rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)
- == IB_LINK_LAYER_ETHERNET) {
+ /* Will this happen? */
+ BUG_ON(id_priv->cma_dev->device != id_priv->id.device);
+
+ if (rdma_transport_iboe(id_priv->id.device, id_priv->id.port_num)) {
ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL);

if (ret)
@@ -700,8 +700,7 @@ static int cma_ib_init_qp_attr(struct rdma_id_private *id_priv,
int ret;
u16 pkey;

- if (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num) ==
- IB_LINK_LAYER_INFINIBAND)
+ if (rdma_transport_ib(id_priv->id.device, id_priv->id.port_num))
pkey = ib_addr_get_pkey(dev_addr);
else
pkey = 0xffff;
@@ -735,8 +734,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr,
int ret = 0;

id_priv = container_of(id, struct rdma_id_private, id);
- switch (rdma_node_get_transport(id_priv->id.device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_mgmt(id_priv->id.device, id_priv->id.port_num)) {
if (!id_priv->cm_id.ib || (id_priv->id.qp_type == IB_QPT_UD))
ret = cma_ib_init_qp_attr(id_priv, qp_attr, qp_attr_mask);
else
@@ -745,19 +743,16 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr,

if (qp_attr->qp_state == IB_QPS_RTR)
qp_attr->rq_psn = id_priv->seq_num;
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_transport_iwarp(id_priv->id.device,
+ id_priv->id.port_num)) {
if (!id_priv->cm_id.iw) {
qp_attr->qp_access_flags = 0;
*qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS;
} else
ret = iw_cm_init_qp_attr(id_priv->cm_id.iw, qp_attr,
qp_attr_mask);
- break;
- default:
+ } else
ret = -ENOSYS;
- break;
- }

return ret;
}
@@ -928,13 +923,9 @@ static inline int cma_user_data_offset(struct rdma_id_private *id_priv)

static void cma_cancel_route(struct rdma_id_private *id_priv)
{
- switch (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)) {
- case IB_LINK_LAYER_INFINIBAND:
+ if (rdma_transport_ib(id_priv->id.device, id_priv->id.port_num)) {
if (id_priv->query)
ib_sa_cancel_query(id_priv->query_id, id_priv->query);
- break;
- default:
- break;
}
}

@@ -1006,17 +997,14 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv)
mc = container_of(id_priv->mc_list.next,
struct cma_multicast, list);
list_del(&mc->list);
- switch (rdma_port_get_link_layer(id_priv->cma_dev->device, id_priv->id.port_num)) {
- case IB_LINK_LAYER_INFINIBAND:
+ if (rdma_transport_ib(id_priv->cma_dev->device,
+ id_priv->id.port_num)) {
ib_sa_free_multicast(mc->multicast.ib);
kfree(mc);
break;
- case IB_LINK_LAYER_ETHERNET:
+ } else if (rdma_transport_ib(id_priv->cma_dev->device,
+ id_priv->id.port_num))
kref_put(&mc->mcref, release_mc);
- break;
- default:
- break;
- }
}
}

@@ -1037,17 +1025,13 @@ void rdma_destroy_id(struct rdma_cm_id *id)
mutex_unlock(&id_priv->handler_mutex);

if (id_priv->cma_dev) {
- switch (rdma_node_get_transport(id_priv->id.device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_mgmt(id_priv->id.device, id_priv->id.port_num)) {
if (id_priv->cm_id.ib)
ib_destroy_cm_id(id_priv->cm_id.ib);
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_transport_iwarp(id_priv->id.device,
+ id_priv->id.port_num)) {
if (id_priv->cm_id.iw)
iw_destroy_cm_id(id_priv->cm_id.iw);
- break;
- default:
- break;
}
cma_leave_mc_groups(id_priv);
cma_release_dev(id_priv);
@@ -1966,26 +1950,14 @@ int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms)
return -EINVAL;

atomic_inc(&id_priv->refcount);
- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
- switch (rdma_port_get_link_layer(id->device, id->port_num)) {
- case IB_LINK_LAYER_INFINIBAND:
- ret = cma_resolve_ib_route(id_priv, timeout_ms);
- break;
- case IB_LINK_LAYER_ETHERNET:
- ret = cma_resolve_iboe_route(id_priv);
- break;
- default:
- ret = -ENOSYS;
- }
- break;
- case RDMA_TRANSPORT_IWARP:
+ if (rdma_transport_ib(id->device, id->port_num))
+ ret = cma_resolve_ib_route(id_priv, timeout_ms);
+ else if (rdma_transport_iboe(id->device, id->port_num))
+ ret = cma_resolve_iboe_route(id_priv);
+ else if (rdma_transport_iwarp(id->device, id->port_num))
ret = cma_resolve_iw_route(id_priv, timeout_ms);
- break;
- default:
+ else
ret = -ENOSYS;
- break;
- }
if (ret)
goto err;

@@ -2059,7 +2031,7 @@ port_found:
goto out;

id_priv->id.route.addr.dev_addr.dev_type =
- (rdma_port_get_link_layer(cma_dev->device, p) == IB_LINK_LAYER_INFINIBAND) ?
+ (rdma_transport_ib(cma_dev->device, p)) ?
ARPHRD_INFINIBAND : ARPHRD_ETHER;

rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid);
@@ -2536,18 +2508,15 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)

id_priv->backlog = backlog;
if (id->device) {
- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_mgmt(id->device, id->port_num)) {
ret = cma_ib_listen(id_priv);
if (ret)
goto err;
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_transport_iwarp(id->device, id->port_num)) {
ret = cma_iw_listen(id_priv, backlog);
if (ret)
goto err;
- break;
- default:
+ } else {
ret = -ENOSYS;
goto err;
}
@@ -2883,20 +2852,15 @@ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
id_priv->srq = conn_param->srq;
}

- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_mgmt(id->device, id->port_num)) {
if (id->qp_type == IB_QPT_UD)
ret = cma_resolve_ib_udp(id_priv, conn_param);
else
ret = cma_connect_ib(id_priv, conn_param);
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_transport_iwarp(id->device, id->port_num))
ret = cma_connect_iw(id_priv, conn_param);
- break;
- default:
+ else
ret = -ENOSYS;
- break;
- }
if (ret)
goto err;

@@ -2999,8 +2963,7 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
id_priv->srq = conn_param->srq;
}

- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_mgmt(id->device, id->port_num)) {
if (id->qp_type == IB_QPT_UD) {
if (conn_param)
ret = cma_send_sidr_rep(id_priv, IB_SIDR_SUCCESS,
@@ -3016,14 +2979,10 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
else
ret = cma_rep_recv(id_priv);
}
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_transport_iwarp(id->device, id->port_num))
ret = cma_accept_iw(id_priv, conn_param);
- break;
- default:
+ else
ret = -ENOSYS;
- break;
- }

if (ret)
goto reject;
@@ -3067,8 +3026,7 @@ int rdma_reject(struct rdma_cm_id *id, const void *private_data,
if (!id_priv->cm_id.ib)
return -EINVAL;

- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_mgmt(id->device, id->port_num)) {
if (id->qp_type == IB_QPT_UD)
ret = cma_send_sidr_rep(id_priv, IB_SIDR_REJECT, 0,
private_data, private_data_len);
@@ -3076,15 +3034,11 @@ int rdma_reject(struct rdma_cm_id *id, const void *private_data,
ret = ib_send_cm_rej(id_priv->cm_id.ib,
IB_CM_REJ_CONSUMER_DEFINED, NULL,
0, private_data, private_data_len);
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_transport_iwarp(id->device, id->port_num)) {
ret = iw_cm_reject(id_priv->cm_id.iw,
private_data, private_data_len);
- break;
- default:
+ } else
ret = -ENOSYS;
- break;
- }
return ret;
}
EXPORT_SYMBOL(rdma_reject);
@@ -3098,22 +3052,17 @@ int rdma_disconnect(struct rdma_cm_id *id)
if (!id_priv->cm_id.ib)
return -EINVAL;

- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_mgmt(id->device, id->port_num)) {
ret = cma_modify_qp_err(id_priv);
if (ret)
goto out;
/* Initiate or respond to a disconnect. */
if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0))
ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0);
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_transport_iwarp(id->device, id->port_num)) {
ret = iw_cm_disconnect(id_priv->cm_id.iw, 0);
- break;
- default:
+ } else
ret = -EINVAL;
- break;
- }
out:
return ret;
}
@@ -3359,24 +3308,13 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
list_add(&mc->list, &id_priv->mc_list);
spin_unlock(&id_priv->lock);

- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
- switch (rdma_port_get_link_layer(id->device, id->port_num)) {
- case IB_LINK_LAYER_INFINIBAND:
- ret = cma_join_ib_multicast(id_priv, mc);
- break;
- case IB_LINK_LAYER_ETHERNET:
- kref_init(&mc->mcref);
- ret = cma_iboe_join_multicast(id_priv, mc);
- break;
- default:
- ret = -EINVAL;
- }
- break;
- default:
+ if (rdma_transport_iboe(id->device, id->port_num)) {
+ kref_init(&mc->mcref);
+ ret = cma_iboe_join_multicast(id_priv, mc);
+ } else if (rdma_transport_ib(id->device, id->port_num))
+ ret = cma_join_ib_multicast(id_priv, mc);
+ else
ret = -ENOSYS;
- break;
- }

if (ret) {
spin_lock_irq(&id_priv->lock);
@@ -3404,19 +3342,17 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr)
ib_detach_mcast(id->qp,
&mc->multicast.ib->rec.mgid,
be16_to_cpu(mc->multicast.ib->rec.mlid));
- if (rdma_node_get_transport(id_priv->cma_dev->device->node_type) == RDMA_TRANSPORT_IB) {
- switch (rdma_port_get_link_layer(id->device, id->port_num)) {
- case IB_LINK_LAYER_INFINIBAND:
- ib_sa_free_multicast(mc->multicast.ib);
- kfree(mc);
- break;
- case IB_LINK_LAYER_ETHERNET:
- kref_put(&mc->mcref, release_mc);
- break;
- default:
- break;
- }
- }
+
+ /* Will this happen? */
+ BUG_ON(id_priv->cma_dev->device != id->device);
+
+ if (rdma_transport_ib(id->device, id->port_num)) {
+ ib_sa_free_multicast(mc->multicast.ib);
+ kfree(mc);
+ } else if (rdma_transport_iboe(id->device,
+ id->port_num))
+ kref_put(&mc->mcref, release_mc);
+
return;
}
}
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 45d67e9..42c9bf6 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -722,26 +722,13 @@ static ssize_t ucma_query_route(struct ucma_file *file,

resp.node_guid = (__force __u64) ctx->cm_id->device->node_guid;
resp.port_num = ctx->cm_id->port_num;
- switch (rdma_node_get_transport(ctx->cm_id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
- switch (rdma_port_get_link_layer(ctx->cm_id->device,
- ctx->cm_id->port_num)) {
- case IB_LINK_LAYER_INFINIBAND:
- ucma_copy_ib_route(&resp, &ctx->cm_id->route);
- break;
- case IB_LINK_LAYER_ETHERNET:
- ucma_copy_iboe_route(&resp, &ctx->cm_id->route);
- break;
- default:
- break;
- }
- break;
- case RDMA_TRANSPORT_IWARP:
+
+ if (rdma_transport_ib(ctx->cm_id->device, ctx->cm_id->port_num))
+ ucma_copy_ib_route(&resp, &ctx->cm_id->route);
+ else if (rdma_transport_iboe(ctx->cm_id->device, ctx->cm_id->port_num))
+ ucma_copy_iboe_route(&resp, &ctx->cm_id->route);
+ else if (rdma_transport_iwarp(ctx->cm_id->device, ctx->cm_id->port_num))
ucma_copy_iw_route(&resp, &ctx->cm_id->route);
- break;
- default:
- break;
- }

out:
if (copy_to_user((void __user *)(unsigned long)cmd.response,
--
2.1.0

2015-04-07 12:38:11

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 14/17] IB/Verbs: Reserve legacy transport type for 'struct rdma_dev_addr'


Reserve the legacy transport type for the 'transport' member
of 'struct rdma_dev_addr' until we make sure this is no
longer needed.

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index c23f483..e26b42e 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -244,14 +244,35 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
}

+static inline void cma_set_legacy_transport(struct rdma_cm_id *id)
+{
+ switch (id->device->node_type) {
+ case RDMA_NODE_IB_CA:
+ case RDMA_NODE_IB_SWITCH:
+ case RDMA_NODE_IB_ROUTER:
+ id->route.addr.dev_addr.transport = RDMA_TRANSPORT_IB;
+ break;
+ case RDMA_NODE_RNIC:
+ id->route.addr.dev_addr.transport = RDMA_TRANSPORT_IWARP;
+ break;
+ case RDMA_NODE_USNIC:
+ id->route.addr.dev_addr.transport = RDMA_TRANSPORT_USNIC;
+ break;
+ case RDMA_NODE_USNIC_UDP:
+ id->route.addr.dev_addr.transport = RDMA_TRANSPORT_USNIC_UDP;
+ break;
+ default:
+ BUG();
+ }
+}
+
static void cma_attach_to_dev(struct rdma_id_private *id_priv,
struct cma_device *cma_dev)
{
atomic_inc(&cma_dev->refcount);
id_priv->cma_dev = cma_dev;
id_priv->id.device = cma_dev->device;
- id_priv->id.route.addr.dev_addr.transport =
- rdma_node_get_transport(cma_dev->device->node_type);
+ cma_set_legacy_transport(&id_priv->id);
list_add_tail(&id_priv->list, &cma_dev->id_list);
}

--
2.1.0

2015-04-07 12:38:50

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 15/17] IB/Verbs: Reform cma_acquire_dev() with management helpers


Reform cma_acquire_dev() with management helpers, introduce
cma_validate_port() to make the code more clean.

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 69 +++++++++++++++++++++++++------------------
1 file changed, 41 insertions(+), 28 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index e26b42e..dc05cd0 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -370,18 +370,36 @@ static int cma_translate_addr(struct sockaddr *addr, struct rdma_dev_addr *dev_a
return ret;
}

+static inline int cma_validate_port(struct ib_device *device, u8 port,
+ union ib_gid *gid, int dev_type)
+{
+ u8 found_port;
+ int ret = -ENODEV;
+
+ if ((dev_type == ARPHRD_INFINIBAND) && !rdma_transport_ib(device, port))
+ return ret;
+
+ if ((dev_type != ARPHRD_INFINIBAND) && rdma_transport_ib(device, port))
+ return ret;
+
+ ret = ib_find_cached_gid(device, gid, &found_port, NULL);
+
+ if (!ret && (port == found_port))
+ return 0;
+
+ return ret;
+}
+
static int cma_acquire_dev(struct rdma_id_private *id_priv,
struct rdma_id_private *listen_id_priv)
{
struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr;
struct cma_device *cma_dev;
- union ib_gid gid, iboe_gid;
+ union ib_gid gid, iboe_gid, *gidp;
int ret = -ENODEV;
- u8 port, found_port;
- enum rdma_link_layer dev_ll = dev_addr->dev_type == ARPHRD_INFINIBAND ?
- IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET;
+ u8 port;

- if (dev_ll != IB_LINK_LAYER_INFINIBAND &&
+ if (dev_addr->dev_type != ARPHRD_INFINIBAND &&
id_priv->id.ps == RDMA_PS_IPOIB)
return -EINVAL;

@@ -391,41 +409,36 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv,

memcpy(&gid, dev_addr->src_dev_addr +
rdma_addr_gid_offset(dev_addr), sizeof gid);
- if (listen_id_priv &&
- rdma_port_get_link_layer(listen_id_priv->id.device,
- listen_id_priv->id.port_num) == dev_ll) {
+
+ if (listen_id_priv) {
cma_dev = listen_id_priv->cma_dev;
port = listen_id_priv->id.port_num;
- if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB &&
- rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET)
- ret = ib_find_cached_gid(cma_dev->device, &iboe_gid,
- &found_port, NULL);
- else
- ret = ib_find_cached_gid(cma_dev->device, &gid,
- &found_port, NULL);
+ gidp = rdma_transport_iboe(cma_dev->device, port) ?
+ &iboe_gid : &gid;

- if (!ret && (port == found_port)) {
- id_priv->id.port_num = found_port;
+ ret = cma_validate_port(cma_dev->device, port, gidp,
+ dev_addr->dev_type);
+ if (!ret) {
+ id_priv->id.port_num = port;
goto out;
}
}
+
list_for_each_entry(cma_dev, &dev_list, list) {
for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port) {
if (listen_id_priv &&
listen_id_priv->cma_dev == cma_dev &&
listen_id_priv->id.port_num == port)
continue;
- if (rdma_port_get_link_layer(cma_dev->device, port) == dev_ll) {
- if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB &&
- rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET)
- ret = ib_find_cached_gid(cma_dev->device, &iboe_gid, &found_port, NULL);
- else
- ret = ib_find_cached_gid(cma_dev->device, &gid, &found_port, NULL);
-
- if (!ret && (port == found_port)) {
- id_priv->id.port_num = found_port;
- goto out;
- }
+
+ gidp = rdma_transport_iboe(cma_dev->device, port) ?
+ &iboe_gid : &gid;
+
+ ret = cma_validate_port(cma_dev->device, port, gidp,
+ dev_addr->dev_type);
+ if (!ret) {
+ id_priv->id.port_num = port;
+ goto out;
}
}
}
--
2.1.0

2015-04-07 12:39:25

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 16/17] IB/Verbs: Cleanup rdma_node_get_transport()


We have get rid of all the scene using rdma_node_get_transport(),
now clean it up.

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/verbs.c | 21 ---------------------
include/rdma/ib_verbs.h | 3 ---
2 files changed, 24 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index ca06f76..49acdbc 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -107,27 +107,6 @@ __attribute_const__ int ib_rate_to_mbps(enum ib_rate rate)
}
EXPORT_SYMBOL(ib_rate_to_mbps);

-__attribute_const__ enum rdma_transport_type
-rdma_node_get_transport(enum rdma_node_type node_type)
-{
- switch (node_type) {
- case RDMA_NODE_IB_CA:
- case RDMA_NODE_IB_SWITCH:
- case RDMA_NODE_IB_ROUTER:
- return RDMA_TRANSPORT_IB;
- case RDMA_NODE_RNIC:
- return RDMA_TRANSPORT_IWARP;
- case RDMA_NODE_USNIC:
- return RDMA_TRANSPORT_USNIC;
- case RDMA_NODE_USNIC_UDP:
- return RDMA_TRANSPORT_USNIC_UDP;
- default:
- BUG();
- return 0;
- }
-}
-EXPORT_SYMBOL(rdma_node_get_transport);
-
enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_num)
{
if (device->get_link_layer)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 2767a91..f033f824 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -84,9 +84,6 @@ enum rdma_transport_type {
RDMA_TRANSPORT_IBOE,
};

-__attribute_const__ enum rdma_transport_type
-rdma_node_get_transport(enum rdma_node_type node_type);
-
enum rdma_link_layer {
IB_LINK_LAYER_UNSPECIFIED,
IB_LINK_LAYER_INFINIBAND,
--
2.1.0

2015-04-07 12:39:57

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 17/17] IB/Verbs: Move rdma_port_get_link_layer() to mlx4 head file


Now only mlx4 still using rdma_port_get_link_layer(), move it
to it's private head file.

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/verbs.c | 20 --------------------
drivers/infiniband/hw/mlx4/mlx4_ib.h | 8 ++++++++
include/rdma/ib_verbs.h | 3 ---
3 files changed, 8 insertions(+), 23 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 49acdbc..f1eac93 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -107,26 +107,6 @@ __attribute_const__ int ib_rate_to_mbps(enum ib_rate rate)
}
EXPORT_SYMBOL(ib_rate_to_mbps);

-enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_num)
-{
- if (device->get_link_layer)
- return device->get_link_layer(device, port_num);
-
- switch (device->query_transport(device, port_num)) {
- case RDMA_TRANSPORT_IB:
- case RDMA_TRANSPORT_IBOE:
- return IB_LINK_LAYER_INFINIBAND;
- case RDMA_TRANSPORT_IWARP:
- case RDMA_TRANSPORT_USNIC:
- case RDMA_TRANSPORT_USNIC_UDP:
- return IB_LINK_LAYER_ETHERNET;
- default:
- BUG();
- return IB_LINK_LAYER_UNSPECIFIED;
- }
-}
-EXPORT_SYMBOL(rdma_port_get_link_layer);
-
/* Protection domains */

struct ib_pd *ib_alloc_pd(struct ib_device *device)
diff --git a/drivers/infiniband/hw/mlx4/mlx4_ib.h b/drivers/infiniband/hw/mlx4/mlx4_ib.h
index 6eb743f..8e86ecc 100644
--- a/drivers/infiniband/hw/mlx4/mlx4_ib.h
+++ b/drivers/infiniband/hw/mlx4/mlx4_ib.h
@@ -708,6 +708,14 @@ int __mlx4_ib_query_pkey(struct ib_device *ibdev, u8 port, u16 index,
int __mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
union ib_gid *gid, int netw_view);

+static enum rdma_link_layer
+rdma_port_get_link_layer(struct ib_device *device, u8 port_num)
+{
+ /* Will this happen? */
+ BUG_ON(!device->get_link_layer);
+ return device->get_link_layer(device, port_num);
+}
+
static inline bool mlx4_ib_ah_grh_present(struct mlx4_ib_ah *ah)
{
u8 port = be32_to_cpu(ah->av.ib.port_pd) >> 24 & 3;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index f033f824..67b3e71 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1742,9 +1742,6 @@ int ib_query_device(struct ib_device *device,
int ib_query_port(struct ib_device *device,
u8 port_num, struct ib_port_attr *port_attr);

-enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device,
- u8 port_num);
-
static inline int rdma_transport_ib(struct ib_device *device, u8 port_num)
{
return device->query_transport(device, port_num)
--
2.1.0

2015-04-07 12:42:06

by Michael Wang

[permalink] [raw]
Subject: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW


Add new callback query_transport() and implement for each HW.

Mapping List:
node-type link-layer old-transport new-transport
nes RNIC ETH IWARP IWARP
amso1100 RNIC ETH IWARP IWARP
cxgb3 RNIC ETH IWARP IWARP
cxgb4 RNIC ETH IWARP IWARP
usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP
ocrdma IB_CA ETH IB IBOE
mlx4 IB_CA IB/ETH IB IB/IBOE
mlx5 IB_CA IB IB IB
ehca IB_CA IB IB IB
ipath IB_CA IB IB IB
mthca IB_CA IB IB IB
qib IB_CA IB IB IB

Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/device.c | 1 +
drivers/infiniband/core/verbs.c | 4 +++-
drivers/infiniband/hw/amso1100/c2_provider.c | 7 +++++++
drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 +++++++
drivers/infiniband/hw/cxgb4/provider.c | 7 +++++++
drivers/infiniband/hw/ehca/ehca_hca.c | 6 ++++++
drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 +++
drivers/infiniband/hw/ehca/ehca_main.c | 1 +
drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +++++++
drivers/infiniband/hw/mlx4/main.c | 10 ++++++++++
drivers/infiniband/hw/mlx5/main.c | 7 +++++++
drivers/infiniband/hw/mthca/mthca_provider.c | 7 +++++++
drivers/infiniband/hw/nes/nes_verbs.c | 6 ++++++
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 +
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 ++++++
drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 +++
drivers/infiniband/hw/qib/qib_verbs.c | 7 +++++++
drivers/infiniband/hw/usnic/usnic_ib_main.c | 1 +
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 ++++++
drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2 ++
include/rdma/ib_verbs.h | 7 ++++++-
21 files changed, 104 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 18c1ece..a9587c4 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -76,6 +76,7 @@ static int ib_device_check_mandatory(struct ib_device *device)
} mandatory_table[] = {
IB_MANDATORY_FUNC(query_device),
IB_MANDATORY_FUNC(query_port),
+ IB_MANDATORY_FUNC(query_transport),
IB_MANDATORY_FUNC(query_pkey),
IB_MANDATORY_FUNC(query_gid),
IB_MANDATORY_FUNC(alloc_pd),
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index f93eb8d..83370de 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -133,14 +133,16 @@ enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_
if (device->get_link_layer)
return device->get_link_layer(device, port_num);

- switch (rdma_node_get_transport(device->node_type)) {
+ switch (device->query_transport(device, port_num)) {
case RDMA_TRANSPORT_IB:
+ case RDMA_TRANSPORT_IBOE:
return IB_LINK_LAYER_INFINIBAND;
case RDMA_TRANSPORT_IWARP:
case RDMA_TRANSPORT_USNIC:
case RDMA_TRANSPORT_USNIC_UDP:
return IB_LINK_LAYER_ETHERNET;
default:
+ BUG();
return IB_LINK_LAYER_UNSPECIFIED;
}
}
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c
index bdf3507..d46bbb0 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -99,6 +99,12 @@ static int c2_query_port(struct ib_device *ibdev,
return 0;
}

+static enum rdma_transport_type
+c2_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IWARP;
+}
+
static int c2_query_pkey(struct ib_device *ibdev,
u8 port, u16 index, u16 * pkey)
{
@@ -801,6 +807,7 @@ int c2_register_device(struct c2_dev *dev)
dev->ibdev.dma_device = &dev->pcidev->dev;
dev->ibdev.query_device = c2_query_device;
dev->ibdev.query_port = c2_query_port;
+ dev->ibdev.query_transport = c2_query_transport;
dev->ibdev.query_pkey = c2_query_pkey;
dev->ibdev.query_gid = c2_query_gid;
dev->ibdev.alloc_ucontext = c2_alloc_ucontext;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 811b24a..09682e9e 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1232,6 +1232,12 @@ static int iwch_query_port(struct ib_device *ibdev,
return 0;
}

+static enum rdma_transport_type
+iwch_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IWARP;
+}
+
static ssize_t show_rev(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -1385,6 +1391,7 @@ int iwch_register_device(struct iwch_dev *dev)
dev->ibdev.dma_device = &(dev->rdev.rnic_info.pdev->dev);
dev->ibdev.query_device = iwch_query_device;
dev->ibdev.query_port = iwch_query_port;
+ dev->ibdev.query_transport = iwch_query_transport;
dev->ibdev.query_pkey = iwch_query_pkey;
dev->ibdev.query_gid = iwch_query_gid;
dev->ibdev.alloc_ucontext = iwch_alloc_ucontext;
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 66bd6a2..a445e0d 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -390,6 +390,12 @@ static int c4iw_query_port(struct ib_device *ibdev, u8 port,
return 0;
}

+static enum rdma_transport_type
+c4iw_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IWARP;
+}
+
static ssize_t show_rev(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -506,6 +512,7 @@ int c4iw_register_device(struct c4iw_dev *dev)
dev->ibdev.dma_device = &(dev->rdev.lldi.pdev->dev);
dev->ibdev.query_device = c4iw_query_device;
dev->ibdev.query_port = c4iw_query_port;
+ dev->ibdev.query_transport = c4iw_query_transport;
dev->ibdev.query_pkey = c4iw_query_pkey;
dev->ibdev.query_gid = c4iw_query_gid;
dev->ibdev.alloc_ucontext = c4iw_alloc_ucontext;
diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c
index 9ed4d25..d5a34a6 100644
--- a/drivers/infiniband/hw/ehca/ehca_hca.c
+++ b/drivers/infiniband/hw/ehca/ehca_hca.c
@@ -242,6 +242,12 @@ query_port1:
return ret;
}

+enum rdma_transport_type
+ehca_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
int ehca_query_sma_attr(struct ehca_shca *shca,
u8 port, struct ehca_sma_attr *attr)
{
diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
index 22f79af..cec945f 100644
--- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
+++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
@@ -49,6 +49,9 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props);
int ehca_query_port(struct ib_device *ibdev, u8 port,
struct ib_port_attr *props);

+enum rdma_transport_type
+ehca_query_transport(struct ib_device *device, u8 port_num);
+
int ehca_query_sma_attr(struct ehca_shca *shca, u8 port,
struct ehca_sma_attr *attr);

diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index cd8d290..60e0a09 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -467,6 +467,7 @@ static int ehca_init_device(struct ehca_shca *shca)
shca->ib_device.dma_device = &shca->ofdev->dev;
shca->ib_device.query_device = ehca_query_device;
shca->ib_device.query_port = ehca_query_port;
+ shca->ib_device.query_transport = ehca_query_transport;
shca->ib_device.query_gid = ehca_query_gid;
shca->ib_device.query_pkey = ehca_query_pkey;
/* shca->in_device.modify_device = ehca_modify_device */
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 44ea939..58d36e3 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1638,6 +1638,12 @@ static int ipath_query_port(struct ib_device *ibdev,
return 0;
}

+static enum rdma_transport_type
+ipath_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
static int ipath_modify_device(struct ib_device *device,
int device_modify_mask,
struct ib_device_modify *device_modify)
@@ -2140,6 +2146,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd)
dev->query_device = ipath_query_device;
dev->modify_device = ipath_modify_device;
dev->query_port = ipath_query_port;
+ dev->query_transport = ipath_query_transport;
dev->modify_port = ipath_modify_port;
dev->query_pkey = ipath_query_pkey;
dev->query_gid = ipath_query_gid;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 0b280b1..28100bd 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -413,6 +413,15 @@ static int mlx4_ib_query_port(struct ib_device *ibdev, u8 port,
return __mlx4_ib_query_port(ibdev, port, props, 0);
}

+static enum rdma_transport_type
+mlx4_ib_query_transport(struct ib_device *device, u8 port_num)
+{
+ struct mlx4_dev *dev = to_mdev(device)->dev;
+
+ return dev->caps.port_mask[port_num] == MLX4_PORT_TYPE_IB ?
+ RDMA_TRANSPORT_IB : RDMA_TRANSPORT_IBOE;
+}
+
int __mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
union ib_gid *gid, int netw_view)
{
@@ -2121,6 +2130,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)

ibdev->ib_dev.query_device = mlx4_ib_query_device;
ibdev->ib_dev.query_port = mlx4_ib_query_port;
+ ibdev->ib_dev.query_transport = mlx4_ib_query_transport;
ibdev->ib_dev.get_link_layer = mlx4_ib_port_link_layer;
ibdev->ib_dev.query_gid = mlx4_ib_query_gid;
ibdev->ib_dev.query_pkey = mlx4_ib_query_pkey;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index cc4ac1e..209c796 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -351,6 +351,12 @@ out:
return err;
}

+static enum rdma_transport_type
+mlx5_ib_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
static int mlx5_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
union ib_gid *gid)
{
@@ -1336,6 +1342,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)

dev->ib_dev.query_device = mlx5_ib_query_device;
dev->ib_dev.query_port = mlx5_ib_query_port;
+ dev->ib_dev.query_transport = mlx5_ib_query_transport;
dev->ib_dev.query_gid = mlx5_ib_query_gid;
dev->ib_dev.query_pkey = mlx5_ib_query_pkey;
dev->ib_dev.modify_device = mlx5_ib_modify_device;
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 415f8e1..67ac6a4 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -179,6 +179,12 @@ static int mthca_query_port(struct ib_device *ibdev,
return err;
}

+static enum rdma_transport_type
+mthca_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
static int mthca_modify_device(struct ib_device *ibdev,
int mask,
struct ib_device_modify *props)
@@ -1281,6 +1287,7 @@ int mthca_register_device(struct mthca_dev *dev)
dev->ib_dev.dma_device = &dev->pdev->dev;
dev->ib_dev.query_device = mthca_query_device;
dev->ib_dev.query_port = mthca_query_port;
+ dev->ib_dev.query_transport = mthca_query_transport;
dev->ib_dev.modify_device = mthca_modify_device;
dev->ib_dev.modify_port = mthca_modify_port;
dev->ib_dev.query_pkey = mthca_query_pkey;
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index c0d0296..8df5b61 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -606,6 +606,11 @@ static int nes_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr
return 0;
}

+static enum rdma_transport_type
+nes_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IWARP;
+}

/**
* nes_query_pkey
@@ -3879,6 +3884,7 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev)
nesibdev->ibdev.dev.parent = &nesdev->pcidev->dev;
nesibdev->ibdev.query_device = nes_query_device;
nesibdev->ibdev.query_port = nes_query_port;
+ nesibdev->ibdev.query_transport = nes_query_transport;
nesibdev->ibdev.query_pkey = nes_query_pkey;
nesibdev->ibdev.query_gid = nes_query_gid;
nesibdev->ibdev.alloc_ucontext = nes_alloc_ucontext;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 7a2b59a..9f4d182 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -244,6 +244,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
/* mandatory verbs. */
dev->ibdev.query_device = ocrdma_query_device;
dev->ibdev.query_port = ocrdma_query_port;
+ dev->ibdev.query_transport = ocrdma_query_transport;
dev->ibdev.modify_port = ocrdma_modify_port;
dev->ibdev.query_gid = ocrdma_query_gid;
dev->ibdev.get_link_layer = ocrdma_link_layer;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 8771755..73bace4 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -187,6 +187,12 @@ int ocrdma_query_port(struct ib_device *ibdev,
return 0;
}

+enum rdma_transport_type
+ocrdma_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IBOE;
+}
+
int ocrdma_modify_port(struct ib_device *ibdev, u8 port, int mask,
struct ib_port_modify *props)
{
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index b8f7853..4a81b63 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -41,6 +41,9 @@ int ocrdma_query_port(struct ib_device *, u8 port, struct ib_port_attr *props);
int ocrdma_modify_port(struct ib_device *, u8 port, int mask,
struct ib_port_modify *props);

+enum rdma_transport_type
+ocrdma_query_transport(struct ib_device *device, u8 port_num);
+
void ocrdma_get_guid(struct ocrdma_dev *, u8 *guid);
int ocrdma_query_gid(struct ib_device *, u8 port,
int index, union ib_gid *gid);
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index 4a35998..caad665 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -1650,6 +1650,12 @@ static int qib_query_port(struct ib_device *ibdev, u8 port,
return 0;
}

+static enum rdma_transport_type
+qib_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
static int qib_modify_device(struct ib_device *device,
int device_modify_mask,
struct ib_device_modify *device_modify)
@@ -2184,6 +2190,7 @@ int qib_register_ib_device(struct qib_devdata *dd)
ibdev->query_device = qib_query_device;
ibdev->modify_device = qib_modify_device;
ibdev->query_port = qib_query_port;
+ ibdev->query_transport = qib_query_transport;
ibdev->modify_port = qib_modify_port;
ibdev->query_pkey = qib_query_pkey;
ibdev->query_gid = qib_query_gid;
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c b/drivers/infiniband/hw/usnic/usnic_ib_main.c
index 0d0f986..03ea9f3 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
@@ -360,6 +360,7 @@ static void *usnic_ib_device_add(struct pci_dev *dev)

us_ibdev->ib_dev.query_device = usnic_ib_query_device;
us_ibdev->ib_dev.query_port = usnic_ib_query_port;
+ us_ibdev->ib_dev.query_transport = usnic_ib_query_transport;
us_ibdev->ib_dev.query_pkey = usnic_ib_query_pkey;
us_ibdev->ib_dev.query_gid = usnic_ib_query_gid;
us_ibdev->ib_dev.get_link_layer = usnic_ib_port_link_layer;
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index 53bd6a2..ff9a5f7 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -348,6 +348,12 @@ int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
return 0;
}

+enum rdma_transport_type
+usnic_ib_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_USNIC_UDP;
+}
+
int usnic_ib_query_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr,
int qp_attr_mask,
struct ib_qp_init_attr *qp_init_attr)
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
index bb864f5..0b1633b 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
@@ -27,6 +27,8 @@ int usnic_ib_query_device(struct ib_device *ibdev,
struct ib_device_attr *props);
int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
struct ib_port_attr *props);
+enum rdma_transport_type
+usnic_ib_query_transport(struct ib_device *device, u8 port_num);
int usnic_ib_query_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr,
int qp_attr_mask,
struct ib_qp_init_attr *qp_init_attr);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 65994a1..d54f91e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -75,10 +75,13 @@ enum rdma_node_type {
};

enum rdma_transport_type {
+ /* legacy for users */
RDMA_TRANSPORT_IB,
RDMA_TRANSPORT_IWARP,
RDMA_TRANSPORT_USNIC,
- RDMA_TRANSPORT_USNIC_UDP
+ RDMA_TRANSPORT_USNIC_UDP,
+ /* new transport */
+ RDMA_TRANSPORT_IBOE,
};

__attribute_const__ enum rdma_transport_type
@@ -1501,6 +1504,8 @@ struct ib_device {
int (*query_port)(struct ib_device *device,
u8 port_num,
struct ib_port_attr *port_attr);
+ enum rdma_transport_type (*query_transport)(struct ib_device *device,
+ u8 port_num);
enum rdma_link_layer (*get_link_layer)(struct ib_device *device,
u8 port_num);
int (*query_gid)(struct ib_device *device,
--
2.1.0

2015-04-07 12:44:19

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH 01/17] IB/Verbs: Implement new callback query_transport() for each HW

V2 sent out, please ignore this one, my apologies.

Regards,
Michael Wang

On 04/07/2015 02:28 PM, Michael Wang wrote:
>
> Add new callback query_transport() and implement for each HW.
>
> Mapping List:
> node-type link-layer old-transport new-transport
> nes RNIC ETH IWARP IWARP
> amso1100 RNIC ETH IWARP IWARP
> cxgb3 RNIC ETH IWARP IWARP
> cxgb4 RNIC ETH IWARP IWARP
> usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP
> ocrdma IB_CA ETH IB IBOE
> mlx4 IB_CA IB/ETH IB IB/IBOE
> mlx5 IB_CA IB IB IB
> ehca IB_CA IB IB IB
> ipath IB_CA IB IB IB
> mthca IB_CA IB IB IB
> qib IB_CA IB IB IB
>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Doug Ledford <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Sean Hefty <[email protected]>
> Signed-off-by: Michael Wang <[email protected]>
> ---
> drivers/infiniband/core/device.c | 1 +
> drivers/infiniband/core/verbs.c | 4 +++-
> drivers/infiniband/hw/amso1100/c2_provider.c | 7 +++++++
> drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 +++++++
> drivers/infiniband/hw/cxgb4/provider.c | 7 +++++++
> drivers/infiniband/hw/ehca/ehca_hca.c | 6 ++++++
> drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 +++
> drivers/infiniband/hw/ehca/ehca_main.c | 1 +
> drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +++++++
> drivers/infiniband/hw/mlx4/main.c | 10 ++++++++++
> drivers/infiniband/hw/mlx5/main.c | 7 +++++++
> drivers/infiniband/hw/mthca/mthca_provider.c | 7 +++++++
> drivers/infiniband/hw/nes/nes_verbs.c | 6 ++++++
> drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 +
> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 ++++++
> drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 +++
> drivers/infiniband/hw/qib/qib_verbs.c | 7 +++++++
> drivers/infiniband/hw/usnic/usnic_ib_main.c | 1 +
> drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 ++++++
> drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2 ++
> include/rdma/ib_verbs.h | 7 ++++++-
> 21 files changed, 104 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 18c1ece..a9587c4 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -76,6 +76,7 @@ static int ib_device_check_mandatory(struct ib_device *device)
> } mandatory_table[] = {
> IB_MANDATORY_FUNC(query_device),
> IB_MANDATORY_FUNC(query_port),
> + IB_MANDATORY_FUNC(query_transport),
> IB_MANDATORY_FUNC(query_pkey),
> IB_MANDATORY_FUNC(query_gid),
> IB_MANDATORY_FUNC(alloc_pd),
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index f93eb8d..83370de 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -133,14 +133,16 @@ enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_
> if (device->get_link_layer)
> return device->get_link_layer(device, port_num);
>
> - switch (rdma_node_get_transport(device->node_type)) {
> + switch (device->query_transport(device, port_num)) {
> case RDMA_TRANSPORT_IB:
> + case RDMA_TRANSPORT_IBOE:
> return IB_LINK_LAYER_INFINIBAND;
> case RDMA_TRANSPORT_IWARP:
> case RDMA_TRANSPORT_USNIC:
> case RDMA_TRANSPORT_USNIC_UDP:
> return IB_LINK_LAYER_ETHERNET;
> default:
> + BUG();
> return IB_LINK_LAYER_UNSPECIFIED;
> }
> }
> diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c
> index bdf3507..d46bbb0 100644
> --- a/drivers/infiniband/hw/amso1100/c2_provider.c
> +++ b/drivers/infiniband/hw/amso1100/c2_provider.c
> @@ -99,6 +99,12 @@ static int c2_query_port(struct ib_device *ibdev,
> return 0;
> }
>
> +static enum rdma_transport_type
> +c2_query_transport(struct ib_device *device, u8 port_num)
> +{
> + return RDMA_TRANSPORT_IWARP;
> +}
> +
> static int c2_query_pkey(struct ib_device *ibdev,
> u8 port, u16 index, u16 * pkey)
> {
> @@ -801,6 +807,7 @@ int c2_register_device(struct c2_dev *dev)
> dev->ibdev.dma_device = &dev->pcidev->dev;
> dev->ibdev.query_device = c2_query_device;
> dev->ibdev.query_port = c2_query_port;
> + dev->ibdev.query_transport = c2_query_transport;
> dev->ibdev.query_pkey = c2_query_pkey;
> dev->ibdev.query_gid = c2_query_gid;
> dev->ibdev.alloc_ucontext = c2_alloc_ucontext;
> diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> index 811b24a..09682e9e 100644
> --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
> +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
> @@ -1232,6 +1232,12 @@ static int iwch_query_port(struct ib_device *ibdev,
> return 0;
> }
>
> +static enum rdma_transport_type
> +iwch_query_transport(struct ib_device *device, u8 port_num)
> +{
> + return RDMA_TRANSPORT_IWARP;
> +}
> +
> static ssize_t show_rev(struct device *dev, struct device_attribute *attr,
> char *buf)
> {
> @@ -1385,6 +1391,7 @@ int iwch_register_device(struct iwch_dev *dev)
> dev->ibdev.dma_device = &(dev->rdev.rnic_info.pdev->dev);
> dev->ibdev.query_device = iwch_query_device;
> dev->ibdev.query_port = iwch_query_port;
> + dev->ibdev.query_transport = iwch_query_transport;
> dev->ibdev.query_pkey = iwch_query_pkey;
> dev->ibdev.query_gid = iwch_query_gid;
> dev->ibdev.alloc_ucontext = iwch_alloc_ucontext;
> diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
> index 66bd6a2..a445e0d 100644
> --- a/drivers/infiniband/hw/cxgb4/provider.c
> +++ b/drivers/infiniband/hw/cxgb4/provider.c
> @@ -390,6 +390,12 @@ static int c4iw_query_port(struct ib_device *ibdev, u8 port,
> return 0;
> }
>
> +static enum rdma_transport_type
> +c4iw_query_transport(struct ib_device *device, u8 port_num)
> +{
> + return RDMA_TRANSPORT_IWARP;
> +}
> +
> static ssize_t show_rev(struct device *dev, struct device_attribute *attr,
> char *buf)
> {
> @@ -506,6 +512,7 @@ int c4iw_register_device(struct c4iw_dev *dev)
> dev->ibdev.dma_device = &(dev->rdev.lldi.pdev->dev);
> dev->ibdev.query_device = c4iw_query_device;
> dev->ibdev.query_port = c4iw_query_port;
> + dev->ibdev.query_transport = c4iw_query_transport;
> dev->ibdev.query_pkey = c4iw_query_pkey;
> dev->ibdev.query_gid = c4iw_query_gid;
> dev->ibdev.alloc_ucontext = c4iw_alloc_ucontext;
> diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c
> index 9ed4d25..d5a34a6 100644
> --- a/drivers/infiniband/hw/ehca/ehca_hca.c
> +++ b/drivers/infiniband/hw/ehca/ehca_hca.c
> @@ -242,6 +242,12 @@ query_port1:
> return ret;
> }
>
> +enum rdma_transport_type
> +ehca_query_transport(struct ib_device *device, u8 port_num)
> +{
> + return RDMA_TRANSPORT_IB;
> +}
> +
> int ehca_query_sma_attr(struct ehca_shca *shca,
> u8 port, struct ehca_sma_attr *attr)
> {
> diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
> index 22f79af..cec945f 100644
> --- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
> +++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
> @@ -49,6 +49,9 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props);
> int ehca_query_port(struct ib_device *ibdev, u8 port,
> struct ib_port_attr *props);
>
> +enum rdma_transport_type
> +ehca_query_transport(struct ib_device *device, u8 port_num);
> +
> int ehca_query_sma_attr(struct ehca_shca *shca, u8 port,
> struct ehca_sma_attr *attr);
>
> diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
> index cd8d290..60e0a09 100644
> --- a/drivers/infiniband/hw/ehca/ehca_main.c
> +++ b/drivers/infiniband/hw/ehca/ehca_main.c
> @@ -467,6 +467,7 @@ static int ehca_init_device(struct ehca_shca *shca)
> shca->ib_device.dma_device = &shca->ofdev->dev;
> shca->ib_device.query_device = ehca_query_device;
> shca->ib_device.query_port = ehca_query_port;
> + shca->ib_device.query_transport = ehca_query_transport;
> shca->ib_device.query_gid = ehca_query_gid;
> shca->ib_device.query_pkey = ehca_query_pkey;
> /* shca->in_device.modify_device = ehca_modify_device */
> diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
> index 44ea939..58d36e3 100644
> --- a/drivers/infiniband/hw/ipath/ipath_verbs.c
> +++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
> @@ -1638,6 +1638,12 @@ static int ipath_query_port(struct ib_device *ibdev,
> return 0;
> }
>
> +static enum rdma_transport_type
> +ipath_query_transport(struct ib_device *device, u8 port_num)
> +{
> + return RDMA_TRANSPORT_IB;
> +}
> +
> static int ipath_modify_device(struct ib_device *device,
> int device_modify_mask,
> struct ib_device_modify *device_modify)
> @@ -2140,6 +2146,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd)
> dev->query_device = ipath_query_device;
> dev->modify_device = ipath_modify_device;
> dev->query_port = ipath_query_port;
> + dev->query_transport = ipath_query_transport;
> dev->modify_port = ipath_modify_port;
> dev->query_pkey = ipath_query_pkey;
> dev->query_gid = ipath_query_gid;
> diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
> index 0b280b1..28100bd 100644
> --- a/drivers/infiniband/hw/mlx4/main.c
> +++ b/drivers/infiniband/hw/mlx4/main.c
> @@ -413,6 +413,15 @@ static int mlx4_ib_query_port(struct ib_device *ibdev, u8 port,
> return __mlx4_ib_query_port(ibdev, port, props, 0);
> }
>
> +static enum rdma_transport_type
> +mlx4_ib_query_transport(struct ib_device *device, u8 port_num)
> +{
> + struct mlx4_dev *dev = to_mdev(device)->dev;
> +
> + return dev->caps.port_mask[port_num] == MLX4_PORT_TYPE_IB ?
> + RDMA_TRANSPORT_IB : RDMA_TRANSPORT_IBOE;
> +}
> +
> int __mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
> union ib_gid *gid, int netw_view)
> {
> @@ -2121,6 +2130,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)
>
> ibdev->ib_dev.query_device = mlx4_ib_query_device;
> ibdev->ib_dev.query_port = mlx4_ib_query_port;
> + ibdev->ib_dev.query_transport = mlx4_ib_query_transport;
> ibdev->ib_dev.get_link_layer = mlx4_ib_port_link_layer;
> ibdev->ib_dev.query_gid = mlx4_ib_query_gid;
> ibdev->ib_dev.query_pkey = mlx4_ib_query_pkey;
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index cc4ac1e..209c796 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -351,6 +351,12 @@ out:
> return err;
> }
>
> +static enum rdma_transport_type
> +mlx5_ib_query_transport(struct ib_device *device, u8 port_num)
> +{
> + return RDMA_TRANSPORT_IB;
> +}
> +
> static int mlx5_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
> union ib_gid *gid)
> {
> @@ -1336,6 +1342,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
>
> dev->ib_dev.query_device = mlx5_ib_query_device;
> dev->ib_dev.query_port = mlx5_ib_query_port;
> + dev->ib_dev.query_transport = mlx5_ib_query_transport;
> dev->ib_dev.query_gid = mlx5_ib_query_gid;
> dev->ib_dev.query_pkey = mlx5_ib_query_pkey;
> dev->ib_dev.modify_device = mlx5_ib_modify_device;
> diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
> index 415f8e1..67ac6a4 100644
> --- a/drivers/infiniband/hw/mthca/mthca_provider.c
> +++ b/drivers/infiniband/hw/mthca/mthca_provider.c
> @@ -179,6 +179,12 @@ static int mthca_query_port(struct ib_device *ibdev,
> return err;
> }
>
> +static enum rdma_transport_type
> +mthca_query_transport(struct ib_device *device, u8 port_num)
> +{
> + return RDMA_TRANSPORT_IB;
> +}
> +
> static int mthca_modify_device(struct ib_device *ibdev,
> int mask,
> struct ib_device_modify *props)
> @@ -1281,6 +1287,7 @@ int mthca_register_device(struct mthca_dev *dev)
> dev->ib_dev.dma_device = &dev->pdev->dev;
> dev->ib_dev.query_device = mthca_query_device;
> dev->ib_dev.query_port = mthca_query_port;
> + dev->ib_dev.query_transport = mthca_query_transport;
> dev->ib_dev.modify_device = mthca_modify_device;
> dev->ib_dev.modify_port = mthca_modify_port;
> dev->ib_dev.query_pkey = mthca_query_pkey;
> diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
> index c0d0296..8df5b61 100644
> --- a/drivers/infiniband/hw/nes/nes_verbs.c
> +++ b/drivers/infiniband/hw/nes/nes_verbs.c
> @@ -606,6 +606,11 @@ static int nes_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr
> return 0;
> }
>
> +static enum rdma_transport_type
> +nes_query_transport(struct ib_device *device, u8 port_num)
> +{
> + return RDMA_TRANSPORT_IWARP;
> +}
>
> /**
> * nes_query_pkey
> @@ -3879,6 +3884,7 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev)
> nesibdev->ibdev.dev.parent = &nesdev->pcidev->dev;
> nesibdev->ibdev.query_device = nes_query_device;
> nesibdev->ibdev.query_port = nes_query_port;
> + nesibdev->ibdev.query_transport = nes_query_transport;
> nesibdev->ibdev.query_pkey = nes_query_pkey;
> nesibdev->ibdev.query_gid = nes_query_gid;
> nesibdev->ibdev.alloc_ucontext = nes_alloc_ucontext;
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
> index 7a2b59a..9f4d182 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
> @@ -244,6 +244,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
> /* mandatory verbs. */
> dev->ibdev.query_device = ocrdma_query_device;
> dev->ibdev.query_port = ocrdma_query_port;
> + dev->ibdev.query_transport = ocrdma_query_transport;
> dev->ibdev.modify_port = ocrdma_modify_port;
> dev->ibdev.query_gid = ocrdma_query_gid;
> dev->ibdev.get_link_layer = ocrdma_link_layer;
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> index 8771755..73bace4 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> @@ -187,6 +187,12 @@ int ocrdma_query_port(struct ib_device *ibdev,
> return 0;
> }
>
> +enum rdma_transport_type
> +ocrdma_query_transport(struct ib_device *device, u8 port_num)
> +{
> + return RDMA_TRANSPORT_IBOE;
> +}
> +
> int ocrdma_modify_port(struct ib_device *ibdev, u8 port, int mask,
> struct ib_port_modify *props)
> {
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
> index b8f7853..4a81b63 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
> @@ -41,6 +41,9 @@ int ocrdma_query_port(struct ib_device *, u8 port, struct ib_port_attr *props);
> int ocrdma_modify_port(struct ib_device *, u8 port, int mask,
> struct ib_port_modify *props);
>
> +enum rdma_transport_type
> +ocrdma_query_transport(struct ib_device *device, u8 port_num);
> +
> void ocrdma_get_guid(struct ocrdma_dev *, u8 *guid);
> int ocrdma_query_gid(struct ib_device *, u8 port,
> int index, union ib_gid *gid);
> diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
> index 4a35998..caad665 100644
> --- a/drivers/infiniband/hw/qib/qib_verbs.c
> +++ b/drivers/infiniband/hw/qib/qib_verbs.c
> @@ -1650,6 +1650,12 @@ static int qib_query_port(struct ib_device *ibdev, u8 port,
> return 0;
> }
>
> +static enum rdma_transport_type
> +qib_query_transport(struct ib_device *device, u8 port_num)
> +{
> + return RDMA_TRANSPORT_IB;
> +}
> +
> static int qib_modify_device(struct ib_device *device,
> int device_modify_mask,
> struct ib_device_modify *device_modify)
> @@ -2184,6 +2190,7 @@ int qib_register_ib_device(struct qib_devdata *dd)
> ibdev->query_device = qib_query_device;
> ibdev->modify_device = qib_modify_device;
> ibdev->query_port = qib_query_port;
> + ibdev->query_transport = qib_query_transport;
> ibdev->modify_port = qib_modify_port;
> ibdev->query_pkey = qib_query_pkey;
> ibdev->query_gid = qib_query_gid;
> diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c b/drivers/infiniband/hw/usnic/usnic_ib_main.c
> index 0d0f986..03ea9f3 100644
> --- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
> +++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
> @@ -360,6 +360,7 @@ static void *usnic_ib_device_add(struct pci_dev *dev)
>
> us_ibdev->ib_dev.query_device = usnic_ib_query_device;
> us_ibdev->ib_dev.query_port = usnic_ib_query_port;
> + us_ibdev->ib_dev.query_transport = usnic_ib_query_transport;
> us_ibdev->ib_dev.query_pkey = usnic_ib_query_pkey;
> us_ibdev->ib_dev.query_gid = usnic_ib_query_gid;
> us_ibdev->ib_dev.get_link_layer = usnic_ib_port_link_layer;
> diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
> index 53bd6a2..ff9a5f7 100644
> --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
> +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
> @@ -348,6 +348,12 @@ int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
> return 0;
> }
>
> +enum rdma_transport_type
> +usnic_ib_query_transport(struct ib_device *device, u8 port_num)
> +{
> + return RDMA_TRANSPORT_USNIC_UDP;
> +}
> +
> int usnic_ib_query_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr,
> int qp_attr_mask,
> struct ib_qp_init_attr *qp_init_attr)
> diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
> index bb864f5..0b1633b 100644
> --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
> +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
> @@ -27,6 +27,8 @@ int usnic_ib_query_device(struct ib_device *ibdev,
> struct ib_device_attr *props);
> int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
> struct ib_port_attr *props);
> +enum rdma_transport_type
> +usnic_ib_query_transport(struct ib_device *device, u8 port_num);
> int usnic_ib_query_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr,
> int qp_attr_mask,
> struct ib_qp_init_attr *qp_init_attr);
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 65994a1..d54f91e 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -75,10 +75,13 @@ enum rdma_node_type {
> };
>
> enum rdma_transport_type {
> + /* legacy for users */
> RDMA_TRANSPORT_IB,
> RDMA_TRANSPORT_IWARP,
> RDMA_TRANSPORT_USNIC,
> - RDMA_TRANSPORT_USNIC_UDP
> + RDMA_TRANSPORT_USNIC_UDP,
> + /* new transport */
> + RDMA_TRANSPORT_IBOE,
> };
>
> __attribute_const__ enum rdma_transport_type
> @@ -1501,6 +1504,8 @@ struct ib_device {
> int (*query_port)(struct ib_device *device,
> u8 port_num,
> struct ib_port_attr *port_attr);
> + enum rdma_transport_type (*query_transport)(struct ib_device *device,
> + u8 port_num);
> enum rdma_link_layer (*get_link_layer)(struct ib_device *device,
> u8 port_num);
> int (*query_gid)(struct ib_device *device,
>

2015-04-07 15:54:23

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH v2 09/17] IB/Verbs: Use helper cap_read_multi_sge() and reform svc_rdma_accept()

On 4/7/2015 8:34 AM, Michael Wang wrote:
> /**
> + * cap_read_multi_sge - Check if the port of device has the capability
> + * RDMA Read Multiple Scatter-Gather Entries.
> + *
> + * @device: Device to be checked
> + * @port_num: Port number of the device
> + *
> + * Return 0 when port of the device don't support
> + * RDMA Read Multiple Scatter-Gather Entries.
> + */
> +static inline int cap_read_multi_sge(struct ib_device *device, u8 port_num)
> +{
> + return !rdma_transport_iwarp(device, port_num);
> +}

This just papers over the issue we discussed earlier. How *many*
entries does the device support? If a device supports one, or two,
is that enough? How does the upper layer know the limit?

This needs an explicit device attribute, to be fixed properly.

> +
> +/**
> * cap_ipoib - Check if the port of device has the capability
> * IP over Infiniband.
> *
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> index e011027..604d035 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
> @@ -118,8 +118,8 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
>
> static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
> {
> - if (rdma_node_get_transport(xprt->sc_cm_id->device->node_type) ==
> - RDMA_TRANSPORT_IWARP)
> + if (!cap_read_multi_sge(xprt->sc_cm_id->device,
> + xprt->sc_cm_id->port_num))
> return 1;
> else
> return min_t(int, sge_count, xprt->sc_max_sge);

This is incorrect. The RDMA Read max is not at all the same as the
max_sge. It is a different operation, with a different set of work
request parameters.

In other words, the above same comment applies.


> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index 4e61880..e75175d 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -979,8 +979,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> /*
> * Determine if a DMA MR is required and if so, what privs are required
> */
> - switch (rdma_node_get_transport(newxprt->sc_cm_id->device->node_type)) {
> - case RDMA_TRANSPORT_IWARP:
> + if (rdma_transport_iwarp(newxprt->sc_cm_id->device,
> + newxprt->sc_cm_id->port_num)) {
> newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV;

Do I read this correctly that it is forcing the "read with invalidate"
capability to "on" for all iWARP devices? I don't think that is correct,
for the legacy devices you're also supporting.


> @@ -992,8 +992,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
> } else
> need_dma_mr = 0;
> - break;
> - case RDMA_TRANSPORT_IB:
> + } else if (rdma_ib_mgmt(newxprt->sc_cm_id->device,
> + newxprt->sc_cm_id->port_num)) {
> if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) {
> need_dma_mr = 1;
> dma_mr_acc = IB_ACCESS_LOCAL_WRITE;

Now I'm even more confused. How is the presence of IB management
related to needing a privileged lmr?



2015-04-07 16:05:22

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 09/17] IB/Verbs: Use helper cap_read_multi_sge() and reform svc_rdma_accept()

Hi, Tom

Thanks for the comments :-)

On 04/07/2015 05:46 PM, Tom Talpey wrote:
> On 4/7/2015 8:34 AM, Michael Wang wrote:
>> /**
>> + * cap_read_multi_sge - Check if the port of device has the capability
>> + * RDMA Read Multiple Scatter-Gather Entries.
>> + *
>> + * @device: Device to be checked
>> + * @port_num: Port number of the device
>> + *
>> + * Return 0 when port of the device don't support
>> + * RDMA Read Multiple Scatter-Gather Entries.
>> + */
>> +static inline int cap_read_multi_sge(struct ib_device *device, u8 port_num)
>> +{
>> + return !rdma_transport_iwarp(device, port_num);
>> +}
>
> This just papers over the issue we discussed earlier. How *many*
> entries does the device support? If a device supports one, or two,
> is that enough? How does the upper layer know the limit?
>
> This needs an explicit device attribute, to be fixed properly.

This is the prototype to expose the problem we have in here, I
would prefer some one good at this part to extending the API in
future, basing on the right logical.

Currently this just inherit from the legacy, it implemented
in order to be compatible with the current code.

>
>> +
>> +/**
>> * cap_ipoib - Check if the port of device has the capability
>> * IP over Infiniband.
>> *
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> index e011027..604d035 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
>> @@ -118,8 +118,8 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,
>>
>> static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
>> {
>> - if (rdma_node_get_transport(xprt->sc_cm_id->device->node_type) ==
>> - RDMA_TRANSPORT_IWARP)
>> + if (!cap_read_multi_sge(xprt->sc_cm_id->device,
>> + xprt->sc_cm_id->port_num))
>> return 1;
>> else
>> return min_t(int, sge_count, xprt->sc_max_sge);
>
> This is incorrect. The RDMA Read max is not at all the same as the
> max_sge. It is a different operation, with a different set of work
> request parameters.
>
> In other words, the above same comment applies.

Any idea on how to improve this part?

Again, all these helpers just inherit the old logical, if
it's wrong, let's correct it ;-)

And if we don't know how to correct, we can leave this as a
signpost and waiting for someone good at this particular part
to fix it.

>
>
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index 4e61880..e75175d 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -979,8 +979,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>> /*
>> * Determine if a DMA MR is required and if so, what privs are required
>> */
>> - switch (rdma_node_get_transport(newxprt->sc_cm_id->device->node_type)) {
>> - case RDMA_TRANSPORT_IWARP:
>> + if (rdma_transport_iwarp(newxprt->sc_cm_id->device,
>> + newxprt->sc_cm_id->port_num)) {
>> newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV;
>
> Do I read this correctly that it is forcing the "read with invalidate"
> capability to "on" for all iWARP devices? I don't think that is correct,
> for the legacy devices you're also supporting.

Hmm.. but that's exactly same as the old logical, correct?
Or do you mean the old logical is wrong?

>
>
>> @@ -992,8 +992,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>> dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
>> } else
>> need_dma_mr = 0;
>> - break;
>> - case RDMA_TRANSPORT_IB:
>> + } else if (rdma_ib_mgmt(newxprt->sc_cm_id->device,
>> + newxprt->sc_cm_id->port_num)) {
>> if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) {
>> need_dma_mr = 1;
>> dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
>
> Now I'm even more confused. How is the presence of IB management
> related to needing a privileged lmr?

I think you actually mean we need some more wrapper here
with the right name, correct?

I'm not good at this part, any suggestions?

Regards,
Michael Wang

>
>

2015-04-07 17:26:31

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 03/17] IB/Verbs: Use management helper cap_ib_mad() for mad-check

On Tue, Apr 07, 2015 at 02:30:22PM +0200, Michael Wang wrote:

> - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
> - return;
> -
> if (device->node_type == RDMA_NODE_IB_SWITCH) {
> start = 0;
> end = 0;
> @@ -3069,6 +3066,9 @@ static void ib_mad_init_device(struct ib_device *device)
> }
>
> for (i = start; i <= end; i++) {
> + if (!cap_ib_mad(device, i))
> + continue;
> +

I would prefer to see these changes in control flow as dedicated
patches, at the top of your patch stack.

For this kind of work a patch should be mechanical changes only, it is
easier to review that way.

Same comment applies throughout.

Jason

2015-04-07 17:42:38

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 09/17] IB/Verbs: Use helper cap_read_multi_sge() and reform svc_rdma_accept()

On Tue, Apr 07, 2015 at 11:46:57AM -0400, Tom Talpey wrote:
> On 4/7/2015 8:34 AM, Michael Wang wrote:
> > /**
> >+ * cap_read_multi_sge - Check if the port of device has the capability
> >+ * RDMA Read Multiple Scatter-Gather Entries.
> >+ *
> >+ * @device: Device to be checked
> >+ * @port_num: Port number of the device
> >+ *
> >+ * Return 0 when port of the device don't support
> >+ * RDMA Read Multiple Scatter-Gather Entries.
> >+ */
> >+static inline int cap_read_multi_sge(struct ib_device *device, u8 port_num)
> >+{
> >+ return !rdma_transport_iwarp(device, port_num);
> >+}
>
> This just papers over the issue we discussed earlier. How *many*
> entries does the device support? If a device supports one, or two,
> is that enough? How does the upper layer know the limit?

I think Michael is fine to just make this one mechanical change.

The kernel only supports two kinds of devices today, ones with 1 read
SGE and ones where READ SGE == WRITE SGE == SEND SGE.

If someone makes another variation then it is up to them to propose a
better fix.


> > static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
> > {
> >- if (rdma_node_get_transport(xprt->sc_cm_id->device->node_type) ==
> >- RDMA_TRANSPORT_IWARP)
> >+ if (!cap_read_multi_sge(xprt->sc_cm_id->device,
> >+ xprt->sc_cm_id->port_num))
> > return 1;
> > else
> > return min_t(int, sge_count, xprt->sc_max_sge);
>
> This is incorrect. The RDMA Read max is not at all the same as the
> max_sge. It is a different operation, with a different set of work
> request parameters.

The algorithm looks OK to me,

newxprt->sc_max_sge = min((size_t)devattr.max_sge,
(size_t)RPCSVC_MAXPAGES);

So it returns 1 or the number of sge entries per WR, and max_sge is
for READ/WRITE/SEND in every case except when cap_read_multi_sge == 1

> > /*
> > * Determine if a DMA MR is required and if so, what privs are required
> > */
> >- switch (rdma_node_get_transport(newxprt->sc_cm_id->device->node_type)) {
> >- case RDMA_TRANSPORT_IWARP:
> >+ if (rdma_transport_iwarp(newxprt->sc_cm_id->device,
> >+ newxprt->sc_cm_id->port_num)) {
> > newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV;
>
> Do I read this correctly that it is forcing the "read with invalidate"
> capability to "on" for all iWARP devices? I don't think that is correct,
> for the legacy devices you're also supporting.

No idea here, this logic was added in:

commit 3a5c63803d0552a3ad93b85c262f12cd86471443
Author: Tom Tucker <[email protected]>
Date: Tue Sep 30 13:46:13 2008 -0500

svcrdma: Query device for Fast Reg support during connection setup

Query the device capabilities in the svc_rdma_accept function to determine
what advanced memory management capabilities are supported by the device.
Based on the query, select the most secure model available given the
requirements of the transport and capabilities of the adapter.

Signed-off-by: Tom Tucker <[email protected]>

> >@@ -992,8 +992,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
> > dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
> > } else
> > need_dma_mr = 0;
> >- break;
> >- case RDMA_TRANSPORT_IB:
> >+ } else if (rdma_ib_mgmt(newxprt->sc_cm_id->device,
> >+ newxprt->sc_cm_id->port_num)) {
> > if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) {
> > need_dma_mr = 1;
> > dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
>
> Now I'm even more confused. How is the presence of IB management
> related to needing a privileged lmr?

Agree, this needs to be someone else.

I think the test is probably based on this comment:

* NB: iWARP requires remote write access for the data sink
* of an RDMA_READ. IB does not.

So the if should be:

if (cap_rdma_read_needs_write(..) &&
!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) {
need_dma_mr = 1;
dma_mr_acc =
(IB_ACCESS_LOCAL_WRITE |
IB_ACCESS_REMOTE_WRITE);

And the identical if blocks merged.

Plus the
if (rdma_transport_iwarp(newxprt->sc_cm_id->device,
newxprt->sc_cm_id->port_num))
newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV

Jason

2015-04-07 18:40:20

by Hefty, Sean

[permalink] [raw]
Subject: RE: [PATCH v2 10/17] IB/Verbs: Adopt management helpers for IB helpers

PiBkaWZmIC0tZ2l0IGEvZHJpdmVycy9pbmZpbmliYW5kL2NvcmUvc2FfcXVlcnkuYw0KPiBiL2Ry
aXZlcnMvaW5maW5pYmFuZC9jb3JlL3NhX3F1ZXJ5LmMNCj4gaW5kZXggZjcwNDI1NC4uNGU2MTEw
NCAxMDA2NDQNCj4gLS0tIGEvZHJpdmVycy9pbmZpbmliYW5kL2NvcmUvc2FfcXVlcnkuYw0KPiAr
KysgYi9kcml2ZXJzL2luZmluaWJhbmQvY29yZS9zYV9xdWVyeS5jDQo+IEBAIC01NDAsNyArNTQw
LDcgQEAgaW50IGliX2luaXRfYWhfZnJvbV9wYXRoKHN0cnVjdCBpYl9kZXZpY2UgKmRldmljZSwg
dTgNCj4gcG9ydF9udW0sDQo+ICAJYWhfYXR0ci0+cG9ydF9udW0gPSBwb3J0X251bTsNCj4gIAlh
aF9hdHRyLT5zdGF0aWNfcmF0ZSA9IHJlYy0+cmF0ZTsNCj4gDQo+IC0JZm9yY2VfZ3JoID0gcmRt
YV9wb3J0X2dldF9saW5rX2xheWVyKGRldmljZSwgcG9ydF9udW0pID09DQo+IElCX0xJTktfTEFZ
RVJfRVRIRVJORVQ7DQo+ICsJZm9yY2VfZ3JoID0gIXJkbWFfdHJhbnNwb3J0X2liKGRldmljZSwg
cG9ydF9udW0pOw0KPiANCj4gIAlpZiAocmVjLT5ob3BfbGltaXQgPiAxIHx8IGZvcmNlX2dyaCkg
ew0KPiAgCQlhaF9hdHRyLT5haF9mbGFncyA9IElCX0FIX0dSSDsNCj4gZGlmZiAtLWdpdCBhL2Ry
aXZlcnMvaW5maW5pYmFuZC9jb3JlL3ZlcmJzLmMNCj4gYi9kcml2ZXJzL2luZmluaWJhbmQvY29y
ZS92ZXJicy5jDQo+IGluZGV4IDgzMzcwZGUuLmNhMDZmNzYgMTAwNjQ0DQo+IC0tLSBhL2RyaXZl
cnMvaW5maW5pYmFuZC9jb3JlL3ZlcmJzLmMNCj4gKysrIGIvZHJpdmVycy9pbmZpbmliYW5kL2Nv
cmUvdmVyYnMuYw0KPiBAQCAtMjAwLDExICsyMDAsOSBAQCBpbnQgaWJfaW5pdF9haF9mcm9tX3dj
KHN0cnVjdCBpYl9kZXZpY2UgKmRldmljZSwgdTgNCj4gcG9ydF9udW0sIHN0cnVjdCBpYl93YyAq
d2MsDQo+ICAJdTMyIGZsb3dfY2xhc3M7DQo+ICAJdTE2IGdpZF9pbmRleDsNCj4gIAlpbnQgcmV0
Ow0KPiAtCWludCBpc19ldGggPSAocmRtYV9wb3J0X2dldF9saW5rX2xheWVyKGRldmljZSwgcG9y
dF9udW0pID09DQo+IC0JCQlJQl9MSU5LX0xBWUVSX0VUSEVSTkVUKTsNCj4gDQo+ICAJbWVtc2V0
KGFoX2F0dHIsIDAsIHNpemVvZiAqYWhfYXR0cik7DQo+IC0JaWYgKGlzX2V0aCkgew0KPiArCWlm
ICghcmRtYV90cmFuc3BvcnRfaWIoZGV2aWNlLCBwb3J0X251bSkpIHsNCj4gIAkJaWYgKCEod2Mt
PndjX2ZsYWdzICYgSUJfV0NfR1JIKSkNCj4gIAkJCXJldHVybiAtRVBST1RPVFlQRTsNCj4gDQo+
IEBAIC04NzMsNyArODcxLDcgQEAgaW50IGliX3Jlc29sdmVfZXRoX2wyX2F0dHJzKHN0cnVjdCBp
Yl9xcCAqcXAsDQo+ICAJdW5pb24gaWJfZ2lkICBzZ2lkOw0KPiANCj4gIAlpZiAoKCpxcF9hdHRy
X21hc2sgJiBJQl9RUF9BVikgICYmDQo+IC0JICAgIChyZG1hX3BvcnRfZ2V0X2xpbmtfbGF5ZXIo
cXAtPmRldmljZSwgcXBfYXR0ci0+YWhfYXR0ci5wb3J0X251bSkNCj4gPT0gSUJfTElOS19MQVlF
Ul9FVEhFUk5FVCkpIHsNCj4gKwkgICAgKCFyZG1hX3RyYW5zcG9ydF9pYihxcC0+ZGV2aWNlLCBx
cF9hdHRyLT5haF9hdHRyLnBvcnRfbnVtKSkpIHsNCj4gIAkJcmV0ID0gaWJfcXVlcnlfZ2lkKHFw
LT5kZXZpY2UsIHFwX2F0dHItPmFoX2F0dHIucG9ydF9udW0sDQo+ICAJCQkJICAgcXBfYXR0ci0+
YWhfYXR0ci5ncmguc2dpZF9pbmRleCwgJnNnaWQpOw0KPiAgCQlpZiAocmV0KQ0KDQpUaGUgYWJv
dmUgY2hlY2tzIHdvdWxkIGJlIGJldHRlciBhczoNCg0KCWZvcmNlX2dyaCA9IHJkbWFfdHJhbnNw
b3J0X2lib2UoLi4uKQ0KDQpUaGV5IGFyZSBSb0NFL0lCb0Ugc3BlY2lmaWMgY2hlY2tzLg0K

2015-04-07 18:49:55

by Hefty, Sean

[permalink] [raw]
Subject: RE: [PATCH v2 11/17] IB/Verbs: Reform link_layer_show() and ib_uverbs_query_port()

PiBkaWZmIC0tZ2l0IGEvZHJpdmVycy9pbmZpbmliYW5kL2NvcmUvc3lzZnMuYw0KPiBiL2RyaXZl
cnMvaW5maW5pYmFuZC9jb3JlL3N5c2ZzLmMNCj4gaW5kZXggY2JkMDM4My4uYWE1M2U0MCAxMDA2
NDQNCj4gLS0tIGEvZHJpdmVycy9pbmZpbmliYW5kL2NvcmUvc3lzZnMuYw0KPiArKysgYi9kcml2
ZXJzL2luZmluaWJhbmQvY29yZS9zeXNmcy5jDQo+IEBAIC0yNDgsMTQgKzI0OCwxMCBAQCBzdGF0
aWMgc3NpemVfdCBwaHlzX3N0YXRlX3Nob3coc3RydWN0IGliX3BvcnQgKnAsDQo+IHN0cnVjdCBw
b3J0X2F0dHJpYnV0ZSAqdW51c2VkLA0KPiAgc3RhdGljIHNzaXplX3QgbGlua19sYXllcl9zaG93
KHN0cnVjdCBpYl9wb3J0ICpwLCBzdHJ1Y3QgcG9ydF9hdHRyaWJ1dGUNCj4gKnVudXNlZCwNCj4g
IAkJCSAgICAgICBjaGFyICpidWYpDQo+ICB7DQo+IC0Jc3dpdGNoIChyZG1hX3BvcnRfZ2V0X2xp
bmtfbGF5ZXIocC0+aWJkZXYsIHAtPnBvcnRfbnVtKSkgew0KPiAtCWNhc2UgSUJfTElOS19MQVlF
Ul9JTkZJTklCQU5EOg0KPiArCWlmIChyZG1hX3RyYW5zcG9ydF9pYihwLT5pYmRldiwgcC0+cG9y
dF9udW0pKQ0KPiAgCQlyZXR1cm4gc3ByaW50ZihidWYsICIlc1xuIiwgIkluZmluaUJhbmQiKTsN
Cj4gLQljYXNlIElCX0xJTktfTEFZRVJfRVRIRVJORVQ6DQo+ICsJZWxzZQ0KPiAgCQlyZXR1cm4g
c3ByaW50ZihidWYsICIlc1xuIiwgIkV0aGVybmV0Iik7DQo+IC0JZGVmYXVsdDoNCj4gLQkJcmV0
dXJuIHNwcmludGYoYnVmLCAiJXNcbiIsICJVbmtub3duIik7DQo+IC0JfQ0KPiAgfQ0KPiANCj4g
IHN0YXRpYyBQT1JUX0FUVFJfUk8oc3RhdGUpOw0KPiBkaWZmIC0tZ2l0IGEvZHJpdmVycy9pbmZp
bmliYW5kL2NvcmUvdXZlcmJzX2NtZC5jDQo+IGIvZHJpdmVycy9pbmZpbmliYW5kL2NvcmUvdXZl
cmJzX2NtZC5jDQo+IGluZGV4IGE5ZjA0ODkuLjNlYjZlYjUgMTAwNjQ0DQo+IC0tLSBhL2RyaXZl
cnMvaW5maW5pYmFuZC9jb3JlL3V2ZXJic19jbWQuYw0KPiArKysgYi9kcml2ZXJzL2luZmluaWJh
bmQvY29yZS91dmVyYnNfY21kLmMNCj4gQEAgLTUxNSw4ICs1MTUsMTAgQEAgc3NpemVfdCBpYl91
dmVyYnNfcXVlcnlfcG9ydChzdHJ1Y3QgaWJfdXZlcmJzX2ZpbGUNCj4gKmZpbGUsDQo+ICAJcmVz
cC5hY3RpdmVfd2lkdGggICAgPSBhdHRyLmFjdGl2ZV93aWR0aDsNCj4gIAlyZXNwLmFjdGl2ZV9z
cGVlZCAgICA9IGF0dHIuYWN0aXZlX3NwZWVkOw0KPiAgCXJlc3AucGh5c19zdGF0ZSAgICAgID0g
YXR0ci5waHlzX3N0YXRlOw0KPiAtCXJlc3AubGlua19sYXllciAgICAgID0gcmRtYV9wb3J0X2dl
dF9saW5rX2xheWVyKGZpbGUtPmRldmljZS0NCj4gPmliX2RldiwNCj4gLQkJCQkJCQljbWQucG9y
dF9udW0pOw0KPiArCXJlc3AubGlua19sYXllciAgICAgID0gcmRtYV90cmFuc3BvcnRfaWIoZmls
ZS0+ZGV2aWNlLT5pYl9kZXYsDQo+ICsJCQkJCQkJY21kLnBvcnRfbnVtKSA/DQo+ICsJCQkgICAg
ICAgSUJfTElOS19MQVlFUl9JTkZJTklCQU5EIDoNCj4gKwkJCSAgICAgICBJQl9MSU5LX0xBWUVS
X0VUSEVSTkVUOw0KPiANCj4gIAlpZiAoY29weV90b191c2VyKCh2b2lkIF9fdXNlciAqKSAodW5z
aWduZWQgbG9uZykgY21kLnJlc3BvbnNlLA0KPiAgCQkJICZyZXNwLCBzaXplb2YgcmVzcCkpDQoN
CkJvdGggb2YgdGhlIGFib3ZlIGNoZWNrIHRoZSB0cmFuc3BvcnQgaW4gb3JkZXIgdG8gZGV0ZXJt
aW5lIHRoZSBsaW5rIGxheWVyLg0KDQpUaGVzZSB2YWx1ZXMgYXJlIGV4cG9zZWQgdG8gdXNlciBz
cGFjZS4gIERvZXMgYW55b25lIGtub3cgd2hhdCBsaW5rIGxheWVyIGlXYXJwIHJldHVybnMgdG8g
dXNlciBzcGFjZT8gDQo=

2015-04-07 18:56:45

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH v2 11/17] IB/Verbs: Reform link_layer_show() and ib_uverbs_query_port()

>
> > diff --git a/drivers/infiniband/core/sysfs.c
> > b/drivers/infiniband/core/sysfs.c
> > index cbd0383..aa53e40 100644
> > --- a/drivers/infiniband/core/sysfs.c
> > +++ b/drivers/infiniband/core/sysfs.c
> > @@ -248,14 +248,10 @@ static ssize_t phys_state_show(struct ib_port *p,
> > struct port_attribute *unused,
> > static ssize_t link_layer_show(struct ib_port *p, struct port_attribute
> > *unused,
> > char *buf)
> > {
> > - switch (rdma_port_get_link_layer(p->ibdev, p->port_num)) {
> > - case IB_LINK_LAYER_INFINIBAND:
> > + if (rdma_transport_ib(p->ibdev, p->port_num))
> > return sprintf(buf, "%s\n", "InfiniBand");
> > - case IB_LINK_LAYER_ETHERNET:
> > + else
> > return sprintf(buf, "%s\n", "Ethernet");
> > - default:
> > - return sprintf(buf, "%s\n", "Unknown");
> > - }
> > }
> >
> > static PORT_ATTR_RO(state);
> > diff --git a/drivers/infiniband/core/uverbs_cmd.c
> > b/drivers/infiniband/core/uverbs_cmd.c
> > index a9f0489..3eb6eb5 100644
> > --- a/drivers/infiniband/core/uverbs_cmd.c
> > +++ b/drivers/infiniband/core/uverbs_cmd.c
> > @@ -515,8 +515,10 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file
> > *file,
> > resp.active_width = attr.active_width;
> > resp.active_speed = attr.active_speed;
> > resp.phys_state = attr.phys_state;
> > - resp.link_layer = rdma_port_get_link_layer(file->device-
> > >ib_dev,
> > - cmd.port_num);
> > + resp.link_layer = rdma_transport_ib(file->device->ib_dev,
> > + cmd.port_num) ?
> > + IB_LINK_LAYER_INFINIBAND :
> > + IB_LINK_LAYER_ETHERNET;
> >
> > if (copy_to_user((void __user *) (unsigned long) cmd.response,
> > &resp, sizeof resp))
>
> Both of the above check the transport in order to determine the link layer.
>
> These values are exposed to user space. Does anyone know what link layer iWarp returns to user space?

Ethernet:

t4:~ # ibv_devinfo -d cxgb4_0|grep link_layer
link_layer: Ethernet
link_layer: Ethernet

Steve.


2015-04-07 20:13:17

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 10/17] IB/Verbs: Adopt management helpers for IB helpers

On Tue, Apr 07, 2015 at 02:35:22PM +0200, Michael Wang wrote:
> index f704254..4e61104 100644
> +++ b/drivers/infiniband/core/sa_query.c
> @@ -540,7 +540,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
> ah_attr->port_num = port_num;
> ah_attr->static_rate = rec->rate;
>
> - force_grh = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_ETHERNET;
> + force_grh = !rdma_transport_ib(device, port_num);

Maybe these tests should be called cap_mandatory_grh - but I'm not
really sure how iWarp uses the GRH fields in the AH...

Jason

2015-04-07 20:16:28

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH v2 10/17] IB/Verbs: Adopt management helpers for IB helpers



> -----Original Message-----
> From: Jason Gunthorpe [mailto:[email protected]]
> Sent: Tuesday, April 07, 2015 3:13 PM
> To: Michael Wang
> Cc: Roland Dreier; Sean Hefty; [email protected]; [email protected]; [email protected];
> [email protected]; Hal Rosenstock; Tom Tucker; Steve Wise; Hoang-Nam Nguyen; Christoph Raisch; Mike Marciniszyn; Eli Cohen;
> Faisal Latif; Upinder Malhi; Trond Myklebust; J. Bruce Fields; David S. Miller; Ira Weiny; PJ Waskiewicz; Tatyana Nikolova; Or
Gerlitz; Jack
> Morgenstein; Haggai Eran; Ilya Nelkenbaum; Yann Droneaud; Bart Van Assche; Shachar Raindel; Sagi Grimberg; Devesh Sharma; Matan
> Barak; Moni Shoua; Jiri Kosina; Selvin Xavier; Mitesh Ahuja; Li RongQing; Rasmus Villemoes; Alex Estrin; Doug Ledford; Eric
Dumazet; Erez
> Shitrit; Tom Gundersen; Chuck Lever
> Subject: Re: [PATCH v2 10/17] IB/Verbs: Adopt management helpers for IB helpers
>
> On Tue, Apr 07, 2015 at 02:35:22PM +0200, Michael Wang wrote:
> > index f704254..4e61104 100644
> > +++ b/drivers/infiniband/core/sa_query.c
> > @@ -540,7 +540,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
> > ah_attr->port_num = port_num;
> > ah_attr->static_rate = rec->rate;
> >
> > - force_grh = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_ETHERNET;
> > + force_grh = !rdma_transport_ib(device, port_num);
>
> Maybe these tests should be called cap_mandatory_grh - but I'm not
> really sure how iWarp uses the GRH fields in the AH...
>

iWARP runs on top of TCP...this SA code is all IB-specific. The reason it was checking for ETHERNET, I think, is for RoCE. So
this change is totally incorrect, I think, because RoCE is an IB transport, but it runs on ETHERNET.

Steve.




2015-04-07 20:18:03

by Hefty, Sean

[permalink] [raw]
Subject: RE: [PATCH v2 10/17] IB/Verbs: Adopt management helpers for IB helpers

> > index f704254..4e61104 100644
> > +++ b/drivers/infiniband/core/sa_query.c
> > @@ -540,7 +540,7 @@ int ib_init_ah_from_path(struct ib_device *device,
> u8 port_num,
> > ah_attr->port_num = port_num;
> > ah_attr->static_rate = rec->rate;
> >
> > - force_grh = rdma_port_get_link_layer(device, port_num) ==
> IB_LINK_LAYER_ETHERNET;
> > + force_grh = !rdma_transport_ib(device, port_num);
>
> Maybe these tests should be called cap_mandatory_grh - but I'm not
> really sure how iWarp uses the GRH fields in the AH...

AH are used with unconnected endpoints, which iWarp doesn't currently support.

2015-04-07 21:11:20

by Steve Wise

[permalink] [raw]
Subject: RE: [PATCH v2 13/17] IB/Verbs: Reform cma/ucma with management helpers



> -----Original Message-----
> From: Michael Wang [mailto:[email protected]]
> Sent: Tuesday, April 07, 2015 7:37 AM
> To: Roland Dreier; Sean Hefty; [email protected]; [email protected]; [email protected];
> [email protected]
> Cc: Hal Rosenstock; Tom Tucker; Steve Wise; Hoang-Nam Nguyen; Christoph Raisch; Mike Marciniszyn; Eli Cohen; Faisal Latif; Upinder
> Malhi; Trond Myklebust; J. Bruce Fields; David S. Miller; Ira Weiny; PJ Waskiewicz; Tatyana Nikolova; Or Gerlitz; Jack Morgenstein; Haggai
> Eran; Ilya Nelkenbaum; Yann Droneaud; Bart Van Assche; Shachar Raindel; Sagi Grimberg; Devesh Sharma; Matan Barak; Moni Shoua; Jiri
> Kosina; Selvin Xavier; Mitesh Ahuja; Li RongQing; Rasmus Villemoes; Alex Estrin; Doug Ledford; Eric Dumazet; Erez Shitrit; Tom
> Gundersen; Chuck Lever; Michael Wang
> Subject: [PATCH v2 13/17] IB/Verbs: Reform cma/ucma with management helpers
>
>
> Reform cma/ucma with management helpers.
>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Doug Ledford <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Sean Hefty <[email protected]>
> Signed-off-by: Michael Wang <[email protected]>
> ---
> drivers/infiniband/core/cma.c | 182 +++++++++++++----------------------------
> drivers/infiniband/core/ucma.c | 25 ++----
> 2 files changed, 65 insertions(+), 142 deletions(-)
>
> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
> index d8a8ea7..c23f483 100644
> --- a/drivers/infiniband/core/cma.c
> +++ b/drivers/infiniband/core/cma.c
> @@ -435,10 +435,10 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
> pkey = ntohs(addr->sib_pkey);
>
> list_for_each_entry(cur_dev, &dev_list, list) {
> - if (rdma_node_get_transport(cur_dev->device->node_type) != RDMA_TRANSPORT_IB)
> - continue;
> -
> for (p = 1; p <= cur_dev->device->phys_port_cnt; ++p) {
> + if (!rdma_ib_mgmt(cur_dev->device, p))
> + continue;
> +
> if (ib_find_cached_pkey(cur_dev->device, p, pkey, &index))
> continue;
>
> @@ -633,10 +633,10 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
> if (ret)
> goto out;
>
> - if (rdma_node_get_transport(id_priv->cma_dev->device->node_type)
> - == RDMA_TRANSPORT_IB &&
> - rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)
> - == IB_LINK_LAYER_ETHERNET) {
> + /* Will this happen? */
> + BUG_ON(id_priv->cma_dev->device != id_priv->id.device);
> +
> + if (rdma_transport_iboe(id_priv->id.device, id_priv->id.port_num)) {
> ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL);
>
> if (ret)
> @@ -700,8 +700,7 @@ static int cma_ib_init_qp_attr(struct rdma_id_private *id_priv,
> int ret;
> u16 pkey;
>
> - if (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num) ==
> - IB_LINK_LAYER_INFINIBAND)
> + if (rdma_transport_ib(id_priv->id.device, id_priv->id.port_num))
> pkey = ib_addr_get_pkey(dev_addr);
> else
> pkey = 0xffff;
> @@ -735,8 +734,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr,
> int ret = 0;
>
> id_priv = container_of(id, struct rdma_id_private, id);
> - switch (rdma_node_get_transport(id_priv->id.device->node_type)) {
> - case RDMA_TRANSPORT_IB:
> + if (rdma_ib_mgmt(id_priv->id.device, id_priv->id.port_num)) {
> if (!id_priv->cm_id.ib || (id_priv->id.qp_type == IB_QPT_UD))
> ret = cma_ib_init_qp_attr(id_priv, qp_attr, qp_attr_mask);
> else
> @@ -745,19 +743,16 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr,
>
> if (qp_attr->qp_state == IB_QPS_RTR)
> qp_attr->rq_psn = id_priv->seq_num;
> - break;
> - case RDMA_TRANSPORT_IWARP:
> + } else if (rdma_transport_iwarp(id_priv->id.device,
> + id_priv->id.port_num)) {
> if (!id_priv->cm_id.iw) {
> qp_attr->qp_access_flags = 0;
> *qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS;
> } else
> ret = iw_cm_init_qp_attr(id_priv->cm_id.iw, qp_attr,
> qp_attr_mask);
> - break;
> - default:
> + } else
> ret = -ENOSYS;
> - break;
> - }
>
> return ret;
> }
> @@ -928,13 +923,9 @@ static inline int cma_user_data_offset(struct rdma_id_private *id_priv)
>
> static void cma_cancel_route(struct rdma_id_private *id_priv)
> {
> - switch (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)) {
> - case IB_LINK_LAYER_INFINIBAND:
> + if (rdma_transport_ib(id_priv->id.device, id_priv->id.port_num)) {
> if (id_priv->query)
> ib_sa_cancel_query(id_priv->query_id, id_priv->query);
> - break;
> - default:
> - break;
> }
> }
>
> @@ -1006,17 +997,14 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv)
> mc = container_of(id_priv->mc_list.next,
> struct cma_multicast, list);
> list_del(&mc->list);
> - switch (rdma_port_get_link_layer(id_priv->cma_dev->device, id_priv->id.port_num)) {
> - case IB_LINK_LAYER_INFINIBAND:
> + if (rdma_transport_ib(id_priv->cma_dev->device,
> + id_priv->id.port_num)) {
> ib_sa_free_multicast(mc->multicast.ib);
> kfree(mc);
> break;
> - case IB_LINK_LAYER_ETHERNET:
> + } else if (rdma_transport_ib(id_priv->cma_dev->device,
> + id_priv->id.port_num))
> kref_put(&mc->mcref, release_mc);
> - break;
> - default:
> - break;
> - }
> }
> }
>

Doesn't the above change result in:

if (rdma_transport_ib()) {
} else if (rdma_transport_ib()) {
}

????

> @@ -1037,17 +1025,13 @@ void rdma_destroy_id(struct rdma_cm_id *id)
> mutex_unlock(&id_priv->handler_mutex);
>
> if (id_priv->cma_dev) {
> - switch (rdma_node_get_transport(id_priv->id.device->node_type)) {
> - case RDMA_TRANSPORT_IB:
> + if (rdma_ib_mgmt(id_priv->id.device, id_priv->id.port_num)) {
> if (id_priv->cm_id.ib)
> ib_destroy_cm_id(id_priv->cm_id.ib);
> - break;
> - case RDMA_TRANSPORT_IWARP:
> + } else if (rdma_transport_iwarp(id_priv->id.device,
> + id_priv->id.port_num)) {
> if (id_priv->cm_id.iw)
> iw_destroy_cm_id(id_priv->cm_id.iw);
> - break;
> - default:
> - break;
> }
> cma_leave_mc_groups(id_priv);
> cma_release_dev(id_priv);
> @@ -1966,26 +1950,14 @@ int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms)
> return -EINVAL;
>
> atomic_inc(&id_priv->refcount);
> - switch (rdma_node_get_transport(id->device->node_type)) {
> - case RDMA_TRANSPORT_IB:
> - switch (rdma_port_get_link_layer(id->device, id->port_num)) {
> - case IB_LINK_LAYER_INFINIBAND:
> - ret = cma_resolve_ib_route(id_priv, timeout_ms);
> - break;
> - case IB_LINK_LAYER_ETHERNET:
> - ret = cma_resolve_iboe_route(id_priv);
> - break;
> - default:
> - ret = -ENOSYS;
> - }
> - break;
> - case RDMA_TRANSPORT_IWARP:
> + if (rdma_transport_ib(id->device, id->port_num))
> + ret = cma_resolve_ib_route(id_priv, timeout_ms);
> + else if (rdma_transport_iboe(id->device, id->port_num))
> + ret = cma_resolve_iboe_route(id_priv);
> + else if (rdma_transport_iwarp(id->device, id->port_num))
> ret = cma_resolve_iw_route(id_priv, timeout_ms);
> - break;
> - default:
> + else
> ret = -ENOSYS;
> - break;
> - }
> if (ret)
> goto err;
>
> @@ -2059,7 +2031,7 @@ port_found:
> goto out;
>
> id_priv->id.route.addr.dev_addr.dev_type =
> - (rdma_port_get_link_layer(cma_dev->device, p) == IB_LINK_LAYER_INFINIBAND) ?
> + (rdma_transport_ib(cma_dev->device, p)) ?
> ARPHRD_INFINIBAND : ARPHRD_ETHER;
>
> rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid);
> @@ -2536,18 +2508,15 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
>
> id_priv->backlog = backlog;
> if (id->device) {
> - switch (rdma_node_get_transport(id->device->node_type)) {
> - case RDMA_TRANSPORT_IB:
> + if (rdma_ib_mgmt(id->device, id->port_num)) {
> ret = cma_ib_listen(id_priv);
> if (ret)
> goto err;
> - break;
> - case RDMA_TRANSPORT_IWARP:
> + } else if (rdma_transport_iwarp(id->device, id->port_num)) {
> ret = cma_iw_listen(id_priv, backlog);
> if (ret)
> goto err;
> - break;
> - default:
> + } else {
> ret = -ENOSYS;
> goto err;
> }
> @@ -2883,20 +2852,15 @@ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
> id_priv->srq = conn_param->srq;
> }
>
> - switch (rdma_node_get_transport(id->device->node_type)) {
> - case RDMA_TRANSPORT_IB:
> + if (rdma_ib_mgmt(id->device, id->port_num)) {
> if (id->qp_type == IB_QPT_UD)
> ret = cma_resolve_ib_udp(id_priv, conn_param);
> else
> ret = cma_connect_ib(id_priv, conn_param);
> - break;
> - case RDMA_TRANSPORT_IWARP:
> + } else if (rdma_transport_iwarp(id->device, id->port_num))
> ret = cma_connect_iw(id_priv, conn_param);
> - break;
> - default:
> + else
> ret = -ENOSYS;
> - break;
> - }
> if (ret)
> goto err;
>
> @@ -2999,8 +2963,7 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
> id_priv->srq = conn_param->srq;
> }
>
> - switch (rdma_node_get_transport(id->device->node_type)) {
> - case RDMA_TRANSPORT_IB:
> + if (rdma_ib_mgmt(id->device, id->port_num)) {
> if (id->qp_type == IB_QPT_UD) {
> if (conn_param)
> ret = cma_send_sidr_rep(id_priv, IB_SIDR_SUCCESS,
> @@ -3016,14 +2979,10 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
> else
> ret = cma_rep_recv(id_priv);
> }
> - break;
> - case RDMA_TRANSPORT_IWARP:
> + } else if (rdma_transport_iwarp(id->device, id->port_num))
> ret = cma_accept_iw(id_priv, conn_param);
> - break;
> - default:
> + else
> ret = -ENOSYS;
> - break;
> - }
>
> if (ret)
> goto reject;
> @@ -3067,8 +3026,7 @@ int rdma_reject(struct rdma_cm_id *id, const void *private_data,
> if (!id_priv->cm_id.ib)
> return -EINVAL;
>
> - switch (rdma_node_get_transport(id->device->node_type)) {
> - case RDMA_TRANSPORT_IB:
> + if (rdma_ib_mgmt(id->device, id->port_num)) {
> if (id->qp_type == IB_QPT_UD)
> ret = cma_send_sidr_rep(id_priv, IB_SIDR_REJECT, 0,
> private_data, private_data_len);
> @@ -3076,15 +3034,11 @@ int rdma_reject(struct rdma_cm_id *id, const void *private_data,
> ret = ib_send_cm_rej(id_priv->cm_id.ib,
> IB_CM_REJ_CONSUMER_DEFINED, NULL,
> 0, private_data, private_data_len);
> - break;
> - case RDMA_TRANSPORT_IWARP:
> + } else if (rdma_transport_iwarp(id->device, id->port_num)) {
> ret = iw_cm_reject(id_priv->cm_id.iw,
> private_data, private_data_len);
> - break;
> - default:
> + } else
> ret = -ENOSYS;
> - break;
> - }
> return ret;
> }
> EXPORT_SYMBOL(rdma_reject);
> @@ -3098,22 +3052,17 @@ int rdma_disconnect(struct rdma_cm_id *id)
> if (!id_priv->cm_id.ib)
> return -EINVAL;
>
> - switch (rdma_node_get_transport(id->device->node_type)) {
> - case RDMA_TRANSPORT_IB:
> + if (rdma_ib_mgmt(id->device, id->port_num)) {
> ret = cma_modify_qp_err(id_priv);
> if (ret)
> goto out;
> /* Initiate or respond to a disconnect. */
> if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0))
> ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0);
> - break;
> - case RDMA_TRANSPORT_IWARP:
> + } else if (rdma_transport_iwarp(id->device, id->port_num)) {
> ret = iw_cm_disconnect(id_priv->cm_id.iw, 0);
> - break;
> - default:
> + } else
> ret = -EINVAL;
> - break;
> - }
> out:
> return ret;
> }
> @@ -3359,24 +3308,13 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
> list_add(&mc->list, &id_priv->mc_list);
> spin_unlock(&id_priv->lock);
>
> - switch (rdma_node_get_transport(id->device->node_type)) {
> - case RDMA_TRANSPORT_IB:
> - switch (rdma_port_get_link_layer(id->device, id->port_num)) {
> - case IB_LINK_LAYER_INFINIBAND:
> - ret = cma_join_ib_multicast(id_priv, mc);
> - break;
> - case IB_LINK_LAYER_ETHERNET:
> - kref_init(&mc->mcref);
> - ret = cma_iboe_join_multicast(id_priv, mc);
> - break;
> - default:
> - ret = -EINVAL;
> - }
> - break;
> - default:
> + if (rdma_transport_iboe(id->device, id->port_num)) {
> + kref_init(&mc->mcref);
> + ret = cma_iboe_join_multicast(id_priv, mc);
> + } else if (rdma_transport_ib(id->device, id->port_num))
> + ret = cma_join_ib_multicast(id_priv, mc);
> + else
> ret = -ENOSYS;
> - break;
> - }
>
> if (ret) {
> spin_lock_irq(&id_priv->lock);
> @@ -3404,19 +3342,17 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr)
> ib_detach_mcast(id->qp,
> &mc->multicast.ib->rec.mgid,
> be16_to_cpu(mc->multicast.ib->rec.mlid));
> - if (rdma_node_get_transport(id_priv->cma_dev->device->node_type) == RDMA_TRANSPORT_IB) {
> - switch (rdma_port_get_link_layer(id->device, id->port_num)) {
> - case IB_LINK_LAYER_INFINIBAND:
> - ib_sa_free_multicast(mc->multicast.ib);
> - kfree(mc);
> - break;
> - case IB_LINK_LAYER_ETHERNET:
> - kref_put(&mc->mcref, release_mc);
> - break;
> - default:
> - break;
> - }
> - }
> +
> + /* Will this happen? */
> + BUG_ON(id_priv->cma_dev->device != id->device);
> +
> + if (rdma_transport_ib(id->device, id->port_num)) {
> + ib_sa_free_multicast(mc->multicast.ib);
> + kfree(mc);
> + } else if (rdma_transport_iboe(id->device,
> + id->port_num))
> + kref_put(&mc->mcref, release_mc);
> +
> return;
> }
> }
> diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
> index 45d67e9..42c9bf6 100644
> --- a/drivers/infiniband/core/ucma.c
> +++ b/drivers/infiniband/core/ucma.c
> @@ -722,26 +722,13 @@ static ssize_t ucma_query_route(struct ucma_file *file,
>
> resp.node_guid = (__force __u64) ctx->cm_id->device->node_guid;
> resp.port_num = ctx->cm_id->port_num;
> - switch (rdma_node_get_transport(ctx->cm_id->device->node_type)) {
> - case RDMA_TRANSPORT_IB:
> - switch (rdma_port_get_link_layer(ctx->cm_id->device,
> - ctx->cm_id->port_num)) {
> - case IB_LINK_LAYER_INFINIBAND:
> - ucma_copy_ib_route(&resp, &ctx->cm_id->route);
> - break;
> - case IB_LINK_LAYER_ETHERNET:
> - ucma_copy_iboe_route(&resp, &ctx->cm_id->route);
> - break;
> - default:
> - break;
> - }
> - break;
> - case RDMA_TRANSPORT_IWARP:
> +
> + if (rdma_transport_ib(ctx->cm_id->device, ctx->cm_id->port_num))
> + ucma_copy_ib_route(&resp, &ctx->cm_id->route);
> + else if (rdma_transport_iboe(ctx->cm_id->device, ctx->cm_id->port_num))
> + ucma_copy_iboe_route(&resp, &ctx->cm_id->route);
> + else if (rdma_transport_iwarp(ctx->cm_id->device, ctx->cm_id->port_num))
> ucma_copy_iw_route(&resp, &ctx->cm_id->route);
> - break;
> - default:
> - break;
> - }
>
> out:
> if (copy_to_user((void __user *)(unsigned long)cmd.response,
> --
> 2.1.0


2015-04-07 21:25:05

by Hefty, Sean

[permalink] [raw]
Subject: RE: [PATCH v2 02/17] IB/Verbs: Implement raw management helpers

PiArc3RhdGljIGlubGluZSBpbnQgcmRtYV90cmFuc3BvcnRfaWIoc3RydWN0IGliX2RldmljZSAq
ZGV2aWNlLCB1OA0KPiBwb3J0X251bSkNCj4gK3sNCj4gKwlyZXR1cm4gZGV2aWNlLT5xdWVyeV90
cmFuc3BvcnQoZGV2aWNlLCBwb3J0X251bSkNCj4gKwkJCT09IFJETUFfVFJBTlNQT1JUX0lCOw0K
PiArfQ0KPiArDQo+ICtzdGF0aWMgaW5saW5lIGludCByZG1hX3RyYW5zcG9ydF9pYm9lKHN0cnVj
dCBpYl9kZXZpY2UgKmRldmljZSwgdTgNCj4gcG9ydF9udW0pDQo+ICt7DQo+ICsJcmV0dXJuIGRl
dmljZS0+cXVlcnlfdHJhbnNwb3J0KGRldmljZSwgcG9ydF9udW0pDQo+ICsJCQk9PSBSRE1BX1RS
QU5TUE9SVF9JQk9FOw0KPiArfQ0KDQpXZSBuZWVkIHRvIGRvIHNvbWV0aGluZyB3aXRoIHRoZSBm
dW5jdGlvbiBuYW1lcyB0byBtYWtlIHRoZWlyIHVzZSBtb3JlIG9idmlvdXMuICBCb3RoIElCIGFu
ZCBJQm9FIGhhdmUgdHJhbnNwb3J0IElCLiAgSSB0aGluayBKYXNvbiBzdWdnZXN0ZWQgcmRtYV90
ZWNoX2liIC8gcmRtYV90ZWNoX2lib2UuDQoNClJlZ2FyZGluZyB0cmFuc3BvcnQgdHlwZXMsIEkg
YmVsaWV2ZSB0aGF0IHVzbmljIHN1cHBvcnRzIDIgZGlmZmVyZW50IHRyYW5zcG9ydHMuICBBbHRo
b3VnaCB1c25pYyBpc24ndCB1c2VkIGJ5IGFueXRoaW5nIGVsc2UgaW4gdGhlIGNvcmUgbGF5ZXIs
IHdlIHNob3VsZCBwcm9iYWJseSBiZSBhYmxlIHRvIGhhbmRsZSBhIGRldmljZSB0aGF0IHN1cHBv
cnRzIG11bHRpcGxlIHByb3RvY29scy4gIEknbSBub3Qgc3VyZSB3aGF0IHRoZSAndHJhbnNwb3J0
JyBzaG91bGQgYmUgZm9yIGlXYXJwLCBzaW5jZSBpV2FycCBpcyBsYXllcmVkIG92ZXIgVENQLiAg
QnV0IHRoYXQgbWF5IGp1c3QgbWVhbiB0aGF0IHRoZSB0ZXJtIHRyYW5zcG9ydCBpc24ndCBncmVh
dC4NCg0KLSBTZWFuDQo=

2015-04-07 21:36:30

by Hefty, Sean

[permalink] [raw]
Subject: RE: [PATCH v2 13/17] IB/Verbs: Reform cma/ucma with management helpers

PiBkaWZmIC0tZ2l0IGEvZHJpdmVycy9pbmZpbmliYW5kL2NvcmUvY21hLmMgYi9kcml2ZXJzL2lu
ZmluaWJhbmQvY29yZS9jbWEuYw0KPiBpbmRleCBkOGE4ZWE3Li5jMjNmNDgzIDEwMDY0NA0KPiAt
LS0gYS9kcml2ZXJzL2luZmluaWJhbmQvY29yZS9jbWEuYw0KPiArKysgYi9kcml2ZXJzL2luZmlu
aWJhbmQvY29yZS9jbWEuYw0KPiBAQCAtNDM1LDEwICs0MzUsMTAgQEAgc3RhdGljIGludCBjbWFf
cmVzb2x2ZV9pYl9kZXYoc3RydWN0IHJkbWFfaWRfcHJpdmF0ZQ0KPiAqaWRfcHJpdikNCj4gIAlw
a2V5ID0gbnRvaHMoYWRkci0+c2liX3BrZXkpOw0KPiANCj4gIAlsaXN0X2Zvcl9lYWNoX2VudHJ5
KGN1cl9kZXYsICZkZXZfbGlzdCwgbGlzdCkgew0KPiAtCQlpZiAocmRtYV9ub2RlX2dldF90cmFu
c3BvcnQoY3VyX2Rldi0+ZGV2aWNlLT5ub2RlX3R5cGUpICE9DQo+IFJETUFfVFJBTlNQT1JUX0lC
KQ0KPiAtCQkJY29udGludWU7DQo+IC0NCj4gIAkJZm9yIChwID0gMTsgcCA8PSBjdXJfZGV2LT5k
ZXZpY2UtPnBoeXNfcG9ydF9jbnQ7ICsrcCkgew0KPiArCQkJaWYgKCFyZG1hX2liX21nbXQoY3Vy
X2Rldi0+ZGV2aWNlLCBwKSkNCj4gKwkJCQljb250aW51ZTsNCg0KVGhpcyBjaGVjayB3YW50cyB0
byBiZSBzb21ldGhpbmcgbGlrZSBpc19hZl9pYl9zdXBwb3J0ZWQoKS4gIENoZWNraW5nIGZvciBJ
QiB0cmFuc3BvcnQgbWF5IGFjdHVhbGx5IGJlIGJldHRlciB0aGFuIGNoZWNraW5nIGZvciBJQiBt
YW5hZ2VtZW50LiAgSSBkb24ndCBrbm93IGlmIElCb0UvUm9DRSBkZXZpY2VzIHN1cHBvcnQgQUZf
SUIuDQoNCg0KPiArDQo+ICAJCQlpZiAoaWJfZmluZF9jYWNoZWRfcGtleShjdXJfZGV2LT5kZXZp
Y2UsIHAsIHBrZXksDQo+ICZpbmRleCkpDQo+ICAJCQkJY29udGludWU7DQo+IA0KPiBAQCAtNjMz
LDEwICs2MzMsMTAgQEAgc3RhdGljIGludCBjbWFfbW9kaWZ5X3FwX3J0cihzdHJ1Y3QgcmRtYV9p
ZF9wcml2YXRlDQo+ICppZF9wcml2LA0KPiAgCWlmIChyZXQpDQo+ICAJCWdvdG8gb3V0Ow0KPiAN
Cj4gLQlpZiAocmRtYV9ub2RlX2dldF90cmFuc3BvcnQoaWRfcHJpdi0+Y21hX2Rldi0+ZGV2aWNl
LT5ub2RlX3R5cGUpDQo+IC0JICAgID09IFJETUFfVFJBTlNQT1JUX0lCICYmDQo+IC0JICAgIHJk
bWFfcG9ydF9nZXRfbGlua19sYXllcihpZF9wcml2LT5pZC5kZXZpY2UsIGlkX3ByaXYtDQo+ID5p
ZC5wb3J0X251bSkNCj4gLQkgICAgPT0gSUJfTElOS19MQVlFUl9FVEhFUk5FVCkgew0KPiArCS8q
IFdpbGwgdGhpcyBoYXBwZW4/ICovDQo+ICsJQlVHX09OKGlkX3ByaXYtPmNtYV9kZXYtPmRldmlj
ZSAhPSBpZF9wcml2LT5pZC5kZXZpY2UpOw0KDQpUaGlzIHNob3VsZG4ndCBoYXBwZW4uICBUaGUg
QlVHX09OIGxvb2tzIG9rYXkuDQoNCg0KPiArCWlmIChyZG1hX3RyYW5zcG9ydF9pYm9lKGlkX3By
aXYtPmlkLmRldmljZSwgaWRfcHJpdi0+aWQucG9ydF9udW0pKSB7DQo+ICAJCXJldCA9IHJkbWFf
YWRkcl9maW5kX3NtYWNfYnlfc2dpZCgmc2dpZCwgcXBfYXR0ci5zbWFjLCBOVUxMKTsNCj4gDQo+
ICAJCWlmIChyZXQpDQo+IEBAIC03MDAsOCArNzAwLDcgQEAgc3RhdGljIGludCBjbWFfaWJfaW5p
dF9xcF9hdHRyKHN0cnVjdCByZG1hX2lkX3ByaXZhdGUNCj4gKmlkX3ByaXYsDQo+ICAJaW50IHJl
dDsNCj4gIAl1MTYgcGtleTsNCj4gDQo+IC0JaWYgKHJkbWFfcG9ydF9nZXRfbGlua19sYXllcihp
ZF9wcml2LT5pZC5kZXZpY2UsIGlkX3ByaXYtDQo+ID5pZC5wb3J0X251bSkgPT0NCj4gLQkgICAg
SUJfTElOS19MQVlFUl9JTkZJTklCQU5EKQ0KPiArCWlmIChyZG1hX3RyYW5zcG9ydF9pYihpZF9w
cml2LT5pZC5kZXZpY2UsIGlkX3ByaXYtPmlkLnBvcnRfbnVtKSkNCj4gIAkJcGtleSA9IGliX2Fk
ZHJfZ2V0X3BrZXkoZGV2X2FkZHIpOw0KPiAgCWVsc2UNCj4gIAkJcGtleSA9IDB4ZmZmZjsNCg0K
Q2hlY2sgaGVyZSBzaG91bGQgYmUgYWdhaW5zdCB0aGUgbGluayBsYXllciwgbm90IHRyYW5zcG9y
dC4NCg0KDQo+IEBAIC03MzUsOCArNzM0LDcgQEAgaW50IHJkbWFfaW5pdF9xcF9hdHRyKHN0cnVj
dCByZG1hX2NtX2lkICppZCwgc3RydWN0DQo+IGliX3FwX2F0dHIgKnFwX2F0dHIsDQo+ICAJaW50
IHJldCA9IDA7DQo+IA0KPiAgCWlkX3ByaXYgPSBjb250YWluZXJfb2YoaWQsIHN0cnVjdCByZG1h
X2lkX3ByaXZhdGUsIGlkKTsNCj4gLQlzd2l0Y2ggKHJkbWFfbm9kZV9nZXRfdHJhbnNwb3J0KGlk
X3ByaXYtPmlkLmRldmljZS0+bm9kZV90eXBlKSkgew0KPiAtCWNhc2UgUkRNQV9UUkFOU1BPUlRf
SUI6DQo+ICsJaWYgKHJkbWFfaWJfbWdtdChpZF9wcml2LT5pZC5kZXZpY2UsIGlkX3ByaXYtPmlk
LnBvcnRfbnVtKSkgew0KPiAgCQlpZiAoIWlkX3ByaXYtPmNtX2lkLmliIHx8IChpZF9wcml2LT5p
ZC5xcF90eXBlID09IElCX1FQVF9VRCkpDQo+ICAJCQlyZXQgPSBjbWFfaWJfaW5pdF9xcF9hdHRy
KGlkX3ByaXYsIHFwX2F0dHIsDQo+IHFwX2F0dHJfbWFzayk7DQo+ICAJCWVsc2UNCj4gQEAgLTc0
NSwxOSArNzQzLDE2IEBAIGludCByZG1hX2luaXRfcXBfYXR0cihzdHJ1Y3QgcmRtYV9jbV9pZCAq
aWQsIHN0cnVjdA0KPiBpYl9xcF9hdHRyICpxcF9hdHRyLA0KPiANCj4gIAkJaWYgKHFwX2F0dHIt
PnFwX3N0YXRlID09IElCX1FQU19SVFIpDQo+ICAJCQlxcF9hdHRyLT5ycV9wc24gPSBpZF9wcml2
LT5zZXFfbnVtOw0KPiAtCQlicmVhazsNCj4gLQljYXNlIFJETUFfVFJBTlNQT1JUX0lXQVJQOg0K
PiArCX0gZWxzZSBpZiAocmRtYV90cmFuc3BvcnRfaXdhcnAoaWRfcHJpdi0+aWQuZGV2aWNlLA0K
PiArCQkJCQkJaWRfcHJpdi0+aWQucG9ydF9udW0pKSB7DQo+ICAJCWlmICghaWRfcHJpdi0+Y21f
aWQuaXcpIHsNCj4gIAkJCXFwX2F0dHItPnFwX2FjY2Vzc19mbGFncyA9IDA7DQo+ICAJCQkqcXBf
YXR0cl9tYXNrID0gSUJfUVBfU1RBVEUgfCBJQl9RUF9BQ0NFU1NfRkxBR1M7DQo+ICAJCX0gZWxz
ZQ0KPiAgCQkJcmV0ID0gaXdfY21faW5pdF9xcF9hdHRyKGlkX3ByaXYtPmNtX2lkLml3LCBxcF9h
dHRyLA0KPiAgCQkJCQkJIHFwX2F0dHJfbWFzayk7DQo+IC0JCWJyZWFrOw0KPiAtCWRlZmF1bHQ6
DQo+ICsJfSBlbHNlDQo+ICAJCXJldCA9IC1FTk9TWVM7DQo+IC0JCWJyZWFrOw0KPiAtCX0NCj4g
DQo+ICAJcmV0dXJuIHJldDsNCj4gIH0NCj4gQEAgLTkyOCwxMyArOTIzLDkgQEAgc3RhdGljIGlu
bGluZSBpbnQgY21hX3VzZXJfZGF0YV9vZmZzZXQoc3RydWN0DQo+IHJkbWFfaWRfcHJpdmF0ZSAq
aWRfcHJpdikNCj4gDQo+ICBzdGF0aWMgdm9pZCBjbWFfY2FuY2VsX3JvdXRlKHN0cnVjdCByZG1h
X2lkX3ByaXZhdGUgKmlkX3ByaXYpDQo+ICB7DQo+IC0Jc3dpdGNoIChyZG1hX3BvcnRfZ2V0X2xp
bmtfbGF5ZXIoaWRfcHJpdi0+aWQuZGV2aWNlLCBpZF9wcml2LQ0KPiA+aWQucG9ydF9udW0pKSB7
DQo+IC0JY2FzZSBJQl9MSU5LX0xBWUVSX0lORklOSUJBTkQ6DQo+ICsJaWYgKHJkbWFfdHJhbnNw
b3J0X2liKGlkX3ByaXYtPmlkLmRldmljZSwgaWRfcHJpdi0+aWQucG9ydF9udW0pKSB7DQoNClRo
ZSBjaGVjayBzaG91bGQgYmUgY2FwX2liX3NhKCkNCg0KDQo+ICAJCWlmIChpZF9wcml2LT5xdWVy
eSkNCj4gIAkJCWliX3NhX2NhbmNlbF9xdWVyeShpZF9wcml2LT5xdWVyeV9pZCwgaWRfcHJpdi0+
cXVlcnkpOw0KPiAtCQlicmVhazsNCj4gLQlkZWZhdWx0Og0KPiAtCQlicmVhazsNCj4gIAl9DQo+
ICB9DQo+IA0KPiBAQCAtMTAwNiwxNyArOTk3LDE0IEBAIHN0YXRpYyB2b2lkIGNtYV9sZWF2ZV9t
Y19ncm91cHMoc3RydWN0DQo+IHJkbWFfaWRfcHJpdmF0ZSAqaWRfcHJpdikNCj4gIAkJbWMgPSBj
b250YWluZXJfb2YoaWRfcHJpdi0+bWNfbGlzdC5uZXh0LA0KPiAgCQkJCSAgc3RydWN0IGNtYV9t
dWx0aWNhc3QsIGxpc3QpOw0KPiAgCQlsaXN0X2RlbCgmbWMtPmxpc3QpOw0KPiAtCQlzd2l0Y2gg
KHJkbWFfcG9ydF9nZXRfbGlua19sYXllcihpZF9wcml2LT5jbWFfZGV2LT5kZXZpY2UsDQo+IGlk
X3ByaXYtPmlkLnBvcnRfbnVtKSkgew0KPiAtCQljYXNlIElCX0xJTktfTEFZRVJfSU5GSU5JQkFO
RDoNCj4gKwkJaWYgKHJkbWFfdHJhbnNwb3J0X2liKGlkX3ByaXYtPmNtYV9kZXYtPmRldmljZSwN
Cj4gKwkJCQkgICAgICBpZF9wcml2LT5pZC5wb3J0X251bSkpIHsNCj4gIAkJCWliX3NhX2ZyZWVf
bXVsdGljYXN0KG1jLT5tdWx0aWNhc3QuaWIpOw0KPiAgCQkJa2ZyZWUobWMpOw0KPiAgCQkJYnJl
YWs7DQoNCldhbnQgY2FwX2liX21jYXN0KCkNCg0KDQo+IC0JCWNhc2UgSUJfTElOS19MQVlFUl9F
VEhFUk5FVDoNCj4gKwkJfSBlbHNlIGlmIChyZG1hX3RyYW5zcG9ydF9pYihpZF9wcml2LT5jbWFf
ZGV2LT5kZXZpY2UsDQo+ICsJCQkJCSAgICAgaWRfcHJpdi0+aWQucG9ydF9udW0pKQ0KPiAgCQkJ
a3JlZl9wdXQoJm1jLT5tY3JlZiwgcmVsZWFzZV9tYyk7DQo+IC0JCQlicmVhazsNCj4gLQkJZGVm
YXVsdDoNCj4gLQkJCWJyZWFrOw0KDQpKdXN0IHdhbnQgZWxzZSAvKiAhY2FwX2liX21jYXN0ICov
DQoNCg0KPiAtCQl9DQo+ICAJfQ0KPiAgfQ0KPiANCj4gQEAgLTEwMzcsMTcgKzEwMjUsMTMgQEAg
dm9pZCByZG1hX2Rlc3Ryb3lfaWQoc3RydWN0IHJkbWFfY21faWQgKmlkKQ0KPiAgCW11dGV4X3Vu
bG9jaygmaWRfcHJpdi0+aGFuZGxlcl9tdXRleCk7DQo+IA0KPiAgCWlmIChpZF9wcml2LT5jbWFf
ZGV2KSB7DQo+IC0JCXN3aXRjaCAocmRtYV9ub2RlX2dldF90cmFuc3BvcnQoaWRfcHJpdi0+aWQu
ZGV2aWNlLQ0KPiA+bm9kZV90eXBlKSkgew0KPiAtCQljYXNlIFJETUFfVFJBTlNQT1JUX0lCOg0K
PiArCQlpZiAocmRtYV9pYl9tZ210KGlkX3ByaXYtPmlkLmRldmljZSwgaWRfcHJpdi0+aWQucG9y
dF9udW0pKSB7DQo+ICAJCQlpZiAoaWRfcHJpdi0+Y21faWQuaWIpDQo+ICAJCQkJaWJfZGVzdHJv
eV9jbV9pZChpZF9wcml2LT5jbV9pZC5pYik7DQo+IC0JCQlicmVhazsNCj4gLQkJY2FzZSBSRE1B
X1RSQU5TUE9SVF9JV0FSUDoNCj4gKwkJfSBlbHNlIGlmIChyZG1hX3RyYW5zcG9ydF9pd2FycChp
ZF9wcml2LT5pZC5kZXZpY2UsDQo+ICsJCQkJCQkJaWRfcHJpdi0+aWQucG9ydF9udW0pKSB7DQo+
ICAJCQlpZiAoaWRfcHJpdi0+Y21faWQuaXcpDQo+ICAJCQkJaXdfZGVzdHJveV9jbV9pZChpZF9w
cml2LT5jbV9pZC5pdyk7DQo+IC0JCQlicmVhazsNCj4gLQkJZGVmYXVsdDoNCj4gLQkJCWJyZWFr
Ow0KPiAgCQl9DQo+ICAJCWNtYV9sZWF2ZV9tY19ncm91cHMoaWRfcHJpdik7DQo+ICAJCWNtYV9y
ZWxlYXNlX2RldihpZF9wcml2KTsNCj4gQEAgLTE5NjYsMjYgKzE5NTAsMTQgQEAgaW50IHJkbWFf
cmVzb2x2ZV9yb3V0ZShzdHJ1Y3QgcmRtYV9jbV9pZCAqaWQsIGludA0KPiB0aW1lb3V0X21zKQ0K
PiAgCQlyZXR1cm4gLUVJTlZBTDsNCj4gDQo+ICAJYXRvbWljX2luYygmaWRfcHJpdi0+cmVmY291
bnQpOw0KPiAtCXN3aXRjaCAocmRtYV9ub2RlX2dldF90cmFuc3BvcnQoaWQtPmRldmljZS0+bm9k
ZV90eXBlKSkgew0KPiAtCWNhc2UgUkRNQV9UUkFOU1BPUlRfSUI6DQo+IC0JCXN3aXRjaCAocmRt
YV9wb3J0X2dldF9saW5rX2xheWVyKGlkLT5kZXZpY2UsIGlkLT5wb3J0X251bSkpIHsNCj4gLQkJ
Y2FzZSBJQl9MSU5LX0xBWUVSX0lORklOSUJBTkQ6DQo+IC0JCQlyZXQgPSBjbWFfcmVzb2x2ZV9p
Yl9yb3V0ZShpZF9wcml2LCB0aW1lb3V0X21zKTsNCj4gLQkJCWJyZWFrOw0KPiAtCQljYXNlIElC
X0xJTktfTEFZRVJfRVRIRVJORVQ6DQo+IC0JCQlyZXQgPSBjbWFfcmVzb2x2ZV9pYm9lX3JvdXRl
KGlkX3ByaXYpOw0KPiAtCQkJYnJlYWs7DQo+IC0JCWRlZmF1bHQ6DQo+IC0JCQlyZXQgPSAtRU5P
U1lTOw0KPiAtCQl9DQo+IC0JCWJyZWFrOw0KPiAtCWNhc2UgUkRNQV9UUkFOU1BPUlRfSVdBUlA6
DQo+ICsJaWYgKHJkbWFfdHJhbnNwb3J0X2liKGlkLT5kZXZpY2UsIGlkLT5wb3J0X251bSkpDQo+
ICsJCXJldCA9IGNtYV9yZXNvbHZlX2liX3JvdXRlKGlkX3ByaXYsIHRpbWVvdXRfbXMpOw0KDQpC
ZXN0IGZpdCB3b3VsZCBiZSBjYXBfaWJfc2EoKQ0KDQoNCj4gKwllbHNlIGlmIChyZG1hX3RyYW5z
cG9ydF9pYm9lKGlkLT5kZXZpY2UsIGlkLT5wb3J0X251bSkpDQo+ICsJCXJldCA9IGNtYV9yZXNv
bHZlX2lib2Vfcm91dGUoaWRfcHJpdik7DQo+ICsJZWxzZSBpZiAocmRtYV90cmFuc3BvcnRfaXdh
cnAoaWQtPmRldmljZSwgaWQtPnBvcnRfbnVtKSkNCj4gIAkJcmV0ID0gY21hX3Jlc29sdmVfaXdf
cm91dGUoaWRfcHJpdiwgdGltZW91dF9tcyk7DQo+IC0JCWJyZWFrOw0KPiAtCWRlZmF1bHQ6DQo+
ICsJZWxzZQ0KPiAgCQlyZXQgPSAtRU5PU1lTOw0KPiAtCQlicmVhazsNCj4gLQl9DQo+ICAJaWYg
KHJldCkNCj4gIAkJZ290byBlcnI7DQo+IA0KPiBAQCAtMjA1OSw3ICsyMDMxLDcgQEAgcG9ydF9m
b3VuZDoNCj4gIAkJZ290byBvdXQ7DQo+IA0KPiAgCWlkX3ByaXYtPmlkLnJvdXRlLmFkZHIuZGV2
X2FkZHIuZGV2X3R5cGUgPQ0KPiAtCQkocmRtYV9wb3J0X2dldF9saW5rX2xheWVyKGNtYV9kZXYt
PmRldmljZSwgcCkgPT0NCj4gSUJfTElOS19MQVlFUl9JTkZJTklCQU5EKSA/DQo+ICsJCShyZG1h
X3RyYW5zcG9ydF9pYihjbWFfZGV2LT5kZXZpY2UsIHApKSA/DQo+ICAJCUFSUEhSRF9JTkZJTklC
QU5EIDogQVJQSFJEX0VUSEVSOw0KDQpUaGlzIHdhbnRzIHRoZSBsaW5rIGxheWVyLCBvciBtYXli
ZSB1c2UgY2FwX2lwb2liLg0KDQoNCj4gDQo+ICAJcmRtYV9hZGRyX3NldF9zZ2lkKCZpZF9wcml2
LT5pZC5yb3V0ZS5hZGRyLmRldl9hZGRyLCAmZ2lkKTsNCj4gQEAgLTI1MzYsMTggKzI1MDgsMTUg
QEAgaW50IHJkbWFfbGlzdGVuKHN0cnVjdCByZG1hX2NtX2lkICppZCwgaW50DQo+IGJhY2tsb2cp
DQo+IA0KPiAgCWlkX3ByaXYtPmJhY2tsb2cgPSBiYWNrbG9nOw0KPiAgCWlmIChpZC0+ZGV2aWNl
KSB7DQo+IC0JCXN3aXRjaCAocmRtYV9ub2RlX2dldF90cmFuc3BvcnQoaWQtPmRldmljZS0+bm9k
ZV90eXBlKSkgew0KPiAtCQljYXNlIFJETUFfVFJBTlNQT1JUX0lCOg0KPiArCQlpZiAocmRtYV9p
Yl9tZ210KGlkLT5kZXZpY2UsIGlkLT5wb3J0X251bSkpIHsNCg0KV2FudCBjYXBfaWJfY20oKQ0K
DQoNCj4gIAkJCXJldCA9IGNtYV9pYl9saXN0ZW4oaWRfcHJpdik7DQo+ICAJCQlpZiAocmV0KQ0K
PiAgCQkJCWdvdG8gZXJyOw0KPiAtCQkJYnJlYWs7DQo+IC0JCWNhc2UgUkRNQV9UUkFOU1BPUlRf
SVdBUlA6DQo+ICsJCX0gZWxzZSBpZiAocmRtYV90cmFuc3BvcnRfaXdhcnAoaWQtPmRldmljZSwg
aWQtPnBvcnRfbnVtKSkgew0KPiAgCQkJcmV0ID0gY21hX2l3X2xpc3RlbihpZF9wcml2LCBiYWNr
bG9nKTsNCj4gIAkJCWlmIChyZXQpDQo+ICAJCQkJZ290byBlcnI7DQo+IC0JCQlicmVhazsNCj4g
LQkJZGVmYXVsdDoNCj4gKwkJfSBlbHNlIHsNCj4gIAkJCXJldCA9IC1FTk9TWVM7DQo+ICAJCQln
b3RvIGVycjsNCj4gIAkJfQ0KPiBAQCAtMjg4MywyMCArMjg1MiwxNSBAQCBpbnQgcmRtYV9jb25u
ZWN0KHN0cnVjdCByZG1hX2NtX2lkICppZCwgc3RydWN0DQo+IHJkbWFfY29ubl9wYXJhbSAqY29u
bl9wYXJhbSkNCj4gIAkJaWRfcHJpdi0+c3JxID0gY29ubl9wYXJhbS0+c3JxOw0KPiAgCX0NCj4g
DQo+IC0Jc3dpdGNoIChyZG1hX25vZGVfZ2V0X3RyYW5zcG9ydChpZC0+ZGV2aWNlLT5ub2RlX3R5
cGUpKSB7DQo+IC0JY2FzZSBSRE1BX1RSQU5TUE9SVF9JQjoNCj4gKwlpZiAocmRtYV9pYl9tZ210
KGlkLT5kZXZpY2UsIGlkLT5wb3J0X251bSkpIHsNCg0KY2FwX2liX2NtKCkNCg0KDQo+ICAJCWlm
IChpZC0+cXBfdHlwZSA9PSBJQl9RUFRfVUQpDQo+ICAJCQlyZXQgPSBjbWFfcmVzb2x2ZV9pYl91
ZHAoaWRfcHJpdiwgY29ubl9wYXJhbSk7DQo+ICAJCWVsc2UNCj4gIAkJCXJldCA9IGNtYV9jb25u
ZWN0X2liKGlkX3ByaXYsIGNvbm5fcGFyYW0pOw0KPiAtCQlicmVhazsNCj4gLQljYXNlIFJETUFf
VFJBTlNQT1JUX0lXQVJQOg0KPiArCX0gZWxzZSBpZiAocmRtYV90cmFuc3BvcnRfaXdhcnAoaWQt
PmRldmljZSwgaWQtPnBvcnRfbnVtKSkNCj4gIAkJcmV0ID0gY21hX2Nvbm5lY3RfaXcoaWRfcHJp
diwgY29ubl9wYXJhbSk7DQo+IC0JCWJyZWFrOw0KPiAtCWRlZmF1bHQ6DQo+ICsJZWxzZQ0KPiAg
CQlyZXQgPSAtRU5PU1lTOw0KPiAtCQlicmVhazsNCj4gLQl9DQo+ICAJaWYgKHJldCkNCj4gIAkJ
Z290byBlcnI7DQo+IA0KPiBAQCAtMjk5OSw4ICsyOTYzLDcgQEAgaW50IHJkbWFfYWNjZXB0KHN0
cnVjdCByZG1hX2NtX2lkICppZCwgc3RydWN0DQo+IHJkbWFfY29ubl9wYXJhbSAqY29ubl9wYXJh
bSkNCj4gIAkJaWRfcHJpdi0+c3JxID0gY29ubl9wYXJhbS0+c3JxOw0KPiAgCX0NCj4gDQo+IC0J
c3dpdGNoIChyZG1hX25vZGVfZ2V0X3RyYW5zcG9ydChpZC0+ZGV2aWNlLT5ub2RlX3R5cGUpKSB7
DQo+IC0JY2FzZSBSRE1BX1RSQU5TUE9SVF9JQjoNCj4gKwlpZiAocmRtYV9pYl9tZ210KGlkLT5k
ZXZpY2UsIGlkLT5wb3J0X251bSkpIHsNCg0KY2FwX2liX2NtKCkNCg0KDQo+ICAJCWlmIChpZC0+
cXBfdHlwZSA9PSBJQl9RUFRfVUQpIHsNCj4gIAkJCWlmIChjb25uX3BhcmFtKQ0KPiAgCQkJCXJl
dCA9IGNtYV9zZW5kX3NpZHJfcmVwKGlkX3ByaXYsIElCX1NJRFJfU1VDQ0VTUywNCj4gQEAgLTMw
MTYsMTQgKzI5NzksMTAgQEAgaW50IHJkbWFfYWNjZXB0KHN0cnVjdCByZG1hX2NtX2lkICppZCwg
c3RydWN0DQo+IHJkbWFfY29ubl9wYXJhbSAqY29ubl9wYXJhbSkNCj4gIAkJCWVsc2UNCj4gIAkJ
CQlyZXQgPSBjbWFfcmVwX3JlY3YoaWRfcHJpdik7DQo+ICAJCX0NCj4gLQkJYnJlYWs7DQo+IC0J
Y2FzZSBSRE1BX1RSQU5TUE9SVF9JV0FSUDoNCj4gKwl9IGVsc2UgaWYgKHJkbWFfdHJhbnNwb3J0
X2l3YXJwKGlkLT5kZXZpY2UsIGlkLT5wb3J0X251bSkpDQo+ICAJCXJldCA9IGNtYV9hY2NlcHRf
aXcoaWRfcHJpdiwgY29ubl9wYXJhbSk7DQoNCklmIGNhcF9pYl9jbSgpIGlzIHVzZWQgaW4gdGhl
IHBsYWNlcyBtYXJrZWQgYWJvdmUsIG1heWJlIGFkZCBhIGNhcF9pd19jbSgpIGZvciB0aGUgZWxz
ZSBjb25kaXRpb25zLg0KDQoNCj4gLQkJYnJlYWs7DQo+IC0JZGVmYXVsdDoNCj4gKwllbHNlDQo+
ICAJCXJldCA9IC1FTk9TWVM7DQo+IC0JCWJyZWFrOw0KPiAtCX0NCj4gDQo+ICAJaWYgKHJldCkN
Cj4gIAkJZ290byByZWplY3Q7DQo+IEBAIC0zMDY3LDggKzMwMjYsNyBAQCBpbnQgcmRtYV9yZWpl
Y3Qoc3RydWN0IHJkbWFfY21faWQgKmlkLCBjb25zdCB2b2lkDQo+ICpwcml2YXRlX2RhdGEsDQo+
ICAJaWYgKCFpZF9wcml2LT5jbV9pZC5pYikNCj4gIAkJcmV0dXJuIC1FSU5WQUw7DQo+IA0KPiAt
CXN3aXRjaCAocmRtYV9ub2RlX2dldF90cmFuc3BvcnQoaWQtPmRldmljZS0+bm9kZV90eXBlKSkg
ew0KPiAtCWNhc2UgUkRNQV9UUkFOU1BPUlRfSUI6DQo+ICsJaWYgKHJkbWFfaWJfbWdtdChpZC0+
ZGV2aWNlLCBpZC0+cG9ydF9udW0pKSB7DQoNCmNhcF9pYl9jbSgpDQoNCg0KPiAgCQlpZiAoaWQt
PnFwX3R5cGUgPT0gSUJfUVBUX1VEKQ0KPiAgCQkJcmV0ID0gY21hX3NlbmRfc2lkcl9yZXAoaWRf
cHJpdiwgSUJfU0lEUl9SRUpFQ1QsIDAsDQo+ICAJCQkJCQlwcml2YXRlX2RhdGEsIHByaXZhdGVf
ZGF0YV9sZW4pOw0KPiBAQCAtMzA3NiwxNSArMzAzNCwxMSBAQCBpbnQgcmRtYV9yZWplY3Qoc3Ry
dWN0IHJkbWFfY21faWQgKmlkLCBjb25zdCB2b2lkDQo+ICpwcml2YXRlX2RhdGEsDQo+ICAJCQly
ZXQgPSBpYl9zZW5kX2NtX3JlaihpZF9wcml2LT5jbV9pZC5pYiwNCj4gIAkJCQkJICAgICBJQl9D
TV9SRUpfQ09OU1VNRVJfREVGSU5FRCwgTlVMTCwNCj4gIAkJCQkJICAgICAwLCBwcml2YXRlX2Rh
dGEsIHByaXZhdGVfZGF0YV9sZW4pOw0KPiAtCQlicmVhazsNCj4gLQljYXNlIFJETUFfVFJBTlNQ
T1JUX0lXQVJQOg0KPiArCX0gZWxzZSBpZiAocmRtYV90cmFuc3BvcnRfaXdhcnAoaWQtPmRldmlj
ZSwgaWQtPnBvcnRfbnVtKSkgew0KPiAgCQlyZXQgPSBpd19jbV9yZWplY3QoaWRfcHJpdi0+Y21f
aWQuaXcsDQo+ICAJCQkJICAgcHJpdmF0ZV9kYXRhLCBwcml2YXRlX2RhdGFfbGVuKTsNCj4gLQkJ
YnJlYWs7DQo+IC0JZGVmYXVsdDoNCj4gKwl9IGVsc2UNCj4gIAkJcmV0ID0gLUVOT1NZUzsNCj4g
LQkJYnJlYWs7DQo+IC0JfQ0KPiAgCXJldHVybiByZXQ7DQo+ICB9DQo+ICBFWFBPUlRfU1lNQk9M
KHJkbWFfcmVqZWN0KTsNCj4gQEAgLTMwOTgsMjIgKzMwNTIsMTcgQEAgaW50IHJkbWFfZGlzY29u
bmVjdChzdHJ1Y3QgcmRtYV9jbV9pZCAqaWQpDQo+ICAJaWYgKCFpZF9wcml2LT5jbV9pZC5pYikN
Cj4gIAkJcmV0dXJuIC1FSU5WQUw7DQo+IA0KPiAtCXN3aXRjaCAocmRtYV9ub2RlX2dldF90cmFu
c3BvcnQoaWQtPmRldmljZS0+bm9kZV90eXBlKSkgew0KPiAtCWNhc2UgUkRNQV9UUkFOU1BPUlRf
SUI6DQo+ICsJaWYgKHJkbWFfaWJfbWdtdChpZC0+ZGV2aWNlLCBpZC0+cG9ydF9udW0pKSB7DQo+
ICAJCXJldCA9IGNtYV9tb2RpZnlfcXBfZXJyKGlkX3ByaXYpOw0KPiAgCQlpZiAocmV0KQ0KPiAg
CQkJZ290byBvdXQ7DQo+ICAJCS8qIEluaXRpYXRlIG9yIHJlc3BvbmQgdG8gYSBkaXNjb25uZWN0
LiAqLw0KPiAgCQlpZiAoaWJfc2VuZF9jbV9kcmVxKGlkX3ByaXYtPmNtX2lkLmliLCBOVUxMLCAw
KSkNCj4gIAkJCWliX3NlbmRfY21fZHJlcChpZF9wcml2LT5jbV9pZC5pYiwgTlVMTCwgMCk7DQoN
CmNhcF9pYl9jbSgpDQoNCg0KPiAtCQlicmVhazsNCj4gLQljYXNlIFJETUFfVFJBTlNQT1JUX0lX
QVJQOg0KPiArCX0gZWxzZSBpZiAocmRtYV90cmFuc3BvcnRfaXdhcnAoaWQtPmRldmljZSwgaWQt
PnBvcnRfbnVtKSkgew0KPiAgCQlyZXQgPSBpd19jbV9kaXNjb25uZWN0KGlkX3ByaXYtPmNtX2lk
Lml3LCAwKTsNCj4gLQkJYnJlYWs7DQo+IC0JZGVmYXVsdDoNCj4gKwl9IGVsc2UNCj4gIAkJcmV0
ID0gLUVJTlZBTDsNCj4gLQkJYnJlYWs7DQo+IC0JfQ0KPiAgb3V0Og0KPiAgCXJldHVybiByZXQ7
DQo+ICB9DQo+IEBAIC0zMzU5LDI0ICszMzA4LDEzIEBAIGludCByZG1hX2pvaW5fbXVsdGljYXN0
KHN0cnVjdCByZG1hX2NtX2lkICppZCwNCj4gc3RydWN0IHNvY2thZGRyICphZGRyLA0KPiAgCWxp
c3RfYWRkKCZtYy0+bGlzdCwgJmlkX3ByaXYtPm1jX2xpc3QpOw0KPiAgCXNwaW5fdW5sb2NrKCZp
ZF9wcml2LT5sb2NrKTsNCj4gDQo+IC0Jc3dpdGNoIChyZG1hX25vZGVfZ2V0X3RyYW5zcG9ydChp
ZC0+ZGV2aWNlLT5ub2RlX3R5cGUpKSB7DQo+IC0JY2FzZSBSRE1BX1RSQU5TUE9SVF9JQjoNCj4g
LQkJc3dpdGNoIChyZG1hX3BvcnRfZ2V0X2xpbmtfbGF5ZXIoaWQtPmRldmljZSwgaWQtPnBvcnRf
bnVtKSkgew0KPiAtCQljYXNlIElCX0xJTktfTEFZRVJfSU5GSU5JQkFORDoNCj4gLQkJCXJldCA9
IGNtYV9qb2luX2liX211bHRpY2FzdChpZF9wcml2LCBtYyk7DQo+IC0JCQlicmVhazsNCj4gLQkJ
Y2FzZSBJQl9MSU5LX0xBWUVSX0VUSEVSTkVUOg0KPiAtCQkJa3JlZl9pbml0KCZtYy0+bWNyZWYp
Ow0KPiAtCQkJcmV0ID0gY21hX2lib2Vfam9pbl9tdWx0aWNhc3QoaWRfcHJpdiwgbWMpOw0KPiAt
CQkJYnJlYWs7DQo+IC0JCWRlZmF1bHQ6DQo+IC0JCQlyZXQgPSAtRUlOVkFMOw0KPiAtCQl9DQo+
IC0JCWJyZWFrOw0KPiAtCWRlZmF1bHQ6DQo+ICsJaWYgKHJkbWFfdHJhbnNwb3J0X2lib2UoaWQt
PmRldmljZSwgaWQtPnBvcnRfbnVtKSkgew0KPiArCQlrcmVmX2luaXQoJm1jLT5tY3JlZik7DQo+
ICsJCXJldCA9IGNtYV9pYm9lX2pvaW5fbXVsdGljYXN0KGlkX3ByaXYsIG1jKTsNCj4gKwl9IGVs
c2UgaWYgKHJkbWFfdHJhbnNwb3J0X2liKGlkLT5kZXZpY2UsIGlkLT5wb3J0X251bSkpDQo+ICsJ
CXJldCA9IGNtYV9qb2luX2liX211bHRpY2FzdChpZF9wcml2LCBtYyk7DQoNCmNhcF9pYl9tY2Fz
dCgpDQoNCg0KPiArCWVsc2UNCj4gIAkJcmV0ID0gLUVOT1NZUzsNCj4gLQkJYnJlYWs7DQo+IC0J
fQ0KPiANCj4gIAlpZiAocmV0KSB7DQo+ICAJCXNwaW5fbG9ja19pcnEoJmlkX3ByaXYtPmxvY2sp
Ow0KPiBAQCAtMzQwNCwxOSArMzM0MiwxNyBAQCB2b2lkIHJkbWFfbGVhdmVfbXVsdGljYXN0KHN0
cnVjdCByZG1hX2NtX2lkICppZCwNCj4gc3RydWN0IHNvY2thZGRyICphZGRyKQ0KPiAgCQkJCWli
X2RldGFjaF9tY2FzdChpZC0+cXAsDQo+ICAJCQkJCQkmbWMtPm11bHRpY2FzdC5pYi0+cmVjLm1n
aWQsDQo+ICAJCQkJCQliZTE2X3RvX2NwdShtYy0+bXVsdGljYXN0LmliLQ0KPiA+cmVjLm1saWQp
KTsNCj4gLQkJCWlmIChyZG1hX25vZGVfZ2V0X3RyYW5zcG9ydChpZF9wcml2LT5jbWFfZGV2LT5k
ZXZpY2UtDQo+ID5ub2RlX3R5cGUpID09IFJETUFfVFJBTlNQT1JUX0lCKSB7DQo+IC0JCQkJc3dp
dGNoIChyZG1hX3BvcnRfZ2V0X2xpbmtfbGF5ZXIoaWQtPmRldmljZSwgaWQtDQo+ID5wb3J0X251
bSkpIHsNCj4gLQkJCQljYXNlIElCX0xJTktfTEFZRVJfSU5GSU5JQkFORDoNCj4gLQkJCQkJaWJf
c2FfZnJlZV9tdWx0aWNhc3QobWMtPm11bHRpY2FzdC5pYik7DQo+IC0JCQkJCWtmcmVlKG1jKTsN
Cj4gLQkJCQkJYnJlYWs7DQo+IC0JCQkJY2FzZSBJQl9MSU5LX0xBWUVSX0VUSEVSTkVUOg0KPiAt
CQkJCQlrcmVmX3B1dCgmbWMtPm1jcmVmLCByZWxlYXNlX21jKTsNCj4gLQkJCQkJYnJlYWs7DQo+
IC0JCQkJZGVmYXVsdDoNCj4gLQkJCQkJYnJlYWs7DQo+IC0JCQkJfQ0KPiAtCQkJfQ0KPiArDQo+
ICsJCQkvKiBXaWxsIHRoaXMgaGFwcGVuPyAqLw0KPiArCQkJQlVHX09OKGlkX3ByaXYtPmNtYV9k
ZXYtPmRldmljZSAhPSBpZC0+ZGV2aWNlKTsNCg0KU2hvdWxkIG5vdCBoYXBwZW4NCg0KPiArDQo+
ICsJCQlpZiAocmRtYV90cmFuc3BvcnRfaWIoaWQtPmRldmljZSwgaWQtPnBvcnRfbnVtKSkgew0K
PiArCQkJCWliX3NhX2ZyZWVfbXVsdGljYXN0KG1jLT5tdWx0aWNhc3QuaWIpOw0KPiArCQkJCWtm
cmVlKG1jKTsNCg0KY2FwX2liX21jYXN0KCkNCg0KDQo+ICsJCQl9IGVsc2UgaWYgKHJkbWFfdHJh
bnNwb3J0X2lib2UoaWQtPmRldmljZSwNCj4gKwkJCQkJCSAgICAgICBpZC0+cG9ydF9udW0pKQ0K
PiArCQkJCWtyZWZfcHV0KCZtYy0+bWNyZWYsIHJlbGVhc2VfbWMpOw0KPiArDQo+ICAJCQlyZXR1
cm47DQo+ICAJCX0NCj4gIAl9DQo+IGRpZmYgLS1naXQgYS9kcml2ZXJzL2luZmluaWJhbmQvY29y
ZS91Y21hLmMNCj4gYi9kcml2ZXJzL2luZmluaWJhbmQvY29yZS91Y21hLmMNCj4gaW5kZXggNDVk
NjdlOS4uNDJjOWJmNiAxMDA2NDQNCj4gLS0tIGEvZHJpdmVycy9pbmZpbmliYW5kL2NvcmUvdWNt
YS5jDQo+ICsrKyBiL2RyaXZlcnMvaW5maW5pYmFuZC9jb3JlL3VjbWEuYw0KPiBAQCAtNzIyLDI2
ICs3MjIsMTMgQEAgc3RhdGljIHNzaXplX3QgdWNtYV9xdWVyeV9yb3V0ZShzdHJ1Y3QgdWNtYV9m
aWxlDQo+ICpmaWxlLA0KPiANCj4gIAlyZXNwLm5vZGVfZ3VpZCA9IChfX2ZvcmNlIF9fdTY0KSBj
dHgtPmNtX2lkLT5kZXZpY2UtPm5vZGVfZ3VpZDsNCj4gIAlyZXNwLnBvcnRfbnVtID0gY3R4LT5j
bV9pZC0+cG9ydF9udW07DQo+IC0Jc3dpdGNoIChyZG1hX25vZGVfZ2V0X3RyYW5zcG9ydChjdHgt
PmNtX2lkLT5kZXZpY2UtPm5vZGVfdHlwZSkpIHsNCj4gLQljYXNlIFJETUFfVFJBTlNQT1JUX0lC
Og0KPiAtCQlzd2l0Y2ggKHJkbWFfcG9ydF9nZXRfbGlua19sYXllcihjdHgtPmNtX2lkLT5kZXZp
Y2UsDQo+IC0JCQljdHgtPmNtX2lkLT5wb3J0X251bSkpIHsNCj4gLQkJY2FzZSBJQl9MSU5LX0xB
WUVSX0lORklOSUJBTkQ6DQo+IC0JCQl1Y21hX2NvcHlfaWJfcm91dGUoJnJlc3AsICZjdHgtPmNt
X2lkLT5yb3V0ZSk7DQo+IC0JCQlicmVhazsNCj4gLQkJY2FzZSBJQl9MSU5LX0xBWUVSX0VUSEVS
TkVUOg0KPiAtCQkJdWNtYV9jb3B5X2lib2Vfcm91dGUoJnJlc3AsICZjdHgtPmNtX2lkLT5yb3V0
ZSk7DQo+IC0JCQlicmVhazsNCj4gLQkJZGVmYXVsdDoNCj4gLQkJCWJyZWFrOw0KPiAtCQl9DQo+
IC0JCWJyZWFrOw0KPiAtCWNhc2UgUkRNQV9UUkFOU1BPUlRfSVdBUlA6DQo+ICsNCj4gKwlpZiAo
cmRtYV90cmFuc3BvcnRfaWIoY3R4LT5jbV9pZC0+ZGV2aWNlLCBjdHgtPmNtX2lkLT5wb3J0X251
bSkpDQo+ICsJCXVjbWFfY29weV9pYl9yb3V0ZSgmcmVzcCwgJmN0eC0+Y21faWQtPnJvdXRlKTsN
Cg0KY2FwX2liX3NhKCkNCg0KDQo+ICsJZWxzZSBpZiAocmRtYV90cmFuc3BvcnRfaWJvZShjdHgt
PmNtX2lkLT5kZXZpY2UsIGN0eC0+Y21faWQtDQo+ID5wb3J0X251bSkpDQo+ICsJCXVjbWFfY29w
eV9pYm9lX3JvdXRlKCZyZXNwLCAmY3R4LT5jbV9pZC0+cm91dGUpOw0KPiArCWVsc2UgaWYgKHJk
bWFfdHJhbnNwb3J0X2l3YXJwKGN0eC0+Y21faWQtPmRldmljZSwgY3R4LT5jbV9pZC0NCj4gPnBv
cnRfbnVtKSkNCj4gIAkJdWNtYV9jb3B5X2l3X3JvdXRlKCZyZXNwLCAmY3R4LT5jbV9pZC0+cm91
dGUpOw0KPiAtCQlicmVhazsNCj4gLQlkZWZhdWx0Og0KPiAtCQlicmVhazsNCj4gLQl9DQo+IA0K
PiAgb3V0Og0KPiAgCWlmIChjb3B5X3RvX3VzZXIoKHZvaWQgX191c2VyICopKHVuc2lnbmVkIGxv
bmcpY21kLnJlc3BvbnNlLA0KDQoNCi0gU2Vhbg0K

2015-04-08 08:13:52

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 03/17] IB/Verbs: Use management helper cap_ib_mad() for mad-check

On 04/07/2015 07:26 PM, Jason Gunthorpe wrote:
> On Tue, Apr 07, 2015 at 02:30:22PM +0200, Michael Wang wrote:
>
>> - if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
>> - return;
>> -
>> if (device->node_type == RDMA_NODE_IB_SWITCH) {
>> start = 0;
>> end = 0;
>> @@ -3069,6 +3066,9 @@ static void ib_mad_init_device(struct ib_device *device)
>> }
>>
>> for (i = start; i <= end; i++) {
>> + if (!cap_ib_mad(device, i))
>> + continue;
>> +
>
> I would prefer to see these changes in control flow as dedicated
> patches, at the top of your patch stack.
>
> For this kind of work a patch should be mechanical changes only, it is
> easier to review that way.
>
> Same comment applies throughout.

Make sense :-) I will re-organize the sequence and put them at last.

Regards,
Michael Wang

>
> Jason
>

2015-04-08 08:24:13

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 10/17] IB/Verbs: Adopt management helpers for IB helpers

On 04/07/2015 08:40 PM, Hefty, Sean wrote:
[snip]
>> @@ -200,11 +200,9 @@ int ib_init_ah_from_wc(struct ib_device *device, u8
>> port_num, struct ib_wc *wc,
>> u32 flow_class;
>> u16 gid_index;
>> int ret;
>> - int is_eth = (rdma_port_get_link_layer(device, port_num) ==
>> - IB_LINK_LAYER_ETHERNET);
>>
>> memset(ah_attr, 0, sizeof *ah_attr);
>> - if (is_eth) {
>> + if (!rdma_transport_ib(device, port_num)) {
>> if (!(wc->wc_flags & IB_WC_GRH))
>> return -EPROTOTYPE;
>>
>> @@ -873,7 +871,7 @@ int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
>> union ib_gid sgid;
>>
>> if ((*qp_attr_mask & IB_QP_AV) &&
>> - (rdma_port_get_link_layer(qp->device, qp_attr->ah_attr.port_num)
>> == IB_LINK_LAYER_ETHERNET)) {
>> + (!rdma_transport_ib(qp->device, qp_attr->ah_attr.port_num))) {
>> ret = ib_query_gid(qp->device, qp_attr->ah_attr.port_num,
>> qp_attr->ah_attr.grh.sgid_index, &sgid);
>> if (ret)
>
> The above checks would be better as:
>
> force_grh = rdma_transport_iboe(...)
>
> They are RoCE/IBoE specific checks.

Got it, will be in next version :-)

Regards,
Michael Wang

>

2015-04-08 08:28:15

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 10/17] IB/Verbs: Adopt management helpers for IB helpers

Hi, Steve

Thanks for the comment :-)

On 04/07/2015 10:16 PM, Steve Wise wrote:
[snip]
>>>
>>> - force_grh = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_ETHERNET;
>>> + force_grh = !rdma_transport_ib(device, port_num);
>>
>> Maybe these tests should be called cap_mandatory_grh - but I'm not
>> really sure how iWarp uses the GRH fields in the AH...
>>
>
> iWARP runs on top of TCP...this SA code is all IB-specific. The reason it was checking for ETHERNET, I think, is for RoCE. So
> this change is totally incorrect, I think, because RoCE is an IB transport, but it runs on ETHERNET.

I guess it's the name 'transport' which confusing folks... actually (!rdma_transport_ib)
including RoCE/IBoE, but yes, here it's for IBoE only, so let's change it to
rdma_transport_iboe ;-)

Regards,
Michael Wang

>
> Steve.
>
>
>

2015-04-08 08:30:01

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 11/17] IB/Verbs: Reform link_layer_show() and ib_uverbs_query_port()



On 04/07/2015 08:49 PM, Hefty, Sean wrote:
[snip]
>> @@ -515,8 +515,10 @@ ssize_t ib_uverbs_query_port(struct ib_uverbs_file
>> *file,
>> resp.active_width = attr.active_width;
>> resp.active_speed = attr.active_speed;
>> resp.phys_state = attr.phys_state;
>> - resp.link_layer = rdma_port_get_link_layer(file->device-
>>> ib_dev,
>> - cmd.port_num);
>> + resp.link_layer = rdma_transport_ib(file->device->ib_dev,
>> + cmd.port_num) ?
>> + IB_LINK_LAYER_INFINIBAND :
>> + IB_LINK_LAYER_ETHERNET;
>>
>> if (copy_to_user((void __user *) (unsigned long) cmd.response,
>> &resp, sizeof resp))
>
> Both of the above check the transport in order to determine the link layer.
>
> These values are exposed to user space. Does anyone know what link layer iWarp returns to user space?

It should be ETH for IWARP according to the old logical :-)

Regards,
Michael Wang

>

2015-04-08 08:39:11

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 13/17] IB/Verbs: Reform cma/ucma with management helpers


On 04/07/2015 11:11 PM, Steve Wise wrote:
[snip]
>> @@ -1006,17 +997,14 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv)
>> mc = container_of(id_priv->mc_list.next,
>> struct cma_multicast, list);
>> list_del(&mc->list);
>> - switch (rdma_port_get_link_layer(id_priv->cma_dev->device, id_priv->id.port_num)) {
>> - case IB_LINK_LAYER_INFINIBAND:
>> + if (rdma_transport_ib(id_priv->cma_dev->device,
>> + id_priv->id.port_num)) {
>> ib_sa_free_multicast(mc->multicast.ib);
>> kfree(mc);
>> break;
>> - case IB_LINK_LAYER_ETHERNET:
>> + } else if (rdma_transport_ib(id_priv->cma_dev->device,
>> + id_priv->id.port_num))
>> kref_put(&mc->mcref, release_mc);
>> - break;
>> - default:
>> - break;
>> - }
>> }
>> }
>>
>
> Doesn't the above change result in:
>
> if (rdma_transport_ib()) {
> } else if (rdma_transport_ib()) {
> }
>

My bad here.. I guess 'else' is enough.

Regards,
Michael Wang

> ????
>
>> @@ -1037,17 +1025,13 @@ void rdma_destroy_id(struct rdma_cm_id *id)
>> mutex_unlock(&id_priv->handler_mutex);
>>
>> if (id_priv->cma_dev) {
>> - switch (rdma_node_get_transport(id_priv->id.device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> + if (rdma_ib_mgmt(id_priv->id.device, id_priv->id.port_num)) {
>> if (id_priv->cm_id.ib)
>> ib_destroy_cm_id(id_priv->cm_id.ib);
>> - break;
>> - case RDMA_TRANSPORT_IWARP:
>> + } else if (rdma_transport_iwarp(id_priv->id.device,
>> + id_priv->id.port_num)) {
>> if (id_priv->cm_id.iw)
>> iw_destroy_cm_id(id_priv->cm_id.iw);
>> - break;
>> - default:
>> - break;
>> }
>> cma_leave_mc_groups(id_priv);
>> cma_release_dev(id_priv);
>> @@ -1966,26 +1950,14 @@ int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms)
>> return -EINVAL;
>>
>> atomic_inc(&id_priv->refcount);
>> - switch (rdma_node_get_transport(id->device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> - switch (rdma_port_get_link_layer(id->device, id->port_num)) {
>> - case IB_LINK_LAYER_INFINIBAND:
>> - ret = cma_resolve_ib_route(id_priv, timeout_ms);
>> - break;
>> - case IB_LINK_LAYER_ETHERNET:
>> - ret = cma_resolve_iboe_route(id_priv);
>> - break;
>> - default:
>> - ret = -ENOSYS;
>> - }
>> - break;
>> - case RDMA_TRANSPORT_IWARP:
>> + if (rdma_transport_ib(id->device, id->port_num))
>> + ret = cma_resolve_ib_route(id_priv, timeout_ms);
>> + else if (rdma_transport_iboe(id->device, id->port_num))
>> + ret = cma_resolve_iboe_route(id_priv);
>> + else if (rdma_transport_iwarp(id->device, id->port_num))
>> ret = cma_resolve_iw_route(id_priv, timeout_ms);
>> - break;
>> - default:
>> + else
>> ret = -ENOSYS;
>> - break;
>> - }
>> if (ret)
>> goto err;
>>
>> @@ -2059,7 +2031,7 @@ port_found:
>> goto out;
>>
>> id_priv->id.route.addr.dev_addr.dev_type =
>> - (rdma_port_get_link_layer(cma_dev->device, p) == IB_LINK_LAYER_INFINIBAND) ?
>> + (rdma_transport_ib(cma_dev->device, p)) ?
>> ARPHRD_INFINIBAND : ARPHRD_ETHER;
>>
>> rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid);
>> @@ -2536,18 +2508,15 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
>>
>> id_priv->backlog = backlog;
>> if (id->device) {
>> - switch (rdma_node_get_transport(id->device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> + if (rdma_ib_mgmt(id->device, id->port_num)) {
>> ret = cma_ib_listen(id_priv);
>> if (ret)
>> goto err;
>> - break;
>> - case RDMA_TRANSPORT_IWARP:
>> + } else if (rdma_transport_iwarp(id->device, id->port_num)) {
>> ret = cma_iw_listen(id_priv, backlog);
>> if (ret)
>> goto err;
>> - break;
>> - default:
>> + } else {
>> ret = -ENOSYS;
>> goto err;
>> }
>> @@ -2883,20 +2852,15 @@ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
>> id_priv->srq = conn_param->srq;
>> }
>>
>> - switch (rdma_node_get_transport(id->device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> + if (rdma_ib_mgmt(id->device, id->port_num)) {
>> if (id->qp_type == IB_QPT_UD)
>> ret = cma_resolve_ib_udp(id_priv, conn_param);
>> else
>> ret = cma_connect_ib(id_priv, conn_param);
>> - break;
>> - case RDMA_TRANSPORT_IWARP:
>> + } else if (rdma_transport_iwarp(id->device, id->port_num))
>> ret = cma_connect_iw(id_priv, conn_param);
>> - break;
>> - default:
>> + else
>> ret = -ENOSYS;
>> - break;
>> - }
>> if (ret)
>> goto err;
>>
>> @@ -2999,8 +2963,7 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
>> id_priv->srq = conn_param->srq;
>> }
>>
>> - switch (rdma_node_get_transport(id->device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> + if (rdma_ib_mgmt(id->device, id->port_num)) {
>> if (id->qp_type == IB_QPT_UD) {
>> if (conn_param)
>> ret = cma_send_sidr_rep(id_priv, IB_SIDR_SUCCESS,
>> @@ -3016,14 +2979,10 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
>> else
>> ret = cma_rep_recv(id_priv);
>> }
>> - break;
>> - case RDMA_TRANSPORT_IWARP:
>> + } else if (rdma_transport_iwarp(id->device, id->port_num))
>> ret = cma_accept_iw(id_priv, conn_param);
>> - break;
>> - default:
>> + else
>> ret = -ENOSYS;
>> - break;
>> - }
>>
>> if (ret)
>> goto reject;
>> @@ -3067,8 +3026,7 @@ int rdma_reject(struct rdma_cm_id *id, const void *private_data,
>> if (!id_priv->cm_id.ib)
>> return -EINVAL;
>>
>> - switch (rdma_node_get_transport(id->device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> + if (rdma_ib_mgmt(id->device, id->port_num)) {
>> if (id->qp_type == IB_QPT_UD)
>> ret = cma_send_sidr_rep(id_priv, IB_SIDR_REJECT, 0,
>> private_data, private_data_len);
>> @@ -3076,15 +3034,11 @@ int rdma_reject(struct rdma_cm_id *id, const void *private_data,
>> ret = ib_send_cm_rej(id_priv->cm_id.ib,
>> IB_CM_REJ_CONSUMER_DEFINED, NULL,
>> 0, private_data, private_data_len);
>> - break;
>> - case RDMA_TRANSPORT_IWARP:
>> + } else if (rdma_transport_iwarp(id->device, id->port_num)) {
>> ret = iw_cm_reject(id_priv->cm_id.iw,
>> private_data, private_data_len);
>> - break;
>> - default:
>> + } else
>> ret = -ENOSYS;
>> - break;
>> - }
>> return ret;
>> }
>> EXPORT_SYMBOL(rdma_reject);
>> @@ -3098,22 +3052,17 @@ int rdma_disconnect(struct rdma_cm_id *id)
>> if (!id_priv->cm_id.ib)
>> return -EINVAL;
>>
>> - switch (rdma_node_get_transport(id->device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> + if (rdma_ib_mgmt(id->device, id->port_num)) {
>> ret = cma_modify_qp_err(id_priv);
>> if (ret)
>> goto out;
>> /* Initiate or respond to a disconnect. */
>> if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0))
>> ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0);
>> - break;
>> - case RDMA_TRANSPORT_IWARP:
>> + } else if (rdma_transport_iwarp(id->device, id->port_num)) {
>> ret = iw_cm_disconnect(id_priv->cm_id.iw, 0);
>> - break;
>> - default:
>> + } else
>> ret = -EINVAL;
>> - break;
>> - }
>> out:
>> return ret;
>> }
>> @@ -3359,24 +3308,13 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
>> list_add(&mc->list, &id_priv->mc_list);
>> spin_unlock(&id_priv->lock);
>>
>> - switch (rdma_node_get_transport(id->device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> - switch (rdma_port_get_link_layer(id->device, id->port_num)) {
>> - case IB_LINK_LAYER_INFINIBAND:
>> - ret = cma_join_ib_multicast(id_priv, mc);
>> - break;
>> - case IB_LINK_LAYER_ETHERNET:
>> - kref_init(&mc->mcref);
>> - ret = cma_iboe_join_multicast(id_priv, mc);
>> - break;
>> - default:
>> - ret = -EINVAL;
>> - }
>> - break;
>> - default:
>> + if (rdma_transport_iboe(id->device, id->port_num)) {
>> + kref_init(&mc->mcref);
>> + ret = cma_iboe_join_multicast(id_priv, mc);
>> + } else if (rdma_transport_ib(id->device, id->port_num))
>> + ret = cma_join_ib_multicast(id_priv, mc);
>> + else
>> ret = -ENOSYS;
>> - break;
>> - }
>>
>> if (ret) {
>> spin_lock_irq(&id_priv->lock);
>> @@ -3404,19 +3342,17 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr)
>> ib_detach_mcast(id->qp,
>> &mc->multicast.ib->rec.mgid,
>> be16_to_cpu(mc->multicast.ib->rec.mlid));
>> - if (rdma_node_get_transport(id_priv->cma_dev->device->node_type) == RDMA_TRANSPORT_IB) {
>> - switch (rdma_port_get_link_layer(id->device, id->port_num)) {
>> - case IB_LINK_LAYER_INFINIBAND:
>> - ib_sa_free_multicast(mc->multicast.ib);
>> - kfree(mc);
>> - break;
>> - case IB_LINK_LAYER_ETHERNET:
>> - kref_put(&mc->mcref, release_mc);
>> - break;
>> - default:
>> - break;
>> - }
>> - }
>> +
>> + /* Will this happen? */
>> + BUG_ON(id_priv->cma_dev->device != id->device);
>> +
>> + if (rdma_transport_ib(id->device, id->port_num)) {
>> + ib_sa_free_multicast(mc->multicast.ib);
>> + kfree(mc);
>> + } else if (rdma_transport_iboe(id->device,
>> + id->port_num))
>> + kref_put(&mc->mcref, release_mc);
>> +
>> return;
>> }
>> }
>> diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
>> index 45d67e9..42c9bf6 100644
>> --- a/drivers/infiniband/core/ucma.c
>> +++ b/drivers/infiniband/core/ucma.c
>> @@ -722,26 +722,13 @@ static ssize_t ucma_query_route(struct ucma_file *file,
>>
>> resp.node_guid = (__force __u64) ctx->cm_id->device->node_guid;
>> resp.port_num = ctx->cm_id->port_num;
>> - switch (rdma_node_get_transport(ctx->cm_id->device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> - switch (rdma_port_get_link_layer(ctx->cm_id->device,
>> - ctx->cm_id->port_num)) {
>> - case IB_LINK_LAYER_INFINIBAND:
>> - ucma_copy_ib_route(&resp, &ctx->cm_id->route);
>> - break;
>> - case IB_LINK_LAYER_ETHERNET:
>> - ucma_copy_iboe_route(&resp, &ctx->cm_id->route);
>> - break;
>> - default:
>> - break;
>> - }
>> - break;
>> - case RDMA_TRANSPORT_IWARP:
>> +
>> + if (rdma_transport_ib(ctx->cm_id->device, ctx->cm_id->port_num))
>> + ucma_copy_ib_route(&resp, &ctx->cm_id->route);
>> + else if (rdma_transport_iboe(ctx->cm_id->device, ctx->cm_id->port_num))
>> + ucma_copy_iboe_route(&resp, &ctx->cm_id->route);
>> + else if (rdma_transport_iwarp(ctx->cm_id->device, ctx->cm_id->port_num))
>> ucma_copy_iw_route(&resp, &ctx->cm_id->route);
>> - break;
>> - default:
>> - break;
>> - }
>>
>> out:
>> if (copy_to_user((void __user *)(unsigned long)cmd.response,
>> --
>> 2.1.0
>

2015-04-08 08:41:32

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 02/17] IB/Verbs: Implement raw management helpers


On 04/07/2015 11:25 PM, Hefty, Sean wrote:
>> +static inline int rdma_transport_ib(struct ib_device *device, u8
>> port_num)
>> +{
>> + return device->query_transport(device, port_num)
>> + == RDMA_TRANSPORT_IB;
>> +}
>> +
>> +static inline int rdma_transport_iboe(struct ib_device *device, u8
>> port_num)
>> +{
>> + return device->query_transport(device, port_num)
>> + == RDMA_TRANSPORT_IBOE;
>> +}
>
> We need to do something with the function names to make their use more obvious. Both IB and IBoE have transport IB. I think Jason suggested rdma_tech_ib / rdma_tech_iboe.
>
> Regarding transport types, I believe that usnic supports 2 different transports. Although usnic isn't used by anything else in the core layer, we should probably be able to handle a device that supports multiple protocols. I'm not sure what the 'transport' should be for iWarp, since iWarp is layered over TCP. But that may just mean that the term transport isn't great.

Agree, it do confusing folks, I will use tech instead in next version :-)

Regards,
Michael Wang

>
> - Sean
>

2015-04-08 08:51:36

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 09/17] IB/Verbs: Use helper cap_read_multi_sge() and reform svc_rdma_accept()

On 04/07/2015 07:42 PM, Jason Gunthorpe wrote:
[snip]
>>> @@ -992,8 +992,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
>>> dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
>>> } else
>>> need_dma_mr = 0;
>>> - break;
>>> - case RDMA_TRANSPORT_IB:
>>> + } else if (rdma_ib_mgmt(newxprt->sc_cm_id->device,
>>> + newxprt->sc_cm_id->port_num)) {
>>> if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) {
>>> need_dma_mr = 1;
>>> dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
>>
>> Now I'm even more confused. How is the presence of IB management
>> related to needing a privileged lmr?
>
> Agree, this needs to be someone else.
>
> I think the test is probably based on this comment:
>
> * NB: iWARP requires remote write access for the data sink
> * of an RDMA_READ. IB does not.
>
> So the if should be:
>
> if (cap_rdma_read_needs_write(..) &&
> !(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) {
> need_dma_mr = 1;
> dma_mr_acc =
> (IB_ACCESS_LOCAL_WRITE |
> IB_ACCESS_REMOTE_WRITE);
>
> And the identical if blocks merged.
>
> Plus the
> if (rdma_transport_iwarp(newxprt->sc_cm_id->device,
> newxprt->sc_cm_id->port_num))
> newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV

Sounds good :-) I'll give this part a reform in next version.

Regards,
Michael Wang

>
> Jason
>

2015-04-08 09:37:06

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 13/17] IB/Verbs: Reform cma/ucma with management helpers

Hi, Sean

Thanks for the review :-) cma is the most tough part during
reform, I really need some guide in here.


On 04/07/2015 11:36 PM, Hefty, Sean wrote:
>> diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
>> index d8a8ea7..c23f483 100644
>> --- a/drivers/infiniband/core/cma.c
>> +++ b/drivers/infiniband/core/cma.c
>> @@ -435,10 +435,10 @@ static int cma_resolve_ib_dev(struct rdma_id_private
>> *id_priv)
>> pkey = ntohs(addr->sib_pkey);
>>
>> list_for_each_entry(cur_dev, &dev_list, list) {
>> - if (rdma_node_get_transport(cur_dev->device->node_type) !=
>> RDMA_TRANSPORT_IB)
>> - continue;
>> -
>> for (p = 1; p <= cur_dev->device->phys_port_cnt; ++p) {
>> + if (!rdma_ib_mgmt(cur_dev->device, p))
>> + continue;
>
> This check wants to be something like is_af_ib_supported(). Checking for IB transport may actually be better than checking for IB management. I don't know if IBoE/RoCE devices support AF_IB.

The wrapper make sense, but do we have the guarantee that IBoE port won't
be used for AF_IB address? I just can't locate the place we filtered it out...

>
[snip]
>> - == IB_LINK_LAYER_ETHERNET) {
>> + /* Will this happen? */
>> + BUG_ON(id_priv->cma_dev->device != id_priv->id.device);
>
> This shouldn't happen. The BUG_ON looks okay.

Got it :-)

>
>
>> + if (rdma_transport_iboe(id_priv->id.device, id_priv->id.port_num)) {
>> ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL);
>>
>> if (ret)
>> @@ -700,8 +700,7 @@ static int cma_ib_init_qp_attr(struct rdma_id_private
>> *id_priv,
>> int ret;
>> u16 pkey;
>>
>> - if (rdma_port_get_link_layer(id_priv->id.device, id_priv-
>>> id.port_num) ==
>> - IB_LINK_LAYER_INFINIBAND)
>> + if (rdma_transport_ib(id_priv->id.device, id_priv->id.port_num))
>> pkey = ib_addr_get_pkey(dev_addr);
>> else
>> pkey = 0xffff;
>
> Check here should be against the link layer, not transport.

I guess the name confusing us again... what if use rdma_tech_ib() here?
it's the only tech using IB link layers, others are all ETH.

>
>
>> @@ -735,8 +734,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct
[snip]
>>
>> static void cma_cancel_route(struct rdma_id_private *id_priv)
>> {
>> - switch (rdma_port_get_link_layer(id_priv->id.device, id_priv-
>>> id.port_num)) {
>> - case IB_LINK_LAYER_INFINIBAND:
>> + if (rdma_transport_ib(id_priv->id.device, id_priv->id.port_num)) {
>
> The check should be cap_ib_sa()

Got it, will be in next version :-)

All the mcast/sa suggestion below will be applied too.

>
[snip]
>>
>> id_priv->id.route.addr.dev_addr.dev_type =
>> - (rdma_port_get_link_layer(cma_dev->device, p) ==
>> IB_LINK_LAYER_INFINIBAND) ?
>> + (rdma_transport_ib(cma_dev->device, p)) ?
>> ARPHRD_INFINIBAND : ARPHRD_ETHER;
>
> This wants the link layer, or maybe use cap_ipoib.

Is this related with ipoib only?

>
>
>>
>> rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid);
>> @@ -2536,18 +2508,15 @@ int rdma_listen(struct rdma_cm_id *id, int
>> backlog)
>>
>> id_priv->backlog = backlog;
>> if (id->device) {
>> - switch (rdma_node_get_transport(id->device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> + if (rdma_ib_mgmt(id->device, id->port_num)) {
>
> Want cap_ib_cm()

Will be in next version :-) and the other cap_ib_cm() suggestion too.

>
>
>> ret = cma_ib_listen(id_priv);
[snip]
>> @@ -3016,14 +2979,10 @@ int rdma_accept(struct rdma_cm_id *id, struct
>> rdma_conn_param *conn_param)
>> else
>> ret = cma_rep_recv(id_priv);
>> }
>> - break;
>> - case RDMA_TRANSPORT_IWARP:
>> + } else if (rdma_transport_iwarp(id->device, id->port_num))
>> ret = cma_accept_iw(id_priv, conn_param);
>
> If cap_ib_cm() is used in the places marked above, maybe add a cap_iw_cm() for the else conditions.

Sounds good, will be in next version :-)

Regards,
Michael Wang

>
>
>> - break;
>> - default:
>> + else
>> ret = -ENOSYS;
>> - break;
>> - }
>>
>> if (ret)
>> goto reject;
>> @@ -3067,8 +3026,7 @@ int rdma_reject(struct rdma_cm_id *id, const void
>> *private_data,
>> if (!id_priv->cm_id.ib)
>> return -EINVAL;
>>
>> - switch (rdma_node_get_transport(id->device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> + if (rdma_ib_mgmt(id->device, id->port_num)) {
>
> cap_ib_cm()
>
>
>> if (id->qp_type == IB_QPT_UD)
>> ret = cma_send_sidr_rep(id_priv, IB_SIDR_REJECT, 0,
>> private_data, private_data_len);
>> @@ -3076,15 +3034,11 @@ int rdma_reject(struct rdma_cm_id *id, const void
>> *private_data,
>> ret = ib_send_cm_rej(id_priv->cm_id.ib,
>> IB_CM_REJ_CONSUMER_DEFINED, NULL,
>> 0, private_data, private_data_len);
>> - break;
>> - case RDMA_TRANSPORT_IWARP:
>> + } else if (rdma_transport_iwarp(id->device, id->port_num)) {
>> ret = iw_cm_reject(id_priv->cm_id.iw,
>> private_data, private_data_len);
>> - break;
>> - default:
>> + } else
>> ret = -ENOSYS;
>> - break;
>> - }
>> return ret;
>> }
>> EXPORT_SYMBOL(rdma_reject);
>> @@ -3098,22 +3052,17 @@ int rdma_disconnect(struct rdma_cm_id *id)
>> if (!id_priv->cm_id.ib)
>> return -EINVAL;
>>
>> - switch (rdma_node_get_transport(id->device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> + if (rdma_ib_mgmt(id->device, id->port_num)) {
>> ret = cma_modify_qp_err(id_priv);
>> if (ret)
>> goto out;
>> /* Initiate or respond to a disconnect. */
>> if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0))
>> ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0);
>
> cap_ib_cm()
>
>
>> - break;
>> - case RDMA_TRANSPORT_IWARP:
>> + } else if (rdma_transport_iwarp(id->device, id->port_num)) {
>> ret = iw_cm_disconnect(id_priv->cm_id.iw, 0);
>> - break;
>> - default:
>> + } else
>> ret = -EINVAL;
>> - break;
>> - }
>> out:
>> return ret;
>> }
>> @@ -3359,24 +3308,13 @@ int rdma_join_multicast(struct rdma_cm_id *id,
>> struct sockaddr *addr,
>> list_add(&mc->list, &id_priv->mc_list);
>> spin_unlock(&id_priv->lock);
>>
>> - switch (rdma_node_get_transport(id->device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> - switch (rdma_port_get_link_layer(id->device, id->port_num)) {
>> - case IB_LINK_LAYER_INFINIBAND:
>> - ret = cma_join_ib_multicast(id_priv, mc);
>> - break;
>> - case IB_LINK_LAYER_ETHERNET:
>> - kref_init(&mc->mcref);
>> - ret = cma_iboe_join_multicast(id_priv, mc);
>> - break;
>> - default:
>> - ret = -EINVAL;
>> - }
>> - break;
>> - default:
>> + if (rdma_transport_iboe(id->device, id->port_num)) {
>> + kref_init(&mc->mcref);
>> + ret = cma_iboe_join_multicast(id_priv, mc);
>> + } else if (rdma_transport_ib(id->device, id->port_num))
>> + ret = cma_join_ib_multicast(id_priv, mc);
>
> cap_ib_mcast()
>
>
>> + else
>> ret = -ENOSYS;
>> - break;
>> - }
>>
>> if (ret) {
>> spin_lock_irq(&id_priv->lock);
>> @@ -3404,19 +3342,17 @@ void rdma_leave_multicast(struct rdma_cm_id *id,
>> struct sockaddr *addr)
>> ib_detach_mcast(id->qp,
>> &mc->multicast.ib->rec.mgid,
>> be16_to_cpu(mc->multicast.ib-
>>> rec.mlid));
>> - if (rdma_node_get_transport(id_priv->cma_dev->device-
>>> node_type) == RDMA_TRANSPORT_IB) {
>> - switch (rdma_port_get_link_layer(id->device, id-
>>> port_num)) {
>> - case IB_LINK_LAYER_INFINIBAND:
>> - ib_sa_free_multicast(mc->multicast.ib);
>> - kfree(mc);
>> - break;
>> - case IB_LINK_LAYER_ETHERNET:
>> - kref_put(&mc->mcref, release_mc);
>> - break;
>> - default:
>> - break;
>> - }
>> - }
>> +
>> + /* Will this happen? */
>> + BUG_ON(id_priv->cma_dev->device != id->device);
>
> Should not happen
>
>> +
>> + if (rdma_transport_ib(id->device, id->port_num)) {
>> + ib_sa_free_multicast(mc->multicast.ib);
>> + kfree(mc);
>
> cap_ib_mcast()
>
>
>> + } else if (rdma_transport_iboe(id->device,
>> + id->port_num))
>> + kref_put(&mc->mcref, release_mc);
>> +
>> return;
>> }
>> }
>> diff --git a/drivers/infiniband/core/ucma.c
>> b/drivers/infiniband/core/ucma.c
>> index 45d67e9..42c9bf6 100644
>> --- a/drivers/infiniband/core/ucma.c
>> +++ b/drivers/infiniband/core/ucma.c
>> @@ -722,26 +722,13 @@ static ssize_t ucma_query_route(struct ucma_file
>> *file,
>>
>> resp.node_guid = (__force __u64) ctx->cm_id->device->node_guid;
>> resp.port_num = ctx->cm_id->port_num;
>> - switch (rdma_node_get_transport(ctx->cm_id->device->node_type)) {
>> - case RDMA_TRANSPORT_IB:
>> - switch (rdma_port_get_link_layer(ctx->cm_id->device,
>> - ctx->cm_id->port_num)) {
>> - case IB_LINK_LAYER_INFINIBAND:
>> - ucma_copy_ib_route(&resp, &ctx->cm_id->route);
>> - break;
>> - case IB_LINK_LAYER_ETHERNET:
>> - ucma_copy_iboe_route(&resp, &ctx->cm_id->route);
>> - break;
>> - default:
>> - break;
>> - }
>> - break;
>> - case RDMA_TRANSPORT_IWARP:
>> +
>> + if (rdma_transport_ib(ctx->cm_id->device, ctx->cm_id->port_num))
>> + ucma_copy_ib_route(&resp, &ctx->cm_id->route);
>
> cap_ib_sa()
>
>
>> + else if (rdma_transport_iboe(ctx->cm_id->device, ctx->cm_id-
>>> port_num))
>> + ucma_copy_iboe_route(&resp, &ctx->cm_id->route);
>> + else if (rdma_transport_iwarp(ctx->cm_id->device, ctx->cm_id-
>>> port_num))
>> ucma_copy_iw_route(&resp, &ctx->cm_id->route);
>> - break;
>> - default:
>> - break;
>> - }
>>
>> out:
>> if (copy_to_user((void __user *)(unsigned long)cmd.response,
>
>
> - Sean
>

2015-04-08 11:38:33

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] IB/Verbs: IB Management Helpers

On 4/7/2015 8:25 AM, Michael Wang wrote:
> Mapping List:
> node-type link-layer old-transport new-transport
> nes RNIC ETH IWARP IWARP
> amso1100 RNIC ETH IWARP IWARP
> cxgb3 RNIC ETH IWARP IWARP
> cxgb4 RNIC ETH IWARP IWARP
> usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP
> ocrdma IB_CA ETH IB IBOE
> mlx4 IB_CA IB/ETH IB IB/IBOE
> mlx5 IB_CA IB IB IB
> ehca IB_CA IB IB IB
> ipath IB_CA IB IB IB
> mthca IB_CA IB IB IB
> qib IB_CA IB IB IB

Can I rewind to ask a high-level question - what's the testing
plan for all of this? Do you have folks lined up for verifying
each of these adapters/networks, and what tests will they run?




2015-04-08 12:41:22

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] IB/Verbs: IB Management Helpers



On 04/08/2015 01:38 PM, Tom Talpey wrote:
> On 4/7/2015 8:25 AM, Michael Wang wrote:
>> Mapping List:
>> node-type link-layer old-transport new-transport
>> nes RNIC ETH IWARP IWARP
>> amso1100 RNIC ETH IWARP IWARP
>> cxgb3 RNIC ETH IWARP IWARP
>> cxgb4 RNIC ETH IWARP IWARP
>> usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP
>> ocrdma IB_CA ETH IB IBOE
>> mlx4 IB_CA IB/ETH IB IB/IBOE
>> mlx5 IB_CA IB IB IB
>> ehca IB_CA IB IB IB
>> ipath IB_CA IB IB IB
>> mthca IB_CA IB IB IB
>> qib IB_CA IB IB IB
>
> Can I rewind to ask a high-level question - what's the testing
> plan for all of this? Do you have folks lined up for verifying
> each of these adapters/networks, and what tests will they run?

I think no one can have the access to all these hardware, so we can
only depends on those who accidentally have one to help the testing,
but it's still far from that stage..

Besides, the logical is not very complex in this part, all the
mapping could find corresponding code as proof, so reviewing
carefully then give a public testing on some tree may could be
a plan too?

But yes, I won't be able to give an exhaustive testing by myself
and no one is backing me on that currently :-P me too are waiting
for the answers on how to assure the quality for patch set like this...

Regards,
Michael Wang

>
>
>

2015-04-08 15:52:11

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] IB/Verbs: IB Management Helpers

On Wed, Apr 08, 2015 at 02:41:18PM +0200, Michael Wang wrote:

> I think no one can have the access to all these hardware, so we can
> only depends on those who accidentally have one to help the testing,
> but it's still far from that stage..

I have seen other patches in this style use the compiler to do the
check, if the patch doesn't change the compiled output then it is
obviously OK.

Some careful use of macros might make that possible, but it is a fair
amount of work.

However, that may be the only way to get something this invasive
applied, especially since we've already seen mistakes in the manual
transforms :|

Jason

2015-04-08 16:05:47

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 00/17] IB/Verbs: IB Management Helpers

On 04/08/2015 05:51 PM, Jason Gunthorpe wrote:
> On Wed, Apr 08, 2015 at 02:41:18PM +0200, Michael Wang wrote:
>
>> I think no one can have the access to all these hardware, so we can
>> only depends on those who accidentally have one to help the testing,
>> but it's still far from that stage..
>
> I have seen other patches in this style use the compiler to do the
> check, if the patch doesn't change the compiled output then it is
> obviously OK.
>
> Some careful use of macros might make that possible, but it is a fair
> amount of work.
>
> However, that may be the only way to get something this invasive
> applied, especially since we've already seen mistakes in the manual
> transforms :|

Make sense, I may be able to testing with mlx4 in our lab, but IMHO
review carefully may be more reliable then incomplete testing in
this case, if we have some tree or branch for next staging, that
could be a good place for public testing, but it's haven't reached
that stage yet ;-)

Regards,
Michael Wang


>
> Jason
>

2015-04-08 17:02:22

by Hefty, Sean

[permalink] [raw]
Subject: RE: [PATCH v2 13/17] IB/Verbs: Reform cma/ucma with management helpers

PiBPbiAwNC8wNy8yMDE1IDExOjM2IFBNLCBIZWZ0eSwgU2VhbiB3cm90ZToNCj4gPj4gZGlmZiAt
LWdpdCBhL2RyaXZlcnMvaW5maW5pYmFuZC9jb3JlL2NtYS5jDQo+IGIvZHJpdmVycy9pbmZpbmli
YW5kL2NvcmUvY21hLmMNCj4gPj4gaW5kZXggZDhhOGVhNy4uYzIzZjQ4MyAxMDA2NDQNCj4gPj4g
LS0tIGEvZHJpdmVycy9pbmZpbmliYW5kL2NvcmUvY21hLmMNCj4gPj4gKysrIGIvZHJpdmVycy9p
bmZpbmliYW5kL2NvcmUvY21hLmMNCj4gPj4gQEAgLTQzNSwxMCArNDM1LDEwIEBAIHN0YXRpYyBp
bnQgY21hX3Jlc29sdmVfaWJfZGV2KHN0cnVjdA0KPiByZG1hX2lkX3ByaXZhdGUNCj4gPj4gKmlk
X3ByaXYpDQo+ID4+ICAJcGtleSA9IG50b2hzKGFkZHItPnNpYl9wa2V5KTsNCj4gPj4NCj4gPj4g
IAlsaXN0X2Zvcl9lYWNoX2VudHJ5KGN1cl9kZXYsICZkZXZfbGlzdCwgbGlzdCkgew0KPiA+PiAt
CQlpZiAocmRtYV9ub2RlX2dldF90cmFuc3BvcnQoY3VyX2Rldi0+ZGV2aWNlLT5ub2RlX3R5cGUp
ICE9DQo+ID4+IFJETUFfVFJBTlNQT1JUX0lCKQ0KPiA+PiAtCQkJY29udGludWU7DQo+ID4+IC0N
Cj4gPj4gIAkJZm9yIChwID0gMTsgcCA8PSBjdXJfZGV2LT5kZXZpY2UtPnBoeXNfcG9ydF9jbnQ7
ICsrcCkgew0KPiA+PiArCQkJaWYgKCFyZG1hX2liX21nbXQoY3VyX2Rldi0+ZGV2aWNlLCBwKSkN
Cj4gPj4gKwkJCQljb250aW51ZTsNCj4gPg0KPiA+IFRoaXMgY2hlY2sgd2FudHMgdG8gYmUgc29t
ZXRoaW5nIGxpa2UgaXNfYWZfaWJfc3VwcG9ydGVkKCkuICBDaGVja2luZw0KPiBmb3IgSUIgdHJh
bnNwb3J0IG1heSBhY3R1YWxseSBiZSBiZXR0ZXIgdGhhbiBjaGVja2luZyBmb3IgSUIgbWFuYWdl
bWVudC4NCj4gSSBkb24ndCBrbm93IGlmIElCb0UvUm9DRSBkZXZpY2VzIHN1cHBvcnQgQUZfSUIu
DQo+IA0KPiBUaGUgd3JhcHBlciBtYWtlIHNlbnNlLCBidXQgZG8gd2UgaGF2ZSB0aGUgZ3VhcmFu
dGVlIHRoYXQgSUJvRSBwb3J0IHdvbid0DQo+IGJlIHVzZWQgZm9yIEFGX0lCIGFkZHJlc3M/IEkg
anVzdCBjYW4ndCBsb2NhdGUgdGhlIHBsYWNlIHdlIGZpbHRlcmVkIGl0DQo+IG91dC4uLg0KDQpJ
IGNhbid0IHRoaW5rIG9mIGEgcmVhc29uIHdoeSBJQm9FIHdvdWxkbid0IHdvcmsgd2l0aCBBRl9J
QiwgYnV0IEknbSBub3Qgc3VyZSBpZiBhbnlvbmUgaGFzIHRlc3RlZCBpdC4gIFRoZSBvcmlnaW5h
bCBjaGVjayB3b3VsZCBoYXZlIGxldCBJQm9FIHRocm91Z2guICBXaGVuIEkgc3VnZ2VzdGVkIGNo
ZWNraW5nIGZvciBJQiB0cmFuc3BvcnQsIEkgbWVhbnQgdGhlIGFjdHVhbCB0cmFuc3BvcnQgcHJv
dG9jb2wsIHdoaWNoIHdvdWxkIGhhdmUgaW5jbHVkZWQgYm90aCBJQiBhbmQgSUJvRS4NCg0KPiA+
PiBAQCAtNzAwLDggKzcwMCw3IEBAIHN0YXRpYyBpbnQgY21hX2liX2luaXRfcXBfYXR0cihzdHJ1
Y3QNCj4gcmRtYV9pZF9wcml2YXRlDQo+ID4+ICppZF9wcml2LA0KPiA+PiAgCWludCByZXQ7DQo+
ID4+ICAJdTE2IHBrZXk7DQo+ID4+DQo+ID4+IC0JaWYgKHJkbWFfcG9ydF9nZXRfbGlua19sYXll
cihpZF9wcml2LT5pZC5kZXZpY2UsIGlkX3ByaXYtDQo+ID4+PiBpZC5wb3J0X251bSkgPT0NCj4g
Pj4gLQkgICAgSUJfTElOS19MQVlFUl9JTkZJTklCQU5EKQ0KPiA+PiArCWlmIChyZG1hX3RyYW5z
cG9ydF9pYihpZF9wcml2LT5pZC5kZXZpY2UsIGlkX3ByaXYtPmlkLnBvcnRfbnVtKSkNCj4gPj4g
IAkJcGtleSA9IGliX2FkZHJfZ2V0X3BrZXkoZGV2X2FkZHIpOw0KPiA+PiAgCWVsc2UNCj4gPj4g
IAkJcGtleSA9IDB4ZmZmZjsNCj4gPg0KPiA+IENoZWNrIGhlcmUgc2hvdWxkIGJlIGFnYWluc3Qg
dGhlIGxpbmsgbGF5ZXIsIG5vdCB0cmFuc3BvcnQuDQo+IA0KPiBJIGd1ZXNzIHRoZSBuYW1lIGNv
bmZ1c2luZyB1cyBhZ2Fpbi4uLiB3aGF0IGlmIHVzZSByZG1hX3RlY2hfaWIoKSBoZXJlPw0KPiBp
dCdzIHRoZSBvbmx5IHRlY2ggdXNpbmcgSUIgbGluayBsYXllcnMsIG90aGVycyBhcmUgYWxsIEVU
SC4NCg0KWWVzLCB0aGF0IHdvdWxkIHdvcmsuDQoNCj4gPj4gIAlpZF9wcml2LT5pZC5yb3V0ZS5h
ZGRyLmRldl9hZGRyLmRldl90eXBlID0NCj4gPj4gLQkJKHJkbWFfcG9ydF9nZXRfbGlua19sYXll
cihjbWFfZGV2LT5kZXZpY2UsIHApID09DQo+ID4+IElCX0xJTktfTEFZRVJfSU5GSU5JQkFORCkg
Pw0KPiA+PiArCQkocmRtYV90cmFuc3BvcnRfaWIoY21hX2Rldi0+ZGV2aWNlLCBwKSkgPw0KPiA+
PiAgCQlBUlBIUkRfSU5GSU5JQkFORCA6IEFSUEhSRF9FVEhFUjsNCj4gPg0KPiA+IFRoaXMgd2Fu
dHMgdGhlIGxpbmsgbGF5ZXIsIG9yIG1heWJlIHVzZSBjYXBfaXBvaWIuDQo+IA0KPiBJcyB0aGlz
IHJlbGF0ZWQgd2l0aCBpcG9pYiBvbmx5Pw0KDQpBUlBIRFJfSU5GSU5JQkFORCBpcyByZWxhdGVk
IHRvIGlwb2liLiAgSW4geW91ciBuZXh0IHVwZGF0ZSwgbWF5YmUgZ28gd2l0aCB0ZWNoX2liLiAg
SSBkb24ndCBrbm93IHRoZSBzdGF0dXMgb2YgaXBvaWIgb3ZlciBpYm9lLg0KDQo=

2015-04-08 18:30:44

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Tue, 2015-04-07 at 14:42 +0200, Michael Wang wrote:
> Add new callback query_transport() and implement for each HW.

My response here is going to be a long email, but that's because it's
easier to respond to the various patches all in one response in order to
preserve context. So, while I'm responding to patch 1 of 17, my
response will cover all 17 patches in whole.

> Mapping List:
> node-type link-layer old-transport new-transport
> nes RNIC ETH IWARP IWARP
> amso1100 RNIC ETH IWARP IWARP
> cxgb3 RNIC ETH IWARP IWARP
> cxgb4 RNIC ETH IWARP IWARP
> usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP
> ocrdma IB_CA ETH IB IBOE
> mlx4 IB_CA IB/ETH IB IB/IBOE
> mlx5 IB_CA IB IB IB
> ehca IB_CA IB IB IB
> ipath IB_CA IB IB IB
> mthca IB_CA IB IB IB
> qib IB_CA IB IB IB
>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Doug Ledford <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Sean Hefty <[email protected]>
> Signed-off-by: Michael Wang <[email protected]>
> ---
> drivers/infiniband/core/device.c | 1 +
> drivers/infiniband/core/verbs.c | 4 +++-
> drivers/infiniband/hw/amso1100/c2_provider.c | 7 +++++++
> drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 +++++++
> drivers/infiniband/hw/cxgb4/provider.c | 7 +++++++
> drivers/infiniband/hw/ehca/ehca_hca.c | 6 ++++++
> drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 +++
> drivers/infiniband/hw/ehca/ehca_main.c | 1 +
> drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +++++++
> drivers/infiniband/hw/mlx4/main.c | 10 ++++++++++
> drivers/infiniband/hw/mlx5/main.c | 7 +++++++
> drivers/infiniband/hw/mthca/mthca_provider.c | 7 +++++++
> drivers/infiniband/hw/nes/nes_verbs.c | 6 ++++++
> drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 +
> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 ++++++
> drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 +++
> drivers/infiniband/hw/qib/qib_verbs.c | 7 +++++++
> drivers/infiniband/hw/usnic/usnic_ib_main.c | 1 +
> drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 ++++++
> drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2 ++
> include/rdma/ib_verbs.h | 7 ++++++-
> 21 files changed, 104 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> index 18c1ece..a9587c4 100644
> --- a/drivers/infiniband/core/device.c
> +++ b/drivers/infiniband/core/device.c
> @@ -76,6 +76,7 @@ static int ib_device_check_mandatory(struct ib_device *device)
> } mandatory_table[] = {
> IB_MANDATORY_FUNC(query_device),
> IB_MANDATORY_FUNC(query_port),
> + IB_MANDATORY_FUNC(query_transport),
> IB_MANDATORY_FUNC(query_pkey),
> IB_MANDATORY_FUNC(query_gid),
> IB_MANDATORY_FUNC(alloc_pd),

I'm concerned about the performance implications of this. The size of
this patchset already points out just how many places in the code we
have to check for various aspects of the device transport in order to do
the right thing. Without going through the entire list to see how many
are on critical hot paths, I'm sure some of them are on at least
partially critical hot paths (like creation of new connections). I
would prefer to see this change be implemented via a device attribute,
not a functional call query. That adds a needless function call in
these paths.

> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index f93eb8d..83370de 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -133,14 +133,16 @@ enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_
> if (device->get_link_layer)
> return device->get_link_layer(device, port_num);
>
> - switch (rdma_node_get_transport(device->node_type)) {
> + switch (device->query_transport(device, port_num)) {
> case RDMA_TRANSPORT_IB:
> + case RDMA_TRANSPORT_IBOE:
> return IB_LINK_LAYER_INFINIBAND;

If we are perserving ABI, then this looks wrong. Currently, IBOE
returnsi transport IB and link layer Ethernet. It should not return
link layer IB, it does not support IB link layer operations (such as MAD
access).

> case RDMA_TRANSPORT_IWARP:
> case RDMA_TRANSPORT_USNIC:
> case RDMA_TRANSPORT_USNIC_UDP:
> return IB_LINK_LAYER_ETHERNET;
> default:
> + BUG();
> return IB_LINK_LAYER_UNSPECIFIED;
> }
> }

[ snip ]

> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 65994a1..d54f91e 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -75,10 +75,13 @@ enum rdma_node_type {
> };
>
> enum rdma_transport_type {
> + /* legacy for users */
> RDMA_TRANSPORT_IB,
> RDMA_TRANSPORT_IWARP,
> RDMA_TRANSPORT_USNIC,
> - RDMA_TRANSPORT_USNIC_UDP
> + RDMA_TRANSPORT_USNIC_UDP,
> + /* new transport */
> + RDMA_TRANSPORT_IBOE,
> };

I'm also concerned about this. I would like to see this enum
essentially turned into a bitmap. One that is constructed in such a way
that we can always get the specific test we need with only one compare
against the overall value. In order to do so, we need to break it down
into the essential elements that are part of each of the transports.
So, for instance, we can define the two link layers we have so far, plus
reserve one for OPA which we know is coming:

RDMA_LINK_LAYER_IB = 0x00000001,
RDMA_LINK_LAYER_ETH = 0x00000002,
RDMA_LINK_LAYER_OPA = 0x00000004,
RDMA_LINK_LAYER_MASK = 0x0000000f,

We can then define the currently known high level transport types:

RDMA_TRANSPORT_IB = 0x00000010,
RDMA_TRANSPORT_IWARP = 0x00000020,
RDMA_TRANSPORT_USNIC = 0x00000040,
RDMA_TRANSPORT_USNIC_UDP = 0x00000080,
RDMA_TRANSPORT_MASK = 0x000000f0,

We could then define bits for the IB management types:

RDMA_MGMT_IB = 0x00000100,
RDMA_MGMT_OPA = 0x00000200,
RDMA_MGMT_MASK = 0x00000f00,

Then we have space to define specific quirks:

RDMA_SEPARATE_READ_SGE = 0x00001000,
RDMA_QUIRKS_MASK = 0xfffff000

Once those are defined, a few definitions for device drivers to use when
they initialize a device to set the bitmap to the right values:

#define IS_IWARP (RDMA_LINK_LAYER_ETH | RDMA_TRANSPORT_IWARP |
RDMA_SEPARATE_READ_SGE)
#define IS_IB (RDMA_LINK_LAYER_IB | RDMA_TRANSPORT_IB | RDMA_MGMT_IB)
#define IS_IBOE (RDMA_LINK_LAYER_ETH | RDMA_TRANSPORT_IB)
#define IS_USNIC (RDMA_LINK_LAYER_ETH | RDMA_TRANSPORT_USNIC)
#define IS_OPA (RDMA_LINK_LAYER_OPA | RDMA_TRANSPORT_IB | RDMA_MGMT_IB |
RDMA_MGMT_OPA)

Then you need to define the tests:

static inline bool
rdma_transport_is_iwarp(struct ib_device *dev, u8 port)
{
return dev->port[port]->transport & RDMA_TRANSPORT_IWARP;
}

/* Note: this intentionally covers IB, IBOE, and OPA...use
rdma_dev_is_ib if you want to only get physical IB devices */
static inline bool
rdma_transport_is_ib(struct ibdev *dev)
{
return dev->port[port]->transport & RDMA_TRANSPORT_IB;
}

rdma_port_is_ib(struct ib_device *dev, u8 port)
{
return dev->port[port]->transport & RDMA_LINK_LAYER_IB;
}

rdma_port_is_iboe(struct ib_device *dev, u8 port)
{
return dev->port[port]->transport & IS_IBOE == IS_IBOE;
}

rdma_port_is_usnic(struct ib_device *dev, u8 port)
{
return dev->port[port]->transport & RDMA_TRANSPORT_USNIC;
}

rdma_port_is_opa(struct ib_device *dev, u8 port)
{
return dev->port[port]->transport & RDMA_LINK_LAYER_OPA;
}

rdma_port_is_iwarp(struct ib_device *dev, u8 port)
{
return rdma_transport_is_iwarp(dev, port);
}

rdma_port_ib_fabric_mgmt(struct ibdev *dev, u8 port)
{
return dev->port[port]->transport & RDMA_MGMT_IB;
}

rdma_port_opa_mgmt(struct ibdev *dev, u8 port)
{
return dev->port[port]->transport & RDMA_MGMT_OPA;
}

Other things can be changed too. Like rdma_port_get_link_layer can
become this:

{
return dev->transport & RDMA_LINK_LAYER_MASK;
}

From patch 2/17:


> +static inline int rdma_ib_mgmt(struct ib_device *device, u8 port_num)
> +{
> + enum rdma_transport_type tp = device->query_transport(device,
> port_num);
> +
> + return (tp == RDMA_TRANSPORT_IB || tp == RDMA_TRANSPORT_IBOE);
> +}

This looks wrong. IBOE doesn't have IB management. At least it doesn't
have subnet management.

Actually, reading through the remainder of the patches, there is some
serious confusion taking place here. In later patches, you use this as
a surrogate for cap_cm, which implies you are talking about connection
management. This is very different than the rdma_dev_ib_mgmt() test
that I create above, which specifically refers to IB management tasks
unique to IB/OPA: MAD, SM, multicast.

The kernel connection management code is not really limited. It
supports IB, IBOE, iWARP, and in the future it will support OPA. There
are some places in the CM code were we test for just IB/IBOE currently,
but that's only because we split iWARP out higher up in the abstraction
hierarchy. So, calling something rdma_ib_mgmt and meaning a rather
specialized tested in the CM is probably misleading.

To straighten all this out, lets break management out into the two
distinct types:

rdma_port_ib_fabric_mgmt() <- fabric specific management tasks: MAD, SM,
multicast. The proper test for this with my bitmap above is a simple
transport & RDMA_MGMT_IB test. If will be true for IB and OPA fabrics.

rdma_port_conn_mgmt() <- connection management, which we currently
support everything except USNIC (correct Sean?), so a test would be
something like !(transport & RDMA_TRANSPORT_USNIC). This is then split
out into two subgroups, IB style and iWARP stype connection management
(aka, rdma_port_iw_conn_mgmt() and rdma_port_ib_conn_mgmt()). In my
above bitmap, since I didn't give IBOE its own transport type, these
subgroups still boil down to the simple tests transport & iWARP and
transport & IB like they do today.

From patch 3/17:


> +/**
> + * cap_ib_mad - Check if the port of device has the capability
> Infiniband
> + * Management Datagrams.
> + *
> + * @device: Device to be checked
> + * @port_num: Port number of the device
> + *
> + * Return 0 when port of the device don't support Infiniband
> + * Management Datagrams.
> + */
> +static inline int cap_ib_mad(struct ib_device *device, u8 port_num)
> +{
> + return rdma_ib_mgmt(device, port_num);
> +}
> +

Why add cap_ib_mad? It's nothing more than rdma_port_ib_fabric_mgmt
with a new name. Just use rdma_port_ib_fabric_mgmt() everywhere you
have cap_ib_mad.

From patch 4/17:


> +/**
> + * cap_ib_smi - Check if the port of device has the capability
> Infiniband
> + * Subnet Management Interface.
> + *
> + * @device: Device to be checked
> + * @port_num: Port number of the device
> + *
> + * Return 0 when port of the device don't support Infiniband
> + * Subnet Management Interface.
> + */
> +static inline int cap_ib_smi(struct ib_device *device, u8 port_num)
> +{
> + return rdma_transport_ib(device, port_num);
> +}
> +

Same as the previous patch. This is needless indirection. Just use
rdma_port_ib_fabric_mgmt directly.

Patch 5/17:

Again, just use rdma_port_ib_conn_mgmt() directly.

Patch 6/17:

Again, just use rdma_port_ib_fabric_mgmt() directly.

Patch 7/17:

Again, just use rdma_port_ib_fabric_mgmt() directly. It's perfectly
applicable to the IB mcast registration requirements.

Patch 8/17:

Here we can create a new test if we are using the bitmap I created
above:

rdma_port_ipoib(struct ib_device *dev, u8 port)
{
return !(dev->port[port]->transport & RDMA_LINK_LAYER_ETH);
}

This is presuming that OPA will need ipoib devices. This will cause all
non-Ethernet link layer devices to return true, and right now, that is
all IB and all OPA devices.

Patch 9/17:

Most of the other comments on this patch stand as they are. I would add
the test:

rdma_port_separate_read_sge(dev, port)
{
return dev->port[port]->transport & RDMA_SEPERATE_READ_SGE;
}

and add the helper function:

rdma_port_get_read_sge(dev, port)
{
if (rdma_transport_is_iwarp)
return 1;
return dev->port[port]->max_sge;
}

Then, as Jason points out, if at some point in the future the kernel is
modified to support devices with assymetrical read/write SGE sizes, this
function can be modified to support those devices.

Patch 10/17:

As Sean pointed out, force_grh should be rdma_dev_is_iboe(). The cm
handles iw devices, but you notice all of the functions you modify here
start with ib_. The iwarp connections are funneled through iw_ specific
function variants, and so even though the cm handles iwarp, ib, and roce
devices, you never see anything other than ib/iboe (and opa in the
future) get to the ib_ variants of the functions. So, they wrote the
original tests as tests against the link layer being ethernet and used
that to differentiate between ib and iboe devices. It works, but can
confuse people. So, everyplace that has !rdma_transport_ib should
really be rdma_dev_is_iboe instead. If we ever merge the iw_ and ib_
functions in the future, having this right will help avoid problems.

Patch 11/17:

I wouldn't reform the link_layer_show except to make it compile with the
new defines I used above.

Patch 12/17:

Go ahead and add a helper to check all ports on a dev, just make it
rdma_hca_ib_conn_mgmt() and have it loop through
rdma_port_ib_conn_mgmt() return codes.

Patch 13/17:

This patch is largely unneeded if we reworked the bitmap like I have
above. A lot of the changes you made to switch from case statements to
multiple if statements can go back to being case statements because in
the bitmap IB and IBOE are still both transport IB, so you just do the
case on the transport bits and not on the link layer bits.

Patch 14/17:

Seems ok.

Patch 15/17:

If you implement the bitmap like I list above, then this code will need
fixed up to use the bitmap. Otherwise it looks OK.

Patch 16/17:

OK.

Patch 17/17:

I would drop this patch. In the future, the mlx5 driver will support
both Ethernet and IB like mlx4 does, and we would just need to pull this
code back to core instead of only in mlx4.


--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-08 18:41:25

by Hefty, Sean

[permalink] [raw]
Subject: RE: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

PiBJJ20gY29uY2VybmVkIGFib3V0IHRoZSBwZXJmb3JtYW5jZSBpbXBsaWNhdGlvbnMgb2YgdGhp
cy4gIFRoZSBzaXplIG9mDQo+IHRoaXMgcGF0Y2hzZXQgYWxyZWFkeSBwb2ludHMgb3V0IGp1c3Qg
aG93IG1hbnkgcGxhY2VzIGluIHRoZSBjb2RlIHdlDQo+IGhhdmUgdG8gY2hlY2sgZm9yIHZhcmlv
dXMgYXNwZWN0cyBvZiB0aGUgZGV2aWNlIHRyYW5zcG9ydCBpbiBvcmRlciB0byBkbw0KPiB0aGUg
cmlnaHQgdGhpbmcuICBXaXRob3V0IGdvaW5nIHRocm91Z2ggdGhlIGVudGlyZSBsaXN0IHRvIHNl
ZSBob3cgbWFueQ0KPiBhcmUgb24gY3JpdGljYWwgaG90IHBhdGhzLCBJJ20gc3VyZSBzb21lIG9m
IHRoZW0gYXJlIG9uIGF0IGxlYXN0DQo+IHBhcnRpYWxseSBjcml0aWNhbCBob3QgcGF0aHMgKGxp
a2UgY3JlYXRpb24gb2YgbmV3IGNvbm5lY3Rpb25zKS4gIEkNCj4gd291bGQgcHJlZmVyIHRvIHNl
ZSB0aGlzIGNoYW5nZSBiZSBpbXBsZW1lbnRlZCB2aWEgYSBkZXZpY2UgYXR0cmlidXRlLA0KPiBu
b3QgYSBmdW5jdGlvbmFsIGNhbGwgcXVlcnkuICBUaGF0IGFkZHMgYSBuZWVkbGVzcyBmdW5jdGlv
biBjYWxsIGluDQo+IHRoZXNlIHBhdGhzLg0KDQpNeSBpbXByZXNzaW9uIG9mIHRoZXNlIGNoYW5n
ZXMgd2VyZSB0aGF0IHRoZXkgd291bGQgZXZlbnR1YWxseSBsZWFkIHRvIHRoZSBtZWNoYW5pc20g
dGhhdCB5b3Ugb3V0bGluZWQ6IA0KDQoNCj4gSSdtIGFsc28gY29uY2VybmVkIGFib3V0IHRoaXMu
ICBJIHdvdWxkIGxpa2UgdG8gc2VlIHRoaXMgZW51bQ0KPiBlc3NlbnRpYWxseSB0dXJuZWQgaW50
byBhIGJpdG1hcC4gIE9uZSB0aGF0IGlzIGNvbnN0cnVjdGVkIGluIHN1Y2ggYSB3YXkNCj4gdGhh
dCB3ZSBjYW4gYWx3YXlzIGdldCB0aGUgc3BlY2lmaWMgdGVzdCB3ZSBuZWVkIHdpdGggb25seSBv
bmUgY29tcGFyZQ0KPiBhZ2FpbnN0IHRoZSBvdmVyYWxsIHZhbHVlLiAgSW4gb3JkZXIgdG8gZG8g
c28sIHdlIG5lZWQgdG8gYnJlYWsgaXQgZG93bg0KPiBpbnRvIHRoZSBlc3NlbnRpYWwgZWxlbWVu
dHMgdGhhdCBhcmUgcGFydCBvZiBlYWNoIG9mIHRoZSB0cmFuc3BvcnRzLg0KPiBTbywgZm9yIGlu
c3RhbmNlLCB3ZSBjYW4gZGVmaW5lIHRoZSB0d28gbGluayBsYXllcnMgd2UgaGF2ZSBzbyBmYXIs
IHBsdXMNCj4gcmVzZXJ2ZSBvbmUgZm9yIE9QQSB3aGljaCB3ZSBrbm93IGlzIGNvbWluZzoNCj4g
DQo+IFJETUFfTElOS19MQVlFUl9JQiAgICAgICA9IDB4MDAwMDAwMDEsDQo+IFJETUFfTElOS19M
QVlFUl9FVEggICAgICA9IDB4MDAwMDAwMDIsDQo+IFJETUFfTElOS19MQVlFUl9PUEEgICAgICA9
IDB4MDAwMDAwMDQsDQo+IFJETUFfTElOS19MQVlFUl9NQVNLICAgICA9IDB4MDAwMDAwMGYsDQo+
IA0KPiBXZSBjYW4gdGhlbiBkZWZpbmUgdGhlIGN1cnJlbnRseSBrbm93biBoaWdoIGxldmVsIHRy
YW5zcG9ydCB0eXBlczoNCj4gDQo+IFJETUFfVFJBTlNQT1JUX0lCICAgICAgICA9IDB4MDAwMDAw
MTAsDQo+IFJETUFfVFJBTlNQT1JUX0lXQVJQICAgICA9IDB4MDAwMDAwMjAsDQo+IFJETUFfVFJB
TlNQT1JUX1VTTklDICAgICA9IDB4MDAwMDAwNDAsDQo+IFJETUFfVFJBTlNQT1JUX1VTTklDX1VE
UCA9IDB4MDAwMDAwODAsDQo+IFJETUFfVFJBTlNQT1JUX01BU0sgICAgICA9IDB4MDAwMDAwZjAs
DQo+IA0KPiBXZSBjb3VsZCB0aGVuIGRlZmluZSBiaXRzIGZvciB0aGUgSUIgbWFuYWdlbWVudCB0
eXBlczoNCj4gDQo+IFJETUFfTUdNVF9JQiAgICAgICAgICAgICA9IDB4MDAwMDAxMDAsDQo+IFJE
TUFfTUdNVF9PUEEgICAgICAgICAgICA9IDB4MDAwMDAyMDAsDQo+IFJETUFfTUdNVF9NQVNLICAg
ICAgICAgICA9IDB4MDAwMDBmMDAsDQo+IA0KPiBUaGVuIHdlIGhhdmUgc3BhY2UgdG8gZGVmaW5l
IHNwZWNpZmljIHF1aXJrczoNCj4gDQo+IFJETUFfU0VQQVJBVEVfUkVBRF9TR0UgICA9IDB4MDAw
MDEwMDAsDQo+IFJETUFfUVVJUktTX01BU0sgICAgICAgICA9IDB4ZmZmZmYwMDANCg0KSSB0b28g
d291bGQgbGlrZSB0byBzZWUgdGhpcyBhcyB0aGUgZW5kIHJlc3VsdCwgYnV0IEkgdGhpbmsgaXQn
cyBwb3NzaWJsZSB0byBzdGFnZSB0aGUgY2hhbmdlcyBieSBoYXZpbmcgdGhlIHN0YXRpYyBpbmxp
bmUgY2FsbHMgYmVpbmcgYWRkZWQgY29udmVydCB0byB1c2luZyB0aGVzZSBzb3J0IG9mIGF0dHJp
YnV0ZXMuDQoNCi0gU2Vhbg0K

2015-04-08 19:36:21

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Wed, Apr 08, 2015 at 06:41:22PM +0000, Hefty, Sean wrote:

> I too would like to see this as the end result, but I think it's
> possible to stage the changes by having the static inline calls
> being added convert to using these sort of attributes.

I agree as well, this patch set is already so big.

But Doug may be right, this conversion may need to be part of the
series that is applied in one go for performance reasons. But that is
just a patch at the end to optimize the inlines calls.

Jason

2015-04-08 20:10:27

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Wed, Apr 08, 2015 at 02:29:46PM -0400, Doug Ledford wrote:

> To straighten all this out, lets break management out into the two
> distinct types:
>
> rdma_port_ib_fabric_mgmt() <- fabric specific management tasks: MAD, SM,
> multicast. The proper test for this with my bitmap above is a simple
> transport & RDMA_MGMT_IB test. If will be true for IB and OPA fabrics.

> rdma_port_conn_mgmt() <- connection management, which we currently
> support everything except USNIC (correct Sean?), so a test would be
> something like !(transport & RDMA_TRANSPORT_USNIC). This is then split
> out into two subgroups, IB style and iWARP stype connection management
> (aka, rdma_port_iw_conn_mgmt() and rdma_port_ib_conn_mgmt()). In my
> above bitmap, since I didn't give IBOE its own transport type, these
> subgroups still boil down to the simple tests transport & iWARP and
> transport & IB like they do today.

There is a lot more variation here than just these two tests, and those
two tests won't scale to include OPA.

IB ROCEE OPA
SMI Y N Y (though the OPA smi looked a bit different)
IB SMP Y N N
OPA SMP N N Y
GMP Y Y Y
SA Y N Y
PM Y Y Y (? guessing for OPA)
CM Y Y Y
GMP needs GRH N Y N

It may be unrealistic, but I was hoping we could largely scrub the
opaque 'is spec iWARP, is spec ROCEE' kinds of tests because they
don't tell anyone what it is the code cares about.

Maybe what is needed is a more precise language for the functions:

> > + * cap_ib_mad - Check if the port of device has the capability
> > Infiniband
> > + * Management Datagrams.

As used this seems to mean:

True if the port can do IB/OPA SMP, or GMP management packets on QP0 or
QP1. (Y Y Y) ie: Do we need the MAD layer at all.

ib_smi seems to be true if QP0 is supported (Y N Y)

Maybe the above set would make a lot more sense as:
cap_ib_qp0
cap_ib_qp1
cap_opa_qp0

ib_cm seems to mean that the CM protocol from the IBA is used on the
port (Y Y Y)

ib_sa means the IBA SA protocol is supported (Y Y Y)

ib_mcast true if the IBA SA protocol is used for multicast GIDs (Y N
Y)

ipoib means the port supports the ipoib protocol (Y N ?)

This seem reasonable and understandable, even if they are currently a
bit duplicating.

> Patch 9/17:
>
> Most of the other comments on this patch stand as they are. I would add
> the test:
>
> rdma_port_separate_read_sge(dev, port)
> {
> return dev->port[port]->transport & RDMA_SEPERATE_READ_SGE;
> }
>
> and add the helper function:
>
> rdma_port_get_read_sge(dev, port)
> {
> if (rdma_transport_is_iwarp)
> return 1;
> return dev->port[port]->max_sge;
> }

Hum, that is nice, but it doesn't quite fit with how the ULP needs to
work. The max limit when creating a WR is the value passed into the
qp_cap, not the device maximum limit.

To do this properly we need to extend the qp_cap, and that is just too
big a change. A one bit iWarp quirk is OK for now.

> As Sean pointed out, force_grh should be rdma_dev_is_iboe(). The cm

I actually really prefer cap_mandatory_grh - that is what is going on
here. ie based on that name (as a reviewer) I'd expect to see the mad
layer check that the mandatory GRH is always present, or blow up.

Some of the other checks in this file revolve around pkey, I'm not
sure what rocee does there? cap_pkey_supported ?

Jason

2015-04-08 20:55:41

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On 4/8/2015 4:10 PM, Jason Gunthorpe wrote:
> On Wed, Apr 08, 2015 at 02:29:46PM -0400, Doug Ledford wrote:
>
>...
>>
>> rdma_port_get_read_sge(dev, port)
>> {
>> if (rdma_transport_is_iwarp)
>> return 1;
>> return dev->port[port]->max_sge;
>> }
>
> Hum, that is nice, but it doesn't quite fit with how the ULP needs to
> work. The max limit when creating a WR is the value passed into the
> qp_cap, not the device maximum limit.

Agreed, and I will again say that not all devices necessarily support
the same max_sge for all WR types. The current one-size-fits-all API
may make the upper layer think so, but it's possibly being lied to.

> To do this properly we need to extend the qp_cap, and that is just too
> big a change. A one bit iWarp quirk is OK for now.

Yes, it would be a large-ish change, and I like Doug's choice of word
"quirk" to capture these as exceptions, until the means for addressing
them is decided.

Overall, I like Doug's proposals, especially from an upper layer
perspective. I might suggest further refining them into categories,
perhaps "management", primarily of interest to kernel and the
plumbing of connections; and actual "RDMA semantics", of interest
to RDMA consumers.


2015-04-09 05:37:07

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v2 10/17] IB/Verbs: Adopt management helpers for IB helpers

On Tue, Apr 07, 2015 at 03:16:30PM -0500, Steve Wise wrote:
>
>
> > -----Original Message-----
> > From: Jason Gunthorpe [mailto:[email protected]]
> > Sent: Tuesday, April 07, 2015 3:13 PM
> > To: Michael Wang
> > Cc: Roland Dreier; Sean Hefty; [email protected]; [email protected]; [email protected];
> > [email protected]; Hal Rosenstock; Tom Tucker; Steve Wise; Hoang-Nam Nguyen; Christoph Raisch; Mike Marciniszyn; Eli Cohen;
> > Faisal Latif; Upinder Malhi; Trond Myklebust; J. Bruce Fields; David S. Miller; Ira Weiny; PJ Waskiewicz; Tatyana Nikolova; Or
> Gerlitz; Jack
> > Morgenstein; Haggai Eran; Ilya Nelkenbaum; Yann Droneaud; Bart Van Assche; Shachar Raindel; Sagi Grimberg; Devesh Sharma; Matan
> > Barak; Moni Shoua; Jiri Kosina; Selvin Xavier; Mitesh Ahuja; Li RongQing; Rasmus Villemoes; Alex Estrin; Doug Ledford; Eric
> Dumazet; Erez
> > Shitrit; Tom Gundersen; Chuck Lever
> > Subject: Re: [PATCH v2 10/17] IB/Verbs: Adopt management helpers for IB helpers
> >
> > On Tue, Apr 07, 2015 at 02:35:22PM +0200, Michael Wang wrote:
> > > index f704254..4e61104 100644
> > > +++ b/drivers/infiniband/core/sa_query.c
> > > @@ -540,7 +540,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
> > > ah_attr->port_num = port_num;
> > > ah_attr->static_rate = rec->rate;
> > >
> > > - force_grh = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_ETHERNET;
> > > + force_grh = !rdma_transport_ib(device, port_num);
> >
> > Maybe these tests should be called cap_mandatory_grh - but I'm not
> > really sure how iWarp uses the GRH fields in the AH...
> >
>
> iWARP runs on top of TCP...this SA code is all IB-specific. The reason it was checking for ETHERNET, I think, is for RoCE. So
> this change is totally incorrect, I think, because RoCE is an IB transport, but it runs on ETHERNET.

But RoCE does not have an SA?

Looks like ib_init_ah_from_path was overloaded to handle non-standard "path
records".

It seems like the correct functionality would be to use ib_init_ah_from_path()
for true SA PathRecords and have another call iboe_init_ah() wrap
ib_init_ah_from_path() when RoCE address information is needed in the AH.

For Michaels patches I think

force_grh = rdma_device_is_iboe(...)

is the logic we need here.

Ira


>
> Steve.
>
>
>

2015-04-09 08:05:29

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 13/17] IB/Verbs: Reform cma/ucma with management helpers

On 04/08/2015 07:02 PM, Hefty, Sean wrote:
[snip]
>>
>> The wrapper make sense, but do we have the guarantee that IBoE port won't
>> be used for AF_IB address? I just can't locate the place we filtered it
>> out...
>
> I can't think of a reason why IBoE wouldn't work with AF_IB, but I'm not sure if anyone has tested it. The original check would have let IBoE through. When I suggested checking for IB transport, I meant the actual transport protocol, which would have included both IB and IBoE.

Got it :-)

>
>>>> @@ -700,8 +700,7 @@ static int cma_ib_init_qp_attr(struct
[snip]
>
>>>> id_priv->id.route.addr.dev_addr.dev_type =
>>>> - (rdma_port_get_link_layer(cma_dev->device, p) ==
>>>> IB_LINK_LAYER_INFINIBAND) ?
>>>> + (rdma_transport_ib(cma_dev->device, p)) ?
>>>> ARPHRD_INFINIBAND : ARPHRD_ETHER;
>>>
>>> This wants the link layer, or maybe use cap_ipoib.
>>
>> Is this related with ipoib only?
>
> ARPHDR_INFINIBAND is related to ipoib. In your next update, maybe go with tech_ib. I don't know the status of ipoib over iboe.

Will be in next version :-)

Regards,
Michael Wang

>

2015-04-09 09:45:20

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On 04/08/2015 10:10 PM, Jason Gunthorpe wrote:
[snip]
>
>> As Sean pointed out, force_grh should be rdma_dev_is_iboe(). The cm
>
> I actually really prefer cap_mandatory_grh - that is what is going on
> here. ie based on that name (as a reviewer) I'd expect to see the mad
> layer check that the mandatory GRH is always present, or blow up.

Sounds good, will be in next version :-)

Regards,
Michael Wang

>
> Some of the other checks in this file revolve around pkey, I'm not
> sure what rocee does there? cap_pkey_supported ?
>
> Jason
>

2015-04-09 12:42:29

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On 04/08/2015 10:10 PM, Jason Gunthorpe wrote:
[snip]
>
> Some of the other checks in this file revolve around pkey, I'm not
> sure what rocee does there? cap_pkey_supported ?

I'm not sure if this count in capability... how shall we describe it?

Regards,
Michael Wang

>
> Jason
>

2015-04-09 14:35:51

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Wed, 2015-04-08 at 14:10 -0600, Jason Gunthorpe wrote:
> On Wed, Apr 08, 2015 at 02:29:46PM -0400, Doug Ledford wrote:
>
> > To straighten all this out, lets break management out into the two
> > distinct types:
> >
> > rdma_port_ib_fabric_mgmt() <- fabric specific management tasks: MAD, SM,
> > multicast. The proper test for this with my bitmap above is a simple
> > transport & RDMA_MGMT_IB test. If will be true for IB and OPA fabrics.
>
> > rdma_port_conn_mgmt() <- connection management, which we currently
> > support everything except USNIC (correct Sean?), so a test would be
> > something like !(transport & RDMA_TRANSPORT_USNIC). This is then split
> > out into two subgroups, IB style and iWARP stype connection management
> > (aka, rdma_port_iw_conn_mgmt() and rdma_port_ib_conn_mgmt()). In my
> > above bitmap, since I didn't give IBOE its own transport type, these
> > subgroups still boil down to the simple tests transport & iWARP and
> > transport & IB like they do today.
>
> There is a lot more variation here than just these two tests, and those
> two tests won't scale to include OPA.
>
> IB ROCEE OPA
> SMI Y N Y (though the OPA smi looked a bit different)
> IB SMP Y N N
> OPA SMP N N Y
> GMP Y Y Y
> SA Y N Y
> PM Y Y Y (? guessing for OPA)
> CM Y Y Y
> GMP needs GRH N Y N
>

You can still break this down to a manageable bitmap.

SMI, SMP, and SA are all essentially the same and can be combined to one
bitmap that is

IB_SM 0x1
OPA_SM 0x2

and the defines are such that IB devices define IB_SM, and OPA devices
define IB_SM and OPA_SM. Any minor differences between OPA and IB can
be handled by testing just the OPA_SM bit. This will exclude all IBOE
devices and iWARP devices.

GMP, PM, and CM are all the same, and are all identical to transport ==
INFINIBAND.

GMP needs GRH happens to be precisely the same as ib_dev_is_iboe.

These are exactly the tests I proposed Jason. I'm not sure I see your
point here. I guess my point is that although the scenario of all the
different items seems complex, it really does boil down to needing only
exactly what I proposed earlier to fulfill the entire test matrix.


--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-09 16:01:27

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Thu, Apr 09, 2015 at 10:34:30AM -0400, Doug Ledford wrote:

> These are exactly the tests I proposed Jason. I'm not sure I see your
> point here. I guess my point is that although the scenario of all the
> different items seems complex, it really does boil down to needing only
> exactly what I proposed earlier to fulfill the entire test matrix.

I have no problem with minimizing a bitmap, but I want the accessors
to make sense first.

My specific problem with your suggestion was combining cap_ib_mad,
cap_ib_sa, and cap_ib_smi into rdma_port_ib_fabric_mgmt.

Not only do the three cap things not return the same value for all
situations, the documentary knowledge is lost by the reduction.

I'd prefer we look at this from a 'what do the call sites need' view,
not a 'how do we minimize' view.

I've written this before: The mess here is that it is too hard to know
what the call sites are actually checking for when it is some baroque
conditional.

Jason

2015-04-09 16:01:23

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Thu, Apr 09, 2015 at 02:42:24PM +0200, Michael Wang wrote:
> On 04/08/2015 10:10 PM, Jason Gunthorpe wrote:
> [snip]
> >
> > Some of the other checks in this file revolve around pkey, I'm not
> > sure what rocee does there? cap_pkey_supported ?
>
> I'm not sure if this count in capability... how shall we describe it?

I'm not sure how rocee uses pkey, but maybe the the GRH and pkey thing
would work well together under a single 'cap_ethernet_ah' ?

Jason

2015-04-09 21:19:17

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Thu, 2015-04-09 at 10:01 -0600, Jason Gunthorpe wrote:
> On Thu, Apr 09, 2015 at 10:34:30AM -0400, Doug Ledford wrote:
>
> > These are exactly the tests I proposed Jason. I'm not sure I see your
> > point here. I guess my point is that although the scenario of all the
> > different items seems complex, it really does boil down to needing only
> > exactly what I proposed earlier to fulfill the entire test matrix.
>
> I have no problem with minimizing a bitmap, but I want the accessors
> to make sense first.
>
> My specific problem with your suggestion was combining cap_ib_mad,
> cap_ib_sa, and cap_ib_smi into rdma_port_ib_fabric_mgmt.
>
> Not only do the three cap things not return the same value for all
> situations, the documentary knowledge is lost by the reduction.
>
> I'd prefer we look at this from a 'what do the call sites need' view,
> not a 'how do we minimize' view.
>
> I've written this before: The mess here is that it is too hard to know
> what the call sites are actually checking for when it is some baroque
> conditional.

The two goals: being specific about what the test is returning and
minimizing the bitmap footprint; are not necessarily opposed. One can
do both at the same time.

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-09 21:37:07

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Thu, Apr 09, 2015 at 05:19:08PM -0400, Doug Ledford wrote:

> The two goals: being specific about what the test is returning and
> minimizing the bitmap footprint; are not necessarily opposed. One can
> do both at the same time.

Agree

Jason

2015-04-10 06:16:30

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

First off there are 2 separate issues here:

1) We need to communicate if a port supports or requires various management
support from the ib_mad, ib_cm, and/or ib_sa modules.

2) We need to communicate how a addresses are formated and resolved for a
particular port


In general I don't think we need to remove all uses of the Transport
or Link Layer.

Although we may be able to remove most of the transport uses.

On Wed, Apr 08, 2015 at 02:10:15PM -0600, Jason Gunthorpe wrote:
> On Wed, Apr 08, 2015 at 02:29:46PM -0400, Doug Ledford wrote:
>
> > To straighten all this out, lets break management out into the two
> > distinct types:
> >
> > rdma_port_ib_fabric_mgmt() <- fabric specific management tasks: MAD, SM,
> > multicast. The proper test for this with my bitmap above is a simple
> > transport & RDMA_MGMT_IB test. If will be true for IB and OPA fabrics.
>
> > rdma_port_conn_mgmt() <- connection management, which we currently
> > support everything except USNIC (correct Sean?), so a test would be
> > something like !(transport & RDMA_TRANSPORT_USNIC). This is then split
> > out into two subgroups, IB style and iWARP stype connection management
> > (aka, rdma_port_iw_conn_mgmt() and rdma_port_ib_conn_mgmt()). In my
> > above bitmap, since I didn't give IBOE its own transport type, these
> > subgroups still boil down to the simple tests transport & iWARP and
> > transport & IB like they do today.
>
> There is a lot more variation here than just these two tests, and those
> two tests won't scale to include OPA.
>
> IB ROCEE OPA
> SMI Y N Y (though the OPA smi looked a bit different)

Yes OPA is different but it is based on the class version of the individual
MADs not any particular device/port support.

> IB SMP Y N N

Correction:

IB SMP Y N Y (OPA supports the IB NodeInfo query)

> OPA SMP N N Y

How is this different from the SMI?

> GMP Y Y Y
> SA Y N Y
> PM Y Y Y (? guessing for OPA)
^^^
Yes

> CM Y Y Y
> GMP needs GRH N Y N
>
> It may be unrealistic, but I was hoping we could largely scrub the
> opaque 'is spec iWARP, is spec ROCEE' kinds of tests because they
> don't tell anyone what it is the code cares about.

I somewhat agree except for things like addressing. In the area of addressing
I think we are likely to need to define something like "cap_addr_ib",
"cap_addr_iboe", "cap_addr_iwarp". See below for more details.

>
> Maybe what is needed is a more precise language for the functions:
>
> > > + * cap_ib_mad - Check if the port of device has the capability
> > > Infiniband
> > > + * Management Datagrams.
>
> As used this seems to mean:
>
> True if the port can do IB/OPA SMP, or GMP management packets on QP0 or
> QP1. (Y Y Y) ie: Do we need the MAD layer at all.
>
> ib_smi seems to be true if QP0 is supported (Y N Y)
>
> Maybe the above set would make a lot more sense as:
> cap_ib_qp0
> cap_ib_qp1
> cap_opa_qp0

I disagree.

All we need right now is is cap_qp0. All devices currently support QP1.

Then after all this is settled I can add:

IB ROCEE OPA
OPA MAD Space N N Y Port is OPA MAD space.

>
> ib_cm seems to mean that the CM protocol from the IBA is used on the
> port (Y Y Y)

Agree.

>
> ib_sa means the IBA SA protocol is supported (Y Y Y)

I think this should be (Y N Y)

IBoE has no SA. The IBoE code "fabricates" a Path Record it does not need to
interact with the SA.

>
> ib_mcast true if the IBA SA protocol is used for multicast GIDs (Y N
> Y)

Given the above why can't we just have the "ib_sa" flag?

>
> ipoib means the port supports the ipoib protocol (Y N ?)


OPA does IPoIB so... (Y N Y)


However, I think checking the link layer is more appropriate here. It does not
make sense to do IP over IB over Eth. Even though the IBoE can do the "IB"
protocol.


Making flags in the driver to indicate which ULPs they support is a _bad_
_idea_.

FWIW: I don't consider the MAD and SA (multicast) modules ULPs. Rather they
are helper modules which are built to share code amongst the drivers to process
things on behalf of the drivers themselves. As such advertising a need for or
particular support within those modules make sense.

So although strictly speaking we could do IPoIBoEth, I think having IPoIB check
the LL and limiting itself to ports which are IB LL is appropriate.

>
> This seem reasonable and understandable, even if they are currently a
> bit duplicating.
>
> > Patch 9/17:
> >
> > Most of the other comments on this patch stand as they are. I would add
> > the test:
> >
> > rdma_port_separate_read_sge(dev, port)
> > {
> > return dev->port[port]->transport & RDMA_SEPERATE_READ_SGE;
> > }
> >
> > and add the helper function:
> >
> > rdma_port_get_read_sge(dev, port)
> > {
> > if (rdma_transport_is_iwarp)
> > return 1;
> > return dev->port[port]->max_sge;
> > }
>
> Hum, that is nice, but it doesn't quite fit with how the ULP needs to
> work. The max limit when creating a WR is the value passed into the
> qp_cap, not the device maximum limit.
>
> To do this properly we need to extend the qp_cap, and that is just too
> big a change. A one bit iWarp quirk is OK for now.

I agree. This is the one place we probably want to just keep the "Transport"
check.

>
> > As Sean pointed out, force_grh should be rdma_dev_is_iboe(). The cm
>
> I actually really prefer cap_mandatory_grh - that is what is going on
> here. ie based on that name (as a reviewer) I'd expect to see the mad
> layer check that the mandatory GRH is always present, or blow up.

While GRH mandatory (for the GMP) is what this is. The function
ib_init_ah_from_path generically is really handling an "IBoE address" to send
to and therefore we need to force the GRH in the AH.

There is a whole slew of LL and Transport checks which are used to
format/resolve addresses.

Functions which need to know that the address format is "IBoE"

-- cma.c: cma_acquire_dev
-- cma.c: cma_modify_qp_rtr
-- sa_query.c: ib_init_ah_from_path
-- verbs.c: ib_resolve_eth_l2_attrs

What about a check rdma_port_req_iboe_addr()?

Functions where AF_IB is checked against LL because it does not make sense to
use AF_IB on anything but an IB LL

-- cma.c: cma_listen_on_dev
-- cma.c: cma_bind_loopback

These checks should be directly against the LL

Functions where the ARP type is checked against the link layer.

-- cma.c: cma_acquire_dev
-- cma.c: cma_bind_loopback

These checks should be directly against the LL

>
> Some of the other checks in this file revolve around pkey, I'm not
> sure what rocee does there? cap_pkey_supported ?

It seems IBoE just hardcodes the pkey to 0xffff. I don't see it used anywhere.

Function where port requires "real" PKey

-- cma.c: cma_ib_init_qp_attr

Check rdma_port_req_pkey()?


Over all for the addressing choices:

The "Transport" (or protocol, or whatever) is Verbs. The Layer below Verbs
(OPA/IB/Eth/TCP) defines how addressing, route, and connection information is
generated, communicated, and used.

As Jason and Doug have been saying sometimes we want to know when that requires
SA interaction or the use of the CM protocol (or neither). Other times we just
need to know what the Address format or Link Layer is.


Ira


2015-04-10 07:46:52

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On 04/09/2015 11:19 PM, Doug Ledford wrote:
[snip]
>>
>> I've written this before: The mess here is that it is too hard to know
>> what the call sites are actually checking for when it is some baroque
>> conditional.
>
> The two goals: being specific about what the test is returning and
> minimizing the bitmap footprint; are not necessarily opposed. One can
> do both at the same time.

This could be internal reforming after the cap_XX() stuff works for core
layer, at that time we don't need to touch core layer anymore, just
introducing this bitmap stuff in verb layer, replacing the implementation
of these helpers with the bitmap check, and following the semantic (description).

Regards,
Michael Wang

>

2015-04-10 07:48:18

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

> > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> > index 18c1ece..a9587c4 100644
> > --- a/drivers/infiniband/core/device.c
> > +++ b/drivers/infiniband/core/device.c
> > @@ -76,6 +76,7 @@ static int ib_device_check_mandatory(struct ib_device *device)
> > } mandatory_table[] = {
> > IB_MANDATORY_FUNC(query_device),
> > IB_MANDATORY_FUNC(query_port),
> > + IB_MANDATORY_FUNC(query_transport),
> > IB_MANDATORY_FUNC(query_pkey),
> > IB_MANDATORY_FUNC(query_gid),
> > IB_MANDATORY_FUNC(alloc_pd),
>
> I'm concerned about the performance implications of this. The size of
> this patchset already points out just how many places in the code we
> have to check for various aspects of the device transport in order to do
> the right thing. Without going through the entire list to see how many
> are on critical hot paths, I'm sure some of them are on at least
> partially critical hot paths (like creation of new connections). I
> would prefer to see this change be implemented via a device attribute,
> not a functional call query. That adds a needless function call in
> these paths.

I like the idea of a query_transport but at the same time would like to see the
use of "transport" reduced. A reduction in the use of this call could
eliminate most performance concerns.

So can we keep this abstraction if at the end of the series we limit its use?

>
> > diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> > index f93eb8d..83370de 100644
> > --- a/drivers/infiniband/core/verbs.c
> > +++ b/drivers/infiniband/core/verbs.c
> > @@ -133,14 +133,16 @@ enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_
> > if (device->get_link_layer)
> > return device->get_link_layer(device, port_num);
> >
> > - switch (rdma_node_get_transport(device->node_type)) {
> > + switch (device->query_transport(device, port_num)) {
> > case RDMA_TRANSPORT_IB:
> > + case RDMA_TRANSPORT_IBOE:
> > return IB_LINK_LAYER_INFINIBAND;
>
> If we are perserving ABI, then this looks wrong. Currently, IBOE
> returnsi transport IB and link layer Ethernet. It should not return
> link layer IB, it does not support IB link layer operations (such as MAD
> access).

I think the original code has the bug.

IBoE devices currently return a transport of IB but they probably never get
here because they support the get_link_layer callback used a few lines above.
So this "bug" was probably never hit.

>
> > case RDMA_TRANSPORT_IWARP:
> > case RDMA_TRANSPORT_USNIC:
> > case RDMA_TRANSPORT_USNIC_UDP:
> > return IB_LINK_LAYER_ETHERNET;
> > default:
> > + BUG();
> > return IB_LINK_LAYER_UNSPECIFIED;
> > }
> > }
>
> [ snip ]
>
> > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> > index 65994a1..d54f91e 100644
> > --- a/include/rdma/ib_verbs.h
> > +++ b/include/rdma/ib_verbs.h
> > @@ -75,10 +75,13 @@ enum rdma_node_type {
> > };
> >
> > enum rdma_transport_type {
> > + /* legacy for users */
> > RDMA_TRANSPORT_IB,
> > RDMA_TRANSPORT_IWARP,
> > RDMA_TRANSPORT_USNIC,
> > - RDMA_TRANSPORT_USNIC_UDP
> > + RDMA_TRANSPORT_USNIC_UDP,
> > + /* new transport */
> > + RDMA_TRANSPORT_IBOE,
> > };
>
> I'm also concerned about this. I would like to see this enum
> essentially turned into a bitmap. One that is constructed in such a way
> that we can always get the specific test we need with only one compare
> against the overall value. In order to do so, we need to break it down
> into the essential elements that are part of each of the transports.
> So, for instance, we can define the two link layers we have so far, plus
> reserve one for OPA which we know is coming:
>
> RDMA_LINK_LAYER_IB = 0x00000001,
> RDMA_LINK_LAYER_ETH = 0x00000002,
> RDMA_LINK_LAYER_OPA = 0x00000004,
> RDMA_LINK_LAYER_MASK = 0x0000000f,

I would reserve more bits here.

>
> We can then define the currently known high level transport types:
>
> RDMA_TRANSPORT_IB = 0x00000010,
> RDMA_TRANSPORT_IWARP = 0x00000020,
> RDMA_TRANSPORT_USNIC = 0x00000040,
> RDMA_TRANSPORT_USNIC_UDP = 0x00000080,
> RDMA_TRANSPORT_MASK = 0x000000f0,

I would reserve more bits here.

>
> We could then define bits for the IB management types:
>
> RDMA_MGMT_IB = 0x00000100,
> RDMA_MGMT_OPA = 0x00000200,
> RDMA_MGMT_MASK = 0x00000f00,

We at least need bits for SA / CM support.

I said previously all device types support QP1 I was wrong... I forgot about
USNIC devices. So the full management bit mask is.


RDMA_MGMT_IB_MAD = 0x00000100,
RDMA_MGMT_QP0 = 0x00000200,
RDMA_MGMT_SA = 0x00000400,
RDMA_MGMT_CM = 0x00000800,
RDMA_MGMT_OPA_MAD = 0x00001000,
RDMA_MGMT_MASK = 0x000fff00,

With a couple of spares.

The MAD stack is pretty agnostic to the types of MADs passing through it so we
don't really need PM flags etc.

>
> Then we have space to define specific quirks:
>
> RDMA_SEPARATE_READ_SGE = 0x00001000,
> RDMA_QUIRKS_MASK = 0xfffff000

shift for spares...

RDMA_SEPARATE_READ_SGE = 0x00100000,
RDMA_QUIRKS_MASK = 0xfff00000

>
> Once those are defined, a few definitions for device drivers to use when
> they initialize a device to set the bitmap to the right values:
>
> #define IS_IWARP (RDMA_LINK_LAYER_ETH | RDMA_TRANSPORT_IWARP |
> RDMA_SEPARATE_READ_SGE)
> #define IS_IB (RDMA_LINK_LAYER_IB | RDMA_TRANSPORT_IB | RDMA_MGMT_IB)
> #define IS_IBOE (RDMA_LINK_LAYER_ETH | RDMA_TRANSPORT_IB)
> #define IS_USNIC (RDMA_LINK_LAYER_ETH | RDMA_TRANSPORT_USNIC)
> #define IS_OPA (RDMA_LINK_LAYER_OPA | RDMA_TRANSPORT_IB | RDMA_MGMT_IB |
> RDMA_MGMT_OPA)
>
> Then you need to define the tests:
>
> static inline bool
> rdma_transport_is_iwarp(struct ib_device *dev, u8 port)
> {
> return dev->port[port]->transport & RDMA_TRANSPORT_IWARP;
> }
>
> /* Note: this intentionally covers IB, IBOE, and OPA...use
> rdma_dev_is_ib if you want to only get physical IB devices */
> static inline bool
> rdma_transport_is_ib(struct ibdev *dev)
> {
> return dev->port[port]->transport & RDMA_TRANSPORT_IB;
> }
>
> rdma_port_is_ib(struct ib_device *dev, u8 port)

I prefer

rdma_port_link_layer_is_ib
rdma_port_link_layer_is_eth


> {
> return dev->port[port]->transport & RDMA_LINK_LAYER_IB;
> }
>
> rdma_port_is_iboe(struct ib_device *dev, u8 port)

I'm not sure what this means.

rdma_port_req_iboe_addr seems more appropriate because what we really need to
know is that this device requires an IBoE address format. (PKey is "fake",
PathRecord is fabricated rather than queried, GRH is required in the AH
conversion.)

> {
> return dev->port[port]->transport & IS_IBOE == IS_IBOE;
> }
>
> rdma_port_is_usnic(struct ib_device *dev, u8 port)

rdma_transport_is_usnic

> {
> return dev->port[port]->transport & RDMA_TRANSPORT_USNIC;
> }
>
> rdma_port_is_opa(struct ib_device *dev, u8 port)

rdma_port_link_layer_is_opa

> {
> return dev->port[port]->transport & RDMA_LINK_LAYER_OPA;
> }
>
> rdma_port_is_iwarp(struct ib_device *dev, u8 port)
> {
> return rdma_transport_is_iwarp(dev, port);

Why not call rdma_transport_is_iwarp?

> }
>
> rdma_port_ib_fabric_mgmt(struct ibdev *dev, u8 port)
> {
> return dev->port[port]->transport & RDMA_MGMT_IB;

I agree with Jason that this does not adequately describe the functionality we
are looking for.

> }
>
> rdma_port_opa_mgmt(struct ibdev *dev, u8 port)

Agree.

> {
> return dev->port[port]->transport & RDMA_MGMT_OPA;
> }
>
> Other things can be changed too. Like rdma_port_get_link_layer can
> become this:
>
> {
> return dev->transport & RDMA_LINK_LAYER_MASK;
> }
>
> From patch 2/17:
>
>
> > +static inline int rdma_ib_mgmt(struct ib_device *device, u8 port_num)
> > +{
> > + enum rdma_transport_type tp = device->query_transport(device,
> > port_num);
> > +
> > + return (tp == RDMA_TRANSPORT_IB || tp == RDMA_TRANSPORT_IBOE);
> > +}
>
> This looks wrong. IBOE doesn't have IB management. At least it doesn't
> have subnet management.

Right that is why we need a bit for CM vs SA capability.

>
> Actually, reading through the remainder of the patches, there is some
> serious confusion taking place here. In later patches, you use this as
> a surrogate for cap_cm, which implies you are talking about connection
> management. This is very different than the rdma_dev_ib_mgmt() test
> that I create above, which specifically refers to IB management tasks
> unique to IB/OPA: MAD, SM, multicast.

multicast is part of SA so should be covered by the SA capability.

>
> The kernel connection management code is not really limited. It
> supports IB, IBOE, iWARP, and in the future it will support OPA. There
> are some places in the CM code were we test for just IB/IBOE currently,
> but that's only because we split iWARP out higher up in the abstraction
> hierarchy. So, calling something rdma_ib_mgmt and meaning a rather
> specialized tested in the CM is probably misleading.

Right! So we should have a CM capability.

>
> To straighten all this out, lets break management out into the two
> distinct types:
>
> rdma_port_ib_fabric_mgmt() <- fabric specific management tasks: MAD, SM,
> multicast. The proper test for this with my bitmap above is a simple
> transport & RDMA_MGMT_IB test. If will be true for IB and OPA fabrics.

General management is covered by

RDMA_MGMT_IB_MAD = 0x00000100,
RDMA_MGMT_QP0 = 0x00000200,
...
RDMA_MGMT_OPA_MAD = 0x00001000,


>
> rdma_port_conn_mgmt() <- connection management, which we currently
> support everything except USNIC (correct Sean?), so a test would be
> something like !(transport & RDMA_TRANSPORT_USNIC). This is then split
> out into two subgroups, IB style and iWARP stype connection management
> (aka, rdma_port_iw_conn_mgmt() and rdma_port_ib_conn_mgmt()). In my
> above bitmap, since I didn't give IBOE its own transport type, these
> subgroups still boil down to the simple tests transport & iWARP and
> transport & IB like they do today.

Specific management features CM, route resolution (SA), and special Multicast
management requirements (SA) are covered by:

RDMA_MGMT_SA = 0x00000400,
RDMA_MGMT_CM = 0x00000800,


>
> From patch 3/17:
>
>
> > +/**
> > + * cap_ib_mad - Check if the port of device has the capability
> > Infiniband
> > + * Management Datagrams.
> > + *
> > + * @device: Device to be checked
> > + * @port_num: Port number of the device
> > + *
> > + * Return 0 when port of the device don't support Infiniband
> > + * Management Datagrams.
> > + */
> > +static inline int cap_ib_mad(struct ib_device *device, u8 port_num)
> > +{
> > + return rdma_ib_mgmt(device, port_num);
> > +}
> > +
>
> Why add cap_ib_mad? It's nothing more than rdma_port_ib_fabric_mgmt
> with a new name. Just use rdma_port_ib_fabric_mgmt() everywhere you
> have cap_ib_mad.

Because USNIC apparently does not support MADs at all. So we end up needing a
"big flag" to turn on/off ib_mad.

RDMA_MGMT_IB_MAD = 0x00000100,

>
> From patch 4/17:
>
>
> > +/**
> > + * cap_ib_smi - Check if the port of device has the capability
> > Infiniband
> > + * Subnet Management Interface.
> > + *
> > + * @device: Device to be checked
> > + * @port_num: Port number of the device
> > + *
> > + * Return 0 when port of the device don't support Infiniband
> > + * Subnet Management Interface.
> > + */
> > +static inline int cap_ib_smi(struct ib_device *device, u8 port_num)
> > +{
> > + return rdma_transport_ib(device, port_num);
> > +}
> > +
>
> Same as the previous patch. This is needless indirection. Just use
> rdma_port_ib_fabric_mgmt directly.

No this is not the same... You said:

<quote>
rdma_port_ib_fabric_mgmt() <- fabric specific management tasks: MAD, SM,
multicast. The proper test for this with my bitmap above is a simple
transport & RDMA_MGMT_IB test. If will be true for IB and OPA fabrics.
</quote>

But what we are looking for here is "does the port support QP0" Previously
flagged as "smi". Cover this with this flag.

RDMA_MGMT_QP0 = 0x00000200,

So the optimized version of the above is:

static inline int cap_ib_smi(struct ib_device *device, u8 port_num)
{
return [<device> <port>]->flags & RDMA_MGMT_QP0;
}


>
> Patch 5/17:
>
> Again, just use rdma_port_ib_conn_mgmt() directly.

Agreed.

>
> Patch 6/17:
>
> Again, just use rdma_port_ib_fabric_mgmt() directly.

No this needs to be

rdma_port_requires_sa()

or something...

>
> Patch 7/17:
>
> Again, just use rdma_port_ib_fabric_mgmt() directly. It's perfectly
> applicable to the IB mcast registration requirements.

No this needs to be

rdma_port_requires_sa()

or something...

>
> Patch 8/17:
>
> Here we can create a new test if we are using the bitmap I created
> above:
>
> rdma_port_ipoib(struct ib_device *dev, u8 port)
> {
> return !(dev->port[port]->transport & RDMA_LINK_LAYER_ETH);

I would prefer a link layer (or generic) bit mask rather than '&' of
"transport" and "link layer". That just seems wrong.

> }
>
> This is presuming that OPA will need ipoib devices. This will cause all
> non-Ethernet link layer devices to return true, and right now, that is
> all IB and all OPA devices.

Agreed.

>
> Patch 9/17:
>
> Most of the other comments on this patch stand as they are. I would add
> the test:
>
> rdma_port_separate_read_sge(dev, port)
> {
> return dev->port[port]->transport & RDMA_SEPERATE_READ_SGE;
> }
>
> and add the helper function:
>
> rdma_port_get_read_sge(dev, port)
> {
> if (rdma_transport_is_iwarp)
> return 1;
> return dev->port[port]->max_sge;
> }
>
> Then, as Jason points out, if at some point in the future the kernel is
> modified to support devices with assymetrical read/write SGE sizes, this
> function can be modified to support those devices.
>
> Patch 10/17:
>
> As Sean pointed out, force_grh should be rdma_dev_is_iboe(). The cm
> handles iw devices, but you notice all of the functions you modify here
> start with ib_. The iwarp connections are funneled through iw_ specific
> function variants, and so even though the cm handles iwarp, ib, and roce
> devices, you never see anything other than ib/iboe (and opa in the
> future) get to the ib_ variants of the functions. So, they wrote the
> original tests as tests against the link layer being ethernet and used
> that to differentiate between ib and iboe devices. It works, but can
> confuse people. So, everyplace that has !rdma_transport_ib should
> really be rdma_dev_is_iboe instead. If we ever merge the iw_ and ib_
> functions in the future, having this right will help avoid problems.

I guess rdma_dev_is_iboe is ok. But it seems like we are keying off the
addresses not necessarily the devices.

>
> Patch 11/17:
>
> I wouldn't reform the link_layer_show except to make it compile with the
> new defines I used above.
>
> Patch 12/17:
>
> Go ahead and add a helper to check all ports on a dev, just make it
> rdma_hca_ib_conn_mgmt() and have it loop through
> rdma_port_ib_conn_mgmt() return codes.
>
> Patch 13/17:
>
> This patch is largely unneeded if we reworked the bitmap like I have
> above. A lot of the changes you made to switch from case statements to
> multiple if statements can go back to being case statements because in
> the bitmap IB and IBOE are still both transport IB, so you just do the
> case on the transport bits and not on the link layer bits.
>
> Patch 14/17:
>
> Seems ok.
>
> Patch 15/17:
>
> If you implement the bitmap like I list above, then this code will need
> fixed up to use the bitmap. Otherwise it looks OK.
>
> Patch 16/17:
>
> OK.
>
> Patch 17/17:
>
> I would drop this patch. In the future, the mlx5 driver will support
> both Ethernet and IB like mlx4 does, and we would just need to pull this
> code back to core instead of only in mlx4.

Agreed.

Ira


2015-04-10 08:19:04

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW



On 04/09/2015 06:00 PM, Jason Gunthorpe wrote:
> On Thu, Apr 09, 2015 at 02:42:24PM +0200, Michael Wang wrote:
>> On 04/08/2015 10:10 PM, Jason Gunthorpe wrote:
>> [snip]
>>>
>>> Some of the other checks in this file revolve around pkey, I'm not
>>> sure what rocee does there? cap_pkey_supported ?
>>
>> I'm not sure if this count in capability... how shall we describe it?
>
> I'm not sure how rocee uses pkey, but maybe the the GRH and pkey thing
> would work well together under a single 'cap_ethernet_ah' ?

Sounds better, we can use this in all the case that handling address
for eth-link-layer :-)

Regards,
Michael Wang

>
> Jason
>

2015-04-10 08:25:26

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On 04/10/2015 08:16 AM, ira.weiny wrote:
> First off there are 2 separate issues here:
>
> 1) We need to communicate if a port supports or requires various management
> support from the ib_mad, ib_cm, and/or ib_sa modules.
>
> 2) We need to communicate how a addresses are formated and resolved for a
> particular port
>
>
> In general I don't think we need to remove all uses of the Transport
> or Link Layer.
>
> Although we may be able to remove most of the transport uses.
>
> On Wed, Apr 08, 2015 at 02:10:15PM -0600, Jason Gunthorpe wrote:
[snip]
>
>>
>> Some of the other checks in this file revolve around pkey, I'm not
>> sure what rocee does there? cap_pkey_supported ?
>
> It seems IBoE just hardcodes the pkey to 0xffff. I don't see it used anywhere.
>
> Function where port requires "real" PKey
>
> -- cma.c: cma_ib_init_qp_attr
>
> Check rdma_port_req_pkey()?

What about cap_eth_ah() for all the cases need eth addressing handling?

>
>
> Over all for the addressing choices:
>
> The "Transport" (or protocol, or whatever) is Verbs. The Layer below Verbs
> (OPA/IB/Eth/TCP) defines how addressing, route, and connection information is
> generated, communicated, and used.
>
> As Jason and Doug have been saying sometimes we want to know when that requires
> SA interaction or the use of the CM protocol (or neither). Other times we just
> need to know what the Address format or Link Layer is.

Till now it seems like we could be able to eliminate the link layer helper in core
layer, but I'll reserve that helper in next version, if later we do not need it anymore,
let's erase it then ;-)

Regards,
Michael Wang

>
>
> Ira
>

2015-04-10 14:57:21

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, Apr 10, 2015 at 10:25:21AM +0200, Michael Wang wrote:
> On 04/10/2015 08:16 AM, ira.weiny wrote:
> > First off there are 2 separate issues here:
> >
> > 1) We need to communicate if a port supports or requires various management
> > support from the ib_mad, ib_cm, and/or ib_sa modules.
> >
> > 2) We need to communicate how a addresses are formated and resolved for a
> > particular port
> >
> >
> > In general I don't think we need to remove all uses of the Transport
> > or Link Layer.
> >
> > Although we may be able to remove most of the transport uses.
> >
> > On Wed, Apr 08, 2015 at 02:10:15PM -0600, Jason Gunthorpe wrote:
> [snip]
> >
> >>
> >> Some of the other checks in this file revolve around pkey, I'm not
> >> sure what rocee does there? cap_pkey_supported ?
> >
> > It seems IBoE just hardcodes the pkey to 0xffff. I don't see it used anywhere.
> >
> > Function where port requires "real" PKey
> >
> > -- cma.c: cma_ib_init_qp_attr
> >
> > Check rdma_port_req_pkey()?
>
> What about cap_eth_ah() for all the cases need eth addressing handling?

That works.

>
> >
> >
> > Over all for the addressing choices:
> >
> > The "Transport" (or protocol, or whatever) is Verbs. The Layer below Verbs
> > (OPA/IB/Eth/TCP) defines how addressing, route, and connection information is
> > generated, communicated, and used.
> >
> > As Jason and Doug have been saying sometimes we want to know when that requires
> > SA interaction or the use of the CM protocol (or neither). Other times we just
> > need to know what the Address format or Link Layer is.
>
> Till now it seems like we could be able to eliminate the link layer helper in core
> layer, but I'll reserve that helper in next version, if later we do not need it anymore,
> let's erase it then ;-)

Eliminating Link Layer is fine if we can do it, but I still think that
something like IPoIB should check the link layer.

After sleeping on it the driver exporting cap_ipoib() does not seem _so_ bad
but I still see a distinction between ULPs like IPoIB and the other modules we
have been discussing.

Ira

> Regards,
> Michael Wang
>
> >
> >
> > Ira
> >

2015-04-10 16:16:14

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, Apr 10, 2015 at 02:16:11AM -0400, ira.weiny wrote:

> > IB ROCEE OPA
> > SMI Y N Y (though the OPA smi looked a bit different)
>
> Yes OPA is different but it is based on the class version of the individual
> MADs not any particular device/port support.

> > OPA SMP N N Y
>
> How is this different from the SMI?

Any code that generates SMPs and SMIs is going to need to know what
format to generate them in. It seems we have a sort of weird world
where IB SMPs are supported on OPA but not the IB SMI.

Not sure any users exist though..

> > Maybe the above set would make a lot more sense as:
> > cap_ib_qp0
> > cap_ib_qp1
> > cap_opa_qp0
>
> I disagree.
>
> All we need right now is is cap_qp0. All devices currently support QP1.

I didn't list iWarp in the table because everything is no, but it
doesn't support QP1.

> > ib_sa means the IBA SA protocol is supported (Y Y Y)
>
> I think this should be (Y N Y)
>
> IBoE has no SA. The IBoE code "fabricates" a Path Record it does not need to
> interact with the SA.

I was wondering why there are so many checks in the SA code, I know
RoCEE doesn't use it, but why are there there?

> > ib_mcast true if the IBA SA protocol is used for multicast GIDs (Y N
> > Y)
>
> Given the above why can't we just have the "ib_sa" flag?

Maybe I got it wrong, but yes, if it really means 'IBA SA protocol for
multicast then it can just be cap_sa.

But there is also the idea that some devices can't do multicast at all
(iWarp), we must care about that at some point?

> However, I think checking the link layer is more appropriate here.
> It does not make sense to do IP over IB over Eth. Even though the
> IBoE can do the "IB" protocol.

Yes, it is ugly.

I think if we look closely we'll find that IPoIB today has a hard
requirement on cap_sa being true, so lets use that?

In fact any ULP that unconditionally uses the SA can use that.

> > I actually really prefer cap_mandatory_grh - that is what is going on
> > here. ie based on that name (as a reviewer) I'd expect to see the mad
> > layer check that the mandatory GRH is always present, or blow up.
>
> While GRH mandatory (for the GMP) is what this is. The function
> ib_init_ah_from_path generically is really handling an "IBoE address" to send
> to and therefore we need to force the GRH in the AH.

This make sense to me.

It appears we have at least rocee, rocee v2 (udp?), tcp, ib and opa
address and AH formats? opa would support ib addresses too I guess.

A
bool rdma_port_addr_is_XXX()

along with a

enum AddrType rdma_port_addr_type()

Might be the thing? The latter should only be used with switch()

Jason

2015-04-10 16:48:39

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, 2015-04-10 at 09:46 +0200, Michael Wang wrote:
> On 04/09/2015 11:19 PM, Doug Ledford wrote:
> [snip]
> >>
> >> I've written this before: The mess here is that it is too hard to know
> >> what the call sites are actually checking for when it is some baroque
> >> conditional.
> >
> > The two goals: being specific about what the test is returning and
> > minimizing the bitmap footprint; are not necessarily opposed. One can
> > do both at the same time.
>
> This could be internal reforming after the cap_XX() stuff works for core
> layer, at that time we don't need to touch core layer anymore, just
> introducing this bitmap stuff in verb layer, replacing the implementation
> of these helpers with the bitmap check, and following the semantic (description).

Agreed.

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-10 17:10:49

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, 2015-04-10 at 03:48 -0400, ira.weiny wrote:
> > > diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
> > > index 18c1ece..a9587c4 100644
> > > --- a/drivers/infiniband/core/device.c
> > > +++ b/drivers/infiniband/core/device.c
> > > @@ -76,6 +76,7 @@ static int ib_device_check_mandatory(struct ib_device *device)
> > > } mandatory_table[] = {
> > > IB_MANDATORY_FUNC(query_device),
> > > IB_MANDATORY_FUNC(query_port),
> > > + IB_MANDATORY_FUNC(query_transport),
> > > IB_MANDATORY_FUNC(query_pkey),
> > > IB_MANDATORY_FUNC(query_gid),
> > > IB_MANDATORY_FUNC(alloc_pd),
> >
> > I'm concerned about the performance implications of this. The size of
> > this patchset already points out just how many places in the code we
> > have to check for various aspects of the device transport in order to do
> > the right thing. Without going through the entire list to see how many
> > are on critical hot paths, I'm sure some of them are on at least
> > partially critical hot paths (like creation of new connections). I
> > would prefer to see this change be implemented via a device attribute,
> > not a functional call query. That adds a needless function call in
> > these paths.
>
> I like the idea of a query_transport but at the same time would like to see the
> use of "transport" reduced. A reduction in the use of this call could
> eliminate most performance concerns.
>
> So can we keep this abstraction if at the end of the series we limit its use?

The reason I don't like a query is because the transport type isn't
changing. It's a static device attribute. The only devices that *can*
change their transport are mlx4 or mlx5 devices, and they tear down and
deregister their current device and bring up a new one when they need to
change transports or link layers. So, this really isn't something we
should query, this should be part of our static device attributes.
Every other query in the list above is for something that changes. This
is not.

> >
> > > diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> > > index f93eb8d..83370de 100644
> > > --- a/drivers/infiniband/core/verbs.c
> > > +++ b/drivers/infiniband/core/verbs.c
> > > @@ -133,14 +133,16 @@ enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_
> > > if (device->get_link_layer)
> > > return device->get_link_layer(device, port_num);
> > >
> > > - switch (rdma_node_get_transport(device->node_type)) {
> > > + switch (device->query_transport(device, port_num)) {
> > > case RDMA_TRANSPORT_IB:
> > > + case RDMA_TRANSPORT_IBOE:
> > > return IB_LINK_LAYER_INFINIBAND;
> >
> > If we are perserving ABI, then this looks wrong. Currently, IBOE
> > returnsi transport IB and link layer Ethernet. It should not return
> > link layer IB, it does not support IB link layer operations (such as MAD
> > access).
>
> I think the original code has the bug.
>
> IBoE devices currently return a transport of IB but they probably never get
> here because they support the get_link_layer callback used a few lines above.
> So this "bug" was probably never hit.
>
> >
> > > case RDMA_TRANSPORT_IWARP:
> > > case RDMA_TRANSPORT_USNIC:
> > > case RDMA_TRANSPORT_USNIC_UDP:
> > > return IB_LINK_LAYER_ETHERNET;
> > > default:
> > > + BUG();
> > > return IB_LINK_LAYER_UNSPECIFIED;
> > > }
> > > }
> >
> > [ snip ]
> >
> > > diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> > > index 65994a1..d54f91e 100644
> > > --- a/include/rdma/ib_verbs.h
> > > +++ b/include/rdma/ib_verbs.h
> > > @@ -75,10 +75,13 @@ enum rdma_node_type {
> > > };
> > >
> > > enum rdma_transport_type {
> > > + /* legacy for users */
> > > RDMA_TRANSPORT_IB,
> > > RDMA_TRANSPORT_IWARP,
> > > RDMA_TRANSPORT_USNIC,
> > > - RDMA_TRANSPORT_USNIC_UDP
> > > + RDMA_TRANSPORT_USNIC_UDP,
> > > + /* new transport */
> > > + RDMA_TRANSPORT_IBOE,
> > > };
> >
> > I'm also concerned about this. I would like to see this enum
> > essentially turned into a bitmap. One that is constructed in such a way
> > that we can always get the specific test we need with only one compare
> > against the overall value. In order to do so, we need to break it down
> > into the essential elements that are part of each of the transports.
> > So, for instance, we can define the two link layers we have so far, plus
> > reserve one for OPA which we know is coming:
> >
> > RDMA_LINK_LAYER_IB = 0x00000001,
> > RDMA_LINK_LAYER_ETH = 0x00000002,
> > RDMA_LINK_LAYER_OPA = 0x00000004,
> > RDMA_LINK_LAYER_MASK = 0x0000000f,
>
> I would reserve more bits here.

Sure. I didn't mean to imply that this was as large is these bit fields
would ever need to be, I just typed it up quickly at the time.

> >
> > We can then define the currently known high level transport types:
> >
> > RDMA_TRANSPORT_IB = 0x00000010,
> > RDMA_TRANSPORT_IWARP = 0x00000020,
> > RDMA_TRANSPORT_USNIC = 0x00000040,
> > RDMA_TRANSPORT_USNIC_UDP = 0x00000080,
> > RDMA_TRANSPORT_MASK = 0x000000f0,
>
> I would reserve more bits here.
>
> >
> > We could then define bits for the IB management types:
> >
> > RDMA_MGMT_IB = 0x00000100,
> > RDMA_MGMT_OPA = 0x00000200,
> > RDMA_MGMT_MASK = 0x00000f00,
>
> We at least need bits for SA / CM support.
>
> I said previously all device types support QP1 I was wrong... I forgot about
> USNIC devices. So the full management bit mask is.
>
>
> RDMA_MGMT_IB_MAD = 0x00000100,
> RDMA_MGMT_QP0 = 0x00000200,
> RDMA_MGMT_SA = 0x00000400,
> RDMA_MGMT_CM = 0x00000800,
> RDMA_MGMT_OPA_MAD = 0x00001000,
> RDMA_MGMT_MASK = 0x000fff00,
>
> With a couple of spares.
>
> The MAD stack is pretty agnostic to the types of MADs passing through it so we
> don't really need PM flags etc.
>
> >
> > Then we have space to define specific quirks:
> >
> > RDMA_SEPARATE_READ_SGE = 0x00001000,
> > RDMA_QUIRKS_MASK = 0xfffff000
>
> shift for spares...
>
> RDMA_SEPARATE_READ_SGE = 0x00100000,
> RDMA_QUIRKS_MASK = 0xfff00000
>
> >
> > Once those are defined, a few definitions for device drivers to use when
> > they initialize a device to set the bitmap to the right values:
> >
> > #define IS_IWARP (RDMA_LINK_LAYER_ETH | RDMA_TRANSPORT_IWARP |
> > RDMA_SEPARATE_READ_SGE)
> > #define IS_IB (RDMA_LINK_LAYER_IB | RDMA_TRANSPORT_IB | RDMA_MGMT_IB)
> > #define IS_IBOE (RDMA_LINK_LAYER_ETH | RDMA_TRANSPORT_IB)
> > #define IS_USNIC (RDMA_LINK_LAYER_ETH | RDMA_TRANSPORT_USNIC)
> > #define IS_OPA (RDMA_LINK_LAYER_OPA | RDMA_TRANSPORT_IB | RDMA_MGMT_IB |
> > RDMA_MGMT_OPA)
> >
> > Then you need to define the tests:
> >
> > static inline bool
> > rdma_transport_is_iwarp(struct ib_device *dev, u8 port)
> > {
> > return dev->port[port]->transport & RDMA_TRANSPORT_IWARP;
> > }
> >
> > /* Note: this intentionally covers IB, IBOE, and OPA...use
> > rdma_dev_is_ib if you want to only get physical IB devices */
> > static inline bool
> > rdma_transport_is_ib(struct ibdev *dev)
> > {
> > return dev->port[port]->transport & RDMA_TRANSPORT_IB;
> > }
> >
> > rdma_port_is_ib(struct ib_device *dev, u8 port)
>
> I prefer
>
> rdma_port_link_layer_is_ib
> rdma_port_link_layer_is_eth

In my quick example, anything that started with rdma_transport* was
testing the high level transport attribute of any given port of any
given device, and anything that started with rdma_port* was testing the
link layer on a specific port of any given device.

>
> > {
> > return dev->port[port]->transport & RDMA_LINK_LAYER_IB;
> > }
> >
> > rdma_port_is_iboe(struct ib_device *dev, u8 port)
>
> I'm not sure what this means.
>
> rdma_port_req_iboe_addr seems more appropriate because what we really need to
> know is that this device requires an IBoE address format. (PKey is "fake",
> PathRecord is fabricated rather than queried, GRH is required in the AH
> conversion.)

TomAto, Tomato...port_is_ib implicity means port_req_iboe_addr.
>
> > {
> > return dev->port[port]->transport & IS_IBOE == IS_IBOE;
> > }
> >
> > rdma_port_is_usnic(struct ib_device *dev, u8 port)
>
> rdma_transport_is_usnic
>
> > {
> > return dev->port[port]->transport & RDMA_TRANSPORT_USNIC;
> > }
> >
> > rdma_port_is_opa(struct ib_device *dev, u8 port)
>
> rdma_port_link_layer_is_opa
>
> > {
> > return dev->port[port]->transport & RDMA_LINK_LAYER_OPA;
> > }
> >
> > rdma_port_is_iwarp(struct ib_device *dev, u8 port)
> > {
> > return rdma_transport_is_iwarp(dev, port);
>
> Why not call rdma_transport_is_iwarp?

As per my above statement, rdma_transport* tests were testing the high
level transport type, rdma_port* types were testing link layers. iWARP
has an Eth link layer, so technically port_is_iwarp makes no sense. But
since all the other types had a check too, I included port_is_iwarp just
to be complete, and if you are going to ask if a specific port is iwarp
as a link layer, it makes sense to say yes if the transport is iwarp,
not if the link layer is eth.

[ snip lots of stuff that is all correct ]

> > Patch 8/17:
> >
> > Here we can create a new test if we are using the bitmap I created
> > above:
> >
> > rdma_port_ipoib(struct ib_device *dev, u8 port)
> > {
> > return !(dev->port[port]->transport & RDMA_LINK_LAYER_ETH);
>
> I would prefer a link layer (or generic) bit mask rather than '&' of
> "transport" and "link layer". That just seems wrong.

I kept the name transport, but it's really a device attribute bitmap.
And of all the link layer types, eth is the only one for which IPoIB
makes no sense (even if it's possible to do). So, as long as the ETH
bit isn't set, we're good to go. But, calling it
dev->port[port]->attributes instead of transport would make it more
clear what it is.

> I guess rdma_dev_is_iboe is ok. But it seems like we are keying off the
> addresses not necessarily the devices.

They're one and the same. The addresses go with the device, the device
goes with the addresses. You never have one without the other. The
name of the check is not really important, just as long as it's clearly
documented. I get why you link the address variant, because it pops out
all the things that are special about IBoE addressing and calls out that
the issues need to be handled. However, saying requires_iboe_addr(),
while foreshadowing the work that needs done, doesn't actually document
the work that needs done. Whether we call is dev_is_iboe() or
requires_iboe_addr(), it would be good if the documentation spelled out
those specific requirements for reference sake.


--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-10 17:37:11

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, Apr 10, 2015 at 01:10:43PM -0400, Doug Ledford wrote:

> documented. I get why you link the address variant, because it pops out
> all the things that are special about IBoE addressing and calls out that
> the issues need to be handled. However, saying requires_iboe_addr(),
> while foreshadowing the work that needs done, doesn't actually document
> the work that needs done. Whether we call is dev_is_iboe() or
> requires_iboe_addr(), it would be good if the documentation spelled out
> those specific requirements for reference sake.

My deep hope for this, was that the test 'requires_iboe_addr' or
whatever we call it would have a *really good* kdoc.

List all the ways iboe_addr's work, how they differ from IB addresses,
refer to the specs people should read to understand it, etc.

The patches don't do this, and maybe Michael is the wrong person to
fill that in, but we can get it done..

Jason

BTW: Michael, next time you post the series, please trim the CC
list...

2015-04-10 17:38:54

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, Apr 10, 2015 at 10:15:51AM -0600, Jason Gunthorpe wrote:
> On Fri, Apr 10, 2015 at 02:16:11AM -0400, ira.weiny wrote:
>
> > > IB ROCEE OPA
> > > SMI Y N Y (though the OPA smi looked a bit different)
> >
> > Yes OPA is different but it is based on the class version of the individual
> > MADs not any particular device/port support.
>
> > > OPA SMP N N Y
> >
> > How is this different from the SMI?
>
> Any code that generates SMPs and SMIs is going to need to know what
> format to generate them in. It seems we have a sort of weird world
> where IB SMPs are supported on OPA but not the IB SMI.

My mistake it was late. The MAD stack needs to know if it should implement the
IB SMI or the IB & OPA SMI. That also implies the use of QP0 or vise versa.

In my email to Doug I suggested "OPA MAD" which covers the need to implement
the OPA SMI on that device for MADs which have that class version.

>
> Not sure any users exist though..

I think all the "users" are either userspace or the drivers themselves. It is
really just the common SMI processing which needs to be turned on/off. Again I
think this can be covered with the QP0 flag.

RDMA_MGMT_QP0 = 0x00000200,

>
> > > Maybe the above set would make a lot more sense as:
> > > cap_ib_qp0
> > > cap_ib_qp1
> > > cap_opa_qp0
> >
> > I disagree.
> >
> > All we need right now is is cap_qp0. All devices currently support QP1.
>
> I didn't list iWarp in the table because everything is no, but it
> doesn't support QP1.

Isn't ocrdma an iWarp device?

int ocrdma_process_mad(struct ib_device *ibdev,
int process_mad_flags,
u8 port_num,
struct ib_wc *in_wc,
struct ib_grh *in_grh,
struct ib_mad *in_mad, struct ib_mad *out_mad)
{
int status;
struct ocrdma_dev *dev;

switch (in_mad->mad_hdr.mgmt_class) {
case IB_MGMT_CLASS_PERF_MGMT:
dev = get_ocrdma_dev(ibdev);
if (!ocrdma_pma_counters(dev, out_mad))
status = IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY;
else
status = IB_MAD_RESULT_SUCCESS;
break;
default:
status = IB_MAD_RESULT_SUCCESS;
break;
}
return status;
}


Regardless I was wrong. USNIC devices don't support MADs at all. So we do
need the "supports MADs at all flag".

RDMA_MGMT_IB_MAD = 0x00000100,

>
> > > ib_sa means the IBA SA protocol is supported (Y Y Y)
> >
> > I think this should be (Y N Y)
> >
> > IBoE has no SA. The IBoE code "fabricates" a Path Record it does not need to
> > interact with the SA.
>
> I was wondering why there are so many checks in the SA code, I know
> RoCEE doesn't use it, but why are there there?

Which checks are you referring to? I think there are separate calls to query
the SA when running on IB for both the Route resolution and the Multicast join
operations. The choice of those calls could be made by a "cap_sa()" helper.

>
> > > ib_mcast true if the IBA SA protocol is used for multicast GIDs (Y N
> > > Y)
> >
> > Given the above why can't we just have the "ib_sa" flag?
>
> Maybe I got it wrong, but yes, if it really means 'IBA SA protocol for
> multicast then it can just be cap_sa.
>
> But there is also the idea that some devices can't do multicast at all
> (iWarp), we must care about that at some point?

I was not sure how to handle this either... I guess you need a
cap_multicast()???

>
> > However, I think checking the link layer is more appropriate here.
> > It does not make sense to do IP over IB over Eth. Even though the
> > IBoE can do the "IB" protocol.
>
> Yes, it is ugly.
>
> I think if we look closely we'll find that IPoIB today has a hard
> requirement on cap_sa being true, so lets use that?

I don't think that is appropriate. You have been advocating that the checks
be clear as to what support we need. While currently the IPoIB layer does (for
IB and OPA) require an SA I think those checks are only appropriate when it is
attempting an SA query.

The choice to run IPoIB at all is a different matter.

>
> In fact any ULP that unconditionally uses the SA can use that.

They _can_ use that but the point of this exercise (and additional checks going
forward) is that we don't "hide" meaning like this.

IPoIB should restrict itself to running on IB link layers. Should additional
link layers be added which IPoIB works on then we add that check.

>
> > > I actually really prefer cap_mandatory_grh - that is what is going on
> > > here. ie based on that name (as a reviewer) I'd expect to see the mad
> > > layer check that the mandatory GRH is always present, or blow up.
> >
> > While GRH mandatory (for the GMP) is what this is. The function
> > ib_init_ah_from_path generically is really handling an "IBoE address" to send
> > to and therefore we need to force the GRH in the AH.
>
> This make sense to me.
>
> It appears we have at least rocee, rocee v2 (udp?), tcp, ib and opa
> address and AH formats?

Seems that way. But has the rocee v2 been accepted?

> opa would support ib addresses too I guess.

Yes opa address == ib addresses. So there is no need to distinguish them.

>
> A
> bool rdma_port_addr_is_XXX()
>
> along with a
>
> enum AddrType rdma_port_addr_type()
>
> Might be the thing? The latter should only be used with switch()

Sounds good to me. But Doug has a point that the address type and the "port"
type go together. So this could probably be the same call for both of those.

Ira


2015-04-10 17:49:38

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, 2015-04-10 at 13:38 -0400, ira.weiny wrote:

> Isn't ocrdma an iWarp device?

No, it's roce. It and mlx4 roce devices currently interoperate.

> > I think if we look closely we'll find that IPoIB today has a hard
> > requirement on cap_sa being true, so lets use that?
>
> I don't think that is appropriate. You have been advocating that the checks
> be clear as to what support we need. While currently the IPoIB layer does (for
> IB and OPA) require an SA I think those checks are only appropriate when it is
> attempting an SA query.
>
> The choice to run IPoIB at all is a different matter.

Appropriately named or not, Jason's choice of words "has a hard
requirement" is correct ;-) For IPoIB, the broadcast group of the fake
Ethernet fabric is a very specific IB multicast group per the IPoIB
spec.

> >
> > In fact any ULP that unconditionally uses the SA can use that.
>
> They _can_ use that but the point of this exercise (and additional checks going
> forward) is that we don't "hide" meaning like this.
>
> IPoIB should restrict itself to running on IB link layers. Should additional
> link layers be added which IPoIB works on then we add that check.

I think your right that checking the link layer is the right thing, and
for now, there is no need to check cap_sa because the link layer check
enforces it. In the future, if there is a new link layer we want to use
this on, and it doesn't have an sa, then we have to enable sa checks and
alternate methods at that time.

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-10 17:50:19

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On 4/10/2015 1:10 PM, Doug Ledford wrote:
> As per my above statement, rdma_transport* tests were testing the high
> level transport type, rdma_port* types were testing link layers. iWARP
> has an Eth link layer, so technically port_is_iwarp makes no sense. But
> since all the other types had a check too, I included port_is_iwarp just
> to be complete, and if you are going to ask if a specific port is iwarp
> as a link layer, it makes sense to say yes if the transport is iwarp,
> not if the link layer is eth.

Not wanting to split hairs, but I would not rule out the possibility
of a future device supporting iWARP on one port and another RDMA
protocol on another. One could also imagine softiWARP and softROCE
co-existing atop a single ethernet NIC.

So, I disagree that port_is_iwarp() is a nonsequitur.

Tom.


2015-04-10 18:05:10

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, Apr 10, 2015 at 01:38:38PM -0400, ira.weiny wrote:

> Isn't ocrdma an iWarp device?

No, it is RoCEE

> > I was wondering why there are so many checks in the SA code, I know
> > RoCEE doesn't use it, but why are there there?
>
> Which checks are you referring to? I think there are separate calls to query
> the SA when running on IB for both the Route resolution and the Multicast join
> operations. The choice of those calls could be made by a "cap_sa()" helper.

I will point them out next time I look through the patches

> > I think if we look closely we'll find that IPoIB today has a hard
> > requirement on cap_sa being true, so lets use that?
>
> I don't think that is appropriate. You have been advocating that the checks
> be clear as to what support we need.

Right, but this is narrow, and we are not hiding meaning.

Look at the IPoIB ULP, and look at the hard requiments of the code,
then translate those back to our new cap scheme. We see today's IPoIB
will not run without:
- UD support
- IB addressing
- IB multicast
- IB SA
- CM (optional)

It seems perfectly correct for a ULP to say at the very start, I need
all these caps, or I will not run (how could it run?). This is true of
any ULP that has a hard need to use those APIs.

That would seem to be the very essance of the cap scheme. Declare what
you need, not what standard you think you need.

Hiding meaning is to say 'only run on IB or OPA': WHY are we limited
to those two?

> While currently the IPoIB layer does (for IB and OPA) require an SA
> I think those checks are only appropriate when it is attempting an
> SA query.

That doesn't make any sense unless someone also adds support for
handling the !SA case.

> > It appears we have at least rocee, rocee v2 (udp?), tcp, ib and opa
> > address and AH formats?
>
> Seems that way. But has the rocee v2 been accepted?

Don't know much about it yet, patches exist, it seems to have a
slightly different addressing format.

> > opa would support ib addresses too I guess.
>
> Yes opa address == ib addresses. So there is no need to distinguish them.

The patches you sent showed a different LRH format for OPA (eg 32 bit
LID), so someday we will need to know that the full 32 bit LID is
available.

We can see how this might work in future, lets say OPAv2 *requires* the
32 bit LID, for that case cap_ib_address = 0 cap_opa_address = 1. If
we don't update IPoIB and it uses the tests from above then it
immediately, and correctly, stops running on those OPAv2 devices.

Once patched to support cap_op_address then it will begin working
again. That seems very sane..

Jason

2015-04-10 18:12:00

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, Apr 10, 2015 at 01:49:32PM -0400, Doug Ledford wrote:
> On Fri, 2015-04-10 at 13:38 -0400, ira.weiny wrote:
>
>
> > > I think if we look closely we'll find that IPoIB today has a hard
> > > requirement on cap_sa being true, so lets use that?
> >
> > I don't think that is appropriate. You have been advocating that the checks
> > be clear as to what support we need. While currently the IPoIB layer does (for
> > IB and OPA) require an SA I think those checks are only appropriate when it is
> > attempting an SA query.
> >
> > The choice to run IPoIB at all is a different matter.
>
> Appropriately named or not, Jason's choice of words "has a hard
> requirement" is correct ;-)

Agreed. I meant that using "cap_sa" is not appropriate. Not that IPoIB did
not have a hard requirement... :-D

I actually think that _both_ the check for IB link layer and the "cap_sa" is
required. Perhaps not at start up...

Ira


2015-04-10 18:18:09

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, 2015-04-10 at 13:50 -0400, Tom Talpey wrote:
> On 4/10/2015 1:10 PM, Doug Ledford wrote:
> > As per my above statement, rdma_transport* tests were testing the high
> > level transport type, rdma_port* types were testing link layers. iWARP
> > has an Eth link layer, so technically port_is_iwarp makes no sense. But
> > since all the other types had a check too, I included port_is_iwarp just
> > to be complete, and if you are going to ask if a specific port is iwarp
> > as a link layer, it makes sense to say yes if the transport is iwarp,
> > not if the link layer is eth.
>
> Not wanting to split hairs, but I would not rule out the possibility
> of a future device supporting iWARP on one port and another RDMA
> protocol on another. One could also imagine softiWARP and softROCE
> co-existing atop a single ethernet NIC.
>
> So, I disagree that port_is_iwarp() is a nonsequitur.

Agreed, but that wasn't what I was calling non-sense. I was referring
to the fact that in my quick little write up, the rdma_port* functions
were all intended to test link layers, not high level transports. There
is no such thing as an iWARP link layer. It was still a port specific
test, and would work in all the situations you described, it's just that
asking if a port's link layer is iWARP makes no sense, so I returned
true if the transport was iWARP regardless of what the link layer
actually was.


--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-10 18:25:21

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, 2015-04-10 at 12:04 -0600, Jason Gunthorpe wrote:
> On Fri, Apr 10, 2015 at 01:38:38PM -0400, ira.weiny wrote:
>
> Hiding meaning is to say 'only run on IB or OPA': WHY are we limited
> to those two?

For something else, I might agree with this. But, for the specific case
of IPoIB, it's pretty fair. IPoIB is more than just an ULP. It's a
spec. And it's very IB specific. It will only work with OPA because
OPA is imitating IB. To run it on another fabric, you would need more
than just to make it work. If the new fabric doesn't have a broadcast
group, or has multicast registration like IB does, you need the
equivalent of IBTA, whatever that may be for this new fabric, buy in on
the pre-defined multicast groups and you might need firmware support in
the switches.

> We can see how this might work in future, lets say OPAv2 *requires* the
> 32 bit LID, for that case cap_ib_address = 0 cap_opa_address = 1. If
> we don't update IPoIB and it uses the tests from above then it
> immediately, and correctly, stops running on those OPAv2 devices.
>
> Once patched to support cap_op_address then it will begin working
> again. That seems very sane..

It is very sane from an implementation standpoint, but from the larger
interoperability standpoint, you need that spec to be extended to the
new fabric simultaneously.


--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-10 19:17:43

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, Apr 10, 2015 at 02:24:26PM -0400, Doug Ledford wrote:

> IPoIB is more than just an ULP. It's a spec. And it's very IB
> specific. It will only work with OPA because OPA is imitating IB.
> To run it on another fabric, you would need more than just to make
> it work. If the new fabric doesn't have a broadcast group, or has
> multicast registration like IB does, you need the equivalent of
> IBTA, whatever that may be for this new fabric, buy in on the
> pre-defined multicast groups and you might need firmware support in
> the switches.

It feels like the 'cap_ib_addressing' or whatever we call it captures
this very well. The IPoIB RFC is very much concerned with GID's and
MGID's and broadly requires the IBA addressing
scheme. cap_ib_addressing asserts the port uses that scheme.

We wouldn't accept patches to IPoIB to add a new addressing scheme
without seeing proper diligence to the standards work.

Looking away from the stadards, using cap_XX seems very sane: We are
building a well defined system of invarients, You can't call into the
sa functions if cap_sa is not set, you can't call into the mcast
functions if cap_mcast is not set, you can't form a AH from IB
GIDs/MGID/LID without cap_ib_addressing.

I makes so much sense for the ULP to directly require the needed cap's
for the kernel APIs it intends to call, or not use the RDMA port at
all.

> > We can see how this might work in future, lets say OPAv2 *requires* the
> > 32 bit LID, for that case cap_ib_address = 0 cap_opa_address = 1. If
> > we don't update IPoIB and it uses the tests from above then it
> > immediately, and correctly, stops running on those OPAv2 devices.
> >
> > Once patched to support cap_op_address then it will begin working
> > again. That seems very sane..
>
> It is very sane from an implementation standpoint, but from the larger
> interoperability standpoint, you need that spec to be extended to the
> new fabric simultaneously.

I liked the OPAv2 hypothetical because it doesn't actually touch the
IPoIB spec. IPoIB spec has little to say about LIDs or LRHs it works
entirely at the GID/MGID/GRH level.

Jason

2015-04-10 20:38:16

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, Apr 10, 2015 at 12:04:55PM -0600, Jason Gunthorpe wrote:
> On Fri, Apr 10, 2015 at 01:38:38PM -0400, ira.weiny wrote:
>
> >
> > I don't think that is appropriate. You have been advocating that the checks
> > be clear as to what support we need.
>
> Right, but this is narrow, and we are not hiding meaning.
>
> Look at the IPoIB ULP, and look at the hard requiments of the code,
> then translate those back to our new cap scheme. We see today's IPoIB
> will not run without:
> - UD support
> - IB addressing
> - IB multicast
> - IB SA
> - CM (optional)
>
> It seems perfectly correct for a ULP to say at the very start, I need
> all these caps, or I will not run (how could it run?). This is true of
> any ULP that has a hard need to use those APIs.

Having IPoIB check for all of these and fail to start if not supported is good.
But the suggestion before was to have "cap_ipoib". I don't think we want that.

>
> That would seem to be the very essance of the cap scheme. Declare what
> you need, not what standard you think you need.

Right.

>
> Hiding meaning is to say 'only run on IB or OPA': WHY are we limited
> to those two?

Because only those 2 support the list of capabilities above.

>
> > While currently the IPoIB layer does (for IB and OPA) require an SA
> > I think those checks are only appropriate when it is attempting an
> > SA query.
>
> That doesn't make any sense unless someone also adds support for
> handling the !SA case.

Fair enough but IPoIB does need to check for SA support now as well as IB
addressing which is currently IB Link Layer (although as Doug said the Port and
Address format go hand in hand. So I'm happy calling it whatever.)

>
> > > It appears we have at least rocee, rocee v2 (udp?), tcp, ib and opa
> > > address and AH formats?
> >
> > Seems that way. But has the rocee v2 been accepted?
>
> Don't know much about it yet, patches exist, it seems to have a
> slightly different addressing format.
>
> > > opa would support ib addresses too I guess.
> >
> > Yes opa address == ib addresses. So there is no need to distinguish them.
>
> The patches you sent showed a different LRH format for OPA (eg 32 bit
> LID), so someday we will need to know that the full 32 bit LID is
> available.
>
> We can see how this might work in future, lets say OPAv2 *requires* the
> 32 bit LID, for that case cap_ib_address = 0 cap_opa_address = 1. If
> we don't update IPoIB and it uses the tests from above then it
> immediately, and correctly, stops running on those OPAv2 devices.

For your hypothetical case, agreed. But I want to make it clear for those who
may be casually reading this thread that OPA addresses are IB addresses right
now.

>
> Once patched to support cap_op_address then it will begin working
> again. That seems very sane..

Agreed.

Ira


2015-04-10 21:06:56

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On Fri, Apr 10, 2015 at 01:17:23PM -0600, Jason Gunthorpe wrote:
> On Fri, Apr 10, 2015 at 02:24:26PM -0400, Doug Ledford wrote:
>
> > IPoIB is more than just an ULP. It's a spec. And it's very IB
> > specific. It will only work with OPA because OPA is imitating IB.
> > To run it on another fabric, you would need more than just to make
> > it work. If the new fabric doesn't have a broadcast group, or has
> > multicast registration like IB does, you need the equivalent of
> > IBTA, whatever that may be for this new fabric, buy in on the
> > pre-defined multicast groups and you might need firmware support in
> > the switches.
>
> It feels like the 'cap_ib_addressing' or whatever we call it captures
> this very well. The IPoIB RFC is very much concerned with GID's and
> MGID's and broadly requires the IBA addressing
> scheme. cap_ib_addressing asserts the port uses that scheme.
>
> We wouldn't accept patches to IPoIB to add a new addressing scheme
> without seeing proper diligence to the standards work.
>
> Looking away from the stadards, using cap_XX seems very sane: We are
> building a well defined system of invarients, You can't call into the
> sa functions if cap_sa is not set, you can't call into the mcast
> functions if cap_mcast is not set, you can't form a AH from IB
> GIDs/MGID/LID without cap_ib_addressing.

Yep.

>
> I makes so much sense for the ULP to directly require the needed cap's
> for the kernel APIs it intends to call, or not use the RDMA port at
> all.

Yes.

So trying to sum up.

Have we settled on the following "capabilities"? Helper function names aside.

/* legacy to communicate to userspace */
RDMA_LINK_LAYER_IB = 0x0000000000000001,
RDMA_LINK_LAYER_ETH = 0x0000000000000002,
RDMA_LINK_LAYER_MASK = 0x000000000000000f, /* more bits? */
/* I'm hoping we don't need more bits here */


/* legacy to communicate to userspace */
RDMA_TRANSPORT_IB = 0x0000000000000010,
RDMA_TRANSPORT_IWARP = 0x0000000000000020,
RDMA_TRANSPORT_USNIC = 0x0000000000000040,
RDMA_TRANSPORT_USNIC_UDP = 0x0000000000000080,
RDMA_TRANSPORT_MASK = 0x00000000000000f0, /* more bits? */
/* I'm hoping we don't need more bits here */


/* New flags */

RDMA_MGMT_IB_MAD = 0x0000000000000100, /* ib_mad module support */
RDMA_MGMT_QP0 = 0x0000000000000200, /* ib_mad QP0 support */
RDMA_MGMT_IB_SA = 0x0000000000000400, /* ib_sa module support */
/* NOTE includes IB Mcast */
RDMA_MGMT_IB_CM = 0x0000000000000800, /* ib_cm module support */
RDMA_MGMT_OPA_MAD = 0x0000000000001000, /* ib_mad OPA MAD support */
RDMA_MGMT_MASK = 0x00000000000fff00,

RDMA_ADDR_IB = 0x0000000000100000, /* Port does IB AH, PR, Pkey */
RDMA_ADDR_IBoE = 0x0000000000200000, /* Port does IBoE AH, PR, Pkey */
/* Do we need iWarp (TCP) here? */
RDMA_ADDR_IB_MASK = 0x000000000ff00000,


RDMA_SEPARATE_READ_SGE = 0x0000000010000000,
RDMA_QUIRKS_MASK = 0x000000fff0000000


>
> > > We can see how this might work in future, lets say OPAv2 *requires* the
> > > 32 bit LID, for that case cap_ib_address = 0 cap_opa_address = 1. If
> > > we don't update IPoIB and it uses the tests from above then it
> > > immediately, and correctly, stops running on those OPAv2 devices.
> > >
> > > Once patched to support cap_op_address then it will begin working
> > > again. That seems very sane..
> >
> > It is very sane from an implementation standpoint, but from the larger
> > interoperability standpoint, you need that spec to be extended to the
> > new fabric simultaneously.
>
> I liked the OPAv2 hypothetical because it doesn't actually touch the
> IPoIB spec. IPoIB spec has little to say about LIDs or LRHs it works
> entirely at the GID/MGID/GRH level.

Agreed.

Ira


2015-04-11 00:01:20

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On 4/10/2015 5:06 PM, ira.weiny wrote:
> On Fri, Apr 10, 2015 at 01:17:23PM -0600, Jason Gunthorpe wrote:
>...
> So trying to sum up.
>
> Have we settled on the following "capabilities"? Helper function names aside.
>
> /* legacy to communicate to userspace */
> RDMA_LINK_LAYER_IB = 0x0000000000000001,
> RDMA_LINK_LAYER_ETH = 0x0000000000000002,
> RDMA_LINK_LAYER_MASK = 0x000000000000000f, /* more bits? */
> /* I'm hoping we don't need more bits here */
>
>
> /* legacy to communicate to userspace */
> RDMA_TRANSPORT_IB = 0x0000000000000010,
> RDMA_TRANSPORT_IWARP = 0x0000000000000020,
> RDMA_TRANSPORT_USNIC = 0x0000000000000040,
> RDMA_TRANSPORT_USNIC_UDP = 0x0000000000000080,
> RDMA_TRANSPORT_MASK = 0x00000000000000f0, /* more bits? */
> /* I'm hoping we don't need more bits here */
>
>
> /* New flags */
>
> RDMA_MGMT_IB_MAD = 0x0000000000000100, /* ib_mad module support */
> RDMA_MGMT_QP0 = 0x0000000000000200, /* ib_mad QP0 support */
> RDMA_MGMT_IB_SA = 0x0000000000000400, /* ib_sa module support */
> /* NOTE includes IB Mcast */
> RDMA_MGMT_IB_CM = 0x0000000000000800, /* ib_cm module support */
> RDMA_MGMT_OPA_MAD = 0x0000000000001000, /* ib_mad OPA MAD support */
> RDMA_MGMT_MASK = 0x00000000000fff00,

You explicitly say "userspace" - why would an upper layer need to
know the link, transport and management details? These seem to be
mid-layer matters.

> RDMA_ADDR_IB = 0x0000000000100000, /* Port does IB AH, PR, Pkey */
> RDMA_ADDR_IBoE = 0x0000000000200000, /* Port does IBoE AH, PR, Pkey */
> /* Do we need iWarp (TCP) here? */
> RDMA_ADDR_IB_MASK = 0x000000000ff00000,

I do see a ULP needing to know the address family needed to pass to
rdma_connect and rdma_listen, so I would add "IP", but not "iWARP".

> RDMA_SEPARATE_READ_SGE = 0x0000000010000000,
> RDMA_QUIRKS_MASK = 0x000000fff0000000

This is good, but it also needs an attribute to signal the need for a
remote-writable RDMA Read sink buffer, for today's iWARP.

Tom.

2015-04-13 07:40:20

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v2 01/17] IB/Verbs: Implement new callback query_transport() for each HW

On 04/10/2015 07:36 PM, Jason Gunthorpe wrote:
> On Fri, Apr 10, 2015 at 01:10:43PM -0400, Doug Ledford wrote:
>
>> documented. I get why you link the address variant, because it pops out
>> all the things that are special about IBoE addressing and calls out that
>> the issues need to be handled. However, saying requires_iboe_addr(),
>> while foreshadowing the work that needs done, doesn't actually document
>> the work that needs done. Whether we call is dev_is_iboe() or
>> requires_iboe_addr(), it would be good if the documentation spelled out
>> those specific requirements for reference sake.
>
> My deep hope for this, was that the test 'requires_iboe_addr' or
> whatever we call it would have a *really good* kdoc.
>
> List all the ways iboe_addr's work, how they differ from IB addresses,
> refer to the specs people should read to understand it, etc.
>
> The patches don't do this, and maybe Michael is the wrong person to
> fill that in, but we can get it done..

That's exactly what I'm thinking ;-)

At first I'm just trying to save us some code but now it's becoming
a topic far above that purpose, I'd like to help commit whatever we already
settled and pass the internal reforming works to experts like you guys
, implement the bitmask stuff ;-)

And I can still help on review and may be testing with mlx4 if later I
got the access.

>
> Jason
>
> BTW: Michael, next time you post the series, please trim the CC
> list...

Thanks for the remind, I'll do trim in v3 :-)

Regards,
Michael Wang

>