2015-04-24 12:34:06

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 00/26] IB/Verbs: IB Management Helpers

Since v5:
* Thanks to Ira, Devesh for the review and testing :-)
* Thanks for the comments from Steve, Tom, Jason, Hal, Devesh, Ira,
Liran, Jason, Dave :-) Please remind me if anything missed :-P
* Trivial fix for 4#
* Drop the reform on acquiring link-layer in 9#
* Drop cap_ipoib()

There are plenty of lengthy code to check the transport type of IB device,
or the link layer type of it's port, but actually we are just speculating
whether a particular management/feature is supported by the device/port.

Thus instead of inferring, we should have our own mechanism for IB management
capability/protocol/feature checking, several proposals below.

This patch set will reform the method of getting transport type, we will
now using query_transport() instead of inferring from transport and link
layer respectively, also we defined the new transport type to make the
concept more reasonable.

Mapping List:
node-type link-layer old-transport new-transport
nes RNIC ETH IWARP IWARP
amso1100 RNIC ETH IWARP IWARP
cxgb3 RNIC ETH IWARP IWARP
cxgb4 RNIC ETH IWARP IWARP
usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP
ocrdma IB_CA ETH IB IBOE
mlx4 IB_CA IB/ETH IB IB/IBOE
mlx5 IB_CA IB IB IB
ehca IB_CA IB IB IB
ipath IB_CA IB IB IB
mthca IB_CA IB IB IB
qib IB_CA IB IB IB

For example:
if (transport == IB) && (link-layer == ETH)
will now become:
if (query_transport() == IBOE)

Thus we will be able to get rid of the respective transport and link-layer
checking, and it will help us to add new protocol/Technology (like OPA) more
easier, also with the introduced management helpers, IB management logical
will be more clear and easier for extending.

Highlights:
The patch set covered a wide range of IB stuff, thus for those who are
familiar with the particular part, your suggestion would be invaluable ;-)

Patch 1#~15# included all the logical reform, 16#~25# introduced the
management helpers, 26#~27# do clean up.

we appreciate for those one who have the HW willing to provide Tested-by :-)

Doug suggested the bitmask mechanism:
https://www.mail-archive.com/[email protected]/msg23765.html
which could be the plan for future reforming, we prefer that to be another
series which focus on semantic and performance.

This patch-set is somewhat 'bloated' now and it may be a good timing for
staging, I'd like to suggest we focus on improving existed helpers and push
all the further reforms into next series ;-)

We now have a repository based on latest infiniband/for-next with this
series applied:
[email protected]:ywang-pb/infiniband-wy.git

Proposals:
Sean:
https://www.mail-archive.com/[email protected]/msg23339.html
Doug:
https://www.mail-archive.com/[email protected]/msg23418.html
https://www.mail-archive.com/[email protected]/msg23765.html
Jason:
https://www.mail-archive.com/[email protected]/msg23425.html

Michael Wang (26):
IB/Verbs: Implement new callback query_transport()
IB/Verbs: Implement raw management helpers
IB/Verbs: Reform IB-core mad/agent/user_mad
IB/Verbs: Reform IB-core cm
IB/Verbs: Reform IB-core sa_query
IB/Verbs: Reform IB-core multicast
IB/Verbs: Reform IB-ulp ipoib
IB/Verbs: Reform IB-ulp xprtrdma
IB/Verbs: Reform IB-core verbs
IB/Verbs: Reform cm related part in IB-core cma/ucm
IB/Verbs: Reform route related part in IB-core cma
IB/Verbs: Reform mcast related part in IB-core cma
IB/Verbs: Reserve legacy transport type in 'dev_addr'
IB/Verbs: Reform cma_acquire_dev()
IB/Verbs: Reform rest part in IB-core cma
IB/Verbs: Use management helper cap_ib_mad()
IB/Verbs: Use management helper cap_ib_smi()
IB/Verbs: Use management helper cap_ib_cm()
IB/Verbs: Use management helper cap_iw_cm()
IB/Verbs: Use management helper cap_ib_sa()
IB/Verbs: Use management helper cap_ib_mcast()
IB/Verbs: Use management helper cap_read_multi_sge()
IB/Verbs: Use management helper cap_af_ib()
IB/Verbs: Use management helper cap_eth_ah()
IB/Verbs: Clean up rdma_ib_or_iboe()
IB/Verbs: Cleanup rdma_node_get_transport()

drivers/infiniband/core/agent.c | 2 +-
drivers/infiniband/core/cm.c | 20 +-
drivers/infiniband/core/cma.c | 282 ++++++++++++---------------
drivers/infiniband/core/device.c | 1 +
drivers/infiniband/core/mad.c | 43 ++--
drivers/infiniband/core/multicast.c | 12 +-
drivers/infiniband/core/sa_query.c | 30 +--
drivers/infiniband/core/ucm.c | 3 +-
drivers/infiniband/core/ucma.c | 25 +--
drivers/infiniband/core/user_mad.c | 26 ++-
drivers/infiniband/core/verbs.c | 31 +--
drivers/infiniband/hw/amso1100/c2_provider.c | 7 +
drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 +
drivers/infiniband/hw/cxgb4/provider.c | 7 +
drivers/infiniband/hw/ehca/ehca_hca.c | 6 +
drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 +
drivers/infiniband/hw/ehca/ehca_main.c | 1 +
drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +
drivers/infiniband/hw/mlx4/main.c | 10 +
drivers/infiniband/hw/mlx5/main.c | 7 +
drivers/infiniband/hw/mthca/mthca_provider.c | 7 +
drivers/infiniband/hw/nes/nes_verbs.c | 6 +
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 +
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 +
drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 +
drivers/infiniband/hw/qib/qib_verbs.c | 7 +
drivers/infiniband/hw/usnic/usnic_ib_main.c | 1 +
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 +
drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2 +
drivers/infiniband/ulp/ipoib/ipoib_main.c | 15 +-
include/rdma/ib_verbs.h | 169 +++++++++++++++-
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 4 +-
net/sunrpc/xprtrdma/svc_rdma_transport.c | 47 ++---
33 files changed, 503 insertions(+), 301 deletions(-)

--
2.1.0


2015-04-24 12:24:00

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

Add new callback query_transport() and implement for each HW.

Mapping List:
node-type link-layer old-transport new-transport
nes RNIC ETH IWARP IWARP
amso1100 RNIC ETH IWARP IWARP
cxgb3 RNIC ETH IWARP IWARP
cxgb4 RNIC ETH IWARP IWARP
usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP
ocrdma IB_CA ETH IB IBOE
mlx4 IB_CA IB/ETH IB IB/IBOE
mlx5 IB_CA IB IB IB
ehca IB_CA IB IB IB
ipath IB_CA IB IB IB
mthca IB_CA IB IB IB
qib IB_CA IB IB IB

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/device.c | 1 +
drivers/infiniband/core/verbs.c | 4 +++-
drivers/infiniband/hw/amso1100/c2_provider.c | 7 +++++++
drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 +++++++
drivers/infiniband/hw/cxgb4/provider.c | 7 +++++++
drivers/infiniband/hw/ehca/ehca_hca.c | 6 ++++++
drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 +++
drivers/infiniband/hw/ehca/ehca_main.c | 1 +
drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +++++++
drivers/infiniband/hw/mlx4/main.c | 10 ++++++++++
drivers/infiniband/hw/mlx5/main.c | 7 +++++++
drivers/infiniband/hw/mthca/mthca_provider.c | 7 +++++++
drivers/infiniband/hw/nes/nes_verbs.c | 6 ++++++
drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 +
drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 ++++++
drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 +++
drivers/infiniband/hw/qib/qib_verbs.c | 7 +++++++
drivers/infiniband/hw/usnic/usnic_ib_main.c | 1 +
drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 ++++++
drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2 ++
include/rdma/ib_verbs.h | 7 ++++++-
21 files changed, 104 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c
index 18c1ece..a9587c4 100644
--- a/drivers/infiniband/core/device.c
+++ b/drivers/infiniband/core/device.c
@@ -76,6 +76,7 @@ static int ib_device_check_mandatory(struct ib_device *device)
} mandatory_table[] = {
IB_MANDATORY_FUNC(query_device),
IB_MANDATORY_FUNC(query_port),
+ IB_MANDATORY_FUNC(query_transport),
IB_MANDATORY_FUNC(query_pkey),
IB_MANDATORY_FUNC(query_gid),
IB_MANDATORY_FUNC(alloc_pd),
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index f93eb8d..626c9cf 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -133,14 +133,16 @@ enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_
if (device->get_link_layer)
return device->get_link_layer(device, port_num);

- switch (rdma_node_get_transport(device->node_type)) {
+ switch (device->query_transport(device, port_num)) {
case RDMA_TRANSPORT_IB:
return IB_LINK_LAYER_INFINIBAND;
+ case RDMA_TRANSPORT_IBOE:
case RDMA_TRANSPORT_IWARP:
case RDMA_TRANSPORT_USNIC:
case RDMA_TRANSPORT_USNIC_UDP:
return IB_LINK_LAYER_ETHERNET;
default:
+ BUG();
return IB_LINK_LAYER_UNSPECIFIED;
}
}
diff --git a/drivers/infiniband/hw/amso1100/c2_provider.c b/drivers/infiniband/hw/amso1100/c2_provider.c
index bdf3507..d46bbb0 100644
--- a/drivers/infiniband/hw/amso1100/c2_provider.c
+++ b/drivers/infiniband/hw/amso1100/c2_provider.c
@@ -99,6 +99,12 @@ static int c2_query_port(struct ib_device *ibdev,
return 0;
}

+static enum rdma_transport_type
+c2_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IWARP;
+}
+
static int c2_query_pkey(struct ib_device *ibdev,
u8 port, u16 index, u16 * pkey)
{
@@ -801,6 +807,7 @@ int c2_register_device(struct c2_dev *dev)
dev->ibdev.dma_device = &dev->pcidev->dev;
dev->ibdev.query_device = c2_query_device;
dev->ibdev.query_port = c2_query_port;
+ dev->ibdev.query_transport = c2_query_transport;
dev->ibdev.query_pkey = c2_query_pkey;
dev->ibdev.query_gid = c2_query_gid;
dev->ibdev.alloc_ucontext = c2_alloc_ucontext;
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
index 811b24a..09682e9e 100644
--- a/drivers/infiniband/hw/cxgb3/iwch_provider.c
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -1232,6 +1232,12 @@ static int iwch_query_port(struct ib_device *ibdev,
return 0;
}

+static enum rdma_transport_type
+iwch_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IWARP;
+}
+
static ssize_t show_rev(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -1385,6 +1391,7 @@ int iwch_register_device(struct iwch_dev *dev)
dev->ibdev.dma_device = &(dev->rdev.rnic_info.pdev->dev);
dev->ibdev.query_device = iwch_query_device;
dev->ibdev.query_port = iwch_query_port;
+ dev->ibdev.query_transport = iwch_query_transport;
dev->ibdev.query_pkey = iwch_query_pkey;
dev->ibdev.query_gid = iwch_query_gid;
dev->ibdev.alloc_ucontext = iwch_alloc_ucontext;
diff --git a/drivers/infiniband/hw/cxgb4/provider.c b/drivers/infiniband/hw/cxgb4/provider.c
index 66bd6a2..a445e0d 100644
--- a/drivers/infiniband/hw/cxgb4/provider.c
+++ b/drivers/infiniband/hw/cxgb4/provider.c
@@ -390,6 +390,12 @@ static int c4iw_query_port(struct ib_device *ibdev, u8 port,
return 0;
}

+static enum rdma_transport_type
+c4iw_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IWARP;
+}
+
static ssize_t show_rev(struct device *dev, struct device_attribute *attr,
char *buf)
{
@@ -506,6 +512,7 @@ int c4iw_register_device(struct c4iw_dev *dev)
dev->ibdev.dma_device = &(dev->rdev.lldi.pdev->dev);
dev->ibdev.query_device = c4iw_query_device;
dev->ibdev.query_port = c4iw_query_port;
+ dev->ibdev.query_transport = c4iw_query_transport;
dev->ibdev.query_pkey = c4iw_query_pkey;
dev->ibdev.query_gid = c4iw_query_gid;
dev->ibdev.alloc_ucontext = c4iw_alloc_ucontext;
diff --git a/drivers/infiniband/hw/ehca/ehca_hca.c b/drivers/infiniband/hw/ehca/ehca_hca.c
index 9ed4d25..d5a34a6 100644
--- a/drivers/infiniband/hw/ehca/ehca_hca.c
+++ b/drivers/infiniband/hw/ehca/ehca_hca.c
@@ -242,6 +242,12 @@ query_port1:
return ret;
}

+enum rdma_transport_type
+ehca_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
int ehca_query_sma_attr(struct ehca_shca *shca,
u8 port, struct ehca_sma_attr *attr)
{
diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
index 22f79af..cec945f 100644
--- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
+++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
@@ -49,6 +49,9 @@ int ehca_query_device(struct ib_device *ibdev, struct ib_device_attr *props);
int ehca_query_port(struct ib_device *ibdev, u8 port,
struct ib_port_attr *props);

+enum rdma_transport_type
+ehca_query_transport(struct ib_device *device, u8 port_num);
+
int ehca_query_sma_attr(struct ehca_shca *shca, u8 port,
struct ehca_sma_attr *attr);

diff --git a/drivers/infiniband/hw/ehca/ehca_main.c b/drivers/infiniband/hw/ehca/ehca_main.c
index cd8d290..60e0a09 100644
--- a/drivers/infiniband/hw/ehca/ehca_main.c
+++ b/drivers/infiniband/hw/ehca/ehca_main.c
@@ -467,6 +467,7 @@ static int ehca_init_device(struct ehca_shca *shca)
shca->ib_device.dma_device = &shca->ofdev->dev;
shca->ib_device.query_device = ehca_query_device;
shca->ib_device.query_port = ehca_query_port;
+ shca->ib_device.query_transport = ehca_query_transport;
shca->ib_device.query_gid = ehca_query_gid;
shca->ib_device.query_pkey = ehca_query_pkey;
/* shca->in_device.modify_device = ehca_modify_device */
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.c b/drivers/infiniband/hw/ipath/ipath_verbs.c
index 44ea939..58d36e3 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c
@@ -1638,6 +1638,12 @@ static int ipath_query_port(struct ib_device *ibdev,
return 0;
}

+static enum rdma_transport_type
+ipath_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
static int ipath_modify_device(struct ib_device *device,
int device_modify_mask,
struct ib_device_modify *device_modify)
@@ -2140,6 +2146,7 @@ int ipath_register_ib_device(struct ipath_devdata *dd)
dev->query_device = ipath_query_device;
dev->modify_device = ipath_modify_device;
dev->query_port = ipath_query_port;
+ dev->query_transport = ipath_query_transport;
dev->modify_port = ipath_modify_port;
dev->query_pkey = ipath_query_pkey;
dev->query_gid = ipath_query_gid;
diff --git a/drivers/infiniband/hw/mlx4/main.c b/drivers/infiniband/hw/mlx4/main.c
index 57070c5..1e13cf9 100644
--- a/drivers/infiniband/hw/mlx4/main.c
+++ b/drivers/infiniband/hw/mlx4/main.c
@@ -420,6 +420,15 @@ static int mlx4_ib_query_port(struct ib_device *ibdev, u8 port,
return __mlx4_ib_query_port(ibdev, port, props, 0);
}

+static enum rdma_transport_type
+mlx4_ib_query_transport(struct ib_device *device, u8 port_num)
+{
+ struct mlx4_dev *dev = to_mdev(device)->dev;
+
+ return dev->caps.port_mask[port_num] == MLX4_PORT_TYPE_IB ?
+ RDMA_TRANSPORT_IB : RDMA_TRANSPORT_IBOE;
+}
+
int __mlx4_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
union ib_gid *gid, int netw_view)
{
@@ -2202,6 +2211,7 @@ static void *mlx4_ib_add(struct mlx4_dev *dev)

ibdev->ib_dev.query_device = mlx4_ib_query_device;
ibdev->ib_dev.query_port = mlx4_ib_query_port;
+ ibdev->ib_dev.query_transport = mlx4_ib_query_transport;
ibdev->ib_dev.get_link_layer = mlx4_ib_port_link_layer;
ibdev->ib_dev.query_gid = mlx4_ib_query_gid;
ibdev->ib_dev.query_pkey = mlx4_ib_query_pkey;
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 57c9809..b6f2f58 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -262,6 +262,12 @@ out:
return err;
}

+static enum rdma_transport_type
+mlx5_ib_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
static int mlx5_ib_query_gid(struct ib_device *ibdev, u8 port, int index,
union ib_gid *gid)
{
@@ -1244,6 +1250,7 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)

dev->ib_dev.query_device = mlx5_ib_query_device;
dev->ib_dev.query_port = mlx5_ib_query_port;
+ dev->ib_dev.query_transport = mlx5_ib_query_transport;
dev->ib_dev.query_gid = mlx5_ib_query_gid;
dev->ib_dev.query_pkey = mlx5_ib_query_pkey;
dev->ib_dev.modify_device = mlx5_ib_modify_device;
diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c
index 415f8e1..67ac6a4 100644
--- a/drivers/infiniband/hw/mthca/mthca_provider.c
+++ b/drivers/infiniband/hw/mthca/mthca_provider.c
@@ -179,6 +179,12 @@ static int mthca_query_port(struct ib_device *ibdev,
return err;
}

+static enum rdma_transport_type
+mthca_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
static int mthca_modify_device(struct ib_device *ibdev,
int mask,
struct ib_device_modify *props)
@@ -1281,6 +1287,7 @@ int mthca_register_device(struct mthca_dev *dev)
dev->ib_dev.dma_device = &dev->pdev->dev;
dev->ib_dev.query_device = mthca_query_device;
dev->ib_dev.query_port = mthca_query_port;
+ dev->ib_dev.query_transport = mthca_query_transport;
dev->ib_dev.modify_device = mthca_modify_device;
dev->ib_dev.modify_port = mthca_modify_port;
dev->ib_dev.query_pkey = mthca_query_pkey;
diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c
index c0d0296..8df5b61 100644
--- a/drivers/infiniband/hw/nes/nes_verbs.c
+++ b/drivers/infiniband/hw/nes/nes_verbs.c
@@ -606,6 +606,11 @@ static int nes_query_port(struct ib_device *ibdev, u8 port, struct ib_port_attr
return 0;
}

+static enum rdma_transport_type
+nes_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IWARP;
+}

/**
* nes_query_pkey
@@ -3879,6 +3884,7 @@ struct nes_ib_device *nes_init_ofa_device(struct net_device *netdev)
nesibdev->ibdev.dev.parent = &nesdev->pcidev->dev;
nesibdev->ibdev.query_device = nes_query_device;
nesibdev->ibdev.query_port = nes_query_port;
+ nesibdev->ibdev.query_transport = nes_query_transport;
nesibdev->ibdev.query_pkey = nes_query_pkey;
nesibdev->ibdev.query_gid = nes_query_gid;
nesibdev->ibdev.alloc_ucontext = nes_alloc_ucontext;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index 7a2b59a..9f4d182 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -244,6 +244,7 @@ static int ocrdma_register_device(struct ocrdma_dev *dev)
/* mandatory verbs. */
dev->ibdev.query_device = ocrdma_query_device;
dev->ibdev.query_port = ocrdma_query_port;
+ dev->ibdev.query_transport = ocrdma_query_transport;
dev->ibdev.modify_port = ocrdma_modify_port;
dev->ibdev.query_gid = ocrdma_query_gid;
dev->ibdev.get_link_layer = ocrdma_link_layer;
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
index 8771755..73bace4 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
@@ -187,6 +187,12 @@ int ocrdma_query_port(struct ib_device *ibdev,
return 0;
}

+enum rdma_transport_type
+ocrdma_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IBOE;
+}
+
int ocrdma_modify_port(struct ib_device *ibdev, u8 port, int mask,
struct ib_port_modify *props)
{
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
index b8f7853..4a81b63 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.h
@@ -41,6 +41,9 @@ int ocrdma_query_port(struct ib_device *, u8 port, struct ib_port_attr *props);
int ocrdma_modify_port(struct ib_device *, u8 port, int mask,
struct ib_port_modify *props);

+enum rdma_transport_type
+ocrdma_query_transport(struct ib_device *device, u8 port_num);
+
void ocrdma_get_guid(struct ocrdma_dev *, u8 *guid);
int ocrdma_query_gid(struct ib_device *, u8 port,
int index, union ib_gid *gid);
diff --git a/drivers/infiniband/hw/qib/qib_verbs.c b/drivers/infiniband/hw/qib/qib_verbs.c
index 4a35998..caad665 100644
--- a/drivers/infiniband/hw/qib/qib_verbs.c
+++ b/drivers/infiniband/hw/qib/qib_verbs.c
@@ -1650,6 +1650,12 @@ static int qib_query_port(struct ib_device *ibdev, u8 port,
return 0;
}

+static enum rdma_transport_type
+qib_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_IB;
+}
+
static int qib_modify_device(struct ib_device *device,
int device_modify_mask,
struct ib_device_modify *device_modify)
@@ -2184,6 +2190,7 @@ int qib_register_ib_device(struct qib_devdata *dd)
ibdev->query_device = qib_query_device;
ibdev->modify_device = qib_modify_device;
ibdev->query_port = qib_query_port;
+ ibdev->query_transport = qib_query_transport;
ibdev->modify_port = qib_modify_port;
ibdev->query_pkey = qib_query_pkey;
ibdev->query_gid = qib_query_gid;
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_main.c b/drivers/infiniband/hw/usnic/usnic_ib_main.c
index 0d0f986..03ea9f3 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_main.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_main.c
@@ -360,6 +360,7 @@ static void *usnic_ib_device_add(struct pci_dev *dev)

us_ibdev->ib_dev.query_device = usnic_ib_query_device;
us_ibdev->ib_dev.query_port = usnic_ib_query_port;
+ us_ibdev->ib_dev.query_transport = usnic_ib_query_transport;
us_ibdev->ib_dev.query_pkey = usnic_ib_query_pkey;
us_ibdev->ib_dev.query_gid = usnic_ib_query_gid;
us_ibdev->ib_dev.get_link_layer = usnic_ib_port_link_layer;
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
index 53bd6a2..ff9a5f7 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c
@@ -348,6 +348,12 @@ int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
return 0;
}

+enum rdma_transport_type
+usnic_ib_query_transport(struct ib_device *device, u8 port_num)
+{
+ return RDMA_TRANSPORT_USNIC_UDP;
+}
+
int usnic_ib_query_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr,
int qp_attr_mask,
struct ib_qp_init_attr *qp_init_attr)
diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
index bb864f5..0b1633b 100644
--- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
+++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.h
@@ -27,6 +27,8 @@ int usnic_ib_query_device(struct ib_device *ibdev,
struct ib_device_attr *props);
int usnic_ib_query_port(struct ib_device *ibdev, u8 port,
struct ib_port_attr *props);
+enum rdma_transport_type
+usnic_ib_query_transport(struct ib_device *device, u8 port_num);
int usnic_ib_query_qp(struct ib_qp *qp, struct ib_qp_attr *qp_attr,
int qp_attr_mask,
struct ib_qp_init_attr *qp_init_attr);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 65994a1..d54f91e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -75,10 +75,13 @@ enum rdma_node_type {
};

enum rdma_transport_type {
+ /* legacy for users */
RDMA_TRANSPORT_IB,
RDMA_TRANSPORT_IWARP,
RDMA_TRANSPORT_USNIC,
- RDMA_TRANSPORT_USNIC_UDP
+ RDMA_TRANSPORT_USNIC_UDP,
+ /* new transport */
+ RDMA_TRANSPORT_IBOE,
};

__attribute_const__ enum rdma_transport_type
@@ -1501,6 +1504,8 @@ struct ib_device {
int (*query_port)(struct ib_device *device,
u8 port_num,
struct ib_port_attr *port_attr);
+ enum rdma_transport_type (*query_transport)(struct ib_device *device,
+ u8 port_num);
enum rdma_link_layer (*get_link_layer)(struct ib_device *device,
u8 port_num);
int (*query_gid)(struct ib_device *device,
--
2.1.0

2015-04-24 12:24:04

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 02/26] IB/Verbs: Implement raw management helpers

Add raw helpers:
rdma_tech_ib
rdma_tech_iboe
rdma_tech_iwarp
rdma_ib_or_iboe (transition, clean up later)
To help us detect which technology the port supported.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
include/rdma/ib_verbs.h | 25 +++++++++++++++++++++++++
1 file changed, 25 insertions(+)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index d54f91e..a12e876 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1748,6 +1748,31 @@ int ib_query_port(struct ib_device *device,
enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device,
u8 port_num);

+static inline int rdma_tech_ib(struct ib_device *device, u8 port_num)
+{
+ return device->query_transport(device, port_num)
+ == RDMA_TRANSPORT_IB;
+}
+
+static inline int rdma_tech_iboe(struct ib_device *device, u8 port_num)
+{
+ return device->query_transport(device, port_num)
+ == RDMA_TRANSPORT_IBOE;
+}
+
+static inline int rdma_tech_iwarp(struct ib_device *device, u8 port_num)
+{
+ return device->query_transport(device, port_num)
+ == RDMA_TRANSPORT_IWARP;
+}
+
+static inline int rdma_ib_or_iboe(struct ib_device *device, u8 port_num)
+{
+ enum rdma_transport_type tp = device->query_transport(device, port_num);
+
+ return (tp == RDMA_TRANSPORT_IB || tp == RDMA_TRANSPORT_IBOE);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-24 12:33:48

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 03/26] IB/Verbs: Reform IB-core mad/agent/user_mad

Use raw management helpers to reform IB-core mad/agent/user_mad.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/agent.c | 2 +-
drivers/infiniband/core/mad.c | 43 +++++++++++++++++++-------------------
drivers/infiniband/core/user_mad.c | 26 ++++++++++++++++-------
3 files changed, 41 insertions(+), 30 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index f6d2961..ffdef4d 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int port_num)
goto error1;
}

- if (rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND) {
+ if (rdma_tech_ib(device, port_num)) {
/* Obtain send only MAD agent for SMI QP */
port_priv->agent[0] = ib_register_mad_agent(device, port_num,
IB_QPT_SMI, NULL, 0,
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 74c30f4..1822932 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2938,7 +2938,7 @@ static int ib_mad_port_open(struct ib_device *device,
init_mad_qp(port_priv, &port_priv->qp_info[1]);

cq_size = mad_sendq_size + mad_recvq_size;
- has_smi = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_INFINIBAND;
+ has_smi = rdma_tech_ib(device, port_num);
if (has_smi)
cq_size *= 2;

@@ -3057,9 +3057,6 @@ static void ib_mad_init_device(struct ib_device *device)
{
int start, end, i;

- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
-
if (device->node_type == RDMA_NODE_IB_SWITCH) {
start = 0;
end = 0;
@@ -3069,6 +3066,9 @@ static void ib_mad_init_device(struct ib_device *device)
}

for (i = start; i <= end; i++) {
+ if (!rdma_ib_or_iboe(device, i))
+ continue;
+
if (ib_mad_port_open(device, i)) {
dev_err(&device->dev, "Couldn't open port %d\n", i);
goto error;
@@ -3086,40 +3086,39 @@ error_agent:
dev_err(&device->dev, "Couldn't close port %d\n", i);

error:
- i--;
+ while (--i >= start) {
+ if (!rdma_ib_or_iboe(device, i))
+ continue;

- while (i >= start) {
if (ib_agent_port_close(device, i))
dev_err(&device->dev,
"Couldn't close port %d for agents\n", i);
if (ib_mad_port_close(device, i))
dev_err(&device->dev, "Couldn't close port %d\n", i);
- i--;
}
}

static void ib_mad_remove_device(struct ib_device *device)
{
- int i, num_ports, cur_port;
-
- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
+ int start, end, i;

if (device->node_type == RDMA_NODE_IB_SWITCH) {
- num_ports = 1;
- cur_port = 0;
+ start = 0;
+ end = 0;
} else {
- num_ports = device->phys_port_cnt;
- cur_port = 1;
+ start = 1;
+ end = device->phys_port_cnt;
}
- for (i = 0; i < num_ports; i++, cur_port++) {
- if (ib_agent_port_close(device, cur_port))
+
+ for (i = start; i <= end; i++) {
+ if (!rdma_ib_or_iboe(device, i))
+ continue;
+
+ if (ib_agent_port_close(device, i))
dev_err(&device->dev,
- "Couldn't close port %d for agents\n",
- cur_port);
- if (ib_mad_port_close(device, cur_port))
- dev_err(&device->dev, "Couldn't close port %d\n",
- cur_port);
+ "Couldn't close port %d for agents\n", i);
+ if (ib_mad_port_close(device, i))
+ dev_err(&device->dev, "Couldn't close port %d\n", i);
}
}

diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index 928cdd2..aa8b334 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -1273,9 +1273,7 @@ static void ib_umad_add_one(struct ib_device *device)
{
struct ib_umad_device *umad_dev;
int s, e, i;
-
- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
+ int count = 0;

if (device->node_type == RDMA_NODE_IB_SWITCH)
s = e = 0;
@@ -1296,21 +1294,33 @@ static void ib_umad_add_one(struct ib_device *device)
umad_dev->end_port = e;

for (i = s; i <= e; ++i) {
+ if (!rdma_ib_or_iboe(device, i))
+ continue;
+
umad_dev->port[i - s].umad_dev = umad_dev;

if (ib_umad_init_port(device, i, umad_dev,
&umad_dev->port[i - s]))
goto err;
+
+ count++;
}

+ if (!count)
+ goto free;
+
ib_set_client_data(device, &umad_client, umad_dev);

return;

err:
- while (--i >= s)
- ib_umad_kill_port(&umad_dev->port[i - s]);
+ while (--i >= s) {
+ if (!rdma_ib_or_iboe(device, i))
+ continue;

+ ib_umad_kill_port(&umad_dev->port[i - s]);
+ }
+free:
kobject_put(&umad_dev->kobj);
}

@@ -1322,8 +1332,10 @@ static void ib_umad_remove_one(struct ib_device *device)
if (!umad_dev)
return;

- for (i = 0; i <= umad_dev->end_port - umad_dev->start_port; ++i)
- ib_umad_kill_port(&umad_dev->port[i]);
+ for (i = 0; i <= umad_dev->end_port - umad_dev->start_port; ++i) {
+ if (rdma_ib_or_iboe(device, i))
+ ib_umad_kill_port(&umad_dev->port[i]);
+ }

kobject_put(&umad_dev->kobj);
}
--
2.1.0

2015-04-24 12:24:07

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 04/26] IB/Verbs: Reform IB-core cm

Use raw management helpers to reform IB-core cm.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cm.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index e28a494..add5e484 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -3760,11 +3760,9 @@ static void cm_add_one(struct ib_device *ib_device)
};
unsigned long flags;
int ret;
+ int count = 0;
u8 i;

- if (rdma_node_get_transport(ib_device->node_type) != RDMA_TRANSPORT_IB)
- return;
-
cm_dev = kzalloc(sizeof(*cm_dev) + sizeof(*port) *
ib_device->phys_port_cnt, GFP_KERNEL);
if (!cm_dev)
@@ -3783,6 +3781,9 @@ static void cm_add_one(struct ib_device *ib_device)

set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask);
for (i = 1; i <= ib_device->phys_port_cnt; i++) {
+ if (!rdma_ib_or_iboe(ib_device, i))
+ continue;
+
port = kzalloc(sizeof *port, GFP_KERNEL);
if (!port)
goto error1;
@@ -3809,7 +3810,13 @@ static void cm_add_one(struct ib_device *ib_device)
ret = ib_modify_port(ib_device, i, 0, &port_modify);
if (ret)
goto error3;
+
+ count++;
}
+
+ if (!count)
+ goto free;
+
ib_set_client_data(ib_device, &cm_client, cm_dev);

write_lock_irqsave(&cm.device_lock, flags);
@@ -3825,11 +3832,15 @@ error1:
port_modify.set_port_cap_mask = 0;
port_modify.clr_port_cap_mask = IB_PORT_CM_SUP;
while (--i) {
+ if (!rdma_ib_or_iboe(ib_device, i))
+ continue;
+
port = cm_dev->port[i-1];
ib_modify_port(ib_device, port->port_num, 0, &port_modify);
ib_unregister_mad_agent(port->mad_agent);
cm_remove_port_fs(port);
}
+free:
device_unregister(cm_dev->device);
kfree(cm_dev);
}
@@ -3853,6 +3864,9 @@ static void cm_remove_one(struct ib_device *ib_device)
write_unlock_irqrestore(&cm.device_lock, flags);

for (i = 1; i <= ib_device->phys_port_cnt; i++) {
+ if (!rdma_ib_or_iboe(ib_device, i))
+ continue;
+
port = cm_dev->port[i-1];
ib_modify_port(ib_device, port->port_num, 0, &port_modify);
ib_unregister_mad_agent(port->mad_agent);
--
2.1.0

2015-04-24 12:33:07

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 05/26] IB/Verbs: Reform IB-core sa_query

Use raw management helpers to reform IB-core sa_query.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/sa_query.c | 30 +++++++++++++++++-------------
1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index c38f030..96adf8c 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -450,7 +450,7 @@ static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event
struct ib_sa_port *port =
&sa_dev->port[event->element.port_num - sa_dev->start_port];

- if (rdma_port_get_link_layer(handler->device, port->port_num) != IB_LINK_LAYER_INFINIBAND)
+ if (WARN_ON(!rdma_tech_ib(handler->device, port->port_num)))
return;

spin_lock_irqsave(&port->ah_lock, flags);
@@ -540,7 +540,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
ah_attr->port_num = port_num;
ah_attr->static_rate = rec->rate;

- force_grh = rdma_port_get_link_layer(device, port_num) == IB_LINK_LAYER_ETHERNET;
+ force_grh = rdma_tech_iboe(device, port_num);

if (rec->hop_limit > 1 || force_grh) {
ah_attr->ah_flags = IB_AH_GRH;
@@ -1153,9 +1153,7 @@ static void ib_sa_add_one(struct ib_device *device)
{
struct ib_sa_device *sa_dev;
int s, e, i;
-
- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
+ int count = 0;

if (device->node_type == RDMA_NODE_IB_SWITCH)
s = e = 0;
@@ -1175,7 +1173,7 @@ static void ib_sa_add_one(struct ib_device *device)

for (i = 0; i <= e - s; ++i) {
spin_lock_init(&sa_dev->port[i].ah_lock);
- if (rdma_port_get_link_layer(device, i + 1) != IB_LINK_LAYER_INFINIBAND)
+ if (!rdma_tech_ib(device, i + 1))
continue;

sa_dev->port[i].sm_ah = NULL;
@@ -1189,8 +1187,13 @@ static void ib_sa_add_one(struct ib_device *device)
goto err;

INIT_WORK(&sa_dev->port[i].update_task, update_sm_ah);
+
+ count++;
}

+ if (!count)
+ goto free;
+
ib_set_client_data(device, &sa_client, sa_dev);

/*
@@ -1204,19 +1207,20 @@ static void ib_sa_add_one(struct ib_device *device)
if (ib_register_event_handler(&sa_dev->event_handler))
goto err;

- for (i = 0; i <= e - s; ++i)
- if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND)
+ for (i = 0; i <= e - s; ++i) {
+ if (rdma_tech_ib(device, i + 1))
update_sm_ah(&sa_dev->port[i].update_task);
+ }

return;

err:
- while (--i >= 0)
- if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND)
+ while (--i >= 0) {
+ if (rdma_tech_ib(device, i + 1))
ib_unregister_mad_agent(sa_dev->port[i].agent);
-
+ }
+free:
kfree(sa_dev);
-
return;
}

@@ -1233,7 +1237,7 @@ static void ib_sa_remove_one(struct ib_device *device)
flush_workqueue(ib_wq);

for (i = 0; i <= sa_dev->end_port - sa_dev->start_port; ++i) {
- if (rdma_port_get_link_layer(device, i + 1) == IB_LINK_LAYER_INFINIBAND) {
+ if (rdma_tech_ib(device, i + 1)) {
ib_unregister_mad_agent(sa_dev->port[i].agent);
if (sa_dev->port[i].sm_ah)
kref_put(&sa_dev->port[i].sm_ah->ref, free_sm_ah);
--
2.1.0

2015-04-24 12:33:04

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 06/26] IB/Verbs: Reform IB-core multicast

Use raw management helpers to reform IB-core multicast.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/multicast.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index fa17b55..24d93f5 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -780,8 +780,7 @@ static void mcast_event_handler(struct ib_event_handler *handler,
int index;

dev = container_of(handler, struct mcast_device, event_handler);
- if (rdma_port_get_link_layer(dev->device, event->element.port_num) !=
- IB_LINK_LAYER_INFINIBAND)
+ if (WARN_ON(!rdma_tech_ib(dev->device, event->element.port_num)))
return;

index = event->element.port_num - dev->start_port;
@@ -808,9 +807,6 @@ static void mcast_add_one(struct ib_device *device)
int i;
int count = 0;

- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
-
dev = kmalloc(sizeof *dev + device->phys_port_cnt * sizeof *port,
GFP_KERNEL);
if (!dev)
@@ -824,8 +820,7 @@ static void mcast_add_one(struct ib_device *device)
}

for (i = 0; i <= dev->end_port - dev->start_port; i++) {
- if (rdma_port_get_link_layer(device, dev->start_port + i) !=
- IB_LINK_LAYER_INFINIBAND)
+ if (!rdma_tech_ib(device, dev->start_port + i))
continue;
port = &dev->port[i];
port->dev = dev;
@@ -863,8 +858,7 @@ static void mcast_remove_one(struct ib_device *device)
flush_workqueue(mcast_wq);

for (i = 0; i <= dev->end_port - dev->start_port; i++) {
- if (rdma_port_get_link_layer(device, dev->start_port + i) ==
- IB_LINK_LAYER_INFINIBAND) {
+ if (rdma_tech_ib(device, dev->start_port + i)) {
port = &dev->port[i];
deref_port(port);
wait_for_completion(&port->comp);
--
2.1.0

2015-04-24 12:24:16

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 07/26] IB/Verbs: Reform IB-ulp ipoib

Use raw management helpers to reform IB-ulp ipoib.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/ulp/ipoib/ipoib_main.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 7cad4dd..3cfd6a9 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -1680,9 +1680,7 @@ static void ipoib_add_one(struct ib_device *device)
struct net_device *dev;
struct ipoib_dev_priv *priv;
int s, e, p;
-
- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
+ int count = 0;

dev_list = kmalloc(sizeof *dev_list, GFP_KERNEL);
if (!dev_list)
@@ -1699,15 +1697,21 @@ static void ipoib_add_one(struct ib_device *device)
}

for (p = s; p <= e; ++p) {
- if (rdma_port_get_link_layer(device, p) != IB_LINK_LAYER_INFINIBAND)
+ if (!rdma_tech_ib(device, p))
continue;
dev = ipoib_add_port("ib%d", device, p);
if (!IS_ERR(dev)) {
priv = netdev_priv(dev);
list_add_tail(&priv->list, dev_list);
+ count++;
}
}

+ if (!count) {
+ kfree(dev_list);
+ return;
+ }
+
ib_set_client_data(device, &ipoib_client, dev_list);
}

@@ -1716,9 +1720,6 @@ static void ipoib_remove_one(struct ib_device *device)
struct ipoib_dev_priv *priv, *tmp;
struct list_head *dev_list;

- if (rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
- return;
-
dev_list = ib_get_client_data(device, &ipoib_client);
if (!dev_list)
return;
--
2.1.0

2015-04-24 12:24:20

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 08/26] IB/Verbs: Reform IB-ulp xprtrdma

Use raw management helpers to reform IB-ulp xprtrdma.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 3 +--
net/sunrpc/xprtrdma/svc_rdma_transport.c | 45 +++++++++++++-------------------
2 files changed, 19 insertions(+), 29 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index f9f13a3..a5bed5b 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -117,8 +117,7 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,

static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
{
- if (rdma_node_get_transport(xprt->sc_cm_id->device->node_type) ==
- RDMA_TRANSPORT_IWARP)
+ if (rdma_tech_iwarp(xprt->sc_cm_id->device, xprt->sc_cm_id->port_num))
return 1;
else
return min_t(int, sge_count, xprt->sc_max_sge);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index f609c1c..a09b7a1 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -851,7 +851,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
struct ib_qp_init_attr qp_attr;
struct ib_device_attr devattr;
int uninitialized_var(dma_mr_acc);
- int need_dma_mr;
+ int need_dma_mr = 0;
int ret;
int i;

@@ -985,35 +985,26 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
/*
* Determine if a DMA MR is required and if so, what privs are required
*/
- switch (rdma_node_get_transport(newxprt->sc_cm_id->device->node_type)) {
- case RDMA_TRANSPORT_IWARP:
- newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV;
- if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) {
- need_dma_mr = 1;
- dma_mr_acc =
- (IB_ACCESS_LOCAL_WRITE |
- IB_ACCESS_REMOTE_WRITE);
- } else if (!(devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)) {
- need_dma_mr = 1;
- dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
- } else
- need_dma_mr = 0;
- break;
- case RDMA_TRANSPORT_IB:
- if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG)) {
- need_dma_mr = 1;
- dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
- } else if (!(devattr.device_cap_flags &
- IB_DEVICE_LOCAL_DMA_LKEY)) {
- need_dma_mr = 1;
- dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
- } else
- need_dma_mr = 0;
- break;
- default:
+ if (!rdma_tech_iwarp(newxprt->sc_cm_id->device,
+ newxprt->sc_cm_id->port_num) &&
+ !rdma_ib_or_iboe(newxprt->sc_cm_id->device,
+ newxprt->sc_cm_id->port_num))
goto errout;
+
+ if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG) ||
+ !(devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)) {
+ need_dma_mr = 1;
+ dma_mr_acc = IB_ACCESS_LOCAL_WRITE;
+ if (rdma_tech_iwarp(newxprt->sc_cm_id->device,
+ newxprt->sc_cm_id->port_num) &&
+ !(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG))
+ dma_mr_acc |= IB_ACCESS_REMOTE_WRITE;
}

+ if (rdma_tech_iwarp(newxprt->sc_cm_id->device,
+ newxprt->sc_cm_id->port_num))
+ newxprt->sc_dev_caps |= SVCRDMA_DEVCAP_READ_W_INV;
+
/* Create the DMA MR if needed, otherwise, use the DMA LKEY */
if (need_dma_mr) {
/* Register all of physical memory */
--
2.1.0

2015-04-24 12:32:42

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 09/26] IB/Verbs: Reform IB-core verbs

Use raw management helpers to reform IB-core verbs

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/verbs.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 626c9cf..7264860 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -200,11 +200,9 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
u32 flow_class;
u16 gid_index;
int ret;
- int is_eth = (rdma_port_get_link_layer(device, port_num) ==
- IB_LINK_LAYER_ETHERNET);

memset(ah_attr, 0, sizeof *ah_attr);
- if (is_eth) {
+ if (rdma_tech_iboe(device, port_num)) {
if (!(wc->wc_flags & IB_WC_GRH))
return -EPROTOTYPE;

@@ -873,7 +871,7 @@ int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
union ib_gid sgid;

if ((*qp_attr_mask & IB_QP_AV) &&
- (rdma_port_get_link_layer(qp->device, qp_attr->ah_attr.port_num) == IB_LINK_LAYER_ETHERNET)) {
+ (rdma_tech_iboe(qp->device, qp_attr->ah_attr.port_num))) {
ret = ib_query_gid(qp->device, qp_attr->ah_attr.port_num,
qp_attr->ah_attr.grh.sgid_index, &sgid);
if (ret)
--
2.1.0

2015-04-24 12:32:16

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 10/26] IB/Verbs: Reform cm related part in IB-core cma/ucm

Use raw management helpers to reform cm related part in IB-core cma/ucm.

Few checks focus on the device cm type rather than the port capability,
directly pass port 1 works currently, but can't support mixing cm type
device in future.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 81 +++++++++++++------------------------------
drivers/infiniband/core/ucm.c | 3 +-
2 files changed, 26 insertions(+), 58 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index d570030..815e41b 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -735,8 +735,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr,
int ret = 0;

id_priv = container_of(id, struct rdma_id_private, id);
- switch (rdma_node_get_transport(id_priv->id.device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_or_iboe(id->device, id->port_num)) {
if (!id_priv->cm_id.ib || (id_priv->id.qp_type == IB_QPT_UD))
ret = cma_ib_init_qp_attr(id_priv, qp_attr, qp_attr_mask);
else
@@ -745,19 +744,15 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr,

if (qp_attr->qp_state == IB_QPS_RTR)
qp_attr->rq_psn = id_priv->seq_num;
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_tech_iwarp(id->device, id->port_num)) {
if (!id_priv->cm_id.iw) {
qp_attr->qp_access_flags = 0;
*qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS;
} else
ret = iw_cm_init_qp_attr(id_priv->cm_id.iw, qp_attr,
qp_attr_mask);
- break;
- default:
+ } else
ret = -ENOSYS;
- break;
- }

return ret;
}
@@ -1037,17 +1032,12 @@ void rdma_destroy_id(struct rdma_cm_id *id)
mutex_unlock(&id_priv->handler_mutex);

if (id_priv->cma_dev) {
- switch (rdma_node_get_transport(id_priv->id.device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_or_iboe(id_priv->id.device, 1)) {
if (id_priv->cm_id.ib)
ib_destroy_cm_id(id_priv->cm_id.ib);
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_tech_iwarp(id_priv->id.device, 1)) {
if (id_priv->cm_id.iw)
iw_destroy_cm_id(id_priv->cm_id.iw);
- break;
- default:
- break;
}
cma_leave_mc_groups(id_priv);
cma_release_dev(id_priv);
@@ -1626,7 +1616,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
int ret;

if (cma_family(id_priv) == AF_IB &&
- rdma_node_get_transport(cma_dev->device->node_type) != RDMA_TRANSPORT_IB)
+ !rdma_ib_or_iboe(cma_dev->device, 1))
return;

id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps,
@@ -2028,7 +2018,7 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv)
mutex_lock(&lock);
list_for_each_entry(cur_dev, &dev_list, list) {
if (cma_family(id_priv) == AF_IB &&
- rdma_node_get_transport(cur_dev->device->node_type) != RDMA_TRANSPORT_IB)
+ !rdma_ib_or_iboe(cur_dev->device, 1))
continue;

if (!cma_dev)
@@ -2060,7 +2050,7 @@ port_found:
goto out;

id_priv->id.route.addr.dev_addr.dev_type =
- (rdma_port_get_link_layer(cma_dev->device, p) == IB_LINK_LAYER_INFINIBAND) ?
+ (rdma_tech_ib(cma_dev->device, p)) ?
ARPHRD_INFINIBAND : ARPHRD_ETHER;

rdma_addr_set_sgid(&id_priv->id.route.addr.dev_addr, &gid);
@@ -2537,18 +2527,15 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)

id_priv->backlog = backlog;
if (id->device) {
- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_or_iboe(id->device, 1)) {
ret = cma_ib_listen(id_priv);
if (ret)
goto err;
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_tech_iwarp(id->device, 1)) {
ret = cma_iw_listen(id_priv, backlog);
if (ret)
goto err;
- break;
- default:
+ } else {
ret = -ENOSYS;
goto err;
}
@@ -2884,20 +2871,15 @@ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
id_priv->srq = conn_param->srq;
}

- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_or_iboe(id->device, id->port_num)) {
if (id->qp_type == IB_QPT_UD)
ret = cma_resolve_ib_udp(id_priv, conn_param);
else
ret = cma_connect_ib(id_priv, conn_param);
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_tech_iwarp(id->device, id->port_num))
ret = cma_connect_iw(id_priv, conn_param);
- break;
- default:
+ else
ret = -ENOSYS;
- break;
- }
if (ret)
goto err;

@@ -3000,8 +2982,7 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
id_priv->srq = conn_param->srq;
}

- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_or_iboe(id->device, id->port_num)) {
if (id->qp_type == IB_QPT_UD) {
if (conn_param)
ret = cma_send_sidr_rep(id_priv, IB_SIDR_SUCCESS,
@@ -3017,14 +2998,10 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
else
ret = cma_rep_recv(id_priv);
}
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_tech_iwarp(id->device, id->port_num))
ret = cma_accept_iw(id_priv, conn_param);
- break;
- default:
+ else
ret = -ENOSYS;
- break;
- }

if (ret)
goto reject;
@@ -3068,8 +3045,7 @@ int rdma_reject(struct rdma_cm_id *id, const void *private_data,
if (!id_priv->cm_id.ib)
return -EINVAL;

- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_or_iboe(id->device, id->port_num)) {
if (id->qp_type == IB_QPT_UD)
ret = cma_send_sidr_rep(id_priv, IB_SIDR_REJECT, 0,
private_data, private_data_len);
@@ -3077,15 +3053,12 @@ int rdma_reject(struct rdma_cm_id *id, const void *private_data,
ret = ib_send_cm_rej(id_priv->cm_id.ib,
IB_CM_REJ_CONSUMER_DEFINED, NULL,
0, private_data, private_data_len);
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_tech_iwarp(id->device, id->port_num)) {
ret = iw_cm_reject(id_priv->cm_id.iw,
private_data, private_data_len);
- break;
- default:
+ } else
ret = -ENOSYS;
- break;
- }
+
return ret;
}
EXPORT_SYMBOL(rdma_reject);
@@ -3099,22 +3072,18 @@ int rdma_disconnect(struct rdma_cm_id *id)
if (!id_priv->cm_id.ib)
return -EINVAL;

- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
+ if (rdma_ib_or_iboe(id->device, id->port_num)) {
ret = cma_modify_qp_err(id_priv);
if (ret)
goto out;
/* Initiate or respond to a disconnect. */
if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0))
ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0);
- break;
- case RDMA_TRANSPORT_IWARP:
+ } else if (rdma_tech_iwarp(id->device, id->port_num)) {
ret = iw_cm_disconnect(id_priv->cm_id.iw, 0);
- break;
- default:
+ } else
ret = -EINVAL;
- break;
- }
+
out:
return ret;
}
diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index f2f6393..70e0ccb 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -1253,8 +1253,7 @@ static void ib_ucm_add_one(struct ib_device *device)
dev_t base;
struct ib_ucm_device *ucm_dev;

- if (!device->alloc_ucontext ||
- rdma_node_get_transport(device->node_type) != RDMA_TRANSPORT_IB)
+ if (!device->alloc_ucontext || !rdma_ib_or_iboe(device, 1))
return;

ucm_dev = kzalloc(sizeof *ucm_dev, GFP_KERNEL);
--
2.1.0

2015-04-24 12:24:24

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 11/26] IB/Verbs: Reform route related part in IB-core cma

Use raw management helpers to reform route related part in IB-core cma.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 31 ++++++++-----------------------
drivers/infiniband/core/ucma.c | 25 ++++++-------------------
2 files changed, 14 insertions(+), 42 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 815e41b..fa69f34 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -923,13 +923,9 @@ static inline int cma_user_data_offset(struct rdma_id_private *id_priv)

static void cma_cancel_route(struct rdma_id_private *id_priv)
{
- switch (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)) {
- case IB_LINK_LAYER_INFINIBAND:
+ if (rdma_tech_ib(id_priv->id.device, id_priv->id.port_num)) {
if (id_priv->query)
ib_sa_cancel_query(id_priv->query_id, id_priv->query);
- break;
- default:
- break;
}
}

@@ -1957,26 +1953,15 @@ int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms)
return -EINVAL;

atomic_inc(&id_priv->refcount);
- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
- switch (rdma_port_get_link_layer(id->device, id->port_num)) {
- case IB_LINK_LAYER_INFINIBAND:
- ret = cma_resolve_ib_route(id_priv, timeout_ms);
- break;
- case IB_LINK_LAYER_ETHERNET:
- ret = cma_resolve_iboe_route(id_priv);
- break;
- default:
- ret = -ENOSYS;
- }
- break;
- case RDMA_TRANSPORT_IWARP:
+ if (rdma_tech_ib(id->device, id->port_num))
+ ret = cma_resolve_ib_route(id_priv, timeout_ms);
+ else if (rdma_tech_iboe(id->device, id->port_num))
+ ret = cma_resolve_iboe_route(id_priv);
+ else if (rdma_tech_iwarp(id->device, id->port_num))
ret = cma_resolve_iw_route(id_priv, timeout_ms);
- break;
- default:
+ else
ret = -ENOSYS;
- break;
- }
+
if (ret)
goto err;

diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 45d67e9..7331c6c 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -722,26 +722,13 @@ static ssize_t ucma_query_route(struct ucma_file *file,

resp.node_guid = (__force __u64) ctx->cm_id->device->node_guid;
resp.port_num = ctx->cm_id->port_num;
- switch (rdma_node_get_transport(ctx->cm_id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
- switch (rdma_port_get_link_layer(ctx->cm_id->device,
- ctx->cm_id->port_num)) {
- case IB_LINK_LAYER_INFINIBAND:
- ucma_copy_ib_route(&resp, &ctx->cm_id->route);
- break;
- case IB_LINK_LAYER_ETHERNET:
- ucma_copy_iboe_route(&resp, &ctx->cm_id->route);
- break;
- default:
- break;
- }
- break;
- case RDMA_TRANSPORT_IWARP:
+
+ if (rdma_tech_ib(ctx->cm_id->device, ctx->cm_id->port_num))
+ ucma_copy_ib_route(&resp, &ctx->cm_id->route);
+ else if (rdma_tech_iboe(ctx->cm_id->device, ctx->cm_id->port_num))
+ ucma_copy_iboe_route(&resp, &ctx->cm_id->route);
+ else if (rdma_tech_iwarp(ctx->cm_id->device, ctx->cm_id->port_num))
ucma_copy_iw_route(&resp, &ctx->cm_id->route);
- break;
- default:
- break;
- }

out:
if (copy_to_user((void __user *)(unsigned long)cmd.response,
--
2.1.0

2015-04-24 12:31:59

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 12/26] IB/Verbs: Reform mcast related part in IB-core cma

Use raw management helpers to reform mcast related part in IB-core cma.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 56 ++++++++++++++-----------------------------
1 file changed, 18 insertions(+), 38 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index fa69f34..a89c246 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -997,17 +997,12 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv)
mc = container_of(id_priv->mc_list.next,
struct cma_multicast, list);
list_del(&mc->list);
- switch (rdma_port_get_link_layer(id_priv->cma_dev->device, id_priv->id.port_num)) {
- case IB_LINK_LAYER_INFINIBAND:
+ if (rdma_tech_ib(id_priv->cma_dev->device,
+ id_priv->id.port_num)) {
ib_sa_free_multicast(mc->multicast.ib);
kfree(mc);
- break;
- case IB_LINK_LAYER_ETHERNET:
+ } else
kref_put(&mc->mcref, release_mc);
- break;
- default:
- break;
- }
}
}

@@ -3314,24 +3309,13 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
list_add(&mc->list, &id_priv->mc_list);
spin_unlock(&id_priv->lock);

- switch (rdma_node_get_transport(id->device->node_type)) {
- case RDMA_TRANSPORT_IB:
- switch (rdma_port_get_link_layer(id->device, id->port_num)) {
- case IB_LINK_LAYER_INFINIBAND:
- ret = cma_join_ib_multicast(id_priv, mc);
- break;
- case IB_LINK_LAYER_ETHERNET:
- kref_init(&mc->mcref);
- ret = cma_iboe_join_multicast(id_priv, mc);
- break;
- default:
- ret = -EINVAL;
- }
- break;
- default:
+ if (rdma_tech_iboe(id->device, id->port_num)) {
+ kref_init(&mc->mcref);
+ ret = cma_iboe_join_multicast(id_priv, mc);
+ } else if (rdma_tech_ib(id->device, id->port_num))
+ ret = cma_join_ib_multicast(id_priv, mc);
+ else
ret = -ENOSYS;
- break;
- }

if (ret) {
spin_lock_irq(&id_priv->lock);
@@ -3359,19 +3343,15 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr)
ib_detach_mcast(id->qp,
&mc->multicast.ib->rec.mgid,
be16_to_cpu(mc->multicast.ib->rec.mlid));
- if (rdma_node_get_transport(id_priv->cma_dev->device->node_type) == RDMA_TRANSPORT_IB) {
- switch (rdma_port_get_link_layer(id->device, id->port_num)) {
- case IB_LINK_LAYER_INFINIBAND:
- ib_sa_free_multicast(mc->multicast.ib);
- kfree(mc);
- break;
- case IB_LINK_LAYER_ETHERNET:
- kref_put(&mc->mcref, release_mc);
- break;
- default:
- break;
- }
- }
+
+ BUG_ON(id_priv->cma_dev->device != id->device);
+
+ if (rdma_tech_ib(id->device, id->port_num)) {
+ ib_sa_free_multicast(mc->multicast.ib);
+ kfree(mc);
+ } else if (rdma_tech_iboe(id->device, id->port_num))
+ kref_put(&mc->mcref, release_mc);
+
return;
}
}
--
2.1.0

2015-04-24 12:31:27

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 13/26] IB/Verbs: Reserve legacy transport type in 'dev_addr'

Reserve the legacy transport type for the 'transport' member
of 'struct rdma_dev_addr' until we make sure this is no
longer needed.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index a89c246..e47284e 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -244,14 +244,35 @@ static inline void cma_set_ip_ver(struct cma_hdr *hdr, u8 ip_ver)
hdr->ip_version = (ip_ver << 4) | (hdr->ip_version & 0xF);
}

+static inline void cma_set_legacy_transport(struct rdma_cm_id *id)
+{
+ switch (id->device->node_type) {
+ case RDMA_NODE_IB_CA:
+ case RDMA_NODE_IB_SWITCH:
+ case RDMA_NODE_IB_ROUTER:
+ id->route.addr.dev_addr.transport = RDMA_TRANSPORT_IB;
+ break;
+ case RDMA_NODE_RNIC:
+ id->route.addr.dev_addr.transport = RDMA_TRANSPORT_IWARP;
+ break;
+ case RDMA_NODE_USNIC:
+ id->route.addr.dev_addr.transport = RDMA_TRANSPORT_USNIC;
+ break;
+ case RDMA_NODE_USNIC_UDP:
+ id->route.addr.dev_addr.transport = RDMA_TRANSPORT_USNIC_UDP;
+ break;
+ default:
+ BUG();
+ }
+}
+
static void cma_attach_to_dev(struct rdma_id_private *id_priv,
struct cma_device *cma_dev)
{
atomic_inc(&cma_dev->refcount);
id_priv->cma_dev = cma_dev;
id_priv->id.device = cma_dev->device;
- id_priv->id.route.addr.dev_addr.transport =
- rdma_node_get_transport(cma_dev->device->node_type);
+ cma_set_legacy_transport(&id_priv->id);
list_add_tail(&id_priv->list, &cma_dev->id_list);
}

--
2.1.0

2015-04-24 12:31:05

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 14/26] IB/Verbs: Reform cma_acquire_dev()

Reform cma_acquire_dev() with management helpers, introduce
cma_validate_port() to make the code more clean.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 68 +++++++++++++++++++++++++------------------
1 file changed, 40 insertions(+), 28 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index e47284e..b80495c 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -370,18 +370,35 @@ static int cma_translate_addr(struct sockaddr *addr, struct rdma_dev_addr *dev_a
return ret;
}

+static inline int cma_validate_port(struct ib_device *device, u8 port,
+ union ib_gid *gid, int dev_type)
+{
+ u8 found_port;
+ int ret = -ENODEV;
+
+ if ((dev_type == ARPHRD_INFINIBAND) && !rdma_tech_ib(device, port))
+ return ret;
+
+ if ((dev_type != ARPHRD_INFINIBAND) && rdma_tech_ib(device, port))
+ return ret;
+
+ ret = ib_find_cached_gid(device, gid, &found_port, NULL);
+ if (port != found_port)
+ return -ENODEV;
+
+ return ret;
+}
+
static int cma_acquire_dev(struct rdma_id_private *id_priv,
struct rdma_id_private *listen_id_priv)
{
struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr;
struct cma_device *cma_dev;
- union ib_gid gid, iboe_gid;
+ union ib_gid gid, iboe_gid, *gidp;
int ret = -ENODEV;
- u8 port, found_port;
- enum rdma_link_layer dev_ll = dev_addr->dev_type == ARPHRD_INFINIBAND ?
- IB_LINK_LAYER_INFINIBAND : IB_LINK_LAYER_ETHERNET;
+ u8 port;

- if (dev_ll != IB_LINK_LAYER_INFINIBAND &&
+ if (dev_addr->dev_type != ARPHRD_INFINIBAND &&
id_priv->id.ps == RDMA_PS_IPOIB)
return -EINVAL;

@@ -391,41 +408,36 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv,

memcpy(&gid, dev_addr->src_dev_addr +
rdma_addr_gid_offset(dev_addr), sizeof gid);
- if (listen_id_priv &&
- rdma_port_get_link_layer(listen_id_priv->id.device,
- listen_id_priv->id.port_num) == dev_ll) {
+
+ if (listen_id_priv) {
cma_dev = listen_id_priv->cma_dev;
port = listen_id_priv->id.port_num;
- if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB &&
- rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET)
- ret = ib_find_cached_gid(cma_dev->device, &iboe_gid,
- &found_port, NULL);
- else
- ret = ib_find_cached_gid(cma_dev->device, &gid,
- &found_port, NULL);
+ gidp = rdma_tech_iboe(cma_dev->device, port) ?
+ &iboe_gid : &gid;

- if (!ret && (port == found_port)) {
- id_priv->id.port_num = found_port;
+ ret = cma_validate_port(cma_dev->device, port, gidp,
+ dev_addr->dev_type);
+ if (!ret) {
+ id_priv->id.port_num = port;
goto out;
}
}
+
list_for_each_entry(cma_dev, &dev_list, list) {
for (port = 1; port <= cma_dev->device->phys_port_cnt; ++port) {
if (listen_id_priv &&
listen_id_priv->cma_dev == cma_dev &&
listen_id_priv->id.port_num == port)
continue;
- if (rdma_port_get_link_layer(cma_dev->device, port) == dev_ll) {
- if (rdma_node_get_transport(cma_dev->device->node_type) == RDMA_TRANSPORT_IB &&
- rdma_port_get_link_layer(cma_dev->device, port) == IB_LINK_LAYER_ETHERNET)
- ret = ib_find_cached_gid(cma_dev->device, &iboe_gid, &found_port, NULL);
- else
- ret = ib_find_cached_gid(cma_dev->device, &gid, &found_port, NULL);
-
- if (!ret && (port == found_port)) {
- id_priv->id.port_num = found_port;
- goto out;
- }
+
+ gidp = rdma_tech_iboe(cma_dev->device, port) ?
+ &iboe_gid : &gid;
+
+ ret = cma_validate_port(cma_dev->device, port, gidp,
+ dev_addr->dev_type);
+ if (!ret) {
+ id_priv->id.port_num = port;
+ goto out;
}
}
}
--
2.1.0

2015-04-24 12:24:30

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 15/26] IB/Verbs: Reform rest part in IB-core cma

Use raw management helpers to reform rest part in IB-core cma.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index b80495c..a20566e 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -468,10 +468,10 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)
pkey = ntohs(addr->sib_pkey);

list_for_each_entry(cur_dev, &dev_list, list) {
- if (rdma_node_get_transport(cur_dev->device->node_type) != RDMA_TRANSPORT_IB)
- continue;
-
for (p = 1; p <= cur_dev->device->phys_port_cnt; ++p) {
+ if (!rdma_ib_or_iboe(cur_dev->device, p))
+ continue;
+
if (ib_find_cached_pkey(cur_dev->device, p, pkey, &index))
continue;

@@ -666,10 +666,9 @@ static int cma_modify_qp_rtr(struct rdma_id_private *id_priv,
if (ret)
goto out;

- if (rdma_node_get_transport(id_priv->cma_dev->device->node_type)
- == RDMA_TRANSPORT_IB &&
- rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num)
- == IB_LINK_LAYER_ETHERNET) {
+ BUG_ON(id_priv->cma_dev->device != id_priv->id.device);
+
+ if (rdma_tech_iboe(id_priv->id.device, id_priv->id.port_num)) {
ret = rdma_addr_find_smac_by_sgid(&sgid, qp_attr.smac, NULL);

if (ret)
@@ -733,11 +732,10 @@ static int cma_ib_init_qp_attr(struct rdma_id_private *id_priv,
int ret;
u16 pkey;

- if (rdma_port_get_link_layer(id_priv->id.device, id_priv->id.port_num) ==
- IB_LINK_LAYER_INFINIBAND)
- pkey = ib_addr_get_pkey(dev_addr);
- else
+ if (rdma_tech_iboe(id_priv->id.device, id_priv->id.port_num))
pkey = 0xffff;
+ else
+ pkey = ib_addr_get_pkey(dev_addr);

ret = ib_find_cached_pkey(id_priv->id.device, id_priv->id.port_num,
pkey, &qp_attr->pkey_index);
--
2.1.0

2015-04-24 12:24:34

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 16/26] IB/Verbs: Use management helper cap_ib_mad()

Introduce helper cap_ib_mad() to help us check if the port of an
IB device support Infiniband Management Datagrams.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/mad.c | 6 +++---
drivers/infiniband/core/user_mad.c | 6 +++---
include/rdma/ib_verbs.h | 15 +++++++++++++++
3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 1822932..4315aeb6 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -3066,7 +3066,7 @@ static void ib_mad_init_device(struct ib_device *device)
}

for (i = start; i <= end; i++) {
- if (!rdma_ib_or_iboe(device, i))
+ if (!cap_ib_mad(device, i))
continue;

if (ib_mad_port_open(device, i)) {
@@ -3087,7 +3087,7 @@ error_agent:

error:
while (--i >= start) {
- if (!rdma_ib_or_iboe(device, i))
+ if (!cap_ib_mad(device, i))
continue;

if (ib_agent_port_close(device, i))
@@ -3111,7 +3111,7 @@ static void ib_mad_remove_device(struct ib_device *device)
}

for (i = start; i <= end; i++) {
- if (!rdma_ib_or_iboe(device, i))
+ if (!cap_ib_mad(device, i))
continue;

if (ib_agent_port_close(device, i))
diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c
index aa8b334..e3ccbf2 100644
--- a/drivers/infiniband/core/user_mad.c
+++ b/drivers/infiniband/core/user_mad.c
@@ -1294,7 +1294,7 @@ static void ib_umad_add_one(struct ib_device *device)
umad_dev->end_port = e;

for (i = s; i <= e; ++i) {
- if (!rdma_ib_or_iboe(device, i))
+ if (!cap_ib_mad(device, i))
continue;

umad_dev->port[i - s].umad_dev = umad_dev;
@@ -1315,7 +1315,7 @@ static void ib_umad_add_one(struct ib_device *device)

err:
while (--i >= s) {
- if (!rdma_ib_or_iboe(device, i))
+ if (!cap_ib_mad(device, i))
continue;

ib_umad_kill_port(&umad_dev->port[i - s]);
@@ -1333,7 +1333,7 @@ static void ib_umad_remove_one(struct ib_device *device)
return;

for (i = 0; i <= umad_dev->end_port - umad_dev->start_port; ++i) {
- if (rdma_ib_or_iboe(device, i))
+ if (cap_ib_mad(device, i))
ib_umad_kill_port(&umad_dev->port[i]);
}

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index a12e876..624e963 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1773,6 +1773,21 @@ static inline int rdma_ib_or_iboe(struct ib_device *device, u8 port_num)
return (tp == RDMA_TRANSPORT_IB || tp == RDMA_TRANSPORT_IBOE);
}

+/**
+ * cap_ib_mad - Check if the port of device has the capability Infiniband
+ * Management Datagrams.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support Infiniband
+ * Management Datagrams.
+ */
+static inline int cap_ib_mad(struct ib_device *device, u8 port_num)
+{
+ return rdma_ib_or_iboe(device, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-24 12:30:16

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 17/26] IB/Verbs: Use management helper cap_ib_smi()

Introduce helper cap_ib_smi() to help us check if the port of an
IB device support Infiniband Subnet Management Interface.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/agent.c | 2 +-
drivers/infiniband/core/mad.c | 2 +-
include/rdma/ib_verbs.h | 15 +++++++++++++++
3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/agent.c b/drivers/infiniband/core/agent.c
index ffdef4d..61471ee 100644
--- a/drivers/infiniband/core/agent.c
+++ b/drivers/infiniband/core/agent.c
@@ -156,7 +156,7 @@ int ib_agent_port_open(struct ib_device *device, int port_num)
goto error1;
}

- if (rdma_tech_ib(device, port_num)) {
+ if (cap_ib_smi(device, port_num)) {
/* Obtain send only MAD agent for SMI QP */
port_priv->agent[0] = ib_register_mad_agent(device, port_num,
IB_QPT_SMI, NULL, 0,
diff --git a/drivers/infiniband/core/mad.c b/drivers/infiniband/core/mad.c
index 4315aeb6..ee3a05e 100644
--- a/drivers/infiniband/core/mad.c
+++ b/drivers/infiniband/core/mad.c
@@ -2938,7 +2938,7 @@ static int ib_mad_port_open(struct ib_device *device,
init_mad_qp(port_priv, &port_priv->qp_info[1]);

cq_size = mad_sendq_size + mad_recvq_size;
- has_smi = rdma_tech_ib(device, port_num);
+ has_smi = cap_ib_smi(device, port_num);
if (has_smi)
cq_size *= 2;

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 624e963..873b9a6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1788,6 +1788,21 @@ static inline int cap_ib_mad(struct ib_device *device, u8 port_num)
return rdma_ib_or_iboe(device, port_num);
}

+/**
+ * cap_ib_smi - Check if the port of device has the capability Infiniband
+ * Subnet Management Interface.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support Infiniband
+ * Subnet Management Interface.
+ */
+static inline int cap_ib_smi(struct ib_device *device, u8 port_num)
+{
+ return rdma_tech_ib(device, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-24 12:24:41

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 18/26] IB/Verbs: Use management helper cap_ib_cm()

Introduce helper cap_ib_cm() to help us check if the port of an
IB device support Infiniband Communication Manager.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cm.c | 6 +++---
drivers/infiniband/core/cma.c | 19 +++++++++----------
drivers/infiniband/core/ucm.c | 2 +-
include/rdma/ib_verbs.h | 15 +++++++++++++++
4 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index add5e484..3ffaad3 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -3781,7 +3781,7 @@ static void cm_add_one(struct ib_device *ib_device)

set_bit(IB_MGMT_METHOD_SEND, reg_req.method_mask);
for (i = 1; i <= ib_device->phys_port_cnt; i++) {
- if (!rdma_ib_or_iboe(ib_device, i))
+ if (!cap_ib_cm(ib_device, i))
continue;

port = kzalloc(sizeof *port, GFP_KERNEL);
@@ -3832,7 +3832,7 @@ error1:
port_modify.set_port_cap_mask = 0;
port_modify.clr_port_cap_mask = IB_PORT_CM_SUP;
while (--i) {
- if (!rdma_ib_or_iboe(ib_device, i))
+ if (!cap_ib_cm(ib_device, i))
continue;

port = cm_dev->port[i-1];
@@ -3864,7 +3864,7 @@ static void cm_remove_one(struct ib_device *ib_device)
write_unlock_irqrestore(&cm.device_lock, flags);

for (i = 1; i <= ib_device->phys_port_cnt; i++) {
- if (!rdma_ib_or_iboe(ib_device, i))
+ if (!cap_ib_cm(ib_device, i))
continue;

port = cm_dev->port[i-1];
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index a20566e..08d2d78 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -766,7 +766,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr,
int ret = 0;

id_priv = container_of(id, struct rdma_id_private, id);
- if (rdma_ib_or_iboe(id->device, id->port_num)) {
+ if (cap_ib_cm(id->device, id->port_num)) {
if (!id_priv->cm_id.ib || (id_priv->id.qp_type == IB_QPT_UD))
ret = cma_ib_init_qp_attr(id_priv, qp_attr, qp_attr_mask);
else
@@ -1054,7 +1054,7 @@ void rdma_destroy_id(struct rdma_cm_id *id)
mutex_unlock(&id_priv->handler_mutex);

if (id_priv->cma_dev) {
- if (rdma_ib_or_iboe(id_priv->id.device, 1)) {
+ if (cap_ib_cm(id_priv->id.device, 1)) {
if (id_priv->cm_id.ib)
ib_destroy_cm_id(id_priv->cm_id.ib);
} else if (rdma_tech_iwarp(id_priv->id.device, 1)) {
@@ -1637,8 +1637,7 @@ static void cma_listen_on_dev(struct rdma_id_private *id_priv,
struct rdma_cm_id *id;
int ret;

- if (cma_family(id_priv) == AF_IB &&
- !rdma_ib_or_iboe(cma_dev->device, 1))
+ if (cma_family(id_priv) == AF_IB && !cap_ib_cm(cma_dev->device, 1))
return;

id = rdma_create_id(cma_listen_handler, id_priv, id_priv->id.ps,
@@ -2029,7 +2028,7 @@ static int cma_bind_loopback(struct rdma_id_private *id_priv)
mutex_lock(&lock);
list_for_each_entry(cur_dev, &dev_list, list) {
if (cma_family(id_priv) == AF_IB &&
- !rdma_ib_or_iboe(cur_dev->device, 1))
+ !cap_ib_cm(cur_dev->device, 1))
continue;

if (!cma_dev)
@@ -2538,7 +2537,7 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)

id_priv->backlog = backlog;
if (id->device) {
- if (rdma_ib_or_iboe(id->device, 1)) {
+ if (cap_ib_cm(id->device, 1)) {
ret = cma_ib_listen(id_priv);
if (ret)
goto err;
@@ -2882,7 +2881,7 @@ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
id_priv->srq = conn_param->srq;
}

- if (rdma_ib_or_iboe(id->device, id->port_num)) {
+ if (cap_ib_cm(id->device, id->port_num)) {
if (id->qp_type == IB_QPT_UD)
ret = cma_resolve_ib_udp(id_priv, conn_param);
else
@@ -2993,7 +2992,7 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
id_priv->srq = conn_param->srq;
}

- if (rdma_ib_or_iboe(id->device, id->port_num)) {
+ if (cap_ib_cm(id->device, id->port_num)) {
if (id->qp_type == IB_QPT_UD) {
if (conn_param)
ret = cma_send_sidr_rep(id_priv, IB_SIDR_SUCCESS,
@@ -3056,7 +3055,7 @@ int rdma_reject(struct rdma_cm_id *id, const void *private_data,
if (!id_priv->cm_id.ib)
return -EINVAL;

- if (rdma_ib_or_iboe(id->device, id->port_num)) {
+ if (cap_ib_cm(id->device, id->port_num)) {
if (id->qp_type == IB_QPT_UD)
ret = cma_send_sidr_rep(id_priv, IB_SIDR_REJECT, 0,
private_data, private_data_len);
@@ -3083,7 +3082,7 @@ int rdma_disconnect(struct rdma_cm_id *id)
if (!id_priv->cm_id.ib)
return -EINVAL;

- if (rdma_ib_or_iboe(id->device, id->port_num)) {
+ if (cap_ib_cm(id->device, id->port_num)) {
ret = cma_modify_qp_err(id_priv);
if (ret)
goto out;
diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index 70e0ccb..f7290c8 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -1253,7 +1253,7 @@ static void ib_ucm_add_one(struct ib_device *device)
dev_t base;
struct ib_ucm_device *ucm_dev;

- if (!device->alloc_ucontext || !rdma_ib_or_iboe(device, 1))
+ if (!device->alloc_ucontext || !cap_ib_cm(device, 1))
return;

ucm_dev = kzalloc(sizeof *ucm_dev, GFP_KERNEL);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 873b9a6..6805e3e 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1803,6 +1803,21 @@ static inline int cap_ib_smi(struct ib_device *device, u8 port_num)
return rdma_tech_ib(device, port_num);
}

+/**
+ * cap_ib_cm - Check if the port of device has the capability Infiniband
+ * Communication Manager.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support Infiniband
+ * Communication Manager.
+ */
+static inline int cap_ib_cm(struct ib_device *device, u8 port_num)
+{
+ return rdma_ib_or_iboe(device, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-24 12:24:44

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 19/26] IB/Verbs: Use management helper cap_iw_cm()

Introduce helper cap_iw_cm() to help us check if the port of an
IB device support IWARP Communication Manager.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 14 +++++++-------
include/rdma/ib_verbs.h | 15 +++++++++++++++
2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 08d2d78..dc88aa5 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -775,7 +775,7 @@ int rdma_init_qp_attr(struct rdma_cm_id *id, struct ib_qp_attr *qp_attr,

if (qp_attr->qp_state == IB_QPS_RTR)
qp_attr->rq_psn = id_priv->seq_num;
- } else if (rdma_tech_iwarp(id->device, id->port_num)) {
+ } else if (cap_iw_cm(id->device, id->port_num)) {
if (!id_priv->cm_id.iw) {
qp_attr->qp_access_flags = 0;
*qp_attr_mask = IB_QP_STATE | IB_QP_ACCESS_FLAGS;
@@ -1057,7 +1057,7 @@ void rdma_destroy_id(struct rdma_cm_id *id)
if (cap_ib_cm(id_priv->id.device, 1)) {
if (id_priv->cm_id.ib)
ib_destroy_cm_id(id_priv->cm_id.ib);
- } else if (rdma_tech_iwarp(id_priv->id.device, 1)) {
+ } else if (cap_iw_cm(id_priv->id.device, 1)) {
if (id_priv->cm_id.iw)
iw_destroy_cm_id(id_priv->cm_id.iw);
}
@@ -2541,7 +2541,7 @@ int rdma_listen(struct rdma_cm_id *id, int backlog)
ret = cma_ib_listen(id_priv);
if (ret)
goto err;
- } else if (rdma_tech_iwarp(id->device, 1)) {
+ } else if (cap_iw_cm(id->device, 1)) {
ret = cma_iw_listen(id_priv, backlog);
if (ret)
goto err;
@@ -2886,7 +2886,7 @@ int rdma_connect(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
ret = cma_resolve_ib_udp(id_priv, conn_param);
else
ret = cma_connect_ib(id_priv, conn_param);
- } else if (rdma_tech_iwarp(id->device, id->port_num))
+ } else if (cap_iw_cm(id->device, id->port_num))
ret = cma_connect_iw(id_priv, conn_param);
else
ret = -ENOSYS;
@@ -3008,7 +3008,7 @@ int rdma_accept(struct rdma_cm_id *id, struct rdma_conn_param *conn_param)
else
ret = cma_rep_recv(id_priv);
}
- } else if (rdma_tech_iwarp(id->device, id->port_num))
+ } else if (cap_iw_cm(id->device, id->port_num))
ret = cma_accept_iw(id_priv, conn_param);
else
ret = -ENOSYS;
@@ -3063,7 +3063,7 @@ int rdma_reject(struct rdma_cm_id *id, const void *private_data,
ret = ib_send_cm_rej(id_priv->cm_id.ib,
IB_CM_REJ_CONSUMER_DEFINED, NULL,
0, private_data, private_data_len);
- } else if (rdma_tech_iwarp(id->device, id->port_num)) {
+ } else if (cap_iw_cm(id->device, id->port_num)) {
ret = iw_cm_reject(id_priv->cm_id.iw,
private_data, private_data_len);
} else
@@ -3089,7 +3089,7 @@ int rdma_disconnect(struct rdma_cm_id *id)
/* Initiate or respond to a disconnect. */
if (ib_send_cm_dreq(id_priv->cm_id.ib, NULL, 0))
ib_send_cm_drep(id_priv->cm_id.ib, NULL, 0);
- } else if (rdma_tech_iwarp(id->device, id->port_num)) {
+ } else if (cap_iw_cm(id->device, id->port_num)) {
ret = iw_cm_disconnect(id_priv->cm_id.iw, 0);
} else
ret = -EINVAL;
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6805e3e..e4999f6 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1818,6 +1818,21 @@ static inline int cap_ib_cm(struct ib_device *device, u8 port_num)
return rdma_ib_or_iboe(device, port_num);
}

+/**
+ * cap_iw_cm - Check if the port of device has the capability IWARP
+ * Communication Manager.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support IWARP
+ * Communication Manager.
+ */
+static inline int cap_iw_cm(struct ib_device *device, u8 port_num)
+{
+ return rdma_tech_iwarp(device, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-24 12:24:49

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 20/26] IB/Verbs: Use management helper cap_ib_sa()

Introduce helper cap_ib_sa() to help us check if the port of an
IB device support Infiniband Subnet Administration.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 4 ++--
drivers/infiniband/core/sa_query.c | 10 +++++-----
drivers/infiniband/core/ucma.c | 2 +-
include/rdma/ib_verbs.h | 15 +++++++++++++++
4 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index dc88aa5..8484ae3 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -954,7 +954,7 @@ static inline int cma_user_data_offset(struct rdma_id_private *id_priv)

static void cma_cancel_route(struct rdma_id_private *id_priv)
{
- if (rdma_tech_ib(id_priv->id.device, id_priv->id.port_num)) {
+ if (cap_ib_sa(id_priv->id.device, id_priv->id.port_num)) {
if (id_priv->query)
ib_sa_cancel_query(id_priv->query_id, id_priv->query);
}
@@ -1978,7 +1978,7 @@ int rdma_resolve_route(struct rdma_cm_id *id, int timeout_ms)
return -EINVAL;

atomic_inc(&id_priv->refcount);
- if (rdma_tech_ib(id->device, id->port_num))
+ if (cap_ib_sa(id->device, id->port_num))
ret = cma_resolve_ib_route(id_priv, timeout_ms);
else if (rdma_tech_iboe(id->device, id->port_num))
ret = cma_resolve_iboe_route(id_priv);
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 96adf8c..f0acc9f 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -450,7 +450,7 @@ static void ib_sa_event(struct ib_event_handler *handler, struct ib_event *event
struct ib_sa_port *port =
&sa_dev->port[event->element.port_num - sa_dev->start_port];

- if (WARN_ON(!rdma_tech_ib(handler->device, port->port_num)))
+ if (WARN_ON(!cap_ib_sa(handler->device, port->port_num)))
return;

spin_lock_irqsave(&port->ah_lock, flags);
@@ -1173,7 +1173,7 @@ static void ib_sa_add_one(struct ib_device *device)

for (i = 0; i <= e - s; ++i) {
spin_lock_init(&sa_dev->port[i].ah_lock);
- if (!rdma_tech_ib(device, i + 1))
+ if (!cap_ib_sa(device, i + 1))
continue;

sa_dev->port[i].sm_ah = NULL;
@@ -1208,7 +1208,7 @@ static void ib_sa_add_one(struct ib_device *device)
goto err;

for (i = 0; i <= e - s; ++i) {
- if (rdma_tech_ib(device, i + 1))
+ if (cap_ib_sa(device, i + 1))
update_sm_ah(&sa_dev->port[i].update_task);
}

@@ -1216,7 +1216,7 @@ static void ib_sa_add_one(struct ib_device *device)

err:
while (--i >= 0) {
- if (rdma_tech_ib(device, i + 1))
+ if (cap_ib_sa(device, i + 1))
ib_unregister_mad_agent(sa_dev->port[i].agent);
}
free:
@@ -1237,7 +1237,7 @@ static void ib_sa_remove_one(struct ib_device *device)
flush_workqueue(ib_wq);

for (i = 0; i <= sa_dev->end_port - sa_dev->start_port; ++i) {
- if (rdma_tech_ib(device, i + 1)) {
+ if (cap_ib_sa(device, i + 1)) {
ib_unregister_mad_agent(sa_dev->port[i].agent);
if (sa_dev->port[i].sm_ah)
kref_put(&sa_dev->port[i].sm_ah->ref, free_sm_ah);
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index 7331c6c..bed7957 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -723,7 +723,7 @@ static ssize_t ucma_query_route(struct ucma_file *file,
resp.node_guid = (__force __u64) ctx->cm_id->device->node_guid;
resp.port_num = ctx->cm_id->port_num;

- if (rdma_tech_ib(ctx->cm_id->device, ctx->cm_id->port_num))
+ if (cap_ib_sa(ctx->cm_id->device, ctx->cm_id->port_num))
ucma_copy_ib_route(&resp, &ctx->cm_id->route);
else if (rdma_tech_iboe(ctx->cm_id->device, ctx->cm_id->port_num))
ucma_copy_iboe_route(&resp, &ctx->cm_id->route);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e4999f6..de3a168 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1833,6 +1833,21 @@ static inline int cap_iw_cm(struct ib_device *device, u8 port_num)
return rdma_tech_iwarp(device, port_num);
}

+/**
+ * cap_ib_sa - Check if the port of device has the capability Infiniband
+ * Subnet Administration.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support Infiniband
+ * Subnet Administration.
+ */
+static inline int cap_ib_sa(struct ib_device *device, u8 port_num)
+{
+ return rdma_tech_ib(device, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-24 12:24:53

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 21/26] IB/Verbs: Use management helper cap_ib_mcast()

Introduce helper cap_ib_mcast() to help us check if the port of an
IB device support Infiniband Multicast.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 6 +++---
drivers/infiniband/core/multicast.c | 6 +++---
include/rdma/ib_verbs.h | 15 +++++++++++++++
3 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 8484ae3..58ec946 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -1028,7 +1028,7 @@ static void cma_leave_mc_groups(struct rdma_id_private *id_priv)
mc = container_of(id_priv->mc_list.next,
struct cma_multicast, list);
list_del(&mc->list);
- if (rdma_tech_ib(id_priv->cma_dev->device,
+ if (cap_ib_mcast(id_priv->cma_dev->device,
id_priv->id.port_num)) {
ib_sa_free_multicast(mc->multicast.ib);
kfree(mc);
@@ -3342,7 +3342,7 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr,
if (rdma_tech_iboe(id->device, id->port_num)) {
kref_init(&mc->mcref);
ret = cma_iboe_join_multicast(id_priv, mc);
- } else if (rdma_tech_ib(id->device, id->port_num))
+ } else if (cap_ib_mcast(id->device, id->port_num))
ret = cma_join_ib_multicast(id_priv, mc);
else
ret = -ENOSYS;
@@ -3376,7 +3376,7 @@ void rdma_leave_multicast(struct rdma_cm_id *id, struct sockaddr *addr)

BUG_ON(id_priv->cma_dev->device != id->device);

- if (rdma_tech_ib(id->device, id->port_num)) {
+ if (cap_ib_mcast(id->device, id->port_num)) {
ib_sa_free_multicast(mc->multicast.ib);
kfree(mc);
} else if (rdma_tech_iboe(id->device, id->port_num))
diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index 24d93f5..bdc1880 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -780,7 +780,7 @@ static void mcast_event_handler(struct ib_event_handler *handler,
int index;

dev = container_of(handler, struct mcast_device, event_handler);
- if (WARN_ON(!rdma_tech_ib(dev->device, event->element.port_num)))
+ if (WARN_ON(!cap_ib_mcast(dev->device, event->element.port_num)))
return;

index = event->element.port_num - dev->start_port;
@@ -820,7 +820,7 @@ static void mcast_add_one(struct ib_device *device)
}

for (i = 0; i <= dev->end_port - dev->start_port; i++) {
- if (!rdma_tech_ib(device, dev->start_port + i))
+ if (!cap_ib_mcast(device, dev->start_port + i))
continue;
port = &dev->port[i];
port->dev = dev;
@@ -858,7 +858,7 @@ static void mcast_remove_one(struct ib_device *device)
flush_workqueue(mcast_wq);

for (i = 0; i <= dev->end_port - dev->start_port; i++) {
- if (rdma_tech_ib(device, dev->start_port + i)) {
+ if (cap_ib_mcast(device, dev->start_port + i)) {
port = &dev->port[i];
deref_port(port);
wait_for_completion(&port->comp);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index de3a168..6e354df 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1848,6 +1848,21 @@ static inline int cap_ib_sa(struct ib_device *device, u8 port_num)
return rdma_tech_ib(device, port_num);
}

+/**
+ * cap_ib_mcast - Check if the port of device has the capability Infiniband
+ * Multicast.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support Infiniband
+ * Multicast.
+ */
+static inline int cap_ib_mcast(struct ib_device *device, u8 port_num)
+{
+ return cap_ib_sa(device, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

--
2.1.0

2015-04-24 12:29:00

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 22/26] IB/Verbs: Use management helper cap_read_multi_sge()

Introduce helper cap_read_multi_sge() to help us check if the port of an
IB device support RDMA Read Multiple Scatter-Gather Entries.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
include/rdma/ib_verbs.h | 15 +++++++++++++++
net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 3 ++-
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 6e354df..4229ae2 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1863,6 +1863,21 @@ static inline int cap_ib_mcast(struct ib_device *device, u8 port_num)
return cap_ib_sa(device, port_num);
}

+/**
+ * cap_read_multi_sge - Check if the port of device has the capability
+ * RDMA Read Multiple Scatter-Gather Entries.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support
+ * RDMA Read Multiple Scatter-Gather Entries.
+ */
+static inline int cap_read_multi_sge(struct ib_device *device, u8 port_num)
+{
+ return !rdma_tech_iwarp(device, port_num);
+}
+
int ib_query_gid(struct ib_device *device,
u8 port_num, int index, union ib_gid *gid);

diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
index a5bed5b..7711b7a 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
@@ -117,7 +117,8 @@ static void rdma_build_arg_xdr(struct svc_rqst *rqstp,

static int rdma_read_max_sge(struct svcxprt_rdma *xprt, int sge_count)
{
- if (rdma_tech_iwarp(xprt->sc_cm_id->device, xprt->sc_cm_id->port_num))
+ if (!cap_read_multi_sge(xprt->sc_cm_id->device,
+ xprt->sc_cm_id->port_num))
return 1;
else
return min_t(int, sge_count, xprt->sc_max_sge);
--
2.1.0

2015-04-24 12:28:57

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 23/26] IB/Verbs: Use management helper cap_af_ib()

Introduce helper cap_af_ib() to help us check if the port of an
IB device support Native Infiniband Address.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 2 +-
include/rdma/ib_verbs.h | 15 +++++++++++++++
2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 58ec946..21d2d1a 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -469,7 +469,7 @@ static int cma_resolve_ib_dev(struct rdma_id_private *id_priv)

list_for_each_entry(cur_dev, &dev_list, list) {
for (p = 1; p <= cur_dev->device->phys_port_cnt; ++p) {
- if (!rdma_ib_or_iboe(cur_dev->device, p))
+ if (!cap_af_ib(cur_dev->device, p))
continue;

if (ib_find_cached_pkey(cur_dev->device, p, pkey, &index))
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 4229ae2..2d4e6ac 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1864,6 +1864,21 @@ static inline int cap_ib_mcast(struct ib_device *device, u8 port_num)
}

/**
+ * cap_af_ib - Check if the port of device has the capability
+ * Native Infiniband Address.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support
+ * Native Infiniband Address.
+ */
+static inline int cap_af_ib(struct ib_device *device, u8 port_num)
+{
+ return rdma_ib_or_iboe(device, port_num);
+}
+
+/**
* cap_read_multi_sge - Check if the port of device has the capability
* RDMA Read Multiple Scatter-Gather Entries.
*
--
2.1.0

2015-04-24 12:28:36

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 24/26] IB/Verbs: Use management helper cap_eth_ah()

Introduce helper cap_eth_ah() to help us check if the port of an
IB device support Ethernet Address Handler.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/cma.c | 2 +-
drivers/infiniband/core/sa_query.c | 2 +-
drivers/infiniband/core/verbs.c | 4 ++--
include/rdma/ib_verbs.h | 15 +++++++++++++++
4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 21d2d1a..572a8a9 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -732,7 +732,7 @@ static int cma_ib_init_qp_attr(struct rdma_id_private *id_priv,
int ret;
u16 pkey;

- if (rdma_tech_iboe(id_priv->id.device, id_priv->id.port_num))
+ if (cap_eth_ah(id_priv->id.device, id_priv->id.port_num))
pkey = 0xffff;
else
pkey = ib_addr_get_pkey(dev_addr);
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index f0acc9f..1f0d009 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -540,7 +540,7 @@ int ib_init_ah_from_path(struct ib_device *device, u8 port_num,
ah_attr->port_num = port_num;
ah_attr->static_rate = rec->rate;

- force_grh = rdma_tech_iboe(device, port_num);
+ force_grh = cap_eth_ah(device, port_num);

if (rec->hop_limit > 1 || force_grh) {
ah_attr->ah_flags = IB_AH_GRH;
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 7264860..44e25c1 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -202,7 +202,7 @@ int ib_init_ah_from_wc(struct ib_device *device, u8 port_num, struct ib_wc *wc,
int ret;

memset(ah_attr, 0, sizeof *ah_attr);
- if (rdma_tech_iboe(device, port_num)) {
+ if (cap_eth_ah(device, port_num)) {
if (!(wc->wc_flags & IB_WC_GRH))
return -EPROTOTYPE;

@@ -871,7 +871,7 @@ int ib_resolve_eth_l2_attrs(struct ib_qp *qp,
union ib_gid sgid;

if ((*qp_attr_mask & IB_QP_AV) &&
- (rdma_tech_iboe(qp->device, qp_attr->ah_attr.port_num))) {
+ (cap_eth_ah(qp->device, qp_attr->ah_attr.port_num))) {
ret = ib_query_gid(qp->device, qp_attr->ah_attr.port_num,
qp_attr->ah_attr.grh.sgid_index, &sgid);
if (ret)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 2d4e6ac..f122df9 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1879,6 +1879,21 @@ static inline int cap_af_ib(struct ib_device *device, u8 port_num)
}

/**
+ * cap_eth_ah - Check if the port of device has the capability
+ * Ethernet Address Handler.
+ *
+ * @device: Device to be checked
+ * @port_num: Port number of the device
+ *
+ * Return 0 when port of the device don't support
+ * Ethernet Address Handler.
+ */
+static inline int cap_eth_ah(struct ib_device *device, u8 port_num)
+{
+ return rdma_tech_iboe(device, port_num);
+}
+
+/**
* cap_read_multi_sge - Check if the port of device has the capability
* RDMA Read Multiple Scatter-Gather Entries.
*
--
2.1.0

2015-04-24 12:28:09

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 25/26] IB/Verbs: Clean up rdma_ib_or_iboe()

We have finished introducing the cap_XX(), and raw helper rdma_ib_or_iboe()
is no longer necessary, thus clean it up.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
include/rdma/ib_verbs.h | 19 +++++++++----------
net/sunrpc/xprtrdma/svc_rdma_transport.c | 6 ++++--
2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index f122df9..0def4e1 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1766,13 +1766,6 @@ static inline int rdma_tech_iwarp(struct ib_device *device, u8 port_num)
== RDMA_TRANSPORT_IWARP;
}

-static inline int rdma_ib_or_iboe(struct ib_device *device, u8 port_num)
-{
- enum rdma_transport_type tp = device->query_transport(device, port_num);
-
- return (tp == RDMA_TRANSPORT_IB || tp == RDMA_TRANSPORT_IBOE);
-}
-
/**
* cap_ib_mad - Check if the port of device has the capability Infiniband
* Management Datagrams.
@@ -1785,7 +1778,9 @@ static inline int rdma_ib_or_iboe(struct ib_device *device, u8 port_num)
*/
static inline int cap_ib_mad(struct ib_device *device, u8 port_num)
{
- return rdma_ib_or_iboe(device, port_num);
+ enum rdma_transport_type tp = device->query_transport(device, port_num);
+
+ return (tp == RDMA_TRANSPORT_IB || tp == RDMA_TRANSPORT_IBOE);
}

/**
@@ -1815,7 +1810,9 @@ static inline int cap_ib_smi(struct ib_device *device, u8 port_num)
*/
static inline int cap_ib_cm(struct ib_device *device, u8 port_num)
{
- return rdma_ib_or_iboe(device, port_num);
+ enum rdma_transport_type tp = device->query_transport(device, port_num);
+
+ return (tp == RDMA_TRANSPORT_IB || tp == RDMA_TRANSPORT_IBOE);
}

/**
@@ -1875,7 +1872,9 @@ static inline int cap_ib_mcast(struct ib_device *device, u8 port_num)
*/
static inline int cap_af_ib(struct ib_device *device, u8 port_num)
{
- return rdma_ib_or_iboe(device, port_num);
+ enum rdma_transport_type tp = device->query_transport(device, port_num);
+
+ return (tp == RDMA_TRANSPORT_IB || tp == RDMA_TRANSPORT_IBOE);
}

/**
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index a09b7a1..8af6f92 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -987,8 +987,10 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
*/
if (!rdma_tech_iwarp(newxprt->sc_cm_id->device,
newxprt->sc_cm_id->port_num) &&
- !rdma_ib_or_iboe(newxprt->sc_cm_id->device,
- newxprt->sc_cm_id->port_num))
+ !rdma_tech_ib(newxprt->sc_cm_id->device,
+ newxprt->sc_cm_id->port_num) &&
+ !rdma_tech_iboe(newxprt->sc_cm_id->device,
+ newxprt->sc_cm_id->port_num))
goto errout;

if (!(newxprt->sc_dev_caps & SVCRDMA_DEVCAP_FAST_REG) ||
--
2.1.0

2015-04-24 12:27:30

by Michael Wang

[permalink] [raw]
Subject: [PATCH v6 26/26] IB/Verbs: Cleanup rdma_node_get_transport()

We have get rid of all the scene using legacy rdma_node_get_transport(),
now clean it up.

Cc: Hal Rosenstock <[email protected]>
Cc: Steve Wise <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Sean Hefty <[email protected]>
Signed-off-by: Michael Wang <[email protected]>
---
drivers/infiniband/core/verbs.c | 21 ---------------------
include/rdma/ib_verbs.h | 3 ---
2 files changed, 24 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index 44e25c1..7dc08b1 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -107,27 +107,6 @@ __attribute_const__ int ib_rate_to_mbps(enum ib_rate rate)
}
EXPORT_SYMBOL(ib_rate_to_mbps);

-__attribute_const__ enum rdma_transport_type
-rdma_node_get_transport(enum rdma_node_type node_type)
-{
- switch (node_type) {
- case RDMA_NODE_IB_CA:
- case RDMA_NODE_IB_SWITCH:
- case RDMA_NODE_IB_ROUTER:
- return RDMA_TRANSPORT_IB;
- case RDMA_NODE_RNIC:
- return RDMA_TRANSPORT_IWARP;
- case RDMA_NODE_USNIC:
- return RDMA_TRANSPORT_USNIC;
- case RDMA_NODE_USNIC_UDP:
- return RDMA_TRANSPORT_USNIC_UDP;
- default:
- BUG();
- return 0;
- }
-}
-EXPORT_SYMBOL(rdma_node_get_transport);
-
enum rdma_link_layer rdma_port_get_link_layer(struct ib_device *device, u8 port_num)
{
if (device->get_link_layer)
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 0def4e1..d051500 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -84,9 +84,6 @@ enum rdma_transport_type {
RDMA_TRANSPORT_IBOE,
};

-__attribute_const__ enum rdma_transport_type
-rdma_node_get_transport(enum rdma_node_type node_type);
-
enum rdma_link_layer {
IB_LINK_LAYER_UNSPECIFIED,
IB_LINK_LAYER_INFINIBAND,
--
2.1.0

2015-04-24 13:13:15

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v6 00/26] IB/Verbs: IB Management Helpers

Add missing Cc:
Devesh Sharma <[email protected]>"
Liran Liss <[email protected]>"
Dave Goodell <[email protected]>"

Regards,
Michael Wang

On 04/24/2015 02:23 PM, Michael Wang wrote:
> Since v5:
> * Thanks to Ira, Devesh for the review and testing :-)
> * Thanks for the comments from Steve, Tom, Jason, Hal, Devesh, Ira,
> Liran, Jason, Dave :-) Please remind me if anything missed :-P
> * Trivial fix for 4#
> * Drop the reform on acquiring link-layer in 9#
> * Drop cap_ipoib()
>
> There are plenty of lengthy code to check the transport type of IB device,
> or the link layer type of it's port, but actually we are just speculating
> whether a particular management/feature is supported by the device/port.
>
> Thus instead of inferring, we should have our own mechanism for IB management
> capability/protocol/feature checking, several proposals below.
>
> This patch set will reform the method of getting transport type, we will
> now using query_transport() instead of inferring from transport and link
> layer respectively, also we defined the new transport type to make the
> concept more reasonable.
>
> Mapping List:
> node-type link-layer old-transport new-transport
> nes RNIC ETH IWARP IWARP
> amso1100 RNIC ETH IWARP IWARP
> cxgb3 RNIC ETH IWARP IWARP
> cxgb4 RNIC ETH IWARP IWARP
> usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP
> ocrdma IB_CA ETH IB IBOE
> mlx4 IB_CA IB/ETH IB IB/IBOE
> mlx5 IB_CA IB IB IB
> ehca IB_CA IB IB IB
> ipath IB_CA IB IB IB
> mthca IB_CA IB IB IB
> qib IB_CA IB IB IB
>
> For example:
> if (transport == IB) && (link-layer == ETH)
> will now become:
> if (query_transport() == IBOE)
>
> Thus we will be able to get rid of the respective transport and link-layer
> checking, and it will help us to add new protocol/Technology (like OPA) more
> easier, also with the introduced management helpers, IB management logical
> will be more clear and easier for extending.
>
> Highlights:
> The patch set covered a wide range of IB stuff, thus for those who are
> familiar with the particular part, your suggestion would be invaluable ;-)
>
> Patch 1#~15# included all the logical reform, 16#~25# introduced the
> management helpers, 26#~27# do clean up.
>
> we appreciate for those one who have the HW willing to provide Tested-by :-)
>
> Doug suggested the bitmask mechanism:
> https://www.mail-archive.com/[email protected]/msg23765.html
> which could be the plan for future reforming, we prefer that to be another
> series which focus on semantic and performance.
>
> This patch-set is somewhat 'bloated' now and it may be a good timing for
> staging, I'd like to suggest we focus on improving existed helpers and push
> all the further reforms into next series ;-)
>
> We now have a repository based on latest infiniband/for-next with this
> series applied:
> [email protected]:ywang-pb/infiniband-wy.git
>
> Proposals:
> Sean:
> https://www.mail-archive.com/[email protected]/msg23339.html
> Doug:
> https://www.mail-archive.com/[email protected]/msg23418.html
> https://www.mail-archive.com/[email protected]/msg23765.html
> Jason:
> https://www.mail-archive.com/[email protected]/msg23425.html
>
> Michael Wang (26):
> IB/Verbs: Implement new callback query_transport()
> IB/Verbs: Implement raw management helpers
> IB/Verbs: Reform IB-core mad/agent/user_mad
> IB/Verbs: Reform IB-core cm
> IB/Verbs: Reform IB-core sa_query
> IB/Verbs: Reform IB-core multicast
> IB/Verbs: Reform IB-ulp ipoib
> IB/Verbs: Reform IB-ulp xprtrdma
> IB/Verbs: Reform IB-core verbs
> IB/Verbs: Reform cm related part in IB-core cma/ucm
> IB/Verbs: Reform route related part in IB-core cma
> IB/Verbs: Reform mcast related part in IB-core cma
> IB/Verbs: Reserve legacy transport type in 'dev_addr'
> IB/Verbs: Reform cma_acquire_dev()
> IB/Verbs: Reform rest part in IB-core cma
> IB/Verbs: Use management helper cap_ib_mad()
> IB/Verbs: Use management helper cap_ib_smi()
> IB/Verbs: Use management helper cap_ib_cm()
> IB/Verbs: Use management helper cap_iw_cm()
> IB/Verbs: Use management helper cap_ib_sa()
> IB/Verbs: Use management helper cap_ib_mcast()
> IB/Verbs: Use management helper cap_read_multi_sge()
> IB/Verbs: Use management helper cap_af_ib()
> IB/Verbs: Use management helper cap_eth_ah()
> IB/Verbs: Clean up rdma_ib_or_iboe()
> IB/Verbs: Cleanup rdma_node_get_transport()
>
> drivers/infiniband/core/agent.c | 2 +-
> drivers/infiniband/core/cm.c | 20 +-
> drivers/infiniband/core/cma.c | 282 ++++++++++++---------------
> drivers/infiniband/core/device.c | 1 +
> drivers/infiniband/core/mad.c | 43 ++--
> drivers/infiniband/core/multicast.c | 12 +-
> drivers/infiniband/core/sa_query.c | 30 +--
> drivers/infiniband/core/ucm.c | 3 +-
> drivers/infiniband/core/ucma.c | 25 +--
> drivers/infiniband/core/user_mad.c | 26 ++-
> drivers/infiniband/core/verbs.c | 31 +--
> drivers/infiniband/hw/amso1100/c2_provider.c | 7 +
> drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 +
> drivers/infiniband/hw/cxgb4/provider.c | 7 +
> drivers/infiniband/hw/ehca/ehca_hca.c | 6 +
> drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 +
> drivers/infiniband/hw/ehca/ehca_main.c | 1 +
> drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +
> drivers/infiniband/hw/mlx4/main.c | 10 +
> drivers/infiniband/hw/mlx5/main.c | 7 +
> drivers/infiniband/hw/mthca/mthca_provider.c | 7 +
> drivers/infiniband/hw/nes/nes_verbs.c | 6 +
> drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 +
> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 +
> drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 +
> drivers/infiniband/hw/qib/qib_verbs.c | 7 +
> drivers/infiniband/hw/usnic/usnic_ib_main.c | 1 +
> drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 +
> drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2 +
> drivers/infiniband/ulp/ipoib/ipoib_main.c | 15 +-
> include/rdma/ib_verbs.h | 169 +++++++++++++++-
> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 4 +-
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 47 ++---
> 33 files changed, 503 insertions(+), 301 deletions(-)
>

2015-04-24 14:30:00

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On 4/24/2015 8:23 AM, Michael Wang wrote:
> Add new callback query_transport() and implement for each HW.
>
> Mapping List:
> node-type link-layer old-transport new-transport
> ...
> mlx4 IB_CA IB/ETH IB IB/IBOE
> mlx5 IB_CA IB IB IB
> ...
> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
> index 57c9809..b6f2f58 100644
> --- a/drivers/infiniband/hw/mlx5/main.c
> +++ b/drivers/infiniband/hw/mlx5/main.c
> @@ -262,6 +262,12 @@ out:
> return err;
> }
>
> +static enum rdma_transport_type
> +mlx5_ib_query_transport(struct ib_device *device, u8 port_num)
> +{
> + return RDMA_TRANSPORT_IB;
> +}
> +


Just noticed that mlx5 is not being coded as RoCE-capable like mlx4.
The mlx5 driver is for the new ConnectX-4, which implements all three
of IB, RoCE and RoCEv2, right? Are those last two not supported?

2015-04-24 14:35:50

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()



On 04/24/2015 04:29 PM, Tom Talpey wrote:
> On 4/24/2015 8:23 AM, Michael Wang wrote:
[snip]
>> +static enum rdma_transport_type
>> +mlx5_ib_query_transport(struct ib_device *device, u8 port_num)
>> +{
>> + return RDMA_TRANSPORT_IB;
>> +}
>> +
>
>
> Just noticed that mlx5 is not being coded as RoCE-capable like mlx4.
> The mlx5 driver is for the new ConnectX-4, which implements all three
> of IB, RoCE and RoCEv2, right? Are those last two not supported?

I'm not sure about the details of mlx5, but according to the current
implementation, it's transport is IB without a link-layer callback,
which means it doesn't support IBoE...

And there is no method to change the port link-layer type as mlx4 did.

Regards,
Michael Wang

>
>

2015-04-24 14:46:43

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On 4/24/2015 10:35 AM, Michael Wang wrote:
>
>
> On 04/24/2015 04:29 PM, Tom Talpey wrote:
>> On 4/24/2015 8:23 AM, Michael Wang wrote:
> [snip]
>>> +static enum rdma_transport_type
>>> +mlx5_ib_query_transport(struct ib_device *device, u8 port_num)
>>> +{
>>> + return RDMA_TRANSPORT_IB;
>>> +}
>>> +
>>
>>
>> Just noticed that mlx5 is not being coded as RoCE-capable like mlx4.
>> The mlx5 driver is for the new ConnectX-4, which implements all three
>> of IB, RoCE and RoCEv2, right? Are those last two not supported?
>
> I'm not sure about the details of mlx5, but according to the current
> implementation, it's transport is IB without a link-layer callback,
> which means it doesn't support IBoE...
>
> And there is no method to change the port link-layer type as mlx4 did.

Hal, is that correct?

From the Mellanox web:

http://www.mellanox.com/related-docs/products/IB_Adapter_card_brochure_c_2_3.pdf

"ConnectX-4
ConnectX-4 adapter cards with Virtual Protocol Interconnect (VPI),
supporting EDR 100Gb/s InfiniBand and 100Gb/s Ethernet connectivity,
provide the...

"Virtual Protocol Interconnect
VPI? flexibility enables any standard networking, clustering, storage,
and management protocol to seamlessly operate over any converged network
leveraging a consolidated software stack. Each port can operate on
InfiniBand, Ethernet, or Data Center Bridging (DCB) fabrics, and
supports IP over InfiniBand (IPoIB), Ethernet over InfiniBand (EoIB)
and RDMA over Converged Ethernet (RoCE and RoCEv2).

2015-04-24 15:12:20

by Liran Liss

[permalink] [raw]
Subject: RE: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

> From: [email protected] [mailto:linux-rdma-
>
[snip]
> a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
> 65994a1..d54f91e 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -75,10 +75,13 @@ enum rdma_node_type { };
>
> enum rdma_transport_type {
> + /* legacy for users */
> RDMA_TRANSPORT_IB,
> RDMA_TRANSPORT_IWARP,
> RDMA_TRANSPORT_USNIC,
> - RDMA_TRANSPORT_USNIC_UDP
> + RDMA_TRANSPORT_USNIC_UDP,
> + /* new transport */
> + RDMA_TRANSPORT_IBOE,

Remove RDMA_TRANSPORT_IBOE - it is not a transport.
ROCE uses IBTA transport.

If any code should test for ROCE should invoke a specific helper, e.g., rdma_protocol_iboe().
This is what you currently call "rdma_tech_iboe" is patch 02/26.

I think that pretty much everybody agrees that rdma_protocol_*() is a better name than rdma_tech_*(), right?
So, let's change this.

The semantics are: "check that a link supports a certain wire protocol, or a set of wire protocols", where 'certain'
refers to the specific helper...

2015-04-24 15:16:00

by Liran Liss

[permalink] [raw]
Subject: RE: [PATCH v6 02/26] IB/Verbs: Implement raw management helpers

> From: [email protected] [mailto:linux-rdma-
>
> Add raw helpers:
> rdma_tech_ib
> rdma_tech_iboe
> rdma_tech_iwarp
> rdma_ib_or_iboe (transition, clean up later) To help us detect which
> technology the port supported.
>

Replace "rdma_tech_*" with "rdma_protocol_*".

> Cc: Hal Rosenstock <[email protected]>
> Cc: Steve Wise <[email protected]>
> Cc: Tom Talpey <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Doug Ledford <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Sean Hefty <[email protected]>
> Signed-off-by: Michael Wang <[email protected]>
> ---
> include/rdma/ib_verbs.h | 25 +++++++++++++++++++++++++
> 1 file changed, 25 insertions(+)
>
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
> d54f91e..a12e876 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1748,6 +1748,31 @@ int ib_query_port(struct ib_device *device, enum
> rdma_link_layer rdma_port_get_link_layer(struct ib_device *device,
> u8 port_num);
>
> +static inline int rdma_tech_ib(struct ib_device *device, u8 port_num) {
> + return device->query_transport(device, port_num)
> + == RDMA_TRANSPORT_IB;
> +}
> +
> +static inline int rdma_tech_iboe(struct ib_device *device, u8 port_num)
> +{
> + return device->query_transport(device, port_num)
> + == RDMA_TRANSPORT_IBOE;

Remove RDMA_TRANSPORT_IBOE.
In the current code, the test should be: (IB transport && Ethernet link layer).

We can later consider each provider declaring the transports directly.

> +}
> +
> +static inline int rdma_tech_iwarp(struct ib_device *device, u8
> +port_num) {
> + return device->query_transport(device, port_num)
> + == RDMA_TRANSPORT_IWARP;
> +}
> +
> +static inline int rdma_ib_or_iboe(struct ib_device *device, u8
> +port_num) {
> + enum rdma_transport_type tp = device->query_transport(device,
> +port_num);
> +
> + return (tp == RDMA_TRANSPORT_IB || tp ==
> RDMA_TRANSPORT_IBOE); }

Remove RDMA_TRANSPORT_IBOE.
Just test against RDMA_TRANSPORT_IB.

> +
> int ib_query_gid(struct ib_device *device,
> u8 port_num, int index, union ib_gid *gid);
>

2015-04-24 16:00:39

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On Fri, 2015-04-24 at 10:46 -0400, Tom Talpey wrote:
> On 4/24/2015 10:35 AM, Michael Wang wrote:
> >
> >
> > On 04/24/2015 04:29 PM, Tom Talpey wrote:
> >> On 4/24/2015 8:23 AM, Michael Wang wrote:
> > [snip]
> >>> +static enum rdma_transport_type
> >>> +mlx5_ib_query_transport(struct ib_device *device, u8 port_num)
> >>> +{
> >>> + return RDMA_TRANSPORT_IB;
> >>> +}
> >>> +
> >>
> >>
> >> Just noticed that mlx5 is not being coded as RoCE-capable like mlx4.
> >> The mlx5 driver is for the new ConnectX-4, which implements all three
> >> of IB, RoCE and RoCEv2, right? Are those last two not supported?
> >
> > I'm not sure about the details of mlx5, but according to the current
> > implementation, it's transport is IB without a link-layer callback,
> > which means it doesn't support IBoE...
> >
> > And there is no method to change the port link-layer type as mlx4 did.
>
> Hal, is that correct?

The mlx5 Ethernet driver has not yet been submitted to the linux kernel.
Until it lands, the drive is IB only. When it does land, it will need
to update this area of code.

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-24 16:29:31

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 02/26] IB/Verbs: Implement raw management helpers

On Fri, Apr 24, 2015 at 03:15:54PM +0000, Liran Liss wrote:
> > From: [email protected] [mailto:linux-rdma-

> > +static inline int rdma_tech_iboe(struct ib_device *device, u8 port_num)
> > +{
> > + return device->query_transport(device, port_num)
> > + == RDMA_TRANSPORT_IBOE;
>
> Remove RDMA_TRANSPORT_IBOE.
> In the current code, the test should be: (IB transport && Ethernet link layer).

No, if this rmda_tech stuff is to reflect the specification the port
implements, then RoCEE is a valid specification (IBA Annex A16), as it
RoCEEv2 (A17).

This patch set is trying to drop then link layer concept entirely.

Jason

2015-04-27 07:20:10

by Devesh Sharma

[permalink] [raw]
Subject: RE: [PATCH v6 00/26] IB/Verbs: IB Management Helpers

Tested-By: Devesh Sharma <[email protected]>

I am still in process of reviewing the series. Will respond soon.

-Regards
Devesh
> -----Original Message-----
> From: Michael Wang [mailto:[email protected]]
> Sent: Friday, April 24, 2015 6:43 PM
> To: Roland Dreier; Sean Hefty; Hal Rosenstock; [email protected];
> [email protected]
> Cc: Tom Tucker; Steve Wise; Hoang-Nam Nguyen; Christoph Raisch; Mike
> Marciniszyn; Eli Cohen; Faisal Latif; Jack Morgenstein; Or Gerlitz; Haggai Eran;
> Ira Weiny; Tom Talpey; Jason Gunthorpe; Doug Ledford; Devesh Sharma; Liran
> Liss; Dave Goodell
> Subject: Re: [PATCH v6 00/26] IB/Verbs: IB Management Helpers
>
> Add missing Cc:
> Devesh Sharma <[email protected]>"
> Liran Liss <[email protected]>"
> Dave Goodell <[email protected]>"
>
> Regards,
> Michael Wang
>
> On 04/24/2015 02:23 PM, Michael Wang wrote:
> > Since v5:
> > * Thanks to Ira, Devesh for the review and testing :-)
> > * Thanks for the comments from Steve, Tom, Jason, Hal, Devesh, Ira,
> > Liran, Jason, Dave :-) Please remind me if anything missed :-P
> > * Trivial fix for 4#
> > * Drop the reform on acquiring link-layer in 9#
> > * Drop cap_ipoib()
> >
> > There are plenty of lengthy code to check the transport type of IB
> > device, or the link layer type of it's port, but actually we are just
> > speculating whether a particular management/feature is supported by the
> device/port.
> >
> > Thus instead of inferring, we should have our own mechanism for IB
> > management capability/protocol/feature checking, several proposals below.
> >
> > This patch set will reform the method of getting transport type, we
> > will now using query_transport() instead of inferring from transport
> > and link layer respectively, also we defined the new transport type to
> > make the concept more reasonable.
> >
> > Mapping List:
> > node-type link-layer old-transport new-transport
> > nes RNIC ETH IWARP IWARP
> > amso1100 RNIC ETH IWARP IWARP
> > cxgb3 RNIC ETH IWARP IWARP
> > cxgb4 RNIC ETH IWARP IWARP
> > usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP
> > ocrdma IB_CA ETH IB IBOE
> > mlx4 IB_CA IB/ETH IB IB/IBOE
> > mlx5 IB_CA IB IB IB
> > ehca IB_CA IB IB IB
> > ipath IB_CA IB IB IB
> > mthca IB_CA IB IB IB
> > qib IB_CA IB IB IB
> >
> > For example:
> > if (transport == IB) && (link-layer == ETH) will now become:
> > if (query_transport() == IBOE)
> >
> > Thus we will be able to get rid of the respective transport and
> > link-layer checking, and it will help us to add new
> > protocol/Technology (like OPA) more easier, also with the introduced
> > management helpers, IB management logical will be more clear and easier
> for extending.
> >
> > Highlights:
> > The patch set covered a wide range of IB stuff, thus for those who are
> > familiar with the particular part, your suggestion would be
> > invaluable ;-)
> >
> > Patch 1#~15# included all the logical reform, 16#~25# introduced the
> > management helpers, 26#~27# do clean up.
> >
> > we appreciate for those one who have the HW willing to provide
> > Tested-by :-)
> >
> > Doug suggested the bitmask mechanism:
> > https://www.mail-archive.com/linux-
> [email protected]/msg23765.html
> > which could be the plan for future reforming, we prefer that to be another
> > series which focus on semantic and performance.
> >
> > This patch-set is somewhat 'bloated' now and it may be a good timing for
> > staging, I'd like to suggest we focus on improving existed helpers and push
> > all the further reforms into next series ;-)
> >
> > We now have a repository based on latest infiniband/for-next with this
> > series applied:
> > [email protected]:ywang-pb/infiniband-wy.git
> >
> > Proposals:
> > Sean:
> > https://www.mail-archive.com/linux-
> [email protected]/msg23339.html
> > Doug:
> > https://www.mail-archive.com/linux-
> [email protected]/msg23418.html
> > https://www.mail-archive.com/linux-
> [email protected]/msg23765.html
> > Jason:
> > https://www.mail-archive.com/linux-
> [email protected]/msg23425.html
> >
> > Michael Wang (26):
> > IB/Verbs: Implement new callback query_transport()
> > IB/Verbs: Implement raw management helpers
> > IB/Verbs: Reform IB-core mad/agent/user_mad
> > IB/Verbs: Reform IB-core cm
> > IB/Verbs: Reform IB-core sa_query
> > IB/Verbs: Reform IB-core multicast
> > IB/Verbs: Reform IB-ulp ipoib
> > IB/Verbs: Reform IB-ulp xprtrdma
> > IB/Verbs: Reform IB-core verbs
> > IB/Verbs: Reform cm related part in IB-core cma/ucm
> > IB/Verbs: Reform route related part in IB-core cma
> > IB/Verbs: Reform mcast related part in IB-core cma
> > IB/Verbs: Reserve legacy transport type in 'dev_addr'
> > IB/Verbs: Reform cma_acquire_dev()
> > IB/Verbs: Reform rest part in IB-core cma
> > IB/Verbs: Use management helper cap_ib_mad()
> > IB/Verbs: Use management helper cap_ib_smi()
> > IB/Verbs: Use management helper cap_ib_cm()
> > IB/Verbs: Use management helper cap_iw_cm()
> > IB/Verbs: Use management helper cap_ib_sa()
> > IB/Verbs: Use management helper cap_ib_mcast()
> > IB/Verbs: Use management helper cap_read_multi_sge()
> > IB/Verbs: Use management helper cap_af_ib()
> > IB/Verbs: Use management helper cap_eth_ah()
> > IB/Verbs: Clean up rdma_ib_or_iboe()
> > IB/Verbs: Cleanup rdma_node_get_transport()
> >
> > drivers/infiniband/core/agent.c | 2 +-
> > drivers/infiniband/core/cm.c | 20 +-
> > drivers/infiniband/core/cma.c | 282 ++++++++++++---------------
> > drivers/infiniband/core/device.c | 1 +
> > drivers/infiniband/core/mad.c | 43 ++--
> > drivers/infiniband/core/multicast.c | 12 +-
> > drivers/infiniband/core/sa_query.c | 30 +--
> > drivers/infiniband/core/ucm.c | 3 +-
> > drivers/infiniband/core/ucma.c | 25 +--
> > drivers/infiniband/core/user_mad.c | 26 ++-
> > drivers/infiniband/core/verbs.c | 31 +--
> > drivers/infiniband/hw/amso1100/c2_provider.c | 7 +
> > drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 +
> > drivers/infiniband/hw/cxgb4/provider.c | 7 +
> > drivers/infiniband/hw/ehca/ehca_hca.c | 6 +
> > drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 +
> > drivers/infiniband/hw/ehca/ehca_main.c | 1 +
> > drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +
> > drivers/infiniband/hw/mlx4/main.c | 10 +
> > drivers/infiniband/hw/mlx5/main.c | 7 +
> > drivers/infiniband/hw/mthca/mthca_provider.c | 7 +
> > drivers/infiniband/hw/nes/nes_verbs.c | 6 +
> > drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 +
> > drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 +
> > drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 +
> > drivers/infiniband/hw/qib/qib_verbs.c | 7 +
> > drivers/infiniband/hw/usnic/usnic_ib_main.c | 1 +
> > drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 +
> > drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2 +
> > drivers/infiniband/ulp/ipoib/ipoib_main.c | 15 +-
> > include/rdma/ib_verbs.h | 169 +++++++++++++++-
> > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 4 +-
> > net/sunrpc/xprtrdma/svc_rdma_transport.c | 47 ++---
> > 33 files changed, 503 insertions(+), 301 deletions(-)
> >

2015-04-27 07:39:16

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()



On 04/24/2015 05:12 PM, Liran Liss wrote:
>> From: [email protected] [mailto:linux-rdma-
>>
> [snip]
>> a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
>> 65994a1..d54f91e 100644
>> --- a/include/rdma/ib_verbs.h
>> +++ b/include/rdma/ib_verbs.h
>> @@ -75,10 +75,13 @@ enum rdma_node_type { };
>>
>> enum rdma_transport_type {
>> + /* legacy for users */
>> RDMA_TRANSPORT_IB,
>> RDMA_TRANSPORT_IWARP,
>> RDMA_TRANSPORT_USNIC,
>> - RDMA_TRANSPORT_USNIC_UDP
>> + RDMA_TRANSPORT_USNIC_UDP,
>> + /* new transport */
>> + RDMA_TRANSPORT_IBOE,
>
> Remove RDMA_TRANSPORT_IBOE - it is not a transport.
> ROCE uses IBTA transport.
>
> If any code should test for ROCE should invoke a specific helper, e.g., rdma_protocol_iboe().
> This is what you currently call "rdma_tech_iboe" is patch 02/26.
>
> I think that pretty much everybody agrees that rdma_protocol_*() is a better name than rdma_tech_*(), right?
> So, let's change this.

Sure, sounds reasonable now, about the IBOE, we still need it to
separate the port support IB/ETH without the check on link-layer,
So what about a new enum on protocol type?

Like:

enum rdma_protocol {
RDMA_PROTOCOL_IB,
RDMA_PROTOCOL_IBOE,
RDMA_PROTOCOL_IWARP,
RDMA_PROTOCOL_USNIC_UDP
};

So we could use query_protocol() to ask device provide the protocol
type, and there will be no mixing with the legacy transport type
anymore :-)

Regards,
Michael Wang

>
> The semantics are: "check that a link supports a certain wire protocol, or a set of wire protocols", where 'certain'
> refers to the specific helper...
>
>

2015-04-27 07:41:52

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v6 02/26] IB/Verbs: Implement raw management helpers



On 04/24/2015 06:29 PM, Jason Gunthorpe wrote:
> On Fri, Apr 24, 2015 at 03:15:54PM +0000, Liran Liss wrote:
>>> From: [email protected] [mailto:linux-rdma-
>
>>> +static inline int rdma_tech_iboe(struct ib_device *device, u8 port_num)
>>> +{
>>> + return device->query_transport(device, port_num)
>>> + == RDMA_TRANSPORT_IBOE;
>>
>> Remove RDMA_TRANSPORT_IBOE.
>> In the current code, the test should be: (IB transport && Ethernet link layer).
>
> No, if this rmda_tech stuff is to reflect the specification the port
> implements, then RoCEE is a valid specification (IBA Annex A16), as it
> RoCEEv2 (A17).
>
> This patch set is trying to drop then link layer concept entirely.

I think a new enum on protocol could help solve the conflict in here, we
can still get rid of the link layer meanwhile leave the legacy
transport type alone :-)

Regards,
Michael Wang

>
> Jason
>

2015-04-27 07:44:52

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v6 00/26] IB/Verbs: IB Management Helpers

On 04/27/2015 09:19 AM, Devesh Sharma wrote:
> Tested-By: Devesh Sharma <[email protected]>
>
> I am still in process of reviewing the series. Will respond soon.

Thanks for the testing :-)

We are now planning on another version to solve the conflict on
transport/protocol concept, I think there will be some trivial
change in next version related to naming.

Regards,
Michael Wang

>
> -Regards
> Devesh
>> -----Original Message-----
>> From: Michael Wang [mailto:[email protected]]
>> Sent: Friday, April 24, 2015 6:43 PM
>> To: Roland Dreier; Sean Hefty; Hal Rosenstock; [email protected];
>> [email protected]
>> Cc: Tom Tucker; Steve Wise; Hoang-Nam Nguyen; Christoph Raisch; Mike
>> Marciniszyn; Eli Cohen; Faisal Latif; Jack Morgenstein; Or Gerlitz; Haggai Eran;
>> Ira Weiny; Tom Talpey; Jason Gunthorpe; Doug Ledford; Devesh Sharma; Liran
>> Liss; Dave Goodell
>> Subject: Re: [PATCH v6 00/26] IB/Verbs: IB Management Helpers
>>
>> Add missing Cc:
>> Devesh Sharma <[email protected]>"
>> Liran Liss <[email protected]>"
>> Dave Goodell <[email protected]>"
>>
>> Regards,
>> Michael Wang
>>
>> On 04/24/2015 02:23 PM, Michael Wang wrote:
>>> Since v5:
>>> * Thanks to Ira, Devesh for the review and testing :-)
>>> * Thanks for the comments from Steve, Tom, Jason, Hal, Devesh, Ira,
>>> Liran, Jason, Dave :-) Please remind me if anything missed :-P
>>> * Trivial fix for 4#
>>> * Drop the reform on acquiring link-layer in 9#
>>> * Drop cap_ipoib()
>>>
>>> There are plenty of lengthy code to check the transport type of IB
>>> device, or the link layer type of it's port, but actually we are just
>>> speculating whether a particular management/feature is supported by the
>> device/port.
>>>
>>> Thus instead of inferring, we should have our own mechanism for IB
>>> management capability/protocol/feature checking, several proposals below.
>>>
>>> This patch set will reform the method of getting transport type, we
>>> will now using query_transport() instead of inferring from transport
>>> and link layer respectively, also we defined the new transport type to
>>> make the concept more reasonable.
>>>
>>> Mapping List:
>>> node-type link-layer old-transport new-transport
>>> nes RNIC ETH IWARP IWARP
>>> amso1100 RNIC ETH IWARP IWARP
>>> cxgb3 RNIC ETH IWARP IWARP
>>> cxgb4 RNIC ETH IWARP IWARP
>>> usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP
>>> ocrdma IB_CA ETH IB IBOE
>>> mlx4 IB_CA IB/ETH IB IB/IBOE
>>> mlx5 IB_CA IB IB IB
>>> ehca IB_CA IB IB IB
>>> ipath IB_CA IB IB IB
>>> mthca IB_CA IB IB IB
>>> qib IB_CA IB IB IB
>>>
>>> For example:
>>> if (transport == IB) && (link-layer == ETH) will now become:
>>> if (query_transport() == IBOE)
>>>
>>> Thus we will be able to get rid of the respective transport and
>>> link-layer checking, and it will help us to add new
>>> protocol/Technology (like OPA) more easier, also with the introduced
>>> management helpers, IB management logical will be more clear and easier
>> for extending.
>>>
>>> Highlights:
>>> The patch set covered a wide range of IB stuff, thus for those who are
>>> familiar with the particular part, your suggestion would be
>>> invaluable ;-)
>>>
>>> Patch 1#~15# included all the logical reform, 16#~25# introduced the
>>> management helpers, 26#~27# do clean up.
>>>
>>> we appreciate for those one who have the HW willing to provide
>>> Tested-by :-)
>>>
>>> Doug suggested the bitmask mechanism:
>>> https://www.mail-archive.com/linux-
>> [email protected]/msg23765.html
>>> which could be the plan for future reforming, we prefer that to be another
>>> series which focus on semantic and performance.
>>>
>>> This patch-set is somewhat 'bloated' now and it may be a good timing for
>>> staging, I'd like to suggest we focus on improving existed helpers and push
>>> all the further reforms into next series ;-)
>>>
>>> We now have a repository based on latest infiniband/for-next with this
>>> series applied:
>>> [email protected]:ywang-pb/infiniband-wy.git
>>>
>>> Proposals:
>>> Sean:
>>> https://www.mail-archive.com/linux-
>> [email protected]/msg23339.html
>>> Doug:
>>> https://www.mail-archive.com/linux-
>> [email protected]/msg23418.html
>>> https://www.mail-archive.com/linux-
>> [email protected]/msg23765.html
>>> Jason:
>>> https://www.mail-archive.com/linux-
>> [email protected]/msg23425.html
>>>
>>> Michael Wang (26):
>>> IB/Verbs: Implement new callback query_transport()
>>> IB/Verbs: Implement raw management helpers
>>> IB/Verbs: Reform IB-core mad/agent/user_mad
>>> IB/Verbs: Reform IB-core cm
>>> IB/Verbs: Reform IB-core sa_query
>>> IB/Verbs: Reform IB-core multicast
>>> IB/Verbs: Reform IB-ulp ipoib
>>> IB/Verbs: Reform IB-ulp xprtrdma
>>> IB/Verbs: Reform IB-core verbs
>>> IB/Verbs: Reform cm related part in IB-core cma/ucm
>>> IB/Verbs: Reform route related part in IB-core cma
>>> IB/Verbs: Reform mcast related part in IB-core cma
>>> IB/Verbs: Reserve legacy transport type in 'dev_addr'
>>> IB/Verbs: Reform cma_acquire_dev()
>>> IB/Verbs: Reform rest part in IB-core cma
>>> IB/Verbs: Use management helper cap_ib_mad()
>>> IB/Verbs: Use management helper cap_ib_smi()
>>> IB/Verbs: Use management helper cap_ib_cm()
>>> IB/Verbs: Use management helper cap_iw_cm()
>>> IB/Verbs: Use management helper cap_ib_sa()
>>> IB/Verbs: Use management helper cap_ib_mcast()
>>> IB/Verbs: Use management helper cap_read_multi_sge()
>>> IB/Verbs: Use management helper cap_af_ib()
>>> IB/Verbs: Use management helper cap_eth_ah()
>>> IB/Verbs: Clean up rdma_ib_or_iboe()
>>> IB/Verbs: Cleanup rdma_node_get_transport()
>>>
>>> drivers/infiniband/core/agent.c | 2 +-
>>> drivers/infiniband/core/cm.c | 20 +-
>>> drivers/infiniband/core/cma.c | 282 ++++++++++++---------------
>>> drivers/infiniband/core/device.c | 1 +
>>> drivers/infiniband/core/mad.c | 43 ++--
>>> drivers/infiniband/core/multicast.c | 12 +-
>>> drivers/infiniband/core/sa_query.c | 30 +--
>>> drivers/infiniband/core/ucm.c | 3 +-
>>> drivers/infiniband/core/ucma.c | 25 +--
>>> drivers/infiniband/core/user_mad.c | 26 ++-
>>> drivers/infiniband/core/verbs.c | 31 +--
>>> drivers/infiniband/hw/amso1100/c2_provider.c | 7 +
>>> drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 +
>>> drivers/infiniband/hw/cxgb4/provider.c | 7 +
>>> drivers/infiniband/hw/ehca/ehca_hca.c | 6 +
>>> drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 +
>>> drivers/infiniband/hw/ehca/ehca_main.c | 1 +
>>> drivers/infiniband/hw/ipath/ipath_verbs.c | 7 +
>>> drivers/infiniband/hw/mlx4/main.c | 10 +
>>> drivers/infiniband/hw/mlx5/main.c | 7 +
>>> drivers/infiniband/hw/mthca/mthca_provider.c | 7 +
>>> drivers/infiniband/hw/nes/nes_verbs.c | 6 +
>>> drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 +
>>> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 +
>>> drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 +
>>> drivers/infiniband/hw/qib/qib_verbs.c | 7 +
>>> drivers/infiniband/hw/usnic/usnic_ib_main.c | 1 +
>>> drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 +
>>> drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2 +
>>> drivers/infiniband/ulp/ipoib/ipoib_main.c | 15 +-
>>> include/rdma/ib_verbs.h | 169 +++++++++++++++-
>>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 4 +-
>>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 47 ++---
>>> 33 files changed, 503 insertions(+), 301 deletions(-)
>>>

2015-04-27 21:52:45

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On Mon, Apr 27, 2015 at 09:39:05AM +0200, Michael Wang wrote:
>
>
> On 04/24/2015 05:12 PM, Liran Liss wrote:
> >> From: [email protected] [mailto:linux-rdma-
> >>
> > [snip]
> >> a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
> >> 65994a1..d54f91e 100644
> >> --- a/include/rdma/ib_verbs.h
> >> +++ b/include/rdma/ib_verbs.h
> >> @@ -75,10 +75,13 @@ enum rdma_node_type { };
> >>
> >> enum rdma_transport_type {
> >> + /* legacy for users */
> >> RDMA_TRANSPORT_IB,
> >> RDMA_TRANSPORT_IWARP,
> >> RDMA_TRANSPORT_USNIC,
> >> - RDMA_TRANSPORT_USNIC_UDP
> >> + RDMA_TRANSPORT_USNIC_UDP,
> >> + /* new transport */
> >> + RDMA_TRANSPORT_IBOE,
> >
> > Remove RDMA_TRANSPORT_IBOE - it is not a transport.
> > ROCE uses IBTA transport.
> >
> > If any code should test for ROCE should invoke a specific helper, e.g., rdma_protocol_iboe().
> > This is what you currently call "rdma_tech_iboe" is patch 02/26.
> >
> > I think that pretty much everybody agrees that rdma_protocol_*() is a better name than rdma_tech_*(), right?
> > So, let's change this.
>
> Sure, sounds reasonable now, about the IBOE, we still need it to
> separate the port support IB/ETH without the check on link-layer,
> So what about a new enum on protocol type?
>
> Like:
>
> enum rdma_protocol {
> RDMA_PROTOCOL_IB,
> RDMA_PROTOCOL_IBOE,
> RDMA_PROTOCOL_IWARP,
> RDMA_PROTOCOL_USNIC_UDP
> };
>
> So we could use query_protocol() to ask device provide the protocol
> type, and there will be no mixing with the legacy transport type
> anymore :-)

I'm ok with that. I like introducing a unique namespace which is clearly
different from the previous "transport" one.

Ira

2015-04-28 00:16:32

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On 4/27/2015 2:52 PM, ira.weiny wrote:
> On Mon, Apr 27, 2015 at 09:39:05AM +0200, Michael Wang wrote:
>>
>>
>> On 04/24/2015 05:12 PM, Liran Liss wrote:
>>>> From: [email protected] [mailto:linux-rdma-
>>>>
>>> [snip]
>>>> a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
>>>> 65994a1..d54f91e 100644
>>>> --- a/include/rdma/ib_verbs.h
>>>> +++ b/include/rdma/ib_verbs.h
>>>> @@ -75,10 +75,13 @@ enum rdma_node_type { };
>>>>
>>>> enum rdma_transport_type {
>>>> + /* legacy for users */
>>>> RDMA_TRANSPORT_IB,
>>>> RDMA_TRANSPORT_IWARP,
>>>> RDMA_TRANSPORT_USNIC,
>>>> - RDMA_TRANSPORT_USNIC_UDP
>>>> + RDMA_TRANSPORT_USNIC_UDP,
>>>> + /* new transport */
>>>> + RDMA_TRANSPORT_IBOE,
>>>
>>> Remove RDMA_TRANSPORT_IBOE - it is not a transport.
>>> ROCE uses IBTA transport.
>>>
>>> If any code should test for ROCE should invoke a specific helper, e.g., rdma_protocol_iboe().
>>> This is what you currently call "rdma_tech_iboe" is patch 02/26.
>>>
>>> I think that pretty much everybody agrees that rdma_protocol_*() is a better name than rdma_tech_*(), right?
>>> So, let's change this.
>>
>> Sure, sounds reasonable now, about the IBOE, we still need it to
>> separate the port support IB/ETH without the check on link-layer,
>> So what about a new enum on protocol type?
>>
>> Like:
>>
>> enum rdma_protocol {
>> RDMA_PROTOCOL_IB,
>> RDMA_PROTOCOL_IBOE,
>> RDMA_PROTOCOL_IWARP,
>> RDMA_PROTOCOL_USNIC_UDP
>> };
>>
>> So we could use query_protocol() to ask device provide the protocol
>> type, and there will be no mixing with the legacy transport type
>> anymore :-)
>
> I'm ok with that. I like introducing a unique namespace which is clearly
> different from the previous "transport" one.

I agree the word "transport" takes things into the weeds.

But on the topic of naming protocols, I've been wondering, is there
some reason that "IBOE" is being used instead of "RoCE"? The IBOE
protocol used to exist and is not the same as the currently
standardized RoCE, right?

Also wondering, why add "UDP" to USNIC, is there a different USNIC?

Naming multiple layers together seems confusing and maybe in the end
will create more code to deal with the differences. For example, what
token will RoCEv2 take? RoCE_UDP, RoCE_v2 or ... ?

2015-04-28 00:36:16

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On Mon, 2015-04-27 at 17:16 -0700, Tom Talpey wrote:
> On 4/27/2015 2:52 PM, ira.weiny wrote:
> > On Mon, Apr 27, 2015 at 09:39:05AM +0200, Michael Wang wrote:
> >>
> >>
> >> On 04/24/2015 05:12 PM, Liran Liss wrote:
> >>>> From: [email protected] [mailto:linux-rdma-
> >>>>
> >>> [snip]
> >>>> a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index
> >>>> 65994a1..d54f91e 100644
> >>>> --- a/include/rdma/ib_verbs.h
> >>>> +++ b/include/rdma/ib_verbs.h
> >>>> @@ -75,10 +75,13 @@ enum rdma_node_type { };
> >>>>
> >>>> enum rdma_transport_type {
> >>>> + /* legacy for users */
> >>>> RDMA_TRANSPORT_IB,
> >>>> RDMA_TRANSPORT_IWARP,
> >>>> RDMA_TRANSPORT_USNIC,
> >>>> - RDMA_TRANSPORT_USNIC_UDP
> >>>> + RDMA_TRANSPORT_USNIC_UDP,
> >>>> + /* new transport */
> >>>> + RDMA_TRANSPORT_IBOE,
> >>>
> >>> Remove RDMA_TRANSPORT_IBOE - it is not a transport.
> >>> ROCE uses IBTA transport.
> >>>
> >>> If any code should test for ROCE should invoke a specific helper, e.g., rdma_protocol_iboe().
> >>> This is what you currently call "rdma_tech_iboe" is patch 02/26.
> >>>
> >>> I think that pretty much everybody agrees that rdma_protocol_*() is a better name than rdma_tech_*(), right?
> >>> So, let's change this.
> >>
> >> Sure, sounds reasonable now, about the IBOE, we still need it to
> >> separate the port support IB/ETH without the check on link-layer,
> >> So what about a new enum on protocol type?
> >>
> >> Like:
> >>
> >> enum rdma_protocol {
> >> RDMA_PROTOCOL_IB,
> >> RDMA_PROTOCOL_IBOE,
> >> RDMA_PROTOCOL_IWARP,
> >> RDMA_PROTOCOL_USNIC_UDP
> >> };
> >>
> >> So we could use query_protocol() to ask device provide the protocol
> >> type, and there will be no mixing with the legacy transport type
> >> anymore :-)
> >
> > I'm ok with that. I like introducing a unique namespace which is clearly
> > different from the previous "transport" one.
>
> I agree the word "transport" takes things into the weeds.
>
> But on the topic of naming protocols, I've been wondering, is there
> some reason that "IBOE" is being used instead of "RoCE"?

Because back in the day, when RoCE was accepted into the kernel, I'm
pretty sure it was prior to the IBTA's final stamp of approval and
before the name was set on RoCE, so IBoE was chosen upstream as the more
"correct" name because it properly denoted what it was deemed to truly
be: IB Verbs over Ethernet.

> The IBOE
> protocol used to exist and is not the same as the currently
> standardized RoCE, right?

I don't believe so. To my knowledge, there was never an IBoE except in
linux upstream parlance.

> Also wondering, why add "UDP" to USNIC, is there a different USNIC?

Yes, there are two transports, one a distinct ethertype and one that
encapsulates USNIC in UDP.

> Naming multiple layers together seems confusing and maybe in the end
> will create more code to deal with the differences. For example, what
> token will RoCEv2 take? RoCE_UDP, RoCE_v2 or ... ?

Uncertain as of now.

--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-28 00:53:25

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On 4/27/2015 5:36 PM, Doug Ledford wrote:
> On Mon, 2015-04-27 at 17:16 -0700, Tom Talpey wrote:
>> On 4/27/2015 2:52 PM, ira.weiny wrote:
>>> On Mon, Apr 27, 2015 at 09:39:05AM +0200, Michael Wang wrote:
>>>> On 04/24/2015 05:12 PM, Liran Liss wrote:
>>>>> [snip]
>>>>
>>>> Like:
>>>>
>>>> enum rdma_protocol {
>>>> RDMA_PROTOCOL_IB,
>>>> RDMA_PROTOCOL_IBOE,
>>>> RDMA_PROTOCOL_IWARP,
>>>> RDMA_PROTOCOL_USNIC_UDP
>>>> };
>>>>
>>>> So we could use query_protocol() to ask device provide the protocol
>>>> type, and there will be no mixing with the legacy transport type
>>>> anymore :-)
>>>
>>> I'm ok with that. I like introducing a unique namespace which is clearly
>>> different from the previous "transport" one.
>>
>> I agree the word "transport" takes things into the weeds.
>>
>> But on the topic of naming protocols, I've been wondering, is there
>> some reason that "IBOE" is being used instead of "RoCE"?
>
> Because back in the day, when RoCE was accepted into the kernel, I'm
> pretty sure it was prior to the IBTA's final stamp of approval and
> before the name was set on RoCE, so IBoE was chosen upstream as the more
> "correct" name because it properly denoted what it was deemed to truly
> be: IB Verbs over Ethernet.

Well history is all well and good, but it seems weird to not use the
current, standard name in new code. It confuses me, anyway, because
it seems like IBOE could easily mean something else.

>> Also wondering, why add "UDP" to USNIC, is there a different USNIC?
>
> Yes, there are two transports, one a distinct ethertype and one that
> encapsulates USNIC in UDP.

But this new enum isn't about transport, it's about protocol. So is
there one USNIC protocol, with a raw layering and a separate one with
UDP? Or is it one USNIC protocol with two different framings? Seems
there should be at least the USNIC protocol, without the _UDP
decoration, and I don't see it in the enum.

>
>> Naming multiple layers together seems confusing and maybe in the end
>> will create more code to deal with the differences. For example, what
>> token will RoCEv2 take? RoCE_UDP, RoCE_v2 or ... ?
>
> Uncertain as of now.

Ok, but it's imminent, right? What's the preference/guidance?

2015-04-28 01:25:11

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On Mon, 2015-04-27 at 17:53 -0700, Tom Talpey wrote:
> On 4/27/2015 5:36 PM, Doug Ledford wrote:
> > On Mon, 2015-04-27 at 17:16 -0700, Tom Talpey wrote:
> >> On 4/27/2015 2:52 PM, ira.weiny wrote:
> >>> On Mon, Apr 27, 2015 at 09:39:05AM +0200, Michael Wang wrote:
> >>>> On 04/24/2015 05:12 PM, Liran Liss wrote:
> >>>>> [snip]
> >>>>
> >>>> Like:
> >>>>
> >>>> enum rdma_protocol {
> >>>> RDMA_PROTOCOL_IB,
> >>>> RDMA_PROTOCOL_IBOE,
> >>>> RDMA_PROTOCOL_IWARP,
> >>>> RDMA_PROTOCOL_USNIC_UDP
> >>>> };
> >>>>
> >>>> So we could use query_protocol() to ask device provide the protocol
> >>>> type, and there will be no mixing with the legacy transport type
> >>>> anymore :-)
> >>>
> >>> I'm ok with that. I like introducing a unique namespace which is clearly
> >>> different from the previous "transport" one.
> >>
> >> I agree the word "transport" takes things into the weeds.
> >>
> >> But on the topic of naming protocols, I've been wondering, is there
> >> some reason that "IBOE" is being used instead of "RoCE"?
> >
> > Because back in the day, when RoCE was accepted into the kernel, I'm
> > pretty sure it was prior to the IBTA's final stamp of approval and
> > before the name was set on RoCE, so IBoE was chosen upstream as the more
> > "correct" name because it properly denoted what it was deemed to truly
> > be: IB Verbs over Ethernet.
>
> Well history is all well and good, but it seems weird to not use the
> current, standard name in new code. It confuses me, anyway, because
> it seems like IBOE could easily mean something else.

Having some of it refer to things as IBOE and some as ROCE would be
similarly confusing, and switching existing IBOE usage to ROCE would
cause pain to people with out of tree drivers (Lustre is the main one I
know of). There's not a good answer here. There's only less sucky
ones.

> >> Also wondering, why add "UDP" to USNIC, is there a different USNIC?
> >
> > Yes, there are two transports, one a distinct ethertype and one that
> > encapsulates USNIC in UDP.
>
> But this new enum isn't about transport, it's about protocol. So is
> there one USNIC protocol, with a raw layering and a separate one with
> UDP? Or is it one USNIC protocol with two different framings? Seems
> there should be at least the USNIC protocol, without the _UDP
> decoration, and I don't see it in the enum.

Keep in mind that this enum was Liran's response to Michael's original
patch. In the enum in Michael's patch, there was both USNIC and
USNIC_UDP.

> >
> >> Naming multiple layers together seems confusing and maybe in the end
> >> will create more code to deal with the differences. For example, what
> >> token will RoCEv2 take? RoCE_UDP, RoCE_v2 or ... ?
> >
> > Uncertain as of now.
>
> Ok, but it's imminent, right? What's the preference/guidance?

There is a patchset from Devesh Sharma at Emulex. It added the RoCEv2
capability. As I recall, it used a new flag added to the existing port
capabilities bitmask and notably did not modify either the node type or
link layer that are currently used to differentiate between the
different protocols. That's from memory though, so I could be mistaken.

But that patchset was not written with this patchset in mind, and
merging the two may well change that. In any case, there is a proposed
spec to follow, so for now that's the preference/guidance (unless this
rework means that we need to depart from the spec on internals for
implementation reasons).


--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-28 01:49:16

by Tom Talpey

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On 4/27/2015 6:24 PM, Doug Ledford wrote:
> On Mon, 2015-04-27 at 17:53 -0700, Tom Talpey wrote:
>> On 4/27/2015 5:36 PM, Doug Ledford wrote:
>>> On Mon, 2015-04-27 at 17:16 -0700, Tom Talpey wrote:
>>>> On 4/27/2015 2:52 PM, ira.weiny wrote:
>>>>> On Mon, Apr 27, 2015 at 09:39:05AM +0200, Michael Wang wrote:
>>>>>> On 04/24/2015 05:12 PM, Liran Liss wrote:
>>>>>>> [snip]
>>>>>>
>>>>>> Like:
>>>>>>
>>>>>> enum rdma_protocol {
>>>>>> RDMA_PROTOCOL_IB,
>>>>>> RDMA_PROTOCOL_IBOE,
>>>>>> RDMA_PROTOCOL_IWARP,
>>>>>> RDMA_PROTOCOL_USNIC_UDP
>>>>>> };
>>>>>>
>>>>>> So we could use query_protocol() to ask device provide the protocol
>>>>>> type, and there will be no mixing with the legacy transport type
>>>>>> anymore :-)
>>>>>
>>>>> I'm ok with that. I like introducing a unique namespace which is clearly
>>>>> different from the previous "transport" one.
>>>>
>>>> I agree the word "transport" takes things into the weeds.
>>>>
>>>> But on the topic of naming protocols, I've been wondering, is there
>>>> some reason that "IBOE" is being used instead of "RoCE"?
>>>
>>> Because back in the day, when RoCE was accepted into the kernel, I'm
>>> pretty sure it was prior to the IBTA's final stamp of approval and
>>> before the name was set on RoCE, so IBoE was chosen upstream as the more
>>> "correct" name because it properly denoted what it was deemed to truly
>>> be: IB Verbs over Ethernet.
>>
>> Well history is all well and good, but it seems weird to not use the
>> current, standard name in new code. It confuses me, anyway, because
>> it seems like IBOE could easily mean something else.
>
> Having some of it refer to things as IBOE and some as ROCE would be
> similarly confusing, and switching existing IBOE usage to ROCE would
> cause pain to people with out of tree drivers (Lustre is the main one I
> know of). There's not a good answer here. There's only less sucky
> ones.

Hrm. Well, avoiding churn is good but legacies can wear ya down.
MHO it is worth doing since these are new enums/new patches.


>
>>>> Also wondering, why add "UDP" to USNIC, is there a different USNIC?
>>>
>>> Yes, there are two transports, one a distinct ethertype and one that
>>> encapsulates USNIC in UDP.
>>
>> But this new enum isn't about transport, it's about protocol. So is
>> there one USNIC protocol, with a raw layering and a separate one with
>> UDP? Or is it one USNIC protocol with two different framings? Seems
>> there should be at least the USNIC protocol, without the _UDP
>> decoration, and I don't see it in the enum.
>
> Keep in mind that this enum was Liran's response to Michael's original
> patch. In the enum in Michael's patch, there was both USNIC and
> USNIC_UDP.

Right! That's why I'm confused. Seems wrong to drop it, right?

>
>>>
>>>> Naming multiple layers together seems confusing and maybe in the end
>>>> will create more code to deal with the differences. For example, what
>>>> token will RoCEv2 take? RoCE_UDP, RoCE_v2 or ... ?
>>>
>>> Uncertain as of now.
>>
>> Ok, but it's imminent, right? What's the preference/guidance?
>
> There is a patchset from Devesh Sharma at Emulex. It added the RoCEv2
> capability. As I recall, it used a new flag added to the existing port
> capabilities bitmask and notably did not modify either the node type or
> link layer that are currently used to differentiate between the
> different protocols. That's from memory though, so I could be mistaken.
>
> But that patchset was not written with this patchset in mind, and
> merging the two may well change that. In any case, there is a proposed
> spec to follow, so for now that's the preference/guidance (unless this
> rework means that we need to depart from the spec on internals for
> implementation reasons).

Well, if RoCEv2 uses the same protocol enum, that may introduce new
confusion, for example there will be some new CM handling for UDP encap,
source port selection, and of course vlan/tag assignment, etc. But if
there is support under way, and everyone is clear, then, ok.

Thanks.

2015-04-28 06:14:20

by Hefty, Sean

[permalink] [raw]
Subject: RE: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

> > Keep in mind that this enum was Liran's response to Michael's original
> > patch. In the enum in Michael's patch, there was both USNIC and
> > USNIC_UDP.
>
> Right! That's why I'm confused. Seems wrong to drop it, right?

I think the original USNIC protocol is layered directly over Ethernet. The protocol basically stole an Ethertype (the one used for IBoE/RoCE) and implemented a proprietary protocol instead. I have no idea how you resolve that, but I also don't think it's used anymore. USNIC_UDP is just UDP.

> Well, if RoCEv2 uses the same protocol enum, that may introduce new
> confusion, for example there will be some new CM handling for UDP encap,
> source port selection, and of course vlan/tag assignment, etc. But if
> there is support under way, and everyone is clear, then, ok.

RoCEv2/IBoUDP shares the same port space as UDP. It has a similar issues as iWarp does sharing state with the main network stack. I'm not aware of any proposal for resolving that. Does it require using a separate IP address? Does it use a port mapper function? Does netdev care for UDP? I'm not sure what USNIC does for this either, but a common solution between USNIC and IBoUDP seems reasonable.


????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2015-04-28 14:29:26

by Michael Wang

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()



On 04/28/2015 03:24 AM, Doug Ledford wrote:
[snip]
>>>> Also wondering, why add "UDP" to USNIC, is there a different USNIC?
>>>
>>> Yes, there are two transports, one a distinct ethertype and one that
>>> encapsulates USNIC in UDP.
>>
>> But this new enum isn't about transport, it's about protocol. So is
>> there one USNIC protocol, with a raw layering and a separate one with
>> UDP? Or is it one USNIC protocol with two different framings? Seems
>> there should be at least the USNIC protocol, without the _UDP
>> decoration, and I don't see it in the enum.
>
> Keep in mind that this enum was Liran's response to Michael's original
> patch. In the enum in Michael's patch, there was both USNIC and
> USNIC_UDP.

Yeah, I've not enum PROTOCOL_USNIC since currently there is no place
need it...

The only three cases currently are:
1. trasnport IB, link layer IB //PROTOCOL_IB
2. transport IB, link layer ETH //PROTOCOL_IBOE
3. transport IWARP //PROTOCOL_IWARP

Regards,
Michael Wang

>
>>>
>>>> Naming multiple layers together seems confusing and maybe in the end
>>>> will create more code to deal with the differences. For example, what
>>>> token will RoCEv2 take? RoCE_UDP, RoCE_v2 or ... ?
>>>
>>> Uncertain as of now.
>>
>> Ok, but it's imminent, right? What's the preference/guidance?
>
> There is a patchset from Devesh Sharma at Emulex. It added the RoCEv2
> capability. As I recall, it used a new flag added to the existing port
> capabilities bitmask and notably did not modify either the node type or
> link layer that are currently used to differentiate between the
> different protocols. That's from memory though, so I could be mistaken.
>
> But that patchset was not written with this patchset in mind, and
> merging the two may well change that. In any case, there is a proposed
> spec to follow, so for now that's the preference/guidance (unless this
> rework means that we need to depart from the spec on internals for
> implementation reasons).
>
>

2015-04-28 18:56:29

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On Mon, Apr 27, 2015 at 09:24:35PM -0400, Doug Ledford wrote:
> On Mon, 2015-04-27 at 17:53 -0700, Tom Talpey wrote:

> Having some of it refer to things as IBOE and some as ROCE would be
> similarly confusing, and switching existing IBOE usage to ROCE would
> cause pain to people with out of tree drivers (Lustre is the main one I
> know of). There's not a good answer here. There's only less sucky
> ones.

The tide has already turned, we should ditch iboe:

$git grep -i roce_ drivers/infiniband/ | wc -l
91
$git grep -i iboe_ drivers/infiniband/ | wc -l
37

It isn't really mainline's role to be too concerned about out of tree
things like Lustre.

Jason

2015-04-28 19:11:29

by Or Gerlitz

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On Tue, Apr 28, 2015 at 9:56 PM, Jason Gunthorpe
<[email protected]> wrote:
> On Mon, Apr 27, 2015 at 09:24:35PM -0400, Doug Ledford wrote:
>> On Mon, 2015-04-27 at 17:53 -0700, Tom Talpey wrote:
>
>> Having some of it refer to things as IBOE and some as ROCE would be
>> similarly confusing, and switching existing IBOE usage to ROCE would
>> cause pain to people with out of tree drivers (Lustre is the main one I
>> know of). There's not a good answer here. There's only less sucky
>> ones.
>
> The tide has already turned, we should ditch iboe:
>
> $git grep -i roce_ drivers/infiniband/ | wc -l
> 91
> $git grep -i iboe_ drivers/infiniband/ | wc -l
> 37
>
> It isn't really mainline's role to be too concerned about out of tree
> things like Lustre.

FWIW, note that Lustre is under staging for a while, not sure how
close they are for actual acceptance.

Or.

2015-04-28 19:50:53

by Dave Goodell

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On Apr 28, 2015, at 1:14 AM, Hefty, Sean <[email protected]> wrote:

>>> Keep in mind that this enum was Liran's response to Michael's original
>>> patch. In the enum in Michael's patch, there was both USNIC and
>>> USNIC_UDP.
>>
>> Right! That's why I'm confused. Seems wrong to drop it, right?
>
> I think the original USNIC protocol is layered directly over Ethernet. The protocol basically stole an Ethertype (the one used for IBoE/RoCE) and implemented a proprietary protocol instead. I have no idea how you resolve that, but I also don't think it's used anymore. USNIC_UDP is just UDP.

Sean is correct. The legacy RDMA_TRANSPORT_USNIC code used a proprietary protocol over plain Ethernet frames. The newer RDMA_TRANSPORT_USNIC_UDP code is just standard UDP/IP/Ethernet packets exposed to user space via the uverbs stack. The current kernel module will support both formats, it just depends on which user space requests at create_qp time. From the kernel point of view there is no common protocol between the two TRANSPORTs (other than sharing partially similar Ethernet frames at L2).

I posted last week to clarify some of this: http://marc.info/?l=linux-rdma&m=142972177830718&w=2

>> Well, if RoCEv2 uses the same protocol enum, that may introduce new
>> confusion, for example there will be some new CM handling for UDP encap,
>> source port selection, and of course vlan/tag assignment, etc. But if
>> there is support under way, and everyone is clear, then, ok.
>
> RoCEv2/IBoUDP shares the same port space as UDP. It has a similar issues as iWarp does sharing state with the main network stack. I'm not aware of any proposal for resolving that. Does it require using a separate IP address? Does it use a port mapper function? Does netdev care for UDP? I'm not sure what USNIC does for this either, but a common solution between USNIC and IBoUDP seems reasonable.

Is the concern here about CM issues or the UDP ports used by the actual usNIC RQs? CM is not used/supported for usNIC at this time.

-Dave

2015-04-28 19:53:45

by Hefty, Sean

[permalink] [raw]
Subject: RE: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

> Is the concern here about CM issues or the UDP ports used by the actual
> usNIC RQs?

UDP port space sharing

2015-04-28 20:03:26

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On Tue, 2015-04-28 at 22:11 +0300, Or Gerlitz wrote:
> On Tue, Apr 28, 2015 at 9:56 PM, Jason Gunthorpe
> <[email protected]> wrote:
> > On Mon, Apr 27, 2015 at 09:24:35PM -0400, Doug Ledford wrote:
> >> On Mon, 2015-04-27 at 17:53 -0700, Tom Talpey wrote:
> >
> >> Having some of it refer to things as IBOE and some as ROCE would be
> >> similarly confusing, and switching existing IBOE usage to ROCE would
> >> cause pain to people with out of tree drivers (Lustre is the main one I
> >> know of). There's not a good answer here. There's only less sucky
> >> ones.
> >
> > The tide has already turned, we should ditch iboe:
> >
> > $git grep -i roce_ drivers/infiniband/ | wc -l
> > 91
> > $git grep -i iboe_ drivers/infiniband/ | wc -l
> > 37
> >
> > It isn't really mainline's role to be too concerned about out of tree
> > things like Lustre.
>
> FWIW, note that Lustre is under staging for a while, not sure how
> close they are for actual acceptance.

I thought that was just the client and didn't include the server...


--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-28 20:03:50

by Doug Ledford

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On Tue, 2015-04-28 at 12:56 -0600, Jason Gunthorpe wrote:
> On Mon, Apr 27, 2015 at 09:24:35PM -0400, Doug Ledford wrote:
> > On Mon, 2015-04-27 at 17:53 -0700, Tom Talpey wrote:
>
> > Having some of it refer to things as IBOE and some as ROCE would be
> > similarly confusing, and switching existing IBOE usage to ROCE would
> > cause pain to people with out of tree drivers (Lustre is the main one I
> > know of). There's not a good answer here. There's only less sucky
> > ones.
>
> The tide has already turned, we should ditch iboe:
>
> $git grep -i roce_ drivers/infiniband/ | wc -l
> 91
> $git grep -i iboe_ drivers/infiniband/ | wc -l
> 37
>
> It isn't really mainline's role to be too concerned about out of tree
> things like Lustre.

While I generally agree, one need not be totally callous about out of
tree things either.


--
Doug Ledford <[email protected]>
GPG KeyID: 0E572FDD



Attachments:
signature.asc (819.00 B)
This is a digitally signed message part

2015-04-28 20:26:36

by Dave Goodell

[permalink] [raw]
Subject: Re: [PATCH v6 01/26] IB/Verbs: Implement new callback query_transport()

On Apr 28, 2015, at 2:53 PM, Hefty, Sean <[email protected]> wrote:

>> Is the concern here about CM issues or the UDP ports used by the actual
>> usNIC RQs?
>
> UDP port space sharing

For the UDP port used by the usNIC QP, the usnic_verbs kernel driver requires user space to pass a file descriptor of a regular UDP socket down at create_qp time. The reference count on this socket is incremented to make sure that the socket can't disappear out from under us. Then an RX filter is installed in the NIC which matches UDP/IP/Ethernet packets that are destined for the UDP port to which the given socket is already bound. So there is a real UDP socket to make most of the usual things happen in the net stack, but the raw UDP/IP/Ethernet packets get delivered directly to the user space queues by the NIC. E.g., "netstat" and "lsof" show you proper addressing information, though obviously any information related to data-path statistics will not be accurate. At teardown we just reverse the steps.

However, I'm not sure if that's the sort of information you were looking for.

-Dave