2006-12-02 22:49:19

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 00/13] 2.6.20 Chelsio T3 RDMA Driver


Version 2 changes:

- Make code sparse endian clean
- Use IDRs for mapping QP and CQ IDs to structure pointers instead of arrays
- Clean up confusing bitfields
- Use random32() instead of local random function
- Use krefs to track endpoint reference counts
- Misc nits

-----

The following series implements the Chelsio T3 iWARP/RDMA Driver to
be considered for inclusion in 2.6.20. It depends on the Chelsio T3
Ethernet Driver which is also under review now for 2.6.20. See:

http://www.mail-archive.com/[email protected]/msg26619.html

The patches are against 2.6.19.

This patch series can also be pulled from:

http://www.opengridcomputing.com/downloads/iw_cxgb3_patches_v2.tar.bz2

The Chelsio T3 Ethernet Driver patch can be pulled from:

http://service.chelsio.com/kernel.org/cxgb3.patch.bz2

A complete GIT kernel tree with all the T3 drivers can be pulled from:

git://staging.openfabrics.org/~swise/cxgb3.git

Thanks,

Steve.


2006-12-02 22:49:29

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 01/13] Linux RDMA Core Changes


Support provider-specific data in ib_uverbs_cmd_req_notify_cq().
The Chelsio iwarp provider library needs to pass information to the
kernel verb for re-arming the CQ.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/core/uverbs_cmd.c | 9 +++++++--
drivers/infiniband/hw/amso1100/c2.h | 2 +-
drivers/infiniband/hw/amso1100/c2_cq.c | 3 ++-
drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 ++-
drivers/infiniband/hw/ehca/ehca_reqs.c | 3 ++-
drivers/infiniband/hw/ipath/ipath_cq.c | 4 +++-
drivers/infiniband/hw/ipath/ipath_verbs.h | 3 ++-
drivers/infiniband/hw/mthca/mthca_cq.c | 6 ++++--
drivers/infiniband/hw/mthca/mthca_dev.h | 4 ++--
include/rdma/ib_verbs.h | 5 +++--
10 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 743247e..5dd1de9 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -959,6 +959,7 @@ ssize_t ib_uverbs_req_notify_cq(struct i
int out_len)
{
struct ib_uverbs_req_notify_cq cmd;
+ struct ib_udata udata;
struct ib_cq *cq;

if (copy_from_user(&cmd, buf, sizeof cmd))
@@ -968,8 +969,12 @@ ssize_t ib_uverbs_req_notify_cq(struct i
if (!cq)
return -EINVAL;

- ib_req_notify_cq(cq, cmd.solicited_only ?
- IB_CQ_SOLICITED : IB_CQ_NEXT_COMP);
+ INIT_UDATA(&udata, buf + sizeof cmd, 0,
+ in_len - sizeof cmd, 0);
+
+ cq->device->req_notify_cq(cq, cmd.solicited_only ?
+ IB_CQ_SOLICITED : IB_CQ_NEXT_COMP,
+ &udata);

put_cq_read(cq);

diff --git a/drivers/infiniband/hw/amso1100/c2.h b/drivers/infiniband/hw/amso1100/c2.h
index 1b17dcd..716f9dc 100644
--- a/drivers/infiniband/hw/amso1100/c2.h
+++ b/drivers/infiniband/hw/amso1100/c2.h
@@ -519,7 +519,7 @@ extern void c2_free_cq(struct c2_dev *c2
extern void c2_cq_event(struct c2_dev *c2dev, u32 mq_index);
extern void c2_cq_clean(struct c2_dev *c2dev, struct c2_qp *qp, u32 mq_index);
extern int c2_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *entry);
-extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify);
+extern int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify, struct ib_udata *udata);

/* CM */
extern int c2_llp_connect(struct iw_cm_id *cm_id,
diff --git a/drivers/infiniband/hw/amso1100/c2_cq.c b/drivers/infiniband/hw/amso1100/c2_cq.c
index 05c9154..7ce8bca 100644
--- a/drivers/infiniband/hw/amso1100/c2_cq.c
+++ b/drivers/infiniband/hw/amso1100/c2_cq.c
@@ -217,7 +217,8 @@ int c2_poll_cq(struct ib_cq *ibcq, int n
return npolled;
}

-int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify)
+int c2_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify,
+ struct ib_udata *udata)
{
struct c2_mq_shared __iomem *shared;
struct c2_cq *cq;
diff --git a/drivers/infiniband/hw/ehca/ehca_iverbs.h b/drivers/infiniband/hw/ehca/ehca_iverbs.h
index 3720e30..566b30c 100644
--- a/drivers/infiniband/hw/ehca/ehca_iverbs.h
+++ b/drivers/infiniband/hw/ehca/ehca_iverbs.h
@@ -135,7 +135,8 @@ int ehca_poll_cq(struct ib_cq *cq, int n

int ehca_peek_cq(struct ib_cq *cq, int wc_cnt);

-int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify);
+int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify,
+ struct ib_udata *udata);

struct ib_qp *ehca_create_qp(struct ib_pd *pd,
struct ib_qp_init_attr *init_attr,
diff --git a/drivers/infiniband/hw/ehca/ehca_reqs.c b/drivers/infiniband/hw/ehca/ehca_reqs.c
index b46bda1..3ed6992 100644
--- a/drivers/infiniband/hw/ehca/ehca_reqs.c
+++ b/drivers/infiniband/hw/ehca/ehca_reqs.c
@@ -634,7 +634,8 @@ poll_cq_exit0:
return ret;
}

-int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify)
+int ehca_req_notify_cq(struct ib_cq *cq, enum ib_cq_notify cq_notify,
+ struct ib_udata *udata)
{
struct ehca_cq *my_cq = container_of(cq, struct ehca_cq, ib_cq);

diff --git a/drivers/infiniband/hw/ipath/ipath_cq.c b/drivers/infiniband/hw/ipath/ipath_cq.c
index 87462e0..27ba4db 100644
--- a/drivers/infiniband/hw/ipath/ipath_cq.c
+++ b/drivers/infiniband/hw/ipath/ipath_cq.c
@@ -307,13 +307,15 @@ int ipath_destroy_cq(struct ib_cq *ibcq)
* ipath_req_notify_cq - change the notification type for a completion queue
* @ibcq: the completion queue
* @notify: the type of notification to request
+ * @udata: user data
*
* Returns 0 for success.
*
* This may be called from interrupt context. Also called by
* ib_req_notify_cq() in the generic verbs code.
*/
-int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify)
+int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify,
+ struct ib_udata *udata)
{
struct ipath_cq *cq = to_icq(ibcq);
unsigned long flags;
diff --git a/drivers/infiniband/hw/ipath/ipath_verbs.h b/drivers/infiniband/hw/ipath/ipath_verbs.h
index 8039f6e..0d39960 100644
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h
@@ -716,7 +716,8 @@ struct ib_cq *ipath_create_cq(struct ib_

int ipath_destroy_cq(struct ib_cq *ibcq);

-int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify);
+int ipath_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify notify,
+ struct ib_udata *udata);

int ipath_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata);

diff --git a/drivers/infiniband/hw/mthca/mthca_cq.c b/drivers/infiniband/hw/mthca/mthca_cq.c
index 149b369..ec7bb79 100644
--- a/drivers/infiniband/hw/mthca/mthca_cq.c
+++ b/drivers/infiniband/hw/mthca/mthca_cq.c
@@ -723,7 +723,8 @@ repoll:
return err == 0 || err == -EAGAIN ? npolled : err;
}

-int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify)
+int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify,
+ struct ib_udata *udata)
{
__be32 doorbell[2];

@@ -740,7 +741,8 @@ int mthca_tavor_arm_cq(struct ib_cq *cq,
return 0;
}

-int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify)
+int mthca_arbel_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify,
+ struct ib_udata *udata)
{
struct mthca_cq *cq = to_mcq(ibcq);
__be32 doorbell[2];
diff --git a/drivers/infiniband/hw/mthca/mthca_dev.h b/drivers/infiniband/hw/mthca/mthca_dev.h
index fe5cecf..6b9ccf6 100644
--- a/drivers/infiniband/hw/mthca/mthca_dev.h
+++ b/drivers/infiniband/hw/mthca/mthca_dev.h
@@ -493,8 +493,8 @@ void mthca_unmap_eq_icm(struct mthca_dev

int mthca_poll_cq(struct ib_cq *ibcq, int num_entries,
struct ib_wc *entry);
-int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify);
-int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify);
+int mthca_tavor_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, struct ib_udata *udata);
+int mthca_arbel_arm_cq(struct ib_cq *cq, enum ib_cq_notify notify, struct ib_udata *udata);
int mthca_init_cq(struct mthca_dev *dev, int nent,
struct mthca_ucontext *ctx, u32 pdn,
struct mthca_cq *cq);
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 8eacc35..e3e1a2c 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -941,7 +941,8 @@ struct ib_device {
struct ib_wc *wc);
int (*peek_cq)(struct ib_cq *cq, int wc_cnt);
int (*req_notify_cq)(struct ib_cq *cq,
- enum ib_cq_notify cq_notify);
+ enum ib_cq_notify cq_notify,
+ struct ib_udata *udata);
int (*req_ncomp_notif)(struct ib_cq *cq,
int wc_cnt);
struct ib_mr * (*get_dma_mr)(struct ib_pd *pd,
@@ -1373,7 +1374,7 @@ int ib_peek_cq(struct ib_cq *cq, int wc_
static inline int ib_req_notify_cq(struct ib_cq *cq,
enum ib_cq_notify cq_notify)
{
- return cq->device->req_notify_cq(cq, cq_notify);
+ return cq->device->req_notify_cq(cq, cq_notify, NULL);
}

/**

2006-12-02 22:49:43

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 02/13] Device Discovery and ULLD Linkage


Code to discover all the T3 devices and register them
with the T3 RDMA Core and the Linux RDMA Core.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/hw/cxgb3/iwch.c | 189 ++++++++++++++++++++++++++++++++++++
drivers/infiniband/hw/cxgb3/iwch.h | 175 +++++++++++++++++++++++++++++++++
2 files changed, 364 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch.c b/drivers/infiniband/hw/cxgb3/iwch.c
new file mode 100644
index 0000000..acbe449
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch.c
@@ -0,0 +1,189 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+
+#include <rdma/ib_verbs.h>
+
+#include "cxgb3_offload.h"
+#include "iwch_provider.h"
+#include "iwch_user.h"
+#include "iwch.h"
+#include "iwch_cm.h"
+
+#define DRV_VERSION "1.1"
+
+MODULE_AUTHOR("Boyd Faulkner, Steve Wise");
+MODULE_DESCRIPTION("Chelsio T3 RDMA Driver");
+MODULE_LICENSE("Dual BSD/GPL");
+MODULE_VERSION(DRV_VERSION);
+
+cxgb3_cpl_handler_func t3c_handlers[NUM_CPL_CMDS];
+
+static void open_rnic_dev(struct t3cdev *);
+static void close_rnic_dev(struct t3cdev *);
+
+struct cxgb3_client t3c_client = {
+ .name = "iw_cxgb3",
+ .add = open_rnic_dev,
+ .remove = close_rnic_dev,
+ .handlers = t3c_handlers,
+ .redirect = iwch_ep_redirect
+};
+
+static LIST_HEAD(dev_list);
+static DEFINE_MUTEX(dev_mutex);
+
+static void rnic_init(struct iwch_dev *rnicp)
+{
+ PDBG("%s iwch_dev %p\n", __FUNCTION__, rnicp);
+ idr_init(&rnicp->cqidr);
+ idr_init(&rnicp->qpidr);
+ idr_init(&rnicp->mmidr);
+ spin_lock_init(&rnicp->lock);
+
+ rnicp->attr.vendor_id = 0x168;
+ rnicp->attr.vendor_part_id = 7;
+ rnicp->attr.max_qps = T3_MAX_NUM_QP - 32;
+ rnicp->attr.max_wrs = (1UL << 24) - 1;
+ rnicp->attr.max_sge_per_wr = T3_MAX_SGE;
+ rnicp->attr.max_sge_per_rdma_write_wr = T3_MAX_SGE;
+ rnicp->attr.max_cqs = T3_MAX_NUM_CQ - 1;
+ rnicp->attr.max_cqes_per_cq = (1UL << 24) - 1;
+ rnicp->attr.max_mem_regs = cxio_num_stags(&rnicp->rdev);
+ rnicp->attr.max_phys_buf_entries = T3_MAX_PBL_SIZE;
+ rnicp->attr.max_pds = T3_MAX_NUM_PD - 1;
+ rnicp->attr.mem_pgsizes_bitmask = 0x7FFF; /* 4KB-128MB */
+ rnicp->attr.can_resize_wq = 0;
+ rnicp->attr.max_rdma_reads_per_qp = 8;
+ rnicp->attr.max_rdma_read_resources =
+ rnicp->attr.max_rdma_reads_per_qp * rnicp->attr.max_qps;
+ rnicp->attr.max_rdma_read_qp_depth = 8; /* IRD */
+ rnicp->attr.max_rdma_read_depth =
+ rnicp->attr.max_rdma_read_qp_depth * rnicp->attr.max_qps;
+ rnicp->attr.rq_overflow_handled = 0;
+ rnicp->attr.can_modify_ird = 0;
+ rnicp->attr.can_modify_ord = 0;
+ rnicp->attr.max_mem_windows = rnicp->attr.max_mem_regs - 1;
+ rnicp->attr.stag0_value = 1;
+ rnicp->attr.zbva_support = 1;
+ rnicp->attr.local_invalidate_fence = 1;
+ rnicp->attr.cq_overflow_detection = 1;
+ return;
+}
+
+static void open_rnic_dev(struct t3cdev *tdev)
+{
+ struct iwch_dev *rnicp;
+ static int vers_printed;
+
+ PDBG("%s t3cdev %p\n", __FUNCTION__, tdev);
+ if (!vers_printed++)
+ printk(KERN_INFO MOD "Chelsio T3 RDMA Driver - version %s\n",
+ DRV_VERSION);
+ rnicp = (struct iwch_dev *)ib_alloc_device(sizeof(*rnicp));
+ if (!rnicp) {
+ printk(KERN_ERR MOD "Cannot allocate ib device\n");
+ return;
+ }
+ rnicp->rdev.ulp = rnicp;
+ rnicp->rdev.t3cdev_p = tdev;
+
+ if (cxio_rdev_open(&rnicp->rdev)) {
+ printk(KERN_ERR MOD "Unable to open CXIO rdev\n");
+ ib_dealloc_device(&rnicp->ibdev);
+ return;
+ }
+
+ rnic_init(rnicp);
+
+ mutex_lock(&dev_mutex);
+ list_add_tail(&rnicp->entry, &dev_list);
+ mutex_unlock(&dev_mutex);
+
+ if (iwch_register_device(rnicp)) {
+ printk(KERN_ERR MOD "Unable to register device\n");
+ close_rnic_dev(tdev);
+ }
+ printk(KERN_INFO MOD "Initialized device %s\n",
+ pci_name(rnicp->rdev.rnic_info.pdev));
+ return;
+}
+
+static void close_rnic_dev(struct t3cdev *tdev)
+{
+ struct iwch_dev *dev, *tmp;
+ PDBG("%s t3cdev %p\n", __FUNCTION__, tdev);
+ mutex_lock(&dev_mutex);
+ list_for_each_entry_safe(dev, tmp, &dev_list, entry) {
+ if (dev->rdev.t3cdev_p == tdev) {
+ list_del(&dev->entry);
+ iwch_unregister_device(dev);
+ cxio_rdev_close(&dev->rdev);
+ idr_destroy(&dev->cqidr);
+ idr_destroy(&dev->qpidr);
+ idr_destroy(&dev->mmidr);
+ ib_dealloc_device(&dev->ibdev);
+ break;
+ }
+ }
+ mutex_unlock(&dev_mutex);
+}
+
+extern void iwch_ev_dispatch(struct cxio_rdev *rdev_p, struct sk_buff *skb);
+
+static int __init iwch_init_module(void)
+{
+ int err;
+
+ err = cxio_hal_init();
+ if (err)
+ return err;
+ err = iwch_cm_init();
+ if (err)
+ return err;
+ cxio_register_ev_cb(iwch_ev_dispatch);
+ cxgb3_register_client(&t3c_client);
+ return 0;
+}
+
+static void __exit iwch_exit_module(void)
+{
+ cxgb3_unregister_client(&t3c_client);
+ cxio_unregister_ev_cb(iwch_ev_dispatch);
+ iwch_cm_term();
+ cxio_hal_exit();
+}
+
+module_init(iwch_init_module);
+module_exit(iwch_exit_module);
diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h
new file mode 100644
index 0000000..411bfcd
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch.h
@@ -0,0 +1,175 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef __IWCH_H__
+#define __IWCH_H__
+
+#include <linux/mutex.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/idr.h>
+
+#include <rdma/ib_verbs.h>
+
+#include "cxio_hal.h"
+#include "cxgb3_offload.h"
+
+struct iwch_pd;
+struct iwch_cq;
+struct iwch_qp;
+struct iwch_mr;
+
+struct iwch_rnic_attributes {
+ u32 vendor_id;
+ u32 vendor_part_id;
+ u32 max_qps;
+ u32 max_wrs; /* Max for any SQ/RQ */
+ u32 max_sge_per_wr;
+ u32 max_sge_per_rdma_write_wr; /* for RDMA Write WR */
+ u32 max_cqs;
+ u32 max_cqes_per_cq;
+ u32 max_mem_regs;
+ u32 max_phys_buf_entries; /* for phys buf list */
+ u32 max_pds;
+
+ /*
+ * The memory page sizes supported by this RNIC.
+ * Bit position i in bitmap indicates page of
+ * size (4k)^i. Phys block list mode unsupported.
+ */
+ u32 mem_pgsizes_bitmask;
+ u8 can_resize_wq;
+
+ /*
+ * The maximum number of RDMA Reads that can be outstanding
+ * per QP with this RNIC as the target.
+ */
+ u32 max_rdma_reads_per_qp;
+
+ /*
+ * The maximum number of resources used for RDMA Reads
+ * by this RNIC with this RNIC as the target.
+ */
+ u32 max_rdma_read_resources;
+
+ /*
+ * The max depth per QP for initiation of RDMA Read
+ * by this RNIC.
+ */
+ u32 max_rdma_read_qp_depth;
+
+ /*
+ * The maximum depth for initiation of RDMA Read
+ * operations by this RNIC on all QPs
+ */
+ u32 max_rdma_read_depth;
+ u8 rq_overflow_handled;
+ u32 can_modify_ird;
+ u32 can_modify_ord;
+ u32 max_mem_windows;
+ u32 stag0_value;
+ u8 zbva_support;
+ u8 local_invalidate_fence;
+ u32 cq_overflow_detection;
+};
+
+struct iwch_dev {
+ struct ib_device ibdev;
+ struct cxio_rdev rdev;
+ u32 device_cap_flags;
+ struct iwch_rnic_attributes attr;
+ struct idr cqidr;
+ struct idr qpidr;
+ struct idr mmidr;
+ spinlock_t lock;
+ struct list_head entry;
+};
+
+static inline struct iwch_dev *to_iwch_dev(struct ib_device *ibdev)
+{
+ return container_of(ibdev, struct iwch_dev, ibdev);
+}
+
+static inline int t3b_device(struct iwch_dev *rhp)
+{
+ return (rhp->rdev.t3cdev_p->type == T3B);
+}
+
+static inline int t3a_device(struct iwch_dev *rhp)
+{
+ return (rhp->rdev.t3cdev_p->type == T3A);
+}
+
+static inline struct iwch_cq *get_chp(struct iwch_dev *rhp, u32 cqid)
+{
+ return idr_find(&rhp->cqidr, cqid);
+}
+
+static inline struct iwch_qp *get_qhp(struct iwch_dev *rhp, u32 qpid)
+{
+ return idr_find(&rhp->qpidr, qpid);
+}
+
+static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid)
+{
+ return idr_find(&rhp->mmidr, mmid);
+}
+
+static inline int insert_handle(struct iwch_dev *rhp, struct idr *idr,
+ void *handle, u32 id)
+{
+ int ret;
+ u32 newid;
+
+ do {
+ if (!idr_pre_get(idr, GFP_KERNEL)) {
+ return -ENOMEM;
+ }
+ spin_lock_irq(&rhp->lock);
+ ret = idr_get_new_above(idr, handle, id, &newid);
+ BUG_ON(newid != id);
+ spin_unlock_irq(&rhp->lock);
+ } while (ret == -EAGAIN);
+
+ return ret;
+}
+
+static inline void remove_handle(struct iwch_dev *rhp, struct idr *idr, u32 id)
+{
+ spin_lock_irq(&rhp->lock);
+ idr_remove(idr, id);
+ spin_unlock_irq(&rhp->lock);
+}
+
+extern struct cxgb3_client t3c_client;
+extern cxgb3_cpl_handler_func t3c_handlers[NUM_CPL_CMDS];
+#endif

2006-12-02 22:50:16

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 03/13] Provider Methods and Data Structures


Provider methods to support the Linux RDMA verbs.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/hw/cxgb3/iwch_provider.c | 1170 +++++++++++++++++++++++++++
drivers/infiniband/hw/cxgb3/iwch_provider.h | 362 ++++++++
drivers/infiniband/hw/cxgb3/iwch_user.h | 68 ++
3 files changed, 1600 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c
new file mode 100644
index 0000000..4bef081
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c
@@ -0,0 +1,1170 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#include <linux/device.h>
+#include <linux/netdevice.h>
+#include <linux/etherdevice.h>
+#include <linux/delay.h>
+#include <linux/errno.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/ethtool.h>
+
+#include <asm/io.h>
+#include <asm/irq.h>
+#include <asm/byteorder.h>
+
+#include <rdma/iw_cm.h>
+#include <rdma/ib_verbs.h>
+#include <rdma/ib_smi.h>
+#include <rdma/ib_user_verbs.h>
+
+#include <cxio_hal.h>
+#include "iwch.h"
+#include "iwch_provider.h"
+#include "iwch_cm.h"
+#include "iwch_user.h"
+
+static int iwch_modify_port(struct ib_device *ibdev,
+ u8 port, int port_modify_mask,
+ struct ib_port_modify *props)
+{
+ return -ENOSYS;
+}
+
+static struct ib_ah *iwch_ah_create(struct ib_pd *pd,
+ struct ib_ah_attr *ah_attr)
+{
+ return ERR_PTR(-ENOSYS);
+}
+
+static int iwch_ah_destroy(struct ib_ah *ah)
+{
+ return -ENOSYS;
+}
+
+static int iwch_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
+{
+ return -ENOSYS;
+}
+
+static int iwch_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
+{
+ return -ENOSYS;
+}
+
+static int iwch_process_mad(struct ib_device *ibdev,
+ int mad_flags,
+ u8 port_num,
+ struct ib_wc *in_wc,
+ struct ib_grh *in_grh,
+ struct ib_mad *in_mad, struct ib_mad *out_mad)
+{
+ return -ENOSYS;
+}
+
+static int iwch_dealloc_ucontext(struct ib_ucontext *context)
+{
+ struct iwch_dev *rhp = to_iwch_dev(context->device);
+ struct iwch_ucontext *ucontext = to_iwch_ucontext(context);
+ PDBG("%s context %p\n", __FUNCTION__, context);
+ cxio_release_ucontext(&rhp->rdev, &ucontext->uctx);
+ kfree(ucontext);
+ return 0;
+}
+
+static struct ib_ucontext *iwch_alloc_ucontext(struct ib_device *ibdev,
+ struct ib_udata *udata)
+{
+ struct iwch_ucontext *context;
+ struct iwch_dev *rhp = to_iwch_dev(ibdev);
+
+ PDBG("%s ibdev %p\n", __FUNCTION__, ibdev);
+ context = kmalloc(sizeof(*context), GFP_KERNEL);
+ if (!context)
+ return ERR_PTR(-ENOMEM);
+ cxio_init_ucontext(&rhp->rdev, &context->uctx);
+ INIT_LIST_HEAD(&context->mmaps);
+ return &context->ibucontext;
+}
+
+static int iwch_destroy_cq(struct ib_cq *ib_cq)
+{
+ struct iwch_cq *chp;
+
+ PDBG("%s ib_cq %p\n", __FUNCTION__, ib_cq);
+ chp = to_iwch_cq(ib_cq);
+
+ remove_handle(chp->rhp, &chp->rhp->cqidr, chp->cq.cqid);
+ atomic_dec(&chp->refcnt);
+ wait_event(chp->wait, !atomic_read(&chp->refcnt));
+
+ cxio_destroy_cq(&chp->rhp->rdev, &chp->cq);
+ kfree(chp);
+ return 0;
+}
+
+static struct ib_cq *iwch_create_cq(struct ib_device *ibdev, int entries,
+ struct ib_ucontext *context,
+ struct ib_udata *udata)
+{
+ struct iwch_dev *rhp;
+ struct iwch_cq *chp;
+ struct iwch_create_cq_resp uresp;
+
+ PDBG("%s ib_dev %p entries %d\n", __FUNCTION__, ibdev, entries);
+ rhp = to_iwch_dev(ibdev);
+ chp = kzalloc(sizeof(*chp), GFP_KERNEL);
+ if (!chp)
+ return ERR_PTR(-ENOMEM);
+
+ if (t3a_device(rhp)) {
+
+ /*
+ * T3A: Add some fluff to handle extra CQEs inserted
+ * for various errors.
+ * Additional CQE possibilities:
+ * TERMINATE,
+ * incoming RDMA WRITE Failures
+ * incoming RDMA READ REQUEST FAILUREs
+ * NOTE: We cannot ensure the CQ won't overflow.
+ */
+ entries += 16;
+ }
+ entries = roundup_pow_of_two(entries);
+ chp->cq.size_log2 = long_log2(entries);
+
+ if (cxio_create_cq(&rhp->rdev, &chp->cq)) {
+ kfree(chp);
+ return ERR_PTR(-ENOMEM);
+ }
+ chp->rhp = rhp;
+ chp->ibcq.cqe = (1 << chp->cq.size_log2) - 1;
+ spin_lock_init(&chp->lock);
+ atomic_set(&chp->refcnt, 1);
+ init_waitqueue_head(&chp->wait);
+ insert_handle(rhp, &rhp->cqidr, chp, chp->cq.cqid);
+
+ if (context) {
+ struct iwch_mm_entry *mm;
+
+ mm = kmalloc(sizeof *mm, GFP_KERNEL);
+ if (!mm) {
+ iwch_destroy_cq(&chp->ibcq);
+ return ERR_PTR(-ENOMEM);
+ }
+ uresp.cqid = chp->cq.cqid;
+ uresp.size_log2 = chp->cq.size_log2;
+ uresp.physaddr = virt_to_phys(chp->cq.queue);
+ if (ib_copy_to_udata(udata, &uresp, sizeof (uresp))) {
+ kfree(mm);
+ iwch_destroy_cq(&chp->ibcq);
+ return ERR_PTR(-EFAULT);
+ }
+ mm->addr = uresp.physaddr;
+ mm->len = PAGE_ALIGN((1UL << uresp.size_log2) *
+ sizeof (struct t3_cqe));
+ insert_mmap(to_iwch_ucontext(context), mm);
+ }
+ PDBG("created cqid 0x%0x chp %p size 0x%0x, dma_addr 0x%0llx\n",
+ chp->cq.cqid, chp, (1 << chp->cq.size_log2),
+ (u64)chp->cq.dma_addr);
+ return &chp->ibcq;
+}
+
+static int iwch_resize_cq(struct ib_cq *cq, int cqe, struct ib_udata *udata)
+{
+ struct iwch_cq *chp = to_iwch_cq(cq);
+ struct t3_cq oldcq, newcq;
+ int ret;
+
+ PDBG("%s ib_cq %p cqe %d\n", __FUNCTION__, cq, cqe);
+
+ /* We don't downsize... */
+ if (cqe <= cq->cqe)
+ return 0;
+
+ /* create new t3_cq with new size */
+ cqe = roundup_pow_of_two(cqe+1);
+ newcq.size_log2 = long_log2(cqe);
+
+ /* Dont allow resize to less than the current wce count */
+ if (cqe < Q_COUNT(chp->cq.rptr, chp->cq.wptr)) {
+ return -ENOMEM;
+ }
+
+ /* Quiesce all QPs using this CQ */
+ ret = iwch_quiesce_qps(chp);
+ if (ret) {
+ return ret;
+ }
+
+ ret = cxio_create_cq(&chp->rhp->rdev, &newcq);
+ if (ret) {
+ kfree(chp);
+ return ret;
+ }
+
+ /* copy CQEs */
+ memcpy(newcq.queue, chp->cq.queue, (1 << chp->cq.size_log2) *
+ sizeof(struct t3_cqe));
+
+ /* old iwch_qp gets new t3_cq but keeps old cqid */
+ oldcq = chp->cq;
+ chp->cq = newcq;
+ chp->cq.cqid = oldcq.cqid;
+
+ /* resize new t3_cq to update the HW context */
+ ret = cxio_resize_cq(&chp->rhp->rdev, &chp->cq);
+ if (ret) {
+ chp->cq = oldcq;
+ return ret;
+ }
+ chp->ibcq.cqe = (1<<chp->cq.size_log2) - 1;
+
+ /* destroy old t3_cq */
+ oldcq.cqid = newcq.cqid;
+ ret = cxio_destroy_cq(&chp->rhp->rdev, &oldcq);
+ if (ret) {
+ printk(KERN_ERR MOD "%s - cxio_destroy_cq failed %d\n",
+ __FUNCTION__, ret);
+ }
+
+ /* add user hooks here */
+
+ /* resume qps */
+ ret = iwch_resume_qps(chp);
+ return ret;
+}
+
+static int iwch_arm_cq(struct ib_cq *ibcq, enum ib_cq_notify notify,
+ struct ib_udata *udata)
+{
+ struct iwch_dev *rhp;
+ struct iwch_cq *chp;
+ enum t3_cq_opcode cq_op;
+ int err;
+ unsigned long flag;
+ struct iwch_req_notify_cq ucmd;
+
+ chp = to_iwch_cq(ibcq);
+ rhp = chp->rhp;
+ if (notify == IB_CQ_SOLICITED)
+ cq_op = CQ_ARM_SE;
+ else
+ cq_op = CQ_ARM_AN;
+ if (udata && t3b_device(rhp)) {
+ if (ib_copy_from_udata(&ucmd, udata, sizeof ucmd))
+ return -EFAULT;
+ spin_lock_irqsave(&chp->lock, flag);
+ chp->cq.rptr = ucmd.rptr;
+ } else
+ spin_lock_irqsave(&chp->lock, flag);
+ PDBG("%s rptr 0x%x\n", __FUNCTION__, chp->cq.rptr);
+ err = cxio_hal_cq_op(&rhp->rdev, &chp->cq, cq_op, 0);
+ spin_unlock_irqrestore(&chp->lock, flag);
+ if (err)
+ printk(KERN_ERR MOD "Error %d rearming CQID 0x%x\n", err,
+ chp->cq.cqid);
+ return err;
+}
+
+static int iwch_mmap(struct ib_ucontext *context, struct vm_area_struct *vma)
+{
+ int len = vma->vm_end - vma->vm_start;
+ u64 pgaddr = vma->vm_pgoff << PAGE_SHIFT;
+ struct cxio_rdev *rdev_p;
+ int ret = 0;
+ struct iwch_mm_entry *mm;
+ struct iwch_ucontext *ucontext;
+
+ PDBG("%s off 0x%lx addr 0x%llx len %d\n", __FUNCTION__, vma->vm_pgoff,
+ pgaddr, len);
+
+ if (vma->vm_start & (PAGE_SIZE-1)) {
+ return -EINVAL;
+ }
+
+ rdev_p = &(to_iwch_dev(context->device)->rdev);
+ ucontext = to_iwch_ucontext(context);
+
+ mm = remove_mmap(ucontext, pgaddr, len);
+ if (!mm)
+ return -EINVAL;
+ kfree(mm);
+
+ if ((pgaddr >= rdev_p->rnic_info.udbell_physbase) &&
+ (pgaddr < (rdev_p->rnic_info.udbell_physbase +
+ rdev_p->rnic_info.udbell_len))) {
+
+ /*
+ * Map T3 DB register.
+ */
+ if (vma->vm_flags & VM_READ) {
+ return -EPERM;
+ }
+
+ vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+ vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;
+ vma->vm_flags &= ~VM_MAYREAD;
+ ret = io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
+ len, vma->vm_page_prot);
+ } else {
+
+ /*
+ * Map WQ or CQ contig dma memory...
+ */
+ ret = remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
+ len, vma->vm_page_prot);
+ }
+
+ return ret;
+}
+
+static int iwch_deallocate_pd(struct ib_pd *pd)
+{
+ struct iwch_dev *rhp;
+ struct iwch_pd *php;
+
+ php = to_iwch_pd(pd);
+ rhp = php->rhp;
+ PDBG("%s ibpd %p pdid 0x%x\n", __FUNCTION__, pd, php->pdid);
+ cxio_hal_put_pdid(rhp->rdev.rscp, php->pdid);
+ kfree(php);
+ return 0;
+}
+
+static struct ib_pd *iwch_allocate_pd(struct ib_device *ibdev,
+ struct ib_ucontext *context,
+ struct ib_udata *udata)
+{
+ struct iwch_pd *php;
+ u32 pdid;
+ struct iwch_dev *rhp;
+
+ PDBG("%s ibdev %p\n", __FUNCTION__, ibdev);
+ rhp = (struct iwch_dev *) ibdev;
+ pdid = cxio_hal_get_pdid(rhp->rdev.rscp);
+ if (!pdid)
+ return ERR_PTR(-EINVAL);
+ php = kzalloc(sizeof(*php), GFP_KERNEL);
+ if (!php) {
+ cxio_hal_put_pdid(rhp->rdev.rscp, pdid);
+ return ERR_PTR(-ENOMEM);
+ }
+ php->pdid = pdid;
+ php->rhp = rhp;
+ if (context) {
+ if (ib_copy_to_udata(udata, &php->pdid, sizeof (__u32))) {
+ iwch_deallocate_pd(&php->ibpd);
+ return ERR_PTR(-EFAULT);
+ }
+ }
+ PDBG("%s pdid 0x%0x ptr 0x%p\n", __FUNCTION__, pdid, php);
+ return &php->ibpd;
+}
+
+static int iwch_dereg_mr(struct ib_mr *ib_mr)
+{
+ struct iwch_dev *rhp;
+ struct iwch_mr *mhp;
+ u32 mmid;
+
+ PDBG("%s ib_mr %p\n", __FUNCTION__, ib_mr);
+ /* There can be no memory windows */
+ if (atomic_read(&ib_mr->usecnt))
+ return -EINVAL;
+
+ mhp = to_iwch_mr(ib_mr);
+ rhp = mhp->rhp;
+ mmid = mhp->attr.stag >> 8;
+ cxio_dereg_mem(&rhp->rdev, mhp->attr.stag, mhp->attr.pbl_size,
+ mhp->attr.pbl_addr);
+ remove_handle(rhp, &rhp->mmidr, mmid);
+ if (mhp->kva)
+ kfree((void *) (unsigned long) mhp->kva);
+ PDBG("%s mmid 0x%x ptr %p\n", __FUNCTION__, mmid, mhp);
+ kfree(mhp);
+ return 0;
+}
+
+static struct ib_mr *iwch_register_phys_mem(struct ib_pd *pd,
+ struct ib_phys_buf *buffer_list,
+ int num_phys_buf,
+ int acc,
+ u64 *iova_start)
+{
+ __be64 *page_list;
+ int shift;
+ u64 total_size;
+ int npages;
+ struct iwch_dev *rhp;
+ struct iwch_pd *php;
+ struct iwch_mr *mhp;
+ int ret;
+
+ PDBG("%s ib_pd %p\n", __FUNCTION__, pd);
+ php = to_iwch_pd(pd);
+ rhp = php->rhp;
+
+ acc = iwch_convert_access(acc);
+
+
+ mhp = kzalloc(sizeof(*mhp), GFP_KERNEL);
+ if (!mhp)
+ return ERR_PTR(-ENOMEM);
+
+ /* First check that we have enough alignment */
+ if ((*iova_start & ~PAGE_MASK) != (buffer_list[0].addr & ~PAGE_MASK)) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ if (num_phys_buf > 1 &&
+ ((buffer_list[0].addr + buffer_list[0].size) & ~PAGE_MASK)) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ ret = build_phys_page_list(buffer_list, num_phys_buf, iova_start,
+ &total_size, &npages, &shift, &page_list);
+ if (ret)
+ goto err;
+
+ mhp->rhp = rhp;
+ mhp->attr.pdid = php->pdid;
+ mhp->attr.zbva = 0;
+
+ /* NOTE: TPT perms are backwards from BIND WR perms! */
+ mhp->attr.perms = (acc & 0x1) << 3;
+ mhp->attr.perms |= (acc & 0x2) << 1;
+ mhp->attr.perms |= (acc & 0x4) >> 1;
+ mhp->attr.perms |= (acc & 0x8) >> 3;
+
+ mhp->attr.va_fbo = *iova_start;
+ mhp->attr.page_size = shift - 12;
+
+ mhp->attr.len = (u32) total_size;
+ mhp->attr.pbl_size = npages;
+ ret = iwch_register_mem(rhp, php, mhp, shift, page_list);
+ kfree(page_list);
+ if (ret) {
+ goto err;
+ }
+ return &mhp->ibmr;
+err:
+ kfree(mhp);
+ return ERR_PTR(ret);
+
+}
+
+static int iwch_reregister_phys_mem(struct ib_mr *mr,
+ int mr_rereg_mask,
+ struct ib_pd *pd,
+ struct ib_phys_buf *buffer_list,
+ int num_phys_buf,
+ int acc, u64 * iova_start)
+{
+
+ struct iwch_mr mh, *mhp;
+ struct iwch_pd *php;
+ struct iwch_dev *rhp;
+ int new_acc;
+ __be64 *page_list = NULL;
+ int shift = 0;
+ u64 total_size;
+ int npages;
+ int ret;
+
+ PDBG("%s ib_mr %p ib_pd %p\n", __FUNCTION__, mr, pd);
+
+ /* There can be no memory windows */
+ if (atomic_read(&mr->usecnt))
+ return -EINVAL;
+
+ mhp = to_iwch_mr(mr);
+ rhp = mhp->rhp;
+ php = to_iwch_pd(mr->pd);
+
+ /* make sure we are on the same adapter */
+ if (rhp != php->rhp)
+ return -EINVAL;
+
+ new_acc = mhp->attr.perms;
+
+ memcpy(&mh, mhp, sizeof *mhp);
+
+ if (mr_rereg_mask & IB_MR_REREG_PD)
+ php = to_iwch_pd(pd);
+ if (mr_rereg_mask & IB_MR_REREG_ACCESS)
+ mh.attr.perms = iwch_convert_access(acc);
+ if (mr_rereg_mask & IB_MR_REREG_TRANS)
+ ret = build_phys_page_list(buffer_list, num_phys_buf,
+ iova_start,
+ &total_size, &npages,
+ &shift, &page_list);
+
+ ret = iwch_reregister_mem(rhp, php, &mh, shift, page_list, npages);
+ kfree(page_list);
+ if (ret) {
+ return ret;
+ }
+ if (mr_rereg_mask & IB_MR_REREG_PD)
+ mhp->attr.pdid = php->pdid;
+ if (mr_rereg_mask & IB_MR_REREG_ACCESS)
+ mhp->attr.perms = acc;
+ if (mr_rereg_mask & IB_MR_REREG_TRANS) {
+ mhp->attr.zbva = 0;
+ mhp->attr.va_fbo = *iova_start;
+ mhp->attr.page_size = shift - 12;
+ mhp->attr.len = (u32) total_size;
+ mhp->attr.pbl_size = npages;
+ }
+
+ return 0;
+}
+
+
+struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, struct ib_umem *region,
+ int acc, struct ib_udata *udata)
+{
+ __be64 *pages;
+ int shift, n, len;
+ int i, j, k;
+ int err = 0;
+ struct ib_umem_chunk *chunk;
+ struct iwch_dev *rhp;
+ struct iwch_pd *php;
+ struct iwch_mr *mhp;
+ struct iwch_reg_user_mr_resp uresp;
+
+ PDBG("%s ib_pd %p\n", __FUNCTION__, pd);
+ shift = ffs(region->page_size) - 1;
+
+ php = to_iwch_pd(pd);
+ rhp = php->rhp;
+ mhp = kzalloc(sizeof(*mhp), GFP_KERNEL);
+ if (!mhp)
+ return ERR_PTR(-ENOMEM);
+
+ n = 0;
+ list_for_each_entry(chunk, &region->chunk_list, list)
+ n += chunk->nents;
+
+ pages = kmalloc(n * sizeof(u64), GFP_KERNEL);
+ if (!pages) {
+ err = -ENOMEM;
+ goto err;
+ }
+
+ acc = iwch_convert_access(acc);
+
+ i = n = 0;
+
+ list_for_each_entry(chunk, &region->chunk_list, list)
+ for (j = 0; j < chunk->nmap; ++j) {
+ len = sg_dma_len(&chunk->page_list[j]) >> shift;
+ for (k = 0; k < len; ++k) {
+ pages[i++] = cpu_to_be64(sg_dma_address(
+ &chunk->page_list[j]) +
+ region->page_size * k);
+ }
+ }
+
+ mhp->rhp = rhp;
+ mhp->attr.pdid = php->pdid;
+ mhp->attr.zbva = 0;
+ mhp->attr.perms = (acc & 0x1) << 3;
+ mhp->attr.perms |= (acc & 0x2) << 1;
+ mhp->attr.perms |= (acc & 0x4) >> 1;
+ mhp->attr.perms |= (acc & 0x8) >> 3;
+ mhp->attr.va_fbo = region->virt_base;
+ mhp->attr.page_size = shift - 12;
+ mhp->attr.len = (u32) region->length;
+ mhp->attr.pbl_size = i;
+ err = iwch_register_mem(rhp, php, mhp, shift, pages);
+ kfree(pages);
+ if (err)
+ goto err;
+
+ if (udata && t3b_device(rhp)) {
+ uresp.pbl_addr = (mhp->attr.pbl_addr -
+ rhp->rdev.rnic_info.pbl_base) >> 3;
+ PDBG("%s user resp pbl_addr 0x%x\n", __FUNCTION__,
+ uresp.pbl_addr);
+
+ if (ib_copy_to_udata(udata, &uresp, sizeof (uresp))) {
+ iwch_dereg_mr(&mhp->ibmr);
+ err = -EFAULT;
+ goto err;
+ }
+ }
+
+ return &mhp->ibmr;
+
+err:
+ kfree(mhp);
+ return ERR_PTR(err);
+}
+
+struct ib_mr *iwch_get_dma_mr(struct ib_pd *pd, int acc)
+{
+ struct ib_phys_buf bl;
+ u64 kva;
+ struct ib_mr *ibmr;
+
+ PDBG("%s ib_pd %p\n", __FUNCTION__, pd);
+
+ /*
+ * T3 only supports 32 bits of size.
+ */
+ bl.size = 0xffffffff;
+ bl.addr = 0;
+ kva = 0;
+ ibmr = iwch_register_phys_mem(pd, &bl, 1, acc, &kva);
+ return ibmr;
+}
+
+struct ib_mw *iwch_alloc_mw(struct ib_pd *pd)
+{
+ struct iwch_dev *rhp;
+ struct iwch_pd *php;
+ struct iwch_mw *mhp;
+ u32 mmid;
+ u32 stag = 0;
+ int ret;
+
+ php = to_iwch_pd(pd);
+ rhp = php->rhp;
+ mhp = kzalloc(sizeof(*mhp), GFP_KERNEL);
+ if (!mhp)
+ return ERR_PTR(-ENOMEM);
+ ret = cxio_allocate_window(&rhp->rdev, &stag, php->pdid);
+ if (ret) {
+ kfree(mhp);
+ return ERR_PTR(ret);
+ }
+ mhp->rhp = rhp;
+ mhp->attr.pdid = php->pdid;
+ mhp->attr.type = TPT_MW;
+ mhp->attr.stag = stag;
+ mmid = (stag) >> 8;
+ insert_handle(rhp, &rhp->mmidr, mhp, mmid);
+ PDBG("%s mmid 0x%x mhp %p stag 0x%x\n", __FUNCTION__, mmid, mhp, stag);
+ return &(mhp->ibmw);
+}
+
+int iwch_dealloc_mw(struct ib_mw *mw)
+{
+ struct iwch_dev *rhp;
+ struct iwch_mw *mhp;
+ u32 mmid;
+
+ mhp = to_iwch_mw(mw);
+ rhp = mhp->rhp;
+ mmid = (mw->rkey) >> 8;
+ cxio_deallocate_window(&rhp->rdev, mhp->attr.stag);
+ remove_handle(rhp, &rhp->mmidr, mmid);
+ kfree(mhp);
+ PDBG("%s ib_mw %p mmid 0x%x ptr %p\n", __FUNCTION__, mw, mmid, mhp);
+ return 0;
+}
+
+static int iwch_destroy_qp(struct ib_qp *ib_qp)
+{
+ struct iwch_dev *rhp;
+ struct iwch_qp *qhp;
+ struct iwch_qp_attributes attrs;
+ struct iwch_ucontext *ucontext;
+
+ qhp = to_iwch_qp(ib_qp);
+ rhp = qhp->rhp;
+
+ if (qhp->attr.state == IWCH_QP_STATE_RTS) {
+ attrs.next_state = IWCH_QP_STATE_ERROR;
+ iwch_modify_qp(rhp, qhp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 0);
+ }
+ wait_event(qhp->wait, !qhp->ep);
+
+ remove_handle(rhp, &rhp->qpidr, qhp->wq.qpid);
+
+ atomic_dec(&qhp->refcnt);
+ wait_event(qhp->wait, !atomic_read(&qhp->refcnt));
+
+ ucontext = ib_qp->uobject ? to_iwch_ucontext(ib_qp->uobject->context)
+ : NULL;
+ cxio_destroy_qp(&rhp->rdev, &qhp->wq,
+ ucontext ? &ucontext->uctx : &rhp->rdev.uctx);
+
+ PDBG("%s ib_qp %p qpid 0x%0x qhp %p\n", __FUNCTION__,
+ ib_qp, qhp->wq.qpid, qhp);
+ kfree(qhp);
+ return 0;
+}
+
+static struct ib_qp *iwch_create_qp(struct ib_pd *pd,
+ struct ib_qp_init_attr *attrs,
+ struct ib_udata *udata)
+{
+ struct iwch_dev *rhp;
+ struct iwch_qp *qhp;
+ struct iwch_pd *php;
+ struct iwch_cq *schp;
+ struct iwch_cq *rchp;
+ struct iwch_create_qp_resp uresp;
+ int wqsize, sqsize, rqsize;
+ struct iwch_ucontext *ucontext;
+
+ PDBG("%s ib_pd %p\n", __FUNCTION__, pd);
+ if (attrs->qp_type != IB_QPT_RC)
+ return ERR_PTR(-EINVAL);
+ php = to_iwch_pd(pd);
+ rhp = php->rhp;
+ schp = get_chp(rhp, ((struct iwch_cq *) attrs->send_cq)->cq.cqid);
+ rchp = get_chp(rhp, ((struct iwch_cq *) attrs->recv_cq)->cq.cqid);
+ if (!schp || !rchp)
+ return ERR_PTR(-EINVAL);
+
+ /* The RQT size must be # of entries + 1 rounded up to a power of two */
+ rqsize = roundup_pow_of_two(attrs->cap.max_recv_wr);
+ if (rqsize == attrs->cap.max_recv_wr)
+ rqsize = roundup_pow_of_two(attrs->cap.max_recv_wr+1);
+
+ /* T3 doesn't support RQT depth < 16 */
+ if (rqsize < 16)
+ rqsize = 16;
+
+ if (rqsize > T3_MAX_RQ_SIZE)
+ return ERR_PTR(-EINVAL);
+
+ /*
+ * NOTE: The SQ and total WQ sizes don't need to be
+ * a power of two. However, all the code assumes
+ * they are. EG: Q_FREECNT() and friends.
+ */
+ sqsize = roundup_pow_of_two(attrs->cap.max_send_wr);
+ wqsize = roundup_pow_of_two(rqsize + sqsize);
+ PDBG("%s wqsize %d sqsize %d rqsize %d\n", __FUNCTION__,
+ wqsize, sqsize, rqsize);
+ qhp = kzalloc(sizeof(*qhp), GFP_KERNEL);
+ if (!qhp)
+ return ERR_PTR(-ENOMEM);
+ qhp->wq.size_log2 = long_log2(wqsize);
+ qhp->wq.rq_size_log2 = long_log2(rqsize);
+ qhp->wq.sq_size_log2 = long_log2(sqsize);
+ ucontext = pd->uobject ? to_iwch_ucontext(pd->uobject->context) : NULL;
+ if (cxio_create_qp(&rhp->rdev, !udata, &qhp->wq,
+ ucontext ? &ucontext->uctx : &rhp->rdev.uctx)) {
+ kfree(qhp);
+ return ERR_PTR(-ENOMEM);
+ }
+ attrs->cap.max_recv_wr = rqsize - 1;
+ attrs->cap.max_send_wr = sqsize;
+ qhp->rhp = rhp;
+ qhp->attr.pd = php->pdid;
+ qhp->attr.scq = ((struct iwch_cq *) attrs->send_cq)->cq.cqid;
+ qhp->attr.rcq = ((struct iwch_cq *) attrs->recv_cq)->cq.cqid;
+ qhp->attr.sq_num_entries = attrs->cap.max_send_wr;
+ qhp->attr.rq_num_entries = attrs->cap.max_recv_wr;
+ qhp->attr.sq_max_sges = attrs->cap.max_send_sge;
+ qhp->attr.sq_max_sges_rdma_write = attrs->cap.max_send_sge;
+ qhp->attr.rq_max_sges = attrs->cap.max_recv_sge;
+ qhp->attr.state = IWCH_QP_STATE_IDLE;
+ qhp->attr.next_state = IWCH_QP_STATE_IDLE;
+
+ /*
+ * XXX - These don't get passed in from the openib user
+ * at create time. The CM sets them via a QP modify.
+ * Need to fix... I think the CM should
+ */
+ qhp->attr.enable_rdma_read = 1;
+ qhp->attr.enable_rdma_write = 1;
+ qhp->attr.enable_bind = 1;
+ qhp->attr.max_ord = 1;
+ qhp->attr.max_ird = 1;
+
+ spin_lock_init(&qhp->lock);
+ init_waitqueue_head(&qhp->wait);
+ atomic_set(&qhp->refcnt, 1);
+ insert_handle(rhp, &rhp->qpidr, qhp, qhp->wq.qpid);
+
+ if (udata) {
+
+ struct iwch_mm_entry *mm1, *mm2;
+
+ mm1 = kmalloc(sizeof *mm1, GFP_KERNEL);
+ if (!mm1) {
+ iwch_destroy_qp(&qhp->ibqp);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ mm2 = kmalloc(sizeof *mm2, GFP_KERNEL);
+ if (!mm2) {
+ kfree(mm1);
+ iwch_destroy_qp(&qhp->ibqp);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ uresp.qpid = qhp->wq.qpid;
+ uresp.size_log2 = qhp->wq.size_log2;
+ uresp.sq_size_log2 = qhp->wq.sq_size_log2;
+ uresp.rq_size_log2 = qhp->wq.rq_size_log2;
+ uresp.physaddr = virt_to_phys(qhp->wq.queue);
+ uresp.doorbell = qhp->wq.udb;
+ if (ib_copy_to_udata(udata, &uresp, sizeof (uresp))) {
+ kfree(mm1);
+ kfree(mm2);
+ iwch_destroy_qp(&qhp->ibqp);
+ return ERR_PTR(-EFAULT);
+ }
+ mm1->addr = uresp.physaddr;
+ mm1->len = PAGE_ALIGN(wqsize * sizeof (union t3_wr));
+ insert_mmap(ucontext, mm1);
+ mm2->addr = uresp.doorbell & PAGE_MASK;
+ mm2->len = PAGE_SIZE;
+ insert_mmap(ucontext, mm2);
+ }
+ qhp->ibqp.qp_num = qhp->wq.qpid;
+ init_timer(&(qhp->timer));
+ PDBG("%s sq_num_entries %d, rq_num_entries %d "
+ "qpid 0x%0x qhp %p dma_addr 0x%llx size %d\n",
+ __FUNCTION__, qhp->attr.sq_num_entries, qhp->attr.rq_num_entries,
+ qhp->wq.qpid, qhp, (u64)qhp->wq.dma_addr, 1 << qhp->wq.size_log2);
+ return (&qhp->ibqp);
+}
+
+static int iwch_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
+ int attr_mask, struct ib_udata *udata)
+{
+ struct iwch_dev *rhp;
+ struct iwch_qp *qhp;
+ enum iwch_qp_attr_mask mask = 0;
+ struct iwch_qp_attributes attrs;
+
+ PDBG("%s ib_qp %p\n", __FUNCTION__, ibqp);
+
+ /* iwarp does not support the RTR state */
+ if ((attr_mask & IB_QP_STATE) && (attr->qp_state == IB_QPS_RTR))
+ attr_mask &= ~IB_QP_STATE;
+
+ /* Make sure we still have something left to do */
+ if (!attr_mask)
+ return 0;
+
+ memset(&attrs, 0, sizeof attrs);
+ qhp = to_iwch_qp(ibqp);
+ rhp = qhp->rhp;
+
+ attrs.next_state = iwch_convert_state(attr->qp_state);
+ attrs.enable_rdma_read = (attr->qp_access_flags &
+ IB_ACCESS_REMOTE_READ) ? 1 : 0;
+ attrs.enable_rdma_write = (attr->qp_access_flags &
+ IB_ACCESS_REMOTE_WRITE) ? 1 : 0;
+ attrs.enable_bind = (attr->qp_access_flags & IB_ACCESS_MW_BIND) ? 1 : 0;
+
+
+ mask |= (attr_mask & IB_QP_STATE) ? IWCH_QP_ATTR_NEXT_STATE : 0;
+ mask |= (attr_mask & IB_QP_ACCESS_FLAGS) ?
+ (IWCH_QP_ATTR_ENABLE_RDMA_READ |
+ IWCH_QP_ATTR_ENABLE_RDMA_WRITE |
+ IWCH_QP_ATTR_ENABLE_RDMA_BIND) : 0;
+
+ return iwch_modify_qp(rhp, qhp, mask, &attrs, 0);
+}
+
+void iwch_qp_add_ref(struct ib_qp *qp)
+{
+ PDBG("%s ib_qp %p\n", __FUNCTION__, qp);
+ atomic_inc(&(to_iwch_qp(qp)->refcnt));
+}
+
+void iwch_qp_rem_ref(struct ib_qp *qp)
+{
+ PDBG("%s ib_qp %p\n", __FUNCTION__, qp);
+ if (atomic_dec_and_test(&(to_iwch_qp(qp)->refcnt)))
+ wake_up(&(to_iwch_qp(qp)->wait));
+}
+
+struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn)
+{
+ PDBG("%s ib_dev %p qpn 0x%x\n", __FUNCTION__, dev, qpn);
+ return (struct ib_qp *)get_qhp(to_iwch_dev(dev), qpn);
+}
+
+
+static int iwch_query_pkey(struct ib_device *ibdev,
+ u8 port, u16 index, u16 * pkey)
+{
+ PDBG("%s ibdev %p\n", __FUNCTION__, ibdev);
+ *pkey = 0;
+ return 0;
+}
+
+static int iwch_query_gid(struct ib_device *ibdev, u8 port,
+ int index, union ib_gid *gid)
+{
+ struct iwch_dev *dev;
+
+ PDBG("%s ibdev %p, port %d, index %d, gid %p\n",
+ __FUNCTION__, ibdev, port, index, gid);
+ dev = to_iwch_dev(ibdev);
+ BUG_ON(port == 0 || port > 2);
+ memset(&(gid->raw[0]), 0, sizeof(gid->raw));
+ memcpy(&(gid->raw[0]), dev->rdev.port_info.lldevs[port-1]->dev_addr, 6);
+ return 0;
+}
+
+static int iwch_query_device(struct ib_device *ibdev,
+ struct ib_device_attr *props)
+{
+
+ struct iwch_dev *dev;
+ PDBG("%s ibdev %p\n", __FUNCTION__, ibdev);
+
+ dev = to_iwch_dev(ibdev);
+ memset(props, 0, sizeof *props);
+ memcpy(&props->sys_image_guid, dev->rdev.t3cdev_p->lldev->dev_addr, 6);
+ props->device_cap_flags = dev->device_cap_flags;
+ props->vendor_id = (u32)dev->rdev.rnic_info.pdev->vendor;
+ props->vendor_part_id = (u32)dev->rdev.rnic_info.pdev->device;
+ props->max_mr_size = ~0ull;
+ props->max_qp = dev->attr.max_qps;
+ props->max_qp_wr = dev->attr.max_wrs;
+ props->max_sge = dev->attr.max_sge_per_wr;
+ props->max_sge_rd = 1;
+ props->max_qp_rd_atom = dev->attr.max_rdma_reads_per_qp;
+ props->max_cq = dev->attr.max_cqs;
+ props->max_cqe = dev->attr.max_cqes_per_cq;
+ props->max_mr = dev->attr.max_mem_regs;
+ props->max_pd = dev->attr.max_pds;
+ props->local_ca_ack_delay = 0;
+
+ return 0;
+}
+
+static int iwch_query_port(struct ib_device *ibdev,
+ u8 port, struct ib_port_attr *props)
+{
+ PDBG("%s ibdev %p\n", __FUNCTION__, ibdev);
+ props->max_mtu = IB_MTU_4096;
+ props->lid = 0;
+ props->lmc = 0;
+ props->sm_lid = 0;
+ props->sm_sl = 0;
+ props->state = IB_PORT_ACTIVE;
+ props->phys_state = 0;
+ props->port_cap_flags =
+ IB_PORT_CM_SUP |
+ IB_PORT_SNMP_TUNNEL_SUP |
+ IB_PORT_REINIT_SUP |
+ IB_PORT_DEVICE_MGMT_SUP |
+ IB_PORT_VENDOR_CLASS_SUP | IB_PORT_BOOT_MGMT_SUP;
+ props->gid_tbl_len = 1;
+ props->pkey_tbl_len = 1;
+ props->qkey_viol_cntr = 0;
+ props->active_width = 2;
+ props->active_speed = 2;
+ props->max_msg_sz = -1;
+
+ return 0;
+}
+
+static ssize_t show_rev(struct class_device *cdev, char *buf)
+{
+ struct iwch_dev *dev = container_of(cdev, struct iwch_dev,
+ ibdev.class_dev);
+ PDBG("%s class dev 0x%p\n", __FUNCTION__, cdev);
+ return sprintf(buf, "%d\n", dev->rdev.t3cdev_p->type);
+}
+
+static ssize_t show_fw_ver(struct class_device *cdev, char *buf)
+{
+ struct iwch_dev *dev = container_of(cdev, struct iwch_dev,
+ ibdev.class_dev);
+ struct ethtool_drvinfo info;
+ struct net_device *lldev = dev->rdev.t3cdev_p->lldev;
+
+ PDBG("%s class dev 0x%p\n", __FUNCTION__, cdev);
+ lldev->ethtool_ops->get_drvinfo(lldev, &info);
+ return sprintf(buf, "%s\n", info.fw_version);
+}
+
+static ssize_t show_hca(struct class_device *cdev, char *buf)
+{
+ struct iwch_dev *dev = container_of(cdev, struct iwch_dev,
+ ibdev.class_dev);
+ struct ethtool_drvinfo info;
+ struct net_device *lldev = dev->rdev.t3cdev_p->lldev;
+
+ PDBG("%s class dev 0x%p\n", __FUNCTION__, cdev);
+ lldev->ethtool_ops->get_drvinfo(lldev, &info);
+ return sprintf(buf, "%s\n", info.driver);
+}
+
+static ssize_t show_board(struct class_device *cdev, char *buf)
+{
+ struct iwch_dev *dev = container_of(cdev, struct iwch_dev,
+ ibdev.class_dev);
+ PDBG("%s class dev 0x%p\n", __FUNCTION__, dev);
+ return sprintf(buf, "%x.%x\n", dev->rdev.rnic_info.pdev->vendor,
+ dev->rdev.rnic_info.pdev->device);
+}
+
+static CLASS_DEVICE_ATTR(hw_rev, S_IRUGO, show_rev, NULL);
+static CLASS_DEVICE_ATTR(fw_ver, S_IRUGO, show_fw_ver, NULL);
+static CLASS_DEVICE_ATTR(hca_type, S_IRUGO, show_hca, NULL);
+static CLASS_DEVICE_ATTR(board_id, S_IRUGO, show_board, NULL);
+
+static struct class_device_attribute *iwch_class_attributes[] = {
+ &class_device_attr_hw_rev,
+ &class_device_attr_fw_ver,
+ &class_device_attr_hca_type,
+ &class_device_attr_board_id
+};
+
+int iwch_register_device(struct iwch_dev *dev)
+{
+ int ret;
+ int i;
+
+ PDBG("%s iwch_dev %p\n", __FUNCTION__, dev);
+ strlcpy(dev->ibdev.name, "cxgb3_%d", IB_DEVICE_NAME_MAX);
+ memset(&dev->ibdev.node_guid, 0, sizeof(dev->ibdev.node_guid));
+ memcpy(&dev->ibdev.node_guid, dev->rdev.t3cdev_p->lldev->dev_addr, 6);
+ dev->ibdev.owner = THIS_MODULE;
+ dev->device_cap_flags =
+ (IB_DEVICE_ZERO_STAG |
+ IB_DEVICE_SEND_W_INV | IB_DEVICE_MEM_WINDOW);
+
+ dev->ibdev.uverbs_cmd_mask =
+ (1ull << IB_USER_VERBS_CMD_GET_CONTEXT) |
+ (1ull << IB_USER_VERBS_CMD_QUERY_DEVICE) |
+ (1ull << IB_USER_VERBS_CMD_QUERY_PORT) |
+ (1ull << IB_USER_VERBS_CMD_ALLOC_PD) |
+ (1ull << IB_USER_VERBS_CMD_DEALLOC_PD) |
+ (1ull << IB_USER_VERBS_CMD_REG_MR) |
+ (1ull << IB_USER_VERBS_CMD_DEREG_MR) |
+ (1ull << IB_USER_VERBS_CMD_CREATE_COMP_CHANNEL) |
+ (1ull << IB_USER_VERBS_CMD_CREATE_CQ) |
+ (1ull << IB_USER_VERBS_CMD_DESTROY_CQ) |
+ (1ull << IB_USER_VERBS_CMD_REQ_NOTIFY_CQ) |
+ (1ull << IB_USER_VERBS_CMD_CREATE_QP) |
+ (1ull << IB_USER_VERBS_CMD_MODIFY_QP) |
+ (1ull << IB_USER_VERBS_CMD_POLL_CQ) |
+ (1ull << IB_USER_VERBS_CMD_DESTROY_QP) |
+ (1ull << IB_USER_VERBS_CMD_POST_SEND) |
+ (1ull << IB_USER_VERBS_CMD_POST_RECV);
+ dev->ibdev.node_type = RDMA_NODE_RNIC;
+ memcpy(dev->ibdev.node_desc, IWCH_NODE_DESC, sizeof(IWCH_NODE_DESC));
+ dev->ibdev.phys_port_cnt = dev->rdev.port_info.nports;
+ dev->ibdev.dma_device = &(dev->rdev.rnic_info.pdev->dev);
+ dev->ibdev.class_dev.dev = &(dev->rdev.rnic_info.pdev->dev);
+ dev->ibdev.query_device = iwch_query_device;
+ dev->ibdev.query_port = iwch_query_port;
+ dev->ibdev.modify_port = iwch_modify_port;
+ dev->ibdev.query_pkey = iwch_query_pkey;
+ dev->ibdev.query_gid = iwch_query_gid;
+ dev->ibdev.alloc_ucontext = iwch_alloc_ucontext;
+ dev->ibdev.dealloc_ucontext = iwch_dealloc_ucontext;
+ dev->ibdev.mmap = iwch_mmap;
+ dev->ibdev.alloc_pd = iwch_allocate_pd;
+ dev->ibdev.dealloc_pd = iwch_deallocate_pd;
+ dev->ibdev.create_ah = iwch_ah_create;
+ dev->ibdev.destroy_ah = iwch_ah_destroy;
+ dev->ibdev.create_qp = iwch_create_qp;
+ dev->ibdev.modify_qp = iwch_ib_modify_qp;
+ dev->ibdev.destroy_qp = iwch_destroy_qp;
+ dev->ibdev.create_cq = iwch_create_cq;
+ dev->ibdev.destroy_cq = iwch_destroy_cq;
+ dev->ibdev.resize_cq = iwch_resize_cq;
+ dev->ibdev.poll_cq = iwch_poll_cq;
+ dev->ibdev.get_dma_mr = iwch_get_dma_mr;
+ dev->ibdev.reg_phys_mr = iwch_register_phys_mem;
+ dev->ibdev.rereg_phys_mr = iwch_reregister_phys_mem;
+ dev->ibdev.reg_user_mr = iwch_reg_user_mr;
+ dev->ibdev.dereg_mr = iwch_dereg_mr;
+ dev->ibdev.alloc_mw = iwch_alloc_mw;
+ dev->ibdev.bind_mw = iwch_bind_mw;
+ dev->ibdev.dealloc_mw = iwch_dealloc_mw;
+
+ dev->ibdev.attach_mcast = iwch_multicast_attach;
+ dev->ibdev.detach_mcast = iwch_multicast_detach;
+ dev->ibdev.process_mad = iwch_process_mad;
+
+ dev->ibdev.req_notify_cq = iwch_arm_cq;
+ dev->ibdev.post_send = iwch_post_send;
+ dev->ibdev.post_recv = iwch_post_receive;
+
+
+ dev->ibdev.iwcm =
+ (struct iw_cm_verbs *) kmalloc(sizeof(struct iw_cm_verbs),
+ GFP_KERNEL);
+ dev->ibdev.iwcm->connect = iwch_connect;
+ dev->ibdev.iwcm->accept = iwch_accept_cr;
+ dev->ibdev.iwcm->reject = iwch_reject_cr;
+ dev->ibdev.iwcm->create_listen = iwch_create_listen;
+ dev->ibdev.iwcm->destroy_listen = iwch_destroy_listen;
+ dev->ibdev.iwcm->add_ref = iwch_qp_add_ref;
+ dev->ibdev.iwcm->rem_ref = iwch_qp_rem_ref;
+ dev->ibdev.iwcm->get_qp = iwch_get_qp;
+
+ ret = ib_register_device(&dev->ibdev);
+ if (ret)
+ goto bail1;
+
+ for (i = 0; i < ARRAY_SIZE(iwch_class_attributes); ++i) {
+ ret = class_device_create_file(&dev->ibdev.class_dev,
+ iwch_class_attributes[i]);
+ if (ret) {
+ goto bail2;
+ }
+ }
+ return 0;
+bail2:
+ ib_unregister_device(&dev->ibdev);
+bail1:
+ return ret;
+}
+
+void iwch_unregister_device(struct iwch_dev *dev)
+{
+ int i;
+
+ PDBG("%s iwch_dev %p\n", __FUNCTION__, dev);
+ for (i = 0; i < ARRAY_SIZE(iwch_class_attributes); ++i)
+ class_device_remove_file(&dev->ibdev.class_dev,
+ iwch_class_attributes[i]);
+ ib_unregister_device(&dev->ibdev);
+ return;
+}
diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.h b/drivers/infiniband/hw/cxgb3/iwch_provider.h
new file mode 100644
index 0000000..76616ac
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch_provider.h
@@ -0,0 +1,362 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef __IWCH_PROVIDER_H__
+#define __IWCH_PROVIDER_H__
+
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <rdma/ib_verbs.h>
+#include <asm/types.h>
+#include "t3cdev.h"
+#include "iwch.h"
+#include "cxio_wr.h"
+#include "cxio_hal.h"
+
+struct iwch_pd {
+ struct ib_pd ibpd;
+ u32 pdid;
+ struct iwch_dev *rhp;
+};
+
+static inline struct iwch_pd *to_iwch_pd(struct ib_pd *ibpd)
+{
+ return container_of(ibpd, struct iwch_pd, ibpd);
+}
+
+struct tpt_attributes {
+ u32 stag;
+ u32 state:1;
+ u32 type:2;
+ u32 rsvd:1;
+ enum tpt_mem_perm perms;
+ u32 remote_invaliate_disable:1;
+ u32 zbva:1;
+ u32 mw_bind_enable:1;
+ u32 page_size:5;
+
+ u32 pdid;
+ u32 qpid;
+ u32 pbl_addr;
+ u32 len;
+ u64 va_fbo;
+ u32 pbl_size;
+};
+
+struct iwch_mr {
+ struct ib_mr ibmr;
+ struct iwch_dev *rhp;
+ u64 kva;
+ struct tpt_attributes attr;
+};
+
+typedef struct iwch_mw iwch_mw_handle;
+
+static inline struct iwch_mr *to_iwch_mr(struct ib_mr *ibmr)
+{
+ return container_of(ibmr, struct iwch_mr, ibmr);
+}
+
+struct iwch_mw {
+ struct ib_mw ibmw;
+ struct iwch_dev *rhp;
+ u64 kva;
+ struct tpt_attributes attr;
+};
+
+static inline struct iwch_mw *to_iwch_mw(struct ib_mw *ibmw)
+{
+ return container_of(ibmw, struct iwch_mw, ibmw);
+}
+
+struct iwch_cq {
+ struct ib_cq ibcq;
+ struct iwch_dev *rhp;
+ struct t3_cq cq;
+ spinlock_t lock;
+ atomic_t refcnt;
+ wait_queue_head_t wait;
+};
+
+static inline struct iwch_cq *to_iwch_cq(struct ib_cq *ibcq)
+{
+ return container_of(ibcq, struct iwch_cq, ibcq);
+}
+
+enum IWCH_QP_FLAGS {
+ QP_QUIESCED = 0x01
+};
+
+struct iwch_mpa_attributes {
+ u8 recv_marker_enabled;
+ u8 xmit_marker_enabled; /* iWARP: enable inbound Read Resp. */
+ u8 crc_enabled;
+ u8 version; /* 0 or 1 */
+};
+
+struct iwch_qp_attributes {
+ u32 scq;
+ u32 rcq;
+ u32 sq_num_entries;
+ u32 rq_num_entries;
+ u32 sq_max_sges;
+ u32 sq_max_sges_rdma_write;
+ u32 rq_max_sges;
+ u32 state;
+ u8 enable_rdma_read;
+ u8 enable_rdma_write; /* enable inbound Read Resp. */
+ u8 enable_bind;
+ u8 enable_mmid0_fastreg; /* Enable STAG0 + Fast-register */
+ /*
+ * Next QP state. If specify the current state, only the
+ * QP attributes will be modified.
+ */
+ u32 max_ord;
+ u32 max_ird;
+ u32 pd; /* IN */
+ u32 next_state;
+ char terminate_buffer[52];
+ u32 terminate_msg_len;
+ u8 is_terminate_local;
+ struct iwch_mpa_attributes mpa_attr; /* IN-OUT */
+ struct iwch_ep *llp_stream_handle;
+ char *stream_msg_buf; /* Last stream msg. before Idle -> RTS */
+ u32 stream_msg_buf_len; /* Only on Idle -> RTS */
+};
+
+struct iwch_qp {
+ struct ib_qp ibqp;
+ struct iwch_dev *rhp;
+ struct iwch_ep *ep;
+ struct iwch_qp_attributes attr;
+ struct t3_wq wq;
+ spinlock_t lock;
+ atomic_t refcnt;
+ wait_queue_head_t wait;
+ enum IWCH_QP_FLAGS flags;
+ struct timer_list timer;
+};
+
+static inline int qp_quiesced(struct iwch_qp *qhp)
+{
+ return (qhp->flags & QP_QUIESCED);
+}
+
+static inline struct iwch_qp *to_iwch_qp(struct ib_qp *ibqp)
+{
+ return container_of(ibqp, struct iwch_qp, ibqp);
+}
+
+void iwch_qp_add_ref(struct ib_qp *qp);
+void iwch_qp_rem_ref(struct ib_qp *qp);
+struct ib_qp *iwch_get_qp(struct ib_device *dev, int qpn);
+
+struct iwch_ucontext {
+ struct ib_ucontext ibucontext;
+ struct cxio_ucontext uctx;
+ struct list_head mmaps;
+};
+
+static inline struct iwch_ucontext *to_iwch_ucontext(struct ib_ucontext *c)
+{
+ return container_of(c, struct iwch_ucontext, ibucontext);
+}
+
+struct iwch_mm_entry {
+ struct list_head entry;
+ u64 addr;
+ unsigned len;
+};
+
+static inline struct iwch_mm_entry *remove_mmap(struct iwch_ucontext *ucontext,
+ u64 addr, unsigned len)
+{
+ struct list_head *pos, *nxt;
+ struct iwch_mm_entry *mm;
+
+ mutex_lock(&ucontext->uctx.lock);
+ list_for_each_safe(pos, nxt, &ucontext->mmaps) {
+
+ mm = list_entry(pos, struct iwch_mm_entry, entry);
+ if (mm->addr == addr && mm->len == len) {
+ list_del_init(&mm->entry);
+ mutex_unlock(&ucontext->uctx.lock);
+ PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr,
+ mm->len);
+ return mm;
+ }
+ }
+ mutex_unlock(&ucontext->uctx.lock);
+ return NULL;
+}
+
+static inline void insert_mmap(struct iwch_ucontext *ucontext,
+ struct iwch_mm_entry *mm)
+{
+ mutex_lock(&ucontext->uctx.lock);
+ PDBG("%s addr 0x%llx len %d\n", __FUNCTION__, mm->addr, mm->len);
+ list_add_tail(&mm->entry, &ucontext->mmaps);
+ mutex_unlock(&ucontext->uctx.lock);
+}
+
+enum iwch_qp_attr_mask {
+ IWCH_QP_ATTR_NEXT_STATE = 1 << 0,
+ IWCH_QP_ATTR_ENABLE_RDMA_READ = 1 << 7,
+ IWCH_QP_ATTR_ENABLE_RDMA_WRITE = 1 << 8,
+ IWCH_QP_ATTR_ENABLE_RDMA_BIND = 1 << 9,
+ IWCH_QP_ATTR_MAX_ORD = 1 << 11,
+ IWCH_QP_ATTR_MAX_IRD = 1 << 12,
+ IWCH_QP_ATTR_LLP_STREAM_HANDLE = 1 << 22,
+ IWCH_QP_ATTR_STREAM_MSG_BUFFER = 1 << 23,
+ IWCH_QP_ATTR_MPA_ATTR = 1 << 24,
+ IWCH_QP_ATTR_QP_CONTEXT_ACTIVATE = 1 << 25,
+ IWCH_QP_ATTR_VALID_MODIFY = (IWCH_QP_ATTR_ENABLE_RDMA_READ |
+ IWCH_QP_ATTR_ENABLE_RDMA_WRITE |
+ IWCH_QP_ATTR_MAX_ORD |
+ IWCH_QP_ATTR_MAX_IRD |
+ IWCH_QP_ATTR_LLP_STREAM_HANDLE |
+ IWCH_QP_ATTR_STREAM_MSG_BUFFER |
+ IWCH_QP_ATTR_MPA_ATTR |
+ IWCH_QP_ATTR_QP_CONTEXT_ACTIVATE)
+};
+
+int iwch_modify_qp(struct iwch_dev *rhp,
+ struct iwch_qp *qhp,
+ enum iwch_qp_attr_mask mask,
+ struct iwch_qp_attributes *attrs,
+ int internal);
+
+enum iwch_qp_state {
+ IWCH_QP_STATE_IDLE,
+ IWCH_QP_STATE_RTS,
+ IWCH_QP_STATE_ERROR,
+ IWCH_QP_STATE_TERMINATE,
+ IWCH_QP_STATE_CLOSING,
+ IWCH_QP_STATE_TOT
+};
+
+static inline int iwch_convert_state(enum ib_qp_state ib_state)
+{
+ switch (ib_state) {
+ case IB_QPS_RESET:
+ case IB_QPS_INIT:
+ return IWCH_QP_STATE_IDLE;
+ case IB_QPS_RTS:
+ return IWCH_QP_STATE_RTS;
+ case IB_QPS_SQD:
+ return IWCH_QP_STATE_CLOSING;
+ case IB_QPS_SQE:
+ return IWCH_QP_STATE_TERMINATE;
+ case IB_QPS_ERR:
+ return IWCH_QP_STATE_ERROR;
+ default:
+ return -1;
+ }
+}
+
+enum iwch_mem_perms {
+ IWCH_MEM_ACCESS_LOCAL_READ = 1 << 0,
+ IWCH_MEM_ACCESS_LOCAL_WRITE = 1 << 1,
+ IWCH_MEM_ACCESS_REMOTE_READ = 1 << 2,
+ IWCH_MEM_ACCESS_REMOTE_WRITE = 1 << 3,
+ IWCH_MEM_ACCESS_ATOMICS = 1 << 4,
+ IWCH_MEM_ACCESS_BINDING = 1 << 5,
+ IWCH_MEM_ACCESS_LOCAL =
+ (IWCH_MEM_ACCESS_LOCAL_READ | IWCH_MEM_ACCESS_LOCAL_WRITE),
+ IWCH_MEM_ACCESS_REMOTE =
+ (IWCH_MEM_ACCESS_REMOTE_WRITE | IWCH_MEM_ACCESS_REMOTE_READ)
+ /* cannot go beyond 1 << 31 */
+} __attribute__ ((packed));
+
+static inline u32 iwch_convert_access(int acc)
+{
+ return (acc & IB_ACCESS_REMOTE_WRITE ? IWCH_MEM_ACCESS_REMOTE_WRITE : 0)
+ | (acc & IB_ACCESS_REMOTE_READ ? IWCH_MEM_ACCESS_REMOTE_READ : 0) |
+ (acc & IB_ACCESS_LOCAL_WRITE ? IWCH_MEM_ACCESS_LOCAL_WRITE : 0) |
+ (acc & IB_ACCESS_MW_BIND ? IWCH_MEM_ACCESS_BINDING : 0) |
+ IWCH_MEM_ACCESS_LOCAL_READ;
+}
+
+enum iwch_mmid_state {
+ IWCH_STAG_STATE_VALID,
+ IWCH_STAG_STATE_INVALID
+};
+
+enum iwch_qp_query_flags {
+ IWCH_QP_QUERY_CONTEXT_NONE = 0x0, /* No ctx; Only attrs */
+ IWCH_QP_QUERY_CONTEXT_GET = 0x1, /* Get ctx + attrs */
+ IWCH_QP_QUERY_CONTEXT_SUSPEND = 0x2, /* Not Supported */
+
+ /*
+ * Quiesce QP context; Consumer
+ * will NOT replay outstanding WR
+ */
+ IWCH_QP_QUERY_CONTEXT_QUIESCE = 0x4,
+ IWCH_QP_QUERY_CONTEXT_REMOVE = 0x8,
+ IWCH_QP_QUERY_TEST_USERWRITE = 0x32 /* Test special */
+};
+
+int iwch_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
+ struct ib_send_wr **bad_wr);
+int iwch_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr,
+ struct ib_recv_wr **bad_wr);
+int iwch_bind_mw(struct ib_qp *qp,
+ struct ib_mw *mw,
+ struct ib_mw_bind *mw_bind);
+int iwch_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc);
+int iwch_post_terminate(struct iwch_qp *qhp, struct respQ_msg_t *rsp_msg);
+int iwch_register_device(struct iwch_dev *dev);
+void iwch_unregister_device(struct iwch_dev *dev);
+int iwch_quiesce_qps(struct iwch_cq *chp);
+int iwch_resume_qps(struct iwch_cq *chp);
+void stop_read_rep_timer(struct iwch_qp *qhp);
+int iwch_register_mem(struct iwch_dev *rhp, struct iwch_pd *php,
+ struct iwch_mr *mhp,
+ int shift,
+ __be64 *page_list);
+int iwch_reregister_mem(struct iwch_dev *rhp, struct iwch_pd *php,
+ struct iwch_mr *mhp,
+ int shift,
+ __be64 *page_list,
+ int npages);
+int build_phys_page_list(struct ib_phys_buf *buffer_list,
+ int num_phys_buf,
+ u64 *iova_start,
+ u64 *total_size,
+ int *npages,
+ int *shift,
+ __be64 **page_list);
+
+
+#define IWCH_NODE_DESC "cxgb3 Chelsio Communications"
+
+#endif
diff --git a/drivers/infiniband/hw/cxgb3/iwch_user.h b/drivers/infiniband/hw/cxgb3/iwch_user.h
new file mode 100644
index 0000000..4e4b9c9
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch_user.h
@@ -0,0 +1,68 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef __IWCH_USER_H__
+#define __IWCH_USER_H__
+
+#define IWCH_UVERBS_ABI_VERSION 1
+
+/*
+ * Make sure that all structs defined in this file remain laid out so
+ * that they pack the same way on 32-bit and 64-bit architectures (to
+ * avoid incompatibility between 32-bit userspace and 64-bit kernels).
+ * In particular do not use pointer types -- pass pointers in __u64
+ * instead.
+ */
+
+struct iwch_create_cq_resp {
+ __u64 physaddr;
+ __u32 cqid;
+ __u32 size_log2;
+};
+
+struct iwch_create_qp_resp {
+ __u64 physaddr;
+ __u64 doorbell;
+ __u32 qpid;
+ __u32 size_log2;
+ __u32 sq_size_log2;
+ __u32 rq_size_log2;
+};
+
+struct iwch_reg_user_mr_resp {
+ __u32 pbl_addr;
+};
+
+struct iwch_req_notify_cq {
+ __u32 rptr;
+};
+#endif

2006-12-02 22:51:15

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 07/13] Async Event Handler


Code to handle async events coming from the T3 RDMA Core.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/hw/cxgb3/iwch_ev.c | 228 +++++++++++++++++++++++++++++++++
1 files changed, 228 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_ev.c b/drivers/infiniband/hw/cxgb3/iwch_ev.c
new file mode 100644
index 0000000..bf767b2
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch_ev.c
@@ -0,0 +1,228 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <linux/slab.h>
+#include <linux/mman.h>
+#include <net/sock.h>
+#include "iwch_provider.h"
+#include "iwch.h"
+#include "iwch_cm.h"
+#include "cxio_hal.h"
+#include "cxio_wr.h"
+
+static void post_qp_event(struct iwch_dev *rnicp, struct iwch_cq *chp,
+ struct respQ_msg_t *rsp_msg,
+ enum ib_event_type ib_event,
+ int send_term)
+{
+ struct ib_event event;
+ struct iwch_qp_attributes attrs;
+ struct iwch_qp *qhp;
+
+ printk(KERN_ERR "%s - AE qpid 0x%x opcode %d status 0x%x "
+ "type %d wrid.hi 0x%x wrid.lo 0x%x \n", __FUNCTION__,
+ CQE_QPID(rsp_msg->cqe), CQE_OPCODE(rsp_msg->cqe),
+ CQE_STATUS(rsp_msg->cqe), CQE_TYPE(rsp_msg->cqe),
+ CQE_WRID_HI(rsp_msg->cqe), CQE_WRID_LOW(rsp_msg->cqe));
+
+ spin_lock(&rnicp->lock);
+ qhp = get_qhp(rnicp, CQE_QPID(rsp_msg->cqe));
+
+ if (!qhp) {
+ printk(KERN_ERR "%s unaffiliated error 0x%x qpid 0x%x\n",
+ __FUNCTION__, CQE_STATUS(rsp_msg->cqe),
+ CQE_QPID(rsp_msg->cqe));
+ spin_unlock(&rnicp->lock);
+ return;
+ }
+
+ if ((qhp->attr.state == IWCH_QP_STATE_ERROR) ||
+ (qhp->attr.state == IWCH_QP_STATE_TERMINATE)) {
+ PDBG("%s AE received after RTS - "
+ "qp state %d qpid 0x%x status 0x%x\n", __FUNCTION__,
+ qhp->attr.state, qhp->wq.qpid, CQE_STATUS(rsp_msg->cqe));
+ spin_unlock(&rnicp->lock);
+ return;
+ }
+
+ atomic_inc(&qhp->refcnt);
+ spin_unlock(&rnicp->lock);
+
+ event.event = ib_event;
+ event.device = chp->ibcq.device;
+ if (ib_event == IB_EVENT_CQ_ERR)
+ event.element.cq = &chp->ibcq;
+ else
+ event.element.qp = &qhp->ibqp;
+
+ if (qhp->ibqp.event_handler)
+ (*qhp->ibqp.event_handler)(&event, qhp->ibqp.qp_context);
+
+ attrs.next_state = IWCH_QP_STATE_TERMINATE;
+ if (send_term && (qhp->attr.state == IWCH_QP_STATE_RTS) &&
+ !iwch_modify_qp(qhp->rhp, qhp, IWCH_QP_ATTR_NEXT_STATE, &attrs, 1))
+ iwch_post_terminate(qhp, rsp_msg);
+
+ if (atomic_dec_and_test(&qhp->refcnt))
+ wake_up(&qhp->wait);
+}
+
+void iwch_ev_dispatch(struct cxio_rdev *rdev_p, struct sk_buff *skb)
+{
+ struct iwch_dev *rnicp;
+ struct respQ_msg_t *rsp_msg = (struct respQ_msg_t *) skb->data;
+ struct iwch_cq *chp;
+ struct iwch_qp *qhp;
+ u32 cqid = RSPQ_CQID(rsp_msg);
+
+ rnicp = (struct iwch_dev *) rdev_p->ulp;
+ spin_lock(&rnicp->lock);
+ chp = get_chp(rnicp, cqid);
+ qhp = get_qhp(rnicp, CQE_QPID(rsp_msg->cqe));
+ if (!chp || !qhp) {
+ printk(KERN_ERR MOD "BAD AE cqid 0x%x qpid 0x%x opcode %d "
+ "status 0x%x type %d wrid.hi 0x%x wrid.lo 0x%x \n",
+ cqid, CQE_QPID(rsp_msg->cqe),
+ CQE_OPCODE(rsp_msg->cqe), CQE_STATUS(rsp_msg->cqe),
+ CQE_TYPE(rsp_msg->cqe), CQE_WRID_HI(rsp_msg->cqe),
+ CQE_WRID_LOW(rsp_msg->cqe));
+ spin_unlock(&rnicp->lock);
+ goto out;
+ }
+ iwch_qp_add_ref(&qhp->ibqp);
+ atomic_inc(&chp->refcnt);
+ spin_unlock(&rnicp->lock);
+
+ /*
+ * 1) completion of our sending a TERMINATE.
+ * 2) incoming TERMINATE message.
+ */
+ if ((CQE_OPCODE(rsp_msg->cqe) == T3_TERMINATE) &&
+ (CQE_STATUS(rsp_msg->cqe) == 0)) {
+ if (SQ_TYPE(rsp_msg->cqe)) {
+ PDBG("%s QPID 0x%x ep %p disconnecting\n",
+ __FUNCTION__, qhp->wq.qpid, qhp->ep);
+ iwch_ep_disconnect(qhp->ep, 0, GFP_ATOMIC);
+ } else {
+ PDBG("%s post REQ_ERR AE QPID 0x%x\n", __FUNCTION__,
+ qhp->wq.qpid);
+ post_qp_event(rnicp, chp, rsp_msg,
+ IB_EVENT_QP_REQ_ERR, 0);
+ iwch_ep_disconnect(qhp->ep, 0, GFP_ATOMIC);
+ }
+ goto done;
+ }
+
+ /* Bad incoming Read request */
+ if (SQ_TYPE(rsp_msg->cqe) &&
+ (CQE_OPCODE(rsp_msg->cqe) == T3_READ_RESP)) {
+ post_qp_event(rnicp, chp, rsp_msg, IB_EVENT_QP_REQ_ERR, 1);
+ goto done;
+ }
+
+ /* Bad incoming write */
+ if (RQ_TYPE(rsp_msg->cqe) &&
+ (CQE_OPCODE(rsp_msg->cqe) == T3_RDMA_WRITE)) {
+ post_qp_event(rnicp, chp, rsp_msg, IB_EVENT_QP_REQ_ERR, 1);
+ goto done;
+ }
+
+ switch (CQE_STATUS(rsp_msg->cqe)) {
+
+ /* Completion Events */
+ case TPT_ERR_SUCCESS:
+
+ /*
+ * Confirm the destination entry if this is a RECV completion.
+ */
+ if (qhp->ep && SQ_TYPE(rsp_msg->cqe))
+ dst_confirm(qhp->ep->dst);
+ (*chp->ibcq.comp_handler)(&chp->ibcq, chp->ibcq.cq_context);
+ break;
+
+ case TPT_ERR_STAG:
+ case TPT_ERR_PDID:
+ case TPT_ERR_QPID:
+ case TPT_ERR_ACCESS:
+ case TPT_ERR_WRAP:
+ case TPT_ERR_BOUND:
+ case TPT_ERR_INVALIDATE_SHARED_MR:
+ case TPT_ERR_INVALIDATE_MR_WITH_MW_BOUND:
+ printk(KERN_ERR "%s - CQE Err qpid 0x%x opcode %d status 0x%x "
+ "type %d wrid.hi 0x%x wrid.lo 0x%x \n", __FUNCTION__,
+ CQE_QPID(rsp_msg->cqe), CQE_OPCODE(rsp_msg->cqe),
+ CQE_STATUS(rsp_msg->cqe), CQE_TYPE(rsp_msg->cqe),
+ CQE_WRID_HI(rsp_msg->cqe), CQE_WRID_LOW(rsp_msg->cqe));
+ (*chp->ibcq.comp_handler)(&chp->ibcq, chp->ibcq.cq_context);
+ post_qp_event(rnicp, chp, rsp_msg, IB_EVENT_QP_ACCESS_ERR, 1);
+ break;
+
+ /* Device Fatal Errors */
+ case TPT_ERR_ECC:
+ case TPT_ERR_ECC_PSTAG:
+ case TPT_ERR_INTERNAL_ERR:
+ post_qp_event(rnicp, chp, rsp_msg, IB_EVENT_DEVICE_FATAL, 1);
+ break;
+
+ /* QP Fatal Errors */
+ case TPT_ERR_OUT_OF_RQE:
+ case TPT_ERR_PBL_ADDR_BOUND:
+ case TPT_ERR_CRC:
+ case TPT_ERR_MARKER:
+ case TPT_ERR_PDU_LEN_ERR:
+ case TPT_ERR_DDP_VERSION:
+ case TPT_ERR_RDMA_VERSION:
+ case TPT_ERR_OPCODE:
+ case TPT_ERR_DDP_QUEUE_NUM:
+ case TPT_ERR_MSN:
+ case TPT_ERR_TBIT:
+ case TPT_ERR_MO:
+ case TPT_ERR_MSN_GAP:
+ case TPT_ERR_MSN_RANGE:
+ case TPT_ERR_RQE_ADDR_BOUND:
+ case TPT_ERR_IRD_OVERFLOW:
+ post_qp_event(rnicp, chp, rsp_msg, IB_EVENT_QP_FATAL, 1);
+ break;
+
+ default:
+ printk(KERN_ERR MOD "Unknown T3 status 0x%x QPID 0x%x\n",
+ CQE_STATUS(rsp_msg->cqe), qhp->wq.qpid);
+ post_qp_event(rnicp, chp, rsp_msg, IB_EVENT_QP_FATAL, 1);
+ break;
+ }
+done:
+ if (atomic_dec_and_test(&chp->refcnt))
+ wake_up(&chp->wait);
+ iwch_qp_rem_ref(&qhp->ibqp);
+out:
+ dev_kfree_skb_irq(skb);
+}

2006-12-02 22:51:48

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 13/13] Kconfig/Makefile


Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/Kconfig | 1 +
drivers/infiniband/Makefile | 1 +
drivers/infiniband/hw/cxgb3/Kconfig | 27 +++++++++++++++++++++++++++
drivers/infiniband/hw/cxgb3/Makefile | 12 ++++++++++++
drivers/infiniband/hw/cxgb3/locking.txt | 25 +++++++++++++++++++++++++
5 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/Kconfig b/drivers/infiniband/Kconfig
index 59b3932..06453ab 100644
--- a/drivers/infiniband/Kconfig
+++ b/drivers/infiniband/Kconfig
@@ -38,6 +38,7 @@ source "drivers/infiniband/hw/mthca/Kcon
source "drivers/infiniband/hw/ipath/Kconfig"
source "drivers/infiniband/hw/ehca/Kconfig"
source "drivers/infiniband/hw/amso1100/Kconfig"
+source "drivers/infiniband/hw/cxgb3/Kconfig"

source "drivers/infiniband/ulp/ipoib/Kconfig"

diff --git a/drivers/infiniband/Makefile b/drivers/infiniband/Makefile
index 570b30a..69bdd55 100644
--- a/drivers/infiniband/Makefile
+++ b/drivers/infiniband/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_INFINIBAND_MTHCA) += hw/mt
obj-$(CONFIG_INFINIBAND_IPATH) += hw/ipath/
obj-$(CONFIG_INFINIBAND_EHCA) += hw/ehca/
obj-$(CONFIG_INFINIBAND_AMSO1100) += hw/amso1100/
+obj-$(CONFIG_INFINIBAND_CXGB3) += hw/cxgb3/
obj-$(CONFIG_INFINIBAND_IPOIB) += ulp/ipoib/
obj-$(CONFIG_INFINIBAND_SRP) += ulp/srp/
obj-$(CONFIG_INFINIBAND_ISER) += ulp/iser/
diff --git a/drivers/infiniband/hw/cxgb3/Kconfig b/drivers/infiniband/hw/cxgb3/Kconfig
new file mode 100644
index 0000000..84f0f6e
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/Kconfig
@@ -0,0 +1,27 @@
+config INFINIBAND_CXGB3
+ tristate "Chelsio RDMA Driver"
+ depends on CHELSIO_T3 && INFINIBAND
+ select GENERIC_ALLOCATOR
+ ---help---
+ This is an iWARP/RDMA driver for the Chelsio T3 1GbE and
+ 10GbE adapters.
+
+ For general information about Chelsio and our products, visit
+ our website at <http://www.chelsio.com>.
+
+ For customer support, please visit our customer support page at
+ <http://www.chelsio.com/support.htm>.
+
+ Please send feedback to <[email protected]>.
+
+ To compile this driver as a module, choose M here: the module
+ will be called iw_cxgb3.
+
+config INFINIBAND_CXGB3_DEBUG
+ bool "Verbose debugging output"
+ depends on INFINIBAND_CXGB3
+ default n
+ ---help---
+ This option causes the Chelsio RDMA driver to produce copious
+ amounts of debug messages. Select this if you are developing
+ the driver or trying to diagnose a problem.
diff --git a/drivers/infiniband/hw/cxgb3/Makefile b/drivers/infiniband/hw/cxgb3/Makefile
new file mode 100644
index 0000000..0df2b3d
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/Makefile
@@ -0,0 +1,12 @@
+EXTRA_CFLAGS += -I$(TOPDIR)/drivers/net/cxgb3 \
+ -I$(TOPDIR)/drivers/infiniband/hw/cxgb3/core
+
+obj-$(CONFIG_INFINIBAND_CXGB3) += iw_cxgb3.o
+
+iw_cxgb3-y := iwch_cm.o iwch_ev.o iwch_cq.o iwch_qp.o iwch_mem.o \
+ iwch_provider.o iwch.o core/cxio_hal.o core/cxio_resource.o
+
+ifdef CONFIG_INFINIBAND_CXGB3_DEBUG
+EXTRA_CFLAGS += -DDEBUG -O1 -g
+iw_cxgb3-y += core/cxio_dbg.o
+endif
diff --git a/drivers/infiniband/hw/cxgb3/locking.txt b/drivers/infiniband/hw/cxgb3/locking.txt
new file mode 100644
index 0000000..e5e9991
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/locking.txt
@@ -0,0 +1,25 @@
+cq lock:
+ - spin lock
+ - used to synchronize the t3_cq
+
+qp lock:
+ - spin lock
+ - used to synchronize updates to the qp state, attrs, and the t3_wq.
+ - touched on interrupt and process context
+
+rnicp lock:
+ - spin lock
+ - touched on interrupt and process context
+ - used around lookup tables mapping CQID and QPID to a structure.
+ - used also to bump the refcnt atomically with the lookup.
+
+poll:
+ lock+disable on cq lock
+ lock qp lock for each cqe that is polled around the call
+ to cxio_poll_cq().
+
+post:
+ lock+disable qp lock
+
+global mutex iwch_mutex:
+ used to maintain global device list.

2006-12-02 22:50:31

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 04/13] Connection Manager


This code implements the iWARP CM provider methods for the Chelsio driver.
The Chelsio ULLD is used to setup and teardown TCP connections, and the
T3 RDMA Core is used to move the connections in and out of RDMA mode.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/hw/cxgb3/iwch_cm.c | 2059 +++++++++++++++++++++++++++++++++
drivers/infiniband/hw/cxgb3/iwch_cm.h | 223 ++++
2 files changed, 2282 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.c b/drivers/infiniband/hw/cxgb3/iwch_cm.c
new file mode 100644
index 0000000..5c59396
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.c
@@ -0,0 +1,2059 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/skbuff.h>
+#include <linux/timer.h>
+#include <linux/notifier.h>
+
+#include <net/neighbour.h>
+#include <net/netevent.h>
+#include <net/route.h>
+
+#include "tcb.h"
+#include "cxgb3_offload.h"
+#include "iwch.h"
+#include "iwch_provider.h"
+#include "iwch_cm.h"
+
+char *states[] = {
+ "idle",
+ "listen",
+ "connecting",
+ "mpa_wait_req",
+ "mpa_req_sent",
+ "mpa_req_rcvd",
+ "mpa_rep_sent",
+ "fpdu_mode",
+ "aborting",
+ "closing",
+ "moribund",
+ "dead",
+ NULL,
+};
+
+static int ep_timeout_secs = 10;
+module_param(ep_timeout_secs, int, 0444);
+MODULE_PARM_DESC(ep_timeout_secs, "CM Endpoint operation timeout "
+ "in seconds (default=10)");
+
+static int mpa_rev = 1;
+module_param(mpa_rev, int, 0444);
+MODULE_PARM_DESC(mpa_rev, "MPA Revision, 0 supports amso1100, "
+ "1 is spec compliant. (default=1)");
+
+static int markers_enabled = 0;
+module_param(markers_enabled, int, 0444);
+MODULE_PARM_DESC(markers_enabled, "Enable MPA MARKERS (default(0)=disabled)");
+
+static int crc_enabled = 1;
+module_param(crc_enabled, int, 0444);
+MODULE_PARM_DESC(crc_enabled, "Enable MPA CRC (default(1)=enabled)");
+
+static int rcv_win = 512 * 1024;
+module_param(rcv_win, int, 0444);
+MODULE_PARM_DESC(rcv_win, "TCP receive window in bytes (default=512KB)");
+
+static int snd_win = 512 * 1024;
+module_param(snd_win, int, 0444);
+MODULE_PARM_DESC(snd_win, "TCP send window in bytes (default=512KB)");
+
+static unsigned int nocong = 1;
+module_param(nocong, uint, 0444);
+MODULE_PARM_DESC(nocong, "Turn off congestion control (default=1)");
+
+static void process_work(void *ctx);
+static struct workqueue_struct *workq;
+DECLARE_WORK(skb_work, process_work, NULL);
+
+static struct sk_buff_head rxq;
+static cxgb3_cpl_handler_func work_handlers[NUM_CPL_CMDS];
+
+static struct sk_buff *get_skb(struct sk_buff *skb, int len, gfp_t gfp);
+static void ep_timeout(unsigned long arg);
+static void connect_reply_upcall(struct iwch_ep *ep, int status);
+
+static void start_ep_timer(struct iwch_ep *ep)
+{
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ if (timer_pending(&ep->timer)) {
+ PDBG("%s stopped / restarted timer ep %p\n", __FUNCTION__, ep);
+ del_timer_sync(&ep->timer);
+ } else
+ get_ep(&ep->com);
+ ep->timer.expires = jiffies + ep_timeout_secs * HZ;
+ ep->timer.data = (unsigned long)ep;
+ ep->timer.function = ep_timeout;
+ add_timer(&ep->timer);
+}
+
+static void stop_ep_timer(struct iwch_ep *ep)
+{
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ del_timer_sync(&ep->timer);
+ put_ep(&ep->com);
+}
+
+static void release_tid(struct t3cdev *tdev, u32 hwtid, struct sk_buff *skb)
+{
+ struct cpl_tid_release *req;
+
+ skb = get_skb(skb, sizeof *req, GFP_KERNEL);
+ if (!skb)
+ return;
+ req = (struct cpl_tid_release *) skb_put(skb, sizeof(*req));
+ req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
+ OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_TID_RELEASE, hwtid));
+ skb->priority = CPL_PRIORITY_SETUP;
+ tdev->send(tdev, skb);
+ return;
+}
+
+int iwch_quiesce_tid(struct iwch_ep *ep)
+{
+ struct cpl_set_tcb_field *req;
+ struct sk_buff *skb = get_skb(NULL, sizeof(*req), GFP_KERNEL);
+
+ if (!skb)
+ return -ENOMEM;
+ req = (struct cpl_set_tcb_field *) skb_put(skb, sizeof(*req));
+ req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
+ req->wr.wr_lo = htonl(V_WR_TID(ep->hwtid));
+ OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_SET_TCB_FIELD, ep->hwtid));
+ req->reply = 0;
+ req->cpu_idx = 0;
+ req->word = htons(W_TCB_RX_QUIESCE);
+ req->mask = cpu_to_be64(1ULL << S_TCB_RX_QUIESCE);
+ req->val = cpu_to_be64(1 << S_TCB_RX_QUIESCE);
+
+ skb->priority = CPL_PRIORITY_DATA;
+ ep->com.tdev->send(ep->com.tdev, skb);
+ return 0;
+}
+
+int iwch_resume_tid(struct iwch_ep *ep)
+{
+ struct cpl_set_tcb_field *req;
+ struct sk_buff *skb = get_skb(NULL, sizeof(*req), GFP_KERNEL);
+
+ if (!skb)
+ return -ENOMEM;
+ req = (struct cpl_set_tcb_field *) skb_put(skb, sizeof(*req));
+ req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
+ req->wr.wr_lo = htonl(V_WR_TID(ep->hwtid));
+ OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_SET_TCB_FIELD, ep->hwtid));
+ req->reply = 0;
+ req->cpu_idx = 0;
+ req->word = htons(W_TCB_RX_QUIESCE);
+ req->mask = cpu_to_be64(1ULL << S_TCB_RX_QUIESCE);
+ req->val = 0;
+
+ skb->priority = CPL_PRIORITY_DATA;
+ ep->com.tdev->send(ep->com.tdev, skb);
+ return 0;
+}
+
+static void set_emss(struct iwch_ep *ep, u16 opt)
+{
+ PDBG("%s ep %p opt %u\n", __FUNCTION__, ep, opt);
+ ep->emss = T3C_DATA(ep->com.tdev)->mtus[G_TCPOPT_MSS(opt)] - 40;
+ if (G_TCPOPT_TSTAMP(opt))
+ ep->emss -= 12;
+ if (ep->emss < 128)
+ ep->emss = 128;
+ PDBG("emss=%d\n", ep->emss);
+}
+
+static int state_comp_exch(struct iwch_ep_common *epc,
+ enum iwch_ep_state comp,
+ enum iwch_ep_state exch)
+{
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&epc->lock, flags);
+ ret = (epc->state == comp);
+ if (ret)
+ epc->state = exch;
+ spin_unlock_irqrestore(&epc->lock, flags);
+ return ret;
+}
+
+static enum iwch_ep_state state_read(struct iwch_ep_common *epc)
+{
+ unsigned long flags;
+ enum iwch_ep_state state;
+
+ spin_lock_irqsave(&epc->lock, flags);
+ state = epc->state;
+ spin_unlock_irqrestore(&epc->lock, flags);
+ return state;
+}
+
+static void state_set(struct iwch_ep_common *epc, enum iwch_ep_state new)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&epc->lock, flags);
+ PDBG("%s - %s -> %s\n", __FUNCTION__, states[epc->state],
+ states[new]);
+ epc->state = new;
+ spin_unlock_irqrestore(&epc->lock, flags);
+ return;
+}
+
+static void *alloc_ep(int size, gfp_t gfp)
+{
+ struct iwch_ep_common *epc;
+
+ epc = kmalloc(size, gfp);
+ if (epc) {
+ memset(epc, 0, size);
+ kref_init(&epc->kref);
+ spin_lock_init(&epc->lock);
+ init_waitqueue_head(&epc->waitq);
+ }
+ PDBG("%s alloc ep %p\n", __FUNCTION__, epc);
+ return (void *) epc;
+}
+
+void __free_ep(struct kref *kref)
+{
+ struct iwch_ep_common *epc;
+ epc = container_of(kref, struct iwch_ep_common, kref);
+ PDBG("%s ep %p state %s\n", __FUNCTION__, epc, states[state_read(epc)]);
+ kfree(epc);
+}
+
+static void release_ep_resources(struct iwch_ep *ep)
+{
+ PDBG("%s ep %p tid %d\n", __FUNCTION__, ep, ep->hwtid);
+ state_set(&ep->com, DEAD);
+ cxgb3_remove_tid(ep->com.tdev, (void *)ep, ep->hwtid);
+ dst_release(ep->dst);
+ l2t_release(L2DATA(ep->com.tdev), ep->l2t);
+ if (ep->com.tdev->type == T3B)
+ release_tid(ep->com.tdev, ep->hwtid, NULL);
+ put_ep(&ep->com);
+}
+
+static void process_work(void *ctx)
+{
+ struct sk_buff *skb = NULL;
+ void *ep;
+ struct t3cdev *tdev;
+ int ret;
+
+ while ((skb = skb_dequeue(&rxq))) {
+ ep = *((void **) (skb->cb));
+ tdev = *((struct t3cdev **) (skb->cb + sizeof(void *)));
+ ret = work_handlers[G_OPCODE(ntohl((__force __be32)skb->csum))](tdev, skb, ep);
+ if (ret & CPL_RET_BUF_DONE)
+ kfree_skb(skb);
+
+ /*
+ * ep was referenced in sched(), and is freed here.
+ */
+ put_ep((struct iwch_ep_common *)ep);
+ }
+}
+
+static int status2errno(int status)
+{
+ switch (status) {
+ case CPL_ERR_NONE:
+ return 0;
+ case CPL_ERR_CONN_RESET:
+ return -ECONNRESET;
+ case CPL_ERR_ARP_MISS:
+ return -EHOSTUNREACH;
+ case CPL_ERR_CONN_TIMEDOUT:
+ return -ETIMEDOUT;
+ case CPL_ERR_TCAM_FULL:
+ return -ENOMEM;
+ case CPL_ERR_CONN_EXIST:
+ return -EADDRINUSE;
+ default:
+ return -EIO;
+ }
+}
+
+/*
+ * Try and reuse skbs already allocated...
+ */
+static struct sk_buff *get_skb(struct sk_buff *skb, int len, gfp_t gfp)
+{
+ if (skb) {
+ BUG_ON(skb_cloned(skb));
+ skb_trim(skb, 0);
+ skb_get(skb);
+ } else {
+ skb = alloc_skb(len, gfp);
+ }
+ return skb;
+}
+
+static struct rtable *find_route(struct t3cdev *dev, __be32 local_ip,
+ __be32 peer_ip, __be16 local_port,
+ __be16 peer_port, u8 tos)
+{
+ struct rtable *rt;
+ struct flowi fl = {
+ .oif = 0,
+ .nl_u = {
+ .ip4_u = {
+ .daddr = peer_ip,
+ .saddr = local_ip,
+ .tos = tos}
+ },
+ .proto = IPPROTO_TCP,
+ .uli_u = {
+ .ports = {
+ .sport = local_port,
+ .dport = peer_port}
+ }
+ };
+
+ if (ip_route_output_flow(&rt, &fl, NULL, 0))
+ return NULL;
+ return rt;
+}
+
+static unsigned int find_best_mtu(const struct t3c_data *d, unsigned short mtu)
+{
+ int i = 0;
+
+ while (i < d->nmtus - 1 && d->mtus[i + 1] <= mtu)
+ ++i;
+ return i;
+}
+
+static void arp_failure_discard(struct t3cdev *dev, struct sk_buff *skb)
+{
+ PDBG("%s t3cdev %p\n", __FUNCTION__, dev);
+ kfree_skb(skb);
+}
+
+/*
+ * Handle an ARP failure for an active open.
+ */
+static void act_open_req_arp_failure(struct t3cdev *dev, struct sk_buff *skb)
+{
+ printk(KERN_ERR MOD "ARP failure duing connect\n");
+ kfree_skb(skb);
+}
+
+/*
+ * Handle an ARP failure for a CPL_ABORT_REQ. Change it into a no RST variant
+ * and send it along.
+ */
+static void abort_arp_failure(struct t3cdev *dev, struct sk_buff *skb)
+{
+ struct cpl_abort_req *req = cplhdr(skb);
+
+ PDBG("%s t3cdev %p\n", __FUNCTION__, dev);
+ req->cmd = CPL_ABORT_NO_RST;
+ cxgb3_ofld_send(dev, skb);
+}
+
+static int send_halfclose(struct iwch_ep *ep, gfp_t gfp)
+{
+ struct cpl_close_con_req *req;
+ struct sk_buff *skb;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ skb = get_skb(NULL, sizeof(*req), gfp);
+ if (!skb) {
+ printk(KERN_ERR MOD "%s - failed to alloc skb\n", __FUNCTION__);
+ return -ENOMEM;
+ }
+ skb->priority = CPL_PRIORITY_DATA;
+ set_arp_failure_handler(skb, arp_failure_discard);
+ req = (struct cpl_close_con_req *) skb_put(skb, sizeof(*req));
+ req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_CLOSE_CON));
+ req->wr.wr_lo = htonl(V_WR_TID(ep->hwtid));
+ OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_CLOSE_CON_REQ, ep->hwtid));
+ l2t_send(ep->com.tdev, skb, ep->l2t);
+ return 0;
+}
+
+static int send_abort(struct iwch_ep *ep, struct sk_buff *skb, gfp_t gfp)
+{
+ struct cpl_abort_req *req;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ skb = get_skb(skb, sizeof(*req), gfp);
+ if (!skb) {
+ printk(KERN_ERR MOD "%s - failed to alloc skb.\n",
+ __FUNCTION__);
+ return -ENOMEM;
+ }
+ skb->priority = CPL_PRIORITY_DATA;
+ set_arp_failure_handler(skb, abort_arp_failure);
+ req = (struct cpl_abort_req *) skb_put(skb, sizeof(*req));
+ req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_HOST_ABORT_CON_REQ));
+ req->wr.wr_lo = htonl(V_WR_TID(ep->hwtid));
+ OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_ABORT_REQ, ep->hwtid));
+ req->cmd = CPL_ABORT_SEND_RST;
+ l2t_send(ep->com.tdev, skb, ep->l2t);
+ return 0;
+}
+
+static int send_connect(struct iwch_ep *ep)
+{
+ struct cpl_act_open_req *req;
+ struct sk_buff *skb;
+ u32 opt0h, opt0l, opt2;
+ unsigned int mtu_idx;
+ int wscale;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+
+ skb = get_skb(NULL, sizeof(*req), GFP_KERNEL);
+ if (!skb) {
+ printk(KERN_ERR MOD "%s - failed to alloc skb.\n",
+ __FUNCTION__);
+ return -ENOMEM;
+ }
+ mtu_idx = find_best_mtu(T3C_DATA(ep->com.tdev), dst_mtu(ep->dst));
+ wscale = compute_wscale(rcv_win);
+ opt0h = V_NAGLE(0) |
+ V_NO_CONG(nocong) |
+ V_KEEP_ALIVE(1) |
+ F_TCAM_BYPASS |
+ V_WND_SCALE(wscale) |
+ V_MSS_IDX(mtu_idx) |
+ V_L2T_IDX(ep->l2t->idx) | V_TX_CHANNEL(ep->l2t->smt_idx);
+ opt0l = V_TOS((ep->tos >> 2) & M_TOS) | V_RCV_BUFSIZ(rcv_win>>10);
+ opt2 = V_FLAVORS_VALID(0) | V_CONG_CONTROL_FLAVOR(0);
+ skb->priority = CPL_PRIORITY_SETUP;
+ set_arp_failure_handler(skb, act_open_req_arp_failure);
+
+ req = (struct cpl_act_open_req *) skb_put(skb, sizeof(*req));
+ req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
+ OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_ACT_OPEN_REQ, ep->atid));
+ req->local_port = ep->com.local_addr.sin_port;
+ req->peer_port = ep->com.remote_addr.sin_port;
+ req->local_ip = ep->com.local_addr.sin_addr.s_addr;
+ req->peer_ip = ep->com.remote_addr.sin_addr.s_addr;
+ req->opt0h = htonl(opt0h);
+ req->opt0l = htonl(opt0l);
+ req->params = 0;
+ req->opt2 = htonl(opt2);
+ l2t_send(ep->com.tdev, skb, ep->l2t);
+ return 0;
+}
+
+static void send_mpa_req(struct iwch_ep *ep, struct sk_buff *skb)
+{
+ int mpalen;
+ struct tx_data_wr *req;
+ struct mpa_message *mpa;
+ int len;
+
+ PDBG("%s ep %p pd_len %d\n", __FUNCTION__, ep, ep->plen);
+
+ BUG_ON(skb_cloned(skb));
+
+ mpalen = sizeof(*mpa) + ep->plen;
+ if (skb->data + mpalen + sizeof(*req) > skb->end) {
+ kfree_skb(skb);
+ skb=alloc_skb(mpalen + sizeof(*req), GFP_KERNEL);
+ if (!skb) {
+ connect_reply_upcall(ep, -ENOMEM);
+ return;
+ }
+ }
+ skb_trim(skb, 0);
+ skb_reserve(skb, sizeof(*req));
+ skb_put(skb, mpalen);
+ skb->priority = CPL_PRIORITY_DATA;
+ mpa = (struct mpa_message *) skb->data;
+ memset(mpa, 0, sizeof(*mpa));
+ memcpy(mpa->key, MPA_KEY_REQ, sizeof(mpa->key));
+ mpa->flags = (crc_enabled ? MPA_CRC : 0) |
+ (markers_enabled ? MPA_MARKERS : 0);
+ mpa->private_data_size = htons(ep->plen);
+ mpa->revision = mpa_rev;
+
+ if (ep->plen)
+ memcpy(mpa->private_data, ep->mpa_pkt + sizeof(*mpa), ep->plen);
+
+ /*
+ * Reference the mpa skb. This ensures the data area
+ * will remain in memory until the hw acks the tx.
+ * Function tx_ack() will deref it.
+ */
+ skb_get(skb);
+ set_arp_failure_handler(skb, arp_failure_discard);
+ skb->h.raw = skb->data;
+ len = skb->len;
+ req = (struct tx_data_wr *) skb_push(skb, sizeof(*req));
+ req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA));
+ req->wr_lo = htonl(V_WR_TID(ep->hwtid));
+ req->len = htonl(len);
+ req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) |
+ V_TX_SNDBUF(snd_win>>15));
+ req->flags = htonl(F_TX_IMM_ACK|F_TX_INIT);
+ req->sndseq = htonl(ep->snd_seq);
+ BUG_ON(ep->mpa_skb);
+ ep->mpa_skb = skb;
+ l2t_send(ep->com.tdev, skb, ep->l2t);
+ start_ep_timer(ep);
+ state_set(&ep->com, MPA_REQ_SENT);
+ return;
+}
+
+static int send_mpa_reject(struct iwch_ep *ep, const void *pdata, u8 plen)
+{
+ int mpalen;
+ struct tx_data_wr *req;
+ struct mpa_message *mpa;
+ struct sk_buff *skb;
+
+ PDBG("%s ep %p plen %d\n", __FUNCTION__, ep, plen);
+
+ mpalen = sizeof(*mpa) + plen;
+
+ skb = get_skb(NULL, mpalen + sizeof(*req), GFP_KERNEL);
+ if (!skb) {
+ printk(KERN_ERR MOD "%s - cannot alloc skb!\n", __FUNCTION__);
+ return -ENOMEM;
+ }
+ skb_reserve(skb, sizeof(*req));
+ mpa = (struct mpa_message *) skb_put(skb, mpalen);
+ memset(mpa, 0, sizeof(*mpa));
+ memcpy(mpa->key, MPA_KEY_REP, sizeof(mpa->key));
+ mpa->flags = MPA_REJECT;
+ mpa->revision = mpa_rev;
+ mpa->private_data_size = htons(plen);
+ if (plen)
+ memcpy(mpa->private_data, pdata, plen);
+
+ /*
+ * Reference the mpa skb again. This ensures the data area
+ * will remain in memory until the hw acks the tx.
+ * Function tx_ack() will deref it.
+ */
+ skb_get(skb);
+ skb->priority = CPL_PRIORITY_DATA;
+ set_arp_failure_handler(skb, arp_failure_discard);
+ skb->h.raw = skb->data;
+ req = (struct tx_data_wr *) skb_push(skb, sizeof(*req));
+ req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA));
+ req->wr_lo = htonl(V_WR_TID(ep->hwtid));
+ req->len = htonl(mpalen);
+ req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) |
+ V_TX_SNDBUF(snd_win>>15));
+ req->flags = htonl(F_TX_IMM_ACK|F_TX_INIT);
+ req->sndseq = htonl(ep->snd_seq);
+ BUG_ON(ep->mpa_skb);
+ ep->mpa_skb = skb;
+ l2t_send(ep->com.tdev, skb, ep->l2t);
+ return 0;
+}
+
+static int send_mpa_reply(struct iwch_ep *ep, const void *pdata, u8 plen)
+{
+ int mpalen;
+ struct tx_data_wr *req;
+ struct mpa_message *mpa;
+ int len;
+ struct sk_buff *skb;
+
+ PDBG("%s ep %p plen %d\n", __FUNCTION__, ep, plen);
+
+ mpalen = sizeof(*mpa) + plen;
+
+ skb = get_skb(NULL, mpalen + sizeof(*req), GFP_KERNEL);
+ if (!skb) {
+ printk(KERN_ERR MOD "%s - cannot alloc skb!\n", __FUNCTION__);
+ return -ENOMEM;
+ }
+ skb->priority = CPL_PRIORITY_DATA;
+ skb_reserve(skb, sizeof(*req));
+ mpa = (struct mpa_message *) skb_put(skb, mpalen);
+ memset(mpa, 0, sizeof(*mpa));
+ memcpy(mpa->key, MPA_KEY_REP, sizeof(mpa->key));
+ mpa->flags = (ep->mpa_attr.crc_enabled ? MPA_CRC : 0) |
+ (markers_enabled ? MPA_MARKERS : 0);
+ mpa->revision = mpa_rev;
+ mpa->private_data_size = htons(plen);
+ if (plen)
+ memcpy(mpa->private_data, pdata, plen);
+
+ /*
+ * Reference the mpa skb. This ensures the data area
+ * will remain in memory until the hw acks the tx.
+ * Function tx_ack() will deref it.
+ */
+ skb_get(skb);
+ set_arp_failure_handler(skb, arp_failure_discard);
+ skb->h.raw = skb->data;
+ len = skb->len;
+ req = (struct tx_data_wr *) skb_push(skb, sizeof(*req));
+ req->wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_TX_DATA));
+ req->wr_lo = htonl(V_WR_TID(ep->hwtid));
+ req->len = htonl(len);
+ req->param = htonl(V_TX_PORT(ep->l2t->smt_idx) |
+ V_TX_SNDBUF(snd_win>>15));
+ req->flags = htonl(F_TX_MORE | F_TX_IMM_ACK | F_TX_INIT);
+ req->sndseq = htonl(ep->snd_seq);
+ ep->mpa_skb = skb;
+ state_set(&ep->com, MPA_REP_SENT);
+ l2t_send(ep->com.tdev, skb, ep->l2t);
+ return 0;
+}
+
+static int act_establish(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct iwch_ep *ep = ctx;
+ struct cpl_act_establish *req = cplhdr(skb);
+ unsigned int tid = GET_TID(req);
+
+ PDBG("%s ep %p tid %d\n", __FUNCTION__, ep, tid);
+
+ dst_confirm(ep->dst);
+
+ /* setup the hwtid for this connection */
+ ep->hwtid = tid;
+ cxgb3_insert_tid(ep->com.tdev, &t3c_client, ep, tid);
+
+ ep->snd_seq = ntohl(req->snd_isn);
+
+ set_emss(ep, ntohs(req->tcp_opt));
+
+ /* dealloc the atid */
+ cxgb3_free_atid(ep->com.tdev, ep->atid);
+
+ /* start MPA negotiation */
+ send_mpa_req(ep, skb);
+
+ return 0;
+}
+
+static void abort_connection(struct iwch_ep *ep, struct sk_buff *skb)
+{
+ PDBG("%s ep %p\n", __FILE__, ep);
+ state_set(&ep->com, ABORTING);
+ send_abort(ep, skb, GFP_KERNEL);
+}
+
+static void close_complete_upcall(struct iwch_ep *ep)
+{
+ struct iw_cm_event event;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ memset(&event, 0, sizeof(event));
+ event.event = IW_CM_EVENT_CLOSE;
+ if (ep->com.cm_id) {
+ PDBG("close complete delivered ep %p cm_id %p tid %d\n",
+ ep, ep->com.cm_id, ep->hwtid);
+ ep->com.cm_id->event_handler(ep->com.cm_id, &event);
+ ep->com.cm_id->rem_ref(ep->com.cm_id);
+ ep->com.cm_id = NULL;
+ ep->com.qp = NULL;
+ }
+}
+
+static void peer_close_upcall(struct iwch_ep *ep)
+{
+ struct iw_cm_event event;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ memset(&event, 0, sizeof(event));
+ event.event = IW_CM_EVENT_DISCONNECT;
+ if (ep->com.cm_id) {
+ PDBG("peer close delivered ep %p cm_id %p tid %d\n",
+ ep, ep->com.cm_id, ep->hwtid);
+ ep->com.cm_id->event_handler(ep->com.cm_id, &event);
+ }
+}
+
+static void peer_abort_upcall(struct iwch_ep *ep)
+{
+ struct iw_cm_event event;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ memset(&event, 0, sizeof(event));
+ event.event = IW_CM_EVENT_CLOSE;
+ event.status = -ECONNRESET;
+ if (ep->com.cm_id) {
+ PDBG("abort delivered ep %p cm_id %p tid %d\n", ep,
+ ep->com.cm_id, ep->hwtid);
+ ep->com.cm_id->event_handler(ep->com.cm_id, &event);
+ ep->com.cm_id->rem_ref(ep->com.cm_id);
+ ep->com.cm_id = NULL;
+ ep->com.qp = NULL;
+ }
+}
+
+static void connect_reply_upcall(struct iwch_ep *ep, int status)
+{
+ struct iw_cm_event event;
+
+ PDBG("%s ep %p status %d\n", __FUNCTION__, ep, status);
+ memset(&event, 0, sizeof(event));
+ event.event = IW_CM_EVENT_CONNECT_REPLY;
+ event.status = status;
+ event.local_addr = ep->com.local_addr;
+ event.remote_addr = ep->com.remote_addr;
+
+ if ((status == 0) || (status == -ECONNREFUSED)) {
+ event.private_data_len = ep->plen;
+ event.private_data = ep->mpa_pkt + sizeof(struct mpa_message);
+ }
+ if (ep->com.cm_id) {
+ PDBG("%s ep %p tid %d status %d\n", __FUNCTION__, ep,
+ ep->hwtid, status);
+ ep->com.cm_id->event_handler(ep->com.cm_id, &event);
+ }
+ if (status < 0) {
+ ep->com.cm_id->rem_ref(ep->com.cm_id);
+ ep->com.cm_id = NULL;
+ ep->com.qp = NULL;
+ }
+}
+
+static void connect_request_upcall(struct iwch_ep *ep)
+{
+ struct iw_cm_event event;
+
+ PDBG("%s ep %p tid %d\n", __FUNCTION__, ep, ep->hwtid);
+ memset(&event, 0, sizeof(event));
+ event.event = IW_CM_EVENT_CONNECT_REQUEST;
+ event.local_addr = ep->com.local_addr;
+ event.remote_addr = ep->com.remote_addr;
+ event.private_data_len = ep->plen;
+ event.private_data = ep->mpa_pkt + sizeof(struct mpa_message);
+ event.provider_data = ep;
+ if (state_read(&ep->parent_ep->com) != DEAD)
+ ep->parent_ep->com.cm_id->event_handler(
+ ep->parent_ep->com.cm_id,
+ &event);
+ put_ep(&ep->parent_ep->com);
+ ep->parent_ep = NULL;
+}
+
+static void established_upcall(struct iwch_ep *ep)
+{
+ struct iw_cm_event event;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ memset(&event, 0, sizeof(event));
+ event.event = IW_CM_EVENT_ESTABLISHED;
+ if (ep->com.cm_id) {
+ PDBG("%s ep %p tid %d\n", __FUNCTION__, ep, ep->hwtid);
+ ep->com.cm_id->event_handler(ep->com.cm_id, &event);
+ }
+}
+
+static int update_rx_credits(struct iwch_ep *ep, u32 credits)
+{
+ struct cpl_rx_data_ack *req;
+ struct sk_buff *skb;
+
+ PDBG("%s ep %p credits %u\n", __FUNCTION__, ep, credits);
+ skb = get_skb(NULL, sizeof(*req), GFP_KERNEL);
+ if (!skb) {
+ printk(KERN_ERR MOD "update_rx_credits - cannot alloc skb!\n");
+ return 0;
+ }
+
+ req = (struct cpl_rx_data_ack *) skb_put(skb, sizeof(*req));
+ req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
+ OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_RX_DATA_ACK, ep->hwtid));
+ req->credit_dack = htonl(V_RX_CREDITS(credits) | V_RX_FORCE_ACK(1));
+ skb->priority = CPL_PRIORITY_ACK;
+ ep->com.tdev->send(ep->com.tdev, skb);
+ return credits;
+}
+
+static void process_mpa_reply(struct iwch_ep *ep, struct sk_buff *skb)
+{
+ struct mpa_message *mpa;
+ u16 plen;
+ struct iwch_qp_attributes attrs;
+ enum iwch_qp_attr_mask mask;
+ int err;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+
+ /*
+ * Stop mpa timer. If it expired, then the state is
+ * CLOSING and we bail since ep_timeout already aborted
+ * the connection.
+ */
+ stop_ep_timer(ep);
+ if (state_read(&ep->com) == CLOSING)
+ return;
+ state_set(&ep->com, FPDU_MODE);
+
+ /*
+ * If we get more than the supported amount of private data
+ * then we must fail this connection.
+ */
+ if (ep->mpa_pkt_len + skb->len > sizeof(ep->mpa_pkt)) {
+ err = -EINVAL;
+ goto err;
+ }
+
+ /*
+ * copy the new data into our accumulation buffer.
+ */
+ memcpy(&(ep->mpa_pkt[ep->mpa_pkt_len]), skb->data, skb->len);
+ ep->mpa_pkt_len += skb->len;
+
+ /*
+ * if we don't even have the mpa message, then bail.
+ */
+ if (ep->mpa_pkt_len < sizeof(*mpa))
+ return;
+ mpa = (struct mpa_message *) ep->mpa_pkt;
+
+ /* Validate MPA header. */
+ if (mpa->revision != mpa_rev) {
+ err = -EPROTO;
+ goto err;
+ }
+ if (memcmp(mpa->key, MPA_KEY_REP, sizeof(mpa->key))) {
+ err = -EPROTO;
+ goto err;
+ }
+
+ plen = ntohs(mpa->private_data_size);
+
+ /*
+ * Fail if there's too much private data.
+ */
+ if (plen > MPA_MAX_PRIVATE_DATA) {
+ err = -EPROTO;
+ goto err;
+ }
+
+ /*
+ * If plen does not account for pkt size
+ */
+ if (ep->mpa_pkt_len > (sizeof(*mpa) + plen)) {
+ err = -EPROTO;
+ goto err;
+ }
+
+ ep->plen = (u8) plen;
+
+ /*
+ * If we don't have all the pdata yet, then bail.
+ * We'll continue process when more data arrives.
+ */
+ if (ep->mpa_pkt_len < (sizeof(*mpa) + plen))
+ return;
+
+ if (mpa->flags & MPA_REJECT) {
+ err = -ECONNREFUSED;
+ goto err;
+ }
+
+ /*
+ * If we get here we have accumulated the entire mpa
+ * start reply message including private data. And
+ * the MPA header is valid.
+ */
+
+ ep->mpa_attr.crc_enabled = (mpa->flags & MPA_CRC) | crc_enabled ? 1 : 0;
+ ep->mpa_attr.recv_marker_enabled = markers_enabled;
+ ep->mpa_attr.xmit_marker_enabled = mpa->flags & MPA_MARKERS ? 1 : 0;
+ ep->mpa_attr.version = mpa_rev;
+ PDBG("%s - crc_enabled=%d, recv_marker_enabled=%d, "
+ "xmit_marker_enabled=%d, version=%d\n", __FUNCTION__,
+ ep->mpa_attr.crc_enabled, ep->mpa_attr.recv_marker_enabled,
+ ep->mpa_attr.xmit_marker_enabled, ep->mpa_attr.version);
+
+ attrs.mpa_attr = ep->mpa_attr;
+ attrs.max_ird = ep->ird;
+ attrs.max_ord = ep->ord;
+ attrs.llp_stream_handle = ep;
+ attrs.next_state = IWCH_QP_STATE_RTS;
+
+ mask = IWCH_QP_ATTR_NEXT_STATE |
+ IWCH_QP_ATTR_LLP_STREAM_HANDLE | IWCH_QP_ATTR_MPA_ATTR |
+ IWCH_QP_ATTR_MAX_IRD | IWCH_QP_ATTR_MAX_ORD;
+
+ /* bind QP and TID with INIT_WR */
+ err = iwch_modify_qp(ep->com.qp->rhp,
+ ep->com.qp, mask, &attrs, 1);
+ if (!err)
+ goto out;
+err:
+ abort_connection(ep, skb);
+out:
+ connect_reply_upcall(ep, err);
+ return;
+}
+
+static void process_mpa_request(struct iwch_ep *ep, struct sk_buff *skb)
+{
+ struct mpa_message *mpa;
+ u16 plen;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+
+ /*
+ * Stop mpa timer. If it expired, then the state is
+ * CLOSING and we bail since ep_timeout already aborted
+ * the connection.
+ */
+ stop_ep_timer(ep);
+ if (state_read(&ep->com) == CLOSING)
+ return;
+
+ /*
+ * If we get more than the supported amount of private data
+ * then we must fail this connection.
+ */
+ if (ep->mpa_pkt_len + skb->len > sizeof(ep->mpa_pkt)) {
+ abort_connection(ep, skb);
+ return;
+ }
+
+ PDBG("%s enter (%s line %u)\n", __FUNCTION__, __FILE__, __LINE__);
+
+ /*
+ * Copy the new data into our accumulation buffer.
+ */
+ memcpy(&(ep->mpa_pkt[ep->mpa_pkt_len]), skb->data, skb->len);
+ ep->mpa_pkt_len += skb->len;
+
+ /*
+ * If we don't even have the mpa message, then bail.
+ * We'll continue process when more data arrives.
+ */
+ if (ep->mpa_pkt_len < sizeof(*mpa))
+ return;
+ PDBG("%s enter (%s line %u)\n", __FUNCTION__, __FILE__, __LINE__);
+ mpa = (struct mpa_message *) ep->mpa_pkt;
+
+ /*
+ * Validate MPA Header.
+ */
+ if (mpa->revision != mpa_rev) {
+ abort_connection(ep, skb);
+ return;
+ }
+
+ if (memcmp(mpa->key, MPA_KEY_REQ, sizeof(mpa->key))) {
+ abort_connection(ep, skb);
+ return;
+ }
+
+ plen = ntohs(mpa->private_data_size);
+
+ /*
+ * Fail if there's too much private data.
+ */
+ if (plen > MPA_MAX_PRIVATE_DATA) {
+ abort_connection(ep, skb);
+ return;
+ }
+
+ /*
+ * If plen does not account for pkt size
+ */
+ if (ep->mpa_pkt_len > (sizeof(*mpa) + plen)) {
+ abort_connection(ep, skb);
+ return;
+ }
+ ep->plen = (u8) plen;
+
+ /*
+ * If we don't have all the pdata yet, then bail.
+ */
+ if (ep->mpa_pkt_len < (sizeof(*mpa) + plen))
+ return;
+
+ /*
+ * If we get here we have accumulated the entire mpa
+ * start reply message including private data.
+ */
+ ep->mpa_attr.crc_enabled = (mpa->flags & MPA_CRC) | crc_enabled ? 1 : 0;
+ ep->mpa_attr.recv_marker_enabled = markers_enabled;
+ ep->mpa_attr.xmit_marker_enabled = mpa->flags & MPA_MARKERS ? 1 : 0;
+ ep->mpa_attr.version = mpa_rev;
+ PDBG("%s - crc_enabled=%d, recv_marker_enabled=%d, "
+ "xmit_marker_enabled=%d, version=%d\n", __FUNCTION__,
+ ep->mpa_attr.crc_enabled, ep->mpa_attr.recv_marker_enabled,
+ ep->mpa_attr.xmit_marker_enabled, ep->mpa_attr.version);
+
+ state_set(&ep->com, MPA_REQ_RCVD);
+
+ /* drive upcall */
+ connect_request_upcall(ep);
+ return;
+}
+
+static int rx_data(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct iwch_ep *ep = ctx;
+ struct cpl_rx_data *hdr = cplhdr(skb);
+ unsigned int dlen = ntohs(hdr->len);
+
+ PDBG("%s ep %p dlen %u\n", __FUNCTION__, ep, dlen);
+
+ skb_pull(skb, sizeof(*hdr));
+ skb_trim(skb, dlen);
+
+ switch (state_read(&ep->com)) {
+ case MPA_REQ_SENT:
+ process_mpa_reply(ep, skb);
+ break;
+ case MPA_REQ_WAIT:
+ process_mpa_request(ep, skb);
+ break;
+ case MPA_REP_SENT:
+ break;
+ default:
+ printk(KERN_ERR MOD "%s Unexpected streaming data."
+ " ep %p state %d tid %d\n",
+ __FUNCTION__, ep, state_read(&ep->com), ep->hwtid);
+
+ /*
+ * The ep will timeout and inform the ULP of the failure.
+ * See ep_timeout().
+ */
+ break;
+ }
+
+ /* update RX credits */
+ update_rx_credits(ep, dlen);
+
+ return CPL_RET_BUF_DONE;
+}
+
+/*
+ * Upcall from the adapter indicating data has been transmitted.
+ * For us its just the single MPA request or reply. We can now free
+ * the skb holding the mpa message.
+ */
+static int tx_ack(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct iwch_ep *ep = ctx;
+ struct cpl_wr_ack *hdr = cplhdr(skb);
+ unsigned int credits = ntohs(hdr->credits);
+ enum iwch_qp_attr_mask mask;
+
+ PDBG("%s ep %p credits %u\n", __FUNCTION__, ep, credits);
+
+ if (credits == 0)
+ return CPL_RET_BUF_DONE;
+ BUG_ON(credits != 1);
+ BUG_ON(ep->mpa_skb == NULL);
+ kfree_skb(ep->mpa_skb);
+ ep->mpa_skb = NULL;
+ dst_confirm(ep->dst);
+ if (state_read(&ep->com) == MPA_REP_SENT) {
+ struct iwch_qp_attributes attrs;
+
+ /* bind QP to EP and move to RTS */
+ attrs.mpa_attr = ep->mpa_attr;
+ attrs.max_ird = ep->ord;
+ attrs.max_ord = ep->ord;
+ attrs.llp_stream_handle = ep;
+ attrs.next_state = IWCH_QP_STATE_RTS;
+
+ /* bind QP and TID with INIT_WR */
+ mask = IWCH_QP_ATTR_NEXT_STATE |
+ IWCH_QP_ATTR_LLP_STREAM_HANDLE |
+ IWCH_QP_ATTR_MPA_ATTR |
+ IWCH_QP_ATTR_MAX_IRD |
+ IWCH_QP_ATTR_MAX_ORD;
+
+ ep->com.rpl_err = iwch_modify_qp(ep->com.qp->rhp,
+ ep->com.qp, mask, &attrs, 1);
+
+ if (!ep->com.rpl_err) {
+ state_set(&ep->com, FPDU_MODE);
+ established_upcall(ep);
+ }
+
+ ep->com.rpl_done = 1;
+ PDBG("waking up ep %p\n", ep);
+ wake_up(&ep->com.waitq);
+ }
+ return CPL_RET_BUF_DONE;
+}
+
+static int abort_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct iwch_ep *ep = ctx;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+
+ close_complete_upcall(ep);
+ release_ep_resources(ep);
+ return CPL_RET_BUF_DONE;
+}
+
+static int act_open_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct iwch_ep *ep = ctx;
+ struct cpl_act_open_rpl *rpl = cplhdr(skb);
+
+ PDBG("%s ep %p status %u errno %d\n", __FUNCTION__, ep, rpl->status,
+ status2errno(rpl->status));
+ connect_reply_upcall(ep, status2errno(rpl->status));
+ state_set(&ep->com, DEAD);
+ if (ep->com.tdev->type == T3B)
+ release_tid(ep->com.tdev, GET_TID(rpl), NULL);
+ cxgb3_free_atid(ep->com.tdev, ep->atid);
+ dst_release(ep->dst);
+ l2t_release(L2DATA(ep->com.tdev), ep->l2t);
+ put_ep(&ep->com);
+ return CPL_RET_BUF_DONE;
+}
+
+static int listen_start(struct iwch_listen_ep *ep)
+{
+ struct sk_buff *skb;
+ struct cpl_pass_open_req *req;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ skb = get_skb(NULL, sizeof(*req), GFP_KERNEL);
+ if (!skb) {
+ printk(KERN_ERR MOD "t3c_listen_start failed to alloc skb!\n");
+ return -ENOMEM;
+ }
+
+ req = (struct cpl_pass_open_req *) skb_put(skb, sizeof(*req));
+ req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
+ OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_PASS_OPEN_REQ, ep->stid));
+ req->local_port = ep->com.local_addr.sin_port;
+ req->local_ip = ep->com.local_addr.sin_addr.s_addr;
+ req->peer_port = 0;
+ req->peer_ip = 0;
+ req->peer_netmask = 0;
+ req->opt0h = htonl(F_DELACK | F_TCAM_BYPASS);
+ req->opt0l = htonl(V_RCV_BUFSIZ(rcv_win>>10));
+ req->opt1 = htonl(V_CONN_POLICY(CPL_CONN_POLICY_ASK));
+
+ skb->priority = 1;
+ ep->com.tdev->send(ep->com.tdev, skb);
+ return 0;
+}
+
+static int pass_open_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct iwch_listen_ep *ep = ctx;
+ struct cpl_pass_open_rpl *rpl = cplhdr(skb);
+
+ PDBG("%s ep %p status %d error %d\n", __FUNCTION__, ep,
+ rpl->status, status2errno(rpl->status));
+ ep->com.rpl_err = status2errno(rpl->status);
+ ep->com.rpl_done = 1;
+ wake_up(&ep->com.waitq);
+
+ return CPL_RET_BUF_DONE;
+}
+
+static int listen_stop(struct iwch_listen_ep *ep)
+{
+ struct sk_buff *skb;
+ struct cpl_close_listserv_req *req;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ skb = get_skb(NULL, sizeof(*req), GFP_KERNEL);
+ if (!skb) {
+ printk(KERN_ERR MOD "%s - failed to alloc skb\n", __FUNCTION__);
+ return -ENOMEM;
+ }
+ req = (struct cpl_close_listserv_req *) skb_put(skb, sizeof(*req));
+ req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
+ OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_CLOSE_LISTSRV_REQ, ep->stid));
+ skb->priority = 1;
+ ep->com.tdev->send(ep->com.tdev, skb);
+ return 0;
+}
+
+static int close_listsrv_rpl(struct t3cdev *tdev, struct sk_buff *skb,
+ void *ctx)
+{
+ struct iwch_listen_ep *ep = ctx;
+ struct cpl_close_listserv_rpl *rpl = cplhdr(skb);
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ ep->com.rpl_err = status2errno(rpl->status);
+ ep->com.rpl_done = 1;
+ wake_up(&ep->com.waitq);
+ return CPL_RET_BUF_DONE;
+}
+
+static void accept_cr(struct iwch_ep *ep, __be32 peer_ip, struct sk_buff *skb)
+{
+ struct cpl_pass_accept_rpl *rpl;
+ unsigned int mtu_idx;
+ u32 opt0h, opt0l, opt2;
+ int wscale;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ BUG_ON(skb_cloned(skb));
+ skb_trim(skb, sizeof(*rpl));
+ skb_get(skb);
+ mtu_idx = find_best_mtu(T3C_DATA(ep->com.tdev), dst_mtu(ep->dst));
+ wscale = compute_wscale(rcv_win);
+ opt0h = V_NAGLE(0) |
+ V_NO_CONG(nocong) |
+ V_KEEP_ALIVE(1) |
+ F_TCAM_BYPASS |
+ V_WND_SCALE(wscale) |
+ V_MSS_IDX(mtu_idx) |
+ V_L2T_IDX(ep->l2t->idx) | V_TX_CHANNEL(ep->l2t->smt_idx);
+ opt0l = V_TOS((ep->tos >> 2) & M_TOS) | V_RCV_BUFSIZ(rcv_win>>10);
+ opt2 = V_FLAVORS_VALID(0) | V_CONG_CONTROL_FLAVOR(0);
+
+ rpl = cplhdr(skb);
+ rpl->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
+ OPCODE_TID(rpl) = htonl(MK_OPCODE_TID(CPL_PASS_ACCEPT_RPL, ep->hwtid));
+ rpl->peer_ip = peer_ip;
+ rpl->opt0h = htonl(opt0h);
+ rpl->opt0l_status = htonl(opt0l | CPL_PASS_OPEN_ACCEPT);
+ rpl->opt2 = htonl(opt2);
+ rpl->rsvd = rpl->opt2; /* workaround for HW bug */
+ skb->priority = CPL_PRIORITY_SETUP;
+ l2t_send(ep->com.tdev, skb, ep->l2t);
+
+ return;
+}
+
+static void reject_cr(struct t3cdev *tdev, u32 hwtid, __be32 peer_ip,
+ struct sk_buff *skb)
+{
+ PDBG("%s t3cdev %p tid %u peer_ip %x\n", __FUNCTION__, tdev, hwtid,
+ peer_ip);
+ BUG_ON(skb_cloned(skb));
+ skb_trim(skb, sizeof(struct cpl_tid_release));
+ skb_get(skb);
+
+ if (tdev->type == T3B)
+ release_tid(tdev, hwtid, skb);
+ else {
+ struct cpl_pass_accept_rpl *rpl;
+
+ rpl = cplhdr(skb);
+ skb->priority = CPL_PRIORITY_SETUP;
+ rpl->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
+ OPCODE_TID(rpl) = htonl(MK_OPCODE_TID(CPL_PASS_ACCEPT_RPL,
+ hwtid));
+ rpl->peer_ip = peer_ip;
+ rpl->opt0h = htonl(F_TCAM_BYPASS);
+ rpl->opt0l_status = htonl(CPL_PASS_OPEN_REJECT);
+ rpl->opt2 = 0;
+ rpl->rsvd = rpl->opt2;
+ tdev->send(tdev, skb);
+ }
+}
+
+static int pass_accept_req(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct iwch_ep *child_ep, *parent_ep = ctx;
+ struct cpl_pass_accept_req *req = cplhdr(skb);
+ unsigned int hwtid = GET_TID(req);
+ struct dst_entry *dst;
+ struct l2t_entry *l2t;
+ struct rtable *rt;
+ struct iff_mac tim;
+
+ PDBG("%s parent ep %p tid %u\n", __FUNCTION__, parent_ep, hwtid);
+
+ if (state_read(&parent_ep->com) != LISTEN) {
+ printk(KERN_ERR "%s - listening ep not in LISTEN\n",
+ __FUNCTION__);
+ goto reject;
+ }
+
+ /*
+ * Find the netdev for this connection request.
+ */
+ tim.mac_addr = req->dst_mac;
+ tim.vlan_tag = ntohs(req->vlan_tag);
+ if (tdev->ctl(tdev, GET_IFF_FROM_MAC, &tim) < 0 || !tim.dev) {
+ printk(KERN_ERR
+ "%s bad dst mac %02x %02x %02x %02x %02x %02x\n",
+ __FUNCTION__,
+ req->dst_mac[0],
+ req->dst_mac[1],
+ req->dst_mac[2],
+ req->dst_mac[3],
+ req->dst_mac[4],
+ req->dst_mac[5]);
+ goto reject;
+ }
+
+ /* Find output route */
+ rt = find_route(tdev,
+ req->local_ip,
+ req->peer_ip,
+ req->local_port,
+ req->peer_port, G_PASS_OPEN_TOS(ntohl(req->tos_tid)));
+ if (!rt) {
+ printk(KERN_ERR MOD "%s - failed to find dst entry!\n",
+ __FUNCTION__);
+ goto reject;
+ }
+ dst = &rt->u.dst;
+ l2t = t3_l2t_get(tdev, dst->neighbour, dst->neighbour->dev->if_port);
+ if (!l2t) {
+ printk(KERN_ERR MOD "%s - failed to allocate l2t entry!\n",
+ __FUNCTION__);
+ dst_release(dst);
+ goto reject;
+ }
+ child_ep = alloc_ep(sizeof(*child_ep), GFP_KERNEL);
+ if (!child_ep) {
+ printk(KERN_ERR MOD "%s - failed to allocate ep entry!\n",
+ __FUNCTION__);
+ l2t_release(L2DATA(tdev), l2t);
+ dst_release(dst);
+ goto reject;
+ }
+ state_set(&child_ep->com, CONNECTING);
+ child_ep->com.tdev = tdev;
+ child_ep->com.cm_id = NULL;
+ child_ep->com.local_addr.sin_family = PF_INET;
+ child_ep->com.local_addr.sin_port = req->local_port;
+ child_ep->com.local_addr.sin_addr.s_addr = req->local_ip;
+ child_ep->com.remote_addr.sin_family = PF_INET;
+ child_ep->com.remote_addr.sin_port = req->peer_port;
+ child_ep->com.remote_addr.sin_addr.s_addr = req->peer_ip;
+ get_ep(&parent_ep->com);
+ child_ep->parent_ep = parent_ep;
+ child_ep->tos = G_PASS_OPEN_TOS(ntohl(req->tos_tid));
+ child_ep->l2t = l2t;
+ child_ep->dst = dst;
+ child_ep->hwtid = hwtid;
+ init_timer(&child_ep->timer);
+ cxgb3_insert_tid(tdev, &t3c_client, child_ep, hwtid);
+ accept_cr(child_ep, req->peer_ip, skb);
+ goto out;
+reject:
+ reject_cr(tdev, hwtid, req->peer_ip, skb);
+out:
+ return CPL_RET_BUF_DONE;
+}
+
+static int pass_establish(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct iwch_ep *ep = ctx;
+ struct cpl_pass_establish *req = cplhdr(skb);
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ ep->snd_seq = ntohl(req->snd_isn);
+
+ set_emss(ep, ntohs(req->tcp_opt));
+
+ dst_confirm(ep->dst);
+ state_set(&ep->com, MPA_REQ_WAIT);
+ start_ep_timer(ep);
+
+ return CPL_RET_BUF_DONE;
+}
+
+static int peer_close(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct iwch_ep *ep = ctx;
+ struct iwch_qp_attributes attrs;
+ int ret;
+ int abort = 0;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ dst_confirm(ep->dst);
+ switch (state_read(&ep->com)) {
+ case MPA_REQ_WAIT:
+ state_set(&ep->com, CLOSING);
+ break;
+ case MPA_REQ_SENT:
+ state_set(&ep->com, CLOSING);
+ connect_reply_upcall(ep, -ECONNRESET);
+ break;
+ case MPA_REQ_RCVD:
+
+ /*
+ * We're gonna mark this puppy DEAD, but keep
+ * the reference on it until the ULP accepts or
+ * rejects the CR.
+ */
+ state_set(&ep->com, CLOSING);
+ get_ep(&ep->com);
+ break;
+ case MPA_REP_SENT:
+ state_set(&ep->com, CLOSING);
+ ep->com.rpl_done = 1;
+ ep->com.rpl_err = -ECONNRESET;
+ PDBG("waking up ep %p\n", ep);
+ wake_up(&ep->com.waitq);
+ break;
+ case FPDU_MODE:
+ state_set(&ep->com, CLOSING);
+ peer_close_upcall(ep);
+ attrs.next_state = IWCH_QP_STATE_CLOSING;
+ ret = iwch_modify_qp(ep->com.qp->rhp,
+ ep->com.qp, IWCH_QP_ATTR_NEXT_STATE,
+ &attrs, 1);
+ if (ret) {
+ printk(KERN_ERR MOD "%s - qp <- closing err!\n",
+ __FUNCTION__);
+ abort = 1;
+ }
+ break;
+ case ABORTING:
+ goto out;
+ case CLOSING:
+ start_ep_timer(ep);
+ state_set(&ep->com, MORIBUND);
+ goto out;
+ case MORIBUND:
+ stop_ep_timer(ep);
+ if (ep->com.cm_id && ep->com.qp) {
+ attrs.next_state = IWCH_QP_STATE_IDLE;
+ iwch_modify_qp(ep->com.qp->rhp,
+ ep->com.qp, IWCH_QP_ATTR_NEXT_STATE,
+ &attrs, 1);
+ }
+ close_complete_upcall(ep);
+ release_ep_resources(ep);
+ goto out;
+ case DEAD:
+ goto out;
+ default:
+ BUG_ON(1);
+ }
+ iwch_ep_disconnect(ep, abort, GFP_KERNEL);
+out:
+ return CPL_RET_BUF_DONE;
+}
+
+/*
+ * Returns whether an ABORT_REQ_RSS message is a negative advice.
+ */
+static inline int is_neg_adv_abort(unsigned int status)
+{
+ return status == CPL_ERR_RTX_NEG_ADVICE ||
+ status == CPL_ERR_PERSIST_NEG_ADVICE;
+}
+
+static int peer_abort(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct cpl_abort_req_rss *req = cplhdr(skb);
+ struct iwch_ep *ep = ctx;
+ struct cpl_abort_rpl *rpl;
+ struct sk_buff *rpl_skb;
+ struct iwch_qp_attributes attrs;
+ int ret;
+ int state;
+
+ if (is_neg_adv_abort(req->status)) {
+ PDBG("%s neg_adv_abort ep %p tid %d\n", __FUNCTION__, ep,
+ ep->hwtid);
+ t3_l2t_send_event(ep->com.tdev, ep->l2t);
+ return CPL_RET_BUF_DONE;
+ }
+
+ state = state_read(&ep->com);
+ PDBG("%s ep %p state %u\n", __FUNCTION__, ep, state);
+ switch (state) {
+ case CONNECTING:
+ break;
+ case MPA_REQ_WAIT:
+ break;
+ case MPA_REQ_SENT:
+ connect_reply_upcall(ep, -ECONNRESET);
+ break;
+ case MPA_REP_SENT:
+ ep->com.rpl_done = 1;
+ ep->com.rpl_err = -ECONNRESET;
+ PDBG("waking up ep %p\n", ep);
+ wake_up(&ep->com.waitq);
+ break;
+ case MPA_REQ_RCVD:
+
+ /*
+ * We're gonna mark this puppy DEAD, but keep
+ * the reference on it until the ULP accepts or
+ * rejects the CR.
+ */
+ get_ep(&ep->com);
+ break;
+ case MORIBUND:
+ stop_ep_timer(ep);
+ case FPDU_MODE:
+ case CLOSING:
+ if (ep->com.cm_id && ep->com.qp) {
+ attrs.next_state = IWCH_QP_STATE_ERROR;
+ ret = iwch_modify_qp(ep->com.qp->rhp,
+ ep->com.qp, IWCH_QP_ATTR_NEXT_STATE,
+ &attrs, 1);
+ if (ret)
+ printk(KERN_ERR MOD
+ "%s - qp <- error failed!\n",
+ __FUNCTION__);
+ }
+ peer_abort_upcall(ep);
+ break;
+ case ABORTING:
+ break;
+ case DEAD:
+ PDBG("%s PEER_ABORT IN DEAD STATE!!!!\n", __FUNCTION__);
+ return CPL_RET_BUF_DONE;
+ default:
+ BUG_ON(1);
+ break;
+ }
+ dst_confirm(ep->dst);
+
+ rpl_skb = get_skb(skb, sizeof(*rpl), GFP_KERNEL);
+ if (!rpl_skb) {
+ printk(KERN_ERR MOD "%s - cannot allocate skb!\n",
+ __FUNCTION__);
+ dst_release(ep->dst);
+ l2t_release(L2DATA(ep->com.tdev), ep->l2t);
+ put_ep(&ep->com);
+ return CPL_RET_BUF_DONE;
+ }
+ rpl_skb->priority = CPL_PRIORITY_DATA;
+ rpl = (struct cpl_abort_rpl *) skb_put(rpl_skb, sizeof(*rpl));
+ rpl->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_HOST_ABORT_CON_RPL));
+ rpl->wr.wr_lo = htonl(V_WR_TID(ep->hwtid));
+ OPCODE_TID(rpl) = htonl(MK_OPCODE_TID(CPL_ABORT_RPL, ep->hwtid));
+ rpl->cmd = CPL_ABORT_NO_RST;
+ ep->com.tdev->send(ep->com.tdev, rpl_skb);
+ if (state != ABORTING)
+ release_ep_resources(ep);
+ return CPL_RET_BUF_DONE;
+}
+
+static int close_con_rpl(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct iwch_ep *ep = ctx;
+ struct iwch_qp_attributes attrs;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ BUG_ON(!ep);
+
+ /* The cm_id may be null if we failed to connect */
+ switch (state_read(&ep->com)) {
+ case CLOSING:
+ start_ep_timer(ep);
+ state_set(&ep->com, MORIBUND);
+ break;
+ case MORIBUND:
+ stop_ep_timer(ep);
+ if ((ep->com.cm_id) && (ep->com.qp)) {
+ attrs.next_state = IWCH_QP_STATE_IDLE;
+ iwch_modify_qp(ep->com.qp->rhp,
+ ep->com.qp,
+ IWCH_QP_ATTR_NEXT_STATE,
+ &attrs, 1);
+ }
+ close_complete_upcall(ep);
+ release_ep_resources(ep);
+ break;
+ case DEAD:
+ default:
+ BUG_ON(1);
+ break;
+ }
+
+ return CPL_RET_BUF_DONE;
+}
+
+/*
+ * T3A does 3 things when a TERM is received:
+ * 1) send up a CPL_RDMA_TERMINATE message with the TERM packet
+ * 2) generate an async event on the QP with the TERMINATE opcode
+ * 3) post a TERMINATE opcde cqe into the associated CQ.
+ *
+ * For (1), we save the message in the qp for later consumer consumption.
+ * For (2), we move the QP into TERMINATE, post a QP event and disconnect.
+ * For (3), we toss the CQE in cxio_poll_cq().
+ *
+ * terminate() handles case (1)...
+ */
+static int terminate(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct iwch_ep *ep = ctx;
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ skb_pull(skb, sizeof(struct cpl_rdma_terminate));
+ PDBG("%s saving %d bytes of term msg\n", __FUNCTION__, skb->len);
+ memcpy(ep->com.qp->attr.terminate_buffer, skb->data, skb->len);
+ ep->com.qp->attr.terminate_msg_len = skb->len;
+ ep->com.qp->attr.is_terminate_local = 0;
+ return CPL_RET_BUF_DONE;
+}
+
+static int ec_status(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct cpl_rdma_ec_status *rep = cplhdr(skb);
+ struct iwch_ep *ep = ctx;
+
+ PDBG("%s ep %p tid %u status %d\n", __FUNCTION__, ep, ep->hwtid,
+ rep->status);
+ if (rep->status) {
+ struct iwch_qp_attributes attrs;
+
+ printk(KERN_ERR MOD "%s BAD CLOSE - Aborting tid %u\n",
+ __FUNCTION__, ep->hwtid);
+ attrs.next_state = IWCH_QP_STATE_ERROR;
+ iwch_modify_qp(ep->com.qp->rhp,
+ ep->com.qp, IWCH_QP_ATTR_NEXT_STATE,
+ &attrs, 1);
+ abort_connection(ep, NULL);
+ }
+ return CPL_RET_BUF_DONE;
+}
+
+static void ep_timeout(unsigned long arg)
+{
+ struct iwch_ep *ep = (struct iwch_ep *)arg;
+ struct iwch_qp_attributes attrs;
+
+ PDBG("%s ep %p tid %u\n", __FUNCTION__, ep, ep->hwtid);
+ if (state_comp_exch(&ep->com, MPA_REQ_SENT, CLOSING)) {
+ struct sk_buff *skb;
+
+ connect_reply_upcall(ep, -ETIMEDOUT);
+ skb = alloc_skb(sizeof(struct cpl_abort_req), GFP_ATOMIC);
+ if (skb)
+ abort_connection(ep, skb);
+ }
+ if (state_comp_exch(&ep->com, MPA_REQ_WAIT, CLOSING)) {
+ struct sk_buff *skb;
+
+ skb = alloc_skb(sizeof(struct cpl_abort_req), GFP_ATOMIC);
+ if (skb)
+ abort_connection(ep, skb);
+ }
+ if (state_comp_exch(&ep->com, MORIBUND, ABORTING)) {
+ struct sk_buff *skb;
+
+ if (ep->com.cm_id && ep->com.qp) {
+ attrs.next_state = IWCH_QP_STATE_ERROR;
+ iwch_modify_qp(ep->com.qp->rhp,
+ ep->com.qp, IWCH_QP_ATTR_NEXT_STATE,
+ &attrs, 1);
+ }
+ skb = alloc_skb(sizeof(struct cpl_abort_req), GFP_ATOMIC);
+ if (skb)
+ abort_connection(ep, skb);
+ }
+ put_ep(&ep->com);
+}
+
+int iwch_reject_cr(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len)
+{
+ int err;
+ struct iwch_ep *ep = to_ep(cm_id);
+ PDBG("%s ep %p tid %u\n", __FUNCTION__, ep, ep->hwtid);
+
+ if (state_read(&ep->com) == DEAD) {
+ put_ep(&ep->com);
+ return -ECONNRESET;
+ }
+ BUG_ON(state_read(&ep->com) != MPA_REQ_RCVD);
+ state_set(&ep->com, CLOSING);
+ if (mpa_rev == 0)
+ abort_connection(ep, NULL);
+ else {
+ err = send_mpa_reject(ep, pdata, pdata_len);
+ err = send_halfclose(ep, GFP_KERNEL);
+ }
+ return 0;
+}
+
+int iwch_accept_cr(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
+{
+ int err;
+ struct iwch_qp_attributes attrs;
+ enum iwch_qp_attr_mask mask;
+ struct iwch_ep *ep = to_ep(cm_id);
+ struct iwch_dev *h = to_iwch_dev(cm_id->device);
+ struct iwch_qp *qp = get_qhp(h, conn_param->qpn);
+
+ PDBG("%s ep %p tid %u\n", __FUNCTION__, ep, ep->hwtid);
+ if (state_read(&ep->com) == DEAD) {
+ put_ep(&ep->com);
+ return -ECONNRESET;
+ }
+
+ BUG_ON(state_read(&ep->com) != MPA_REQ_RCVD);
+ BUG_ON(!qp);
+
+ if ((conn_param->ord > qp->rhp->attr.max_rdma_read_qp_depth) ||
+ (conn_param->ird > qp->rhp->attr.max_rdma_reads_per_qp)) {
+ abort_connection(ep, NULL);
+ return -EINVAL;
+ }
+
+ cm_id->add_ref(cm_id);
+ ep->com.cm_id = cm_id;
+ ep->com.qp = qp;
+
+ ep->com.rpl_done = 0;
+ ep->com.rpl_err = 0;
+ ep->ird = conn_param->ird;
+ ep->ord = conn_param->ord;
+ PDBG("%s %d ird %d ord %d\n", __FUNCTION__, __LINE__, ep->ird, ep->ord);
+ get_ep(&ep->com);
+ err = send_mpa_reply(ep, conn_param->private_data,
+ conn_param->private_data_len);
+ if (err) {
+ ep->com.cm_id = NULL;
+ ep->com.qp = NULL;
+ cm_id->rem_ref(cm_id);
+ abort_connection(ep, NULL);
+ put_ep(&ep->com);
+ return err;
+ }
+
+ /* bind QP to EP and move to RTS */
+ attrs.mpa_attr = ep->mpa_attr;
+ attrs.max_ird = ep->ord;
+ attrs.max_ord = ep->ord;
+ attrs.llp_stream_handle = ep;
+ attrs.next_state = IWCH_QP_STATE_RTS;
+
+ /* bind QP and TID with INIT_WR */
+ mask = IWCH_QP_ATTR_NEXT_STATE |
+ IWCH_QP_ATTR_LLP_STREAM_HANDLE |
+ IWCH_QP_ATTR_MPA_ATTR |
+ IWCH_QP_ATTR_MAX_IRD |
+ IWCH_QP_ATTR_MAX_ORD;
+
+ err = iwch_modify_qp(ep->com.qp->rhp,
+ ep->com.qp, mask, &attrs, 1);
+
+ if (err) {
+ ep->com.cm_id = NULL;
+ ep->com.qp = NULL;
+ cm_id->rem_ref(cm_id);
+ abort_connection(ep, NULL);
+ } else {
+ state_set(&ep->com, FPDU_MODE);
+ established_upcall(ep);
+ }
+ put_ep(&ep->com);
+ return err;
+}
+
+int iwch_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
+{
+ int err = 0;
+ struct iwch_dev *h = to_iwch_dev(cm_id->device);
+ struct iwch_ep *ep;
+ struct rtable *rt;
+
+ ep = alloc_ep(sizeof(*ep), GFP_KERNEL);
+ if (!ep) {
+ printk(KERN_ERR MOD "%s - cannot alloc ep.\n", __FUNCTION__);
+ err = -ENOMEM;
+ goto out;
+ }
+ init_timer(&ep->timer);
+ ep->plen = conn_param->private_data_len;
+ if (ep->plen)
+ memcpy(ep->mpa_pkt + sizeof(struct mpa_message),
+ conn_param->private_data, ep->plen);
+ ep->ird = conn_param->ird;
+ ep->ord = conn_param->ord;
+ ep->com.tdev = h->rdev.t3cdev_p;
+
+ cm_id->add_ref(cm_id);
+ ep->com.cm_id = cm_id;
+ ep->com.qp = get_qhp(h, conn_param->qpn);
+ BUG_ON(!ep->com.qp);
+ PDBG("%s qpn 0x%x qp %p cm_id %p\n", __FUNCTION__, conn_param->qpn,
+ ep->com.qp, cm_id);
+
+ /*
+ * Allocate an active TID to initiate a TCP connection.
+ */
+ ep->atid = cxgb3_alloc_atid(h->rdev.t3cdev_p, &t3c_client, ep);
+ if (ep->atid == -1) {
+ printk(KERN_ERR MOD "%s - cannot alloc atid.\n", __FUNCTION__);
+ err = -ENOMEM;
+ goto fail2;
+ }
+
+ /* find a route */
+ rt = find_route(h->rdev.t3cdev_p,
+ cm_id->local_addr.sin_addr.s_addr,
+ cm_id->remote_addr.sin_addr.s_addr,
+ cm_id->local_addr.sin_port,
+ cm_id->remote_addr.sin_port, IPTOS_LOWDELAY);
+ if (!rt) {
+ printk(KERN_ERR MOD "%s - cannot find route.\n", __FUNCTION__);
+ err = -EHOSTUNREACH;
+ goto fail3;
+ }
+ ep->dst = &rt->u.dst;
+
+ /* get a l2t entry */
+ ep->l2t = t3_l2t_get(ep->com.tdev,
+ ep->dst->neighbour,
+ ep->dst->neighbour->dev->if_port);
+ if (!ep->l2t) {
+ printk(KERN_ERR MOD "%s - cannot alloc l2e.\n", __FUNCTION__);
+ err = -ENOMEM;
+ goto fail4;
+ }
+
+ state_set(&ep->com, CONNECTING);
+ ep->tos = IPTOS_LOWDELAY;
+ ep->com.local_addr = cm_id->local_addr;
+ ep->com.remote_addr = cm_id->remote_addr;
+
+ /* send connect request to rnic */
+ err = send_connect(ep);
+ if (!err)
+ goto out;
+
+ l2t_release(L2DATA(h->rdev.t3cdev_p), ep->l2t);
+fail4:
+ dst_release(ep->dst);
+fail3:
+ cxgb3_free_atid(ep->com.tdev, ep->atid);
+fail2:
+ put_ep(&ep->com);
+out:
+ return err;
+}
+
+int iwch_create_listen(struct iw_cm_id *cm_id, int backlog)
+{
+ int err = 0;
+ struct iwch_dev *h = to_iwch_dev(cm_id->device);
+ struct iwch_listen_ep *ep;
+
+
+ might_sleep();
+
+ ep = alloc_ep(sizeof(*ep), GFP_KERNEL);
+ if (!ep) {
+ printk(KERN_ERR MOD "%s - cannot alloc ep.\n", __FUNCTION__);
+ err = -ENOMEM;
+ goto fail1;
+ }
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+ ep->com.tdev = h->rdev.t3cdev_p;
+ cm_id->add_ref(cm_id);
+ ep->com.cm_id = cm_id;
+ ep->backlog = backlog;
+ ep->com.local_addr = cm_id->local_addr;
+
+ /*
+ * Allocate a server TID.
+ */
+ ep->stid = cxgb3_alloc_stid(h->rdev.t3cdev_p, &t3c_client, ep);
+ if (ep->stid == -1) {
+ printk(KERN_ERR MOD "%s - cannot alloc atid.\n", __FUNCTION__);
+ err = -ENOMEM;
+ goto fail2;
+ }
+
+ state_set(&ep->com, LISTEN);
+ err = listen_start(ep);
+ if (err)
+ goto fail3;
+
+ /* wait for pass_open_rpl */
+ wait_event(ep->com.waitq, ep->com.rpl_done);
+ err = ep->com.rpl_err;
+ if (!err) {
+ cm_id->provider_data = ep;
+ goto out;
+ }
+fail3:
+ cxgb3_free_stid(ep->com.tdev, ep->stid);
+fail2:
+ put_ep(&ep->com);
+fail1:
+out:
+ return err;
+}
+
+int iwch_destroy_listen(struct iw_cm_id *cm_id)
+{
+ int err;
+ struct iwch_listen_ep *ep = to_listen_ep(cm_id);
+
+ PDBG("%s ep %p\n", __FUNCTION__, ep);
+
+ might_sleep();
+ state_set(&ep->com, DEAD);
+ ep->com.rpl_done = 0;
+ ep->com.rpl_err = 0;
+ err = listen_stop(ep);
+ wait_event(ep->com.waitq, ep->com.rpl_done);
+ cxgb3_free_stid(ep->com.tdev, ep->stid);
+ err = ep->com.rpl_err;
+ cm_id->rem_ref(cm_id);
+ put_ep(&ep->com);
+ return err;
+}
+
+int iwch_ep_disconnect(struct iwch_ep *ep, int abrupt, gfp_t gfp)
+{
+ int ret=0;
+ int state;
+
+
+ state = state_read(&ep->com);
+ PDBG("%s ep %p state %s, abrupt %d\n", __FUNCTION__, ep,
+ states[state], abrupt);
+ if (state == DEAD) {
+ PDBG("%s already dead ep %p\n", __FUNCTION__, ep);
+ return 0;
+ }
+ if (abrupt) {
+ if (state != ABORTING) {
+ state_set(&ep->com, ABORTING);
+ ret = send_abort(ep, NULL, gfp);
+ }
+ } else {
+
+ if (state != CLOSING)
+ state_set(&ep->com, CLOSING);
+ else {
+ start_ep_timer(ep);
+ state_set(&ep->com, MORIBUND);
+ }
+
+ ret = send_halfclose(ep, gfp);
+ }
+ return ret;
+}
+
+int iwch_ep_redirect(void *ctx, struct dst_entry *old, struct dst_entry *new,
+ struct l2t_entry *l2t)
+{
+ struct iwch_ep *ep = ctx;
+
+ if (ep->dst != old)
+ return 0;
+
+ PDBG("%s ep %p redirect to dst %p l2t %p\n", __FUNCTION__, ep, new,
+ l2t);
+ dst_hold(new);
+ l2t_release(L2DATA(ep->com.tdev), ep->l2t);
+ ep->l2t = l2t;
+ dst_release(old);
+ ep->dst = new;
+ return 1;
+}
+
+/*
+ * All the CM events are handled on a work queue to have a safe context.
+ */
+static int sched(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
+{
+ struct iwch_ep_common *epc = ctx;
+
+ get_ep(epc);
+
+ /*
+ * Save ctx and tdev in the skb->cb area.
+ */
+ *((void **) skb->cb) = ctx;
+ *((struct t3cdev **) (skb->cb + sizeof(void *))) = tdev;
+
+ /*
+ * Queue the skb and schedule the worker thread.
+ */
+ skb_queue_tail(&rxq, skb);
+ queue_work(workq, &skb_work);
+ return 0;
+}
+
+int __init iwch_cm_init(void)
+{
+ skb_queue_head_init(&rxq);
+
+ workq = create_singlethread_workqueue("iw_cxgb3");
+ if (!workq)
+ return -ENOMEM;
+
+ /*
+ * All upcalls from the T3 Core go to sched() to
+ * schedule the processing on a work queue.
+ */
+ t3c_handlers[CPL_ACT_ESTABLISH] = sched;
+ t3c_handlers[CPL_ACT_OPEN_RPL] = sched;
+ t3c_handlers[CPL_RX_DATA] = sched;
+ t3c_handlers[CPL_TX_DMA_ACK] = sched;
+ t3c_handlers[CPL_ABORT_RPL_RSS] = sched;
+ t3c_handlers[CPL_ABORT_RPL] = sched;
+ t3c_handlers[CPL_PASS_OPEN_RPL] = sched;
+ t3c_handlers[CPL_CLOSE_LISTSRV_RPL] = sched;
+ t3c_handlers[CPL_PASS_ACCEPT_REQ] = sched;
+ t3c_handlers[CPL_PASS_ESTABLISH] = sched;
+ t3c_handlers[CPL_PEER_CLOSE] = sched;
+ t3c_handlers[CPL_CLOSE_CON_RPL] = sched;
+ t3c_handlers[CPL_ABORT_REQ_RSS] = sched;
+ t3c_handlers[CPL_RDMA_TERMINATE] = sched;
+ t3c_handlers[CPL_RDMA_EC_STATUS] = sched;
+
+ /*
+ * These are the real handlers that are called from a
+ * work queue.
+ */
+ work_handlers[CPL_ACT_ESTABLISH] = act_establish;
+ work_handlers[CPL_ACT_OPEN_RPL] = act_open_rpl;
+ work_handlers[CPL_RX_DATA] = rx_data;
+ work_handlers[CPL_TX_DMA_ACK] = tx_ack;
+ work_handlers[CPL_ABORT_RPL_RSS] = abort_rpl;
+ work_handlers[CPL_ABORT_RPL] = abort_rpl;
+ work_handlers[CPL_PASS_OPEN_RPL] = pass_open_rpl;
+ work_handlers[CPL_CLOSE_LISTSRV_RPL] = close_listsrv_rpl;
+ work_handlers[CPL_PASS_ACCEPT_REQ] = pass_accept_req;
+ work_handlers[CPL_PASS_ESTABLISH] = pass_establish;
+ work_handlers[CPL_PEER_CLOSE] = peer_close;
+ work_handlers[CPL_ABORT_REQ_RSS] = peer_abort;
+ work_handlers[CPL_CLOSE_CON_RPL] = close_con_rpl;
+ work_handlers[CPL_RDMA_TERMINATE] = terminate;
+ work_handlers[CPL_RDMA_EC_STATUS] = ec_status;
+ return 0;
+}
+
+void __exit iwch_cm_term(void)
+{
+ flush_workqueue(workq);
+ destroy_workqueue(workq);
+}
diff --git a/drivers/infiniband/hw/cxgb3/iwch_cm.h b/drivers/infiniband/hw/cxgb3/iwch_cm.h
new file mode 100644
index 0000000..893f9d0
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch_cm.h
@@ -0,0 +1,223 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef _IWCH_CM_H_
+#define _IWCH_CM_H_
+
+#include <linux/inet.h>
+#include <linux/wait.h>
+#include <linux/spinlock.h>
+#include <linux/kref.h>
+
+#include <rdma/ib_verbs.h>
+#include <rdma/iw_cm.h>
+
+#include "cxgb3_offload.h"
+#include "iwch_provider.h"
+
+#define MPA_KEY_REQ "MPA ID Req Frame"
+#define MPA_KEY_REP "MPA ID Rep Frame"
+
+#define MPA_MAX_PRIVATE_DATA 256
+#define MPA_REV 0 /* XXX - amso1100 uses rev 0 ! */
+#define MPA_REJECT 0x20
+#define MPA_CRC 0x40
+#define MPA_MARKERS 0x80
+#define MPA_FLAGS_MASK 0xE0
+
+#define put_ep(ep) { \
+ PDBG("put_ep (via %s:%u) ep %p refcnt %d\n", __FUNCTION__, __LINE__, \
+ ep, atomic_read(&((ep)->kref.refcount))); \
+ kref_put(&((ep)->kref), __free_ep); \
+}
+
+#define get_ep(ep) { \
+ PDBG("get_ep (via %s:%u) ep %p, refcnt %d\n", __FUNCTION__, __LINE__, \
+ ep, atomic_read(&((ep)->kref.refcount))); \
+ kref_get(&((ep)->kref)); \
+}
+
+struct mpa_message {
+ u8 key[16];
+ u8 flags;
+ u8 revision;
+ __be16 private_data_size;
+ u8 private_data[0];
+};
+
+struct terminate_message {
+ u8 layer_etype;
+ u8 ecode;
+ __be16 hdrct_rsvd;
+ u8 len_hdrs[0];
+};
+
+#define TERM_MAX_LENGTH (sizeof(struct terminate_message) + 2 + 18 + 28)
+
+enum iwch_layers_types {
+ LAYER_RDMAP = 0x00,
+ LAYER_DDP = 0x10,
+ LAYER_MPA = 0x20,
+ RDMAP_LOCAL_CATA = 0x00,
+ RDMAP_REMOTE_PROT = 0x01,
+ RDMAP_REMOTE_OP = 0x02,
+ DDP_LOCAL_CATA = 0x00,
+ DDP_TAGGED_ERR = 0x01,
+ DDP_UNTAGGED_ERR = 0x02,
+ DDP_LLP = 0x03
+};
+
+enum iwch_rdma_ecodes {
+ RDMAP_INV_STAG = 0x00,
+ RDMAP_BASE_BOUNDS = 0x01,
+ RDMAP_ACC_VIOL = 0x02,
+ RDMAP_STAG_NOT_ASSOC = 0x03,
+ RDMAP_TO_WRAP = 0x04,
+ RDMAP_INV_VERS = 0x05,
+ RDMAP_INV_OPCODE = 0x06,
+ RDMAP_STREAM_CATA = 0x07,
+ RDMAP_GLOBAL_CATA = 0x08,
+ RDMAP_CANT_INV_STAG = 0x09,
+ RDMAP_UNSPECIFIED = 0xff
+};
+
+enum iwch_ddp_ecodes {
+ DDPT_INV_STAG = 0x00,
+ DDPT_BASE_BOUNDS = 0x01,
+ DDPT_STAG_NOT_ASSOC = 0x02,
+ DDPT_TO_WRAP = 0x03,
+ DDPT_INV_VERS = 0x04,
+ DDPU_INV_QN = 0x01,
+ DDPU_INV_MSN_NOBUF = 0x02,
+ DDPU_INV_MSN_RANGE = 0x03,
+ DDPU_INV_MO = 0x04,
+ DDPU_MSG_TOOBIG = 0x05,
+ DDPU_INV_VERS = 0x06
+};
+
+enum iwch_mpa_ecodes {
+ MPA_CRC_ERR = 0x02,
+ MPA_MARKER_ERR = 0x03
+};
+
+enum iwch_ep_state {
+ IDLE = 0,
+ LISTEN,
+ CONNECTING,
+ MPA_REQ_WAIT,
+ MPA_REQ_SENT,
+ MPA_REQ_RCVD,
+ MPA_REP_SENT,
+ FPDU_MODE,
+ ABORTING,
+ CLOSING,
+ MORIBUND,
+ DEAD,
+};
+
+struct iwch_ep_common {
+ struct iw_cm_id *cm_id;
+ struct iwch_qp *qp;
+ struct t3cdev *tdev;
+ enum iwch_ep_state state;
+ struct kref kref;
+ spinlock_t lock;
+ struct sockaddr_in local_addr;
+ struct sockaddr_in remote_addr;
+ wait_queue_head_t waitq;
+ int rpl_done;
+ int rpl_err;
+};
+
+struct iwch_listen_ep {
+ struct iwch_ep_common com;
+ unsigned int stid;
+ int backlog;
+};
+
+struct iwch_ep {
+ struct iwch_ep_common com;
+ struct iwch_ep *parent_ep;
+ struct timer_list timer;
+ unsigned int atid;
+ u32 hwtid;
+ u32 snd_seq;
+ struct l2t_entry *l2t;
+ struct dst_entry *dst;
+ struct sk_buff *mpa_skb;
+ struct iwch_mpa_attributes mpa_attr;
+ unsigned int mpa_pkt_len;
+ u8 mpa_pkt[sizeof(struct mpa_message) + MPA_MAX_PRIVATE_DATA];
+ u8 tos;
+ u16 emss;
+ u16 plen;
+ u32 ird;
+ u32 ord;
+};
+
+static inline struct iwch_ep *to_ep(struct iw_cm_id *cm_id)
+{
+ return (struct iwch_ep *)cm_id->provider_data;
+}
+
+static inline struct iwch_listen_ep *to_listen_ep(struct iw_cm_id *cm_id)
+{
+ return (struct iwch_listen_ep *)cm_id->provider_data;
+}
+
+static inline int compute_wscale(int win)
+{
+ int wscale = 0;
+
+ while (wscale < 14 && (65535<<wscale) < win)
+ wscale++;
+ return wscale;
+}
+
+/* CM prototypes */
+
+int iwch_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param);
+int iwch_create_listen(struct iw_cm_id *cm_id, int backlog);
+int iwch_destroy_listen(struct iw_cm_id *cm_id);
+int iwch_reject_cr(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len);
+int iwch_accept_cr(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param);
+int iwch_ep_disconnect(struct iwch_ep *ep, int abrupt, gfp_t gfp);
+int iwch_quiesce_tid(struct iwch_ep *ep);
+int iwch_resume_tid(struct iwch_ep *ep);
+void __free_ep(struct kref *kref);
+void iwch_rearp(struct iwch_ep *ep);
+int iwch_ep_redirect(void *ctx, struct dst_entry *old, struct dst_entry *new, struct l2t_entry *l2t);
+
+int __init iwch_cm_init(void);
+void __exit iwch_cm_term(void);
+
+#endif /* _IWCH_CM_H_ */

2006-12-02 22:51:25

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 09/13] Core WQE/CQE Types


T3 WQE and CQE structures, defines, etc...

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/hw/cxgb3/core/cxio_wr.h | 685 ++++++++++++++++++++++++++++
1 files changed, 685 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_wr.h b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h
new file mode 100644
index 0000000..45870be
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_wr.h
@@ -0,0 +1,685 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef __CXIO_WR_H__
+#define __CXIO_WR_H__
+
+#include <asm/io.h>
+#include <linux/pci.h>
+#include <linux/timer.h>
+#include "firmware_exports.h"
+
+#define T3_MAX_SGE 4
+
+#define Q_EMPTY(rptr,wptr) ((rptr)==(wptr))
+#define Q_FULL(rptr,wptr,size_log2) ( (((wptr)-(rptr))>>(size_log2)) && \
+ ((rptr)!=(wptr)) )
+#define Q_GENBIT(ptr,size_log2) (!(((ptr)>>size_log2)&0x1))
+#define Q_FREECNT(rptr,wptr,size_log2) ((1UL<<size_log2)-((wptr)-(rptr)))
+#define Q_COUNT(rptr,wptr) ((wptr)-(rptr))
+#define Q_PTR2IDX(ptr,size_log2) (ptr & ((1UL<<size_log2)-1))
+
+static inline void ring_doorbell(void __iomem *doorbell, u32 qpid)
+{
+ writel(((1<<31) | qpid), doorbell);
+}
+
+#define SEQ32_GE(x,y) (!( (((u32) (x)) - ((u32) (y))) & 0x80000000 ))
+
+enum t3_wr_flags {
+ T3_COMPLETION_FLAG = 0x01,
+ T3_NOTIFY_FLAG = 0x02,
+ T3_SOLICITED_EVENT_FLAG = 0x04,
+ T3_READ_FENCE_FLAG = 0x08,
+ T3_LOCAL_FENCE_FLAG = 0x10
+} __attribute__ ((packed));
+
+enum t3_wr_opcode {
+ T3_WR_BP = FW_WROPCODE_RI_BYPASS,
+ T3_WR_SEND = FW_WROPCODE_RI_SEND,
+ T3_WR_WRITE = FW_WROPCODE_RI_RDMA_WRITE,
+ T3_WR_READ = FW_WROPCODE_RI_RDMA_READ,
+ T3_WR_INV_STAG = FW_WROPCODE_RI_LOCAL_INV,
+ T3_WR_BIND = FW_WROPCODE_RI_BIND_MW,
+ T3_WR_RCV = FW_WROPCODE_RI_RECEIVE,
+ T3_WR_INIT = FW_WROPCODE_RI_RDMA_INIT,
+ T3_WR_QP_MOD = FW_WROPCODE_RI_MODIFY_QP
+} __attribute__ ((packed));
+
+enum t3_rdma_opcode {
+ T3_RDMA_WRITE, /* IETF RDMAP v1.0 ... */
+ T3_READ_REQ,
+ T3_READ_RESP,
+ T3_SEND,
+ T3_SEND_WITH_INV,
+ T3_SEND_WITH_SE,
+ T3_SEND_WITH_SE_INV,
+ T3_TERMINATE,
+ T3_RDMA_INIT, /* CHELSIO RI specific ... */
+ T3_BIND_MW,
+ T3_FAST_REGISTER,
+ T3_LOCAL_INV,
+ T3_QP_MOD,
+ T3_BYPASS
+} __attribute__ ((packed));
+
+static inline enum t3_rdma_opcode wr2opcode(enum t3_wr_opcode wrop)
+{
+ switch (wrop) {
+ case T3_WR_BP: return T3_BYPASS;
+ case T3_WR_SEND: return T3_SEND;
+ case T3_WR_WRITE: return T3_RDMA_WRITE;
+ case T3_WR_READ: return T3_READ_REQ;
+ case T3_WR_INV_STAG: return T3_LOCAL_INV;
+ case T3_WR_BIND: return T3_BIND_MW;
+ case T3_WR_INIT: return T3_RDMA_INIT;
+ case T3_WR_QP_MOD: return T3_QP_MOD;
+ default: break;
+ }
+ return -1;
+}
+
+
+/* Work request id */
+union t3_wrid {
+ struct {
+ u32 hi;
+ u32 low;
+ } id0;
+ u64 id1;
+};
+
+#define WRID(wrid) (wrid.id1)
+#define WRID_GEN(wrid) (wrid.id0.wr_gen)
+#define WRID_IDX(wrid) (wrid.id0.wr_idx)
+#define WRID_LO(wrid) (wrid.id0.wr_lo)
+
+struct fw_riwrh {
+ __be32 op_seop_flags;
+ __be32 gen_tid_len;
+};
+
+#define S_FW_RIWR_OP 24
+#define M_FW_RIWR_OP 0xff
+#define V_FW_RIWR_OP(x) ((x) << S_FW_RIWR_OP)
+#define G_FW_RIWR_OP(x) ((((x) >> S_FW_RIWR_OP)) & M_FW_RIWR_OP)
+
+#define S_FW_RIWR_SOPEOP 22
+#define M_FW_RIWR_SOPEOP 0x3
+#define V_FW_RIWR_SOPEOP(x) ((x) << S_FW_RIWR_SOPEOP)
+
+#define S_FW_RIWR_FLAGS 8
+#define M_FW_RIWR_FLAGS 0x3fffff
+#define V_FW_RIWR_FLAGS(x) ((x) << S_FW_RIWR_FLAGS)
+#define G_FW_RIWR_FLAGS(x) ((((x) >> S_FW_RIWR_FLAGS)) & M_FW_RIWR_FLAGS)
+
+#define S_FW_RIWR_TID 8
+#define V_FW_RIWR_TID(x) ((x) << S_FW_RIWR_TID)
+
+#define S_FW_RIWR_LEN 0
+#define V_FW_RIWR_LEN(x) ((x) << S_FW_RIWR_LEN)
+
+#define S_FW_RIWR_GEN 31
+#define V_FW_RIWR_GEN(x) ((x) << S_FW_RIWR_GEN)
+
+struct t3_sge {
+ __be32 stag;
+ __be32 len;
+ __be64 to;
+};
+
+/* If num_sgle is zero, flit 5+ contains immediate data.*/
+struct t3_send_wr {
+ struct fw_riwrh wrh; /* 0 */
+ union t3_wrid wrid; /* 1 */
+
+ u8 rdmaop; /* 2 */
+ u8 reserved[3];
+ __be32 rem_stag;
+ __be32 plen; /* 3 */
+ __be32 num_sgle;
+ struct t3_sge sgl[T3_MAX_SGE]; /* 4+ */
+};
+
+struct t3_local_inv_wr {
+ struct fw_riwrh wrh; /* 0 */
+ union t3_wrid wrid; /* 1 */
+ __be32 stag; /* 2 */
+ __be32 reserved3;
+};
+
+struct t3_rdma_write_wr {
+ struct fw_riwrh wrh; /* 0 */
+ union t3_wrid wrid; /* 1 */
+ u8 rdmaop; /* 2 */
+ u8 reserved[3];
+ __be32 stag_sink;
+ __be64 to_sink; /* 3 */
+ __be32 plen; /* 4 */
+ __be32 num_sgle;
+ struct t3_sge sgl[T3_MAX_SGE]; /* 5+ */
+};
+
+struct t3_rdma_read_wr {
+ struct fw_riwrh wrh; /* 0 */
+ union t3_wrid wrid; /* 1 */
+ u8 rdmaop; /* 2 */
+ u8 reserved[3];
+ __be32 rem_stag;
+ __be64 rem_to; /* 3 */
+ __be32 local_stag; /* 4 */
+ __be32 local_len;
+ __be64 local_to; /* 5 */
+};
+
+enum t3_addr_type {
+ T3_VA_BASED_TO = 0x0,
+ T3_ZERO_BASED_TO = 0x1
+} __attribute__ ((packed));
+
+enum t3_mem_perms {
+ T3_MEM_ACCESS_LOCAL_READ = 0x1,
+ T3_MEM_ACCESS_LOCAL_WRITE = 0x2,
+ T3_MEM_ACCESS_REM_READ = 0x4,
+ T3_MEM_ACCESS_REM_WRITE = 0x8
+} __attribute__ ((packed));
+
+struct t3_bind_mw_wr {
+ struct fw_riwrh wrh; /* 0 */
+ union t3_wrid wrid; /* 1 */
+ u16 reserved; /* 2 */
+ u8 type;
+ u8 perms;
+ __be32 mr_stag;
+ __be32 mw_stag; /* 3 */
+ __be32 mw_len;
+ __be64 mw_va; /* 4 */
+ __be32 mr_pbl_addr; /* 5 */
+ u8 reserved2[3];
+ u8 mr_pagesz;
+};
+
+struct t3_receive_wr {
+ struct fw_riwrh wrh; /* 0 */
+ union t3_wrid wrid; /* 1 */
+ u8 pagesz[T3_MAX_SGE];
+ __be32 num_sgle; /* 2 */
+ struct t3_sge sgl[T3_MAX_SGE]; /* 3+ */
+ __be32 pbl_addr[T3_MAX_SGE];
+};
+
+struct t3_bypass_wr {
+ struct fw_riwrh wrh;
+ union t3_wrid wrid; /* 1 */
+};
+
+struct t3_modify_qp_wr {
+ struct fw_riwrh wrh; /* 0 */
+ union t3_wrid wrid; /* 1 */
+ __be32 flags; /* 2 */
+ __be32 quiesce; /* 2 */
+ __be32 max_ird; /* 3 */
+ __be32 max_ord; /* 3 */
+ __be64 sge_cmd; /* 4 */
+ __be64 ctx1; /* 5 */
+ __be64 ctx0; /* 6 */
+};
+
+enum t3_modify_qp_flags {
+ MODQP_QUIESCE = 0x01,
+ MODQP_MAX_IRD = 0x02,
+ MODQP_MAX_ORD = 0x04,
+ MODQP_WRITE_EC = 0x08,
+ MODQP_READ_EC = 0x10,
+};
+
+
+enum t3_mpa_attrs {
+ uP_RI_MPA_RX_MARKER_ENABLE = 0x1,
+ uP_RI_MPA_TX_MARKER_ENABLE = 0x2,
+ uP_RI_MPA_CRC_ENABLE = 0x4,
+ uP_RI_MPA_IETF_ENABLE = 0x8
+} __attribute__ ((packed));
+
+enum t3_qp_caps {
+ uP_RI_QP_RDMA_READ_ENABLE = 0x01,
+ uP_RI_QP_RDMA_WRITE_ENABLE = 0x02,
+ uP_RI_QP_BIND_ENABLE = 0x04,
+ uP_RI_QP_FAST_REGISTER_ENABLE = 0x08,
+ uP_RI_QP_STAG0_ENABLE = 0x10
+} __attribute__ ((packed));
+
+struct t3_rdma_init_attr {
+ u32 tid;
+ u32 qpid;
+ u32 pdid;
+ u32 scqid;
+ u32 rcqid;
+ u32 rq_addr;
+ u32 rq_size;
+ enum t3_mpa_attrs mpaattrs;
+ enum t3_qp_caps qpcaps;
+ u16 tcp_emss;
+ u32 ord;
+ u32 ird;
+ u64 qp_dma_addr;
+ u32 qp_dma_size;
+ u32 flags;
+};
+
+struct t3_rdma_init_wr {
+ struct fw_riwrh wrh; /* 0 */
+ union t3_wrid wrid; /* 1 */
+ __be32 qpid; /* 2 */
+ __be32 pdid;
+ __be32 scqid; /* 3 */
+ __be32 rcqid;
+ __be32 rq_addr; /* 4 */
+ __be32 rq_size;
+ u8 mpaattrs; /* 5 */
+ u8 qpcaps;
+ __be16 ulpdu_size;
+ __be32 flags; /* bits 31-1 - reservered */
+ /* bit 0 - set if RECV posted */
+ __be32 ord; /* 6 */
+ __be32 ird;
+ __be64 qp_dma_addr; /* 7 */
+ __be32 qp_dma_size; /* 8 */
+ u32 rsvd;
+};
+
+struct t3_genbit {
+ u64 flit[15];
+ __be64 genbit;
+};
+
+enum rdma_init_wr_flags {
+ RECVS_POSTED = 1,
+};
+
+union t3_wr {
+ struct t3_send_wr send;
+ struct t3_rdma_write_wr write;
+ struct t3_rdma_read_wr read;
+ struct t3_receive_wr recv;
+ struct t3_local_inv_wr local_inv;
+ struct t3_bind_mw_wr bind;
+ struct t3_bypass_wr bypass;
+ struct t3_rdma_init_wr init;
+ struct t3_modify_qp_wr qp_mod;
+ struct t3_genbit genbit;
+ u64 flit[16];
+};
+
+#define T3_SQ_CQE_FLIT 13
+#define T3_SQ_COOKIE_FLIT 14
+
+#define T3_RQ_COOKIE_FLIT 13
+#define T3_RQ_CQE_FLIT 14
+
+static inline enum t3_wr_opcode fw_riwrh_opcode(struct fw_riwrh *wqe)
+{
+ return G_FW_RIWR_OP(be32_to_cpu(wqe->op_seop_flags));
+}
+
+static inline void build_fw_riwrh(struct fw_riwrh *wqe, enum t3_wr_opcode op,
+ enum t3_wr_flags flags, u8 genbit, u32 tid,
+ u8 len)
+{
+ wqe->op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(op) |
+ V_FW_RIWR_SOPEOP(M_FW_RIWR_SOPEOP) |
+ V_FW_RIWR_FLAGS(flags));
+ wmb();
+ wqe->gen_tid_len = cpu_to_be32(V_FW_RIWR_GEN(genbit) |
+ V_FW_RIWR_TID(tid) |
+ V_FW_RIWR_LEN(len));
+ /* 2nd gen bit... */
+ ((union t3_wr *)wqe)->genbit.genbit = cpu_to_be64(genbit);
+}
+
+/*
+ * T3 ULP2_TX commands
+ */
+enum t3_utx_mem_op {
+ T3_UTX_MEM_READ = 2,
+ T3_UTX_MEM_WRITE = 3
+};
+
+/* T3 MC7 RDMA TPT entry format */
+
+enum tpt_mem_type {
+ TPT_NON_SHARED_MR = 0x0,
+ TPT_SHARED_MR = 0x1,
+ TPT_MW = 0x2,
+ TPT_MW_RELAXED_PROTECTION = 0x3
+};
+
+enum tpt_addr_type {
+ TPT_ZBTO = 0,
+ TPT_VATO = 1
+};
+
+enum tpt_mem_perm {
+ TPT_LOCAL_READ = 0x8,
+ TPT_LOCAL_WRITE = 0x4,
+ TPT_REMOTE_READ = 0x2,
+ TPT_REMOTE_WRITE = 0x1
+};
+
+struct tpt_entry {
+ __be32 valid_stag_pdid;
+ __be32 flags_pagesize_qpid;
+
+ __be32 rsvd_pbl_addr;
+ __be32 len;
+ __be32 va_hi;
+ __be32 va_low_or_fbo;
+
+ __be32 rsvd_bind_cnt_or_pstag;
+ __be32 rsvd_pbl_size;
+};
+
+#define S_TPT_VALID 31
+#define V_TPT_VALID(x) ((x) << S_TPT_VALID)
+#define F_TPT_VALID V_TPT_VALID(1U)
+
+#define S_TPT_STAG_KEY 23
+#define M_TPT_STAG_KEY 0xFF
+#define V_TPT_STAG_KEY(x) ((x) << S_TPT_STAG_KEY)
+#define G_TPT_STAG_KEY(x) (((x) >> S_TPT_STAG_KEY) & M_TPT_STAG_KEY)
+
+#define S_TPT_STAG_STATE 22
+#define V_TPT_STAG_STATE(x) ((x) << S_TPT_STAG_STATE)
+#define F_TPT_STAG_STATE V_TPT_STAG_STATE(1U)
+
+#define S_TPT_STAG_TYPE 20
+#define M_TPT_STAG_TYPE 0x3
+#define V_TPT_STAG_TYPE(x) ((x) << S_TPT_STAG_TYPE)
+#define G_TPT_STAG_TYPE(x) (((x) >> S_TPT_STAG_TYPE) & M_TPT_STAG_TYPE)
+
+#define S_TPT_PDID 0
+#define M_TPT_PDID 0xFFFFF
+#define V_TPT_PDID(x) ((x) << S_TPT_PDID)
+#define G_TPT_PDID(x) (((x) >> S_TPT_PDID) & M_TPT_PDID)
+
+#define S_TPT_PERM 28
+#define M_TPT_PERM 0xF
+#define V_TPT_PERM(x) ((x) << S_TPT_PERM)
+#define G_TPT_PERM(x) (((x) >> S_TPT_PERM) & M_TPT_PERM)
+
+#define S_TPT_REM_INV_DIS 27
+#define V_TPT_REM_INV_DIS(x) ((x) << S_TPT_REM_INV_DIS)
+#define F_TPT_REM_INV_DIS V_TPT_REM_INV_DIS(1U)
+
+#define S_TPT_ADDR_TYPE 26
+#define V_TPT_ADDR_TYPE(x) ((x) << S_TPT_ADDR_TYPE)
+#define F_TPT_ADDR_TYPE V_TPT_ADDR_TYPE(1U)
+
+#define S_TPT_MW_BIND_ENABLE 25
+#define V_TPT_MW_BIND_ENABLE(x) ((x) << S_TPT_MW_BIND_ENABLE)
+#define F_TPT_MW_BIND_ENABLE V_TPT_MW_BIND_ENABLE(1U)
+
+#define S_TPT_PAGE_SIZE 20
+#define M_TPT_PAGE_SIZE 0x1F
+#define V_TPT_PAGE_SIZE(x) ((x) << S_TPT_PAGE_SIZE)
+#define G_TPT_PAGE_SIZE(x) (((x) >> S_TPT_PAGE_SIZE) & M_TPT_PAGE_SIZE)
+
+#define S_TPT_PBL_ADDR 0
+#define M_TPT_PBL_ADDR 0x1FFFFFFF
+#define V_TPT_PBL_ADDR(x) ((x) << S_TPT_PBL_ADDR)
+#define G_TPT_PBL_ADDR(x) (((x) >> S_TPT_PBL_ADDR) & M_TPT_PBL_ADDR)
+
+#define S_TPT_QPID 0
+#define M_TPT_QPID 0xFFFFF
+#define V_TPT_QPID(x) ((x) << S_TPT_QPID)
+#define G_TPT_QPID(x) (((x) >> S_TPT_QPID) & M_TPT_QPID)
+
+#define S_TPT_PSTAG 0
+#define M_TPT_PSTAG 0xFFFFFF
+#define V_TPT_PSTAG(x) ((x) << S_TPT_PSTAG)
+#define G_TPT_PSTAG(x) (((x) >> S_TPT_PSTAG) & M_TPT_PSTAG)
+
+#define S_TPT_PBL_SIZE 0
+#define M_TPT_PBL_SIZE 0xFFFFF
+#define V_TPT_PBL_SIZE(x) ((x) << S_TPT_PBL_SIZE)
+#define G_TPT_PBL_SIZE(x) (((x) >> S_TPT_PBL_SIZE) & M_TPT_PBL_SIZE)
+
+/*
+ * CQE defs
+ */
+struct t3_cqe {
+ __be32 header;
+ __be32 len;
+ union {
+ struct {
+ __be32 stag;
+ __be32 msn;
+ } rcqe;
+ struct {
+ u32 wrid_hi;
+ u32 wrid_low;
+ } scqe;
+ } u;
+};
+
+#define S_CQE_OOO 31
+#define M_CQE_OOO 0x1
+#define G_CQE_OOO(x) ((((x) >> S_CQE_OOO)) & M_CQE_OOO)
+#define V_CEQ_OOO(x) ((x)<<S_CQE_OOO)
+
+#define S_CQE_QPID 12
+#define M_CQE_QPID 0x7FFFF
+#define G_CQE_QPID(x) ((((x) >> S_CQE_QPID)) & M_CQE_QPID)
+#define V_CQE_QPID(x) ((x)<<S_CQE_QPID)
+
+#define S_CQE_SWCQE 11
+#define M_CQE_SWCQE 0x1
+#define G_CQE_SWCQE(x) ((((x) >> S_CQE_SWCQE)) & M_CQE_SWCQE)
+#define V_CQE_SWCQE(x) ((x)<<S_CQE_SWCQE)
+
+#define S_CQE_GENBIT 10
+#define M_CQE_GENBIT 0x1
+#define G_CQE_GENBIT(x) (((x) >> S_CQE_GENBIT) & M_CQE_GENBIT)
+#define V_CQE_GENBIT(x) ((x)<<S_CQE_GENBIT)
+
+#define S_CQE_STATUS 5
+#define M_CQE_STATUS 0x1F
+#define G_CQE_STATUS(x) ((((x) >> S_CQE_STATUS)) & M_CQE_STATUS)
+#define V_CQE_STATUS(x) ((x)<<S_CQE_STATUS)
+
+#define S_CQE_TYPE 4
+#define M_CQE_TYPE 0x1
+#define G_CQE_TYPE(x) ((((x) >> S_CQE_TYPE)) & M_CQE_TYPE)
+#define V_CQE_TYPE(x) ((x)<<S_CQE_TYPE)
+
+#define S_CQE_OPCODE 0
+#define M_CQE_OPCODE 0xF
+#define G_CQE_OPCODE(x) ((((x) >> S_CQE_OPCODE)) & M_CQE_OPCODE)
+#define V_CQE_OPCODE(x) ((x)<<S_CQE_OPCODE)
+
+#define SW_CQE(x) (G_CQE_SWCQE(be32_to_cpu((x).header)))
+#define CQE_OOO(x) (G_CQE_OOO(be32_to_cpu((x).header)))
+#define CQE_QPID(x) (G_CQE_QPID(be32_to_cpu((x).header)))
+#define CQE_GENBIT(x) (G_CQE_GENBIT(be32_to_cpu((x).header)))
+#define CQE_TYPE(x) (G_CQE_TYPE(be32_to_cpu((x).header)))
+#define SQ_TYPE(x) (CQE_TYPE((x)))
+#define RQ_TYPE(x) (!CQE_TYPE((x)))
+#define CQE_STATUS(x) (G_CQE_STATUS(be32_to_cpu((x).header)))
+#define CQE_OPCODE(x) (G_CQE_OPCODE(be32_to_cpu((x).header)))
+
+#define CQE_LEN(x) (be32_to_cpu((x).len))
+
+/* used for RQ completion processing */
+#define CQE_WRID_STAG(x) (be32_to_cpu((x).u.rcqe.stag))
+#define CQE_WRID_MSN(x) (be32_to_cpu((x).u.rcqe.msn))
+
+/* used for SQ completion processing */
+#define CQE_WRID_SQ_WPTR(x) ((x).u.scqe.wrid_hi)
+#define CQE_WRID_WPTR(x) ((x).u.scqe.wrid_low)
+
+/* generic accessor macros */
+#define CQE_WRID_HI(x) ((x).u.scqe.wrid_hi)
+#define CQE_WRID_LOW(x) ((x).u.scqe.wrid_low)
+
+#define TPT_ERR_SUCCESS 0x0
+#define TPT_ERR_STAG 0x1 /* STAG invalid: either the */
+ /* STAG is offlimt, being 0, */
+ /* or STAG_key mismatch */
+#define TPT_ERR_PDID 0x2 /* PDID mismatch */
+#define TPT_ERR_QPID 0x3 /* QPID mismatch */
+#define TPT_ERR_ACCESS 0x4 /* Invalid access right */
+#define TPT_ERR_WRAP 0x5 /* Wrap error */
+#define TPT_ERR_BOUND 0x6 /* base and bounds voilation */
+#define TPT_ERR_INVALIDATE_SHARED_MR 0x7 /* attempt to invalidate a */
+ /* shared memory region */
+#define TPT_ERR_INVALIDATE_MR_WITH_MW_BOUND 0x8 /* attempt to invalidate a */
+ /* shared memory region */
+#define TPT_ERR_ECC 0x9 /* ECC error detected */
+#define TPT_ERR_ECC_PSTAG 0xA /* ECC error detected when */
+ /* reading PSTAG for a MW */
+ /* Invalidate */
+#define TPT_ERR_PBL_ADDR_BOUND 0xB /* pbl addr out of bounds: */
+ /* software error */
+#define TPT_ERR_SWFLUSH 0xC /* SW FLUSHED */
+#define TPT_ERR_CRC 0x10 /* CRC error */
+#define TPT_ERR_MARKER 0x11 /* Marker error */
+#define TPT_ERR_PDU_LEN_ERR 0x12 /* invalid PDU length */
+#define TPT_ERR_OUT_OF_RQE 0x13 /* out of RQE */
+#define TPT_ERR_DDP_VERSION 0x14 /* wrong DDP version */
+#define TPT_ERR_RDMA_VERSION 0x15 /* wrong RDMA version */
+#define TPT_ERR_OPCODE 0x16 /* invalid rdma opcode */
+#define TPT_ERR_DDP_QUEUE_NUM 0x17 /* invalid ddp queue number */
+#define TPT_ERR_MSN 0x18 /* MSN error */
+#define TPT_ERR_TBIT 0x19 /* tag bit not set correctly */
+#define TPT_ERR_MO 0x1A /* MO not 0 for TERMINATE */
+ /* or READ_REQ */
+#define TPT_ERR_MSN_GAP 0x1B
+#define TPT_ERR_MSN_RANGE 0x1C
+#define TPT_ERR_IRD_OVERFLOW 0x1D
+#define TPT_ERR_RQE_ADDR_BOUND 0x1E /* RQE addr out of bounds: */
+ /* software error */
+#define TPT_ERR_INTERNAL_ERR 0x1F /* internal error (opcode */
+ /* mismatch) */
+
+struct t3_swsq {
+ __u64 wr_id;
+ struct t3_cqe cqe;
+ __u32 sq_wptr;
+ __be32 read_len;
+ int opcode;
+ int complete;
+ int signaled;
+};
+
+/*
+ * A T3 WQ implements both the SQ and RQ.
+ */
+struct t3_wq {
+ union t3_wr *queue; /* DMA accessable memory */
+ dma_addr_t dma_addr; /* DMA address for HW */
+ DECLARE_PCI_UNMAP_ADDR(mapping) /* unmap kruft */
+ u32 error; /* 1 once we go to ERROR */
+ u32 qpid;
+ u32 wptr; /* idx to next available WR slot */
+ u32 size_log2; /* total wq size */
+ struct t3_swsq *sq; /* SW SQ */
+ struct t3_swsq *oldest_read; /* tracks oldest pending read */
+ u32 sq_wptr; /* sq_wptr - sq_rptr == count of */
+ u32 sq_rptr; /* pending wrs */
+ u32 sq_size_log2; /* sq size */
+ u64 *rq; /* SW RQ (holds consumer wr_ids */
+ u32 rq_wptr; /* rq_wptr - rq_rptr == count of */
+ u32 rq_rptr; /* pending wrs */
+ u64 *rq_oldest_wr; /* oldest wr on the SW RQ */
+ u32 rq_size_log2; /* rq size */
+ u32 rq_addr; /* rq adapter address */
+ void __iomem *doorbell; /* kernel db */
+ u64 udb; /* user db if any */
+};
+
+struct t3_cq {
+ u32 cqid;
+ u32 rptr;
+ u32 wptr;
+ u32 size_log2;
+ dma_addr_t dma_addr;
+ DECLARE_PCI_UNMAP_ADDR(mapping)
+ struct t3_cqe *queue;
+ struct t3_cqe *sw_queue;
+ u32 sw_rptr;
+ u32 sw_wptr;
+};
+
+#define CQ_VLD_ENTRY(ptr,size_log2,cqe) (Q_GENBIT(ptr,size_log2) == \
+ CQE_GENBIT(*cqe))
+
+static inline void cxio_set_wq_in_error(struct t3_wq *wq)
+{
+ wq->queue->flit[13] = 1;
+}
+
+static inline struct t3_cqe *cxio_next_hw_cqe(struct t3_cq *cq)
+{
+ struct t3_cqe *cqe;
+
+ cqe = cq->queue + (Q_PTR2IDX(cq->rptr, cq->size_log2));
+ if (CQ_VLD_ENTRY(cq->rptr, cq->size_log2, cqe))
+ return cqe;
+ return NULL;
+}
+
+static inline struct t3_cqe *cxio_next_sw_cqe(struct t3_cq *cq)
+{
+ struct t3_cqe *cqe;
+
+ if (!Q_EMPTY(cq->sw_rptr, cq->sw_wptr)) {
+ cqe = cq->sw_queue + (Q_PTR2IDX(cq->sw_rptr, cq->size_log2));
+ return cqe;
+ }
+ return NULL;
+}
+
+static inline struct t3_cqe *cxio_next_cqe(struct t3_cq *cq)
+{
+ struct t3_cqe *cqe;
+
+ if (!Q_EMPTY(cq->sw_rptr, cq->sw_wptr)) {
+ cqe = cq->sw_queue + (Q_PTR2IDX(cq->sw_rptr, cq->size_log2));
+ return cqe;
+ }
+ cqe = cq->queue + (Q_PTR2IDX(cq->rptr, cq->size_log2));
+ if (CQ_VLD_ENTRY(cq->rptr, cq->size_log2, cqe))
+ return cqe;
+ return NULL;
+}
+
+#endif

2006-12-02 22:51:48

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 11/13] Core Resource Allocation


Core functions to carve up adapter memory, stag, qp, and cq IDs.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/hw/cxgb3/core/cxio_resource.c | 331 ++++++++++++++++++++++
drivers/infiniband/hw/cxgb3/core/cxio_resource.h | 70 +++++
2 files changed, 401 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.c b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c
new file mode 100644
index 0000000..444df15
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_resource.c
@@ -0,0 +1,331 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+/* Crude resource management */
+#include <linux/kernel.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/kfifo.h>
+#include <linux/spinlock.h>
+#include <linux/errno.h>
+#include "cxio_resource.h"
+#include "cxio_hal.h"
+
+static struct kfifo *rhdl_fifo;
+static spinlock_t rhdl_fifo_lock;
+
+#define RANDOM_SIZE 16
+
+static int __cxio_init_resource_fifo(struct kfifo **fifo,
+ spinlock_t *fifo_lock,
+ u32 nr, u32 skip_low,
+ u32 skip_high,
+ int random)
+{
+ u32 i, j, entry = 0, idx;
+ u32 random_bytes;
+ u32 rarray[16];
+ spin_lock_init(fifo_lock);
+
+ *fifo = kfifo_alloc(nr * sizeof(u32), GFP_KERNEL, fifo_lock);
+ if (IS_ERR(*fifo))
+ return -ENOMEM;
+
+ for (i = 0; i < skip_low + skip_high; i++)
+ __kfifo_put(*fifo, (unsigned char *) &entry, sizeof(u32));
+ if (random) {
+ j = 0;
+ random_bytes = random32();
+ for (i = 0; i < RANDOM_SIZE; i++)
+ rarray[i] = i + skip_low;
+ for (i = skip_low + RANDOM_SIZE; i < nr - skip_high; i++) {
+ if (j >= RANDOM_SIZE) {
+ j = 0;
+ random_bytes = random32();
+ }
+ idx = (random_bytes >> (j * 2)) & 0xF;
+ __kfifo_put(*fifo,
+ (unsigned char *) &rarray[idx],
+ sizeof(u32));
+ rarray[idx] = i;
+ j++;
+ }
+ for (i = 0; i < RANDOM_SIZE; i++)
+ __kfifo_put(*fifo,
+ (unsigned char *) &rarray[i],
+ sizeof(u32));
+ } else
+ for (i = skip_low; i < nr - skip_high; i++)
+ __kfifo_put(*fifo, (unsigned char *) &i, sizeof(u32));
+
+ for (i = 0; i < skip_low + skip_high; i++)
+ kfifo_get(*fifo, (unsigned char *) &entry, sizeof(u32));
+ return 0;
+}
+
+static int cxio_init_resource_fifo(struct kfifo **fifo, spinlock_t * fifo_lock,
+ u32 nr, u32 skip_low, u32 skip_high)
+{
+ return (__cxio_init_resource_fifo(fifo, fifo_lock, nr, skip_low,
+ skip_high, 0));
+}
+
+static int cxio_init_resource_fifo_random(struct kfifo **fifo,
+ spinlock_t * fifo_lock,
+ u32 nr, u32 skip_low, u32 skip_high)
+{
+
+ return (__cxio_init_resource_fifo(fifo, fifo_lock, nr, skip_low,
+ skip_high, 1));
+}
+
+static int cxio_init_qpid_fifo(struct cxio_rdev *rdev_p)
+{
+ u32 i;
+
+ spin_lock_init(&rdev_p->rscp->qpid_fifo_lock);
+
+ rdev_p->rscp->qpid_fifo = kfifo_alloc(T3_MAX_NUM_QP * sizeof(u32),
+ GFP_KERNEL,
+ &rdev_p->rscp->qpid_fifo_lock);
+ if (IS_ERR(rdev_p->rscp->qpid_fifo))
+ return -ENOMEM;
+
+ for (i = 16; i < T3_MAX_NUM_QP; i++)
+ if (!(i & rdev_p->qpmask))
+ __kfifo_put(rdev_p->rscp->qpid_fifo,
+ (unsigned char *) &i, sizeof(u32));
+ return 0;
+}
+
+int cxio_hal_init_rhdl_resource(u32 nr_rhdl)
+{
+ return cxio_init_resource_fifo(&rhdl_fifo, &rhdl_fifo_lock, nr_rhdl, 1,
+ 0);
+}
+
+void cxio_hal_destroy_rhdl_resource(void)
+{
+ kfifo_free(rhdl_fifo);
+}
+
+/* nr_* must be power of 2 */
+int cxio_hal_init_resource(struct cxio_rdev *rdev_p,
+ u32 nr_tpt, u32 nr_pbl,
+ u32 nr_rqt, u32 nr_qpid, u32 nr_cqid, u32 nr_pdid)
+{
+ int err = 0;
+ struct cxio_hal_resource *rscp;
+
+ rscp = kmalloc(sizeof(*rscp), GFP_KERNEL);
+ if (!rscp)
+ return -ENOMEM;
+ rdev_p->rscp = rscp;
+ err = cxio_init_resource_fifo_random(&rscp->tpt_fifo,
+ &rscp->tpt_fifo_lock,
+ nr_tpt, 1, 0);
+ if (err)
+ goto tpt_err;
+ err = cxio_init_qpid_fifo(rdev_p);
+ if (err)
+ goto qpid_err;
+ err = cxio_init_resource_fifo(&rscp->cqid_fifo, &rscp->cqid_fifo_lock,
+ nr_cqid, 1, 0);
+ if (err)
+ goto cqid_err;
+ err = cxio_init_resource_fifo(&rscp->pdid_fifo, &rscp->pdid_fifo_lock,
+ nr_pdid, 1, 0);
+ if (err)
+ goto pdid_err;
+ return 0;
+pdid_err:
+ kfifo_free(rscp->cqid_fifo);
+cqid_err:
+ kfifo_free(rscp->qpid_fifo);
+qpid_err:
+ kfifo_free(rscp->tpt_fifo);
+tpt_err:
+ return -ENOMEM;
+}
+
+/*
+ * returns 0 if no resource available
+ */
+static inline u32 cxio_hal_get_resource(struct kfifo *fifo)
+{
+ u32 entry;
+ if (kfifo_get(fifo, (unsigned char *) &entry, sizeof(u32)))
+ return entry;
+ else
+ return 0; /* fifo emptry */
+}
+
+static inline void cxio_hal_put_resource(struct kfifo *fifo, u32 entry)
+{
+ BUG_ON(kfifo_put(fifo, (unsigned char *) &entry, sizeof(u32)) == 0);
+}
+
+u32 cxio_hal_get_rhdl(void)
+{
+ return cxio_hal_get_resource(rhdl_fifo);
+}
+
+void cxio_hal_put_rhdl(u32 rhdl)
+{
+ cxio_hal_put_resource(rhdl_fifo, rhdl);
+}
+
+u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp)
+{
+ return cxio_hal_get_resource(rscp->tpt_fifo);
+}
+
+void cxio_hal_put_stag(struct cxio_hal_resource *rscp, u32 stag)
+{
+ cxio_hal_put_resource(rscp->tpt_fifo, stag);
+}
+
+u32 cxio_hal_get_qpid(struct cxio_hal_resource *rscp)
+{
+ u32 qpid = cxio_hal_get_resource(rscp->qpid_fifo);
+ PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid);
+ return qpid;
+}
+
+void cxio_hal_put_qpid(struct cxio_hal_resource *rscp, u32 qpid)
+{
+ PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid);
+ cxio_hal_put_resource(rscp->qpid_fifo, qpid);
+}
+
+u32 cxio_hal_get_cqid(struct cxio_hal_resource *rscp)
+{
+ return cxio_hal_get_resource(rscp->cqid_fifo);
+}
+
+void cxio_hal_put_cqid(struct cxio_hal_resource *rscp, u32 cqid)
+{
+ cxio_hal_put_resource(rscp->cqid_fifo, cqid);
+}
+
+u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp)
+{
+ return cxio_hal_get_resource(rscp->pdid_fifo);
+}
+
+void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid)
+{
+ cxio_hal_put_resource(rscp->pdid_fifo, pdid);
+}
+
+void cxio_hal_destroy_resource(struct cxio_hal_resource *rscp)
+{
+ kfifo_free(rscp->tpt_fifo);
+ kfifo_free(rscp->cqid_fifo);
+ kfifo_free(rscp->qpid_fifo);
+ kfifo_free(rscp->pdid_fifo);
+ kfree(rscp);
+}
+
+/*
+ * PBL Memory Manager. Uses Linux generic allocator.
+ */
+
+#define MIN_PBL_SHIFT 8 /* 256B == min PBL size (32 entries) */
+#define PBL_CHUNK 2*1024*1024
+
+u32 cxio_hal_pblpool_alloc(struct cxio_rdev *rdev_p, int size)
+{
+ unsigned long addr = gen_pool_alloc(rdev_p->pbl_pool, size);
+ PDBG("%s addr 0x%x size %d\n", __FUNCTION__, (u32)addr, size);
+ return (u32)addr;
+}
+
+void cxio_hal_pblpool_free(struct cxio_rdev *rdev_p, u32 addr, int size)
+{
+ PDBG("%s addr 0x%x size %d\n", __FUNCTION__, addr, size);
+ gen_pool_free(rdev_p->pbl_pool, (unsigned long)addr, size);
+}
+
+int cxio_hal_pblpool_create(struct cxio_rdev *rdev_p)
+{
+ unsigned long i;
+ rdev_p->pbl_pool = gen_pool_create(MIN_PBL_SHIFT, -1);
+ if (rdev_p->pbl_pool)
+ for (i = rdev_p->rnic_info.pbl_base;
+ i <= rdev_p->rnic_info.pbl_top - PBL_CHUNK + 1;
+ i += PBL_CHUNK)
+ gen_pool_add(rdev_p->pbl_pool, i, PBL_CHUNK, -1);
+ return rdev_p->pbl_pool ? 0 : -ENOMEM;
+}
+
+void cxio_hal_pblpool_destroy(struct cxio_rdev *rdev_p)
+{
+ gen_pool_destroy(rdev_p->pbl_pool);
+}
+
+/*
+ * RQT Memory Manager. Uses Linux generic allocator.
+ */
+
+#define MIN_RQT_SHIFT 10 /* 1KB == mini RQT size (16 entries) */
+#define RQT_CHUNK 2*1024*1024
+
+u32 cxio_hal_rqtpool_alloc(struct cxio_rdev *rdev_p, int size)
+{
+ unsigned long addr = gen_pool_alloc(rdev_p->rqt_pool, size << 6);
+ PDBG("%s addr 0x%x size %d\n", __FUNCTION__, (u32)addr, size << 6);
+ return (u32)addr;
+}
+
+void cxio_hal_rqtpool_free(struct cxio_rdev *rdev_p, u32 addr, int size)
+{
+ PDBG("%s addr 0x%x size %d\n", __FUNCTION__, addr, size << 6);
+ gen_pool_free(rdev_p->rqt_pool, (unsigned long)addr, size << 6);
+}
+
+int cxio_hal_rqtpool_create(struct cxio_rdev *rdev_p)
+{
+ unsigned long i;
+ rdev_p->rqt_pool = gen_pool_create(MIN_RQT_SHIFT, -1);
+ if (rdev_p->rqt_pool)
+ for (i = rdev_p->rnic_info.rqt_base;
+ i <= rdev_p->rnic_info.rqt_top - RQT_CHUNK + 1;
+ i += RQT_CHUNK)
+ gen_pool_add(rdev_p->rqt_pool, i, RQT_CHUNK, -1);
+ return rdev_p->rqt_pool ? 0 : -ENOMEM;
+}
+
+void cxio_hal_rqtpool_destroy(struct cxio_rdev *rdev_p)
+{
+ gen_pool_destroy(rdev_p->rqt_pool);
+}
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_resource.h b/drivers/infiniband/hw/cxgb3/core/cxio_resource.h
new file mode 100644
index 0000000..a6bbe83
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_resource.h
@@ -0,0 +1,70 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef __CXIO_RESOURCE_H__
+#define __CXIO_RESOURCE_H__
+
+#include <linux/kernel.h>
+#include <linux/random.h>
+#include <linux/slab.h>
+#include <linux/kfifo.h>
+#include <linux/spinlock.h>
+#include <linux/errno.h>
+#include <linux/genalloc.h>
+#include "cxio_hal.h"
+
+extern int cxio_hal_init_rhdl_resource(u32 nr_rhdl);
+extern void cxio_hal_destroy_rhdl_resource(void);
+extern int cxio_hal_init_resource(struct cxio_rdev *rdev_p,
+ u32 nr_tpt, u32 nr_pbl,
+ u32 nr_rqt, u32 nr_qpid, u32 nr_cqid,
+ u32 nr_pdid);
+extern u32 cxio_hal_get_stag(struct cxio_hal_resource *rscp);
+extern void cxio_hal_put_stag(struct cxio_hal_resource *rscp, u32 stag);
+extern u32 cxio_hal_get_qpid(struct cxio_hal_resource *rscp);
+extern void cxio_hal_put_qpid(struct cxio_hal_resource *rscp, u32 qpid);
+extern u32 cxio_hal_get_cqid(struct cxio_hal_resource *rscp);
+extern void cxio_hal_put_cqid(struct cxio_hal_resource *rscp, u32 cqid);
+extern void cxio_hal_destroy_resource(struct cxio_hal_resource *rscp);
+
+#define PBL_OFF(rdev_p, a) ( (a) - (rdev_p)->rnic_info.pbl_base )
+extern int cxio_hal_pblpool_create(struct cxio_rdev *rdev_p);
+extern void cxio_hal_pblpool_destroy(struct cxio_rdev *rdev_p);
+extern u32 cxio_hal_pblpool_alloc(struct cxio_rdev *rdev_p, int size);
+extern void cxio_hal_pblpool_free(struct cxio_rdev *rdev_p, u32 addr, int size);
+
+#define RQT_OFF(rdev_p, a) ( (a) - (rdev_p)->rnic_info.rqt_base )
+extern int cxio_hal_rqtpool_create(struct cxio_rdev *rdev_p);
+extern void cxio_hal_rqtpool_destroy(struct cxio_rdev *rdev_p);
+extern u32 cxio_hal_rqtpool_alloc(struct cxio_rdev *rdev_p, int size);
+extern void cxio_hal_rqtpool_free(struct cxio_rdev *rdev_p, u32 addr, int size);
+#endif

2006-12-02 22:52:22

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 10/13] Core HAL


The RDMA Core interfaces with the T3 HW and ULLD providing a low level
RDMA interface.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/hw/cxgb3/core/cxio_hal.c | 1302 +++++++++++++++++++++++++++
drivers/infiniband/hw/cxgb3/core/cxio_hal.h | 201 ++++
2 files changed, 1503 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.c b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
new file mode 100644
index 0000000..367c834
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.c
@@ -0,0 +1,1302 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <asm/semaphore.h>
+#include <asm/delay.h>
+
+#include <linux/netdevice.h>
+#include <linux/sched.h>
+#include <linux/spinlock.h>
+#include <linux/pci.h>
+
+#include "cxio_resource.h"
+#include "cxio_hal.h"
+#include "cxgb3_offload.h"
+#include "sge_defs.h"
+
+static struct cxio_rdev *rdev_tbl[T3_MAX_NUM_RNIC];
+static cxio_hal_ev_callback_func_t cxio_ev_cb = NULL;
+
+static inline struct cxio_rdev *cxio_hal_find_rdev_by_name(char *dev_name)
+{
+ int i;
+ for (i = 0; i < T3_MAX_NUM_RNIC; i++)
+ if (rdev_tbl[i])
+ if (!strcmp(rdev_tbl[i]->dev_name, dev_name))
+ return rdev_tbl[i];
+ return NULL;
+}
+
+static inline struct cxio_rdev *cxio_hal_find_rdev_by_t3cdev(struct t3cdev
+ *tdev)
+{
+ int i;
+ for (i = 0; i < T3_MAX_NUM_RNIC; i++)
+ if (rdev_tbl[i])
+ if (rdev_tbl[i]->t3cdev_p == tdev)
+ return rdev_tbl[i];
+ return NULL;
+}
+
+static inline int cxio_hal_add_rdev(struct cxio_rdev *rdev_p)
+{
+ int i;
+ for (i = 0; i < T3_MAX_NUM_RNIC; i++)
+ if (!rdev_tbl[i]) {
+ rdev_tbl[i] = rdev_p;
+ break;
+ }
+ return (i == T3_MAX_NUM_RNIC);
+}
+
+static inline void cxio_hal_delete_rdev(struct cxio_rdev *rdev_p)
+{
+ int i;
+ for (i = 0; i < T3_MAX_NUM_RNIC; i++)
+ if (rdev_tbl[i] == rdev_p) {
+ rdev_tbl[i] = NULL;
+ break;
+ }
+}
+
+int cxio_hal_cq_op(struct cxio_rdev *rdev_p, struct t3_cq *cq,
+ enum t3_cq_opcode op, u32 credit)
+{
+ int ret;
+ struct t3_cqe *cqe;
+ u32 rptr;
+
+ struct rdma_cq_op setup;
+ setup.id = cq->cqid;
+ setup.credits = (op == CQ_CREDIT_UPDATE) ? credit : 0;
+ setup.op = op;
+ ret = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_OP, &setup);
+
+ if ((ret < 0) || (op == CQ_CREDIT_UPDATE))
+ return ret;
+
+ /*
+ * If the rearm returned an index other than our current index,
+ * then there might be CQE's in flight (being DMA'd). We must wait
+ * here for them to complete or the consumer can miss a notification.
+ */
+ if (Q_PTR2IDX((cq->rptr), cq->size_log2) != ret) {
+ int i=0;
+
+ rptr = cq->rptr;
+
+ /*
+ * Keep the generation correct by bumping rptr until it
+ * matches the index returned by the rearm - 1.
+ */
+ while (Q_PTR2IDX((rptr+1), cq->size_log2) != ret)
+ rptr++;
+
+ /*
+ * Now rptr is the index for the (last) cqe that was
+ * in-flight at the time the HW rearmed the CQ. We
+ * spin until that CQE is valid.
+ */
+ cqe = cq->queue + Q_PTR2IDX(rptr, cq->size_log2);
+ while (!CQ_VLD_ENTRY(rptr, cq->size_log2, cqe)) {
+ udelay(1);
+ if (i++ > 1000000) {
+ BUG_ON(1);
+ printk(KERN_ERR "%s: stalled rnic\n",
+ rdev_p->dev_name);
+ return -EIO;
+ }
+ }
+ }
+ return 0;
+}
+
+static inline int cxio_hal_clear_cq_ctx(struct cxio_rdev *rdev_p, u32 cqid)
+{
+ struct rdma_cq_setup setup;
+ setup.id = cqid;
+ setup.base_addr = 0; /* NULL address */
+ setup.size = 0; /* disaable the CQ */
+ setup.credits = 0;
+ setup.credit_thres = 0;
+ setup.ovfl_mode = 0;
+ return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
+}
+
+int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev_p, u32 qpid)
+{
+ u64 sge_cmd;
+ struct t3_modify_qp_wr *wqe;
+ struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL);
+ if (!skb) {
+ PDBG("%s alloc_skb failed\n", __FUNCTION__);
+ return -ENOMEM;
+ }
+ wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe));
+ memset(wqe, 0, sizeof(*wqe));
+ build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 3, 1, qpid, 7);
+ wqe->flags = cpu_to_be32(MODQP_WRITE_EC);
+ sge_cmd = qpid << 8 | 3;
+ wqe->sge_cmd = cpu_to_be64(sge_cmd);
+ skb->priority = CPL_PRIORITY_CONTROL;
+ return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb));
+}
+
+int cxio_create_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq)
+{
+ struct rdma_cq_setup setup;
+ int size = (1UL << (cq->size_log2)) * sizeof(struct t3_cqe);
+
+ cq->cqid = cxio_hal_get_cqid(rdev_p->rscp);
+ if (!cq->cqid)
+ return -ENOMEM;
+ cq->sw_queue = kzalloc(size, GFP_KERNEL);
+ if (!cq->sw_queue)
+ return -ENOMEM;
+ cq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev),
+ (1UL << (cq->size_log2)) *
+ sizeof(struct t3_cqe),
+ &(cq->dma_addr), GFP_KERNEL);
+ if (!cq->queue) {
+ kfree(cq->sw_queue);
+ return -ENOMEM;
+ }
+ pci_unmap_addr_set(cq, mapping, cq->dma_addr);
+ memset(cq->queue, 0, size);
+ setup.id = cq->cqid;
+ setup.base_addr = (u64) (cq->dma_addr);
+ setup.size = 1UL << cq->size_log2;
+ setup.credits = 65535;
+ setup.credit_thres = 1;
+ if (rdev_p->t3cdev_p->type == T3B)
+ setup.ovfl_mode = 0;
+ else
+ setup.ovfl_mode = 1;
+ return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
+}
+
+int cxio_resize_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq)
+{
+ struct rdma_cq_setup setup;
+ setup.id = cq->cqid;
+ setup.base_addr = (u64) (cq->dma_addr);
+ setup.size = 1UL << cq->size_log2;
+ setup.credits = setup.size;
+ setup.credit_thres = setup.size; /* TBD: overflow recovery */
+ setup.ovfl_mode = 1;
+ return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
+}
+
+static u32 get_qpid(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx)
+{
+ struct cxio_qpid_list *entry;
+ u32 qpid;
+ int i;
+
+ mutex_lock(&uctx->lock);
+ if (!list_empty(&uctx->qpids)) {
+ entry = list_entry(uctx->qpids.next, struct cxio_qpid_list,
+ entry);
+ list_del(&entry->entry);
+ qpid = entry->qpid;
+ kfree(entry);
+ } else {
+ qpid = cxio_hal_get_qpid(rdev_p->rscp);
+ if (!qpid)
+ goto out;
+ for (i = qpid+1; i & rdev_p->qpmask; i++) {
+ entry = kmalloc(sizeof *entry, GFP_KERNEL);
+ if (!entry)
+ break;
+ entry->qpid = i;
+ list_add_tail(&entry->entry, &uctx->qpids);
+ }
+ }
+out:
+ mutex_unlock(&uctx->lock);
+ PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid);
+ return qpid;
+}
+
+static void put_qpid(struct cxio_rdev *rdev_p, u32 qpid,
+ struct cxio_ucontext *uctx)
+{
+ struct cxio_qpid_list *entry;
+
+ entry = kmalloc(sizeof *entry, GFP_KERNEL);
+ if (!entry)
+ return;
+ PDBG("%s qpid 0x%x\n", __FUNCTION__, qpid);
+ entry->qpid = qpid;
+ mutex_lock(&uctx->lock);
+ list_add_tail(&entry->entry, &uctx->qpids);
+ mutex_unlock(&uctx->lock);
+}
+
+void cxio_release_ucontext(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx)
+{
+ struct list_head *pos, *nxt;
+ struct cxio_qpid_list *entry;
+
+ mutex_lock(&uctx->lock);
+ list_for_each_safe(pos, nxt, &uctx->qpids) {
+ entry = list_entry(pos, struct cxio_qpid_list, entry);
+ list_del_init(&entry->entry);
+ if (!(entry->qpid & rdev_p->qpmask))
+ cxio_hal_put_qpid(rdev_p->rscp, entry->qpid);
+ kfree(entry);
+ }
+ mutex_unlock(&uctx->lock);
+}
+
+void cxio_init_ucontext(struct cxio_rdev *rdev_p, struct cxio_ucontext *uctx)
+{
+ INIT_LIST_HEAD(&uctx->qpids);
+ mutex_init(&uctx->lock);
+}
+
+int cxio_create_qp(struct cxio_rdev *rdev_p, u32 kernel_domain,
+ struct t3_wq *wq, struct cxio_ucontext *uctx)
+{
+ int depth = 1UL << wq->size_log2;
+ int rqsize = 1UL << wq->rq_size_log2;
+
+ wq->qpid = get_qpid(rdev_p, uctx);
+ if (!wq->qpid)
+ return -ENOMEM;
+
+ wq->rq = kzalloc(depth * sizeof(u64), GFP_KERNEL);
+ if (!wq->rq)
+ goto err1;
+
+ wq->rq_addr = cxio_hal_rqtpool_alloc(rdev_p, rqsize);
+ if (!wq->rq_addr)
+ goto err2;
+
+ wq->sq = kzalloc(depth * sizeof(struct t3_swsq), GFP_KERNEL);
+ if (!wq->sq)
+ goto err3;
+
+ wq->queue = dma_alloc_coherent(&(rdev_p->rnic_info.pdev->dev),
+ depth * sizeof(union t3_wr),
+ &(wq->dma_addr), GFP_KERNEL);
+ if (!wq->queue)
+ goto err4;
+
+ memset(wq->queue, 0, depth * sizeof(union t3_wr));
+ pci_unmap_addr_set(wq, mapping, wq->dma_addr);
+ wq->doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr;
+ if (!kernel_domain)
+ wq->udb = (u64)rdev_p->rnic_info.udbell_physbase +
+ (wq->qpid << rdev_p->qpshift);
+ PDBG("%s qpid 0x%x doorbell 0x%p udb 0x%llx\n", __FUNCTION__,
+ wq->qpid, wq->doorbell, wq->udb);
+ return 0;
+err4:
+ kfree(wq->sq);
+err3:
+ cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, rqsize);
+err2:
+ kfree(wq->rq);
+err1:
+ put_qpid(rdev_p, wq->qpid, uctx);
+ return -ENOMEM;
+}
+
+int cxio_destroy_cq(struct cxio_rdev *rdev_p, struct t3_cq *cq)
+{
+ int err;
+ err = cxio_hal_clear_cq_ctx(rdev_p, cq->cqid);
+ kfree(cq->sw_queue);
+ dma_free_coherent(&(rdev_p->rnic_info.pdev->dev),
+ (1UL << (cq->size_log2))
+ * sizeof(struct t3_cqe), cq->queue,
+ pci_unmap_addr(cq, mapping));
+ cxio_hal_put_cqid(rdev_p->rscp, cq->cqid);
+ return err;
+}
+
+int cxio_destroy_qp(struct cxio_rdev *rdev_p, struct t3_wq *wq,
+ struct cxio_ucontext *uctx)
+{
+ dma_free_coherent(&(rdev_p->rnic_info.pdev->dev),
+ (1UL << (wq->size_log2))
+ * sizeof(union t3_wr), wq->queue,
+ pci_unmap_addr(wq, mapping));
+ kfree(wq->sq);
+ cxio_hal_rqtpool_free(rdev_p, wq->rq_addr, (1UL << wq->rq_size_log2));
+ kfree(wq->rq);
+ put_qpid(rdev_p, wq->qpid, uctx);
+ return 0;
+}
+
+static void insert_recv_cqe(struct t3_wq *wq, struct t3_cq *cq)
+{
+ struct t3_cqe cqe;
+
+ PDBG("%s wq %p cq %p sw_rptr 0x%x sw_wptr 0x%x\n", __FUNCTION__,
+ wq, cq, cq->sw_rptr, cq->sw_wptr);
+ memset(&cqe, 0, sizeof(cqe));
+ cqe.header = cpu_to_be32(V_CQE_STATUS(TPT_ERR_SWFLUSH) |
+ V_CQE_OPCODE(T3_SEND) |
+ V_CQE_TYPE(0) |
+ V_CQE_SWCQE(1) |
+ V_CQE_QPID(wq->qpid) |
+ V_CQE_GENBIT(Q_GENBIT(cq->sw_wptr,
+ cq->size_log2)));
+ *(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2)) = cqe;
+ cq->sw_wptr++;
+}
+
+void cxio_flush_rq(struct t3_wq *wq, struct t3_cq *cq, int count)
+{
+ u32 ptr;
+
+ PDBG("%s wq %p cq %p\n", __FUNCTION__, wq, cq);
+
+ /* flush RQ */
+ PDBG("%s rq_rptr %u rq_wptr %u skip count %u\n", __FUNCTION__,
+ wq->rq_rptr, wq->rq_wptr, count);
+ ptr = wq->rq_rptr + count;
+ while (ptr++ != wq->rq_wptr)
+ insert_recv_cqe(wq, cq);
+}
+
+static void insert_sq_cqe(struct t3_wq *wq, struct t3_cq *cq,
+ struct t3_swsq *sqp)
+{
+ struct t3_cqe cqe;
+
+ PDBG("%s wq %p cq %p sw_rptr 0x%x sw_wptr 0x%x\n", __FUNCTION__,
+ wq, cq, cq->sw_rptr, cq->sw_wptr);
+ memset(&cqe, 0, sizeof(cqe));
+ cqe.header = cpu_to_be32(V_CQE_STATUS(TPT_ERR_SWFLUSH) |
+ V_CQE_OPCODE(sqp->opcode) |
+ V_CQE_TYPE(1) |
+ V_CQE_SWCQE(1) |
+ V_CQE_QPID(wq->qpid) |
+ V_CQE_GENBIT(Q_GENBIT(cq->sw_wptr,
+ cq->size_log2)));
+ cqe.u.scqe.wrid_hi = sqp->sq_wptr;
+
+ *(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2)) = cqe;
+ cq->sw_wptr++;
+}
+
+void cxio_flush_sq(struct t3_wq *wq, struct t3_cq *cq, int count)
+{
+ __u32 ptr;
+ struct t3_swsq *sqp = wq->sq + Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2);
+
+ ptr = wq->sq_rptr + count;
+ sqp += count;
+ while (ptr != wq->sq_wptr) {
+ insert_sq_cqe(wq, cq, sqp);
+ sqp++;
+ ptr++;
+ }
+}
+
+/*
+ * Move all CQEs from the HWCQ into the SWCQ.
+ */
+void cxio_flush_hw_cq(struct t3_cq *cq)
+{
+ struct t3_cqe *cqe, *swcqe;
+
+ PDBG("%s cq %p cqid 0x%x\n", __FUNCTION__, cq, cq->cqid);
+ cqe = cxio_next_hw_cqe(cq);
+ while (cqe) {
+ PDBG("%s flushing hwcq rptr 0x%x to swcq wptr 0x%x\n",
+ __FUNCTION__, cq->rptr, cq->sw_wptr);
+ swcqe = cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2);
+ *swcqe = *cqe;
+ swcqe->header |= cpu_to_be32(V_CQE_SWCQE(1));
+ cq->sw_wptr++;
+ cq->rptr++;
+ cqe = cxio_next_hw_cqe(cq);
+ }
+}
+
+static inline int cqe_completes_wr(struct t3_cqe *cqe, struct t3_wq *wq)
+{
+ if (CQE_OPCODE(*cqe) == T3_TERMINATE)
+ return 0;
+
+ if ((CQE_OPCODE(*cqe) == T3_RDMA_WRITE) && RQ_TYPE(*cqe))
+ return 0;
+
+ if ((CQE_OPCODE(*cqe) == T3_READ_RESP) && SQ_TYPE(*cqe))
+ return 0;
+
+ if ((CQE_OPCODE(*cqe) == T3_SEND) && RQ_TYPE(*cqe) &&
+ Q_EMPTY(wq->rq_rptr, wq->rq_wptr))
+ return 0;
+
+ return 1;
+}
+
+void cxio_count_scqes(struct t3_cq *cq, struct t3_wq *wq, int *count)
+{
+ struct t3_cqe *cqe;
+ u32 ptr;
+
+ *count = 0;
+ ptr = cq->sw_rptr;
+ while (!Q_EMPTY(ptr, cq->sw_wptr)) {
+ cqe = cq->sw_queue + (Q_PTR2IDX(ptr, cq->size_log2));
+ if ((SQ_TYPE(*cqe) || (CQE_OPCODE(*cqe) == T3_READ_RESP)) &&
+ (CQE_QPID(*cqe) == wq->qpid))
+ (*count)++;
+ ptr++;
+ }
+ PDBG("%s cq %p count %d\n", __FUNCTION__, cq, *count);
+}
+
+void cxio_count_rcqes(struct t3_cq *cq, struct t3_wq *wq, int *count)
+{
+ struct t3_cqe *cqe;
+ u32 ptr;
+
+ *count = 0;
+ PDBG("%s count zero %d\n", __FUNCTION__, *count);
+ ptr = cq->sw_rptr;
+ while (!Q_EMPTY(ptr, cq->sw_wptr)) {
+ cqe = cq->sw_queue + (Q_PTR2IDX(ptr, cq->size_log2));
+ if (RQ_TYPE(*cqe) && (CQE_OPCODE(*cqe) != T3_READ_RESP) &&
+ (CQE_QPID(*cqe) == wq->qpid) && cqe_completes_wr(cqe, wq))
+ (*count)++;
+ ptr++;
+ }
+ PDBG("%s cq %p count %d\n", __FUNCTION__, cq, *count);
+}
+
+static int cxio_hal_init_ctrl_cq(struct cxio_rdev *rdev_p)
+{
+ struct rdma_cq_setup setup;
+ setup.id = 0;
+ setup.base_addr = 0; /* NULL address */
+ setup.size = 1; /* enable the CQ */
+ setup.credits = 0;
+
+ /* force SGE to redirect to RspQ and interrupt */
+ setup.credit_thres = 0;
+ setup.ovfl_mode = 1;
+ return (rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_CQ_SETUP, &setup));
+}
+
+static int cxio_hal_init_ctrl_qp(struct cxio_rdev *rdev_p)
+{
+ int err;
+ u64 sge_cmd, ctx0, ctx1;
+ u64 base_addr;
+ struct t3_modify_qp_wr *wqe;
+ struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_KERNEL);
+
+
+ if (!skb) {
+ PDBG("%s alloc_skb failed\n", __FUNCTION__);
+ return -ENOMEM;
+ }
+ err = cxio_hal_init_ctrl_cq(rdev_p);
+ if (err) {
+ PDBG("%s err %d initializing ctrl_cq\n", __FUNCTION__, err);
+ return err;
+ }
+ rdev_p->ctrl_qp.workq = dma_alloc_coherent(
+ &(rdev_p->rnic_info.pdev->dev),
+ (1 << T3_CTRL_QP_SIZE_LOG2) *
+ sizeof(union t3_wr),
+ &(rdev_p->ctrl_qp.dma_addr),
+ GFP_KERNEL);
+ if (!rdev_p->ctrl_qp.workq) {
+ PDBG("%s dma_alloc_coherent failed\n", __FUNCTION__);
+ return -ENOMEM;
+ }
+ pci_unmap_addr_set(&rdev_p->ctrl_qp, mapping,
+ rdev_p->ctrl_qp.dma_addr);
+ rdev_p->ctrl_qp.doorbell = (void __iomem *)rdev_p->rnic_info.kdb_addr;
+ memset(rdev_p->ctrl_qp.workq, 0,
+ (1 << T3_CTRL_QP_SIZE_LOG2) * sizeof(union t3_wr));
+
+ init_MUTEX(&rdev_p->ctrl_qp.sem);
+ init_waitqueue_head(&rdev_p->ctrl_qp.waitq);
+
+ /* update HW Ctrl QP context */
+ base_addr = rdev_p->ctrl_qp.dma_addr;
+ base_addr >>= 12;
+ ctx0 = (V_EC_SIZE((1 << T3_CTRL_QP_SIZE_LOG2)) |
+ V_EC_BASE_LO((u32) base_addr & 0xffff));
+ ctx0 <<= 32;
+ ctx0 |= V_EC_CREDITS(FW_WR_NUM);
+ base_addr >>= 16;
+ ctx1 = (u32) base_addr;
+ base_addr >>= 32;
+ ctx1 |= ((u64) (V_EC_BASE_HI((u32) base_addr & 0xf) | V_EC_RESPQ(0) |
+ V_EC_TYPE(0) | V_EC_GEN(1) |
+ V_EC_UP_TOKEN(T3_CTL_QP_TID) | F_EC_VALID)) << 32;
+ wqe = (struct t3_modify_qp_wr *) skb_put(skb, sizeof(*wqe));
+ memset(wqe, 0, sizeof(*wqe));
+ build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_QP_MOD, 0, 1,
+ T3_CTL_QP_TID, 7);
+ wqe->flags = cpu_to_be32(MODQP_WRITE_EC);
+ sge_cmd = (3ULL << 56) | FW_RI_SGEEC_START << 8 | 3;
+ wqe->sge_cmd = cpu_to_be64(sge_cmd);
+ wqe->ctx1 = cpu_to_be64(ctx1);
+ wqe->ctx0 = cpu_to_be64(ctx0);
+ PDBG("CtrlQP dma_addr 0x%llx workq %p size %d\n",
+ (u64) rdev_p->ctrl_qp.dma_addr, rdev_p->ctrl_qp.workq,
+ 1 << T3_CTRL_QP_SIZE_LOG2);
+ skb->priority = CPL_PRIORITY_CONTROL;
+ return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb));
+}
+
+static int cxio_hal_destroy_ctrl_qp(struct cxio_rdev *rdev_p)
+{
+ dma_free_coherent(&(rdev_p->rnic_info.pdev->dev),
+ (1UL << T3_CTRL_QP_SIZE_LOG2)
+ * sizeof(union t3_wr), rdev_p->ctrl_qp.workq,
+ pci_unmap_addr(&rdev_p->ctrl_qp, mapping));
+ return cxio_hal_clear_qp_ctx(rdev_p, T3_CTRL_QP_ID);
+}
+
+/* write len bytes of data into addr (32B aligned address)
+ * If data is NULL, clear len byte of memory to zero.
+ * caller aquires the sem before the call
+ */
+static int cxio_hal_ctrl_qp_write_mem(struct cxio_rdev *rdev_p, u32 addr,
+ u32 len, void *data, int completion)
+{
+ u32 i, nr_wqe, copy_len;
+ u8 *copy_data;
+ u8 wr_len, utx_len; /* lenght in 8 byte flit */
+ enum t3_wr_flags flag;
+ __be64 *wqe;
+ u64 utx_cmd;
+ addr &= 0x7FFFFFF;
+ nr_wqe = len % 96 ? len / 96 + 1 : len / 96; /* 96B max per WQE */
+ PDBG("%s wptr 0x%x rptr 0x%x len %d, nr_wqe %d data %p addr 0x%0x\n",
+ __FUNCTION__, rdev_p->ctrl_qp.wptr, rdev_p->ctrl_qp.rptr, len,
+ nr_wqe, data, addr);
+ utx_len = 3; /* in 32B unit */
+ for (i = 0; i < nr_wqe; i++) {
+ if (Q_FULL(rdev_p->ctrl_qp.rptr, rdev_p->ctrl_qp.wptr,
+ T3_CTRL_QP_SIZE_LOG2)) {
+ PDBG("%s ctrl_qp full wtpr 0x%0x rptr 0x%0x, "
+ "wait for more space i %d\n", __FUNCTION__,
+ rdev_p->ctrl_qp.wptr, rdev_p->ctrl_qp.rptr, i);
+ if (wait_event_interruptible(rdev_p->ctrl_qp.waitq,
+ !Q_FULL(rdev_p->ctrl_qp.rptr,
+ rdev_p->ctrl_qp.wptr,
+ T3_CTRL_QP_SIZE_LOG2))) {
+ PDBG("%s ctrl_qp workq interrupted\n",
+ __FUNCTION__);
+ return -ERESTARTSYS;
+ }
+ PDBG("%s ctrl_qp wakeup, continue posting work request "
+ "i %d\n", __FUNCTION__, i);
+ }
+ wqe = (__be64 *)(rdev_p->ctrl_qp.workq + (rdev_p->ctrl_qp.wptr %
+ (1 << T3_CTRL_QP_SIZE_LOG2)));
+ flag = 0;
+ if (i == (nr_wqe - 1)) {
+ /* last WQE */
+ flag = completion ? T3_COMPLETION_FLAG : 0;
+ if (len % 32)
+ utx_len = len / 32 + 1;
+ else
+ utx_len = len / 32;
+ }
+
+ /*
+ * Force a CQE to return the credit to the workq in case
+ * we posted more than half the max QP size of WRs
+ */
+ if ((i != 0) &&
+ (i % (((1 << T3_CTRL_QP_SIZE_LOG2)) >> 1) == 0)) {
+ flag = T3_COMPLETION_FLAG;
+ PDBG("%s force completion at i %d\n", __FUNCTION__, i);
+ }
+
+ /* build the utx mem command */
+ wqe += (sizeof(struct t3_bypass_wr) >> 3);
+ utx_cmd = (T3_UTX_MEM_WRITE << 28) | (addr + i * 3);
+ utx_cmd <<= 32;
+ utx_cmd |= (utx_len << 28) | ((utx_len << 2) + 1);
+ *wqe = cpu_to_be64(utx_cmd);
+ wqe++;
+ copy_data = (u8 *) data + i * 96;
+ copy_len = len > 96 ? 96 : len;
+
+ /* clear memory content if data is NULL */
+ if (data)
+ memcpy(wqe, copy_data, copy_len);
+ else
+ memset(wqe, 0, copy_len);
+ if (copy_len % 32)
+ memset(((u8 *) wqe) + copy_len, 0,
+ 32 - (copy_len % 32));
+ wr_len = ((sizeof(struct t3_bypass_wr)) >> 3) + 1 +
+ (utx_len << 2);
+ wqe = (__be64 *)(rdev_p->ctrl_qp.workq + (rdev_p->ctrl_qp.wptr %
+ (1 << T3_CTRL_QP_SIZE_LOG2)));
+
+ /* wptr in the WRID[31:0] */
+ ((union t3_wrid *)(wqe+1))->id0.low = rdev_p->ctrl_qp.wptr;
+
+ /*
+ * This must be the last write with a memory barrier
+ * for the genbit
+ */
+ build_fw_riwrh((struct fw_riwrh *) wqe, T3_WR_BP, flag,
+ Q_GENBIT(rdev_p->ctrl_qp.wptr,
+ T3_CTRL_QP_SIZE_LOG2), T3_CTRL_QP_ID,
+ wr_len);
+ if (flag == T3_COMPLETION_FLAG)
+ ring_doorbell(rdev_p->ctrl_qp.doorbell, T3_CTRL_QP_ID);
+ len -= 96;
+ rdev_p->ctrl_qp.wptr++;
+ }
+ return 0;
+}
+
+/* IN: stag key, pdid, perm, zbva, to, len, page_size, pbl, and pbl_size
+ * OUT: stag index, actual pbl_size, pbl_addr allocated.
+ * TBD: shared memory region support
+ */
+static int __cxio_tpt_op(struct cxio_rdev *rdev_p, u32 reset_tpt_entry,
+ u32 *stag, u8 stag_state, u32 pdid,
+ enum tpt_mem_type type, enum tpt_mem_perm perm,
+ u32 zbva, u64 to, u32 len, u8 page_size, __be64 *pbl,
+ u32 *pbl_size, u32 *pbl_addr)
+{
+ int err;
+ struct tpt_entry tpt;
+ u32 stag_idx;
+ u32 wptr;
+ int rereg = (*stag != T3_STAG_UNSET);
+
+ stag_state = stag_state > 0;
+ stag_idx = (*stag) >> 8;
+
+ if ((!reset_tpt_entry) && !(*stag != T3_STAG_UNSET)) {
+ stag_idx = cxio_hal_get_stag(rdev_p->rscp);
+ if (!stag_idx)
+ return -ENOMEM;
+ *stag = (stag_idx << 8) | ((*stag) & 0xFF);
+ }
+ PDBG("%s stag_state 0x%0x type 0x%0x pdid 0x%0x, stag_idx 0x%x\n",
+ __FUNCTION__, stag_state, type, pdid, stag_idx);
+
+ if (reset_tpt_entry)
+ cxio_hal_pblpool_free(rdev_p, *pbl_addr, *pbl_size << 3);
+ else if (!rereg) {
+ *pbl_addr = cxio_hal_pblpool_alloc(rdev_p, *pbl_size << 3);
+ if (!*pbl_addr) {
+ return -ENOMEM;
+ }
+ }
+
+ down_interruptible(&rdev_p->ctrl_qp.sem);
+
+ /* write PBL first if any - update pbl only if pbl list exist */
+ if (pbl) {
+
+ PDBG("%s *pdb_addr 0x%x, pbl_base 0x%x, pbl_size %d\n",
+ __FUNCTION__, *pbl_addr, rdev_p->rnic_info.pbl_base,
+ *pbl_size);
+ err = cxio_hal_ctrl_qp_write_mem(rdev_p,
+ (*pbl_addr >> 5),
+ (*pbl_size << 3), pbl, 0);
+ if (err)
+ goto ret;
+ }
+
+ /* write TPT entry */
+ if (reset_tpt_entry)
+ memset(&tpt, 0, sizeof(tpt));
+ else {
+ tpt.valid_stag_pdid = cpu_to_be32(F_TPT_VALID |
+ V_TPT_STAG_KEY((*stag) & M_TPT_STAG_KEY) |
+ V_TPT_STAG_STATE(stag_state) |
+ V_TPT_STAG_TYPE(type) | V_TPT_PDID(pdid));
+ BUG_ON(page_size >= 28);
+ tpt.flags_pagesize_qpid = cpu_to_be32(V_TPT_PERM(perm) |
+ F_TPT_MW_BIND_ENABLE |
+ V_TPT_ADDR_TYPE((zbva ? TPT_ZBTO : TPT_VATO)) |
+ V_TPT_PAGE_SIZE(page_size));
+ tpt.rsvd_pbl_addr = reset_tpt_entry ? 0 :
+ cpu_to_be32(V_TPT_PBL_ADDR(PBL_OFF(rdev_p, *pbl_addr)>>3));
+ tpt.len = cpu_to_be32(len);
+ tpt.va_hi = cpu_to_be32((u32) (to >> 32));
+ tpt.va_low_or_fbo = cpu_to_be32((u32) (to & 0xFFFFFFFFULL));
+ tpt.rsvd_bind_cnt_or_pstag = 0;
+ tpt.rsvd_pbl_size = reset_tpt_entry ? 0 :
+ cpu_to_be32(V_TPT_PBL_SIZE((*pbl_size) >> 2));
+ }
+ err = cxio_hal_ctrl_qp_write_mem(rdev_p,
+ stag_idx +
+ (rdev_p->rnic_info.tpt_base >> 5),
+ sizeof(tpt), &tpt, 1);
+
+ /* release the stag index to free pool */
+ if (reset_tpt_entry)
+ cxio_hal_put_stag(rdev_p->rscp, stag_idx);
+ret:
+ wptr = rdev_p->ctrl_qp.wptr;
+ up(&rdev_p->ctrl_qp.sem);
+ if (!err)
+ if (wait_event_interruptible(rdev_p->ctrl_qp.waitq,
+ SEQ32_GE(rdev_p->ctrl_qp.rptr,
+ wptr)))
+ return -ERESTARTSYS;
+ return err;
+}
+
+/* IN : stag key, pdid, pbl_size
+ * Out: stag index, actaul pbl_size, and pbl_addr allocated.
+ */
+int cxio_allocate_stag(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid,
+ enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr)
+{
+ *stag = T3_STAG_UNSET;
+ return (__cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_NON_SHARED_MR,
+ perm, 0, 0ULL, 0, 0, NULL, pbl_size, pbl_addr));
+}
+
+int cxio_register_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid,
+ enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
+ u8 page_size, __be64 *pbl, u32 *pbl_size,
+ u32 *pbl_addr)
+{
+ *stag = T3_STAG_UNSET;
+ return __cxio_tpt_op(rdev_p, 0, stag, 1, pdid, TPT_NON_SHARED_MR, perm,
+ zbva, to, len, page_size, pbl, pbl_size, pbl_addr);
+}
+
+int cxio_reregister_phys_mem(struct cxio_rdev *rdev_p, u32 *stag, u32 pdid,
+ enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
+ u8 page_size, __be64 *pbl, u32 *pbl_size,
+ u32 *pbl_addr)
+{
+ return __cxio_tpt_op(rdev_p, 0, stag, 1, pdid, TPT_NON_SHARED_MR, perm,
+ zbva, to, len, page_size, pbl, pbl_size, pbl_addr);
+}
+
+int cxio_dereg_mem(struct cxio_rdev *rdev_p, u32 stag, u32 pbl_size,
+ u32 pbl_addr)
+{
+ return __cxio_tpt_op(rdev_p, 1, &stag, 0, 0, 0, 0, 0, 0ULL, 0, 0, NULL,
+ &pbl_size, &pbl_addr);
+}
+
+int cxio_allocate_window(struct cxio_rdev *rdev_p, u32 * stag, u32 pdid)
+{
+ u32 pbl_size = 0;
+ *stag = T3_STAG_UNSET;
+ return __cxio_tpt_op(rdev_p, 0, stag, 0, pdid, TPT_MW, 0, 0, 0ULL, 0, 0,
+ NULL, &pbl_size, NULL);
+}
+
+int cxio_deallocate_window(struct cxio_rdev *rdev_p, u32 stag)
+{
+ return __cxio_tpt_op(rdev_p, 1, &stag, 0, 0, 0, 0, 0, 0ULL, 0, 0, NULL,
+ NULL, NULL);
+}
+
+int cxio_rdma_init(struct cxio_rdev *rdev_p, struct t3_rdma_init_attr *attr)
+{
+ struct t3_rdma_init_wr *wqe;
+ struct sk_buff *skb = alloc_skb(sizeof(*wqe), GFP_ATOMIC);
+ if (!skb)
+ return -ENOMEM;
+ PDBG("%s rdev_p %p\n", __FUNCTION__, rdev_p);
+ wqe = (struct t3_rdma_init_wr *) __skb_put(skb, sizeof(*wqe));
+ wqe->wrh.op_seop_flags = cpu_to_be32(V_FW_RIWR_OP(T3_WR_INIT));
+ wqe->wrh.gen_tid_len = cpu_to_be32(V_FW_RIWR_TID(attr->tid) |
+ V_FW_RIWR_LEN(sizeof(*wqe) >> 3));
+ wqe->wrid.id1 = 0;
+ wqe->qpid = cpu_to_be32(attr->qpid);
+ wqe->pdid = cpu_to_be32(attr->pdid);
+ wqe->scqid = cpu_to_be32(attr->scqid);
+ wqe->rcqid = cpu_to_be32(attr->rcqid);
+ wqe->rq_addr = cpu_to_be32(attr->rq_addr - rdev_p->rnic_info.rqt_base);
+ wqe->rq_size = cpu_to_be32(attr->rq_size);
+ wqe->mpaattrs = attr->mpaattrs;
+ wqe->qpcaps = attr->qpcaps;
+ wqe->ulpdu_size = cpu_to_be16(attr->tcp_emss);
+ wqe->flags = cpu_to_be32(attr->flags);
+ wqe->ord = cpu_to_be32(attr->ord);
+ wqe->ird = cpu_to_be32(attr->ird);
+ wqe->qp_dma_addr = cpu_to_be64(attr->qp_dma_addr);
+ wqe->qp_dma_size = cpu_to_be32(attr->qp_dma_size);
+ wqe->rsvd = 0;
+ skb->priority = 0; /* 0=>ToeQ; 1=>CtrlQ */
+ return (cxgb3_ofld_send(rdev_p->t3cdev_p, skb));
+}
+
+void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb)
+{
+ cxio_ev_cb = ev_cb;
+}
+
+void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb)
+{
+ cxio_ev_cb = NULL;
+}
+
+static int cxio_hal_ev_handler(struct t3cdev *t3cdev_p, struct sk_buff *skb)
+{
+ static int cnt;
+ struct cxio_rdev *rdev_p = NULL;
+ struct respQ_msg_t *rsp_msg = (struct respQ_msg_t *) skb->data;
+ PDBG("%d: %s cq_id 0x%x cq_ptr 0x%x genbit %0x overflow %0x an %0x"
+ " se %0x notify %0x cqbranch %0x creditth %0x\n",
+ cnt, __FUNCTION__, RSPQ_CQID(rsp_msg), RSPQ_CQPTR(rsp_msg),
+ RSPQ_GENBIT(rsp_msg), RSPQ_OVERFLOW(rsp_msg), RSPQ_AN(rsp_msg),
+ RSPQ_SE(rsp_msg), RSPQ_NOTIFY(rsp_msg), RSPQ_CQBRANCH(rsp_msg),
+ RSPQ_CREDIT_THRESH(rsp_msg));
+ PDBG("CQE: QPID 0x%0x genbit %0x type 0x%0x status 0x%0x opcode %d "
+ "len 0x%0x wrid_hi_stag 0x%x wrid_low_msn 0x%x\n",
+ CQE_QPID(rsp_msg->cqe), CQE_GENBIT(rsp_msg->cqe),
+ CQE_TYPE(rsp_msg->cqe), CQE_STATUS(rsp_msg->cqe),
+ CQE_OPCODE(rsp_msg->cqe), CQE_LEN(rsp_msg->cqe),
+ CQE_WRID_HI(rsp_msg->cqe), CQE_WRID_LOW(rsp_msg->cqe));
+ rdev_p = (struct cxio_rdev *)t3cdev_p->ulp;
+ if (!rdev_p) {
+ PDBG("%s called by t3cdev %p with null ulp\n", __FUNCTION__,
+ t3cdev_p);
+ return 0;
+ }
+ if (CQE_QPID(rsp_msg->cqe) == T3_CTRL_QP_ID) {
+ rdev_p->ctrl_qp.rptr = CQE_WRID_LOW(rsp_msg->cqe) + 1;
+ wake_up_interruptible(&rdev_p->ctrl_qp.waitq);
+ dev_kfree_skb_irq(skb);
+ } else if (CQE_QPID(rsp_msg->cqe) == 0xfff8)
+ dev_kfree_skb_irq(skb);
+ else if (cxio_ev_cb)
+ (*cxio_ev_cb) (rdev_p, skb);
+ else
+ dev_kfree_skb_irq(skb);
+ cnt++;
+ return 0;
+}
+
+/* Caller takes care of locking if needed */
+int cxio_rdev_open(struct cxio_rdev *rdev_p)
+{
+ struct net_device *netdev_p = NULL;
+ int err = 0;
+ if (strlen(rdev_p->dev_name)) {
+ if (cxio_hal_find_rdev_by_name(rdev_p->dev_name)) {
+ return -EBUSY;
+ }
+ netdev_p = dev_get_by_name(rdev_p->dev_name);
+ if (!netdev_p) {
+ return -EINVAL;
+ }
+ dev_put(netdev_p);
+ } else if (rdev_p->t3cdev_p) {
+ if (cxio_hal_find_rdev_by_t3cdev(rdev_p->t3cdev_p)) {
+ return -EBUSY;
+ }
+ netdev_p = rdev_p->t3cdev_p->lldev;
+ strncpy(rdev_p->dev_name, rdev_p->t3cdev_p->name,
+ T3_MAX_DEV_NAME_LEN);
+ } else {
+ PDBG("%s t3cdev_p or dev_name must be set\n", __FUNCTION__);
+ return -EINVAL;
+ }
+
+ if (cxio_hal_add_rdev(rdev_p))
+ return -ENOMEM;
+
+ PDBG("%s opening rnic dev %s\n", __FUNCTION__, rdev_p->dev_name);
+ memset(&rdev_p->ctrl_qp, 0, sizeof(rdev_p->ctrl_qp));
+ if (!rdev_p->t3cdev_p)
+ rdev_p->t3cdev_p = T3CDEV(netdev_p);
+ rdev_p->t3cdev_p->ulp = (void *) rdev_p;
+ err = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, RDMA_GET_PARAMS,
+ &(rdev_p->rnic_info));
+ if (err) {
+ printk(KERN_ERR "%s t3cdev_p(%p)->ctl returned error %d.\n",
+ __FUNCTION__, rdev_p->t3cdev_p, err);
+ goto err1;
+ }
+ err = rdev_p->t3cdev_p->ctl(rdev_p->t3cdev_p, GET_PORTS,
+ &(rdev_p->port_info));
+ if (err) {
+ printk(KERN_ERR "%s t3cdev_p(%p)->ctl returned error %d.\n",
+ __FUNCTION__, rdev_p->t3cdev_p, err);
+ goto err1;
+ }
+
+ /*
+ * qpshift is the number of bits to shift the qpid left in order
+ * to get the correct address of the doorbell for that qp.
+ */
+ cxio_init_ucontext(rdev_p, &rdev_p->uctx);
+ rdev_p->qpshift = PAGE_SHIFT -
+ long_log2(65536 >>
+ long_log2(rdev_p->rnic_info.udbell_len >>
+ PAGE_SHIFT));
+ rdev_p->qpnr = rdev_p->rnic_info.udbell_len >> PAGE_SHIFT;
+ rdev_p->qpmask = (65536 >> long_log2(rdev_p->qpnr)) - 1;
+ PDBG("%s rnic %s info: tpt_base 0x%0x tpt_top 0x%0x num stags %d "
+ "pbl_base 0x%0x pbl_top 0x%0x rqt_base 0x%0x, rqt_top 0x%0x\n",
+ __FUNCTION__, rdev_p->dev_name, rdev_p->rnic_info.tpt_base,
+ rdev_p->rnic_info.tpt_top, cxio_num_stags(rdev_p),
+ rdev_p->rnic_info.pbl_base,
+ rdev_p->rnic_info.pbl_top, rdev_p->rnic_info.rqt_base,
+ rdev_p->rnic_info.rqt_top);
+ PDBG("udbell_len 0x%0x udbell_physbase 0x%lx kdb_addr %p qpshift %lu "
+ "qpnr %d qpmask 0x%x\n",
+ rdev_p->rnic_info.udbell_len,
+ rdev_p->rnic_info.udbell_physbase, rdev_p->rnic_info.kdb_addr,
+ rdev_p->qpshift, rdev_p->qpnr, rdev_p->qpmask);
+
+ err = cxio_hal_init_ctrl_qp(rdev_p);
+ if (err) {
+ printk(KERN_ERR "%s error %d initializing ctrl_qp.\n",
+ __FUNCTION__, err);
+ goto err1;
+ }
+ err = cxio_hal_init_resource(rdev_p, cxio_num_stags(rdev_p), 0,
+ 0, T3_MAX_NUM_QP, T3_MAX_NUM_CQ,
+ T3_MAX_NUM_PD);
+ if (err) {
+ printk(KERN_ERR "%s error %d initializing hal resources.\n",
+ __FUNCTION__, err);
+ goto err2;
+ }
+ err = cxio_hal_pblpool_create(rdev_p);
+ if (err) {
+ printk(KERN_ERR "%s error %d initializing pbl mem pool.\n",
+ __FUNCTION__, err);
+ goto err3;
+ }
+ err = cxio_hal_rqtpool_create(rdev_p);
+ if (err) {
+ printk(KERN_ERR "%s error %d initializing rqt mem pool.\n",
+ __FUNCTION__, err);
+ goto err4;
+ }
+ return 0;
+err4:
+ cxio_hal_pblpool_destroy(rdev_p);
+err3:
+ cxio_hal_destroy_resource(rdev_p->rscp);
+err2:
+ cxio_hal_destroy_ctrl_qp(rdev_p);
+err1:
+ cxio_hal_delete_rdev(rdev_p);
+ return err;
+}
+
+void cxio_rdev_close(struct cxio_rdev *rdev_p)
+{
+ if (rdev_p) {
+ cxio_hal_pblpool_destroy(rdev_p);
+ cxio_hal_rqtpool_destroy(rdev_p);
+ cxio_hal_delete_rdev(rdev_p);
+ rdev_p->t3cdev_p->ulp = NULL;
+ cxio_hal_destroy_ctrl_qp(rdev_p);
+ cxio_hal_destroy_resource(rdev_p->rscp);
+ }
+}
+
+int __init cxio_hal_init(void)
+{
+ if (cxio_hal_init_rhdl_resource(T3_MAX_NUM_RI))
+ return -ENOMEM;
+ memset(rdev_tbl, 0, T3_MAX_NUM_RNIC * sizeof(void *));
+ t3_register_cpl_handler(CPL_ASYNC_NOTIF, cxio_hal_ev_handler);
+ return 0;
+}
+
+void __exit cxio_hal_exit(void)
+{
+ int i;
+ t3_register_cpl_handler(CPL_ASYNC_NOTIF, NULL);
+ for (i = 0; i < T3_MAX_NUM_RNIC; i++)
+ cxio_rdev_close(rdev_tbl[i]);
+ cxio_hal_destroy_rhdl_resource();
+}
+
+static inline void flush_completed_wrs(struct t3_wq *wq, struct t3_cq *cq)
+{
+ struct t3_swsq *sqp;
+ __u32 ptr = wq->sq_rptr;
+ int count = Q_COUNT(wq->sq_rptr, wq->sq_wptr);
+
+ sqp = wq->sq + Q_PTR2IDX(ptr, wq->sq_size_log2);
+ while (count--)
+ if (!sqp->signaled) {
+ ptr++;
+ sqp = wq->sq + Q_PTR2IDX(ptr, wq->sq_size_log2);
+ } else if (sqp->complete) {
+
+ /*
+ * Insert this completed cqe into the swcq.
+ */
+ PDBG("%s moving cqe into swcq sq idx %ld cq idx %ld\n",
+ __FUNCTION__, Q_PTR2IDX(ptr, wq->sq_size_log2),
+ Q_PTR2IDX(cq->sw_wptr, cq->size_log2));
+ sqp->cqe.header |= htonl(V_CQE_SWCQE(1));
+ *(cq->sw_queue + Q_PTR2IDX(cq->sw_wptr, cq->size_log2))
+ = sqp->cqe;
+ cq->sw_wptr++;
+ sqp->signaled = 0;
+ break;
+ } else
+ break;
+}
+
+static inline void create_read_req_cqe(struct t3_wq *wq,
+ struct t3_cqe *hw_cqe,
+ struct t3_cqe *read_cqe)
+{
+ read_cqe->u.scqe.wrid_hi = wq->oldest_read->sq_wptr;
+ read_cqe->len = wq->oldest_read->read_len;
+ read_cqe->header = htonl(V_CQE_QPID(CQE_QPID(*hw_cqe)) |
+ V_CQE_SWCQE(SW_CQE(*hw_cqe)) |
+ V_CQE_OPCODE(T3_READ_REQ) |
+ V_CQE_TYPE(1));
+}
+
+/*
+ * Return a ptr to the next read wr in the SWSQ or NULL.
+ */
+static inline void advance_oldest_read(struct t3_wq *wq)
+{
+
+ u32 rptr = wq->oldest_read - wq->sq + 1;
+ u32 wptr = Q_PTR2IDX(wq->sq_wptr, wq->sq_size_log2);
+
+ while (Q_PTR2IDX(rptr, wq->sq_size_log2) != wptr) {
+ wq->oldest_read = wq->sq + Q_PTR2IDX(rptr, wq->sq_size_log2);
+
+ if (wq->oldest_read->opcode == T3_READ_REQ)
+ return;
+ rptr++;
+ }
+ wq->oldest_read = NULL;
+}
+
+/*
+ * cxio_poll_cq
+ *
+ * Caller must:
+ * check the validity of the first CQE,
+ * supply the wq assicated with the qpid.
+ *
+ * credit: cq credit to return to sge.
+ * cqe_flushed: 1 iff the CQE is flushed.
+ * cqe: copy of the polled CQE.
+ *
+ * return value:
+ * 0 CQE returned,
+ * -1 CQE skipped, try again.
+ */
+int cxio_poll_cq(struct t3_wq *wq, struct t3_cq *cq, struct t3_cqe *cqe,
+ u8 *cqe_flushed, u64 *cookie, u32 *credit)
+{
+ int ret = 0;
+ struct t3_cqe *hw_cqe, read_cqe;
+
+ *cqe_flushed = 0;
+ *credit = 0;
+ hw_cqe = cxio_next_cqe(cq);
+
+ PDBG("%s CQE OOO %d qpid 0x%0x genbit %d type %d status 0x%0x"
+ " opcode 0x%0x len 0x%0x wrid_hi_stag 0x%x wrid_low_msn 0x%x\n",
+ __FUNCTION__, CQE_OOO(*hw_cqe), CQE_QPID(*hw_cqe),
+ CQE_GENBIT(*hw_cqe), CQE_TYPE(*hw_cqe), CQE_STATUS(*hw_cqe),
+ CQE_OPCODE(*hw_cqe), CQE_LEN(*hw_cqe), CQE_WRID_HI(*hw_cqe),
+ CQE_WRID_LOW(*hw_cqe));
+
+ /*
+ * skip cqe's not affiliated with a QP.
+ */
+ if (wq == NULL) {
+ ret = -1;
+ goto skip_cqe;
+ }
+
+ /*
+ * Gotta tweak READ completions:
+ * 1) the cqe doesn't contain the sq_wptr from the wr.
+ * 2) opcode not reflected from the wr.
+ * 3) read_len not reflected from the wr.
+ * 4) cq_type is RQ_TYPE not SQ_TYPE.
+ */
+ if (RQ_TYPE(*hw_cqe) && (CQE_OPCODE(*hw_cqe) == T3_READ_RESP)) {
+
+ /*
+ * Don't write to the HWCQ, so create a new read req CQE
+ * in local memory.
+ */
+ create_read_req_cqe(wq, hw_cqe, &read_cqe);
+ hw_cqe = &read_cqe;
+ advance_oldest_read(wq);
+ }
+
+ /*
+ * T3A: Discard TERMINATE CQEs.
+ */
+ if (CQE_OPCODE(*hw_cqe) == T3_TERMINATE) {
+ ret = -1;
+ wq->error = 1;
+ goto skip_cqe;
+ }
+
+ if (CQE_STATUS(*hw_cqe) || wq->error) {
+ *cqe_flushed = wq->error;
+ wq->error = 1;
+
+ /*
+ * T3A inserts errors into the CQE. We cannot return
+ * these as work completions.
+ */
+ /* incoming write failures */
+ if ((CQE_OPCODE(*hw_cqe) == T3_RDMA_WRITE)
+ && RQ_TYPE(*hw_cqe)) {
+ ret = -1;
+ goto skip_cqe;
+ }
+ /* incoming read request failures */
+ if ((CQE_OPCODE(*hw_cqe) == T3_READ_RESP) && SQ_TYPE(*hw_cqe)) {
+ ret = -1;
+ goto skip_cqe;
+ }
+
+ /* incoming SEND with no receive posted failures */
+ if ((CQE_OPCODE(*hw_cqe) == T3_SEND) && RQ_TYPE(*hw_cqe) &&
+ Q_EMPTY(wq->rq_rptr, wq->rq_wptr)) {
+ ret = -1;
+ goto skip_cqe;
+ }
+ goto proc_cqe;
+ }
+
+ /*
+ * RECV completion.
+ */
+ if (RQ_TYPE(*hw_cqe)) {
+
+ /*
+ * HW only validates 4 bits of MSN. So we must validate that
+ * the MSN in the SEND is the next expected MSN. If its not,
+ * then we complete this with TPT_ERR_MSN and mark the wq in
+ * error.
+ */
+ if (unlikely((CQE_WRID_MSN(*hw_cqe) != (wq->rq_rptr + 1)))) {
+ wq->error = 1;
+ hw_cqe->header |= htonl(V_CQE_STATUS(TPT_ERR_MSN));
+ goto proc_cqe;
+ }
+ goto proc_cqe;
+ }
+
+ /*
+ * If we get here its a send completion.
+ *
+ * Handle out of order completion. These get stuffed
+ * in the SW SQ. Then the SW SQ is walked to move any
+ * now in-order completions into the SW CQ. This handles
+ * 2 cases:
+ * 1) reaping unsignaled WRs when the first subsequent
+ * signaled WR is completed.
+ * 2) out of order read completions.
+ */
+ if (!SW_CQE(*hw_cqe) && (CQE_WRID_SQ_WPTR(*hw_cqe) != wq->sq_rptr)) {
+ struct t3_swsq *sqp;
+
+ PDBG("%s out of order completion going in swsq at idx %ld\n",
+ __FUNCTION__,
+ Q_PTR2IDX(CQE_WRID_SQ_WPTR(*hw_cqe), wq->sq_size_log2));
+ sqp = wq->sq +
+ Q_PTR2IDX(CQE_WRID_SQ_WPTR(*hw_cqe), wq->sq_size_log2);
+ sqp->cqe = *hw_cqe;
+ sqp->complete = 1;
+ ret = -1;
+ goto flush_wq;
+ }
+
+proc_cqe:
+ *cqe = *hw_cqe;
+
+ /*
+ * Reap the associated WR(s) that are freed up with this
+ * completion.
+ */
+ if (SQ_TYPE(*hw_cqe)) {
+ wq->sq_rptr = CQE_WRID_SQ_WPTR(*hw_cqe);
+ PDBG("%s completing sq idx %ld\n", __FUNCTION__,
+ Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2));
+ *cookie = (wq->sq +
+ Q_PTR2IDX(wq->sq_rptr, wq->sq_size_log2))->wr_id;
+ wq->sq_rptr++;
+ } else {
+ PDBG("%s completing rq idx %ld\n", __FUNCTION__,
+ Q_PTR2IDX(wq->rq_rptr, wq->rq_size_log2));
+ *cookie = *(wq->rq + Q_PTR2IDX(wq->rq_rptr, wq->rq_size_log2));
+ wq->rq_rptr++;
+ }
+
+flush_wq:
+ /*
+ * Flush any completed cqes that are now in-order.
+ */
+ flush_completed_wrs(wq, cq);
+
+skip_cqe:
+ if (SW_CQE(*hw_cqe)) {
+ PDBG("%s cq %p cqid 0x%x skip sw cqe sw_rptr 0x%x\n",
+ __FUNCTION__, cq, cq->cqid, cq->sw_rptr);
+ ++cq->sw_rptr;
+ } else {
+ PDBG("%s cq %p cqid 0x%x skip hw cqe rptr 0x%x\n",
+ __FUNCTION__, cq, cq->cqid, cq->rptr);
+ ++cq->rptr;
+
+ /*
+ * T3A: compute credits.
+ */
+ if (((cq->rptr - cq->wptr) > (1 << (cq->size_log2 - 1)))
+ || ((cq->rptr - cq->wptr) >= 128)) {
+ *credit = cq->rptr - cq->wptr;
+ cq->wptr = cq->rptr;
+ }
+ }
+ return ret;
+}
diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_hal.h b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
new file mode 100644
index 0000000..bde5cfb
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_hal.h
@@ -0,0 +1,201 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef __CXIO_HAL_H__
+#define __CXIO_HAL_H__
+
+#include <linux/list.h>
+#include <linux/mutex.h>
+
+#include "t3_cpl.h"
+#include "t3cdev.h"
+#include "cxgb3_ctl_defs.h"
+#include "cxio_wr.h"
+
+#define T3_CTRL_QP_ID FW_RI_SGEEC_START
+#define T3_CTL_QP_TID FW_RI_TID_START
+#define T3_CTRL_QP_SIZE_LOG2 8
+#define T3_CTRL_CQ_ID 0
+
+/* TBD */
+#define T3_MAX_NUM_RNIC 8
+#define T3_MAX_NUM_RI (1<<15)
+#define T3_MAX_NUM_QP (1<<15)
+#define T3_MAX_NUM_CQ (1<<15)
+#define T3_MAX_NUM_PD (1<<15)
+#define T3_MAX_PBL_SIZE 256
+#define T3_MAX_RQ_SIZE 1024
+#define T3_MAX_NUM_STAG (1<<15)
+
+#define T3_STAG_UNSET 0xffffffff
+
+#define T3_MAX_DEV_NAME_LEN 32
+
+struct cxio_hal_ctrl_qp {
+ u32 wptr;
+ u32 rptr;
+ struct semaphore sem; /* for the wtpr, can sleep */
+ wait_queue_head_t waitq; /* wait for RspQ/CQE msg */
+ union t3_wr *workq; /* the work request queue */
+ dma_addr_t dma_addr; /* pci bus address of the workq */
+ DECLARE_PCI_UNMAP_ADDR(mapping)
+ void __iomem *doorbell;
+};
+
+struct cxio_hal_resource {
+ struct kfifo *tpt_fifo;
+ spinlock_t tpt_fifo_lock;
+ struct kfifo *qpid_fifo;
+ spinlock_t qpid_fifo_lock;
+ struct kfifo *cqid_fifo;
+ spinlock_t cqid_fifo_lock;
+ struct kfifo *pdid_fifo;
+ spinlock_t pdid_fifo_lock;
+};
+
+struct cxio_qpid_list {
+ struct list_head entry;
+ u32 qpid;
+};
+
+struct cxio_ucontext {
+ struct list_head qpids;
+ struct mutex lock;
+};
+
+struct cxio_rdev {
+ char dev_name[T3_MAX_DEV_NAME_LEN];
+ struct t3cdev *t3cdev_p;
+ struct rdma_info rnic_info;
+ struct adap_ports port_info;
+ struct cxio_hal_resource *rscp;
+ struct cxio_hal_ctrl_qp ctrl_qp;
+ void *ulp;
+ unsigned long qpshift;
+ u32 qpnr;
+ u32 qpmask;
+ struct cxio_ucontext uctx;
+ struct gen_pool *pbl_pool;
+ struct gen_pool *rqt_pool;
+};
+
+static inline int cxio_num_stags(struct cxio_rdev *rdev_p)
+{
+ return min((int)T3_MAX_NUM_STAG, (int)((rdev_p->rnic_info.tpt_top - rdev_p->rnic_info.tpt_base) >> 5));
+}
+
+typedef void (*cxio_hal_ev_callback_func_t) (struct cxio_rdev * rdev_p,
+ struct sk_buff * skb);
+
+#define RSPQ_CQID(rsp) (be32_to_cpu(rsp->cq_ptrid) & 0xffff)
+#define RSPQ_CQPTR(rsp) ((be32_to_cpu(rsp->cq_ptrid) >> 16) & 0xffff)
+#define RSPQ_GENBIT(rsp) ((be32_to_cpu(rsp->flags) >> 16) & 1)
+#define RSPQ_OVERFLOW(rsp) ((be32_to_cpu(rsp->flags) >> 17) & 1)
+#define RSPQ_AN(rsp) ((be32_to_cpu(rsp->flags) >> 18) & 1)
+#define RSPQ_SE(rsp) ((be32_to_cpu(rsp->flags) >> 19) & 1)
+#define RSPQ_NOTIFY(rsp) ((be32_to_cpu(rsp->flags) >> 20) & 1)
+#define RSPQ_CQBRANCH(rsp) ((be32_to_cpu(rsp->flags) >> 21) & 1)
+#define RSPQ_CREDIT_THRESH(rsp) ((be32_to_cpu(rsp->flags) >> 22) & 1)
+
+struct respQ_msg_t {
+ __be32 flags; /* flit 0 */
+ __be32 cq_ptrid;
+ __be64 rsvd; /* flit 1 */
+ struct t3_cqe cqe; /* flits 2-3 */
+};
+
+enum t3_cq_opcode {
+ CQ_ARM_AN = 0x2,
+ CQ_ARM_SE = 0x6,
+ CQ_FORCE_AN = 0x3,
+ CQ_CREDIT_UPDATE = 0x7
+};
+
+int cxio_rdev_open(struct cxio_rdev *rdev);
+void cxio_rdev_close(struct cxio_rdev *rdev);
+int cxio_hal_cq_op(struct cxio_rdev *rdev, struct t3_cq *cq,
+ enum t3_cq_opcode op, u32 credit);
+int cxio_hal_clear_qp_ctx(struct cxio_rdev *rdev, u32 qpid);
+int cxio_create_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
+int cxio_destroy_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
+int cxio_resize_cq(struct cxio_rdev *rdev, struct t3_cq *cq);
+void cxio_release_ucontext(struct cxio_rdev *rdev, struct cxio_ucontext *uctx);
+void cxio_init_ucontext(struct cxio_rdev *rdev, struct cxio_ucontext *uctx);
+int cxio_create_qp(struct cxio_rdev *rdev, u32 kernel_domain, struct t3_wq *wq,
+ struct cxio_ucontext *uctx);
+int cxio_destroy_qp(struct cxio_rdev *rdev, struct t3_wq *wq,
+ struct cxio_ucontext *uctx);
+int cxio_peek_cq(struct t3_wq *wr, struct t3_cq *cq, int opcode);
+int cxio_allocate_stag(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
+ enum tpt_mem_perm perm, u32 * pbl_size, u32 * pbl_addr);
+int cxio_register_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
+ enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
+ u8 page_size, __be64 *pbl, u32 *pbl_size,
+ u32 *pbl_addr);
+int cxio_reregister_phys_mem(struct cxio_rdev *rdev, u32 * stag, u32 pdid,
+ enum tpt_mem_perm perm, u32 zbva, u64 to, u32 len,
+ u8 page_size, __be64 *pbl, u32 *pbl_size,
+ u32 *pbl_addr);
+int cxio_dereg_mem(struct cxio_rdev *rdev, u32 stag, u32 pbl_size,
+ u32 pbl_addr);
+int cxio_allocate_window(struct cxio_rdev *rdev, u32 * stag, u32 pdid);
+int cxio_deallocate_window(struct cxio_rdev *rdev, u32 stag);
+int cxio_rdma_init(struct cxio_rdev *rdev, struct t3_rdma_init_attr *attr);
+void cxio_register_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
+void cxio_unregister_ev_cb(cxio_hal_ev_callback_func_t ev_cb);
+u32 cxio_hal_get_rhdl(void);
+void cxio_hal_put_rhdl(u32 rhdl);
+u32 cxio_hal_get_pdid(struct cxio_hal_resource *rscp);
+void cxio_hal_put_pdid(struct cxio_hal_resource *rscp, u32 pdid);
+int __init cxio_hal_init(void);
+void __exit cxio_hal_exit(void);
+void cxio_flush_rq(struct t3_wq *wq, struct t3_cq *cq, int count);
+void cxio_flush_sq(struct t3_wq *wq, struct t3_cq *cq, int count);
+void cxio_count_rcqes(struct t3_cq *cq, struct t3_wq *wq, int *count);
+void cxio_count_scqes(struct t3_cq *cq, struct t3_wq *wq, int *count);
+void cxio_flush_hw_cq(struct t3_cq *cq);
+int cxio_poll_cq(struct t3_wq *wq, struct t3_cq *cq, struct t3_cqe *cqe,
+ u8 *cqe_flushed, u64 *cookie, u32 *credit);
+
+#define MOD "iw_cxgb3: "
+#define PDBG(fmt, args...) pr_debug(MOD fmt, ## args)
+
+#ifdef DEBUG
+void cxio_dump_tpt(struct cxio_rdev *rev, u32 stag);
+void cxio_dump_pbl(struct cxio_rdev *rev, u32 pbl_addr, uint len, u8 shift);
+void cxio_dump_wqe(union t3_wr *wqe);
+void cxio_dump_wce(struct t3_cqe *wce);
+void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents);
+void cxio_dump_tcb(struct cxio_rdev *rdev, u32 hwtid);
+#endif
+
+#endif

2006-12-02 22:52:56

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 12/13] Core Debug functions


Debug code to dump various data structs, some of which are in
adapter memory.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/hw/cxgb3/core/cxio_dbg.c | 205 +++++++++++++++++++++++++++
1 files changed, 205 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c
new file mode 100644
index 0000000..22f4f75
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/core/cxio_dbg.c
@@ -0,0 +1,205 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifdef DEBUG
+#include <linux/types.h>
+#include "common.h"
+#include "cxgb3_ioctl.h"
+#include "cxio_hal.h"
+#include "cxio_wr.h"
+
+void cxio_dump_tpt(struct cxio_rdev *rdev, u32 stag)
+{
+ struct ch_mem_range *m;
+ u64 *data;
+ int rc;
+ int size = 32;
+
+ m = kmalloc(sizeof(*m) + size, GFP_ATOMIC);
+ if (!m) {
+ PDBG("%s couldn't allocate memory.\n", __FUNCTION__);
+ return;
+ }
+ m->mem_id = MEM_PMRX;
+ m->addr = (stag>>8) * 32 + rdev->rnic_info.tpt_base;
+ m->len = size;
+ PDBG("%s TPT addr 0x%x len %d\n", __FUNCTION__, m->addr, m->len);
+ rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m);
+ if (rc) {
+ PDBG("%s toectl returned error %d\n", __FUNCTION__, rc);
+ kfree(m);
+ return;
+ }
+
+ data = (u64 *)m->buf;
+ while (size > 0) {
+ PDBG("TPT %08x: %016llx\n", m->addr, (u64)*data);
+ size -= 8;
+ data++;
+ m->addr += 8;
+ }
+ kfree(m);
+}
+
+void cxio_dump_pbl(struct cxio_rdev *rdev, u32 pbl_addr, uint len, u8 shift)
+{
+ struct ch_mem_range *m;
+ u64 *data;
+ int rc;
+ int size, npages;
+
+ shift += 12;
+ npages = (len + (1ULL << shift) - 1) >> shift;
+ size = npages * sizeof(u64);
+
+ m = kmalloc(sizeof(*m) + size, GFP_ATOMIC);
+ if (!m) {
+ PDBG("%s couldn't allocate memory.\n", __FUNCTION__);
+ return;
+ }
+ m->mem_id = MEM_PMRX;
+ m->addr = pbl_addr;
+ m->len = size;
+ PDBG("%s PBL addr 0x%x len %d depth %d\n",
+ __FUNCTION__, m->addr, m->len, npages);
+ rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m);
+ if (rc) {
+ PDBG("%s toectl returned error %d\n", __FUNCTION__, rc);
+ kfree(m);
+ return;
+ }
+
+ data = (u64 *)m->buf;
+ while (size > 0) {
+ PDBG("PBL %08x: %016llx\n", m->addr, (u64)*data);
+ size -= 8;
+ data++;
+ m->addr += 8;
+ }
+ kfree(m);
+}
+
+void cxio_dump_wqe(union t3_wr *wqe)
+{
+ __be64 *data = (__be64 *)wqe;
+ uint size = (uint)(be64_to_cpu(*data) & 0xff);
+
+ if (size == 0)
+ size = 8;
+ while (size > 0) {
+ PDBG("WQE %p: %016llx\n", data, be64_to_cpu(*data));
+ size--;
+ data++;
+ }
+}
+
+void cxio_dump_wce(struct t3_cqe *wce)
+{
+ __be64 *data = (__be64 *)wce;
+ int size = sizeof(*wce);
+
+ while (size > 0) {
+ PDBG("WCE %p: %016llx\n", data, be64_to_cpu(*data));
+ size -= 8;
+ data++;
+ }
+}
+
+void cxio_dump_rqt(struct cxio_rdev *rdev, u32 hwtid, int nents)
+{
+ struct ch_mem_range *m;
+ int size = nents * 64;
+ u64 *data;
+ int rc;
+
+ m = kmalloc(sizeof(*m) + size, GFP_ATOMIC);
+ if (!m) {
+ PDBG("%s couldn't allocate memory.\n", __FUNCTION__);
+ return;
+ }
+ m->mem_id = MEM_PMRX;
+ m->addr = ((hwtid)<<10) + rdev->rnic_info.rqt_base;
+ m->len = size;
+ PDBG("%s RQT addr 0x%x len %d\n", __FUNCTION__, m->addr, m->len);
+ rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m);
+ if (rc) {
+ PDBG("%s toectl returned error %d\n", __FUNCTION__, rc);
+ kfree(m);
+ return;
+ }
+
+ data = (u64 *)m->buf;
+ while (size > 0) {
+ PDBG("RQT %08x: %016llx\n", m->addr, (u64)*data);
+ size -= 8;
+ data++;
+ m->addr += 8;
+ }
+ kfree(m);
+}
+
+void cxio_dump_tcb(struct cxio_rdev *rdev, u32 hwtid)
+{
+ struct ch_mem_range *m;
+ int size = TCB_SIZE;
+ u32 *data;
+ int rc;
+
+ m = kmalloc(sizeof(*m) + size, GFP_ATOMIC);
+ if (!m) {
+ PDBG("%s couldn't allocate memory.\n", __FUNCTION__);
+ return;
+ }
+ m->mem_id = MEM_CM;
+ m->addr = hwtid * size;
+ m->len = size;
+ PDBG("%s TCB %d len %d\n", __FUNCTION__, m->addr, m->len);
+ rc = rdev->t3cdev_p->ctl(rdev->t3cdev_p, RDMA_GET_MEM, m);
+ if (rc) {
+ PDBG("%s toectl returned error %d\n", __FUNCTION__, rc);
+ kfree(m);
+ return;
+ }
+
+ data = (u32 *)m->buf;
+ while (size > 0) {
+ printk("%2u: %08x %08x %08x %08x %08x %08x %08x %08x\n",
+ m->addr,
+ *(data+2), *(data+3), *(data),*(data+1),
+ *(data+6), *(data+7), *(data+4), *(data+5));
+ size -= 32;
+ data += 8;
+ m->addr += 32;
+ }
+ kfree(m);
+}
+#endif

2006-12-02 22:51:15

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 06/13] Completion Queues


Functions to manipulate CQs.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/hw/cxgb3/iwch_cq.c | 231 +++++++++++++++++++++++++++++++++
1 files changed, 231 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_cq.c b/drivers/infiniband/hw/cxgb3/iwch_cq.c
new file mode 100644
index 0000000..9d82df4
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch_cq.c
@@ -0,0 +1,231 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include "iwch_provider.h"
+#include "iwch.h"
+
+/*
+ * Get one cq entry from cxio and map it to openib.
+ *
+ * Returns:
+ * 0 EMPTY;
+ * 1 cqe returned
+ * -EAGAIN caller must try again
+ * any other -errno fatal error
+ */
+int iwch_poll_cq_one(struct iwch_dev *rhp, struct iwch_cq *chp,
+ struct ib_wc *wc)
+{
+ struct iwch_qp *qhp = NULL;
+ struct t3_cqe cqe, *rd_cqe;
+ struct t3_wq *wq;
+ u32 credit = 0;
+ u8 cqe_flushed;
+ u64 cookie;
+ int ret = 1;
+
+ rd_cqe = cxio_next_cqe(&chp->cq);
+
+ if (!rd_cqe)
+ return 0;
+
+ qhp = get_qhp(rhp, CQE_QPID(*rd_cqe));
+ if (!qhp)
+ wq = NULL;
+ else {
+ spin_lock(&qhp->lock);
+ wq = &(qhp->wq);
+ }
+ ret = cxio_poll_cq(wq, &(chp->cq), &cqe, &cqe_flushed, &cookie,
+ &credit);
+ if (t3a_device(chp->rhp) && credit) {
+ PDBG("%s updating %d cq credits on id %d\n", __FUNCTION__,
+ credit, chp->cq.cqid);
+ cxio_hal_cq_op(&rhp->rdev, &chp->cq, CQ_CREDIT_UPDATE, credit);
+ }
+
+ if (ret) {
+ ret = -EAGAIN;
+ goto out;
+ }
+ ret = 1;
+
+ wc->wr_id = cookie;
+ wc->qp_num = qhp->wq.qpid;
+ wc->vendor_err = CQE_STATUS(cqe);
+
+ PDBG("%s qpid 0x%x type %d opcode %d status 0x%x wrid hi 0x%x "
+ "lo 0x%x cookie 0x%llx\n", __FUNCTION__,
+ CQE_QPID(cqe), CQE_TYPE(cqe),
+ CQE_OPCODE(cqe), CQE_STATUS(cqe), CQE_WRID_HI(cqe),
+ CQE_WRID_LOW(cqe), cookie);
+
+ if (CQE_TYPE(cqe) == 0) {
+ if (!CQE_STATUS(cqe))
+ wc->byte_len = CQE_LEN(cqe);
+ else
+ wc->byte_len = 0;
+ wc->opcode = IB_WC_RECV;
+ } else {
+ switch (CQE_OPCODE(cqe)) {
+ case T3_RDMA_WRITE:
+ wc->opcode = IB_WC_RDMA_WRITE;
+ break;
+ case T3_READ_REQ:
+ wc->opcode = IB_WC_RDMA_READ;
+ wc->byte_len = CQE_LEN(cqe);
+ break;
+ case T3_SEND:
+ case T3_SEND_WITH_SE:
+ wc->opcode = IB_WC_SEND;
+ break;
+ case T3_BIND_MW:
+ wc->opcode = IB_WC_BIND_MW;
+ break;
+
+ /* these aren't supported yet */
+ case T3_SEND_WITH_INV:
+ case T3_SEND_WITH_SE_INV:
+ case T3_LOCAL_INV:
+ case T3_FAST_REGISTER:
+ default:
+ printk(KERN_ERR MOD "Unexpected opcode %d "
+ "in the CQE received for QPID=0x%0x\n",
+ CQE_OPCODE(cqe), CQE_QPID(cqe));
+ ret = -EINVAL;
+ goto out;
+ }
+ }
+
+ if (cqe_flushed)
+ wc->status = IB_WC_WR_FLUSH_ERR;
+ else {
+
+ switch (CQE_STATUS(cqe)) {
+ case TPT_ERR_SUCCESS:
+ wc->status = IB_WC_SUCCESS;
+ break;
+ case TPT_ERR_STAG:
+ wc->status = IB_WC_LOC_ACCESS_ERR;
+ break;
+ case TPT_ERR_PDID:
+ wc->status = IB_WC_LOC_PROT_ERR;
+ break;
+ case TPT_ERR_QPID:
+ case TPT_ERR_ACCESS:
+ wc->status = IB_WC_LOC_ACCESS_ERR;
+ break;
+ case TPT_ERR_WRAP:
+ wc->status = IB_WC_GENERAL_ERR;
+ break;
+ case TPT_ERR_BOUND:
+ wc->status = IB_WC_LOC_LEN_ERR;
+ break;
+ case TPT_ERR_INVALIDATE_SHARED_MR:
+ case TPT_ERR_INVALIDATE_MR_WITH_MW_BOUND:
+ wc->status = IB_WC_MW_BIND_ERR;
+ break;
+ case TPT_ERR_CRC:
+ case TPT_ERR_MARKER:
+ case TPT_ERR_PDU_LEN_ERR:
+ case TPT_ERR_OUT_OF_RQE:
+ case TPT_ERR_DDP_VERSION:
+ case TPT_ERR_RDMA_VERSION:
+ case TPT_ERR_DDP_QUEUE_NUM:
+ case TPT_ERR_MSN:
+ case TPT_ERR_TBIT:
+ case TPT_ERR_MO:
+ case TPT_ERR_MSN_RANGE:
+ case TPT_ERR_IRD_OVERFLOW:
+ case TPT_ERR_OPCODE:
+ wc->status = IB_WC_FATAL_ERR;
+ break;
+ case TPT_ERR_SWFLUSH:
+ wc->status = IB_WC_WR_FLUSH_ERR;
+ break;
+ default:
+ printk(KERN_ERR MOD "Unexpected cqe_status 0x%x for "
+ "QPID=0x%0x\n", CQE_STATUS(cqe), CQE_QPID(cqe));
+ ret = -EINVAL;
+ }
+ }
+out:
+ if (wq)
+ spin_unlock(&qhp->lock);
+ return ret;
+}
+
+int iwch_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *wc)
+{
+ struct iwch_dev *rhp;
+ struct iwch_cq *chp;
+ unsigned long flags;
+ int npolled;
+ int err = 0;
+
+ chp = to_iwch_cq(ibcq);
+ rhp = chp->rhp;
+
+ spin_lock_irqsave(&chp->lock, flags);
+ for (npolled = 0; npolled < num_entries; ++npolled) {
+#ifdef DEBUG
+ int i=0;
+#endif
+
+ /*
+ * Because T3 can post CQEs that are _not_ associated
+ * with a WR, we might have to poll again after removing
+ * one of these.
+ */
+ do {
+ err = iwch_poll_cq_one(rhp, chp, wc + npolled);
+#ifdef DEBUG
+ BUG_ON(++i > 1000);
+#endif
+ } while (err == -EAGAIN);
+ if (err <= 0)
+ break;
+ }
+ spin_unlock_irqrestore(&chp->lock, flags);
+
+ if (err < 0)
+ return err;
+ else {
+ return npolled;
+ }
+}
+
+int iwch_modify_cq(struct ib_cq *cq, int cqe)
+{
+ PDBG("iwch_modify_cq: TBD\n");
+ return 0;
+}

2006-12-02 22:53:47

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 08/13] Memory Registration


Functions to register memory regions.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/hw/cxgb3/iwch_mem.c | 170 ++++++++++++++++++++++++++++++++
1 files changed, 170 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_mem.c b/drivers/infiniband/hw/cxgb3/iwch_mem.c
new file mode 100644
index 0000000..774d11e
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch_mem.c
@@ -0,0 +1,170 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include <asm/byteorder.h>
+
+#include <rdma/iw_cm.h>
+#include <rdma/ib_verbs.h>
+
+#include "cxio_hal.h"
+#include "iwch.h"
+#include "iwch_provider.h"
+
+int iwch_register_mem(struct iwch_dev *rhp, struct iwch_pd *php,
+ struct iwch_mr *mhp,
+ int shift,
+ __be64 *page_list)
+{
+ u32 stag;
+ u32 mmid;
+
+
+ if (cxio_register_phys_mem(&rhp->rdev,
+ &stag, mhp->attr.pdid,
+ mhp->attr.perms,
+ mhp->attr.zbva,
+ mhp->attr.va_fbo,
+ mhp->attr.len,
+ shift-12,
+ page_list,
+ &mhp->attr.pbl_size, &mhp->attr.pbl_addr))
+ return -ENOMEM;
+ mhp->attr.state = 1;
+ mhp->attr.stag = stag;
+ mmid = stag >> 8;
+ mhp->ibmr.rkey = mhp->ibmr.lkey = stag;
+ insert_handle(rhp, &rhp->mmidr, mhp, mmid);
+ PDBG("%s mmid 0x%x mhp %p\n", __FUNCTION__, mmid, mhp);
+ return 0;
+}
+
+int iwch_reregister_mem(struct iwch_dev *rhp, struct iwch_pd *php,
+ struct iwch_mr *mhp,
+ int shift,
+ __be64 *page_list,
+ int npages)
+{
+ u32 stag;
+ u32 mmid;
+
+
+ /* We could support this... */
+ if (npages > mhp->attr.pbl_size)
+ return -ENOMEM;
+
+ stag = mhp->attr.stag;
+ if (cxio_reregister_phys_mem(&rhp->rdev,
+ &stag, mhp->attr.pdid,
+ mhp->attr.perms,
+ mhp->attr.zbva,
+ mhp->attr.va_fbo,
+ mhp->attr.len,
+ shift-12,
+ page_list,
+ &mhp->attr.pbl_size, &mhp->attr.pbl_addr))
+ return -ENOMEM;
+ mhp->attr.state = 1;
+ mhp->attr.stag = stag;
+ mmid = stag >> 8;
+ mhp->ibmr.rkey = mhp->ibmr.lkey = stag;
+ insert_handle(rhp, &rhp->mmidr, mhp, mmid);
+ PDBG("%s mmid 0x%x mhp %p\n", __FUNCTION__, mmid, mhp);
+ return 0;
+}
+
+int build_phys_page_list(struct ib_phys_buf *buffer_list,
+ int num_phys_buf,
+ u64 *iova_start,
+ u64 *total_size,
+ int *npages,
+ int *shift,
+ __be64 **page_list)
+{
+ u64 mask;
+ int i, j, n;
+
+ mask = 0;
+ *total_size = 0;
+ for (i = 0; i < num_phys_buf; ++i) {
+ if (i != 0 && buffer_list[i].addr & ~PAGE_MASK)
+ return -EINVAL;
+ if (i != 0 && i != num_phys_buf - 1 &&
+ (buffer_list[i].size & ~PAGE_MASK))
+ return -EINVAL;
+ *total_size += buffer_list[i].size;
+ if (i > 0)
+ mask |= buffer_list[i].addr;
+ }
+
+ if (*total_size > 0xFFFFFFFFULL)
+ return -ENOMEM;
+
+ /* Find largest page shift we can use to cover buffers */
+ for (*shift = PAGE_SHIFT; *shift < 27; ++(*shift))
+ if (num_phys_buf > 1) {
+ if ((1ULL << *shift) & mask)
+ break;
+ } else
+ if (1ULL << *shift >=
+ buffer_list[0].size +
+ (buffer_list[0].addr & ((1ULL << *shift) - 1)))
+ break;
+
+ buffer_list[0].size += buffer_list[0].addr & ((1ULL << *shift) - 1);
+ buffer_list[0].addr &= ~0ull << *shift;
+
+ *npages = 0;
+ for (i = 0; i < num_phys_buf; ++i)
+ *npages += (buffer_list[i].size +
+ (1ULL << *shift) - 1) >> *shift;
+
+ if (!*npages)
+ return -EINVAL;
+
+ *page_list = kmalloc(sizeof(u64) * *npages, GFP_KERNEL);
+ if (!*page_list)
+ return -ENOMEM;
+
+ n = 0;
+ for (i = 0; i < num_phys_buf; ++i)
+ for (j = 0;
+ j < (buffer_list[i].size + (1ULL << *shift) - 1) >> *shift;
+ ++j)
+ (*page_list)[n++] = cpu_to_be64(buffer_list[i].addr +
+ ((u64) j << *shift));
+
+ PDBG("%s va 0x%llx mask 0x%llx shift %d len %lld pbl_size %d\n",
+ __FUNCTION__, *iova_start, mask, *shift, *total_size, *npages);
+
+ return 0;
+
+}

2006-12-02 22:53:40

by Steve Wise

[permalink] [raw]
Subject: [PATCH v2 05/13] Queue Pairs


Code to manipulate the QP.

Signed-off-by: Steve Wise <[email protected]>
---

drivers/infiniband/hw/cxgb3/iwch_qp.c | 1007 +++++++++++++++++++++++++++++++++
1 files changed, 1007 insertions(+), 0 deletions(-)

diff --git a/drivers/infiniband/hw/cxgb3/iwch_qp.c b/drivers/infiniband/hw/cxgb3/iwch_qp.c
new file mode 100644
index 0000000..9f6b251
--- /dev/null
+++ b/drivers/infiniband/hw/cxgb3/iwch_qp.c
@@ -0,0 +1,1007 @@
+/*
+ * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
+ * Copyright (c) 2006 Open Grid Computing, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include "iwch_provider.h"
+#include "iwch.h"
+#include "iwch_cm.h"
+#include "cxio_hal.h"
+
+#define NO_SUPPORT -1
+
+static inline int iwch_build_rdma_send(union t3_wr *wqe, struct ib_send_wr *wr,
+ u8 * flit_cnt)
+{
+ int i;
+ u32 plen;
+
+ switch (wr->opcode) {
+ case IB_WR_SEND:
+ case IB_WR_SEND_WITH_IMM:
+ if (wr->send_flags & IB_SEND_SOLICITED)
+ wqe->send.rdmaop = T3_SEND_WITH_SE;
+ else
+ wqe->send.rdmaop = T3_SEND;
+ wqe->send.rem_stag = 0;
+ break;
+#if 0 /* Not currently supported */
+ case TYPE_SEND_INVALIDATE:
+ case TYPE_SEND_INVALIDATE_IMMEDIATE:
+ wqe->send.rdmaop = T3_SEND_WITH_INV;
+ wqe->send.rem_stag = cpu_to_be32(wr->wr.rdma.rkey);
+ break;
+ case TYPE_SEND_SE_INVALIDATE:
+ wqe->send.rdmaop = T3_SEND_WITH_SE_INV;
+ wqe->send.rem_stag = cpu_to_be32(wr->wr.rdma.rkey);
+ break;
+#endif
+ default:
+ break;
+ }
+ if (wr->num_sge > T3_MAX_SGE)
+ return -EINVAL;
+ wqe->send.reserved[0] = 0;
+ wqe->send.reserved[1] = 0;
+ wqe->send.reserved[2] = 0;
+ if (wr->opcode == IB_WR_SEND_WITH_IMM) {
+ plen = 4;
+ wqe->send.sgl[0].stag = wr->imm_data;
+ wqe->send.sgl[0].len = __constant_cpu_to_be32(0);
+ wqe->send.num_sgle = __constant_cpu_to_be32(0);
+ *flit_cnt = 5;
+ } else {
+ plen = 0;
+ for (i = 0; i < wr->num_sge; i++) {
+ if ((plen + wr->sg_list[i].length) < plen) {
+ return -EMSGSIZE;
+ }
+ plen += wr->sg_list[i].length;
+ wqe->send.sgl[i].stag =
+ cpu_to_be32(wr->sg_list[i].lkey);
+ wqe->send.sgl[i].len =
+ cpu_to_be32(wr->sg_list[i].length);
+ wqe->send.sgl[i].to = cpu_to_be64(wr->sg_list[i].addr);
+ }
+ wqe->send.num_sgle = cpu_to_be32(wr->num_sge);
+ *flit_cnt = 4 + ((wr->num_sge) << 1);
+ }
+ wqe->send.plen = cpu_to_be32(plen);
+ return 0;
+}
+
+static inline int iwch_build_rdma_write(union t3_wr *wqe, struct ib_send_wr *wr,
+ u8 *flit_cnt)
+{
+ int i;
+ u32 plen;
+ if (wr->num_sge > T3_MAX_SGE)
+ return -EINVAL;
+ wqe->write.rdmaop = T3_RDMA_WRITE;
+ wqe->write.reserved[0] = 0;
+ wqe->write.reserved[1] = 0;
+ wqe->write.reserved[2] = 0;
+ wqe->write.stag_sink = cpu_to_be32(wr->wr.rdma.rkey);
+ wqe->write.to_sink = cpu_to_be64(wr->wr.rdma.remote_addr);
+
+ if (wr->opcode == IB_WR_RDMA_WRITE_WITH_IMM) {
+ plen = 4;
+ wqe->write.sgl[0].stag = wr->imm_data;
+ wqe->write.sgl[0].len = __constant_cpu_to_be32(0);
+ wqe->write.num_sgle = __constant_cpu_to_be32(0);
+ *flit_cnt = 6;
+ } else {
+ plen = 0;
+ for (i = 0; i < wr->num_sge; i++) {
+ if ((plen + wr->sg_list[i].length) < plen) {
+ return -EMSGSIZE;
+ }
+ plen += wr->sg_list[i].length;
+ wqe->write.sgl[i].stag =
+ cpu_to_be32(wr->sg_list[i].lkey);
+ wqe->write.sgl[i].len =
+ cpu_to_be32(wr->sg_list[i].length);
+ wqe->write.sgl[i].to =
+ cpu_to_be64(wr->sg_list[i].addr);
+ }
+ wqe->write.num_sgle = cpu_to_be32(wr->num_sge);
+ *flit_cnt = 5 + ((wr->num_sge) << 1);
+ }
+ wqe->write.plen = cpu_to_be32(plen);
+ return 0;
+}
+
+static inline int iwch_build_rdma_read(union t3_wr *wqe, struct ib_send_wr *wr,
+ u8 *flit_cnt)
+{
+ if (wr->num_sge > 1)
+ return -EINVAL;
+ wqe->read.rdmaop = T3_READ_REQ;
+ wqe->read.reserved[0] = 0;
+ wqe->read.reserved[1] = 0;
+ wqe->read.reserved[2] = 0;
+ wqe->read.rem_stag = cpu_to_be32(wr->wr.rdma.rkey);
+ wqe->read.rem_to = cpu_to_be64(wr->wr.rdma.remote_addr);
+ wqe->read.local_stag = cpu_to_be32(wr->sg_list[0].lkey);
+ wqe->read.local_len = cpu_to_be32(wr->sg_list[0].length);
+ wqe->read.local_to = cpu_to_be64(wr->sg_list[0].addr);
+ *flit_cnt = sizeof(struct t3_rdma_read_wr) >> 3;
+ return 0;
+}
+
+/*
+ * TBD: this is going to be moved to firmware. Missing pdid/qpid check for now.
+ */
+static inline int iwch_sgl2pbl_map(struct iwch_dev *rhp,
+ struct ib_sge *sg_list, u32 num_sgle,
+ u32 * pbl_addr, u8 * page_size)
+{
+ int i;
+ struct iwch_mr *mhp;
+ u32 offset;
+ for (i = 0; i < num_sgle; i++) {
+
+ mhp = get_mhp(rhp, (sg_list[i].lkey) >> 8);
+ if (!mhp) {
+ PDBG("%s %d\n", __FUNCTION__, __LINE__);
+ return -EIO;
+ }
+ if (!mhp->attr.state) {
+ PDBG("%s %d\n", __FUNCTION__, __LINE__);
+ return -EIO;
+ }
+ if (mhp->attr.zbva) {
+ PDBG("%s %d\n", __FUNCTION__, __LINE__);
+ return -EIO;
+ }
+
+ if (sg_list[i].addr < mhp->attr.va_fbo) {
+ PDBG("%s %d\n", __FUNCTION__, __LINE__);
+ return -EINVAL;
+ }
+ if (sg_list[i].addr + ((u64) sg_list[i].length) <
+ sg_list[i].addr) {
+ PDBG("%s %d\n", __FUNCTION__, __LINE__);
+ return -EINVAL;
+ }
+ if (sg_list[i].addr + ((u64) sg_list[i].length) >
+ mhp->attr.va_fbo + ((u64) mhp->attr.len)) {
+ PDBG("%s %d\n", __FUNCTION__, __LINE__);
+ return -EINVAL;
+ }
+ offset = sg_list[i].addr - mhp->attr.va_fbo;
+ offset += ((u32) mhp->attr.va_fbo) %
+ (1UL << (12 + mhp->attr.page_size));
+ pbl_addr[i] = ((mhp->attr.pbl_addr -
+ rhp->rdev.rnic_info.pbl_base) >> 3) +
+ (offset >> (12 + mhp->attr.page_size));
+ page_size[i] = mhp->attr.page_size;
+ }
+ return 0;
+}
+
+static inline int iwch_build_rdma_recv(struct iwch_dev *rhp,
+ union t3_wr *wqe,
+ struct ib_recv_wr *wr)
+{
+ int i, err = 0;
+ u32 pbl_addr[4];
+ u8 page_size[4];
+ if (wr->num_sge > T3_MAX_SGE)
+ return -EINVAL;
+ err = iwch_sgl2pbl_map(rhp, wr->sg_list, wr->num_sge, pbl_addr,
+ page_size);
+ if (err)
+ return err;
+ wqe->recv.pagesz[0] = page_size[0];
+ wqe->recv.pagesz[1] = page_size[1];
+ wqe->recv.pagesz[2] = page_size[2];
+ wqe->recv.pagesz[3] = page_size[3];
+ wqe->recv.num_sgle = cpu_to_be32(wr->num_sge);
+ for (i = 0; i < wr->num_sge; i++) {
+ wqe->recv.sgl[i].stag = cpu_to_be32(wr->sg_list[i].lkey);
+ wqe->recv.sgl[i].len = cpu_to_be32(wr->sg_list[i].length);
+
+ /* to in the WQE == the offset into the page */
+ wqe->recv.sgl[i].to = cpu_to_be64(((u32) wr->sg_list[i].addr) %
+ (1UL << (12 + page_size[i])));
+
+ /* pbl_addr is the adapters address in the PBL */
+ wqe->recv.pbl_addr[i] = cpu_to_be32(pbl_addr[i]);
+ }
+ for (; i < T3_MAX_SGE; i++) {
+ wqe->recv.sgl[i].stag = 0;
+ wqe->recv.sgl[i].len = 0;
+ wqe->recv.sgl[i].to = 0;
+ wqe->recv.pbl_addr[i] = 0;
+ }
+ return 0;
+}
+
+int iwch_post_send(struct ib_qp *ibqp, struct ib_send_wr *wr,
+ struct ib_send_wr **bad_wr)
+{
+ int err = 0;
+ u8 t3_wr_flit_cnt;
+ enum t3_wr_opcode t3_wr_opcode = 0;
+ enum t3_wr_flags t3_wr_flags;
+ struct iwch_qp *qhp;
+ u32 idx;
+ union t3_wr *wqe;
+ u32 num_wrs;
+ unsigned long flag;
+ struct t3_swsq *sqp;
+
+ qhp = to_iwch_qp(ibqp);
+ spin_lock_irqsave(&qhp->lock, flag);
+ if (qhp->attr.state > IWCH_QP_STATE_RTS) {
+ spin_unlock_irqrestore(&qhp->lock, flag);
+ return -EINVAL;
+ }
+ num_wrs = Q_FREECNT(qhp->wq.sq_rptr, qhp->wq.sq_wptr,
+ qhp->wq.sq_size_log2);
+ if (num_wrs <= 0) {
+ spin_unlock_irqrestore(&qhp->lock, flag);
+ return -ENOMEM;
+ }
+ while (wr) {
+ if (num_wrs == 0) {
+ err = -ENOMEM;
+ *bad_wr = wr;
+ break;
+ }
+ idx = Q_PTR2IDX(qhp->wq.wptr, qhp->wq.size_log2);
+ wqe = (union t3_wr *) (qhp->wq.queue + idx);
+ t3_wr_flags = 0;
+ if (wr->send_flags & IB_SEND_SOLICITED)
+ t3_wr_flags |= T3_SOLICITED_EVENT_FLAG;
+ if (wr->send_flags & IB_SEND_FENCE)
+ t3_wr_flags |= T3_READ_FENCE_FLAG;
+ if (wr->send_flags & IB_SEND_SIGNALED)
+ t3_wr_flags |= T3_COMPLETION_FLAG;
+ sqp = qhp->wq.sq +
+ Q_PTR2IDX(qhp->wq.sq_wptr, qhp->wq.sq_size_log2);
+ switch (wr->opcode) {
+ case IB_WR_SEND:
+ case IB_WR_SEND_WITH_IMM:
+ t3_wr_opcode = T3_WR_SEND;
+ err = iwch_build_rdma_send(wqe, wr, &t3_wr_flit_cnt);
+ break;
+ case IB_WR_RDMA_WRITE:
+ case IB_WR_RDMA_WRITE_WITH_IMM:
+ t3_wr_opcode = T3_WR_WRITE;
+ err = iwch_build_rdma_write(wqe, wr, &t3_wr_flit_cnt);
+ break;
+ case IB_WR_RDMA_READ:
+ t3_wr_opcode = T3_WR_READ;
+ t3_wr_flags = 0; /* T3 reads are always signaled */
+ err = iwch_build_rdma_read(wqe, wr, &t3_wr_flit_cnt);
+ if (err)
+ break;
+ sqp->read_len = wqe->read.local_len;
+ if (!qhp->wq.oldest_read)
+ qhp->wq.oldest_read = sqp;
+ break;
+ default:
+ PDBG("%s post of type=%d TBD!\n", __FUNCTION__,
+ wr->opcode);
+ err = -EINVAL;
+ }
+ if (err) {
+ *bad_wr = wr;
+ break;
+ }
+ wqe->send.wrid.id0.hi = qhp->wq.sq_wptr;
+ sqp->wr_id = wr->wr_id;
+ sqp->opcode = wr2opcode(t3_wr_opcode);
+ sqp->sq_wptr = qhp->wq.sq_wptr;
+ sqp->complete = 0;
+ sqp->signaled = (wr->send_flags & IB_SEND_SIGNALED);
+
+ build_fw_riwrh((void *) wqe, t3_wr_opcode, t3_wr_flags,
+ Q_GENBIT(qhp->wq.wptr, qhp->wq.size_log2),
+ 0, t3_wr_flit_cnt);
+ PDBG("%s cookie 0x%llx wq idx 0x%x swsq idx %ld opcode %d\n",
+ __FUNCTION__, wr->wr_id, idx,
+ Q_PTR2IDX(qhp->wq.sq_wptr, qhp->wq.sq_size_log2),
+ sqp->opcode);
+ wr = wr->next;
+ num_wrs--;
+ ++(qhp->wq.wptr);
+ ++(qhp->wq.sq_wptr);
+ }
+ spin_unlock_irqrestore(&qhp->lock, flag);
+ ring_doorbell(qhp->wq.doorbell, qhp->wq.qpid);
+ return err;
+}
+
+int iwch_post_receive(struct ib_qp *ibqp, struct ib_recv_wr *wr,
+ struct ib_recv_wr **bad_wr)
+{
+ int err = 0;
+ struct iwch_qp *qhp;
+ u32 idx;
+ union t3_wr *wqe;
+ u32 num_wrs;
+ unsigned long flag;
+
+ qhp = to_iwch_qp(ibqp);
+ spin_lock_irqsave(&qhp->lock, flag);
+ if (qhp->attr.state > IWCH_QP_STATE_RTS) {
+ spin_unlock_irqrestore(&qhp->lock, flag);
+ return -EINVAL;
+ }
+ num_wrs = Q_FREECNT(qhp->wq.rq_rptr, qhp->wq.rq_wptr,
+ qhp->wq.rq_size_log2) - 1;
+ if (!wr) {
+ spin_unlock_irqrestore(&qhp->lock, flag);
+ return -EINVAL;
+ }
+ while (wr) {
+ idx = Q_PTR2IDX(qhp->wq.wptr, qhp->wq.size_log2);
+ wqe = (union t3_wr *) (qhp->wq.queue + idx);
+ if (num_wrs)
+ err = iwch_build_rdma_recv(qhp->rhp, wqe, wr);
+ else
+ err = -ENOMEM;
+ if (err) {
+ *bad_wr = wr;
+ break;
+ }
+ qhp->wq.rq[Q_PTR2IDX(qhp->wq.rq_wptr, qhp->wq.rq_size_log2)] =
+ wr->wr_id;
+ build_fw_riwrh((void *) wqe, T3_WR_RCV, T3_COMPLETION_FLAG,
+ Q_GENBIT(qhp->wq.wptr, qhp->wq.size_log2),
+ 0, sizeof(struct t3_receive_wr) >> 3);
+ PDBG("%s cookie 0x%llx idx 0x%x rq_wptr 0x%x rw_rptr 0x%x "
+ "wqe %p \n", __FUNCTION__, wr->wr_id, idx,
+ qhp->wq.rq_wptr, qhp->wq.rq_rptr, wqe);
+ ++(qhp->wq.rq_wptr);
+ ++(qhp->wq.wptr);
+ wr = wr->next;
+ num_wrs--;
+ }
+ spin_unlock_irqrestore(&qhp->lock, flag);
+ ring_doorbell(qhp->wq.doorbell, qhp->wq.qpid);
+ return err;
+}
+
+int iwch_bind_mw(struct ib_qp *qp,
+ struct ib_mw *mw,
+ struct ib_mw_bind *mw_bind)
+{
+ struct iwch_dev *rhp;
+ struct iwch_mw *mhp;
+ struct iwch_qp *qhp;
+ union t3_wr *wqe;
+ u32 pbl_addr;
+ u8 page_size;
+ u32 num_wrs;
+ unsigned long flag;
+ struct ib_sge sgl;
+ int err=0;
+ enum t3_wr_flags t3_wr_flags;
+ u32 idx;
+ struct t3_swsq *sqp;
+
+ qhp = to_iwch_qp(qp);
+ mhp = to_iwch_mw(mw);
+ rhp = qhp->rhp;
+
+ spin_lock_irqsave(&qhp->lock, flag);
+ if (qhp->attr.state > IWCH_QP_STATE_RTS) {
+ spin_unlock_irqrestore(&qhp->lock, flag);
+ return -EINVAL;
+ }
+ num_wrs = Q_FREECNT(qhp->wq.sq_rptr, qhp->wq.sq_wptr,
+ qhp->wq.sq_size_log2);
+ if ((num_wrs) <= 0) {
+ spin_unlock_irqrestore(&qhp->lock, flag);
+ return -ENOMEM;
+ }
+ idx = Q_PTR2IDX(qhp->wq.wptr, qhp->wq.size_log2);
+ PDBG("%s: idx 0x%0x, mw 0x%p, mw_bind 0x%p\n", __FUNCTION__, idx,
+ mw, mw_bind);
+ wqe = (union t3_wr *) (qhp->wq.queue + idx);
+
+ t3_wr_flags = 0;
+ if (mw_bind->send_flags & IB_SEND_SIGNALED)
+ t3_wr_flags = T3_COMPLETION_FLAG;
+
+ sgl.addr = mw_bind->addr;
+ sgl.lkey = mw_bind->mr->lkey;
+ sgl.length = mw_bind->length;
+ wqe->bind.reserved = 0;
+ wqe->bind.type = T3_VA_BASED_TO;
+
+ /* TBD: check perms */
+ wqe->bind.perms = iwch_convert_access(mw_bind->mw_access_flags);
+ wqe->bind.mr_stag = cpu_to_be32(mw_bind->mr->lkey);
+ wqe->bind.mw_stag = cpu_to_be32(mw->rkey);
+ wqe->bind.mw_len = cpu_to_be32(mw_bind->length);
+ wqe->bind.mw_va = cpu_to_be64(mw_bind->addr);
+ err = iwch_sgl2pbl_map(rhp, &sgl, 1, &pbl_addr, &page_size);
+ if (err) {
+ spin_unlock_irqrestore(&qhp->lock, flag);
+ return err;
+ }
+ wqe->send.wrid.id0.hi = qhp->wq.sq_wptr;
+ sqp = qhp->wq.sq + Q_PTR2IDX(qhp->wq.sq_wptr, qhp->wq.sq_size_log2);
+ sqp->wr_id = mw_bind->wr_id;
+ sqp->opcode = T3_BIND_MW;
+ sqp->sq_wptr = qhp->wq.sq_wptr;
+ sqp->complete = 0;
+ sqp->signaled = (mw_bind->send_flags & IB_SEND_SIGNALED);
+ wqe->bind.mr_pbl_addr = cpu_to_be32(pbl_addr);
+ wqe->bind.mr_pagesz = page_size;
+ wqe->flit[T3_SQ_COOKIE_FLIT] = mw_bind->wr_id;
+ build_fw_riwrh((void *)wqe, T3_WR_BIND, t3_wr_flags,
+ Q_GENBIT(qhp->wq.wptr, qhp->wq.size_log2), 0,
+ sizeof(struct t3_bind_mw_wr) >> 3);
+ ++(qhp->wq.wptr);
+ ++(qhp->wq.sq_wptr);
+ spin_unlock_irqrestore(&qhp->lock, flag);
+
+ ring_doorbell(qhp->wq.doorbell, qhp->wq.qpid);
+
+ return err;
+}
+
+static inline void build_term_codes(int t3err, u8 *layer_type, u8 *ecode,
+ int tagged)
+{
+ switch (t3err) {
+ case TPT_ERR_STAG:
+ if (tagged == 1) {
+ *layer_type = LAYER_DDP|DDP_TAGGED_ERR;
+ *ecode = DDPT_INV_STAG;
+ } else if (tagged == 2) {
+ *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT;
+ *ecode = RDMAP_INV_STAG;
+ }
+ break;
+ case TPT_ERR_PDID:
+ case TPT_ERR_QPID:
+ case TPT_ERR_ACCESS:
+ if (tagged == 1) {
+ *layer_type = LAYER_DDP|DDP_TAGGED_ERR;
+ *ecode = DDPT_STAG_NOT_ASSOC;
+ } else if (tagged == 2) {
+ *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT;
+ *ecode = RDMAP_STAG_NOT_ASSOC;
+ }
+ break;
+ case TPT_ERR_WRAP:
+ *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT;
+ *ecode = RDMAP_TO_WRAP;
+ break;
+ case TPT_ERR_BOUND:
+ if (tagged == 1) {
+ *layer_type = LAYER_DDP|DDP_TAGGED_ERR;
+ *ecode = DDPT_BASE_BOUNDS;
+ } else if (tagged == 2) {
+ *layer_type = LAYER_RDMAP|RDMAP_REMOTE_PROT;
+ *ecode = RDMAP_BASE_BOUNDS;
+ } else {
+ *layer_type = LAYER_DDP|DDP_UNTAGGED_ERR;
+ *ecode = DDPU_MSG_TOOBIG;
+ }
+ break;
+ case TPT_ERR_INVALIDATE_SHARED_MR:
+ case TPT_ERR_INVALIDATE_MR_WITH_MW_BOUND:
+ *layer_type = LAYER_RDMAP|RDMAP_REMOTE_OP;
+ *ecode = RDMAP_CANT_INV_STAG;
+ break;
+ case TPT_ERR_ECC:
+ case TPT_ERR_ECC_PSTAG:
+ case TPT_ERR_INTERNAL_ERR:
+ *layer_type = LAYER_RDMAP|RDMAP_LOCAL_CATA;
+ *ecode = 0;
+ break;
+ case TPT_ERR_OUT_OF_RQE:
+ *layer_type = LAYER_DDP|DDP_UNTAGGED_ERR;
+ *ecode = DDPU_INV_MSN_NOBUF;
+ break;
+ case TPT_ERR_PBL_ADDR_BOUND:
+ *layer_type = LAYER_DDP|DDP_TAGGED_ERR;
+ *ecode = DDPT_BASE_BOUNDS;
+ break;
+ case TPT_ERR_CRC:
+ *layer_type = LAYER_MPA|DDP_LLP;
+ *ecode = MPA_CRC_ERR;
+ break;
+ case TPT_ERR_MARKER:
+ *layer_type = LAYER_MPA|DDP_LLP;
+ *ecode = MPA_MARKER_ERR;
+ break;
+ case TPT_ERR_PDU_LEN_ERR:
+ *layer_type = LAYER_DDP|DDP_UNTAGGED_ERR;
+ *ecode = DDPU_MSG_TOOBIG;
+ break;
+ case TPT_ERR_DDP_VERSION:
+ if (tagged) {
+ *layer_type = LAYER_DDP|DDP_TAGGED_ERR;
+ *ecode = DDPT_INV_VERS;
+ } else {
+ *layer_type = LAYER_DDP|DDP_UNTAGGED_ERR;
+ *ecode = DDPU_INV_VERS;
+ }
+ break;
+ case TPT_ERR_RDMA_VERSION:
+ *layer_type = LAYER_RDMAP|RDMAP_REMOTE_OP;
+ *ecode = RDMAP_INV_VERS;
+ break;
+ case TPT_ERR_OPCODE:
+ *layer_type = LAYER_RDMAP|RDMAP_REMOTE_OP;
+ *ecode = RDMAP_INV_OPCODE;
+ break;
+ case TPT_ERR_DDP_QUEUE_NUM:
+ *layer_type = LAYER_DDP|DDP_UNTAGGED_ERR;
+ *ecode = DDPU_INV_QN;
+ break;
+ case TPT_ERR_MSN:
+ case TPT_ERR_MSN_GAP:
+ case TPT_ERR_MSN_RANGE:
+ case TPT_ERR_IRD_OVERFLOW:
+ *layer_type = LAYER_DDP|DDP_UNTAGGED_ERR;
+ *ecode = DDPU_INV_MSN_RANGE;
+ break;
+ case TPT_ERR_TBIT:
+ *layer_type = LAYER_DDP|DDP_LOCAL_CATA;
+ *ecode = 0;
+ break;
+ case TPT_ERR_MO:
+ *layer_type = LAYER_DDP|DDP_UNTAGGED_ERR;
+ *ecode = DDPU_INV_MO;
+ break;
+ default:
+ *layer_type = LAYER_RDMAP|DDP_LOCAL_CATA;
+ *ecode = 0;
+ break;
+ }
+}
+
+/*
+ * This posts a TERMINATE with layer=RDMA, type=catastrophic.
+ */
+int iwch_post_terminate(struct iwch_qp *qhp, struct respQ_msg_t *rsp_msg)
+{
+ union t3_wr *wqe;
+ struct terminate_message *term;
+ int status;
+ int tagged = 0;
+ struct sk_buff *skb;
+
+ PDBG("%s %d\n", __FUNCTION__, __LINE__);
+ skb = alloc_skb(40, GFP_ATOMIC);
+ if (!skb) {
+ printk(KERN_ERR "%s cannot send TERMINATE!\n", __FUNCTION__);
+ return -ENOMEM;
+ }
+ wqe = (union t3_wr *)skb_put(skb, 40);
+ memset(wqe, 0, 40);
+ wqe->send.rdmaop = T3_TERMINATE;
+
+ /* immediate data length */
+ wqe->send.plen = htonl(4);
+
+ /* immediate data starts here. */
+ term = (struct terminate_message *)wqe->send.sgl;
+ if (rsp_msg) {
+ status = CQE_STATUS(rsp_msg->cqe);
+ if (CQE_OPCODE(rsp_msg->cqe) == T3_RDMA_WRITE)
+ tagged = 1;
+ if ((CQE_OPCODE(rsp_msg->cqe) == T3_READ_REQ) ||
+ (CQE_OPCODE(rsp_msg->cqe) == T3_READ_RESP))
+ tagged = 2;
+ } else {
+ status = TPT_ERR_INTERNAL_ERR;
+ }
+ build_term_codes(status, &term->layer_etype, &term->ecode, tagged);
+ build_fw_riwrh((void *)wqe, T3_WR_SEND,
+ T3_COMPLETION_FLAG | T3_NOTIFY_FLAG, 1,
+ qhp->ep->hwtid, 5);
+ skb->priority = CPL_PRIORITY_DATA;
+ return (cxgb3_ofld_send(qhp->rhp->rdev.t3cdev_p, skb));
+}
+
+/*
+ * Assumes qhp lock is held.
+ */
+static void __flush_qp(struct iwch_qp *qhp, unsigned long *flag)
+{
+ struct iwch_cq *rchp, *schp;
+ int count;
+
+ rchp = get_chp(qhp->rhp, qhp->attr.rcq);
+ schp = get_chp(qhp->rhp, qhp->attr.scq);
+
+ PDBG("%s qhp %p rchp %p schp %p\n", __FUNCTION__, qhp, rchp, schp);
+ /* take a ref on the qhp since we must release the lock */
+ atomic_inc(&qhp->refcnt);
+ spin_unlock_irqrestore(&qhp->lock, *flag);
+
+ /* locking heirarchy: cq lock first, then qp lock. */
+ spin_lock_irqsave(&rchp->lock, *flag);
+ spin_lock(&qhp->lock);
+ cxio_flush_hw_cq(&rchp->cq);
+ cxio_count_rcqes(&rchp->cq, &qhp->wq, &count);
+ cxio_flush_rq(&qhp->wq, &rchp->cq, count);
+ spin_unlock(&qhp->lock);
+ spin_unlock_irqrestore(&rchp->lock, *flag);
+
+ /* locking heirarchy: cq lock first, then qp lock. */
+ spin_lock_irqsave(&schp->lock, *flag);
+ spin_lock(&qhp->lock);
+ cxio_flush_hw_cq(&schp->cq);
+ cxio_count_scqes(&schp->cq, &qhp->wq, &count);
+ cxio_flush_sq(&qhp->wq, &schp->cq, count);
+ spin_unlock(&qhp->lock);
+ spin_unlock_irqrestore(&schp->lock, *flag);
+
+ /* deref */
+ if (atomic_dec_and_test(&qhp->refcnt))
+ wake_up(&qhp->wait);
+
+ spin_lock_irqsave(&qhp->lock, *flag);
+}
+
+static inline void flush_qp(struct iwch_qp *qhp, unsigned long *flag)
+{
+ if (t3b_device(qhp->rhp))
+ cxio_set_wq_in_error(&qhp->wq);
+ else
+ __flush_qp(qhp, flag);
+}
+
+
+/*
+ * Return non zero if at least one RECV was pre-posted.
+ */
+static inline int rqes_posted(struct iwch_qp *qhp)
+{
+ return (fw_riwrh_opcode((struct fw_riwrh *)qhp->wq.queue) == T3_WR_RCV);
+}
+
+static int rdma_init(struct iwch_dev *rhp, struct iwch_qp *qhp,
+ enum iwch_qp_attr_mask mask,
+ struct iwch_qp_attributes *attrs)
+{
+ struct t3_rdma_init_attr init_attr;
+ int ret;
+
+ init_attr.tid = qhp->ep->hwtid;
+ init_attr.qpid = qhp->wq.qpid;
+ init_attr.pdid = qhp->attr.pd;
+ init_attr.scqid = qhp->attr.scq;
+ init_attr.rcqid = qhp->attr.rcq;
+ init_attr.rq_addr = qhp->wq.rq_addr;
+ init_attr.rq_size = 1 << qhp->wq.rq_size_log2;
+ init_attr.mpaattrs = uP_RI_MPA_IETF_ENABLE |
+ qhp->attr.mpa_attr.recv_marker_enabled |
+ (qhp->attr.mpa_attr.xmit_marker_enabled << 1) |
+ (qhp->attr.mpa_attr.crc_enabled << 2);
+
+ /*
+ * XXX - The IWCM doesn't quite handle getting these
+ * attrs set before going into RTS. For now, just turn
+ * them on always...
+ */
+#if 0
+ init_attr.qpcaps = qhp->attr.enableRdmaRead |
+ (qhp->attr.enableRdmaWrite << 1) |
+ (qhp->attr.enableBind << 2) |
+ (qhp->attr.enable_stag0_fastreg << 3) |
+ (qhp->attr.enable_stag0_fastreg << 4);
+#else
+ init_attr.qpcaps = 0x1f;
+#endif
+ init_attr.tcp_emss = qhp->ep->emss;
+ init_attr.ord = qhp->attr.max_ord;
+ init_attr.ird = qhp->attr.max_ird;
+ init_attr.qp_dma_addr = qhp->wq.dma_addr;
+ init_attr.qp_dma_size = (1UL << qhp->wq.size_log2);
+ init_attr.flags = rqes_posted(qhp) ? RECVS_POSTED : 0;
+ PDBG("%s init_attr.rq_addr 0x%x init_attr.rq_size = %d "
+ "flags 0x%x qpcaps 0x%x\n", __FUNCTION__,
+ init_attr.rq_addr, init_attr.rq_size,
+ init_attr.flags, init_attr.qpcaps);
+ ret = cxio_rdma_init(&rhp->rdev, &init_attr);
+ PDBG("%s ret %d\n", __FUNCTION__, ret);
+ return ret;
+}
+
+int iwch_modify_qp(struct iwch_dev *rhp, struct iwch_qp *qhp,
+ enum iwch_qp_attr_mask mask,
+ struct iwch_qp_attributes *attrs,
+ int internal)
+{
+ int ret = 0;
+ struct iwch_qp_attributes newattr = qhp->attr;
+ unsigned long flag;
+ int disconnect = 0;
+ int terminate = 0;
+ int abort = 0;
+ int free = 0;
+ struct iwch_ep *ep = NULL;
+
+ PDBG("%s qhp %p qpid 0x%x ep %p state %d -> %d\n", __FUNCTION__,
+ qhp, qhp->wq.qpid, qhp->ep, qhp->attr.state,
+ (mask & IWCH_QP_ATTR_NEXT_STATE) ? attrs->next_state : -1);
+
+ spin_lock_irqsave(&qhp->lock, flag);
+
+ /* Process attr changes if in IDLE */
+ if (mask & IWCH_QP_ATTR_VALID_MODIFY) {
+ if (qhp->attr.state != IWCH_QP_STATE_IDLE) {
+ ret = -EIO;
+ goto out;
+ }
+ if (mask & IWCH_QP_ATTR_ENABLE_RDMA_READ)
+ newattr.enable_rdma_read = attrs->enable_rdma_read;
+ if (mask & IWCH_QP_ATTR_ENABLE_RDMA_WRITE)
+ newattr.enable_rdma_write = attrs->enable_rdma_write;
+ if (mask & IWCH_QP_ATTR_ENABLE_RDMA_BIND)
+ newattr.enable_bind = attrs->enable_bind;
+ if (mask & IWCH_QP_ATTR_MAX_ORD) {
+ if (attrs->max_ord >
+ rhp->attr.max_rdma_read_qp_depth) {
+ ret = -EINVAL;
+ goto out;
+ }
+ newattr.max_ord = attrs->max_ord;
+ }
+ if (mask & IWCH_QP_ATTR_MAX_IRD) {
+ if (attrs->max_ird >
+ rhp->attr.max_rdma_reads_per_qp) {
+ ret = -EINVAL;
+ goto out;
+ }
+ newattr.max_ird = attrs->max_ird;
+ }
+ qhp->attr = newattr;
+ }
+
+ if (!(mask & IWCH_QP_ATTR_NEXT_STATE))
+ goto out;
+ if (qhp->attr.state == attrs->next_state)
+ goto out;
+
+ switch (qhp->attr.state) {
+ case IWCH_QP_STATE_IDLE:
+ switch (attrs->next_state) {
+ case IWCH_QP_STATE_RTS:
+ if (!(mask & IWCH_QP_ATTR_LLP_STREAM_HANDLE)) {
+ ret = -EINVAL;
+ goto out;
+ }
+ if (!(mask & IWCH_QP_ATTR_MPA_ATTR)) {
+ ret = -EINVAL;
+ goto out;
+ }
+ qhp->attr.mpa_attr = attrs->mpa_attr;
+ qhp->attr.llp_stream_handle = attrs->llp_stream_handle;
+ qhp->ep = qhp->attr.llp_stream_handle;
+ qhp->attr.state = IWCH_QP_STATE_RTS;
+
+ /*
+ * Ref the endpoint here and deref when we
+ * disassociate the endpoint from the QP. This
+ * happens in CLOSING->IDLE transition or *->ERROR
+ * transition.
+ */
+ get_ep(&qhp->ep->com);
+ spin_unlock_irqrestore(&qhp->lock, flag);
+ ret = rdma_init(rhp, qhp, mask, attrs);
+ spin_lock_irqsave(&qhp->lock, flag);
+ if (ret)
+ goto err;
+ break;
+ case IWCH_QP_STATE_ERROR:
+ qhp->attr.state = IWCH_QP_STATE_ERROR;
+ flush_qp(qhp, &flag);
+ break;
+ default:
+ ret = -EINVAL;
+ goto out;
+ }
+ break;
+ case IWCH_QP_STATE_RTS:
+ switch (attrs->next_state) {
+ case IWCH_QP_STATE_CLOSING:
+ BUG_ON(atomic_read(&qhp->ep->com.kref.refcount) < 2);
+ qhp->attr.state = IWCH_QP_STATE_CLOSING;
+ if (!internal) {
+ abort=0;
+ disconnect = 1;
+ ep = qhp->ep;
+ }
+ break;
+ case IWCH_QP_STATE_TERMINATE:
+ qhp->attr.state = IWCH_QP_STATE_TERMINATE;
+ if (!internal)
+ terminate = 1;
+ break;
+ case IWCH_QP_STATE_ERROR:
+ qhp->attr.state = IWCH_QP_STATE_ERROR;
+ if (!internal) {
+ abort=1;
+ disconnect = 1;
+ ep = qhp->ep;
+ }
+ goto err;
+ break;
+ default:
+ ret = -EINVAL;
+ goto out;
+ }
+ break;
+ case IWCH_QP_STATE_CLOSING:
+ if (!internal) {
+ ret = -EINVAL;
+ goto out;
+ }
+ switch (attrs->next_state) {
+ case IWCH_QP_STATE_IDLE:
+ qhp->attr.state = IWCH_QP_STATE_IDLE;
+ qhp->attr.llp_stream_handle = NULL;
+ put_ep(&qhp->ep->com);
+ qhp->ep = NULL;
+ wake_up(&qhp->wait);
+ break;
+ case IWCH_QP_STATE_ERROR:
+ goto err;
+ default:
+ ret = -EINVAL;
+ goto err;
+ }
+ break;
+ case IWCH_QP_STATE_ERROR:
+ if (attrs->next_state != IWCH_QP_STATE_IDLE) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (!Q_EMPTY(qhp->wq.sq_rptr, qhp->wq.sq_wptr) ||
+ !Q_EMPTY(qhp->wq.rq_rptr, qhp->wq.rq_wptr)) {
+ ret = -EINVAL;
+ goto out;
+ }
+ qhp->attr.state = IWCH_QP_STATE_IDLE;
+ memset(&qhp->attr, 0, sizeof(qhp->attr));
+ break;
+ case IWCH_QP_STATE_TERMINATE:
+ if (!internal) {
+ ret = -EINVAL;
+ goto out;
+ }
+ goto err;
+ break;
+ default:
+ printk(KERN_ERR "%s in a bad state %d\n",
+ __FUNCTION__, qhp->attr.state);
+ ret = -EINVAL;
+ goto err;
+ break;
+ }
+ goto out;
+err:
+ PDBG("%s disassociating ep %p qpid 0x%x\n", __FUNCTION__, qhp->ep,
+ qhp->wq.qpid);
+
+ /* disassociate the LLP connection */
+ qhp->attr.llp_stream_handle = NULL;
+ ep = qhp->ep;
+ qhp->ep = NULL;
+ qhp->attr.state = IWCH_QP_STATE_ERROR;
+ free=1;
+ wake_up(&qhp->wait);
+ BUG_ON(!ep);
+ flush_qp(qhp, &flag);
+out:
+ spin_unlock_irqrestore(&qhp->lock, flag);
+
+ if (terminate)
+ iwch_post_terminate(qhp, NULL);
+
+ /*
+ * If disconnect is 1, then we need to initiate a disconnect
+ * on the EP. This can be a normal close (RTS->CLOSING) or
+ * an abnormal close (RTS/CLOSING->ERROR).
+ */
+ if (disconnect)
+ iwch_ep_disconnect(ep, abort, GFP_KERNEL);
+
+ /*
+ * If free is 1, then we've disassociated the EP from the QP
+ * and we need to dereference the EP.
+ */
+ if (free)
+ put_ep(&ep->com);
+
+ PDBG("%s exit state %d\n", __FUNCTION__, qhp->attr.state);
+ return ret;
+}
+
+static int quiesce_qp(struct iwch_qp *qhp)
+{
+ spin_lock_irq(&qhp->lock);
+ iwch_quiesce_tid(qhp->ep);
+ qhp->flags |= QP_QUIESCED;
+ spin_unlock_irq(&qhp->lock);
+ return 0;
+}
+
+static int resume_qp(struct iwch_qp *qhp)
+{
+ spin_lock_irq(&qhp->lock);
+ iwch_resume_tid(qhp->ep);
+ qhp->flags &= ~QP_QUIESCED;
+ spin_unlock_irq(&qhp->lock);
+ return 0;
+}
+
+int iwch_quiesce_qps(struct iwch_cq *chp)
+{
+ int i;
+ struct iwch_qp *qhp;
+
+ for (i=0; i < T3_MAX_NUM_QP; i++) {
+ qhp = get_qhp(chp->rhp, i);
+ if (!qhp)
+ continue;
+ if ((qhp->attr.rcq == chp->cq.cqid) && !qp_quiesced(qhp)) {
+ quiesce_qp(qhp);
+ continue;
+ }
+ if ((qhp->attr.scq == chp->cq.cqid) && !qp_quiesced(qhp))
+ quiesce_qp(qhp);
+ }
+ return 0;
+}
+
+int iwch_resume_qps(struct iwch_cq *chp)
+{
+ int i;
+ struct iwch_qp *qhp;
+
+ for (i=0; i < T3_MAX_NUM_QP; i++) {
+ qhp = get_qhp(chp->rhp, i);
+ if (!qhp)
+ continue;
+ if ((qhp->attr.rcq == chp->cq.cqid) && qp_quiesced(qhp)) {
+ resume_qp(qhp);
+ continue;
+ }
+ if ((qhp->attr.scq == chp->cq.cqid) && qp_quiesced(qhp))
+ resume_qp(qhp);
+ }
+ return 0;
+}

2006-12-02 23:16:37

by Francois Romieu

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] 2.6.20 Chelsio T3 RDMA Driver

Steve Wise <[email protected]> :
[...]
> Version 2 changes:
>
> - Make code sparse endian clean
> - Use IDRs for mapping QP and CQ IDs to structure pointers instead of arrays
> - Clean up confusing bitfields
> - Use random32() instead of local random function
> - Use krefs to track endpoint reference counts
> - Misc nits
>
> -----
>
> The following series implements the Chelsio T3 iWARP/RDMA Driver to
> be considered for inclusion in 2.6.20. It depends on the Chelsio T3
> Ethernet Driver which is also under review now for 2.6.20. See:

I understood that Stephen expressed some doubts regarding the inclusion
of TOE enabled features.

Was his point addressed ?

--
Ueimor

2006-12-03 00:28:13

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] 2.6.20 Chelsio T3 RDMA Driver

Francois Romieu wrote:
> Steve Wise <[email protected]> :
> [...]
>
>> Version 2 changes:
>>
>> - Make code sparse endian clean
>> - Use IDRs for mapping QP and CQ IDs to structure pointers instead of arrays
>> - Clean up confusing bitfields
>> - Use random32() instead of local random function
>> - Use krefs to track endpoint reference counts
>> - Misc nits
>>
>> -----
>>
>> The following series implements the Chelsio T3 iWARP/RDMA Driver to
>> be considered for inclusion in 2.6.20. It depends on the Chelsio T3
>> Ethernet Driver which is also under review now for 2.6.20. See:
>>
>
> I understood that Stephen expressed some doubts regarding the inclusion
> of TOE enabled features.
>
> Was his point addressed ?
>
>

My comments were about different hardware.

2006-12-03 12:07:26

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [PATCH v2 03/13] Provider Methods and Data Structures

On Sat, 2006-12-02 at 16:49 -0600, Steve Wise wrote:

> +
> +static struct ib_ah *iwch_ah_create(struct ib_pd *pd,
> + struct ib_ah_attr *ah_attr)
> +{
> + return ERR_PTR(-ENOSYS);
> +}


-ENOSYS is just about ALWAYS a bug in that it's guaranteed to be the
wrong error code ;)


2006-12-03 16:29:34

by Jan Engelhardt

[permalink] [raw]
Subject: Re: [PATCH v2 02/13] Device Discovery and ULLD Linkage

Hi,



Some questions,suggestions,:

>+cxgb3_cpl_handler_func t3c_handlers[NUM_CPL_CMDS];

Can it be static'ified? (I suppose not.)

>+struct cxgb3_client t3c_client = {
>+ .name = "iw_cxgb3",
>+ .add = open_rnic_dev,
>+ .remove = close_rnic_dev,
>+ .handlers = t3c_handlers,
>+ .redirect = iwch_ep_redirect
>+};

Can it be const'ified?

>+static void rnic_init(struct iwch_dev *rnicp)
>+{
>+ PDBG("%s iwch_dev %p\n", __FUNCTION__, rnicp);
>+ idr_init(&rnicp->cqidr);
>+ idr_init(&rnicp->qpidr);
>+ idr_init(&rnicp->mmidr);
>+ spin_lock_init(&rnicp->lock);
>+
>+ rnicp->attr.vendor_id = 0x168;
>+ rnicp->attr.vendor_part_id = 7;

Sugg.:

typeof(rnicp->attr) *a = &rnicp->attr; // replace typeof with proper thing
a->vendor_id = 0x168;
a->vendor_part_id = 7;

shortens the lines a bit.

>+ rnicp->attr.max_qps = T3_MAX_NUM_QP - 32;
>+ rnicp->attr.max_wrs = (1UL << 24) - 1;
>+ rnicp->attr.max_sge_per_wr = T3_MAX_SGE;
>+ rnicp->attr.max_sge_per_rdma_write_wr = T3_MAX_SGE;
>+ rnicp->attr.max_cqs = T3_MAX_NUM_CQ - 1;
>+ rnicp->attr.max_cqes_per_cq = (1UL << 24) - 1;
>+ rnicp->attr.max_mem_regs = cxio_num_stags(&rnicp->rdev);
>+ rnicp->attr.max_phys_buf_entries = T3_MAX_PBL_SIZE;
>+ rnicp->attr.max_pds = T3_MAX_NUM_PD - 1;
>+ rnicp->attr.mem_pgsizes_bitmask = 0x7FFF; /* 4KB-128MB */
>+ rnicp->attr.can_resize_wq = 0;
>+ rnicp->attr.max_rdma_reads_per_qp = 8;
>+ rnicp->attr.max_rdma_read_resources =
>+ rnicp->attr.max_rdma_reads_per_qp * rnicp->attr.max_qps;
>+ rnicp->attr.max_rdma_read_qp_depth = 8; /* IRD */
>+ rnicp->attr.max_rdma_read_depth =
>+ rnicp->attr.max_rdma_read_qp_depth * rnicp->attr.max_qps;
>+ rnicp->attr.rq_overflow_handled = 0;
>+ rnicp->attr.can_modify_ird = 0;
>+ rnicp->attr.can_modify_ord = 0;
>+ rnicp->attr.max_mem_windows = rnicp->attr.max_mem_regs - 1;
>+ rnicp->attr.stag0_value = 1;
>+ rnicp->attr.zbva_support = 1;
>+ rnicp->attr.local_invalidate_fence = 1;
>+ rnicp->attr.cq_overflow_detection = 1;
>+ return;
>+}
>+
>--- /dev/null
>+++ b/drivers/infiniband/hw/cxgb3/iwch.h
>+static inline int t3b_device(struct iwch_dev *rhp)
>+{
>+ return (rhp->rdev.t3cdev_p->type == T3B);
>+}
>+
>+static inline int t3a_device(struct iwch_dev *rhp)
>+{
>+ return (rhp->rdev.t3cdev_p->type == T3A);
>+}

These two can be constified for sure: static inline int t3a_device(const
struct iwch_dev *rhp)

>+
>+static inline struct iwch_cq *get_chp(struct iwch_dev *rhp, u32 cqid)
>+{
>+ return idr_find(&rhp->cqidr, cqid);
>+}
>+
>+static inline struct iwch_qp *get_qhp(struct iwch_dev *rhp, u32 qpid)
>+{
>+ return idr_find(&rhp->qpidr, qpid);
>+}
>+
>+static inline struct iwch_mr *get_mhp(struct iwch_dev *rhp, u32 mmid)
>+{
>+ return idr_find(&rhp->mmidr, mmid);
>+}

Here I am not sure.




-`J'
--

2006-12-04 11:12:05

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Sat, Dec 02, 2006 at 04:49:58PM -0600, Steve Wise ([email protected]) wrote:
> +static int send_halfclose(struct iwch_ep *ep, gfp_t gfp)
> +{
> + struct cpl_close_con_req *req;
> + struct sk_buff *skb;
> +
> + PDBG("%s ep %p\n", __FUNCTION__, ep);
> + skb = get_skb(NULL, sizeof(*req), gfp);
> + if (!skb) {
> + printk(KERN_ERR MOD "%s - failed to alloc skb\n", __FUNCTION__);
> + return -ENOMEM;
> + }
> + skb->priority = CPL_PRIORITY_DATA;
> + set_arp_failure_handler(skb, arp_failure_discard);
> + req = (struct cpl_close_con_req *) skb_put(skb, sizeof(*req));
> + req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_CLOSE_CON));
> + req->wr.wr_lo = htonl(V_WR_TID(ep->hwtid));
> + OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_CLOSE_CON_REQ, ep->hwtid));
> + l2t_send(ep->com.tdev, skb, ep->l2t);
> + return 0;
> +}
> +
> +static int send_abort(struct iwch_ep *ep, struct sk_buff *skb, gfp_t gfp)
> +{
> + struct cpl_abort_req *req;
> +
> + PDBG("%s ep %p\n", __FUNCTION__, ep);
> + skb = get_skb(skb, sizeof(*req), gfp);
> + if (!skb) {
> + printk(KERN_ERR MOD "%s - failed to alloc skb.\n",
> + __FUNCTION__);
> + return -ENOMEM;
> + }
> + skb->priority = CPL_PRIORITY_DATA;
> + set_arp_failure_handler(skb, abort_arp_failure);
> + req = (struct cpl_abort_req *) skb_put(skb, sizeof(*req));
> + req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_OFLD_HOST_ABORT_CON_REQ));
> + req->wr.wr_lo = htonl(V_WR_TID(ep->hwtid));
> + OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_ABORT_REQ, ep->hwtid));
> + req->cmd = CPL_ABORT_SEND_RST;
> + l2t_send(ep->com.tdev, skb, ep->l2t);
> + return 0;
> +}
> +
> +static int send_connect(struct iwch_ep *ep)
> +{
> + struct cpl_act_open_req *req;
> + struct sk_buff *skb;
> + u32 opt0h, opt0l, opt2;
> + unsigned int mtu_idx;
> + int wscale;
> +
> + PDBG("%s ep %p\n", __FUNCTION__, ep);
> +
> + skb = get_skb(NULL, sizeof(*req), GFP_KERNEL);
> + if (!skb) {
> + printk(KERN_ERR MOD "%s - failed to alloc skb.\n",
> + __FUNCTION__);
> + return -ENOMEM;
> + }
> + mtu_idx = find_best_mtu(T3C_DATA(ep->com.tdev), dst_mtu(ep->dst));
> + wscale = compute_wscale(rcv_win);
> + opt0h = V_NAGLE(0) |
> + V_NO_CONG(nocong) |
> + V_KEEP_ALIVE(1) |
> + F_TCAM_BYPASS |
> + V_WND_SCALE(wscale) |
> + V_MSS_IDX(mtu_idx) |
> + V_L2T_IDX(ep->l2t->idx) | V_TX_CHANNEL(ep->l2t->smt_idx);
> + opt0l = V_TOS((ep->tos >> 2) & M_TOS) | V_RCV_BUFSIZ(rcv_win>>10);
> + opt2 = V_FLAVORS_VALID(0) | V_CONG_CONTROL_FLAVOR(0);
> + skb->priority = CPL_PRIORITY_SETUP;
> + set_arp_failure_handler(skb, act_open_req_arp_failure);
> +
> + req = (struct cpl_act_open_req *) skb_put(skb, sizeof(*req));
> + req->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
> + OPCODE_TID(req) = htonl(MK_OPCODE_TID(CPL_ACT_OPEN_REQ, ep->atid));
> + req->local_port = ep->com.local_addr.sin_port;
> + req->peer_port = ep->com.remote_addr.sin_port;
> + req->local_ip = ep->com.local_addr.sin_addr.s_addr;
> + req->peer_ip = ep->com.remote_addr.sin_addr.s_addr;
> + req->opt0h = htonl(opt0h);
> + req->opt0l = htonl(opt0l);
> + req->params = 0;
> + req->opt2 = htonl(opt2);
> + l2t_send(ep->com.tdev, skb, ep->l2t);
> + return 0;
> +}

...

> +static int act_establish(struct t3cdev *tdev, struct sk_buff *skb, void *ctx)
> +{
> + struct iwch_ep *ep = ctx;
> + struct cpl_act_establish *req = cplhdr(skb);
> + unsigned int tid = GET_TID(req);
> +
> + PDBG("%s ep %p tid %d\n", __FUNCTION__, ep, tid);
> +
> + dst_confirm(ep->dst);
> +
> + /* setup the hwtid for this connection */
> + ep->hwtid = tid;
> + cxgb3_insert_tid(ep->com.tdev, &t3c_client, ep, tid);
> +
> + ep->snd_seq = ntohl(req->snd_isn);
> +
> + set_emss(ep, ntohs(req->tcp_opt));
> +
> + /* dealloc the atid */
> + cxgb3_free_atid(ep->com.tdev, ep->atid);
> +
> + /* start MPA negotiation */
> + send_mpa_req(ep, skb);
> +
> + return 0;
> +}
> +
> +static void abort_connection(struct iwch_ep *ep, struct sk_buff *skb)
> +{
> + PDBG("%s ep %p\n", __FILE__, ep);
> + state_set(&ep->com, ABORTING);
> + send_abort(ep, skb, GFP_KERNEL);
> +}

Could you convince network core developers that it is not own TCP
implementation which will mess with existing one?

This and a lot of other changes in this driver definitely says you
implement your own stack of protocols on top of infiniband hardware.

--
Evgeniy Polyakov

2006-12-04 15:46:05

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

> Could you convince network core developers that it is not own TCP
> implementation which will mess with existing one?

I'm not qualified to comment on this...

> This and a lot of other changes in this driver definitely says you
> implement your own stack of protocols on top of infiniband hardware.

...but I do know this driver is for 10-gig ethernet HW.

- R.

2006-12-04 16:20:54

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Mon, 2006-12-04 at 07:45 -0800, Roland Dreier wrote:
> > Could you convince network core developers that it is not own TCP
> > implementation which will mess with existing one?
>
> I'm not qualified to comment on this...
>

I don't understand your question?

> > This and a lot of other changes in this driver definitely says you
> > implement your own stack of protocols on top of infiniband hardware.
>
> ...but I do know this driver is for 10-gig ethernet HW.
>

There is no SW TCP stack in this driver. The HW supports RDMA over
TCP/IP/10GbE in HW and this is required for zero-copy RDMA over Ethernet
(aka iWARP). The device is a 10 GbE device, not Infiniband. The
Ethernet driver, upon which the rdma driver depends, acts both like a
traditional Ethernet NIC for the Linux stack as well as a TCP offload
device for the RDMA driver allowing establishment of RDMA connections.
The Connection Manager (patch 04/13) sends/receives messages from the
Ethernet driver that sets up HW TCP connections for doing RDMA. While
this is indeed implementing TCP offload, it is _not_ integrating it with
the sockets layer nor the linux stack and offloading sockets
connections. Its only supporting offload connections for the RDMA
driver to do iWARP. The Ammasso device is another example of this
(drivers/infiniband/hw/amso1100). Deep iSCSI adapters are another
example of this.


Steve.


2006-12-04 16:24:36

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 00/13] 2.6.20 Chelsio T3 RDMA Driver

> >>
> >
> > I understood that Stephen expressed some doubts regarding the inclusion
> > of TOE enabled features.
> >
> > Was his point addressed ?
> >
> >
>
> My comments were about different hardware.


Just to clarify:

Stephen is working on the Chelsio T2 HW driver.

The drivers Divy and I are submitting are for the new Chelsio T3
hardware. Two drivers are being submitted: The Ethernet driver
(submitted by Divy) and the RDMA driver (submitted by me) which requires
the Ethernet driver. The RDMA driver will live in
drivers/infiniband/hw/cxgb3 and the Ethernet driver will live in
drivers/net/cxgb3.


Steve.




2006-12-04 16:28:27

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 03/13] Provider Methods and Data Structures

On Sun, 2006-12-03 at 13:07 +0100, Arjan van de Ven wrote:
> On Sat, 2006-12-02 at 16:49 -0600, Steve Wise wrote:
>
> > +
> > +static struct ib_ah *iwch_ah_create(struct ib_pd *pd,
> > + struct ib_ah_attr *ah_attr)
> > +{
> > + return ERR_PTR(-ENOSYS);
> > +}
>
>
> -ENOSYS is just about ALWAYS a bug in that it's guaranteed to be the
> wrong error code ;)

This is a method that is not supported by the iWARP transport. ENOSYS
indicates this. I _think_ this is SOP for the infinband subsystem.

Roland, I think at one time we were talking about changing the Core to
better handle this? Either with attributes/capabilities that the low
level driver can set, or by set these method ptrs to NULL and the core
should handle it in the wrapper function...


Steve.

2006-12-04 16:45:47

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH v2 03/13] Provider Methods and Data Structures

> Roland, I think at one time we were talking about changing the Core to
> better handle this? Either with attributes/capabilities that the low
> level driver can set, or by set these method ptrs to NULL and the core
> should handle it in the wrapper function...

Yes, it would make sense to change the midlayer so we have different
sets of mandatory functions for IB and iWARP drivers. For example,
the iwcm functions probably should be mandatory for iWARP devices, right?

- R.

2006-12-04 16:50:58

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 03/13] Provider Methods and Data Structures

On Mon, 2006-12-04 at 08:45 -0800, Roland Dreier wrote:
> > Roland, I think at one time we were talking about changing the Core to
> > better handle this? Either with attributes/capabilities that the low
> > level driver can set, or by set these method ptrs to NULL and the core
> > should handle it in the wrapper function...
>
> Yes, it would make sense to change the midlayer so we have different
> sets of mandatory functions for IB and iWARP drivers. For example,
> the iwcm functions probably should be mandatory for iWARP devices, right?
>

Yes. The iWARP devices must all support the iwcm methods for sure.



2006-12-05 05:10:50

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Mon, Dec 04, 2006 at 07:45:52AM -0800, Roland Dreier ([email protected]) wrote:
> > This and a lot of other changes in this driver definitely says you
> > implement your own stack of protocols on top of infiniband hardware.
>
> ...but I do know this driver is for 10-gig ethernet HW.

It is for iwarp/rdma from description.
If it is 10ge, then why does it parse incomping packet headers and
implements initial tcp state machine?

> - R.

--
Evgeniy Polyakov

2006-12-05 05:14:09

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

> It is for iwarp/rdma from description.

Yes, iWARP on top of 10G ethernet.

> If it is 10ge, then why does it parse incomping packet headers and
> implements initial tcp state machine?

To establish connections to run RDMA over, I guess. iWARP is RDMA
over TCP.

- R.

2006-12-05 05:14:17

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Mon, Dec 04, 2006 at 10:20:51AM -0600, Steve Wise ([email protected]) wrote:
> > > This and a lot of other changes in this driver definitely says you
> > > implement your own stack of protocols on top of infiniband hardware.
> >
> > ...but I do know this driver is for 10-gig ethernet HW.
> >
>
> There is no SW TCP stack in this driver. The HW supports RDMA over
> TCP/IP/10GbE in HW and this is required for zero-copy RDMA over Ethernet
> (aka iWARP). The device is a 10 GbE device, not Infiniband. The
> Ethernet driver, upon which the rdma driver depends, acts both like a
> traditional Ethernet NIC for the Linux stack as well as a TCP offload
> device for the RDMA driver allowing establishment of RDMA connections.
> The Connection Manager (patch 04/13) sends/receives messages from the
> Ethernet driver that sets up HW TCP connections for doing RDMA. While
> this is indeed implementing TCP offload, it is _not_ integrating it with
> the sockets layer nor the linux stack and offloading sockets
> connections. Its only supporting offload connections for the RDMA
> driver to do iWARP. The Ammasso device is another example of this
> (drivers/infiniband/hw/amso1100). Deep iSCSI adapters are another
> example of this.

So what will happen when application will create a socket, bind it to
that NIC, and then try to establish a TCP connection? How NIC will
decide that received packets are from socket but not for internal TCP
state machine handled by that device?

As a side note, does all iwarp devices _require_ to have very
limited TCP engine implemented it in its hardware, or it is possible
to work with external SW stack?

> Steve.

--
Evgeniy Polyakov

2006-12-05 05:17:18

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Mon, Dec 04, 2006 at 09:13:59PM -0800, Roland Dreier ([email protected]) wrote:
> > It is for iwarp/rdma from description.
>
> Yes, iWARP on top of 10G ethernet.
>
> > If it is 10ge, then why does it parse incomping packet headers and
> > implements initial tcp state machine?
>
> To establish connections to run RDMA over, I guess. iWARP is RDMA
> over TCP.

So will each new NIC implement some parts of TCP stack in theirs drivers?

> - R.

--
Evgeniy Polyakov

2006-12-05 05:27:23

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

> So will each new NIC implement some parts of TCP stack in theirs drivers?

I hope not. The driver we merged (amso1100) did it completely in FW,
with a separate MAC and IP interface for the RDMA connections. I
think we better understand the Chelsio driver pretty well and think it
over carefully before we merge it.

Thanks for pointing this stuff out.

- R.

2006-12-05 10:46:11

by Brice Goglin

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

Steve Wise wrote:
> There is no SW TCP stack in this driver. The HW supports RDMA over
> TCP/IP/10GbE in HW and this is required for zero-copy RDMA over Ethernet
> (aka iWARP). The device is a 10 GbE device, not Infiniband.

Then, I wonder why the driver goes in drivers/infiniband/ :)

Is there really no way to only keep the actual hw infiniband there, move
iwarp/rdma drivers in drivers/net/something/ and the core stuff in
net/something/ ?

Brice

2006-12-05 15:02:10

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, 2006-12-05 at 08:07 +0300, Evgeniy Polyakov wrote:
> On Mon, Dec 04, 2006 at 07:45:52AM -0800, Roland Dreier ([email protected]) wrote:
> > > This and a lot of other changes in this driver definitely says you
> > > implement your own stack of protocols on top of infiniband hardware.
> >
> > ...but I do know this driver is for 10-gig ethernet HW.
>
> It is for iwarp/rdma from description.
> If it is 10ge, then why does it parse incomping packet headers and
> implements initial tcp state machine?
>

Its not implementing the TCP state machine at all. Its implementing the
MPA state machine (see the iWARP internet drafts). These packets are
TCP payload. MPA is used to negotiate RDMA mode on a TCP connection.
This entails an exchange of 2 messages on the TCP connection. Once this
is exchanged and both side agree, the connection is bound to an RDMA QP
and the connection moved into RDMA mode. From that point on, all IO is
done via the post_send() and post_recv().


Steve.


2006-12-05 15:03:36

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Mon, 2006-12-04 at 21:13 -0800, Roland Dreier wrote:
> > It is for iwarp/rdma from description.
>
> Yes, iWARP on top of 10G ethernet.
>
> > If it is 10ge, then why does it parse incomping packet headers and
> > implements initial tcp state machine?
>
> To establish connections to run RDMA over, I guess. iWARP is RDMA
> over TCP.
>

The driver uses messages exchanged to and from the HW via the Ethernet
driver to setup TCP connections. No TCP processing is done in the host.
The hardware does all the TCP processing.


Steve.



2006-12-05 15:07:34

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, 2006-12-05 at 08:13 +0300, Evgeniy Polyakov wrote:
> On Mon, Dec 04, 2006 at 10:20:51AM -0600, Steve Wise ([email protected]) wrote:
> > > > This and a lot of other changes in this driver definitely says you
> > > > implement your own stack of protocols on top of infiniband hardware.
> > >
> > > ...but I do know this driver is for 10-gig ethernet HW.
> > >
> >
> > There is no SW TCP stack in this driver. The HW supports RDMA over
> > TCP/IP/10GbE in HW and this is required for zero-copy RDMA over Ethernet
> > (aka iWARP). The device is a 10 GbE device, not Infiniband. The
> > Ethernet driver, upon which the rdma driver depends, acts both like a
> > traditional Ethernet NIC for the Linux stack as well as a TCP offload
> > device for the RDMA driver allowing establishment of RDMA connections.
> > The Connection Manager (patch 04/13) sends/receives messages from the
> > Ethernet driver that sets up HW TCP connections for doing RDMA. While
> > this is indeed implementing TCP offload, it is _not_ integrating it with
> > the sockets layer nor the linux stack and offloading sockets
> > connections. Its only supporting offload connections for the RDMA
> > driver to do iWARP. The Ammasso device is another example of this
> > (drivers/infiniband/hw/amso1100). Deep iSCSI adapters are another
> > example of this.
>
> So what will happen when application will create a socket, bind it to
> that NIC, and then try to establish a TCP connection? How NIC will
> decide that received packets are from socket but not for internal TCP
> state machine handled by that device?

The HW knows which TCP connections are offloaded by virtue of the fact
that they were setup via the RDMA subsystem. Any other TCP traffic (and
all other non TCP traffic) gets passed to the host stack.

>
> As a side note, does all iwarp devices _require_ to have very
> limited TCP engine implemented it in its hardware, or it is possible
> to work with external SW stack?

It is possible, but not very interesting.

One could implement an all-software iWARP stack. The iWARP protocols
are just TCP payload and _could_ be implemented in user mode on top of a
socket. However, this isn't very interesting: the goal of iWARP (and
RDMA for that matter) is to allow direct placement of data into user
memory with 0 copies done by the host CPU. low latency.

Steve.


2006-12-05 15:14:37

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Mon, 2006-12-04 at 21:27 -0800, Roland Dreier wrote:
> > So will each new NIC implement some parts of TCP stack in theirs drivers?
>
> I hope not. The driver we merged (amso1100) did it completely in FW,
> with a separate MAC and IP interface for the RDMA connections. I
> think we better understand the Chelsio driver pretty well and think it
> over carefully before we merge it.
>

Chelsio doesn't implement TCP stack in the driver. Just like Ammasso,
it sends messages to the HW to setup connections. It differs from
Ammasso in at least 2 ways:

1) Ammasso does the MPA negotiations in FW/HW. Chelsio does it in the
RDMA driver. So there is code in the Chelsio driver to handle MPA
startup negotiation (the exchange of 2 packets over the TCP connection
while its still in streaming more). BTW: This code _could_ be moved
into the core IWCM if we find it could be used by other rnic devices
(don't know yet).

2) Ammasso implments a 100% deep adapter. It does ARP, routing, IP,
TCP, and IWARP protocols all in firmware/hw. It had 2 mac addresses
simulating 2 ethernet ports. One exclusively for RDMA connections, and
one for host stack traffic. Chelsio implements a shallower adapter that
only does TCP in HW. ARP, for instance, is handled by the native stack
and the rdma driver uses netevents to maintain arp tables in the HW for
use by the offloaded TCP connections.

Steve.



2006-12-05 15:23:30

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, Dec 05, 2006 at 09:02:05AM -0600, Steve Wise ([email protected]) wrote:
> > > > This and a lot of other changes in this driver definitely says you
> > > > implement your own stack of protocols on top of infiniband hardware.
> > >
> > > ...but I do know this driver is for 10-gig ethernet HW.
> >
> > It is for iwarp/rdma from description.
> > If it is 10ge, then why does it parse incomping packet headers and
> > implements initial tcp state machine?
> >
>
> Its not implementing the TCP state machine at all. Its implementing the
> MPA state machine (see the iWARP internet drafts). These packets are
> TCP payload. MPA is used to negotiate RDMA mode on a TCP connection.
> This entails an exchange of 2 messages on the TCP connection. Once this
> is exchanged and both side agree, the connection is bound to an RDMA QP
> and the connection moved into RDMA mode. From that point on, all IO is
> done via the post_send() and post_recv().

And why does rdma require window scaling, keep alive, nagle and other
interesting options from TCP spec?

This really looks like initial implementation of TCP in hardware - you
setup flags like doing the same using setsockopt() and then hardware
manages the flow like network stack manages TCP state machine changes.

According to draft-culley-iwarp-mpa-03.txt this layer can do a lot of
things with valid TCP flow like

5. The TCP sender puts the FPDUs into the TCP stream. If the TCP
Sender is MPA-aware, it segments the TCP stream in such a way
that a TCP Segment boundary is also the boundary of an FPDU.
TCP then passes each segment to the IP layer for transmission.

Phrases like "MPA-aware TCP" rises a lot of questions - briefly saying
that hardware (even if it is called ethernet driver) can create and work
with own TCP flows potentially modified in the way it likes which is seen
in driver. Likely such flows will not be seen by upper layers like OS
network stack according to hardware descriptions.

Is it correct?

> Steve.

--
Evgeniy Polyakov

2006-12-05 15:27:47

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, Dec 05, 2006 at 09:14:36AM -0600, Steve Wise ([email protected]) wrote:
> Chelsio doesn't implement TCP stack in the driver. Just like Ammasso,
> it sends messages to the HW to setup connections. It differs from
> Ammasso in at least 2 ways:
>
> 1) Ammasso does the MPA negotiations in FW/HW. Chelsio does it in the
> RDMA driver. So there is code in the Chelsio driver to handle MPA
> startup negotiation (the exchange of 2 packets over the TCP connection
> while its still in streaming more). BTW: This code _could_ be moved
> into the core IWCM if we find it could be used by other rnic devices
> (don't know yet).
>
> 2) Ammasso implments a 100% deep adapter. It does ARP, routing, IP,
> TCP, and IWARP protocols all in firmware/hw. It had 2 mac addresses
> simulating 2 ethernet ports. One exclusively for RDMA connections, and
> one for host stack traffic. Chelsio implements a shallower adapter that
> only does TCP in HW. ARP, for instance, is handled by the native stack
> and the rdma driver uses netevents to maintain arp tables in the HW for
> use by the offloaded TCP connections.

So breifly saying - there is TCP stack implementation (including ARP and
routing and other parts) in hardware/firmware/driver which is guaranteed
to not be visible to host other than in form of high-level dataflow.
Am I right here?

> Steve.
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Evgeniy Polyakov

2006-12-05 15:40:00

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, 2006-12-05 at 18:19 +0300, Evgeniy Polyakov wrote:
> On Tue, Dec 05, 2006 at 09:02:05AM -0600, Steve Wise ([email protected]) wrote:
> > > > > This and a lot of other changes in this driver definitely says you
> > > > > implement your own stack of protocols on top of infiniband hardware.
> > > >
> > > > ...but I do know this driver is for 10-gig ethernet HW.
> > >
> > > It is for iwarp/rdma from description.
> > > If it is 10ge, then why does it parse incomping packet headers and
> > > implements initial tcp state machine?
> > >
> >
> > Its not implementing the TCP state machine at all. Its implementing the
> > MPA state machine (see the iWARP internet drafts). These packets are
> > TCP payload. MPA is used to negotiate RDMA mode on a TCP connection.
> > This entails an exchange of 2 messages on the TCP connection. Once this
> > is exchanged and both side agree, the connection is bound to an RDMA QP
> > and the connection moved into RDMA mode. From that point on, all IO is
> > done via the post_send() and post_recv().
>
> And why does rdma require window scaling, keep alive, nagle and other
> interesting options from TCP spec?
>

The connection setup messages sent to the hardware need to have these
parameters so the TCP engine on the HW knows how to do connection
options, windows, etc.

> This really looks like initial implementation of TCP in hardware - you
> setup flags like doing the same using setsockopt() and then hardware
> manages the flow like network stack manages TCP state machine changes.
>
> According to draft-culley-iwarp-mpa-03.txt this layer can do a lot of
> things with valid TCP flow like
>
> 5. The TCP sender puts the FPDUs into the TCP stream. If the TCP
> Sender is MPA-aware, it segments the TCP stream in such a way
> that a TCP Segment boundary is also the boundary of an FPDU.
> TCP then passes each segment to the IP layer for transmission.
>
> Phrases like "MPA-aware TCP" rises a lot of questions - briefly saying
> that hardware (even if it is called ethernet driver) can create and work
> with own TCP flows potentially modified in the way it likes which is seen
> in driver. Likely such flows will not be seen by upper layers like OS
> network stack according to hardware descriptions.
>
> Is it correct?
>

I don't quite get your point about the driver aspect of this?

The HW manages the iWARP connection including data flow. It adheres to
the MPA, RDDP, and RDMAP protocol specification IDs from the IETF. The
HW manages how data gets pushed out in the RDMA stream. The RDMA
Driver just requests a TCP connection and does the MPA exchange. Then
tells the hardware to move the connection into RDMA mode. From that
point on, the driver simply suffles IO work requests from the consumer
application to the hardware and handles asynchronous events while the
connection is up and running.

Steve.




2006-12-05 15:46:19

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, 2006-12-05 at 18:27 +0300, Evgeniy Polyakov wrote:
> On Tue, Dec 05, 2006 at 09:14:36AM -0600, Steve Wise ([email protected]) wrote:
> > Chelsio doesn't implement TCP stack in the driver. Just like Ammasso,
> > it sends messages to the HW to setup connections. It differs from
> > Ammasso in at least 2 ways:
> >
> > 1) Ammasso does the MPA negotiations in FW/HW. Chelsio does it in the
> > RDMA driver. So there is code in the Chelsio driver to handle MPA
> > startup negotiation (the exchange of 2 packets over the TCP connection
> > while its still in streaming more). BTW: This code _could_ be moved
> > into the core IWCM if we find it could be used by other rnic devices
> > (don't know yet).
> >
> > 2) Ammasso implments a 100% deep adapter. It does ARP, routing, IP,
> > TCP, and IWARP protocols all in firmware/hw. It had 2 mac addresses
> > simulating 2 ethernet ports. One exclusively for RDMA connections, and
> > one for host stack traffic. Chelsio implements a shallower adapter that
> > only does TCP in HW. ARP, for instance, is handled by the native stack
> > and the rdma driver uses netevents to maintain arp tables in the HW for
> > use by the offloaded TCP connections.
>
> So breifly saying - there is TCP stack implementation (including ARP and
> routing and other parts) in hardware/firmware/driver which is guaranteed
> to not be visible to host other than in form of high-level dataflow.
> Am I right here?

For Ammasso, yes.

2006-12-05 16:02:12

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, 2006-12-05 at 11:45 +0100, Brice Goglin wrote:
> Steve Wise wrote:
> > There is no SW TCP stack in this driver. The HW supports RDMA over
> > TCP/IP/10GbE in HW and this is required for zero-copy RDMA over Ethernet
> > (aka iWARP). The device is a 10 GbE device, not Infiniband.
>
> Then, I wonder why the driver goes in drivers/infiniband/ :)

drivers/infiniband support both IB and IWARP transports.

> Is there really no way to only keep the actual hw infiniband there, move
> iwarp/rdma drivers in drivers/net/something/ and the core stuff in
> net/something/ ?
>

Sure, this _could_ be done, but what I think you're missing is that
applications use the interface exported by drivers/infiniband over both
IB -and- IWARP transports. The application can be written to not care
which transport is used. Examples of apps that can run over both
transports using the same common interface:

user mode: MVAPICH2, OMPI, IMPI, HPMPI,
kernel mode: NFS-RDMA, iSER.

Note that the include directory used by drivers/infiniband is now
include/rdma. Perhaps drivers/infiniband should be renamed to
drivers/rdma as well at some point...



Steve.



2006-12-05 16:04:25

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, Dec 05, 2006 at 09:39:58AM -0600, Steve Wise ([email protected]) wrote:
> > Phrases like "MPA-aware TCP" rises a lot of questions - briefly saying
> > that hardware (even if it is called ethernet driver) can create and work
> > with own TCP flows potentially modified in the way it likes which is seen
> > in driver. Likely such flows will not be seen by upper layers like OS
> > network stack according to hardware descriptions.
> >
> > Is it correct?
> >
>
> I don't quite get your point about the driver aspect of this?
>
> The HW manages the iWARP connection including data flow. It adheres to
> the MPA, RDDP, and RDMAP protocol specification IDs from the IETF. The
> HW manages how data gets pushed out in the RDMA stream. The RDMA
> Driver just requests a TCP connection and does the MPA exchange. Then
> tells the hardware to move the connection into RDMA mode. From that
> point on, the driver simply suffles IO work requests from the consumer
> application to the hardware and handles asynchronous events while the
> connection is up and running.

My main concern about this is the fact, that protocol handling is
splitted into SF and HW parts, and actually until negotiation is
completed those parts are completely unrelated to each other, so
requested TCP connection can leak into main stack and main stack can
send some packets which can be considered as MPA negotiation.

> Steve.

--
Evgeniy Polyakov

2006-12-05 16:12:44

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, 2006-12-05 at 18:59 +0300, Evgeniy Polyakov wrote:
> On Tue, Dec 05, 2006 at 09:39:58AM -0600, Steve Wise ([email protected]) wrote:
> > > Phrases like "MPA-aware TCP" rises a lot of questions - briefly saying
> > > that hardware (even if it is called ethernet driver) can create and work
> > > with own TCP flows potentially modified in the way it likes which is seen
> > > in driver. Likely such flows will not be seen by upper layers like OS
> > > network stack according to hardware descriptions.
> > >
> > > Is it correct?
> > >
> >
> > I don't quite get your point about the driver aspect of this?
> >
> > The HW manages the iWARP connection including data flow. It adheres to
> > the MPA, RDDP, and RDMAP protocol specification IDs from the IETF. The
> > HW manages how data gets pushed out in the RDMA stream. The RDMA
> > Driver just requests a TCP connection and does the MPA exchange. Then
> > tells the hardware to move the connection into RDMA mode. From that
> > point on, the driver simply suffles IO work requests from the consumer
> > application to the hardware and handles asynchronous events while the
> > connection is up and running.
>
> My main concern about this is the fact, that protocol handling is
> splitted into SF and HW parts, and actually until negotiation is
> completed those parts are completely unrelated to each other, so
> requested TCP connection can leak into main stack and main stack can
> send some packets which can be considered as MPA negotiation.
>

Ah. Data from an offloaded connection cannot leak into the main stack
nor vice-verse. We can take an active RDMA connection establishment as
an example if you want: Once the message is sent to the HW to "setup a
TCP connection from addr/port a.b to addr/port c.d", then packets on
that connection (that 4-tuple) will always be delivered to the RDMA
driver, not the native stack. If the the packet received after the
connection is setup is -not- an MPA reply (in this example), then the
connection is aborted. Once the connection is aborted. So no leaking
can happen.





2006-12-05 16:17:44

by Steve Wise

[permalink] [raw]
Subject: Re: [openib-general] [PATCH v2 04/13] Connection Manager

On Tue, 2006-12-05 at 10:12 -0600, Steve Wise wrote:
> On Tue, 2006-12-05 at 18:59 +0300, Evgeniy Polyakov wrote:
> > On Tue, Dec 05, 2006 at 09:39:58AM -0600, Steve Wise ([email protected]) wrote:
> > > > Phrases like "MPA-aware TCP" rises a lot of questions - briefly saying
> > > > that hardware (even if it is called ethernet driver) can create and work
> > > > with own TCP flows potentially modified in the way it likes which is seen
> > > > in driver. Likely such flows will not be seen by upper layers like OS
> > > > network stack according to hardware descriptions.
> > > >
> > > > Is it correct?
> > > >
> > >
> > > I don't quite get your point about the driver aspect of this?
> > >
> > > The HW manages the iWARP connection including data flow. It adheres to
> > > the MPA, RDDP, and RDMAP protocol specification IDs from the IETF. The
> > > HW manages how data gets pushed out in the RDMA stream. The RDMA
> > > Driver just requests a TCP connection and does the MPA exchange. Then
> > > tells the hardware to move the connection into RDMA mode. From that
> > > point on, the driver simply suffles IO work requests from the consumer
> > > application to the hardware and handles asynchronous events while the
> > > connection is up and running.
> >
> > My main concern about this is the fact, that protocol handling is
> > splitted into SF and HW parts, and actually until negotiation is
> > completed those parts are completely unrelated to each other, so
> > requested TCP connection can leak into main stack and main stack can
> > send some packets which can be considered as MPA negotiation.
> >
>
> Ah. Data from an offloaded connection cannot leak into the main stack
> nor vice-verse. We can take an active RDMA connection establishment as
> an example if you want: Once the message is sent to the HW to "setup a
> TCP connection from addr/port a.b to addr/port c.d", then packets on
> that connection (that 4-tuple) will always be delivered to the RDMA
> driver, not the native stack. If the the packet received after the
> connection is setup is -not- an MPA reply (in this example), then the
> connection is aborted. Once the connection is aborted.
^ the 4 tuple can
then be reused for rdma or native stack tcp connections.



2006-12-05 16:27:14

by Steve Wise

[permalink] [raw]
Subject: Re: [openib-general] [PATCH v2 04/13] Connection Manager

On Tue, 2006-12-05 at 10:02 -0600, Steve Wise wrote:
> On Tue, 2006-12-05 at 11:45 +0100, Brice Goglin wrote:
> > Steve Wise wrote:
> > > There is no SW TCP stack in this driver. The HW supports RDMA over
> > > TCP/IP/10GbE in HW and this is required for zero-copy RDMA over Ethernet
> > > (aka iWARP). The device is a 10 GbE device, not Infiniband.
> >
> > Then, I wonder why the driver goes in drivers/infiniband/ :)
>
> drivers/infiniband support both IB and IWARP transports.
>
> > Is there really no way to only keep the actual hw infiniband there, move
> > iwarp/rdma drivers in drivers/net/something/ and the core stuff in
> > net/something/ ?
> >
>
> Sure, this _could_ be done, but what I think you're missing is that
> applications use the interface exported by drivers/infiniband over both
> IB -and- IWARP transports. The application can be written to not care
> which transport is used. Examples of apps that can run over both
> transports using the same common interface:
>
> user mode: MVAPICH2, OMPI, IMPI, HPMPI,
> kernel mode: NFS-RDMA, iSER.
>
> Note that the include directory used by drivers/infiniband is now
> include/rdma. Perhaps drivers/infiniband should be renamed to
> drivers/rdma as well at some point...


By the way, FYI: The Chelsio T3 device support is split into 2 driver
modules: the Ethernet driver and the RDMA driver. The Ethernet driver
lives in drivers/net/cxgb3 while the RDMA driver lives in
drivers/infiniband/hw/cxgb3. The Ethernet driver can be used
stand-alone as a 10GbE high-performance NIC driver. The RDMA driver has
a config-time dependency on the Ethernet driver.

The 2nd version of the Ethernet driver was posted yesterday. See:

http://www.spinics.net/lists/netdev/msg20464.html



Steve.

2006-12-05 16:32:21

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, Dec 05, 2006 at 10:12:42AM -0600, Steve Wise ([email protected]) wrote:
> Ah. Data from an offloaded connection cannot leak into the main stack
> nor vice-verse. We can take an active RDMA connection establishment as
> an example if you want: Once the message is sent to the HW to "setup a
> TCP connection from addr/port a.b to addr/port c.d", then packets on
> that connection (that 4-tuple) will always be delivered to the RDMA
> driver, not the native stack. If the the packet received after the
> connection is setup is -not- an MPA reply (in this example), then the
> connection is aborted. Once the connection is aborted. So no leaking
> can happen.

And if there were a dataflow between addr/port a.b to addr/port c.d
already, it will either terminated?

Considering the following sequence:
handlers->t3c_handlers->sched()->work_queue->work_handlers()->for
example CPL_PASS_ACCEPT_REQ->pass_accept_req() - it just parses incoming
skb and sets port/addr/route and other fields to be used as a base for rdma
connection. What if it just a usual network packet from kernelspace or
userspace with the same payload as should be sent by remote rdma system?

--
Evgeniy Polyakov

2006-12-05 16:47:29

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, 2006-12-05 at 19:31 +0300, Evgeniy Polyakov wrote:
> On Tue, Dec 05, 2006 at 10:12:42AM -0600, Steve Wise ([email protected]) wrote:
> > Ah. Data from an offloaded connection cannot leak into the main stack
> > nor vice-verse. We can take an active RDMA connection establishment as
> > an example if you want: Once the message is sent to the HW to "setup a
> > TCP connection from addr/port a.b to addr/port c.d", then packets on
> > that connection (that 4-tuple) will always be delivered to the RDMA
> > driver, not the native stack. If the the packet received after the
> > connection is setup is -not- an MPA reply (in this example), then the
> > connection is aborted. Once the connection is aborted. So no leaking
> > can happen.
>
> And if there were a dataflow between addr/port a.b to addr/port c.d
> already, it will either terminated?
>
> Considering the following sequence:
> handlers->t3c_handlers->sched()->work_queue->work_handlers()->for
> example CPL_PASS_ACCEPT_REQ->pass_accept_req() - it just parses incoming
> skb and sets port/addr/route and other fields to be used as a base for rdma
> connection. What if it just a usual network packet from kernelspace or
> userspace with the same payload as should be sent by remote rdma system?
>

That skb isn't a network packet. Its a CPL_PASS_ACCEPT_REQ message (see
struct cpl_pass_accept_req in the Ethernet driver t3_cpl.h). If the
RDMA driver hadn't registered to listen on that addr/port, it would
never get this skb. Once a connection is established, the MPA messages
(and any TCP payload data) is delivered to the RDMA driver in the form
of skb's containing struct cpl_rx_data. So these skbs aren't just TCP
packets at all. They either control messages or TCP payload. Either way
they are encapsulated in CPL message structures.

Does this make sense?



2006-12-05 17:14:13

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

> Is there really no way to only keep the actual hw infiniband there, move
> iwarp/rdma drivers in drivers/net/something/ and the core stuff in
> net/something/ ?

It's definitely possible, but rearranging the source tree hasn't been
a high priority (for me at least).

- R.

2006-12-05 17:32:56

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, Dec 05, 2006 at 08:26:49PM +0300, Evgeniy Polyakov ([email protected]) wrote:
> On Tue, Dec 05, 2006 at 10:47:25AM -0600, Steve Wise ([email protected]) wrote:
> > > And if there were a dataflow between addr/port a.b to addr/port c.d
> > > already, it will either terminated?
> > >
> > > Considering the following sequence:
> > > handlers->t3c_handlers->sched()->work_queue->work_handlers()->for
> > > example CPL_PASS_ACCEPT_REQ->pass_accept_req() - it just parses incoming
> > > skb and sets port/addr/route and other fields to be used as a base for rdma
> > > connection. What if it just a usual network packet from kernelspace or
> > > userspace with the same payload as should be sent by remote rdma system?
> > >
> >
> > That skb isn't a network packet. Its a CPL_PASS_ACCEPT_REQ message (see
> > struct cpl_pass_accept_req in the Ethernet driver t3_cpl.h). If the
> > RDMA driver hadn't registered to listen on that addr/port, it would
> > never get this skb. Once a connection is established, the MPA messages
> > (and any TCP payload data) is delivered to the RDMA driver in the form
> > of skb's containing struct cpl_rx_data. So these skbs aren't just TCP
> > packets at all. They either control messages or TCP payload. Either way
> > they are encapsulated in CPL message structures.
> >
> > Does this make sense?
>
> Almost - except the case about where those skbs are coming from?
> It looks like they are obtained from network, since it is ethernet
> driver, and if they match some set of rules, they are considered as valid
> MPA negotiation protocol.
>
> If it is correct, it means that any packet in the network can be
> potentially 'stolen' by rdma hardware, although it was part of the usual
> dataflow.
> If that packets are not from ethernet network, but from different
> low-level, then there is a question (besides why this driver is called
> ethernet if it manages different hardware) about how connection over
> that different media is being setup and since packets contain perfectly
> valid IP addresses and ports.

It looks like I've answered myself - it is _not_ ethernet driver, but
rdma one, and although it gets all data through skbs from ethernet
driver, the latter gets them not from ethernet network.
And thus addresses and ports and all other information can not be mixed
between the two.

--
Evgeniy Polyakov

2006-12-05 17:34:13

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, Dec 05, 2006 at 10:47:25AM -0600, Steve Wise ([email protected]) wrote:
> > And if there were a dataflow between addr/port a.b to addr/port c.d
> > already, it will either terminated?
> >
> > Considering the following sequence:
> > handlers->t3c_handlers->sched()->work_queue->work_handlers()->for
> > example CPL_PASS_ACCEPT_REQ->pass_accept_req() - it just parses incoming
> > skb and sets port/addr/route and other fields to be used as a base for rdma
> > connection. What if it just a usual network packet from kernelspace or
> > userspace with the same payload as should be sent by remote rdma system?
> >
>
> That skb isn't a network packet. Its a CPL_PASS_ACCEPT_REQ message (see
> struct cpl_pass_accept_req in the Ethernet driver t3_cpl.h). If the
> RDMA driver hadn't registered to listen on that addr/port, it would
> never get this skb. Once a connection is established, the MPA messages
> (and any TCP payload data) is delivered to the RDMA driver in the form
> of skb's containing struct cpl_rx_data. So these skbs aren't just TCP
> packets at all. They either control messages or TCP payload. Either way
> they are encapsulated in CPL message structures.
>
> Does this make sense?

Almost - except the case about where those skbs are coming from?
It looks like they are obtained from network, since it is ethernet
driver, and if they match some set of rules, they are considered as valid
MPA negotiation protocol.

If it is correct, it means that any packet in the network can be
potentially 'stolen' by rdma hardware, although it was part of the usual
dataflow.
If that packets are not from ethernet network, but from different
low-level, then there is a question (besides why this driver is called
ethernet if it manages different hardware) about how connection over
that different media is being setup and since packets contain perfectly
valid IP addresses and ports.

And, btw, not related question - does postponing the whole skb multiplexing
to work queue result in lower latency and/or higher speed?
Since there are a lot of tricks introduced to minimize gap between
interrupt/napi polling and protocol processing, so such huge postponing
with the whole context switch looks strange.

--
Evgeniy Polyakov

2006-12-05 17:51:41

by Steve Wise

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, 2006-12-05 at 20:26 +0300, Evgeniy Polyakov wrote:
> On Tue, Dec 05, 2006 at 10:47:25AM -0600, Steve Wise ([email protected]) wrote:
> > > And if there were a dataflow between addr/port a.b to addr/port c.d
> > > already, it will either terminated?
> > >
> > > Considering the following sequence:
> > > handlers->t3c_handlers->sched()->work_queue->work_handlers()->for
> > > example CPL_PASS_ACCEPT_REQ->pass_accept_req() - it just parses incoming
> > > skb and sets port/addr/route and other fields to be used as a base for rdma
> > > connection. What if it just a usual network packet from kernelspace or
> > > userspace with the same payload as should be sent by remote rdma system?
> > >
> >
> > That skb isn't a network packet. Its a CPL_PASS_ACCEPT_REQ message (see
> > struct cpl_pass_accept_req in the Ethernet driver t3_cpl.h). If the
> > RDMA driver hadn't registered to listen on that addr/port, it would
> > never get this skb. Once a connection is established, the MPA messages
> > (and any TCP payload data) is delivered to the RDMA driver in the form
> > of skb's containing struct cpl_rx_data. So these skbs aren't just TCP
> > packets at all. They either control messages or TCP payload. Either way
> > they are encapsulated in CPL message structures.
> >
> > Does this make sense?
>
> Almost - except the case about where those skbs are coming from?
> It looks like they are obtained from network, since it is ethernet
> driver, and if they match some set of rules, they are considered as valid
> MPA negotiation protocol.

They come from the Ethernet driver, but that driver manages multiple HW
queues and these packets come from an offload queue, not the NIC queue.
So the HW demultiplexes.

Perhaps Divy or Felix from Chelsio can expand on how the Ethernet driver
manages this?

>
> If it is correct, it means that any packet in the network can be
> potentially 'stolen' by rdma hardware, although it was part of the usual
> dataflow.
> If that packets are not from ethernet network, but from different
> low-level, then there is a question (besides why this driver is called
> ethernet if it manages different hardware) about how connection over
> that different media is being setup and since packets contain perfectly
> valid IP addresses and ports.

The HW has different queues for offload vs native Ethernet frames. I'm
not an expert on the Ethernet driver, so you'll have to consult that
code and ask questions of Divy and/or Felix.

> And, btw, not related question - does postponing the whole skb multiplexing
> to work queue result in lower latency and/or higher speed?
> Since there are a lot of tricks introduced to minimize gap between
> interrupt/napi polling and protocol processing, so such huge postponing
> with the whole context switch looks strange.
>

Neither. The work queue makes the RDMA driver's life easier because it
has context to allocate skbs, for instance. Note all the work queue
stuff is done _only_ for RDMA connection setup and teardown. Once the
connection is in RDMA mode, there's no work queues at all for IO, and CQ
notifications happen in interrupt context. RDMA operations are
submitted to the hardware via iwch_post_send(). Completion notification
is done in the interrupt context via iwch_ev_dispatch(). And completion
entries reaped by the consumer application via iwch_poll_cq().


Steve.

2006-12-05 18:13:37

by Evgeniy Polyakov

[permalink] [raw]
Subject: Re: [PATCH v2 04/13] Connection Manager

On Tue, Dec 05, 2006 at 11:51:40AM -0600, Steve Wise ([email protected]) wrote:
> > Almost - except the case about where those skbs are coming from?
> > It looks like they are obtained from network, since it is ethernet
> > driver, and if they match some set of rules, they are considered as valid
> > MPA negotiation protocol.
>
> They come from the Ethernet driver, but that driver manages multiple HW
> queues and these packets come from an offload queue, not the NIC queue.
> So the HW demultiplexes.

Ok, thanks for explaination.

--
Evgeniy Polyakov

2006-12-06 01:45:53

by Michael Krause

[permalink] [raw]
Subject: Re: [openib-general] [PATCH v2 04/13] Connection Manager


If you require more details on how this all works - it was fully explored
in the IETF RDDP workgroup - may I suggest a reading of the RDMA Security
Considerations draft which goes through many of the issues on how one
relates to a host stack. This complements the MPA spec and supports much
of what Steve has already responded to during this string of e-mails. We
took a great deal of time and debate to insure this can work efficiently
and without confusion in terms of who owns what and when.

Mike



At 10:09 AM 12/5/2006, Evgeniy Polyakov wrote:
>On Tue, Dec 05, 2006 at 11:51:40AM -0600, Steve Wise
>([email protected]) wrote:
> > > Almost - except the case about where those skbs are coming from?
> > > It looks like they are obtained from network, since it is ethernet
> > > driver, and if they match some set of rules, they are considered as
> valid
> > > MPA negotiation protocol.
> >
> > They come from the Ethernet driver, but that driver manages multiple HW
> > queues and these packets come from an offload queue, not the NIC queue.
> > So the HW demultiplexes.
>
>Ok, thanks for explaination.
>
>--
> Evgeniy Polyakov
>
>_______________________________________________
>openib-general mailing list
>[email protected]
>http://openib.org/mailman/listinfo/openib-general
>
>To unsubscribe, please visit
>http://openib.org/mailman/listinfo/openib-general