2006-05-12 23:44:34

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4

Hi, Roland -

Here is a series of patches to bring the ipath driver up to date. I
believe you may already have two of them (but I've included them just
in case), but the others should all be new.

They apply on top of Linus's current -git.

Cheers,

<b


2006-05-12 23:44:35

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 4 of 53] ipath - cap number of PDs that can be allocated

Put an arbitrary cap on the maximum number of PDs that can be allocated
for a device. This is arbitrary because the number we support
is constrained only by system memory and what kmalloc can give us.
Nevertheless, if we don't have a limit, some third-party OpenIB stress
tests fail. The limit can be changed on the fly using a module parameter.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 5d5e1e641b16 -r 300f0aa6f034 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
@@ -54,6 +54,11 @@ unsigned int ib_ipath_debug; /* debug ma
unsigned int ib_ipath_debug; /* debug mask */
module_param_named(debug, ib_ipath_debug, uint, S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(debug, "Verbs debug mask");
+
+static unsigned int ib_ipath_max_pds = 0xFFFF;
+module_param_named(max_pds, ib_ipath_max_pds, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(max_pds,
+ "Maximum number of protection domains to support");

MODULE_LICENSE("GPL");
MODULE_AUTHOR("PathScale <[email protected]>");
@@ -589,7 +594,7 @@ static int ipath_query_device(struct ib_
props->max_cq = 0xffff;
props->max_cqe = 0xffff;
props->max_mr = dev->lk_table.max;
- props->max_pd = 0xffff;
+ props->max_pd = ib_ipath_max_pds;
props->max_qp_rd_atom = 1;
props->max_qp_init_rd_atom = 1;
/* props->max_res_rd_atom */
@@ -743,8 +748,23 @@ static struct ib_pd *ipath_alloc_pd(stru
struct ib_ucontext *context,
struct ib_udata *udata)
{
+ struct ipath_ibdev *dev = to_idev(ibdev);
struct ipath_pd *pd;
struct ib_pd *ret;
+
+ /*
+ * This is actually totally arbitrary. Some correctness tests
+ * assume there's a maximum number of PDs that can be allocated.
+ * We don't actually have this limit, but we fail the test if
+ * we allow allocations of more than we report for this value.
+ */
+
+ if (dev->n_pds_allocated == ib_ipath_max_pds) {
+ ret = ERR_PTR(-ENOMEM);
+ goto bail;
+ }
+
+ dev->n_pds_allocated++;

pd = kmalloc(sizeof *pd, GFP_KERNEL);
if (!pd) {
@@ -764,6 +784,9 @@ static int ipath_dealloc_pd(struct ib_pd
static int ipath_dealloc_pd(struct ib_pd *ibpd)
{
struct ipath_pd *pd = to_ipd(ibpd);
+ struct ipath_ibdev *dev = to_idev(ibpd->device);
+
+ dev->n_pds_allocated--;

kfree(pd);

diff -r 5d5e1e641b16 -r 300f0aa6f034 drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700
@@ -431,6 +431,7 @@ struct ipath_ibdev {
__be64 sys_image_guid; /* in network order */
__be64 gid_prefix; /* in network order */
__be64 mkey;
+ u32 n_pds_allocated; /* number of PDs allocated for device */
u64 ipath_sword; /* total dwords sent (sample result) */
u64 ipath_rword; /* total dwords received (sample result) */
u64 ipath_spkts; /* total packets sent (sample result) */

2006-05-12 23:44:35

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 3 of 53] ipath - report max MR and QP sizes based on table sizes

Report max MR based on the lkey table size.
Report max QP based on the QP table size.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 3ab7a7b10bf2 -r 5d5e1e641b16 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
@@ -583,12 +583,12 @@ static int ipath_query_device(struct ib_
props->sys_image_guid = dev->sys_image_guid;

props->max_mr_size = ~0ull;
- props->max_qp = 0xffff;
+ props->max_qp = dev->qp_table.max;
props->max_qp_wr = 0xffff;
props->max_sge = 255;
props->max_cq = 0xffff;
props->max_cqe = 0xffff;
- props->max_mr = 0xffff;
+ props->max_mr = dev->lk_table.max;
props->max_pd = 0xffff;
props->max_qp_rd_atom = 1;
props->max_qp_init_rd_atom = 1;

2006-05-12 23:45:12

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 17 of 53] ipath - fail properly if GID missing

Return -EINVAL if we can't find a multicast GID.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 176d1f0c26a3 -r c5f3731224bb drivers/infiniband/hw/ipath/ipath_verbs_mcast.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c Fri May 12 15:55:28 2006 -0700
@@ -272,7 +272,7 @@ int ipath_multicast_detach(struct ib_qp
while (1) {
if (n == NULL) {
spin_unlock_irqrestore(&mcast_lock, flags);
- ret = 0;
+ ret = -EINVAL;
goto bail;
}

2006-05-12 23:45:16

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 46 of 53] ipath - enable GPIO interrupt on HT-460

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r b41e576e5202 -r 04c86dd11b27 drivers/infiniband/hw/ipath/ipath_eeprom.c
--- a/drivers/infiniband/hw/ipath/ipath_eeprom.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c Fri May 12 15:55:29 2006 -0700
@@ -505,11 +505,10 @@ static u8 flash_csum(struct ipath_flash
* ipath_get_guid - get the GUID from the i2c device
* @dd: the infinipath device
*
- * When we add the multi-chip support, we will probably have to add
- * the ability to use the number of guids field, and get the guid from
- * the first chip's flash, to use for all of them.
- */
-void ipath_get_guid(struct ipath_devdata *dd)
+ * We have the capability to use the ipath_nguid field, and get
+ * the guid from the first chip's flash, to use for all of them.
+ */
+void ipath_get_eeprom_info(struct ipath_devdata *dd)
{
void *buf;
struct ipath_flash *ifp;
diff -r b41e576e5202 -r 04c86dd11b27 drivers/infiniband/hw/ipath/ipath_ht400.c
--- a/drivers/infiniband/hw/ipath/ipath_ht400.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ht400.c Fri May 12 15:55:29 2006 -0700
@@ -607,7 +607,12 @@ static int ipath_ht_boardname(struct ipa
case 4: /* Ponderosa is one of the bringup boards */
n = "Ponderosa";
break;
- case 5: /* HT-460 original production board */
+ case 5:
+ /*
+ * HT-460 original production board; two production levels, with
+ * different serial number ranges. See ipath_ht_early_init() for
+ * case where we enable IPATH_GPIO_INTR for later serial # range.
+ */
n = "InfiniPath_HT-460";
break;
case 6:
@@ -1520,6 +1525,18 @@ static int ipath_ht_early_init(struct ip
*/
ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
INFINIPATH_S_ABORT);
+
+ ipath_get_eeprom_info(dd);
+ if(dd->ipath_boardrev == 5 && dd->ipath_serial[0] == '1' &&
+ dd->ipath_serial[1] == '2' && dd->ipath_serial[2] == '8') {
+ /*
+ * Later production HT-460 has same changes as HT-465, so
+ * can use GPIO interrupts. They have serial #'s starting
+ * with 128, rather than 112.
+ */
+ dd->ipath_flags |= IPATH_GPIO_INTR;
+ dd->ipath_flags &= ~IPATH_POLL_RX_INTR;
+ }
return 0;
}

diff -r b41e576e5202 -r 04c86dd11b27 drivers/infiniband/hw/ipath/ipath_init_chip.c
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:29 2006 -0700
@@ -857,7 +857,6 @@ int ipath_init_chip(struct ipath_devdata

done:
if (!ret) {
- ipath_get_guid(dd);
*dd->ipath_statusp |= IPATH_STATUS_CHIP_PRESENT;
if (!dd->ipath_f_intrsetup(dd)) {
/* now we can enable all interrupts from the chip */
diff -r b41e576e5202 -r 04c86dd11b27 drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:29 2006 -0700
@@ -646,7 +646,7 @@ void ipath_init_pe800_funcs(struct ipath
void ipath_init_pe800_funcs(struct ipath_devdata *);
/* init HT-400-specific func */
void ipath_init_ht400_funcs(struct ipath_devdata *);
-void ipath_get_guid(struct ipath_devdata *);
+void ipath_get_eeprom_info(struct ipath_devdata *);
u64 ipath_snap_cntr(struct ipath_devdata *, ipath_creg);

/*
diff -r b41e576e5202 -r 04c86dd11b27 drivers/infiniband/hw/ipath/ipath_pe800.c
--- a/drivers/infiniband/hw/ipath/ipath_pe800.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_pe800.c Fri May 12 15:55:29 2006 -0700
@@ -1180,6 +1180,8 @@ static int ipath_pe_early_init(struct ip
*/
dd->ipath_rhdrhead_intr_off = 1ULL<<32;

+ ipath_get_eeprom_info(dd);
+
return 0;
}

2006-05-12 23:46:13

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 41 of 53] ipath - disable interrupts while holding spinlock in RWQE get

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 160a111381ae -r 83f1832c6015 drivers/infiniband/hw/ipath/ipath_ruc.c
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:29 2006 -0700
@@ -171,12 +171,13 @@ int ipath_get_rwqe(struct ipath_qp *qp,
n = rq->head - rq->tail;
if (n < srq->limit) {
srq->limit = 0;
- spin_unlock(&rq->lock);
+ spin_unlock_irqrestore(&rq->lock, flags);
ev.device = qp->ibqp.device;
ev.element.srq = qp->ibqp.srq;
ev.event = IB_EVENT_SRQ_LIMIT_REACHED;
srq->ibsrq.event_handler(&ev,
srq->ibsrq.srq_context);
+ spin_lock_irqsave(&rq->lock, flags);
}
}
done:

2006-05-12 23:46:12

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 36 of 53] ipath - count local link integrity errors

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700
@@ -446,6 +446,8 @@ static int __devinit ipath_init_one(stru
* by ipath_setup_htconfig.
*/
dd->ipath_flags = 0;
+ dd->ipath_lli_counter = 0;
+ dd->ipath_lli_errors = 0;

if (dd->ipath_f_bus(dd, pdev))
ipath_dev_err(dd, "Failed to setup config space; "
@@ -927,6 +929,18 @@ void ipath_kreceive(struct ipath_devdata
"tlen=%x opcode=%x egridx=%x: %s\n",
eflags, l, etype, tlen, bthbytes[0],
ips_get_index((__le32 *) rc), emsg);
+ /* Count local link integrity errors. */
+ if (eflags & (INFINIPATH_RHF_H_ICRCERR |
+ INFINIPATH_RHF_H_VCRCERR)) {
+ u8 n = (dd->ipath_ibcctrl >>
+ INFINIPATH_IBCC_PHYERRTHRESHOLD_SHIFT) &
+ INFINIPATH_IBCC_PHYERRTHRESHOLD_MASK;
+
+ if (++dd->ipath_lli_counter > n) {
+ dd->ipath_lli_counter = 0;
+ dd->ipath_lli_errors++;
+ }
+ }
} else if (etype == RCVHQ_RCV_TYPE_NON_KD) {
int ret = __ipath_verbs_rcv(dd, rc + 1,
ebuf, tlen);
@@ -934,6 +948,8 @@ void ipath_kreceive(struct ipath_devdata
ipath_cdbg(VERBOSE,
"received IB packet, "
"not SMA (QP=%x)\n", qp);
+ if (dd->ipath_lli_counter)
+ dd->ipath_lli_counter--;
} else if (etype == RCVHQ_RCV_TYPE_EAGER) {
if (qp == IPATH_KD_QP &&
bthbytes[0] == ipath_layer_rcv_opcode &&
@@ -1864,19 +1880,19 @@ static void __exit infinipath_cleanup(vo
} else
ipath_dbg("irq is 0, not doing free_irq "
"for unit %u\n", dd->ipath_unit);
+
+ /*
+ * we check for NULL here, because it's outside
+ * the kregbase check, and we need to call it
+ * after the free_irq. Thus it's possible that
+ * the function pointers were never initialized.
+ */
+ if (dd->ipath_f_cleanup)
+ /* clean up chip-specific stuff */
+ dd->ipath_f_cleanup(dd);
+
dd->pcidev = NULL;
}
-
- /*
- * we check for NULL here, because it's outside the kregbase
- * check, and we need to call it after the free_irq. Thus
- * it's possible that the function pointers were never
- * initialized.
- */
- if (dd->ipath_f_cleanup)
- /* clean up chip-specific stuff */
- dd->ipath_f_cleanup(dd);
-
spin_lock_irqsave(&ipath_devs_lock, flags);
}

diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_intr.c
--- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700
@@ -261,6 +261,7 @@ static void handle_e_ibstatuschanged(str
| IPATH_LINKACTIVE |
IPATH_LINKARMED);
*dd->ipath_statusp &= ~IPATH_STATUS_IB_READY;
+ dd->ipath_lli_counter = 0;
if (!noprint) {
if (((dd->ipath_lastibcstat >>
INFINIPATH_IBCS_LINKSTATE_SHIFT) &
diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:28 2006 -0700
@@ -509,6 +509,11 @@ struct ipath_devdata {
u8 ipath_pci_cacheline;
/* LID mask control */
u8 ipath_lmc;
+
+ /* local link integrity counter */
+ u32 ipath_lli_counter;
+ /* local link integrity errors */
+ u32 ipath_lli_errors;
};


diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_layer.c
--- a/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:28 2006 -0700
@@ -1013,6 +1013,11 @@ int ipath_layer_get_counters(struct ipat
ipath_snap_cntr(dd, dd->ipath_cregs->cr_ibsymbolerrcnt);
cntrs->link_error_recovery_counter =
ipath_snap_cntr(dd, dd->ipath_cregs->cr_iblinkerrrecovcnt);
+ /*
+ * The link downed counter counts when the other side downs the
+ * connection. We add in the number of times we downed the link
+ * due to local link integrity errors to compensate.
+ */
cntrs->link_downed_counter =
ipath_snap_cntr(dd, dd->ipath_cregs->cr_iblinkdowncnt);
cntrs->port_rcv_errors =
@@ -1037,6 +1042,8 @@ int ipath_layer_get_counters(struct ipat
ipath_snap_cntr(dd, dd->ipath_cregs->cr_pktsendcnt);
cntrs->port_rcv_packets =
ipath_snap_cntr(dd, dd->ipath_cregs->cr_pktrcvcnt);
+ cntrs->local_link_integrity_errors = dd->ipath_lli_errors;
+ cntrs->excessive_buffer_overrun_errors = 0; /* XXX */

ret = 0;

diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_layer.h
--- a/drivers/infiniband/hw/ipath/ipath_layer.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_layer.h Fri May 12 15:55:28 2006 -0700
@@ -54,6 +54,8 @@ struct ipath_layer_counters {
u64 port_rcv_data;
u64 port_xmit_packets;
u64 port_rcv_packets;
+ u32 local_link_integrity_errors;
+ u32 excessive_buffer_overrun_errors;
};

/*
diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_mad.c
--- a/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700
@@ -646,6 +646,8 @@ struct ib_pma_portcounters {
#define IB_PMA_SEL_PORT_RCV_ERRORS __constant_htons(0x0008)
#define IB_PMA_SEL_PORT_RCV_REMPHYS_ERRORS __constant_htons(0x0010)
#define IB_PMA_SEL_PORT_XMIT_DISCARDS __constant_htons(0x0040)
+#define IB_PMA_SEL_LOCAL_LINK_INTEGRITY_ERRORS __constant_htons(0x0200)
+#define IB_PMA_SEL_EXCESSIVE_BUFFER_OVERRUNS __constant_htons(0x0400)
#define IB_PMA_SEL_PORT_VL15_DROPPED __constant_htons(0x0800)
#define IB_PMA_SEL_PORT_XMIT_DATA __constant_htons(0x1000)
#define IB_PMA_SEL_PORT_RCV_DATA __constant_htons(0x2000)
@@ -893,6 +895,10 @@ static int recv_pma_get_portcounters(str
cntrs.port_rcv_data -= dev->n_port_rcv_data;
cntrs.port_xmit_packets -= dev->n_port_xmit_packets;
cntrs.port_rcv_packets -= dev->n_port_rcv_packets;
+ cntrs.local_link_integrity_errors -=
+ dev->z_local_link_integrity_errors;
+ cntrs.excessive_buffer_overrun_errors -=
+ dev->z_excessive_buffer_overrun_errors;

memset(pmp->data, 0, sizeof(pmp->data));

@@ -930,6 +936,12 @@ static int recv_pma_get_portcounters(str
else
p->port_xmit_discards =
cpu_to_be16((u16)cntrs.port_xmit_discards);
+ if (cntrs.local_link_integrity_errors > 0xFUL)
+ cntrs.local_link_integrity_errors = 0xFUL;
+ if (cntrs.excessive_buffer_overrun_errors > 0xFUL)
+ cntrs.excessive_buffer_overrun_errors = 0xFUL;
+ p->lli_ebor_errors = (cntrs.local_link_integrity_errors << 4) |
+ cntrs.excessive_buffer_overrun_errors;
if (dev->n_vl15_dropped > 0xFFFFUL)
p->vl15_dropped = __constant_cpu_to_be16(0xFFFF);
else
@@ -1028,6 +1040,14 @@ static int recv_pma_set_portcounters(str
if (p->counter_select & IB_PMA_SEL_PORT_XMIT_DISCARDS)
dev->n_port_xmit_discards = cntrs.port_xmit_discards;

+ if (p->counter_select & IB_PMA_SEL_LOCAL_LINK_INTEGRITY_ERRORS)
+ dev->z_local_link_integrity_errors =
+ cntrs.local_link_integrity_errors;
+
+ if (p->counter_select & IB_PMA_SEL_EXCESSIVE_BUFFER_OVERRUNS)
+ dev->z_excessive_buffer_overrun_errors =
+ cntrs.excessive_buffer_overrun_errors;
+
if (p->counter_select & IB_PMA_SEL_PORT_VL15_DROPPED)
dev->n_vl15_dropped = 0;

diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
@@ -1046,6 +1046,10 @@ static void *ipath_register_ib_device(in
idev->n_port_rcv_data = cntrs.port_rcv_data;
idev->n_port_xmit_packets = cntrs.port_xmit_packets;
idev->n_port_rcv_packets = cntrs.port_rcv_packets;
+ idev->z_local_link_integrity_errors =
+ cntrs.local_link_integrity_errors;
+ idev->z_excessive_buffer_overrun_errors =
+ cntrs.excessive_buffer_overrun_errors;

/*
* The system image GUID is supposed to be the same for all
diff -r e29625bd9050 -r ec1934faf5d1 drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
@@ -459,6 +459,8 @@ struct ipath_ibdev {
u64 n_port_xmit_packets; /* starting count for PMA */
u64 n_port_rcv_packets; /* starting count for PMA */
u32 n_pkey_violations; /* starting count for PMA */
+ u32 z_local_link_integrity_errors; /* starting count for PMA */
+ u32 z_excessive_buffer_overrun_errors; /* starting count for PMA */
u32 n_rc_resends;
u32 n_rc_acks;
u32 n_rc_qacks;

2006-05-12 23:46:12

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 43 of 53] ipath - fix memory leak when creating a QP fails

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 0aba84dce506 -r 7634b2f0fc40 drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:29 2006 -0700
@@ -680,6 +680,7 @@ struct ib_qp *ipath_create_qp(struct ib_
case IB_QPT_GSI:
qp = kmalloc(sizeof(*qp), GFP_KERNEL);
if (!qp) {
+ vfree(swq);
ret = ERR_PTR(-ENOMEM);
goto bail;
}
@@ -690,6 +691,7 @@ struct ib_qp *ipath_create_qp(struct ib_
qp->r_rq.wq = vmalloc(qp->r_rq.size * sz);
if (!qp->r_rq.wq) {
kfree(qp);
+ vfree(swq);
ret = ERR_PTR(-ENOMEM);
goto bail;
}

2006-05-12 23:45:11

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 9 of 53] ipath - cap number of CQs

Cap the number of CQs that can be created. Not a real limitation for us,
but the user verbs code expects a real number.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 1d3e85454b53 -r a89145f4846c drivers/infiniband/hw/ipath/ipath_cq.c
--- a/drivers/infiniband/hw/ipath/ipath_cq.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_cq.c Fri May 12 15:55:27 2006 -0700
@@ -157,12 +157,18 @@ struct ib_cq *ipath_create_cq(struct ib_
struct ib_ucontext *context,
struct ib_udata *udata)
{
+ struct ipath_ibdev *dev = to_idev(ibdev);
struct ipath_cq *cq;
struct ib_wc *wc;
struct ib_cq *ret;

if (entries > ib_ipath_max_cqe) {
ret = ERR_PTR(-EINVAL);
+ goto bail;
+ }
+
+ if (dev->n_cqs_allocated == ib_ipath_max_cqs) {
+ ret = ERR_PTR(-ENOMEM);
goto bail;
}

@@ -201,6 +207,8 @@ struct ib_cq *ipath_create_cq(struct ib_

ret = &cq->ibcq;

+ dev->n_cqs_allocated++;
+
bail:
return ret;
}
@@ -215,9 +223,11 @@ bail:
*/
int ipath_destroy_cq(struct ib_cq *ibcq)
{
+ struct ipath_ibdev *dev = to_idev(ibcq->device);
struct ipath_cq *cq = to_icq(ibcq);

tasklet_kill(&cq->comptask);
+ dev->n_cqs_allocated--;
vfree(cq->queue);
kfree(cq);

diff -r 1d3e85454b53 -r a89145f4846c drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
@@ -69,6 +69,11 @@ module_param_named(max_cqe, ib_ipath_max
module_param_named(max_cqe, ib_ipath_max_cqe, uint, S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(max_cqe,
"Maximum number of completion queue entries to support");
+
+unsigned int ib_ipath_max_cqs = 0xFFFF;
+module_param_named(max_cqs, ib_ipath_max_cqs, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(max_cqs,
+ "Maximum number of completion queues to support");

MODULE_LICENSE("GPL");
MODULE_AUTHOR("PathScale <[email protected]>");
@@ -601,7 +606,7 @@ static int ipath_query_device(struct ib_
props->max_qp = dev->qp_table.max;
props->max_qp_wr = 0xffff;
props->max_sge = 255;
- props->max_cq = 0xffff;
+ props->max_cq = ib_ipath_max_cqs;
props->max_ah = ib_ipath_max_ahs;
props->max_cqe = ib_ipath_max_cqe;
props->max_mr = dev->lk_table.max;
diff -r 1d3e85454b53 -r a89145f4846c drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700
@@ -433,6 +433,7 @@ struct ipath_ibdev {
__be64 mkey;
u32 n_pds_allocated; /* number of PDs allocated for device */
u32 n_ahs_allocated; /* number of AHs allocated for device */
+ u32 n_cqs_allocated; /* number of CQs allocated for device */
u64 ipath_sword; /* total dwords sent (sample result) */
u64 ipath_rword; /* total dwords received (sample result) */
u64 ipath_spkts; /* total packets sent (sample result) */
@@ -692,6 +693,8 @@ extern unsigned int ib_ipath_lkey_table_

extern unsigned int ib_ipath_max_cqe;

+extern unsigned int ib_ipath_max_cqs;
+
extern const u32 ib_ipath_rnr_table[];

#endif /* IPATH_VERBS_H */

2006-05-12 23:45:54

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 23 of 53] ipath - [TRIVIAL] typo fixes

A few typo fixes.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 1887e7b3e2a3 -r 8b882bb46a32 drivers/infiniband/hw/ipath/ipath_intr.c
--- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700
@@ -753,7 +753,7 @@ irqreturn_t ipath_intr(int irq, void *da
}

/*
- * We try to avoid readint the interrupt status register, since
+ * We try to avoid reading the interrupt status register, since
* that's a PIO read, and stalls the processor for up to about
* ~0.25 usec. The idea is that if we processed a port0 packet,
* we blindly clear the port 0 receive interrupt bits, and nothing
diff -r 1887e7b3e2a3 -r 8b882bb46a32 drivers/infiniband/hw/ipath/ipath_layer.c
--- a/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:28 2006 -0700
@@ -882,7 +882,7 @@ static void copy_io(u32 __iomem *piobuf,
/**
* ipath_verbs_send - send a packet from the verbs layer
* @dd: the infinipath device
- * @hdrwords: the number of works in the header
+ * @hdrwords: the number of words in the header
* @hdr: the packet header
* @len: the length of the packet in bytes
* @ss: the SGE to send

2006-05-12 23:45:53

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

Made in-memory rcvhdrq tail update be in dma_alloc'ed memory, not random
user or special kernel (needed for powerpc, also "just the right thing
to do".

Some cleanups to make unexpected link transitions less likely to produce
complaints about packet errors, and also to not leave SMA packets stuck
and unable to go out.

Call dma_free_coherent without ipath_mutex held.

A few other random debug and comment cleanups.

Always init rcvhdrq head/tail registers to 0, to avoid race conditions
(should have been that way some time ago).

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 09077b2f476f -r e29625bd9050 drivers/infiniband/hw/ipath/ipath_common.h
--- a/drivers/infiniband/hw/ipath/ipath_common.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_common.h Fri May 12 15:55:28 2006 -0700
@@ -307,6 +307,9 @@ struct ipath_base_info {
__u32 spi_rcv_egrchunksize;
/* total size of mmap to cover full rcvegrbuffers */
__u32 spi_rcv_egrbuftotlen;
+ __u32 spi_filler_for_align;
+ /* address of readonly memory copy of the rcvhdrq tail register. */
+ __u64 spi_rcvhdr_tailaddr;
} __attribute__ ((aligned(8)));


@@ -376,13 +379,7 @@ struct ipath_user_info {
*/
__u32 spu_rcvhdrsize;

- /*
- * cache line aligned (64 byte) user address to
- * which the rcvhdrtail register will be written by infinipath
- * whenever it changes, so that no chip registers are read in
- * the performance path.
- */
- __u64 spu_rcvhdraddr;
+ __u64 spu_unused; /* kept for compatible layout */

/*
* address of struct base_info to write to
diff -r 09077b2f476f -r e29625bd9050 drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700
@@ -131,14 +131,6 @@ static struct pci_driver ipath_driver =
.id_table = ipath_pci_tbl,
};

-/*
- * This is where port 0's rcvhdrtail register is written back; we also
- * want nothing else sharing the cache line, so make it a cache line
- * in size. Used for all units.
- */
-volatile __le64 *ipath_port0_rcvhdrtail;
-dma_addr_t ipath_port0_rcvhdrtail_dma;
-static int port0_rcvhdrtail_refs;

static inline void read_bars(struct ipath_devdata *dd, struct pci_dev *dev,
u32 *bar0, u32 *bar1)
@@ -171,14 +163,13 @@ static void ipath_free_devdata(struct pc
list_del(&dd->ipath_list);
spin_unlock_irqrestore(&ipath_devs_lock, flags);
}
- dma_free_coherent(&pdev->dev, sizeof(*dd), dd, dd->ipath_dma_addr);
+ vfree(dd);
}

static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev)
{
unsigned long flags;
struct ipath_devdata *dd;
- dma_addr_t dma_addr;
int ret;

if (!idr_pre_get(&unit_table, GFP_KERNEL)) {
@@ -186,15 +177,13 @@ static struct ipath_devdata *ipath_alloc
goto bail;
}

- dd = dma_alloc_coherent(&pdev->dev, sizeof(*dd), &dma_addr,
- GFP_KERNEL);
-
+ dd = vmalloc(sizeof(*dd));
if (!dd) {
dd = ERR_PTR(-ENOMEM);
goto bail;
}
-
- dd->ipath_dma_addr = dma_addr;
+ memset(dd, 0, sizeof(*dd));
+
dd->ipath_unit = -1;

spin_lock_irqsave(&ipath_devs_lock, flags);
@@ -272,47 +261,6 @@ int ipath_count_units(int *npresentp, in
return nunits;
}

-static int init_port0_rcvhdrtail(struct pci_dev *pdev)
-{
- int ret;
-
- mutex_lock(&ipath_mutex);
-
- if (!ipath_port0_rcvhdrtail) {
- ipath_port0_rcvhdrtail =
- dma_alloc_coherent(&pdev->dev,
- IPATH_PORT0_RCVHDRTAIL_SIZE,
- &ipath_port0_rcvhdrtail_dma,
- GFP_KERNEL);
-
- if (!ipath_port0_rcvhdrtail) {
- ret = -ENOMEM;
- goto bail;
- }
- }
- port0_rcvhdrtail_refs++;
- ret = 0;
-
-bail:
- mutex_unlock(&ipath_mutex);
-
- return ret;
-}
-
-static void cleanup_port0_rcvhdrtail(struct pci_dev *pdev)
-{
- mutex_lock(&ipath_mutex);
-
- if (!--port0_rcvhdrtail_refs) {
- dma_free_coherent(&pdev->dev, IPATH_PORT0_RCVHDRTAIL_SIZE,
- (void *) ipath_port0_rcvhdrtail,
- ipath_port0_rcvhdrtail_dma);
- ipath_port0_rcvhdrtail = NULL;
- }
-
- mutex_unlock(&ipath_mutex);
-}
-
/*
* These next two routines are placeholders in case we don't have per-arch
* code for controlling write combining. If explicit control of write
@@ -337,20 +285,12 @@ static int __devinit ipath_init_one(stru
u32 bar0 = 0, bar1 = 0;
u8 rev;

- ret = init_port0_rcvhdrtail(pdev);
- if (ret < 0) {
- printk(KERN_ERR IPATH_DRV_NAME
- ": Could not allocate port0_rcvhdrtail: error %d\n",
- -ret);
- goto bail;
- }
-
dd = ipath_alloc_devdata(pdev);
if (IS_ERR(dd)) {
ret = PTR_ERR(dd);
printk(KERN_ERR IPATH_DRV_NAME
": Could not allocate devdata: error %d\n", -ret);
- goto bail_rcvhdrtail;
+ goto bail;
}

ipath_cdbg(VERBOSE, "initializing unit #%u\n", dd->ipath_unit);
@@ -562,9 +502,6 @@ bail_devdata:
bail_devdata:
ipath_free_devdata(pdev, dd);

-bail_rcvhdrtail:
- cleanup_port0_rcvhdrtail(pdev);
-
bail:
return ret;
}
@@ -595,7 +532,6 @@ static void __devexit ipath_remove_one(s
pci_disable_device(pdev);

ipath_free_devdata(pdev, dd);
- cleanup_port0_rcvhdrtail(pdev);
}

/* general driver use */
@@ -1372,26 +1308,20 @@ bail:
* @dd: the infinipath device
* @pd: the port data
*
- * this *must* be physically contiguous memory, and for now,
- * that limits it to what kmalloc can do.
+ * this must be contiguous memory (from an i/o perspective), and must be
+ * DMA'able (which means for some systems, it will go through an IOMMU,
+ * or be forced into a low address range).
*/
int ipath_create_rcvhdrq(struct ipath_devdata *dd,
struct ipath_portdata *pd)
{
- int ret = 0, amt;
-
- amt = ALIGN(dd->ipath_rcvhdrcnt * dd->ipath_rcvhdrentsize *
- sizeof(u32), PAGE_SIZE);
+ int ret = 0;
+
if (!pd->port_rcvhdrq) {
- /*
- * not using REPEAT isn't viable; at 128KB, we can easily
- * fail this. The problem with REPEAT is we can block here
- * "forever". There isn't an inbetween, unfortunately. We
- * could reduce the risk by never freeing the rcvhdrq except
- * at unload, but even then, the first time a port is used,
- * we could delay for some time...
- */
+ dma_addr_t phys_hdrqtail;
gfp_t gfp_flags = GFP_USER | __GFP_COMP;
+ int amt = ALIGN(dd->ipath_rcvhdrcnt * dd->ipath_rcvhdrentsize *
+ sizeof(u32), PAGE_SIZE);

pd->port_rcvhdrq = dma_alloc_coherent(
&dd->pcidev->dev, amt, &pd->port_rcvhdrq_phys,
@@ -1404,6 +1334,16 @@ int ipath_create_rcvhdrq(struct ipath_de
ret = -ENOMEM;
goto bail;
}
+ pd->port_rcvhdrtail_kvaddr = dma_alloc_coherent(
+ &dd->pcidev->dev, PAGE_SIZE, &phys_hdrqtail, GFP_KERNEL);
+ if (!pd->port_rcvhdrtail_kvaddr) {
+ ipath_dev_err(dd, "attempt to allocate 1 page "
+ "for port %u rcvhdrqtailaddr failed\n",
+ pd->port_port);
+ ret = -ENOMEM;
+ goto bail;
+ }
+ pd->port_rcvhdrqtailaddr_phys = phys_hdrqtail;

pd->port_rcvhdrq_size = amt;

@@ -1413,20 +1353,28 @@ int ipath_create_rcvhdrq(struct ipath_de
(unsigned long) pd->port_rcvhdrq_phys,
(unsigned long) pd->port_rcvhdrq_size,
pd->port_port);
- } else {
- /*
- * clear for security, sanity, and/or debugging, each
- * time we reuse
- */
- memset(pd->port_rcvhdrq, 0, amt);
- }
+
+ ipath_cdbg(VERBOSE, "port %d hdrtailaddr, %llx physical\n",
+ pd->port_port,
+ (unsigned long long) phys_hdrqtail);
+ }
+ else
+ ipath_cdbg(VERBOSE, "reuse port %d rcvhdrq @%p %llx phys; "
+ "hdrtailaddr@%p %llx physical\n",
+ pd->port_port, pd->port_rcvhdrq,
+ pd->port_rcvhdrq_phys, pd->port_rcvhdrtail_kvaddr,
+ (unsigned long long)pd->port_rcvhdrqtailaddr_phys);
+
+ /* clear for security and sanity on each use */
+ memset(pd->port_rcvhdrq, 0, pd->port_rcvhdrq_size);
+ memset((void *)pd->port_rcvhdrtail_kvaddr, 0, PAGE_SIZE);

/*
* tell chip each time we init it, even if we are re-using previous
- * memory (we zero it at process close)
- */
- ipath_cdbg(VERBOSE, "writing port %d rcvhdraddr as %lx\n",
- pd->port_port, (unsigned long) pd->port_rcvhdrq_phys);
+ * memory (we zero the register at process close)
+ */
+ ipath_write_kreg_port(dd, dd->ipath_kregs->kr_rcvhdrtailaddr,
+ pd->port_port, pd->port_rcvhdrqtailaddr_phys);
ipath_write_kreg_port(dd, dd->ipath_kregs->kr_rcvhdraddr,
pd->port_port, pd->port_rcvhdrq_phys);

@@ -1514,15 +1462,27 @@ void ipath_set_ib_lstate(struct ipath_de
[INFINIPATH_IBCC_LINKCMD_ARMED] = "ARMED",
[INFINIPATH_IBCC_LINKCMD_ACTIVE] = "ACTIVE"
};
+ int linkcmd = (which >> INFINIPATH_IBCC_LINKCMD_SHIFT) &
+ INFINIPATH_IBCC_LINKCMD_MASK;
+
ipath_cdbg(SMA, "Trying to move unit %u to %s, current ltstate "
"is %s\n", dd->ipath_unit,
- what[(which >> INFINIPATH_IBCC_LINKCMD_SHIFT) &
- INFINIPATH_IBCC_LINKCMD_MASK],
+ what[linkcmd],
ipath_ibcstatus_str[
(ipath_read_kreg64
(dd, dd->ipath_kregs->kr_ibcstatus) >>
INFINIPATH_IBCS_LINKTRAININGSTATE_SHIFT) &
INFINIPATH_IBCS_LINKTRAININGSTATE_MASK]);
+ /* flush all queued sends when going to DOWN or INIT, to be sure that
+ * they don't block SMA and other MAD packets */
+ if(!linkcmd || linkcmd == INFINIPATH_IBCC_LINKCMD_INIT) {
+ ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
+ INFINIPATH_S_ABORT);
+ ipath_disarm_piobufs(dd, dd->ipath_lastport_piobuf,
+ (unsigned)(dd->ipath_piobcnt2k +
+ dd->ipath_piobcnt4k) -
+ dd->ipath_lastport_piobuf);
+ }

ipath_write_kreg(dd, dd->ipath_kregs->kr_ibcctrl,
dd->ipath_ibcctrl | which);
@@ -1670,60 +1630,54 @@ void ipath_shutdown_device(struct ipath_
/**
* ipath_free_pddata - free a port's allocated data
* @dd: the infinipath device
- * @port: the port
- * @freehdrq: free the port data structure if true
- *
- * when closing, free up any allocated data for a port, if the
- * reference count goes to zero
- * Note: this also optionally frees the portdata itself!
- * Any changes here have to be matched up with the reinit case
- * of ipath_init_chip(), which calls this routine on reinit after reset.
- */
-void ipath_free_pddata(struct ipath_devdata *dd, u32 port, int freehdrq)
-{
- struct ipath_portdata *pd = dd->ipath_pd[port];
-
+ * @pd: the portdata structure
+ *
+ * free up any allocated data for a port
+ * This should not touch anything that would affect a simultaneous
+ * re-allocation of port data, because it is called after ipath_mutex
+ * is released (and can be called from reinit as well).
+ * It should never change any chip state, or global driver state.
+ * (The only exception to global state is freeing the port0 port0_skbs.)
+ */
+void ipath_free_pddata(struct ipath_devdata *dd, struct ipath_portdata *pd)
+{
if (!pd)
return;
- if (freehdrq)
- /*
- * only clear and free portdata if we are going to also
- * release the hdrq, otherwise we leak the hdrq on each
- * open/close cycle
- */
- dd->ipath_pd[port] = NULL;
- if (freehdrq && pd->port_rcvhdrq) {
+
+ if (pd->port_rcvhdrq) {
ipath_cdbg(VERBOSE, "free closed port %d rcvhdrq @ %p "
"(size=%lu)\n", pd->port_port, pd->port_rcvhdrq,
(unsigned long) pd->port_rcvhdrq_size);
dma_free_coherent(&dd->pcidev->dev, pd->port_rcvhdrq_size,
pd->port_rcvhdrq, pd->port_rcvhdrq_phys);
pd->port_rcvhdrq = NULL;
- }
- if (port && pd->port_rcvegrbuf) {
- /* always free this */
- if (pd->port_rcvegrbuf) {
- unsigned e;
-
- for (e = 0; e < pd->port_rcvegrbuf_chunks; e++) {
- void *base = pd->port_rcvegrbuf[e];
- size_t size = pd->port_rcvegrbuf_size;
-
- ipath_cdbg(VERBOSE, "egrbuf free(%p, %lu), "
- "chunk %u/%u\n", base,
- (unsigned long) size,
- e, pd->port_rcvegrbuf_chunks);
- dma_free_coherent(
- &dd->pcidev->dev, size, base,
- pd->port_rcvegrbuf_phys[e]);
- }
- vfree(pd->port_rcvegrbuf);
- pd->port_rcvegrbuf = NULL;
- vfree(pd->port_rcvegrbuf_phys);
- pd->port_rcvegrbuf_phys = NULL;
- }
+ if(pd->port_rcvhdrtail_kvaddr) {
+ dma_free_coherent(&dd->pcidev->dev, PAGE_SIZE,
+ (void *)pd->port_rcvhdrtail_kvaddr,
+ pd->port_rcvhdrqtailaddr_phys);
+ pd->port_rcvhdrtail_kvaddr = NULL;
+ }
+ }
+ if(pd->port_port && pd->port_rcvegrbuf) {
+ unsigned e;
+
+ for (e = 0; e < pd->port_rcvegrbuf_chunks; e++) {
+ void *base = pd->port_rcvegrbuf[e];
+ size_t size = pd->port_rcvegrbuf_size;
+
+ ipath_cdbg(VERBOSE, "egrbuf free(%p, %lu), "
+ "chunk %u/%u\n", base,
+ (unsigned long) size,
+ e, pd->port_rcvegrbuf_chunks);
+ dma_free_coherent(&dd->pcidev->dev, size,
+ base, pd->port_rcvegrbuf_phys[e]);
+ }
+ vfree(pd->port_rcvegrbuf);
+ pd->port_rcvegrbuf = NULL;
+ vfree(pd->port_rcvegrbuf_phys);
+ pd->port_rcvegrbuf_phys = NULL;
pd->port_rcvegrbuf_chunks = 0;
- } else if (port == 0 && dd->ipath_port0_skbs) {
+ } else if (pd->port_port == 0 && dd->ipath_port0_skbs) {
unsigned e;
struct sk_buff **skbs = dd->ipath_port0_skbs;

@@ -1735,10 +1689,8 @@ void ipath_free_pddata(struct ipath_devd
dev_kfree_skb(skbs[e]);
vfree(skbs);
}
- if (freehdrq) {
- kfree(pd->port_tid_pg_list);
- kfree(pd);
- }
+ kfree(pd->port_tid_pg_list);
+ kfree(pd);
}

static int __init infinipath_init(void)
@@ -1864,10 +1816,14 @@ static void cleanup_device(struct ipath_

/*
* free any resources still in use (usually just kernel ports)
- * at unload
- */
- for (port = 0; port < dd->ipath_cfgports; port++)
- ipath_free_pddata(dd, port, 1);
+ * at unload; we do for portcnt, not cfgports, because cfgports
+ * could have changed while we were loaded.
+ */
+ for (port = 0; port < dd->ipath_portcnt; port++) {
+ struct ipath_portdata *pd = dd->ipath_pd[port];
+ dd->ipath_pd[port] = NULL;
+ ipath_free_pddata(dd, pd);
+ }
kfree(dd->ipath_pd);
/*
* debuggability, in case some cleanup path tries to use it
@@ -1908,19 +1864,19 @@ static void __exit infinipath_cleanup(vo
} else
ipath_dbg("irq is 0, not doing free_irq "
"for unit %u\n", dd->ipath_unit);
-
- /*
- * we check for NULL here, because it's outside
- * the kregbase check, and we need to call it
- * after the free_irq. Thus it's possible that
- * the function pointers were never initialized.
- */
- if (dd->ipath_f_cleanup)
- /* clean up chip-specific stuff */
- dd->ipath_f_cleanup(dd);
-
dd->pcidev = NULL;
}
+
+ /*
+ * we check for NULL here, because it's outside the kregbase
+ * check, and we need to call it after the free_irq. Thus
+ * it's possible that the function pointers were never
+ * initialized.
+ */
+ if (dd->ipath_f_cleanup)
+ /* clean up chip-specific stuff */
+ dd->ipath_f_cleanup(dd);
+
spin_lock_irqsave(&ipath_devs_lock, flags);
}

diff -r 09077b2f476f -r e29625bd9050 drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:28 2006 -0700
@@ -122,6 +122,7 @@ static int ipath_get_base_info(struct ip
* on to yet another method of dealing with this
*/
kinfo->spi_rcvhdr_base = (u64) pd->port_rcvhdrq_phys;
+ kinfo->spi_rcvhdr_tailaddr = (u64)pd->port_rcvhdrqtailaddr_phys;
kinfo->spi_rcv_egrbufs = (u64) pd->port_rcvegr_phys;
kinfo->spi_pioavailaddr = (u64) dd->ipath_pioavailregs_phys;
kinfo->spi_status = (u64) kinfo->spi_pioavailaddr +
@@ -783,11 +784,12 @@ static int ipath_create_user_egr(struct

bail_rcvegrbuf_phys:
for (e = 0; e < pd->port_rcvegrbuf_chunks &&
- pd->port_rcvegrbuf[e]; e++)
+ pd->port_rcvegrbuf[e]; e++) {
dma_free_coherent(&dd->pcidev->dev, size,
pd->port_rcvegrbuf[e],
pd->port_rcvegrbuf_phys[e]);

+ }
vfree(pd->port_rcvegrbuf_phys);
pd->port_rcvegrbuf_phys = NULL;
bail_rcvegrbuf:
@@ -802,10 +804,7 @@ static int ipath_do_user_init(struct ipa
{
int ret = 0;
struct ipath_devdata *dd = pd->port_dd;
- u64 physaddr, uaddr, off, atmp;
- struct page *pagep;
u32 head32;
- u64 head;

/* for now, if major version is different, bail */
if ((uinfo->spu_userversion >> 16) != IPATH_USER_SWMAJOR) {
@@ -829,54 +828,6 @@ static int ipath_do_user_init(struct ipa
}

/* for now we do nothing with rcvhdrcnt: uinfo->spu_rcvhdrcnt */
-
- /* set up for the rcvhdr Q tail register writeback to user memory */
- if (!uinfo->spu_rcvhdraddr ||
- !access_ok(VERIFY_WRITE, (u64 __user *) (unsigned long)
- uinfo->spu_rcvhdraddr, sizeof(u64))) {
- ipath_dbg("Port %d rcvhdrtail addr %llx not valid\n",
- pd->port_port,
- (unsigned long long) uinfo->spu_rcvhdraddr);
- ret = -EINVAL;
- goto done;
- }
-
- off = offset_in_page(uinfo->spu_rcvhdraddr);
- uaddr = PAGE_MASK & (unsigned long) uinfo->spu_rcvhdraddr;
- ret = ipath_get_user_pages_nocopy(uaddr, &pagep);
- if (ret) {
- dev_info(&dd->pcidev->dev, "Failed to lookup and lock "
- "address %llx for rcvhdrtail: errno %d\n",
- (unsigned long long) uinfo->spu_rcvhdraddr, -ret);
- goto done;
- }
- ipath_stats.sps_pagelocks++;
- pd->port_rcvhdrtail_uaddr = uaddr;
- pd->port_rcvhdrtail_pagep = pagep;
- pd->port_rcvhdrtail_kvaddr =
- page_address(pagep);
- pd->port_rcvhdrtail_kvaddr += off;
- physaddr = page_to_phys(pagep) + off;
- ipath_cdbg(VERBOSE, "port %d user addr %llx hdrtailaddr, %llx "
- "physical (off=%llx)\n",
- pd->port_port,
- (unsigned long long) uinfo->spu_rcvhdraddr,
- (unsigned long long) physaddr, (unsigned long long) off);
- ipath_write_kreg_port(dd, dd->ipath_kregs->kr_rcvhdrtailaddr,
- pd->port_port, physaddr);
- atmp = ipath_read_kreg64_port(dd,
- dd->ipath_kregs->kr_rcvhdrtailaddr,
- pd->port_port);
- if (physaddr != atmp) {
- ipath_dev_err(dd,
- "Catastrophic software error, "
- "RcvHdrTailAddr%u written as %llx, "
- "read back as %llx\n", pd->port_port,
- (unsigned long long) physaddr,
- (unsigned long long) atmp);
- ret = -EINVAL;
- goto done;
- }

/* for right now, kernel piobufs are at end, so port 1 is at 0 */
pd->port_piobufs = dd->ipath_piobufbase +
@@ -896,26 +847,18 @@ static int ipath_do_user_init(struct ipa
ret = ipath_create_user_egr(pd);
if (ret)
goto done;
- /* enable receives now */
- /* atomically set enable bit for this port */
- set_bit(INFINIPATH_R_PORTENABLE_SHIFT + pd->port_port,
- &dd->ipath_rcvctrl);

/*
- * set the head registers for this port to the current values
+ * set the eager head register for this port to the current values
* of the tail pointers, since we don't know if they were
* updated on last use of the port.
*/
- head32 = ipath_read_ureg32(dd, ur_rcvhdrtail, pd->port_port);
- head = (u64) head32;
- ipath_write_ureg(dd, ur_rcvhdrhead, head, pd->port_port);
head32 = ipath_read_ureg32(dd, ur_rcvegrindextail, pd->port_port);
ipath_write_ureg(dd, ur_rcvegrindexhead, head32, pd->port_port);
dd->ipath_lastegrheads[pd->port_port] = -1;
dd->ipath_lastrcvhdrqtails[pd->port_port] = -1;
- ipath_cdbg(VERBOSE, "Wrote port%d head %llx, egrhead %x from "
- "tail regs\n", pd->port_port,
- (unsigned long long) head, head32);
+ ipath_cdbg(VERBOSE, "Wrote port%d egrhead %x from tail regs\n",
+ pd->port_port, head32);
pd->port_tidcursor = 0; /* start at beginning after open */
/*
* now enable the port; the tail registers will be written to memory
@@ -924,13 +867,62 @@ static int ipath_do_user_init(struct ipa
* transition from 0 to 1, so clear it first, then set it as part of
* enabling the port. This will (very briefly) affect any other
* open ports, but it shouldn't be long enough to be an issue.
+ * We explictly set the in-memory copy to 0 beforehand, so we don't
+ * have to wait to be sure the DMA update has happened.
*/
+ *pd->port_rcvhdrtail_kvaddr = 0ULL;
+ set_bit(INFINIPATH_R_PORTENABLE_SHIFT + pd->port_port,
+ &dd->ipath_rcvctrl);
ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
dd->ipath_rcvctrl & ~INFINIPATH_R_TAILUPD);
ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvctrl,
dd->ipath_rcvctrl);
-
done:
+ return ret;
+}
+
+
+/* common code for the mappings on dma_alloc_coherent mem */
+static int ipath_mmap_mem(struct vm_area_struct *vma,
+ struct ipath_portdata *pd, unsigned len,
+ int write_ok, dma_addr_t addr, char *what)
+{
+ struct ipath_devdata *dd = pd->port_dd;
+ unsigned pfn = (unsigned long)addr >> PAGE_SHIFT;
+ int ret;
+
+ if ((vma->vm_end - vma->vm_start) > len) {
+ dev_info(&dd->pcidev->dev,
+ "FAIL on %s: len %lx > %x\n", what,
+ vma->vm_end - vma->vm_start, len);
+ ret = -EFAULT;
+ goto bail;
+ }
+
+ if(!write_ok) {
+ if (vma->vm_flags & VM_WRITE) {
+ dev_info(&dd->pcidev->dev,
+ "%s must be mapped readonly\n", what);
+ ret = -EPERM;
+ goto bail;
+ }
+
+ /* don't allow them to later change with mprotect */
+ vma->vm_flags &= ~VM_MAYWRITE;
+ }
+
+ ret = remap_pfn_range(vma, vma->vm_start, pfn,
+ len, vma->vm_page_prot);
+ if(ret)
+ dev_info(&dd->pcidev->dev,
+ "%s port%u mmap of %lx, %x bytes r%c failed: %d\n",
+ what, pd->port_port, (unsigned long)addr, len,
+ write_ok?'w':'o', ret);
+ else
+ ipath_cdbg(VERBOSE, "%s port%u mmaped %lx, %x bytes r%c\n",
+ what, pd->port_port, (unsigned long)addr, len,
+ write_ok?'w':'o');
+bail:
return ret;
}

@@ -940,8 +932,11 @@ static int mmap_ureg(struct vm_area_stru
unsigned long phys;
int ret;

- /* it's the real hardware, so io_remap works */
-
+ /*
+ * This is real hardware, so use io_remap. This is the mechanism
+ * for the user process to update the head registers for their port
+ * in the chip.
+ */
if ((vma->vm_end - vma->vm_start) > PAGE_SIZE) {
dev_info(&dd->pcidev->dev, "FAIL mmap userreg: reqlen "
"%lx > PAGE\n", vma->vm_end - vma->vm_start);
@@ -967,10 +962,11 @@ static int mmap_piobufs(struct vm_area_s
int ret;

/*
- * When we map the PIO buffers, we want to map them as writeonly, no
- * read possible.
+ * When we map the PIO buffers in the chip, we want to map them as
+ * writeonly, no read possible. This prevents access to previous
+ * process data, and catches users who might try to read the i/o
+ * space due to a bug.
*/
-
if ((vma->vm_end - vma->vm_start) >
(dd->ipath_pbufsport * dd->ipath_palign)) {
dev_info(&dd->pcidev->dev, "FAIL mmap piobufs: "
@@ -981,11 +977,10 @@ static int mmap_piobufs(struct vm_area_s
}

phys = dd->ipath_physaddr + pd->port_piobufs;
+
/*
- * Do *NOT* mark this as non-cached (PWT bit), or we don't get the
+ * Don't mark this as non-cached, or we don't get the
* write combining behavior we want on the PIO buffers!
- * vma->vm_page_prot =
- * pgprot_noncached(vma->vm_page_prot);
*/

if (vma->vm_flags & VM_READ) {
@@ -997,8 +992,7 @@ static int mmap_piobufs(struct vm_area_s
}

/* don't allow them to later change to readable with mprotect */
-
- vma->vm_flags &= ~VM_MAYWRITE;
+ vma->vm_flags &= ~VM_MAYREAD;
vma->vm_flags |= VM_DONTCOPY | VM_DONTEXPAND;

ret = io_remap_pfn_range(vma, vma->vm_start, phys >> PAGE_SHIFT,
@@ -1016,11 +1010,6 @@ static int mmap_rcvegrbufs(struct vm_are
size_t total_size, i;
dma_addr_t *phys;
int ret;
-
- if (!pd->port_rcvegrbuf) {
- ret = -EFAULT;
- goto bail;
- }

size = pd->port_rcvegrbuf_size;
total_size = pd->port_rcvegrbuf_chunks * size;
@@ -1039,12 +1028,11 @@ static int mmap_rcvegrbufs(struct vm_are
ret = -EPERM;
goto bail;
}
+ /* don't allow them to later change to writeable with mprotect */
+ vma->vm_flags &= ~VM_MAYWRITE;

start = vma->vm_start;
phys = pd->port_rcvegrbuf_phys;
-
- /* don't allow them to later change to writeable with mprotect */
- vma->vm_flags &= ~VM_MAYWRITE;

for (i = 0; i < pd->port_rcvegrbuf_chunks; i++, start += size) {
ret = remap_pfn_range(vma, start, phys[i] >> PAGE_SHIFT,
@@ -1054,78 +1042,6 @@ static int mmap_rcvegrbufs(struct vm_are
}
ret = 0;

-bail:
- return ret;
-}
-
-static int mmap_rcvhdrq(struct vm_area_struct *vma,
- struct ipath_portdata *pd)
-{
- struct ipath_devdata *dd = pd->port_dd;
- size_t total_size;
- int ret;
-
- /*
- * kmalloc'ed memory, physically contiguous; this is from
- * spi_rcvhdr_base; we allow user to map read-write so they can
- * write hdrq entries to allow protocol code to directly poll
- * whether a hdrq entry has been written.
- */
- total_size = ALIGN(dd->ipath_rcvhdrcnt * dd->ipath_rcvhdrentsize *
- sizeof(u32), PAGE_SIZE);
- if ((vma->vm_end - vma->vm_start) > total_size) {
- dev_info(&dd->pcidev->dev,
- "FAIL on rcvhdrq: reqlen %lx > actual %lx\n",
- vma->vm_end - vma->vm_start,
- (unsigned long) total_size);
- ret = -EFAULT;
- goto bail;
- }
-
- ret = remap_pfn_range(vma, vma->vm_start,
- pd->port_rcvhdrq_phys >> PAGE_SHIFT,
- vma->vm_end - vma->vm_start,
- vma->vm_page_prot);
-bail:
- return ret;
-}
-
-static int mmap_pioavailregs(struct vm_area_struct *vma,
- struct ipath_portdata *pd)
-{
- struct ipath_devdata *dd = pd->port_dd;
- int ret;
-
- /*
- * when we map the PIO bufferavail registers, we want to map them as
- * readonly, no write possible.
- *
- * kmalloc'ed memory, physically contiguous, one page only, readonly
- */
-
- if ((vma->vm_end - vma->vm_start) > PAGE_SIZE) {
- dev_info(&dd->pcidev->dev, "FAIL on pioavailregs_dma: "
- "reqlen %lx > actual %lx\n",
- vma->vm_end - vma->vm_start,
- (unsigned long) PAGE_SIZE);
- ret = -EFAULT;
- goto bail;
- }
-
- if (vma->vm_flags & VM_WRITE) {
- dev_info(&dd->pcidev->dev,
- "Can't map pioavailregs as writable (flags=%lx)\n",
- vma->vm_flags);
- ret = -EPERM;
- goto bail;
- }
-
- /* don't allow them to later change with mprotect */
- vma->vm_flags &= ~VM_MAYWRITE;
-
- ret = remap_pfn_range(vma, vma->vm_start,
- dd->ipath_pioavailregs_phys >> PAGE_SHIFT,
- PAGE_SIZE, vma->vm_page_prot);
bail:
return ret;
}
@@ -1149,6 +1065,7 @@ static int ipath_mmap(struct file *fp, s

pd = port_fp(fp);
dd = pd->port_dd;
+
/*
* This is the ipath_do_user_init() code, mapping the shared buffers
* into the user process. The address referred to by vm_pgoff is the
@@ -1158,28 +1075,59 @@ static int ipath_mmap(struct file *fp, s
pgaddr = vma->vm_pgoff << PAGE_SHIFT;

/*
- * note that ureg does *NOT* have the kregvirt as part of it, to be
- * sure that for 32 bit programs, we don't end up trying to map a >
- * 44 address. Has to match ipath_get_base_info() code that sets
- * __spi_uregbase
+ * Must fit in 40 bits for our hardware; some checked elsewhere,
+ * but we'll be paranoid. Check for 0 is mostly in case one of the
+ * allocations failed, but user called mmap anyway. We want to catch
+ * that before it can match.
*/
-
+ if(!pgaddr || pgaddr >= (1ULL<<40)) {
+ ipath_dev_err(dd, "Bad physical address %llx, start %lx, end %lx\n",
+ (unsigned long long)pgaddr, vma->vm_start, vma->vm_end);
+ return -EINVAL;
+ }
+
+ /* just the offset of the port user registers, not physical addr */
ureg = dd->ipath_uregbase + dd->ipath_palign * pd->port_port;

ipath_cdbg(MM, "ushare: pgaddr %llx vm_start=%lx, vmlen %lx\n",
(unsigned long long) pgaddr, vma->vm_start,
vma->vm_end - vma->vm_start);

- if (pgaddr == ureg)
+ if(vma->vm_start & (PAGE_SIZE-1)) {
+ ipath_dev_err(dd,
+ "vm_start not aligned: %lx, end=%lx phys %lx\n",
+ vma->vm_start, vma->vm_end, (unsigned long)pgaddr);
+ ret = -EINVAL;
+ }
+ else if (pgaddr == ureg)
ret = mmap_ureg(vma, dd, ureg);
else if (pgaddr == pd->port_piobufs)
ret = mmap_piobufs(vma, dd, pd);
else if (pgaddr == (u64) pd->port_rcvegr_phys)
ret = mmap_rcvegrbufs(vma, pd);
- else if (pgaddr == (u64) pd->port_rcvhdrq_phys)
- ret = mmap_rcvhdrq(vma, pd);
+ else if (pgaddr == (u64) pd->port_rcvhdrq_phys) {
+ /*
+ * The rcvhdrq itself; readonly except on HT-400 (so have
+ * to allow writable mapping), multiple pages, contiguous
+ * from an i/o perspective.
+ */
+ unsigned total_size =
+ ALIGN(dd->ipath_rcvhdrcnt * dd->ipath_rcvhdrentsize
+ * sizeof(u32), PAGE_SIZE);
+ ret = ipath_mmap_mem(vma, pd, total_size, 1,
+ pd->port_rcvhdrq_phys,
+ "rcvhdrq");
+ }
+ else if (pgaddr == (u64)pd->port_rcvhdrqtailaddr_phys)
+ /* in-memory copy of rcvhdrq tail register */
+ ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0,
+ pd->port_rcvhdrqtailaddr_phys,
+ "rcvhdrq tail");
else if (pgaddr == dd->ipath_pioavailregs_phys)
- ret = mmap_pioavailregs(vma, pd);
+ /* in-memory copy of pioavail registers */
+ ret = ipath_mmap_mem(vma, pd, PAGE_SIZE, 0,
+ dd->ipath_pioavailregs_phys,
+ "pioavail registers");
else
ret = -EINVAL;

@@ -1532,14 +1480,6 @@ static int ipath_close(struct inode *in,
}

if (dd->ipath_kregbase) {
- if (pd->port_rcvhdrtail_uaddr) {
- pd->port_rcvhdrtail_uaddr = 0;
- pd->port_rcvhdrtail_kvaddr = NULL;
- ipath_release_user_pages_on_close(
- &pd->port_rcvhdrtail_pagep, 1);
- pd->port_rcvhdrtail_pagep = NULL;
- ipath_stats.sps_pageunlocks++;
- }
ipath_write_kreg_port(
dd, dd->ipath_kregs->kr_rcvhdrtailaddr,
port, 0ULL);
@@ -1576,9 +1516,9 @@ static int ipath_close(struct inode *in,

dd->ipath_f_clear_tids(dd, pd->port_port);

- ipath_free_pddata(dd, pd->port_port, 0);
-
+ dd->ipath_pd[pd->port_port] = NULL; /* before releasing mutex */
mutex_unlock(&ipath_mutex);
+ ipath_free_pddata(dd, pd); /* after releasing the mutex */

return ret;
}
@@ -1908,3 +1848,4 @@ bail:
bail:
return;
}
+
diff -r 09077b2f476f -r e29625bd9050 drivers/infiniband/hw/ipath/ipath_init_chip.c
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:28 2006 -0700
@@ -409,17 +409,8 @@ static int init_pioavailregs(struct ipat
/* and its length */
dd->ipath_freezelen = L1_CACHE_BYTES - sizeof(dd->ipath_statusp[0]);

- if (dd->ipath_unit * 64 > (IPATH_PORT0_RCVHDRTAIL_SIZE - 64)) {
- ipath_dev_err(dd, "unit %u too large for port 0 "
- "rcvhdrtail buffer size\n", dd->ipath_unit);
- ret = -ENODEV;
- }
- else
- ret = 0;
-
- /* so we can get current tail in ipath_kreceive(), per chip */
- dd->ipath_hdrqtailptr = &ipath_port0_rcvhdrtail[
- dd->ipath_unit * (64 / sizeof(*ipath_port0_rcvhdrtail))];
+ ret = 0;
+
done:
return ret;
}
@@ -652,7 +643,7 @@ int ipath_init_chip(struct ipath_devdata
{
int ret = 0, i;
u32 val32, kpiobufs;
- u64 val, atmp;
+ u64 val;
struct ipath_portdata *pd = NULL; /* keep gcc4 happy */

ret = init_housekeeping(dd, &pd, reinit);
@@ -775,24 +766,6 @@ int ipath_init_chip(struct ipath_devdata
goto done;
}

- val = ipath_port0_rcvhdrtail_dma + dd->ipath_unit * 64;
-
- /* verify that the alignment requirement was met */
- ipath_write_kreg_port(dd, dd->ipath_kregs->kr_rcvhdrtailaddr,
- 0, val);
- atmp = ipath_read_kreg64_port(
- dd, dd->ipath_kregs->kr_rcvhdrtailaddr, 0);
- if (val != atmp) {
- ipath_dev_err(dd, "Catastrophic software error, "
- "RcvHdrTailAddr0 written as %llx, "
- "read back as %llx from %x\n",
- (unsigned long long) val,
- (unsigned long long) atmp,
- dd->ipath_kregs->kr_rcvhdrtailaddr);
- ret = -EINVAL;
- goto done;
- }
-
ipath_write_kreg(dd, dd->ipath_kregs->kr_rcvbthqp, IPATH_KD_QP);

/*
@@ -841,12 +814,18 @@ int ipath_init_chip(struct ipath_devdata
* re-init, the simplest way to handle this is to free
* existing, and re-allocate.
*/
- if (reinit)
- ipath_free_pddata(dd, 0, 0);
+ if (reinit) {
+ struct ipath_portdata *pd = dd->ipath_pd[0];
+ dd->ipath_pd[0] = NULL;
+ ipath_free_pddata(dd, pd);
+ }
dd->ipath_f_tidtemplate(dd);
ret = ipath_create_rcvhdrq(dd, pd);
- if (!ret)
+ if (!ret) {
+ dd->ipath_hdrqtailptr =
+ (volatile __le64 *)pd->port_rcvhdrtail_kvaddr;
ret = create_port0_egr(dd);
+ }
if (ret)
ipath_dev_err(dd, "failed to allocate port 0 (kernel) "
"rcvhdrq and/or egr bufs\n");
diff -r 09077b2f476f -r e29625bd9050 drivers/infiniband/hw/ipath/ipath_intr.c
--- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700
@@ -36,6 +36,7 @@
#include "ips_common.h"
#include "ipath_layer.h"

+/* These are all rcv-related errors which we want to count for stats */
#define E_SUM_PKTERRS \
(INFINIPATH_E_RHDRLEN | INFINIPATH_E_RBADTID | \
INFINIPATH_E_RBADVERSION | INFINIPATH_E_RHDR | \
@@ -44,12 +45,25 @@
INFINIPATH_E_RFORMATERR | INFINIPATH_E_RUNSUPVL | \
INFINIPATH_E_RUNEXPCHAR | INFINIPATH_E_REBP)

+/* These are all send-related errors which we want to count for stats */
#define E_SUM_ERRS \
(INFINIPATH_E_SPIOARMLAUNCH | INFINIPATH_E_SUNEXPERRPKTNUM | \
INFINIPATH_E_SDROPPEDDATAPKT | INFINIPATH_E_SDROPPEDSMPPKT | \
INFINIPATH_E_SMAXPKTLEN | INFINIPATH_E_SUNSUPVL | \
INFINIPATH_E_SMINPKTLEN | INFINIPATH_E_SPKTLEN | \
INFINIPATH_E_INVALIDADDR)
+
+/*
+ * these are errors that can occur when the link changes state while
+ * a packet is being sent or received. This doesn't cover things
+ * like EBP or VCRC that can be the result of a sending having the
+ * link change state, so we receive a "known bad" packet.
+ */
+#define E_SUM_LINK_PKTERRS \
+ (INFINIPATH_E_SDROPPEDDATAPKT | INFINIPATH_E_SDROPPEDSMPPKT | \
+ INFINIPATH_E_SMINPKTLEN | INFINIPATH_E_SPKTLEN | \
+ INFINIPATH_E_RSHORTPKTLEN | INFINIPATH_E_RMINPKTLEN | \
+ INFINIPATH_E_RUNEXPCHAR)

static u64 handle_e_sum_errs(struct ipath_devdata *dd, ipath_err_t errs)
{
@@ -100,9 +114,7 @@ static u64 handle_e_sum_errs(struct ipat
if (ipath_debug & __IPATH_PKTDBG)
printk("\n");
}
- if ((errs & (INFINIPATH_E_SDROPPEDDATAPKT |
- INFINIPATH_E_SDROPPEDSMPPKT |
- INFINIPATH_E_SMINPKTLEN)) &&
+ if ((errs & E_SUM_LINK_PKTERRS) &&
!(dd->ipath_flags & IPATH_LINKACTIVE)) {
/*
* This can happen when SMA is trying to bring the link
@@ -111,11 +123,9 @@ static u64 handle_e_sum_errs(struct ipat
* valid. We don't want to confuse people, so we just
* don't print them, except at debug
*/
- ipath_dbg("Ignoring pktsend errors %llx, because not "
- "yet active\n", (unsigned long long) errs);
- ignore_this_time = INFINIPATH_E_SDROPPEDDATAPKT |
- INFINIPATH_E_SDROPPEDSMPPKT |
- INFINIPATH_E_SMINPKTLEN;
+ ipath_dbg("Ignoring packet errors %llx, because link not "
+ "ACTIVE\n", (unsigned long long) errs);
+ ignore_this_time = errs & E_SUM_LINK_PKTERRS;
}

return ignore_this_time;
@@ -156,7 +166,29 @@ static void handle_e_ibstatuschanged(str
*/
val = ipath_read_kreg64(dd, dd->ipath_kregs->kr_ibcstatus);
lstate = val & IPATH_IBSTATE_MASK;
- if (lstate == IPATH_IBSTATE_INIT || lstate == IPATH_IBSTATE_ARM ||
+
+ /*
+ * this is confusing enough when it happens that I want to always put it
+ * on the console and in the logs. If it was a requested state change,
+ * we'll have already cleared the flags, so we won't print this warning
+ */
+ if ((lstate != IPATH_IBSTATE_ARM && lstate != IPATH_IBSTATE_ACTIVE)
+ && (dd->ipath_flags & (IPATH_LINKARMED | IPATH_LINKACTIVE))) {
+ dev_info(&dd->pcidev->dev, "Link state changed from %s to %s\n",
+ (dd->ipath_flags & IPATH_LINKARMED) ? "ARM" : "ACTIVE",
+ ib_linkstate(lstate));
+ /*
+ * Flush all queued sends when link went to DOWN or INIT,
+ * to be sure that they don't block SMA and other MAD packets
+ */
+ ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
+ INFINIPATH_S_ABORT);
+ ipath_disarm_piobufs(dd, dd->ipath_lastport_piobuf,
+ (unsigned)(dd->ipath_piobcnt2k +
+ dd->ipath_piobcnt4k) -
+ dd->ipath_lastport_piobuf);
+ }
+ else if (lstate == IPATH_IBSTATE_INIT || lstate == IPATH_IBSTATE_ARM ||
lstate == IPATH_IBSTATE_ACTIVE) {
/*
* only print at SMA if there is a change, debug if not
@@ -379,6 +411,19 @@ static int handle_errors(struct ipath_de

if (errs & E_SUM_ERRS)
ignore_this_time = handle_e_sum_errs(dd, errs);
+ else if ((errs & E_SUM_LINK_PKTERRS) &&
+ !(dd->ipath_flags & IPATH_LINKACTIVE)) {
+ /*
+ * This can happen when SMA is trying to bring the link
+ * up, but the IB link changes state at the "wrong" time.
+ * The IB logic then complains that the packet isn't
+ * valid. We don't want to confuse people, so we just
+ * don't print them, except at debug
+ */
+ ipath_dbg("Ignoring packet errors %llx, because link not "
+ "ACTIVE\n", (unsigned long long) errs);
+ ignore_this_time = errs & E_SUM_LINK_PKTERRS;
+ }

if (supp_msgs == 250000) {
/*
diff -r 09077b2f476f -r e29625bd9050 drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:28 2006 -0700
@@ -61,9 +61,7 @@ struct ipath_portdata {
/* rcvhdrq base, needs mmap before useful */
void *port_rcvhdrq;
/* kernel virtual address where hdrqtail is updated */
- u64 *port_rcvhdrtail_kvaddr;
- /* page * used for uaddr */
- struct page *port_rcvhdrtail_pagep;
+ volatile __le64 *port_rcvhdrtail_kvaddr;
/*
* temp buffer for expected send setup, allocated at open, instead
* of each setup call
@@ -78,11 +76,7 @@ struct ipath_portdata {
dma_addr_t port_rcvegr_phys;
/* mmap of hdrq, must fit in 44 bits */
dma_addr_t port_rcvhdrq_phys;
- /*
- * the actual user address that we ipath_mlock'ed, so we can
- * ipath_munlock it at close
- */
- unsigned long port_rcvhdrtail_uaddr;
+ dma_addr_t port_rcvhdrqtailaddr_phys;
/*
* number of opens on this instance (0 or 1; ignoring forks, dup,
* etc. for now)
@@ -167,7 +161,6 @@ struct ipath_devdata {
* only written to by the chip, not the driver.
*/
volatile __le64 *ipath_hdrqtailptr;
- dma_addr_t ipath_dma_addr;
/* ipath_cfgports pointers */
struct ipath_portdata **ipath_pd;
/* sk_buffs used by port 0 eager receive queue */
@@ -518,10 +511,6 @@ struct ipath_devdata {
u8 ipath_lmc;
};

-extern volatile __le64 *ipath_port0_rcvhdrtail;
-extern dma_addr_t ipath_port0_rcvhdrtail_dma;
-
-#define IPATH_PORT0_RCVHDRTAIL_SIZE PAGE_SIZE

extern struct list_head ipath_dev_list;
extern spinlock_t ipath_devs_lock;
@@ -582,7 +571,7 @@ void ipath_disarm_piobufs(struct ipath_d
unsigned cnt);

int ipath_create_rcvhdrq(struct ipath_devdata *, struct ipath_portdata *);
-void ipath_free_pddata(struct ipath_devdata *, u32, int);
+void ipath_free_pddata(struct ipath_devdata *, struct ipath_portdata *);

int ipath_parse_ushort(const char *str, unsigned short *valp);

2006-05-12 23:45:10

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 11 of 53] ipath - don't modify QP if changes fail

Make sure modify_qp won't modify the QP if any of the changes failed.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 2fea0d127a41 -r cc6d7f2537b2 drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700
@@ -427,6 +427,7 @@ int ipath_modify_qp(struct ib_qp *ibqp,
int ipath_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr,
int attr_mask)
{
+ struct ipath_ibdev *dev = to_idev(ibqp->device);
struct ipath_qp *qp = to_iqp(ibqp);
enum ib_qp_state cur_state, new_state;
unsigned long flags;
@@ -443,6 +444,19 @@ int ipath_modify_qp(struct ib_qp *ibqp,
attr_mask))
goto inval;

+ if (attr_mask & IB_QP_AV)
+ if (attr->ah_attr.dlid == 0 ||
+ attr->ah_attr.dlid >= IPS_MULTICAST_LID_BASE)
+ goto inval;
+
+ if (attr_mask & IB_QP_PKEY_INDEX)
+ if (attr->pkey_index >= ipath_layer_get_npkeys(dev->dd))
+ goto inval;
+
+ if (attr_mask & IB_QP_MIN_RNR_TIMER)
+ if (attr->min_rnr_timer > 31)
+ goto inval;
+
switch (new_state) {
case IB_QPS_RESET:
ipath_reset_qp(qp);
@@ -457,13 +471,8 @@ int ipath_modify_qp(struct ib_qp *ibqp,

}

- if (attr_mask & IB_QP_PKEY_INDEX) {
- struct ipath_ibdev *dev = to_idev(ibqp->device);
-
- if (attr->pkey_index >= ipath_layer_get_npkeys(dev->dd))
- goto inval;
+ if (attr_mask & IB_QP_PKEY_INDEX)
qp->s_pkey_index = attr->pkey_index;
- }

if (attr_mask & IB_QP_DEST_QPN)
qp->remote_qpn = attr->dest_qp_num;
@@ -479,12 +488,8 @@ int ipath_modify_qp(struct ib_qp *ibqp,
if (attr_mask & IB_QP_ACCESS_FLAGS)
qp->qp_access_flags = attr->qp_access_flags;

- if (attr_mask & IB_QP_AV) {
- if (attr->ah_attr.dlid == 0 ||
- attr->ah_attr.dlid >= IPS_MULTICAST_LID_BASE)
- goto inval;
+ if (attr_mask & IB_QP_AV)
qp->remote_ah_attr = attr->ah_attr;
- }

if (attr_mask & IB_QP_PATH_MTU)
qp->path_mtu = attr->path_mtu;
@@ -499,11 +504,8 @@ int ipath_modify_qp(struct ib_qp *ibqp,
qp->s_rnr_retry_cnt = qp->s_rnr_retry;
}

- if (attr_mask & IB_QP_MIN_RNR_TIMER) {
- if (attr->min_rnr_timer > 31)
- goto inval;
+ if (attr_mask & IB_QP_MIN_RNR_TIMER)
qp->s_min_rnr_timer = attr->min_rnr_timer;
- }

if (attr_mask & IB_QP_QKEY)
qp->qkey = attr->qkey;

2006-05-12 23:45:11

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 12 of 53] ipath - reduce overhead of receive interrupts

Somewhat reduce overhead on receive interrupts, and count the number
of interrupts where that works (fastrcvint).

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r cc6d7f2537b2 -r ab2b013f1f95 drivers/infiniband/hw/ipath/ipath_common.h
--- a/drivers/infiniband/hw/ipath/ipath_common.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_common.h Fri May 12 15:55:28 2006 -0700
@@ -96,8 +96,8 @@ struct infinipath_stats {
__u64 sps_hwerrs;
/* number of times IB link changed state unexpectedly */
__u64 sps_iblink;
- /* no longer used; left for compatibility */
- __u64 sps_unused3;
+ /* kernel receive interrupts that didn't read intstat */
+ __u64 sps_fastrcvint;
/* number of kernel (port0) packets received */
__u64 sps_port0pkts;
/* number of "ethernet" packets sent by driver */
diff -r cc6d7f2537b2 -r ab2b013f1f95 drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700
@@ -936,12 +936,7 @@ void ipath_kreceive(struct ipath_devdata
(u32)le64_to_cpu(*dd->ipath_hdrqtailptr))
goto done;

-gotmore:
- /*
- * read only once at start. If in flood situation, this helps
- * performance slightly. If more arrive while we are processing,
- * we'll come back here and do them
- */
+ /* read only once at start for performance */
hdrqtail = (u32)le64_to_cpu(*dd->ipath_hdrqtailptr);

for (i = 0, l = dd->ipath_port0head; l != hdrqtail; i++) {
@@ -1070,10 +1065,6 @@ gotmore:

dd->ipath_port0head = l;

- if (hdrqtail != (u32)le64_to_cpu(*dd->ipath_hdrqtailptr))
- /* more arrived while we handled first batch */
- goto gotmore;
-
if (pkttot > ipath_stats.sps_maxpkts_call)
ipath_stats.sps_maxpkts_call = pkttot;
ipath_stats.sps_port0pkts += pkttot;
diff -r cc6d7f2537b2 -r ab2b013f1f95 drivers/infiniband/hw/ipath/ipath_intr.c
--- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700
@@ -493,10 +493,10 @@ static void handle_errors(struct ipath_d
continue;
if (hd == (tl + 1) ||
(!hd && tl == dd->ipath_hdrqlast)) {
+ if (i == 0)
+ chkerrpkts = 1;
dd->ipath_lastrcvhdrqtails[i] = tl;
pd->port_hdrqfull++;
- if (i == 0)
- chkerrpkts = 1;
}
}
}
@@ -678,7 +678,12 @@ set:
dd->ipath_sendctrl);
}

-static void handle_rcv(struct ipath_devdata *dd, u32 istat)
+/*
+ * Handle receive interrupts for user ports; this means a user
+ * process was waiting for a packet to arrive, and didn't want
+ * to poll
+ */
+static void handle_urcv(struct ipath_devdata *dd, u32 istat)
{
u64 portr;
int i;
@@ -688,22 +693,17 @@ static void handle_rcv(struct ipath_devd
infinipath_i_rcvavail_mask)
| ((istat >> INFINIPATH_I_RCVURG_SHIFT) &
infinipath_i_rcvurg_mask);
- for (i = 0; i < dd->ipath_cfgports; i++) {
+ for (i = 1; i < dd->ipath_cfgports; i++) {
struct ipath_portdata *pd = dd->ipath_pd[i];
- if (portr & (1 << i) && pd &&
- pd->port_cnt) {
- if (i == 0)
- ipath_kreceive(dd);
- else if (test_bit(IPATH_PORT_WAITING_RCV,
- &pd->port_flag)) {
- int rcbit;
- clear_bit(IPATH_PORT_WAITING_RCV,
- &pd->port_flag);
- rcbit = i + INFINIPATH_R_INTRAVAIL_SHIFT;
- clear_bit(1UL << rcbit, &dd->ipath_rcvctrl);
- wake_up_interruptible(&pd->port_wait);
- rcvdint = 1;
- }
+ if (portr & (1 << i) && pd && pd->port_cnt &&
+ test_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag)) {
+ int rcbit;
+ clear_bit(IPATH_PORT_WAITING_RCV,
+ &pd->port_flag);
+ rcbit = i + INFINIPATH_R_INTRAVAIL_SHIFT;
+ clear_bit(1UL << rcbit, &dd->ipath_rcvctrl);
+ wake_up_interruptible(&pd->port_wait);
+ rcvdint = 1;
}
}
if (rcvdint) {
@@ -721,19 +721,66 @@ irqreturn_t ipath_intr(int irq, void *da
struct ipath_devdata *dd = data;
u32 istat;
ipath_err_t estat = 0;
+ irqreturn_t ret;
+ u32 p0bits;
static unsigned unexpected = 0;
- irqreturn_t ret;
+ static const u32 port0rbits = (1U<<INFINIPATH_I_RCVAVAIL_SHIFT) |
+ (1U<<INFINIPATH_I_RCVURG_SHIFT);
+
+ ipath_stats.sps_ints++;

if(!(dd->ipath_flags & IPATH_PRESENT)) {
- /* this is mostly so we don't try to touch the chip while
- * it is being reset */
- /*
- * This return value is perhaps odd, but we do not want the
+ /*
+ * This return value is not great, but we do not want the
* interrupt core code to remove our interrupt handler
* because we don't appear to be handling an interrupt
* during a chip reset.
*/
return IRQ_HANDLED;
+ }
+
+ /*
+ * this needs to be flags&initted, not statusp, so we keep
+ * taking interrupts even after link goes down, etc.
+ * Also, we *must* clear the interrupt at some point, or we won't
+ * take it again, which can be real bad for errors, etc...
+ */
+
+ if (!(dd->ipath_flags & IPATH_INITTED)) {
+ ipath_bad_intr(dd, &unexpected);
+ ret = IRQ_NONE;
+ goto bail;
+ }
+
+ /*
+ * We try to avoid readint the interrupt status register, since
+ * that's a PIO read, and stalls the processor for up to about
+ * ~0.25 usec. The idea is that if we processed a port0 packet,
+ * we blindly clear the port 0 receive interrupt bits, and nothing
+ * else, then return. If other interrupts are pending, the chip
+ * will re-interrupt us as soon as we write the intclear register.
+ * We then won't process any more kernel packets (if not the 2nd
+ * time, then the 3rd or 4th) and we'll then handle the other
+ * interrupts. We clear the interrupts first so that we don't
+ * lose intr for later packets that arrive while we are processing.
+ */
+ if (dd->ipath_port0head !=
+ (u32)le64_to_cpu(*dd->ipath_hdrqtailptr)) {
+ u32 oldhead = dd->ipath_port0head;
+ if(dd->ipath_flags & IPATH_GPIO_INTR) {
+ ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear,
+ (u64) (1 << 2));
+ p0bits = port0rbits | INFINIPATH_I_GPIO;
+ }
+ else
+ p0bits = port0rbits;
+ ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, p0bits);
+ ipath_kreceive(dd);
+ if(oldhead != dd->ipath_port0head) {
+ ipath_stats.sps_fastrcvint++;
+ goto done;
+ }
+ istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus);
}

istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus);
@@ -749,31 +796,17 @@ irqreturn_t ipath_intr(int irq, void *da
goto bail;
}

- ipath_stats.sps_ints++;
-
- /*
- * this needs to be flags&initted, not statusp, so we keep
- * taking interrupts even after link goes down, etc.
- * Also, we *must* clear the interrupt at some point, or we won't
- * take it again, which can be real bad for errors, etc...
- */
-
- if (!(dd->ipath_flags & IPATH_INITTED)) {
- ipath_bad_intr(dd, &unexpected);
- ret = IRQ_NONE;
- goto bail;
- }
if (unexpected)
unexpected = 0;

- ipath_cdbg(VERBOSE, "intr stat=0x%x\n", istat);
-
- if (istat & ~infinipath_i_bitsextant)
+ if(unlikely(istat & ~infinipath_i_bitsextant))
ipath_dev_err(dd,
"interrupt with unknown interrupts %x set\n",
istat & (u32) ~ infinipath_i_bitsextant);
-
- if (istat & INFINIPATH_I_ERROR) {
+ else
+ ipath_cdbg(VERBOSE, "intr stat=0x%x\n", istat);
+
+ if(unlikely(istat & INFINIPATH_I_ERROR)) {
ipath_stats.sps_errints++;
estat = ipath_read_kreg64(dd,
dd->ipath_kregs->kr_errorstatus);
@@ -791,7 +824,14 @@ irqreturn_t ipath_intr(int irq, void *da
handle_errors(dd, estat);
}

+ p0bits = port0rbits;
if (istat & INFINIPATH_I_GPIO) {
+ /*
+ * Packets are available in the port 0 rcv queue.
+ * Eventually this needs to be generalized to check
+ * IPATH_GPIO_INTR, and the specific GPIO bit, if
+ * GPIO interrupts are used for anything else.
+ */
if (unlikely(!(dd->ipath_flags & IPATH_GPIO_INTR))) {
u32 gpiostatus;
gpiostatus = ipath_read_kreg32(
@@ -805,14 +845,7 @@ irqreturn_t ipath_intr(int irq, void *da
/* Clear GPIO status bit 2 */
ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear,
(u64) (1 << 2));
-
- /*
- * Packets are available in the port 0 rcv queue.
- * Eventually this needs to be generalized to check
- * IPATH_GPIO_INTR, and the specific GPIO bit, if
- * GPIO interrupts are used for anything else.
- */
- ipath_kreceive(dd);
+ p0bits |= INFINIPATH_I_GPIO;
}
}

@@ -825,6 +858,25 @@ irqreturn_t ipath_intr(int irq, void *da
*/
ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, istat);

+ /*
+ * we check for both transition from empty to non-empty, and urgent
+ * packets (those with the interrupt bit set in the header), and
+ * if enabled, the GPIO bit 2 interrupt used for port0 on some
+ * HT-400 boards.
+ * Do this before checking for pio buffers available, since
+ * receives can overflow; piobuf waiters can afford a few
+ * extra cycles, since they were waiting anyway.
+ */
+ if(istat & p0bits) {
+ ipath_kreceive(dd);
+ istat &= ~port0rbits;
+ }
+ if (istat & ((infinipath_i_rcvavail_mask <<
+ INFINIPATH_I_RCVAVAIL_SHIFT)
+ | (infinipath_i_rcvurg_mask <<
+ INFINIPATH_I_RCVURG_SHIFT)))
+ handle_urcv(dd, istat);
+
if (istat & INFINIPATH_I_SPIOBUFAVAIL) {
clear_bit(IPATH_S_PIOINTBUFAVAIL, &dd->ipath_sendctrl);
ipath_write_kreg(dd, dd->ipath_kregs->kr_sendctrl,
@@ -836,17 +888,7 @@ irqreturn_t ipath_intr(int irq, void *da
handle_layer_pioavail(dd);
}

- /*
- * we check for both transition from empty to non-empty, and urgent
- * packets (those with the interrupt bit set in the header)
- */
-
- if (istat & ((infinipath_i_rcvavail_mask <<
- INFINIPATH_I_RCVAVAIL_SHIFT)
- | (infinipath_i_rcvurg_mask <<
- INFINIPATH_I_RCVURG_SHIFT)))
- handle_rcv(dd, istat);
-
+done:
ret = IRQ_HANDLED;

bail:
diff -r cc6d7f2537b2 -r ab2b013f1f95 drivers/infiniband/hw/ipath/ipath_stats.c
--- a/drivers/infiniband/hw/ipath/ipath_stats.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_stats.c Fri May 12 15:55:28 2006 -0700
@@ -185,7 +185,6 @@ static void ipath_qcheck(struct ipath_de
dd->ipath_port0head,
(unsigned long long)
ipath_stats.sps_port0pkts);
- ipath_kreceive(dd);
}
dd->ipath_lastport0rcv_cnt = ipath_stats.sps_port0pkts;
}

2006-05-12 23:45:10

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 8 of 53] ipath - cap number of CQEs

Cap the number of CQEs. Not a real limitation for us, but expected by
the verbs code.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r e823378bd19c -r 1d3e85454b53 drivers/infiniband/hw/ipath/ipath_cq.c
--- a/drivers/infiniband/hw/ipath/ipath_cq.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_cq.c Fri May 12 15:55:27 2006 -0700
@@ -161,6 +161,11 @@ struct ib_cq *ipath_create_cq(struct ib_
struct ib_wc *wc;
struct ib_cq *ret;

+ if (entries > ib_ipath_max_cqe) {
+ ret = ERR_PTR(-EINVAL);
+ goto bail;
+ }
+
/*
* Need to use vmalloc() if we want to support large #s of
* entries.
diff -r e823378bd19c -r 1d3e85454b53 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
@@ -64,6 +64,11 @@ module_param_named(max_ahs, ib_ipath_max
module_param_named(max_ahs, ib_ipath_max_ahs, uint, S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(max_ahs,
"Maximum number of address handles to support");
+
+unsigned int ib_ipath_max_cqe = 0xFFFF;
+module_param_named(max_cqe, ib_ipath_max_cqe, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(max_cqe,
+ "Maximum number of completion queue entries to support");

MODULE_LICENSE("GPL");
MODULE_AUTHOR("PathScale <[email protected]>");
@@ -598,7 +603,7 @@ static int ipath_query_device(struct ib_
props->max_sge = 255;
props->max_cq = 0xffff;
props->max_ah = ib_ipath_max_ahs;
- props->max_cqe = 0xffff;
+ props->max_cqe = ib_ipath_max_cqe;
props->max_mr = dev->lk_table.max;
props->max_pd = ib_ipath_max_pds;
props->max_qp_rd_atom = 1;
diff -r e823378bd19c -r 1d3e85454b53 drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700
@@ -690,6 +690,8 @@ extern const int ib_ipath_state_ops[];

extern unsigned int ib_ipath_lkey_table_size;

+extern unsigned int ib_ipath_max_cqe;
+
extern const u32 ib_ipath_rnr_table[];

#endif /* IPATH_VERBS_H */

2006-05-12 23:45:11

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 13 of 53] ipath - limit number of SGEs and WRs per QP

We can't create more than a certain number of SGEs or WRs per QP.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r ab2b013f1f95 -r 02a05b853d20 drivers/infiniband/hw/ipath/ipath_cq.c
--- a/drivers/infiniband/hw/ipath/ipath_cq.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_cq.c Fri May 12 15:55:28 2006 -0700
@@ -162,7 +162,7 @@ struct ib_cq *ipath_create_cq(struct ib_
struct ib_wc *wc;
struct ib_cq *ret;

- if (entries > ib_ipath_max_cqe) {
+ if (entries > ib_ipath_max_cqes) {
ret = ERR_PTR(-EINVAL);
goto bail;
}
diff -r ab2b013f1f95 -r 02a05b853d20 drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700
@@ -663,8 +663,10 @@ struct ib_qp *ipath_create_qp(struct ib_
size_t sz;
struct ib_qp *ret;

- if (init_attr->cap.max_send_sge > 255 ||
- init_attr->cap.max_recv_sge > 255) {
+ if (init_attr->cap.max_send_sge > ib_ipath_max_sges ||
+ init_attr->cap.max_recv_sge > ib_ipath_max_sges ||
+ init_attr->cap.max_send_wr > ib_ipath_max_qp_wrs ||
+ init_attr->cap.max_recv_wr > ib_ipath_max_qp_wrs) {
ret = ERR_PTR(-ENOMEM);
goto bail;
}
diff -r ab2b013f1f95 -r 02a05b853d20 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
@@ -62,18 +62,25 @@ MODULE_PARM_DESC(max_pds,

static unsigned int ib_ipath_max_ahs = 0xFFFF;
module_param_named(max_ahs, ib_ipath_max_ahs, uint, S_IWUSR | S_IRUGO);
-MODULE_PARM_DESC(max_ahs,
- "Maximum number of address handles to support");
-
-unsigned int ib_ipath_max_cqe = 0xFFFF;
-module_param_named(max_cqe, ib_ipath_max_cqe, uint, S_IWUSR | S_IRUGO);
-MODULE_PARM_DESC(max_cqe,
+MODULE_PARM_DESC(max_ahs, "Maximum number of address handles to support");
+
+unsigned int ib_ipath_max_cqes = 0xFFFF;
+module_param_named(max_cqes, ib_ipath_max_cqes, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(max_cqes,
"Maximum number of completion queue entries to support");

unsigned int ib_ipath_max_cqs = 0xFFFF;
module_param_named(max_cqs, ib_ipath_max_cqs, uint, S_IWUSR | S_IRUGO);
-MODULE_PARM_DESC(max_cqs,
- "Maximum number of completion queues to support");
+MODULE_PARM_DESC(max_cqs, "Maximum number of completion queues to support");
+
+unsigned int ib_ipath_max_qp_wrs = 255;
+module_param_named(max_qp_wrs, ib_ipath_max_qp_wrs, uint,
+ S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(max_qp_wrs, "Maximum number of QP WRs to support");
+
+unsigned int ib_ipath_max_sges = 255;
+module_param_named(max_sges, ib_ipath_max_sges, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(max_sges, "Maximum number of SGEs to support");

MODULE_LICENSE("GPL");
MODULE_AUTHOR("PathScale <[email protected]>");
@@ -604,11 +611,11 @@ static int ipath_query_device(struct ib_

props->max_mr_size = ~0ull;
props->max_qp = dev->qp_table.max;
- props->max_qp_wr = 0xffff;
- props->max_sge = 255;
+ props->max_qp_wr = ib_ipath_max_qp_wrs;
+ props->max_sge = ib_ipath_max_sges;
props->max_cq = ib_ipath_max_cqs;
props->max_ah = ib_ipath_max_ahs;
- props->max_cqe = ib_ipath_max_cqe;
+ props->max_cqe = ib_ipath_max_cqes;
props->max_mr = dev->lk_table.max;
props->max_pd = ib_ipath_max_pds;
props->max_qp_rd_atom = 1;
diff -r ab2b013f1f95 -r 02a05b853d20 drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
@@ -691,10 +691,14 @@ extern const int ib_ipath_state_ops[];

extern unsigned int ib_ipath_lkey_table_size;

-extern unsigned int ib_ipath_max_cqe;
+extern unsigned int ib_ipath_max_cqes;

extern unsigned int ib_ipath_max_cqs;

+extern unsigned int ib_ipath_max_qp_wrs;
+
+extern unsigned int ib_ipath_max_sges;
+
extern const u32 ib_ipath_rnr_table[];

#endif /* IPATH_VERBS_H */

2006-05-12 23:45:10

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 6 of 53] ipath - forbid creation of AH with DLID of 0

Don't allow an AH to be created with a DLID of 0.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r db56c0ab6a64 -r def81ab50644 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
@@ -810,6 +810,11 @@ static struct ib_ah *ipath_create_ah(str
if (ah_attr->dlid >= IPS_MULTICAST_LID_BASE &&
ah_attr->dlid != IPS_PERMISSIVE_LID &&
!(ah_attr->ah_flags & IB_AH_GRH)) {
+ ret = ERR_PTR(-EINVAL);
+ goto bail;
+ }
+
+ if (ah_attr->dlid == 0) {
ret = ERR_PTR(-EINVAL);
goto bail;
}

2006-05-12 23:45:09

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 5 of 53] ipath - forbid creation of AHs with illegal ports

Don't allow an AH to be created with an illegal port.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 300f0aa6f034 -r db56c0ab6a64 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
@@ -810,6 +810,12 @@ static struct ib_ah *ipath_create_ah(str
if (ah_attr->dlid >= IPS_MULTICAST_LID_BASE &&
ah_attr->dlid != IPS_PERMISSIVE_LID &&
!(ah_attr->ah_flags & IB_AH_GRH)) {
+ ret = ERR_PTR(-EINVAL);
+ goto bail;
+ }
+
+ if (ah_attr->port_num != 1 ||
+ ah_attr->port_num > pd->device->phys_port_cnt) {
ret = ERR_PTR(-EINVAL);
goto bail;
}

2006-05-12 23:50:05

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 38 of 53] ipath - SRQ compliance checks

We were not rigorous enough in checking SRQs.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r f8debae94d44 -r e9306861dc6a drivers/infiniband/hw/ipath/ipath_srq.c
--- a/drivers/infiniband/hw/ipath/ipath_srq.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_srq.c Fri May 12 15:55:28 2006 -0700
@@ -125,11 +125,23 @@ struct ib_srq *ipath_create_srq(struct i
struct ib_srq_init_attr *srq_init_attr,
struct ib_udata *udata)
{
+ struct ipath_ibdev *dev = to_idev(ibpd->device);
struct ipath_srq *srq;
u32 sz;
struct ib_srq *ret;

- if (srq_init_attr->attr.max_sge < 1) {
+ if (dev->n_srqs_allocated == ib_ipath_max_srqs) {
+ ret = ERR_PTR(-ENOMEM);
+ goto bail;
+ }
+
+ if (srq_init_attr->attr.max_wr == 0) {
+ ret = ERR_PTR(-EINVAL);
+ goto bail;
+ }
+
+ if ((srq_init_attr->attr.max_sge > ib_ipath_max_srq_sges) ||
+ (srq_init_attr->attr.max_wr > ib_ipath_max_srq_wrs)) {
ret = ERR_PTR(-EINVAL);
goto bail;
}
@@ -164,6 +176,8 @@ struct ib_srq *ipath_create_srq(struct i

ret = &srq->ibsrq;

+ dev->n_srqs_allocated++;
+
bail:
return ret;
}
@@ -181,24 +195,26 @@ int ipath_modify_srq(struct ib_srq *ibsr
unsigned long flags;
int ret;

- if (attr_mask & IB_SRQ_LIMIT) {
- spin_lock_irqsave(&srq->rq.lock, flags);
- srq->limit = attr->srq_limit;
- spin_unlock_irqrestore(&srq->rq.lock, flags);
- }
+ if (attr_mask & IB_SRQ_MAX_WR)
+ if ((attr->max_wr > ib_ipath_max_srq_wrs) ||
+ (attr->max_sge > srq->rq.max_sge)) {
+ ret = -EINVAL;
+ goto bail;
+ }
+
+ if (attr_mask & IB_SRQ_LIMIT)
+ if (attr->srq_limit >= srq->rq.size) {
+ ret = -EINVAL;
+ goto bail;
+ }
+
if (attr_mask & IB_SRQ_MAX_WR) {
- u32 size = attr->max_wr + 1;
struct ipath_rwqe *wq, *p;
- u32 n;
- u32 sz;
-
- if (attr->max_sge < srq->rq.max_sge) {
- ret = -EINVAL;
- goto bail;
- }
+ u32 sz, size, n;

sz = sizeof(struct ipath_rwqe) +
attr->max_sge * sizeof(struct ipath_sge);
+ size = attr->max_wr + 1;
wq = vmalloc(size * sz);
if (!wq) {
ret = -ENOMEM;
@@ -242,6 +258,11 @@ int ipath_modify_srq(struct ib_srq *ibsr
spin_unlock_irqrestore(&srq->rq.lock, flags);
}

+ if (attr_mask & IB_SRQ_LIMIT) {
+ spin_lock_irqsave(&srq->rq.lock, flags);
+ srq->limit = attr->srq_limit;
+ spin_unlock_irqrestore(&srq->rq.lock, flags);
+ }
ret = 0;

bail:
@@ -265,7 +286,9 @@ int ipath_destroy_srq(struct ib_srq *ibs
int ipath_destroy_srq(struct ib_srq *ibsrq)
{
struct ipath_srq *srq = to_isrq(ibsrq);
-
+ struct ipath_ibdev *dev = to_idev(ibsrq->device);
+
+ dev->n_srqs_allocated--;
vfree(srq->rq.wq);
kfree(srq);

diff -r f8debae94d44 -r e9306861dc6a drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
@@ -436,6 +436,7 @@ struct ipath_ibdev {
u32 n_pds_allocated; /* number of PDs allocated for device */
u32 n_ahs_allocated; /* number of AHs allocated for device */
u32 n_cqs_allocated; /* number of CQs allocated for device */
+ u32 n_srqs_allocated; /* number of SRQs allocated for device */
u32 n_mcast_grps_allocated; /* number of mcast groups allocated */
u64 ipath_sword; /* total dwords sent (sample result) */
u64 ipath_rword; /* total dwords received (sample result) */

2006-05-12 23:45:09

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 52 of 53] ipath - register as IB device owner

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 5f665c503f0d -r fd9bdeea5b10 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 16:41:45 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 16:42:39 2006 -0700
@@ -1060,6 +1060,7 @@ static void *ipath_register_ib_device(in
idev->dd = dd;

strlcpy(dev->name, "ipath%d", IB_DEVICE_NAME_MAX);
+ dev->owner = THIS_MODULE;
dev->node_guid = ipath_layer_get_guid(dd);
dev->uverbs_abi_ver = IPATH_UVERBS_ABI_VERSION;
dev->uverbs_cmd_mask =

2006-05-12 23:45:09

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 2 of 53] ipath - purge sps_lid and sps_mlid arrays, and /sys entries

The two arrays only had space for 4 units, so didn't work for larger
numbers of units. I thought I'd eliminated these before submitting the
original driver patches.

Also fixed error return on ipath_sysfs_unit_write to not set an error
code if the sysfs code reports consuming more chars than we wrote (since
that can include the nul, and the user doesn't have to include the nul
in the write).

Also changed from ipath_set_sps_lid() to ipath_set_lid(); the sps
was a leftover piece of naming.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 9b9f24aab350 -r 3ab7a7b10bf2 drivers/infiniband/hw/ipath/ipath_common.h
--- a/drivers/infiniband/hw/ipath/ipath_common.h Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_common.h Fri May 12 15:55:27 2006 -0700
@@ -121,8 +121,7 @@ struct infinipath_stats {
__u64 sps_ports;
/* list of pkeys (other than default) accepted (0 means not set) */
__u16 sps_pkeys[4];
- /* lids for up to 4 infinipaths, indexed by infinipath # */
- __u16 sps_lid[4];
+ __u16 sps_unused16[4]; /* available; maintaining compatible layout */
/* number of user ports per chip (not IB ports) */
__u32 sps_nports;
/* not our interrupt, or already handled */
@@ -140,10 +139,8 @@ struct infinipath_stats {
* packets if ipath not configured, sma/mad, etc.)
*/
__u64 sps_krdrops;
- /* mlids for up to 4 infinipaths, indexed by infinipath # */
- __u16 sps_mlid[4];
/* pad for future growth */
- __u64 __sps_pad[45];
+ __u64 __sps_pad[46];
};

/*
diff -r 9b9f24aab350 -r 3ab7a7b10bf2 drivers/infiniband/hw/ipath/ipath_init_chip.c
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:27 2006 -0700
@@ -836,8 +836,6 @@ int ipath_init_chip(struct ipath_devdata
/* clear any interrups up to this point (ints still not enabled) */
ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, -1LL);

- ipath_stats.sps_lid[dd->ipath_unit] = dd->ipath_lid;
-
/*
* Set up the port 0 (kernel) rcvhdr q and egr TIDs. If doing
* re-init, the simplest way to handle this is to free
diff -r 9b9f24aab350 -r 3ab7a7b10bf2 drivers/infiniband/hw/ipath/ipath_layer.c
--- a/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:27 2006 -0700
@@ -299,9 +299,8 @@ bail:

EXPORT_SYMBOL_GPL(ipath_layer_set_mtu);

-int ipath_set_sps_lid(struct ipath_devdata *dd, u32 arg, u8 lmc)
-{
- ipath_stats.sps_lid[dd->ipath_unit] = arg;
+int ipath_set_lid(struct ipath_devdata *dd, u32 arg, u8 lmc)
+{
dd->ipath_lid = arg;
dd->ipath_lmc = lmc;

@@ -315,7 +314,7 @@ int ipath_set_sps_lid(struct ipath_devda
return 0;
}

-EXPORT_SYMBOL_GPL(ipath_set_sps_lid);
+EXPORT_SYMBOL_GPL(ipath_set_lid);

int ipath_layer_set_guid(struct ipath_devdata *dd, __be64 guid)
{
@@ -616,9 +615,9 @@ int ipath_layer_open(struct ipath_devdat

if (*dd->ipath_statusp & IPATH_STATUS_IB_READY)
intval |= IPATH_LAYER_INT_IF_UP;
- if (ipath_stats.sps_lid[dd->ipath_unit])
+ if (dd->ipath_lid)
intval |= IPATH_LAYER_INT_LID;
- if (ipath_stats.sps_mlid[dd->ipath_unit])
+ if (dd->ipath_mlid)
intval |= IPATH_LAYER_INT_BCAST;
/*
* do this on open, in case low level is already up and
diff -r 9b9f24aab350 -r 3ab7a7b10bf2 drivers/infiniband/hw/ipath/ipath_layer.h
--- a/drivers/infiniband/hw/ipath/ipath_layer.h Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_layer.h Fri May 12 15:55:27 2006 -0700
@@ -126,7 +126,7 @@ u32 ipath_layer_get_cr_errpkey(struct ip
u32 ipath_layer_get_cr_errpkey(struct ipath_devdata *dd);
int ipath_layer_set_linkstate(struct ipath_devdata *dd, u8 state);
int ipath_layer_set_mtu(struct ipath_devdata *, u16);
-int ipath_set_sps_lid(struct ipath_devdata *, u32, u8);
+int ipath_set_lid(struct ipath_devdata *, u32, u8);
int ipath_layer_send_hdr(struct ipath_devdata *dd,
struct ether_header *hdr);
int ipath_verbs_send(struct ipath_devdata *dd, u32 hdrwords,
diff -r 9b9f24aab350 -r 3ab7a7b10bf2 drivers/infiniband/hw/ipath/ipath_mad.c
--- a/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:27 2006 -0700
@@ -341,7 +341,7 @@ static int recv_subn_set_portinfo(struct
/* Must be a valid unicast LID address. */
if (lid == 0 || lid >= IPS_MULTICAST_LID_BASE)
goto err;
- ipath_set_sps_lid(dev->dd, lid, pip->mkeyprot_resv_lmc & 7);
+ ipath_set_lid(dev->dd, lid, pip->mkeyprot_resv_lmc & 7);
event.event = IB_EVENT_LID_CHANGE;
ib_dispatch_event(&event);
}
diff -r 9b9f24aab350 -r 3ab7a7b10bf2 drivers/infiniband/hw/ipath/ipath_sysfs.c
--- a/drivers/infiniband/hw/ipath/ipath_sysfs.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c Fri May 12 15:55:27 2006 -0700
@@ -84,98 +84,6 @@ static ssize_t show_num_units(struct dev
ipath_count_units(NULL, NULL, NULL));
}

-#define DRIVER_STAT(name, attr) \
- static ssize_t show_stat_##name(struct device_driver *dev, \
- char *buf) \
- { \
- return scnprintf( \
- buf, PAGE_SIZE, "%llu\n", \
- (unsigned long long) ipath_stats.sps_ ##attr); \
- } \
- static DRIVER_ATTR(name, S_IRUGO, show_stat_##name, NULL)
-
-DRIVER_STAT(intrs, ints);
-DRIVER_STAT(err_intrs, errints);
-DRIVER_STAT(errs, errs);
-DRIVER_STAT(pkt_errs, pkterrs);
-DRIVER_STAT(crc_errs, crcerrs);
-DRIVER_STAT(hw_errs, hwerrs);
-DRIVER_STAT(ib_link, iblink);
-DRIVER_STAT(port0_pkts, port0pkts);
-DRIVER_STAT(ether_spkts, ether_spkts);
-DRIVER_STAT(ether_rpkts, ether_rpkts);
-DRIVER_STAT(sma_spkts, sma_spkts);
-DRIVER_STAT(sma_rpkts, sma_rpkts);
-DRIVER_STAT(hdrq_full, hdrqfull);
-DRIVER_STAT(etid_full, etidfull);
-DRIVER_STAT(no_piobufs, nopiobufs);
-DRIVER_STAT(ports, ports);
-DRIVER_STAT(pkey0, pkeys[0]);
-DRIVER_STAT(pkey1, pkeys[1]);
-DRIVER_STAT(pkey2, pkeys[2]);
-DRIVER_STAT(pkey3, pkeys[3]);
-/* XXX fix the following when dynamic table of devices used */
-DRIVER_STAT(lid0, lid[0]);
-DRIVER_STAT(lid1, lid[1]);
-DRIVER_STAT(lid2, lid[2]);
-DRIVER_STAT(lid3, lid[3]);
-
-DRIVER_STAT(nports, nports);
-DRIVER_STAT(null_intr, nullintr);
-DRIVER_STAT(max_pkts_call, maxpkts_call);
-DRIVER_STAT(avg_pkts_call, avgpkts_call);
-DRIVER_STAT(page_locks, pagelocks);
-DRIVER_STAT(page_unlocks, pageunlocks);
-DRIVER_STAT(krdrops, krdrops);
-/* XXX fix the following when dynamic table of devices used */
-DRIVER_STAT(mlid0, mlid[0]);
-DRIVER_STAT(mlid1, mlid[1]);
-DRIVER_STAT(mlid2, mlid[2]);
-DRIVER_STAT(mlid3, mlid[3]);
-
-static struct attribute *driver_stat_attributes[] = {
- &driver_attr_intrs.attr,
- &driver_attr_err_intrs.attr,
- &driver_attr_errs.attr,
- &driver_attr_pkt_errs.attr,
- &driver_attr_crc_errs.attr,
- &driver_attr_hw_errs.attr,
- &driver_attr_ib_link.attr,
- &driver_attr_port0_pkts.attr,
- &driver_attr_ether_spkts.attr,
- &driver_attr_ether_rpkts.attr,
- &driver_attr_sma_spkts.attr,
- &driver_attr_sma_rpkts.attr,
- &driver_attr_hdrq_full.attr,
- &driver_attr_etid_full.attr,
- &driver_attr_no_piobufs.attr,
- &driver_attr_ports.attr,
- &driver_attr_pkey0.attr,
- &driver_attr_pkey1.attr,
- &driver_attr_pkey2.attr,
- &driver_attr_pkey3.attr,
- &driver_attr_lid0.attr,
- &driver_attr_lid1.attr,
- &driver_attr_lid2.attr,
- &driver_attr_lid3.attr,
- &driver_attr_nports.attr,
- &driver_attr_null_intr.attr,
- &driver_attr_max_pkts_call.attr,
- &driver_attr_avg_pkts_call.attr,
- &driver_attr_page_locks.attr,
- &driver_attr_page_unlocks.attr,
- &driver_attr_krdrops.attr,
- &driver_attr_mlid0.attr,
- &driver_attr_mlid1.attr,
- &driver_attr_mlid2.attr,
- &driver_attr_mlid3.attr,
- NULL
-};
-
-static struct attribute_group driver_stat_attr_group = {
- .name = "stats",
- .attrs = driver_stat_attributes
-};

static ssize_t show_status(struct device *dev,
struct device_attribute *attr,
@@ -272,7 +180,7 @@ static ssize_t store_lid(struct device *
size_t count)
{
struct ipath_devdata *dd = dev_get_drvdata(dev);
- u16 lid;
+ u16 lid = 0; /* gcc thinks might be un-initialized */
int ret;

ret = ipath_parse_ushort(buf, &lid);
@@ -284,11 +192,11 @@ static ssize_t store_lid(struct device *
goto invalid;
}

- ipath_set_sps_lid(dd, lid, 0);
+ ipath_set_lid(dd, lid, 0);

goto bail;
invalid:
- ipath_dev_err(dd, "attempt to set invalid LID\n");
+ ipath_dev_err(dd, "attempt to set invalid LID 0x%x\n", lid);
bail:
return ret;
}
@@ -319,7 +227,6 @@ static ssize_t store_mlid(struct device
unit = dd->ipath_unit;

dd->ipath_mlid = mlid;
- ipath_stats.sps_mlid[unit] = mlid;
ipath_layer_intr(dd, IPATH_LAYER_INT_BCAST);

goto bail;
@@ -737,17 +644,12 @@ int ipath_driver_create_group(struct dev
if (ret)
goto bail;

- ret = sysfs_create_group(&drv->kobj, &driver_stat_attr_group);
- if (ret)
- sysfs_remove_group(&drv->kobj, &driver_attr_group);
-
bail:
return ret;
}

void ipath_driver_remove_group(struct device_driver *drv)
{
- sysfs_remove_group(&drv->kobj, &driver_stat_attr_group);
sysfs_remove_group(&drv->kobj, &driver_attr_group);
}

2006-05-12 23:45:09

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 51 of 53] ipath - fix reporting of vendor ID and a few other trivial bits

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r bd1de2e983db -r 5f665c503f0d drivers/infiniband/hw/ipath/ipath_layer.c
--- a/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 16:41:45 2006 -0700
@@ -339,18 +339,26 @@ u32 ipath_layer_get_nguid(struct ipath_d

EXPORT_SYMBOL_GPL(ipath_layer_get_nguid);

-int ipath_layer_query_device(struct ipath_devdata *dd, u32 * vendor,
- u32 * boardrev, u32 * majrev, u32 * minrev)
-{
- *vendor = dd->ipath_vendorid;
- *boardrev = dd->ipath_boardrev;
- *majrev = dd->ipath_majrev;
- *minrev = dd->ipath_minrev;
-
- return 0;
-}
-
-EXPORT_SYMBOL_GPL(ipath_layer_query_device);
+u32 ipath_layer_get_majrev(struct ipath_devdata *dd)
+{
+ return dd->ipath_majrev;
+}
+
+EXPORT_SYMBOL_GPL(ipath_layer_get_majrev);
+
+u32 ipath_layer_get_minrev(struct ipath_devdata *dd)
+{
+ return dd->ipath_minrev;
+}
+
+EXPORT_SYMBOL_GPL(ipath_layer_get_minrev);
+
+u32 ipath_layer_get_pcirev(struct ipath_devdata *dd)
+{
+ return dd->ipath_pcirev;
+}
+
+EXPORT_SYMBOL_GPL(ipath_layer_get_pcirev);

u32 ipath_layer_get_flags(struct ipath_devdata *dd)
{
@@ -372,6 +380,13 @@ u16 ipath_layer_get_deviceid(struct ipat
}

EXPORT_SYMBOL_GPL(ipath_layer_get_deviceid);
+
+u32 ipath_layer_get_vendorid(struct ipath_devdata *dd)
+{
+ return dd->ipath_vendorid;
+}
+
+EXPORT_SYMBOL_GPL(ipath_layer_get_vendorid);

u64 ipath_layer_get_lastibcstat(struct ipath_devdata *dd)
{
diff -r bd1de2e983db -r 5f665c503f0d drivers/infiniband/hw/ipath/ipath_layer.h
--- a/drivers/infiniband/hw/ipath/ipath_layer.h Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_layer.h Fri May 12 16:41:45 2006 -0700
@@ -145,11 +145,13 @@ int ipath_layer_set_guid(struct ipath_de
int ipath_layer_set_guid(struct ipath_devdata *, __be64 guid);
__be64 ipath_layer_get_guid(struct ipath_devdata *);
u32 ipath_layer_get_nguid(struct ipath_devdata *);
-int ipath_layer_query_device(struct ipath_devdata *, u32 * vendor,
- u32 * boardrev, u32 * majrev, u32 * minrev);
+u32 ipath_layer_get_majrev(struct ipath_devdata *);
+u32 ipath_layer_get_minrev(struct ipath_devdata *);
+u32 ipath_layer_get_pcirev(struct ipath_devdata *);
u32 ipath_layer_get_flags(struct ipath_devdata *dd);
struct device *ipath_layer_get_device(struct ipath_devdata *dd);
u16 ipath_layer_get_deviceid(struct ipath_devdata *dd);
+u32 ipath_layer_get_vendorid(struct ipath_devdata *);
u64 ipath_layer_get_lastibcstat(struct ipath_devdata *dd);
u32 ipath_layer_get_ibmtu(struct ipath_devdata *dd);
int ipath_layer_enable_timer(struct ipath_devdata *dd);
diff -r bd1de2e983db -r 5f665c503f0d drivers/infiniband/hw/ipath/ipath_mad.c
--- a/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 16:41:45 2006 -0700
@@ -84,7 +84,7 @@ static int recv_subn_get_nodeinfo(struct
{
struct nodeinfo *nip = (struct nodeinfo *)&smp->data;
struct ipath_devdata *dd = to_idev(ibdev)->dd;
- u32 vendor, boardid, majrev, minrev;
+ u32 vendor, majrev, minrev;

if (smp->attr_mod)
smp->status |= IB_SMP_INVALID_FIELD;
@@ -104,9 +104,11 @@ static int recv_subn_get_nodeinfo(struct
nip->port_guid = nip->sys_guid;
nip->partition_cap = cpu_to_be16(ipath_layer_get_npkeys(dd));
nip->device_id = cpu_to_be16(ipath_layer_get_deviceid(dd));
- ipath_layer_query_device(dd, &vendor, &boardid, &majrev, &minrev);
+ majrev = ipath_layer_get_majrev(dd);
+ minrev = ipath_layer_get_minrev(dd);
nip->revision = cpu_to_be32((majrev << 16) | minrev);
nip->local_port_num = port;
+ vendor = ipath_layer_get_vendorid(dd);
nip->vendor_id[0] = 0;
nip->vendor_id[1] = vendor >> 8;
nip->vendor_id[2] = vendor;
diff -r bd1de2e983db -r 5f665c503f0d drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 16:41:45 2006 -0700
@@ -620,18 +620,15 @@ static int ipath_query_device(struct ib_
struct ib_device_attr *props)
{
struct ipath_ibdev *dev = to_idev(ibdev);
- u32 vendor, boardrev, majrev, minrev;

memset(props, 0, sizeof(*props));

props->device_cap_flags = IB_DEVICE_BAD_PKEY_CNTR |
IB_DEVICE_BAD_QKEY_CNTR | IB_DEVICE_SHUTDOWN_PORT |
IB_DEVICE_SYS_IMAGE_GUID;
- ipath_layer_query_device(dev->dd, &vendor, &boardrev,
- &majrev, &minrev);
- props->vendor_id = vendor;
- props->vendor_part_id = boardrev;
- props->hw_ver = boardrev << 16 | majrev << 8 | minrev;
+ props->vendor_id = ipath_layer_get_vendorid(dev->dd);
+ props->vendor_part_id = ipath_layer_get_deviceid(dev->dd);
+ props->hw_ver = ipath_layer_get_pcirev(dev->dd);

props->sys_image_guid = dev->sys_image_guid;

@@ -1220,11 +1217,8 @@ static ssize_t show_rev(struct class_dev
{
struct ipath_ibdev *dev =
container_of(cdev, struct ipath_ibdev, ibdev.class_dev);
- int vendor, boardrev, majrev, minrev;
-
- ipath_layer_query_device(dev->dd, &vendor, &boardrev,
- &majrev, &minrev);
- return sprintf(buf, "%d.%d\n", majrev, minrev);
+
+ return sprintf(buf, "%x\n", ipath_layer_get_pcirev(dev->dd));
}

static ssize_t show_hca(struct class_device *cdev, char *buf)
@@ -1253,7 +1247,7 @@ static ssize_t show_stats(struct class_d
len = sprintf(buf,
"RC resends %d\n"
"RC no QACK %d\n"
- "RC ACKs %d\n"
+ "RC ACKs %d\n"
"RC SEQ NAKs %d\n"
"RC RDMA seq %d\n"
"RC RNR NAKs %d\n"
@@ -1263,7 +1257,7 @@ static ssize_t show_stats(struct class_d
"piobuf wait %d\n"
"no piobuf %d\n"
"PKT drops %d\n"
- "WQE errs %d\n",
+ "WQE errs %d\n",
dev->n_rc_resends, dev->n_rc_qacks, dev->n_rc_acks,
dev->n_seq_naks, dev->n_rdma_seq, dev->n_rnr_naks,
dev->n_other_naks, dev->n_timeouts,

2006-05-12 23:45:08

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r fd9bdeea5b10 -r f8ebb8c1e436 drivers/infiniband/hw/ipath/ipath_eeprom.c
--- a/drivers/infiniband/hw/ipath/ipath_eeprom.c Fri May 12 16:42:39 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_eeprom.c Fri May 12 16:42:39 2006 -0700
@@ -185,6 +185,7 @@ bail:
*/
static void i2c_wait_for_writes(struct ipath_devdata *dd)
{
+ mb();
(void)ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
}

2006-05-12 23:45:08

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 44 of 53] ipath - allow diags on any unit

Previously, we hardwired all diags to unit 0.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 7634b2f0fc40 -r 28d938eb0463 drivers/infiniband/hw/ipath/ipath_diag.c
--- a/drivers/infiniband/hw/ipath/ipath_diag.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_diag.c Fri May 12 15:55:29 2006 -0700
@@ -66,18 +66,20 @@ static struct file_operations diag_file_
.release = ipath_diag_release
};

-static struct cdev *diag_cdev;
-static struct class_device *diag_class_dev;
-
-int ipath_diag_init(void)
-{
- return ipath_cdev_init(IPATH_DIAG_MINOR, "ipath_diag",
- &diag_file_ops, &diag_cdev, &diag_class_dev);
-}
-
-void ipath_diag_cleanup(void)
-{
- ipath_cdev_cleanup(&diag_cdev, &diag_class_dev);
+int ipath_diag_add(struct ipath_devdata *dd)
+{
+ char name[16];
+
+ snprintf(name, sizeof(name), "ipath_diag%d", dd->ipath_unit);
+
+ return ipath_cdev_init(IPATH_DIAG_MINOR_BASE + dd->ipath_unit, name,
+ &diag_file_ops, &dd->diag_cdev,
+ &dd->diag_class_dev);
+}
+
+void ipath_diag_remove(struct ipath_devdata *dd)
+{
+ ipath_cdev_cleanup(&dd->diag_cdev, &dd->diag_class_dev);
}

/**
@@ -101,8 +103,7 @@ static int ipath_read_umem64(struct ipat
int ret;

/* not very efficient, but it works for now */
- if (reg_addr < dd->ipath_kregbase ||
- reg_end > dd->ipath_kregend) {
+ if (reg_addr < dd->ipath_kregbase || reg_end > dd->ipath_kregend) {
ret = -EINVAL;
goto bail;
}
@@ -139,8 +140,7 @@ static int ipath_write_umem64(struct ipa
int ret;

/* not very efficient, but it works for now */
- if (reg_addr < dd->ipath_kregbase ||
- reg_end > dd->ipath_kregend) {
+ if (reg_addr < dd->ipath_kregbase || reg_end > dd->ipath_kregend) {
ret = -EINVAL;
goto bail;
}
@@ -240,59 +240,45 @@ bail:

static int ipath_diag_open(struct inode *in, struct file *fp)
{
+ int unit = iminor(in) - IPATH_DIAG_MINOR_BASE;
struct ipath_devdata *dd;
- int unit = 0; /* XXX this is bogus */
- unsigned long flags;
- int ret;
-
- dd = ipath_lookup(unit);
+ int ret;

mutex_lock(&ipath_mutex);
- spin_lock_irqsave(&ipath_devs_lock, flags);

if (ipath_diag_inuse) {
ret = -EBUSY;
goto bail;
}

- list_for_each_entry(dd, &ipath_dev_list, ipath_list) {
- /*
- * we need at least one infinipath device to be present
- * (don't use INITTED, because we want to be able to open
- * even if device is in freeze mode, which cleared INITTED).
- * There is a small amount of risk to this, which is why we
- * also verify kregbase is set.
- */
-
- if (!(dd->ipath_flags & IPATH_PRESENT) ||
- !dd->ipath_kregbase)
- continue;
-
- ipath_diag_inuse = 1;
- diag_set_link = 0;
- ret = 0;
- goto bail;
- }
-
- ret = -ENODEV;
-
-bail:
- spin_unlock_irqrestore(&ipath_devs_lock, flags);
+ dd = ipath_lookup(unit);
+
+ if (dd == NULL || !(dd->ipath_flags & IPATH_PRESENT) ||
+ !dd->ipath_kregbase) {
+ ret = -ENODEV;
+ goto bail;
+ }
+
+ fp->private_data = dd;
+ ipath_diag_inuse = 1;
+ diag_set_link = 0;
+ ret = 0;

/* Only expose a way to reset the device if we
make it into diag mode. */
- if (ret == 0)
- ipath_expose_reset(&dd->pcidev->dev);
-
+ ipath_expose_reset(&dd->pcidev->dev);
+
+bail:
mutex_unlock(&ipath_mutex);

return ret;
}

-static int ipath_diag_release(struct inode *i, struct file *f)
+static int ipath_diag_release(struct inode *in, struct file *fp)
{
mutex_lock(&ipath_mutex);
ipath_diag_inuse = 0;
+ fp->private_data = NULL;
mutex_unlock(&ipath_mutex);
return 0;
}
@@ -300,16 +286,9 @@ static ssize_t ipath_diag_read(struct fi
static ssize_t ipath_diag_read(struct file *fp, char __user *data,
size_t count, loff_t *off)
{
- int unit = 0; /* XXX provide for reads on other units some day */
- struct ipath_devdata *dd;
+ struct ipath_devdata *dd = fp->private_data;
void __iomem *kreg_base;
ssize_t ret;
-
- dd = ipath_lookup(unit);
- if (!dd) {
- ret = -ENODEV;
- goto bail;
- }

kreg_base = dd->ipath_kregbase;

@@ -329,23 +308,16 @@ static ssize_t ipath_diag_read(struct fi
ret = count;
}

-bail:
return ret;
}

static ssize_t ipath_diag_write(struct file *fp, const char __user *data,
size_t count, loff_t *off)
{
- int unit = 0; /* XXX this is bogus */
- struct ipath_devdata *dd;
+ struct ipath_devdata *dd = fp->private_data;
void __iomem *kreg_base;
ssize_t ret;

- dd = ipath_lookup(unit);
- if (!dd) {
- ret = -ENODEV;
- goto bail;
- }
kreg_base = dd->ipath_kregbase;

if (count == 0)
@@ -364,6 +336,6 @@ static ssize_t ipath_diag_write(struct f
ret = count;
}

-bail:
- return ret;
-}
+ return ret;
+}
+
diff -r 7634b2f0fc40 -r 28d938eb0463 drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:29 2006 -0700
@@ -488,6 +488,7 @@ static int __devinit ipath_init_one(stru
ipath_device_create_group(&pdev->dev, dd);
ipathfs_add_device(dd);
ipath_user_add(dd);
+ ipath_diag_add(dd);
ipath_layer_add(dd);

goto bail;
@@ -517,8 +518,9 @@ static void __devexit ipath_remove_one(s
return;

dd = pci_get_drvdata(pdev);
- ipath_layer_del(dd);
- ipath_user_del(dd);
+ ipath_layer_remove(dd);
+ ipath_diag_remove(dd);
+ ipath_user_remove(dd);
ipathfs_remove_device(dd);
ipath_device_remove_group(&pdev->dev, dd);
ipath_cdbg(VERBOSE, "Releasing pci memory regions, dd %p, "
diff -r 7634b2f0fc40 -r 28d938eb0463 drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:29 2006 -0700
@@ -1390,16 +1390,16 @@ done:

static int ipath_open(struct inode *in, struct file *fp)
{
- int ret, minor;
+ int ret, user_minor;

mutex_lock(&ipath_mutex);

- minor = iminor(in);
+ user_minor = iminor(in) - IPATH_USER_MINOR_BASE;
ipath_cdbg(VERBOSE, "open on dev %lx (minor %d)\n",
- (long)in->i_rdev, minor);
-
- if (minor)
- ret = find_free_port(minor - 1, fp);
+ (long)in->i_rdev, user_minor);
+
+ if (user_minor)
+ ret = find_free_port(user_minor - 1, fp);
else
ret = find_best_unit(fp);

@@ -1799,19 +1799,13 @@ int ipath_user_add(struct ipath_devdata
"error %d\n", -ret);
goto bail;
}
- ret = ipath_diag_init();
- if (ret < 0) {
- ipath_dev_err(dd, "Unable to set up diag support: "
- "error %d\n", -ret);
- goto bail_sma;
- }

ret = init_cdev(0, "ipath", &ipath_file_ops, &wildcard_cdev,
&wildcard_class_dev);
if (ret < 0) {
ipath_dev_err(dd, "Could not create wildcard "
"minor: error %d\n", -ret);
- goto bail_diag;
+ goto bail_sma;
}

atomic_set(&user_setup, 1);
@@ -1820,31 +1814,28 @@ int ipath_user_add(struct ipath_devdata
snprintf(name, sizeof(name), "ipath%d", dd->ipath_unit);

ret = init_cdev(dd->ipath_unit + 1, name, &ipath_file_ops,
- &dd->cdev, &dd->class_dev);
+ &dd->user_cdev, &dd->user_class_dev);
if (ret < 0)
ipath_dev_err(dd, "Could not create user minor %d, %s\n",
dd->ipath_unit + 1, name);

goto bail;

-bail_diag:
- ipath_diag_cleanup();
bail_sma:
user_cleanup();
bail:
return ret;
}

-void ipath_user_del(struct ipath_devdata *dd)
-{
- cleanup_cdev(&dd->cdev, &dd->class_dev);
+void ipath_user_remove(struct ipath_devdata *dd)
+{
+ cleanup_cdev(&dd->user_cdev, &dd->user_class_dev);

if (atomic_dec_return(&user_count) == 0) {
if (atomic_read(&user_setup) == 0)
goto bail;

cleanup_cdev(&wildcard_cdev, &wildcard_class_dev);
- ipath_diag_cleanup();
user_cleanup();

atomic_set(&user_setup, 0);
diff -r 7634b2f0fc40 -r 28d938eb0463 drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:29 2006 -0700
@@ -347,8 +347,10 @@ struct ipath_devdata {
char *ipath_freezemsg;
/* pci access data structure */
struct pci_dev *pcidev;
- struct cdev *cdev;
- struct class_device *class_dev;
+ struct cdev *user_cdev;
+ struct cdev *diag_cdev;
+ struct class_device *user_class_dev;
+ struct class_device *diag_class_dev;
/* timer used to prevent stats overflow, error throttling, etc. */
struct timer_list ipath_stats_timer;
/* check for stale messages in rcv queue */
@@ -531,7 +533,7 @@ extern int __ipath_verbs_rcv(struct ipat
extern int __ipath_verbs_rcv(struct ipath_devdata *, void *, void *, u32);

void ipath_layer_add(struct ipath_devdata *);
-void ipath_layer_del(struct ipath_devdata *);
+void ipath_layer_remove(struct ipath_devdata *);

int ipath_init_chip(struct ipath_devdata *, int);
int ipath_enable_wc(struct ipath_devdata *dd);
@@ -545,14 +547,14 @@ void ipath_cdev_cleanup(struct cdev **cd
void ipath_cdev_cleanup(struct cdev **cdevp,
struct class_device **class_devp);

-int ipath_diag_init(void);
-void ipath_diag_cleanup(void);
+int ipath_diag_add(struct ipath_devdata *);
+void ipath_diag_remove(struct ipath_devdata *);
void ipath_diag_bringup_link(struct ipath_devdata *);

extern wait_queue_head_t ipath_sma_state_wait;

int ipath_user_add(struct ipath_devdata *dd);
-void ipath_user_del(struct ipath_devdata *dd);
+void ipath_user_remove(struct ipath_devdata *dd);

struct sk_buff *ipath_alloc_skb(struct ipath_devdata *dd, gfp_t);

@@ -831,9 +833,10 @@ extern struct mutex ipath_mutex;

#define IPATH_DRV_NAME "ipath_core"
#define IPATH_MAJOR 233
+#define IPATH_USER_MINOR_BASE 0
#define IPATH_SMA_MINOR 128
-#define IPATH_DIAG_MINOR 129
-#define IPATH_NMINORS 130
+#define IPATH_DIAG_MINOR_BASE 129
+#define IPATH_NMINORS 255

#define ipath_dev_err(dd,fmt,...) \
do { \
diff -r 7634b2f0fc40 -r 28d938eb0463 drivers/infiniband/hw/ipath/ipath_layer.c
--- a/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:29 2006 -0700
@@ -402,7 +402,7 @@ void ipath_layer_add(struct ipath_devdat
mutex_unlock(&ipath_layer_mutex);
}

-void ipath_layer_del(struct ipath_devdata *dd)
+void ipath_layer_remove(struct ipath_devdata *dd)
{
mutex_lock(&ipath_layer_mutex);

2006-05-12 23:45:08

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 48 of 53] ipath - QP should ignore receive queue size if SRQ specified

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r a1615956e57f -r 49b446b12f16 drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:29 2006 -0700
@@ -684,16 +684,22 @@ struct ib_qp *ipath_create_qp(struct ib_
ret = ERR_PTR(-ENOMEM);
goto bail;
}
- qp->r_rq.size = init_attr->cap.max_recv_wr + 1;
- sz = sizeof(struct ipath_sge) *
- init_attr->cap.max_recv_sge +
- sizeof(struct ipath_rwqe);
- qp->r_rq.wq = vmalloc(qp->r_rq.size * sz);
- if (!qp->r_rq.wq) {
- kfree(qp);
- vfree(swq);
- ret = ERR_PTR(-ENOMEM);
- goto bail;
+ if (init_attr->srq) {
+ qp->r_rq.size = 0;
+ qp->r_rq.max_sge = 0;
+ qp->r_rq.wq = NULL;
+ } else {
+ qp->r_rq.size = init_attr->cap.max_recv_wr + 1;
+ qp->r_rq.max_sge = init_attr->cap.max_recv_sge;
+ sz = (sizeof(struct ipath_sge) * qp->r_rq.max_sge) +
+ sizeof(struct ipath_rwqe);
+ qp->r_rq.wq = vmalloc(qp->r_rq.size * sz);
+ if (!qp->r_rq.wq) {
+ kfree(qp);
+ vfree(swq);
+ ret = ERR_PTR(-ENOMEM);
+ goto bail;
+ }
}

/*
@@ -712,7 +718,6 @@ struct ib_qp *ipath_create_qp(struct ib_
qp->s_wq = swq;
qp->s_size = init_attr->cap.max_send_wr + 1;
qp->s_max_sge = init_attr->cap.max_send_sge;
- qp->r_rq.max_sge = init_attr->cap.max_recv_sge;
qp->s_flags = init_attr->sq_sig_type == IB_SIGNAL_REQ_WR ?
1 << IPATH_S_SIGNAL_REQ_WR : 0;
dev = to_idev(ibpd->device);

2006-05-12 23:45:08

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 50 of 53] ipath - reduce maximum table sizes

Decrease the number of WRs and SGEs we support from 131071/255 to
16383/60. This decreases our maximum memory usage per QP from ~1800MB
down to about 40MB. This is still a lot, but it's better than 2GB.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 40532fdc53f0 -r bd1de2e983db drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:29 2006 -0700
@@ -73,12 +73,12 @@ module_param_named(max_cqs, ib_ipath_max
module_param_named(max_cqs, ib_ipath_max_cqs, uint, S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(max_cqs, "Maximum number of completion queues to support");

-unsigned int ib_ipath_max_qp_wrs = 0x1FFFF;
+unsigned int ib_ipath_max_qp_wrs = 0x3FFF;
module_param_named(max_qp_wrs, ib_ipath_max_qp_wrs, uint,
S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(max_qp_wrs, "Maximum number of QP WRs to support");

-unsigned int ib_ipath_max_sges = 0xFF;
+unsigned int ib_ipath_max_sges = 0x60;
module_param_named(max_sges, ib_ipath_max_sges, uint, S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(max_sges, "Maximum number of SGEs to support");

2006-05-12 23:45:07

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 45 of 53] ipath - fix memory leak when create of QP fails

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 28d938eb0463 -r b41e576e5202 drivers/infiniband/hw/ipath/ipath_init_chip.c
--- a/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_init_chip.c Fri May 12 15:55:29 2006 -0700
@@ -114,6 +114,7 @@ static int create_port0_egr(struct ipath
"eager TID %u\n", e);
while (e != 0)
dev_kfree_skb(skbs[--e]);
+ vfree(skbs);
ret = -ENOMEM;
goto bail;
}

2006-05-12 23:45:07

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 39 of 53] ipath - count PE800 receive interrupts on user ports

Fixed so it works on the PE-800. It had not previously been updated to
match PE-800 receive interrupt differences from HT-400.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r e9306861dc6a -r 5b565c24d62a drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:28 2006 -0700
@@ -1172,6 +1172,10 @@ static unsigned int ipath_poll(struct fi

if (tail == head) {
set_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag);
+ if(dd->ipath_rhdrhead_intr_off) /* arm rcv interrupt */
+ (void)ipath_write_ureg(dd, ur_rcvhdrhead,
+ dd->ipath_rhdrhead_intr_off
+ | head, pd->port_port);
poll_wait(fp, &pd->port_wait, pt);

if (test_bit(IPATH_PORT_WAITING_RCV, &pd->port_flag)) {

2006-05-12 23:45:06

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 32 of 53] ipath - fix NULL dereference during cleanup

Fix NULL deref due to pcidev being clobbered before dd->ipath_f_cleanup()
was called.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 4868daa7f215 -r b9fd1a46c910 drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700
@@ -1897,19 +1897,19 @@ static void __exit infinipath_cleanup(vo
} else
ipath_dbg("irq is 0, not doing free_irq "
"for unit %u\n", dd->ipath_unit);
+
+ /*
+ * we check for NULL here, because it's outside
+ * the kregbase check, and we need to call it
+ * after the free_irq. Thus it's possible that
+ * the function pointers were never initialized.
+ */
+ if (dd->ipath_f_cleanup)
+ /* clean up chip-specific stuff */
+ dd->ipath_f_cleanup(dd);
+
dd->pcidev = NULL;
}
-
- /*
- * we check for NULL here, because it's outside the kregbase
- * check, and we need to call it after the free_irq. Thus
- * it's possible that the function pointers were never
- * initialized.
- */
- if (dd->ipath_f_cleanup)
- /* clean up chip-specific stuff */
- dd->ipath_f_cleanup(dd);
-
spin_lock_irqsave(&ipath_devs_lock, flags);
}

2006-05-12 23:45:07

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 33 of 53] ipath - clean up some comments

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r b9fd1a46c910 -r 5ddaf7c07cdf drivers/infiniband/hw/ipath/ipath_kernel.h
--- a/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_kernel.h Fri May 12 15:55:28 2006 -0700
@@ -720,13 +720,8 @@ u64 ipath_read_kreg64_port(const struct
* @port: port number
*
* Return the contents of a register that is virtualized to be per port.
- * Prints a debug message and returns -1 on errors (not distinguishable from
- * valid contents at runtime; we may add a separate error variable at some
- * point).
- *
- * This is normally not used by the kernel, but may be for debugging, and
- * has a different implementation than user mode, which is why it's not in
- * _common.h.
+ * Returns -1 on errors (not distinguishable from valid contents at
+ * runtime; we may add a separate error variable at some point).
*/
static inline u32 ipath_read_ureg32(const struct ipath_devdata *dd,
ipath_ureg regno, int port)

2006-05-12 23:45:06

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 27 of 53] ipath - fix accounting of data packets with bad VLs

For better IB conformance.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 8e2d63833cf2 -r 551966b88d7c drivers/infiniband/hw/ipath/ipath_layer.c
--- a/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_layer.c Fri May 12 15:55:28 2006 -0700
@@ -1019,13 +1019,11 @@ int ipath_layer_get_counters(struct ipat
ipath_snap_cntr(dd, dd->ipath_cregs->cr_rxdroppktcnt) +
ipath_snap_cntr(dd, dd->ipath_cregs->cr_rcvovflcnt) +
ipath_snap_cntr(dd, dd->ipath_cregs->cr_portovflcnt) +
- ipath_snap_cntr(dd, dd->ipath_cregs->cr_errrcvflowctrlcnt) +
ipath_snap_cntr(dd, dd->ipath_cregs->cr_err_rlencnt) +
ipath_snap_cntr(dd, dd->ipath_cregs->cr_invalidrlencnt) +
ipath_snap_cntr(dd, dd->ipath_cregs->cr_erricrccnt) +
ipath_snap_cntr(dd, dd->ipath_cregs->cr_errvcrccnt) +
ipath_snap_cntr(dd, dd->ipath_cregs->cr_errlpcrccnt) +
- ipath_snap_cntr(dd, dd->ipath_cregs->cr_errlinkcnt) +
ipath_snap_cntr(dd, dd->ipath_cregs->cr_badformatcnt);
cntrs->port_rcv_remphys_errors =
ipath_snap_cntr(dd, dd->ipath_cregs->cr_rcvebpcnt);
diff -r 8e2d63833cf2 -r 551966b88d7c drivers/infiniband/hw/ipath/ipath_mad.c
--- a/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700
@@ -1316,32 +1316,8 @@ int ipath_process_mad(struct ib_device *
struct ib_wc *in_wc, struct ib_grh *in_grh,
struct ib_mad *in_mad, struct ib_mad *out_mad)
{
- struct ipath_ibdev *dev = to_idev(ibdev);
int ret;

- /*
- * Snapshot current HW counters to "clear" them.
- * This should be done when the driver is loaded except that for
- * some reason we get a zillion errors when brining up the link.
- */
- if (dev->rcv_errors == 0) {
- struct ipath_layer_counters cntrs;
-
- ipath_layer_get_counters(to_idev(ibdev)->dd, &cntrs);
- dev->rcv_errors++;
- dev->n_symbol_error_counter = cntrs.symbol_error_counter;
- dev->n_link_error_recovery_counter =
- cntrs.link_error_recovery_counter;
- dev->n_link_downed_counter = cntrs.link_downed_counter;
- dev->n_port_rcv_errors = cntrs.port_rcv_errors + 1;
- dev->n_port_rcv_remphys_errors =
- cntrs.port_rcv_remphys_errors;
- dev->n_port_xmit_discards = cntrs.port_xmit_discards;
- dev->n_port_xmit_data = cntrs.port_xmit_data;
- dev->n_port_rcv_data = cntrs.port_rcv_data;
- dev->n_port_xmit_packets = cntrs.port_xmit_packets;
- dev->n_port_rcv_packets = cntrs.port_rcv_packets;
- }
switch (in_mad->mad_hdr.mgmt_class) {
case IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE:
case IB_MGMT_CLASS_SUBN_LID_ROUTED:
diff -r 8e2d63833cf2 -r 551966b88d7c drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
@@ -981,6 +981,7 @@ static int ipath_verbs_register_sysfs(st
*/
static void *ipath_register_ib_device(int unit, struct ipath_devdata *dd)
{
+ struct ipath_layer_counters cntrs;
struct ipath_ibdev *idev;
struct ib_device *dev;
int ret;
@@ -1030,6 +1031,21 @@ static void *ipath_register_ib_device(in
idev->pma_counter_select[3] = IB_PMA_PORT_RCV_PKTS;
idev->pma_counter_select[5] = IB_PMA_PORT_XMIT_WAIT;
idev->link_width_enabled = 3; /* 1x or 4x */
+
+ /* Snapshot current HW counters to "clear" them. */
+ ipath_layer_get_counters(dd, &cntrs);
+ idev->n_symbol_error_counter = cntrs.symbol_error_counter;
+ idev->n_link_error_recovery_counter =
+ cntrs.link_error_recovery_counter;
+ idev->n_link_downed_counter = cntrs.link_downed_counter;
+ idev->n_port_rcv_errors = cntrs.port_rcv_errors;
+ idev->n_port_rcv_remphys_errors =
+ cntrs.port_rcv_remphys_errors;
+ idev->n_port_xmit_discards = cntrs.port_xmit_discards;
+ idev->n_port_xmit_data = cntrs.port_xmit_data;
+ idev->n_port_rcv_data = cntrs.port_rcv_data;
+ idev->n_port_xmit_packets = cntrs.port_xmit_packets;
+ idev->n_port_rcv_packets = cntrs.port_rcv_packets;

/*
* The system image GUID is supposed to be the same for all

2006-05-12 23:54:56

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 37 of 53] ipath - name zero counter offsets consistently

Name zero counter offsets consistently so it's clear they aren't counters.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r ec1934faf5d1 -r f8debae94d44 drivers/infiniband/hw/ipath/ipath_mad.c
--- a/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700
@@ -251,7 +251,7 @@ static int recv_subn_get_portinfo(struct
/* P_KeyViolations are counted by hardware. */
pip->pkey_violations =
cpu_to_be16((ipath_layer_get_cr_errpkey(dev->dd) -
- dev->n_pkey_violations) & 0xFFFF);
+ dev->z_pkey_violations) & 0xFFFF);
pip->qkey_violations = cpu_to_be16(dev->qkey_violations);
/* Only the hardware GUID is supported for now */
pip->guid_cap = 1;
@@ -425,7 +425,7 @@ static int recv_subn_set_portinfo(struct
* later.
*/
if (pip->pkey_violations == 0)
- dev->n_pkey_violations =
+ dev->z_pkey_violations =
ipath_layer_get_cr_errpkey(dev->dd);

if (pip->qkey_violations == 0)
@@ -883,18 +883,18 @@ static int recv_pma_get_portcounters(str
ipath_layer_get_counters(dev->dd, &cntrs);

/* Adjust counters for any resets done. */
- cntrs.symbol_error_counter -= dev->n_symbol_error_counter;
+ cntrs.symbol_error_counter -= dev->z_symbol_error_counter;
cntrs.link_error_recovery_counter -=
- dev->n_link_error_recovery_counter;
- cntrs.link_downed_counter -= dev->n_link_downed_counter;
+ dev->z_link_error_recovery_counter;
+ cntrs.link_downed_counter -= dev->z_link_downed_counter;
cntrs.port_rcv_errors += dev->rcv_errors;
- cntrs.port_rcv_errors -= dev->n_port_rcv_errors;
- cntrs.port_rcv_remphys_errors -= dev->n_port_rcv_remphys_errors;
- cntrs.port_xmit_discards -= dev->n_port_xmit_discards;
- cntrs.port_xmit_data -= dev->n_port_xmit_data;
- cntrs.port_rcv_data -= dev->n_port_rcv_data;
- cntrs.port_xmit_packets -= dev->n_port_xmit_packets;
- cntrs.port_rcv_packets -= dev->n_port_rcv_packets;
+ cntrs.port_rcv_errors -= dev->z_port_rcv_errors;
+ cntrs.port_rcv_remphys_errors -= dev->z_port_rcv_remphys_errors;
+ cntrs.port_xmit_discards -= dev->z_port_xmit_discards;
+ cntrs.port_xmit_data -= dev->z_port_xmit_data;
+ cntrs.port_rcv_data -= dev->z_port_rcv_data;
+ cntrs.port_xmit_packets -= dev->z_port_xmit_packets;
+ cntrs.port_rcv_packets -= dev->z_port_rcv_packets;
cntrs.local_link_integrity_errors -=
dev->z_local_link_integrity_errors;
cntrs.excessive_buffer_overrun_errors -=
@@ -981,10 +981,10 @@ static int recv_pma_get_portcounters_ext
&rpkts, &xwait);

/* Adjust counters for any resets done. */
- swords -= dev->n_port_xmit_data;
- rwords -= dev->n_port_rcv_data;
- spkts -= dev->n_port_xmit_packets;
- rpkts -= dev->n_port_rcv_packets;
+ swords -= dev->z_port_xmit_data;
+ rwords -= dev->z_port_rcv_data;
+ spkts -= dev->z_port_xmit_packets;
+ rpkts -= dev->z_port_rcv_packets;

memset(pmp->data, 0, sizeof(pmp->data));

@@ -1020,25 +1020,25 @@ static int recv_pma_set_portcounters(str
ipath_layer_get_counters(dev->dd, &cntrs);

if (p->counter_select & IB_PMA_SEL_SYMBOL_ERROR)
- dev->n_symbol_error_counter = cntrs.symbol_error_counter;
+ dev->z_symbol_error_counter = cntrs.symbol_error_counter;

if (p->counter_select & IB_PMA_SEL_LINK_ERROR_RECOVERY)
- dev->n_link_error_recovery_counter =
+ dev->z_link_error_recovery_counter =
cntrs.link_error_recovery_counter;

if (p->counter_select & IB_PMA_SEL_LINK_DOWNED)
- dev->n_link_downed_counter = cntrs.link_downed_counter;
+ dev->z_link_downed_counter = cntrs.link_downed_counter;

if (p->counter_select & IB_PMA_SEL_PORT_RCV_ERRORS)
- dev->n_port_rcv_errors =
+ dev->z_port_rcv_errors =
cntrs.port_rcv_errors + dev->rcv_errors;

if (p->counter_select & IB_PMA_SEL_PORT_RCV_REMPHYS_ERRORS)
- dev->n_port_rcv_remphys_errors =
+ dev->z_port_rcv_remphys_errors =
cntrs.port_rcv_remphys_errors;

if (p->counter_select & IB_PMA_SEL_PORT_XMIT_DISCARDS)
- dev->n_port_xmit_discards = cntrs.port_xmit_discards;
+ dev->z_port_xmit_discards = cntrs.port_xmit_discards;

if (p->counter_select & IB_PMA_SEL_LOCAL_LINK_INTEGRITY_ERRORS)
dev->z_local_link_integrity_errors =
@@ -1052,16 +1052,16 @@ static int recv_pma_set_portcounters(str
dev->n_vl15_dropped = 0;

if (p->counter_select & IB_PMA_SEL_PORT_XMIT_DATA)
- dev->n_port_xmit_data = cntrs.port_xmit_data;
+ dev->z_port_xmit_data = cntrs.port_xmit_data;

if (p->counter_select & IB_PMA_SEL_PORT_RCV_DATA)
- dev->n_port_rcv_data = cntrs.port_rcv_data;
+ dev->z_port_rcv_data = cntrs.port_rcv_data;

if (p->counter_select & IB_PMA_SEL_PORT_XMIT_PACKETS)
- dev->n_port_xmit_packets = cntrs.port_xmit_packets;
+ dev->z_port_xmit_packets = cntrs.port_xmit_packets;

if (p->counter_select & IB_PMA_SEL_PORT_RCV_PACKETS)
- dev->n_port_rcv_packets = cntrs.port_rcv_packets;
+ dev->z_port_rcv_packets = cntrs.port_rcv_packets;

return recv_pma_get_portcounters(pmp, ibdev, port);
}
@@ -1078,16 +1078,16 @@ static int recv_pma_set_portcounters_ext
&rpkts, &xwait);

if (p->counter_select & IB_PMA_SELX_PORT_XMIT_DATA)
- dev->n_port_xmit_data = swords;
+ dev->z_port_xmit_data = swords;

if (p->counter_select & IB_PMA_SELX_PORT_RCV_DATA)
- dev->n_port_rcv_data = rwords;
+ dev->z_port_rcv_data = rwords;

if (p->counter_select & IB_PMA_SELX_PORT_XMIT_PACKETS)
- dev->n_port_xmit_packets = spkts;
+ dev->z_port_xmit_packets = spkts;

if (p->counter_select & IB_PMA_SELX_PORT_RCV_PACKETS)
- dev->n_port_rcv_packets = rpkts;
+ dev->z_port_rcv_packets = rpkts;

if (p->counter_select & IB_PMA_SELX_PORT_UNI_XMIT_PACKETS)
dev->n_unicast_xmit = 0;
diff -r ec1934faf5d1 -r f8debae94d44 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
@@ -700,7 +700,7 @@ static int ipath_query_port(struct ib_de
props->max_msg_sz = 4096;
props->pkey_tbl_len = ipath_layer_get_npkeys(dev->dd);
props->bad_pkey_cntr = ipath_layer_get_cr_errpkey(dev->dd) -
- dev->n_pkey_violations;
+ dev->z_pkey_violations;
props->qkey_viol_cntr = dev->qkey_violations;
props->active_width = IB_WIDTH_4X;
/* See rate_show() */
@@ -1034,18 +1034,18 @@ static void *ipath_register_ib_device(in

/* Snapshot current HW counters to "clear" them. */
ipath_layer_get_counters(dd, &cntrs);
- idev->n_symbol_error_counter = cntrs.symbol_error_counter;
- idev->n_link_error_recovery_counter =
+ idev->z_symbol_error_counter = cntrs.symbol_error_counter;
+ idev->z_link_error_recovery_counter =
cntrs.link_error_recovery_counter;
- idev->n_link_downed_counter = cntrs.link_downed_counter;
- idev->n_port_rcv_errors = cntrs.port_rcv_errors;
- idev->n_port_rcv_remphys_errors =
+ idev->z_link_downed_counter = cntrs.link_downed_counter;
+ idev->z_port_rcv_errors = cntrs.port_rcv_errors;
+ idev->z_port_rcv_remphys_errors =
cntrs.port_rcv_remphys_errors;
- idev->n_port_xmit_discards = cntrs.port_xmit_discards;
- idev->n_port_xmit_data = cntrs.port_xmit_data;
- idev->n_port_rcv_data = cntrs.port_rcv_data;
- idev->n_port_xmit_packets = cntrs.port_xmit_packets;
- idev->n_port_rcv_packets = cntrs.port_rcv_packets;
+ idev->z_port_xmit_discards = cntrs.port_xmit_discards;
+ idev->z_port_xmit_data = cntrs.port_xmit_data;
+ idev->z_port_rcv_data = cntrs.port_rcv_data;
+ idev->z_port_xmit_packets = cntrs.port_xmit_packets;
+ idev->z_port_rcv_packets = cntrs.port_rcv_packets;
idev->z_local_link_integrity_errors =
cntrs.local_link_integrity_errors;
idev->z_excessive_buffer_overrun_errors =
diff -r ec1934faf5d1 -r f8debae94d44 drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
@@ -448,17 +448,17 @@ struct ipath_ibdev {
u64 n_unicast_rcv; /* total unicast packets received */
u64 n_multicast_xmit; /* total multicast packets sent */
u64 n_multicast_rcv; /* total multicast packets received */
- u64 n_symbol_error_counter; /* starting count for PMA */
- u64 n_link_error_recovery_counter; /* starting count for PMA */
- u64 n_link_downed_counter; /* starting count for PMA */
- u64 n_port_rcv_errors; /* starting count for PMA */
- u64 n_port_rcv_remphys_errors; /* starting count for PMA */
- u64 n_port_xmit_discards; /* starting count for PMA */
- u64 n_port_xmit_data; /* starting count for PMA */
- u64 n_port_rcv_data; /* starting count for PMA */
- u64 n_port_xmit_packets; /* starting count for PMA */
- u64 n_port_rcv_packets; /* starting count for PMA */
- u32 n_pkey_violations; /* starting count for PMA */
+ u64 z_symbol_error_counter; /* starting count for PMA */
+ u64 z_link_error_recovery_counter; /* starting count for PMA */
+ u64 z_link_downed_counter; /* starting count for PMA */
+ u64 z_port_rcv_errors; /* starting count for PMA */
+ u64 z_port_rcv_remphys_errors; /* starting count for PMA */
+ u64 z_port_xmit_discards; /* starting count for PMA */
+ u64 z_port_xmit_data; /* starting count for PMA */
+ u64 z_port_rcv_data; /* starting count for PMA */
+ u64 z_port_xmit_packets; /* starting count for PMA */
+ u64 z_port_rcv_packets; /* starting count for PMA */
+ u32 z_pkey_violations; /* starting count for PMA */
u32 z_local_link_integrity_errors; /* starting count for PMA */
u32 z_excessive_buffer_overrun_errors; /* starting count for PMA */
u32 n_rc_resends;

2006-05-12 23:56:22

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 47 of 53] ipath - fix problem with lost interrupts on HT-400

We can have a race clearing chip interrupt with another interrupt
about to be delivered and can clear it before it is delivered on the
GPIO workaround. By doing the extra check here for the in-memory tail
register updating while we were doing earlier packets, we "almost"
guarantee we have covered that case.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 04c86dd11b27 -r a1615956e57f drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:29 2006 -0700
@@ -858,7 +858,7 @@ void ipath_kreceive(struct ipath_devdata
const u32 maxcnt = dd->ipath_rcvhdrcnt * rsize; /* words */
u32 etail = -1, l, hdrqtail;
struct ips_message_header *hdr;
- u32 eflags, i, etype, tlen, pkttot = 0, updegr=0;
+ u32 eflags, i, etype, tlen, pkttot = 0, updegr=0, reloop=0;
static u64 totcalls; /* stats, may eventually remove */
char emsg[128];

@@ -873,12 +873,11 @@ void ipath_kreceive(struct ipath_devdata
goto bail;

l = dd->ipath_port0head;
- if(l == (u32)le64_to_cpu(*dd->ipath_hdrqtailptr))
+ hdrqtail = (u32)le64_to_cpu(*dd->ipath_hdrqtailptr);
+ if(l == hdrqtail)
goto done;

- /* read only once at start for performance */
- hdrqtail = (u32)le64_to_cpu(*dd->ipath_hdrqtailptr);
-
+reloop:
for (i = 0; l != hdrqtail; i++) {
u32 qp;
u8 *bthbytes;
@@ -1013,16 +1012,34 @@ void ipath_kreceive(struct ipath_devdata
*/
if(l == hdrqtail || (i && !(i&0xf))) {
u64 lval;
- if(l == hdrqtail) /* want interrupt only on last */
+ if(l == hdrqtail) {
+ /* PE-800 interrupt only on last */
lval = dd->ipath_rhdrhead_intr_off | l;
+ }
else
lval = l;
(void)ipath_write_ureg(dd, ur_rcvhdrhead, lval, 0);
if(updegr) {
- (void)ipath_write_ureg(dd, ur_rcvegrindexhead,
+ ipath_write_ureg(dd, ur_rcvegrindexhead,
etail, 0);
updegr = 0;
}
+ }
+ }
+ if(!dd->ipath_rhdrhead_intr_off && !reloop) {
+ /* HT-400 workaround; we can have a race clearing chip
+ * interrupt with another interrupt about to be delivered,
+ * and can clear it before it is delivered on the GPIO
+ * workaround. By doing the extra check here for the
+ * in-memory tail register updating while we were doing
+ * earlier packets, we "almost" guarantee we have covered
+ * that case.
+ */
+ u32 hqtail = (u32)le64_to_cpu(*dd->ipath_hdrqtailptr);
+ if(hqtail != hdrqtail) {
+ hdrqtail = hqtail;
+ reloop = 1; /* loop 1 extra time at most */
+ goto reloop;
}
}

diff -r 04c86dd11b27 -r a1615956e57f drivers/infiniband/hw/ipath/ipath_intr.c
--- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:29 2006 -0700
@@ -761,13 +761,14 @@ static void handle_urcv(struct ipath_dev
}


+
irqreturn_t ipath_intr(int irq, void *data, struct pt_regs *regs)
{
struct ipath_devdata *dd = data;
u32 istat, chk0rcv = 0;
ipath_err_t estat = 0;
irqreturn_t ret;
- u32 p0bits, oldhead;
+ u32 oldhead, curtail;
static unsigned unexpected = 0;
static const u32 port0rbits = (1U<<INFINIPATH_I_RCVAVAIL_SHIFT) |
(1U<<INFINIPATH_I_RCVURG_SHIFT);
@@ -810,15 +811,16 @@ irqreturn_t ipath_intr(int irq, void *da
* lose intr for later packets that arrive while we are processing.
*/
oldhead = dd->ipath_port0head;
- if (oldhead != (u32)le64_to_cpu(*dd->ipath_hdrqtailptr)) {
+ curtail = (u32)le64_to_cpu(*dd->ipath_hdrqtailptr);
+ if (oldhead != curtail) {
if(dd->ipath_flags & IPATH_GPIO_INTR) {
ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear,
(u64) (1 << 2));
- p0bits = port0rbits | INFINIPATH_I_GPIO;
+ istat = port0rbits | INFINIPATH_I_GPIO;
}
else
- p0bits = port0rbits;
- ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, p0bits);
+ istat = port0rbits;
+ ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, istat);
ipath_kreceive(dd);
if(oldhead != dd->ipath_port0head) {
ipath_stats.sps_fastrcvint++;
@@ -827,7 +829,6 @@ irqreturn_t ipath_intr(int irq, void *da
}

istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus);
- p0bits = port0rbits;

if (unlikely(!istat)) {
ipath_stats.sps_nullintr++;
@@ -890,19 +891,19 @@ irqreturn_t ipath_intr(int irq, void *da
else {
/* Clear GPIO status bit 2 */
ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear,
- (u64) (1 << 2));
- p0bits |= INFINIPATH_I_GPIO;
+ (u64) (1 << 2));
chk0rcv = 1;
}
}
- chk0rcv |= istat & p0bits;
-
- /*
- * clear the ones we will deal with on this round
- * We clear it early, mostly for receive interrupts, so we
- * know the chip will have seen this by the time we process
- * the queue, and will re-interrupt if necessary. The processor
- * itself won't take the interrupt again until we return.
+ chk0rcv |= istat & port0rbits;
+
+ /*
+ * Clear the interrupt bits we found set, unless they are receive
+ * related, in which case we already cleared them above, and don't
+ * want to clear them again, because we might lose an interrupt.
+ * Clear it early, so we "know" know the chip will have seen this by
+ * the time we process the queue, and will re-interrupt if necessary.
+ * The processor itself won't take the interrupt again until we return.
*/
ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, istat);

2006-05-12 23:45:03

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 18 of 53] ipath - make max mcast sizes configurable

Make the max IB mcast sizes configurable.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r c5f3731224bb -r df954e47ff67 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
@@ -81,6 +81,32 @@ unsigned int ib_ipath_max_sges = 0xFF;
unsigned int ib_ipath_max_sges = 0xFF;
module_param_named(max_sges, ib_ipath_max_sges, uint, S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(max_sges, "Maximum number of SGEs to support");
+
+unsigned int ib_ipath_max_mcast_grps = 16384;
+module_param_named(max_mcast_grps, ib_ipath_max_mcast_grps, uint,
+ S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(max_mcast_grps,
+ "Maximum number of multicast groups to support");
+
+unsigned int ib_ipath_max_mcast_qp_attached = 16;
+module_param_named(max_mcast_qp_attached, ib_ipath_max_mcast_qp_attached,
+ uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(max_mcast_qp_attached,
+ "Maximum number of attached QPs to support");
+
+unsigned int ib_ipath_max_srqs = 1024;
+module_param_named(max_srqs, ib_ipath_max_srqs, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(max_srqs, "Maximum number of SRQs to support");
+
+unsigned int ib_ipath_max_srq_sges = 128;
+module_param_named(max_srq_sges, ib_ipath_max_srq_sges,
+ uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(max_srq_sges, "Maximum number of SRQ SGEs to support");
+
+unsigned int ib_ipath_max_srq_wrs = 0x1FFFF;
+module_param_named(max_srq_wrs, ib_ipath_max_srq_wrs,
+ uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(max_srq_wrs, "Maximum number of SRQ WRs support");

MODULE_LICENSE("GPL");
MODULE_AUTHOR("PathScale <[email protected]>");
@@ -621,14 +647,14 @@ static int ipath_query_device(struct ib_
props->max_qp_rd_atom = 1;
props->max_qp_init_rd_atom = 1;
/* props->max_res_rd_atom */
- props->max_srq = 0xffff;
- props->max_srq_wr = 0xffff;
- props->max_srq_sge = 255;
+ props->max_srq = ib_ipath_max_srqs;
+ props->max_srq_wr = ib_ipath_max_srq_wrs;
+ props->max_srq_sge = ib_ipath_max_srq_sges;
/* props->local_ca_ack_delay */
props->atomic_cap = IB_ATOMIC_HCA;
props->max_pkeys = ipath_layer_get_npkeys(dev->dd);
- props->max_mcast_grp = 0xffff;
- props->max_mcast_qp_attach = 0xffff;
+ props->max_mcast_grp = ib_ipath_max_mcast_grps;
+ props->max_mcast_qp_attach = ib_ipath_max_mcast_qp_attached;
props->max_total_mcast_qp_attach = props->max_mcast_qp_attach *
props->max_mcast_grp;

diff -r c5f3731224bb -r df954e47ff67 drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
@@ -148,6 +148,7 @@ struct ipath_mcast {
struct list_head qp_list;
wait_queue_head_t wait;
atomic_t refcount;
+ int n_attached;
};

/* Memory region */
@@ -434,6 +435,7 @@ struct ipath_ibdev {
u32 n_pds_allocated; /* number of PDs allocated for device */
u32 n_ahs_allocated; /* number of AHs allocated for device */
u32 n_cqs_allocated; /* number of CQs allocated for device */
+ u32 n_mcast_grps_allocated; /* number of mcast groups allocated */
u64 ipath_sword; /* total dwords sent (sample result) */
u64 ipath_rword; /* total dwords received (sample result) */
u64 ipath_spkts; /* total packets sent (sample result) */
@@ -699,6 +701,16 @@ extern unsigned int ib_ipath_max_qp_wrs;

extern unsigned int ib_ipath_max_sges;

+extern unsigned int ib_ipath_max_mcast_grps;
+
+extern unsigned int ib_ipath_max_mcast_qp_attached;
+
+extern unsigned int ib_ipath_max_srqs;
+
+extern unsigned int ib_ipath_max_srq_sges;
+
+extern unsigned int ib_ipath_max_srq_wrs;
+
extern const u32 ib_ipath_rnr_table[];

#endif /* IPATH_VERBS_H */
diff -r c5f3731224bb -r df954e47ff67 drivers/infiniband/hw/ipath/ipath_verbs_mcast.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs_mcast.c Fri May 12 15:55:28 2006 -0700
@@ -92,6 +92,7 @@ static struct ipath_mcast *ipath_mcast_a
INIT_LIST_HEAD(&mcast->qp_list);
init_waitqueue_head(&mcast->wait);
atomic_set(&mcast->refcount, 0);
+ mcast->n_attached = 0;

bail:
return mcast;
@@ -157,7 +158,8 @@ bail:
* the table but the QP was added. Return ESRCH if the QP was already
* attached and neither structure was added.
*/
-static int ipath_mcast_add(struct ipath_mcast *mcast,
+static int ipath_mcast_add(struct ipath_ibdev *dev,
+ struct ipath_mcast *mcast,
struct ipath_mcast_qp *mqp)
{
struct rb_node **n = &mcast_tree.rb_node;
@@ -188,16 +190,28 @@ static int ipath_mcast_add(struct ipath_
/* Search the QP list to see if this is already there. */
list_for_each_entry_rcu(p, &tmcast->qp_list, list) {
if (p->qp == mqp->qp) {
- spin_unlock_irqrestore(&mcast_lock, flags);
ret = ESRCH;
goto bail;
}
}
+ if (tmcast->n_attached == ib_ipath_max_mcast_qp_attached) {
+ ret = ENOMEM;
+ goto bail;
+ }
+
+ tmcast->n_attached++;
+
list_add_tail_rcu(&mqp->list, &tmcast->qp_list);
- spin_unlock_irqrestore(&mcast_lock, flags);
ret = EEXIST;
goto bail;
}
+
+ if (dev->n_mcast_grps_allocated == ib_ipath_max_mcast_grps) {
+ ret = ENOMEM;
+ goto bail;
+ }
+
+ dev->n_mcast_grps_allocated++;

list_add_tail_rcu(&mqp->list, &mcast->qp_list);

@@ -205,17 +219,18 @@ static int ipath_mcast_add(struct ipath_
rb_link_node(&mcast->rb_node, pn, n);
rb_insert_color(&mcast->rb_node, &mcast_tree);

+ ret = 0;
+
+bail:
spin_unlock_irqrestore(&mcast_lock, flags);

- ret = 0;
-
-bail:
return ret;
}

int ipath_multicast_attach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
{
struct ipath_qp *qp = to_iqp(ibqp);
+ struct ipath_ibdev *dev = to_idev(ibqp->device);
struct ipath_mcast *mcast;
struct ipath_mcast_qp *mqp;
int ret;
@@ -235,7 +250,7 @@ int ipath_multicast_attach(struct ib_qp
ret = -ENOMEM;
goto bail;
}
- switch (ipath_mcast_add(mcast, mqp)) {
+ switch (ipath_mcast_add(dev, mcast, mqp)) {
case ESRCH:
/* Neither was used: can't attach the same QP twice. */
ipath_mcast_qp_free(mqp);
@@ -245,6 +260,12 @@ int ipath_multicast_attach(struct ib_qp
case EEXIST: /* The mcast wasn't used */
ipath_mcast_free(mcast);
break;
+ case ENOMEM:
+ /* Exceeded the maximum number of mcast groups. */
+ ipath_mcast_qp_free(mqp);
+ ipath_mcast_free(mcast);
+ ret = -ENOMEM;
+ goto bail;
default:
break;
}
@@ -258,6 +279,7 @@ int ipath_multicast_detach(struct ib_qp
int ipath_multicast_detach(struct ib_qp *ibqp, union ib_gid *gid, u16 lid)
{
struct ipath_qp *qp = to_iqp(ibqp);
+ struct ipath_ibdev *dev = to_idev(ibqp->device);
struct ipath_mcast *mcast = NULL;
struct ipath_mcast_qp *p, *tmp;
struct rb_node *n;
@@ -296,6 +318,7 @@ int ipath_multicast_detach(struct ib_qp
* link until we are sure there are no list walkers.
*/
list_del_rcu(&p->list);
+ mcast->n_attached--;

/* If this was the last attached QP, remove the GID too. */
if (list_empty(&mcast->qp_list)) {
@@ -319,6 +342,7 @@ int ipath_multicast_detach(struct ib_qp
atomic_dec(&mcast->refcount);
wait_event(mcast->wait, !atomic_read(&mcast->refcount));
ipath_mcast_free(mcast);
+ dev->n_mcast_grps_allocated--;
}

ret = 0;

2006-05-12 23:56:23

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 49 of 53] ipath - NULL-terminate pci_device_id table

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 49b446b12f16 -r 40532fdc53f0 drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:29 2006 -0700
@@ -120,6 +120,7 @@ static const struct pci_device_id ipath_
PCI_DEVICE_ID_INFINIPATH_HT)},
{PCI_DEVICE(PCI_VENDOR_ID_PATHSCALE,
PCI_DEVICE_ID_INFINIPATH_PE800)},
+ {0}
};

MODULE_DEVICE_TABLE(pci, ipath_pci_tbl);

2006-05-12 23:57:14

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 40 of 53] ipath - remember to drop spinlock

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 5b565c24d62a -r 160a111381ae drivers/infiniband/hw/ipath/ipath_rc.c
--- a/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:29 2006 -0700
@@ -1505,8 +1505,10 @@ void ipath_rc_rcv(struct ipath_ibdev *de
ok = ipath_rkey_ok(dev, &qp->s_rdma_sge,
qp->s_rdma_len, vaddr, rkey,
IB_ACCESS_REMOTE_READ);
- if (unlikely(!ok))
+ if (unlikely(!ok)) {
+ spin_unlock_irq(&qp->s_lock);
goto nack_acc;
+ }
/*
* Update the next expected PSN. We add 1 later
* below, so only add the remainder here.

2006-05-12 23:57:11

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 42 of 53] ipath - increment pointer properly when doing a diag read

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 83f1832c6015 -r 0aba84dce506 drivers/infiniband/hw/ipath/ipath_diag.c
--- a/drivers/infiniband/hw/ipath/ipath_diag.c Fri May 12 15:55:29 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_diag.c Fri May 12 15:55:29 2006 -0700
@@ -113,7 +113,7 @@ static int ipath_read_umem64(struct ipat
goto bail;
}
reg_addr++;
- uaddr++;
+ uaddr += sizeof(u64);
}
ret = 0;
bail:
@@ -153,7 +153,7 @@ static int ipath_write_umem64(struct ipa
writeq(data, reg_addr);

reg_addr++;
- uaddr++;
+ uaddr += sizeof(u64);
}
ret = 0;
bail:
@@ -191,7 +191,8 @@ static int ipath_read_umem32(struct ipat
}

reg_addr++;
- uaddr++;
+ uaddr += sizeof(u32);
+
}
ret = 0;
bail:
@@ -230,7 +231,7 @@ static int ipath_write_umem32(struct ipa
writel(data, reg_addr);

reg_addr++;
- uaddr++;
+ uaddr += sizeof(u32);
}
ret = 0;
bail:

2006-05-12 23:57:55

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 25 of 53] ipath - remove some duplicated lines of code

Cosmetic fixes.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r e468ad0bd83e -r 2b7918a7133e drivers/infiniband/hw/ipath/ipath_ht400.c
--- a/drivers/infiniband/hw/ipath/ipath_ht400.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ht400.c Fri May 12 15:55:28 2006 -0700
@@ -1555,7 +1555,6 @@ void ipath_init_ht400_funcs(struct ipath
dd->ipath_f_reset = ipath_setup_ht_reset;
dd->ipath_f_get_boardname = ipath_ht_boardname;
dd->ipath_f_init_hwerrors = ipath_ht_init_hwerrors;
- dd->ipath_f_init_hwerrors = ipath_ht_init_hwerrors;
dd->ipath_f_early_init = ipath_ht_early_init;
dd->ipath_f_handle_hwerrors = ipath_ht_handle_hwerrors;
dd->ipath_f_quiet_serdes = ipath_ht_quiet_serdes;
diff -r e468ad0bd83e -r 2b7918a7133e drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700
@@ -513,9 +513,6 @@ int ipath_modify_qp(struct ib_qp *ibqp,
if (attr_mask & IB_QP_QKEY)
qp->qkey = attr->qkey;

- if (attr_mask & IB_QP_PKEY_INDEX)
- qp->s_pkey_index = attr->pkey_index;
-
qp->state = new_state;
spin_unlock_irqrestore(&qp->s_lock, flags);

2006-05-12 23:58:00

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 26 of 53] ipath - treat PE800 rev1 and rev2 as similar

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 2b7918a7133e -r 8e2d63833cf2 drivers/infiniband/hw/ipath/ipath_pe800.c
--- a/drivers/infiniband/hw/ipath/ipath_pe800.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_pe800.c Fri May 12 15:55:28 2006 -0700
@@ -532,7 +532,7 @@ static int ipath_pe_boardname(struct ipa
if (n)
snprintf(name, namelen, "%s", n);

- if (dd->ipath_majrev != 4 || dd->ipath_minrev != 1) {
+ if (dd->ipath_majrev != 4 || !dd->ipath_minrev || dd->ipath_minrev>2) {
ipath_dev_err(dd, "Unsupported PE-800 revision %u.%u!\n",
dd->ipath_majrev, dd->ipath_minrev);
ret = 1;

2006-05-12 23:58:01

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt

I think Roland already has this patch.

diff -r 201654fe1962 -r 4e0a07d20868 drivers/infiniband/hw/ipath/ipath_keys.c
--- a/drivers/infiniband/hw/ipath/ipath_keys.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_keys.c Fri May 12 15:55:28 2006 -0700
@@ -126,11 +126,11 @@ int ipath_lkey_ok(struct ipath_lkey_tabl
/*
* We use LKEY == zero to mean a physical kmalloc() address.
* This is a bit of a hack since we rely on dma_map_single()
- * being reversible by calling bus_to_virt().
+ * being reversible by calling phys_to_virt().
*/
if (sge->lkey == 0) {
isge->mr = NULL;
- isge->vaddr = bus_to_virt(sge->addr);
+ isge->vaddr = phys_to_virt(sge->addr);
isge->length = sge->length;
isge->sge_length = sge->length;
ret = 1;

2006-05-12 23:58:51

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 19 of 53] ipath - replace uses of LIST_POISON

Per Andrew's request.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r df954e47ff67 -r 947e92f4b370 drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700
@@ -375,10 +375,10 @@ static void ipath_error_qp(struct ipath_

spin_lock(&dev->pending_lock);
/* XXX What if its already removed by the timeout code? */
- if (qp->timerwait.next != LIST_POISON1)
- list_del(&qp->timerwait);
- if (qp->piowait.next != LIST_POISON1)
- list_del(&qp->piowait);
+ if (!list_empty(&qp->timerwait))
+ list_del_init(&qp->timerwait);
+ if (!list_empty(&qp->piowait))
+ list_del_init(&qp->piowait);
spin_unlock(&dev->pending_lock);

wc.status = IB_WC_WR_FLUSH_ERR;
@@ -722,10 +722,8 @@ struct ib_qp *ipath_create_qp(struct ib_
init_attr->qp_type == IB_QPT_RC ?
ipath_do_rc_send : ipath_do_uc_send,
(unsigned long)qp);
- qp->piowait.next = LIST_POISON1;
- qp->piowait.prev = LIST_POISON2;
- qp->timerwait.next = LIST_POISON1;
- qp->timerwait.prev = LIST_POISON2;
+ INIT_LIST_HEAD(&qp->piowait);
+ INIT_LIST_HEAD(&qp->timerwait);
qp->state = IB_QPS_RESET;
qp->s_wq = swq;
qp->s_size = init_attr->cap.max_send_wr + 1;
@@ -795,10 +793,10 @@ int ipath_destroy_qp(struct ib_qp *ibqp)

/* Make sure the QP isn't on the timeout list. */
spin_lock_irqsave(&dev->pending_lock, flags);
- if (qp->timerwait.next != LIST_POISON1)
- list_del(&qp->timerwait);
- if (qp->piowait.next != LIST_POISON1)
- list_del(&qp->piowait);
+ if (!list_empty(&qp->timerwait))
+ list_del_init(&qp->timerwait);
+ if (!list_empty(&qp->piowait))
+ list_del_init(&qp->piowait);
spin_unlock_irqrestore(&dev->pending_lock, flags);

/*
@@ -867,10 +865,10 @@ void ipath_sqerror_qp(struct ipath_qp *q

spin_lock(&dev->pending_lock);
/* XXX What if its already removed by the timeout code? */
- if (qp->timerwait.next != LIST_POISON1)
- list_del(&qp->timerwait);
- if (qp->piowait.next != LIST_POISON1)
- list_del(&qp->piowait);
+ if (!list_empty(&qp->timerwait))
+ list_del_init(&qp->timerwait);
+ if (!list_empty(&qp->piowait))
+ list_del_init(&qp->piowait);
spin_unlock(&dev->pending_lock);

ipath_cq_enter(to_icq(qp->ibqp.send_cq), wc, 1);
diff -r df954e47ff67 -r 947e92f4b370 drivers/infiniband/hw/ipath/ipath_rc.c
--- a/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700
@@ -57,7 +57,7 @@ static void ipath_init_restart(struct ip
qp->s_len = wqe->length - len;
dev = to_idev(qp->ibqp.device);
spin_lock(&dev->pending_lock);
- if (qp->timerwait.next == LIST_POISON1)
+ if (list_empty(&qp->timerwait))
list_add_tail(&qp->timerwait,
&dev->pending[dev->pending_index]);
spin_unlock(&dev->pending_lock);
@@ -356,7 +356,7 @@ static inline int ipath_make_rc_req(stru
if ((int)(qp->s_psn - qp->s_next_psn) > 0)
qp->s_next_psn = qp->s_psn;
spin_lock(&dev->pending_lock);
- if (qp->timerwait.next == LIST_POISON1)
+ if (list_empty(&qp->timerwait))
list_add_tail(&qp->timerwait,
&dev->pending[dev->pending_index]);
spin_unlock(&dev->pending_lock);
@@ -726,8 +726,8 @@ void ipath_restart_rc(struct ipath_qp *q
*/
dev = to_idev(qp->ibqp.device);
spin_lock(&dev->pending_lock);
- if (qp->timerwait.next != LIST_POISON1)
- list_del(&qp->timerwait);
+ if (!list_empty(&qp->timerwait))
+ list_del_init(&qp->timerwait);
spin_unlock(&dev->pending_lock);

if (wqe->wr.opcode == IB_WR_RDMA_READ)
@@ -886,8 +886,8 @@ static int do_rc_ack(struct ipath_qp *qp
* just won't find anything to restart if we ACK everything.
*/
spin_lock(&dev->pending_lock);
- if (qp->timerwait.next != LIST_POISON1)
- list_del(&qp->timerwait);
+ if (!list_empty(&qp->timerwait))
+ list_del_init(&qp->timerwait);
spin_unlock(&dev->pending_lock);

/*
@@ -1194,8 +1194,7 @@ static inline void ipath_rc_rcv_resp(str
IB_WR_RDMA_READ))
goto ack_done;
spin_lock(&dev->pending_lock);
- if (qp->s_rnr_timeout == 0 &&
- qp->timerwait.next != LIST_POISON1)
+ if (qp->s_rnr_timeout == 0 && !list_empty(&qp->timerwait))
list_move_tail(&qp->timerwait,
&dev->pending[dev->pending_index]);
spin_unlock(&dev->pending_lock);
diff -r df954e47ff67 -r 947e92f4b370 drivers/infiniband/hw/ipath/ipath_ruc.c
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:28 2006 -0700
@@ -435,7 +435,7 @@ void ipath_no_bufs_available(struct ipat
unsigned long flags;

spin_lock_irqsave(&dev->pending_lock, flags);
- if (qp->piowait.next == LIST_POISON1)
+ if (list_empty(&qp->piowait))
list_add_tail(&qp->piowait, &dev->piowait);
spin_unlock_irqrestore(&dev->pending_lock, flags);
/*
diff -r df954e47ff67 -r 947e92f4b370 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
@@ -517,7 +517,7 @@ static void ipath_ib_timer(void *arg)
last = &dev->pending[dev->pending_index];
while (!list_empty(last)) {
qp = list_entry(last->next, struct ipath_qp, timerwait);
- list_del(&qp->timerwait);
+ list_del_init(&qp->timerwait);
qp->timer_next = resend;
resend = qp;
atomic_inc(&qp->refcount);
@@ -527,7 +527,7 @@ static void ipath_ib_timer(void *arg)
qp = list_entry(last->next, struct ipath_qp, timerwait);
if (--qp->s_rnr_timeout == 0) {
do {
- list_del(&qp->timerwait);
+ list_del_init(&qp->timerwait);
tasklet_hi_schedule(&qp->s_task);
if (list_empty(last))
break;
@@ -607,7 +607,7 @@ static int ipath_ib_piobufavail(void *ar
while (!list_empty(&dev->piowait)) {
qp = list_entry(dev->piowait.next, struct ipath_qp,
piowait);
- list_del(&qp->piowait);
+ list_del_init(&qp->piowait);
tasklet_hi_schedule(&qp->s_task);
}
spin_unlock_irqrestore(&dev->pending_lock, flags);

2006-05-12 23:58:51

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 24 of 53] ipath - count dropped VL15 packets

We need to count these for IB conformance.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 8b882bb46a32 -r e468ad0bd83e drivers/infiniband/hw/ipath/ipath_mad.c
--- a/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_mad.c Fri May 12 15:55:28 2006 -0700
@@ -646,6 +646,7 @@ struct ib_pma_portcounters {
#define IB_PMA_SEL_PORT_RCV_ERRORS __constant_htons(0x0008)
#define IB_PMA_SEL_PORT_RCV_REMPHYS_ERRORS __constant_htons(0x0010)
#define IB_PMA_SEL_PORT_XMIT_DISCARDS __constant_htons(0x0040)
+#define IB_PMA_SEL_PORT_VL15_DROPPED __constant_htons(0x0800)
#define IB_PMA_SEL_PORT_XMIT_DATA __constant_htons(0x1000)
#define IB_PMA_SEL_PORT_RCV_DATA __constant_htons(0x2000)
#define IB_PMA_SEL_PORT_XMIT_PACKETS __constant_htons(0x4000)
@@ -929,6 +930,10 @@ static int recv_pma_get_portcounters(str
else
p->port_xmit_discards =
cpu_to_be16((u16)cntrs.port_xmit_discards);
+ if (dev->n_vl15_dropped > 0xFFFFUL)
+ p->vl15_dropped = __constant_cpu_to_be16(0xFFFF);
+ else
+ p->vl15_dropped = cpu_to_be16((u16)dev->n_vl15_dropped);
if (cntrs.port_xmit_data > 0xFFFFFFFFUL)
p->port_xmit_data = __constant_cpu_to_be32(0xFFFFFFFF);
else
@@ -1022,6 +1027,9 @@ static int recv_pma_set_portcounters(str

if (p->counter_select & IB_PMA_SEL_PORT_XMIT_DISCARDS)
dev->n_port_xmit_discards = cntrs.port_xmit_discards;
+
+ if (p->counter_select & IB_PMA_SEL_PORT_VL15_DROPPED)
+ dev->n_vl15_dropped = 0;

if (p->counter_select & IB_PMA_SEL_PORT_XMIT_DATA)
dev->n_port_xmit_data = cntrs.port_xmit_data;
diff -r 8b882bb46a32 -r e468ad0bd83e drivers/infiniband/hw/ipath/ipath_ud.c
--- a/drivers/infiniband/hw/ipath/ipath_ud.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ud.c Fri May 12 15:55:28 2006 -0700
@@ -554,7 +554,11 @@ void ipath_ud_rcv(struct ipath_ibdev *de
spin_lock_irqsave(&rq->lock, flags);
if (rq->tail == rq->head) {
spin_unlock_irqrestore(&rq->lock, flags);
- dev->n_pkt_drops++;
+ /* Count VL15 packets dropped due to no receive buffer */
+ if (qp->ibqp.qp_num == 0)
+ dev->n_vl15_dropped++;
+ else
+ dev->n_pkt_drops++;
goto bail;
}
/* Silently drop packets which are too big. */
diff -r 8b882bb46a32 -r e468ad0bd83e drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
@@ -468,6 +468,7 @@ struct ipath_ibdev {
u32 n_other_naks;
u32 n_timeouts;
u32 n_pkt_drops;
+ u32 n_vl15_dropped;
u32 n_wqe_errs;
u32 n_rdma_dup_busy;
u32 n_piowait;

2006-05-12 23:58:51

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 29 of 53] ipath - remove redundant register read

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 47f1df66d097 -r 23519e578bf0 drivers/infiniband/hw/ipath/ipath_intr.c
--- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700
@@ -780,7 +780,6 @@ irqreturn_t ipath_intr(int irq, void *da
ipath_stats.sps_fastrcvint++;
goto done;
}
- istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus);
}

istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus);

2006-05-12 23:59:50

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 22 of 53] ipath - fix "many lost ticks" warning

Don't disable interrupts for long, or the kernel gets shirty.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 4e0a07d20868 -r 1887e7b3e2a3 drivers/infiniband/hw/ipath/ipath_keys.c
--- a/drivers/infiniband/hw/ipath/ipath_keys.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_keys.c Fri May 12 15:55:28 2006 -0700
@@ -120,6 +120,7 @@ int ipath_lkey_ok(struct ipath_lkey_tabl
struct ib_sge *sge, int acc)
{
struct ipath_mregion *mr;
+ unsigned n, m;
size_t off;
int ret;

@@ -151,20 +152,22 @@ int ipath_lkey_ok(struct ipath_lkey_tabl
}

off += mr->offset;
+ m = 0;
+ n = 0;
+ while (off >= mr->map[m]->segs[n].length) {
+ off -= mr->map[m]->segs[n].length;
+ n++;
+ if (n >= IPATH_SEGSZ) {
+ m++;
+ n = 0;
+ }
+ }
isge->mr = mr;
- isge->m = 0;
- isge->n = 0;
- while (off >= mr->map[isge->m]->segs[isge->n].length) {
- off -= mr->map[isge->m]->segs[isge->n].length;
- isge->n++;
- if (isge->n >= IPATH_SEGSZ) {
- isge->m++;
- isge->n = 0;
- }
- }
- isge->vaddr = mr->map[isge->m]->segs[isge->n].vaddr + off;
- isge->length = mr->map[isge->m]->segs[isge->n].length - off;
+ isge->vaddr = mr->map[m]->segs[n].vaddr + off;
+ isge->length = mr->map[m]->segs[n].length - off;
isge->sge_length = sge->length;
+ isge->m = m;
+ isge->n = n;

ret = 1;

@@ -189,6 +192,7 @@ int ipath_rkey_ok(struct ipath_ibdev *de
struct ipath_lkey_table *rkt = &dev->lk_table;
struct ipath_sge *sge = &ss->sge;
struct ipath_mregion *mr;
+ unsigned n, m;
size_t off;
int ret;

@@ -206,20 +210,22 @@ int ipath_rkey_ok(struct ipath_ibdev *de
}

off += mr->offset;
+ m = 0;
+ n = 0;
+ while (off >= mr->map[m]->segs[n].length) {
+ off -= mr->map[m]->segs[n].length;
+ n++;
+ if (n >= IPATH_SEGSZ) {
+ m++;
+ n = 0;
+ }
+ }
sge->mr = mr;
- sge->m = 0;
- sge->n = 0;
- while (off >= mr->map[sge->m]->segs[sge->n].length) {
- off -= mr->map[sge->m]->segs[sge->n].length;
- sge->n++;
- if (sge->n >= IPATH_SEGSZ) {
- sge->m++;
- sge->n = 0;
- }
- }
- sge->vaddr = mr->map[sge->m]->segs[sge->n].vaddr + off;
- sge->length = mr->map[sge->m]->segs[sge->n].length - off;
+ sge->vaddr = mr->map[m]->segs[n].vaddr + off;
+ sge->length = mr->map[m]->segs[n].length - off;
sge->sge_length = len;
+ sge->m = m;
+ sge->n = n;
ss->sg_list = NULL;
ss->num_sge = 1;

diff -r 4e0a07d20868 -r 1887e7b3e2a3 drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700
@@ -332,10 +332,11 @@ static void ipath_reset_qp(struct ipath_
qp->remote_qpn = 0;
qp->qkey = 0;
qp->qp_access_flags = 0;
+ clear_bit(IPATH_S_BUSY, &qp->s_flags);
qp->s_hdrwords = 0;
qp->s_psn = 0;
qp->r_psn = 0;
- atomic_set(&qp->msn, 0);
+ qp->r_msn = 0;
if (qp->ibqp.qp_type == IB_QPT_RC) {
qp->s_state = IB_OPCODE_RC_SEND_LAST;
qp->r_state = IB_OPCODE_RC_SEND_LAST;
@@ -344,7 +345,8 @@ static void ipath_reset_qp(struct ipath_
qp->r_state = IB_OPCODE_UC_SEND_LAST;
}
qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;
- qp->s_nak_state = 0;
+ qp->r_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;
+ qp->r_nak_state = 0;
qp->s_rnr_timeout = 0;
qp->s_head = 0;
qp->s_tail = 0;
@@ -362,10 +364,10 @@ static void ipath_reset_qp(struct ipath_
* @qp: the QP to put into an error state
*
* Flushes both send and receive work queues.
- * QP r_rq.lock and s_lock should be held.
- */
-
-static void ipath_error_qp(struct ipath_qp *qp)
+ * QP s_lock should be held.
+ */
+
+void ipath_error_qp(struct ipath_qp *qp)
{
struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
struct ib_wc wc;
@@ -408,12 +410,14 @@ static void ipath_error_qp(struct ipath_
qp->s_ack_state = IB_OPCODE_RC_ACKNOWLEDGE;

wc.opcode = IB_WC_RECV;
+ spin_lock(&qp->r_rq.lock);
while (qp->r_rq.tail != qp->r_rq.head) {
wc.wr_id = get_rwqe_ptr(&qp->r_rq, qp->r_rq.tail)->wr_id;
if (++qp->r_rq.tail >= qp->r_rq.size)
qp->r_rq.tail = 0;
ipath_cq_enter(to_icq(qp->ibqp.recv_cq), &wc, 1);
}
+ spin_unlock(&qp->r_rq.lock);
}

/**
@@ -433,8 +437,7 @@ int ipath_modify_qp(struct ib_qp *ibqp,
unsigned long flags;
int ret;

- spin_lock_irqsave(&qp->r_rq.lock, flags);
- spin_lock(&qp->s_lock);
+ spin_lock_irqsave(&qp->s_lock, flags);

cur_state = attr_mask & IB_QP_CUR_STATE ?
attr->cur_qp_state : qp->state;
@@ -505,7 +508,7 @@ int ipath_modify_qp(struct ib_qp *ibqp,
}

if (attr_mask & IB_QP_MIN_RNR_TIMER)
- qp->s_min_rnr_timer = attr->min_rnr_timer;
+ qp->r_min_rnr_timer = attr->min_rnr_timer;

if (attr_mask & IB_QP_QKEY)
qp->qkey = attr->qkey;
@@ -514,25 +517,13 @@ int ipath_modify_qp(struct ib_qp *ibqp,
qp->s_pkey_index = attr->pkey_index;

qp->state = new_state;
- spin_unlock(&qp->s_lock);
- spin_unlock_irqrestore(&qp->r_rq.lock, flags);
-
- /*
- * If QP1 changed to the RTS state, try to move to the link to INIT
- * even if it was ACTIVE so the SM will reinitialize the SMA's
- * state.
- */
- if (qp->ibqp.qp_num == 1 && new_state == IB_QPS_RTS) {
- struct ipath_ibdev *dev = to_idev(ibqp->device);
-
- ipath_layer_set_linkstate(dev->dd, IPATH_IB_LINKDOWN);
- }
+ spin_unlock_irqrestore(&qp->s_lock, flags);
+
ret = 0;
goto bail;

inval:
- spin_unlock(&qp->s_lock);
- spin_unlock_irqrestore(&qp->r_rq.lock, flags);
+ spin_unlock_irqrestore(&qp->s_lock, flags);
ret = -EINVAL;

bail:
@@ -566,7 +557,7 @@ int ipath_query_qp(struct ib_qp *ibqp, s
attr->sq_draining = 0;
attr->max_rd_atomic = 1;
attr->max_dest_rd_atomic = 1;
- attr->min_rnr_timer = qp->s_min_rnr_timer;
+ attr->min_rnr_timer = qp->r_min_rnr_timer;
attr->port_num = 1;
attr->timeout = 0;
attr->retry_cnt = qp->s_retry_cnt;
@@ -593,16 +584,12 @@ int ipath_query_qp(struct ib_qp *ibqp, s
* @qp: the queue pair to compute the AETH for
*
* Returns the AETH.
- *
- * The QP s_lock should be held.
*/
__be32 ipath_compute_aeth(struct ipath_qp *qp)
{
- u32 aeth = atomic_read(&qp->msn) & IPS_MSN_MASK;
-
- if (qp->s_nak_state) {
- aeth |= qp->s_nak_state << IPS_AETH_CREDIT_SHIFT;
- } else if (qp->ibqp.srq) {
+ u32 aeth = qp->r_msn & IPS_MSN_MASK;
+
+ if (qp->ibqp.srq) {
/*
* Shared receive queues don't generate credits.
* Set the credit field to the invalid value.
diff -r 4e0a07d20868 -r 1887e7b3e2a3 drivers/infiniband/hw/ipath/ipath_rc.c
--- a/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700
@@ -41,7 +41,7 @@
* @qp: the QP who's SGE we're restarting
* @wqe: the work queue to initialize the QP's SGE from
*
- * The QP s_lock should be held.
+ * The QP s_lock should be held and interrupts disabled.
*/
static void ipath_init_restart(struct ipath_qp *qp, struct ipath_swqe *wqe)
{
@@ -64,7 +64,7 @@ static void ipath_init_restart(struct ip
}

/**
- * ipath_make_rc_ack - construct a response packet (ACK, NAK, or RDMA read)
+ * ipath_make_rc_ack - construct a RDMA read response packet
* @qp: a pointer to the QP
* @ohdr: a pointer to the IB header being constructed
* @pmtu: the path MTU
@@ -76,7 +76,6 @@ u32 ipath_make_rc_ack(struct ipath_qp *q
struct ipath_other_headers *ohdr,
u32 pmtu)
{
- struct ipath_sge_state *ss;
u32 hwords;
u32 len;
u32 bth0;
@@ -90,7 +89,6 @@ u32 ipath_make_rc_ack(struct ipath_qp *q
*/
switch (qp->s_ack_state) {
case OP(RDMA_READ_REQUEST):
- ss = &qp->s_rdma_sge;
len = qp->s_rdma_len;
if (len > pmtu) {
len = pmtu;
@@ -107,7 +105,6 @@ u32 ipath_make_rc_ack(struct ipath_qp *q
qp->s_ack_state = OP(RDMA_READ_RESPONSE_MIDDLE);
/* FALLTHROUGH */
case OP(RDMA_READ_RESPONSE_MIDDLE):
- ss = &qp->s_rdma_sge;
len = qp->s_rdma_len;
if (len > pmtu)
len = pmtu;
@@ -122,44 +119,18 @@ u32 ipath_make_rc_ack(struct ipath_qp *q

case OP(RDMA_READ_RESPONSE_LAST):
case OP(RDMA_READ_RESPONSE_ONLY):
+ default:
/*
* We have to prevent new requests from changing
* the r_sge state while a ipath_verbs_send()
* is in progress.
- * Changing r_state allows the receiver
- * to continue processing new packets.
- * We do it here now instead of above so
- * that we are sure the packet was sent before
- * changing the state.
- */
- qp->r_state = OP(RDMA_READ_RESPONSE_LAST);
+ */
qp->s_ack_state = OP(ACKNOWLEDGE);
bth0 = 0;
goto bail;
-
- case OP(COMPARE_SWAP):
- case OP(FETCH_ADD):
- ss = NULL;
- len = 0;
- qp->r_state = OP(SEND_LAST);
- qp->s_ack_state = OP(ACKNOWLEDGE);
- bth0 = OP(ATOMIC_ACKNOWLEDGE) << 24;
- ohdr->u.at.aeth = ipath_compute_aeth(qp);
- ohdr->u.at.atomic_ack_eth = cpu_to_be64(qp->s_ack_atomic);
- hwords += sizeof(ohdr->u.at) / 4;
- break;
-
- default:
- /* Send a regular ACK. */
- ss = NULL;
- len = 0;
- qp->s_ack_state = OP(ACKNOWLEDGE);
- bth0 = qp->s_ack_state << 24;
- ohdr->u.aeth = ipath_compute_aeth(qp);
- hwords++;
}
qp->s_hdrwords = hwords;
- qp->s_cur_sge = ss;
+ qp->s_cur_sge = &qp->s_rdma_sge;
qp->s_cur_size = len;

bail:
@@ -175,7 +146,7 @@ bail:
* @bth2p: pointer to the BTH PSN word
*
* Return 1 if constructed; otherwise, return 0.
- * Note the QP s_lock must be held.
+ * Note the QP s_lock must be held and interrupts disabled.
*/
int ipath_make_rc_req(struct ipath_qp *qp,
struct ipath_other_headers *ohdr,
@@ -532,11 +503,16 @@ static void send_rc_ack(struct ipath_qp
ohdr = &hdr.u.l.oth;
lrh0 = IPS_LRH_GRH;
}
+ /* read pkey_index w/o lock (its atomic) */
bth0 = ipath_layer_get_pkey(dev->dd, qp->s_pkey_index);
- ohdr->u.aeth = ipath_compute_aeth(qp);
- if (qp->s_ack_state >= OP(COMPARE_SWAP)) {
+ if (qp->r_nak_state)
+ ohdr->u.aeth = (qp->r_msn & IPS_MSN_MASK) |
+ (qp->r_nak_state << IPS_AETH_CREDIT_SHIFT);
+ else
+ ohdr->u.aeth = ipath_compute_aeth(qp);
+ if (qp->r_ack_state >= OP(COMPARE_SWAP)) {
bth0 |= OP(ATOMIC_ACKNOWLEDGE) << 24;
- ohdr->u.at.atomic_ack_eth = cpu_to_be64(qp->s_ack_atomic);
+ ohdr->u.at.atomic_ack_eth = cpu_to_be64(qp->r_atomic_data);
hwords += sizeof(ohdr->u.at.atomic_ack_eth) / 4;
} else
bth0 |= OP(ACKNOWLEDGE) << 24;
@@ -547,13 +523,13 @@ static void send_rc_ack(struct ipath_qp
hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->dd));
ohdr->bth[0] = cpu_to_be32(bth0);
ohdr->bth[1] = cpu_to_be32(qp->remote_qpn);
- ohdr->bth[2] = cpu_to_be32(qp->s_ack_psn & IPS_PSN_MASK);
+ ohdr->bth[2] = cpu_to_be32(qp->r_ack_psn & IPS_PSN_MASK);

/*
* If we can send the ACK, clear the ACK state.
*/
if (ipath_verbs_send(dev->dd, hwords, (u32 *) &hdr, 0, NULL) == 0) {
- qp->s_ack_state = OP(ACKNOWLEDGE);
+ qp->r_ack_state = OP(ACKNOWLEDGE);
dev->n_unicast_xmit++;
} else
dev->n_rc_qacks++;
@@ -647,7 +623,7 @@ done:
* @psn: packet sequence number for the request
* @wc: the work completion request
*
- * The QP s_lock should be held.
+ * The QP s_lock should be held and interrupts disabled.
*/
void ipath_restart_rc(struct ipath_qp *qp, u32 psn, struct ib_wc *wc)
{
@@ -711,7 +687,7 @@ bail:
*
* This is called from ipath_rc_rcv_resp() to process an incoming RC ACK
* for the given QP.
- * Called at interrupt level with the QP s_lock held.
+ * Called at interrupt level with the QP s_lock held and interrupts disabled.
* Returns 1 if OK, 0 if current operation should be aborted (NAK).
*/
static int do_rc_ack(struct ipath_qp *qp, u32 aeth, u32 psn, int opcode)
@@ -1125,8 +1101,6 @@ static inline int ipath_rc_rcv_error(str
{
struct ib_reth *reth;

- spin_lock(&qp->s_lock);
-
if (diff > 0) {
/*
* Packet sequence error.
@@ -1134,15 +1108,16 @@ static inline int ipath_rc_rcv_error(str
* Don't queue the NAK if a RDMA read, atomic, or
* NAK is pending though.
*/
- if ((qp->s_ack_state >= OP(RDMA_READ_REQUEST) &&
- qp->s_ack_state != OP(ACKNOWLEDGE)) ||
- qp->s_nak_state != 0)
+ if (qp->s_ack_state != OP(ACKNOWLEDGE) ||
+ qp->r_nak_state != 0)
goto done;
- qp->s_ack_state = OP(SEND_ONLY);
- qp->s_nak_state = IB_NAK_PSN_ERROR;
- /* Use the expected PSN. */
- qp->s_ack_psn = qp->r_psn;
- goto resched;
+ if (qp->r_ack_state < OP(COMPARE_SWAP)) {
+ qp->r_ack_state = OP(SEND_ONLY);
+ qp->r_nak_state = IB_NAK_PSN_ERROR;
+ /* Use the expected PSN. */
+ qp->r_ack_psn = qp->r_psn;
+ }
+ goto send_ack;
}

/*
@@ -1156,30 +1131,29 @@ static inline int ipath_rc_rcv_error(str
* send the earliest so that RDMA reads can be restarted at
* the requester's expected PSN.
*/
- if (qp->s_ack_state != OP(ACKNOWLEDGE) &&
- ipath_cmp24(psn, qp->s_ack_psn) >= 0) {
- if (qp->s_ack_state < OP(RDMA_READ_REQUEST))
- qp->s_ack_psn = psn;
- goto done;
- }
- switch (opcode) {
- case OP(RDMA_READ_REQUEST):
- /*
- * We have to be careful to not change s_rdma_sge
- * while ipath_do_rc_send() is using it and not
- * holding the s_lock.
- */
- if (qp->s_ack_state != OP(ACKNOWLEDGE) &&
- qp->s_ack_state >= OP(RDMA_READ_REQUEST)) {
- dev->n_rdma_dup_busy++;
- goto done;
- }
+ if (opcode == OP(RDMA_READ_REQUEST)) {
/* RETH comes after BTH */
if (!header_in_data)
reth = &ohdr->u.rc.reth;
else {
reth = (struct ib_reth *)data;
data += sizeof(*reth);
+ }
+ /*
+ * If we receive a duplicate RDMA request, it means the
+ * requester saw a sequence error and needs to restart
+ * from an earlier point. We can abort the current
+ * RDMA read send in that case.
+ */
+ spin_lock_irq(&qp->s_lock);
+ if (qp->s_ack_state != OP(ACKNOWLEDGE) &&
+ (qp->s_hdrwords || ipath_cmp24(psn, qp->s_ack_psn) >= 0)) {
+ /*
+ * We are already sending earlier requested data.
+ * Don't abort it to send later out of sequence data.
+ */
+ spin_unlock_irq(&qp->s_lock);
+ goto done;
}
qp->s_rdma_len = be32_to_cpu(reth->length);
if (qp->s_rdma_len != 0) {
@@ -1194,8 +1168,10 @@ static inline int ipath_rc_rcv_error(str
ok = ipath_rkey_ok(dev, &qp->s_rdma_sge,
qp->s_rdma_len, vaddr, rkey,
IB_ACCESS_REMOTE_READ);
- if (unlikely(!ok))
+ if (unlikely(!ok)) {
+ spin_unlock_irq(&qp->s_lock);
goto done;
+ }
} else {
qp->s_rdma_sge.sg_list = NULL;
qp->s_rdma_sge.num_sge = 0;
@@ -1204,8 +1180,30 @@ static inline int ipath_rc_rcv_error(str
qp->s_rdma_sge.sge.length = 0;
qp->s_rdma_sge.sge.sge_length = 0;
}
- break;
-
+ qp->s_ack_state = opcode;
+ qp->s_ack_psn = psn;
+ spin_unlock_irq(&qp->s_lock);
+ tasklet_hi_schedule(&qp->s_task);
+ goto send_ack;
+ }
+
+ /*
+ * A pending RDMA read will ACK anything before it so
+ * ignore earlier duplicate requests.
+ */
+ if (qp->s_ack_state != OP(ACKNOWLEDGE))
+ goto done;
+
+ /*
+ * If an ACK is pending, don't replace the pending ACK
+ * with an earlier one since the later one will ACK the earlier.
+ * Also, if we already have a pending atomic, send it.
+ */
+ if (qp->r_ack_state != OP(ACKNOWLEDGE) &&
+ (ipath_cmp24(psn, qp->r_ack_psn) <= 0 ||
+ qp->r_ack_state >= OP(COMPARE_SWAP)))
+ goto send_ack;
+ switch (opcode) {
case OP(COMPARE_SWAP):
case OP(FETCH_ADD):
/*
@@ -1214,17 +1212,15 @@ static inline int ipath_rc_rcv_error(str
*/
if ((psn & IPS_PSN_MASK) != qp->r_atomic_psn)
goto done;
- qp->s_ack_atomic = qp->r_atomic_data;
break;
}
- qp->s_ack_state = opcode;
- qp->s_nak_state = 0;
- qp->s_ack_psn = psn;
-resched:
+ qp->r_ack_state = opcode;
+ qp->r_nak_state = 0;
+ qp->r_ack_psn = psn;
+send_ack:
return 0;

done:
- spin_unlock(&qp->s_lock);
return 1;
}

@@ -1249,7 +1245,6 @@ void ipath_rc_rcv(struct ipath_ibdev *de
u32 hdrsize;
u32 psn;
u32 pad;
- unsigned long flags;
struct ib_wc wc;
u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu);
int diff;
@@ -1290,10 +1285,8 @@ void ipath_rc_rcv(struct ipath_ibdev *de
opcode <= OP(ATOMIC_ACKNOWLEDGE)) {
ipath_rc_rcv_resp(dev, ohdr, data, tlen, qp, opcode, psn,
hdrsize, pmtu, header_in_data);
- goto bail;
- }
-
- spin_lock_irqsave(&qp->r_rq.lock, flags);
+ goto done;
+ }

/* Compute 24 bits worth of difference. */
diff = ipath_cmp24(psn, qp->r_psn);
@@ -1301,7 +1294,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de
if (ipath_rc_rcv_error(dev, ohdr, data, qp, opcode,
psn, diff, header_in_data))
goto done;
- goto resched;
+ goto send_ack;
}

/* Check for opcode sequence errors. */
@@ -1318,18 +1311,14 @@ void ipath_rc_rcv(struct ipath_ibdev *de
* Don't queue the NAK if a RDMA read, atomic, or NAK
* is pending though.
*/
- spin_lock(&qp->s_lock);
- if (qp->s_ack_state >= OP(RDMA_READ_REQUEST) &&
- qp->s_ack_state != OP(ACKNOWLEDGE)) {
- spin_unlock(&qp->s_lock);
- goto done;
- }
+ if (qp->r_ack_state >= OP(COMPARE_SWAP))
+ goto send_ack;
/* XXX Flush WQEs */
qp->state = IB_QPS_ERR;
- qp->s_ack_state = OP(SEND_ONLY);
- qp->s_nak_state = IB_NAK_INVALID_REQUEST;
- qp->s_ack_psn = qp->r_psn;
- goto resched;
+ qp->r_ack_state = OP(SEND_ONLY);
+ qp->r_nak_state = IB_NAK_INVALID_REQUEST;
+ qp->r_ack_psn = qp->r_psn;
+ goto send_ack;

case OP(RDMA_WRITE_FIRST):
case OP(RDMA_WRITE_MIDDLE):
@@ -1338,20 +1327,6 @@ void ipath_rc_rcv(struct ipath_ibdev *de
opcode == OP(RDMA_WRITE_LAST_WITH_IMMEDIATE))
break;
goto nack_inv;
-
- case OP(RDMA_READ_REQUEST):
- case OP(COMPARE_SWAP):
- case OP(FETCH_ADD):
- /*
- * Drop all new requests until a response has been sent. A
- * new request then ACKs the RDMA response we sent. Relaxed
- * ordering would allow new requests to be processed but we
- * would need to keep a queue of rwqe's for all that are in
- * progress. Note that we can't RNR NAK this request since
- * the RDMA READ or atomic response is already queued to be
- * sent (unless we implement a response send queue).
- */
- goto done;

default:
if (opcode == OP(SEND_MIDDLE) ||
@@ -1361,6 +1336,11 @@ void ipath_rc_rcv(struct ipath_ibdev *de
opcode == OP(RDMA_WRITE_LAST) ||
opcode == OP(RDMA_WRITE_LAST_WITH_IMMEDIATE))
goto nack_inv;
+ /*
+ * Note that it is up to the requester to not send a new
+ * RDMA read or atomic operation before receiving an ACK
+ * for the previous operation.
+ */
break;
}

@@ -1377,16 +1357,12 @@ void ipath_rc_rcv(struct ipath_ibdev *de
* Don't queue the NAK if a RDMA read or atomic
* is pending though.
*/
- spin_lock(&qp->s_lock);
- if (qp->s_ack_state >= OP(RDMA_READ_REQUEST) &&
- qp->s_ack_state != OP(ACKNOWLEDGE)) {
- spin_unlock(&qp->s_lock);
- goto done;
- }
- qp->s_ack_state = OP(SEND_ONLY);
- qp->s_nak_state = IB_RNR_NAK | qp->s_min_rnr_timer;
- qp->s_ack_psn = qp->r_psn;
- goto resched;
+ if (qp->r_ack_state >= OP(COMPARE_SWAP))
+ goto send_ack;
+ qp->r_ack_state = OP(SEND_ONLY);
+ qp->r_nak_state = IB_RNR_NAK | qp->r_min_rnr_timer;
+ qp->r_ack_psn = qp->r_psn;
+ goto send_ack;
}
qp->r_rcv_len = 0;
/* FALLTHROUGH */
@@ -1443,7 +1419,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de
if (unlikely(wc.byte_len > qp->r_len))
goto nack_inv;
ipath_copy_sge(&qp->r_sge, data, tlen);
- atomic_inc(&qp->msn);
+ qp->r_msn++;
if (opcode == OP(RDMA_WRITE_LAST) ||
opcode == OP(RDMA_WRITE_ONLY))
break;
@@ -1487,29 +1463,8 @@ void ipath_rc_rcv(struct ipath_ibdev *de
ok = ipath_rkey_ok(dev, &qp->r_sge,
qp->r_len, vaddr, rkey,
IB_ACCESS_REMOTE_WRITE);
- if (unlikely(!ok)) {
- nack_acc:
- /*
- * A NAK will ACK earlier sends and RDMA
- * writes. Don't queue the NAK if a RDMA
- * read, atomic, or NAK is pending though.
- */
- spin_lock(&qp->s_lock);
- nack_acc1:
- if (qp->s_ack_state >=
- OP(RDMA_READ_REQUEST) &&
- qp->s_ack_state != OP(ACKNOWLEDGE)) {
- spin_unlock(&qp->s_lock);
- goto done;
- }
- /* XXX Flush WQEs */
- qp->state = IB_QPS_ERR;
- qp->s_ack_state = OP(RDMA_WRITE_ONLY);
- qp->s_nak_state =
- IB_NAK_REMOTE_ACCESS_ERROR;
- qp->s_ack_psn = qp->r_psn;
- goto resched;
- }
+ if (unlikely(!ok))
+ goto nack_acc;
} else {
qp->r_sge.sg_list = NULL;
qp->r_sge.sge.mr = NULL;
@@ -1539,16 +1494,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de
if (unlikely(!(qp->qp_access_flags &
IB_ACCESS_REMOTE_READ)))
goto nack_acc;
- /*
- * Ignore request if we already have an
- * RDMA read or ATOMIC pending.
- */
- spin_lock(&qp->s_lock);
- if (qp->s_ack_state != OP(ACKNOWLEDGE) &&
- qp->s_ack_state >= OP(RDMA_READ_REQUEST)) {
- spin_unlock(&qp->s_lock);
- goto done;
- }
+ spin_lock_irq(&qp->s_lock);
qp->s_rdma_len = be32_to_cpu(reth->length);
if (qp->s_rdma_len != 0) {
u32 rkey = be32_to_cpu(reth->rkey);
@@ -1560,7 +1506,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de
qp->s_rdma_len, vaddr, rkey,
IB_ACCESS_REMOTE_READ);
if (unlikely(!ok))
- goto nack_acc1;
+ goto nack_acc;
/*
* Update the next expected PSN. We add 1 later
* below, so only add the remainder here.
@@ -1580,13 +1526,20 @@ void ipath_rc_rcv(struct ipath_ibdev *de
* finish sending the result since a duplicate request would
* increment it more than once.
*/
- atomic_inc(&qp->msn);
+ qp->r_msn++;
+
qp->s_ack_state = opcode;
- qp->s_nak_state = 0;
qp->s_ack_psn = psn;
+ spin_unlock_irq(&qp->s_lock);
+
qp->r_psn++;
qp->r_state = opcode;
- goto rdmadone;
+ qp->r_nak_state = 0;
+
+ /* Call ipath_do_rc_send() in another thread. */
+ tasklet_hi_schedule(&qp->s_task);
+
+ goto done;

case OP(COMPARE_SWAP):
case OP(FETCH_ADD): {
@@ -1615,7 +1568,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de
goto nack_acc;
/* Perform atomic OP and save result. */
sdata = be64_to_cpu(ateth->swap_data);
- spin_lock(&dev->pending_lock);
+ spin_lock_irq(&dev->pending_lock);
qp->r_atomic_data = *(u64 *) qp->r_sge.sge.vaddr;
if (opcode == OP(FETCH_ADD))
*(u64 *) qp->r_sge.sge.vaddr =
@@ -1623,8 +1576,8 @@ void ipath_rc_rcv(struct ipath_ibdev *de
else if (qp->r_atomic_data ==
be64_to_cpu(ateth->compare_data))
*(u64 *) qp->r_sge.sge.vaddr = sdata;
- spin_unlock(&dev->pending_lock);
- atomic_inc(&qp->msn);
+ spin_unlock_irq(&dev->pending_lock);
+ qp->r_msn++;
qp->r_atomic_psn = psn & IPS_PSN_MASK;
psn |= 1 << 31;
break;
@@ -1636,46 +1589,39 @@ void ipath_rc_rcv(struct ipath_ibdev *de
}
qp->r_psn++;
qp->r_state = opcode;
+ qp->r_nak_state = 0;
/* Send an ACK if requested or required. */
if (psn & (1 << 31)) {
/*
* Coalesce ACKs unless there is a RDMA READ or
* ATOMIC pending.
*/
- spin_lock(&qp->s_lock);
- if (qp->s_ack_state == OP(ACKNOWLEDGE) ||
- qp->s_ack_state < OP(RDMA_READ_REQUEST)) {
- qp->s_ack_state = opcode;
- qp->s_nak_state = 0;
- qp->s_ack_psn = psn;
- qp->s_ack_atomic = qp->r_atomic_data;
- goto resched;
- }
- spin_unlock(&qp->s_lock);
- }
+ if (qp->r_ack_state < OP(COMPARE_SWAP)) {
+ qp->r_ack_state = opcode;
+ qp->r_ack_psn = psn;
+ }
+ goto send_ack;
+ }
+ goto done;
+
+nack_acc:
+ /*
+ * A NAK will ACK earlier sends and RDMA writes.
+ * Don't queue the NAK if a RDMA read, atomic, or NAK
+ * is pending though.
+ */
+ if (qp->r_ack_state < OP(COMPARE_SWAP)) {
+ /* XXX Flush WQEs */
+ qp->state = IB_QPS_ERR;
+ qp->r_ack_state = OP(RDMA_WRITE_ONLY);
+ qp->r_nak_state = IB_NAK_REMOTE_ACCESS_ERROR;
+ qp->r_ack_psn = qp->r_psn;
+ }
+send_ack:
+ /* Send ACK right away unless a RDMA read is pending. */
+ if (qp->s_ack_state == OP(ACKNOWLEDGE))
+ send_rc_ack(qp);
+
done:
- spin_unlock_irqrestore(&qp->r_rq.lock, flags);
- goto bail;
-
-resched:
- /*
- * Try to send ACK right away but not if ipath_do_rc_send() is
- * active.
- */
- if (qp->s_hdrwords == 0 &&
- (qp->s_ack_state < IB_OPCODE_RDMA_READ_REQUEST ||
- qp->s_ack_state >= IB_OPCODE_COMPARE_SWAP))
- send_rc_ack(qp);
- else
- dev->n_rc_qacks++;
-
-rdmadone:
- spin_unlock(&qp->s_lock);
- spin_unlock_irqrestore(&qp->r_rq.lock, flags);
-
- /* Call ipath_do_rc_send() in another thread. */
- tasklet_hi_schedule(&qp->s_task);
-
-bail:
return;
}
diff -r 4e0a07d20868 -r 1887e7b3e2a3 drivers/infiniband/hw/ipath/ipath_ruc.c
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:28 2006 -0700
@@ -112,10 +112,11 @@ void ipath_insert_rnr_queue(struct ipath
*
* Return 0 if no RWQE is available, otherwise return 1.
*
- * Called at interrupt level with the QP r_rq.lock held.
+ * Can be called from interrupt level.
*/
int ipath_get_rwqe(struct ipath_qp *qp, int wr_id_only)
{
+ unsigned long flags;
struct ipath_rq *rq;
struct ipath_srq *srq;
struct ipath_rwqe *wqe;
@@ -123,6 +124,8 @@ int ipath_get_rwqe(struct ipath_qp *qp,

if (!qp->ibqp.srq) {
rq = &qp->r_rq;
+ spin_lock_irqsave(&rq->lock, flags);
+
if (unlikely(rq->tail == rq->head)) {
ret = 0;
goto bail;
@@ -137,15 +140,14 @@ int ipath_get_rwqe(struct ipath_qp *qp,
}
if (++rq->tail >= rq->size)
rq->tail = 0;
- ret = 1;
- goto bail;
+ goto done;
}

srq = to_isrq(qp->ibqp.srq);
rq = &srq->rq;
- spin_lock(&rq->lock);
+ spin_lock_irqsave(&rq->lock, flags);
+
if (unlikely(rq->tail == rq->head)) {
- spin_unlock(&rq->lock);
ret = 0;
goto bail;
}
@@ -175,13 +177,13 @@ int ipath_get_rwqe(struct ipath_qp *qp,
ev.event = IB_EVENT_SRQ_LIMIT_REACHED;
srq->ibsrq.event_handler(&ev,
srq->ibsrq.srq_context);
- } else
- spin_unlock(&rq->lock);
- } else
- spin_unlock(&rq->lock);
+ }
+ }
+done:
ret = 1;

bail:
+ spin_unlock_irqrestore(&rq->lock, flags);
return ret;
}

@@ -247,10 +249,8 @@ again:
wc.imm_data = wqe->wr.imm_data;
/* FALLTHROUGH */
case IB_WR_SEND:
- spin_lock_irqsave(&qp->r_rq.lock, flags);
if (!ipath_get_rwqe(qp, 0)) {
rnr_nak:
- spin_unlock_irqrestore(&qp->r_rq.lock, flags);
/* Handle RNR NAK */
if (qp->ibqp.qp_type == IB_QPT_UC)
goto send_comp;
@@ -262,20 +262,17 @@ again:
sqp->s_rnr_retry--;
dev->n_rnr_naks++;
sqp->s_rnr_timeout =
- ib_ipath_rnr_table[sqp->s_min_rnr_timer];
+ ib_ipath_rnr_table[sqp->r_min_rnr_timer];
ipath_insert_rnr_queue(sqp);
goto done;
}
- spin_unlock_irqrestore(&qp->r_rq.lock, flags);
break;

case IB_WR_RDMA_WRITE_WITH_IMM:
wc.wc_flags = IB_WC_WITH_IMM;
wc.imm_data = wqe->wr.imm_data;
- spin_lock_irqsave(&qp->r_rq.lock, flags);
if (!ipath_get_rwqe(qp, 1))
goto rnr_nak;
- spin_unlock_irqrestore(&qp->r_rq.lock, flags);
/* FALLTHROUGH */
case IB_WR_RDMA_WRITE:
if (wqe->length == 0)
diff -r 4e0a07d20868 -r 1887e7b3e2a3 drivers/infiniband/hw/ipath/ipath_uc.c
--- a/drivers/infiniband/hw/ipath/ipath_uc.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_uc.c Fri May 12 15:55:28 2006 -0700
@@ -240,7 +240,6 @@ void ipath_uc_rcv(struct ipath_ibdev *de
u32 hdrsize;
u32 psn;
u32 pad;
- unsigned long flags;
struct ib_wc wc;
u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu);
struct ib_reth *reth;
@@ -278,8 +277,6 @@ void ipath_uc_rcv(struct ipath_ibdev *de
wc.imm_data = 0;
wc.wc_flags = 0;

- spin_lock_irqsave(&qp->r_rq.lock, flags);
-
/* Compare the PSN verses the expected PSN. */
if (unlikely(ipath_cmp24(psn, qp->r_psn) != 0)) {
/*
@@ -536,15 +533,11 @@ void ipath_uc_rcv(struct ipath_ibdev *de

default:
/* Drop packet for unknown opcodes. */
- spin_unlock_irqrestore(&qp->r_rq.lock, flags);
dev->n_pkt_drops++;
- goto bail;
+ goto done;
}
qp->r_psn++;
qp->r_state = opcode;
done:
- spin_unlock_irqrestore(&qp->r_rq.lock, flags);
-
-bail:
return;
}
diff -r 4e0a07d20868 -r 1887e7b3e2a3 drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
@@ -306,32 +306,33 @@ struct ipath_qp {
u32 s_next_psn; /* PSN for next request */
u32 s_last_psn; /* last response PSN processed */
u32 s_psn; /* current packet sequence number */
+ u32 s_ack_psn; /* PSN for RDMA_READ */
u32 s_rnr_timeout; /* number of milliseconds for RNR timeout */
- u32 s_ack_psn; /* PSN for next ACK or RDMA_READ */
- u64 s_ack_atomic; /* data for atomic ACK */
+ u32 r_ack_psn; /* PSN for next ACK or atomic ACK */
u64 r_wr_id; /* ID for current receive WQE */
u64 r_atomic_data; /* data for last atomic op */
u32 r_atomic_psn; /* PSN of last atomic op */
u32 r_len; /* total length of r_sge */
u32 r_rcv_len; /* receive data len processed */
u32 r_psn; /* expected rcv packet sequence number */
+ u32 r_msn; /* message sequence number */
u8 state; /* QP state */
u8 s_state; /* opcode of last packet sent */
u8 s_ack_state; /* opcode of packet to ACK */
- u8 s_nak_state; /* non-zero if NAK is pending */
u8 r_state; /* opcode of last packet received */
+ u8 r_ack_state; /* opcode of packet to ACK */
+ u8 r_nak_state; /* non-zero if NAK is pending */
+ u8 r_min_rnr_timer; /* retry timeout value for RNR NAKs */
u8 r_reuse_sge; /* for UC receive errors */
u8 r_sge_inx; /* current index into sg_list */
+ u8 qp_access_flags;
u8 s_max_sge; /* size of s_wq->sg_list */
- u8 qp_access_flags;
u8 s_retry_cnt; /* number of times to retry */
u8 s_rnr_retry_cnt;
- u8 s_min_rnr_timer;
u8 s_retry; /* requester retry counter */
u8 s_rnr_retry; /* requester RNR retry counter */
u8 s_pkey_index; /* PKEY index to use */
enum ib_mtu path_mtu;
- atomic_t msn; /* message sequence number */
u32 remote_qpn;
u32 qkey; /* QKEY for this QP (for UD or RD) */
u32 s_size; /* send work queue size */

2006-05-12 23:59:49

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 31 of 53] ipath - forbid sending of bad packet sizes

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r b098b021b6fd -r 4868daa7f215 drivers/infiniband/hw/ipath/ipath_ud.c
--- a/drivers/infiniband/hw/ipath/ipath_ud.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ud.c Fri May 12 15:55:28 2006 -0700
@@ -273,6 +273,11 @@ int ipath_post_ud_send(struct ipath_qp *
}
len += wr->sg_list[i].length;
ss.num_sge++;
+ }
+ /* Check for invalid packet size. */
+ if (len > ipath_layer_get_ibmtu(dev->dd)) {
+ ret = -EINVAL;
+ goto bail;
}
extra_bytes = (4 - len) & 3;
nwords = (len + extra_bytes) >> 2;

2006-05-13 00:00:51

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 28 of 53] ipath - forbid setting of invalid MLID

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 551966b88d7c -r 47f1df66d097 drivers/infiniband/hw/ipath/ipath_sysfs.c
--- a/drivers/infiniband/hw/ipath/ipath_sysfs.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_sysfs.c Fri May 12 15:55:28 2006 -0700
@@ -221,7 +221,7 @@ static ssize_t store_mlid(struct device
int ret;

ret = ipath_parse_ushort(buf, &mlid);
- if (ret < 0)
+ if (ret < 0 || mlid < 0xc000)
goto invalid;

unit = dd->ipath_unit;

2006-05-13 00:01:42

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 34 of 53] ipath - fix occasional hangs in SDP

We were updating the head register multiple times in the rcvhdrq
processing loop, and setting the counter on each update. Since that meant
that the tail register was ahead of head for all but the last update,
we would get extra interrupts. The fix was to not write the counter
value except on the last update.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 5ddaf7c07cdf -r 09077b2f476f drivers/infiniband/hw/ipath/ipath_driver.c
--- a/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c Fri May 12 15:55:28 2006 -0700
@@ -918,7 +918,7 @@ void ipath_kreceive(struct ipath_devdata
const u32 maxcnt = dd->ipath_rcvhdrcnt * rsize; /* words */
u32 etail = -1, l, hdrqtail;
struct ips_message_header *hdr;
- u32 eflags, i, etype, tlen, pkttot = 0;
+ u32 eflags, i, etype, tlen, pkttot = 0, updegr=0;
static u64 totcalls; /* stats, may eventually remove */
char emsg[128];

@@ -932,14 +932,14 @@ void ipath_kreceive(struct ipath_devdata
if (test_and_set_bit(0, &dd->ipath_rcv_pending))
goto bail;

- if (dd->ipath_port0head ==
- (u32)le64_to_cpu(*dd->ipath_hdrqtailptr))
+ l = dd->ipath_port0head;
+ if(l == (u32)le64_to_cpu(*dd->ipath_hdrqtailptr))
goto done;

/* read only once at start for performance */
hdrqtail = (u32)le64_to_cpu(*dd->ipath_hdrqtailptr);

- for (i = 0, l = dd->ipath_port0head; l != hdrqtail; i++) {
+ for (i = 0; l != hdrqtail; i++) {
u32 qp;
u8 *bthbytes;

@@ -1050,15 +1050,26 @@ void ipath_kreceive(struct ipath_devdata
l += rsize;
if (l >= maxcnt)
l = 0;
+ if (etype != RCVHQ_RCV_TYPE_EXPECTED)
+ updegr = 1;
/*
- * update for each packet, to help prevent overflows if we
- * have lots of packets.
+ * update head regs on last packet, and every 16 packets.
+ * Reduce bus traffic, while still trying to prevent
+ * rcvhdrq overflows, for when the queue is nearly full
*/
- (void)ipath_write_ureg(dd, ur_rcvhdrhead,
- dd->ipath_rhdrhead_intr_off | l, 0);
- if (etype != RCVHQ_RCV_TYPE_EXPECTED)
- (void)ipath_write_ureg(dd, ur_rcvegrindexhead,
- etail, 0);
+ if(l == hdrqtail || (i && !(i&0xf))) {
+ u64 lval;
+ if(l == hdrqtail) /* want interrupt only on last */
+ lval = dd->ipath_rhdrhead_intr_off | l;
+ else
+ lval = l;
+ (void)ipath_write_ureg(dd, ur_rcvhdrhead, lval, 0);
+ if(updegr) {
+ (void)ipath_write_ureg(dd, ur_rcvegrindexhead,
+ etail, 0);
+ updegr = 0;
+ }
+ }
}

pkttot += i;
diff -r 5ddaf7c07cdf -r 09077b2f476f drivers/infiniband/hw/ipath/ipath_intr.c
--- a/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_intr.c Fri May 12 15:55:28 2006 -0700
@@ -350,7 +350,7 @@ static unsigned handle_frequent_errors(s
return supp_msgs;
}

-static void handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
+static int handle_errors(struct ipath_devdata *dd, ipath_err_t errs)
{
char msg[512];
u64 ignore_this_time = 0;
@@ -434,7 +434,7 @@ static void handle_errors(struct ipath_d
INFINIPATH_E_IBSTATUSCHANGED);
}
if (!errs)
- return;
+ return 0;

if (!noprint)
/*
@@ -558,9 +558,7 @@ static void handle_errors(struct ipath_d
wake_up_interruptible(&ipath_sma_state_wait);
}

- if (chkerrpkts)
- /* process possible error packets in hdrq */
- ipath_kreceive(dd);
+ return chkerrpkts;
}

/* this is separate to allow for better optimization of ipath_intr() */
@@ -716,13 +714,14 @@ static void handle_urcv(struct ipath_dev
}
}

+
irqreturn_t ipath_intr(int irq, void *data, struct pt_regs *regs)
{
struct ipath_devdata *dd = data;
- u32 istat;
+ u32 istat, chk0rcv = 0;
ipath_err_t estat = 0;
irqreturn_t ret;
- u32 p0bits;
+ u32 p0bits, oldhead;
static unsigned unexpected = 0;
static const u32 port0rbits = (1U<<INFINIPATH_I_RCVAVAIL_SHIFT) |
(1U<<INFINIPATH_I_RCVURG_SHIFT);
@@ -764,9 +763,8 @@ irqreturn_t ipath_intr(int irq, void *da
* interrupts. We clear the interrupts first so that we don't
* lose intr for later packets that arrive while we are processing.
*/
- if (dd->ipath_port0head !=
- (u32)le64_to_cpu(*dd->ipath_hdrqtailptr)) {
- u32 oldhead = dd->ipath_port0head;
+ oldhead = dd->ipath_port0head;
+ if (oldhead != (u32)le64_to_cpu(*dd->ipath_hdrqtailptr)) {
if(dd->ipath_flags & IPATH_GPIO_INTR) {
ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear,
(u64) (1 << 2));
@@ -783,6 +781,8 @@ irqreturn_t ipath_intr(int irq, void *da
}

istat = ipath_read_kreg32(dd, dd->ipath_kregs->kr_intstatus);
+ p0bits = port0rbits;
+
if (unlikely(!istat)) {
ipath_stats.sps_nullintr++;
ret = IRQ_NONE; /* not our interrupt, or already handled */
@@ -820,10 +820,11 @@ irqreturn_t ipath_intr(int irq, void *da
ipath_dev_err(dd, "Read of error status failed "
"(all bits set); ignoring\n");
else
- handle_errors(dd, estat);
- }
-
- p0bits = port0rbits;
+ if(handle_errors(dd, estat))
+ /* force calling ipath_kreceive() */
+ chk0rcv = 1;
+ }
+
if (istat & INFINIPATH_I_GPIO) {
/*
* Packets are available in the port 0 rcv queue.
@@ -845,8 +846,10 @@ irqreturn_t ipath_intr(int irq, void *da
ipath_write_kreg(dd, dd->ipath_kregs->kr_gpio_clear,
(u64) (1 << 2));
p0bits |= INFINIPATH_I_GPIO;
- }
- }
+ chk0rcv = 1;
+ }
+ }
+ chk0rcv |= istat & p0bits;

/*
* clear the ones we will deal with on this round
@@ -858,18 +861,16 @@ irqreturn_t ipath_intr(int irq, void *da
ipath_write_kreg(dd, dd->ipath_kregs->kr_intclear, istat);

/*
- * we check for both transition from empty to non-empty, and urgent
- * packets (those with the interrupt bit set in the header), and
- * if enabled, the GPIO bit 2 interrupt used for port0 on some
- * HT-400 boards.
- * Do this before checking for pio buffers available, since
- * receives can overflow; piobuf waiters can afford a few
- * extra cycles, since they were waiting anyway.
- */
- if(istat & p0bits) {
+ * handle port0 receive before checking for pio buffers available,
+ * since receives can overflow; piobuf waiters can afford a few
+ * extra cycles, since they were waiting anyway, and user's waiting
+ * for receive are at the bottom.
+ */
+ if(chk0rcv) {
ipath_kreceive(dd);
istat &= ~port0rbits;
}
+
if (istat & ((infinipath_i_rcvavail_mask <<
INFINIPATH_I_RCVAVAIL_SHIFT)
| (infinipath_i_rcvurg_mask <<

2006-05-13 00:01:43

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 20 of 53] ipath - more sharing between RC and UC code

Share more common code between RC and UC protocols.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 947e92f4b370 -r 201654fe1962 drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:28 2006 -0700
@@ -718,9 +718,7 @@ struct ib_qp *ipath_create_qp(struct ib_
spin_lock_init(&qp->r_rq.lock);
atomic_set(&qp->refcount, 0);
init_waitqueue_head(&qp->wait);
- tasklet_init(&qp->s_task,
- init_attr->qp_type == IB_QPT_RC ?
- ipath_do_rc_send : ipath_do_uc_send,
+ tasklet_init(&qp->s_task, ipath_do_ruc_send,
(unsigned long)qp);
INIT_LIST_HEAD(&qp->piowait);
INIT_LIST_HEAD(&qp->timerwait);
@@ -905,9 +903,9 @@ void ipath_get_credit(struct ipath_qp *q
* as many packets as we like. Otherwise, we have to
* honor the credit field.
*/
- if (credit == IPS_AETH_CREDIT_INVAL) {
+ if (credit == IPS_AETH_CREDIT_INVAL)
qp->s_lsn = (u32) -1;
- } else if (qp->s_lsn != (u32) -1) {
+ else if (qp->s_lsn != (u32) -1) {
/* Compute new LSN (i.e., MSN + credit) */
credit = (aeth + credit_table[credit]) & IPS_MSN_MASK;
if (ipath_cmp24(credit, qp->s_lsn) > 0)
diff -r 947e92f4b370 -r 201654fe1962 drivers/infiniband/hw/ipath/ipath_rc.c
--- a/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_rc.c Fri May 12 15:55:28 2006 -0700
@@ -72,9 +72,9 @@ static void ipath_init_restart(struct ip
* Return bth0 if constructed; otherwise, return 0.
* Note the QP s_lock must be held.
*/
-static inline u32 ipath_make_rc_ack(struct ipath_qp *qp,
- struct ipath_other_headers *ohdr,
- u32 pmtu)
+u32 ipath_make_rc_ack(struct ipath_qp *qp,
+ struct ipath_other_headers *ohdr,
+ u32 pmtu)
{
struct ipath_sge_state *ss;
u32 hwords;
@@ -95,8 +95,7 @@ static inline u32 ipath_make_rc_ack(stru
if (len > pmtu) {
len = pmtu;
qp->s_ack_state = OP(RDMA_READ_RESPONSE_FIRST);
- }
- else
+ } else
qp->s_ack_state = OP(RDMA_READ_RESPONSE_ONLY);
qp->s_rdma_len -= len;
bth0 = qp->s_ack_state << 24;
@@ -135,7 +134,8 @@ static inline u32 ipath_make_rc_ack(stru
*/
qp->r_state = OP(RDMA_READ_RESPONSE_LAST);
qp->s_ack_state = OP(ACKNOWLEDGE);
- return 0;
+ bth0 = 0;
+ goto bail;

case OP(COMPARE_SWAP):
case OP(FETCH_ADD):
@@ -143,7 +143,7 @@ static inline u32 ipath_make_rc_ack(stru
len = 0;
qp->r_state = OP(SEND_LAST);
qp->s_ack_state = OP(ACKNOWLEDGE);
- bth0 = IB_OPCODE_ATOMIC_ACKNOWLEDGE << 24;
+ bth0 = OP(ATOMIC_ACKNOWLEDGE) << 24;
ohdr->u.at.aeth = ipath_compute_aeth(qp);
ohdr->u.at.atomic_ack_eth = cpu_to_be64(qp->s_ack_atomic);
hwords += sizeof(ohdr->u.at) / 4;
@@ -162,6 +162,7 @@ static inline u32 ipath_make_rc_ack(stru
qp->s_cur_sge = ss;
qp->s_cur_size = len;

+bail:
return bth0;
}

@@ -176,9 +177,9 @@ static inline u32 ipath_make_rc_ack(stru
* Return 1 if constructed; otherwise, return 0.
* Note the QP s_lock must be held.
*/
-static inline int ipath_make_rc_req(struct ipath_qp *qp,
- struct ipath_other_headers *ohdr,
- u32 pmtu, u32 *bth0p, u32 *bth2p)
+int ipath_make_rc_req(struct ipath_qp *qp,
+ struct ipath_other_headers *ohdr,
+ u32 pmtu, u32 *bth0p, u32 *bth2p)
{
struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
struct ipath_sge_state *ss;
@@ -257,7 +258,7 @@ static inline int ipath_make_rc_req(stru
break;

case IB_WR_RDMA_WRITE:
- if (newreq)
+ if (newreq && qp->s_lsn != (u32) -1)
qp->s_lsn++;
/* FALLTHROUGH */
case IB_WR_RDMA_WRITE_WITH_IMM:
@@ -283,8 +284,7 @@ static inline int ipath_make_rc_req(stru
else {
qp->s_state =
OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE);
- /* Immediate data comes
- * after RETH */
+ /* Immediate data comes after RETH */
ohdr->u.rc.imm_data = wqe->wr.imm_data;
hwords += 1;
if (wqe->wr.send_flags & IB_SEND_SOLICITED)
@@ -304,7 +304,8 @@ static inline int ipath_make_rc_req(stru
qp->s_state = OP(RDMA_READ_REQUEST);
hwords += sizeof(ohdr->u.rc.reth) / 4;
if (newreq) {
- qp->s_lsn++;
+ if (qp->s_lsn != (u32) -1)
+ qp->s_lsn++;
/*
* Adjust s_next_psn to count the
* expected number of responses.
@@ -335,7 +336,8 @@ static inline int ipath_make_rc_req(stru
wqe->wr.wr.atomic.compare_add);
hwords += sizeof(struct ib_atomic_eth) / 4;
if (newreq) {
- qp->s_lsn++;
+ if (qp->s_lsn != (u32) -1)
+ qp->s_lsn++;
wqe->lpsn = wqe->psn;
}
if (++qp->s_cur == qp->s_size)
@@ -355,6 +357,11 @@ static inline int ipath_make_rc_req(stru
bth2 |= qp->s_psn++ & IPS_PSN_MASK;
if ((int)(qp->s_psn - qp->s_next_psn) > 0)
qp->s_next_psn = qp->s_psn;
+ /*
+ * Put the QP on the pending list so lost ACKs will cause
+ * a retry. More than one request can be pending so the
+ * QP may already be on the dev->pending list.
+ */
spin_lock(&dev->pending_lock);
if (list_empty(&qp->timerwait))
list_add_tail(&qp->timerwait,
@@ -364,8 +371,8 @@ static inline int ipath_make_rc_req(stru

case OP(RDMA_READ_RESPONSE_FIRST):
/*
- * This case can only happen if a send is restarted. See
- * ipath_restart_rc().
+ * This case can only happen if a send is restarted.
+ * See ipath_restart_rc().
*/
ipath_init_restart(qp, wqe);
/* FALLTHROUGH */
@@ -496,176 +503,48 @@ done:
return 0;
}

-static inline void ipath_make_rc_grh(struct ipath_qp *qp,
- struct ib_global_route *grh,
- u32 nwords)
-{
- struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
-
- /* GRH header size in 32-bit words. */
- qp->s_hdrwords += 10;
- qp->s_hdr.u.l.grh.version_tclass_flow =
- cpu_to_be32((6 << 28) |
- (grh->traffic_class << 20) |
- grh->flow_label);
- qp->s_hdr.u.l.grh.paylen =
- cpu_to_be16(((qp->s_hdrwords - 12) + nwords +
- SIZE_OF_CRC) << 2);
- /* next_hdr is defined by C8-7 in ch. 8.4.1 */
- qp->s_hdr.u.l.grh.next_hdr = 0x1B;
- qp->s_hdr.u.l.grh.hop_limit = grh->hop_limit;
- /* The SGID is 32-bit aligned. */
- qp->s_hdr.u.l.grh.sgid.global.subnet_prefix = dev->gid_prefix;
- qp->s_hdr.u.l.grh.sgid.global.interface_id =
- ipath_layer_get_guid(dev->dd);
- qp->s_hdr.u.l.grh.dgid = grh->dgid;
-}
-
/**
- * ipath_do_rc_send - perform a send on an RC QP
- * @data: contains a pointer to the QP
+ * send_rc_ack - Construct an ACK packet and send it
+ * @qp: a pointer to the QP
*
- * Process entries in the send work queue until credit or queue is
- * exhausted. Only allow one CPU to send a packet per QP (tasklet).
- * Otherwise, after we drop the QP s_lock, two threads could send
- * packets out of order.
+ * This is called from ipath_rc_rcv() and only uses the receive
+ * side QP state.
+ * Note that RDMA reads are handled in the send side QP state and tasklet.
*/
-void ipath_do_rc_send(unsigned long data)
-{
- struct ipath_qp *qp = (struct ipath_qp *)data;
- struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
- unsigned long flags;
- u16 lrh0;
- u32 nwords;
- u32 extra_bytes;
- u32 bth0;
- u32 bth2;
- u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu);
- struct ipath_other_headers *ohdr;
-
- if (test_and_set_bit(IPATH_S_BUSY, &qp->s_flags))
- goto bail;
-
- if (unlikely(qp->remote_ah_attr.dlid ==
- ipath_layer_get_lid(dev->dd))) {
- struct ib_wc wc;
-
- /*
- * Pass in an uninitialized ib_wc to be consistent with
- * other places where ipath_ruc_loopback() is called.
- */
- ipath_ruc_loopback(qp, &wc);
- goto clear;
- }
-
- ohdr = &qp->s_hdr.u.oth;
- if (qp->remote_ah_attr.ah_flags & IB_AH_GRH)
- ohdr = &qp->s_hdr.u.l.oth;
-
-again:
- /* Check for a constructed packet to be sent. */
- if (qp->s_hdrwords != 0) {
- /*
- * If no PIO bufs are available, return. An interrupt will
- * call ipath_ib_piobufavail() when one is available.
- */
- _VERBS_INFO("h %u %p\n", qp->s_hdrwords, &qp->s_hdr);
- _VERBS_INFO("d %u %p %u %p %u %u %u %u\n", qp->s_cur_size,
- qp->s_cur_sge->sg_list,
- qp->s_cur_sge->num_sge,
- qp->s_cur_sge->sge.vaddr,
- qp->s_cur_sge->sge.sge_length,
- qp->s_cur_sge->sge.length,
- qp->s_cur_sge->sge.m,
- qp->s_cur_sge->sge.n);
- if (ipath_verbs_send(dev->dd, qp->s_hdrwords,
- (u32 *) &qp->s_hdr, qp->s_cur_size,
- qp->s_cur_sge)) {
- ipath_no_bufs_available(qp, dev);
- goto bail;
- }
- dev->n_unicast_xmit++;
- /* Record that we sent the packet and s_hdr is empty. */
- qp->s_hdrwords = 0;
- }
-
- /*
- * The lock is needed to synchronize between setting
- * qp->s_ack_state, resend timer, and post_send().
- */
- spin_lock_irqsave(&qp->s_lock, flags);
-
- /* Sending responses has higher priority over sending requests. */
- if (qp->s_ack_state != OP(ACKNOWLEDGE) &&
- (bth0 = ipath_make_rc_ack(qp, ohdr, pmtu)) != 0)
- bth2 = qp->s_ack_psn++ & IPS_PSN_MASK;
- else if (!ipath_make_rc_req(qp, ohdr, pmtu, &bth0, &bth2))
- goto done;
-
- spin_unlock_irqrestore(&qp->s_lock, flags);
-
- /* Construct the header. */
- extra_bytes = (4 - qp->s_cur_size) & 3;
- nwords = (qp->s_cur_size + extra_bytes) >> 2;
- lrh0 = IPS_LRH_BTH;
- if (unlikely(qp->remote_ah_attr.ah_flags & IB_AH_GRH)) {
- ipath_make_rc_grh(qp, &qp->remote_ah_attr.grh, nwords);
- lrh0 = IPS_LRH_GRH;
- }
- lrh0 |= qp->remote_ah_attr.sl << 4;
- qp->s_hdr.lrh[0] = cpu_to_be16(lrh0);
- qp->s_hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);
- qp->s_hdr.lrh[2] = cpu_to_be16(qp->s_hdrwords + nwords +
- SIZE_OF_CRC);
- qp->s_hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->dd));
- bth0 |= ipath_layer_get_pkey(dev->dd, qp->s_pkey_index);
- bth0 |= extra_bytes << 20;
- ohdr->bth[0] = cpu_to_be32(bth0);
- ohdr->bth[1] = cpu_to_be32(qp->remote_qpn);
- ohdr->bth[2] = cpu_to_be32(bth2);
-
- /* Check for more work to do. */
- goto again;
-
-done:
- spin_unlock_irqrestore(&qp->s_lock, flags);
-clear:
- clear_bit(IPATH_S_BUSY, &qp->s_flags);
-bail:
- return;
-}
-
static void send_rc_ack(struct ipath_qp *qp)
{
struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
u16 lrh0;
u32 bth0;
+ u32 hwords;
+ struct ipath_ib_header hdr;
struct ipath_other_headers *ohdr;

/* Construct the header. */
- ohdr = &qp->s_hdr.u.oth;
+ ohdr = &hdr.u.oth;
lrh0 = IPS_LRH_BTH;
/* header size in 32-bit words LRH+BTH+AETH = (8+12+4)/4. */
- qp->s_hdrwords = 6;
+ hwords = 6;
if (unlikely(qp->remote_ah_attr.ah_flags & IB_AH_GRH)) {
- ipath_make_rc_grh(qp, &qp->remote_ah_attr.grh, 0);
- ohdr = &qp->s_hdr.u.l.oth;
+ hwords += ipath_make_grh(dev, &hdr.u.l.grh,
+ &qp->remote_ah_attr.grh,
+ hwords, 0);
+ ohdr = &hdr.u.l.oth;
lrh0 = IPS_LRH_GRH;
}
bth0 = ipath_layer_get_pkey(dev->dd, qp->s_pkey_index);
ohdr->u.aeth = ipath_compute_aeth(qp);
if (qp->s_ack_state >= OP(COMPARE_SWAP)) {
- bth0 |= IB_OPCODE_ATOMIC_ACKNOWLEDGE << 24;
+ bth0 |= OP(ATOMIC_ACKNOWLEDGE) << 24;
ohdr->u.at.atomic_ack_eth = cpu_to_be64(qp->s_ack_atomic);
- qp->s_hdrwords += sizeof(ohdr->u.at.atomic_ack_eth) / 4;
- }
- else
+ hwords += sizeof(ohdr->u.at.atomic_ack_eth) / 4;
+ } else
bth0 |= OP(ACKNOWLEDGE) << 24;
lrh0 |= qp->remote_ah_attr.sl << 4;
- qp->s_hdr.lrh[0] = cpu_to_be16(lrh0);
- qp->s_hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);
- qp->s_hdr.lrh[2] = cpu_to_be16(qp->s_hdrwords + SIZE_OF_CRC);
- qp->s_hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->dd));
+ hdr.lrh[0] = cpu_to_be16(lrh0);
+ hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);
+ hdr.lrh[2] = cpu_to_be16(hwords + SIZE_OF_CRC);
+ hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->dd));
ohdr->bth[0] = cpu_to_be32(bth0);
ohdr->bth[1] = cpu_to_be32(qp->remote_qpn);
ohdr->bth[2] = cpu_to_be32(qp->s_ack_psn & IPS_PSN_MASK);
@@ -673,12 +552,93 @@ static void send_rc_ack(struct ipath_qp
/*
* If we can send the ACK, clear the ACK state.
*/
- if (ipath_verbs_send(dev->dd, qp->s_hdrwords, (u32 *) &qp->s_hdr,
- 0, NULL) == 0) {
+ if (ipath_verbs_send(dev->dd, hwords, (u32 *) &hdr, 0, NULL) == 0) {
qp->s_ack_state = OP(ACKNOWLEDGE);
+ dev->n_unicast_xmit++;
+ } else
dev->n_rc_qacks++;
- dev->n_unicast_xmit++;
- }
+}
+
+/**
+ * reset_psn - reset the QP state to send starting from PSN
+ * @qp: the QP
+ * @psn: the packet sequence number to restart at
+ *
+ * This is called from ipath_rc_rcv() to process an incoming RC ACK
+ * for the given QP.
+ * Called at interrupt level with the QP s_lock held.
+ */
+static void reset_psn(struct ipath_qp *qp, u32 psn)
+{
+ u32 n = qp->s_last;
+ struct ipath_swqe *wqe = get_swqe_ptr(qp, n);
+ u32 opcode;
+
+ qp->s_cur = n;
+
+ /*
+ * If we are starting the request from the beginning,
+ * let the normal send code handle initialization.
+ */
+ if (ipath_cmp24(psn, wqe->psn) <= 0) {
+ qp->s_state = OP(SEND_LAST);
+ goto done;
+ }
+
+ /* Find the work request opcode corresponding to the given PSN. */
+ opcode = wqe->wr.opcode;
+ for (;;) {
+ int diff;
+
+ if (++n == qp->s_size)
+ n = 0;
+ if (n == qp->s_tail)
+ break;
+ wqe = get_swqe_ptr(qp, n);
+ diff = ipath_cmp24(psn, wqe->psn);
+ if (diff < 0)
+ break;
+ qp->s_cur = n;
+ /*
+ * If we are starting the request from the beginning,
+ * let the normal send code handle initialization.
+ */
+ if (diff == 0) {
+ qp->s_state = OP(SEND_LAST);
+ goto done;
+ }
+ opcode = wqe->wr.opcode;
+ }
+
+ /*
+ * Set the state to restart in the middle of a request.
+ * Don't change the s_sge, s_cur_sge, or s_cur_size.
+ * See ipath_do_rc_send().
+ */
+ switch (opcode) {
+ case IB_WR_SEND:
+ case IB_WR_SEND_WITH_IMM:
+ qp->s_state = OP(RDMA_READ_RESPONSE_FIRST);
+ break;
+
+ case IB_WR_RDMA_WRITE:
+ case IB_WR_RDMA_WRITE_WITH_IMM:
+ qp->s_state = OP(RDMA_READ_RESPONSE_LAST);
+ break;
+
+ case IB_WR_RDMA_READ:
+ qp->s_state = OP(RDMA_READ_RESPONSE_MIDDLE);
+ break;
+
+ default:
+ /*
+ * This case shouldn't happen since its only
+ * one PSN per req.
+ */
+ qp->s_state = OP(SEND_LAST);
+ }
+done:
+ qp->s_psn = psn;
}

/**
@@ -693,7 +653,6 @@ void ipath_restart_rc(struct ipath_qp *q
{
struct ipath_swqe *wqe = get_swqe_ptr(qp, qp->s_last);
struct ipath_ibdev *dev;
- u32 n;

/*
* If there are no requests pending, we are done.
@@ -735,130 +694,13 @@ void ipath_restart_rc(struct ipath_qp *q
else
dev->n_rc_resends += (int)qp->s_psn - (int)psn;

- /*
- * If we are starting the request from the beginning, let the normal
- * send code handle initialization.
- */
- qp->s_cur = qp->s_last;
- if (ipath_cmp24(psn, wqe->psn) <= 0) {
- qp->s_state = OP(SEND_LAST);
- qp->s_psn = wqe->psn;
- } else {
- n = qp->s_cur;
- for (;;) {
- if (++n == qp->s_size)
- n = 0;
- if (n == qp->s_tail) {
- if (ipath_cmp24(psn, qp->s_next_psn) >= 0) {
- qp->s_cur = n;
- wqe = get_swqe_ptr(qp, n);
- }
- break;
- }
- wqe = get_swqe_ptr(qp, n);
- if (ipath_cmp24(psn, wqe->psn) < 0)
- break;
- qp->s_cur = n;
- }
- qp->s_psn = psn;
-
- /*
- * Reset the state to restart in the middle of a request.
- * Don't change the s_sge, s_cur_sge, or s_cur_size.
- * See ipath_do_rc_send().
- */
- switch (wqe->wr.opcode) {
- case IB_WR_SEND:
- case IB_WR_SEND_WITH_IMM:
- qp->s_state = OP(RDMA_READ_RESPONSE_FIRST);
- break;
-
- case IB_WR_RDMA_WRITE:
- case IB_WR_RDMA_WRITE_WITH_IMM:
- qp->s_state = OP(RDMA_READ_RESPONSE_LAST);
- break;
-
- case IB_WR_RDMA_READ:
- qp->s_state =
- OP(RDMA_READ_RESPONSE_MIDDLE);
- break;
-
- default:
- /*
- * This case shouldn't happen since its only
- * one PSN per req.
- */
- qp->s_state = OP(SEND_LAST);
- }
- }
+ reset_psn(qp, psn);

done:
tasklet_hi_schedule(&qp->s_task);

bail:
return;
-}
-
-/**
- * reset_psn - reset the QP state to send starting from PSN
- * @qp: the QP
- * @psn: the packet sequence number to restart at
- *
- * This is called from ipath_rc_rcv() to process an incoming RC ACK
- * for the given QP.
- * Called at interrupt level with the QP s_lock held.
- */
-static void reset_psn(struct ipath_qp *qp, u32 psn)
-{
- struct ipath_swqe *wqe;
- u32 n;
-
- n = qp->s_cur;
- wqe = get_swqe_ptr(qp, n);
- for (;;) {
- if (++n == qp->s_size)
- n = 0;
- if (n == qp->s_tail) {
- if (ipath_cmp24(psn, qp->s_next_psn) >= 0) {
- qp->s_cur = n;
- wqe = get_swqe_ptr(qp, n);
- }
- break;
- }
- wqe = get_swqe_ptr(qp, n);
- if (ipath_cmp24(psn, wqe->psn) < 0)
- break;
- qp->s_cur = n;
- }
- qp->s_psn = psn;
-
- /*
- * Set the state to restart in the middle of a
- * request. Don't change the s_sge, s_cur_sge, or
- * s_cur_size. See ipath_do_rc_send().
- */
- switch (wqe->wr.opcode) {
- case IB_WR_SEND:
- case IB_WR_SEND_WITH_IMM:
- qp->s_state = OP(RDMA_READ_RESPONSE_FIRST);
- break;
-
- case IB_WR_RDMA_WRITE:
- case IB_WR_RDMA_WRITE_WITH_IMM:
- qp->s_state = OP(RDMA_READ_RESPONSE_LAST);
- break;
-
- case IB_WR_RDMA_READ:
- qp->s_state = OP(RDMA_READ_RESPONSE_MIDDLE);
- break;
-
- default:
- /*
- * This case shouldn't happen since its only
- * one PSN per req.
- */
- qp->s_state = OP(SEND_LAST);
- }
}

/**
@@ -867,7 +709,7 @@ static void reset_psn(struct ipath_qp *q
* @psn: the packet sequence number of the ACK
* @opcode: the opcode of the request that resulted in the ACK
*
- * This is called from ipath_rc_rcv() to process an incoming RC ACK
+ * This is called from ipath_rc_rcv_resp() to process an incoming RC ACK
* for the given QP.
* Called at interrupt level with the QP s_lock held.
* Returns 1 if OK, 0 if current operation should be aborted (NAK).
@@ -1011,17 +853,7 @@ static int do_rc_ack(struct ipath_qp *qp

dev->n_rc_resends += (int)qp->s_psn - (int)psn;

- /*
- * If we are starting the request from the beginning, let
- * the normal send code handle initialization.
- */
- qp->s_cur = qp->s_last;
- wqe = get_swqe_ptr(qp, qp->s_cur);
- if (ipath_cmp24(psn, wqe->psn) <= 0) {
- qp->s_state = OP(SEND_LAST);
- qp->s_psn = wqe->psn;
- } else
- reset_psn(qp, psn);
+ reset_psn(qp, psn);

qp->s_rnr_timeout =
ib_ipath_rnr_table[(aeth >> IPS_AETH_CREDIT_SHIFT) &
@@ -1182,32 +1014,33 @@ static inline void ipath_rc_rcv_resp(str
goto ack_done;
}
rdma_read:
- if (unlikely(qp->s_state != OP(RDMA_READ_REQUEST)))
- goto ack_done;
- if (unlikely(tlen != (hdrsize + pmtu + 4)))
- goto ack_done;
- if (unlikely(pmtu >= qp->s_len))
- goto ack_done;
- /* We got a response so update the timeout. */
- if (unlikely(qp->s_last == qp->s_tail ||
- get_swqe_ptr(qp, qp->s_last)->wr.opcode !=
- IB_WR_RDMA_READ))
- goto ack_done;
- spin_lock(&dev->pending_lock);
- if (qp->s_rnr_timeout == 0 && !list_empty(&qp->timerwait))
- list_move_tail(&qp->timerwait,
- &dev->pending[dev->pending_index]);
- spin_unlock(&dev->pending_lock);
- /*
- * Update the RDMA receive state but do the copy w/o holding the
- * locks and blocking interrupts. XXX Yet another place that
- * affects relaxed RDMA order since we don't want s_sge modified.
- */
- qp->s_len -= pmtu;
- qp->s_last_psn = psn;
- spin_unlock_irqrestore(&qp->s_lock, flags);
- ipath_copy_sge(&qp->s_sge, data, pmtu);
- goto bail;
+ if (unlikely(qp->s_state != OP(RDMA_READ_REQUEST)))
+ goto ack_done;
+ if (unlikely(tlen != (hdrsize + pmtu + 4)))
+ goto ack_done;
+ if (unlikely(pmtu >= qp->s_len))
+ goto ack_done;
+ /* We got a response so update the timeout. */
+ if (unlikely(qp->s_last == qp->s_tail ||
+ get_swqe_ptr(qp, qp->s_last)->wr.opcode !=
+ IB_WR_RDMA_READ))
+ goto ack_done;
+ spin_lock(&dev->pending_lock);
+ if (qp->s_rnr_timeout == 0 && !list_empty(&qp->timerwait))
+ list_move_tail(&qp->timerwait,
+ &dev->pending[dev->pending_index]);
+ spin_unlock(&dev->pending_lock);
+ /*
+ * Update the RDMA receive state but do the copy w/o
+ * holding the locks and blocking interrupts.
+ * XXX Yet another place that affects relaxed RDMA order
+ * since we don't want s_sge modified.
+ */
+ qp->s_len -= pmtu;
+ qp->s_last_psn = psn;
+ spin_unlock_irqrestore(&qp->s_lock, flags);
+ ipath_copy_sge(&qp->s_sge, data, pmtu);
+ goto bail;

case OP(RDMA_READ_RESPONSE_LAST):
/* ACKs READ req. */
@@ -1230,18 +1063,12 @@ static inline void ipath_rc_rcv_resp(str
* ICRC (4).
*/
if (unlikely(tlen <= (hdrsize + pad + 8))) {
- /*
- * XXX Need to generate an error CQ
- * entry.
- */
+ /* XXX Need to generate an error CQ entry. */
goto ack_done;
}
tlen -= hdrsize + pad + 8;
if (unlikely(tlen != qp->s_len)) {
- /*
- * XXX Need to generate an error CQ
- * entry.
- */
+ /* XXX Need to generate an error CQ entry. */
goto ack_done;
}
if (!header_in_data)
@@ -1254,9 +1081,12 @@ static inline void ipath_rc_rcv_resp(str
if (do_rc_ack(qp, aeth, psn, OP(RDMA_READ_RESPONSE_LAST))) {
/*
* Change the state so we contimue
- * processing new requests.
+ * processing new requests and wake up the
+ * tasklet if there are posted sends.
*/
qp->s_state = OP(SEND_LAST);
+ if (qp->s_tail != qp->s_head)
+ tasklet_hi_schedule(&qp->s_task);
}
goto ack_done;
}
@@ -1295,6 +1125,8 @@ static inline int ipath_rc_rcv_error(str
{
struct ib_reth *reth;

+ spin_lock(&qp->s_lock);
+
if (diff > 0) {
/*
* Packet sequence error.
@@ -1302,13 +1134,10 @@ static inline int ipath_rc_rcv_error(str
* Don't queue the NAK if a RDMA read, atomic, or
* NAK is pending though.
*/
- spin_lock(&qp->s_lock);
if ((qp->s_ack_state >= OP(RDMA_READ_REQUEST) &&
- qp->s_ack_state != IB_OPCODE_ACKNOWLEDGE) ||
- qp->s_nak_state != 0) {
- spin_unlock(&qp->s_lock);
+ qp->s_ack_state != OP(ACKNOWLEDGE)) ||
+ qp->s_nak_state != 0)
goto done;
- }
qp->s_ack_state = OP(SEND_ONLY);
qp->s_nak_state = IB_NAK_PSN_ERROR;
/* Use the expected PSN. */
@@ -1327,12 +1156,10 @@ static inline int ipath_rc_rcv_error(str
* send the earliest so that RDMA reads can be restarted at
* the requester's expected PSN.
*/
- spin_lock(&qp->s_lock);
- if (qp->s_ack_state != IB_OPCODE_ACKNOWLEDGE &&
+ if (qp->s_ack_state != OP(ACKNOWLEDGE) &&
ipath_cmp24(psn, qp->s_ack_psn) >= 0) {
- if (qp->s_ack_state < IB_OPCODE_RDMA_READ_REQUEST)
+ if (qp->s_ack_state < OP(RDMA_READ_REQUEST))
qp->s_ack_psn = psn;
- spin_unlock(&qp->s_lock);
goto done;
}
switch (opcode) {
@@ -1343,8 +1170,7 @@ static inline int ipath_rc_rcv_error(str
* holding the s_lock.
*/
if (qp->s_ack_state != OP(ACKNOWLEDGE) &&
- qp->s_ack_state >= IB_OPCODE_RDMA_READ_REQUEST) {
- spin_unlock(&qp->s_lock);
+ qp->s_ack_state >= OP(RDMA_READ_REQUEST)) {
dev->n_rdma_dup_busy++;
goto done;
}
@@ -1383,13 +1209,11 @@ static inline int ipath_rc_rcv_error(str
case OP(COMPARE_SWAP):
case OP(FETCH_ADD):
/*
- * Check for the PSN of the last atomic operations
+ * Check for the PSN of the last atomic operation
* performed and resend the result if found.
*/
- if ((psn & IPS_PSN_MASK) != qp->r_atomic_psn) {
- spin_unlock(&qp->s_lock);
+ if ((psn & IPS_PSN_MASK) != qp->r_atomic_psn)
goto done;
- }
qp->s_ack_atomic = qp->r_atomic_data;
break;
}
@@ -1400,6 +1224,7 @@ resched:
return 0;

done:
+ spin_unlock(&qp->s_lock);
return 1;
}

@@ -1453,11 +1278,6 @@ void ipath_rc_rcv(struct ipath_ibdev *de
} else
psn = be32_to_cpu(ohdr->bth[2]);
}
- /*
- * The opcode is in the low byte when its in network order
- * (top byte when in host order).
- */
- opcode = be32_to_cpu(ohdr->bth[0]) >> 24;

/*
* Process responses (ACKs) before anything else. Note that the
@@ -1465,6 +1285,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de
* queue rather than the expected receive packet sequence number.
* In other words, this QP is the requester.
*/
+ opcode = be32_to_cpu(ohdr->bth[0]) >> 24;
if (opcode >= OP(RDMA_READ_RESPONSE_FIRST) &&
opcode <= OP(ATOMIC_ACKNOWLEDGE)) {
ipath_rc_rcv_resp(dev, ohdr, data, tlen, qp, opcode, psn,
@@ -1492,22 +1313,23 @@ void ipath_rc_rcv(struct ipath_ibdev *de
opcode == OP(SEND_LAST_WITH_IMMEDIATE))
break;
nack_inv:
- /*
- * A NAK will ACK earlier sends and RDMA writes. Don't queue the
- * NAK if a RDMA read, atomic, or NAK is pending though.
- */
- spin_lock(&qp->s_lock);
- if (qp->s_ack_state >= OP(RDMA_READ_REQUEST) &&
- qp->s_ack_state != IB_OPCODE_ACKNOWLEDGE) {
- spin_unlock(&qp->s_lock);
- goto done;
- }
- /* XXX Flush WQEs */
- qp->state = IB_QPS_ERR;
- qp->s_ack_state = OP(SEND_ONLY);
- qp->s_nak_state = IB_NAK_INVALID_REQUEST;
- qp->s_ack_psn = qp->r_psn;
- goto resched;
+ /*
+ * A NAK will ACK earlier sends and RDMA writes.
+ * Don't queue the NAK if a RDMA read, atomic, or NAK
+ * is pending though.
+ */
+ spin_lock(&qp->s_lock);
+ if (qp->s_ack_state >= OP(RDMA_READ_REQUEST) &&
+ qp->s_ack_state != OP(ACKNOWLEDGE)) {
+ spin_unlock(&qp->s_lock);
+ goto done;
+ }
+ /* XXX Flush WQEs */
+ qp->state = IB_QPS_ERR;
+ qp->s_ack_state = OP(SEND_ONLY);
+ qp->s_nak_state = IB_NAK_INVALID_REQUEST;
+ qp->s_ack_psn = qp->r_psn;
+ goto resched;

case OP(RDMA_WRITE_FIRST):
case OP(RDMA_WRITE_MIDDLE):
@@ -1556,9 +1378,8 @@ void ipath_rc_rcv(struct ipath_ibdev *de
* is pending though.
*/
spin_lock(&qp->s_lock);
- if (qp->s_ack_state >=
- OP(RDMA_READ_REQUEST) &&
- qp->s_ack_state != IB_OPCODE_ACKNOWLEDGE) {
+ if (qp->s_ack_state >= OP(RDMA_READ_REQUEST) &&
+ qp->s_ack_state != OP(ACKNOWLEDGE)) {
spin_unlock(&qp->s_lock);
goto done;
}
@@ -1674,10 +1495,10 @@ void ipath_rc_rcv(struct ipath_ibdev *de
* read, atomic, or NAK is pending though.
*/
spin_lock(&qp->s_lock);
+ nack_acc1:
if (qp->s_ack_state >=
OP(RDMA_READ_REQUEST) &&
- qp->s_ack_state !=
- IB_OPCODE_ACKNOWLEDGE) {
+ qp->s_ack_state != OP(ACKNOWLEDGE)) {
spin_unlock(&qp->s_lock);
goto done;
}
@@ -1715,9 +1536,16 @@ void ipath_rc_rcv(struct ipath_ibdev *de
reth = (struct ib_reth *)data;
data += sizeof(*reth);
}
+ if (unlikely(!(qp->qp_access_flags &
+ IB_ACCESS_REMOTE_READ)))
+ goto nack_acc;
+ /*
+ * Ignore request if we already have an
+ * RDMA read or ATOMIC pending.
+ */
spin_lock(&qp->s_lock);
if (qp->s_ack_state != OP(ACKNOWLEDGE) &&
- qp->s_ack_state >= IB_OPCODE_RDMA_READ_REQUEST) {
+ qp->s_ack_state >= OP(RDMA_READ_REQUEST)) {
spin_unlock(&qp->s_lock);
goto done;
}
@@ -1731,10 +1559,8 @@ void ipath_rc_rcv(struct ipath_ibdev *de
ok = ipath_rkey_ok(dev, &qp->s_rdma_sge,
qp->s_rdma_len, vaddr, rkey,
IB_ACCESS_REMOTE_READ);
- if (unlikely(!ok)) {
- spin_unlock(&qp->s_lock);
- goto nack_acc;
- }
+ if (unlikely(!ok))
+ goto nack_acc1;
/*
* Update the next expected PSN. We add 1 later
* below, so only add the remainder here.
@@ -1749,9 +1575,6 @@ void ipath_rc_rcv(struct ipath_ibdev *de
qp->s_rdma_sge.sge.length = 0;
qp->s_rdma_sge.sge.sge_length = 0;
}
- if (unlikely(!(qp->qp_access_flags &
- IB_ACCESS_REMOTE_READ)))
- goto nack_acc;
/*
* We need to increment the MSN here instead of when we
* finish sending the result since a duplicate request would
@@ -1821,7 +1644,7 @@ void ipath_rc_rcv(struct ipath_ibdev *de
*/
spin_lock(&qp->s_lock);
if (qp->s_ack_state == OP(ACKNOWLEDGE) ||
- qp->s_ack_state < IB_OPCODE_RDMA_READ_REQUEST) {
+ qp->s_ack_state < OP(RDMA_READ_REQUEST)) {
qp->s_ack_state = opcode;
qp->s_nak_state = 0;
qp->s_ack_psn = psn;
@@ -1843,6 +1666,8 @@ resched:
(qp->s_ack_state < IB_OPCODE_RDMA_READ_REQUEST ||
qp->s_ack_state >= IB_OPCODE_COMPARE_SWAP))
send_rc_ack(qp);
+ else
+ dev->n_rc_qacks++;

rdmadone:
spin_unlock(&qp->s_lock);
diff -r 947e92f4b370 -r 201654fe1962 drivers/infiniband/hw/ipath/ipath_ruc.c
--- a/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ruc.c Fri May 12 15:55:28 2006 -0700
@@ -31,6 +31,7 @@
*/

#include "ipath_verbs.h"
+#include "ips_common.h"

/*
* Convert the AETH RNR timeout code into the number of milliseconds.
@@ -187,7 +188,6 @@ bail:
/**
* ipath_ruc_loopback - handle UC and RC lookback requests
* @sqp: the loopback QP
- * @wc: the work completion entry
*
* This is called from ipath_do_uc_send() or ipath_do_rc_send() to
* forward a WQE addressed to the same HCA.
@@ -196,13 +196,14 @@ bail:
* receive interrupts since this is a connected protocol and all packets
* will pass through here.
*/
-void ipath_ruc_loopback(struct ipath_qp *sqp, struct ib_wc *wc)
+static void ipath_ruc_loopback(struct ipath_qp *sqp)
{
struct ipath_ibdev *dev = to_idev(sqp->ibqp.device);
struct ipath_qp *qp;
struct ipath_swqe *wqe;
struct ipath_sge *sge;
unsigned long flags;
+ struct ib_wc wc;
u64 sdata;

qp = ipath_lookup_qpn(&dev->qp_table, sqp->remote_qpn);
@@ -233,8 +234,8 @@ again:
wqe = get_swqe_ptr(sqp, sqp->s_last);
spin_unlock_irqrestore(&sqp->s_lock, flags);

- wc->wc_flags = 0;
- wc->imm_data = 0;
+ wc.wc_flags = 0;
+ wc.imm_data = 0;

sqp->s_sge.sge = wqe->sg_list[0];
sqp->s_sge.sg_list = wqe->sg_list + 1;
@@ -242,8 +243,8 @@ again:
sqp->s_len = wqe->length;
switch (wqe->wr.opcode) {
case IB_WR_SEND_WITH_IMM:
- wc->wc_flags = IB_WC_WITH_IMM;
- wc->imm_data = wqe->wr.imm_data;
+ wc.wc_flags = IB_WC_WITH_IMM;
+ wc.imm_data = wqe->wr.imm_data;
/* FALLTHROUGH */
case IB_WR_SEND:
spin_lock_irqsave(&qp->r_rq.lock, flags);
@@ -254,7 +255,7 @@ again:
if (qp->ibqp.qp_type == IB_QPT_UC)
goto send_comp;
if (sqp->s_rnr_retry == 0) {
- wc->status = IB_WC_RNR_RETRY_EXC_ERR;
+ wc.status = IB_WC_RNR_RETRY_EXC_ERR;
goto err;
}
if (sqp->s_rnr_retry_cnt < 7)
@@ -269,8 +270,8 @@ again:
break;

case IB_WR_RDMA_WRITE_WITH_IMM:
- wc->wc_flags = IB_WC_WITH_IMM;
- wc->imm_data = wqe->wr.imm_data;
+ wc.wc_flags = IB_WC_WITH_IMM;
+ wc.imm_data = wqe->wr.imm_data;
spin_lock_irqsave(&qp->r_rq.lock, flags);
if (!ipath_get_rwqe(qp, 1))
goto rnr_nak;
@@ -284,20 +285,20 @@ again:
wqe->wr.wr.rdma.rkey,
IB_ACCESS_REMOTE_WRITE))) {
acc_err:
- wc->status = IB_WC_REM_ACCESS_ERR;
+ wc.status = IB_WC_REM_ACCESS_ERR;
err:
- wc->wr_id = wqe->wr.wr_id;
- wc->opcode = ib_ipath_wc_opcode[wqe->wr.opcode];
- wc->vendor_err = 0;
- wc->byte_len = 0;
- wc->qp_num = sqp->ibqp.qp_num;
- wc->src_qp = sqp->remote_qpn;
- wc->pkey_index = 0;
- wc->slid = sqp->remote_ah_attr.dlid;
- wc->sl = sqp->remote_ah_attr.sl;
- wc->dlid_path_bits = 0;
- wc->port_num = 0;
- ipath_sqerror_qp(sqp, wc);
+ wc.wr_id = wqe->wr.wr_id;
+ wc.opcode = ib_ipath_wc_opcode[wqe->wr.opcode];
+ wc.vendor_err = 0;
+ wc.byte_len = 0;
+ wc.qp_num = sqp->ibqp.qp_num;
+ wc.src_qp = sqp->remote_qpn;
+ wc.pkey_index = 0;
+ wc.slid = sqp->remote_ah_attr.dlid;
+ wc.sl = sqp->remote_ah_attr.sl;
+ wc.dlid_path_bits = 0;
+ wc.port_num = 0;
+ ipath_sqerror_qp(sqp, &wc);
goto done;
}
break;
@@ -373,22 +374,22 @@ again:
goto send_comp;

if (wqe->wr.opcode == IB_WR_RDMA_WRITE_WITH_IMM)
- wc->opcode = IB_WC_RECV_RDMA_WITH_IMM;
+ wc.opcode = IB_WC_RECV_RDMA_WITH_IMM;
else
- wc->opcode = IB_WC_RECV;
- wc->wr_id = qp->r_wr_id;
- wc->status = IB_WC_SUCCESS;
- wc->vendor_err = 0;
- wc->byte_len = wqe->length;
- wc->qp_num = qp->ibqp.qp_num;
- wc->src_qp = qp->remote_qpn;
+ wc.opcode = IB_WC_RECV;
+ wc.wr_id = qp->r_wr_id;
+ wc.status = IB_WC_SUCCESS;
+ wc.vendor_err = 0;
+ wc.byte_len = wqe->length;
+ wc.qp_num = qp->ibqp.qp_num;
+ wc.src_qp = qp->remote_qpn;
/* XXX do we know which pkey matched? Only needed for GSI. */
- wc->pkey_index = 0;
- wc->slid = qp->remote_ah_attr.dlid;
- wc->sl = qp->remote_ah_attr.sl;
- wc->dlid_path_bits = 0;
+ wc.pkey_index = 0;
+ wc.slid = qp->remote_ah_attr.dlid;
+ wc.sl = qp->remote_ah_attr.sl;
+ wc.dlid_path_bits = 0;
/* Signal completion event if the solicited bit is set. */
- ipath_cq_enter(to_icq(qp->ibqp.recv_cq), wc,
+ ipath_cq_enter(to_icq(qp->ibqp.recv_cq), &wc,
wqe->wr.send_flags & IB_SEND_SOLICITED);

send_comp:
@@ -396,19 +397,19 @@ send_comp:

if (!test_bit(IPATH_S_SIGNAL_REQ_WR, &sqp->s_flags) ||
(wqe->wr.send_flags & IB_SEND_SIGNALED)) {
- wc->wr_id = wqe->wr.wr_id;
- wc->status = IB_WC_SUCCESS;
- wc->opcode = ib_ipath_wc_opcode[wqe->wr.opcode];
- wc->vendor_err = 0;
- wc->byte_len = wqe->length;
- wc->qp_num = sqp->ibqp.qp_num;
- wc->src_qp = 0;
- wc->pkey_index = 0;
- wc->slid = 0;
- wc->sl = 0;
- wc->dlid_path_bits = 0;
- wc->port_num = 0;
- ipath_cq_enter(to_icq(sqp->ibqp.send_cq), wc, 0);
+ wc.wr_id = wqe->wr.wr_id;
+ wc.status = IB_WC_SUCCESS;
+ wc.opcode = ib_ipath_wc_opcode[wqe->wr.opcode];
+ wc.vendor_err = 0;
+ wc.byte_len = wqe->length;
+ wc.qp_num = sqp->ibqp.qp_num;
+ wc.src_qp = 0;
+ wc.pkey_index = 0;
+ wc.slid = 0;
+ wc.sl = 0;
+ wc.dlid_path_bits = 0;
+ wc.port_num = 0;
+ ipath_cq_enter(to_icq(sqp->ibqp.send_cq), &wc, 0);
}

/* Update s_last now that we are finished with the SWQE */
@@ -454,11 +455,11 @@ void ipath_no_bufs_available(struct ipat
}

/**
- * ipath_post_rc_send - post RC and UC sends
+ * ipath_post_ruc_send - post RC and UC sends
* @qp: the QP to post on
* @wr: the work request to send
*/
-int ipath_post_rc_send(struct ipath_qp *qp, struct ib_send_wr *wr)
+int ipath_post_ruc_send(struct ipath_qp *qp, struct ib_send_wr *wr)
{
struct ipath_swqe *wqe;
unsigned long flags;
@@ -533,13 +534,149 @@ int ipath_post_rc_send(struct ipath_qp *
qp->s_head = next;
spin_unlock_irqrestore(&qp->s_lock, flags);

- if (qp->ibqp.qp_type == IB_QPT_UC)
- ipath_do_uc_send((unsigned long) qp);
- else
- ipath_do_rc_send((unsigned long) qp);
+ ipath_do_ruc_send((unsigned long) qp);

ret = 0;

bail:
return ret;
}
+
+/**
+ * ipath_make_grh - construct a GRH header
+ * @dev: a pointer to the ipath device
+ * @hdr: a pointer to the GRH header being constructed
+ * @grh: the global route address to send to
+ * @hwords: the number of 32 bit words of header being sent
+ * @nwords: the number of 32 bit words of data being sent
+ *
+ * Return the size of the header in 32 bit words.
+ */
+u32 ipath_make_grh(struct ipath_ibdev *dev, struct ib_grh *hdr,
+ struct ib_global_route *grh, u32 hwords, u32 nwords)
+{
+ hdr->version_tclass_flow =
+ cpu_to_be32((6 << 28) |
+ (grh->traffic_class << 20) |
+ grh->flow_label);
+ hdr->paylen = cpu_to_be16((hwords - 2 + nwords + SIZE_OF_CRC) << 2);
+ /* next_hdr is defined by C8-7 in ch. 8.4.1 */
+ hdr->next_hdr = 0x1B;
+ hdr->hop_limit = grh->hop_limit;
+ /* The SGID is 32-bit aligned. */
+ hdr->sgid.global.subnet_prefix = dev->gid_prefix;
+ hdr->sgid.global.interface_id = ipath_layer_get_guid(dev->dd);
+ hdr->dgid = grh->dgid;
+
+ /* GRH header size in 32-bit words. */
+ return sizeof(struct ib_grh) / sizeof(u32);
+}
+
+/**
+ * ipath_do_ruc_send - perform a send on an RC or UC QP
+ * @data: contains a pointer to the QP
+ *
+ * Process entries in the send work queue until credit or queue is
+ * exhausted. Only allow one CPU to send a packet per QP (tasklet).
+ * Otherwise, after we drop the QP s_lock, two threads could send
+ * packets out of order.
+ */
+void ipath_do_ruc_send(unsigned long data)
+{
+ struct ipath_qp *qp = (struct ipath_qp *)data;
+ struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
+ unsigned long flags;
+ u16 lrh0;
+ u32 nwords;
+ u32 extra_bytes;
+ u32 bth0;
+ u32 bth2;
+ u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu);
+ struct ipath_other_headers *ohdr;
+
+ if (test_and_set_bit(IPATH_S_BUSY, &qp->s_flags))
+ goto bail;
+
+ if (unlikely(qp->remote_ah_attr.dlid ==
+ ipath_layer_get_lid(dev->dd))) {
+ ipath_ruc_loopback(qp);
+ goto clear;
+ }
+
+ ohdr = &qp->s_hdr.u.oth;
+ if (qp->remote_ah_attr.ah_flags & IB_AH_GRH)
+ ohdr = &qp->s_hdr.u.l.oth;
+
+again:
+ /* Check for a constructed packet to be sent. */
+ if (qp->s_hdrwords != 0) {
+ /*
+ * If no PIO bufs are available, return. An interrupt will
+ * call ipath_ib_piobufavail() when one is available.
+ */
+ if (ipath_verbs_send(dev->dd, qp->s_hdrwords,
+ (u32 *) &qp->s_hdr, qp->s_cur_size,
+ qp->s_cur_sge)) {
+ ipath_no_bufs_available(qp, dev);
+ goto bail;
+ }
+ dev->n_unicast_xmit++;
+ /* Record that we sent the packet and s_hdr is empty. */
+ qp->s_hdrwords = 0;
+ }
+
+ /*
+ * The lock is needed to synchronize between setting
+ * qp->s_ack_state, resend timer, and post_send().
+ */
+ spin_lock_irqsave(&qp->s_lock, flags);
+
+ /* Sending responses has higher priority over sending requests. */
+ if (qp->s_ack_state != IB_OPCODE_RC_ACKNOWLEDGE &&
+ (bth0 = ipath_make_rc_ack(qp, ohdr, pmtu)) != 0)
+ bth2 = qp->s_ack_psn++ & IPS_PSN_MASK;
+ else if (!((qp->ibqp.qp_type == IB_QPT_RC) ?
+ ipath_make_rc_req(qp, ohdr, pmtu, &bth0, &bth2) :
+ ipath_make_uc_req(qp, ohdr, pmtu, &bth0, &bth2))) {
+ /*
+ * Clear the busy bit before unlocking to avoid races with
+ * adding new work queue items and then failing to process
+ * them.
+ */
+ clear_bit(IPATH_S_BUSY, &qp->s_flags);
+ spin_unlock_irqrestore(&qp->s_lock, flags);
+ goto bail;
+ }
+
+ spin_unlock_irqrestore(&qp->s_lock, flags);
+
+ /* Construct the header. */
+ extra_bytes = (4 - qp->s_cur_size) & 3;
+ nwords = (qp->s_cur_size + extra_bytes) >> 2;
+ lrh0 = IPS_LRH_BTH;
+ if (unlikely(qp->remote_ah_attr.ah_flags & IB_AH_GRH)) {
+ qp->s_hdrwords += ipath_make_grh(dev, &qp->s_hdr.u.l.grh,
+ &qp->remote_ah_attr.grh,
+ qp->s_hdrwords, nwords);
+ lrh0 = IPS_LRH_GRH;
+ }
+ lrh0 |= qp->remote_ah_attr.sl << 4;
+ qp->s_hdr.lrh[0] = cpu_to_be16(lrh0);
+ qp->s_hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);
+ qp->s_hdr.lrh[2] = cpu_to_be16(qp->s_hdrwords + nwords +
+ SIZE_OF_CRC);
+ qp->s_hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->dd));
+ bth0 |= ipath_layer_get_pkey(dev->dd, qp->s_pkey_index);
+ bth0 |= extra_bytes << 20;
+ ohdr->bth[0] = cpu_to_be32(bth0);
+ ohdr->bth[1] = cpu_to_be32(qp->remote_qpn);
+ ohdr->bth[2] = cpu_to_be32(bth2);
+
+ /* Check for more work to do. */
+ goto again;
+
+clear:
+ clear_bit(IPATH_S_BUSY, &qp->s_flags);
+bail:
+ return;
+}
diff -r 947e92f4b370 -r 201654fe1962 drivers/infiniband/hw/ipath/ipath_uc.c
--- a/drivers/infiniband/hw/ipath/ipath_uc.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_uc.c Fri May 12 15:55:28 2006 -0700
@@ -61,90 +61,40 @@ static void complete_last_send(struct ip
}

/**
- * ipath_do_uc_send - do a send on a UC queue
- * @data: contains a pointer to the QP to send on
- *
- * Process entries in the send work queue until the queue is exhausted.
- * Only allow one CPU to send a packet per QP (tasklet).
- * Otherwise, after we drop the QP lock, two threads could send
- * packets out of order.
- * This is similar to ipath_do_rc_send() below except we don't have
- * timeouts or resends.
+ * ipath_make_uc_req - construct a request packet (SEND, RDMA write)
+ * @qp: a pointer to the QP
+ * @ohdr: a pointer to the IB header being constructed
+ * @pmtu: the path MTU
+ * @bth0p: pointer to the BTH opcode word
+ * @bth2p: pointer to the BTH PSN word
+ *
+ * Return 1 if constructed; otherwise, return 0.
+ * Note the QP s_lock must be held and interrupts disabled.
*/
-void ipath_do_uc_send(unsigned long data)
+int ipath_make_uc_req(struct ipath_qp *qp,
+ struct ipath_other_headers *ohdr,
+ u32 pmtu, u32 *bth0p, u32 *bth2p)
{
- struct ipath_qp *qp = (struct ipath_qp *)data;
- struct ipath_ibdev *dev = to_idev(qp->ibqp.device);
struct ipath_swqe *wqe;
- unsigned long flags;
- u16 lrh0;
u32 hwords;
- u32 nwords;
- u32 extra_bytes;
u32 bth0;
- u32 bth2;
- u32 pmtu = ib_mtu_enum_to_int(qp->path_mtu);
u32 len;
- struct ipath_other_headers *ohdr;
struct ib_wc wc;

- if (test_and_set_bit(IPATH_S_BUSY, &qp->s_flags))
- goto bail;
-
- if (unlikely(qp->remote_ah_attr.dlid ==
- ipath_layer_get_lid(dev->dd))) {
- /* Pass in an uninitialized ib_wc to save stack space. */
- ipath_ruc_loopback(qp, &wc);
- clear_bit(IPATH_S_BUSY, &qp->s_flags);
- goto bail;
- }
-
- ohdr = &qp->s_hdr.u.oth;
- if (qp->remote_ah_attr.ah_flags & IB_AH_GRH)
- ohdr = &qp->s_hdr.u.l.oth;
-
-again:
- /* Check for a constructed packet to be sent. */
- if (qp->s_hdrwords != 0) {
- /*
- * If no PIO bufs are available, return.
- * An interrupt will call ipath_ib_piobufavail()
- * when one is available.
- */
- if (ipath_verbs_send(dev->dd, qp->s_hdrwords,
- (u32 *) &qp->s_hdr,
- qp->s_cur_size,
- qp->s_cur_sge)) {
- ipath_no_bufs_available(qp, dev);
- goto bail;
- }
- dev->n_unicast_xmit++;
- /* Record that we sent the packet and s_hdr is empty. */
- qp->s_hdrwords = 0;
- }
-
- lrh0 = IPS_LRH_BTH;
+ if (!(ib_ipath_state_ops[qp->state] & IPATH_PROCESS_SEND_OK))
+ goto done;
+
/* header size in 32-bit words LRH+BTH = (8+12)/4. */
hwords = 5;
-
- /*
- * The lock is needed to synchronize between
- * setting qp->s_ack_state and post_send().
- */
- spin_lock_irqsave(&qp->s_lock, flags);
-
- if (!(ib_ipath_state_ops[qp->state] & IPATH_PROCESS_SEND_OK))
- goto done;
-
- bth0 = ipath_layer_get_pkey(dev->dd, qp->s_pkey_index);
-
- /* Send a request. */
+ bth0 = 0;
+
+ /* Get the next send request. */
wqe = get_swqe_ptr(qp, qp->s_last);
switch (qp->s_state) {
default:
/*
- * Signal the completion of the last send (if there is
- * one).
+ * Signal the completion of the last send
+ * (if there is one).
*/
if (qp->s_last != qp->s_tail)
complete_last_send(qp, wqe, &wc);
@@ -257,61 +207,16 @@ again:
}
break;
}
- bth2 = qp->s_next_psn++ & IPS_PSN_MASK;
qp->s_len -= len;
- bth0 |= qp->s_state << 24;
-
- spin_unlock_irqrestore(&qp->s_lock, flags);
-
- /* Construct the header. */
- extra_bytes = (4 - len) & 3;
- nwords = (len + extra_bytes) >> 2;
- if (unlikely(qp->remote_ah_attr.ah_flags & IB_AH_GRH)) {
- /* Header size in 32-bit words. */
- hwords += 10;
- lrh0 = IPS_LRH_GRH;
- qp->s_hdr.u.l.grh.version_tclass_flow =
- cpu_to_be32((6 << 28) |
- (qp->remote_ah_attr.grh.traffic_class
- << 20) |
- qp->remote_ah_attr.grh.flow_label);
- qp->s_hdr.u.l.grh.paylen =
- cpu_to_be16(((hwords - 12) + nwords +
- SIZE_OF_CRC) << 2);
- /* next_hdr is defined by C8-7 in ch. 8.4.1 */
- qp->s_hdr.u.l.grh.next_hdr = 0x1B;
- qp->s_hdr.u.l.grh.hop_limit =
- qp->remote_ah_attr.grh.hop_limit;
- /* The SGID is 32-bit aligned. */
- qp->s_hdr.u.l.grh.sgid.global.subnet_prefix =
- dev->gid_prefix;
- qp->s_hdr.u.l.grh.sgid.global.interface_id =
- ipath_layer_get_guid(dev->dd);
- qp->s_hdr.u.l.grh.dgid = qp->remote_ah_attr.grh.dgid;
- }
qp->s_hdrwords = hwords;
qp->s_cur_sge = &qp->s_sge;
qp->s_cur_size = len;
- lrh0 |= qp->remote_ah_attr.sl << 4;
- qp->s_hdr.lrh[0] = cpu_to_be16(lrh0);
- /* DEST LID */
- qp->s_hdr.lrh[1] = cpu_to_be16(qp->remote_ah_attr.dlid);
- qp->s_hdr.lrh[2] = cpu_to_be16(hwords + nwords + SIZE_OF_CRC);
- qp->s_hdr.lrh[3] = cpu_to_be16(ipath_layer_get_lid(dev->dd));
- bth0 |= extra_bytes << 20;
- ohdr->bth[0] = cpu_to_be32(bth0);
- ohdr->bth[1] = cpu_to_be32(qp->remote_qpn);
- ohdr->bth[2] = cpu_to_be32(bth2);
-
- /* Check for more work to do. */
- goto again;
+ *bth0p = bth0 | (qp->s_state << 24);
+ *bth2p = qp->s_next_psn++ & IPS_PSN_MASK;
+ return 1;

done:
- spin_unlock_irqrestore(&qp->s_lock, flags);
- clear_bit(IPATH_S_BUSY, &qp->s_flags);
-
-bail:
- return;
+ return 0;
}

/**
@@ -535,12 +440,13 @@ void ipath_uc_rcv(struct ipath_ibdev *de
if (qp->r_len != 0) {
u32 rkey = be32_to_cpu(reth->rkey);
u64 vaddr = be64_to_cpu(reth->vaddr);
+ int ok;

/* Check rkey */
- if (unlikely(!ipath_rkey_ok(
- dev, &qp->r_sge, qp->r_len,
- vaddr, rkey,
- IB_ACCESS_REMOTE_WRITE))) {
+ ok = ipath_rkey_ok(dev, &qp->r_sge, qp->r_len,
+ vaddr, rkey,
+ IB_ACCESS_REMOTE_WRITE);
+ if (unlikely(!ok)) {
dev->n_pkt_drops++;
goto done;
}
@@ -558,8 +464,7 @@ void ipath_uc_rcv(struct ipath_ibdev *de
}
if (opcode == OP(RDMA_WRITE_ONLY))
goto rdma_last;
- else if (opcode ==
- OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE))
+ else if (opcode == OP(RDMA_WRITE_ONLY_WITH_IMMEDIATE))
goto rdma_last_imm;
/* FALLTHROUGH */
case OP(RDMA_WRITE_MIDDLE):
@@ -592,9 +497,9 @@ void ipath_uc_rcv(struct ipath_ibdev *de
dev->n_pkt_drops++;
goto done;
}
- if (qp->r_reuse_sge) {
+ if (qp->r_reuse_sge)
qp->r_reuse_sge = 0;
- } else if (!ipath_get_rwqe(qp, 1)) {
+ else if (!ipath_get_rwqe(qp, 1)) {
dev->n_pkt_drops++;
goto done;
}
diff -r 947e92f4b370 -r 201654fe1962 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
@@ -246,7 +246,7 @@ static int ipath_post_send(struct ib_qp
switch (qp->ibqp.qp_type) {
case IB_QPT_UC:
case IB_QPT_RC:
- err = ipath_post_rc_send(qp, wr);
+ err = ipath_post_ruc_send(qp, wr);
break;

case IB_QPT_SMI:
diff -r 947e92f4b370 -r 201654fe1962 drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:28 2006 -0700
@@ -585,10 +585,6 @@ void ipath_sqerror_qp(struct ipath_qp *q

void ipath_get_credit(struct ipath_qp *qp, u32 aeth);

-void ipath_do_rc_send(unsigned long data);
-
-void ipath_do_uc_send(unsigned long data);
-
void ipath_cq_enter(struct ipath_cq *cq, struct ib_wc *entry, int sig);

int ipath_rkey_ok(struct ipath_ibdev *dev, struct ipath_sge_state *ss,
@@ -601,7 +597,7 @@ void ipath_copy_sge(struct ipath_sge_sta

void ipath_skip_sge(struct ipath_sge_state *ss, u32 length);

-int ipath_post_rc_send(struct ipath_qp *qp, struct ib_send_wr *wr);
+int ipath_post_ruc_send(struct ipath_qp *qp, struct ib_send_wr *wr);

void ipath_uc_rcv(struct ipath_ibdev *dev, struct ipath_ib_header *hdr,
int has_grh, void *data, u32 tlen, struct ipath_qp *qp);
@@ -683,7 +679,19 @@ void ipath_insert_rnr_queue(struct ipath

int ipath_get_rwqe(struct ipath_qp *qp, int wr_id_only);

-void ipath_ruc_loopback(struct ipath_qp *sqp, struct ib_wc *wc);
+u32 ipath_make_grh(struct ipath_ibdev *dev, struct ib_grh *hdr,
+ struct ib_global_route *grh, u32 hwords, u32 nwords);
+
+void ipath_do_ruc_send(unsigned long data);
+
+u32 ipath_make_rc_ack(struct ipath_qp *qp, struct ipath_other_headers *ohdr,
+ u32 pmtu);
+
+int ipath_make_rc_req(struct ipath_qp *qp, struct ipath_other_headers *ohdr,
+ u32 pmtu, u32 *bth0p, u32 *bth2p);
+
+int ipath_make_uc_req(struct ipath_qp *qp, struct ipath_other_headers *ohdr,
+ u32 pmtu, u32 *bth0p, u32 *bth2p);

extern const enum ib_wc_opcode ib_ipath_wc_opcode[];

2006-05-12 23:44:33

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 1 of 53] ipath - fix spinlock recursion bug

The local loopback path for RC can lock the rkey table lock without
blocking interrupts. The receive interrupt path can then call
ipath_rkey_ok() and deadlock. Since the lock only protects a 64 bit read,
the lock isn't needed.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 89f7c69a68bf -r 9b9f24aab350 drivers/infiniband/hw/ipath/ipath_keys.c
--- a/drivers/infiniband/hw/ipath/ipath_keys.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_keys.c Fri May 12 15:55:27 2006 -0700
@@ -136,9 +136,7 @@ int ipath_lkey_ok(struct ipath_lkey_tabl
ret = 1;
goto bail;
}
- spin_lock(&rkt->lock);
mr = rkt->table[(sge->lkey >> (32 - ib_ipath_lkey_table_size))];
- spin_unlock(&rkt->lock);
if (unlikely(mr == NULL || mr->lkey != sge->lkey)) {
ret = 0;
goto bail;
@@ -184,8 +182,6 @@ bail:
* @acc: access flags
*
* Return 1 if successful, otherwise 0.
- *
- * The QP r_rq.lock should be held.
*/
int ipath_rkey_ok(struct ipath_ibdev *dev, struct ipath_sge_state *ss,
u32 len, u64 vaddr, u32 rkey, int acc)
@@ -196,9 +192,7 @@ int ipath_rkey_ok(struct ipath_ibdev *de
size_t off;
int ret;

- spin_lock(&rkt->lock);
mr = rkt->table[(rkey >> (32 - ib_ipath_lkey_table_size))];
- spin_unlock(&rkt->lock);
if (unlikely(mr == NULL || mr->lkey != rkey)) {
ret = 0;
goto bail;

2006-05-13 00:02:53

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 15 of 53] ipath - make some maximum values more sane

Increase the limits on some maximum values.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 5d9fbba3222e -r 480ceff18a88 drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:28 2006 -0700
@@ -64,21 +64,21 @@ module_param_named(max_ahs, ib_ipath_max
module_param_named(max_ahs, ib_ipath_max_ahs, uint, S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(max_ahs, "Maximum number of address handles to support");

-unsigned int ib_ipath_max_cqes = 0xFFFF;
+unsigned int ib_ipath_max_cqes = 0x2FFFF;
module_param_named(max_cqes, ib_ipath_max_cqes, uint, S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(max_cqes,
"Maximum number of completion queue entries to support");

-unsigned int ib_ipath_max_cqs = 0xFFFF;
+unsigned int ib_ipath_max_cqs = 0x1FFFF;
module_param_named(max_cqs, ib_ipath_max_cqs, uint, S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(max_cqs, "Maximum number of completion queues to support");

-unsigned int ib_ipath_max_qp_wrs = 255;
+unsigned int ib_ipath_max_qp_wrs = 0x1FFFF;
module_param_named(max_qp_wrs, ib_ipath_max_qp_wrs, uint,
S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(max_qp_wrs, "Maximum number of QP WRs to support");

-unsigned int ib_ipath_max_sges = 255;
+unsigned int ib_ipath_max_sges = 0xFF;
module_param_named(max_sges, ib_ipath_max_sges, uint, S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(max_sges, "Maximum number of SGEs to support");

2006-05-13 00:03:09

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 16 of 53] ipath - fix reporting of driver version to userspace

Fix the interface version that gets exported to userspace.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 480ceff18a88 -r 176d1f0c26a3 drivers/infiniband/hw/ipath/ipath_file_ops.c
--- a/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_file_ops.c Fri May 12 15:55:28 2006 -0700
@@ -139,7 +139,7 @@ static int ipath_get_base_info(struct ip
kinfo->spi_piosize = dd->ipath_ibmaxlen;
kinfo->spi_mtu = dd->ipath_ibmaxlen; /* maxlen, not ibmtu */
kinfo->spi_port = pd->port_port;
- kinfo->spi_sw_version = IPATH_USER_SWVERSION;
+ kinfo->spi_sw_version = IPATH_KERN_SWVERSION;
kinfo->spi_hw_version = dd->ipath_revision;

if (copy_to_user(ubase, kinfo, sizeof(*kinfo)))

2006-05-13 00:03:09

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 14 of 53] ipath - forbid empty MRs

Don't allow zero-length regions to be created.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 02a05b853d20 -r 5d9fbba3222e drivers/infiniband/hw/ipath/ipath_mr.c
--- a/drivers/infiniband/hw/ipath/ipath_mr.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_mr.c Fri May 12 15:55:28 2006 -0700
@@ -168,6 +168,11 @@ struct ib_mr *ipath_reg_user_mr(struct i
struct ib_umem_chunk *chunk;
int n, m, i;
struct ib_mr *ret;
+
+ if (region->length == 0) {
+ ret = ERR_PTR(-EINVAL);
+ goto bail;
+ }

n = 0;
list_for_each_entry(chunk, &region->chunk_list, list)

2006-05-13 00:02:27

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 30 of 53] ipath - count VL15 packet drops due to bad VL or lack of buffers

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r 23519e578bf0 -r b098b021b6fd drivers/infiniband/hw/ipath/ipath_ud.c
--- a/drivers/infiniband/hw/ipath/ipath_ud.c Fri May 12 15:55:28 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_ud.c Fri May 12 15:55:28 2006 -0700
@@ -554,11 +554,16 @@ void ipath_ud_rcv(struct ipath_ibdev *de
spin_lock_irqsave(&rq->lock, flags);
if (rq->tail == rq->head) {
spin_unlock_irqrestore(&rq->lock, flags);
- /* Count VL15 packets dropped due to no receive buffer */
+ /*
+ * Count VL15 packets dropped due to no receive buffer.
+ * Otherwise, count them as buffer overruns since usually,
+ * the HW will be able to receive packets even if there are
+ * no QPs with posted receive buffers.
+ */
if (qp->ibqp.qp_num == 0)
dev->n_vl15_dropped++;
else
- dev->n_pkt_drops++;
+ dev->rcv_errors++;
goto bail;
}
/* Silently drop packets which are too big. */

2006-05-13 00:03:36

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 10 of 53] ipath - require capabilities when creating a QP

You have to specify some capabilities when creating a QP.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r a89145f4846c -r 2fea0d127a41 drivers/infiniband/hw/ipath/ipath_qp.c
--- a/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_qp.c Fri May 12 15:55:27 2006 -0700
@@ -667,6 +667,14 @@ struct ib_qp *ipath_create_qp(struct ib_
goto bail;
}

+ if (init_attr->cap.max_send_sge +
+ init_attr->cap.max_recv_sge +
+ init_attr->cap.max_send_wr +
+ init_attr->cap.max_recv_wr == 0) {
+ ret = ERR_PTR(-EINVAL);
+ goto bail;
+ }
+
switch (init_attr->qp_type) {
case IB_QPT_UC:
case IB_QPT_RC:

2006-05-13 00:04:34

by Bryan O'Sullivan

[permalink] [raw]
Subject: [PATCH 7 of 53] ipath - cap maximum number of AHs

Cap the maximum number of address handles.

Signed-off-by: Bryan O'Sullivan <[email protected]>

diff -r def81ab50644 -r e823378bd19c drivers/infiniband/hw/ipath/ipath_verbs.c
--- a/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.c Fri May 12 15:55:27 2006 -0700
@@ -59,6 +59,11 @@ module_param_named(max_pds, ib_ipath_max
module_param_named(max_pds, ib_ipath_max_pds, uint, S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(max_pds,
"Maximum number of protection domains to support");
+
+static unsigned int ib_ipath_max_ahs = 0xFFFF;
+module_param_named(max_ahs, ib_ipath_max_ahs, uint, S_IWUSR | S_IRUGO);
+MODULE_PARM_DESC(max_ahs,
+ "Maximum number of address handles to support");

MODULE_LICENSE("GPL");
MODULE_AUTHOR("PathScale <[email protected]>");
@@ -592,6 +597,7 @@ static int ipath_query_device(struct ib_
props->max_qp_wr = 0xffff;
props->max_sge = 255;
props->max_cq = 0xffff;
+ props->max_ah = ib_ipath_max_ahs;
props->max_cqe = 0xffff;
props->max_mr = dev->lk_table.max;
props->max_pd = ib_ipath_max_pds;
@@ -764,13 +770,13 @@ static struct ib_pd *ipath_alloc_pd(stru
goto bail;
}

- dev->n_pds_allocated++;
-
pd = kmalloc(sizeof *pd, GFP_KERNEL);
if (!pd) {
ret = ERR_PTR(-ENOMEM);
goto bail;
}
+
+ dev->n_pds_allocated++;

/* ib_alloc_pd() will initialize pd->ibpd. */
pd->user = udata != NULL;
@@ -805,6 +811,12 @@ static struct ib_ah *ipath_create_ah(str
{
struct ipath_ah *ah;
struct ib_ah *ret;
+ struct ipath_ibdev *dev = to_idev(pd->device);
+
+ if (dev->n_ahs_allocated == ib_ipath_max_ahs) {
+ ret = ERR_PTR(-ENOMEM);
+ goto bail;
+ }

/* A multicast address requires a GRH (see ch. 8.4.1). */
if (ah_attr->dlid >= IPS_MULTICAST_LID_BASE &&
@@ -848,7 +860,10 @@ bail:
*/
static int ipath_destroy_ah(struct ib_ah *ibah)
{
+ struct ipath_ibdev *dev = to_idev(ibah->device);
struct ipath_ah *ah = to_iah(ibah);
+
+ dev->n_ahs_allocated--;

kfree(ah);

diff -r def81ab50644 -r e823378bd19c drivers/infiniband/hw/ipath/ipath_verbs.h
--- a/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700
+++ b/drivers/infiniband/hw/ipath/ipath_verbs.h Fri May 12 15:55:27 2006 -0700
@@ -432,6 +432,7 @@ struct ipath_ibdev {
__be64 gid_prefix; /* in network order */
__be64 mkey;
u32 n_pds_allocated; /* number of PDs allocated for device */
+ u32 n_ahs_allocated; /* number of AHs allocated for device */
u64 ipath_sword; /* total dwords sent (sample result) */
u64 ipath_rword; /* total dwords received (sample result) */
u64 ipath_spkts; /* total packets sent (sample result) */

2006-05-13 01:01:22

by Joshua Hudson

[permalink] [raw]
Subject: Re: [PATCH 1 of 53] ipath - fix spinlock recursion bug

On 5/12/06, Bryan O'Sullivan <[email protected]> wrote:
> The local loopback path for RC can lock the rkey table lock without
> blocking interrupts. The receive interrupt path can then call
> ipath_rkey_ok() and deadlock. Since the lock only protects a 64 bit read,
> the lock isn't needed.
Uhhh, a 64 bit read is not atomic on all architectures. Certainly not i386.

Might want to verify safety of this.

2006-05-15 15:02:36

by Roger Heflin

[permalink] [raw]
Subject: Re: [openib-general] [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4

Bryan O'Sullivan wrote:
> Hi, Roland -
>
> Here is a series of patches to bring the ipath driver up to date. I
> believe you may already have two of them (but I've included them just
> in case), but the others should all be new.
>
> They apply on top of Linus's current -git.
>
> Cheers,
>
> <b


Bryan,

I notice there are several patches in 1-53 that appear to be missing, or
at least I did not get them on the openib list.

They are: 3, 11, 12, 17, 27, 35, 36, 38, 41, 43, 46, 52.

Roger

2006-05-15 15:44:49

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4

Umm... dumping a 53 patch series into the kernel at this stage in the
release cycle isn't going to work. You need to sort out the patches
that need to go into 2.6.17 from patches that can wait. For example,
a 1500+ line patch to factor out common code is clearly not
appropriate now. Pretty much the only patches that should be going in
now are changes that fix crashes or other serious bugs.

(You can send both sets of patches at the same time -- just let me
which ones are for 2.6.17 and which ones can be queued for 2.6.18)

I have some more specific comments in reply to individual patches,
although I didn't try to review all 53.

- R.

2006-05-15 15:45:52

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 4 of 53] ipath - cap number of PDs that can be allocated

> Put an arbitrary cap on the maximum number of PDs that can be allocated
> for a device. This is arbitrary because the number we support
> is constrained only by system memory and what kmalloc can give us.
> Nevertheless, if we don't have a limit, some third-party OpenIB stress
> tests fail. The limit can be changed on the fly using a module parameter.

Would it make more sense to fix the stress test?

- R.

2006-05-15 15:47:08

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 14 of 53] ipath - forbid empty MRs

> Don't allow zero-length regions to be created.

Why are zero-length regions forbidden?

- R.

2006-05-15 15:48:16

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 15 of 53] ipath - make some maximum values more sane

> -unsigned int ib_ipath_max_cqes = 0xFFFF;
> +unsigned int ib_ipath_max_cqes = 0x2FFFF;

You just added this limit in patch 8/53. How about just fixing that
patch to do what you want?

- R.

2006-05-15 15:50:35

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt

> I think Roland already has this patch.

> * This is a bit of a hack since we rely on dma_map_single()
> - * being reversible by calling bus_to_virt().
> + * being reversible by calling phys_to_virt().

Actually I NAK'ed this patch. It compiles the same thing on x86_64
but makes the source code wrong -- dma_map_single() returns a bus
address, not a physical address.

- R.

2006-05-15 15:53:46

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

This looks like a pastiche of several patches. Why can't it be split
up into logical pieces?

> Call dma_free_coherent without ipath_mutex held.

Why? Doesn't freeing work with the mutex held?

- R.

2006-05-15 15:55:40

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 41 of 53] ipath - disable interrupts while holding spinlock in RWQE get

> @@ -171,12 +171,13 @@ int ipath_get_rwqe(struct ipath_qp *qp,
> n = rq->head - rq->tail;
> if (n < srq->limit) {
> srq->limit = 0;
> - spin_unlock(&rq->lock);
> + spin_unlock_irqrestore(&rq->lock, flags);
> ev.device = qp->ibqp.device;
> ev.element.srq = qp->ibqp.srq;
> ev.event = IB_EVENT_SRQ_LIMIT_REACHED;
> srq->ibsrq.event_handler(&ev,
> srq->ibsrq.srq_context);
> + spin_lock_irqsave(&rq->lock, flags);

ipath_get_rwqe() in the kernel now doesn't even have a flags
variable. So this looks like a bug introduced earlier in this patch
series. Please roll the fix up into the place where you added the bug.

- R.

2006-05-15 15:57:46

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes

> static void i2c_wait_for_writes(struct ipath_devdata *dd)
> {
> + mb();
> (void)ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
> }

This needs a comment explaining why it's needed. A memory barrier
before a readl() looks very strange since readl() should be ordered anyway.

- R.

2006-05-15 16:00:40

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 50 of 53] ipath - reduce maximum table sizes

This is the third patch in the series that changes these -- how about
making up your mind ;)

2006-05-15 16:00:29

by Roger Heflin

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4

Roland Dreier wrote:
> Umm... dumping a 53 patch series into the kernel at this stage in the
> release cycle isn't going to work. You need to sort out the patches
> that need to go into 2.6.17 from patches that can wait. For example,
> a 1500+ line patch to factor out common code is clearly not
> appropriate now. Pretty much the only patches that should be going in
> now are changes that fix crashes or other serious bugs.
>
> (You can send both sets of patches at the same time -- just let me
> which ones are for 2.6.17 and which ones can be queued for 2.6.18)
>
> I have some more specific comments in reply to individual patches,
> although I didn't try to review all 53.
>
> - R.


Roland,

What should these patches apply against?

I have tried rc4 and a number of them fail, and I have also
tried one of your gits (though maybe not the right one), and
at least some of the the same patches seem to fail to apply
there.

If I can get an idea what they should apply to I will apply them
against that and see how things look.

I have at least one of the nasty bugs that I know about.

Roger

2006-05-15 16:04:35

by Roland Dreier

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4

Roger> What should these patches apply against?

No idea. Bryan said they apply against Linus's current git, but I
didn't actually try.

- R.

2006-05-15 20:11:16

by Roger Heflin

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4

Roland Dreier wrote:
> Roger> What should these patches apply against?
>
> No idea. Bryan said they apply against Linus's current git, but I
> didn't actually try.
>
> - R.
>

I checked the rc4 -> git patches (there is only 1 ipath patch in it),
and I get a number of patch fails attempting to apply the patches,
I have the older 5/12/06 patch that was sent and I also get a number
of fails trying to apply that, though that may mean to be applied to
rc3 and not rc4, but rc4 + older patch + newer patches fails, and
rc4 + git + newer patches fails, looking through the code there are
a few things that I cannot find where the code in the context diff
came from.

I did attempt to resolve some of the funniness but there were things
that I appear to be missing (things in the context diff that I cannot
find exist in rc4 and I cannot find being added in any patch), so
I don't think I can even get everything to apply even with manual
adjusting.

Roger

2006-05-15 21:07:00

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 4 of 53] ipath - cap number of PDs that can be allocated

On Mon, 2006-05-15 at 08:45 -0700, Roland Dreier wrote:

> Would it make more sense to fix the stress test?

I don't think so. Without some kind of limits, it is simple for an
unprivileged user process to cause the kernel to allocate huge wads of
memory and thereby DoS or accidentally OOM the machine.

The test in question should probably be fixed, but this is a much more
fundamental problem. I don't have any specific opinions on what should
be done about it, other than "something".

<b

2006-05-15 21:07:40

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4

On Mon, 2006-05-15 at 08:44 -0700, Roland Dreier wrote:
> Umm... dumping a 53 patch series into the kernel at this stage in the
> release cycle isn't going to work.

Fair enough.

> Pretty much the only patches that should be going in
> now are changes that fix crashes or other serious bugs.

OK, I'll filter those out and send them separately.

<b

2006-05-15 21:09:20

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 0 of 53] ipath driver updates for 2.6.17-rc4

On Mon, 2006-05-15 at 15:11 -0500, Roger Heflin wrote:

> I checked the rc4 -> git patches (there is only 1 ipath patch in it),
> and I get a number of patch fails attempting to apply the patches,

I've been using a Mercurial mirror of the git tree, but it should be
basically identical to the git tree.

> I did attempt to resolve some of the funniness but there were things
> that I appear to be missing (things in the context diff that I cannot
> find exist in rc4 and I cannot find being added in any patch), so
> I don't think I can even get everything to apply even with manual
> adjusting.

Please send me some more information off-list, and I'll try to help.

<b

2006-05-15 21:10:48

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes

On Mon, 2006-05-15 at 08:57 -0700, Roland Dreier wrote:
> > static void i2c_wait_for_writes(struct ipath_devdata *dd)
> > {
> > + mb();
> > (void)ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
> > }
>
> This needs a comment explaining why it's needed. A memory barrier
> before a readl() looks very strange since readl() should be ordered anyway.

Yeah. It's actually working around what appears to be a gcc bug if the
kernel is compiled with -Os. Ralph knows the details; he can give a
more complete answer.

<b

2006-05-15 21:17:59

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 14 of 53] ipath - forbid empty MRs

On Mon, 2006-05-15 at 08:46 -0700, Roland Dreier wrote:
> > Don't allow zero-length regions to be created.
>
> Why are zero-length regions forbidden?

One of the gen2 basic tests checks for zero-length regions and barfs if
someone creates them. There's no language in IBNA that forbids
zero-length regions (I'll take a look at the spec itself to be sure), so
it's possible that the test is wrong. On the other hand, a zero-length
region doesn't seem terribly useful.

<b

2006-05-15 21:21:25

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt

On Mon, 2006-05-15 at 08:50 -0700, Roland Dreier wrote:

> Actually I NAK'ed this patch. It compiles the same thing on x86_64
> but makes the source code wrong -- dma_map_single() returns a bus
> address, not a physical address.

As Segher mentioned, bus_to_virt is unportable, so it's definitely the
wrong thing to use.

I don't recall what you suggested instead, but I seem to recall that the
discussion kind of went "oh, right, the layering is all broken".

Any ideas? Should this turn from a one-liner into a
big-refactor-for-2.6.18 patch?

<b

2006-05-15 21:28:50

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt

Bryan> As Segher mentioned, bus_to_virt is unportable, so it's
Bryan> definitely the wrong thing to use.

Yes, but at least it says what you're trying to do. asm-powerpc's
io.h has this for phys_to_virt:

* This function does not handle bus mappings for DMA transfers. In
* almost all conceivable cases a device driver should not be using
* this function

so replacing bus_to_virt with that is not a step forward.

Bryan> Any ideas? Should this turn from a one-liner into a
Bryan> big-refactor-for-2.6.18 patch?

I don't think there's a quick way to fix this. What you really want
to do is override the DMA mapping functions for your device so that
you can keep track of the kernel mapping. powerpc can already do this
(cf the ehca driver), and I think patches to do it on x86-64 are
floating around as part of the "Calgary IOMMU" work.

- R.

2006-05-15 23:01:23

by Ralph Campbell

[permalink] [raw]
Subject: Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes

> On Mon, 2006-05-15 at 08:57 -0700, Roland Dreier wrote:
>> > static void i2c_wait_for_writes(struct ipath_devdata *dd)
>> > {
>> > + mb();
>> > (void)ipath_read_kreg32(dd, dd->ipath_kregs->kr_scratch);
>> > }
>>
>> This needs a comment explaining why it's needed. A memory barrier
>> before a readl() looks very strange since readl() should be ordered
>> anyway.
>
> Yeah. It's actually working around what appears to be a gcc bug if the
> kernel is compiled with -Os. Ralph knows the details; he can give a
> more complete answer.
>
> <b

I don't have a lot to add to this other than I looked at the
assembly code output for -Os and -O3 and both looked OK.
I put the mb() in to be sure the writes were complete and
I found this to work by experimentation.
Without it, the driver fails to read the EEPROM correctly.

2006-05-15 23:08:07

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes

ralphc> I don't have a lot to add to this other than I looked at
ralphc> the assembly code output for -Os and -O3 and both looked
ralphc> OK. I put the mb() in to be sure the writes were complete
ralphc> and I found this to work by experimentation. Without it,
ralphc> the driver fails to read the EEPROM correctly.

Hmm, that doesn't give me a warm fuzzy feeling. Basically on x86-64
you're adding an unneeded mfence instruction to work around
miscompilation?

Is i2c_wait_for_writes miscompiled without the mb() with -Os? What
does the bad assembly look like?

- R.

2006-05-15 23:12:42

by Grant Grundler

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt

On Mon, May 15, 2006 at 02:28:45PM -0700, Roland Dreier wrote:
> Bryan> Any ideas? Should this turn from a one-liner into a
> Bryan> big-refactor-for-2.6.18 patch?
>
> I don't think there's a quick way to fix this. What you really want
> to do is override the DMA mapping functions for your device so that
> you can keep track of the kernel mapping.

Or figure out which openib.org interface has to change so the
original virt addresses that were registered/handed to the ULP
are passed down to the low level interface driver too.
Seems like a more obvious way to fix the problem.
Someone did suggest this already, right?

> (cf the ehca driver), and I think patches to do it on x86-64 are
> floating around as part of the "Calgary IOMMU" work.

parisc has been using dma_ops for several years.
I don't expect dma_ops to become part of generic code.
DMA support is inherently arch specific.
Because of that, I don't look forward to a low level
device driver that is mucking with dma_ops.

hth,
grant

2006-05-15 23:16:48

by Roland Dreier

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt

Grant> Or figure out which openib.org interface has to change so
Grant> the original virt addresses that were registered/handed to
Grant> the ULP are passed down to the low level interface driver
Grant> too. Seems like a more obvious way to fix the problem.
Grant> Someone did suggest this already, right?

It's been suggested many times, but no one ever comes up with a way to
handle the fact that RDMA means that addresses come from remote
systems as well as being passed in through an API.

- R.

2006-05-15 23:25:01

by Ralph Campbell

[permalink] [raw]
Subject: Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes

> ralphc> I don't have a lot to add to this other than I looked at
> ralphc> the assembly code output for -Os and -O3 and both looked
> ralphc> OK. I put the mb() in to be sure the writes were complete
> ralphc> and I found this to work by experimentation. Without it,
> ralphc> the driver fails to read the EEPROM correctly.
>
> Hmm, that doesn't give me a warm fuzzy feeling. Basically on x86-64
> you're adding an unneeded mfence instruction to work around
> miscompilation?
>
> Is i2c_wait_for_writes miscompiled without the mb() with -Os? What
> does the bad assembly look like?
>
> - R.

We had a power failure here so I'm not able to reproduce the
assembly code at the moment. What I remember from looking
at the code is that the code for ipath_read_kreg32() was
present in i2c_wait_for_writes() when compiled -Os so
it didn't look like a compiler bug. I probably could put the
mb() at the end of i2c_gpio_set() if that makes you more
comfortable. The mb() is definitely needed though.

2006-05-15 23:28:26

by Roland Dreier

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes

ralphc> We had a power failure here so I'm not able to reproduce
ralphc> the assembly code at the moment. What I remember from
ralphc> looking at the code is that the code for
ralphc> ipath_read_kreg32() was present in i2c_wait_for_writes()
ralphc> when compiled -Os so it didn't look like a compiler bug.
ralphc> I probably could put the mb() at the end of i2c_gpio_set()
ralphc> if that makes you more comfortable. The mb() is
ralphc> definitely needed though.

Is it the mb()? Or is just a barrier() enough? In other words do you
really need the mfence, or do you just need to stop the compiler from
reordering things?

- R.

2006-05-15 23:29:47

by Grant Grundler

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt

On Mon, May 15, 2006 at 04:16:45PM -0700, Roland Dreier wrote:
> Grant> Or figure out which openib.org interface has to change so
> Grant> the original virt addresses that were registered/handed to
> Grant> the ULP are passed down to the low level interface driver
> Grant> too. Seems like a more obvious way to fix the problem.
> Grant> Someone did suggest this already, right?
>
> It's been suggested many times, but no one ever comes up with a way to
> handle the fact that RDMA means that addresses come from remote
> systems as well as being passed in through an API.

Aren't remote addresses handled differently than local ones?
ULP has to map local addresses.
We can't map remote ones (remote host maps it).
The ULP must know the difference and can tell the lower level
driver which is which.

Sorry, I hope my ignorance of RDMA isn't getting in the way again.

thanks,
grant

2006-05-15 23:34:59

by Roland Dreier

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt

Grant> Aren't remote addresses handled differently than local
Grant> ones? ULP has to map local addresses. We can't map remote
Grant> ones (remote host maps it). The ULP must know the
Grant> difference and can tell the lower level driver which is
Grant> which.

The problem is that RDMA requests have to be handled by the low-level
driver (or hardware) without any ULP involvement. So every device has
to handle getting messages like "send me XXX bytes of data from
address YYY in the memory region corresponding to R_Key ZZZ."

- R.

2006-05-15 23:38:04

by Ralph Campbell

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes

> ralphc> We had a power failure here so I'm not able to reproduce
> ralphc> the assembly code at the moment. What I remember from
> ralphc> looking at the code is that the code for
> ralphc> ipath_read_kreg32() was present in i2c_wait_for_writes()
> ralphc> when compiled -Os so it didn't look like a compiler bug.
> ralphc> I probably could put the mb() at the end of i2c_gpio_set()
> ralphc> if that makes you more comfortable. The mb() is
> ralphc> definitely needed though.
>
> Is it the mb()? Or is just a barrier() enough? In other words do you
> really need the mfence, or do you just need to stop the compiler from
> reordering things?
>
> - R.

I didn't try calling barrier() so I don't know the answer.
When power is restored, I can try it.
My guess is that it's a timing issue and not a code reordering
issue.

2006-05-15 23:47:48

by Roland Dreier

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 53 of 53] ipath - add memory barrier when waiting for writes

ralphc> I didn't try calling barrier() so I don't know the answer.
ralphc> When power is restored, I can try it. My guess is that
ralphc> it's a timing issue and not a code reordering issue.

Hmm, then we really better understand what's going on, because
otherwise you're just going to have trouble again if someone makes a
CPU with a faster mfence instruction...

2006-05-16 20:05:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH 21 of 53] ipath - use phys_to_virt instead of bus_to_virt

On Mon, May 15, 2006 at 02:21:21PM -0700, Bryan O'Sullivan wrote:
> On Mon, 2006-05-15 at 08:50 -0700, Roland Dreier wrote:
>
> > Actually I NAK'ed this patch. It compiles the same thing on x86_64
> > but makes the source code wrong -- dma_map_single() returns a bus
> > address, not a physical address.
>
> As Segher mentioned, bus_to_virt is unportable, so it's definitely the
> wrong thing to use.

phys_to_virt is as bad. please fix your code to do the right thing, that
is to stop pretending to be able to map back from a bus to a virtual address.
The only way to get at the virtual address from a bus one is to store it
away at the time you call the dma mapping function.

2006-05-16 22:44:46

by Arlin Davis

[permalink] [raw]
Subject: Re: [openib-general] [PATCH 15 of 53] ipath - make some maximum values more sane

Bryan O'Sullivan wrote:

>Increase the limits on some maximum values.
>
>
>
I noticed a rdma/message max size limitation of 4096 the last time I ran
some dapl tests. Are there plans to increase or did I miss it somewhere
in all the patches?

Here are the max values returned from the ipath ibv_query_device:

query_hca: (ver=20401) ep 65535 ep_q 65535 evd 65535 evd_q 65535
query_hca: msg 4096 rdma 4096 iov 255 lmr 65535 rmr 0
query_hca: dto 65535 iov 255 rdma i1,o1

Thanks,

-arlin


2006-05-17 16:48:06

by Dave Olson

[permalink] [raw]
Subject: Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

On Mon, 15 May 2006, Roland Dreier wrote:

| This looks like a pastiche of several patches. Why can't it be split
| up into logical pieces?
|
| > Call dma_free_coherent without ipath_mutex held.
|
| Why? Doesn't freeing work with the mutex held?

Sure, that's the way the previous code worked.

We are seeing a bug (with both our driver native MPI processes and mthca mvapic),
where when 8 processes using "simultaneously exit", we get watchdogs and/or hangs
in the close routines. Moving the freeing outside the mutex was an attempt
to see if we were running into some VM issues by doing lots of page unlocking
and freeing with the mutex held. It seemed to help somewhat, but not to solve
the problem.

It also allows other processes to open and close in a somewhat more timely
fashion.

Dave Olson
[email protected]
http://www.unixfolk.com/dave

2006-05-17 18:08:12

by Roland Dreier

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

Dave> We are seeing a bug (with both our driver native MPI
Dave> processes and mthca mvapic), where when 8 processes using
Dave> "simultaneously exit", we get watchdogs and/or hangs in the
Dave> close routines. Moving the freeing outside the mutex was an
Dave> attempt to see if we were running into some VM issues by
Dave> doing lots of page unlocking and freeing with the mutex
Dave> held. It seemed to help somewhat, but not to solve the
Dave> problem.

Am I understanding correctly that you see a hang or watchdog timeout
even with the mthca driver?

Is there any possibility of posting the test case to reproduce this?
It doesn't seem likely that ipath changes are going to fix a generic
bug like this...

- R.

2006-05-18 04:14:03

by Dave Olson

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

On Wed, 17 May 2006, Roland Dreier wrote:

| Dave> We are seeing a bug (with both our driver native MPI
| Dave> processes and mthca mvapic), where when 8 processes using
| Dave> "simultaneously exit", we get watchdogs and/or hangs in the
| Dave> close routines. Moving the freeing outside the mutex was an
| Dave> attempt to see if we were running into some VM issues by
| Dave> doing lots of page unlocking and freeing with the mutex
| Dave> held. It seemed to help somewhat, but not to solve the
| Dave> problem.
|
| Am I understanding correctly that you see a hang or watchdog timeout
| even with the mthca driver?

Yes. That is, the symptoms are the same, although the cause
may be different.

| Is there any possibility of posting the test case to reproduce this?

It's the MPI job mpi_multibw (based on the OSU osu_bw, but changed
to do messaging rate), running 8 copies per dual-core 4-socket opteron,
both on InfiniPath MPI, and MVAPICH (built for gen2).

We ship the source with our upcoming release, and will probably make
it available outside our release.

We did discover one possible problem today, which is shared between
our device code and the core openib code, and that's doing some
memory freeing and accounting from a work thread (updating mm->locked_vm
and cleaning up from earlier get_user_pages); the code in our driver
was copied from the openib core code, it's not literally shared.

I have a strong suspicion that at least sometimes, it's executing after
the current->mm has gone away. I'm looking at that more right now.

| It doesn't seem likely that ipath changes are going to fix a generic
| bug like this...

It wasn't an attempt to fix it, so much as to work around it, while
I worked on other higher priority stuff. As I mentioned, it also helps
a bit in allowing multiple processes to be in the open and close code
simultaneously, when you have multiple cpus, so even on that basis,
I'd probably leave it as it now is.

Dave Olson
[email protected]
http://www.unixfolk.com/dave

2006-05-18 04:55:27

by Roland Dreier

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

Dave> We did discover one possible problem today, which is shared
Dave> between our device code and the core openib code, and that's
Dave> doing some memory freeing and accounting from a work thread
Dave> (updating mm->locked_vm and cleaning up from earlier
Dave> get_user_pages); the code in our driver was copied from the
Dave> openib core code, it's not literally shared.

Dave> I have a strong suspicion that at least sometimes, it's
Dave> executing after the current->mm has gone away. I'm looking
Dave> at that more right now.

It doesn't seem likely to me. In uverbs_mem.c,
ib_umem_release_on_close() does get_task_mm() and gives up if it can't
take a reference to the task's mm. The mmput() doesn't happen until
ib_umem_account() runs in the work thread.

I do see obvious bugs in ipath_user_pages.c, though. In
ipath_release_user_pages_on_close(), you have:

mm = get_task_mm(current);
if (!mm)
goto bail;

work = kmalloc(sizeof(*work), GFP_KERNEL);
if (!work)
goto bail_mm;

goto bail;

INIT_WORK(&work->work, user_pages_account, work);
work->mm = mm;
work->num_pages = num_pages;

bail_mm:
mmput(mm);
bail:
return;

So with the "goto bail" you skip the code which does something with
the work you allocate, which means that you leak not only the work
structure but also the reference to the task's mm that you took.

Even without the "goto bail" the code still wouldn't actually schedule
the work, so the work structure would be leaked, although you would do
mmput().

I'm not sure what you were trying to do here.c

- R.

2006-05-18 05:14:55

by Bryan O'Sullivan

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

On Wed, 2006-05-17 at 21:55 -0700, Roland Dreier wrote:

> So with the "goto bail" you skip the code which does something with
> the work you allocate, which means that you leak not only the work
> structure but also the reference to the task's mm that you took.

Wow. I have no idea where that extra "goto bail" came from. It's not
supposed to be there.

<b

2006-05-18 05:17:52

by Roland Dreier

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

Bryan> Wow. I have no idea where that extra "goto bail" came
Bryan> from. It's not supposed to be there.

Even without it you still leak the work structure, because there's no
schedule_work().

Now that I look at it, in uverbs_mem.c, the mm will be leaked if the
kmalloc fails...

- R.

2006-05-18 05:26:34

by Dave Olson

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

On Wed, 17 May 2006, Dave Olson wrote:

| On Wed, 17 May 2006, Roland Dreier wrote:
|
| | Am I understanding correctly that you see a hang or watchdog timeout
| | even with the mthca driver?
|
| Yes. That is, the symptoms are the same, although the cause
| may be different.
|
| | Is there any possibility of posting the test case to reproduce this?
|
| It's the MPI job mpi_multibw (based on the OSU osu_bw, but changed
| to do messaging rate), running 8 copies per dual-core 4-socket opteron,
| both on InfiniPath MPI, and MVAPICH (built for gen2).

Here's the typical case where the watchdog fires (with infinipath MPI),
on FC4 2.6.16 2108 (without kprobes, with kprobes things are slightly
different, but not much; I'm running without since we were often in
the kprobes code from the exit code, but I think that's just a red-herring).

The sysrq p was some seconds prior to the watchdog. It's almost as
though something is looping far too many times during the close cleanup.

The other 7 exitting processes are typically in
sys_exit_group -> do_exit -> __up_red --> __spin_lock_irqsave -> __up_read (or __down_read)
(from what sysrq t prints). They are all runnable on the other 7
processors.

The infinipath driver does mmap both memory and device pages for each of
these processes.

SysRq : Show Regs
CPU 0:
Modules linked in: ib_sdp(U) ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipath(U) ib_ipoib(U) ib_sa(U) ib_mad(U) ib_core(U) ipath_core(U) nfs(U) nfsd(U) exportfs(U) lockd(U) nfs_acl(U) ipv6(U) autofs4(U) sunrpc(U) video(U) button(U) battery(U) ac(U) i2c_nforce2(U) i2c_core(U) e1000(U) floppy(U) sg(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) sata_nv(U) libata(U) aic79xx(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U)
Pid: 23788, comm: mpi_multibw Not tainted 2.6.16-1.2108_FC4.rootsmp #1
RIP: 0010:[<ffffffff8013c50e>] <ffffffff8013c50e>{__do_softirq+81}
RSP: 0018:ffffffff8048d368 EFLAGS: 00000206
RAX: 0000000000000022 RBX: 0000000000000022 RCX: 0000000000000080
RDX: 0000000000000000 RSI: 00000000000000c0 RDI: ffff81007f1fd0c0
RBP: ffffffff80528f80 R08: 0000000000000200 R09: 0000000000000002
R10: ffffffff804a6a38 R11: 0000000000000000 R12: ffffffff80577c80
R13: 0000000000000000 R14: 000000000000000a R15: 00002aaabba6c000
FS: 00002aaaab32ffa0(0000) GS:ffffffff80511000(0000) knlGS:00000000f7fc86c0
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000055555565ebe8 CR3: 000000007ac6d000 CR4: 00000000000006e0

Call Trace: <IRQ> <ffffffff8010c076>{call_softirq+30}
<ffffffff8010d82c>{do_softirq+44} <ffffffff8010b9d0>{apic_timer_interrupt+132} <EOI>
<ffffffff80355226>{_write_unlock_irq+14} <ffffffff801659d9>{__set_page_dirty_nobuffers+183}
<ffffffff8016cc80>{unmap_vmas+1042} <ffffffff8016fa78>{exit_mmap+124}
<ffffffff80133f17>{mmput+37} <ffffffff80139783>{do_exit+584}
<ffffffff80142aec>{__dequeue_signal+459} <ffffffff80139f00>{sys_exit_group+0}
<ffffffff80143f03>{get_signal_to_deliver+1568} <ffffffff8010a37a>{do_signal+116}
<ffffffff80197151>{__pollwait+0} <ffffffff80197e9c>{sys_select+934}
<ffffffff8010acb7>{sysret_signal+28} <ffffffff8010afa3>{ptregscall_common+103}

[ perhaps 20 or 30 seconds later, NMI fires; we had already been sort of
stuck for 60 seconds or so when I did the sysrq p above ]

NMI Watchdog detected LOCKUP on CPU 1
CPU 1
Modules linked in: ib_sdp(U) ib_cm(U) ib_umad(U) ib_uverbs(U) ib_ipath(U) ib_ipoib(U) ib_sa(U) ib_mad(U) ib_core(U) ipath_core(U) nfs(U) nfsd(U) exportfs(U) lockd(U) nfs_acl(U) ipv6(U) autofs4(U) sunrpc(U) video(U) button(U) battery(U) ac(U) i2c_nforce2(U) i2c_core(U) e1000(U) floppy(U) sg(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) ext3(U) jbd(U) dm_mod(U) sata_nv(U) libata(U) aic79xx(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U)
Pid: 23789, comm: mpi_multibw Not tainted 2.6.16-1.2108_FC4.rootsmp #1
RIP: 0010:[<ffffffff80214bd0>] <ffffffff80214bd0>{_raw_write_lock+161}
RSP: 0018:ffff81007c5b5c18 EFLAGS: 00000086
RAX: 000000008f02e600 RBX: ffff810037cec680 RCX: 00000000002c2671
RDX: 0000000000927190 RSI: 0000000000000001 RDI: ffff810037cec680
RBP: ffff810037cec668 R08: ffff810002d6b500 R09: 00000000fffffffa
R10: 0000000000000003 R11: ffffffff80165922 R12: ffff810037cec680
R13: 00002aaaac200000 R14: ffff810002d6b540 R15: 00002aaabba6c000
FS: 00002aaaaaae6080(0000) GS:ffff81011fc466c0(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000033f38bdaf0 CR3: 000000007c296000 CR4: 00000000000006e0
Process mpi_multibw (pid: 23789, threadinfo ffff81007c5b4000, task ffff8100030557a0)
Stack: ffff810002d6b540 ffffffff8016596b 0000000075ad5067 00002aaaac1b4000
ffff81007d451da0 ffffffff8016cc80 0000000000000000 ffff81007c5b5d38
ffffffffffffffff 0000000000000000
Call Trace: <ffffffff8016596b>{__set_page_dirty_nobuffers+73}
<ffffffff8016cc80>{unmap_vmas+1042} <ffffffff8016fa78>{exit_mmap+124}
<ffffffff80133f17>{mmput+37} <ffffffff80139783>{do_exit+584}
<ffffffff80142aec>{__dequeue_signal+459} <ffffffff80139f00>{sys_exit_group+0}
<ffffffff80143f03>{get_signal_to_deliver+1568} <ffffffff8010a37a>{do_signal+116}
<ffffffff80197151>{__pollwait+0} <ffffffff80197e9c>{sys_select+934}
<ffffffff8010acb7>{sysret_signal+28} <ffffffff8010afa3>{ptregscall_common+103}

Code: 84 c0 75 7f f0 81 03 00 00 00 01 f3 90 48 83 c1 01 48 8b 15
Kernel panic - not syncing: nmi watchdog

2006-05-18 07:04:32

by Dave Olson

[permalink] [raw]
Subject: Re: [openib-general] Re: [PATCH 35 of 53] ipath - some interrelated stability and cleanliness fixes

On Wed, 17 May 2006, Roland Dreier wrote:
| I do see obvious bugs in ipath_user_pages.c, though. In
| ipath_release_user_pages_on_close(), you have:
|
| mm = get_task_mm(current);
| if (!mm)
| goto bail;

It turns out that since this is called from ipath_close(),
mm will always be NULL, so what we do is leak memory, and
possibly leave some locked pages. I've been looking at this
code this evening; fixing it is clearly needed, but doesn't help
the long delays, hangs, and watchdogs, so far.

Dave Olson
[email protected]
http://www.unixfolk.com/dave