2005-01-13 01:18:27

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][0/18] InfiniBand: updates for 2.6.11-rc1

Here are updates since the initial merge to drivers/infiniband taken
from the OpenIB repository. This is a mix of cleanups, bug fixes and
small features. There shouldn't be anything controversial or risky.

Thanks,
Roland


2005-01-12 21:55:48

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][14/18] InfiniBand/core: add qp_type to struct ib_qp

Add qp_type to struct ib_qp.

Signed-off by: Sean Hefty <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/include/ib_verbs.h (revision 1507)
+++ linux/drivers/infiniband/include/ib_verbs.h (revision 1508)
@@ -659,6 +659,7 @@
void (*event_handler)(struct ib_event *, void *);
void *qp_context;
u32 qp_num;
+ enum ib_qp_type qp_type;
};

struct ib_mr {
--- linux/drivers/infiniband/core/verbs.c (revision 1507)
+++ linux/drivers/infiniband/core/verbs.c (revision 1508)
@@ -132,6 +132,7 @@
qp->srq = qp_init_attr->srq;
qp->event_handler = qp_init_attr->event_handler;
qp->qp_context = qp_init_attr->qp_context;
+ qp->qp_type = qp_init_attr->qp_type;
atomic_inc(&pd->usecnt);
atomic_inc(&qp_init_attr->send_cq->usecnt);
atomic_inc(&qp_init_attr->recv_cq->usecnt);

2005-01-12 21:55:49

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][15/18] InfiniBand/core: add ib_find_cached_gid function

Add a new function to find a port on a device given a GID by searching
the cached GID tables. Document all cache functions in ib_cache.h.
Rename existing functions to better match format of verb routines.

Signed-off by: Sean Hefty <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/ulp/ipoib/ipoib_verbs.c (revision 1508)
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_verbs.c (revision 1509)
@@ -49,7 +49,7 @@
if (!qp_attr)
goto out;

- if (ib_cached_pkey_find(priv->ca, priv->port, priv->pkey, &pkey_index)) {
+ if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index)) {
clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
ret = -ENXIO;
goto out;
@@ -104,7 +104,7 @@
* The port has to be assigned to the respective IB partition in
* advance.
*/
- ret = ib_cached_pkey_find(priv->ca, priv->port, priv->pkey, &pkey_index);
+ ret = ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index);
if (ret) {
clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
return ret;
--- linux/drivers/infiniband/ulp/ipoib/ipoib_ib.c (revision 1508)
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_ib.c (revision 1509)
@@ -630,7 +630,7 @@
struct ipoib_dev_priv *priv = netdev_priv(dev);
u16 pkey_index = 0;

- if (ib_cached_pkey_find(priv->ca, priv->port, priv->pkey, &pkey_index))
+ if (ib_find_cached_pkey(priv->ca, priv->port, priv->pkey, &pkey_index))
clear_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
else
set_bit(IPOIB_PKEY_ASSIGNED, &priv->flags);
--- linux/drivers/infiniband/include/ib_cache.h (revision 1508)
+++ linux/drivers/infiniband/include/ib_cache.h (revision 1509)
@@ -37,16 +37,66 @@

#include <ib_verbs.h>

-int ib_cached_gid_get(struct ib_device *device,
- u8 port,
+/**
+ * ib_get_cached_gid - Returns a cached GID table entry
+ * @device: The device to query.
+ * @port_num: The port number of the device to query.
+ * @index: The index into the cached GID table to query.
+ * @gid: The GID value found at the specified index.
+ *
+ * ib_get_cached_gid() fetches the specified GID table entry stored in
+ * the local software cache.
+ */
+int ib_get_cached_gid(struct ib_device *device,
+ u8 port_num,
int index,
union ib_gid *gid);
-int ib_cached_pkey_get(struct ib_device *device_handle,
- u8 port,
+
+/**
+ * ib_find_cached_gid - Returns the port number and GID table index where
+ * a specified GID value occurs.
+ * @device: The device to query.
+ * @gid: The GID value to search for.
+ * @port_num: The port number of the device where the GID value was found.
+ * @index: The index into the cached GID table where the GID was found. This
+ * parameter may be NULL.
+ *
+ * ib_find_cached_gid() searches for the specified GID value in
+ * the local software cache.
+ */
+int ib_find_cached_gid(struct ib_device *device,
+ union ib_gid *gid,
+ u8 *port_num,
+ u16 *index);
+
+/**
+ * ib_get_cached_pkey - Returns a cached PKey table entry
+ * @device: The device to query.
+ * @port_num: The port number of the device to query.
+ * @index: The index into the cached PKey table to query.
+ * @pkey: The PKey value found at the specified index.
+ *
+ * ib_get_cached_pkey() fetches the specified PKey table entry stored in
+ * the local software cache.
+ */
+int ib_get_cached_pkey(struct ib_device *device_handle,
+ u8 port_num,
int index,
u16 *pkey);
-int ib_cached_pkey_find(struct ib_device *device,
- u8 port,
+
+/**
+ * ib_find_cached_pkey - Returns the PKey table index where a specified
+ * PKey value occurs.
+ * @device: The device to query.
+ * @port_num: The port number of the device to search for the PKey.
+ * @pkey: The PKey value to search for.
+ * @index: The index into the cached PKey table where the PKey was found.
+ *
+ * ib_find_cached_pkey() searches the specified PKey table in
+ * the local software cache.
+ */
+int ib_find_cached_pkey(struct ib_device *device,
+ u8 port_num,
u16 pkey,
u16 *index);

--- linux/drivers/infiniband/core/cache.c (revision 1508)
+++ linux/drivers/infiniband/core/cache.c (revision 1509)
@@ -65,8 +65,8 @@
return device->node_type == IB_NODE_SWITCH ? 0 : device->phys_port_cnt;
}

-int ib_cached_gid_get(struct ib_device *device,
- u8 port,
+int ib_get_cached_gid(struct ib_device *device,
+ u8 port_num,
int index,
union ib_gid *gid)
{
@@ -74,12 +74,12 @@
unsigned long flags;
int ret = 0;

- if (port < start_port(device) || port > end_port(device))
+ if (port_num < start_port(device) || port_num > end_port(device))
return -EINVAL;

read_lock_irqsave(&device->cache.lock, flags);

- cache = device->cache.gid_cache[port - start_port(device)];
+ cache = device->cache.gid_cache[port_num - start_port(device)];

if (index < 0 || index >= cache->table_len)
ret = -EINVAL;
@@ -90,10 +90,45 @@

return ret;
}
-EXPORT_SYMBOL(ib_cached_gid_get);
+EXPORT_SYMBOL(ib_get_cached_gid);

-int ib_cached_pkey_get(struct ib_device *device,
- u8 port,
+int ib_find_cached_gid(struct ib_device *device,
+ union ib_gid *gid,
+ u8 *port_num,
+ u16 *index)
+{
+ struct ib_gid_cache *cache;
+ unsigned long flags;
+ int p, i;
+ int ret = -ENOENT;
+
+ *port_num = -1;
+ if (index)
+ *index = -1;
+
+ read_lock_irqsave(&device->cache.lock, flags);
+
+ for (p = 0; p <= end_port(device) - start_port(device); ++p) {
+ cache = device->cache.gid_cache[p];
+ for (i = 0; i < cache->table_len; ++i) {
+ if (!memcmp(gid, &cache->table[i], sizeof *gid)) {
+ *port_num = p;
+ if (index)
+ *index = i;
+ ret = 0;
+ goto found;
+ }
+ }
+ }
+found:
+ read_unlock_irqrestore(&device->cache.lock, flags);
+
+ return ret;
+}
+EXPORT_SYMBOL(ib_find_cached_gid);
+
+int ib_get_cached_pkey(struct ib_device *device,
+ u8 port_num,
int index,
u16 *pkey)
{
@@ -101,12 +136,12 @@
unsigned long flags;
int ret = 0;

- if (port < start_port(device) || port > end_port(device))
+ if (port_num < start_port(device) || port_num > end_port(device))
return -EINVAL;

read_lock_irqsave(&device->cache.lock, flags);

- cache = device->cache.pkey_cache[port - start_port(device)];
+ cache = device->cache.pkey_cache[port_num - start_port(device)];

if (index < 0 || index >= cache->table_len)
ret = -EINVAL;
@@ -117,10 +152,10 @@

return ret;
}
-EXPORT_SYMBOL(ib_cached_pkey_get);
+EXPORT_SYMBOL(ib_get_cached_pkey);

-int ib_cached_pkey_find(struct ib_device *device,
- u8 port,
+int ib_find_cached_pkey(struct ib_device *device,
+ u8 port_num,
u16 pkey,
u16 *index)
{
@@ -129,12 +164,12 @@
int i;
int ret = -ENOENT;

- if (port < start_port(device) || port > end_port(device))
+ if (port_num < start_port(device) || port_num > end_port(device))
return -EINVAL;

read_lock_irqsave(&device->cache.lock, flags);

- cache = device->cache.pkey_cache[port - start_port(device)];
+ cache = device->cache.pkey_cache[port_num - start_port(device)];

*index = -1;

@@ -149,7 +184,7 @@

return ret;
}
-EXPORT_SYMBOL(ib_cached_pkey_find);
+EXPORT_SYMBOL(ib_find_cached_pkey);

static void ib_cache_update(struct ib_device *device,
u8 port)
--- linux/drivers/infiniband/hw/mthca/mthca_av.c (revision 1508)
+++ linux/drivers/infiniband/hw/mthca/mthca_av.c (revision 1509)
@@ -159,7 +159,7 @@
(be32_to_cpu(ah->av->sl_tclass_flowlabel) >> 20) & 0xff;
header->grh.flow_label =
ah->av->sl_tclass_flowlabel & cpu_to_be32(0xfffff);
- ib_cached_gid_get(&dev->ib_dev,
+ ib_get_cached_gid(&dev->ib_dev,
be32_to_cpu(ah->av->port_pd) >> 24,
ah->av->gid_index,
&header->grh.source_gid);
--- linux/drivers/infiniband/hw/mthca/mthca_qp.c (revision 1508)
+++ linux/drivers/infiniband/hw/mthca/mthca_qp.c (revision 1509)
@@ -1190,11 +1190,11 @@
sqp->ud_header.lrh.source_lid = 0xffff;
sqp->ud_header.bth.solicited_event = !!(wr->send_flags & IB_SEND_SOLICITED);
if (!sqp->qp.ibqp.qp_num)
- ib_cached_pkey_get(&dev->ib_dev, sqp->port,
+ ib_get_cached_pkey(&dev->ib_dev, sqp->port,
sqp->pkey_index,
&sqp->ud_header.bth.pkey);
else
- ib_cached_pkey_get(&dev->ib_dev, sqp->port,
+ ib_get_cached_pkey(&dev->ib_dev, sqp->port,
wr->wr.ud.pkey_index,
&sqp->ud_header.bth.pkey);
cpu_to_be16s(&sqp->ud_header.bth.pkey);

2005-01-12 21:59:56

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][11/18] InfiniBand/mthca: clean up computation of HCA memory map

Clean up the computation of the HCA context memory map. This serves two purposes:
- make it easier to change the HCA "profile" (eg add more QPs)
- make it easier to implement mem-free Arbel support

Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/hw/mthca/mthca_dev.h (revision 1493)
+++ linux/drivers/infiniband/hw/mthca/mthca_dev.h (revision 1494)
@@ -70,13 +70,16 @@
};

enum {
- MTHCA_MPT_ENTRY_SIZE = 0x40,
MTHCA_EQ_CONTEXT_SIZE = 0x40,
MTHCA_CQ_CONTEXT_SIZE = 0x40,
MTHCA_QP_CONTEXT_SIZE = 0x200,
MTHCA_RDB_ENTRY_SIZE = 0x20,
MTHCA_AV_SIZE = 0x20,
- MTHCA_MGM_ENTRY_SIZE = 0x40
+ MTHCA_MGM_ENTRY_SIZE = 0x40,
+
+ /* Arbel FW gives us these, but we need them for Tavor */
+ MTHCA_MPT_ENTRY_SIZE = 0x40,
+ MTHCA_MTT_SEG_SIZE = 0x40,
};

enum {
--- linux/drivers/infiniband/hw/mthca/mthca_main.c (revision 1493)
+++ linux/drivers/infiniband/hw/mthca/mthca_main.c (revision 1494)
@@ -76,6 +76,20 @@
"ib_mthca: Mellanox InfiniBand HCA driver v"
DRV_VERSION " (" DRV_RELDATE ")\n";

+static struct mthca_profile default_profile = {
+ .num_qp = 1 << 16,
+ .rdb_per_qp = 4,
+ .num_cq = 1 << 16,
+ .num_mcg = 1 << 13,
+ .num_mpt = 1 << 17,
+ .num_mtt = 1 << 20
+};
+
+enum {
+ MTHCA_TAVOR_NUM_UDAV = 1 << 15,
+ MTHCA_ARBEL_UARC_SIZE = 1 << 18
+};
+
static int __devinit mthca_tune_pci(struct mthca_dev *mdev)
{
int cap;
@@ -175,6 +189,7 @@
u8 status;
int err;
struct mthca_dev_lim dev_lim;
+ struct mthca_profile profile;
struct mthca_init_hca_param init_hca;
struct mthca_adapter adapter;

@@ -214,7 +229,11 @@

err = mthca_dev_lim(mdev, &dev_lim);

- err = mthca_make_profile(mdev, &dev_lim, &init_hca);
+ profile = default_profile;
+ profile.num_uar = dev_lim.uar_size / PAGE_SIZE;
+ profile.num_udav = MTHCA_TAVOR_NUM_UDAV;
+
+ err = mthca_make_profile(mdev, &profile, &dev_lim, &init_hca);
if (err)
goto err_out_disable;

--- linux/drivers/infiniband/hw/mthca/mthca_profile.c (revision 1493)
+++ linux/drivers/infiniband/hw/mthca/mthca_profile.c (revision 1494)
@@ -37,31 +37,34 @@

#include "mthca_profile.h"

-static int default_profile[MTHCA_RES_NUM] = {
- [MTHCA_RES_QP] = 1 << 16,
- [MTHCA_RES_EQP] = 1 << 16,
- [MTHCA_RES_CQ] = 1 << 16,
- [MTHCA_RES_EQ] = 32,
- [MTHCA_RES_RDB] = 1 << 18,
- [MTHCA_RES_MCG] = 1 << 13,
- [MTHCA_RES_MPT] = 1 << 17,
- [MTHCA_RES_MTT] = 1 << 20,
- [MTHCA_RES_UDAV] = 1 << 15
-};
-
enum {
- MTHCA_MTT_SEG_SIZE = 64
+ MTHCA_RES_QP,
+ MTHCA_RES_EEC,
+ MTHCA_RES_SRQ,
+ MTHCA_RES_CQ,
+ MTHCA_RES_EQP,
+ MTHCA_RES_EEEC,
+ MTHCA_RES_EQ,
+ MTHCA_RES_RDB,
+ MTHCA_RES_MCG,
+ MTHCA_RES_MPT,
+ MTHCA_RES_MTT,
+ MTHCA_RES_UAR,
+ MTHCA_RES_UDAV,
+ MTHCA_RES_UARC,
+ MTHCA_RES_NUM
};

enum {
+ MTHCA_NUM_EQS = 32,
MTHCA_NUM_PDS = 1 << 15
};

int mthca_make_profile(struct mthca_dev *dev,
+ struct mthca_profile *request,
struct mthca_dev_lim *dev_lim,
struct mthca_init_hca_param *init_hca)
{
- /* just use default profile for now */
struct mthca_resource {
u64 size;
u64 start;
@@ -70,17 +73,18 @@
int log_num;
};

+ u64 mem_base, mem_avail;
u64 total_size = 0;
struct mthca_resource *profile;
struct mthca_resource tmp;
int i, j;

- default_profile[MTHCA_RES_UAR] = dev_lim->uar_size / PAGE_SIZE;
-
profile = kmalloc(MTHCA_RES_NUM * sizeof *profile, GFP_KERNEL);
if (!profile)
return -ENOMEM;

+ memset(profile, 0, MTHCA_RES_NUM * sizeof *profile);
+
profile[MTHCA_RES_QP].size = dev_lim->qpc_entry_sz;
profile[MTHCA_RES_EEC].size = dev_lim->eec_entry_sz;
profile[MTHCA_RES_SRQ].size = dev_lim->srq_entry_sz;
@@ -90,18 +94,38 @@
profile[MTHCA_RES_EQ].size = dev_lim->eqc_entry_sz;
profile[MTHCA_RES_RDB].size = MTHCA_RDB_ENTRY_SIZE;
profile[MTHCA_RES_MCG].size = MTHCA_MGM_ENTRY_SIZE;
- profile[MTHCA_RES_MPT].size = MTHCA_MPT_ENTRY_SIZE;
- profile[MTHCA_RES_MTT].size = MTHCA_MTT_SEG_SIZE;
+ profile[MTHCA_RES_MPT].size = dev_lim->mpt_entry_sz;
+ profile[MTHCA_RES_MTT].size = dev_lim->mtt_seg_sz;
profile[MTHCA_RES_UAR].size = dev_lim->uar_scratch_entry_sz;
profile[MTHCA_RES_UDAV].size = MTHCA_AV_SIZE;
+ profile[MTHCA_RES_UARC].size = request->uarc_size;

+ profile[MTHCA_RES_QP].num = request->num_qp;
+ profile[MTHCA_RES_EQP].num = request->num_qp;
+ profile[MTHCA_RES_RDB].num = request->num_qp * request->rdb_per_qp;
+ profile[MTHCA_RES_CQ].num = request->num_cq;
+ profile[MTHCA_RES_EQ].num = MTHCA_NUM_EQS;
+ profile[MTHCA_RES_MCG].num = request->num_mcg;
+ profile[MTHCA_RES_MPT].num = request->num_mpt;
+ profile[MTHCA_RES_MTT].num = request->num_mtt;
+ profile[MTHCA_RES_UAR].num = request->num_uar;
+ profile[MTHCA_RES_UARC].num = request->num_uar;
+ profile[MTHCA_RES_UDAV].num = request->num_udav;
+
for (i = 0; i < MTHCA_RES_NUM; ++i) {
profile[i].type = i;
- profile[i].num = default_profile[i];
- profile[i].log_num = max(ffs(default_profile[i]) - 1, 0);
- profile[i].size *= default_profile[i];
+ profile[i].log_num = max(ffs(profile[i].num) - 1, 0);
+ profile[i].size *= profile[i].num;
}

+ if (dev->hca_type == ARBEL_NATIVE) {
+ mem_base = 0;
+ mem_avail = dev_lim->hca.arbel.max_icm_sz;
+ } else {
+ mem_base = dev->ddr_start;
+ mem_avail = dev->fw.tavor.fw_start - dev->ddr_start;
+ }
+
/*
* Sort the resources in decreasing order of size. Since they
* all have sizes that are powers of 2, we'll be able to keep
@@ -119,16 +143,14 @@

for (i = 0; i < MTHCA_RES_NUM; ++i) {
if (profile[i].size) {
- profile[i].start = dev->ddr_start + total_size;
+ profile[i].start = mem_base + total_size;
total_size += profile[i].size;
}
- if (total_size > dev->fw.tavor.fw_start - dev->ddr_start) {
+ if (total_size > mem_avail) {
mthca_err(dev, "Profile requires 0x%llx bytes; "
- "won't fit between DDR start at 0x%016llx "
- "and FW start at 0x%016llx.\n",
+ "won't in 0x%llx bytes of context memory.\n",
(unsigned long long) total_size,
- (unsigned long long) dev->ddr_start,
- (unsigned long long) dev->fw.tavor.fw_start);
+ (unsigned long long) mem_avail);
kfree(profile);
return -ENOMEM;
}
@@ -141,10 +163,13 @@
(unsigned long long) profile[i].size);
}

- mthca_dbg(dev, "HCA memory: allocated %d KB/%d KB (%d KB free)\n",
- (int) (total_size >> 10),
- (int) ((dev->fw.tavor.fw_start - dev->ddr_start) >> 10),
- (int) ((dev->fw.tavor.fw_start - dev->ddr_start - total_size) >> 10));
+ if (dev->hca_type == ARBEL_NATIVE)
+ mthca_dbg(dev, "HCA context memory: reserving %d KB\n",
+ (int) (total_size >> 10));
+ else
+ mthca_dbg(dev, "HCA memory: allocated %d KB/%d KB (%d KB free)\n",
+ (int) (total_size >> 10), (int) (mem_avail >> 10),
+ (int) ((mem_avail - total_size) >> 10));

for (i = 0; i < MTHCA_RES_NUM; ++i) {
switch (profile[i].type) {
@@ -203,10 +228,10 @@
break;
case MTHCA_RES_MTT:
dev->limits.num_mtt_segs = profile[i].num;
- dev->limits.mtt_seg_size = MTHCA_MTT_SEG_SIZE;
+ dev->limits.mtt_seg_size = dev_lim->mtt_seg_sz;
dev->mr_table.mtt_base = profile[i].start;
init_hca->mtt_base = profile[i].start;
- init_hca->mtt_seg_sz = ffs(MTHCA_MTT_SEG_SIZE) - 7;
+ init_hca->mtt_seg_sz = ffs(dev_lim->mtt_seg_sz) - 7;
break;
case MTHCA_RES_UAR:
init_hca->uar_scratch_base = profile[i].start;
--- linux/drivers/infiniband/hw/mthca/mthca_cmd.c (revision 1493)
+++ linux/drivers/infiniband/hw/mthca/mthca_cmd.c (revision 1494)
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2004 Topspin Communications. All rights reserved.
+ * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
@@ -959,9 +959,9 @@
MTHCA_GET(field, outbox, QUERY_DEV_LIM_RSZ_SRQ_OFFSET);
dev_lim->hca.arbel.resize_srq = field & 1;
MTHCA_GET(size, outbox, QUERY_DEV_LIM_MTT_ENTRY_SZ_OFFSET);
- dev_lim->hca.arbel.mtt_entry_sz = size;
+ dev_lim->mtt_seg_sz = size;
MTHCA_GET(size, outbox, QUERY_DEV_LIM_MPT_ENTRY_SZ_OFFSET);
- dev_lim->hca.arbel.mpt_entry_sz = size;
+ dev_lim->mpt_entry_sz = size;
MTHCA_GET(field, outbox, QUERY_DEV_LIM_PBL_SZ_OFFSET);
dev_lim->hca.arbel.max_pbl_sz = 1 << (field & 0x3f);
MTHCA_GET(dev_lim->hca.arbel.bmme_flags, outbox,
@@ -987,6 +987,8 @@
} else {
MTHCA_GET(field, outbox, QUERY_DEV_LIM_MAX_AV_OFFSET);
dev_lim->hca.tavor.max_avs = 1 << (field & 0x3f);
+ dev_lim->mtt_seg_sz = MTHCA_MTT_SEG_SIZE;
+ dev_lim->mpt_entry_sz = MTHCA_MPT_ENTRY_SIZE;
}

out:
--- linux/drivers/infiniband/hw/mthca/mthca_profile.h (revision 1493)
+++ linux/drivers/infiniband/hw/mthca/mthca_profile.h (revision 1494)
@@ -38,24 +38,20 @@
#include "mthca_dev.h"
#include "mthca_cmd.h"

-enum {
- MTHCA_RES_QP,
- MTHCA_RES_EEC,
- MTHCA_RES_SRQ,
- MTHCA_RES_CQ,
- MTHCA_RES_EQP,
- MTHCA_RES_EEEC,
- MTHCA_RES_EQ,
- MTHCA_RES_RDB,
- MTHCA_RES_MCG,
- MTHCA_RES_MPT,
- MTHCA_RES_MTT,
- MTHCA_RES_UAR,
- MTHCA_RES_UDAV,
- MTHCA_RES_NUM
+struct mthca_profile {
+ int num_qp;
+ int rdb_per_qp;
+ int num_cq;
+ int num_mcg;
+ int num_mpt;
+ int num_mtt;
+ int num_udav;
+ int num_uar;
+ int uarc_size;
};

int mthca_make_profile(struct mthca_dev *mdev,
+ struct mthca_profile *request,
struct mthca_dev_lim *dev_lim,
struct mthca_init_hca_param *init_hca);

--- linux/drivers/infiniband/hw/mthca/mthca_cmd.h (revision 1493)
+++ linux/drivers/infiniband/hw/mthca/mthca_cmd.h (revision 1494)
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2004 Topspin Communications. All rights reserved.
+ * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
@@ -148,14 +148,14 @@
int cqc_entry_sz;
int srq_entry_sz;
int uar_scratch_entry_sz;
+ int mtt_seg_sz;
+ int mpt_entry_sz;
union {
struct {
int max_avs;
} tavor;
struct {
int resize_srq;
- int mtt_entry_sz;
- int mpt_entry_sz;
int max_pbl_sz;
u8 bmme_flags;
u32 reserved_lkey;

2005-01-12 21:59:57

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][12/18] InfiniBand/core: fix handling of 0-hop directed route MADs

Handle outgoing DR 0 hop SMPs properly when provider returns just
SUCCESS to process_mad.

Signed-off-by: Hal Rosenstock <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/core/mad.c (revision 1501)
+++ linux/drivers/infiniband/core/mad.c (revision 1502)
@@ -60,6 +60,9 @@
static int method_in_use(struct ib_mad_mgmt_method_table **method,
struct ib_mad_reg_req *mad_reg_req);
static void remove_mad_reg_req(struct ib_mad_agent_private *priv);
+static struct ib_mad_agent_private *find_mad_agent(
+ struct ib_mad_port_private *port_priv,
+ struct ib_mad *mad, int solicited);
static int ib_mad_post_receive_mads(struct ib_mad_qp_info *qp_info,
struct ib_mad_private *mad);
static void cancel_mads(struct ib_mad_agent_private *mad_agent_priv);
@@ -623,10 +626,12 @@
struct ib_smp *smp,
struct ib_send_wr *send_wr)
{
- int ret, alloc_flags;
+ int ret, alloc_flags, solicited;
unsigned long flags;
struct ib_mad_local_private *local;
struct ib_mad_private *mad_priv;
+ struct ib_mad_port_private *port_priv;
+ struct ib_mad_agent_private *recv_mad_agent = NULL;
struct ib_device *device = mad_agent_priv->agent.device;
u8 port_num = mad_agent_priv->agent.port_num;

@@ -651,6 +656,7 @@
goto out;
}
local->mad_priv = NULL;
+ local->recv_mad_agent = NULL;
mad_priv = kmem_cache_alloc(ib_mad_cache, alloc_flags);
if (!mad_priv) {
ret = -ENOMEM;
@@ -669,19 +675,41 @@
* there is a recv handler
*/
if (solicited_mad(&mad_priv->mad.mad) &&
- mad_agent_priv->agent.recv_handler)
+ mad_agent_priv->agent.recv_handler) {
local->mad_priv = mad_priv;
- else
+ local->recv_mad_agent = mad_agent_priv;
+ /*
+ * Reference MAD agent until receive
+ * side of local completion handled
+ */
+ atomic_inc(&mad_agent_priv->refcount);
+ } else
kmem_cache_free(ib_mad_cache, mad_priv);
break;
case IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED:
kmem_cache_free(ib_mad_cache, mad_priv);
break;
case IB_MAD_RESULT_SUCCESS:
- kmem_cache_free(ib_mad_cache, mad_priv);
- kfree(local);
- ret = 0;
- goto out;
+ /* Treat like an incoming receive MAD */
+ solicited = solicited_mad(&mad_priv->mad.mad);
+ port_priv = ib_get_mad_port(mad_agent_priv->agent.device,
+ mad_agent_priv->agent.port_num);
+ if (port_priv) {
+ mad_priv->mad.mad.mad_hdr.tid =
+ ((struct ib_mad *)smp)->mad_hdr.tid;
+ recv_mad_agent = find_mad_agent(port_priv,
+ &mad_priv->mad.mad,
+ solicited);
+ }
+ if (!port_priv || !recv_mad_agent) {
+ kmem_cache_free(ib_mad_cache, mad_priv);
+ kfree(local);
+ ret = 0;
+ goto out;
+ }
+ local->mad_priv = mad_priv;
+ local->recv_mad_agent = recv_mad_agent;
+ break;
default:
kmem_cache_free(ib_mad_cache, mad_priv);
kfree(local);
@@ -696,7 +724,7 @@
local->send_wr.next = NULL;
local->tid = send_wr->wr.ud.mad_hdr->tid;
local->wr_id = send_wr->wr_id;
- /* Reference MAD agent until local completion handled */
+ /* Reference MAD agent until send side of local completion handled */
atomic_inc(&mad_agent_priv->refcount);
/* Queue local completion to local list */
spin_lock_irqsave(&mad_agent_priv->lock, flags);
@@ -1997,6 +2025,7 @@
{
struct ib_mad_agent_private *mad_agent_priv;
struct ib_mad_local_private *local;
+ struct ib_mad_agent_private *recv_mad_agent;
unsigned long flags;
struct ib_wc wc;
struct ib_mad_send_wc mad_send_wc;
@@ -2010,6 +2039,13 @@
completion_list);
spin_unlock_irqrestore(&mad_agent_priv->lock, flags);
if (local->mad_priv) {
+ recv_mad_agent = local->recv_mad_agent;
+ if (!recv_mad_agent) {
+ printk(KERN_ERR PFX "No receive MAD agent for local completion\n");
+ kmem_cache_free(ib_mad_cache, local->mad_priv);
+ goto local_send_completion;
+ }
+
/*
* Defined behavior is to complete response
* before request
@@ -2034,15 +2070,19 @@
local->mad_priv->header.recv_wc.recv_buf.grh = NULL;
local->mad_priv->header.recv_wc.recv_buf.mad =
&local->mad_priv->mad.mad;
- if (atomic_read(&mad_agent_priv->qp_info->snoop_count))
- snoop_recv(mad_agent_priv->qp_info,
+ if (atomic_read(&recv_mad_agent->qp_info->snoop_count))
+ snoop_recv(recv_mad_agent->qp_info,
&local->mad_priv->header.recv_wc,
IB_MAD_SNOOP_RECVS);
- mad_agent_priv->agent.recv_handler(
- &mad_agent_priv->agent,
+ recv_mad_agent->agent.recv_handler(
+ &recv_mad_agent->agent,
&local->mad_priv->header.recv_wc);
+ spin_lock_irqsave(&recv_mad_agent->lock, flags);
+ atomic_dec(&recv_mad_agent->refcount);
+ spin_unlock_irqrestore(&recv_mad_agent->lock, flags);
}

+local_send_completion:
/* Complete send */
mad_send_wc.status = IB_WC_SUCCESS;
mad_send_wc.vendor_err = 0;
--- linux/drivers/infiniband/core/mad_priv.h (revision 1501)
+++ linux/drivers/infiniband/core/mad_priv.h (revision 1502)
@@ -127,6 +127,7 @@
struct ib_mad_local_private {
struct list_head completion_list;
struct ib_mad_private *mad_priv;
+ struct ib_mad_agent_private *recv_mad_agent;
struct ib_send_wr send_wr;
struct ib_sge sg_list[IB_MAD_SEND_REQ_MAX_SG];
u64 wr_id; /* client WR ID */

2005-01-12 22:04:37

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][5/18] InfiniBand/mthca: add needed rmb() in event queue poll

Add an rmb() between checking the ownership bit of an event queue
entry and reading the contents of the EQE. Without this barrier, the
CPU could read stale contents of the EQE before HW writes the EQE but
have the read of the ownership bit reordered until after HW finishes
writing, which leads to the driver processing an incorrect event. This
was actually observed to happen when multiple completion queues are in
heavy use on an IBM JS20 PowerPC 970 system.

Also explain the existing rmb() in completion queue poll (there for
the same reason) and slightly improve debugging output.

Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/hw/mthca/mthca_cq.c (revision 1437)
+++ linux/drivers/infiniband/hw/mthca/mthca_cq.c (revision 1439)
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2004 Topspin Communications. All rights reserved.
+ * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
@@ -391,6 +391,10 @@
if (!next_cqe_sw(cq))
return -EAGAIN;

+ /*
+ * Make sure we read CQ entry contents after we've checked the
+ * ownership bit.
+ */
rmb();

cqe = get_cqe(cq, cq->cons_index);
@@ -768,7 +772,8 @@
u32 *ctx = MAILBOX_ALIGN(mailbox);
int j;

- printk(KERN_ERR "context for CQN %x\n", cq->cqn);
+ printk(KERN_ERR "context for CQN %x (cons index %x, next sw %d)\n",
+ cq->cqn, cq->cons_index, next_cqe_sw(cq));
for (j = 0; j < 16; ++j)
printk(KERN_ERR "[%2x] %08x\n", j * 4, be32_to_cpu(ctx[j]));
}
--- linux/drivers/infiniband/hw/mthca/mthca_eq.c (revision 1437)
+++ linux/drivers/infiniband/hw/mthca/mthca_eq.c (revision 1439)
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2004 Topspin Communications. All rights reserved.
+ * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
@@ -240,6 +240,12 @@
int set_ci = 0;
eqe = get_eqe(eq, eq->cons_index);

+ /*
+ * Make sure we read EQ entry contents after we've
+ * checked the ownership bit.
+ */
+ rmb();
+
switch (eqe->type) {
case MTHCA_EVENT_TYPE_COMP:
disarm_cqn = be32_to_cpu(eqe->event.comp.cqn) & 0xffffff;

2005-01-12 22:04:40

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][6/18] InfiniBand/core: remove debug printk

Remove debug printk accidentally included.

Signed-off-by: Tom Duffy <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/core/sysfs.c (revision 1455)
+++ linux/drivers/infiniband/core/sysfs.c (revision 1456)
@@ -188,8 +188,6 @@
case 4: speed = " QDR"; break;
}

- printk(KERN_ERR "width %d speed %d\n", attr.active_width, attr.active_speed);
-
rate = 25 * ib_width_enum_to_int(attr.active_width) * attr.active_speed;
if (rate < 0)
return -EINVAL;

2005-01-12 22:09:19

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][3/18] InfiniBand/mthca: support RDMA/atomic attributes in QP modify

Implement setting of RDMA/atomic enable bits, initiator resources and
responder resources for modify QP in low-level Mellanox HCA driver
(should complete RDMA/atomic implementation).

Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/hw/mthca/mthca_dev.h (revision 1421)
+++ linux/drivers/infiniband/hw/mthca/mthca_dev.h (revision 1422)
@@ -75,6 +75,7 @@
MTHCA_EQ_CONTEXT_SIZE = 0x40,
MTHCA_CQ_CONTEXT_SIZE = 0x40,
MTHCA_QP_CONTEXT_SIZE = 0x200,
+ MTHCA_RDB_ENTRY_SIZE = 0x20,
MTHCA_AV_SIZE = 0x20,
MTHCA_MGM_ENTRY_SIZE = 0x40
};
@@ -121,7 +122,6 @@
int mtt_seg_size;
int reserved_mtts;
int reserved_mrws;
- int num_rdbs;
int reserved_uars;
int num_mgms;
int num_amgms;
@@ -174,6 +174,8 @@

struct mthca_qp_table {
struct mthca_alloc alloc;
+ u32 rdb_base;
+ int rdb_shift;
int sqp_start;
spinlock_t lock;
struct mthca_array qp;
--- linux/drivers/infiniband/hw/mthca/mthca_provider.h (revision 1421)
+++ linux/drivers/infiniband/hw/mthca/mthca_provider.h (revision 1422)
@@ -162,9 +162,12 @@
spinlock_t lock;
atomic_t refcount;
u32 qpn;
- int transport;
- enum ib_qp_state state;
int is_direct;
+ u8 transport;
+ u8 state;
+ u8 atomic_rd_en;
+ u8 resp_depth;
+
struct mthca_mr mr;

struct mthca_wq rq;
--- linux/drivers/infiniband/hw/mthca/mthca_profile.c (revision 1421)
+++ linux/drivers/infiniband/hw/mthca/mthca_profile.c (revision 1422)
@@ -50,7 +50,6 @@
};

enum {
- MTHCA_RDB_ENTRY_SIZE = 32,
MTHCA_MTT_SEG_SIZE = 64
};

@@ -181,8 +180,13 @@
init_hca->log_num_eqs = profile[i].log_num;
break;
case MTHCA_RES_RDB:
- dev->limits.num_rdbs = profile[i].num;
- init_hca->rdb_base = profile[i].start;
+ for (dev->qp_table.rdb_shift = 0;
+ profile[MTHCA_RES_QP].num << dev->qp_table.rdb_shift <
+ profile[i].num;
+ ++dev->qp_table.rdb_shift)
+ ; /* nothing */
+ dev->qp_table.rdb_base = (u32) profile[i].start;
+ init_hca->rdb_base = profile[i].start;
break;
case MTHCA_RES_MCG:
dev->limits.num_mgms = profile[i].num >> 1;
--- linux/drivers/infiniband/hw/mthca/mthca_qp.c (revision 1421)
+++ linux/drivers/infiniband/hw/mthca/mthca_qp.c (revision 1422)
@@ -146,7 +146,7 @@
MTHCA_QP_OPTPAR_ALT_ADDR_PATH = 1 << 0,
MTHCA_QP_OPTPAR_RRE = 1 << 1,
MTHCA_QP_OPTPAR_RAE = 1 << 2,
- MTHCA_QP_OPTPAR_REW = 1 << 3,
+ MTHCA_QP_OPTPAR_RWE = 1 << 3,
MTHCA_QP_OPTPAR_PKEY_INDEX = 1 << 4,
MTHCA_QP_OPTPAR_Q_KEY = 1 << 5,
MTHCA_QP_OPTPAR_RNR_TIMEOUT = 1 << 6,
@@ -697,14 +697,87 @@
qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RETRY_COUNT);
}

- /* XXX initiator resources */
+ if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC) {
+ qp_context->params1 |= cpu_to_be32(min(attr->max_dest_rd_atomic ?
+ ffs(attr->max_dest_rd_atomic) - 1 : 0,
+ 7) << 21);
+ qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_SRA_MAX);
+ }

if (attr_mask & IB_QP_SQ_PSN)
qp_context->next_send_psn = cpu_to_be32(attr->sq_psn);
qp_context->cqn_snd = cpu_to_be32(to_mcq(ibqp->send_cq)->cqn);

- /* XXX RDMA/atomic enable, responder resources */
+ if (attr_mask & IB_QP_ACCESS_FLAGS) {
+ /*
+ * Only enable RDMA/atomics if we have responder
+ * resources set to a non-zero value.
+ */
+ if (qp->resp_depth) {
+ qp_context->params2 |=
+ cpu_to_be32(attr->qp_access_flags & IB_ACCESS_REMOTE_WRITE ?
+ MTHCA_QP_BIT_RWE : 0);
+ qp_context->params2 |=
+ cpu_to_be32(attr->qp_access_flags & IB_ACCESS_REMOTE_READ ?
+ MTHCA_QP_BIT_RRE : 0);
+ qp_context->params2 |=
+ cpu_to_be32(attr->qp_access_flags & IB_ACCESS_REMOTE_ATOMIC ?
+ MTHCA_QP_BIT_RAE : 0);
+ }

+ qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RWE |
+ MTHCA_QP_OPTPAR_RRE |
+ MTHCA_QP_OPTPAR_RAE);
+
+ qp->atomic_rd_en = attr->qp_access_flags;
+ }
+
+ if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC) {
+ u8 rra_max;
+
+ if (qp->resp_depth && !attr->max_rd_atomic) {
+ /*
+ * Lowering our responder resources to zero.
+ * Turn off RDMA/atomics as responder.
+ * (RWE/RRE/RAE in params2 already zero)
+ */
+ qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RWE |
+ MTHCA_QP_OPTPAR_RRE |
+ MTHCA_QP_OPTPAR_RAE);
+ }
+
+ if (!qp->resp_depth && attr->max_rd_atomic) {
+ /*
+ * Increasing our responder resources from
+ * zero. Turn on RDMA/atomics as appropriate.
+ */
+ qp_context->params2 |=
+ cpu_to_be32(qp->atomic_rd_en & IB_ACCESS_REMOTE_WRITE ?
+ MTHCA_QP_BIT_RWE : 0);
+ qp_context->params2 |=
+ cpu_to_be32(qp->atomic_rd_en & IB_ACCESS_REMOTE_READ ?
+ MTHCA_QP_BIT_RRE : 0);
+ qp_context->params2 |=
+ cpu_to_be32(qp->atomic_rd_en & IB_ACCESS_REMOTE_ATOMIC ?
+ MTHCA_QP_BIT_RAE : 0);
+
+ qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RWE |
+ MTHCA_QP_OPTPAR_RRE |
+ MTHCA_QP_OPTPAR_RAE);
+ }
+
+ for (rra_max = 0;
+ 1 << rra_max < attr->max_rd_atomic &&
+ rra_max < dev->qp_table.rdb_shift;
+ ++rra_max)
+ ; /* nothing */
+
+ qp_context->params2 |= cpu_to_be32(rra_max << 21);
+ qp_param->opt_param_mask |= cpu_to_be32(MTHCA_QP_OPTPAR_RRA_MAX);
+
+ qp->resp_depth = attr->max_rd_atomic;
+ }
+
if (qp->rq.policy == IB_SIGNAL_ALL_WR)
qp_context->params2 |= cpu_to_be32(MTHCA_QP_BIT_RSC);
if (attr_mask & IB_QP_MIN_RNR_TIMER) {
@@ -714,7 +787,9 @@
if (attr_mask & IB_QP_RQ_PSN)
qp_context->rnr_nextrecvpsn |= cpu_to_be32(attr->rq_psn);

- /* XXX ra_buff_indx */
+ qp_context->ra_buff_indx = dev->qp_table.rdb_base +
+ ((qp->qpn & (dev->limits.num_qps - 1)) * MTHCA_RDB_ENTRY_SIZE <<
+ dev->qp_table.rdb_shift);

qp_context->cqn_rcv = cpu_to_be32(to_mcq(ibqp->recv_cq)->cqn);

@@ -910,6 +985,8 @@
spin_lock_init(&qp->lock);
atomic_set(&qp->refcount, 1);
qp->state = IB_QPS_RESET;
+ qp->atomic_rd_en = 0;
+ qp->resp_depth = 0;
qp->sq.policy = send_policy;
qp->rq.policy = recv_policy;
qp->rq.cur = 0;

2005-01-12 22:09:18

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][2/18] InfiniBand/mthca: trivial formatting fix

Trivial formatting fix for empty for loops.

Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/hw/mthca/mthca_mr.c (revision 1420)
+++ linux/drivers/infiniband/hw/mthca/mthca_mr.c (revision 1421)
@@ -197,7 +197,7 @@
for (i = dev->limits.mtt_seg_size / 8, mr->order = 0;
i < list_len;
i <<= 1, ++mr->order)
- /* nothing */ ;
+ ; /* nothing */

mr->first_seg = mthca_alloc_mtt(dev, mr->order);
if (mr->first_seg == -1)
@@ -337,7 +337,7 @@
for (i = 1, dev->mr_table.max_mtt_order = 0;
i < dev->limits.num_mtt_segs;
i <<= 1, ++dev->mr_table.max_mtt_order)
- /* nothing */ ;
+ ; /* nothing */

dev->mr_table.mtt_buddy = kmalloc((dev->mr_table.max_mtt_order + 1) *
sizeof (long *),

2005-01-12 22:04:40

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][7/18] InfiniBand: make more code static

Make needlessly global code static.

Signed-off-by: Adrian Bunk <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/ulp/ipoib/ipoib_main.c (revision 1456)
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_main.c (revision 1457)
@@ -606,7 +606,7 @@
return NETDEV_TX_OK;
}

-struct net_device_stats *ipoib_get_stats(struct net_device *dev)
+static struct net_device_stats *ipoib_get_stats(struct net_device *dev)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);

--- linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision 1456)
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision 1457)
@@ -44,7 +44,7 @@
#include "ipoib.h"

#ifdef CONFIG_INFINIBAND_IPOIB_DEBUG
-int mcast_debug_level;
+static int mcast_debug_level;

module_param(mcast_debug_level, int, 0644);
MODULE_PARM_DESC(mcast_debug_level,
@@ -623,7 +623,7 @@
return 0;
}

-int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast)
+static int ipoib_mcast_leave(struct net_device *dev, struct ipoib_mcast *mcast)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct ib_sa_mcmember_rec rec = {
--- linux/drivers/infiniband/ulp/ipoib/ipoib_ib.c (revision 1456)
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_ib.c (revision 1457)
@@ -357,7 +357,7 @@
}
}

-void __ipoib_reap_ah(struct net_device *dev)
+static void __ipoib_reap_ah(struct net_device *dev)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct ipoib_ah *ah, *tah;
--- linux/drivers/infiniband/core/cache.c (revision 1456)
+++ linux/drivers/infiniband/core/cache.c (revision 1457)
@@ -252,7 +252,7 @@
}
}

-void ib_cache_setup_one(struct ib_device *device)
+static void ib_cache_setup_one(struct ib_device *device)
{
int p;

@@ -295,7 +295,7 @@
kfree(device->cache.gid_cache);
}

-void ib_cache_cleanup_one(struct ib_device *device)
+static void ib_cache_cleanup_one(struct ib_device *device)
{
int p;

@@ -311,7 +311,7 @@
kfree(device->cache.gid_cache);
}

-struct ib_client cache_client = {
+static struct ib_client cache_client = {
.name = "cache",
.add = ib_cache_setup_one,
.remove = ib_cache_cleanup_one

2005-01-12 22:29:18

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][4/18] InfiniBand/mthca: clean up allocation mapping of HCA context memory

Clean up the way we allocate and map memory for use as ICM ("InfiniHost Context
Memory") when running in Arbel MemFree mode. This slightly improves the code for
mapping the firmware area and will make future progress towards full MemFree
support much easier.

Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/hw/mthca/mthca_dev.h (revision 1425)
+++ linux/drivers/infiniband/hw/mthca/mthca_dev.h (revision 1426)
@@ -40,7 +40,6 @@
#include <linux/pci.h>
#include <linux/dma-mapping.h>
#include <asm/semaphore.h>
-#include <asm/scatterlist.h>

#include "mthca_provider.h"
#include "mthca_doorbell.h"
@@ -214,7 +213,7 @@
u64 clr_int_base;
u64 eq_arm_base;
u64 eq_set_ci_base;
- struct scatterlist *mem;
+ struct mthca_icm *icm;
u16 fw_pages;
} arbel;
} fw;
--- linux/drivers/infiniband/hw/mthca/mthca_main.c (revision 1425)
+++ linux/drivers/infiniband/hw/mthca/mthca_main.c (revision 1426)
@@ -48,6 +48,7 @@
#include "mthca_config_reg.h"
#include "mthca_cmd.h"
#include "mthca_profile.h"
+#include "mthca_memfree.h"

MODULE_AUTHOR("Roland Dreier");
MODULE_DESCRIPTION("Mellanox InfiniBand HCA low-level driver");
@@ -259,75 +260,26 @@
{
u8 status;
int err;
- int num_ent, num_sg, fw_pages, cur_order;
- int i;

/* FIXME: use HCA-attached memory for FW if present */

- mdev->fw.arbel.mem = kmalloc(sizeof *mdev->fw.arbel.mem *
- mdev->fw.arbel.fw_pages,
- GFP_KERNEL);
- if (!mdev->fw.arbel.mem) {
+ mdev->fw.arbel.icm =
+ mthca_alloc_icm(mdev, mdev->fw.arbel.fw_pages,
+ GFP_HIGHUSER | __GFP_NOWARN);
+ if (!mdev->fw.arbel.icm) {
mthca_err(mdev, "Couldn't allocate FW area, aborting.\n");
return -ENOMEM;
}

- memset(mdev->fw.arbel.mem, 0,
- sizeof *mdev->fw.arbel.mem * mdev->fw.arbel.fw_pages);
-
- fw_pages = mdev->fw.arbel.fw_pages;
- num_ent = 0;
-
- /*
- * We allocate in as big chunks as we can, up to a maximum of
- * 256 KB per chunk.
- */
- cur_order = get_order(1 << 18);
-
- while (fw_pages > 0) {
- while (1 << cur_order > fw_pages)
- --cur_order;
-
- /*
- * We allocate with GFP_HIGHUSER because only the
- * firmware is going to touch these pages, so there's
- * no need for a kernel virtual address. We use
- * __GFP_NOWARN because we'll deal with any allocation
- * failures ourselves.
- */
- mdev->fw.arbel.mem[num_ent].page = alloc_pages(GFP_HIGHUSER | __GFP_NOWARN,
- cur_order);
- mdev->fw.arbel.mem[num_ent].length = PAGE_SIZE << cur_order;
- if (!mdev->fw.arbel.mem[num_ent].page) {
- --cur_order;
- if (cur_order < 0) {
- mthca_err(mdev, "Couldn't allocate FW area, aborting.\n");
- err = -ENOMEM;
- goto err_free;
- }
- } else {
- ++num_ent;
- fw_pages -= 1 << cur_order;
- }
- }
-
- num_sg = pci_map_sg(mdev->pdev, mdev->fw.arbel.mem, num_ent,
- PCI_DMA_BIDIRECTIONAL);
- if (num_sg <= 0) {
- mthca_err(mdev, "Couldn't allocate FW area, aborting.\n");
- err = -ENOMEM;
- goto err_free;
- }
-
- err = mthca_MAP_FA(mdev, num_sg, mdev->fw.arbel.mem, &status);
+ err = mthca_MAP_FA(mdev, mdev->fw.arbel.icm, &status);
if (err) {
mthca_err(mdev, "MAP_FA command failed, aborting.\n");
- goto err_unmap;
+ goto err_free;
}
if (status) {
mthca_err(mdev, "MAP_FA returned status 0x%02x, aborting.\n", status);
err = -EINVAL;
- goto err_unmap;
+ goto err_free;
}
err = mthca_RUN_FW(mdev, &status);
if (err) {
@@ -345,15 +297,8 @@
err_unmap_fa:
mthca_UNMAP_FA(mdev, &status);

-err_unmap:
- pci_unmap_sg(mdev->pdev, mdev->fw.arbel.mem,
- mdev->fw.arbel.fw_pages, PCI_DMA_BIDIRECTIONAL);
err_free:
- for (i = 0; i < mdev->fw.arbel.fw_pages; ++i)
- if (mdev->fw.arbel.mem[i].page)
- __free_pages(mdev->fw.arbel.mem[i].page,
- get_order(mdev->fw.arbel.mem[i].length));
- kfree(mdev->fw.arbel.mem);
+ mthca_free_icm(mdev, mdev->fw.arbel.icm);
return err;
}

@@ -397,13 +342,17 @@
err = mthca_dev_lim(mdev, &dev_lim);
if (err) {
mthca_err(mdev, "QUERY_DEV_LIM command failed, aborting.\n");
- goto err_out_disable;
+ goto err_out_stop_fw;
}

mthca_warn(mdev, "Sorry, native MT25208 mode support is not done, "
"aborting.\n");
err = -ENODEV;

+err_out_stop_fw:
+ mthca_UNMAP_FA(mdev, &status);
+ mthca_free_icm(mdev, mdev->fw.arbel.icm);
+
err_out_disable:
if (!(mdev->mthca_flags & MTHCA_FLAG_NO_LAM))
mthca_DISABLE_LAM(mdev, &status);
@@ -610,22 +559,13 @@
static void mthca_close_hca(struct mthca_dev *mdev)
{
u8 status;
- int i;

mthca_CLOSE_HCA(mdev, 0, &status);

if (mdev->hca_type == ARBEL_NATIVE) {
mthca_UNMAP_FA(mdev, &status);
+ mthca_free_icm(mdev, mdev->fw.arbel.icm);

- pci_unmap_sg(mdev->pdev, mdev->fw.arbel.mem,
- mdev->fw.arbel.fw_pages, PCI_DMA_BIDIRECTIONAL);
-
- for (i = 0; i < mdev->fw.arbel.fw_pages; ++i)
- if (mdev->fw.arbel.mem[i].page)
- __free_pages(mdev->fw.arbel.mem[i].page,
- get_order(mdev->fw.arbel.mem[i].length));
- kfree(mdev->fw.arbel.mem);
-
if (!(mdev->mthca_flags & MTHCA_FLAG_NO_LAM))
mthca_DISABLE_LAM(mdev, &status);
} else
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux/drivers/infiniband/hw/mthca/mthca_memfree.h (revision 1426)
@@ -0,0 +1,107 @@
+/*
+ * Copyright (c) 2004 Topspin Communications. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * $Id$
+ */
+
+#ifndef MTHCA_MEMFREE_H
+#define MTHCA_MEMFREE_H
+
+#include <linux/list.h>
+#include <linux/pci.h>
+
+#define MTHCA_ICM_CHUNK_LEN \
+ ((512 - sizeof (struct list_head) - 2 * sizeof (int)) / \
+ (sizeof (struct scatterlist)))
+
+struct mthca_icm_chunk {
+ struct list_head list;
+ int npages;
+ int nsg;
+ struct scatterlist mem[MTHCA_ICM_CHUNK_LEN];
+};
+
+struct mthca_icm {
+ struct list_head chunk_list;
+};
+
+struct mthca_icm_iter {
+ struct mthca_icm *icm;
+ struct mthca_icm_chunk *chunk;
+ int page_idx;
+};
+
+struct mthca_dev;
+
+struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, int npages,
+ unsigned int gfp_mask);
+void mthca_free_icm(struct mthca_dev *dev, struct mthca_icm *icm);
+
+static inline void mthca_icm_first(struct mthca_icm *icm,
+ struct mthca_icm_iter *iter)
+{
+ iter->icm = icm;
+ iter->chunk = list_empty(&icm->chunk_list) ?
+ NULL : list_entry(icm->chunk_list.next,
+ struct mthca_icm_chunk, list);
+ iter->page_idx = 0;
+}
+
+static inline int mthca_icm_last(struct mthca_icm_iter *iter)
+{
+ return !iter->chunk;
+}
+
+static inline void mthca_icm_next(struct mthca_icm_iter *iter)
+{
+ if (++iter->page_idx >= iter->chunk->nsg) {
+ if (iter->chunk->list.next == &iter->icm->chunk_list) {
+ iter->chunk = NULL;
+ return;
+ }
+
+ iter->chunk = list_entry(iter->chunk->list.next,
+ struct mthca_icm_chunk, list);
+ iter->page_idx = 0;
+ }
+}
+
+static inline dma_addr_t mthca_icm_addr(struct mthca_icm_iter *iter)
+{
+ return sg_dma_address(&iter->chunk->mem[iter->page_idx]);
+}
+
+static inline unsigned long mthca_icm_size(struct mthca_icm_iter *iter)
+{
+ return sg_dma_len(&iter->chunk->mem[iter->page_idx]);
+}
+
+#endif /* MTHCA_MEMFREE_H */
--- linux/drivers/infiniband/hw/mthca/mthca_cmd.c (revision 1425)
+++ linux/drivers/infiniband/hw/mthca/mthca_cmd.c (revision 1426)
@@ -40,6 +40,7 @@
#include "mthca_dev.h"
#include "mthca_config_reg.h"
#include "mthca_cmd.h"
+#include "mthca_memfree.h"

#define CMD_POLL_TOKEN 0xffff

@@ -508,38 +509,38 @@
return mthca_cmd(dev, 0, 0, 0, CMD_SYS_DIS, HZ, status);
}

-int mthca_MAP_FA(struct mthca_dev *dev, int count,
- struct scatterlist *sglist, u8 *status)
+int mthca_MAP_FA(struct mthca_dev *dev, struct mthca_icm *icm, u8 *status)
{
u32 *inbox;
dma_addr_t indma;
+ struct mthca_icm_iter iter;
int lg;
int nent = 0;
- int i, j;
+ int i;
int err = 0;
int ts = 0;

inbox = pci_alloc_consistent(dev->pdev, PAGE_SIZE, &indma);
memset(inbox, 0, PAGE_SIZE);

- for (i = 0; i < count; ++i) {
+ for (mthca_icm_first(icm, &iter); !mthca_icm_last(&iter); mthca_icm_next(&iter)) {
/*
* We have to pass pages that are aligned to their
* size, so find the least significant 1 in the
* address or size and use that as our log2 size.
*/
- lg = ffs(sg_dma_address(sglist + i) | sg_dma_len(sglist + i)) - 1;
+ lg = ffs(mthca_icm_addr(&iter) | mthca_icm_size(&iter)) - 1;
if (lg < 12) {
- mthca_warn(dev, "Got FW area not aligned to 4K (%llx/%x).\n",
- (unsigned long long) sg_dma_address(sglist + i),
- sg_dma_len(sglist + i));
+ mthca_warn(dev, "Got FW area not aligned to 4K (%llx/%lx).\n",
+ (unsigned long long) mthca_icm_addr(&iter),
+ mthca_icm_size(&iter));
err = -EINVAL;
goto out;
}
- for (j = 0; j < sg_dma_len(sglist + i) / (1 << lg); ++j, ++nent) {
+ for (i = 0; i < mthca_icm_size(&iter) / (1 << lg); ++i, ++nent) {
*((__be64 *) (inbox + nent * 4 + 2)) =
- cpu_to_be64((sg_dma_address(sglist + i) +
- (j << lg)) |
+ cpu_to_be64((mthca_icm_addr(&iter) +
+ (i << lg)) |
(lg - 12));
ts += 1 << (lg - 10);
if (nent == PAGE_SIZE / 16) {
--- linux/drivers/infiniband/hw/mthca/mthca_cmd.h (revision 1425)
+++ linux/drivers/infiniband/hw/mthca/mthca_cmd.h (revision 1426)
@@ -219,8 +219,7 @@

int mthca_SYS_EN(struct mthca_dev *dev, u8 *status);
int mthca_SYS_DIS(struct mthca_dev *dev, u8 *status);
-int mthca_MAP_FA(struct mthca_dev *dev, int count,
- struct scatterlist *sglist, u8 *status);
+int mthca_MAP_FA(struct mthca_dev *dev, struct mthca_icm *icm, u8 *status);
int mthca_UNMAP_FA(struct mthca_dev *dev, u8 *status);
int mthca_RUN_FW(struct mthca_dev *dev, u8 *status);
int mthca_QUERY_FW(struct mthca_dev *dev, u8 *status);
--- linux/drivers/infiniband/hw/mthca/Makefile (revision 1425)
+++ linux/drivers/infiniband/hw/mthca/Makefile (revision 1426)
@@ -9,4 +9,4 @@
ib_mthca-y := mthca_main.o mthca_cmd.o mthca_profile.o mthca_reset.o \
mthca_allocator.o mthca_eq.o mthca_pd.o mthca_cq.o \
mthca_mr.o mthca_qp.o mthca_av.o mthca_mcg.o mthca_mad.o \
- mthca_provider.o
+ mthca_provider.o mthca_memfree.o
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux/drivers/infiniband/hw/mthca/mthca_memfree.c (revision 1426)
@@ -0,0 +1,133 @@
+/*
+ * Copyright (c) 2004 Topspin Communications. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * $Id$
+ */
+
+#include "mthca_memfree.h"
+#include "mthca_dev.h"
+
+void mthca_free_icm(struct mthca_dev *dev, struct mthca_icm *icm)
+{
+ struct mthca_icm_chunk *chunk, *tmp;
+ int i;
+
+ if (!icm)
+ return;
+
+ list_for_each_entry_safe(chunk, tmp, &icm->chunk_list, list) {
+ if (chunk->nsg > 0)
+ pci_unmap_sg(dev->pdev, chunk->mem, chunk->npages,
+ PCI_DMA_BIDIRECTIONAL);
+
+ for (i = 0; i < chunk->npages; ++i)
+ __free_pages(chunk->mem[i].page,
+ get_order(chunk->mem[i].length));
+
+ kfree(chunk);
+ }
+
+ kfree(icm);
+}
+
+struct mthca_icm *mthca_alloc_icm(struct mthca_dev *dev, int npages,
+ unsigned int gfp_mask)
+{
+ struct mthca_icm *icm;
+ struct mthca_icm_chunk *chunk = NULL;
+ int cur_order;
+
+ icm = kmalloc(sizeof *icm, gfp_mask & ~(__GFP_HIGHMEM | __GFP_NOWARN));
+ if (!icm)
+ return icm;
+
+ INIT_LIST_HEAD(&icm->chunk_list);
+
+ /*
+ * We allocate in as big chunks as we can, up to a maximum of
+ * 256 KB per chunk.
+ */
+ cur_order = get_order(1 << 18);
+
+ while (npages > 0) {
+ if (!chunk) {
+ chunk = kmalloc(sizeof *chunk,
+ gfp_mask & ~(__GFP_HIGHMEM | __GFP_NOWARN));
+ if (!chunk)
+ goto fail;
+
+ chunk->npages = 0;
+ chunk->nsg = 0;
+ list_add_tail(&chunk->list, &icm->chunk_list);
+ }
+
+ while (1 << cur_order > npages)
+ --cur_order;
+
+ chunk->mem[chunk->npages].page = alloc_pages(gfp_mask, cur_order);
+ if (chunk->mem[chunk->npages].page) {
+ chunk->mem[chunk->npages].length = PAGE_SIZE << cur_order;
+ chunk->mem[chunk->npages].offset = 0;
+
+ if (++chunk->npages == MTHCA_ICM_CHUNK_LEN) {
+ chunk->nsg = pci_map_sg(dev->pdev, chunk->mem,
+ chunk->npages,
+ PCI_DMA_BIDIRECTIONAL);
+
+ if (chunk->nsg <= 0)
+ goto fail;
+
+ chunk = NULL;
+ }
+
+ npages -= 1 << cur_order;
+ } else {
+ --cur_order;
+ if (cur_order < 0)
+ goto fail;
+ }
+ }
+
+ if (chunk) {
+ chunk->nsg = pci_map_sg(dev->pdev, chunk->mem,
+ chunk->npages,
+ PCI_DMA_BIDIRECTIONAL);
+
+ if (chunk->nsg <= 0)
+ goto fail;
+ }
+
+ return icm;
+
+fail:
+ mthca_free_icm(dev, icm);
+ return NULL;
+}


2005-01-13 00:34:27

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][13/18] InfiniBand/core: add more parameters to process_mad

Add parameters to process_mad device method to support full Mellanox
firmware capabilities (pass sufficient information for baseboard
management trap generation, etc).

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/include/ib_verbs.h (revision 1502)
+++ linux/drivers/infiniband/include/ib_verbs.h (revision 1504)
@@ -684,9 +684,12 @@
};

struct ib_mad;
+struct ib_grh;

enum ib_process_mad_flags {
- IB_MAD_IGNORE_MKEY = 1
+ IB_MAD_IGNORE_MKEY = 1,
+ IB_MAD_IGNORE_BKEY = 2,
+ IB_MAD_IGNORE_ALL = IB_MAD_IGNORE_MKEY | IB_MAD_IGNORE_BKEY
};

enum ib_mad_result {
@@ -812,7 +815,8 @@
int (*process_mad)(struct ib_device *device,
int process_mad_flags,
u8 port_num,
- u16 source_lid,
+ struct ib_wc *in_wc,
+ struct ib_grh *in_grh,
struct ib_mad *in_mad,
struct ib_mad *out_mad);

--- linux/drivers/infiniband/core/mad.c (revision 1502)
+++ linux/drivers/infiniband/core/mad.c (revision 1504)
@@ -617,6 +617,23 @@
spin_unlock_irqrestore(&qp_info->snoop_lock, flags);
}

+static void build_smp_wc(u64 wr_id, u16 slid, u16 pkey_index, u8 port_num,
+ struct ib_wc *wc)
+{
+ memset(wc, 0, sizeof *wc);
+ wc->wr_id = wr_id;
+ wc->status = IB_WC_SUCCESS;
+ wc->opcode = IB_WC_RECV;
+ wc->pkey_index = pkey_index;
+ wc->byte_len = sizeof(struct ib_mad) + sizeof(struct ib_grh);
+ wc->src_qp = IB_QP0;
+ wc->qp_num = IB_QP0;
+ wc->slid = slid;
+ wc->sl = 0;
+ wc->dlid_path_bits = 0;
+ wc->port_num = port_num;
+}
+
/*
* Return 0 if SMP is to be sent
* Return 1 if SMP was consumed locally (whether or not solicited)
@@ -634,6 +651,7 @@
struct ib_mad_agent_private *recv_mad_agent;
struct ib_device *device = mad_agent_priv->agent.device;
u8 port_num = mad_agent_priv->agent.port_num;
+ struct ib_wc mad_wc;

if (!smi_handle_dr_smp_send(smp, device->node_type, port_num)) {
ret = -EINVAL;
@@ -664,7 +682,12 @@
kfree(local);
goto out;
}
- ret = device->process_mad(device, 0, port_num, smp->dr_slid,
+
+ build_smp_wc(send_wr->wr_id, smp->dr_slid, send_wr->wr.ud.pkey_index,
+ send_wr->wr.ud.port_num, &mad_wc);
+
+ /* No GRH for DR SMP */
+ ret = device->process_mad(device, 0, port_num, &mad_wc, NULL,
(struct ib_mad *)smp,
(struct ib_mad *)&mad_priv->mad);
switch (ret)
@@ -1622,7 +1645,7 @@

ret = port_priv->device->process_mad(port_priv->device, 0,
port_priv->port_num,
- wc->slid,
+ wc, &recv->grh,
&recv->mad.mad,
&response->mad.mad);
if (ret & IB_MAD_RESULT_SUCCESS) {
@@ -2050,19 +2073,10 @@
* Defined behavior is to complete response
* before request
*/
- wc.wr_id = local->wr_id;
- wc.status = IB_WC_SUCCESS;
- wc.opcode = IB_WC_RECV;
- wc.vendor_err = 0;
- wc.byte_len = sizeof(struct ib_mad) +
- sizeof(struct ib_grh);
- wc.src_qp = IB_QP0;
- wc.wc_flags = 0; /* No GRH */
- wc.pkey_index = 0;
- wc.slid = IB_LID_PERMISSIVE;
- wc.sl = 0;
- wc.dlid_path_bits = 0;
- wc.qp_num = IB_QP0;
+ build_smp_wc(local->wr_id, IB_LID_PERMISSIVE,
+ 0 /* pkey index */,
+ recv_mad_agent->agent.port_num, &wc);
+
local->mad_priv->header.recv_wc.wc = &wc;
local->mad_priv->header.recv_wc.mad_len =
sizeof(struct ib_mad);
--- linux/drivers/infiniband/core/sysfs.c (revision 1502)
+++ linux/drivers/infiniband/core/sysfs.c (revision 1504)
@@ -315,8 +315,8 @@

in_mad->data[41] = p->port_num; /* PortSelect field */

- if ((p->ibdev->process_mad(p->ibdev, IB_MAD_IGNORE_MKEY, p->port_num, 0xffff,
- in_mad, out_mad) &
+ if ((p->ibdev->process_mad(p->ibdev, IB_MAD_IGNORE_MKEY,
+ p->port_num, NULL, NULL, in_mad, out_mad) &
(IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY)) !=
(IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY)) {
ret = -EINVAL;
--- linux/drivers/infiniband/hw/mthca/mthca_dev.h (revision 1502)
+++ linux/drivers/infiniband/hw/mthca/mthca_dev.h (revision 1504)
@@ -381,7 +381,8 @@
int mthca_process_mad(struct ib_device *ibdev,
int mad_flags,
u8 port_num,
- u16 slid,
+ struct ib_wc *in_wc,
+ struct ib_grh *in_grh,
struct ib_mad *in_mad,
struct ib_mad *out_mad);
int mthca_create_agents(struct mthca_dev *dev);
--- linux/drivers/infiniband/hw/mthca/mthca_mad.c (revision 1502)
+++ linux/drivers/infiniband/hw/mthca/mthca_mad.c (revision 1504)
@@ -185,12 +185,14 @@
int mthca_process_mad(struct ib_device *ibdev,
int mad_flags,
u8 port_num,
- u16 slid,
+ struct ib_wc *in_wc,
+ struct ib_grh *in_grh,
struct ib_mad *in_mad,
struct ib_mad *out_mad)
{
int err;
u8 status;
+ u16 slid = in_wc ? in_wc->slid : IB_LID_PERMISSIVE;

/* Forward locally generated traps to the SM */
if (in_mad->mad_hdr.method == IB_MGMT_METHOD_TRAP &&

2005-01-13 00:34:23

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][8/18] InfiniBand/core: set byte_cnt correctly in MAD completion

Integrate Michael Tsirkin's patch to local_completion to set the WC
byte_cnt according to the IBA 1.1 spec (include the GRH size
regardless of whether it is present or not).

Signed-off-by: Hal Rosenstock <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/core/mad.c (revision 1458)
+++ linux/drivers/infiniband/core/mad.c (revision 1459)
@@ -2018,9 +2018,10 @@
wc.status = IB_WC_SUCCESS;
wc.opcode = IB_WC_RECV;
wc.vendor_err = 0;
- wc.byte_len = sizeof(struct ib_mad);
+ wc.byte_len = sizeof(struct ib_mad) +
+ sizeof(struct ib_grh);
wc.src_qp = IB_QP0;
- wc.wc_flags = 0;
+ wc.wc_flags = 0; /* No GRH */
wc.pkey_index = 0;
wc.slid = IB_LID_PERMISSIVE;
wc.sl = 0;

2005-01-13 00:34:25

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][10/18] InfiniBand/core: add node_type and phys_state sysfs attrs

Add per-device "node_type" and per-port "phys_state" sysfs attributes
for InfiniBand devices.

Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/include/ib_verbs.h (revision 1480)
+++ linux/drivers/infiniband/include/ib_verbs.h (revision 1490)
@@ -212,6 +212,7 @@
u8 init_type_reply;
u8 active_width;
u8 active_speed;
+ u8 phys_state;
};

enum ib_device_modify_flags {
--- linux/drivers/infiniband/core/sysfs.c (revision 1480)
+++ linux/drivers/infiniband/core/sysfs.c (revision 1490)
@@ -197,6 +197,29 @@
ib_width_enum_to_int(attr.active_width), speed);
}

+static ssize_t phys_state_show(struct ib_port *p, struct port_attribute *unused,
+ char *buf)
+{
+ struct ib_port_attr attr;
+
+ ssize_t ret;
+
+ ret = ib_query_port(p->ibdev, p->port_num, &attr);
+ if (ret)
+ return ret;
+
+ switch (attr.phys_state) {
+ case 1: return sprintf(buf, "1: Sleep\n");
+ case 2: return sprintf(buf, "2: Polling\n");
+ case 3: return sprintf(buf, "3: Disabled\n");
+ case 4: return sprintf(buf, "4: PortConfigurationTraining\n");
+ case 5: return sprintf(buf, "5: LinkUp\n");
+ case 6: return sprintf(buf, "6: LinkErrorRecovery\n");
+ case 7: return sprintf(buf, "7: Phy Test\n");
+ default: return sprintf(buf, "%d: <unknown>\n", attr.phys_state);
+ }
+}
+
static PORT_ATTR_RO(state);
static PORT_ATTR_RO(lid);
static PORT_ATTR_RO(lid_mask_count);
@@ -204,6 +227,7 @@
static PORT_ATTR_RO(sm_sl);
static PORT_ATTR_RO(cap_mask);
static PORT_ATTR_RO(rate);
+static PORT_ATTR_RO(phys_state);

static struct attribute *port_default_attrs[] = {
&port_attr_state.attr,
@@ -213,6 +237,7 @@
&port_attr_sm_sl.attr,
&port_attr_cap_mask.attr,
&port_attr_rate.attr,
+ &port_attr_phys_state.attr,
NULL
};

@@ -572,6 +597,18 @@
return ret;
}

+static ssize_t show_node_type(struct class_device *cdev, char *buf)
+{
+ struct ib_device *dev = container_of(cdev, struct ib_device, class_dev);
+
+ switch (dev->node_type) {
+ case IB_NODE_CA: return sprintf(buf, "%d: CA\n", dev->node_type);
+ case IB_NODE_SWITCH: return sprintf(buf, "%d: switch\n", dev->node_type);
+ case IB_NODE_ROUTER: return sprintf(buf, "%d: router\n", dev->node_type);
+ default: return sprintf(buf, "%d: <unknown>\n", dev->node_type);
+ }
+}
+
static ssize_t show_sys_image_guid(struct class_device *cdev, char *buf)
{
struct ib_device *dev = container_of(cdev, struct ib_device, class_dev);
@@ -606,10 +643,12 @@
be16_to_cpu(((u16 *) &attr.node_guid)[3]));
}

+static CLASS_DEVICE_ATTR(node_type, S_IRUGO, show_node_type, NULL);
static CLASS_DEVICE_ATTR(sys_image_guid, S_IRUGO, show_sys_image_guid, NULL);
static CLASS_DEVICE_ATTR(node_guid, S_IRUGO, show_node_guid, NULL);

static struct class_device_attribute *ib_class_attributes[] = {
+ &class_device_attr_node_type,
&class_device_attr_sys_image_guid,
&class_device_attr_node_guid
};
--- linux/drivers/infiniband/hw/mthca/mthca_provider.c (revision 1480)
+++ linux/drivers/infiniband/hw/mthca/mthca_provider.c (revision 1490)
@@ -119,6 +119,7 @@
props->sm_lid = be16_to_cpup((u16 *) (out_mad->data + 18));
props->sm_sl = out_mad->data[36] & 0xf;
props->state = out_mad->data[32] & 0xf;
+ props->phys_state = out_mad->data[33] >> 4;
props->port_cap_flags = be32_to_cpup((u32 *) (out_mad->data + 20));
props->gid_tbl_len = to_mdev(ibdev)->limits.gid_table_len;
props->pkey_tbl_len = to_mdev(ibdev)->limits.pkey_table_len;
--- linux/Documentation/infiniband/sysfs.txt (revision 1480)
+++ linux/Documentation/infiniband/sysfs.txt (revision 1490)
@@ -3,6 +3,7 @@
For each InfiniBand device, the InfiniBand drivers create the
following files under /sys/class/linux/drivers/infiniband/<device name>:

+ node_type - Node type (CA, switch or router)
node_guid - Node GUID
sys_image_guid - System image GUID

@@ -25,6 +26,7 @@
sm_lid - Subnet manager LID for port's subnet
sm_sl - Subnet manager SL for port's subnet
state - Port state (DOWN, INIT, ARMED, ACTIVE or ACTIVE_DEFER)
+ phys_state - Port physical state (Sleep, Polling, LinkUp, etc)

There is also a "counters" subdirectory, with files


2005-01-13 00:34:24

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][9/18] InfiniBand/core: add QP number to work completion struct

InfiniBand spec rev 1.2 compliance: add local qp number to
work completion structure.

Signed-off-by: Michael S. Tsirkin <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/include/ib_verbs.h (revision 1466)
+++ linux/drivers/infiniband/include/ib_verbs.h (revision 1468)
@@ -352,6 +352,7 @@
u32 vendor_err;
u32 byte_len;
__be32 imm_data;
+ u32 qp_num;
u32 src_qp;
int wc_flags;
u16 pkey_index;
--- linux/drivers/infiniband/core/mad.c (revision 1466)
+++ linux/drivers/infiniband/core/mad.c (revision 1468)
@@ -2026,6 +2026,7 @@
wc.slid = IB_LID_PERMISSIVE;
wc.sl = 0;
wc.dlid_path_bits = 0;
+ wc.qp_num = IB_QP0;
local->mad_priv->header.recv_wc.wc = &wc;
local->mad_priv->header.recv_wc.mad_len =
sizeof(struct ib_mad);
--- linux/drivers/infiniband/hw/mthca/mthca_cq.c (revision 1466)
+++ linux/drivers/infiniband/hw/mthca/mthca_cq.c (revision 1468)
@@ -444,6 +444,8 @@
spin_lock(&(*cur_qp)->lock);
}

+ entry->qp_num = (*cur_qp)->qpn;
+
if (is_send) {
wq = &(*cur_qp)->sq;
wqe_index = ((be32_to_cpu(cqe->wqe) - (*cur_qp)->send_wqe_offset)

2005-01-12 21:55:50

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][16/18] InfiniBand: update copyrights for new year

Update copyright line (files were modified in 2005).

Signed-off-by: Hal Rosenstock <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/core/mad.c (revision 1510)
+++ linux/drivers/infiniband/core/mad.c (revision 1512)
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2004, Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2004, 2005 Voltaire, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
--- linux/drivers/infiniband/core/mad_priv.h (revision 1510)
+++ linux/drivers/infiniband/core/mad_priv.h (revision 1512)
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2004, Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2004, 2005, Voltaire, Inc. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
--- linux/drivers/infiniband/core/sysfs.c (revision 1510)
+++ linux/drivers/infiniband/core/sysfs.c (revision 1512)
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2004 Topspin Communications. All rights reserved.
+ * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
--- linux/drivers/infiniband/hw/mthca/mthca_provider.c (revision 1510)
+++ linux/drivers/infiniband/hw/mthca/mthca_provider.c (revision 1512)
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2004 Topspin Communications. All rights reserved.
+ * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU

2005-01-13 00:58:07

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][17/18] InfiniBand/ipoib: move structs from stack to device private struct

Move the gather list and work request used for posting sends from the
stack in ipoib_send() to the private structure. This reduces the
stack usage for the data path function ipoib_send() and may speed
things up slightly because we don't need to initialize constant
members of the structures.

Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/ulp/ipoib/ipoib_verbs.c (revision 1520)
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_verbs.c (revision 1521)
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2004 Topspin Communications. All rights reserved.
+ * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
@@ -187,7 +187,7 @@

priv->mr = ib_get_dma_mr(priv->pd, IB_ACCESS_LOCAL_WRITE);
if (IS_ERR(priv->mr)) {
- printk(KERN_WARNING "%s: ib_reg_phys_mr failed\n", ca->name);
+ printk(KERN_WARNING "%s: ib_get_dma_mr failed\n", ca->name);
goto out_free_cq;
}

@@ -204,6 +204,13 @@
priv->dev->dev_addr[2] = (priv->qp->qp_num >> 8) & 0xff;
priv->dev->dev_addr[3] = (priv->qp->qp_num ) & 0xff;

+ priv->tx_sge.lkey = priv->mr->lkey;
+
+ priv->tx_wr.opcode = IB_WR_SEND;
+ priv->tx_wr.sg_list = &priv->tx_sge;
+ priv->tx_wr.num_sge = 1;
+ priv->tx_wr.send_flags = IB_SEND_SIGNALED;
+
return 0;

out_free_mr:
--- linux/drivers/infiniband/ulp/ipoib/ipoib.h (revision 1520)
+++ linux/drivers/infiniband/ulp/ipoib/ipoib.h (revision 1521)
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2004 Topspin Communications. All rights reserved.
+ * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
@@ -150,10 +150,12 @@

struct ipoib_buf *rx_ring;

- spinlock_t tx_lock;
+ spinlock_t tx_lock;
struct ipoib_buf *tx_ring;
- unsigned tx_head;
- unsigned tx_tail;
+ unsigned tx_head;
+ unsigned tx_tail;
+ struct ib_sge tx_sge;
+ struct ib_send_wr tx_wr;

struct ib_wc ibwc[IPOIB_NUM_WC];

--- linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision 1520)
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision 1521)
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2004 Topspin Communications. All rights reserved.
+ * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
@@ -213,8 +213,10 @@

/* Set the cached Q_Key before we attach if it's the broadcast group */
if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
- sizeof (union ib_gid)))
+ sizeof (union ib_gid))) {
priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey);
+ priv->tx_wr.wr.ud.remote_qkey = priv->qkey;
+ }

if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) {
if (test_and_set_bit(IPOIB_MCAST_FLAG_ATTACHED, &mcast->flags)) {
--- linux/drivers/infiniband/ulp/ipoib/ipoib_ib.c (revision 1520)
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_ib.c (revision 1521)
@@ -1,5 +1,5 @@
/*
- * Copyright (c) 2004 Topspin Communications. All rights reserved.
+ * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved.
*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
@@ -281,28 +281,16 @@
struct ib_ah *address, u32 qpn,
dma_addr_t addr, int len)
{
- struct ib_sge list = {
- .addr = addr,
- .length = len,
- .lkey = priv->mr->lkey,
- };
- struct ib_send_wr param = {
- .wr_id = wr_id,
- .opcode = IB_WR_SEND,
- .sg_list = &list,
- .num_sge = 1,
- .wr = {
- .ud = {
- .remote_qpn = qpn,
- .remote_qkey = priv->qkey,
- .ah = address
- },
- },
- .send_flags = IB_SEND_SIGNALED,
- };
struct ib_send_wr *bad_wr;

- return ib_post_send(priv->qp, &param, &bad_wr);
+ priv->tx_sge.addr = addr;
+ priv->tx_sge.length = len;
+
+ priv->tx_wr.wr_id = wr_id;
+ priv->tx_wr.wr.ud.remote_qpn = qpn;
+ priv->tx_wr.wr.ud.ah = address;
+
+ return ib_post_send(priv->qp, &priv->tx_wr, &bad_wr);
}

void ipoib_send(struct net_device *dev, struct sk_buff *skb,

2005-01-13 01:05:59

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][18/18] InfiniBand/core: rename handle_outgoing_smp

Change routine name from handle_outgoing_smp to handle_outgoing_dr_smp.

Signed-off-by: Hal Rosenstock <[email protected]>
Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/core/mad.c (revision 1449)
+++ linux/drivers/infiniband/core/mad.c (revision 1450)
@@ -619,9 +619,9 @@
* Return 1 if SMP was consumed locally (whether or not solicited)
* Return < 0 if error
*/
-static int handle_outgoing_smp(struct ib_mad_agent_private *mad_agent_priv,
- struct ib_smp *smp,
- struct ib_send_wr *send_wr)
+static int handle_outgoing_dr_smp(struct ib_mad_agent_private *mad_agent_priv,
+ struct ib_smp *smp,
+ struct ib_send_wr *send_wr)
{
int ret, alloc_flags;
unsigned long flags;
@@ -797,7 +797,8 @@

smp = (struct ib_smp *)send_wr->wr.ud.mad_hdr;
if (smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) {
- ret = handle_outgoing_smp(mad_agent_priv, smp, send_wr);
+ ret = handle_outgoing_dr_smp(mad_agent_priv, smp,
+ send_wr);
if (ret < 0) /* error */
goto error2;
else if (ret == 1) /* locally consumed */

2005-01-13 01:18:27

by Roland Dreier

[permalink] [raw]
Subject: [PATCH][1/18] InfiniBand/IPoIB: use correct static rate in IpoIB

Calculate static rate for IPoIB address handles based on local
width/speed and path rate.

Signed-off-by: Roland Dreier <[email protected]>

--- linux/drivers/infiniband/include/ib_sa.h (revision 1409)
+++ linux/drivers/infiniband/include/ib_sa.h (revision 1410)
@@ -59,6 +59,34 @@
IB_SA_BEST = 3
};

+enum ib_sa_rate {
+ IB_SA_RATE_2_5_GBPS = 2,
+ IB_SA_RATE_5_GBPS = 5,
+ IB_SA_RATE_10_GBPS = 3,
+ IB_SA_RATE_20_GBPS = 6,
+ IB_SA_RATE_30_GBPS = 4,
+ IB_SA_RATE_40_GBPS = 7,
+ IB_SA_RATE_60_GBPS = 8,
+ IB_SA_RATE_80_GBPS = 9,
+ IB_SA_RATE_120_GBPS = 10
+};
+
+static inline int ib_sa_rate_enum_to_int(enum ib_sa_rate rate)
+{
+ switch (rate) {
+ case IB_SA_RATE_2_5_GBPS: return 1;
+ case IB_SA_RATE_5_GBPS: return 2;
+ case IB_SA_RATE_10_GBPS: return 4;
+ case IB_SA_RATE_20_GBPS: return 8;
+ case IB_SA_RATE_30_GBPS: return 12;
+ case IB_SA_RATE_40_GBPS: return 16;
+ case IB_SA_RATE_60_GBPS: return 24;
+ case IB_SA_RATE_80_GBPS: return 32;
+ case IB_SA_RATE_120_GBPS: return 48;
+ default: return -1;
+ }
+}
+
typedef u64 __bitwise ib_sa_comp_mask;

#define IB_SA_COMP_MASK(n) ((__force ib_sa_comp_mask) cpu_to_be64(1ull << n))
--- linux/drivers/infiniband/ulp/ipoib/ipoib_main.c (revision 1410)
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_main.c (revision 1411)
@@ -283,21 +283,21 @@
skb_queue_head_init(&skqueue);

if (!status) {
- /*
- * For now we set static_rate to 0. This is not
- * really correct: we should look at the rate
- * component of the path member record, compare it
- * with the rate of our local port (calculated from
- * the active link speed and link width) and set an
- * inter-packet delay appropriately.
- */
struct ib_ah_attr av = {
.dlid = be16_to_cpu(pathrec->dlid),
.sl = pathrec->sl,
- .static_rate = 0,
.port_num = priv->port
};

+ if (ib_sa_rate_enum_to_int(pathrec->rate) > 0)
+ av.static_rate = (2 * priv->local_rate -
+ ib_sa_rate_enum_to_int(pathrec->rate) - 1) /
+ (priv->local_rate ? priv->local_rate : 1);
+
+ ipoib_dbg(priv, "static_rate %d for local port %dX, path %dX\n",
+ av.static_rate, priv->local_rate,
+ ib_sa_rate_enum_to_int(pathrec->rate));
+
ah = ipoib_create_ah(dev, priv->pd, &av);
}

--- linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision 1410)
+++ linux/drivers/infiniband/ulp/ipoib/ipoib_multicast.c (revision 1411)
@@ -238,19 +238,10 @@
}

{
- /*
- * For now we set static_rate to 0. This is not
- * really correct: we should look at the rate
- * component of the MC member record, compare it with
- * the rate of our local port (calculated from the
- * active link speed and link width) and set an
- * inter-packet delay appropriately.
- */
struct ib_ah_attr av = {
.dlid = be16_to_cpu(mcast->mcmember.mlid),
.port_num = priv->port,
.sl = mcast->mcmember.sl,
- .static_rate = 0,
.ah_flags = IB_AH_GRH,
.grh = {
.flow_label = be32_to_cpu(mcast->mcmember.flow_label),
@@ -262,6 +253,15 @@

av.grh.dgid = mcast->mcmember.mgid;

+ if (ib_sa_rate_enum_to_int(mcast->mcmember.rate) > 0)
+ av.static_rate = (2 * priv->local_rate -
+ ib_sa_rate_enum_to_int(mcast->mcmember.rate) - 1) /
+ (priv->local_rate ? priv->local_rate : 1);
+
+ ipoib_dbg_mcast(priv, "static_rate %d for local port %dX, mcmember %dX\n",
+ av.static_rate, priv->local_rate,
+ ib_sa_rate_enum_to_int(mcast->mcmember.rate));
+
mcast->ah = ipoib_create_ah(dev, priv->pd, &av);
if (!mcast->ah) {
ipoib_warn(priv, "ib_address_create failed\n");
@@ -506,6 +506,17 @@
else
memcpy(priv->dev->dev_addr + 4, priv->local_gid.raw, sizeof (union ib_gid));

+ {
+ struct ib_port_attr attr;
+
+ if (!ib_query_port(priv->ca, priv->port, &attr)) {
+ priv->local_lid = attr.lid;
+ priv->local_rate = attr.active_speed *
+ ib_width_enum_to_int(attr.active_width);
+ } else
+ ipoib_warn(priv, "ib_query_port failed\n");
+ }
+
if (!priv->broadcast) {
priv->broadcast = ipoib_mcast_alloc(dev, 1);
if (!priv->broadcast) {
@@ -554,15 +565,6 @@
return;
}

- {
- struct ib_port_attr attr;
-
- if (!ib_query_port(priv->ca, priv->port, &attr))
- priv->local_lid = attr.lid;
- else
- ipoib_warn(priv, "ib_query_port failed\n");
- }
-
priv->mcast_mtu = ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu) -
IPOIB_ENCAP_LEN;
dev->mtu = min(priv->mcast_mtu, priv->admin_mtu);
--- linux/drivers/infiniband/ulp/ipoib/ipoib.h (revision 1410)
+++ linux/drivers/infiniband/ulp/ipoib/ipoib.h (revision 1411)
@@ -143,6 +143,7 @@

union ib_gid local_gid;
u16 local_lid;
+ u8 local_rate;

unsigned int admin_mtu;
unsigned int mcast_mtu;

2005-01-13 09:45:48

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [openib-general] [PATCH][5/18] InfiniBand/mthca: add needed rmb() in event queue poll

Hello!
Quoting r. Roland Dreier ([email protected]) "[openib-general] [PATCH][5/18] InfiniBand/mthca: add needed rmb() in event queue poll":
> Add an rmb() between checking the ownership bit of an event queue
> entry and reading the contents of the EQE. Without this barrier, the
> CPU could read stale contents of the EQE before HW writes the EQE but
> have the read of the ownership bit reordered until after HW finishes
> writing, which leads to the driver processing an incorrect event. This
> was actually observed to happen when multiple completion queues are in
> heavy use on an IBM JS20 PowerPC 970 system.
>
> Also explain the existing rmb() in completion queue poll (there for
> the same reason) and slightly improve debugging output.
>
> Signed-off-by: Roland Dreier <[email protected]>
>
> --- linux/drivers/infiniband/hw/mthca/mthca_cq.c (revision 1437)
> +++ linux/drivers/infiniband/hw/mthca/mthca_cq.c (revision 1439)
> @@ -1,5 +1,5 @@
> /*
> - * Copyright (c) 2004 Topspin Communications. All rights reserved.
> + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved.
> *
> * This software is available to you under a choice of one of two
> * licenses. You may choose to be licensed under the terms of the GNU
> @@ -391,6 +391,10 @@
> if (!next_cqe_sw(cq))
> return -EAGAIN;
>
> + /*
> + * Make sure we read CQ entry contents after we've checked the
> + * ownership bit.
> + */
> rmb();
>
> cqe = get_cqe(cq, cq->cons_index);
> @@ -768,7 +772,8 @@
> u32 *ctx = MAILBOX_ALIGN(mailbox);
> int j;
>
> - printk(KERN_ERR "context for CQN %x\n", cq->cqn);
> + printk(KERN_ERR "context for CQN %x (cons index %x, next sw %d)\n",
> + cq->cqn, cq->cons_index, next_cqe_sw(cq));
> for (j = 0; j < 16; ++j)
> printk(KERN_ERR "[%2x] %08x\n", j * 4, be32_to_cpu(ctx[j]));
> }
> --- linux/drivers/infiniband/hw/mthca/mthca_eq.c (revision 1437)
> +++ linux/drivers/infiniband/hw/mthca/mthca_eq.c (revision 1439)
> @@ -1,5 +1,5 @@
> /*
> - * Copyright (c) 2004 Topspin Communications. All rights reserved.
> + * Copyright (c) 2004, 2005 Topspin Communications. All rights reserved.
> *
> * This software is available to you under a choice of one of two
> * licenses. You may choose to be licensed under the terms of the GNU
> @@ -240,6 +240,12 @@
> int set_ci = 0;
> eqe = get_eqe(eq, eq->cons_index);
>
> + /*
> + * Make sure we read EQ entry contents after we've
> + * checked the ownership bit.
> + */
> + rmb();
> +
> switch (eqe->type) {
> case MTHCA_EVENT_TYPE_COMP:
> disarm_cqn = be32_to_cpu(eqe->event.comp.cqn) & 0xffffff;

Since we are using the eqe here, it seems that read_barrier_depends
shall be sufficient (as well as in the cq case)?

However, I see that read_barrier_depends is a nop on ppc, and the
comment indicates that problems were seen on ppc 970.
What gives? do I misunderstand what a dependency is?

MST

2005-01-13 15:36:15

by Roland Dreier

[permalink] [raw]
Subject: Re: [openib-general] [PATCH][5/18] InfiniBand/mthca: add needed rmb() in event queue poll

Michael> Since we are using the eqe here, it seems that
Michael> read_barrier_depends shall be sufficient (as well as in
Michael> the cq case)?

Michael> However, I see that read_barrier_depends is a nop on ppc,
Michael> and the comment indicates that problems were seen on ppc
Michael> 970. What gives? do I misunderstand what a dependency
Michael> is?

There is no dependency between the EQE ownership field and the rest of
the EQE, so read_barrier_depends() is not sufficient. I think you are
misunderstanding what a dependency is. The comments in
asm-i386/system.h or http://lse.sourceforge.net/locking/wmbdd.html may
help clear things up.

- R.