2013-08-02 17:36:17

by Jon Mason

[permalink] [raw]
Subject: [PATCH 00/15] NTB: Bug Fixes and New Features

A fairly major update for NTB. Numerous fixes and features being added,
including adding support for NTB-RP and using DMA engines to
transmit/receive data. Reviews are appreciated!

Thanks,
Jon

MAINTAINERS | 2 +
drivers/ntb/Kconfig | 2 +-
drivers/ntb/ntb_hw.c | 483 ++++++++++++++++++++++++++++++++++---------
drivers/ntb/ntb_hw.h | 105 ++++++++--
drivers/ntb/ntb_regs.h | 47 +++--
drivers/ntb/ntb_transport.c | 382 ++++++++++++++++++++++++++--------
6 files changed, 802 insertions(+), 219 deletions(-)


2013-08-02 17:36:30

by Jon Mason

[permalink] [raw]
Subject: [PATCH 10/15] NTB: Rename Variables for NTB-RP

Many variable names in the NTB driver refer to the primary or secondary
side. However, these variables will be used to access the reverse case
when in NTB-RP mode. Make these names more generic in anticipation of
NTB-RP support.

Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/ntb_hw.c | 83 +++++++++++++++++++++----------------------
drivers/ntb/ntb_hw.h | 12 +++----
drivers/ntb/ntb_transport.c | 2 +-
3 files changed, 48 insertions(+), 49 deletions(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index 014222c..e7cbb44 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -167,9 +167,9 @@ int ntb_register_db_callback(struct ntb_device *ndev, unsigned int idx,
ndev->db_cb[idx].data = data;

/* unmask interrupt */
- mask = readw(ndev->reg_ofs.pdb_mask);
+ mask = readw(ndev->reg_ofs.ldb_mask);
clear_bit(idx * ndev->bits_per_vector, &mask);
- writew(mask, ndev->reg_ofs.pdb_mask);
+ writew(mask, ndev->reg_ofs.ldb_mask);

return 0;
}
@@ -189,9 +189,9 @@ void ntb_unregister_db_callback(struct ntb_device *ndev, unsigned int idx)
if (idx >= ndev->max_cbs || !ndev->db_cb[idx].callback)
return;

- mask = readw(ndev->reg_ofs.pdb_mask);
+ mask = readw(ndev->reg_ofs.ldb_mask);
set_bit(idx * ndev->bits_per_vector, &mask);
- writew(mask, ndev->reg_ofs.pdb_mask);
+ writew(mask, ndev->reg_ofs.ldb_mask);

ndev->db_cb[idx].callback = NULL;
}
@@ -423,16 +423,16 @@ void ntb_set_mw_addr(struct ntb_device *ndev, unsigned int mw, u64 addr)

switch (MW_TO_BAR(mw)) {
case NTB_BAR_23:
- writeq(addr, ndev->reg_ofs.sbar2_xlat);
+ writeq(addr, ndev->reg_ofs.bar2_xlat);
break;
case NTB_BAR_45:
- writeq(addr, ndev->reg_ofs.sbar4_xlat);
+ writeq(addr, ndev->reg_ofs.bar4_xlat);
break;
}
}

/**
- * ntb_ring_sdb() - Set the doorbell on the secondary/external side
+ * ntb_ring_doorbell() - Set the doorbell on the secondary/external side
* @ndev: pointer to ntb_device instance
* @db: doorbell to ring
*
@@ -441,15 +441,15 @@ void ntb_set_mw_addr(struct ntb_device *ndev, unsigned int mw, u64 addr)
*
* RETURNS: An appropriate -ERRNO error value on error, or zero for success.
*/
-void ntb_ring_sdb(struct ntb_device *ndev, unsigned int db)
+void ntb_ring_doorbell(struct ntb_device *ndev, unsigned int db)
{
dev_dbg(&ndev->pdev->dev, "%s: ringing doorbell %d\n", __func__, db);

if (ndev->hw_type == BWD_HW)
- writeq((u64) 1 << db, ndev->reg_ofs.sdb);
+ writeq((u64) 1 << db, ndev->reg_ofs.rdb);
else
writew(((1 << ndev->bits_per_vector) - 1) <<
- (db * ndev->bits_per_vector), ndev->reg_ofs.sdb);
+ (db * ndev->bits_per_vector), ndev->reg_ofs.rdb);
}

static void bwd_recover_link(struct ntb_device *ndev)
@@ -665,10 +665,10 @@ static int ntb_xeon_setup(struct ntb_device *ndev)
else
ndev->dev_type = NTB_DEV_DSD;

- ndev->reg_ofs.pdb = ndev->reg_base + SNB_PDOORBELL_OFFSET;
- ndev->reg_ofs.pdb_mask = ndev->reg_base + SNB_PDBMSK_OFFSET;
- ndev->reg_ofs.sbar2_xlat = ndev->reg_base + SNB_SBAR2XLAT_OFFSET;
- ndev->reg_ofs.sbar4_xlat = ndev->reg_base + SNB_SBAR4XLAT_OFFSET;
+ ndev->reg_ofs.ldb = ndev->reg_base + SNB_PDOORBELL_OFFSET;
+ ndev->reg_ofs.ldb_mask = ndev->reg_base + SNB_PDBMSK_OFFSET;
+ ndev->reg_ofs.bar2_xlat = ndev->reg_base + SNB_SBAR2XLAT_OFFSET;
+ ndev->reg_ofs.bar4_xlat = ndev->reg_base + SNB_SBAR4XLAT_OFFSET;
ndev->reg_ofs.lnk_cntl = ndev->reg_base + SNB_NTBCNTL_OFFSET;
ndev->reg_ofs.lnk_stat = ndev->reg_base + SNB_LINK_STATUS_OFFSET;
ndev->reg_ofs.spad_read = ndev->reg_base + SNB_SPAD_OFFSET;
@@ -687,7 +687,7 @@ static int ntb_xeon_setup(struct ntb_device *ndev)
ndev->limits.max_mw = SNB_ERRATA_MAX_MW;
ndev->reg_ofs.spad_write = ndev->mw[1].vbase +
SNB_SPAD_OFFSET;
- ndev->reg_ofs.sdb = ndev->mw[1].vbase +
+ ndev->reg_ofs.rdb = ndev->mw[1].vbase +
SNB_PDOORBELL_OFFSET;

/* Set the Limit register to 4k, the minimum size, to
@@ -699,7 +699,7 @@ static int ntb_xeon_setup(struct ntb_device *ndev)
ndev->limits.max_mw = SNB_MAX_MW;
ndev->reg_ofs.spad_write = ndev->reg_base +
SNB_B2B_SPAD_OFFSET;
- ndev->reg_ofs.sdb = ndev->reg_base +
+ ndev->reg_ofs.rdb = ndev->reg_base +
SNB_B2B_DOORBELL_OFFSET;

/* Disable the Limit register, just incase it is set to
@@ -791,21 +791,21 @@ static int ntb_bwd_setup(struct ntb_device *ndev)
if (rc)
return rc;

- ndev->reg_ofs.pdb = ndev->reg_base + BWD_PDOORBELL_OFFSET;
- ndev->reg_ofs.pdb_mask = ndev->reg_base + BWD_PDBMSK_OFFSET;
- ndev->reg_ofs.sbar2_xlat = ndev->reg_base + BWD_SBAR2XLAT_OFFSET;
- ndev->reg_ofs.sbar4_xlat = ndev->reg_base + BWD_SBAR4XLAT_OFFSET;
+ ndev->reg_ofs.ldb = ndev->reg_base + BWD_PDOORBELL_OFFSET;
+ ndev->reg_ofs.ldb_mask = ndev->reg_base + BWD_PDBMSK_OFFSET;
+ ndev->reg_ofs.bar2_xlat = ndev->reg_base + BWD_SBAR2XLAT_OFFSET;
+ ndev->reg_ofs.bar4_xlat = ndev->reg_base + BWD_SBAR4XLAT_OFFSET;
ndev->reg_ofs.lnk_cntl = ndev->reg_base + BWD_NTBCNTL_OFFSET;
ndev->reg_ofs.lnk_stat = ndev->reg_base + BWD_LINK_STATUS_OFFSET;
ndev->reg_ofs.spad_read = ndev->reg_base + BWD_SPAD_OFFSET;
ndev->reg_ofs.spci_cmd = ndev->reg_base + BWD_PCICMD_OFFSET;

if (ndev->conn_type == NTB_CONN_B2B) {
- ndev->reg_ofs.sdb = ndev->reg_base + BWD_B2B_DOORBELL_OFFSET;
+ ndev->reg_ofs.rdb = ndev->reg_base + BWD_B2B_DOORBELL_OFFSET;
ndev->reg_ofs.spad_write = ndev->reg_base + BWD_B2B_SPAD_OFFSET;
ndev->limits.max_spads = BWD_MAX_SPADS;
} else {
- ndev->reg_ofs.sdb = ndev->reg_base + BWD_PDOORBELL_OFFSET;
+ ndev->reg_ofs.rdb = ndev->reg_base + BWD_PDOORBELL_OFFSET;
ndev->reg_ofs.spad_write = ndev->reg_base + BWD_SPAD_OFFSET;
ndev->limits.max_spads = BWD_MAX_COMPAT_SPADS;
}
@@ -885,7 +885,7 @@ static irqreturn_t bwd_callback_msix_irq(int irq, void *data)
*/
ndev->last_ts = jiffies;

- writeq((u64) 1 << db_cb->db_num, ndev->reg_ofs.pdb);
+ writeq((u64) 1 << db_cb->db_num, ndev->reg_ofs.ldb);

return IRQ_HANDLED;
}
@@ -907,7 +907,7 @@ static irqreturn_t xeon_callback_msix_irq(int irq, void *data)
* interrupts.
*/
writew(((1 << ndev->bits_per_vector) - 1) <<
- (db_cb->db_num * ndev->bits_per_vector), ndev->reg_ofs.pdb);
+ (db_cb->db_num * ndev->bits_per_vector), ndev->reg_ofs.ldb);

return IRQ_HANDLED;
}
@@ -925,7 +925,7 @@ static irqreturn_t xeon_event_msix_irq(int irq, void *dev)
dev_err(&ndev->pdev->dev, "Error determining link status\n");

/* bit 15 is always the link bit */
- writew(1 << ndev->limits.max_db_bits, ndev->reg_ofs.pdb);
+ writew(1 << ndev->limits.max_db_bits, ndev->reg_ofs.ldb);

return IRQ_HANDLED;
}
@@ -936,29 +936,28 @@ static irqreturn_t ntb_interrupt(int irq, void *dev)
unsigned int i = 0;

if (ndev->hw_type == BWD_HW) {
- u64 pdb = readq(ndev->reg_ofs.pdb);
+ u64 ldb = readq(ndev->reg_ofs.ldb);

- dev_dbg(&ndev->pdev->dev, "irq %d - pdb = %Lx\n", irq, pdb);
+ dev_dbg(&ndev->pdev->dev, "irq %d - ldb = %Lx\n", irq, ldb);

- while (pdb) {
- i = __ffs(pdb);
- pdb &= pdb - 1;
+ while (ldb) {
+ i = __ffs(ldb);
+ ldb &= ldb - 1;
bwd_callback_msix_irq(irq, &ndev->db_cb[i]);
}
} else {
- u16 pdb = readw(ndev->reg_ofs.pdb);
+ u16 ldb = readw(ndev->reg_ofs.ldb);

- dev_dbg(&ndev->pdev->dev, "irq %d - pdb = %x sdb %x\n", irq,
- pdb, readw(ndev->reg_ofs.sdb));
+ dev_dbg(&ndev->pdev->dev, "irq %d - ldb = %x\n", irq, ldb);

- if (pdb & SNB_DB_HW_LINK) {
+ if (ldb & SNB_DB_HW_LINK) {
xeon_event_msix_irq(irq, dev);
- pdb &= ~SNB_DB_HW_LINK;
+ ldb &= ~SNB_DB_HW_LINK;
}

- while (pdb) {
- i = __ffs(pdb);
- pdb &= pdb - 1;
+ while (ldb) {
+ i = __ffs(ldb);
+ ldb &= ldb - 1;
xeon_callback_msix_irq(irq, &ndev->db_cb[i]);
}
}
@@ -1116,10 +1115,10 @@ static int ntb_setup_interrupts(struct ntb_device *ndev)
* Interrupt. The rest will be unmasked as callbacks are registered.
*/
if (ndev->hw_type == BWD_HW)
- writeq(~0, ndev->reg_ofs.pdb_mask);
+ writeq(~0, ndev->reg_ofs.ldb_mask);
else
writew(~(1 << ndev->limits.max_db_bits),
- ndev->reg_ofs.pdb_mask);
+ ndev->reg_ofs.ldb_mask);

rc = ntb_setup_msix(ndev);
if (!rc)
@@ -1148,9 +1147,9 @@ static void ntb_free_interrupts(struct ntb_device *ndev)

/* mask interrupts */
if (ndev->hw_type == BWD_HW)
- writeq(~0, ndev->reg_ofs.pdb_mask);
+ writeq(~0, ndev->reg_ofs.ldb_mask);
else
- writew(~0, ndev->reg_ofs.pdb_mask);
+ writew(~0, ndev->reg_ofs.ldb_mask);

if (ndev->num_msix) {
struct msix_entry *msix;
diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
index ab5f768..4b69f2e 100644
--- a/drivers/ntb/ntb_hw.h
+++ b/drivers/ntb/ntb_hw.h
@@ -124,11 +124,11 @@ struct ntb_device {
unsigned char msix_cnt;
} limits;
struct {
- void __iomem *pdb;
- void __iomem *pdb_mask;
- void __iomem *sdb;
- void __iomem *sbar2_xlat;
- void __iomem *sbar4_xlat;
+ void __iomem *ldb;
+ void __iomem *ldb_mask;
+ void __iomem *rdb;
+ void __iomem *bar2_xlat;
+ void __iomem *bar4_xlat;
void __iomem *spad_write;
void __iomem *spad_read;
void __iomem *lnk_cntl;
@@ -243,7 +243,7 @@ int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
resource_size_t ntb_get_mw_base(struct ntb_device *ndev, unsigned int mw);
void __iomem *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw);
u64 ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw);
-void ntb_ring_sdb(struct ntb_device *ndev, unsigned int idx);
+void ntb_ring_doorbell(struct ntb_device *ndev, unsigned int idx);
void *ntb_find_transport(struct pci_dev *pdev);

int ntb_transport_init(struct pci_dev *pdev);
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index 73a35e4..f206842 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -1197,7 +1197,7 @@ static void ntb_tx_copy_callback(void *data)
wmb();
iowrite32(entry->flags | DESC_DONE_FLAG, &hdr->flags);

- ntb_ring_sdb(qp->ndev, qp->qp_num);
+ ntb_ring_doorbell(qp->ndev, qp->qp_num);

/* The entry length can only be zero if the packet is intended to be a
* "link down" or similar. Since no payload is being sent in these
--
1.7.9.5

2013-08-02 17:36:39

by Jon Mason

[permalink] [raw]
Subject: [PATCH 13/15] NTB: Comment Fix

Add "data" ntb_register_db_callback parameter description comment and
correct poor spelling.

Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/ntb_hw.c | 5 +++--
drivers/ntb/ntb_transport.c | 6 +++---
2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index 8153bea..95603f8 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -145,6 +145,7 @@ void ntb_unregister_event_callback(struct ntb_device *ndev)
* ntb_register_db_callback() - register a callback for doorbell interrupt
* @ndev: pointer to ntb_device instance
* @idx: doorbell index to register callback, zero based
+ * @data: pointer to be returned to caller with every callback
* @func: callback function to register
*
* This function registers a callback function for the doorbell interrupt
@@ -1222,9 +1223,9 @@ static int ntb_create_callbacks(struct ntb_device *ndev)
{
int i;

- /* Checken-egg issue. We won't know how many callbacks are necessary
+ /* Chicken-egg issue. We won't know how many callbacks are necessary
* until we see how many MSI-X vectors we get, but these pointers need
- * to be passed into the MSI-X register fucntion. So, we allocate the
+ * to be passed into the MSI-X register function. So, we allocate the
* max, knowing that they might not all be used, to work around this.
*/
ndev->db_cb = kcalloc(ndev->limits.max_db_bits,
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index f206842..3b5941e 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -1000,7 +1000,7 @@ static void ntb_rx_copy_callback(void *data)
hdr->flags = 0;

/* If the callbacks come out of order, the writing of the index to the
- * last completed will be out of order. This may result in the the
+ * last completed will be out of order. This may result in the
* receive stalling forever.
*/
iowrite32(entry->index, &qp->rx_info->entry);
@@ -1551,7 +1551,7 @@ EXPORT_SYMBOL_GPL(ntb_transport_rx_enqueue);
* @len: length of the data buffer
*
* Enqueue a new transmit buffer onto the transport queue from which a NTB
- * payload will be transmitted. This assumes that a lock is behing held to
+ * payload will be transmitted. This assumes that a lock is being held to
* serialize access to the qp.
*
* RETURNS: An appropriate -ERRNO error value on error, or zero for success.
@@ -1609,7 +1609,7 @@ EXPORT_SYMBOL_GPL(ntb_transport_link_up);
*
* Notify NTB transport layer of client's desire to no longer receive data on
* transport queue specified. It is the client's responsibility to ensure all
- * entries on queue are purged or otherwise handled appropraitely.
+ * entries on queue are purged or otherwise handled appropriately.
*/
void ntb_transport_link_down(struct ntb_transport_qp *qp)
{
--
1.7.9.5

2013-08-02 17:36:51

by Jon Mason

[permalink] [raw]
Subject: [PATCH 12/15] NTB: Remove References of non-B2B BWD HW

NTB-RP is not a supported configuration on BWD hardware. Remove the
code attempting to set it up.

Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/ntb_hw.c | 16 ++++------------
drivers/ntb/ntb_regs.h | 1 -
2 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index 371b65e..8153bea 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -829,7 +829,7 @@ static int ntb_bwd_setup(struct ntb_device *ndev)
break;
case NTB_CONN_RP:
default:
- dev_err(&ndev->pdev->dev, "Only B2B supported at this time\n");
+ dev_err(&ndev->pdev->dev, "Unsupported NTB configuration\n");
return -EINVAL;
}

@@ -846,24 +846,16 @@ static int ntb_bwd_setup(struct ntb_device *ndev)

ndev->reg_ofs.ldb = ndev->reg_base + BWD_PDOORBELL_OFFSET;
ndev->reg_ofs.ldb_mask = ndev->reg_base + BWD_PDBMSK_OFFSET;
+ ndev->reg_ofs.rdb = ndev->reg_base + BWD_B2B_DOORBELL_OFFSET;
ndev->reg_ofs.bar2_xlat = ndev->reg_base + BWD_SBAR2XLAT_OFFSET;
ndev->reg_ofs.bar4_xlat = ndev->reg_base + BWD_SBAR4XLAT_OFFSET;
ndev->reg_ofs.lnk_cntl = ndev->reg_base + BWD_NTBCNTL_OFFSET;
ndev->reg_ofs.lnk_stat = ndev->reg_base + BWD_LINK_STATUS_OFFSET;
ndev->reg_ofs.spad_read = ndev->reg_base + BWD_SPAD_OFFSET;
+ ndev->reg_ofs.spad_write = ndev->reg_base + BWD_B2B_SPAD_OFFSET;
ndev->reg_ofs.spci_cmd = ndev->reg_base + BWD_PCICMD_OFFSET;
-
- if (ndev->conn_type == NTB_CONN_B2B) {
- ndev->reg_ofs.rdb = ndev->reg_base + BWD_B2B_DOORBELL_OFFSET;
- ndev->reg_ofs.spad_write = ndev->reg_base + BWD_B2B_SPAD_OFFSET;
- ndev->limits.max_spads = BWD_MAX_SPADS;
- } else {
- ndev->reg_ofs.rdb = ndev->reg_base + BWD_PDOORBELL_OFFSET;
- ndev->reg_ofs.spad_write = ndev->reg_base + BWD_SPAD_OFFSET;
- ndev->limits.max_spads = BWD_MAX_COMPAT_SPADS;
- }
-
ndev->limits.max_mw = BWD_MAX_MW;
+ ndev->limits.max_spads = BWD_MAX_SPADS;
ndev->limits.max_db_bits = BWD_MAX_DB_BITS;
ndev->limits.msix_cnt = BWD_MSIX_CNT;
ndev->bits_per_vector = BWD_DB_BITS_PER_VEC;
diff --git a/drivers/ntb/ntb_regs.h b/drivers/ntb/ntb_regs.h
index 910d5e9..57aefbb 100644
--- a/drivers/ntb/ntb_regs.h
+++ b/drivers/ntb/ntb_regs.h
@@ -103,7 +103,6 @@

#define BWD_MSIX_CNT 34
#define BWD_MAX_SPADS 16
-#define BWD_MAX_COMPAT_SPADS 16
#define BWD_MAX_DB_BITS 34
#define BWD_DB_BITS_PER_VEC 1
#define BWD_MAX_MW 2
--
1.7.9.5

2013-08-02 17:36:44

by Jon Mason

[permalink] [raw]
Subject: [PATCH 07/15] NTB: Update Device IDs

Add support for new Intel NTB devices on upcoming Xeon hardware. Since
the Xeon hardware design is already in place in the driver, all that is
needed are the new device ids.

Remove the device IDs for NTB devs running in Transparent Bridge mode,
as this driver is not being used for those devices.

Rename the device IDs for NTB devs running in NTB-RP mode to better
identify their usage model. "PS" to denote the Primary Side of NTB, and
"SS" to denote the secondary side. The primary side is the interface
exposed to the local system, and the secondary side is the interface
exposed to the remote system.

Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/ntb_hw.c | 29 ++++++++++++++++++++---------
drivers/ntb/ntb_hw.h | 15 ++++++++++-----
2 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index d259d41..fc6b19d 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -94,11 +94,17 @@ struct dentry *debugfs_dir;
static DEFINE_PCI_DEVICE_TABLE(ntb_pci_tbl) = {
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_BWD)},
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_JSF)},
- {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF)},
- {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_RP_JSF)},
- {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_RP_SNB)},
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_SNB)},
- {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_IVT)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_HSX)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_PS_JSF)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_PS_SNB)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_PS_IVT)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_PS_HSX)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_SS_JSF)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_SS_SNB)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_SS_IVT)},
+ {PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_SS_HSX)},
{0}
};
MODULE_DEVICE_TABLE(pci, ntb_pci_tbl);
@@ -805,13 +811,18 @@ static int ntb_device_setup(struct ntb_device *ndev)
int rc;

switch (ndev->pdev->device) {
- case PCI_DEVICE_ID_INTEL_NTB_2ND_SNB:
- case PCI_DEVICE_ID_INTEL_NTB_RP_JSF:
- case PCI_DEVICE_ID_INTEL_NTB_RP_SNB:
- case PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF:
- case PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB:
+ case PCI_DEVICE_ID_INTEL_NTB_SS_JSF:
+ case PCI_DEVICE_ID_INTEL_NTB_SS_SNB:
+ case PCI_DEVICE_ID_INTEL_NTB_SS_IVT:
+ case PCI_DEVICE_ID_INTEL_NTB_SS_HSX:
+ case PCI_DEVICE_ID_INTEL_NTB_PS_JSF:
+ case PCI_DEVICE_ID_INTEL_NTB_PS_SNB:
+ case PCI_DEVICE_ID_INTEL_NTB_PS_IVT:
+ case PCI_DEVICE_ID_INTEL_NTB_PS_HSX:
case PCI_DEVICE_ID_INTEL_NTB_B2B_JSF:
case PCI_DEVICE_ID_INTEL_NTB_B2B_SNB:
+ case PCI_DEVICE_ID_INTEL_NTB_B2B_IVT:
+ case PCI_DEVICE_ID_INTEL_NTB_B2B_HSX:
rc = ntb_xeon_setup(ndev);
break;
case PCI_DEVICE_ID_INTEL_NTB_B2B_BWD:
diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
index 3a15d49..591d4ff 100644
--- a/drivers/ntb/ntb_hw.h
+++ b/drivers/ntb/ntb_hw.h
@@ -47,12 +47,17 @@
*/

#define PCI_DEVICE_ID_INTEL_NTB_B2B_JSF 0x3725
-#define PCI_DEVICE_ID_INTEL_NTB_CLASSIC_JSF 0x3726
-#define PCI_DEVICE_ID_INTEL_NTB_RP_JSF 0x3727
-#define PCI_DEVICE_ID_INTEL_NTB_RP_SNB 0x3C08
+#define PCI_DEVICE_ID_INTEL_NTB_PS_JSF 0x3726
+#define PCI_DEVICE_ID_INTEL_NTB_SS_JSF 0x3727
#define PCI_DEVICE_ID_INTEL_NTB_B2B_SNB 0x3C0D
-#define PCI_DEVICE_ID_INTEL_NTB_CLASSIC_SNB 0x3C0E
-#define PCI_DEVICE_ID_INTEL_NTB_2ND_SNB 0x3C0F
+#define PCI_DEVICE_ID_INTEL_NTB_PS_SNB 0x3C0E
+#define PCI_DEVICE_ID_INTEL_NTB_SS_SNB 0x3C0F
+#define PCI_DEVICE_ID_INTEL_NTB_B2B_IVT 0x0E0D
+#define PCI_DEVICE_ID_INTEL_NTB_PS_IVT 0x0E0E
+#define PCI_DEVICE_ID_INTEL_NTB_SS_IVT 0x0E0F
+#define PCI_DEVICE_ID_INTEL_NTB_B2B_HSX 0x2F0D
+#define PCI_DEVICE_ID_INTEL_NTB_PS_HSX 0x2F0E
+#define PCI_DEVICE_ID_INTEL_NTB_SS_HSX 0x2F0F
#define PCI_DEVICE_ID_INTEL_NTB_B2B_BWD 0x0C4E

#define msix_table_size(control) ((control & PCI_MSIX_FLAGS_QSIZE)+1)
--
1.7.9.5

2013-08-02 17:37:01

by Jon Mason

[permalink] [raw]
Subject: [PATCH 03/15] NTB: Correct USD/DSD Identification

Due to ambiguous documentation, the USD/DSD identification is backward
when compared to the setting in BIOS. Correct the bits to match the
BIOS setting.

Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/ntb_hw.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index 3b0ab50..26b808b 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -531,9 +531,9 @@ static int ntb_xeon_setup(struct ntb_device *ndev)
}

if (val & SNB_PPD_DEV_TYPE)
- ndev->dev_type = NTB_DEV_DSD;
- else
ndev->dev_type = NTB_DEV_USD;
+ else
+ ndev->dev_type = NTB_DEV_DSD;

ndev->reg_ofs.pdb = ndev->reg_base + SNB_PDOORBELL_OFFSET;
ndev->reg_ofs.pdb_mask = ndev->reg_base + SNB_PDBMSK_OFFSET;
@@ -647,6 +647,9 @@ static int ntb_device_setup(struct ntb_device *ndev)
if (rc)
return rc;

+ dev_info(&ndev->pdev->dev, "Device Type = %s\n",
+ ndev->dev_type == NTB_DEV_USD ? "USD/DSP" : "DSD/USP");
+
/* Enable Bus Master and Memory Space on the secondary side */
writew(PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER, ndev->reg_ofs.spci_cmd);

--
1.7.9.5

2013-08-02 17:36:59

by Jon Mason

[permalink] [raw]
Subject: [PATCH 05/15] NTB: Xeon Errata Workaround

There is a Xeon hardware errata related to writes to SDOORBELL or
B2BDOORBELL in conjunction with inbound access to NTB MMIO Space, which
may hang the system. To workaround this issue, use one of the memory
windows to access the interrupt and scratch pad registers on the remote
system. This bypasses the issue, but removes one of the memory windows
from use by the transport. This reduction of MWs necessitates adding
some logic to determine the number of available MWs.

Since some NTB usage methodologies may have unidirectional traffic, the
ability to disable the workaround via modparm has been added.

See BF113 in
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-c5500-c3500-spec-update.pdf
See BT119 in
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-family-spec-update.pdf

Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/ntb_hw.c | 101 +++++++++++++++++++++++++++++++++++++------
drivers/ntb/ntb_hw.h | 39 ++++++++++++++---
drivers/ntb/ntb_regs.h | 22 ++++++----
drivers/ntb/ntb_transport.c | 78 ++++++++++++++++++++-------------
4 files changed, 182 insertions(+), 58 deletions(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index 97d6704..674bc09 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -62,6 +62,10 @@ MODULE_VERSION(NTB_VER);
MODULE_LICENSE("Dual BSD/GPL");
MODULE_AUTHOR("Intel Corporation");

+static bool xeon_errata_workaround = true;
+module_param(xeon_errata_workaround, bool, 0644);
+MODULE_PARM_DESC(xeon_errata_workaround, "Workaround for the Xeon Errata");
+
enum {
NTB_CONN_CLASSIC = 0,
NTB_CONN_B2B,
@@ -81,7 +85,7 @@ enum {
struct dentry *debugfs_dir;

/* Translate memory window 0,1 to BAR 2,4 */
-#define MW_TO_BAR(mw) (mw * 2 + 2)
+#define MW_TO_BAR(mw) (mw * NTB_MAX_NUM_MW + 2)

static DEFINE_PCI_DEVICE_TABLE(ntb_pci_tbl) = {
{PCI_VDEVICE(INTEL, PCI_DEVICE_ID_INTEL_NTB_B2B_BWD)},
@@ -347,8 +351,8 @@ int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val)
*/
void __iomem *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw)
{
- if (mw >= NTB_NUM_MW)
- return NULL;
+ if (mw >= ntb_max_mw(ndev))
+ return 0;

return ndev->mw[mw].vbase;
}
@@ -364,7 +368,7 @@ void __iomem *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw)
*/
resource_size_t ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw)
{
- if (mw >= NTB_NUM_MW)
+ if (mw >= ntb_max_mw(ndev))
return 0;

return ndev->mw[mw].bar_sz;
@@ -382,7 +386,7 @@ resource_size_t ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw)
*/
void ntb_set_mw_addr(struct ntb_device *ndev, unsigned int mw, u64 addr)
{
- if (mw >= NTB_NUM_MW)
+ if (mw >= ntb_max_mw(ndev))
return;

dev_dbg(&ndev->pdev->dev, "Writing addr %Lx to BAR %d\n", addr,
@@ -546,16 +550,84 @@ static int ntb_xeon_setup(struct ntb_device *ndev)
ndev->reg_ofs.spad_read = ndev->reg_base + SNB_SPAD_OFFSET;
ndev->reg_ofs.spci_cmd = ndev->reg_base + SNB_PCICMD_OFFSET;

- if (ndev->conn_type == NTB_CONN_B2B) {
- ndev->reg_ofs.sdb = ndev->reg_base + SNB_B2B_DOORBELL_OFFSET;
- ndev->reg_ofs.spad_write = ndev->reg_base + SNB_B2B_SPAD_OFFSET;
- ndev->limits.max_spads = SNB_MAX_B2B_SPADS;
+ /* There is a Xeon hardware errata related to writes to
+ * SDOORBELL or B2BDOORBELL in conjunction with inbound access
+ * to NTB MMIO Space, which may hang the system. To workaround
+ * this use the second memory window to access the interrupt and
+ * scratch pad registers on the remote system.
+ */
+ if (xeon_errata_workaround) {
+ if (!ndev->mw[1].bar_sz)
+ return -EINVAL;
+
+ ndev->limits.max_mw = SNB_ERRATA_MAX_MW;
+ ndev->reg_ofs.spad_write = ndev->mw[1].vbase +
+ SNB_SPAD_OFFSET;
+ ndev->reg_ofs.sdb = ndev->mw[1].vbase +
+ SNB_PDOORBELL_OFFSET;
+
+ /* Set the Limit register to 4k, the minimum size, to
+ * prevent an illegal access
+ */
+ writeq(ndev->mw[1].bar_sz + 0x1000, ndev->reg_base +
+ SNB_PBAR4LMT_OFFSET);
+ } else {
+ ndev->limits.max_mw = SNB_MAX_MW;
+ ndev->reg_ofs.spad_write = ndev->reg_base +
+ SNB_B2B_SPAD_OFFSET;
+ ndev->reg_ofs.sdb = ndev->reg_base +
+ SNB_B2B_DOORBELL_OFFSET;
+
+ /* Disable the Limit register, just incase it is set to
+ * something silly
+ */
+ writeq(0, ndev->reg_base + SNB_PBAR4LMT_OFFSET);
+ }
+
+ /* The Xeon errata workaround requires setting SBAR Base
+ * addresses to known values, so that the PBAR XLAT can be
+ * pointed at SBAR0 of the remote system.
+ */
+ if (ndev->dev_type == NTB_DEV_USD) {
+ writeq(SNB_MBAR23_DSD_ADDR, ndev->reg_base +
+ SNB_PBAR2XLAT_OFFSET);
+ if (xeon_errata_workaround)
+ writeq(SNB_MBAR01_DSD_ADDR, ndev->reg_base +
+ SNB_PBAR4XLAT_OFFSET);
+ else {
+ writeq(SNB_MBAR45_DSD_ADDR, ndev->reg_base +
+ SNB_PBAR4XLAT_OFFSET);
+ writeq(SNB_MBAR01_DSD_ADDR, ndev->reg_base +
+ SNB_B2B_XLAT_OFFSET);
+ }
+
+ writeq(SNB_MBAR01_USD_ADDR, ndev->reg_base +
+ SNB_SBAR0BASE_OFFSET);
+ writeq(SNB_MBAR23_USD_ADDR, ndev->reg_base +
+ SNB_SBAR2BASE_OFFSET);
+ writeq(SNB_MBAR45_USD_ADDR, ndev->reg_base +
+ SNB_SBAR4BASE_OFFSET);
} else {
- ndev->reg_ofs.sdb = ndev->reg_base + SNB_SDOORBELL_OFFSET;
- ndev->reg_ofs.spad_write = ndev->reg_base + SNB_SPAD_OFFSET;
- ndev->limits.max_spads = SNB_MAX_COMPAT_SPADS;
+ writeq(SNB_MBAR23_USD_ADDR, ndev->reg_base +
+ SNB_PBAR2XLAT_OFFSET);
+ if (xeon_errata_workaround)
+ writeq(SNB_MBAR01_USD_ADDR, ndev->reg_base +
+ SNB_PBAR4XLAT_OFFSET);
+ else {
+ writeq(SNB_MBAR45_USD_ADDR, ndev->reg_base +
+ SNB_PBAR4XLAT_OFFSET);
+ writeq(SNB_MBAR01_USD_ADDR, ndev->reg_base +
+ SNB_B2B_XLAT_OFFSET);
+ }
+ writeq(SNB_MBAR01_DSD_ADDR, ndev->reg_base +
+ SNB_SBAR0BASE_OFFSET);
+ writeq(SNB_MBAR23_DSD_ADDR, ndev->reg_base +
+ SNB_SBAR2BASE_OFFSET);
+ writeq(SNB_MBAR45_DSD_ADDR, ndev->reg_base +
+ SNB_SBAR4BASE_OFFSET);
}

+ ndev->limits.max_spads = SNB_MAX_B2B_SPADS;
ndev->limits.max_db_bits = SNB_MAX_DB_BITS;
ndev->limits.msix_cnt = SNB_MSIX_CNT;
ndev->bits_per_vector = SNB_DB_BITS_PER_VEC;
@@ -614,6 +686,7 @@ static int ntb_bwd_setup(struct ntb_device *ndev)
ndev->limits.max_spads = BWD_MAX_COMPAT_SPADS;
}

+ ndev->limits.max_mw = BWD_MAX_MW;
ndev->limits.max_db_bits = BWD_MAX_DB_BITS;
ndev->limits.msix_cnt = BWD_MSIX_CNT;
ndev->bits_per_vector = BWD_DB_BITS_PER_VEC;
@@ -1053,7 +1126,7 @@ static int ntb_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
goto err2;
}

- for (i = 0; i < NTB_NUM_MW; i++) {
+ for (i = 0; i < NTB_MAX_NUM_MW; i++) {
ndev->mw[i].bar_sz = pci_resource_len(pdev, MW_TO_BAR(i));
ndev->mw[i].vbase =
ioremap_wc(pci_resource_start(pdev, MW_TO_BAR(i)),
@@ -1155,7 +1228,7 @@ static void ntb_pci_remove(struct pci_dev *pdev)
ntb_free_callbacks(ndev);
ntb_device_free(ndev);

- for (i = 0; i < NTB_NUM_MW; i++)
+ for (i = 0; i < NTB_MAX_NUM_MW; i++)
iounmap(ndev->mw[i].vbase);

iounmap(ndev->reg_base);
diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
index 6a4f56f..72fcb22 100644
--- a/drivers/ntb/ntb_hw.h
+++ b/drivers/ntb/ntb_hw.h
@@ -68,7 +68,7 @@

#define NTB_HB_TIMEOUT msecs_to_jiffies(1000)

-#define NTB_NUM_MW 2
+#define NTB_MAX_NUM_MW 2

enum ntb_hw_event {
NTB_EVENT_SW_EVENT0 = 0,
@@ -96,11 +96,12 @@ struct ntb_device {
struct pci_dev *pdev;
struct msix_entry *msix_entries;
void __iomem *reg_base;
- struct ntb_mw mw[NTB_NUM_MW];
+ struct ntb_mw mw[NTB_MAX_NUM_MW];
struct {
- unsigned int max_spads;
- unsigned int max_db_bits;
- unsigned int msix_cnt;
+ unsigned char max_mw;
+ unsigned char max_spads;
+ unsigned char max_db_bits;
+ unsigned char msix_cnt;
} limits;
struct {
void __iomem *pdb;
@@ -132,6 +133,32 @@ struct ntb_device {
};

/**
+ * ntb_max_cbs() - return the max callbacks
+ * @ndev: pointer to ntb_device instance
+ *
+ * Given the ntb pointer, return the maximum number of callbacks
+ *
+ * RETURNS: the maximum number of callbacks
+ */
+static inline unsigned char ntb_max_cbs(struct ntb_device *ndev)
+{
+ return ndev->max_cbs;
+}
+
+/**
+ * ntb_max_mw() - return the max number of memory windows
+ * @ndev: pointer to ntb_device instance
+ *
+ * Given the ntb pointer, return the maximum number of memory windows
+ *
+ * RETURNS: the maximum number of memory windows
+ */
+static inline unsigned char ntb_max_mw(struct ntb_device *ndev)
+{
+ return ndev->limits.max_mw;
+}
+
+/**
* ntb_hw_link_status() - return the hardware link status
* @ndev: pointer to ntb_device instance
*
@@ -148,7 +175,7 @@ static inline bool ntb_hw_link_status(struct ntb_device *ndev)
* ntb_query_pdev() - return the pci_dev pointer
* @ndev: pointer to ntb_device instance
*
- * Given the ntb pointer return the pci_dev pointerfor the NTB hardware device
+ * Given the ntb pointer, return the pci_dev pointer for the NTB hardware device
*
* RETURNS: a pointer to the ntb pci_dev
*/
diff --git a/drivers/ntb/ntb_regs.h b/drivers/ntb/ntb_regs.h
index 96209b4..4fe135a 100644
--- a/drivers/ntb/ntb_regs.h
+++ b/drivers/ntb/ntb_regs.h
@@ -58,6 +58,8 @@
/* Reserve the uppermost bit for link interrupt */
#define SNB_MAX_DB_BITS 15
#define SNB_DB_BITS_PER_VEC 5
+#define SNB_MAX_MW 2
+#define SNB_ERRATA_MAX_MW 1

#define SNB_DB_HW_LINK 0x8000

@@ -74,6 +76,9 @@
#define SNB_SBAR2XLAT_OFFSET 0x0030
#define SNB_SBAR4XLAT_OFFSET 0x0038
#define SNB_SBAR0BASE_OFFSET 0x0040
+#define SNB_SBAR0BASE_OFFSET 0x0040
+#define SNB_SBAR2BASE_OFFSET 0x0048
+#define SNB_SBAR4BASE_OFFSET 0x0050
#define SNB_SBAR2BASE_OFFSET 0x0048
#define SNB_SBAR4BASE_OFFSET 0x0050
#define SNB_NTBCNTL_OFFSET 0x0058
@@ -90,11 +95,19 @@
#define SNB_B2B_DOORBELL_OFFSET 0x0140
#define SNB_B2B_XLAT_OFFSET 0x0144

+#define SNB_MBAR01_USD_ADDR 0x000000000000000CULL
+#define SNB_MBAR23_USD_ADDR 0x000000410000000CULL
+#define SNB_MBAR45_USD_ADDR 0x000000810000000CULL
+#define SNB_MBAR01_DSD_ADDR 0x000000000000000CULL
+#define SNB_MBAR23_DSD_ADDR 0x000000400000000CULL
+#define SNB_MBAR45_DSD_ADDR 0x000000800000000CULL
+
#define BWD_MSIX_CNT 34
#define BWD_MAX_SPADS 16
#define BWD_MAX_COMPAT_SPADS 16
#define BWD_MAX_DB_BITS 34
#define BWD_DB_BITS_PER_VEC 1
+#define BWD_MAX_MW 2

#define BWD_PCICMD_OFFSET 0xb004
#define BWD_MBAR23_OFFSET 0xb018
@@ -128,12 +141,3 @@
#define BWD_PPD_INIT_LINK 0x0008
#define BWD_PPD_CONN_TYPE 0x0300
#define BWD_PPD_DEV_TYPE 0x1000
-
-#define BWD_PBAR2XLAT_USD_ADDR 0x0000004000000000
-#define BWD_PBAR4XLAT_USD_ADDR 0x0000008000000000
-#define BWD_MBAR23_USD_ADDR 0x000000410000000C
-#define BWD_MBAR45_USD_ADDR 0x000000810000000C
-#define BWD_PBAR2XLAT_DSD_ADDR 0x0000004100000000
-#define BWD_PBAR4XLAT_DSD_ADDR 0x0000008100000000
-#define BWD_MBAR23_DSD_ADDR 0x000000400000000C
-#define BWD_MBAR45_DSD_ADDR 0x000000800000000C
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index c308915..f7380e9 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -64,7 +64,7 @@ static unsigned int transport_mtu = 0x401E;
module_param(transport_mtu, uint, 0644);
MODULE_PARM_DESC(transport_mtu, "Maximum size of NTB transport packets");

-static unsigned char max_num_clients = 2;
+static unsigned char max_num_clients;
module_param(max_num_clients, byte, 0644);
MODULE_PARM_DESC(max_num_clients, "Maximum number of NTB transport clients");

@@ -150,7 +150,7 @@ struct ntb_transport {
struct list_head client_devs;

struct ntb_device *ndev;
- struct ntb_transport_mw mw[NTB_NUM_MW];
+ struct ntb_transport_mw *mw;
struct ntb_transport_qp *qps;
unsigned int max_qps;
unsigned long qp_bitmap;
@@ -182,7 +182,7 @@ enum {
MAX_SPAD,
};

-#define QP_TO_MW(qp) ((qp) % NTB_NUM_MW)
+#define QP_TO_MW(ndev, qp) ((qp) % ntb_max_mw(ndev))
#define NTB_QP_DEF_NUM_ENTRIES 100
#define NTB_LINK_DOWN_TIMEOUT 10

@@ -474,19 +474,22 @@ static void ntb_transport_setup_qp_mw(struct ntb_transport *nt,
{
struct ntb_transport_qp *qp = &nt->qps[qp_num];
unsigned int rx_size, num_qps_mw;
- u8 mw_num = QP_TO_MW(qp_num);
+ u8 mw_num, mw_max;
unsigned int i;

+ mw_max = ntb_max_mw(nt->ndev);
+ mw_num = QP_TO_MW(nt->ndev, qp_num);
+
WARN_ON(nt->mw[mw_num].virt_addr == NULL);

- if (nt->max_qps % NTB_NUM_MW && mw_num < nt->max_qps % NTB_NUM_MW)
- num_qps_mw = nt->max_qps / NTB_NUM_MW + 1;
+ if (nt->max_qps % mw_max && mw_num < nt->max_qps % mw_max)
+ num_qps_mw = nt->max_qps / mw_max + 1;
else
- num_qps_mw = nt->max_qps / NTB_NUM_MW;
+ num_qps_mw = nt->max_qps / mw_max;

rx_size = (unsigned int) nt->mw[mw_num].size / num_qps_mw;
qp->remote_rx_info = nt->mw[mw_num].virt_addr +
- (qp_num / NTB_NUM_MW * rx_size);
+ (qp_num / mw_max * rx_size);
rx_size -= sizeof(struct ntb_rx_info);

qp->rx_buff = qp->remote_rx_info + 1;
@@ -630,7 +633,7 @@ static void ntb_transport_link_work(struct work_struct *work)
int rc, i;

/* send the local info, in the opposite order of the way we read it */
- for (i = 0; i < NTB_NUM_MW; i++) {
+ for (i = 0; i < ntb_max_mw(ndev); i++) {
rc = ntb_write_remote_spad(ndev, MW0_SZ_HIGH + (i * 2),
ntb_get_mw_size(ndev, i) >> 32);
if (rc) {
@@ -650,10 +653,10 @@ static void ntb_transport_link_work(struct work_struct *work)
}
}

- rc = ntb_write_remote_spad(ndev, NUM_MWS, NTB_NUM_MW);
+ rc = ntb_write_remote_spad(ndev, NUM_MWS, ntb_max_mw(ndev));
if (rc) {
dev_err(&pdev->dev, "Error writing %x to remote spad %d\n",
- NTB_NUM_MW, NUM_MWS);
+ ntb_max_mw(ndev), NUM_MWS);
goto out;
}

@@ -698,11 +701,11 @@ static void ntb_transport_link_work(struct work_struct *work)
goto out;
}

- if (val != NTB_NUM_MW)
+ if (val != ntb_max_mw(ndev))
goto out;
dev_dbg(&pdev->dev, "Remote number of mws = %d\n", val);

- for (i = 0; i < NTB_NUM_MW; i++) {
+ for (i = 0; i < ntb_max_mw(ndev); i++) {
u64 val64;

rc = ntb_read_remote_spad(ndev, MW0_SZ_HIGH + (i * 2), &val);
@@ -744,7 +747,7 @@ static void ntb_transport_link_work(struct work_struct *work)
return;

out1:
- for (i = 0; i < NTB_NUM_MW; i++)
+ for (i = 0; i < ntb_max_mw(ndev); i++)
ntb_free_mw(nt, i);
out:
if (ntb_hw_link_status(ndev))
@@ -798,7 +801,10 @@ static void ntb_transport_init_queue(struct ntb_transport *nt,
{
struct ntb_transport_qp *qp;
unsigned int num_qps_mw, tx_size;
- u8 mw_num = QP_TO_MW(qp_num);
+ u8 mw_num, mw_max;
+
+ mw_max = ntb_max_mw(nt->ndev);
+ mw_num = QP_TO_MW(nt->ndev, qp_num);

qp = &nt->qps[qp_num];
qp->qp_num = qp_num;
@@ -808,14 +814,14 @@ static void ntb_transport_init_queue(struct ntb_transport *nt,
qp->client_ready = NTB_LINK_DOWN;
qp->event_handler = NULL;

- if (nt->max_qps % NTB_NUM_MW && mw_num < nt->max_qps % NTB_NUM_MW)
- num_qps_mw = nt->max_qps / NTB_NUM_MW + 1;
+ if (nt->max_qps % mw_max && mw_num < nt->max_qps % mw_max)
+ num_qps_mw = nt->max_qps / mw_max + 1;
else
- num_qps_mw = nt->max_qps / NTB_NUM_MW;
+ num_qps_mw = nt->max_qps / mw_max;

tx_size = (unsigned int) ntb_get_mw_size(qp->ndev, mw_num) / num_qps_mw;
qp->rx_info = ntb_get_mw_vbase(nt->ndev, mw_num) +
- (qp_num / NTB_NUM_MW * tx_size);
+ (qp_num / mw_max * tx_size);
tx_size -= sizeof(struct ntb_rx_info);

qp->tx_mw = qp->rx_info + 1;
@@ -862,13 +868,23 @@ int ntb_transport_init(struct pci_dev *pdev)
goto err;
}

- nt->max_qps = min(nt->ndev->max_cbs, max_num_clients);
+ nt->mw = kcalloc(ntb_max_mw(nt->ndev), sizeof(struct ntb_transport_mw),
+ GFP_KERNEL);
+ if (!nt->mw) {
+ rc = -ENOMEM;
+ goto err1;
+ }
+
+ if (max_num_clients)
+ nt->max_qps = min(ntb_max_cbs(nt->ndev), max_num_clients);
+ else
+ nt->max_qps = min(ntb_max_cbs(nt->ndev), ntb_max_mw(nt->ndev));

nt->qps = kcalloc(nt->max_qps, sizeof(struct ntb_transport_qp),
GFP_KERNEL);
if (!nt->qps) {
rc = -ENOMEM;
- goto err1;
+ goto err2;
}

nt->qp_bitmap = ((u64) 1 << nt->max_qps) - 1;
@@ -882,22 +898,24 @@ int ntb_transport_init(struct pci_dev *pdev)
rc = ntb_register_event_callback(nt->ndev,
ntb_transport_event_callback);
if (rc)
- goto err2;
+ goto err3;

INIT_LIST_HEAD(&nt->client_devs);
rc = ntb_bus_init(nt);
if (rc)
- goto err3;
+ goto err4;

if (ntb_hw_link_status(nt->ndev))
schedule_delayed_work(&nt->link_work, 0);

return 0;

-err3:
+err4:
ntb_unregister_event_callback(nt->ndev);
-err2:
+err3:
kfree(nt->qps);
+err2:
+ kfree(nt->mw);
err1:
ntb_unregister_transport(nt->ndev);
err:
@@ -908,6 +926,7 @@ err:
void ntb_transport_free(void *transport)
{
struct ntb_transport *nt = transport;
+ struct ntb_device *ndev = nt->ndev;
struct pci_dev *pdev;
int i;

@@ -924,15 +943,16 @@ void ntb_transport_free(void *transport)

cancel_delayed_work_sync(&nt->link_work);

- ntb_unregister_event_callback(nt->ndev);
+ ntb_unregister_event_callback(ndev);

- pdev = ntb_query_pdev(nt->ndev);
+ pdev = ntb_query_pdev(ndev);

- for (i = 0; i < NTB_NUM_MW; i++)
+ for (i = 0; i < ntb_max_mw(ndev); i++)
ntb_free_mw(nt, i);

kfree(nt->qps);
- ntb_unregister_transport(nt->ndev);
+ kfree(nt->mw);
+ ntb_unregister_transport(ndev);
kfree(nt);
}

--
1.7.9.5

2013-08-02 17:37:26

by Jon Mason

[permalink] [raw]
Subject: [PATCH 02/15] NTB: Correct Number of Scratch Pad Registers

The NTB Xeon hardware has 16 scratch pad registers and 16 back-to-back
scratch pad registers. Correct the #define to represent this and update
the variable names to reflect their usage.

Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/ntb_hw.c | 2 +-
drivers/ntb/ntb_regs.h | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index 515099e..3b0ab50 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -547,7 +547,7 @@ static int ntb_xeon_setup(struct ntb_device *ndev)
if (ndev->conn_type == NTB_CONN_B2B) {
ndev->reg_ofs.sdb = ndev->reg_base + SNB_B2B_DOORBELL_OFFSET;
ndev->reg_ofs.spad_write = ndev->reg_base + SNB_B2B_SPAD_OFFSET;
- ndev->limits.max_spads = SNB_MAX_SPADS;
+ ndev->limits.max_spads = SNB_MAX_B2B_SPADS;
} else {
ndev->reg_ofs.sdb = ndev->reg_base + SNB_SDOORBELL_OFFSET;
ndev->reg_ofs.spad_write = ndev->reg_base + SNB_SPAD_OFFSET;
diff --git a/drivers/ntb/ntb_regs.h b/drivers/ntb/ntb_regs.h
index 5bfa8c0..96209b4 100644
--- a/drivers/ntb/ntb_regs.h
+++ b/drivers/ntb/ntb_regs.h
@@ -53,8 +53,8 @@
#define NTB_LINK_WIDTH_MASK 0x03f0

#define SNB_MSIX_CNT 4
-#define SNB_MAX_SPADS 16
-#define SNB_MAX_COMPAT_SPADS 8
+#define SNB_MAX_B2B_SPADS 16
+#define SNB_MAX_COMPAT_SPADS 16
/* Reserve the uppermost bit for link interrupt */
#define SNB_MAX_DB_BITS 15
#define SNB_DB_BITS_PER_VEC 5
--
1.7.9.5

2013-08-02 17:37:35

by Jon Mason

[permalink] [raw]
Subject: [PATCH 15/15] MAINTAINERS: Add Website and Git Tree for NTB

Add website and git tree for the NTB entry in MAINTAINERS

Signed-off-by: Jon Mason <[email protected]>
---
MAINTAINERS | 2 ++
1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ad7e322..821f468 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5741,6 +5741,8 @@ F: drivers/scsi/nsp32*
NTB DRIVER
M: Jon Mason <[email protected]>
S: Supported
+W: https://github.com/jonmason/ntb/wiki
+T: git git://github.com/jonmason/ntb.git
F: drivers/ntb/
F: drivers/net/ntb_netdev.c
F: include/linux/ntb.h
--
1.7.9.5

2013-08-02 17:38:05

by Jon Mason

[permalink] [raw]
Subject: [PATCH 09/15] NTB: Use DMA Engine to Transmit and Receive

Allocate and use a DMA engine channel to transmit and receive data over
NTB. If none is allocated, fall back to using the CPU to transfer data.

Cc: Dan Williams <[email protected]>
Cc: Vinod Koul <[email protected]>
Cc: Dave Jiang <[email protected]>
Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/ntb_hw.c | 17 +++
drivers/ntb/ntb_hw.h | 1 +
drivers/ntb/ntb_transport.c | 285 ++++++++++++++++++++++++++++++++++++-------
3 files changed, 258 insertions(+), 45 deletions(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index 1d8e551..014222c 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -350,6 +350,23 @@ int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val)
}

/**
+ * ntb_get_mw_base() - get addr for the NTB memory window
+ * @ndev: pointer to ntb_device instance
+ * @mw: memory window number
+ *
+ * This function provides the base address of the memory window specified.
+ *
+ * RETURNS: address, or NULL on error.
+ */
+resource_size_t ntb_get_mw_base(struct ntb_device *ndev, unsigned int mw)
+{
+ if (mw >= ntb_max_mw(ndev))
+ return 0;
+
+ return pci_resource_start(ndev->pdev, MW_TO_BAR(mw));
+}
+
+/**
* ntb_get_mw_vbase() - get virtual addr for the NTB memory window
* @ndev: pointer to ntb_device instance
* @mw: memory window number
diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
index b03de80..ab5f768 100644
--- a/drivers/ntb/ntb_hw.h
+++ b/drivers/ntb/ntb_hw.h
@@ -240,6 +240,7 @@ int ntb_write_local_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
int ntb_read_local_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
int ntb_write_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
+resource_size_t ntb_get_mw_base(struct ntb_device *ndev, unsigned int mw);
void __iomem *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw);
u64 ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw);
void ntb_ring_sdb(struct ntb_device *ndev, unsigned int idx);
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index f7380e9..73a35e4 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -47,6 +47,7 @@
*/
#include <linux/debugfs.h>
#include <linux/delay.h>
+#include <linux/dmaengine.h>
#include <linux/dma-mapping.h>
#include <linux/errno.h>
#include <linux/export.h>
@@ -68,6 +69,10 @@ static unsigned char max_num_clients;
module_param(max_num_clients, byte, 0644);
MODULE_PARM_DESC(max_num_clients, "Maximum number of NTB transport clients");

+static unsigned int copy_bytes = 1024;
+module_param(copy_bytes, uint, 0644);
+MODULE_PARM_DESC(copy_bytes, "Threshold under which NTB will use the CPU to copy instead of DMA");
+
struct ntb_queue_entry {
/* ntb_queue list reference */
struct list_head entry;
@@ -76,6 +81,13 @@ struct ntb_queue_entry {
void *buf;
unsigned int len;
unsigned int flags;
+
+ struct ntb_transport_qp *qp;
+ union {
+ struct ntb_payload_header __iomem *tx_hdr;
+ struct ntb_payload_header *rx_hdr;
+ };
+ unsigned int index;
};

struct ntb_rx_info {
@@ -86,6 +98,7 @@ struct ntb_transport_qp {
struct ntb_transport *transport;
struct ntb_device *ndev;
void *cb_data;
+ struct dma_chan *dma_chan;

bool client_ready;
bool qp_link;
@@ -99,6 +112,7 @@ struct ntb_transport_qp {
struct list_head tx_free_q;
spinlock_t ntb_tx_free_q_lock;
void __iomem *tx_mw;
+ dma_addr_t tx_mw_raw;
unsigned int tx_index;
unsigned int tx_max_entry;
unsigned int tx_max_frame;
@@ -114,6 +128,7 @@ struct ntb_transport_qp {
unsigned int rx_index;
unsigned int rx_max_entry;
unsigned int rx_max_frame;
+ dma_cookie_t last_cookie;

void (*event_handler) (void *data, int status);
struct delayed_work link_work;
@@ -129,9 +144,14 @@ struct ntb_transport_qp {
u64 rx_err_no_buf;
u64 rx_err_oflow;
u64 rx_err_ver;
+ u64 rx_memcpy;
+ u64 rx_async;
u64 tx_bytes;
u64 tx_pkts;
u64 tx_ring_full;
+ u64 tx_err_no_buf;
+ u64 tx_memcpy;
+ u64 tx_async;
};

struct ntb_transport_mw {
@@ -381,7 +401,7 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
char *buf;
ssize_t ret, out_offset, out_count;

- out_count = 600;
+ out_count = 1000;

buf = kmalloc(out_count, GFP_KERNEL);
if (!buf)
@@ -396,6 +416,10 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"rx_pkts - \t%llu\n", qp->rx_pkts);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "rx_memcpy - \t%llu\n", qp->rx_memcpy);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "rx_async - \t%llu\n", qp->rx_async);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
"rx_ring_empty - %llu\n", qp->rx_ring_empty);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
@@ -415,8 +439,14 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"tx_pkts - \t%llu\n", qp->tx_pkts);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "tx_memcpy - \t%llu\n", qp->tx_memcpy);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "tx_async - \t%llu\n", qp->tx_async);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
"tx_ring_full - \t%llu\n", qp->tx_ring_full);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
+ "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
+ out_offset += snprintf(buf + out_offset, out_count - out_offset,
"tx_mw - \t%p\n", qp->tx_mw);
out_offset += snprintf(buf + out_offset, out_count - out_offset,
"tx_index - \t%u\n", qp->tx_index);
@@ -488,11 +518,11 @@ static void ntb_transport_setup_qp_mw(struct ntb_transport *nt,
num_qps_mw = nt->max_qps / mw_max;

rx_size = (unsigned int) nt->mw[mw_num].size / num_qps_mw;
- qp->remote_rx_info = nt->mw[mw_num].virt_addr +
- (qp_num / mw_max * rx_size);
+ qp->rx_buff = nt->mw[mw_num].virt_addr + qp_num / mw_max * rx_size;
rx_size -= sizeof(struct ntb_rx_info);

- qp->rx_buff = qp->remote_rx_info + 1;
+ qp->remote_rx_info = qp->rx_buff + rx_size;
+
/* Due to housekeeping, there must be atleast 2 buffs */
qp->rx_max_frame = min(transport_mtu, rx_size / 2);
qp->rx_max_entry = rx_size / qp->rx_max_frame;
@@ -802,6 +832,7 @@ static void ntb_transport_init_queue(struct ntb_transport *nt,
struct ntb_transport_qp *qp;
unsigned int num_qps_mw, tx_size;
u8 mw_num, mw_max;
+ u64 qp_offset;

mw_max = ntb_max_mw(nt->ndev);
mw_num = QP_TO_MW(nt->ndev, qp_num);
@@ -820,11 +851,12 @@ static void ntb_transport_init_queue(struct ntb_transport *nt,
num_qps_mw = nt->max_qps / mw_max;

tx_size = (unsigned int) ntb_get_mw_size(qp->ndev, mw_num) / num_qps_mw;
- qp->rx_info = ntb_get_mw_vbase(nt->ndev, mw_num) +
- (qp_num / mw_max * tx_size);
+ qp_offset = qp_num / mw_max * tx_size;
+ qp->tx_mw = ntb_get_mw_vbase(nt->ndev, mw_num) + qp_offset;
+ qp->tx_mw_raw = ntb_get_mw_base(qp->ndev, mw_num) + qp_offset;
tx_size -= sizeof(struct ntb_rx_info);
+ qp->rx_info = qp->tx_mw + tx_size;

- qp->tx_mw = qp->rx_info + 1;
/* Due to housekeeping, there must be atleast 2 buffs */
qp->tx_max_frame = min(transport_mtu, tx_size / 2);
qp->tx_max_entry = tx_size / qp->tx_max_frame;
@@ -956,13 +988,22 @@ void ntb_transport_free(void *transport)
kfree(nt);
}

-static void ntb_rx_copy_task(struct ntb_transport_qp *qp,
- struct ntb_queue_entry *entry, void *offset)
+static void ntb_rx_copy_callback(void *data)
{
+ struct ntb_queue_entry *entry = data;
+ struct ntb_transport_qp *qp = entry->qp;
void *cb_data = entry->cb_data;
unsigned int len = entry->len;
+ struct ntb_payload_header *hdr = entry->rx_hdr;
+
+ wmb();
+ hdr->flags = 0;

- memcpy(entry->buf, offset, entry->len);
+ /* If the callbacks come out of order, the writing of the index to the
+ * last completed will be out of order. This may result in the the
+ * receive stalling forever.
+ */
+ iowrite32(entry->index, &qp->rx_info->entry);

ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry, &qp->rx_free_q);

@@ -970,6 +1011,80 @@ static void ntb_rx_copy_task(struct ntb_transport_qp *qp,
qp->rx_handler(qp, qp->cb_data, cb_data, len);
}

+static void ntb_memcpy_rx(struct ntb_queue_entry *entry, void *offset)
+{
+ void *buf = entry->buf;
+ size_t len = entry->len;
+
+ memcpy(buf, offset, len);
+
+ ntb_rx_copy_callback(entry);
+}
+
+static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset,
+ size_t len)
+{
+ struct dma_async_tx_descriptor *txd;
+ struct ntb_transport_qp *qp = entry->qp;
+ struct dma_chan *chan = qp->dma_chan;
+ struct dma_device *device;
+ dma_addr_t src, dest;
+ dma_cookie_t cookie;
+ void *buf = entry->buf;
+ unsigned long flags;
+
+ entry->len = len;
+
+ if (!chan)
+ goto err;
+
+ device = chan->device;
+
+ if (len < copy_bytes ||
+ !is_dma_copy_aligned(device, __pa(offset), __pa(buf), len))
+ goto err1;
+
+ dest = dma_map_single(device->dev, buf, len, DMA_FROM_DEVICE);
+ if (dma_mapping_error(device->dev, dest))
+ goto err1;
+
+ src = dma_map_single(device->dev, offset, len, DMA_TO_DEVICE);
+ if (dma_mapping_error(device->dev, src))
+ goto err2;
+
+ flags = DMA_COMPL_DEST_UNMAP_SINGLE | DMA_COMPL_SRC_UNMAP_SINGLE |
+ DMA_PREP_INTERRUPT;
+ txd = device->device_prep_dma_memcpy(chan, dest, src, len, flags);
+ if (!txd)
+ goto err3;
+
+ txd_clear_parent(txd);
+ txd_clear_next(txd);
+ txd->callback = ntb_rx_copy_callback;
+ txd->callback_param = entry;
+
+ cookie = dmaengine_submit(txd);
+ if (dma_submit_error(cookie))
+ goto err3;
+
+ qp->last_cookie = cookie;
+
+ dma_async_issue_pending(chan);
+ qp->rx_async++;
+
+ return;
+
+err3:
+ dma_unmap_single(device->dev, src, len, DMA_TO_DEVICE);
+err2:
+ dma_unmap_single(device->dev, dest, len, DMA_FROM_DEVICE);
+err1:
+ dma_sync_wait(chan, qp->last_cookie);
+err:
+ ntb_memcpy_rx(entry, offset);
+ qp->rx_memcpy++;
+}
+
static int ntb_process_rxc(struct ntb_transport_qp *qp)
{
struct ntb_payload_header *hdr;
@@ -1008,41 +1123,44 @@ static int ntb_process_rxc(struct ntb_transport_qp *qp)
if (hdr->flags & LINK_DOWN_FLAG) {
ntb_qp_link_down(qp);

- ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
- &qp->rx_pend_q);
- goto out;
+ goto err;
}

dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
"rx offset %u, ver %u - %d payload received, buf size %d\n",
qp->rx_index, hdr->ver, hdr->len, entry->len);

- if (hdr->len <= entry->len) {
- entry->len = hdr->len;
- ntb_rx_copy_task(qp, entry, offset);
- } else {
- ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
- &qp->rx_pend_q);
+ qp->rx_bytes += hdr->len;
+ qp->rx_pkts++;

+ if (hdr->len > entry->len) {
qp->rx_err_oflow++;
dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
"RX overflow! Wanted %d got %d\n",
hdr->len, entry->len);
+
+ goto err;
}

- qp->rx_bytes += hdr->len;
- qp->rx_pkts++;
+ entry->index = qp->rx_index;
+ entry->rx_hdr = hdr;

-out:
- /* Ensure that the data is fully copied out before clearing the flag */
- wmb();
- hdr->flags = 0;
- iowrite32(qp->rx_index, &qp->rx_info->entry);
+ ntb_async_rx(entry, offset, hdr->len);

+out:
qp->rx_index++;
qp->rx_index %= qp->rx_max_entry;

return 0;
+
+err:
+ ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
+ &qp->rx_pend_q);
+ wmb();
+ hdr->flags = 0;
+ iowrite32(qp->rx_index, &qp->rx_info->entry);
+
+ goto out;
}

static void ntb_transport_rx(unsigned long data)
@@ -1070,19 +1188,12 @@ static void ntb_transport_rxc_db(void *data, int db_num)
tasklet_schedule(&qp->rx_work);
}

-static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
- struct ntb_queue_entry *entry,
- void __iomem *offset)
+static void ntb_tx_copy_callback(void *data)
{
- struct ntb_payload_header __iomem *hdr;
-
- memcpy_toio(offset, entry->buf, entry->len);
-
- hdr = offset + qp->tx_max_frame - sizeof(struct ntb_payload_header);
- iowrite32(entry->len, &hdr->len);
- iowrite32((u32) qp->tx_pkts, &hdr->ver);
+ struct ntb_queue_entry *entry = data;
+ struct ntb_transport_qp *qp = entry->qp;
+ struct ntb_payload_header __iomem *hdr = entry->tx_hdr;

- /* Ensure that the data is fully copied out before setting the flag */
wmb();
iowrite32(entry->flags | DESC_DONE_FLAG, &hdr->flags);

@@ -1103,15 +1214,79 @@ static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry, &qp->tx_free_q);
}

-static int ntb_process_tx(struct ntb_transport_qp *qp,
- struct ntb_queue_entry *entry)
+static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void __iomem *offset)
+{
+ memcpy_toio(offset, entry->buf, entry->len);
+
+ ntb_tx_copy_callback(entry);
+}
+
+static void ntb_async_tx(struct ntb_transport_qp *qp,
+ struct ntb_queue_entry *entry)
{
+ struct dma_async_tx_descriptor *txd;
+ struct dma_chan *chan = qp->dma_chan;
+ struct dma_device *device;
+ dma_addr_t src, dest;
+ dma_cookie_t cookie;
+ struct ntb_payload_header __iomem *hdr;
void __iomem *offset;
+ size_t len = entry->len;
+ void *buf = entry->buf;
+ unsigned long flags;

offset = qp->tx_mw + qp->tx_max_frame * qp->tx_index;
+ hdr = offset + qp->tx_max_frame - sizeof(struct ntb_payload_header);
+ entry->tx_hdr = hdr;
+
+ iowrite32(entry->len, &hdr->len);
+ iowrite32((u32) qp->tx_pkts, &hdr->ver);
+
+ if (!chan)
+ goto err;

- dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%lld - offset %p, tx %u, entry len %d flags %x buff %p\n",
- qp->tx_pkts, offset, qp->tx_index, entry->len, entry->flags,
+ device = chan->device;
+
+ dest = qp->tx_mw_raw + qp->tx_max_frame * qp->tx_index;
+
+ if (len < copy_bytes ||
+ !is_dma_copy_aligned(device, __pa(buf), dest, len))
+ goto err;
+
+ src = dma_map_single(device->dev, buf, len, DMA_TO_DEVICE);
+ if (dma_mapping_error(device->dev, src))
+ goto err;
+
+ flags = DMA_COMPL_SRC_UNMAP_SINGLE | DMA_PREP_INTERRUPT;
+ txd = device->device_prep_dma_memcpy(chan, dest, src, len, flags);
+ if (!txd)
+ goto err1;
+
+ txd_clear_parent(txd);
+ txd_clear_next(txd);
+ txd->callback = ntb_tx_copy_callback;
+ txd->callback_param = entry;
+
+ cookie = dmaengine_submit(txd);
+ if (dma_submit_error(cookie))
+ goto err1;
+
+ dma_async_issue_pending(chan);
+ qp->tx_async++;
+
+ return;
+err1:
+ dma_unmap_single(device->dev, src, len, DMA_TO_DEVICE);
+err:
+ ntb_memcpy_tx(entry, offset);
+ qp->tx_memcpy++;
+}
+
+static int ntb_process_tx(struct ntb_transport_qp *qp,
+ struct ntb_queue_entry *entry)
+{
+ dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%lld - tx %u, entry len %d flags %x buff %p\n",
+ qp->tx_pkts, qp->tx_index, entry->len, entry->flags,
entry->buf);
if (qp->tx_index == qp->remote_rx_info->entry) {
qp->tx_ring_full++;
@@ -1127,7 +1302,7 @@ static int ntb_process_tx(struct ntb_transport_qp *qp,
return 0;
}

- ntb_tx_copy_task(qp, entry, offset);
+ ntb_async_tx(qp, entry);

qp->tx_index++;
qp->tx_index %= qp->tx_max_entry;
@@ -1213,11 +1388,16 @@ ntb_transport_create_queue(void *data, struct pci_dev *pdev,
qp->tx_handler = handlers->tx_handler;
qp->event_handler = handlers->event_handler;

+ qp->dma_chan = dma_find_channel(DMA_MEMCPY);
+ if (!qp->dma_chan)
+ dev_info(&pdev->dev, "Unable to allocate private DMA channel, using CPU instead\n");
+
for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
entry = kzalloc(sizeof(struct ntb_queue_entry), GFP_ATOMIC);
if (!entry)
goto err1;

+ entry->qp = qp;
ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
&qp->rx_free_q);
}
@@ -1227,6 +1407,7 @@ ntb_transport_create_queue(void *data, struct pci_dev *pdev,
if (!entry)
goto err2;

+ entry->qp = qp;
ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
&qp->tx_free_q);
}
@@ -1272,11 +1453,14 @@ void ntb_transport_free_queue(struct ntb_transport_qp *qp)

pdev = ntb_query_pdev(qp->ndev);

- cancel_delayed_work_sync(&qp->link_work);
+ if (qp->dma_chan)
+ dmaengine_terminate_all(qp->dma_chan);

ntb_unregister_db_callback(qp->ndev, qp->qp_num);
tasklet_disable(&qp->rx_work);

+ cancel_delayed_work_sync(&qp->link_work);
+
while ((entry = ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q)))
kfree(entry);

@@ -1382,8 +1566,10 @@ int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
return -EINVAL;

entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q);
- if (!entry)
+ if (!entry) {
+ qp->tx_err_no_buf++;
return -ENOMEM;
+ }

entry->cb_data = cb;
entry->buf = data;
@@ -1499,9 +1685,18 @@ EXPORT_SYMBOL_GPL(ntb_transport_qp_num);
*/
unsigned int ntb_transport_max_size(struct ntb_transport_qp *qp)
{
+ unsigned int max;
+
if (!qp)
return 0;

- return qp->tx_max_frame - sizeof(struct ntb_payload_header);
+ if (!qp->dma_chan)
+ return qp->tx_max_frame - sizeof(struct ntb_payload_header);
+
+ /* If DMA engine usage is possible, try to find the max size for that */
+ max = qp->tx_max_frame - sizeof(struct ntb_payload_header);
+ max -= max % (1 << qp->dma_chan->device->copy_align);
+
+ return max;
}
EXPORT_SYMBOL_GPL(ntb_transport_max_size);
--
1.7.9.5

2013-08-02 17:38:39

by Jon Mason

[permalink] [raw]
Subject: [PATCH 11/15] NTB: NTB-RP support

Add support for Non-Transparent Bridge connected to a PCI-E Root Port on
the remote system (also known as NTB-RP mode). This allows for a NTB
enabled system to be connected to a non-NTB enabled system/slot.

Modifications to the registers and BARs/MWs on the Secondary side by the
remote system are reflected into registers on the Primary side for the
local system. Similarly, modifications of registers and BARs/MWs on
Primary side by the local system are reflected into registers on the
Secondary side for the Remote System. This allows communication between
the 2 sides via these registers and BARs/MWs.

Note: there is not a fix for the Xeon Errata (that was already worked
around in NTB-B2B mode) for NTB-RP mode. Due to this limitation, NTB-RP
will not work on the Secondary side with the Xeon Errata workaround
enabled. To get around this, disable the workaround via the
xeon_errata_workaround=0 modparm. However, this can cause the hang
described in the errata.

Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/ntb_hw.c | 239 +++++++++++++++++++++++++++++-------------------
drivers/ntb/ntb_regs.h | 5 +-
2 files changed, 150 insertions(+), 94 deletions(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index e7cbb44..371b65e 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -69,7 +69,7 @@ module_param(xeon_errata_workaround, bool, 0644);
MODULE_PARM_DESC(xeon_errata_workaround, "Workaround for the Xeon Errata");

enum {
- NTB_CONN_CLASSIC = 0,
+ NTB_CONN_TRANSPARENT = 0,
NTB_CONN_B2B,
NTB_CONN_RP,
};
@@ -509,7 +509,8 @@ static void ntb_link_event(struct ntb_device *ndev, int link_state)
ndev->link_status = NTB_LINK_UP;
event = NTB_EVENT_HW_LINK_UP;

- if (ndev->hw_type == BWD_HW)
+ if (ndev->hw_type == BWD_HW ||
+ ndev->conn_type == NTB_CONN_TRANSPARENT)
status = readw(ndev->reg_ofs.lnk_stat);
else {
int rc = pci_read_config_word(ndev->pdev,
@@ -649,109 +650,161 @@ static int ntb_xeon_setup(struct ntb_device *ndev)
if (rc)
return rc;

- switch (val & SNB_PPD_CONN_TYPE) {
- case NTB_CONN_B2B:
- ndev->conn_type = NTB_CONN_B2B;
- break;
- case NTB_CONN_CLASSIC:
- case NTB_CONN_RP:
- default:
- dev_err(&ndev->pdev->dev, "Only B2B supported at this time\n");
- return -EINVAL;
- }
-
if (val & SNB_PPD_DEV_TYPE)
ndev->dev_type = NTB_DEV_USD;
else
ndev->dev_type = NTB_DEV_DSD;

- ndev->reg_ofs.ldb = ndev->reg_base + SNB_PDOORBELL_OFFSET;
- ndev->reg_ofs.ldb_mask = ndev->reg_base + SNB_PDBMSK_OFFSET;
- ndev->reg_ofs.bar2_xlat = ndev->reg_base + SNB_SBAR2XLAT_OFFSET;
- ndev->reg_ofs.bar4_xlat = ndev->reg_base + SNB_SBAR4XLAT_OFFSET;
- ndev->reg_ofs.lnk_cntl = ndev->reg_base + SNB_NTBCNTL_OFFSET;
- ndev->reg_ofs.lnk_stat = ndev->reg_base + SNB_LINK_STATUS_OFFSET;
- ndev->reg_ofs.spad_read = ndev->reg_base + SNB_SPAD_OFFSET;
- ndev->reg_ofs.spci_cmd = ndev->reg_base + SNB_PCICMD_OFFSET;
-
- /* There is a Xeon hardware errata related to writes to
- * SDOORBELL or B2BDOORBELL in conjunction with inbound access
- * to NTB MMIO Space, which may hang the system. To workaround
- * this use the second memory window to access the interrupt and
- * scratch pad registers on the remote system.
- */
- if (xeon_errata_workaround) {
- if (!ndev->mw[1].bar_sz)
- return -EINVAL;
-
- ndev->limits.max_mw = SNB_ERRATA_MAX_MW;
- ndev->reg_ofs.spad_write = ndev->mw[1].vbase +
- SNB_SPAD_OFFSET;
- ndev->reg_ofs.rdb = ndev->mw[1].vbase +
- SNB_PDOORBELL_OFFSET;
-
- /* Set the Limit register to 4k, the minimum size, to
- * prevent an illegal access
+ switch (val & SNB_PPD_CONN_TYPE) {
+ case NTB_CONN_B2B:
+ dev_info(&ndev->pdev->dev, "Conn Type = B2B\n");
+ ndev->conn_type = NTB_CONN_B2B;
+ ndev->reg_ofs.ldb = ndev->reg_base + SNB_PDOORBELL_OFFSET;
+ ndev->reg_ofs.ldb_mask = ndev->reg_base + SNB_PDBMSK_OFFSET;
+ ndev->reg_ofs.spad_read = ndev->reg_base + SNB_SPAD_OFFSET;
+ ndev->reg_ofs.bar2_xlat = ndev->reg_base + SNB_SBAR2XLAT_OFFSET;
+ ndev->reg_ofs.bar4_xlat = ndev->reg_base + SNB_SBAR4XLAT_OFFSET;
+ ndev->limits.max_spads = SNB_MAX_B2B_SPADS;
+
+ /* There is a Xeon hardware errata related to writes to
+ * SDOORBELL or B2BDOORBELL in conjunction with inbound access
+ * to NTB MMIO Space, which may hang the system. To workaround
+ * this use the second memory window to access the interrupt and
+ * scratch pad registers on the remote system.
*/
- writeq(ndev->mw[1].bar_sz + 0x1000, ndev->reg_base +
- SNB_PBAR4LMT_OFFSET);
- } else {
- ndev->limits.max_mw = SNB_MAX_MW;
- ndev->reg_ofs.spad_write = ndev->reg_base +
- SNB_B2B_SPAD_OFFSET;
- ndev->reg_ofs.rdb = ndev->reg_base +
- SNB_B2B_DOORBELL_OFFSET;
+ if (xeon_errata_workaround) {
+ if (!ndev->mw[1].bar_sz)
+ return -EINVAL;
+
+ ndev->limits.max_mw = SNB_ERRATA_MAX_MW;
+ ndev->reg_ofs.spad_write = ndev->mw[1].vbase +
+ SNB_SPAD_OFFSET;
+ ndev->reg_ofs.rdb = ndev->mw[1].vbase +
+ SNB_PDOORBELL_OFFSET;
+
+ /* Set the Limit register to 4k, the minimum size, to
+ * prevent an illegal access
+ */
+ writeq(ndev->mw[1].bar_sz + 0x1000, ndev->reg_base +
+ SNB_PBAR4LMT_OFFSET);
+ } else {
+ ndev->limits.max_mw = SNB_MAX_MW;
+ ndev->reg_ofs.spad_write = ndev->reg_base +
+ SNB_B2B_SPAD_OFFSET;
+ ndev->reg_ofs.rdb = ndev->reg_base +
+ SNB_B2B_DOORBELL_OFFSET;
+
+ /* Disable the Limit register, just incase it is set to
+ * something silly
+ */
+ writeq(0, ndev->reg_base + SNB_PBAR4LMT_OFFSET);
+ }

- /* Disable the Limit register, just incase it is set to
- * something silly
+ /* The Xeon errata workaround requires setting SBAR Base
+ * addresses to known values, so that the PBAR XLAT can be
+ * pointed at SBAR0 of the remote system.
*/
- writeq(0, ndev->reg_base + SNB_PBAR4LMT_OFFSET);
- }
+ if (ndev->dev_type == NTB_DEV_USD) {
+ writeq(SNB_MBAR23_DSD_ADDR, ndev->reg_base +
+ SNB_PBAR2XLAT_OFFSET);
+ if (xeon_errata_workaround)
+ writeq(SNB_MBAR01_DSD_ADDR, ndev->reg_base +
+ SNB_PBAR4XLAT_OFFSET);
+ else {
+ writeq(SNB_MBAR45_DSD_ADDR, ndev->reg_base +
+ SNB_PBAR4XLAT_OFFSET);
+ writeq(SNB_MBAR01_DSD_ADDR, ndev->reg_base +
+ SNB_B2B_XLAT_OFFSET);
+ }

- /* The Xeon errata workaround requires setting SBAR Base
- * addresses to known values, so that the PBAR XLAT can be
- * pointed at SBAR0 of the remote system.
- */
- if (ndev->dev_type == NTB_DEV_USD) {
- writeq(SNB_MBAR23_DSD_ADDR, ndev->reg_base +
- SNB_PBAR2XLAT_OFFSET);
- if (xeon_errata_workaround)
+ writeq(SNB_MBAR01_USD_ADDR, ndev->reg_base +
+ SNB_SBAR0BASE_OFFSET);
+ writeq(SNB_MBAR23_USD_ADDR, ndev->reg_base +
+ SNB_SBAR2BASE_OFFSET);
+ writeq(SNB_MBAR45_USD_ADDR, ndev->reg_base +
+ SNB_SBAR4BASE_OFFSET);
+ } else {
+ writeq(SNB_MBAR23_USD_ADDR, ndev->reg_base +
+ SNB_PBAR2XLAT_OFFSET);
+ if (xeon_errata_workaround)
+ writeq(SNB_MBAR01_USD_ADDR, ndev->reg_base +
+ SNB_PBAR4XLAT_OFFSET);
+ else {
+ writeq(SNB_MBAR45_USD_ADDR, ndev->reg_base +
+ SNB_PBAR4XLAT_OFFSET);
+ writeq(SNB_MBAR01_USD_ADDR, ndev->reg_base +
+ SNB_B2B_XLAT_OFFSET);
+ }
writeq(SNB_MBAR01_DSD_ADDR, ndev->reg_base +
- SNB_PBAR4XLAT_OFFSET);
- else {
+ SNB_SBAR0BASE_OFFSET);
+ writeq(SNB_MBAR23_DSD_ADDR, ndev->reg_base +
+ SNB_SBAR2BASE_OFFSET);
writeq(SNB_MBAR45_DSD_ADDR, ndev->reg_base +
- SNB_PBAR4XLAT_OFFSET);
- writeq(SNB_MBAR01_DSD_ADDR, ndev->reg_base +
- SNB_B2B_XLAT_OFFSET);
+ SNB_SBAR4BASE_OFFSET);
}
+ break;
+ case NTB_CONN_RP:
+ dev_info(&ndev->pdev->dev, "Conn Type = RP\n");
+ ndev->conn_type = NTB_CONN_RP;

- writeq(SNB_MBAR01_USD_ADDR, ndev->reg_base +
- SNB_SBAR0BASE_OFFSET);
- writeq(SNB_MBAR23_USD_ADDR, ndev->reg_base +
- SNB_SBAR2BASE_OFFSET);
- writeq(SNB_MBAR45_USD_ADDR, ndev->reg_base +
- SNB_SBAR4BASE_OFFSET);
- } else {
- writeq(SNB_MBAR23_USD_ADDR, ndev->reg_base +
- SNB_PBAR2XLAT_OFFSET);
if (xeon_errata_workaround)
- writeq(SNB_MBAR01_USD_ADDR, ndev->reg_base +
- SNB_PBAR4XLAT_OFFSET);
- else {
- writeq(SNB_MBAR45_USD_ADDR, ndev->reg_base +
- SNB_PBAR4XLAT_OFFSET);
- writeq(SNB_MBAR01_USD_ADDR, ndev->reg_base +
- SNB_B2B_XLAT_OFFSET);
- }
- writeq(SNB_MBAR01_DSD_ADDR, ndev->reg_base +
- SNB_SBAR0BASE_OFFSET);
- writeq(SNB_MBAR23_DSD_ADDR, ndev->reg_base +
- SNB_SBAR2BASE_OFFSET);
- writeq(SNB_MBAR45_DSD_ADDR, ndev->reg_base +
- SNB_SBAR4BASE_OFFSET);
+ return -EINVAL;
+
+ /* Scratch pads need to have exclusive access from the primary
+ * or secondary side. Halve the num spads so that each side can
+ * have an equal amount.
+ */
+ ndev->limits.max_spads = SNB_MAX_COMPAT_SPADS / 2;
+ /* Note: The SDOORBELL is the cause of the errata. You REALLY
+ * don't want to touch it.
+ */
+ ndev->reg_ofs.rdb = ndev->reg_base + SNB_SDOORBELL_OFFSET;
+ ndev->reg_ofs.ldb = ndev->reg_base + SNB_PDOORBELL_OFFSET;
+ ndev->reg_ofs.ldb_mask = ndev->reg_base + SNB_PDBMSK_OFFSET;
+ /* Offset the start of the spads to correspond to whether it is
+ * primary or secondary
+ */
+ ndev->reg_ofs.spad_write = ndev->reg_base + SNB_SPAD_OFFSET +
+ ndev->limits.max_spads * 4;
+ ndev->reg_ofs.spad_read = ndev->reg_base + SNB_SPAD_OFFSET;
+ ndev->reg_ofs.bar2_xlat = ndev->reg_base + SNB_SBAR2XLAT_OFFSET;
+ ndev->reg_ofs.bar4_xlat = ndev->reg_base + SNB_SBAR4XLAT_OFFSET;
+ ndev->limits.max_mw = SNB_MAX_MW;
+ break;
+ case NTB_CONN_TRANSPARENT:
+ dev_info(&ndev->pdev->dev, "Conn Type = TRANSPARENT\n");
+ ndev->conn_type = NTB_CONN_TRANSPARENT;
+ /* Scratch pads need to have exclusive access from the primary
+ * or secondary side. Halve the num spads so that each side can
+ * have an equal amount.
+ */
+ ndev->limits.max_spads = SNB_MAX_COMPAT_SPADS / 2;
+ ndev->reg_ofs.rdb = ndev->reg_base + SNB_PDOORBELL_OFFSET;
+ ndev->reg_ofs.ldb = ndev->reg_base + SNB_SDOORBELL_OFFSET;
+ ndev->reg_ofs.ldb_mask = ndev->reg_base + SNB_SDBMSK_OFFSET;
+ ndev->reg_ofs.spad_write = ndev->reg_base + SNB_SPAD_OFFSET;
+ /* Offset the start of the spads to correspond to whether it is
+ * primary or secondary
+ */
+ ndev->reg_ofs.spad_read = ndev->reg_base + SNB_SPAD_OFFSET +
+ ndev->limits.max_spads * 4;
+ ndev->reg_ofs.bar2_xlat = ndev->reg_base + SNB_PBAR2XLAT_OFFSET;
+ ndev->reg_ofs.bar4_xlat = ndev->reg_base + SNB_PBAR4XLAT_OFFSET;
+
+ ndev->limits.max_mw = SNB_MAX_MW;
+ break;
+ default:
+ /* Most likely caused by the remote NTB-RP device not being
+ * configured
+ */
+ dev_err(&ndev->pdev->dev, "Unknown PPD %x\n", val);
+ return -EINVAL;
}

- ndev->limits.max_spads = SNB_MAX_B2B_SPADS;
+ ndev->reg_ofs.lnk_cntl = ndev->reg_base + SNB_NTBCNTL_OFFSET;
+ ndev->reg_ofs.lnk_stat = ndev->reg_base + SNB_SLINK_STATUS_OFFSET;
+ ndev->reg_ofs.spci_cmd = ndev->reg_base + SNB_PCICMD_OFFSET;
+
ndev->limits.max_db_bits = SNB_MAX_DB_BITS;
ndev->limits.msix_cnt = SNB_MSIX_CNT;
ndev->bits_per_vector = SNB_DB_BITS_PER_VEC;
@@ -855,8 +908,10 @@ static int ntb_device_setup(struct ntb_device *ndev)
dev_info(&ndev->pdev->dev, "Device Type = %s\n",
ndev->dev_type == NTB_DEV_USD ? "USD/DSP" : "DSD/USP");

- /* Enable Bus Master and Memory Space on the secondary side */
- writew(PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER, ndev->reg_ofs.spci_cmd);
+ if (ndev->conn_type == NTB_CONN_B2B)
+ /* Enable Bus Master and Memory Space on the secondary side */
+ writew(PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER,
+ ndev->reg_ofs.spci_cmd);

return 0;
}
@@ -1350,7 +1405,7 @@ static void ntb_pci_remove(struct pci_dev *pdev)

/* Bring NTB link down */
ntb_cntl = readl(ndev->reg_ofs.lnk_cntl);
- ntb_cntl |= NTB_LINK_DISABLE;
+ ntb_cntl |= NTB_CNTL_LINK_DISABLE;
writel(ntb_cntl, ndev->reg_ofs.lnk_cntl);

ntb_transport_free(ndev->ntb_transport);
diff --git a/drivers/ntb/ntb_regs.h b/drivers/ntb/ntb_regs.h
index 8f3f903..910d5e9 100644
--- a/drivers/ntb/ntb_regs.h
+++ b/drivers/ntb/ntb_regs.h
@@ -46,8 +46,6 @@
* Jon Mason <[email protected]>
*/

-#define NTB_LINK_ENABLE 0x0000
-#define NTB_LINK_DISABLE 0x0002
#define NTB_LINK_STATUS_ACTIVE 0x2000
#define NTB_LINK_SPEED_MASK 0x000f
#define NTB_LINK_WIDTH_MASK 0x03f0
@@ -65,6 +63,7 @@

#define SNB_PCICMD_OFFSET 0x0504
#define SNB_DEVCTRL_OFFSET 0x0598
+#define SNB_SLINK_STATUS_OFFSET 0x05A2
#define SNB_LINK_STATUS_OFFSET 0x01A2

#define SNB_PBAR2LMT_OFFSET 0x0000
@@ -146,6 +145,8 @@
#define BWD_LTSSMSTATEJMP_FORCEDETECT (1 << 2)
#define BWD_IBIST_ERR_OFLOW 0x7FFF7FFF

+#define NTB_CNTL_CFG_LOCK (1 << 0)
+#define NTB_CNTL_LINK_DISABLE (1 << 1)
#define NTB_CNTL_BAR23_SNOOP (1 << 2)
#define NTB_CNTL_BAR45_SNOOP (1 << 6)
#define BWD_CNTL_LINK_DOWN (1 << 16)
--
1.7.9.5

2013-08-02 17:38:38

by Jon Mason

[permalink] [raw]
Subject: [PATCH 08/15] NTB: Enable 32bit Support

Correct the issues on NTB that prevented it from working on x86_32 and
modify the Kconfig to allow it to be permitted to be used in that
environment as well.

Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/Kconfig | 2 +-
drivers/ntb/ntb_hw.c | 4 ++--
drivers/ntb/ntb_hw.h | 17 ++++++++++++++++-
3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/ntb/Kconfig b/drivers/ntb/Kconfig
index 37ee649..f69df793 100644
--- a/drivers/ntb/Kconfig
+++ b/drivers/ntb/Kconfig
@@ -1,7 +1,7 @@
config NTB
tristate "Intel Non-Transparent Bridge support"
depends on PCI
- depends on X86_64
+ depends on X86
help
The PCI-E Non-transparent bridge hardware is a point-to-point PCI-E bus
connecting 2 systems. When configured, writes to the device's PCI
diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index fc6b19d..1d8e551 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -376,7 +376,7 @@ void __iomem *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw)
*
* RETURNS: the size of the memory window or zero on error
*/
-resource_size_t ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw)
+u64 ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw)
{
if (mw >= ntb_max_mw(ndev))
return 0;
@@ -1247,7 +1247,7 @@ static int ntb_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
ioremap_wc(pci_resource_start(pdev, MW_TO_BAR(i)),
ndev->mw[i].bar_sz);
dev_info(&pdev->dev, "MW %d size %llu\n", i,
- pci_resource_len(pdev, MW_TO_BAR(i)));
+ (unsigned long long) ndev->mw[i].bar_sz);
if (!ndev->mw[i].vbase) {
dev_warn(&pdev->dev, "Cannot remap BAR %d\n",
MW_TO_BAR(i));
diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
index 591d4ff..b03de80 100644
--- a/drivers/ntb/ntb_hw.h
+++ b/drivers/ntb/ntb_hw.h
@@ -62,6 +62,21 @@

#define msix_table_size(control) ((control & PCI_MSIX_FLAGS_QSIZE)+1)

+#ifndef readq
+static inline u64 readq(void __iomem *addr)
+{
+ return readl(addr) | (((u64) readl(addr + 4)) << 32LL);
+}
+#endif
+
+#ifndef writeq
+static inline void writeq(u64 val, void __iomem *addr)
+{
+ writel(((u32) (val)), (addr));
+ writel(((u32) (val >> 32)), (addr + 4));
+}
+#endif
+
#define NTB_BAR_MMIO 0
#define NTB_BAR_23 2
#define NTB_BAR_45 4
@@ -226,7 +241,7 @@ int ntb_read_local_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
int ntb_write_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
void __iomem *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw);
-resource_size_t ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw);
+u64 ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw);
void ntb_ring_sdb(struct ntb_device *ndev, unsigned int idx);
void *ntb_find_transport(struct pci_dev *pdev);

--
1.7.9.5

2013-08-02 17:36:37

by Jon Mason

[permalink] [raw]
Subject: [PATCH 14/15] NTB: Update Version

Update NTB version to 1.0

Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/ntb_hw.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index 95603f8..9dd69d8 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -57,7 +57,7 @@
#include "ntb_regs.h"

#define NTB_NAME "Intel(R) PCI-E Non-Transparent Bridge Driver"
-#define NTB_VER "0.25"
+#define NTB_VER "1.0"

MODULE_DESCRIPTION(NTB_NAME);
MODULE_VERSION(NTB_VER);
--
1.7.9.5

2013-08-02 17:39:13

by Jon Mason

[permalink] [raw]
Subject: [PATCH 01/15] NTB: Add Error Handling in ntb_device_setup

If an error is encountered in ntb_device_setup, it is possible that the
spci_cmd isn't populated. Writes to the offset can result in a NULL
pointer dereference. This issue is easily encountered by running in
NTB-RP mode, as it currently is not supported and will generate an
error. To get around this issue, return if an error is encountered
prior to attempting to write to the spci_cmd offset.

Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/ntb_hw.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index 2dacd19..515099e 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -644,10 +644,13 @@ static int ntb_device_setup(struct ntb_device *ndev)
rc = -ENODEV;
}

+ if (rc)
+ return rc;
+
/* Enable Bus Master and Memory Space on the secondary side */
writew(PCI_COMMAND_MEMORY | PCI_COMMAND_MASTER, ndev->reg_ofs.spci_cmd);

- return rc;
+ return 0;
}

static void ntb_device_free(struct ntb_device *ndev)
--
1.7.9.5

2013-08-02 17:39:14

by Jon Mason

[permalink] [raw]
Subject: [PATCH 06/15] NTB: BWD Link Recovery

The BWD NTB device will drop the link if an error is encountered on the
point-to-point PCI bridge. The link will stay down until all errors are
cleared and the link is re-established. On link down, check to see if
the error is detected, if so do the necessary housekeeping to try and
recover from the error and reestablish the link.

There is a potential race between the 2 NTB devices recovering at the
same time. If the times are synchronized, the link will not recover and the
driver will be stuck in this loop forever. Add a random interval to the
recovery time to prevent this race.

Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/ntb_hw.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++--
drivers/ntb/ntb_hw.h | 5 +++
drivers/ntb/ntb_regs.h | 15 +++++++
3 files changed, 127 insertions(+), 3 deletions(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index 674bc09..d259d41 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -46,10 +46,12 @@
* Jon Mason <[email protected]>
*/
#include <linux/debugfs.h>
+#include <linux/delay.h>
#include <linux/init.h>
#include <linux/interrupt.h>
#include <linux/module.h>
#include <linux/pci.h>
+#include <linux/random.h>
#include <linux/slab.h>
#include "ntb_hw.h"
#include "ntb_regs.h"
@@ -84,6 +86,8 @@ enum {

struct dentry *debugfs_dir;

+#define BWD_LINK_RECOVERY_TIME 500
+
/* Translate memory window 0,1 to BAR 2,4 */
#define MW_TO_BAR(mw) (mw * NTB_MAX_NUM_MW + 2)

@@ -425,6 +429,49 @@ void ntb_ring_sdb(struct ntb_device *ndev, unsigned int db)
(db * ndev->bits_per_vector), ndev->reg_ofs.sdb);
}

+static void bwd_recover_link(struct ntb_device *ndev)
+{
+ u32 status;
+
+ /* Driver resets the NTB ModPhy lanes - magic! */
+ writeb(0xe0, ndev->reg_base + BWD_MODPHY_PCSREG6);
+ writeb(0x40, ndev->reg_base + BWD_MODPHY_PCSREG4);
+ writeb(0x60, ndev->reg_base + BWD_MODPHY_PCSREG4);
+ writeb(0x60, ndev->reg_base + BWD_MODPHY_PCSREG6);
+
+ /* Driver waits 100ms to allow the NTB ModPhy to settle */
+ msleep(100);
+
+ /* Clear AER Errors, write to clear */
+ status = readl(ndev->reg_base + BWD_ERRCORSTS_OFFSET);
+ dev_dbg(&ndev->pdev->dev, "ERRCORSTS = %x\n", status);
+ status &= PCI_ERR_COR_REP_ROLL;
+ writel(status, ndev->reg_base + BWD_ERRCORSTS_OFFSET);
+
+ /* Clear unexpected electrical idle event in LTSSM, write to clear */
+ status = readl(ndev->reg_base + BWD_LTSSMERRSTS0_OFFSET);
+ dev_dbg(&ndev->pdev->dev, "LTSSMERRSTS0 = %x\n", status);
+ status |= BWD_LTSSMERRSTS0_UNEXPECTEDEI;
+ writel(status, ndev->reg_base + BWD_LTSSMERRSTS0_OFFSET);
+
+ /* Clear DeSkew Buffer error, write to clear */
+ status = readl(ndev->reg_base + BWD_DESKEWSTS_OFFSET);
+ dev_dbg(&ndev->pdev->dev, "DESKEWSTS = %x\n", status);
+ status |= BWD_DESKEWSTS_DBERR;
+ writel(status, ndev->reg_base + BWD_DESKEWSTS_OFFSET);
+
+ status = readl(ndev->reg_base + BWD_IBSTERRRCRVSTS0_OFFSET);
+ dev_dbg(&ndev->pdev->dev, "IBSTERRRCRVSTS0 = %x\n", status);
+ status &= BWD_IBIST_ERR_OFLOW;
+ writel(status, ndev->reg_base + BWD_IBSTERRRCRVSTS0_OFFSET);
+
+ /* Releases the NTB state machine to allow the link to retrain */
+ status = readl(ndev->reg_base + BWD_LTSSMSTATEJMP_OFFSET);
+ dev_dbg(&ndev->pdev->dev, "LTSSMSTATEJMP = %x\n", status);
+ status &= ~BWD_LTSSMSTATEJMP_FORCEDETECT;
+ writel(status, ndev->reg_base + BWD_LTSSMSTATEJMP_OFFSET);
+}
+
static void ntb_link_event(struct ntb_device *ndev, int link_state)
{
unsigned int event;
@@ -448,13 +495,16 @@ static void ntb_link_event(struct ntb_device *ndev, int link_state)
if (rc)
return;
}
+
+ ndev->link_width = (status & NTB_LINK_WIDTH_MASK) >> 4;
+ ndev->link_speed = (status & NTB_LINK_SPEED_MASK);
dev_info(&ndev->pdev->dev, "Link Width %d, Link Speed %d\n",
- (status & NTB_LINK_WIDTH_MASK) >> 4,
- (status & NTB_LINK_SPEED_MASK));
+ ndev->link_width, ndev->link_speed);
} else {
dev_info(&ndev->pdev->dev, "Link Down\n");
ndev->link_status = NTB_LINK_DOWN;
event = NTB_EVENT_HW_LINK_DOWN;
+ /* Don't modify link width/speed, we need it in link recovery */
}

/* notify the upper layer if we have an event change */
@@ -494,6 +544,47 @@ static int ntb_link_status(struct ntb_device *ndev)
return 0;
}

+static void bwd_link_recovery(struct work_struct *work)
+{
+ struct ntb_device *ndev = container_of(work, struct ntb_device,
+ lr_timer.work);
+ u32 status32;
+
+ bwd_recover_link(ndev);
+ /* There is a potential race between the 2 NTB devices recovering at the
+ * same time. If the times are the same, the link will not recover and
+ * the driver will be stuck in this loop forever. Add a random interval
+ * to the recovery time to prevent this race.
+ */
+ msleep(BWD_LINK_RECOVERY_TIME + prandom_u32() % BWD_LINK_RECOVERY_TIME);
+
+ status32 = readl(ndev->reg_base + BWD_LTSSMSTATEJMP_OFFSET);
+ if (status32 & BWD_LTSSMSTATEJMP_FORCEDETECT)
+ goto retry;
+
+ status32 = readl(ndev->reg_base + BWD_IBSTERRRCRVSTS0_OFFSET);
+ if (status32 & BWD_IBIST_ERR_OFLOW)
+ goto retry;
+
+ status32 = readl(ndev->reg_ofs.lnk_cntl);
+ if (!(status32 & BWD_CNTL_LINK_DOWN)) {
+ unsigned char speed, width;
+ u16 status16;
+
+ status16 = readw(ndev->reg_ofs.lnk_stat);
+ width = (status16 & NTB_LINK_WIDTH_MASK) >> 4;
+ speed = (status16 & NTB_LINK_SPEED_MASK);
+ if (ndev->link_width != width || ndev->link_speed != speed)
+ goto retry;
+ }
+
+ schedule_delayed_work(&ndev->hb_timer, NTB_HB_TIMEOUT);
+ return;
+
+retry:
+ schedule_delayed_work(&ndev->lr_timer, NTB_HB_TIMEOUT);
+}
+
/* BWD doesn't have link status interrupt, poll on that platform */
static void bwd_link_poll(struct work_struct *work)
{
@@ -509,6 +600,16 @@ static void bwd_link_poll(struct work_struct *work)
if (rc)
dev_err(&ndev->pdev->dev,
"Error determining link status\n");
+
+ /* Check to see if a link error is the cause of the link down */
+ if (ndev->link_status == NTB_LINK_DOWN) {
+ u32 status32 = readl(ndev->reg_base +
+ BWD_LTSSMSTATEJMP_OFFSET);
+ if (status32 & BWD_LTSSMSTATEJMP_FORCEDETECT) {
+ schedule_delayed_work(&ndev->lr_timer, 0);
+ return;
+ }
+ }
}

schedule_delayed_work(&ndev->hb_timer, NTB_HB_TIMEOUT);
@@ -693,6 +794,7 @@ static int ntb_bwd_setup(struct ntb_device *ndev)

/* Since bwd doesn't have a link interrupt, setup a poll timer */
INIT_DELAYED_WORK(&ndev->hb_timer, bwd_link_poll);
+ INIT_DELAYED_WORK(&ndev->lr_timer, bwd_link_recovery);
schedule_delayed_work(&ndev->hb_timer, NTB_HB_TIMEOUT);

return 0;
@@ -733,8 +835,10 @@ static int ntb_device_setup(struct ntb_device *ndev)

static void ntb_device_free(struct ntb_device *ndev)
{
- if (ndev->hw_type == BWD_HW)
+ if (ndev->hw_type == BWD_HW) {
cancel_delayed_work_sync(&ndev->hb_timer);
+ cancel_delayed_work_sync(&ndev->lr_timer);
+ }
}

static irqreturn_t bwd_callback_msix_irq(int irq, void *data)
diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
index 72fcb22..3a15d49 100644
--- a/drivers/ntb/ntb_hw.h
+++ b/drivers/ntb/ntb_hw.h
@@ -125,10 +125,15 @@ struct ntb_device {
unsigned char num_msix;
unsigned char bits_per_vector;
unsigned char max_cbs;
+ unsigned char link_width;
+ unsigned char link_speed;
unsigned char link_status;
+
struct delayed_work hb_timer;
unsigned long last_ts;

+ struct delayed_work lr_timer;
+
struct dentry *debugfs_dir;
};

diff --git a/drivers/ntb/ntb_regs.h b/drivers/ntb/ntb_regs.h
index 4fe135a..8f3f903 100644
--- a/drivers/ntb/ntb_regs.h
+++ b/drivers/ntb/ntb_regs.h
@@ -114,6 +114,7 @@
#define BWD_MBAR45_OFFSET 0xb020
#define BWD_DEVCTRL_OFFSET 0xb048
#define BWD_LINK_STATUS_OFFSET 0xb052
+#define BWD_ERRCORSTS_OFFSET 0xb110

#define BWD_SBAR2XLAT_OFFSET 0x0008
#define BWD_SBAR4XLAT_OFFSET 0x0010
@@ -131,6 +132,20 @@
#define BWD_B2B_SPADSEMA_OFFSET 0x80c0
#define BWD_B2B_STKYSPAD_OFFSET 0x80c4

+#define BWD_MODPHY_PCSREG4 0x1c004
+#define BWD_MODPHY_PCSREG6 0x1c006
+
+#define BWD_IP_BASE 0xC000
+#define BWD_DESKEWSTS_OFFSET (BWD_IP_BASE + 0x3024)
+#define BWD_LTSSMERRSTS0_OFFSET (BWD_IP_BASE + 0x3180)
+#define BWD_LTSSMSTATEJMP_OFFSET (BWD_IP_BASE + 0x3040)
+#define BWD_IBSTERRRCRVSTS0_OFFSET (BWD_IP_BASE + 0x3324)
+
+#define BWD_DESKEWSTS_DBERR (1 << 15)
+#define BWD_LTSSMERRSTS0_UNEXPECTEDEI (1 << 20)
+#define BWD_LTSSMSTATEJMP_FORCEDETECT (1 << 2)
+#define BWD_IBIST_ERR_OFLOW 0x7FFF7FFF
+
#define NTB_CNTL_BAR23_SNOOP (1 << 2)
#define NTB_CNTL_BAR45_SNOOP (1 << 6)
#define BWD_CNTL_LINK_DOWN (1 << 16)
--
1.7.9.5

2013-08-02 17:39:53

by Jon Mason

[permalink] [raw]
Subject: [PATCH 04/15] NTB: Correct debugfs to work with more than 1 NTB Device

Debugfs was setup in NTB to only have a single debugfs directory. This
resulted in the leaking of debugfs directories and files when multiple
NTB devices were present, due to each device stomping on the variables
containing the previous device's values (thus preventing them from being
freed on cleanup). Correct this by creating a secondary directory of
the PCI BDF for each device present, and nesting the previously existing
information in those directories.

Signed-off-by: Jon Mason <[email protected]>
---
drivers/ntb/ntb_hw.c | 27 +++++++++++++++++++++++++++
drivers/ntb/ntb_hw.h | 16 ++++++++++++++++
drivers/ntb/ntb_transport.c | 17 +++++------------
3 files changed, 48 insertions(+), 12 deletions(-)

diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
index 26b808b..97d6704 100644
--- a/drivers/ntb/ntb_hw.c
+++ b/drivers/ntb/ntb_hw.c
@@ -78,6 +78,8 @@ enum {
BWD_HW,
};

+struct dentry *debugfs_dir;
+
/* Translate memory window 0,1 to BAR 2,4 */
#define MW_TO_BAR(mw) (mw * 2 + 2)

@@ -998,6 +1000,28 @@ static void ntb_free_callbacks(struct ntb_device *ndev)
kfree(ndev->db_cb);
}

+static void ntb_setup_debugfs(struct ntb_device *ndev)
+{
+ if (!debugfs_initialized())
+ return;
+
+ if (!debugfs_dir)
+ debugfs_dir = debugfs_create_dir(KBUILD_MODNAME, NULL);
+
+ ndev->debugfs_dir = debugfs_create_dir(pci_name(ndev->pdev),
+ debugfs_dir);
+}
+
+static void ntb_free_debugfs(struct ntb_device *ndev)
+{
+ debugfs_remove_recursive(ndev->debugfs_dir);
+
+ if (debugfs_dir && simple_empty(debugfs_dir)) {
+ debugfs_remove_recursive(debugfs_dir);
+ debugfs_dir = NULL;
+ }
+}
+
static int ntb_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct ntb_device *ndev;
@@ -1010,6 +1034,7 @@ static int ntb_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
ndev->pdev = pdev;
ndev->link_status = NTB_LINK_DOWN;
pci_set_drvdata(pdev, ndev);
+ ntb_setup_debugfs(ndev);

rc = pci_enable_device(pdev);
if (rc)
@@ -1106,6 +1131,7 @@ err2:
err1:
pci_disable_device(pdev);
err:
+ ntb_free_debugfs(ndev);
kfree(ndev);

dev_err(&pdev->dev, "Error loading %s module\n", KBUILD_MODNAME);
@@ -1135,6 +1161,7 @@ static void ntb_pci_remove(struct pci_dev *pdev)
iounmap(ndev->reg_base);
pci_release_selected_regions(pdev, NTB_BAR_MASK);
pci_disable_device(pdev);
+ ntb_free_debugfs(ndev);
kfree(ndev);
}

diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
index 3a3038c..6a4f56f 100644
--- a/drivers/ntb/ntb_hw.h
+++ b/drivers/ntb/ntb_hw.h
@@ -127,6 +127,8 @@ struct ntb_device {
unsigned char link_status;
struct delayed_work hb_timer;
unsigned long last_ts;
+
+ struct dentry *debugfs_dir;
};

/**
@@ -155,6 +157,20 @@ static inline struct pci_dev *ntb_query_pdev(struct ntb_device *ndev)
return ndev->pdev;
}

+/**
+ * ntb_query_debugfs() - return the debugfs pointer
+ * @ndev: pointer to ntb_device instance
+ *
+ * Given the ntb pointer, return the debugfs directory pointer for the NTB
+ * hardware device
+ *
+ * RETURNS: a pointer to the debugfs directory
+ */
+static inline struct dentry *ntb_query_debugfs(struct ntb_device *ndev)
+{
+ return ndev->debugfs_dir;
+}
+
struct ntb_device *ntb_register_transport(struct pci_dev *pdev,
void *transport);
void ntb_unregister_transport(struct ntb_device *ndev);
diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
index f8d7081..c308915 100644
--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -157,7 +157,6 @@ struct ntb_transport {
bool transport_link;
struct delayed_work link_work;
struct work_struct link_cleanup;
- struct dentry *debugfs_dir;
};

enum {
@@ -824,12 +823,12 @@ static void ntb_transport_init_queue(struct ntb_transport *nt,
qp->tx_max_frame = min(transport_mtu, tx_size / 2);
qp->tx_max_entry = tx_size / qp->tx_max_frame;

- if (nt->debugfs_dir) {
+ if (ntb_query_debugfs(nt->ndev)) {
char debugfs_name[4];

snprintf(debugfs_name, 4, "qp%d", qp_num);
qp->debugfs_dir = debugfs_create_dir(debugfs_name,
- nt->debugfs_dir);
+ ntb_query_debugfs(nt->ndev));

qp->debugfs_stats = debugfs_create_file("stats", S_IRUSR,
qp->debugfs_dir, qp,
@@ -857,11 +856,6 @@ int ntb_transport_init(struct pci_dev *pdev)
if (!nt)
return -ENOMEM;

- if (debugfs_initialized())
- nt->debugfs_dir = debugfs_create_dir(KBUILD_MODNAME, NULL);
- else
- nt->debugfs_dir = NULL;
-
nt->ndev = ntb_register_transport(pdev, nt);
if (!nt->ndev) {
rc = -EIO;
@@ -907,7 +901,6 @@ err2:
err1:
ntb_unregister_transport(nt->ndev);
err:
- debugfs_remove_recursive(nt->debugfs_dir);
kfree(nt);
return rc;
}
@@ -921,16 +914,16 @@ void ntb_transport_free(void *transport)
nt->transport_link = NTB_LINK_DOWN;

/* verify that all the qp's are freed */
- for (i = 0; i < nt->max_qps; i++)
+ for (i = 0; i < nt->max_qps; i++) {
if (!test_bit(i, &nt->qp_bitmap))
ntb_transport_free_queue(&nt->qps[i]);
+ debugfs_remove_recursive(nt->qps[i].debugfs_dir);
+ }

ntb_bus_remove(nt);

cancel_delayed_work_sync(&nt->link_work);

- debugfs_remove_recursive(nt->debugfs_dir);
-
ntb_unregister_event_callback(nt->ndev);

pdev = ntb_query_pdev(nt->ndev);
--
1.7.9.5

2013-08-02 18:04:25

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 00/15] NTB: Bug Fixes and New Features

On Fri, 2013-08-02 at 10:35 -0700, Jon Mason wrote:
> A fairly major update for NTB. Numerous fixes and features being added,
> including adding support for NTB-RP and using DMA engines to
> transmit/receive data. Reviews are appreciated!
>
> Thanks,
> Jon
>
> MAINTAINERS | 2 +
> drivers/ntb/Kconfig | 2 +-
> drivers/ntb/ntb_hw.c | 483 ++++++++++++++++++++++++++++++++++---------
> drivers/ntb/ntb_hw.h | 105 ++++++++--
> drivers/ntb/ntb_regs.h | 47 +++--
> drivers/ntb/ntb_transport.c | 382 ++++++++++++++++++++++++++--------
> 6 files changed, 802 insertions(+), 219 deletions(-)

Wouldn't this be better moved to drivers/pci/pcie/ntb ?

2013-08-19 09:41:52

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 09/15] NTB: Use DMA Engine to Transmit and Receive

On Fri, Aug 2, 2013 at 10:35 AM, Jon Mason <[email protected]> wrote:
> Allocate and use a DMA engine channel to transmit and receive data over
> NTB. If none is allocated, fall back to using the CPU to transfer data.
>
> Cc: Dan Williams <[email protected]>
> Cc: Vinod Koul <[email protected]>
> Cc: Dave Jiang <[email protected]>
> Signed-off-by: Jon Mason <[email protected]>
> ---
> drivers/ntb/ntb_hw.c | 17 +++
> drivers/ntb/ntb_hw.h | 1 +
> drivers/ntb/ntb_transport.c | 285 ++++++++++++++++++++++++++++++++++++-------
> 3 files changed, 258 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
> index 1d8e551..014222c 100644
> --- a/drivers/ntb/ntb_hw.c
> +++ b/drivers/ntb/ntb_hw.c
> @@ -350,6 +350,23 @@ int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val)
> }
>
> /**
> + * ntb_get_mw_base() - get addr for the NTB memory window
> + * @ndev: pointer to ntb_device instance
> + * @mw: memory window number
> + *
> + * This function provides the base address of the memory window specified.
> + *
> + * RETURNS: address, or NULL on error.
> + */
> +resource_size_t ntb_get_mw_base(struct ntb_device *ndev, unsigned int mw)
> +{
> + if (mw >= ntb_max_mw(ndev))
> + return 0;
> +
> + return pci_resource_start(ndev->pdev, MW_TO_BAR(mw));
> +}
> +
> +/**
> * ntb_get_mw_vbase() - get virtual addr for the NTB memory window
> * @ndev: pointer to ntb_device instance
> * @mw: memory window number
> diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
> index b03de80..ab5f768 100644
> --- a/drivers/ntb/ntb_hw.h
> +++ b/drivers/ntb/ntb_hw.h
> @@ -240,6 +240,7 @@ int ntb_write_local_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
> int ntb_read_local_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
> int ntb_write_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
> int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
> +resource_size_t ntb_get_mw_base(struct ntb_device *ndev, unsigned int mw);
> void __iomem *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw);
> u64 ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw);
> void ntb_ring_sdb(struct ntb_device *ndev, unsigned int idx);
> diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> index f7380e9..73a35e4 100644
> --- a/drivers/ntb/ntb_transport.c
> +++ b/drivers/ntb/ntb_transport.c
> @@ -47,6 +47,7 @@
> */
> #include <linux/debugfs.h>
> #include <linux/delay.h>
> +#include <linux/dmaengine.h>
> #include <linux/dma-mapping.h>
> #include <linux/errno.h>
> #include <linux/export.h>
> @@ -68,6 +69,10 @@ static unsigned char max_num_clients;
> module_param(max_num_clients, byte, 0644);
> MODULE_PARM_DESC(max_num_clients, "Maximum number of NTB transport clients");
>
> +static unsigned int copy_bytes = 1024;
> +module_param(copy_bytes, uint, 0644);
> +MODULE_PARM_DESC(copy_bytes, "Threshold under which NTB will use the CPU to copy instead of DMA");
> +
> struct ntb_queue_entry {
> /* ntb_queue list reference */
> struct list_head entry;
> @@ -76,6 +81,13 @@ struct ntb_queue_entry {
> void *buf;
> unsigned int len;
> unsigned int flags;
> +
> + struct ntb_transport_qp *qp;
> + union {
> + struct ntb_payload_header __iomem *tx_hdr;
> + struct ntb_payload_header *rx_hdr;
> + };
> + unsigned int index;
> };
>
> struct ntb_rx_info {
> @@ -86,6 +98,7 @@ struct ntb_transport_qp {
> struct ntb_transport *transport;
> struct ntb_device *ndev;
> void *cb_data;
> + struct dma_chan *dma_chan;
>
> bool client_ready;
> bool qp_link;
> @@ -99,6 +112,7 @@ struct ntb_transport_qp {
> struct list_head tx_free_q;
> spinlock_t ntb_tx_free_q_lock;
> void __iomem *tx_mw;
> + dma_addr_t tx_mw_raw;
> unsigned int tx_index;
> unsigned int tx_max_entry;
> unsigned int tx_max_frame;
> @@ -114,6 +128,7 @@ struct ntb_transport_qp {
> unsigned int rx_index;
> unsigned int rx_max_entry;
> unsigned int rx_max_frame;
> + dma_cookie_t last_cookie;
>
> void (*event_handler) (void *data, int status);
> struct delayed_work link_work;
> @@ -129,9 +144,14 @@ struct ntb_transport_qp {
> u64 rx_err_no_buf;
> u64 rx_err_oflow;
> u64 rx_err_ver;
> + u64 rx_memcpy;
> + u64 rx_async;
> u64 tx_bytes;
> u64 tx_pkts;
> u64 tx_ring_full;
> + u64 tx_err_no_buf;
> + u64 tx_memcpy;
> + u64 tx_async;
> };
>
> struct ntb_transport_mw {
> @@ -381,7 +401,7 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
> char *buf;
> ssize_t ret, out_offset, out_count;
>
> - out_count = 600;
> + out_count = 1000;
>
> buf = kmalloc(out_count, GFP_KERNEL);
> if (!buf)
> @@ -396,6 +416,10 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "rx_pkts - \t%llu\n", qp->rx_pkts);
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> + "rx_memcpy - \t%llu\n", qp->rx_memcpy);
> + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> + "rx_async - \t%llu\n", qp->rx_async);
> + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "rx_ring_empty - %llu\n", qp->rx_ring_empty);
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
> @@ -415,8 +439,14 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "tx_pkts - \t%llu\n", qp->tx_pkts);
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> + "tx_memcpy - \t%llu\n", qp->tx_memcpy);
> + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> + "tx_async - \t%llu\n", qp->tx_async);
> + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "tx_ring_full - \t%llu\n", qp->tx_ring_full);
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> + "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
> + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "tx_mw - \t%p\n", qp->tx_mw);
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "tx_index - \t%u\n", qp->tx_index);
> @@ -488,11 +518,11 @@ static void ntb_transport_setup_qp_mw(struct ntb_transport *nt,
> num_qps_mw = nt->max_qps / mw_max;
>
> rx_size = (unsigned int) nt->mw[mw_num].size / num_qps_mw;
> - qp->remote_rx_info = nt->mw[mw_num].virt_addr +
> - (qp_num / mw_max * rx_size);
> + qp->rx_buff = nt->mw[mw_num].virt_addr + qp_num / mw_max * rx_size;
> rx_size -= sizeof(struct ntb_rx_info);
>
> - qp->rx_buff = qp->remote_rx_info + 1;
> + qp->remote_rx_info = qp->rx_buff + rx_size;
> +
> /* Due to housekeeping, there must be atleast 2 buffs */
> qp->rx_max_frame = min(transport_mtu, rx_size / 2);
> qp->rx_max_entry = rx_size / qp->rx_max_frame;
> @@ -802,6 +832,7 @@ static void ntb_transport_init_queue(struct ntb_transport *nt,
> struct ntb_transport_qp *qp;
> unsigned int num_qps_mw, tx_size;
> u8 mw_num, mw_max;
> + u64 qp_offset;
>
> mw_max = ntb_max_mw(nt->ndev);
> mw_num = QP_TO_MW(nt->ndev, qp_num);
> @@ -820,11 +851,12 @@ static void ntb_transport_init_queue(struct ntb_transport *nt,
> num_qps_mw = nt->max_qps / mw_max;
>
> tx_size = (unsigned int) ntb_get_mw_size(qp->ndev, mw_num) / num_qps_mw;
> - qp->rx_info = ntb_get_mw_vbase(nt->ndev, mw_num) +
> - (qp_num / mw_max * tx_size);
> + qp_offset = qp_num / mw_max * tx_size;
> + qp->tx_mw = ntb_get_mw_vbase(nt->ndev, mw_num) + qp_offset;
> + qp->tx_mw_raw = ntb_get_mw_base(qp->ndev, mw_num) + qp_offset;

Just a quibble with the name, why "raw" instead of "phys"?

> tx_size -= sizeof(struct ntb_rx_info);
> + qp->rx_info = qp->tx_mw + tx_size;
>
> - qp->tx_mw = qp->rx_info + 1;
> /* Due to housekeeping, there must be atleast 2 buffs */
> qp->tx_max_frame = min(transport_mtu, tx_size / 2);
> qp->tx_max_entry = tx_size / qp->tx_max_frame;
> @@ -956,13 +988,22 @@ void ntb_transport_free(void *transport)
> kfree(nt);
> }
>
> -static void ntb_rx_copy_task(struct ntb_transport_qp *qp,
> - struct ntb_queue_entry *entry, void *offset)
> +static void ntb_rx_copy_callback(void *data)
> {
> + struct ntb_queue_entry *entry = data;
> + struct ntb_transport_qp *qp = entry->qp;
> void *cb_data = entry->cb_data;
> unsigned int len = entry->len;
> + struct ntb_payload_header *hdr = entry->rx_hdr;
> +
> + wmb();

What is this barrier paired with?

> + hdr->flags = 0;
>
> - memcpy(entry->buf, offset, entry->len);
> + /* If the callbacks come out of order, the writing of the index to the
> + * last completed will be out of order. This may result in the the
> + * receive stalling forever.
> + */

Is this for the case where we are bouncing back and forth between
sync/async? Otherwise I do not see how transactions could get out of
order given you allocate a channel once per queue. Flushing the async
transactions when transitioning to sync is one option. Is this
comment warning about a bug or explaining why we do the iowrite?

> + iowrite32(entry->index, &qp->rx_info->entry);
>
> ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry, &qp->rx_free_q);
>
> @@ -970,6 +1011,80 @@ static void ntb_rx_copy_task(struct ntb_transport_qp *qp,
> qp->rx_handler(qp, qp->cb_data, cb_data, len);
> }
>
> +static void ntb_memcpy_rx(struct ntb_queue_entry *entry, void *offset)
> +{
> + void *buf = entry->buf;
> + size_t len = entry->len;
> +
> + memcpy(buf, offset, len);
> +
> + ntb_rx_copy_callback(entry);
> +}
> +
> +static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset,
> + size_t len)
> +{
> + struct dma_async_tx_descriptor *txd;
> + struct ntb_transport_qp *qp = entry->qp;
> + struct dma_chan *chan = qp->dma_chan;
> + struct dma_device *device;
> + dma_addr_t src, dest;
> + dma_cookie_t cookie;
> + void *buf = entry->buf;
> + unsigned long flags;
> +
> + entry->len = len;
> +
> + if (!chan)
> + goto err;
> +
> + device = chan->device;
> +
> + if (len < copy_bytes ||
> + !is_dma_copy_aligned(device, __pa(offset), __pa(buf), len))

__pa()'s necessary here? I don't think the alignment requirements
will ever cross PAGE_OFFSET.

> + goto err1;
> +
> + dest = dma_map_single(device->dev, buf, len, DMA_FROM_DEVICE);
> + if (dma_mapping_error(device->dev, dest))
> + goto err1;
> +
> + src = dma_map_single(device->dev, offset, len, DMA_TO_DEVICE);
> + if (dma_mapping_error(device->dev, src))
> + goto err2;
> +
> + flags = DMA_COMPL_DEST_UNMAP_SINGLE | DMA_COMPL_SRC_UNMAP_SINGLE |
> + DMA_PREP_INTERRUPT;
> + txd = device->device_prep_dma_memcpy(chan, dest, src, len, flags);
> + if (!txd)
> + goto err3;
> +
> + txd_clear_parent(txd);
> + txd_clear_next(txd);

Neither of these txd_clear* calls should be necessary.

> + txd->callback = ntb_rx_copy_callback;
> + txd->callback_param = entry;
> +
> + cookie = dmaengine_submit(txd);
> + if (dma_submit_error(cookie))
> + goto err3;
> +
> + qp->last_cookie = cookie;
> +
> + dma_async_issue_pending(chan);

hmm... can this go in ntb_process_rx() so that the submission is
batched after all the async work is prepped?

> + qp->rx_async++;
> +
> + return;
> +
> +err3:
> + dma_unmap_single(device->dev, src, len, DMA_TO_DEVICE);
> +err2:
> + dma_unmap_single(device->dev, dest, len, DMA_FROM_DEVICE);
> +err1:
> + dma_sync_wait(chan, qp->last_cookie);
> +err:
> + ntb_memcpy_rx(entry, offset);
> + qp->rx_memcpy++;
> +}
> +
> static int ntb_process_rxc(struct ntb_transport_qp *qp)
> {
> struct ntb_payload_header *hdr;
> @@ -1008,41 +1123,44 @@ static int ntb_process_rxc(struct ntb_transport_qp *qp)
> if (hdr->flags & LINK_DOWN_FLAG) {
> ntb_qp_link_down(qp);
>
> - ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
> - &qp->rx_pend_q);
> - goto out;
> + goto err;
> }
>
> dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
> "rx offset %u, ver %u - %d payload received, buf size %d\n",
> qp->rx_index, hdr->ver, hdr->len, entry->len);
>
> - if (hdr->len <= entry->len) {
> - entry->len = hdr->len;
> - ntb_rx_copy_task(qp, entry, offset);
> - } else {
> - ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
> - &qp->rx_pend_q);
> + qp->rx_bytes += hdr->len;
> + qp->rx_pkts++;
>
> + if (hdr->len > entry->len) {
> qp->rx_err_oflow++;
> dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
> "RX overflow! Wanted %d got %d\n",
> hdr->len, entry->len);
> +
> + goto err;
> }
>
> - qp->rx_bytes += hdr->len;
> - qp->rx_pkts++;
> + entry->index = qp->rx_index;
> + entry->rx_hdr = hdr;
>
> -out:
> - /* Ensure that the data is fully copied out before clearing the flag */
> - wmb();
> - hdr->flags = 0;
> - iowrite32(qp->rx_index, &qp->rx_info->entry);
> + ntb_async_rx(entry, offset, hdr->len);
>
> +out:
> qp->rx_index++;
> qp->rx_index %= qp->rx_max_entry;
>
> return 0;
> +
> +err:
> + ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
> + &qp->rx_pend_q);
> + wmb();
> + hdr->flags = 0;
> + iowrite32(qp->rx_index, &qp->rx_info->entry);
> +
> + goto out;
> }
>
> static void ntb_transport_rx(unsigned long data)
> @@ -1070,19 +1188,12 @@ static void ntb_transport_rxc_db(void *data, int db_num)
> tasklet_schedule(&qp->rx_work);
> }
>
> -static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
> - struct ntb_queue_entry *entry,
> - void __iomem *offset)
> +static void ntb_tx_copy_callback(void *data)
> {
> - struct ntb_payload_header __iomem *hdr;
> -
> - memcpy_toio(offset, entry->buf, entry->len);
> -
> - hdr = offset + qp->tx_max_frame - sizeof(struct ntb_payload_header);
> - iowrite32(entry->len, &hdr->len);
> - iowrite32((u32) qp->tx_pkts, &hdr->ver);
> + struct ntb_queue_entry *entry = data;
> + struct ntb_transport_qp *qp = entry->qp;
> + struct ntb_payload_header __iomem *hdr = entry->tx_hdr;
>
> - /* Ensure that the data is fully copied out before setting the flag */
> wmb();
> iowrite32(entry->flags | DESC_DONE_FLAG, &hdr->flags);
>
> @@ -1103,15 +1214,79 @@ static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
> ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry, &qp->tx_free_q);
> }
>
> -static int ntb_process_tx(struct ntb_transport_qp *qp,
> - struct ntb_queue_entry *entry)
> +static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void __iomem *offset)
> +{
> + memcpy_toio(offset, entry->buf, entry->len);
> +
> + ntb_tx_copy_callback(entry);
> +}
> +
> +static void ntb_async_tx(struct ntb_transport_qp *qp,
> + struct ntb_queue_entry *entry)
> {
> + struct dma_async_tx_descriptor *txd;
> + struct dma_chan *chan = qp->dma_chan;
> + struct dma_device *device;
> + dma_addr_t src, dest;
> + dma_cookie_t cookie;
> + struct ntb_payload_header __iomem *hdr;
> void __iomem *offset;
> + size_t len = entry->len;
> + void *buf = entry->buf;
> + unsigned long flags;
>
> offset = qp->tx_mw + qp->tx_max_frame * qp->tx_index;
> + hdr = offset + qp->tx_max_frame - sizeof(struct ntb_payload_header);
> + entry->tx_hdr = hdr;
> +
> + iowrite32(entry->len, &hdr->len);
> + iowrite32((u32) qp->tx_pkts, &hdr->ver);
> +
> + if (!chan)
> + goto err;
>
> - dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%lld - offset %p, tx %u, entry len %d flags %x buff %p\n",
> - qp->tx_pkts, offset, qp->tx_index, entry->len, entry->flags,
> + device = chan->device;
> +
> + dest = qp->tx_mw_raw + qp->tx_max_frame * qp->tx_index;
> +
> + if (len < copy_bytes ||
> + !is_dma_copy_aligned(device, __pa(buf), dest, len))

ditto on the use of __pa here.

> + goto err;
> +
> + src = dma_map_single(device->dev, buf, len, DMA_TO_DEVICE);
> + if (dma_mapping_error(device->dev, src))
> + goto err;
> +
> + flags = DMA_COMPL_SRC_UNMAP_SINGLE | DMA_PREP_INTERRUPT;
> + txd = device->device_prep_dma_memcpy(chan, dest, src, len, flags);
> + if (!txd)
> + goto err1;
> +
> + txd_clear_parent(txd);
> + txd_clear_next(txd);

...again not needed.

> + txd->callback = ntb_tx_copy_callback;
> + txd->callback_param = entry;
> +
> + cookie = dmaengine_submit(txd);
> + if (dma_submit_error(cookie))
> + goto err1;
> +
> + dma_async_issue_pending(chan);
> + qp->tx_async++;
> +
> + return;
> +err1:
> + dma_unmap_single(device->dev, src, len, DMA_TO_DEVICE);
> +err:
> + ntb_memcpy_tx(entry, offset);
> + qp->tx_memcpy++;
> +}
> +
> +static int ntb_process_tx(struct ntb_transport_qp *qp,
> + struct ntb_queue_entry *entry)
> +{
> + dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%lld - tx %u, entry len %d flags %x buff %p\n",
> + qp->tx_pkts, qp->tx_index, entry->len, entry->flags,
> entry->buf);
> if (qp->tx_index == qp->remote_rx_info->entry) {
> qp->tx_ring_full++;
> @@ -1127,7 +1302,7 @@ static int ntb_process_tx(struct ntb_transport_qp *qp,
> return 0;
> }
>
> - ntb_tx_copy_task(qp, entry, offset);
> + ntb_async_tx(qp, entry);
>
> qp->tx_index++;
> qp->tx_index %= qp->tx_max_entry;
> @@ -1213,11 +1388,16 @@ ntb_transport_create_queue(void *data, struct pci_dev *pdev,
> qp->tx_handler = handlers->tx_handler;
> qp->event_handler = handlers->event_handler;
>
> + qp->dma_chan = dma_find_channel(DMA_MEMCPY);
> + if (!qp->dma_chan)
> + dev_info(&pdev->dev, "Unable to allocate private DMA channel, using CPU instead\n");

You can drop "private" since this is a public allocation.

> +
> for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
> entry = kzalloc(sizeof(struct ntb_queue_entry), GFP_ATOMIC);
> if (!entry)
> goto err1;
>
> + entry->qp = qp;
> ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
> &qp->rx_free_q);
> }
> @@ -1227,6 +1407,7 @@ ntb_transport_create_queue(void *data, struct pci_dev *pdev,
> if (!entry)
> goto err2;
>
> + entry->qp = qp;
> ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
> &qp->tx_free_q);
> }
> @@ -1272,11 +1453,14 @@ void ntb_transport_free_queue(struct ntb_transport_qp *qp)
>
> pdev = ntb_query_pdev(qp->ndev);
>
> - cancel_delayed_work_sync(&qp->link_work);
> + if (qp->dma_chan)
> + dmaengine_terminate_all(qp->dma_chan);

Do you need this or can you just wait for all outstanding tx / rx to quiesce?

> ntb_unregister_db_callback(qp->ndev, qp->qp_num);
> tasklet_disable(&qp->rx_work);
>
> + cancel_delaye

Soon the ability to ask the driver to unmap for you will go away. You
can get ahead of the game by unmapping manually.

d_work_sync(&qp->link_work);
> +
> while ((entry = ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q)))
> kfree(entry);
>
> @@ -1382,8 +1566,10 @@ int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
> return -EINVAL;
>
> entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q);
> - if (!entry)
> + if (!entry) {
> + qp->tx_err_no_buf++;
> return -ENOMEM;
> + }
>
> entry->cb_data = cb;
> entry->buf = data;
> @@ -1499,9 +1685,18 @@ EXPORT_SYMBOL_GPL(ntb_transport_qp_num);
> */
> unsigned int ntb_transport_max_size(struct ntb_transport_qp *qp)
> {
> + unsigned int max;
> +
> if (!qp)
> return 0;
>
> - return qp->tx_max_frame - sizeof(struct ntb_payload_header);
> + if (!qp->dma_chan)
> + return qp->tx_max_frame - sizeof(struct ntb_payload_header);
> +
> + /* If DMA engine usage is possible, try to find the max size for that */
> + max = qp->tx_max_frame - sizeof(struct ntb_payload_header);
> + max -= max % (1 << qp->dma_chan->device->copy_align);

Unless I missed it you need a dmaengine_get() dmaengine_put() pair in
your driver setup/teardown. dmaengine_get() notifies dmaengine of a
dma_find_channel() user.

2013-08-19 10:01:58

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 09/15] NTB: Use DMA Engine to Transmit and Receive

On Fri, Aug 2, 2013 at 10:35 AM, Jon Mason <[email protected]> wrote:
> Allocate and use a DMA engine channel to transmit and receive data over
> NTB. If none is allocated, fall back to using the CPU to transfer data.
>
> Cc: Dan Williams <[email protected]>
> Cc: Vinod Koul <[email protected]>
> Cc: Dave Jiang <[email protected]>
> Signed-off-by: Jon Mason <[email protected]>
> ---
> drivers/ntb/ntb_hw.c | 17 +++
> drivers/ntb/ntb_hw.h | 1 +
> drivers/ntb/ntb_transport.c | 285 ++++++++++++++++++++++++++++++++++++-------
> 3 files changed, 258 insertions(+), 45 deletions(-)
>
> diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
> index 1d8e551..014222c 100644
> --- a/drivers/ntb/ntb_hw.c
> +++ b/drivers/ntb/ntb_hw.c
> @@ -350,6 +350,23 @@ int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val)
> }
>
> /**
> + * ntb_get_mw_base() - get addr for the NTB memory window
> + * @ndev: pointer to ntb_device instance
> + * @mw: memory window number
> + *
> + * This function provides the base address of the memory window specified.
> + *
> + * RETURNS: address, or NULL on error.
> + */
> +resource_size_t ntb_get_mw_base(struct ntb_device *ndev, unsigned int mw)
> +{
> + if (mw >= ntb_max_mw(ndev))
> + return 0;
> +
> + return pci_resource_start(ndev->pdev, MW_TO_BAR(mw));
> +}
> +
> +/**
> * ntb_get_mw_vbase() - get virtual addr for the NTB memory window
> * @ndev: pointer to ntb_device instance
> * @mw: memory window number
> diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
> index b03de80..ab5f768 100644
> --- a/drivers/ntb/ntb_hw.h
> +++ b/drivers/ntb/ntb_hw.h
> @@ -240,6 +240,7 @@ int ntb_write_local_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
> int ntb_read_local_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
> int ntb_write_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
> int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
> +resource_size_t ntb_get_mw_base(struct ntb_device *ndev, unsigned int mw);
> void __iomem *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw);
> u64 ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw);
> void ntb_ring_sdb(struct ntb_device *ndev, unsigned int idx);
> diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> index f7380e9..73a35e4 100644
> --- a/drivers/ntb/ntb_transport.c
> +++ b/drivers/ntb/ntb_transport.c
> @@ -47,6 +47,7 @@
> */
> #include <linux/debugfs.h>
> #include <linux/delay.h>
> +#include <linux/dmaengine.h>
> #include <linux/dma-mapping.h>
> #include <linux/errno.h>
> #include <linux/export.h>
> @@ -68,6 +69,10 @@ static unsigned char max_num_clients;
> module_param(max_num_clients, byte, 0644);
> MODULE_PARM_DESC(max_num_clients, "Maximum number of NTB transport clients");
>
> +static unsigned int copy_bytes = 1024;
> +module_param(copy_bytes, uint, 0644);
> +MODULE_PARM_DESC(copy_bytes, "Threshold under which NTB will use the CPU to copy instead of DMA");
> +
> struct ntb_queue_entry {
> /* ntb_queue list reference */
> struct list_head entry;
> @@ -76,6 +81,13 @@ struct ntb_queue_entry {
> void *buf;
> unsigned int len;
> unsigned int flags;
> +
> + struct ntb_transport_qp *qp;
> + union {
> + struct ntb_payload_header __iomem *tx_hdr;
> + struct ntb_payload_header *rx_hdr;
> + };
> + unsigned int index;
> };
>
> struct ntb_rx_info {
> @@ -86,6 +98,7 @@ struct ntb_transport_qp {
> struct ntb_transport *transport;
> struct ntb_device *ndev;
> void *cb_data;
> + struct dma_chan *dma_chan;
>
> bool client_ready;
> bool qp_link;
> @@ -99,6 +112,7 @@ struct ntb_transport_qp {
> struct list_head tx_free_q;
> spinlock_t ntb_tx_free_q_lock;
> void __iomem *tx_mw;
> + dma_addr_t tx_mw_raw;
> unsigned int tx_index;
> unsigned int tx_max_entry;
> unsigned int tx_max_frame;
> @@ -114,6 +128,7 @@ struct ntb_transport_qp {
> unsigned int rx_index;
> unsigned int rx_max_entry;
> unsigned int rx_max_frame;
> + dma_cookie_t last_cookie;
>
> void (*event_handler) (void *data, int status);
> struct delayed_work link_work;
> @@ -129,9 +144,14 @@ struct ntb_transport_qp {
> u64 rx_err_no_buf;
> u64 rx_err_oflow;
> u64 rx_err_ver;
> + u64 rx_memcpy;
> + u64 rx_async;
> u64 tx_bytes;
> u64 tx_pkts;
> u64 tx_ring_full;
> + u64 tx_err_no_buf;
> + u64 tx_memcpy;
> + u64 tx_async;
> };
>
> struct ntb_transport_mw {
> @@ -381,7 +401,7 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
> char *buf;
> ssize_t ret, out_offset, out_count;
>
> - out_count = 600;
> + out_count = 1000;
>
> buf = kmalloc(out_count, GFP_KERNEL);
> if (!buf)
> @@ -396,6 +416,10 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "rx_pkts - \t%llu\n", qp->rx_pkts);
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> + "rx_memcpy - \t%llu\n", qp->rx_memcpy);
> + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> + "rx_async - \t%llu\n", qp->rx_async);
> + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "rx_ring_empty - %llu\n", qp->rx_ring_empty);
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
> @@ -415,8 +439,14 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "tx_pkts - \t%llu\n", qp->tx_pkts);
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> + "tx_memcpy - \t%llu\n", qp->tx_memcpy);
> + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> + "tx_async - \t%llu\n", qp->tx_async);
> + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "tx_ring_full - \t%llu\n", qp->tx_ring_full);
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> + "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
> + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "tx_mw - \t%p\n", qp->tx_mw);
> out_offset += snprintf(buf + out_offset, out_count - out_offset,
> "tx_index - \t%u\n", qp->tx_index);
> @@ -488,11 +518,11 @@ static void ntb_transport_setup_qp_mw(struct ntb_transport *nt,
> num_qps_mw = nt->max_qps / mw_max;
>
> rx_size = (unsigned int) nt->mw[mw_num].size / num_qps_mw;
> - qp->remote_rx_info = nt->mw[mw_num].virt_addr +
> - (qp_num / mw_max * rx_size);
> + qp->rx_buff = nt->mw[mw_num].virt_addr + qp_num / mw_max * rx_size;
> rx_size -= sizeof(struct ntb_rx_info);
>
> - qp->rx_buff = qp->remote_rx_info + 1;
> + qp->remote_rx_info = qp->rx_buff + rx_size;
> +
> /* Due to housekeeping, there must be atleast 2 buffs */
> qp->rx_max_frame = min(transport_mtu, rx_size / 2);
> qp->rx_max_entry = rx_size / qp->rx_max_frame;
> @@ -802,6 +832,7 @@ static void ntb_transport_init_queue(struct ntb_transport *nt,
> struct ntb_transport_qp *qp;
> unsigned int num_qps_mw, tx_size;
> u8 mw_num, mw_max;
> + u64 qp_offset;
>
> mw_max = ntb_max_mw(nt->ndev);
> mw_num = QP_TO_MW(nt->ndev, qp_num);
> @@ -820,11 +851,12 @@ static void ntb_transport_init_queue(struct ntb_transport *nt,
> num_qps_mw = nt->max_qps / mw_max;
>
> tx_size = (unsigned int) ntb_get_mw_size(qp->ndev, mw_num) / num_qps_mw;
> - qp->rx_info = ntb_get_mw_vbase(nt->ndev, mw_num) +
> - (qp_num / mw_max * tx_size);
> + qp_offset = qp_num / mw_max * tx_size;
> + qp->tx_mw = ntb_get_mw_vbase(nt->ndev, mw_num) + qp_offset;
> + qp->tx_mw_raw = ntb_get_mw_base(qp->ndev, mw_num) + qp_offset;

Just a quibble with the name, why "raw" instead of "phys"?

> tx_size -= sizeof(struct ntb_rx_info);
> + qp->rx_info = qp->tx_mw + tx_size;
>
> - qp->tx_mw = qp->rx_info + 1;
> /* Due to housekeeping, there must be atleast 2 buffs */
> qp->tx_max_frame = min(transport_mtu, tx_size / 2);
> qp->tx_max_entry = tx_size / qp->tx_max_frame;
> @@ -956,13 +988,22 @@ void ntb_transport_free(void *transport)
> kfree(nt);
> }
>
> -static void ntb_rx_copy_task(struct ntb_transport_qp *qp,
> - struct ntb_queue_entry *entry, void *offset)
> +static void ntb_rx_copy_callback(void *data)
> {
> + struct ntb_queue_entry *entry = data;
> + struct ntb_transport_qp *qp = entry->qp;
> void *cb_data = entry->cb_data;
> unsigned int len = entry->len;
> + struct ntb_payload_header *hdr = entry->rx_hdr;
> +
> + wmb();

What is this barrier paired with?

> + hdr->flags = 0;
>
> - memcpy(entry->buf, offset, entry->len);
> + /* If the callbacks come out of order, the writing of the index to the
> + * last completed will be out of order. This may result in the the
> + * receive stalling forever.
> + */

Is this for the case where we are bouncing back and forth between
sync/async? Otherwise I do not see how transactions could get out of
order given you allocate a channel once per queue. Is this comment
saying that the iowrite32 is somehow a fix, or is this comment a
FIXME?

> + iowrite32(entry->index, &qp->rx_info->entry);
>
> ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry, &qp->rx_free_q);
>
> @@ -970,6 +1011,80 @@ static void ntb_rx_copy_task(struct ntb_transport_qp *qp,
> qp->rx_handler(qp, qp->cb_data, cb_data, len);
> }
>
> +static void ntb_memcpy_rx(struct ntb_queue_entry *entry, void *offset)
> +{
> + void *buf = entry->buf;
> + size_t len = entry->len;
> +
> + memcpy(buf, offset, len);
> +
> + ntb_rx_copy_callback(entry);
> +}
> +
> +static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset,
> + size_t len)
> +{
> + struct dma_async_tx_descriptor *txd;
> + struct ntb_transport_qp *qp = entry->qp;
> + struct dma_chan *chan = qp->dma_chan;
> + struct dma_device *device;
> + dma_addr_t src, dest;
> + dma_cookie_t cookie;
> + void *buf = entry->buf;
> + unsigned long flags;
> +
> + entry->len = len;
> +
> + if (!chan)
> + goto err;
> +
> + device = chan->device;
> +
> + if (len < copy_bytes ||
> + !is_dma_copy_aligned(device, __pa(offset), __pa(buf), len))

__pa()'s necessary here? I don't think the alignment requirements
will ever cross PAGE_OFFSET.

> + goto err1;
> +
> + dest = dma_map_single(device->dev, buf, len, DMA_FROM_DEVICE);
> + if (dma_mapping_error(device->dev, dest))
> + goto err1;
> +
> + src = dma_map_single(device->dev, offset, len, DMA_TO_DEVICE);
> + if (dma_mapping_error(device->dev, src))
> + goto err2;
> +
> + flags = DMA_COMPL_DEST_UNMAP_SINGLE | DMA_COMPL_SRC_UNMAP_SINGLE |
> + DMA_PREP_INTERRUPT;
> + txd = device->device_prep_dma_memcpy(chan, dest, src, len, flags);
> + if (!txd)
> + goto err3;
> +
> + txd_clear_parent(txd);
> + txd_clear_next(txd);

Neither of these should be necessary.

> + txd->callback = ntb_rx_copy_callback;
> + txd->callback_param = entry;
> +
> + cookie = dmaengine_submit(txd);
> + if (dma_submit_error(cookie))
> + goto err3;
> +
> + qp->last_cookie = cookie;
> +
> + dma_async_issue_pending(chan);

hmm... can this go in ntb_process_rx() so that the submission is
batched? Cuts down on mmio.

> + qp->rx_async++;
> +
> + return;
> +
> +err3:
> + dma_unmap_single(device->dev, src, len, DMA_TO_DEVICE);
> +err2:
> + dma_unmap_single(device->dev, dest, len, DMA_FROM_DEVICE);
> +err1:
> + dma_sync_wait(chan, qp->last_cookie);
> +err:
> + ntb_memcpy_rx(entry, offset);
> + qp->rx_memcpy++;
> +}
> +
> static int ntb_process_rxc(struct ntb_transport_qp *qp)
> {
> struct ntb_payload_header *hdr;
> @@ -1008,41 +1123,44 @@ static int ntb_process_rxc(struct ntb_transport_qp *qp)
> if (hdr->flags & LINK_DOWN_FLAG) {
> ntb_qp_link_down(qp);
>
> - ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
> - &qp->rx_pend_q);
> - goto out;
> + goto err;
> }
>
> dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
> "rx offset %u, ver %u - %d payload received, buf size %d\n",
> qp->rx_index, hdr->ver, hdr->len, entry->len);
>
> - if (hdr->len <= entry->len) {
> - entry->len = hdr->len;
> - ntb_rx_copy_task(qp, entry, offset);
> - } else {
> - ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
> - &qp->rx_pend_q);
> + qp->rx_bytes += hdr->len;
> + qp->rx_pkts++;
>
> + if (hdr->len > entry->len) {
> qp->rx_err_oflow++;
> dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
> "RX overflow! Wanted %d got %d\n",
> hdr->len, entry->len);
> +
> + goto err;
> }
>
> - qp->rx_bytes += hdr->len;
> - qp->rx_pkts++;
> + entry->index = qp->rx_index;
> + entry->rx_hdr = hdr;
>
> -out:
> - /* Ensure that the data is fully copied out before clearing the flag */
> - wmb();
> - hdr->flags = 0;
> - iowrite32(qp->rx_index, &qp->rx_info->entry);
> + ntb_async_rx(entry, offset, hdr->len);
>
> +out:
> qp->rx_index++;
> qp->rx_index %= qp->rx_max_entry;
>
> return 0;
> +
> +err:
> + ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
> + &qp->rx_pend_q);
> + wmb();
> + hdr->flags = 0;
> + iowrite32(qp->rx_index, &qp->rx_info->entry);
> +
> + goto out;
> }
>
> static void ntb_transport_rx(unsigned long data)
> @@ -1070,19 +1188,12 @@ static void ntb_transport_rxc_db(void *data, int db_num)
> tasklet_schedule(&qp->rx_work);
> }
>
> -static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
> - struct ntb_queue_entry *entry,
> - void __iomem *offset)
> +static void ntb_tx_copy_callback(void *data)
> {
> - struct ntb_payload_header __iomem *hdr;
> -
> - memcpy_toio(offset, entry->buf, entry->len);
> -
> - hdr = offset + qp->tx_max_frame - sizeof(struct ntb_payload_header);
> - iowrite32(entry->len, &hdr->len);
> - iowrite32((u32) qp->tx_pkts, &hdr->ver);
> + struct ntb_queue_entry *entry = data;
> + struct ntb_transport_qp *qp = entry->qp;
> + struct ntb_payload_header __iomem *hdr = entry->tx_hdr;
>
> - /* Ensure that the data is fully copied out before setting the flag */
> wmb();
> iowrite32(entry->flags | DESC_DONE_FLAG, &hdr->flags);
>
> @@ -1103,15 +1214,79 @@ static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
> ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry, &qp->tx_free_q);
> }
>
> -static int ntb_process_tx(struct ntb_transport_qp *qp,
> - struct ntb_queue_entry *entry)
> +static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void __iomem *offset)
> +{
> + memcpy_toio(offset, entry->buf, entry->len);
> +
> + ntb_tx_copy_callback(entry);
> +}
> +
> +static void ntb_async_tx(struct ntb_transport_qp *qp,
> + struct ntb_queue_entry *entry)
> {
> + struct dma_async_tx_descriptor *txd;
> + struct dma_chan *chan = qp->dma_chan;
> + struct dma_device *device;
> + dma_addr_t src, dest;
> + dma_cookie_t cookie;
> + struct ntb_payload_header __iomem *hdr;
> void __iomem *offset;
> + size_t len = entry->len;
> + void *buf = entry->buf;
> + unsigned long flags;
>
> offset = qp->tx_mw + qp->tx_max_frame * qp->tx_index;
> + hdr = offset + qp->tx_max_frame - sizeof(struct ntb_payload_header);
> + entry->tx_hdr = hdr;
> +
> + iowrite32(entry->len, &hdr->len);
> + iowrite32((u32) qp->tx_pkts, &hdr->ver);
> +
> + if (!chan)
> + goto err;
>
> - dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%lld - offset %p, tx %u, entry len %d flags %x buff %p\n",
> - qp->tx_pkts, offset, qp->tx_index, entry->len, entry->flags,
> + device = chan->device;
> +
> + dest = qp->tx_mw_raw + qp->tx_max_frame * qp->tx_index;
> +
> + if (len < copy_bytes ||
> + !is_dma_copy_aligned(device, __pa(buf), dest, len))

ditto on the use of __pa here.

> + goto err;
> +
> + src = dma_map_single(device->dev, buf, len, DMA_TO_DEVICE);
> + if (dma_mapping_error(device->dev, src))
> + goto err;
> +
> + flags = DMA_COMPL_SRC_UNMAP_SINGLE | DMA_PREP_INTERRUPT;
> + txd = device->device_prep_dma_memcpy(chan, dest, src, len, flags);
> + if (!txd)
> + goto err1;
> +
> + txd_clear_parent(txd);
> + txd_clear_next(txd);

...again not needed.

> + txd->callback = ntb_tx_copy_callback;
> + txd->callback_param = entry;
> +
> + cookie = dmaengine_submit(txd);
> + if (dma_submit_error(cookie))
> + goto err1;
> +
> + dma_async_issue_pending(chan);
> + qp->tx_async++;
> +
> + return;
> +err1:
> + dma_unmap_single(device->dev, src, len, DMA_TO_DEVICE);
> +err:
> + ntb_memcpy_tx(entry, offset);
> + qp->tx_memcpy++;
> +}
> +
> +static int ntb_process_tx(struct ntb_transport_qp *qp,
> + struct ntb_queue_entry *entry)
> +{
> + dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%lld - tx %u, entry len %d flags %x buff %p\n",
> + qp->tx_pkts, qp->tx_index, entry->len, entry->flags,
> entry->buf);
> if (qp->tx_index == qp->remote_rx_info->entry) {
> qp->tx_ring_full++;
> @@ -1127,7 +1302,7 @@ static int ntb_process_tx(struct ntb_transport_qp *qp,
> return 0;
> }
>
> - ntb_tx_copy_task(qp, entry, offset);
> + ntb_async_tx(qp, entry);
>
> qp->tx_index++;
> qp->tx_index %= qp->tx_max_entry;
> @@ -1213,11 +1388,16 @@ ntb_transport_create_queue(void *data, struct pci_dev *pdev,
> qp->tx_handler = handlers->tx_handler;
> qp->event_handler = handlers->event_handler;
>
> + qp->dma_chan = dma_find_channel(DMA_MEMCPY);
> + if (!qp->dma_chan)
> + dev_info(&pdev->dev, "Unable to allocate private DMA channel, using CPU instead\n");

You can drop "private" since this is a public allocation.

> +
> for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
> entry = kzalloc(sizeof(struct ntb_queue_entry), GFP_ATOMIC);
> if (!entry)
> goto err1;
>
> + entry->qp = qp;
> ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
> &qp->rx_free_q);
> }
> @@ -1227,6 +1407,7 @@ ntb_transport_create_queue(void *data, struct pci_dev *pdev,
> if (!entry)
> goto err2;
>
> + entry->qp = qp;
> ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
> &qp->tx_free_q);
> }
> @@ -1272,11 +1453,14 @@ void ntb_transport_free_queue(struct ntb_transport_qp *qp)
>
> pdev = ntb_query_pdev(qp->ndev);
>
> - cancel_delayed_work_sync(&qp->link_work);
> + if (qp->dma_chan)
> + dmaengine_terminate_all(qp->dma_chan);

Do you need this or can you just wait for all outstanding tx / rx to quiesce?

> ntb_unregister_db_callback(qp->ndev, qp->qp_num);
> tasklet_disable(&qp->rx_work);
>
> + cancel_delayed_work_sync(&qp->link_work);
> +
> while ((entry = ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q)))
> kfree(entry);
>
> @@ -1382,8 +1566,10 @@ int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
> return -EINVAL;
>
> entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q);
> - if (!entry)
> + if (!entry) {
> + qp->tx_err_no_buf++;
> return -ENOMEM;
> + }
>
> entry->cb_data = cb;
> entry->buf = data;
> @@ -1499,9 +1685,18 @@ EXPORT_SYMBOL_GPL(ntb_transport_qp_num);
> */
> unsigned int ntb_transport_max_size(struct ntb_transport_qp *qp)
> {
> + unsigned int max;
> +
> if (!qp)
> return 0;
>
> - return qp->tx_max_frame - sizeof(struct ntb_payload_header);
> + if (!qp->dma_chan)
> + return qp->tx_max_frame - sizeof(struct ntb_payload_header);
> +
> + /* If DMA engine usage is possible, try to find the max size for that */
> + max = qp->tx_max_frame - sizeof(struct ntb_payload_header);
> + max -= max % (1 << qp->dma_chan->device->copy_align);

Unless I missed it you need a dmaengine_get() dmaengine_put() pair in
your driver setup/teardown. dmaengine_get() notifies dmaengine of a
dma_find_channel() user.

2013-08-19 20:37:47

by Jon Mason

[permalink] [raw]
Subject: Re: [PATCH 09/15] NTB: Use DMA Engine to Transmit and Receive

On Mon, Aug 19, 2013 at 03:01:54AM -0700, Dan Williams wrote:
> On Fri, Aug 2, 2013 at 10:35 AM, Jon Mason <[email protected]> wrote:
> > Allocate and use a DMA engine channel to transmit and receive data over
> > NTB. If none is allocated, fall back to using the CPU to transfer data.
> >
> > Cc: Dan Williams <[email protected]>
> > Cc: Vinod Koul <[email protected]>
> > Cc: Dave Jiang <[email protected]>
> > Signed-off-by: Jon Mason <[email protected]>
> > ---
> > drivers/ntb/ntb_hw.c | 17 +++
> > drivers/ntb/ntb_hw.h | 1 +
> > drivers/ntb/ntb_transport.c | 285 ++++++++++++++++++++++++++++++++++++-------
> > 3 files changed, 258 insertions(+), 45 deletions(-)
> >
> > diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
> > index 1d8e551..014222c 100644
> > --- a/drivers/ntb/ntb_hw.c
> > +++ b/drivers/ntb/ntb_hw.c
> > @@ -350,6 +350,23 @@ int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val)
> > }
> >
> > /**
> > + * ntb_get_mw_base() - get addr for the NTB memory window
> > + * @ndev: pointer to ntb_device instance
> > + * @mw: memory window number
> > + *
> > + * This function provides the base address of the memory window specified.
> > + *
> > + * RETURNS: address, or NULL on error.
> > + */
> > +resource_size_t ntb_get_mw_base(struct ntb_device *ndev, unsigned int mw)
> > +{
> > + if (mw >= ntb_max_mw(ndev))
> > + return 0;
> > +
> > + return pci_resource_start(ndev->pdev, MW_TO_BAR(mw));
> > +}
> > +
> > +/**
> > * ntb_get_mw_vbase() - get virtual addr for the NTB memory window
> > * @ndev: pointer to ntb_device instance
> > * @mw: memory window number
> > diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
> > index b03de80..ab5f768 100644
> > --- a/drivers/ntb/ntb_hw.h
> > +++ b/drivers/ntb/ntb_hw.h
> > @@ -240,6 +240,7 @@ int ntb_write_local_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
> > int ntb_read_local_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
> > int ntb_write_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 val);
> > int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx, u32 *val);
> > +resource_size_t ntb_get_mw_base(struct ntb_device *ndev, unsigned int mw);
> > void __iomem *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int mw);
> > u64 ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw);
> > void ntb_ring_sdb(struct ntb_device *ndev, unsigned int idx);
> > diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> > index f7380e9..73a35e4 100644
> > --- a/drivers/ntb/ntb_transport.c
> > +++ b/drivers/ntb/ntb_transport.c
> > @@ -47,6 +47,7 @@
> > */
> > #include <linux/debugfs.h>
> > #include <linux/delay.h>
> > +#include <linux/dmaengine.h>
> > #include <linux/dma-mapping.h>
> > #include <linux/errno.h>
> > #include <linux/export.h>
> > @@ -68,6 +69,10 @@ static unsigned char max_num_clients;
> > module_param(max_num_clients, byte, 0644);
> > MODULE_PARM_DESC(max_num_clients, "Maximum number of NTB transport clients");
> >
> > +static unsigned int copy_bytes = 1024;
> > +module_param(copy_bytes, uint, 0644);
> > +MODULE_PARM_DESC(copy_bytes, "Threshold under which NTB will use the CPU to copy instead of DMA");
> > +
> > struct ntb_queue_entry {
> > /* ntb_queue list reference */
> > struct list_head entry;
> > @@ -76,6 +81,13 @@ struct ntb_queue_entry {
> > void *buf;
> > unsigned int len;
> > unsigned int flags;
> > +
> > + struct ntb_transport_qp *qp;
> > + union {
> > + struct ntb_payload_header __iomem *tx_hdr;
> > + struct ntb_payload_header *rx_hdr;
> > + };
> > + unsigned int index;
> > };
> >
> > struct ntb_rx_info {
> > @@ -86,6 +98,7 @@ struct ntb_transport_qp {
> > struct ntb_transport *transport;
> > struct ntb_device *ndev;
> > void *cb_data;
> > + struct dma_chan *dma_chan;
> >
> > bool client_ready;
> > bool qp_link;
> > @@ -99,6 +112,7 @@ struct ntb_transport_qp {
> > struct list_head tx_free_q;
> > spinlock_t ntb_tx_free_q_lock;
> > void __iomem *tx_mw;
> > + dma_addr_t tx_mw_raw;
> > unsigned int tx_index;
> > unsigned int tx_max_entry;
> > unsigned int tx_max_frame;
> > @@ -114,6 +128,7 @@ struct ntb_transport_qp {
> > unsigned int rx_index;
> > unsigned int rx_max_entry;
> > unsigned int rx_max_frame;
> > + dma_cookie_t last_cookie;
> >
> > void (*event_handler) (void *data, int status);
> > struct delayed_work link_work;
> > @@ -129,9 +144,14 @@ struct ntb_transport_qp {
> > u64 rx_err_no_buf;
> > u64 rx_err_oflow;
> > u64 rx_err_ver;
> > + u64 rx_memcpy;
> > + u64 rx_async;
> > u64 tx_bytes;
> > u64 tx_pkts;
> > u64 tx_ring_full;
> > + u64 tx_err_no_buf;
> > + u64 tx_memcpy;
> > + u64 tx_async;
> > };
> >
> > struct ntb_transport_mw {
> > @@ -381,7 +401,7 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
> > char *buf;
> > ssize_t ret, out_offset, out_count;
> >
> > - out_count = 600;
> > + out_count = 1000;
> >
> > buf = kmalloc(out_count, GFP_KERNEL);
> > if (!buf)
> > @@ -396,6 +416,10 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
> > out_offset += snprintf(buf + out_offset, out_count - out_offset,
> > "rx_pkts - \t%llu\n", qp->rx_pkts);
> > out_offset += snprintf(buf + out_offset, out_count - out_offset,
> > + "rx_memcpy - \t%llu\n", qp->rx_memcpy);
> > + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> > + "rx_async - \t%llu\n", qp->rx_async);
> > + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> > "rx_ring_empty - %llu\n", qp->rx_ring_empty);
> > out_offset += snprintf(buf + out_offset, out_count - out_offset,
> > "rx_err_no_buf - %llu\n", qp->rx_err_no_buf);
> > @@ -415,8 +439,14 @@ static ssize_t debugfs_read(struct file *filp, char __user *ubuf, size_t count,
> > out_offset += snprintf(buf + out_offset, out_count - out_offset,
> > "tx_pkts - \t%llu\n", qp->tx_pkts);
> > out_offset += snprintf(buf + out_offset, out_count - out_offset,
> > + "tx_memcpy - \t%llu\n", qp->tx_memcpy);
> > + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> > + "tx_async - \t%llu\n", qp->tx_async);
> > + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> > "tx_ring_full - \t%llu\n", qp->tx_ring_full);
> > out_offset += snprintf(buf + out_offset, out_count - out_offset,
> > + "tx_err_no_buf - %llu\n", qp->tx_err_no_buf);
> > + out_offset += snprintf(buf + out_offset, out_count - out_offset,
> > "tx_mw - \t%p\n", qp->tx_mw);
> > out_offset += snprintf(buf + out_offset, out_count - out_offset,
> > "tx_index - \t%u\n", qp->tx_index);
> > @@ -488,11 +518,11 @@ static void ntb_transport_setup_qp_mw(struct ntb_transport *nt,
> > num_qps_mw = nt->max_qps / mw_max;
> >
> > rx_size = (unsigned int) nt->mw[mw_num].size / num_qps_mw;
> > - qp->remote_rx_info = nt->mw[mw_num].virt_addr +
> > - (qp_num / mw_max * rx_size);
> > + qp->rx_buff = nt->mw[mw_num].virt_addr + qp_num / mw_max * rx_size;
> > rx_size -= sizeof(struct ntb_rx_info);
> >
> > - qp->rx_buff = qp->remote_rx_info + 1;
> > + qp->remote_rx_info = qp->rx_buff + rx_size;
> > +
> > /* Due to housekeeping, there must be atleast 2 buffs */
> > qp->rx_max_frame = min(transport_mtu, rx_size / 2);
> > qp->rx_max_entry = rx_size / qp->rx_max_frame;
> > @@ -802,6 +832,7 @@ static void ntb_transport_init_queue(struct ntb_transport *nt,
> > struct ntb_transport_qp *qp;
> > unsigned int num_qps_mw, tx_size;
> > u8 mw_num, mw_max;
> > + u64 qp_offset;
> >
> > mw_max = ntb_max_mw(nt->ndev);
> > mw_num = QP_TO_MW(nt->ndev, qp_num);
> > @@ -820,11 +851,12 @@ static void ntb_transport_init_queue(struct ntb_transport *nt,
> > num_qps_mw = nt->max_qps / mw_max;
> >
> > tx_size = (unsigned int) ntb_get_mw_size(qp->ndev, mw_num) / num_qps_mw;
> > - qp->rx_info = ntb_get_mw_vbase(nt->ndev, mw_num) +
> > - (qp_num / mw_max * tx_size);
> > + qp_offset = qp_num / mw_max * tx_size;
> > + qp->tx_mw = ntb_get_mw_vbase(nt->ndev, mw_num) + qp_offset;
> > + qp->tx_mw_raw = ntb_get_mw_base(qp->ndev, mw_num) + qp_offset;
>
> Just a quibble with the name, why "raw" instead of "phys"?

What's in a name? ;-)

phys is fine.

>
> > tx_size -= sizeof(struct ntb_rx_info);
> > + qp->rx_info = qp->tx_mw + tx_size;
> >
> > - qp->tx_mw = qp->rx_info + 1;
> > /* Due to housekeeping, there must be atleast 2 buffs */
> > qp->tx_max_frame = min(transport_mtu, tx_size / 2);
> > qp->tx_max_entry = tx_size / qp->tx_max_frame;
> > @@ -956,13 +988,22 @@ void ntb_transport_free(void *transport)
> > kfree(nt);
> > }
> >
> > -static void ntb_rx_copy_task(struct ntb_transport_qp *qp,
> > - struct ntb_queue_entry *entry, void *offset)
> > +static void ntb_rx_copy_callback(void *data)
> > {
> > + struct ntb_queue_entry *entry = data;
> > + struct ntb_transport_qp *qp = entry->qp;
> > void *cb_data = entry->cb_data;
> > unsigned int len = entry->len;
> > + struct ntb_payload_header *hdr = entry->rx_hdr;
> > +
> > + wmb();
>
> What is this barrier paired with?

You will see a wmb being removed from ntb_process_rxc below. It is
VERY necessary to be here.

> > + hdr->flags = 0;
> >
> > - memcpy(entry->buf, offset, entry->len);
> > + /* If the callbacks come out of order, the writing of the index to the
> > + * last completed will be out of order. This may result in the the
> > + * receive stalling forever.
> > + */
>
> Is this for the case where we are bouncing back and forth between
> sync/async? Otherwise I do not see how transactions could get out of
> order given you allocate a channel once per queue. Is this comment
> saying that the iowrite32 is somehow a fix, or is this comment a
> FIXME?

There is a case for a mix, the "copy_bytes" variable above switches to
CPU for small transfers (which greatly increases throughput on small
transfers). The caveat to it is the need to flush the DMA engine to
prevent out-of-order. This comment is mainly an reminder of this issue.

> > + iowrite32(entry->index, &qp->rx_info->entry);
> >
> > ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry, &qp->rx_free_q);
> >
> > @@ -970,6 +1011,80 @@ static void ntb_rx_copy_task(struct ntb_transport_qp *qp,
> > qp->rx_handler(qp, qp->cb_data, cb_data, len);
> > }
> >
> > +static void ntb_memcpy_rx(struct ntb_queue_entry *entry, void *offset)
> > +{
> > + void *buf = entry->buf;
> > + size_t len = entry->len;
> > +
> > + memcpy(buf, offset, len);
> > +
> > + ntb_rx_copy_callback(entry);
> > +}
> > +
> > +static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset,
> > + size_t len)
> > +{
> > + struct dma_async_tx_descriptor *txd;
> > + struct ntb_transport_qp *qp = entry->qp;
> > + struct dma_chan *chan = qp->dma_chan;
> > + struct dma_device *device;
> > + dma_addr_t src, dest;
> > + dma_cookie_t cookie;
> > + void *buf = entry->buf;
> > + unsigned long flags;
> > +
> > + entry->len = len;
> > +
> > + if (!chan)
> > + goto err;
> > +
> > + device = chan->device;
> > +
> > + if (len < copy_bytes ||
> > + !is_dma_copy_aligned(device, __pa(offset), __pa(buf), len))
>
> __pa()'s necessary here? I don't think the alignment requirements
> will ever cross PAGE_OFFSET.

The __pa calls cast the void* to an unsigned long, thus keeping
is_dma_copy_aligned happy.

> > + goto err1;
> > +
> > + dest = dma_map_single(device->dev, buf, len, DMA_FROM_DEVICE);
> > + if (dma_mapping_error(device->dev, dest))
> > + goto err1;
> > +
> > + src = dma_map_single(device->dev, offset, len, DMA_TO_DEVICE);
> > + if (dma_mapping_error(device->dev, src))
> > + goto err2;
> > +
> > + flags = DMA_COMPL_DEST_UNMAP_SINGLE | DMA_COMPL_SRC_UNMAP_SINGLE |
> > + DMA_PREP_INTERRUPT;
> > + txd = device->device_prep_dma_memcpy(chan, dest, src, len, flags);
> > + if (!txd)
> > + goto err3;
> > +
> > + txd_clear_parent(txd);
> > + txd_clear_next(txd);
>
> Neither of these should be necessary.

I had crashes in early versions of the code without them, but a quick
test with the removed doesn't show any issues. Good catch.

> > + txd->callback = ntb_rx_copy_callback;
> > + txd->callback_param = entry;
> > +
> > + cookie = dmaengine_submit(txd);
> > + if (dma_submit_error(cookie))
> > + goto err3;
> > +
> > + qp->last_cookie = cookie;
> > +
> > + dma_async_issue_pending(chan);
>
> hmm... can this go in ntb_process_rx() so that the submission is
> batched? Cuts down on mmio.

I moved it down to ntb_transport_rx (after the calls to
ntb_process_rxc), and the performance seems to be roughly the same.

> > + qp->rx_async++;
> > +
> > + return;
> > +
> > +err3:
> > + dma_unmap_single(device->dev, src, len, DMA_TO_DEVICE);
> > +err2:
> > + dma_unmap_single(device->dev, dest, len, DMA_FROM_DEVICE);
> > +err1:
> > + dma_sync_wait(chan, qp->last_cookie);
> > +err:
> > + ntb_memcpy_rx(entry, offset);
> > + qp->rx_memcpy++;
> > +}
> > +
> > static int ntb_process_rxc(struct ntb_transport_qp *qp)
> > {
> > struct ntb_payload_header *hdr;
> > @@ -1008,41 +1123,44 @@ static int ntb_process_rxc(struct ntb_transport_qp *qp)
> > if (hdr->flags & LINK_DOWN_FLAG) {
> > ntb_qp_link_down(qp);
> >
> > - ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
> > - &qp->rx_pend_q);
> > - goto out;
> > + goto err;
> > }
> >
> > dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
> > "rx offset %u, ver %u - %d payload received, buf size %d\n",
> > qp->rx_index, hdr->ver, hdr->len, entry->len);
> >
> > - if (hdr->len <= entry->len) {
> > - entry->len = hdr->len;
> > - ntb_rx_copy_task(qp, entry, offset);
> > - } else {
> > - ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
> > - &qp->rx_pend_q);
> > + qp->rx_bytes += hdr->len;
> > + qp->rx_pkts++;
> >
> > + if (hdr->len > entry->len) {
> > qp->rx_err_oflow++;
> > dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
> > "RX overflow! Wanted %d got %d\n",
> > hdr->len, entry->len);
> > +
> > + goto err;
> > }
> >
> > - qp->rx_bytes += hdr->len;
> > - qp->rx_pkts++;
> > + entry->index = qp->rx_index;
> > + entry->rx_hdr = hdr;
> >
> > -out:
> > - /* Ensure that the data is fully copied out before clearing the flag */
> > - wmb();
> > - hdr->flags = 0;
> > - iowrite32(qp->rx_index, &qp->rx_info->entry);
> > + ntb_async_rx(entry, offset, hdr->len);
> >
> > +out:
> > qp->rx_index++;
> > qp->rx_index %= qp->rx_max_entry;
> >
> > return 0;
> > +
> > +err:
> > + ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
> > + &qp->rx_pend_q);
> > + wmb();
> > + hdr->flags = 0;
> > + iowrite32(qp->rx_index, &qp->rx_info->entry);
> > +
> > + goto out;
> > }
> >
> > static void ntb_transport_rx(unsigned long data)
> > @@ -1070,19 +1188,12 @@ static void ntb_transport_rxc_db(void *data, int db_num)
> > tasklet_schedule(&qp->rx_work);
> > }
> >
> > -static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
> > - struct ntb_queue_entry *entry,
> > - void __iomem *offset)
> > +static void ntb_tx_copy_callback(void *data)
> > {
> > - struct ntb_payload_header __iomem *hdr;
> > -
> > - memcpy_toio(offset, entry->buf, entry->len);
> > -
> > - hdr = offset + qp->tx_max_frame - sizeof(struct ntb_payload_header);
> > - iowrite32(entry->len, &hdr->len);
> > - iowrite32((u32) qp->tx_pkts, &hdr->ver);
> > + struct ntb_queue_entry *entry = data;
> > + struct ntb_transport_qp *qp = entry->qp;
> > + struct ntb_payload_header __iomem *hdr = entry->tx_hdr;
> >
> > - /* Ensure that the data is fully copied out before setting the flag */
> > wmb();
> > iowrite32(entry->flags | DESC_DONE_FLAG, &hdr->flags);
> >
> > @@ -1103,15 +1214,79 @@ static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
> > ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry, &qp->tx_free_q);
> > }
> >
> > -static int ntb_process_tx(struct ntb_transport_qp *qp,
> > - struct ntb_queue_entry *entry)
> > +static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void __iomem *offset)
> > +{
> > + memcpy_toio(offset, entry->buf, entry->len);
> > +
> > + ntb_tx_copy_callback(entry);
> > +}
> > +
> > +static void ntb_async_tx(struct ntb_transport_qp *qp,
> > + struct ntb_queue_entry *entry)
> > {
> > + struct dma_async_tx_descriptor *txd;
> > + struct dma_chan *chan = qp->dma_chan;
> > + struct dma_device *device;
> > + dma_addr_t src, dest;
> > + dma_cookie_t cookie;
> > + struct ntb_payload_header __iomem *hdr;
> > void __iomem *offset;
> > + size_t len = entry->len;
> > + void *buf = entry->buf;
> > + unsigned long flags;
> >
> > offset = qp->tx_mw + qp->tx_max_frame * qp->tx_index;
> > + hdr = offset + qp->tx_max_frame - sizeof(struct ntb_payload_header);
> > + entry->tx_hdr = hdr;
> > +
> > + iowrite32(entry->len, &hdr->len);
> > + iowrite32((u32) qp->tx_pkts, &hdr->ver);
> > +
> > + if (!chan)
> > + goto err;
> >
> > - dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%lld - offset %p, tx %u, entry len %d flags %x buff %p\n",
> > - qp->tx_pkts, offset, qp->tx_index, entry->len, entry->flags,
> > + device = chan->device;
> > +
> > + dest = qp->tx_mw_raw + qp->tx_max_frame * qp->tx_index;
> > +
> > + if (len < copy_bytes ||
> > + !is_dma_copy_aligned(device, __pa(buf), dest, len))
>
> ditto on the use of __pa here.
>
> > + goto err;
> > +
> > + src = dma_map_single(device->dev, buf, len, DMA_TO_DEVICE);
> > + if (dma_mapping_error(device->dev, src))
> > + goto err;
> > +
> > + flags = DMA_COMPL_SRC_UNMAP_SINGLE | DMA_PREP_INTERRUPT;
> > + txd = device->device_prep_dma_memcpy(chan, dest, src, len, flags);
> > + if (!txd)
> > + goto err1;
> > +
> > + txd_clear_parent(txd);
> > + txd_clear_next(txd);
>
> ...again not needed.
>
> > + txd->callback = ntb_tx_copy_callback;
> > + txd->callback_param = entry;
> > +
> > + cookie = dmaengine_submit(txd);
> > + if (dma_submit_error(cookie))
> > + goto err1;
> > +
> > + dma_async_issue_pending(chan);
> > + qp->tx_async++;
> > +
> > + return;
> > +err1:
> > + dma_unmap_single(device->dev, src, len, DMA_TO_DEVICE);
> > +err:
> > + ntb_memcpy_tx(entry, offset);
> > + qp->tx_memcpy++;
> > +}
> > +
> > +static int ntb_process_tx(struct ntb_transport_qp *qp,
> > + struct ntb_queue_entry *entry)
> > +{
> > + dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%lld - tx %u, entry len %d flags %x buff %p\n",
> > + qp->tx_pkts, qp->tx_index, entry->len, entry->flags,
> > entry->buf);
> > if (qp->tx_index == qp->remote_rx_info->entry) {
> > qp->tx_ring_full++;
> > @@ -1127,7 +1302,7 @@ static int ntb_process_tx(struct ntb_transport_qp *qp,
> > return 0;
> > }
> >
> > - ntb_tx_copy_task(qp, entry, offset);
> > + ntb_async_tx(qp, entry);
> >
> > qp->tx_index++;
> > qp->tx_index %= qp->tx_max_entry;
> > @@ -1213,11 +1388,16 @@ ntb_transport_create_queue(void *data, struct pci_dev *pdev,
> > qp->tx_handler = handlers->tx_handler;
> > qp->event_handler = handlers->event_handler;
> >
> > + qp->dma_chan = dma_find_channel(DMA_MEMCPY);
> > + if (!qp->dma_chan)
> > + dev_info(&pdev->dev, "Unable to allocate private DMA channel, using CPU instead\n");
>
> You can drop "private" since this is a public allocation.

Good catch

> > +
> > for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
> > entry = kzalloc(sizeof(struct ntb_queue_entry), GFP_ATOMIC);
> > if (!entry)
> > goto err1;
> >
> > + entry->qp = qp;
> > ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
> > &qp->rx_free_q);
> > }
> > @@ -1227,6 +1407,7 @@ ntb_transport_create_queue(void *data, struct pci_dev *pdev,
> > if (!entry)
> > goto err2;
> >
> > + entry->qp = qp;
> > ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
> > &qp->tx_free_q);
> > }
> > @@ -1272,11 +1453,14 @@ void ntb_transport_free_queue(struct ntb_transport_qp *qp)
> >
> > pdev = ntb_query_pdev(qp->ndev);
> >
> > - cancel_delayed_work_sync(&qp->link_work);
> > + if (qp->dma_chan)
> > + dmaengine_terminate_all(qp->dma_chan);
>
> Do you need this or can you just wait for all outstanding tx / rx to quiesce?

I'd prefer not to spin in the shutdown code waiting for the channel to
quiesce. I suppose I could be nice and give it a small time to do so,
before I smash it in the head with a rock.

> > ntb_unregister_db_callback(qp->ndev, qp->qp_num);
> > tasklet_disable(&qp->rx_work);
> >
> > + cancel_delayed_work_sync(&qp->link_work);
> > +
> > while ((entry = ntb_list_rm(&qp->ntb_rx_free_q_lock, &qp->rx_free_q)))
> > kfree(entry);
> >
> > @@ -1382,8 +1566,10 @@ int ntb_transport_tx_enqueue(struct ntb_transport_qp *qp, void *cb, void *data,
> > return -EINVAL;
> >
> > entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q);
> > - if (!entry)
> > + if (!entry) {
> > + qp->tx_err_no_buf++;
> > return -ENOMEM;
> > + }
> >
> > entry->cb_data = cb;
> > entry->buf = data;
> > @@ -1499,9 +1685,18 @@ EXPORT_SYMBOL_GPL(ntb_transport_qp_num);
> > */
> > unsigned int ntb_transport_max_size(struct ntb_transport_qp *qp)
> > {
> > + unsigned int max;
> > +
> > if (!qp)
> > return 0;
> >
> > - return qp->tx_max_frame - sizeof(struct ntb_payload_header);
> > + if (!qp->dma_chan)
> > + return qp->tx_max_frame - sizeof(struct ntb_payload_header);
> > +
> > + /* If DMA engine usage is possible, try to find the max size for that */
> > + max = qp->tx_max_frame - sizeof(struct ntb_payload_header);
> > + max -= max % (1 << qp->dma_chan->device->copy_align);
>
> Unless I missed it you need a dmaengine_get() dmaengine_put() pair in
> your driver setup/teardown. dmaengine_get() notifies dmaengine of a
> dma_find_channel() user.

Good catch.

Thanks for the review! I'll make the changes and do another spin of
the patch.

Thanks,
Jon

2013-08-19 23:36:28

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 09/15] NTB: Use DMA Engine to Transmit and Receive



On 8/19/13 1:37 PM, "Jon Mason" <[email protected]> wrote:

>On Mon, Aug 19, 2013 at 03:01:54AM -0700, Dan Williams wrote:
>> On Fri, Aug 2, 2013 at 10:35 AM, Jon Mason <[email protected]> wrote:
>> > Allocate and use a DMA engine channel to transmit and receive data
>>over
>> > NTB. If none is allocated, fall back to using the CPU to transfer
>>data.
>> >
>> > Cc: Dan Williams <[email protected]>
>> > Cc: Vinod Koul <[email protected]>
>> > Cc: Dave Jiang <[email protected]>
>> > Signed-off-by: Jon Mason <[email protected]>
>> > ---
>> > drivers/ntb/ntb_hw.c | 17 +++
>> > drivers/ntb/ntb_hw.h | 1 +
>> > drivers/ntb/ntb_transport.c | 285
>>++++++++++++++++++++++++++++++++++++-------
>> > 3 files changed, 258 insertions(+), 45 deletions(-)
>> >
>> > diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
>> > index 1d8e551..014222c 100644
>> > --- a/drivers/ntb/ntb_hw.c
>> > +++ b/drivers/ntb/ntb_hw.c
>> > @@ -350,6 +350,23 @@ int ntb_read_remote_spad(struct ntb_device
>>*ndev, unsigned int idx, u32 *val)
>> > }
>> >
>> > /**
>> > + * ntb_get_mw_base() - get addr for the NTB memory window
>> > + * @ndev: pointer to ntb_device instance
>> > + * @mw: memory window number
>> > + *
>> > + * This function provides the base address of the memory window
>>specified.
>> > + *
>> > + * RETURNS: address, or NULL on error.
>> > + */
>> > +resource_size_t ntb_get_mw_base(struct ntb_device *ndev, unsigned
>>int mw)
>> > +{
>> > + if (mw >= ntb_max_mw(ndev))
>> > + return 0;
>> > +
>> > + return pci_resource_start(ndev->pdev, MW_TO_BAR(mw));
>> > +}

Nothing does error checking on this return value. I think the code should
either be sure that ?mw' is valid (mw_num is passed to the
ntb_get_mw_vbase helper too) and delete the check, or at least make it a
WARN_ONCE. The former seems a tad cleaner to me.


>> > +
>> > +/**
>> > * ntb_get_mw_vbase() - get virtual addr for the NTB memory window
>> > * @ndev: pointer to ntb_device instance
>> > * @mw: memory window number
>> > diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
>> > index b03de80..ab5f768 100644
>> > --- a/drivers/ntb/ntb_hw.h
>> > +++ b/drivers/ntb/ntb_hw.h
>> > @@ -240,6 +240,7 @@ int ntb_write_local_spad(struct ntb_device *ndev,
>>unsigned int idx, u32 val);
>> > int ntb_read_local_spad(struct ntb_device *ndev, unsigned int idx,
>>u32 *val);
>> > int ntb_write_remote_spad(struct ntb_device *ndev, unsigned int idx,
>>u32 val);
>> > int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx,
>>u32 *val);
>> > +resource_size_t ntb_get_mw_base(struct ntb_device *ndev, unsigned
>>int mw);
>> > void __iomem *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int
>>mw);
>> > u64 ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw);
>> > void ntb_ring_sdb(struct ntb_device *ndev, unsigned int idx);
>> > diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
>> > index f7380e9..73a35e4 100644
>> > --- a/drivers/ntb/ntb_transport.c
>> > +++ b/drivers/ntb/ntb_transport.c
>> > @@ -47,6 +47,7 @@
>> > */
>> > #include <linux/debugfs.h>
>> > #include <linux/delay.h>
>> > +#include <linux/dmaengine.h>
>> > #include <linux/dma-mapping.h>
>> > #include <linux/errno.h>
>> > #include <linux/export.h>
>> > @@ -68,6 +69,10 @@ static unsigned char max_num_clients;
>> > module_param(max_num_clients, byte, 0644);
>> > MODULE_PARM_DESC(max_num_clients, "Maximum number of NTB transport
>>clients");
>> >
>> > +static unsigned int copy_bytes = 1024;
>> > +module_param(copy_bytes, uint, 0644);
>> > +MODULE_PARM_DESC(copy_bytes, "Threshold under which NTB will use the
>>CPU to copy instead of DMA");
>> > +
>> > struct ntb_queue_entry {
>> > /* ntb_queue list reference */
>> > struct list_head entry;
>> > @@ -76,6 +81,13 @@ struct ntb_queue_entry {
>> > void *buf;
>> > unsigned int len;
>> > unsigned int flags;
>> > +
>> > + struct ntb_transport_qp *qp;
>> > + union {
>> > + struct ntb_payload_header __iomem *tx_hdr;
>> > + struct ntb_payload_header *rx_hdr;
>> > + };
>> > + unsigned int index;
>> > };
>> >
>> > struct ntb_rx_info {
>> > @@ -86,6 +98,7 @@ struct ntb_transport_qp {
>> > struct ntb_transport *transport;
>> > struct ntb_device *ndev;
>> > void *cb_data;
>> > + struct dma_chan *dma_chan;
>> >
>> > bool client_ready;
>> > bool qp_link;
>> > @@ -99,6 +112,7 @@ struct ntb_transport_qp {
>> > struct list_head tx_free_q;
>> > spinlock_t ntb_tx_free_q_lock;
>> > void __iomem *tx_mw;
>> > + dma_addr_t tx_mw_raw;
>> > unsigned int tx_index;
>> > unsigned int tx_max_entry;
>> > unsigned int tx_max_frame;
>> > @@ -114,6 +128,7 @@ struct ntb_transport_qp {
>> > unsigned int rx_index;
>> > unsigned int rx_max_entry;
>> > unsigned int rx_max_frame;
>> > + dma_cookie_t last_cookie;
>> >
>> > void (*event_handler) (void *data, int status);
>> > struct delayed_work link_work;
>> > @@ -129,9 +144,14 @@ struct ntb_transport_qp {
>> > u64 rx_err_no_buf;
>> > u64 rx_err_oflow;
>> > u64 rx_err_ver;
>> > + u64 rx_memcpy;
>> > + u64 rx_async;
>> > u64 tx_bytes;
>> > u64 tx_pkts;
>> > u64 tx_ring_full;
>> > + u64 tx_err_no_buf;
>> > + u64 tx_memcpy;
>> > + u64 tx_async;
>> > };
>> >
>> > struct ntb_transport_mw {
>> > @@ -381,7 +401,7 @@ static ssize_t debugfs_read(struct file *filp,
>>char __user *ubuf, size_t count,
>> > char *buf;
>> > ssize_t ret, out_offset, out_count;
>> >
>> > - out_count = 600;
>> > + out_count = 1000;
>> >
>> > buf = kmalloc(out_count, GFP_KERNEL);
>> > if (!buf)
>> > @@ -396,6 +416,10 @@ static ssize_t debugfs_read(struct file *filp,
>>char __user *ubuf, size_t count,
>> > out_offset += snprintf(buf + out_offset, out_count -
>>out_offset,
>> > "rx_pkts - \t%llu\n", qp->rx_pkts);
>> > out_offset += snprintf(buf + out_offset, out_count -
>>out_offset,
>> > + "rx_memcpy - \t%llu\n", qp->rx_memcpy);
>> > + out_offset += snprintf(buf + out_offset, out_count -
>>out_offset,
>> > + "rx_async - \t%llu\n", qp->rx_async);
>> > + out_offset += snprintf(buf + out_offset, out_count -
>>out_offset,
>> > "rx_ring_empty - %llu\n",
>>qp->rx_ring_empty);
>> > out_offset += snprintf(buf + out_offset, out_count -
>>out_offset,
>> > "rx_err_no_buf - %llu\n",
>>qp->rx_err_no_buf);
>> > @@ -415,8 +439,14 @@ static ssize_t debugfs_read(struct file *filp,
>>char __user *ubuf, size_t count,
>> > out_offset += snprintf(buf + out_offset, out_count -
>>out_offset,
>> > "tx_pkts - \t%llu\n", qp->tx_pkts);
>> > out_offset += snprintf(buf + out_offset, out_count -
>>out_offset,
>> > + "tx_memcpy - \t%llu\n", qp->tx_memcpy);
>> > + out_offset += snprintf(buf + out_offset, out_count -
>>out_offset,
>> > + "tx_async - \t%llu\n", qp->tx_async);
>> > + out_offset += snprintf(buf + out_offset, out_count -
>>out_offset,
>> > "tx_ring_full - \t%llu\n",
>>qp->tx_ring_full);
>> > out_offset += snprintf(buf + out_offset, out_count -
>>out_offset,
>> > + "tx_err_no_buf - %llu\n",
>>qp->tx_err_no_buf);
>> > + out_offset += snprintf(buf + out_offset, out_count -
>>out_offset,
>> > "tx_mw - \t%p\n", qp->tx_mw);
>> > out_offset += snprintf(buf + out_offset, out_count -
>>out_offset,
>> > "tx_index - \t%u\n", qp->tx_index);
>> > @@ -488,11 +518,11 @@ static void ntb_transport_setup_qp_mw(struct
>>ntb_transport *nt,
>> > num_qps_mw = nt->max_qps / mw_max;
>> >
>> > rx_size = (unsigned int) nt->mw[mw_num].size / num_qps_mw;
>> > - qp->remote_rx_info = nt->mw[mw_num].virt_addr +
>> > - (qp_num / mw_max * rx_size);
>> > + qp->rx_buff = nt->mw[mw_num].virt_addr + qp_num / mw_max *
>>rx_size;
>> > rx_size -= sizeof(struct ntb_rx_info);
>> >
>> > - qp->rx_buff = qp->remote_rx_info + 1;
>> > + qp->remote_rx_info = qp->rx_buff + rx_size;
>> > +
>> > /* Due to housekeeping, there must be atleast 2 buffs */
>> > qp->rx_max_frame = min(transport_mtu, rx_size / 2);
>> > qp->rx_max_entry = rx_size / qp->rx_max_frame;
>> > @@ -802,6 +832,7 @@ static void ntb_transport_init_queue(struct
>>ntb_transport *nt,
>> > struct ntb_transport_qp *qp;
>> > unsigned int num_qps_mw, tx_size;
>> > u8 mw_num, mw_max;
>> > + u64 qp_offset;
>> >
>> > mw_max = ntb_max_mw(nt->ndev);
>> > mw_num = QP_TO_MW(nt->ndev, qp_num);
>> > @@ -820,11 +851,12 @@ static void ntb_transport_init_queue(struct
>>ntb_transport *nt,
>> > num_qps_mw = nt->max_qps / mw_max;
>> >
>> > tx_size = (unsigned int) ntb_get_mw_size(qp->ndev, mw_num) /
>>num_qps_mw;
>> > - qp->rx_info = ntb_get_mw_vbase(nt->ndev, mw_num) +
>> > - (qp_num / mw_max * tx_size);
>> > + qp_offset = qp_num / mw_max * tx_size;
>> > + qp->tx_mw = ntb_get_mw_vbase(nt->ndev, mw_num) + qp_offset;
>> > + qp->tx_mw_raw = ntb_get_mw_base(qp->ndev, mw_num) + qp_offset;
>>
>> Just a quibble with the name, why "raw" instead of "phys"?
>
>What's in a name? ;-)
>
>phys is fine.
>
>>
>> > tx_size -= sizeof(struct ntb_rx_info);
>> > + qp->rx_info = qp->tx_mw + tx_size;
>> >
>> > - qp->tx_mw = qp->rx_info + 1;
>> > /* Due to housekeeping, there must be atleast 2 buffs */
>> > qp->tx_max_frame = min(transport_mtu, tx_size / 2);
>> > qp->tx_max_entry = tx_size / qp->tx_max_frame;
>> > @@ -956,13 +988,22 @@ void ntb_transport_free(void *transport)
>> > kfree(nt);
>> > }
>> >
>> > -static void ntb_rx_copy_task(struct ntb_transport_qp *qp,
>> > - struct ntb_queue_entry *entry, void
>>*offset)
>> > +static void ntb_rx_copy_callback(void *data)
>> > {
>> > + struct ntb_queue_entry *entry = data;
>> > + struct ntb_transport_qp *qp = entry->qp;
>> > void *cb_data = entry->cb_data;
>> > unsigned int len = entry->len;
>> > + struct ntb_payload_header *hdr = entry->rx_hdr;
>> > +
>> > + wmb();
>>
>> What is this barrier paired with?
>
>You will see a wmb being removed from ntb_process_rxc below. It is
>VERY necessary to be here.

If it?s not paired with a read barrier it needs a comment about what it is
ordering. I thought checkpatch would squawk about this? I?m curious why
not a full read back to verify the write completes vs mmiowb()? wmb() is
somewhere in the middle.

>
>> > + hdr->flags = 0;
>> >
>> > - memcpy(entry->buf, offset, entry->len);
>> > + /* If the callbacks come out of order, the writing of the
>>index to the
>> > + * last completed will be out of order. This may result in
>>the the
>> > + * receive stalling forever.
>> > + */
>>
>> Is this for the case where we are bouncing back and forth between
>> sync/async? Otherwise I do not see how transactions could get out of
>> order given you allocate a channel once per queue. Is this comment
>> saying that the iowrite32 is somehow a fix, or is this comment a
>> FIXME?
>
>There is a case for a mix, the "copy_bytes" variable above switches to
>CPU for small transfers (which greatly increases throughput on small
>transfers). The caveat to it is the need to flush the DMA engine to
>prevent out-of-order. This comment is mainly an reminder of this issue.

So this is going forward with the stall as a known issue? The next patch
should just do the sync to prevent the re-ordering, right?

>
>> > + iowrite32(entry->index, &qp->rx_info->entry);
>> >
>> > ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
>>&qp->rx_free_q);
>> >
>> > @@ -970,6 +1011,80 @@ static void ntb_rx_copy_task(struct
>>ntb_transport_qp *qp,
>> > qp->rx_handler(qp, qp->cb_data, cb_data, len);
>> > }
>> >
>> > +static void ntb_memcpy_rx(struct ntb_queue_entry *entry, void
>>*offset)
>> > +{
>> > + void *buf = entry->buf;
>> > + size_t len = entry->len;
>> > +
>> > + memcpy(buf, offset, len);
>> > +
>> > + ntb_rx_copy_callback(entry);
>> > +}
>> > +
>> > +static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset,
>> > + size_t len)
>> > +{
>> > + struct dma_async_tx_descriptor *txd;
>> > + struct ntb_transport_qp *qp = entry->qp;
>> > + struct dma_chan *chan = qp->dma_chan;
>> > + struct dma_device *device;
>> > + dma_addr_t src, dest;
>> > + dma_cookie_t cookie;
>> > + void *buf = entry->buf;
>> > + unsigned long flags;
>> > +
>> > + entry->len = len;
>> > +
>> > + if (!chan)
>> > + goto err;
>> > +
>> > + device = chan->device;
>> > +
>> > + if (len < copy_bytes ||
>> > + !is_dma_copy_aligned(device, __pa(offset), __pa(buf),
>>len))
>>
>> __pa()'s necessary here? I don't think the alignment requirements
>> will ever cross PAGE_OFFSET.
>
>The __pa calls cast the void* to an unsigned long, thus keeping
>is_dma_copy_aligned happy.

Then "(unsigned long) offset, (unsigned long) buf? right? Why munge the
value?

>
>> > + goto err1;
>> > +
>> > + dest = dma_map_single(device->dev, buf, len, DMA_FROM_DEVICE);
>> > + if (dma_mapping_error(device->dev, dest))
>> > + goto err1;
>> > +
>> > + src = dma_map_single(device->dev, offset, len, DMA_TO_DEVICE);
>> > + if (dma_mapping_error(device->dev, src))
>> > + goto err2;
>> > +
>> > + flags = DMA_COMPL_DEST_UNMAP_SINGLE |
>>DMA_COMPL_SRC_UNMAP_SINGLE |
>> > + DMA_PREP_INTERRUPT;
>> > + txd = device->device_prep_dma_memcpy(chan, dest, src, len,
>>flags);
>> > + if (!txd)
>> > + goto err3;
>> > +
>> > + txd_clear_parent(txd);
>> > + txd_clear_next(txd);
>>
>> Neither of these should be necessary.
>
>I had crashes in early versions of the code without them, but a quick
>test with the removed doesn't show any issues. Good catch.
>
>> > + txd->callback = ntb_rx_copy_callback;
>> > + txd->callback_param = entry;
>> > +
>> > + cookie = dmaengine_submit(txd);
>> > + if (dma_submit_error(cookie))
>> > + goto err3;
>> > +
>> > + qp->last_cookie = cookie;
>> > +
>> > + dma_async_issue_pending(chan);
>>
>> hmm... can this go in ntb_process_rx() so that the submission is
>> batched? Cuts down on mmio.
>
>I moved it down to ntb_transport_rx (after the calls to
>ntb_process_rxc), and the performance seems to be roughly the same.

Yeah, not expecting it to be noticeable, but conceptually

submit
submit
submit
submit
issue


Is nicer than:

submit
issue
submit
issue





>
>> > + qp->rx_async++;
>> > +
>> > + return;
>> > +
>> > +err3:
>> > + dma_unmap_single(device->dev, src, len, DMA_TO_DEVICE);
>> > +err2:
>> > + dma_unmap_single(device->dev, dest, len, DMA_FROM_DEVICE);
>> > +err1:
>> > + dma_sync_wait(chan, qp->last_cookie);
>> > +err:
>> > + ntb_memcpy_rx(entry, offset);
>> > + qp->rx_memcpy++;
>> > +}
>> > +
>> > static int ntb_process_rxc(struct ntb_transport_qp *qp)
>> > {
>> > struct ntb_payload_header *hdr;
>> > @@ -1008,41 +1123,44 @@ static int ntb_process_rxc(struct
>>ntb_transport_qp *qp)
>> > if (hdr->flags & LINK_DOWN_FLAG) {
>> > ntb_qp_link_down(qp);
>> >
>> > - ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
>> > - &qp->rx_pend_q);
>> > - goto out;
>> > + goto err;
>> > }
>> >
>> > dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
>> > "rx offset %u, ver %u - %d payload received, buf size
>>%d\n",
>> > qp->rx_index, hdr->ver, hdr->len, entry->len);
>> >
>> > - if (hdr->len <= entry->len) {
>> > - entry->len = hdr->len;
>> > - ntb_rx_copy_task(qp, entry, offset);
>> > - } else {
>> > - ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
>> > - &qp->rx_pend_q);
>> > + qp->rx_bytes += hdr->len;
>> > + qp->rx_pkts++;
>> >
>> > + if (hdr->len > entry->len) {
>> > qp->rx_err_oflow++;
>> > dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
>> > "RX overflow! Wanted %d got %d\n",
>> > hdr->len, entry->len);
>> > +
>> > + goto err;
>> > }
>> >
>> > - qp->rx_bytes += hdr->len;
>> > - qp->rx_pkts++;
>> > + entry->index = qp->rx_index;
>> > + entry->rx_hdr = hdr;
>> >
>> > -out:
>> > - /* Ensure that the data is fully copied out before clearing
>>the flag */
>> > - wmb();
>> > - hdr->flags = 0;
>> > - iowrite32(qp->rx_index, &qp->rx_info->entry);
>> > + ntb_async_rx(entry, offset, hdr->len);
>> >
>> > +out:
>> > qp->rx_index++;
>> > qp->rx_index %= qp->rx_max_entry;
>> >
>> > return 0;
>> > +
>> > +err:
>> > + ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
>> > + &qp->rx_pend_q);
>> > + wmb();
>> > + hdr->flags = 0;
>> > + iowrite32(qp->rx_index, &qp->rx_info->entry);
>> > +
>> > + goto out;
>> > }
>> >
>> > static void ntb_transport_rx(unsigned long data)
>> > @@ -1070,19 +1188,12 @@ static void ntb_transport_rxc_db(void *data,
>>int db_num)
>> > tasklet_schedule(&qp->rx_work);
>> > }
>> >
>> > -static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
>> > - struct ntb_queue_entry *entry,
>> > - void __iomem *offset)
>> > +static void ntb_tx_copy_callback(void *data)
>> > {
>> > - struct ntb_payload_header __iomem *hdr;
>> > -
>> > - memcpy_toio(offset, entry->buf, entry->len);
>> > -
>> > - hdr = offset + qp->tx_max_frame - sizeof(struct
>>ntb_payload_header);
>> > - iowrite32(entry->len, &hdr->len);
>> > - iowrite32((u32) qp->tx_pkts, &hdr->ver);
>> > + struct ntb_queue_entry *entry = data;
>> > + struct ntb_transport_qp *qp = entry->qp;
>> > + struct ntb_payload_header __iomem *hdr = entry->tx_hdr;
>> >
>> > - /* Ensure that the data is fully copied out before setting
>>the flag */
>> > wmb();
>> > iowrite32(entry->flags | DESC_DONE_FLAG, &hdr->flags);
>> >
>> > @@ -1103,15 +1214,79 @@ static void ntb_tx_copy_task(struct
>>ntb_transport_qp *qp,
>> > ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
>>&qp->tx_free_q);
>> > }
>> >
>> > -static int ntb_process_tx(struct ntb_transport_qp *qp,
>> > - struct ntb_queue_entry *entry)
>> > +static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void
>>__iomem *offset)
>> > +{
>> > + memcpy_toio(offset, entry->buf, entry->len);
>> > +
>> > + ntb_tx_copy_callback(entry);
>> > +}
>> > +
>> > +static void ntb_async_tx(struct ntb_transport_qp *qp,
>> > + struct ntb_queue_entry *entry)
>> > {
>> > + struct dma_async_tx_descriptor *txd;
>> > + struct dma_chan *chan = qp->dma_chan;
>> > + struct dma_device *device;
>> > + dma_addr_t src, dest;
>> > + dma_cookie_t cookie;
>> > + struct ntb_payload_header __iomem *hdr;
>> > void __iomem *offset;
>> > + size_t len = entry->len;
>> > + void *buf = entry->buf;
>> > + unsigned long flags;
>> >
>> > offset = qp->tx_mw + qp->tx_max_frame * qp->tx_index;
>> > + hdr = offset + qp->tx_max_frame - sizeof(struct
>>ntb_payload_header);
>> > + entry->tx_hdr = hdr;
>> > +
>> > + iowrite32(entry->len, &hdr->len);
>> > + iowrite32((u32) qp->tx_pkts, &hdr->ver);
>> > +
>> > + if (!chan)
>> > + goto err;
>> >
>> > - dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%lld - offset %p, tx
>>%u, entry len %d flags %x buff %p\n",
>> > - qp->tx_pkts, offset, qp->tx_index, entry->len,
>>entry->flags,
>> > + device = chan->device;
>> > +
>> > + dest = qp->tx_mw_raw + qp->tx_max_frame * qp->tx_index;
>> > +
>> > + if (len < copy_bytes ||
>> > + !is_dma_copy_aligned(device, __pa(buf), dest, len))
>>
>> ditto on the use of __pa here.
>>
>> > + goto err;
>> > +
>> > + src = dma_map_single(device->dev, buf, len, DMA_TO_DEVICE);
>> > + if (dma_mapping_error(device->dev, src))
>> > + goto err;
>> > +
>> > + flags = DMA_COMPL_SRC_UNMAP_SINGLE | DMA_PREP_INTERRUPT;
>> > + txd = device->device_prep_dma_memcpy(chan, dest, src, len,
>>flags);
>> > + if (!txd)
>> > + goto err1;
>> > +
>> > + txd_clear_parent(txd);
>> > + txd_clear_next(txd);
>>
>> ...again not needed.
>>
>> > + txd->callback = ntb_tx_copy_callback;
>> > + txd->callback_param = entry;
>> > +
>> > + cookie = dmaengine_submit(txd);
>> > + if (dma_submit_error(cookie))
>> > + goto err1;
>> > +
>> > + dma_async_issue_pending(chan);
>> > + qp->tx_async++;
>> > +
>> > + return;
>> > +err1:
>> > + dma_unmap_single(device->dev, src, len, DMA_TO_DEVICE);
>> > +err:
>> > + ntb_memcpy_tx(entry, offset);
>> > + qp->tx_memcpy++;
>> > +}
>> > +
>> > +static int ntb_process_tx(struct ntb_transport_qp *qp,
>> > + struct ntb_queue_entry *entry)
>> > +{
>> > + dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%lld - tx %u, entry
>>len %d flags %x buff %p\n",
>> > + qp->tx_pkts, qp->tx_index, entry->len, entry->flags,
>> > entry->buf);
>> > if (qp->tx_index == qp->remote_rx_info->entry) {
>> > qp->tx_ring_full++;
>> > @@ -1127,7 +1302,7 @@ static int ntb_process_tx(struct
>>ntb_transport_qp *qp,
>> > return 0;
>> > }
>> >
>> > - ntb_tx_copy_task(qp, entry, offset);
>> > + ntb_async_tx(qp, entry);
>> >
>> > qp->tx_index++;
>> > qp->tx_index %= qp->tx_max_entry;
>> > @@ -1213,11 +1388,16 @@ ntb_transport_create_queue(void *data, struct
>>pci_dev *pdev,
>> > qp->tx_handler = handlers->tx_handler;
>> > qp->event_handler = handlers->event_handler;
>> >
>> > + qp->dma_chan = dma_find_channel(DMA_MEMCPY);
>> > + if (!qp->dma_chan)
>> > + dev_info(&pdev->dev, "Unable to allocate private DMA
>>channel, using CPU instead\n");
>>
>> You can drop "private" since this is a public allocation.
>
>Good catch
>
>> > +
>> > for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
>> > entry = kzalloc(sizeof(struct ntb_queue_entry),
>>GFP_ATOMIC);
>> > if (!entry)
>> > goto err1;
>> >
>> > + entry->qp = qp;
>> > ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
>> > &qp->rx_free_q);
>> > }
>> > @@ -1227,6 +1407,7 @@ ntb_transport_create_queue(void *data, struct
>>pci_dev *pdev,
>> > if (!entry)
>> > goto err2;
>> >
>> > + entry->qp = qp;
>> > ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
>> > &qp->tx_free_q);
>> > }
>> > @@ -1272,11 +1453,14 @@ void ntb_transport_free_queue(struct
>>ntb_transport_qp *qp)
>> >
>> > pdev = ntb_query_pdev(qp->ndev);
>> >
>> > - cancel_delayed_work_sync(&qp->link_work);
>> > + if (qp->dma_chan)
>> > + dmaengine_terminate_all(qp->dma_chan);
>>
>> Do you need this or can you just wait for all outstanding tx / rx to
>>quiesce?
>
>I'd prefer not to spin in the shutdown code waiting for the channel to
>quiesce. I suppose I could be nice and give it a small time to do so,
>before I smash it in the head with a rock.

But ->device_control is not a mandatory operation. You?re already sync
waiting for the workqueue to die.

>
>> > ntb_unregister_db_callback(qp->ndev, qp->qp_num);
>> > tasklet_disable(&qp->rx_work);
>> >
>> > + cancel_delayed_work_sync(&qp->link_work);
>> > +
>> > while ((entry = ntb_list_rm(&qp->ntb_rx_free_q_lock,
>>&qp->rx_free_q)))
>> > kfree(entry);
>> >
>> > @@ -1382,8 +1566,10 @@ int ntb_transport_tx_enqueue(struct
>>ntb_transport_qp *qp, void *cb, void *data,
>> > return -EINVAL;
>> >
>> > entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q);
>> > - if (!entry)
>> > + if (!entry) {
>> > + qp->tx_err_no_buf++;
>> > return -ENOMEM;
>> > + }
>> >
>> > entry->cb_data = cb;
>> > entry->buf = data;
>> > @@ -1499,9 +1685,18 @@ EXPORT_SYMBOL_GPL(ntb_transport_qp_num);
>> > */
>> > unsigned int ntb_transport_max_size(struct ntb_transport_qp *qp)
>> > {
>> > + unsigned int max;
>> > +
>> > if (!qp)
>> > return 0;
>> >
>> > - return qp->tx_max_frame - sizeof(struct ntb_payload_header);
>> > + if (!qp->dma_chan)
>> > + return qp->tx_max_frame - sizeof(struct
>>ntb_payload_header);
>> > +
>> > + /* If DMA engine usage is possible, try to find the max size
>>for that */
>> > + max = qp->tx_max_frame - sizeof(struct ntb_payload_header);
>> > + max -= max % (1 << qp->dma_chan->device->copy_align);
>>
>> Unless I missed it you need a dmaengine_get() dmaengine_put() pair in
>> your driver setup/teardown. dmaengine_get() notifies dmaengine of a
>> dma_find_channel() user.
>
>Good catch.
>
>Thanks for the review! I'll make the changes and do another spin of
>the patch.
>
>Thanks,
>Jon

2013-08-20 00:07:48

by Jon Mason

[permalink] [raw]
Subject: Re: [PATCH 09/15] NTB: Use DMA Engine to Transmit and Receive

On Mon, Aug 19, 2013 at 11:36:13PM +0000, Dan Williams wrote:
>
>
> On 8/19/13 1:37 PM, "Jon Mason" <[email protected]> wrote:
>
> >On Mon, Aug 19, 2013 at 03:01:54AM -0700, Dan Williams wrote:
> >> On Fri, Aug 2, 2013 at 10:35 AM, Jon Mason <[email protected]> wrote:
> >> > Allocate and use a DMA engine channel to transmit and receive data
> >>over
> >> > NTB. If none is allocated, fall back to using the CPU to transfer
> >>data.
> >> >
> >> > Cc: Dan Williams <[email protected]>
> >> > Cc: Vinod Koul <[email protected]>
> >> > Cc: Dave Jiang <[email protected]>
> >> > Signed-off-by: Jon Mason <[email protected]>
> >> > ---
> >> > drivers/ntb/ntb_hw.c | 17 +++
> >> > drivers/ntb/ntb_hw.h | 1 +
> >> > drivers/ntb/ntb_transport.c | 285
> >>++++++++++++++++++++++++++++++++++++-------
> >> > 3 files changed, 258 insertions(+), 45 deletions(-)
> >> >
> >> > diff --git a/drivers/ntb/ntb_hw.c b/drivers/ntb/ntb_hw.c
> >> > index 1d8e551..014222c 100644
> >> > --- a/drivers/ntb/ntb_hw.c
> >> > +++ b/drivers/ntb/ntb_hw.c
> >> > @@ -350,6 +350,23 @@ int ntb_read_remote_spad(struct ntb_device
> >>*ndev, unsigned int idx, u32 *val)
> >> > }
> >> >
> >> > /**
> >> > + * ntb_get_mw_base() - get addr for the NTB memory window
> >> > + * @ndev: pointer to ntb_device instance
> >> > + * @mw: memory window number
> >> > + *
> >> > + * This function provides the base address of the memory window
> >>specified.
> >> > + *
> >> > + * RETURNS: address, or NULL on error.
> >> > + */
> >> > +resource_size_t ntb_get_mw_base(struct ntb_device *ndev, unsigned
> >>int mw)
> >> > +{
> >> > + if (mw >= ntb_max_mw(ndev))
> >> > + return 0;
> >> > +
> >> > + return pci_resource_start(ndev->pdev, MW_TO_BAR(mw));
> >> > +}
>
> Nothing does error checking on this return value. I think the code should
> either be sure that Œmw' is valid (mw_num is passed to the
> ntb_get_mw_vbase helper too) and delete the check, or at least make it a
> WARN_ONCE. The former seems a tad cleaner to me.

Ugh! Thanks.

>
>
> >> > +
> >> > +/**
> >> > * ntb_get_mw_vbase() - get virtual addr for the NTB memory window
> >> > * @ndev: pointer to ntb_device instance
> >> > * @mw: memory window number
> >> > diff --git a/drivers/ntb/ntb_hw.h b/drivers/ntb/ntb_hw.h
> >> > index b03de80..ab5f768 100644
> >> > --- a/drivers/ntb/ntb_hw.h
> >> > +++ b/drivers/ntb/ntb_hw.h
> >> > @@ -240,6 +240,7 @@ int ntb_write_local_spad(struct ntb_device *ndev,
> >>unsigned int idx, u32 val);
> >> > int ntb_read_local_spad(struct ntb_device *ndev, unsigned int idx,
> >>u32 *val);
> >> > int ntb_write_remote_spad(struct ntb_device *ndev, unsigned int idx,
> >>u32 val);
> >> > int ntb_read_remote_spad(struct ntb_device *ndev, unsigned int idx,
> >>u32 *val);
> >> > +resource_size_t ntb_get_mw_base(struct ntb_device *ndev, unsigned
> >>int mw);
> >> > void __iomem *ntb_get_mw_vbase(struct ntb_device *ndev, unsigned int
> >>mw);
> >> > u64 ntb_get_mw_size(struct ntb_device *ndev, unsigned int mw);
> >> > void ntb_ring_sdb(struct ntb_device *ndev, unsigned int idx);
> >> > diff --git a/drivers/ntb/ntb_transport.c b/drivers/ntb/ntb_transport.c
> >> > index f7380e9..73a35e4 100644
> >> > --- a/drivers/ntb/ntb_transport.c
> >> > +++ b/drivers/ntb/ntb_transport.c
> >> > @@ -47,6 +47,7 @@
> >> > */
> >> > #include <linux/debugfs.h>
> >> > #include <linux/delay.h>
> >> > +#include <linux/dmaengine.h>
> >> > #include <linux/dma-mapping.h>
> >> > #include <linux/errno.h>
> >> > #include <linux/export.h>
> >> > @@ -68,6 +69,10 @@ static unsigned char max_num_clients;
> >> > module_param(max_num_clients, byte, 0644);
> >> > MODULE_PARM_DESC(max_num_clients, "Maximum number of NTB transport
> >>clients");
> >> >
> >> > +static unsigned int copy_bytes = 1024;
> >> > +module_param(copy_bytes, uint, 0644);
> >> > +MODULE_PARM_DESC(copy_bytes, "Threshold under which NTB will use the
> >>CPU to copy instead of DMA");
> >> > +
> >> > struct ntb_queue_entry {
> >> > /* ntb_queue list reference */
> >> > struct list_head entry;
> >> > @@ -76,6 +81,13 @@ struct ntb_queue_entry {
> >> > void *buf;
> >> > unsigned int len;
> >> > unsigned int flags;
> >> > +
> >> > + struct ntb_transport_qp *qp;
> >> > + union {
> >> > + struct ntb_payload_header __iomem *tx_hdr;
> >> > + struct ntb_payload_header *rx_hdr;
> >> > + };
> >> > + unsigned int index;
> >> > };
> >> >
> >> > struct ntb_rx_info {
> >> > @@ -86,6 +98,7 @@ struct ntb_transport_qp {
> >> > struct ntb_transport *transport;
> >> > struct ntb_device *ndev;
> >> > void *cb_data;
> >> > + struct dma_chan *dma_chan;
> >> >
> >> > bool client_ready;
> >> > bool qp_link;
> >> > @@ -99,6 +112,7 @@ struct ntb_transport_qp {
> >> > struct list_head tx_free_q;
> >> > spinlock_t ntb_tx_free_q_lock;
> >> > void __iomem *tx_mw;
> >> > + dma_addr_t tx_mw_raw;
> >> > unsigned int tx_index;
> >> > unsigned int tx_max_entry;
> >> > unsigned int tx_max_frame;
> >> > @@ -114,6 +128,7 @@ struct ntb_transport_qp {
> >> > unsigned int rx_index;
> >> > unsigned int rx_max_entry;
> >> > unsigned int rx_max_frame;
> >> > + dma_cookie_t last_cookie;
> >> >
> >> > void (*event_handler) (void *data, int status);
> >> > struct delayed_work link_work;
> >> > @@ -129,9 +144,14 @@ struct ntb_transport_qp {
> >> > u64 rx_err_no_buf;
> >> > u64 rx_err_oflow;
> >> > u64 rx_err_ver;
> >> > + u64 rx_memcpy;
> >> > + u64 rx_async;
> >> > u64 tx_bytes;
> >> > u64 tx_pkts;
> >> > u64 tx_ring_full;
> >> > + u64 tx_err_no_buf;
> >> > + u64 tx_memcpy;
> >> > + u64 tx_async;
> >> > };
> >> >
> >> > struct ntb_transport_mw {
> >> > @@ -381,7 +401,7 @@ static ssize_t debugfs_read(struct file *filp,
> >>char __user *ubuf, size_t count,
> >> > char *buf;
> >> > ssize_t ret, out_offset, out_count;
> >> >
> >> > - out_count = 600;
> >> > + out_count = 1000;
> >> >
> >> > buf = kmalloc(out_count, GFP_KERNEL);
> >> > if (!buf)
> >> > @@ -396,6 +416,10 @@ static ssize_t debugfs_read(struct file *filp,
> >>char __user *ubuf, size_t count,
> >> > out_offset += snprintf(buf + out_offset, out_count -
> >>out_offset,
> >> > "rx_pkts - \t%llu\n", qp->rx_pkts);
> >> > out_offset += snprintf(buf + out_offset, out_count -
> >>out_offset,
> >> > + "rx_memcpy - \t%llu\n", qp->rx_memcpy);
> >> > + out_offset += snprintf(buf + out_offset, out_count -
> >>out_offset,
> >> > + "rx_async - \t%llu\n", qp->rx_async);
> >> > + out_offset += snprintf(buf + out_offset, out_count -
> >>out_offset,
> >> > "rx_ring_empty - %llu\n",
> >>qp->rx_ring_empty);
> >> > out_offset += snprintf(buf + out_offset, out_count -
> >>out_offset,
> >> > "rx_err_no_buf - %llu\n",
> >>qp->rx_err_no_buf);
> >> > @@ -415,8 +439,14 @@ static ssize_t debugfs_read(struct file *filp,
> >>char __user *ubuf, size_t count,
> >> > out_offset += snprintf(buf + out_offset, out_count -
> >>out_offset,
> >> > "tx_pkts - \t%llu\n", qp->tx_pkts);
> >> > out_offset += snprintf(buf + out_offset, out_count -
> >>out_offset,
> >> > + "tx_memcpy - \t%llu\n", qp->tx_memcpy);
> >> > + out_offset += snprintf(buf + out_offset, out_count -
> >>out_offset,
> >> > + "tx_async - \t%llu\n", qp->tx_async);
> >> > + out_offset += snprintf(buf + out_offset, out_count -
> >>out_offset,
> >> > "tx_ring_full - \t%llu\n",
> >>qp->tx_ring_full);
> >> > out_offset += snprintf(buf + out_offset, out_count -
> >>out_offset,
> >> > + "tx_err_no_buf - %llu\n",
> >>qp->tx_err_no_buf);
> >> > + out_offset += snprintf(buf + out_offset, out_count -
> >>out_offset,
> >> > "tx_mw - \t%p\n", qp->tx_mw);
> >> > out_offset += snprintf(buf + out_offset, out_count -
> >>out_offset,
> >> > "tx_index - \t%u\n", qp->tx_index);
> >> > @@ -488,11 +518,11 @@ static void ntb_transport_setup_qp_mw(struct
> >>ntb_transport *nt,
> >> > num_qps_mw = nt->max_qps / mw_max;
> >> >
> >> > rx_size = (unsigned int) nt->mw[mw_num].size / num_qps_mw;
> >> > - qp->remote_rx_info = nt->mw[mw_num].virt_addr +
> >> > - (qp_num / mw_max * rx_size);
> >> > + qp->rx_buff = nt->mw[mw_num].virt_addr + qp_num / mw_max *
> >>rx_size;
> >> > rx_size -= sizeof(struct ntb_rx_info);
> >> >
> >> > - qp->rx_buff = qp->remote_rx_info + 1;
> >> > + qp->remote_rx_info = qp->rx_buff + rx_size;
> >> > +
> >> > /* Due to housekeeping, there must be atleast 2 buffs */
> >> > qp->rx_max_frame = min(transport_mtu, rx_size / 2);
> >> > qp->rx_max_entry = rx_size / qp->rx_max_frame;
> >> > @@ -802,6 +832,7 @@ static void ntb_transport_init_queue(struct
> >>ntb_transport *nt,
> >> > struct ntb_transport_qp *qp;
> >> > unsigned int num_qps_mw, tx_size;
> >> > u8 mw_num, mw_max;
> >> > + u64 qp_offset;
> >> >
> >> > mw_max = ntb_max_mw(nt->ndev);
> >> > mw_num = QP_TO_MW(nt->ndev, qp_num);
> >> > @@ -820,11 +851,12 @@ static void ntb_transport_init_queue(struct
> >>ntb_transport *nt,
> >> > num_qps_mw = nt->max_qps / mw_max;
> >> >
> >> > tx_size = (unsigned int) ntb_get_mw_size(qp->ndev, mw_num) /
> >>num_qps_mw;
> >> > - qp->rx_info = ntb_get_mw_vbase(nt->ndev, mw_num) +
> >> > - (qp_num / mw_max * tx_size);
> >> > + qp_offset = qp_num / mw_max * tx_size;
> >> > + qp->tx_mw = ntb_get_mw_vbase(nt->ndev, mw_num) + qp_offset;
> >> > + qp->tx_mw_raw = ntb_get_mw_base(qp->ndev, mw_num) + qp_offset;
> >>
> >> Just a quibble with the name, why "raw" instead of "phys"?
> >
> >What's in a name? ;-)
> >
> >phys is fine.
> >
> >>
> >> > tx_size -= sizeof(struct ntb_rx_info);
> >> > + qp->rx_info = qp->tx_mw + tx_size;
> >> >
> >> > - qp->tx_mw = qp->rx_info + 1;
> >> > /* Due to housekeeping, there must be atleast 2 buffs */
> >> > qp->tx_max_frame = min(transport_mtu, tx_size / 2);
> >> > qp->tx_max_entry = tx_size / qp->tx_max_frame;
> >> > @@ -956,13 +988,22 @@ void ntb_transport_free(void *transport)
> >> > kfree(nt);
> >> > }
> >> >
> >> > -static void ntb_rx_copy_task(struct ntb_transport_qp *qp,
> >> > - struct ntb_queue_entry *entry, void
> >>*offset)
> >> > +static void ntb_rx_copy_callback(void *data)
> >> > {
> >> > + struct ntb_queue_entry *entry = data;
> >> > + struct ntb_transport_qp *qp = entry->qp;
> >> > void *cb_data = entry->cb_data;
> >> > unsigned int len = entry->len;
> >> > + struct ntb_payload_header *hdr = entry->rx_hdr;
> >> > +
> >> > + wmb();
> >>
> >> What is this barrier paired with?
> >
> >You will see a wmb being removed from ntb_process_rxc below. It is
> >VERY necessary to be here.
>
> If it¹s not paired with a read barrier it needs a comment about what it is
> ordering. I thought checkpatch would squawk about this? I¹m curious why
> not a full read back to verify the write completes vs mmiowb()? wmb() is
> somewhere in the middle.

I'm pretty sure I ran all this through checkpatch before sending.
I'll re-add the comment I had before.

>
> >
> >> > + hdr->flags = 0;
> >> >
> >> > - memcpy(entry->buf, offset, entry->len);
> >> > + /* If the callbacks come out of order, the writing of the
> >>index to the
> >> > + * last completed will be out of order. This may result in
> >>the the
> >> > + * receive stalling forever.
> >> > + */
> >>
> >> Is this for the case where we are bouncing back and forth between
> >> sync/async? Otherwise I do not see how transactions could get out of
> >> order given you allocate a channel once per queue. Is this comment
> >> saying that the iowrite32 is somehow a fix, or is this comment a
> >> FIXME?
> >
> >There is a case for a mix, the "copy_bytes" variable above switches to
> >CPU for small transfers (which greatly increases throughput on small
> >transfers). The caveat to it is the need to flush the DMA engine to
> >prevent out-of-order. This comment is mainly an reminder of this issue.
>
> So this is going forward with the stall as a known issue? The next patch
> should just do the sync to prevent the re-ordering, right?

There is already a dma_sync_wait in the error path of ntb_async_rx to
enforce the ordering. Do I need to change the comment (or move it) to
make it more obvious what is happening?

> >
> >> > + iowrite32(entry->index, &qp->rx_info->entry);
> >> >
> >> > ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
> >>&qp->rx_free_q);
> >> >
> >> > @@ -970,6 +1011,80 @@ static void ntb_rx_copy_task(struct
> >>ntb_transport_qp *qp,
> >> > qp->rx_handler(qp, qp->cb_data, cb_data, len);
> >> > }
> >> >
> >> > +static void ntb_memcpy_rx(struct ntb_queue_entry *entry, void
> >>*offset)
> >> > +{
> >> > + void *buf = entry->buf;
> >> > + size_t len = entry->len;
> >> > +
> >> > + memcpy(buf, offset, len);
> >> > +
> >> > + ntb_rx_copy_callback(entry);
> >> > +}
> >> > +
> >> > +static void ntb_async_rx(struct ntb_queue_entry *entry, void *offset,
> >> > + size_t len)
> >> > +{
> >> > + struct dma_async_tx_descriptor *txd;
> >> > + struct ntb_transport_qp *qp = entry->qp;
> >> > + struct dma_chan *chan = qp->dma_chan;
> >> > + struct dma_device *device;
> >> > + dma_addr_t src, dest;
> >> > + dma_cookie_t cookie;
> >> > + void *buf = entry->buf;
> >> > + unsigned long flags;
> >> > +
> >> > + entry->len = len;
> >> > +
> >> > + if (!chan)
> >> > + goto err;
> >> > +
> >> > + device = chan->device;
> >> > +
> >> > + if (len < copy_bytes ||
> >> > + !is_dma_copy_aligned(device, __pa(offset), __pa(buf),
> >>len))
> >>
> >> __pa()'s necessary here? I don't think the alignment requirements
> >> will ever cross PAGE_OFFSET.
> >
> >The __pa calls cast the void* to an unsigned long, thus keeping
> >is_dma_copy_aligned happy.
>
> Then "(unsigned long) offset, (unsigned long) buf² right? Why munge the
> value?

Fair enough.

>
> >
> >> > + goto err1;
> >> > +
> >> > + dest = dma_map_single(device->dev, buf, len, DMA_FROM_DEVICE);
> >> > + if (dma_mapping_error(device->dev, dest))
> >> > + goto err1;
> >> > +
> >> > + src = dma_map_single(device->dev, offset, len, DMA_TO_DEVICE);
> >> > + if (dma_mapping_error(device->dev, src))
> >> > + goto err2;
> >> > +
> >> > + flags = DMA_COMPL_DEST_UNMAP_SINGLE |
> >>DMA_COMPL_SRC_UNMAP_SINGLE |
> >> > + DMA_PREP_INTERRUPT;
> >> > + txd = device->device_prep_dma_memcpy(chan, dest, src, len,
> >>flags);
> >> > + if (!txd)
> >> > + goto err3;
> >> > +
> >> > + txd_clear_parent(txd);
> >> > + txd_clear_next(txd);
> >>
> >> Neither of these should be necessary.
> >
> >I had crashes in early versions of the code without them, but a quick
> >test with the removed doesn't show any issues. Good catch.
> >
> >> > + txd->callback = ntb_rx_copy_callback;
> >> > + txd->callback_param = entry;
> >> > +
> >> > + cookie = dmaengine_submit(txd);
> >> > + if (dma_submit_error(cookie))
> >> > + goto err3;
> >> > +
> >> > + qp->last_cookie = cookie;
> >> > +
> >> > + dma_async_issue_pending(chan);
> >>
> >> hmm... can this go in ntb_process_rx() so that the submission is
> >> batched? Cuts down on mmio.
> >
> >I moved it down to ntb_transport_rx (after the calls to
> >ntb_process_rxc), and the performance seems to be roughly the same.
>
> Yeah, not expecting it to be noticeable, but conceptually
>
> submit
> submit
> submit
> submit
> issue
>
>
> Is nicer than:
>
> submit
> issue
> submit
> issue
>
>

I agree, but I liked having all the dma engine awareness
compartmentalized in the ntb_async_* and callbacks.

>
>
>
> >
> >> > + qp->rx_async++;
> >> > +
> >> > + return;
> >> > +
> >> > +err3:
> >> > + dma_unmap_single(device->dev, src, len, DMA_TO_DEVICE);
> >> > +err2:
> >> > + dma_unmap_single(device->dev, dest, len, DMA_FROM_DEVICE);
> >> > +err1:
> >> > + dma_sync_wait(chan, qp->last_cookie);
> >> > +err:
> >> > + ntb_memcpy_rx(entry, offset);
> >> > + qp->rx_memcpy++;
> >> > +}
> >> > +
> >> > static int ntb_process_rxc(struct ntb_transport_qp *qp)
> >> > {
> >> > struct ntb_payload_header *hdr;
> >> > @@ -1008,41 +1123,44 @@ static int ntb_process_rxc(struct
> >>ntb_transport_qp *qp)
> >> > if (hdr->flags & LINK_DOWN_FLAG) {
> >> > ntb_qp_link_down(qp);
> >> >
> >> > - ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
> >> > - &qp->rx_pend_q);
> >> > - goto out;
> >> > + goto err;
> >> > }
> >> >
> >> > dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
> >> > "rx offset %u, ver %u - %d payload received, buf size
> >>%d\n",
> >> > qp->rx_index, hdr->ver, hdr->len, entry->len);
> >> >
> >> > - if (hdr->len <= entry->len) {
> >> > - entry->len = hdr->len;
> >> > - ntb_rx_copy_task(qp, entry, offset);
> >> > - } else {
> >> > - ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
> >> > - &qp->rx_pend_q);
> >> > + qp->rx_bytes += hdr->len;
> >> > + qp->rx_pkts++;
> >> >
> >> > + if (hdr->len > entry->len) {
> >> > qp->rx_err_oflow++;
> >> > dev_dbg(&ntb_query_pdev(qp->ndev)->dev,
> >> > "RX overflow! Wanted %d got %d\n",
> >> > hdr->len, entry->len);
> >> > +
> >> > + goto err;
> >> > }
> >> >
> >> > - qp->rx_bytes += hdr->len;
> >> > - qp->rx_pkts++;
> >> > + entry->index = qp->rx_index;
> >> > + entry->rx_hdr = hdr;
> >> >
> >> > -out:
> >> > - /* Ensure that the data is fully copied out before clearing
> >>the flag */
> >> > - wmb();
> >> > - hdr->flags = 0;
> >> > - iowrite32(qp->rx_index, &qp->rx_info->entry);
> >> > + ntb_async_rx(entry, offset, hdr->len);
> >> >
> >> > +out:
> >> > qp->rx_index++;
> >> > qp->rx_index %= qp->rx_max_entry;
> >> >
> >> > return 0;
> >> > +
> >> > +err:
> >> > + ntb_list_add(&qp->ntb_rx_pend_q_lock, &entry->entry,
> >> > + &qp->rx_pend_q);
> >> > + wmb();
> >> > + hdr->flags = 0;
> >> > + iowrite32(qp->rx_index, &qp->rx_info->entry);
> >> > +
> >> > + goto out;
> >> > }
> >> >
> >> > static void ntb_transport_rx(unsigned long data)
> >> > @@ -1070,19 +1188,12 @@ static void ntb_transport_rxc_db(void *data,
> >>int db_num)
> >> > tasklet_schedule(&qp->rx_work);
> >> > }
> >> >
> >> > -static void ntb_tx_copy_task(struct ntb_transport_qp *qp,
> >> > - struct ntb_queue_entry *entry,
> >> > - void __iomem *offset)
> >> > +static void ntb_tx_copy_callback(void *data)
> >> > {
> >> > - struct ntb_payload_header __iomem *hdr;
> >> > -
> >> > - memcpy_toio(offset, entry->buf, entry->len);
> >> > -
> >> > - hdr = offset + qp->tx_max_frame - sizeof(struct
> >>ntb_payload_header);
> >> > - iowrite32(entry->len, &hdr->len);
> >> > - iowrite32((u32) qp->tx_pkts, &hdr->ver);
> >> > + struct ntb_queue_entry *entry = data;
> >> > + struct ntb_transport_qp *qp = entry->qp;
> >> > + struct ntb_payload_header __iomem *hdr = entry->tx_hdr;
> >> >
> >> > - /* Ensure that the data is fully copied out before setting
> >>the flag */
> >> > wmb();
> >> > iowrite32(entry->flags | DESC_DONE_FLAG, &hdr->flags);
> >> >
> >> > @@ -1103,15 +1214,79 @@ static void ntb_tx_copy_task(struct
> >>ntb_transport_qp *qp,
> >> > ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
> >>&qp->tx_free_q);
> >> > }
> >> >
> >> > -static int ntb_process_tx(struct ntb_transport_qp *qp,
> >> > - struct ntb_queue_entry *entry)
> >> > +static void ntb_memcpy_tx(struct ntb_queue_entry *entry, void
> >>__iomem *offset)
> >> > +{
> >> > + memcpy_toio(offset, entry->buf, entry->len);
> >> > +
> >> > + ntb_tx_copy_callback(entry);
> >> > +}
> >> > +
> >> > +static void ntb_async_tx(struct ntb_transport_qp *qp,
> >> > + struct ntb_queue_entry *entry)
> >> > {
> >> > + struct dma_async_tx_descriptor *txd;
> >> > + struct dma_chan *chan = qp->dma_chan;
> >> > + struct dma_device *device;
> >> > + dma_addr_t src, dest;
> >> > + dma_cookie_t cookie;
> >> > + struct ntb_payload_header __iomem *hdr;
> >> > void __iomem *offset;
> >> > + size_t len = entry->len;
> >> > + void *buf = entry->buf;
> >> > + unsigned long flags;
> >> >
> >> > offset = qp->tx_mw + qp->tx_max_frame * qp->tx_index;
> >> > + hdr = offset + qp->tx_max_frame - sizeof(struct
> >>ntb_payload_header);
> >> > + entry->tx_hdr = hdr;
> >> > +
> >> > + iowrite32(entry->len, &hdr->len);
> >> > + iowrite32((u32) qp->tx_pkts, &hdr->ver);
> >> > +
> >> > + if (!chan)
> >> > + goto err;
> >> >
> >> > - dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%lld - offset %p, tx
> >>%u, entry len %d flags %x buff %p\n",
> >> > - qp->tx_pkts, offset, qp->tx_index, entry->len,
> >>entry->flags,
> >> > + device = chan->device;
> >> > +
> >> > + dest = qp->tx_mw_raw + qp->tx_max_frame * qp->tx_index;
> >> > +
> >> > + if (len < copy_bytes ||
> >> > + !is_dma_copy_aligned(device, __pa(buf), dest, len))
> >>
> >> ditto on the use of __pa here.
> >>
> >> > + goto err;
> >> > +
> >> > + src = dma_map_single(device->dev, buf, len, DMA_TO_DEVICE);
> >> > + if (dma_mapping_error(device->dev, src))
> >> > + goto err;
> >> > +
> >> > + flags = DMA_COMPL_SRC_UNMAP_SINGLE | DMA_PREP_INTERRUPT;
> >> > + txd = device->device_prep_dma_memcpy(chan, dest, src, len,
> >>flags);
> >> > + if (!txd)
> >> > + goto err1;
> >> > +
> >> > + txd_clear_parent(txd);
> >> > + txd_clear_next(txd);
> >>
> >> ...again not needed.
> >>
> >> > + txd->callback = ntb_tx_copy_callback;
> >> > + txd->callback_param = entry;
> >> > +
> >> > + cookie = dmaengine_submit(txd);
> >> > + if (dma_submit_error(cookie))
> >> > + goto err1;
> >> > +
> >> > + dma_async_issue_pending(chan);
> >> > + qp->tx_async++;
> >> > +
> >> > + return;
> >> > +err1:
> >> > + dma_unmap_single(device->dev, src, len, DMA_TO_DEVICE);
> >> > +err:
> >> > + ntb_memcpy_tx(entry, offset);
> >> > + qp->tx_memcpy++;
> >> > +}
> >> > +
> >> > +static int ntb_process_tx(struct ntb_transport_qp *qp,
> >> > + struct ntb_queue_entry *entry)
> >> > +{
> >> > + dev_dbg(&ntb_query_pdev(qp->ndev)->dev, "%lld - tx %u, entry
> >>len %d flags %x buff %p\n",
> >> > + qp->tx_pkts, qp->tx_index, entry->len, entry->flags,
> >> > entry->buf);
> >> > if (qp->tx_index == qp->remote_rx_info->entry) {
> >> > qp->tx_ring_full++;
> >> > @@ -1127,7 +1302,7 @@ static int ntb_process_tx(struct
> >>ntb_transport_qp *qp,
> >> > return 0;
> >> > }
> >> >
> >> > - ntb_tx_copy_task(qp, entry, offset);
> >> > + ntb_async_tx(qp, entry);
> >> >
> >> > qp->tx_index++;
> >> > qp->tx_index %= qp->tx_max_entry;
> >> > @@ -1213,11 +1388,16 @@ ntb_transport_create_queue(void *data, struct
> >>pci_dev *pdev,
> >> > qp->tx_handler = handlers->tx_handler;
> >> > qp->event_handler = handlers->event_handler;
> >> >
> >> > + qp->dma_chan = dma_find_channel(DMA_MEMCPY);
> >> > + if (!qp->dma_chan)
> >> > + dev_info(&pdev->dev, "Unable to allocate private DMA
> >>channel, using CPU instead\n");
> >>
> >> You can drop "private" since this is a public allocation.
> >
> >Good catch
> >
> >> > +
> >> > for (i = 0; i < NTB_QP_DEF_NUM_ENTRIES; i++) {
> >> > entry = kzalloc(sizeof(struct ntb_queue_entry),
> >>GFP_ATOMIC);
> >> > if (!entry)
> >> > goto err1;
> >> >
> >> > + entry->qp = qp;
> >> > ntb_list_add(&qp->ntb_rx_free_q_lock, &entry->entry,
> >> > &qp->rx_free_q);
> >> > }
> >> > @@ -1227,6 +1407,7 @@ ntb_transport_create_queue(void *data, struct
> >>pci_dev *pdev,
> >> > if (!entry)
> >> > goto err2;
> >> >
> >> > + entry->qp = qp;
> >> > ntb_list_add(&qp->ntb_tx_free_q_lock, &entry->entry,
> >> > &qp->tx_free_q);
> >> > }
> >> > @@ -1272,11 +1453,14 @@ void ntb_transport_free_queue(struct
> >>ntb_transport_qp *qp)
> >> >
> >> > pdev = ntb_query_pdev(qp->ndev);
> >> >
> >> > - cancel_delayed_work_sync(&qp->link_work);
> >> > + if (qp->dma_chan)
> >> > + dmaengine_terminate_all(qp->dma_chan);
> >>
> >> Do you need this or can you just wait for all outstanding tx / rx to
> >>quiesce?
> >
> >I'd prefer not to spin in the shutdown code waiting for the channel to
> >quiesce. I suppose I could be nice and give it a small time to do so,
> >before I smash it in the head with a rock.
>
> But ->device_control is not a mandatory operation. You¹re already sync
> waiting for the workqueue to die.

I added it and I do think it will be a little nicer

Thanks,
Jon

>
> >
> >> > ntb_unregister_db_callback(qp->ndev, qp->qp_num);
> >> > tasklet_disable(&qp->rx_work);
> >> >
> >> > + cancel_delayed_work_sync(&qp->link_work);
> >> > +
> >> > while ((entry = ntb_list_rm(&qp->ntb_rx_free_q_lock,
> >>&qp->rx_free_q)))
> >> > kfree(entry);
> >> >
> >> > @@ -1382,8 +1566,10 @@ int ntb_transport_tx_enqueue(struct
> >>ntb_transport_qp *qp, void *cb, void *data,
> >> > return -EINVAL;
> >> >
> >> > entry = ntb_list_rm(&qp->ntb_tx_free_q_lock, &qp->tx_free_q);
> >> > - if (!entry)
> >> > + if (!entry) {
> >> > + qp->tx_err_no_buf++;
> >> > return -ENOMEM;
> >> > + }
> >> >
> >> > entry->cb_data = cb;
> >> > entry->buf = data;
> >> > @@ -1499,9 +1685,18 @@ EXPORT_SYMBOL_GPL(ntb_transport_qp_num);
> >> > */
> >> > unsigned int ntb_transport_max_size(struct ntb_transport_qp *qp)
> >> > {
> >> > + unsigned int max;
> >> > +
> >> > if (!qp)
> >> > return 0;
> >> >
> >> > - return qp->tx_max_frame - sizeof(struct ntb_payload_header);
> >> > + if (!qp->dma_chan)
> >> > + return qp->tx_max_frame - sizeof(struct
> >>ntb_payload_header);
> >> > +
> >> > + /* If DMA engine usage is possible, try to find the max size
> >>for that */
> >> > + max = qp->tx_max_frame - sizeof(struct ntb_payload_header);
> >> > + max -= max % (1 << qp->dma_chan->device->copy_align);
> >>
> >> Unless I missed it you need a dmaengine_get() dmaengine_put() pair in
> >> your driver setup/teardown. dmaengine_get() notifies dmaengine of a
> >> dma_find_channel() user.
> >
> >Good catch.
> >
> >Thanks for the review! I'll make the changes and do another spin of
> >the patch.
> >
> >Thanks,
> >Jon
>

2013-08-20 00:45:58

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH 09/15] NTB: Use DMA Engine to Transmit and Receive



On 8/19/13 5:07 PM, "Jon Mason" <[email protected]> wrote:

>> >> Is this for the case where we are bouncing back and forth between
>> >> sync/async? Otherwise I do not see how transactions could get out of
>> >> order given you allocate a channel once per queue. Is this comment
>> >> saying that the iowrite32 is somehow a fix, or is this comment a
>> >> FIXME?
>> >
>> >There is a case for a mix, the "copy_bytes" variable above switches to
>> >CPU for small transfers (which greatly increases throughput on small
>> >transfers). The caveat to it is the need to flush the DMA engine to
>> >prevent out-of-order. This comment is mainly an reminder of this
>>issue.
>>
>> So this is going forward with the stall as a known issue? The next
>>patch
>> should just do the sync to prevent the re-ordering, right?
>
>There is already a dma_sync_wait in the error path of ntb_async_rx to
>enforce the ordering. Do I need to change the comment (or move it) to
>make it more obvious what is happening?

Yeah, I think it just needs to move to the dma_sync_wait() otherwise it
seems like it?s an open issue that needs fixing.

>>>> > + txd->callback = ntb_rx_copy_callback;
>> >> > + txd->callback_param = entry;
>> >> > +
>> >> > + cookie = dmaengine_submit(txd);
>> >> > + if (dma_submit_error(cookie))
>> >> > + goto err3;
>> >> > +
>> >> > + qp->last_cookie = cookie;
>> >> > +
>> >> > + dma_async_issue_pending(chan);
>> >>
>> >> hmm... can this go in ntb_process_rx() so that the submission is
>> >> batched? Cuts down on mmio.
>> >
>> >I moved it down to ntb_transport_rx (after the calls to
>> >ntb_process_rxc), and the performance seems to be roughly the same.
>>
>> Yeah, not expecting it to be noticeable, but conceptually
>>
>> submit
>> submit
>> submit
>> submit
>> issue
>>
>>
>> Is nicer than:
>>
>> submit
>> issue
>> submit
>> issue
>>
>>
>
>I agree, but I liked having all the dma engine awareness
>compartmentalized in the ntb_async_* and callbacks.

Ok, makes sense.

--
Dan