2018-06-09 00:09:13

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 0/8] Fix breakage caused by the NTB multi-port patchset

Hey,

Here are all the fixes required to get ntb_test on switchtec working
again after the multi-port test patches were merged.

I'd appreciate it if future changes can be a) more careful about
not breaking things, b) communicated more clearly so that better
review can be done, and c) not merged until sufficient review actually
is done.

Note, I sent the first patch in this series earlier; please disregard
the earlier one.

Thanks,

Logan

Logan Gunthorpe (8):
NTB: ntb_tool: reading the link file should not end in a NULL byte
NTB: Setup the DMA mask globally for all drivers
NTB: Fix the default port and peer numbers for legacy drivers
NTB: ntb_pingpong: Choose doorbells based on port number
NTB: perf: Don't require one more memory window than number of peers
NTB: perf: Fix support for hardware that doesn't have port numbers
NTB: perf: Fix race condition when run with ntb_test
NTB: ntb_test: Fix bug when counting remote files

drivers/ntb/hw/amd/ntb_hw_amd.c | 4 ----
drivers/ntb/hw/idt/ntb_hw_idt.c | 6 ------
drivers/ntb/hw/intel/ntb_hw_intel.c | 4 ----
drivers/ntb/ntb.c | 22 ++++++++++++++--------
drivers/ntb/test/ntb_perf.c | 22 +++++++++++++++++++---
drivers/ntb/test/ntb_pingpong.c | 14 ++++++--------
drivers/ntb/test/ntb_tool.c | 3 +--
tools/testing/selftests/ntb/ntb_test.sh | 2 +-
8 files changed, 41 insertions(+), 36 deletions(-)

--
2.11.0


2018-06-09 00:09:33

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 4/8] NTB: ntb_pingpong: Choose doorbells based on port number

This commit fixes pingpong support for existing drivers that do not
implement ntb_default_port_number() and ntb_default_peer_port_number().
This is required for hardware (like the crosslink topology of
switchtec) which cannot assign reasonable port numbers to each port due
to its perfect symmetry.

Instead of picking the doorbell to use based on the the index of the
peer, we use the peer's port number. This is a bit clearer and easier
to understand.

Fixes: c7aeb0afdcc2 ("NTB: ntb_pp: Add full multi-port NTB API support")
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/ntb/test/ntb_pingpong.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/drivers/ntb/test/ntb_pingpong.c b/drivers/ntb/test/ntb_pingpong.c
index 65865e460ab8..18d00eec7b02 100644
--- a/drivers/ntb/test/ntb_pingpong.c
+++ b/drivers/ntb/test/ntb_pingpong.c
@@ -121,15 +121,14 @@ static int pp_find_next_peer(struct pp_ctx *pp)
link = ntb_link_is_up(pp->ntb, NULL, NULL);

/* Find next available peer */
- if (link & pp->nmask) {
+ if (link & pp->nmask)
pidx = __ffs64(link & pp->nmask);
- out_db = BIT_ULL(pidx + 1);
- } else if (link & pp->pmask) {
+ else if (link & pp->pmask)
pidx = __ffs64(link & pp->pmask);
- out_db = BIT_ULL(pidx);
- } else {
+ else
return -ENODEV;
- }
+
+ out_db = BIT_ULL(ntb_peer_port_number(pp->ntb, pidx));

spin_lock(&pp->lock);
pp->out_pidx = pidx;
@@ -303,7 +302,7 @@ static void pp_init_flds(struct pp_ctx *pp)
break;
}

- pp->in_db = BIT_ULL(pidx);
+ pp->in_db = BIT_ULL(lport);
pp->pmask = GENMASK_ULL(pidx, 0) >> 1;
pp->nmask = GENMASK_ULL(pcnt - 1, pidx);

@@ -435,4 +434,3 @@ static void __exit pp_exit(void)
debugfs_remove_recursive(pp_dbgfs_topdir);
}
module_exit(pp_exit);
-
--
2.11.0


2018-06-09 00:09:55

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 8/8] NTB: ntb_test: Fix bug when counting remote files

When remote files are counted in get_files_count, without using SSH,
the code returns 0 because there is a colon prepended to $LOC. $VPATH
should have been used instead of $LOC.

Fixes: 06bd0407d06c ("NTB: ntb_test: Update ntb_tool Scratchpad tests")
Signed-off-by: Logan Gunthorpe <[email protected]>
---
tools/testing/selftests/ntb/ntb_test.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/ntb/ntb_test.sh b/tools/testing/selftests/ntb/ntb_test.sh
index 08cbfbbc7029..17ca36403d04 100755
--- a/tools/testing/selftests/ntb/ntb_test.sh
+++ b/tools/testing/selftests/ntb/ntb_test.sh
@@ -250,7 +250,7 @@ function get_files_count()
split_remote $LOC

if [[ "$REMOTE" == "" ]]; then
- echo $(ls -1 "$LOC"/${NAME}* 2>/dev/null | wc -l)
+ echo $(ls -1 "$VPATH"/${NAME}* 2>/dev/null | wc -l)
else
echo $(ssh "$REMOTE" "ls -1 \"$VPATH\"/${NAME}* | \
wc -l" 2> /dev/null)
--
2.11.0


2018-06-09 00:10:20

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 3/8] NTB: Fix the default port and peer numbers for legacy drivers

When the commit adding ntb_default_port_number() and
ntb_default_peer_port_number() entered the kernel there was no
users of it so it was impossible to tell what the API needed.

When a user finally landed a year later (ntb_pingpong) there were
more NTB topologies were created and no consideration was considered
to how other drivers had changed.

Now that there is a user it can be fixed to provide a sensible default
for the legacy drivers that do not implement ntb_{peer_}port_number().
Seeing ntb_pingpong doesn't check error codes returning EINVAL was also
not sensible.

Patches for ntb_pingpong and ntb_perf follow (which are broken
otherwise) to support hardware that doesn't have port numbers. This is
important not only to not break support with existing drivers but for
the cross link topology which, due to its perfect symmetry, cannot
assign unique port numbers to each side.

Fixes: 1e5301196a88 ("NTB: Add indexed ports NTB API")
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/ntb/ntb.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/ntb/ntb.c b/drivers/ntb/ntb.c
index 93f24440d11d..d955a92a095a 100644
--- a/drivers/ntb/ntb.c
+++ b/drivers/ntb/ntb.c
@@ -225,10 +225,8 @@ int ntb_default_port_number(struct ntb_dev *ntb)
case NTB_TOPO_B2B_DSD:
return NTB_PORT_SEC_DSD;
default:
- break;
+ return 0;
}
-
- return -EINVAL;
}
EXPORT_SYMBOL(ntb_default_port_number);

@@ -251,10 +249,8 @@ int ntb_default_peer_port_number(struct ntb_dev *ntb, int pidx)
case NTB_TOPO_B2B_DSD:
return NTB_PORT_PRI_USD;
default:
- break;
+ return 0;
}
-
- return -EINVAL;
}
EXPORT_SYMBOL(ntb_default_peer_port_number);

@@ -326,4 +322,3 @@ static void __exit ntb_driver_exit(void)
bus_unregister(&ntb_bus);
}
module_exit(ntb_driver_exit);
-
--
2.11.0


2018-06-09 00:11:05

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 5/8] NTB: perf: Don't require one more memory window than peer

NTB Perf should not require more than one memory window per peer. This
was probably an off-by-one error.

Fixes: 5648e56d03fa ("NTB: ntb_perf: Add full multi-port NTB API support")
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/ntb/test/ntb_perf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index 2a9d6b0d1f19..fe27412ffe91 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -655,7 +655,7 @@ static int perf_init_service(struct perf_ctx *perf)
{
u64 mask;

- if (ntb_peer_mw_count(perf->ntb) < perf->pcnt + 1) {
+ if (ntb_peer_mw_count(perf->ntb) < perf->pcnt) {
dev_err(&perf->ntb->dev, "Not enough memory windows\n");
return -EINVAL;
}
--
2.11.0


2018-06-09 00:11:15

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 1/8] NTB: ntb_tool: reading the link file should not end in a NULL byte

When running ntb_test this warning is issued:

./ntb_test.sh: line 200: warning: command substitution: ignored null
byte in input

This is caused by the kernel returning one more byte than is necessary
when reading the link file.

Reduce the number of bytes read back to 2 as it was before the
commit that regressed this.

Fixes: 7f46c8b3a552 ("NTB: ntb_tool: Add full multi-port NTB API support")
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/ntb/test/ntb_tool.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/ntb/test/ntb_tool.c b/drivers/ntb/test/ntb_tool.c
index d592c0ffbd19..ec5cf095cdb9 100644
--- a/drivers/ntb/test/ntb_tool.c
+++ b/drivers/ntb/test/ntb_tool.c
@@ -504,7 +504,7 @@ static ssize_t tool_peer_link_read(struct file *filep, char __user *ubuf,
buf[1] = '\n';
buf[2] = '\0';

- return simple_read_from_buffer(ubuf, size, offp, buf, 3);
+ return simple_read_from_buffer(ubuf, size, offp, buf, 2);
}

static TOOL_FOPS_RDWR(tool_peer_link_fops,
@@ -1690,4 +1690,3 @@ static void __exit tool_exit(void)
debugfs_remove_recursive(tool_dbgfs_topdir);
}
module_exit(tool_exit);
-
--
2.11.0


2018-06-09 00:11:26

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 2/8] NTB: Setup the DMA mask globally for all drivers

Commit 417cf39cfea9 ("NTB: Set dma mask and dma coherent mask to NTB
devices") added code to set the DMA mask for the NTB device
to each driver individually. However, it neglected to set it for the
Switchtec driver. So when the monolithic commit 7f46c8b3a552 ("NTB:
ntb_tool: Add full multi-port NTB API support") started allocating
DMA memory against the NTB device it broke the Switchtec driver.

Seeing this is setting up a property of the NTB device, it should be
done by the common NTB code (inside ntb_register_device()) so we can be
sure it's done properly for all drivers. This avoids each driver needing
to duplicate the code and helps prevent us from inadvertently breaking
one of the drivers in the future if we have to make changes in this area.

Fixes: 7f46c8b3a552 ("NTB: ntb_tool: Add full multi-port NTB API support")
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/ntb/hw/amd/ntb_hw_amd.c | 4 ----
drivers/ntb/hw/idt/ntb_hw_idt.c | 6 ------
drivers/ntb/hw/intel/ntb_hw_intel.c | 4 ----
drivers/ntb/ntb.c | 13 ++++++++++++-
4 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/ntb/hw/amd/ntb_hw_amd.c b/drivers/ntb/hw/amd/ntb_hw_amd.c
index 3cfa46876239..f0788aae05c9 100644
--- a/drivers/ntb/hw/amd/ntb_hw_amd.c
+++ b/drivers/ntb/hw/amd/ntb_hw_amd.c
@@ -1020,10 +1020,6 @@ static int amd_ntb_init_pci(struct amd_ntb_dev *ndev,
goto err_dma_mask;
dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
}
- rc = dma_coerce_mask_and_coherent(&ndev->ntb.dev,
- dma_get_mask(&pdev->dev));
- if (rc)
- goto err_dma_mask;

ndev->self_mmio = pci_iomap(pdev, 0, 0);
if (!ndev->self_mmio) {
diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
index 8d98872d0983..1918a2db1c43 100644
--- a/drivers/ntb/hw/idt/ntb_hw_idt.c
+++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
@@ -2447,12 +2447,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
dev_warn(&pdev->dev,
"Cannot set consistent DMA highmem bit mask\n");
}
- ret = dma_coerce_mask_and_coherent(&ndev->ntb.dev,
- dma_get_mask(&pdev->dev));
- if (ret != 0) {
- dev_err(&pdev->dev, "Failed to set NTB device DMA bit mask\n");
- return ret;
- }

/*
* Enable the device advanced error reporting. It's not critical to
diff --git a/drivers/ntb/hw/intel/ntb_hw_intel.c b/drivers/ntb/hw/intel/ntb_hw_intel.c
index 156b45cd4a19..341a3d5baa3f 100644
--- a/drivers/ntb/hw/intel/ntb_hw_intel.c
+++ b/drivers/ntb/hw/intel/ntb_hw_intel.c
@@ -2334,10 +2334,6 @@ static int intel_ntb_init_pci(struct intel_ntb_dev *ndev, struct pci_dev *pdev)
goto err_dma_mask;
dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
}
- rc = dma_coerce_mask_and_coherent(&ndev->ntb.dev,
- dma_get_mask(&pdev->dev));
- if (rc)
- goto err_dma_mask;

ndev->self_mmio = pci_iomap(pdev, 0, 0);
if (!ndev->self_mmio) {
diff --git a/drivers/ntb/ntb.c b/drivers/ntb/ntb.c
index 2581ab724c34..93f24440d11d 100644
--- a/drivers/ntb/ntb.c
+++ b/drivers/ntb/ntb.c
@@ -100,6 +100,8 @@ EXPORT_SYMBOL(ntb_unregister_client);

int ntb_register_device(struct ntb_dev *ntb)
{
+ int ret;
+
if (!ntb)
return -EINVAL;
if (!ntb->pdev)
@@ -120,7 +122,16 @@ int ntb_register_device(struct ntb_dev *ntb)
ntb->ctx_ops = NULL;
spin_lock_init(&ntb->ctx_lock);

- return device_register(&ntb->dev);
+ device_initialize(&ntb->dev);
+
+ ret = dma_coerce_mask_and_coherent(&ntb->dev,
+ dma_get_mask(&ntb->pdev->dev));
+ if (ret != 0) {
+ dev_err(&ntb->dev, "Failed to set NTB device DMA bit mask\n");
+ return ret;
+ }
+
+ return device_add(&ntb->dev);
}
EXPORT_SYMBOL(ntb_register_device);

--
2.11.0


2018-06-09 00:11:35

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 6/8] NTB: perf: Fix support for hardware that doesn't have port numbers

Legacy drivers do not have port numbers (but is reliably only two ports)
and was broken by the recent commit that added mult-port support to
ntb_perf. This is especially important to support the cross link
topology which is perfectly symmetric and cannot assign unique port
numbers easily.

Hardware that returns zero for both the local port and the peer should
just always use gidx=0 for the only peer.

Fixes: 5648e56d03fa ("NTB: ntb_perf: Add full multi-port NTB API support")
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/ntb/test/ntb_perf.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index fe27412ffe91..6285cb8515ac 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -1417,6 +1417,16 @@ static int perf_init_peers(struct perf_ctx *perf)
if (perf->gidx == -1)
perf->gidx = pidx;

+ /*
+ * Hardware with only two ports may not have unique port
+ * numbers. In this case, the gidxs should all be zero.
+ */
+ if (perf->pcnt == 1 && ntb_port_number(perf->ntb) == 0 &&
+ ntb_peer_port_number(perf->ntb, 0) == 0) {
+ perf->gidx = 0;
+ perf->peers[0].gidx = 0;
+ }
+
for (pidx = 0; pidx < perf->pcnt; pidx++) {
ret = perf_setup_peer_mw(&perf->peers[pidx]);
if (ret)
--
2.11.0


2018-06-09 00:11:47

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 5/8] NTB: perf: Don't require one more memory window than peer

Oops, sorry, I double posted patch 5. Please disregard the second one.

Logan

On 08/06/18 06:08 PM, Logan Gunthorpe wrote:
> NTB Perf should not require more than one memory window per peer. This
> was probably an off-by-one error.
>
> Fixes: 5648e56d03fa ("NTB: ntb_perf: Add full multi-port NTB API support")
> Signed-off-by: Logan Gunthorpe <[email protected]>
> ---
> drivers/ntb/test/ntb_perf.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
> index 2a9d6b0d1f19..fe27412ffe91 100644
> --- a/drivers/ntb/test/ntb_perf.c
> +++ b/drivers/ntb/test/ntb_perf.c
> @@ -655,7 +655,7 @@ static int perf_init_service(struct perf_ctx *perf)
> {
> u64 mask;
>
> - if (ntb_peer_mw_count(perf->ntb) < perf->pcnt + 1) {
> + if (ntb_peer_mw_count(perf->ntb) < perf->pcnt) {
> dev_err(&perf->ntb->dev, "Not enough memory windows\n");
> return -EINVAL;
> }
>

2018-06-09 00:11:48

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 7/8] NTB: perf: Fix race condition when run with ntb_test

When running ntb_test, the script tries to run the ntb_perf test
immediately after probing the modules. Since adding multi-port support,
this fails seeing the new initialization procedure in ntb_perf
can not complete instantly.

To fix this we add a completion which is waited on when a test is
started. In this way, run can be written any time after the module is
loaded and it will wait for the initialization to complete instead of
sending an error.

Fixes: 5648e56d03fa ("NTB: ntb_perf: Add full multi-port NTB API support")
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/ntb/test/ntb_perf.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index 6285cb8515ac..8e2b7630ecc9 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -158,6 +158,8 @@ struct perf_peer {
/* NTB connection setup service */
struct work_struct service;
unsigned long sts;
+
+ struct completion init_comp;
};
#define to_peer_service(__work) \
container_of(__work, struct perf_peer, service)
@@ -549,6 +551,7 @@ static int perf_setup_outbuf(struct perf_peer *peer)

/* Initialization is finally done */
set_bit(PERF_STS_DONE, &peer->sts);
+ complete_all(&peer->init_comp);

return 0;
}
@@ -639,6 +642,7 @@ static void perf_service_work(struct work_struct *work)
perf_setup_outbuf(peer);

if (test_and_clear_bit(PERF_CMD_CLEAR, &peer->sts)) {
+ init_completion(&peer->init_comp);
clear_bit(PERF_STS_DONE, &peer->sts);
if (test_bit(0, &peer->perf->busy_flag) &&
peer == peer->perf->test_peer) {
@@ -1046,8 +1050,9 @@ static int perf_submit_test(struct perf_peer *peer)
struct perf_thread *pthr;
int tidx, ret;

- if (!test_bit(PERF_STS_DONE, &peer->sts))
- return -ENOLINK;
+ ret = wait_for_completion_interruptible(&peer->init_comp);
+ if (ret < 0)
+ return ret;

if (test_and_set_bit_lock(0, &perf->busy_flag))
return -EBUSY;
@@ -1413,6 +1418,7 @@ static int perf_init_peers(struct perf_ctx *perf)
peer->gidx = pidx;
}
INIT_WORK(&peer->service, perf_service_work);
+ init_completion(&peer->init_comp);
}
if (perf->gidx == -1)
perf->gidx = pidx;
--
2.11.0


2018-06-09 00:12:05

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 5/8] NTB: perf: Don't require one more memory window than number of peers

ntb_perf should not require more than one memory window per peer. This
was probably an off-by-one error.

Fixes: 5648e56d03fa ("NTB: ntb_perf: Add full multi-port NTB API support")
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/ntb/test/ntb_perf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
index 2a9d6b0d1f19..fe27412ffe91 100644
--- a/drivers/ntb/test/ntb_perf.c
+++ b/drivers/ntb/test/ntb_perf.c
@@ -655,7 +655,7 @@ static int perf_init_service(struct perf_ctx *perf)
{
u64 mask;

- if (ntb_peer_mw_count(perf->ntb) < perf->pcnt + 1) {
+ if (ntb_peer_mw_count(perf->ntb) < perf->pcnt) {
dev_err(&perf->ntb->dev, "Not enough memory windows\n");
return -EINVAL;
}
--
2.11.0


2018-06-10 01:46:31

by Allen Hubbe

[permalink] [raw]
Subject: Re: [PATCH 0/8] Fix breakage caused by the NTB multi-port patchset

On Fri, Jun 8, 2018 at 8:08 PM, Logan Gunthorpe <[email protected]> wrote:
> Hey,
>
> Here are all the fixes required to get ntb_test on switchtec working
> again after the multi-port test patches were merged.
>
> I'd appreciate it if future changes can be a) more careful about
> not breaking things, b) communicated more clearly so that better
> review can be done, and c) not merged until sufficient review actually
> is done.
>
> Note, I sent the first patch in this series earlier; please disregard
> the earlier one.
>
> Thanks,
>
> Logan
>
> Logan Gunthorpe (8):
> NTB: ntb_tool: reading the link file should not end in a NULL byte
> NTB: Setup the DMA mask globally for all drivers
> NTB: Fix the default port and peer numbers for legacy drivers
> NTB: ntb_pingpong: Choose doorbells based on port number
> NTB: perf: Don't require one more memory window than number of peers
> NTB: perf: Fix support for hardware that doesn't have port numbers
> NTB: perf: Fix race condition when run with ntb_test
> NTB: ntb_test: Fix bug when counting remote files

Thanks Logan.

Series:
Acked-by: Allen Hubbe <[email protected]>

>
> drivers/ntb/hw/amd/ntb_hw_amd.c | 4 ----
> drivers/ntb/hw/idt/ntb_hw_idt.c | 6 ------
> drivers/ntb/hw/intel/ntb_hw_intel.c | 4 ----
> drivers/ntb/ntb.c | 22 ++++++++++++++--------
> drivers/ntb/test/ntb_perf.c | 22 +++++++++++++++++++---
> drivers/ntb/test/ntb_pingpong.c | 14 ++++++--------
> drivers/ntb/test/ntb_tool.c | 3 +--
> tools/testing/selftests/ntb/ntb_test.sh | 2 +-
> 8 files changed, 41 insertions(+), 36 deletions(-)
>
> --
> 2.11.0

2018-06-11 20:55:46

by Dave Jiang

[permalink] [raw]
Subject: Re: [PATCH 2/8] NTB: Setup the DMA mask globally for all drivers



On 06/08/2018 05:08 PM, Logan Gunthorpe wrote:
> Commit 417cf39cfea9 ("NTB: Set dma mask and dma coherent mask to NTB
> devices") added code to set the DMA mask for the NTB device
> to each driver individually. However, it neglected to set it for the
> Switchtec driver. So when the monolithic commit 7f46c8b3a552 ("NTB:
> ntb_tool: Add full multi-port NTB API support") started allocating
> DMA memory against the NTB device it broke the Switchtec driver.
>
> Seeing this is setting up a property of the NTB device, it should be
> done by the common NTB code (inside ntb_register_device()) so we can be
> sure it's done properly for all drivers. This avoids each driver needing
> to duplicate the code and helps prevent us from inadvertently breaking
> one of the drivers in the future if we have to make changes in this area.
>
> Fixes: 7f46c8b3a552 ("NTB: ntb_tool: Add full multi-port NTB API support")
> Signed-off-by: Logan Gunthorpe <[email protected]>

Acked-by: Dave Jiang <[email protected]>
for the Intel parts and the generic parts

> ---
> drivers/ntb/hw/amd/ntb_hw_amd.c | 4 ----
> drivers/ntb/hw/idt/ntb_hw_idt.c | 6 ------
> drivers/ntb/hw/intel/ntb_hw_intel.c | 4 ----
> drivers/ntb/ntb.c | 13 ++++++++++++-
> 4 files changed, 12 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/ntb/hw/amd/ntb_hw_amd.c b/drivers/ntb/hw/amd/ntb_hw_amd.c
> index 3cfa46876239..f0788aae05c9 100644
> --- a/drivers/ntb/hw/amd/ntb_hw_amd.c
> +++ b/drivers/ntb/hw/amd/ntb_hw_amd.c
> @@ -1020,10 +1020,6 @@ static int amd_ntb_init_pci(struct amd_ntb_dev *ndev,
> goto err_dma_mask;
> dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
> }
> - rc = dma_coerce_mask_and_coherent(&ndev->ntb.dev,
> - dma_get_mask(&pdev->dev));
> - if (rc)
> - goto err_dma_mask;
>
> ndev->self_mmio = pci_iomap(pdev, 0, 0);
> if (!ndev->self_mmio) {
> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> index 8d98872d0983..1918a2db1c43 100644
> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> @@ -2447,12 +2447,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
> dev_warn(&pdev->dev,
> "Cannot set consistent DMA highmem bit mask\n");
> }
> - ret = dma_coerce_mask_and_coherent(&ndev->ntb.dev,
> - dma_get_mask(&pdev->dev));
> - if (ret != 0) {
> - dev_err(&pdev->dev, "Failed to set NTB device DMA bit mask\n");
> - return ret;
> - }
>
> /*
> * Enable the device advanced error reporting. It's not critical to
> diff --git a/drivers/ntb/hw/intel/ntb_hw_intel.c b/drivers/ntb/hw/intel/ntb_hw_intel.c
> index 156b45cd4a19..341a3d5baa3f 100644
> --- a/drivers/ntb/hw/intel/ntb_hw_intel.c
> +++ b/drivers/ntb/hw/intel/ntb_hw_intel.c
> @@ -2334,10 +2334,6 @@ static int intel_ntb_init_pci(struct intel_ntb_dev *ndev, struct pci_dev *pdev)
> goto err_dma_mask;
> dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
> }
> - rc = dma_coerce_mask_and_coherent(&ndev->ntb.dev,
> - dma_get_mask(&pdev->dev));
> - if (rc)
> - goto err_dma_mask;
>
> ndev->self_mmio = pci_iomap(pdev, 0, 0);
> if (!ndev->self_mmio) {
> diff --git a/drivers/ntb/ntb.c b/drivers/ntb/ntb.c
> index 2581ab724c34..93f24440d11d 100644
> --- a/drivers/ntb/ntb.c
> +++ b/drivers/ntb/ntb.c
> @@ -100,6 +100,8 @@ EXPORT_SYMBOL(ntb_unregister_client);
>
> int ntb_register_device(struct ntb_dev *ntb)
> {
> + int ret;
> +
> if (!ntb)
> return -EINVAL;
> if (!ntb->pdev)
> @@ -120,7 +122,16 @@ int ntb_register_device(struct ntb_dev *ntb)
> ntb->ctx_ops = NULL;
> spin_lock_init(&ntb->ctx_lock);
>
> - return device_register(&ntb->dev);
> + device_initialize(&ntb->dev);
> +
> + ret = dma_coerce_mask_and_coherent(&ntb->dev,
> + dma_get_mask(&ntb->pdev->dev));
> + if (ret != 0) {
> + dev_err(&ntb->dev, "Failed to set NTB device DMA bit mask\n");
> + return ret;
> + }
> +
> + return device_add(&ntb->dev);
> }
> EXPORT_SYMBOL(ntb_register_device);
>
>

2018-06-12 15:52:07

by Jon Mason

[permalink] [raw]
Subject: Re: [PATCH 2/8] NTB: Setup the DMA mask globally for all drivers

On Fri, Jun 8, 2018 at 8:08 PM, Logan Gunthorpe <[email protected]> wrote:
> Commit 417cf39cfea9 ("NTB: Set dma mask and dma coherent mask to NTB
> devices") added code to set the DMA mask for the NTB device
> to each driver individually. However, it neglected to set it for the
> Switchtec driver. So when the monolithic commit 7f46c8b3a552 ("NTB:
> ntb_tool: Add full multi-port NTB API support") started allocating
> DMA memory against the NTB device it broke the Switchtec driver.
>
> Seeing this is setting up a property of the NTB device, it should be
> done by the common NTB code (inside ntb_register_device()) so we can be
> sure it's done properly for all drivers. This avoids each driver needing
> to duplicate the code and helps prevent us from inadvertently breaking
> one of the drivers in the future if we have to make changes in this area.
>
> Fixes: 7f46c8b3a552 ("NTB: ntb_tool: Add full multi-port NTB API support")
> Signed-off-by: Logan Gunthorpe <[email protected]>
> ---
> drivers/ntb/hw/amd/ntb_hw_amd.c | 4 ----
> drivers/ntb/hw/idt/ntb_hw_idt.c | 6 ------
> drivers/ntb/hw/intel/ntb_hw_intel.c | 4 ----
> drivers/ntb/ntb.c | 13 ++++++++++++-
> 4 files changed, 12 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/ntb/hw/amd/ntb_hw_amd.c b/drivers/ntb/hw/amd/ntb_hw_amd.c
> index 3cfa46876239..f0788aae05c9 100644
> --- a/drivers/ntb/hw/amd/ntb_hw_amd.c
> +++ b/drivers/ntb/hw/amd/ntb_hw_amd.c
> @@ -1020,10 +1020,6 @@ static int amd_ntb_init_pci(struct amd_ntb_dev *ndev,
> goto err_dma_mask;
> dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
> }
> - rc = dma_coerce_mask_and_coherent(&ndev->ntb.dev,
> - dma_get_mask(&pdev->dev));
> - if (rc)
> - goto err_dma_mask;
>
> ndev->self_mmio = pci_iomap(pdev, 0, 0);
> if (!ndev->self_mmio) {
> diff --git a/drivers/ntb/hw/idt/ntb_hw_idt.c b/drivers/ntb/hw/idt/ntb_hw_idt.c
> index 8d98872d0983..1918a2db1c43 100644
> --- a/drivers/ntb/hw/idt/ntb_hw_idt.c
> +++ b/drivers/ntb/hw/idt/ntb_hw_idt.c
> @@ -2447,12 +2447,6 @@ static int idt_init_pci(struct idt_ntb_dev *ndev)
> dev_warn(&pdev->dev,
> "Cannot set consistent DMA highmem bit mask\n");
> }
> - ret = dma_coerce_mask_and_coherent(&ndev->ntb.dev,
> - dma_get_mask(&pdev->dev));
> - if (ret != 0) {
> - dev_err(&pdev->dev, "Failed to set NTB device DMA bit mask\n");
> - return ret;
> - }
>
> /*
> * Enable the device advanced error reporting. It's not critical to
> diff --git a/drivers/ntb/hw/intel/ntb_hw_intel.c b/drivers/ntb/hw/intel/ntb_hw_intel.c
> index 156b45cd4a19..341a3d5baa3f 100644
> --- a/drivers/ntb/hw/intel/ntb_hw_intel.c
> +++ b/drivers/ntb/hw/intel/ntb_hw_intel.c
> @@ -2334,10 +2334,6 @@ static int intel_ntb_init_pci(struct intel_ntb_dev *ndev, struct pci_dev *pdev)
> goto err_dma_mask;
> dev_warn(&pdev->dev, "Cannot DMA consistent highmem\n");
> }
> - rc = dma_coerce_mask_and_coherent(&ndev->ntb.dev,
> - dma_get_mask(&pdev->dev));
> - if (rc)
> - goto err_dma_mask;
>
> ndev->self_mmio = pci_iomap(pdev, 0, 0);
> if (!ndev->self_mmio) {
> diff --git a/drivers/ntb/ntb.c b/drivers/ntb/ntb.c
> index 2581ab724c34..93f24440d11d 100644
> --- a/drivers/ntb/ntb.c
> +++ b/drivers/ntb/ntb.c
> @@ -100,6 +100,8 @@ EXPORT_SYMBOL(ntb_unregister_client);
>
> int ntb_register_device(struct ntb_dev *ntb)
> {
> + int ret;
> +
> if (!ntb)
> return -EINVAL;
> if (!ntb->pdev)
> @@ -120,7 +122,16 @@ int ntb_register_device(struct ntb_dev *ntb)
> ntb->ctx_ops = NULL;
> spin_lock_init(&ntb->ctx_lock);
>
> - return device_register(&ntb->dev);
> + device_initialize(&ntb->dev);
> +
> + ret = dma_coerce_mask_and_coherent(&ntb->dev,
> + dma_get_mask(&ntb->pdev->dev));

ntb.c is more of a glue layer, and this is more device specific.
While I like adding it here for more common code, it should probably
reside in the ntb_hw_*.c files to enforce the hw specific code all
reside in that layer. So, this probably needs to be replaced with a
patch which adds the setting of the mask to the switchtec driver.

Thanks,
Jon


> + if (ret != 0) {
> + dev_err(&ntb->dev, "Failed to set NTB device DMA bit mask\n");
> + return ret;
> + }
> +
> + return device_add(&ntb->dev);
> }
> EXPORT_SYMBOL(ntb_register_device);
>
> --
> 2.11.0
>

2018-06-12 16:00:34

by Jon Mason

[permalink] [raw]
Subject: Re: [PATCH 3/8] NTB: Fix the default port and peer numbers for legacy drivers

On Fri, Jun 8, 2018 at 8:08 PM, Logan Gunthorpe <[email protected]> wrote:
> When the commit adding ntb_default_port_number() and
> ntb_default_peer_port_number() entered the kernel there was no
> users of it so it was impossible to tell what the API needed.
>
> When a user finally landed a year later (ntb_pingpong) there were
> more NTB topologies were created and no consideration was considered
> to how other drivers had changed.
>
> Now that there is a user it can be fixed to provide a sensible default
> for the legacy drivers that do not implement ntb_{peer_}port_number().
> Seeing ntb_pingpong doesn't check error codes returning EINVAL was also
> not sensible.
>
> Patches for ntb_pingpong and ntb_perf follow (which are broken
> otherwise) to support hardware that doesn't have port numbers. This is
> important not only to not break support with existing drivers but for
> the cross link topology which, due to its perfect symmetry, cannot
> assign unique port numbers to each side.

This is a very long way of saying "no clients are checking the error
codes, so removing them". :)
I think the history and references to follow-on patches are not
necessary in the commit message and belong more in a 0/X.

This is more of a feature than a bug fix. Can you break this (and the
pingpong and perf changes caused by this) off into a separate series,
as I'll want to apply this to the ntb-next and not bugfixes branch?

Thanks,
Jon

> Fixes: 1e5301196a88 ("NTB: Add indexed ports NTB API")
> Signed-off-by: Logan Gunthorpe <[email protected]>
> ---
> drivers/ntb/ntb.c | 9 ++-------
> 1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/ntb/ntb.c b/drivers/ntb/ntb.c
> index 93f24440d11d..d955a92a095a 100644
> --- a/drivers/ntb/ntb.c
> +++ b/drivers/ntb/ntb.c
> @@ -225,10 +225,8 @@ int ntb_default_port_number(struct ntb_dev *ntb)
> case NTB_TOPO_B2B_DSD:
> return NTB_PORT_SEC_DSD;
> default:
> - break;
> + return 0;
> }
> -
> - return -EINVAL;
> }
> EXPORT_SYMBOL(ntb_default_port_number);
>
> @@ -251,10 +249,8 @@ int ntb_default_peer_port_number(struct ntb_dev *ntb, int pidx)
> case NTB_TOPO_B2B_DSD:
> return NTB_PORT_PRI_USD;
> default:
> - break;
> + return 0;
> }
> -
> - return -EINVAL;
> }
> EXPORT_SYMBOL(ntb_default_peer_port_number);
>
> @@ -326,4 +322,3 @@ static void __exit ntb_driver_exit(void)
> bus_unregister(&ntb_bus);
> }
> module_exit(ntb_driver_exit);
> -
> --
> 2.11.0
>

2018-06-12 16:20:31

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 2/8] NTB: Setup the DMA mask globally for all drivers



On 12/06/18 09:48 AM, Jon Mason wrote:
> ntb.c is more of a glue layer, and this is more device specific.
> While I like adding it here for more common code, it should probably
> reside in the ntb_hw_*.c files to enforce the hw specific code all
> reside in that layer. So, this probably needs to be replaced with a
> patch which adds the setting of the mask to the switchtec driver.

I disagree strongly. You can tell it's not device specific seeing we
have 4 devices that need the exact same code. In fact, there is nothing
device specific in those lines of code as the device specific part comes
when a driver sets the PCI parent device's DMA mask. These lines just
initialize the dma_mask for the new NTB device with its parent's mask.
This is just sensible given that nothing now works if it is not done and
trusting driver writers to get it right is not a good idea seeing we
already screwed it up once. Furthermore, it violates DRY (do not repeat
yourself).

If there is something driver specific that must be done (although I
can't actually imagine what this would be) the drivers are free to
change the mask after calling ntb_register but getting the common case
setup in common code is just good design.

Logan

2018-06-12 16:31:47

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 3/8] NTB: Fix the default port and peer numbers for legacy drivers



On 12/06/18 09:59 AM, Jon Mason wrote:
>> Patches for ntb_pingpong and ntb_perf follow (which are broken
>> otherwise) to support hardware that doesn't have port numbers. This is
>> important not only to not break support with existing drivers but for
>> the cross link topology which, due to its perfect symmetry, cannot
>> assign unique port numbers to each side.
>
> This is a very long way of saying "no clients are checking the error
> codes, so removing them". :)

Well, clients not checking the error code made this harder to debug for
sure, but removing the error code is a side effect and not what is
happening here (in fact someone should probably still go back and add
error checking because these functions can still return errors but
that's not really something I have time to do). After the next couple
patches, the clients will use this change to detect that there are no
port numbers and handle things similarly to the way they did before they
were broken by the multiport changes.

> I think the history and references to follow-on patches are not
> necessary in the commit message and belong more in a 0/X.

This is the opposite of what I've ever heard before. Having a commit
message that explains what led up to this commit is a good thing and
allows people debugging in the future to better understand the decisions
made. People debugging commits will never find the 0/X cover letter
which is just intended to introduce the series to reviewers and describe
changes if the series is posted multiple times.

> This is more of a feature than a bug fix. Can you break this (and the
> pingpong and perf changes caused by this) off into a separate series,
> as I'll want to apply this to the ntb-next and not bugfixes branch?

No this is not a feature request. This is fixing a regression that broke
previously working code in the only sensible way I can come up with. If
you have a better way to fix this, I'd be glad to hear it. But this
should *not* be treated as a feature request.

Logan

2018-06-15 19:47:57

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 0/8] Fix breakage caused by the NTB multi-port patchset

On Fri, Jun 08, 2018 at 06:08:10PM -0600, Logan Gunthorpe <[email protected]> wrote:

Good day, Logan.
Thanks for the patchset you submitted. My hopefully useful comments are
under the corresponding patches.

Regards,
-Sergey

> Hey,
>
> Here are all the fixes required to get ntb_test on switchtec working
> again after the multi-port test patches were merged.
>
> I'd appreciate it if future changes can be a) more careful about
> not breaking things, b) communicated more clearly so that better
> review can be done, and c) not merged until sufficient review actually
> is done.
>
> Note, I sent the first patch in this series earlier; please disregard
> the earlier one.
>
> Thanks,
>
> Logan
>
> Logan Gunthorpe (8):
> NTB: ntb_tool: reading the link file should not end in a NULL byte
> NTB: Setup the DMA mask globally for all drivers
> NTB: Fix the default port and peer numbers for legacy drivers
> NTB: ntb_pingpong: Choose doorbells based on port number
> NTB: perf: Don't require one more memory window than number of peers
> NTB: perf: Fix support for hardware that doesn't have port numbers
> NTB: perf: Fix race condition when run with ntb_test
> NTB: ntb_test: Fix bug when counting remote files
>
> drivers/ntb/hw/amd/ntb_hw_amd.c | 4 ----
> drivers/ntb/hw/idt/ntb_hw_idt.c | 6 ------
> drivers/ntb/hw/intel/ntb_hw_intel.c | 4 ----
> drivers/ntb/ntb.c | 22 ++++++++++++++--------
> drivers/ntb/test/ntb_perf.c | 22 +++++++++++++++++++---
> drivers/ntb/test/ntb_pingpong.c | 14 ++++++--------
> drivers/ntb/test/ntb_tool.c | 3 +--
> tools/testing/selftests/ntb/ntb_test.sh | 2 +-
> 8 files changed, 41 insertions(+), 36 deletions(-)
>
> --
> 2.11.0
>
> --
> You received this message because you are subscribed to the Google Groups "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/linux-ntb/20180609000819.13883-1-logang%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.

2018-06-15 19:48:31

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 1/8] NTB: ntb_tool: reading the link file should not end in a NULL byte

On Fri, Jun 08, 2018 at 06:08:11PM -0600, Logan Gunthorpe <[email protected]> wrote:
> When running ntb_test this warning is issued:
>
> ./ntb_test.sh: line 200: warning: command substitution: ignored null
> byte in input
>

This is weird. Neither me nor the folks' who tested the script saw this warning.
I tried it on my laptop with bash and on a target device with busybox-shell. The
warning never occurred. I even tried a simple command like:
[[ $(echo -ne "\x4e\x0a\00") == "N" ]] && echo "True"

It might be that your bash is more modern than mine. Anyway if this patch solves the
problem you see, that's great. Thanks for it.

-Sergey

> This is caused by the kernel returning one more byte than is necessary
> when reading the link file.
>
> Reduce the number of bytes read back to 2 as it was before the
> commit that regressed this.
>
> Fixes: 7f46c8b3a552 ("NTB: ntb_tool: Add full multi-port NTB API support")
> Signed-off-by: Logan Gunthorpe <[email protected]>
> ---
> drivers/ntb/test/ntb_tool.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/ntb/test/ntb_tool.c b/drivers/ntb/test/ntb_tool.c
> index d592c0ffbd19..ec5cf095cdb9 100644
> --- a/drivers/ntb/test/ntb_tool.c
> +++ b/drivers/ntb/test/ntb_tool.c
> @@ -504,7 +504,7 @@ static ssize_t tool_peer_link_read(struct file *filep, char __user *ubuf,
> buf[1] = '\n';
> buf[2] = '\0';
>
> - return simple_read_from_buffer(ubuf, size, offp, buf, 3);
> + return simple_read_from_buffer(ubuf, size, offp, buf, 2);
> }
>
> static TOOL_FOPS_RDWR(tool_peer_link_fops,
> @@ -1690,4 +1690,3 @@ static void __exit tool_exit(void)
> debugfs_remove_recursive(tool_dbgfs_topdir);
> }
> module_exit(tool_exit);
> -
> --
> 2.11.0
>

2018-06-15 19:49:40

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 3/8] NTB: Fix the default port and peer numbers for legacy drivers

On Fri, Jun 08, 2018 at 06:08:13PM -0600, Logan Gunthorpe <[email protected]> wrote:
> When the commit adding ntb_default_port_number() and
> ntb_default_peer_port_number() entered the kernel there was no
> users of it so it was impossible to tell what the API needed.
>
> When a user finally landed a year later (ntb_pingpong) there were
> more NTB topologies were created and no consideration was considered
> to how other drivers had changed.
>
> Now that there is a user it can be fixed to provide a sensible default
> for the legacy drivers that do not implement ntb_{peer_}port_number().
> Seeing ntb_pingpong doesn't check error codes returning EINVAL was also
> not sensible.
>
> Patches for ntb_pingpong and ntb_perf follow (which are broken
> otherwise) to support hardware that doesn't have port numbers. This is
> important not only to not break support with existing drivers but for
> the cross link topology which, due to its perfect symmetry, cannot
> assign unique port numbers to each side.
>
> Fixes: 1e5301196a88 ("NTB: Add indexed ports NTB API")
> Signed-off-by: Logan Gunthorpe <[email protected]>

As a part of the multi-port NTB API the port-index interface was freshly
introduced. The main idea was to somehow address local/peer domains within one
NTB device, since from now there can be more than one peer domain to send
message to or to set MWs up with. For this we invented the two-spaces interface
which mapped in general non-linear ports space to the locally linear ports
indexes space, and vise-versa. That mapping was implemented by new callbacks:
ntb_port*()/ntb_peer_port*().

Even though it perfectly fitted the IDT NTB functions, the Intel/AMD devices
didn't have explicit ports numbering. Instead we decided to assign the numbers
by using the topology type. So the Primary and B2B US sides got port
NTB_PORT_PRI_USD, Secondary and B2B DS sides got port NTB_PORT_SEC_DSD.
In order to make it being default for all pure two-ports devices like
Intel/AMD the new methods ntb_default_port_number() and
ntb_default_peer_port_number() were developed and utilized in the
ntb_port*()/ntb_peer_port*() API functions (see ntb.h header file).

So to speak the main purpose of the default methods is to assign some unique
port number to the NTB devices based on the topology at current implementation.
Please note, that it is essential for the NTB API to have each port uniquely
enumerated within one device. This is the way the multi-port NTB API has been
designed in the first place. That was the reason we altered the Intel/AMD and
IDT drivers about two years ago.

Based on this I redeveloped the ntb_tool/ntb_perf/ntb_pingpong drivers.
Needless to say that I was sure all the NTB devices followed the API convention
regarding the port numbers. Since the Switchtec driver doesn't provide the
explicit port-index API callbacks, the NTB API internals uses the default
methods, which as you can see don't know anything about SWITCH and CROSSLINK
topologies. That's why the methods return -EINVAL so the test drivers don't
work properly.

Concerning the fix of the discovered issues and fixes introduced by this
patchset. I'd suggest to add the ports-index callbacks to the Switchtec
driver, which identify local and peer ports. After this the current version
of all the test drivers shall perfectly work.

As far as I can see the PFX family switches documentation operates with the
definitions like Ports/Partitions (similar to the IDT switches) as well as
the switchtec management driver. It might be a clue to the switch functionality,
which can be used to find something similar to the ports numbering.

Regards,
-Sergey

> ---
> drivers/ntb/ntb.c | 9 ++-------
> 1 file changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/ntb/ntb.c b/drivers/ntb/ntb.c
> index 93f24440d11d..d955a92a095a 100644
> --- a/drivers/ntb/ntb.c
> +++ b/drivers/ntb/ntb.c
> @@ -225,10 +225,8 @@ int ntb_default_port_number(struct ntb_dev *ntb)
> case NTB_TOPO_B2B_DSD:
> return NTB_PORT_SEC_DSD;
> default:
> - break;
> + return 0;
> }
> -
> - return -EINVAL;
> }
> EXPORT_SYMBOL(ntb_default_port_number);
>
> @@ -251,10 +249,8 @@ int ntb_default_peer_port_number(struct ntb_dev *ntb, int pidx)
> case NTB_TOPO_B2B_DSD:
> return NTB_PORT_PRI_USD;
> default:
> - break;
> + return 0;
> }
> -
> - return -EINVAL;
> }
> EXPORT_SYMBOL(ntb_default_peer_port_number);
>
> @@ -326,4 +322,3 @@ static void __exit ntb_driver_exit(void)
> bus_unregister(&ntb_bus);
> }
> module_exit(ntb_driver_exit);
> -
> --
> 2.11.0
>
> --
> You received this message because you are subscribed to the Google Groups "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/linux-ntb/20180609000819.13883-4-logang%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.

2018-06-15 19:50:08

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 4/8] NTB: ntb_pingpong: Choose doorbells based on port number

On Fri, Jun 08, 2018 at 06:08:14PM -0600, Logan Gunthorpe <[email protected]> wrote:
> This commit fixes pingpong support for existing drivers that do not
> implement ntb_default_port_number() and ntb_default_peer_port_number().
> This is required for hardware (like the crosslink topology of
> switchtec) which cannot assign reasonable port numbers to each port due
> to its perfect symmetry.
>
> Instead of picking the doorbell to use based on the the index of the
> peer, we use the peer's port number. This is a bit clearer and easier
> to understand.
>

Thanks for the patch. It was the original version of the ping-pong driver,
I was going to submit. But I've decided to develop it a bit different. And
here is why.

My goal was to create the multi-port version of the ping-pong test.
The idea of the new driver was to implement the cyclic port-to-port
ping-pong algorithm. Simply speaking each port selects two partner-ports,
one partner would be used as the source of pings and another one would be
target of pongs sent to with the defined delay.

Since IDT got a global Doorbell register, which is shared between all
the ports, I had to assign an unique doorbell bit to each port. I created a
simple algorithm, which linearised in general non-linear port numbers.
Then I used the globally unique port index to select the corresponding
doorbell bit. pp_init_flds() methods implements the corresponding algorithm,
while pp_find_next_peer() performs the next port selection to convey the
pong to.

Regarding the patch. The idea of using the port number instead of linearised
unique index should also work for Intel/AMD/IDT drivers. But the ports-space
linearization algorithm was created for the case if the real port numbers
would exceed the available Doorbell bits. I thought this might be the case of
multi-ports version of the switchtec driver.

Needless to say, that if Switchtec driver had the ports-index API implementation,
this patch wouldn't be needed.

Regards,
-Sergey

> Fixes: c7aeb0afdcc2 ("NTB: ntb_pp: Add full multi-port NTB API support")
> Signed-off-by: Logan Gunthorpe <[email protected]>
> ---
> drivers/ntb/test/ntb_pingpong.c | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/ntb/test/ntb_pingpong.c b/drivers/ntb/test/ntb_pingpong.c
> index 65865e460ab8..18d00eec7b02 100644
> --- a/drivers/ntb/test/ntb_pingpong.c
> +++ b/drivers/ntb/test/ntb_pingpong.c
> @@ -121,15 +121,14 @@ static int pp_find_next_peer(struct pp_ctx *pp)
> link = ntb_link_is_up(pp->ntb, NULL, NULL);
>
> /* Find next available peer */
> - if (link & pp->nmask) {
> + if (link & pp->nmask)
> pidx = __ffs64(link & pp->nmask);
> - out_db = BIT_ULL(pidx + 1);
> - } else if (link & pp->pmask) {
> + else if (link & pp->pmask)
> pidx = __ffs64(link & pp->pmask);
> - out_db = BIT_ULL(pidx);
> - } else {
> + else
> return -ENODEV;
> - }
> +
> + out_db = BIT_ULL(ntb_peer_port_number(pp->ntb, pidx));
>
> spin_lock(&pp->lock);
> pp->out_pidx = pidx;
> @@ -303,7 +302,7 @@ static void pp_init_flds(struct pp_ctx *pp)
> break;
> }
>
> - pp->in_db = BIT_ULL(pidx);
> + pp->in_db = BIT_ULL(lport);
> pp->pmask = GENMASK_ULL(pidx, 0) >> 1;
> pp->nmask = GENMASK_ULL(pcnt - 1, pidx);
>
> @@ -435,4 +434,3 @@ static void __exit pp_exit(void)
> debugfs_remove_recursive(pp_dbgfs_topdir);
> }
> module_exit(pp_exit);
> -
> --
> 2.11.0
>
> --
> You received this message because you are subscribed to the Google Groups "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/linux-ntb/20180609000819.13883-5-logang%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.

2018-06-15 19:51:33

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 5/8] NTB: perf: Don't require one more memory window than number of peers

On Fri, Jun 08, 2018 at 06:08:15PM -0600, Logan Gunthorpe <[email protected]> wrote:
> ntb_perf should not require more than one memory window per peer. This
> was probably an off-by-one error.
>

Good catch. Thanks. IDT got a lot of MWs especially if LookUpTables are
enabled. That's why I didn't find the effect of this error.

Regards,
-Sergey

> Fixes: 5648e56d03fa ("NTB: ntb_perf: Add full multi-port NTB API support")
> Signed-off-by: Logan Gunthorpe <[email protected]>
> ---
> drivers/ntb/test/ntb_perf.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
> index 2a9d6b0d1f19..fe27412ffe91 100644
> --- a/drivers/ntb/test/ntb_perf.c
> +++ b/drivers/ntb/test/ntb_perf.c
> @@ -655,7 +655,7 @@ static int perf_init_service(struct perf_ctx *perf)
> {
> u64 mask;
>
> - if (ntb_peer_mw_count(perf->ntb) < perf->pcnt + 1) {
> + if (ntb_peer_mw_count(perf->ntb) < perf->pcnt) {
> dev_err(&perf->ntb->dev, "Not enough memory windows\n");
> return -EINVAL;
> }
> --
> 2.11.0
>
> --
> You received this message because you are subscribed to the Google Groups "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/linux-ntb/20180609000819.13883-6-logang%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.

2018-06-15 19:51:42

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 6/8] NTB: perf: Fix support for hardware that doesn't have port numbers

On Fri, Jun 08, 2018 at 06:08:17PM -0600, Logan Gunthorpe <[email protected]> wrote:
> Legacy drivers do not have port numbers (but is reliably only two ports)
> and was broken by the recent commit that added mult-port support to
> ntb_perf. This is especially important to support the cross link
> topology which is perfectly symmetric and cannot assign unique port
> numbers easily.
>

Please, see the comment to the patch 3/8. I explained everything there
including the fact, that the Intel/AMD drivers do have unique port numbers
assigned.

Regards,
-Sergey

> Hardware that returns zero for both the local port and the peer should
> just always use gidx=0 for the only peer.
>
> Fixes: 5648e56d03fa ("NTB: ntb_perf: Add full multi-port NTB API support")
> Signed-off-by: Logan Gunthorpe <[email protected]>
> ---
> drivers/ntb/test/ntb_perf.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
> index fe27412ffe91..6285cb8515ac 100644
> --- a/drivers/ntb/test/ntb_perf.c
> +++ b/drivers/ntb/test/ntb_perf.c
> @@ -1417,6 +1417,16 @@ static int perf_init_peers(struct perf_ctx *perf)
> if (perf->gidx == -1)
> perf->gidx = pidx;
>
> + /*
> + * Hardware with only two ports may not have unique port
> + * numbers. In this case, the gidxs should all be zero.
> + */
> + if (perf->pcnt == 1 && ntb_port_number(perf->ntb) == 0 &&
> + ntb_peer_port_number(perf->ntb, 0) == 0) {
> + perf->gidx = 0;
> + perf->peers[0].gidx = 0;
> + }
> +
> for (pidx = 0; pidx < perf->pcnt; pidx++) {
> ret = perf_setup_peer_mw(&perf->peers[pidx]);
> if (ret)
> --
> 2.11.0
>
> --
> You received this message because you are subscribed to the Google Groups "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/linux-ntb/20180609000819.13883-8-logang%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.

2018-06-15 19:52:02

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 7/8] NTB: perf: Fix race condition when run with ntb_test

On Fri, Jun 08, 2018 at 06:08:18PM -0600, Logan Gunthorpe <[email protected]> wrote:
> When running ntb_test, the script tries to run the ntb_perf test
> immediately after probing the modules. Since adding multi-port support,
> this fails seeing the new initialization procedure in ntb_perf
> can not complete instantly.
>
> To fix this we add a completion which is waited on when a test is
> started. In this way, run can be written any time after the module is
> loaded and it will wait for the initialization to complete instead of
> sending an error.
>

Hmm, this behavior is the feature of the driver and isn't a bug or race to be
fixed. ntb_perf driver returns -ENOLINK until the link is actually established,
when the memory windows are properly initialized so the test can be performed.
What do you think of leaving the algorithm as is, but instead to develop
the polling scheme in the ntb_test.sh script and break the script execution if
the link isn't established after sometime? At least we won't need to wait forever
in case if the peer hanged up or crashed while the NTB link negotiation algorithm
was in-progress.

Regards,
-Sergey

> Fixes: 5648e56d03fa ("NTB: ntb_perf: Add full multi-port NTB API support")
> Signed-off-by: Logan Gunthorpe <[email protected]>
> ---
> drivers/ntb/test/ntb_perf.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/ntb/test/ntb_perf.c b/drivers/ntb/test/ntb_perf.c
> index 6285cb8515ac..8e2b7630ecc9 100644
> --- a/drivers/ntb/test/ntb_perf.c
> +++ b/drivers/ntb/test/ntb_perf.c
> @@ -158,6 +158,8 @@ struct perf_peer {
> /* NTB connection setup service */
> struct work_struct service;
> unsigned long sts;
> +
> + struct completion init_comp;
> };
> #define to_peer_service(__work) \
> container_of(__work, struct perf_peer, service)
> @@ -549,6 +551,7 @@ static int perf_setup_outbuf(struct perf_peer *peer)
>
> /* Initialization is finally done */
> set_bit(PERF_STS_DONE, &peer->sts);
> + complete_all(&peer->init_comp);
>
> return 0;
> }
> @@ -639,6 +642,7 @@ static void perf_service_work(struct work_struct *work)
> perf_setup_outbuf(peer);
>
> if (test_and_clear_bit(PERF_CMD_CLEAR, &peer->sts)) {
> + init_completion(&peer->init_comp);
> clear_bit(PERF_STS_DONE, &peer->sts);
> if (test_bit(0, &peer->perf->busy_flag) &&
> peer == peer->perf->test_peer) {
> @@ -1046,8 +1050,9 @@ static int perf_submit_test(struct perf_peer *peer)
> struct perf_thread *pthr;
> int tidx, ret;
>
> - if (!test_bit(PERF_STS_DONE, &peer->sts))
> - return -ENOLINK;
> + ret = wait_for_completion_interruptible(&peer->init_comp);
> + if (ret < 0)
> + return ret;
>
> if (test_and_set_bit_lock(0, &perf->busy_flag))
> return -EBUSY;
> @@ -1413,6 +1418,7 @@ static int perf_init_peers(struct perf_ctx *perf)
> peer->gidx = pidx;
> }
> INIT_WORK(&peer->service, perf_service_work);
> + init_completion(&peer->init_comp);
> }
> if (perf->gidx == -1)
> perf->gidx = pidx;
> --
> 2.11.0
>
> --
> You received this message because you are subscribed to the Google Groups "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/d/msgid/linux-ntb/20180609000819.13883-9-logang%40deltatee.com.
> For more options, visit https://groups.google.com/d/optout.

2018-06-15 19:52:17

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 3/8] NTB: Fix the default port and peer numbers for legacy drivers



On 15/06/18 01:48 PM, Serge Semin wrote:
> Concerning the fix of the discovered issues and fixes introduced by this
> patchset. I'd suggest to add the ports-index callbacks to the Switchtec
> driver, which identify local and peer ports. After this the current version
> of all the test drivers shall perfectly work.

Well that will work for the simple switchtec case. The crosslink
topology CAN NOT produce port numbers like you ask. It is perfectly
symmetric and the two hosts cannot reliably figure out which is port 0
and which is port 1. So these patches support this case.

Logan

2018-06-15 19:52:31

by Serge Semin

[permalink] [raw]
Subject: Re: [PATCH 8/8] NTB: ntb_test: Fix bug when counting remote files

On Fri, Jun 08, 2018 at 06:08:19PM -0600, Logan Gunthorpe <[email protected]> wrote:
> When remote files are counted in get_files_count, without using SSH,
> the code returns 0 because there is a colon prepended to $LOC. $VPATH
> should have been used instead of $LOC.
>

Good catch. Thanks for the patch. I discovered this problem myself a few days
before you sent this patchset. So was going to submit the fix, but you were
faster.

I also tested this script in the looped-back setup. It is the case when two
NTB-device ports are available at the same RootComplex. So the NTB can be
configured from the single executional context. In this case the REMOTE_HOST is left
empty, so the colon is left prepended to the corresponding paths and causes multiple
errors including the one fixed by this patch. In order to fix it, we need to discard
the colon for remote-less case, for instance, by the next patch:

@@ -482,7 +495,11 @@ function perf_test()
function ntb_tool_tests()
{
LOCAL_TOOL="$DEBUGFS/ntb_tool/$LOCAL_DEV"
- REMOTE_TOOL="$REMOTE_HOST:$DEBUGFS/ntb_tool/$REMOTE_DEV"
+ if [[ "${REMOTE_HOST}" != "" ]]; then
+ REMOTE_TOOL="$REMOTE_HOST:$DEBUGFS/ntb_tool/$REMOTE_DEV"
+ else
+ REMOTE_TOOL="$DEBUGFS/ntb_tool/$REMOTE_DEV"
+ fi

echo "Starting ntb_tool tests..."

And so on for REMOTE_PP and REMOTE_PERF. It is necessary for NTB devices, which ports
are looped-back to the same Root-Port. Would you be amenable if you resent this patch
together with the fix I suggested?

Regards,
-Sergey

> Fixes: 06bd0407d06c ("NTB: ntb_test: Update ntb_tool Scratchpad tests")
> Signed-off-by: Logan Gunthorpe <[email protected]>
> ---
> tools/testing/selftests/ntb/ntb_test.sh | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/ntb/ntb_test.sh b/tools/testing/selftests/ntb/ntb_test.sh
> index 08cbfbbc7029..17ca36403d04 100755
> --- a/tools/testing/selftests/ntb/ntb_test.sh
> +++ b/tools/testing/selftests/ntb/ntb_test.sh
> @@ -250,7 +250,7 @@ function get_files_count()
> split_remote $LOC
>
> if [[ "$REMOTE" == "" ]]; then
> - echo $(ls -1 "$LOC"/${NAME}* 2>/dev/null | wc -l)
> + echo $(ls -1 "$VPATH"/${NAME}* 2>/dev/null | wc -l)
> else
> echo $(ssh "$REMOTE" "ls -1 \"$VPATH\"/${NAME}* | \
> wc -l" 2> /dev/null)
> --
> 2.11.0
>

2018-06-15 19:57:18

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 4/8] NTB: ntb_pingpong: Choose doorbells based on port number



On 15/06/18 01:49 PM, Serge Semin wrote:
> Regarding the patch. The idea of using the port number instead of linearised
> unique index should also work for Intel/AMD/IDT drivers. But the ports-space
> linearization algorithm was created for the case if the real port numbers
> would exceed the available Doorbell bits. I thought this might be the case of
> multi-ports version of the switchtec driver.

Well, the switchtec driver splits its 64 doorbells in two sets, one for
each port right now. That will likely have to change when we go to a
multi-port implementation. In that case we will have 64 doorbells and a
maximum 48 ports. So I don't think we have to concern ourselves with
more ports than doorbells.

> Needless to say, that if Switchtec driver had the ports-index API implementation,
> this patch wouldn't be needed.

As I've said, it's impossible to write for the crosslink topology so the
clients must support that case.

Logan

2018-06-15 20:01:26

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 7/8] NTB: perf: Fix race condition when run with ntb_test



On 15/06/18 01:51 PM, Serge Semin wrote:
> On Fri, Jun 08, 2018 at 06:08:18PM -0600, Logan Gunthorpe <[email protected]> wrote:
>> When running ntb_test, the script tries to run the ntb_perf test
>> immediately after probing the modules. Since adding multi-port support,
>> this fails seeing the new initialization procedure in ntb_perf
>> can not complete instantly.
>>
>> To fix this we add a completion which is waited on when a test is
>> started. In this way, run can be written any time after the module is
>> loaded and it will wait for the initialization to complete instead of
>> sending an error.
>>
>
> Hmm, this behavior is the feature of the driver and isn't a bug or race to be
> fixed. ntb_perf driver returns -ENOLINK until the link is actually established,
> when the memory windows are properly initialized so the test can be performed.
> What do you think of leaving the algorithm as is, but instead to develop
> the polling scheme in the ntb_test.sh script and break the script execution if
> the link isn't established after sometime? At least we won't need to wait forever
> in case if the peer hanged up or crashed while the NTB link negotiation algorithm
> was in-progress.

I think polling is really ugly and doesn't really address solve the
issue of waiting forever. It's pretty easy to interrupt out of the wait
and provides a much better clue to whats going on than an error.

If we want to be more explicit, it would be pretty easy to start a timer
in the bash script and use SIGALRM to exit if the link doesn't come up
after 30 seconds or something.

Logan

2018-06-15 20:02:58

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 8/8] NTB: ntb_test: Fix bug when counting remote files



On 15/06/18 01:51 PM, Serge Semin wrote:
> On Fri, Jun 08, 2018 at 06:08:19PM -0600, Logan Gunthorpe <[email protected]> wrote:
>> When remote files are counted in get_files_count, without using SSH,
>> the code returns 0 because there is a colon prepended to $LOC. $VPATH
>> should have been used instead of $LOC.
>>
>
> Good catch. Thanks for the patch. I discovered this problem myself a few days
> before you sent this patchset. So was going to submit the fix, but you were
> faster.

Thanks. It would be good to get Acked-bys or Reviewed-bys on the patches
you approved.

Logan

2018-06-15 20:03:23

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 6/8] NTB: perf: Fix support for hardware that doesn't have port numbers



On 15/06/18 01:50 PM, Serge Semin wrote:
> On Fri, Jun 08, 2018 at 06:08:17PM -0600, Logan Gunthorpe <[email protected]> wrote:
>> Legacy drivers do not have port numbers (but is reliably only two ports)
>> and was broken by the recent commit that added mult-port support to
>> ntb_perf. This is especially important to support the cross link
>> topology which is perfectly symmetric and cannot assign unique port
>> numbers easily.
>>
>
> Please, see the comment to the patch 3/8. I explained everything there
> including the fact, that the Intel/AMD drivers do have unique port numbers
> assigned.

See my comments as to why this is impossible.

Logan

2018-07-20 16:35:03

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 8/8] NTB: ntb_test: Fix bug when counting remote files



On 15/06/18 01:51 PM, Serge Semin wrote:
> I also tested this script in the looped-back setup. It is the case when two
> NTB-device ports are available at the same RootComplex. So the NTB can be
> configured from the single executional context. In this case the REMOTE_HOST is left
> empty, so the colon is left prepended to the corresponding paths and causes multiple
> errors including the one fixed by this patch. In order to fix it, we need to discard
> the colon for remote-less case, for instance, by the next patch:
>
> @@ -482,7 +495,11 @@ function perf_test()
> function ntb_tool_tests()
> {
> LOCAL_TOOL="$DEBUGFS/ntb_tool/$LOCAL_DEV"
> - REMOTE_TOOL="$REMOTE_HOST:$DEBUGFS/ntb_tool/$REMOTE_DEV"
> + if [[ "${REMOTE_HOST}" != "" ]]; then
> + REMOTE_TOOL="$REMOTE_HOST:$DEBUGFS/ntb_tool/$REMOTE_DEV"
> + else
> + REMOTE_TOOL="$DEBUGFS/ntb_tool/$REMOTE_DEV"
> + fi
>
> echo "Starting ntb_tool tests..."
>
> And so on for REMOTE_PP and REMOTE_PERF. It is necessary for NTB devices, which ports
> are looped-back to the same Root-Port. Would you be amenable if you resent this patch
> together with the fix I suggested?
I took a closer look at this and it's not necessary. (Note: I do the
majority of my testing in a looped-back setup).

What you didn't notice is that split_remote() separates the colon
whether there is a host or not. It's not passed to ssh or cat (or
whatever) directly. So the change you propose will actually break the
how it was designed.

Logan