2021-06-22 06:16:08

by Ira Weiny

[permalink] [raw]
Subject: [PATCH 4/4] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

From: Ira Weiny <[email protected]>

kmap() is being deprecated and will break uses of device dax after PKS
protection is introduced.[1]

The use of kmap() in siw_tx_hdt() is all thread local therefore
kmap_local_page() is a sufficient replacement and will work with pgmap
protected pages when those are implemented.

kmap_local_page() mappings are tracked in a stack and must be unmapped
in the opposite order they were mapped in.

siw_tx_hdt() tracks pages used in a page_array. It uses that array to
unmap pages which were mapped on function exit. Not all entries in the
array are mapped and this is tracked in kmap_mask.

kunmap_local() takes a mapped address rather than a page. Declare a
mapped address array, page_array_addr, of the same size as the page
array to be used for unmapping.

Use kmap_local_page() instead of kmap() to map pages in the page_array.

Because segments are mapped into the page array in increasing index
order, modify siw_unmap_pages() to unmap pages in decreasing order.

The kmap_mask is no longer needed as the lack of an address in the
address array can indicate no unmap is required.

[1] https://lore.kernel.org/lkml/[email protected]/

Signed-off-by: Ira Weiny <[email protected]>
---
drivers/infiniband/sw/siw/siw_qp_tx.c | 35 +++++++++++++++------------
1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index db68a10d12cd..e70aba23f6e7 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -396,13 +396,17 @@ static int siw_0copy_tx(struct socket *s, struct page **page,

#define MAX_TRAILER (MPA_CRC_SIZE + 4)

-static void siw_unmap_pages(struct page **pp, unsigned long kmap_mask)
+static void siw_unmap_pages(void **addrs, int len)
{
- while (kmap_mask) {
- if (kmap_mask & BIT(0))
- kunmap(*pp);
- pp++;
- kmap_mask >>= 1;
+ int i;
+
+ /*
+ * Work backwards through the array to honor the kmap_local_page()
+ * ordering requirements.
+ */
+ for (i = (len-1); i >= 0; i--) {
+ if (addrs[i])
+ kunmap_local(addrs[i]);
}
}

@@ -427,13 +431,15 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
struct siw_sge *sge = &wqe->sqe.sge[c_tx->sge_idx];
struct kvec iov[MAX_ARRAY];
struct page *page_array[MAX_ARRAY];
+ void *page_array_addr[MAX_ARRAY];
struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_EOR };

int seg = 0, do_crc = c_tx->do_crc, is_kva = 0, rv;
unsigned int data_len = c_tx->bytes_unsent, hdr_len = 0, trl_len = 0,
sge_off = c_tx->sge_off, sge_idx = c_tx->sge_idx,
pbl_idx = c_tx->pbl_idx;
- unsigned long kmap_mask = 0L;
+
+ memset(page_array_addr, 0, sizeof(page_array_addr));

if (c_tx->state == SIW_SEND_HDR) {
if (c_tx->use_sendpage) {
@@ -498,7 +504,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
p = siw_get_upage(mem->umem,
sge->laddr + sge_off);
if (unlikely(!p)) {
- siw_unmap_pages(page_array, kmap_mask);
+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
wqe->processed -= c_tx->bytes_unsent;
rv = -EFAULT;
goto done_crc;
@@ -506,11 +512,10 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
page_array[seg] = p;

if (!c_tx->use_sendpage) {
- iov[seg].iov_base = kmap(p) + fp_off;
- iov[seg].iov_len = plen;
+ page_array_addr[seg] = kmap_local_page(page_array[seg]);

- /* Remember for later kunmap() */
- kmap_mask |= BIT(seg);
+ iov[seg].iov_base = page_array_addr[seg] + fp_off;
+ iov[seg].iov_len = plen;

if (do_crc)
crypto_shash_update(
@@ -518,7 +523,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
iov[seg].iov_base,
plen);
} else if (do_crc) {
- kaddr = kmap_local_page(p);
+ kaddr = kmap_local_page(page_array[seg]);
crypto_shash_update(c_tx->mpa_crc_hd,
kaddr + fp_off,
plen);
@@ -542,7 +547,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)

if (++seg > (int)MAX_ARRAY) {
siw_dbg_qp(tx_qp(c_tx), "to many fragments\n");
- siw_unmap_pages(page_array, kmap_mask);
+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
wqe->processed -= c_tx->bytes_unsent;
rv = -EMSGSIZE;
goto done_crc;
@@ -593,7 +598,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
} else {
rv = kernel_sendmsg(s, &msg, iov, seg + 1,
hdr_len + data_len + trl_len);
- siw_unmap_pages(page_array, kmap_mask);
+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
}
if (rv < (int)hdr_len) {
/* Not even complete hdr pushed or negative rv */
--
2.28.0.rc0.12.gb6a658bd00c9


2021-06-22 16:44:36

by Bernard Metzler

[permalink] [raw]
Subject: Re: [PATCH 4/4] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

[email protected] wrote: -----

>To: "Jason Gunthorpe" <[email protected]>
>From: [email protected]
>Date: 06/22/2021 08:14AM
>Cc: "Ira Weiny" <[email protected]>, "Mike Marciniszyn"
><[email protected]>, "Dennis Dalessandro"
><[email protected]>, "Doug Ledford"
><[email protected]>, "Faisal Latif" <[email protected]>,
>"Shiraz Saleem" <[email protected]>, "Bernard Metzler"
><[email protected]>, "Kamal Heib" <[email protected]>,
>[email protected], [email protected]
>Subject: [EXTERNAL] [PATCH 4/4] RDMA/siw: Convert siw_tx_hdt() to
>kmap_local_page()
>
>From: Ira Weiny <[email protected]>
>
>kmap() is being deprecated and will break uses of device dax after
>PKS
>protection is introduced.[1]
>
>The use of kmap() in siw_tx_hdt() is all thread local therefore
>kmap_local_page() is a sufficient replacement and will work with
>pgmap
>protected pages when those are implemented.
>
>kmap_local_page() mappings are tracked in a stack and must be
>unmapped
>in the opposite order they were mapped in.
>
>siw_tx_hdt() tracks pages used in a page_array. It uses that array
>to
>unmap pages which were mapped on function exit. Not all entries in
>the
>array are mapped and this is tracked in kmap_mask.
>
>kunmap_local() takes a mapped address rather than a page. Declare a
>mapped address array, page_array_addr, of the same size as the page
>array to be used for unmapping.
>

Hi Ira, thanks for taking care of that!

I think we can avoid introducing another 'page_array_addr[]' array
here, which must be zeroed first and completely searched for
valid mappings during unmap, and also further bloats the
stack size of siw_tx_hdt(). I think we can go away with the
already available iov[].iov_base addresses array, masking addresses
with PAGE_MASK during unmapping to mask any first byte offset.
All kmap_local_page() mapping end up at that list. For unmapping
we can still rely on the kmap_mask bit field, which is more
efficient to initialize and search for valid mappings. Ordering
during unmapping can be guaranteed if we parse the bitmask
in reverse order. Let me know if you prefer me to propose
a change -- that siw_tx_hdt() thing became rather complex I
have to admit!

Best,
Bernard.

>Use kmap_local_page() instead of kmap() to map pages in the
>page_array.
>
>Because segments are mapped into the page array in increasing index
>order, modify siw_unmap_pages() to unmap pages in decreasing order.
>
>The kmap_mask is no longer needed as the lack of an address in the
>address array can indicate no unmap is required.
>
>[1]
>INVALID URI REMOVED
>lkml_20201009195033.3208459-2D59-2Dira.weiny-40intel.com_&d=DwIDAg&c=
>jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8ZO1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&
>m=wnRcc-qyXV_X7kyQfFYL6XPgmmakQxmo44BmjIon-w0&s=Y0aiKJ4EHZY8FJlI-uiPr
>xcBE95kmgn3iEz3p8d5VF4&e=
>
>Signed-off-by: Ira Weiny <[email protected]>
>---
> drivers/infiniband/sw/siw/siw_qp_tx.c | 35
>+++++++++++++++------------
> 1 file changed, 20 insertions(+), 15 deletions(-)
>
>diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c
>b/drivers/infiniband/sw/siw/siw_qp_tx.c
>index db68a10d12cd..e70aba23f6e7 100644
>--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
>+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
>@@ -396,13 +396,17 @@ static int siw_0copy_tx(struct socket *s,
>struct page **page,
>
> #define MAX_TRAILER (MPA_CRC_SIZE + 4)
>
>-static void siw_unmap_pages(struct page **pp, unsigned long
>kmap_mask)
>+static void siw_unmap_pages(void **addrs, int len)
> {
>- while (kmap_mask) {
>- if (kmap_mask & BIT(0))
>- kunmap(*pp);
>- pp++;
>- kmap_mask >>= 1;
>+ int i;
>+
>+ /*
>+ * Work backwards through the array to honor the kmap_local_page()
>+ * ordering requirements.
>+ */
>+ for (i = (len-1); i >= 0; i--) {
>+ if (addrs[i])
>+ kunmap_local(addrs[i]);
> }
> }
>
>@@ -427,13 +431,15 @@ static int siw_tx_hdt(struct siw_iwarp_tx
>*c_tx, struct socket *s)
> struct siw_sge *sge = &wqe->sqe.sge[c_tx->sge_idx];
> struct kvec iov[MAX_ARRAY];
> struct page *page_array[MAX_ARRAY];
>+ void *page_array_addr[MAX_ARRAY];
> struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_EOR };
>
> int seg = 0, do_crc = c_tx->do_crc, is_kva = 0, rv;
> unsigned int data_len = c_tx->bytes_unsent, hdr_len = 0, trl_len =
>0,
> sge_off = c_tx->sge_off, sge_idx = c_tx->sge_idx,
> pbl_idx = c_tx->pbl_idx;
>- unsigned long kmap_mask = 0L;
>+
>+ memset(page_array_addr, 0, sizeof(page_array_addr));
>
> if (c_tx->state == SIW_SEND_HDR) {
> if (c_tx->use_sendpage) {
>@@ -498,7 +504,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> p = siw_get_upage(mem->umem,
> sge->laddr + sge_off);
> if (unlikely(!p)) {
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
> wqe->processed -= c_tx->bytes_unsent;
> rv = -EFAULT;
> goto done_crc;
>@@ -506,11 +512,10 @@ static int siw_tx_hdt(struct siw_iwarp_tx
>*c_tx, struct socket *s)
> page_array[seg] = p;
>
> if (!c_tx->use_sendpage) {
>- iov[seg].iov_base = kmap(p) + fp_off;
>- iov[seg].iov_len = plen;
>+ page_array_addr[seg] = kmap_local_page(page_array[seg]);
>
>- /* Remember for later kunmap() */
>- kmap_mask |= BIT(seg);
>+ iov[seg].iov_base = page_array_addr[seg] + fp_off;
>+ iov[seg].iov_len = plen;
>
> if (do_crc)
> crypto_shash_update(
>@@ -518,7 +523,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> iov[seg].iov_base,
> plen);
> } else if (do_crc) {
>- kaddr = kmap_local_page(p);
>+ kaddr = kmap_local_page(page_array[seg]);
> crypto_shash_update(c_tx->mpa_crc_hd,
> kaddr + fp_off,
> plen);
>@@ -542,7 +547,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
>
> if (++seg > (int)MAX_ARRAY) {
> siw_dbg_qp(tx_qp(c_tx), "to many fragments\n");
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
> wqe->processed -= c_tx->bytes_unsent;
> rv = -EMSGSIZE;
> goto done_crc;
>@@ -593,7 +598,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> } else {
> rv = kernel_sendmsg(s, &msg, iov, seg + 1,
> hdr_len + data_len + trl_len);
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
> }
> if (rv < (int)hdr_len) {
> /* Not even complete hdr pushed or negative rv */
>--
>2.28.0.rc0.12.gb6a658bd00c9
>
>

2021-06-22 20:38:18

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V2] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

From: Ira Weiny <[email protected]>

kmap() is being deprecated and will break uses of device dax after PKS
protection is introduced.[1]

The use of kmap() in siw_tx_hdt() is all thread local therefore
kmap_local_page() is a sufficient replacement and will work with pgmap
protected pages when those are implemented.

siw_tx_hdt() tracks pages used in a page_array. It uses that array to
unmap pages which were mapped on function exit. Not all entries in the
array are mapped and this is tracked in kmap_mask.

kunmap_local() takes a mapped address rather than a page. Alter
siw_unmap_pages() to take the iov array to reuse the iov_base address of
each mapping. Use PAGE_MASK to get the proper address for
kunmap_local().

kmap_local_page() mappings are tracked in a stack and must be unmapped
in the opposite order they were mapped in. Because segments are mapped
into the page array in increasing index order, modify siw_unmap_pages()
to unmap pages in decreasing order.

Use kmap_local_page() instead of kmap() to map pages in the page_array.

[1] https://lore.kernel.org/lkml/[email protected]/

Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V2:
From Bernard
Reuse iov[].iov_base rather than declaring another array of
pointers and preserve the use of kmap_mask to know which iov's
were kmapped.

---
drivers/infiniband/sw/siw/siw_qp_tx.c | 32 +++++++++++++++++----------
1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index db68a10d12cd..fd3b9e6a67d7 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -396,13 +396,20 @@ static int siw_0copy_tx(struct socket *s, struct page **page,

#define MAX_TRAILER (MPA_CRC_SIZE + 4)

-static void siw_unmap_pages(struct page **pp, unsigned long kmap_mask)
+static void siw_unmap_pages(struct kvec *iov, unsigned long kmap_mask, int len)
{
- while (kmap_mask) {
- if (kmap_mask & BIT(0))
- kunmap(*pp);
- pp++;
- kmap_mask >>= 1;
+ int i;
+
+ /*
+ * Work backwards through the array to honor the kmap_local_page()
+ * ordering requirements.
+ */
+ for (i = (len-1); i >= 0; i--) {
+ if (kmap_mask & BIT(i)) {
+ unsigned long addr = (unsigned long)iov[i].iov_base;
+
+ kunmap_local((void *)(addr & PAGE_MASK));
+ }
}
}

@@ -498,7 +505,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
p = siw_get_upage(mem->umem,
sge->laddr + sge_off);
if (unlikely(!p)) {
- siw_unmap_pages(page_array, kmap_mask);
+ siw_unmap_pages(iov, kmap_mask, MAX_ARRAY);
wqe->processed -= c_tx->bytes_unsent;
rv = -EFAULT;
goto done_crc;
@@ -506,11 +513,12 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
page_array[seg] = p;

if (!c_tx->use_sendpage) {
- iov[seg].iov_base = kmap(p) + fp_off;
- iov[seg].iov_len = plen;
+ void *kaddr = kmap_local_page(page_array[seg]);

/* Remember for later kunmap() */
kmap_mask |= BIT(seg);
+ iov[seg].iov_base = kaddr + fp_off;
+ iov[seg].iov_len = plen;

if (do_crc)
crypto_shash_update(
@@ -518,7 +526,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
iov[seg].iov_base,
plen);
} else if (do_crc) {
- kaddr = kmap_local_page(p);
+ kaddr = kmap_local_page(page_array[seg]);
crypto_shash_update(c_tx->mpa_crc_hd,
kaddr + fp_off,
plen);
@@ -542,7 +550,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)

if (++seg > (int)MAX_ARRAY) {
siw_dbg_qp(tx_qp(c_tx), "to many fragments\n");
- siw_unmap_pages(page_array, kmap_mask);
+ siw_unmap_pages(iov, kmap_mask, MAX_ARRAY);
wqe->processed -= c_tx->bytes_unsent;
rv = -EMSGSIZE;
goto done_crc;
@@ -593,7 +601,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
} else {
rv = kernel_sendmsg(s, &msg, iov, seg + 1,
hdr_len + data_len + trl_len);
- siw_unmap_pages(page_array, kmap_mask);
+ siw_unmap_pages(iov, kmap_mask, MAX_ARRAY);
}
if (rv < (int)hdr_len) {
/* Not even complete hdr pushed or negative rv */
--
2.28.0.rc0.12.gb6a658bd00c9

2021-06-22 20:41:50

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 4/4] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

On Tue, Jun 22, 2021 at 04:42:49PM +0000, Bernard Metzler wrote:
> [email protected] wrote: -----
>
> >To: "Jason Gunthorpe" <[email protected]>
> >From: [email protected]
> >Date: 06/22/2021 08:14AM
> >Cc: "Ira Weiny" <[email protected]>, "Mike Marciniszyn"
> ><[email protected]>, "Dennis Dalessandro"
> ><[email protected]>, "Doug Ledford"
> ><[email protected]>, "Faisal Latif" <[email protected]>,
> >"Shiraz Saleem" <[email protected]>, "Bernard Metzler"
> ><[email protected]>, "Kamal Heib" <[email protected]>,
> >[email protected], [email protected]
> >Subject: [EXTERNAL] [PATCH 4/4] RDMA/siw: Convert siw_tx_hdt() to
> >kmap_local_page()
> >
> >From: Ira Weiny <[email protected]>
> >
> >kmap() is being deprecated and will break uses of device dax after
> >PKS
> >protection is introduced.[1]
> >
> >The use of kmap() in siw_tx_hdt() is all thread local therefore
> >kmap_local_page() is a sufficient replacement and will work with
> >pgmap
> >protected pages when those are implemented.
> >
> >kmap_local_page() mappings are tracked in a stack and must be
> >unmapped
> >in the opposite order they were mapped in.
> >
> >siw_tx_hdt() tracks pages used in a page_array. It uses that array
> >to
> >unmap pages which were mapped on function exit. Not all entries in
> >the
> >array are mapped and this is tracked in kmap_mask.
> >
> >kunmap_local() takes a mapped address rather than a page. Declare a
> >mapped address array, page_array_addr, of the same size as the page
> >array to be used for unmapping.
> >
>
> Hi Ira, thanks for taking care of that!
>
> I think we can avoid introducing another 'page_array_addr[]' array
> here, which must be zeroed first and completely searched for
> valid mappings during unmap, and also further bloats the
> stack size of siw_tx_hdt(). I think we can go away with the
> already available iov[].iov_base addresses array, masking addresses
> with PAGE_MASK during unmapping to mask any first byte offset.
> All kmap_local_page() mapping end up at that list. For unmapping
> we can still rely on the kmap_mask bit field, which is more
> efficient to initialize and search for valid mappings. Ordering
> during unmapping can be guaranteed if we parse the bitmask
> in reverse order. Let me know if you prefer me to propose
> a change -- that siw_tx_hdt() thing became rather complex I
> have to admit!

Seems not too bad, V2 sent.

I was concerned with the additional stack size but only 28 pointers (If I did
my math right) did not seem too bad. It is redundant though so lets see if
I've gotten V2 right.

Thanks!
Ira

>
> Best,
> Bernard.
>
> >Use kmap_local_page() instead of kmap() to map pages in the
> >page_array.
> >
> >Because segments are mapped into the page array in increasing index
> >order, modify siw_unmap_pages() to unmap pages in decreasing order.
> >
> >The kmap_mask is no longer needed as the lack of an address in the
> >address array can indicate no unmap is required.
> >
> >[1]
> >INVALID URI REMOVED
> >lkml_20201009195033.3208459-2D59-2Dira.weiny-40intel.com_&d=DwIDAg&c=
> >jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8ZO1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&
> >m=wnRcc-qyXV_X7kyQfFYL6XPgmmakQxmo44BmjIon-w0&s=Y0aiKJ4EHZY8FJlI-uiPr
> >xcBE95kmgn3iEz3p8d5VF4&e=
> >
> >Signed-off-by: Ira Weiny <[email protected]>
> >---
> > drivers/infiniband/sw/siw/siw_qp_tx.c | 35
> >+++++++++++++++------------
> > 1 file changed, 20 insertions(+), 15 deletions(-)
> >
> >diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c
> >b/drivers/infiniband/sw/siw/siw_qp_tx.c
> >index db68a10d12cd..e70aba23f6e7 100644
> >--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
> >+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
> >@@ -396,13 +396,17 @@ static int siw_0copy_tx(struct socket *s,
> >struct page **page,
> >
> > #define MAX_TRAILER (MPA_CRC_SIZE + 4)
> >
> >-static void siw_unmap_pages(struct page **pp, unsigned long
> >kmap_mask)
> >+static void siw_unmap_pages(void **addrs, int len)
> > {
> >- while (kmap_mask) {
> >- if (kmap_mask & BIT(0))
> >- kunmap(*pp);
> >- pp++;
> >- kmap_mask >>= 1;
> >+ int i;
> >+
> >+ /*
> >+ * Work backwards through the array to honor the kmap_local_page()
> >+ * ordering requirements.
> >+ */
> >+ for (i = (len-1); i >= 0; i--) {
> >+ if (addrs[i])
> >+ kunmap_local(addrs[i]);
> > }
> > }
> >
> >@@ -427,13 +431,15 @@ static int siw_tx_hdt(struct siw_iwarp_tx
> >*c_tx, struct socket *s)
> > struct siw_sge *sge = &wqe->sqe.sge[c_tx->sge_idx];
> > struct kvec iov[MAX_ARRAY];
> > struct page *page_array[MAX_ARRAY];
> >+ void *page_array_addr[MAX_ARRAY];
> > struct msghdr msg = { .msg_flags = MSG_DONTWAIT | MSG_EOR };
> >
> > int seg = 0, do_crc = c_tx->do_crc, is_kva = 0, rv;
> > unsigned int data_len = c_tx->bytes_unsent, hdr_len = 0, trl_len =
> >0,
> > sge_off = c_tx->sge_off, sge_idx = c_tx->sge_idx,
> > pbl_idx = c_tx->pbl_idx;
> >- unsigned long kmap_mask = 0L;
> >+
> >+ memset(page_array_addr, 0, sizeof(page_array_addr));
> >
> > if (c_tx->state == SIW_SEND_HDR) {
> > if (c_tx->use_sendpage) {
> >@@ -498,7 +504,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> >struct socket *s)
> > p = siw_get_upage(mem->umem,
> > sge->laddr + sge_off);
> > if (unlikely(!p)) {
> >- siw_unmap_pages(page_array, kmap_mask);
> >+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
> > wqe->processed -= c_tx->bytes_unsent;
> > rv = -EFAULT;
> > goto done_crc;
> >@@ -506,11 +512,10 @@ static int siw_tx_hdt(struct siw_iwarp_tx
> >*c_tx, struct socket *s)
> > page_array[seg] = p;
> >
> > if (!c_tx->use_sendpage) {
> >- iov[seg].iov_base = kmap(p) + fp_off;
> >- iov[seg].iov_len = plen;
> >+ page_array_addr[seg] = kmap_local_page(page_array[seg]);
> >
> >- /* Remember for later kunmap() */
> >- kmap_mask |= BIT(seg);
> >+ iov[seg].iov_base = page_array_addr[seg] + fp_off;
> >+ iov[seg].iov_len = plen;
> >
> > if (do_crc)
> > crypto_shash_update(
> >@@ -518,7 +523,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> >struct socket *s)
> > iov[seg].iov_base,
> > plen);
> > } else if (do_crc) {
> >- kaddr = kmap_local_page(p);
> >+ kaddr = kmap_local_page(page_array[seg]);
> > crypto_shash_update(c_tx->mpa_crc_hd,
> > kaddr + fp_off,
> > plen);
> >@@ -542,7 +547,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> >struct socket *s)
> >
> > if (++seg > (int)MAX_ARRAY) {
> > siw_dbg_qp(tx_qp(c_tx), "to many fragments\n");
> >- siw_unmap_pages(page_array, kmap_mask);
> >+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
> > wqe->processed -= c_tx->bytes_unsent;
> > rv = -EMSGSIZE;
> > goto done_crc;
> >@@ -593,7 +598,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> >struct socket *s)
> > } else {
> > rv = kernel_sendmsg(s, &msg, iov, seg + 1,
> > hdr_len + data_len + trl_len);
> >- siw_unmap_pages(page_array, kmap_mask);
> >+ siw_unmap_pages(page_array_addr, MAX_ARRAY);
> > }
> > if (rv < (int)hdr_len) {
> > /* Not even complete hdr pushed or negative rv */
> >--
> >2.28.0.rc0.12.gb6a658bd00c9
> >
> >
>

2021-06-23 14:38:12

by Bernard Metzler

[permalink] [raw]
Subject: Re: [PATCH V2] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

[email protected] wrote: -----

>To: "Jason Gunthorpe" <[email protected]>
>From: [email protected]
>Date: 06/22/2021 10:35PM
>Cc: "Ira Weiny" <[email protected]>, "Mike Marciniszyn"
><[email protected]>, "Dennis Dalessandro"
><[email protected]>, "Doug Ledford"
><[email protected]>, "Faisal Latif" <[email protected]>,
>"Shiraz Saleem" <[email protected]>, "Bernard Metzler"
><[email protected]>, "Kamal Heib" <[email protected]>,
>[email protected], [email protected]
>Subject: [EXTERNAL] [PATCH V2] RDMA/siw: Convert siw_tx_hdt() to
>kmap_local_page()
>
>From: Ira Weiny <[email protected]>
>
>kmap() is being deprecated and will break uses of device dax after
>PKS
>protection is introduced.[1]
>
>The use of kmap() in siw_tx_hdt() is all thread local therefore
>kmap_local_page() is a sufficient replacement and will work with
>pgmap
>protected pages when those are implemented.
>
>siw_tx_hdt() tracks pages used in a page_array. It uses that array
>to
>unmap pages which were mapped on function exit. Not all entries in
>the
>array are mapped and this is tracked in kmap_mask.
>
>kunmap_local() takes a mapped address rather than a page. Alter
>siw_unmap_pages() to take the iov array to reuse the iov_base address
>of
>each mapping. Use PAGE_MASK to get the proper address for
>kunmap_local().
>
>kmap_local_page() mappings are tracked in a stack and must be
>unmapped
>in the opposite order they were mapped in. Because segments are
>mapped
>into the page array in increasing index order, modify
>siw_unmap_pages()
>to unmap pages in decreasing order.
>
>Use kmap_local_page() instead of kmap() to map pages in the
>page_array.
>
>[1]
>INVALID URI REMOVED
>lkml_20201009195033.3208459-2D59-2Dira.weiny-40intel.com_&d=DwIDAg&c=
>jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8ZO1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&
>m=ujJBVqPLdVdVxXvOu_PlFL3NVC0Znds3FgxyrtWJtwM&s=WZIBAdwlCqPIRjsNOGlly
>gQ6Hsug6ObgrWgO_nvBGyc&e=
>
>Signed-off-by: Ira Weiny <[email protected]>
>
>---
>Changes for V2:
> From Bernard
> Reuse iov[].iov_base rather than declaring another array of
> pointers and preserve the use of kmap_mask to know which iov's
> were kmapped.
>
>---
> drivers/infiniband/sw/siw/siw_qp_tx.c | 32
>+++++++++++++++++----------
> 1 file changed, 20 insertions(+), 12 deletions(-)
>
>diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c
>b/drivers/infiniband/sw/siw/siw_qp_tx.c
>index db68a10d12cd..fd3b9e6a67d7 100644
>--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
>+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
>@@ -396,13 +396,20 @@ static int siw_0copy_tx(struct socket *s,
>struct page **page,
>
> #define MAX_TRAILER (MPA_CRC_SIZE + 4)
>
>-static void siw_unmap_pages(struct page **pp, unsigned long
>kmap_mask)
>+static void siw_unmap_pages(struct kvec *iov, unsigned long
>kmap_mask, int len)
> {
>- while (kmap_mask) {
>- if (kmap_mask & BIT(0))
>- kunmap(*pp);
>- pp++;
>- kmap_mask >>= 1;
>+ int i;
>+
>+ /*
>+ * Work backwards through the array to honor the kmap_local_page()
>+ * ordering requirements.
>+ */
>+ for (i = (len-1); i >= 0; i--) {
>+ if (kmap_mask & BIT(i)) {
>+ unsigned long addr = (unsigned long)iov[i].iov_base;
>+
>+ kunmap_local((void *)(addr & PAGE_MASK));
>+ }
> }
> }
>
>@@ -498,7 +505,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> p = siw_get_upage(mem->umem,
> sge->laddr + sge_off);
> if (unlikely(!p)) {
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(iov, kmap_mask, MAX_ARRAY);
> wqe->processed -= c_tx->bytes_unsent;
> rv = -EFAULT;
> goto done_crc;
>@@ -506,11 +513,12 @@ static int siw_tx_hdt(struct siw_iwarp_tx
>*c_tx, struct socket *s)
> page_array[seg] = p;
>
> if (!c_tx->use_sendpage) {
>- iov[seg].iov_base = kmap(p) + fp_off;
>- iov[seg].iov_len = plen;
>+ void *kaddr = kmap_local_page(page_array[seg]);

we can use 'kmap_local_page(p)' here
>
> /* Remember for later kunmap() */
> kmap_mask |= BIT(seg);
>+ iov[seg].iov_base = kaddr + fp_off;
>+ iov[seg].iov_len = plen;
>
> if (do_crc)
> crypto_shash_update(
>@@ -518,7 +526,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> iov[seg].iov_base,
> plen);

This patch does not apply for me. Would I have to install first
your [Patch 3/4] -- since the current patch references kmap_local_page()
already? Maybe it is better to apply if it would be just one siw
related patch in that series?



> } else if (do_crc) {
>- kaddr = kmap_local_page(p);
>+ kaddr = kmap_local_page(page_array[seg]);

using 'kmap_local_page(p)' as you had it is straightforward
and I would prefer it.

> crypto_shash_update(c_tx->mpa_crc_hd,
> kaddr + fp_off,
> plen);
>@@ -542,7 +550,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
>
> if (++seg > (int)MAX_ARRAY) {
> siw_dbg_qp(tx_qp(c_tx), "to many fragments\n");
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(iov, kmap_mask, MAX_ARRAY);

to minimize the iterations over the byte array in 'siw_unmap_pages()',
we may pass seg-1 instead of MAX_ARRAY


> wqe->processed -= c_tx->bytes_unsent;
> rv = -EMSGSIZE;
> goto done_crc;
>@@ -593,7 +601,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> } else {
> rv = kernel_sendmsg(s, &msg, iov, seg + 1,
> hdr_len + data_len + trl_len);
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(iov, kmap_mask, MAX_ARRAY);

to minimize the iterations over the byte array in 'siw_unmap_pages()',
we may pass seg instead of MAX_ARRAY

> }
> if (rv < (int)hdr_len) {
> /* Not even complete hdr pushed or negative rv */
>--
>2.28.0.rc0.12.gb6a658bd00c9
>
>

2021-06-23 15:36:56

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V2] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

On Wed, Jun 23, 2021 at 02:36:45PM +0000, Bernard Metzler wrote:
> [email protected] wrote: -----
>
> >@@ -506,11 +513,12 @@ static int siw_tx_hdt(struct siw_iwarp_tx
> >*c_tx, struct socket *s)
> > page_array[seg] = p;
> >
> > if (!c_tx->use_sendpage) {
> >- iov[seg].iov_base = kmap(p) + fp_off;
> >- iov[seg].iov_len = plen;
> >+ void *kaddr = kmap_local_page(page_array[seg]);
>
> we can use 'kmap_local_page(p)' here

Yes but I actually did this on purpose as it makes the code read clearly that
the mapping is 'seg' element of the array. Do you prefer 'p' because this is a
performant path?

> >
> > /* Remember for later kunmap() */
> > kmap_mask |= BIT(seg);
> >+ iov[seg].iov_base = kaddr + fp_off;
> >+ iov[seg].iov_len = plen;
> >
> > if (do_crc)
> > crypto_shash_update(
> >@@ -518,7 +526,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> >struct socket *s)
> > iov[seg].iov_base,
> > plen);
>
> This patch does not apply for me. Would I have to install first
> your [Patch 3/4] -- since the current patch references kmap_local_page()
> already? Maybe it is better to apply if it would be just one siw
> related patch in that series?

Yes the other patch goes first. I split it out to make this more difficult
change more reviewable. I could squash them as it is probably straight forward
enough but I've been careful with this in other subsystems.

Jason, do you have any issue with squashing the 2 patches?

>
>
>
> > } else if (do_crc) {
> >- kaddr = kmap_local_page(p);
> >+ kaddr = kmap_local_page(page_array[seg]);
>
> using 'kmap_local_page(p)' as you had it is straightforward
> and I would prefer it.

OK. I think this reads cleaner but I can see 'p' being more performant.

>
> > crypto_shash_update(c_tx->mpa_crc_hd,
> > kaddr + fp_off,
> > plen);
> >@@ -542,7 +550,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> >struct socket *s)
> >
> > if (++seg > (int)MAX_ARRAY) {
> > siw_dbg_qp(tx_qp(c_tx), "to many fragments\n");
> >- siw_unmap_pages(page_array, kmap_mask);
> >+ siw_unmap_pages(iov, kmap_mask, MAX_ARRAY);
>
> to minimize the iterations over the byte array in 'siw_unmap_pages()',
> we may pass seg-1 instead of MAX_ARRAY

Sounds good.

>
>
> > wqe->processed -= c_tx->bytes_unsent;
> > rv = -EMSGSIZE;
> > goto done_crc;
> >@@ -593,7 +601,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> >struct socket *s)
> > } else {
> > rv = kernel_sendmsg(s, &msg, iov, seg + 1,
> > hdr_len + data_len + trl_len);
> >- siw_unmap_pages(page_array, kmap_mask);
> >+ siw_unmap_pages(iov, kmap_mask, MAX_ARRAY);
>
> to minimize the iterations over the byte array in 'siw_unmap_pages()',
> we may pass seg instead of MAX_ARRAY

Will do.

Thanks for the review! :-D
Ira

2021-06-23 22:20:11

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V3] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

From: Ira Weiny <[email protected]>

kmap() is being deprecated and will break uses of device dax after PKS
protection is introduced.[1]

The use of kmap() in siw_tx_hdt() is all thread local therefore
kmap_local_page() is a sufficient replacement and will work with pgmap
protected pages when those are implemented.

siw_tx_hdt() tracks pages used in a page_array. It uses that array to
unmap pages which were mapped on function exit. Not all entries in the
array are mapped and this is tracked in kmap_mask.

kunmap_local() takes a mapped address rather than a page. Alter
siw_unmap_pages() to take the iov array to reuse the iov_base address of
each mapping. Use PAGE_MASK to get the proper address for
kunmap_local().

kmap_local_page() mappings are tracked in a stack and must be unmapped
in the opposite order they were mapped in. Because segments are mapped
into the page array in increasing index order, modify siw_unmap_pages()
to unmap pages in decreasing order.

Use kmap_local_page() instead of kmap() to map pages in the page_array.

[1] https://lore.kernel.org/lkml/[email protected]/

Signed-off-by: Ira Weiny <[email protected]>

---
Jason, I went ahead and left this a separate patch. Let me know if you really
want this and the other siw squashed.

Changes for V3:
From Bernard
Use 'p' in kmap_local_page()
Use seg as length to siw_unmap_pages()

Changes for V2:
From Bernard
Reuse iov[].iov_base rather than declaring another array
of pointers and preserve the use of kmap_mask to know
which iov's were kmapped.
---
drivers/infiniband/sw/siw/siw_qp_tx.c | 30 +++++++++++++++++----------
1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index db68a10d12cd..89a5b75f7254 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -396,13 +396,20 @@ static int siw_0copy_tx(struct socket *s, struct page **page,

#define MAX_TRAILER (MPA_CRC_SIZE + 4)

-static void siw_unmap_pages(struct page **pp, unsigned long kmap_mask)
+static void siw_unmap_pages(struct kvec *iov, unsigned long kmap_mask, int len)
{
- while (kmap_mask) {
- if (kmap_mask & BIT(0))
- kunmap(*pp);
- pp++;
- kmap_mask >>= 1;
+ int i;
+
+ /*
+ * Work backwards through the array to honor the kmap_local_page()
+ * ordering requirements.
+ */
+ for (i = (len-1); i >= 0; i--) {
+ if (kmap_mask & BIT(i)) {
+ unsigned long addr = (unsigned long)iov[i].iov_base;
+
+ kunmap_local((void *)(addr & PAGE_MASK));
+ }
}
}

@@ -498,7 +505,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
p = siw_get_upage(mem->umem,
sge->laddr + sge_off);
if (unlikely(!p)) {
- siw_unmap_pages(page_array, kmap_mask);
+ siw_unmap_pages(iov, kmap_mask, seg);
wqe->processed -= c_tx->bytes_unsent;
rv = -EFAULT;
goto done_crc;
@@ -506,11 +513,12 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
page_array[seg] = p;

if (!c_tx->use_sendpage) {
- iov[seg].iov_base = kmap(p) + fp_off;
- iov[seg].iov_len = plen;
+ void *kaddr = kmap_local_page(p);

/* Remember for later kunmap() */
kmap_mask |= BIT(seg);
+ iov[seg].iov_base = kaddr + fp_off;
+ iov[seg].iov_len = plen;

if (do_crc)
crypto_shash_update(
@@ -542,7 +550,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)

if (++seg > (int)MAX_ARRAY) {
siw_dbg_qp(tx_qp(c_tx), "to many fragments\n");
- siw_unmap_pages(page_array, kmap_mask);
+ siw_unmap_pages(iov, kmap_mask, seg-1);
wqe->processed -= c_tx->bytes_unsent;
rv = -EMSGSIZE;
goto done_crc;
@@ -593,7 +601,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
} else {
rv = kernel_sendmsg(s, &msg, iov, seg + 1,
hdr_len + data_len + trl_len);
- siw_unmap_pages(page_array, kmap_mask);
+ siw_unmap_pages(iov, kmap_mask, seg+1);
}
if (rv < (int)hdr_len) {
/* Not even complete hdr pushed or negative rv */
--
2.28.0.rc0.12.gb6a658bd00c9

2021-06-24 15:47:48

by Bernard Metzler

[permalink] [raw]
Subject: Re: [PATCH V3] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()


[email protected] wrote: -----

>To: "Jason Gunthorpe" <[email protected]>
>From: [email protected]
>Date: 06/24/2021 12:16AM
>Cc: "Ira Weiny" <[email protected]>, "Mike Marciniszyn"
><[email protected]>, "Dennis Dalessandro"
><[email protected]>, "Doug Ledford"
><[email protected]>, "Faisal Latif" <[email protected]>,
>"Shiraz Saleem" <[email protected]>, "Bernard Metzler"
><[email protected]>, "Kamal Heib" <[email protected]>,
>[email protected], [email protected]
>Subject: [EXTERNAL] [PATCH V3] RDMA/siw: Convert siw_tx_hdt() to
>kmap_local_page()
>
>From: Ira Weiny <[email protected]>
>
>kmap() is being deprecated and will break uses of device dax after
>PKS
>protection is introduced.[1]
>
>The use of kmap() in siw_tx_hdt() is all thread local therefore
>kmap_local_page() is a sufficient replacement and will work with
>pgmap
>protected pages when those are implemented.
>
>siw_tx_hdt() tracks pages used in a page_array. It uses that array
>to
>unmap pages which were mapped on function exit. Not all entries in
>the
>array are mapped and this is tracked in kmap_mask.
>
>kunmap_local() takes a mapped address rather than a page. Alter
>siw_unmap_pages() to take the iov array to reuse the iov_base address
>of
>each mapping. Use PAGE_MASK to get the proper address for
>kunmap_local().
>
>kmap_local_page() mappings are tracked in a stack and must be
>unmapped
>in the opposite order they were mapped in. Because segments are
>mapped
>into the page array in increasing index order, modify
>siw_unmap_pages()
>to unmap pages in decreasing order.
>
>Use kmap_local_page() instead of kmap() to map pages in the
>page_array.
>
>[1]
>INVALID URI REMOVED
>lkml_20201009195033.3208459-2D59-2Dira.weiny-40intel.com_&d=DwIDAg&c=
>jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8ZO1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&
>m=eI4Db7iSlEKRl4l5pYKwY5rL5WXWWxahhxNciwy2lRA&s=vo11VhOvYbAkABdhV6htX
>TmXgFZeWbBZAFnPDvg7Bzs&e=
>
>Signed-off-by: Ira Weiny <[email protected]>
>
>---
>Jason, I went ahead and left this a separate patch. Let me know if
>you really
>want this and the other siw squashed.
>
>Changes for V3:
> From Bernard
> Use 'p' in kmap_local_page()
> Use seg as length to siw_unmap_pages()
>
>Changes for V2:
> From Bernard
> Reuse iov[].iov_base rather than declaring another array
> of pointers and preserve the use of kmap_mask to know
> which iov's were kmapped.
>---
> drivers/infiniband/sw/siw/siw_qp_tx.c | 30
>+++++++++++++++++----------
> 1 file changed, 19 insertions(+), 11 deletions(-)
>
>diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c
>b/drivers/infiniband/sw/siw/siw_qp_tx.c
>index db68a10d12cd..89a5b75f7254 100644
>--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
>+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
>@@ -396,13 +396,20 @@ static int siw_0copy_tx(struct socket *s,
>struct page **page,
>
> #define MAX_TRAILER (MPA_CRC_SIZE + 4)
>
>-static void siw_unmap_pages(struct page **pp, unsigned long
>kmap_mask)
>+static void siw_unmap_pages(struct kvec *iov, unsigned long
>kmap_mask, int len)
> {
>- while (kmap_mask) {
>- if (kmap_mask & BIT(0))
>- kunmap(*pp);
>- pp++;
>- kmap_mask >>= 1;
>+ int i;
>+
>+ /*
>+ * Work backwards through the array to honor the kmap_local_page()
>+ * ordering requirements.
>+ */
>+ for (i = (len-1); i >= 0; i--) {
>+ if (kmap_mask & BIT(i)) {
>+ unsigned long addr = (unsigned long)iov[i].iov_base;
>+
>+ kunmap_local((void *)(addr & PAGE_MASK));
>+ }
> }
> }
>
>@@ -498,7 +505,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> p = siw_get_upage(mem->umem,
> sge->laddr + sge_off);
> if (unlikely(!p)) {
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(iov, kmap_mask, seg);
> wqe->processed -= c_tx->bytes_unsent;
> rv = -EFAULT;
> goto done_crc;
>@@ -506,11 +513,12 @@ static int siw_tx_hdt(struct siw_iwarp_tx
>*c_tx, struct socket *s)
> page_array[seg] = p;
>
> if (!c_tx->use_sendpage) {
>- iov[seg].iov_base = kmap(p) + fp_off;
>- iov[seg].iov_len = plen;
>+ void *kaddr = kmap_local_page(p);
>
> /* Remember for later kunmap() */
> kmap_mask |= BIT(seg);
>+ iov[seg].iov_base = kaddr + fp_off;
>+ iov[seg].iov_len = plen;
>
> if (do_crc)
> crypto_shash_update(
>@@ -542,7 +550,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
>
> if (++seg > (int)MAX_ARRAY) {
> siw_dbg_qp(tx_qp(c_tx), "to many fragments\n");
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(iov, kmap_mask, seg-1);
> wqe->processed -= c_tx->bytes_unsent;
> rv = -EMSGSIZE;
> goto done_crc;
>@@ -593,7 +601,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> } else {
> rv = kernel_sendmsg(s, &msg, iov, seg + 1,
> hdr_len + data_len + trl_len);
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(iov, kmap_mask, seg+1);

seg+1 is one to many, since the last segment references the iWarp
trailer (CRC). There are 2 reason for this multi-segment processing
in the transmit path. (1) efficiency and (2) MTU based packet framing.
The iov contains the complete iWarp frame with header, (potentially
multiple) data fragments, and the CRC. It gets pushed to TCP in one
go, praying for iWarp framing stays intact (which most time works).
So the code can collect data form multiple SGE's of a WRITE or
SEND and tries putting those into one frame, if MTU allows, and
adds header and trailer.
The last segment (seg + 1) references the CRC, which is never kmap'ed.

I'll try the code next days, but it looks good otherwise!

Thanks very much!
> }
> if (rv < (int)hdr_len) {
> /* Not even complete hdr pushed or negative rv */
>--
>2.28.0.rc0.12.gb6a658bd00c9
>
>

2021-06-24 17:35:08

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V3] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

On Thu, Jun 24, 2021 at 03:45:55PM +0000, Bernard Metzler wrote:
>
> >@@ -593,7 +601,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
> >struct socket *s)
> > } else {
> > rv = kernel_sendmsg(s, &msg, iov, seg + 1,
> > hdr_len + data_len + trl_len);
> >- siw_unmap_pages(page_array, kmap_mask);
> >+ siw_unmap_pages(iov, kmap_mask, seg+1);
>
> seg+1 is one to many, since the last segment references the iWarp
> trailer (CRC). There are 2 reason for this multi-segment processing
> in the transmit path. (1) efficiency and (2) MTU based packet framing.
> The iov contains the complete iWarp frame with header, (potentially
> multiple) data fragments, and the CRC. It gets pushed to TCP in one
> go, praying for iWarp framing stays intact (which most time works).
> So the code can collect data form multiple SGE's of a WRITE or
> SEND and tries putting those into one frame, if MTU allows, and
> adds header and trailer.
>
> The last segment (seg + 1) references the CRC, which is never kmap'ed.

siw_unmap_pages() take a length and seg is the index...

But ok so a further optimization...

Fair enough.

>
> I'll try the code next days, but it looks good otherwise!

I believe this will work though.

Ira

> Thanks very much!
> > }
> > if (rv < (int)hdr_len) {
> > /* Not even complete hdr pushed or negative rv */
> >--
> >2.28.0.rc0.12.gb6a658bd00c9
> >
> >

2021-06-24 17:50:15

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V4] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

From: Ira Weiny <[email protected]>

kmap() is being deprecated and will break uses of device dax after PKS
protection is introduced.[1]

The use of kmap() in siw_tx_hdt() is all thread local therefore
kmap_local_page() is a sufficient replacement and will work with pgmap
protected pages when those are implemented.

siw_tx_hdt() tracks pages used in a page_array. It uses that array to
unmap pages which were mapped on function exit. Not all entries in the
array are mapped and this is tracked in kmap_mask.

kunmap_local() takes a mapped address rather than a page. Alter
siw_unmap_pages() to take the iov array to reuse the iov_base address of
each mapping. Use PAGE_MASK to get the proper address for
kunmap_local().

kmap_local_page() mappings are tracked in a stack and must be unmapped
in the opposite order they were mapped in. Because segments are mapped
into the page array in increasing index order, modify siw_unmap_pages()
to unmap pages in decreasing order.

Use kmap_local_page() instead of kmap() to map pages in the page_array.

[1] https://lore.kernel.org/lkml/[email protected]/

Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V4:
From Bernard
Further optimize siw_unmap_pages() by eliminating the
CRC page from the iov array.

Changes for V3:
From Bernard
Use 'p' in kmap_local_page()
Use seg as length to siw_unmap_pages()

Changes for V2:
From Bernard
Reuse iov[].iov_base rather than declaring another array
of pointers and preserve the use of kmap_mask to know
which iov's were kmapped.
---
drivers/infiniband/sw/siw/siw_qp_tx.c | 30 +++++++++++++++++----------
1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c
index db68a10d12cd..1f4e60257700 100644
--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
@@ -396,13 +396,20 @@ static int siw_0copy_tx(struct socket *s, struct page **page,

#define MAX_TRAILER (MPA_CRC_SIZE + 4)

-static void siw_unmap_pages(struct page **pp, unsigned long kmap_mask)
+static void siw_unmap_pages(struct kvec *iov, unsigned long kmap_mask, int len)
{
- while (kmap_mask) {
- if (kmap_mask & BIT(0))
- kunmap(*pp);
- pp++;
- kmap_mask >>= 1;
+ int i;
+
+ /*
+ * Work backwards through the array to honor the kmap_local_page()
+ * ordering requirements.
+ */
+ for (i = (len-1); i >= 0; i--) {
+ if (kmap_mask & BIT(i)) {
+ unsigned long addr = (unsigned long)iov[i].iov_base;
+
+ kunmap_local((void *)(addr & PAGE_MASK));
+ }
}
}

@@ -498,7 +505,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
p = siw_get_upage(mem->umem,
sge->laddr + sge_off);
if (unlikely(!p)) {
- siw_unmap_pages(page_array, kmap_mask);
+ siw_unmap_pages(iov, kmap_mask, seg);
wqe->processed -= c_tx->bytes_unsent;
rv = -EFAULT;
goto done_crc;
@@ -506,11 +513,12 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
page_array[seg] = p;

if (!c_tx->use_sendpage) {
- iov[seg].iov_base = kmap(p) + fp_off;
- iov[seg].iov_len = plen;
+ void *kaddr = kmap_local_page(p);

/* Remember for later kunmap() */
kmap_mask |= BIT(seg);
+ iov[seg].iov_base = kaddr + fp_off;
+ iov[seg].iov_len = plen;

if (do_crc)
crypto_shash_update(
@@ -542,7 +550,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)

if (++seg > (int)MAX_ARRAY) {
siw_dbg_qp(tx_qp(c_tx), "to many fragments\n");
- siw_unmap_pages(page_array, kmap_mask);
+ siw_unmap_pages(iov, kmap_mask, seg-1);
wqe->processed -= c_tx->bytes_unsent;
rv = -EMSGSIZE;
goto done_crc;
@@ -593,7 +601,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s)
} else {
rv = kernel_sendmsg(s, &msg, iov, seg + 1,
hdr_len + data_len + trl_len);
- siw_unmap_pages(page_array, kmap_mask);
+ siw_unmap_pages(iov, kmap_mask, seg);
}
if (rv < (int)hdr_len) {
/* Not even complete hdr pushed or negative rv */
--
2.28.0.rc0.12.gb6a658bd00c9

2021-06-29 21:35:34

by Bernard Metzler

[permalink] [raw]
Subject: Re: [PATCH V4] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

[email protected] wrote: -----

>To: "Jason Gunthorpe" <[email protected]>
>From: [email protected]
>Date: 06/24/2021 07:48PM
>Cc: "Ira Weiny" <[email protected]>, "Mike Marciniszyn"
><[email protected]>, "Dennis Dalessandro"
><[email protected]>, "Doug Ledford"
><[email protected]>, "Faisal Latif" <[email protected]>,
>"Shiraz Saleem" <[email protected]>, "Bernard Metzler"
><[email protected]>, "Kamal Heib" <[email protected]>,
>[email protected], [email protected]
>Subject: [EXTERNAL] [PATCH V4] RDMA/siw: Convert siw_tx_hdt() to
>kmap_local_page()
>
>From: Ira Weiny <[email protected]>
>
>kmap() is being deprecated and will break uses of device dax after
>PKS
>protection is introduced.[1]
>
>The use of kmap() in siw_tx_hdt() is all thread local therefore
>kmap_local_page() is a sufficient replacement and will work with
>pgmap
>protected pages when those are implemented.
>
>siw_tx_hdt() tracks pages used in a page_array. It uses that array
>to
>unmap pages which were mapped on function exit. Not all entries in
>the
>array are mapped and this is tracked in kmap_mask.
>
>kunmap_local() takes a mapped address rather than a page. Alter
>siw_unmap_pages() to take the iov array to reuse the iov_base address
>of
>each mapping. Use PAGE_MASK to get the proper address for
>kunmap_local().
>
>kmap_local_page() mappings are tracked in a stack and must be
>unmapped
>in the opposite order they were mapped in. Because segments are
>mapped
>into the page array in increasing index order, modify
>siw_unmap_pages()
>to unmap pages in decreasing order.
>
>Use kmap_local_page() instead of kmap() to map pages in the
>page_array.
>
>[1]
>INVALID URI REMOVED
>lkml_20201009195033.3208459-2D59-2Dira.weiny-40intel.com_&d=DwIDAg&c=
>jf_iaSHvJObTbx-siA1ZOg&r=2TaYXQ0T-r8ZO1PP1alNwU_QJcRRLfmYTAgd3QCvqSc&
>m=01QnZvj05j7vvgDChewVpHJlDytiIFuttai7VRUdJMs&s=zS4nDlvF_3MDi9wu7GaL6
>qooDhiboqP5ii5ozBeDpLE&e=
>
>Signed-off-by: Ira Weiny <[email protected]>
>
>---
>Changes for V4:
> From Bernard
> Further optimize siw_unmap_pages() by eliminating the
> CRC page from the iov array.
>
>Changes for V3:
> From Bernard
> Use 'p' in kmap_local_page()
> Use seg as length to siw_unmap_pages()
>
>Changes for V2:
> From Bernard
> Reuse iov[].iov_base rather than declaring another array
> of pointers and preserve the use of kmap_mask to know
> which iov's were kmapped.
>---
> drivers/infiniband/sw/siw/siw_qp_tx.c | 30
>+++++++++++++++++----------
> 1 file changed, 19 insertions(+), 11 deletions(-)
>
>diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c
>b/drivers/infiniband/sw/siw/siw_qp_tx.c
>index db68a10d12cd..1f4e60257700 100644
>--- a/drivers/infiniband/sw/siw/siw_qp_tx.c
>+++ b/drivers/infiniband/sw/siw/siw_qp_tx.c
>@@ -396,13 +396,20 @@ static int siw_0copy_tx(struct socket *s,
>struct page **page,
>
> #define MAX_TRAILER (MPA_CRC_SIZE + 4)
>
>-static void siw_unmap_pages(struct page **pp, unsigned long
>kmap_mask)
>+static void siw_unmap_pages(struct kvec *iov, unsigned long
>kmap_mask, int len)
> {
>- while (kmap_mask) {
>- if (kmap_mask & BIT(0))
>- kunmap(*pp);
>- pp++;
>- kmap_mask >>= 1;
>+ int i;
>+
>+ /*
>+ * Work backwards through the array to honor the kmap_local_page()
>+ * ordering requirements.
>+ */
>+ for (i = (len-1); i >= 0; i--) {
>+ if (kmap_mask & BIT(i)) {
>+ unsigned long addr = (unsigned long)iov[i].iov_base;
>+
>+ kunmap_local((void *)(addr & PAGE_MASK));
>+ }
> }
> }
>
>@@ -498,7 +505,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> p = siw_get_upage(mem->umem,
> sge->laddr + sge_off);
> if (unlikely(!p)) {
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(iov, kmap_mask, seg);
> wqe->processed -= c_tx->bytes_unsent;
> rv = -EFAULT;
> goto done_crc;
>@@ -506,11 +513,12 @@ static int siw_tx_hdt(struct siw_iwarp_tx
>*c_tx, struct socket *s)
> page_array[seg] = p;
>
> if (!c_tx->use_sendpage) {
>- iov[seg].iov_base = kmap(p) + fp_off;
>- iov[seg].iov_len = plen;
>+ void *kaddr = kmap_local_page(p);
>
> /* Remember for later kunmap() */
> kmap_mask |= BIT(seg);
>+ iov[seg].iov_base = kaddr + fp_off;
>+ iov[seg].iov_len = plen;
>
> if (do_crc)
> crypto_shash_update(
>@@ -542,7 +550,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
>
> if (++seg > (int)MAX_ARRAY) {
> siw_dbg_qp(tx_qp(c_tx), "to many fragments\n");
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(iov, kmap_mask, seg-1);
> wqe->processed -= c_tx->bytes_unsent;
> rv = -EMSGSIZE;
> goto done_crc;
>@@ -593,7 +601,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx,
>struct socket *s)
> } else {
> rv = kernel_sendmsg(s, &msg, iov, seg + 1,
> hdr_len + data_len + trl_len);
>- siw_unmap_pages(page_array, kmap_mask);
>+ siw_unmap_pages(iov, kmap_mask, seg);
> }
> if (rv < (int)hdr_len) {
> /* Not even complete hdr pushed or negative rv */
>--
>2.28.0.rc0.12.gb6a658bd00c9
>
>
Sry my misconfigured email attached some HTML crap. So I did not
reach the list.

Tested V4 which works as intended. Thanks, Ira!

Reviewed-by: Bernard Metzler <[email protected]>

2021-06-29 22:18:02

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V4] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

> >
> Sry my misconfigured email attached some HTML crap. So I did not
> reach the list.

NP...

>
> Tested V4 which works as intended. Thanks, Ira!
>
> Reviewed-by: Bernard Metzler <[email protected]>

Thanks!
Ira

2021-07-15 18:28:57

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH V4] RDMA/siw: Convert siw_tx_hdt() to kmap_local_page()

On Thu, Jun 24, 2021 at 10:48:14AM -0700, [email protected] wrote:
> From: Ira Weiny <[email protected]>
>
> kmap() is being deprecated and will break uses of device dax after PKS
> protection is introduced.[1]
>
> The use of kmap() in siw_tx_hdt() is all thread local therefore
> kmap_local_page() is a sufficient replacement and will work with pgmap
> protected pages when those are implemented.
>
> siw_tx_hdt() tracks pages used in a page_array. It uses that array to
> unmap pages which were mapped on function exit. Not all entries in the
> array are mapped and this is tracked in kmap_mask.
>
> kunmap_local() takes a mapped address rather than a page. Alter
> siw_unmap_pages() to take the iov array to reuse the iov_base address of
> each mapping. Use PAGE_MASK to get the proper address for
> kunmap_local().
>
> kmap_local_page() mappings are tracked in a stack and must be unmapped
> in the opposite order they were mapped in. Because segments are mapped
> into the page array in increasing index order, modify siw_unmap_pages()
> to unmap pages in decreasing order.
>
> Use kmap_local_page() instead of kmap() to map pages in the page_array.
>
> [1] https://lore.kernel.org/lkml/[email protected]/
>
> Signed-off-by: Ira Weiny <[email protected]>
> Reviewed-by: Bernard Metzler <[email protected]>
> ---
> Changes for V4:
> From Bernard
> Further optimize siw_unmap_pages() by eliminating the
> CRC page from the iov array.
>
> Changes for V3:
> From Bernard
> Use 'p' in kmap_local_page()
> Use seg as length to siw_unmap_pages()
>
> Changes for V2:
> From Bernard
> Reuse iov[].iov_base rather than declaring another array
> of pointers and preserve the use of kmap_mask to know
> which iov's were kmapped.
> ---
> drivers/infiniband/sw/siw/siw_qp_tx.c | 30 +++++++++++++++++----------
> 1 file changed, 19 insertions(+), 11 deletions(-)

Applied to for-next, thanks

Jason