LinuxLists.cc - [PATCH 0/1] RDMA/umem: minor bug fix and cleanup in error handling paths

2019-03-02 03:28:08

Subject: [PATCH 0/1] RDMA/umem: minor bug fix and cleanup in error handling paths

From: John Hubbard <[email protected]>

Hi,

Ira Weiny alerted me to a couple of places where I'd missed a change from
put_page() to put_user_page(), in my pending patchsets. But when I
attempted to dive more deeply into that code, I ran into things that I
*think* should be fixed up a bit.

I hope I didn't completely miss something. I am not set up to test this
(no Infiniband hardware) so I'm not even sure I should send this out, but
it seems like the best way to ask "is this code really working the way I
think it does"?

This applies to the latest linux.git tree.

Cc: Ira Weiny <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: [email protected]
Cc: [email protected]

John Hubbard (1):
RDMA/umem: minor bug fix and cleanup in error handling paths

drivers/infiniband/core/umem_odp.c | 24 +++++++++---------------
1 file changed, 9 insertions(+), 15 deletions(-)

--
2.21.0

2019-03-02 03:28:18

by John Hubbard

[permalink] [raw]

Subject: [PATCH] RDMA/umem: minor bug fix and cleanup in error handling paths

From: John Hubbard <[email protected]>

1. Bug fix: the error handling release pages starting
at the first page that experienced an error.

2. Refinement: release_pages() is better than put_page()
in a loop.

3. Dead code removal: the check for (user_virt & ~page_mask)
is checking for a condition that can never happen,
because earlier:

user_virt = user_virt & page_mask;

...so, remove that entire phrase.

4. Minor: As long as I'm here, shorten up a couple of long lines
in the same function, without harming the ability to
grep for the printed error message.

Cc: Ira Weiny <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: John Hubbard <[email protected]>
---
drivers/infiniband/core/umem_odp.c | 24 +++++++++---------------
1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index acb882f279cb..294bf6676947 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -648,25 +648,17 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,

if (npages < 0) {
if (npages != -EAGAIN)
- pr_warn("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
+ pr_warn("fail to get %zu user pages with error %d\n",
+ gup_num_pages, npages);
else
- pr_debug("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
+ pr_debug("fail to get %zu user pages with error %d\n",
+ gup_num_pages, npages);
break;
}

bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
mutex_lock(&umem_odp->umem_mutex);
for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
- if (user_virt & ~page_mask) {
- p += PAGE_SIZE;
- if (page_to_phys(local_page_list[j]) != p) {
- ret = -EFAULT;
- break;
- }
- put_page(local_page_list[j]);
- continue;
- }
-
ret = ib_umem_odp_map_dma_single_page(
umem_odp, k, local_page_list[j],
access_mask, current_seq);
@@ -684,9 +676,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
mutex_unlock(&umem_odp->umem_mutex);

if (ret < 0) {
- /* Release left over pages when handling errors. */
- for (++j; j < npages; ++j)
- put_page(local_page_list[j]);
+ /*
+ * Release pages, starting at the the first page
+ * that experienced an error.
+ */
+ release_pages(&local_page_list[j], npages - j);
break;
}
}
--
2.21.0

2019-03-02 16:05:10

by kernel test robot

[permalink] [raw]

Subject: Re: [PATCH] RDMA/umem: minor bug fix and cleanup in error handling paths

Hi John,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on rdma/for-next]
[also build test ERROR on v5.0-rc8 next-20190301]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/john-hubbard-gmail-com/RDMA-umem-minor-bug-fix-and-cleanup-in-error-handling-paths/20190302-233314
base: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
config: i386-randconfig-x002-201908 (attached as .config)
compiler: gcc-8 (Debian 8.2.0-20) 8.2.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386

All errors (new ones prefixed by >>):

drivers/infiniband/core/umem_odp.c: In function 'ib_umem_odp_map_dma_pages':
>> drivers/infiniband/core/umem_odp.c:684:4: error: implicit declaration of function 'release_pages'; did you mean 'release_task'? [-Werror=implicit-function-declaration]
release_pages(&local_page_list[j], npages - j);
^~~~~~~~~~~~~
release_task
cc1: some warnings being treated as errors

vim +684 drivers/infiniband/core/umem_odp.c

559
560 /**
561 * ib_umem_odp_map_dma_pages - Pin and DMA map userspace memory in an ODP MR.
562 *
563 * Pins the range of pages passed in the argument, and maps them to
564 * DMA addresses. The DMA addresses of the mapped pages is updated in
565 * umem_odp->dma_list.
566 *
567 * Returns the number of pages mapped in success, negative error code
568 * for failure.
569 * An -EAGAIN error code is returned when a concurrent mmu notifier prevents
570 * the function from completing its task.
571 * An -ENOENT error code indicates that userspace process is being terminated
572 * and mm was already destroyed.
573 * @umem_odp: the umem to map and pin
574 * @user_virt: the address from which we need to map.
575 * @bcnt: the minimal number of bytes to pin and map. The mapping might be
576 * bigger due to alignment, and may also be smaller in case of an error
577 * pinning or mapping a page. The actual pages mapped is returned in
578 * the return value.
579 * @access_mask: bit mask of the requested access permissions for the given
580 * range.
581 * @current_seq: the MMU notifiers sequance value for synchronization with
582 * invalidations. the sequance number is read from
583 * umem_odp->notifiers_seq before calling this function
584 */
585 int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
586 u64 bcnt, u64 access_mask,
587 unsigned long current_seq)
588 {
589 struct ib_umem *umem = &umem_odp->umem;
590 struct task_struct *owning_process = NULL;
591 struct mm_struct *owning_mm = umem_odp->umem.owning_mm;
592 struct page **local_page_list = NULL;
593 u64 page_mask, off;
594 int j, k, ret = 0, start_idx, npages = 0, page_shift;
595 unsigned int flags = 0;
596 phys_addr_t p = 0;
597
598 if (access_mask == 0)
599 return -EINVAL;
600
601 if (user_virt < ib_umem_start(umem) ||
602 user_virt + bcnt > ib_umem_end(umem))
603 return -EFAULT;
604
605 local_page_list = (struct page **)__get_free_page(GFP_KERNEL);
606 if (!local_page_list)
607 return -ENOMEM;
608
609 page_shift = umem->page_shift;
610 page_mask = ~(BIT(page_shift) - 1);
611 off = user_virt & (~page_mask);
612 user_virt = user_virt & page_mask;
613 bcnt += off; /* Charge for the first page offset as well. */
614
615 /*
616 * owning_process is allowed to be NULL, this means somehow the mm is
617 * existing beyond the lifetime of the originating process.. Presumably
618 * mmget_not_zero will fail in this case.
619 */
620 owning_process = get_pid_task(umem_odp->per_mm->tgid, PIDTYPE_PID);
621 if (WARN_ON(!mmget_not_zero(umem_odp->umem.owning_mm))) {
622 ret = -EINVAL;
623 goto out_put_task;
624 }
625
626 if (access_mask & ODP_WRITE_ALLOWED_BIT)
627 flags |= FOLL_WRITE;
628
629 start_idx = (user_virt - ib_umem_start(umem)) >> page_shift;
630 k = start_idx;
631
632 while (bcnt > 0) {
633 const size_t gup_num_pages = min_t(size_t,
634 (bcnt + BIT(page_shift) - 1) >> page_shift,
635 PAGE_SIZE / sizeof(struct page *));
636
637 down_read(&owning_mm->mmap_sem);
638 /*
639 * Note: this might result in redundent page getting. We can
640 * avoid this by checking dma_list to be 0 before calling
641 * get_user_pages. However, this make the code much more
642 * complex (and doesn't gain us much performance in most use
643 * cases).
644 */
645 npages = get_user_pages_remote(owning_process, owning_mm,
646 user_virt, gup_num_pages,
647 flags, local_page_list, NULL, NULL);
648 up_read(&owning_mm->mmap_sem);
649
650 if (npages < 0) {
651 if (npages != -EAGAIN)
652 pr_warn("fail to get %zu user pages with error %d\n",
653 gup_num_pages, npages);
654 else
655 pr_debug("fail to get %zu user pages with error %d\n",
656 gup_num_pages, npages);
657 break;
658 }
659
660 bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
661 mutex_lock(&umem_odp->umem_mutex);
662 for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
663 ret = ib_umem_odp_map_dma_single_page(
664 umem_odp, k, local_page_list[j],
665 access_mask, current_seq);
666 if (ret < 0) {
667 if (ret != -EAGAIN)
668 pr_warn("ib_umem_odp_map_dma_single_page failed with error %d\n", ret);
669 else
670 pr_debug("ib_umem_odp_map_dma_single_page failed with error %d\n", ret);
671 break;
672 }
673
674 p = page_to_phys(local_page_list[j]);
675 k++;
676 }
677 mutex_unlock(&umem_odp->umem_mutex);
678
679 if (ret < 0) {
680 /*
681 * Release pages, starting at the the first page
682 * that experienced an error.
683 */
> 684 release_pages(&local_page_list[j], npages - j);
685 break;
686 }
687 }
688
689 if (ret >= 0) {
690 if (npages < 0 && k == start_idx)
691 ret = npages;
692 else
693 ret = k - start_idx;
694 }
695
696 mmput(owning_mm);
697 out_put_task:
698 if (owning_process)
699 put_task_struct(owning_process);
700 free_page((unsigned long)local_page_list);
701 return ret;
702 }
703 EXPORT_SYMBOL(ib_umem_odp_map_dma_pages);
704

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation

Attachments:

(No filename) (6.92 kB)
.config.gz (35.57 kB)
Download all attachments

2019-03-02 19:26:28

by kernel test robot

[permalink] [raw]

Subject: Re: [PATCH] RDMA/umem: minor bug fix and cleanup in error handling paths

Hi John,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on rdma/for-next]
[also build test ERROR on v5.0-rc8 next-20190301]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/john-hubbard-gmail-com/RDMA-umem-minor-bug-fix-and-cleanup-in-error-handling-paths/20190302-233314
base: https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git for-next
config: nds32-allyesconfig (attached as .config)
compiler: nds32le-linux-gcc (GCC) 6.4.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=6.4.0 make.cross ARCH=nds32

All errors (new ones prefixed by >>):

drivers/infiniband/core/umem_odp.c: In function 'ib_umem_odp_map_dma_pages':
>> drivers/infiniband/core/umem_odp.c:684:4: error: implicit declaration of function 'release_pages' [-Werror=implicit-function-declaration]
release_pages(&local_page_list[j], npages - j);
^~~~~~~~~~~~~
cc1: some warnings being treated as errors

vim +/release_pages +684 drivers/infiniband/core/umem_odp.c

559
560 /**
561 * ib_umem_odp_map_dma_pages - Pin and DMA map userspace memory in an ODP MR.
562 *
563 * Pins the range of pages passed in the argument, and maps them to
564 * DMA addresses. The DMA addresses of the mapped pages is updated in
565 * umem_odp->dma_list.
566 *
567 * Returns the number of pages mapped in success, negative error code
568 * for failure.
569 * An -EAGAIN error code is returned when a concurrent mmu notifier prevents
570 * the function from completing its task.
571 * An -ENOENT error code indicates that userspace process is being terminated
572 * and mm was already destroyed.
573 * @umem_odp: the umem to map and pin
574 * @user_virt: the address from which we need to map.
575 * @bcnt: the minimal number of bytes to pin and map. The mapping might be
576 * bigger due to alignment, and may also be smaller in case of an error
577 * pinning or mapping a page. The actual pages mapped is returned in
578 * the return value.
579 * @access_mask: bit mask of the requested access permissions for the given
580 * range.
581 * @current_seq: the MMU notifiers sequance value for synchronization with
582 * invalidations. the sequance number is read from
583 * umem_odp->notifiers_seq before calling this function
584 */
585 int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
586 u64 bcnt, u64 access_mask,
587 unsigned long current_seq)
588 {
589 struct ib_umem *umem = &umem_odp->umem;
590 struct task_struct *owning_process = NULL;
591 struct mm_struct *owning_mm = umem_odp->umem.owning_mm;
592 struct page **local_page_list = NULL;
593 u64 page_mask, off;
594 int j, k, ret = 0, start_idx, npages = 0, page_shift;
595 unsigned int flags = 0;
596 phys_addr_t p = 0;
597
598 if (access_mask == 0)
599 return -EINVAL;
600
601 if (user_virt < ib_umem_start(umem) ||
602 user_virt + bcnt > ib_umem_end(umem))
603 return -EFAULT;
604
605 local_page_list = (struct page **)__get_free_page(GFP_KERNEL);
606 if (!local_page_list)
607 return -ENOMEM;
608
609 page_shift = umem->page_shift;
610 page_mask = ~(BIT(page_shift) - 1);
611 off = user_virt & (~page_mask);
612 user_virt = user_virt & page_mask;
613 bcnt += off; /* Charge for the first page offset as well. */
614
615 /*
616 * owning_process is allowed to be NULL, this means somehow the mm is
617 * existing beyond the lifetime of the originating process.. Presumably
618 * mmget_not_zero will fail in this case.
619 */
620 owning_process = get_pid_task(umem_odp->per_mm->tgid, PIDTYPE_PID);
621 if (WARN_ON(!mmget_not_zero(umem_odp->umem.owning_mm))) {
622 ret = -EINVAL;
623 goto out_put_task;
624 }
625
626 if (access_mask & ODP_WRITE_ALLOWED_BIT)
627 flags |= FOLL_WRITE;
628
629 start_idx = (user_virt - ib_umem_start(umem)) >> page_shift;
630 k = start_idx;
631
632 while (bcnt > 0) {
633 const size_t gup_num_pages = min_t(size_t,
634 (bcnt + BIT(page_shift) - 1) >> page_shift,
635 PAGE_SIZE / sizeof(struct page *));
636
637 down_read(&owning_mm->mmap_sem);
638 /*
639 * Note: this might result in redundent page getting. We can
640 * avoid this by checking dma_list to be 0 before calling
641 * get_user_pages. However, this make the code much more
642 * complex (and doesn't gain us much performance in most use
643 * cases).
644 */
645 npages = get_user_pages_remote(owning_process, owning_mm,
646 user_virt, gup_num_pages,
647 flags, local_page_list, NULL, NULL);
648 up_read(&owning_mm->mmap_sem);
649
650 if (npages < 0) {
651 if (npages != -EAGAIN)
652 pr_warn("fail to get %zu user pages with error %d\n",
653 gup_num_pages, npages);
654 else
655 pr_debug("fail to get %zu user pages with error %d\n",
656 gup_num_pages, npages);
657 break;
658 }
659
660 bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
661 mutex_lock(&umem_odp->umem_mutex);
662 for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
663 ret = ib_umem_odp_map_dma_single_page(
664 umem_odp, k, local_page_list[j],
665 access_mask, current_seq);
666 if (ret < 0) {
667 if (ret != -EAGAIN)
668 pr_warn("ib_umem_odp_map_dma_single_page failed with error %d\n", ret);
669 else
670 pr_debug("ib_umem_odp_map_dma_single_page failed with error %d\n", ret);
671 break;
672 }
673
674 p = page_to_phys(local_page_list[j]);
675 k++;
676 }
677 mutex_unlock(&umem_odp->umem_mutex);
678
679 if (ret < 0) {
680 /*
681 * Release pages, starting at the the first page
682 * that experienced an error.
683 */
> 684 release_pages(&local_page_list[j], npages - j);
685 break;
686 }
687 }
688
689 if (ret >= 0) {
690 if (npages < 0 && k == start_idx)
691 ret = npages;
692 else
693 ret = k - start_idx;
694 }
695
696 mmput(owning_mm);
697 out_put_task:
698 if (owning_process)
699 put_task_struct(owning_process);
700 free_page((unsigned long)local_page_list);
701 return ret;
702 }
703 EXPORT_SYMBOL(ib_umem_odp_map_dma_pages);
704

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation

Attachments:

(No filename) (7.04 kB)
.config.gz (48.75 kB)
Download all attachments

2019-03-02 20:26:27

by John Hubbard

[permalink] [raw]

Subject: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

From: John Hubbard <[email protected]>

1. Bug fix: the error handling release pages starting
at the first page that experienced an error.

2. Refinement: release_pages() is better than put_page()
in a loop.

3. Dead code removal: the check for (user_virt & ~page_mask)
is checking for a condition that can never happen,
because earlier:

user_virt = user_virt & page_mask;

...so, remove that entire phrase.

4. Minor: As long as I'm here, shorten up a couple of long lines
in the same function, without harming the ability to
grep for the printed error message.

Cc: Ira Weiny <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Doug Ledford <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: John Hubbard <[email protected]>
---

v2: Fixes a kbuild test robot reported build failure, by directly
including pagemap.h

drivers/infiniband/core/umem_odp.c | 25 ++++++++++---------------
1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index acb882f279cb..83872c1f3f2c 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -40,6 +40,7 @@
#include <linux/vmalloc.h>
#include <linux/hugetlb.h>
#include <linux/interval_tree_generic.h>
+#include <linux/pagemap.h>

#include <rdma/ib_verbs.h>
#include <rdma/ib_umem.h>
@@ -648,25 +649,17 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,

if (npages < 0) {
if (npages != -EAGAIN)
- pr_warn("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
+ pr_warn("fail to get %zu user pages with error %d\n",
+ gup_num_pages, npages);
else
- pr_debug("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
+ pr_debug("fail to get %zu user pages with error %d\n",
+ gup_num_pages, npages);
break;
}

bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
mutex_lock(&umem_odp->umem_mutex);
for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
- if (user_virt & ~page_mask) {
- p += PAGE_SIZE;
- if (page_to_phys(local_page_list[j]) != p) {
- ret = -EFAULT;
- break;
- }
- put_page(local_page_list[j]);
- continue;
- }
-
ret = ib_umem_odp_map_dma_single_page(
umem_odp, k, local_page_list[j],
access_mask, current_seq);
@@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
mutex_unlock(&umem_odp->umem_mutex);

if (ret < 0) {
- /* Release left over pages when handling errors. */
- for (++j; j < npages; ++j)
- put_page(local_page_list[j]);
+ /*
+ * Release pages, starting at the the first page
+ * that experienced an error.
+ */
+ release_pages(&local_page_list[j], npages - j);
break;
}
}
--
2.21.0

2019-03-03 03:46:20

by Ira Weiny

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

FWIW I don't have ODP hardware either. So I can't test this either.

On Sat, Mar 02, 2019 at 12:24:35PM -0800, [email protected] wrote:
> From: John Hubbard <[email protected]>
>
> 1. Bug fix: the error handling release pages starting
> at the first page that experienced an error.
>
> 2. Refinement: release_pages() is better than put_page()
> in a loop.
>
> 3. Dead code removal: the check for (user_virt & ~page_mask)
> is checking for a condition that can never happen,
> because earlier:
>
> user_virt = user_virt & page_mask;
>
> ...so, remove that entire phrase.
>
> 4. Minor: As long as I'm here, shorten up a couple of long lines
> in the same function, without harming the ability to
> grep for the printed error message.
>
> Cc: Ira Weiny <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Doug Ledford <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: John Hubbard <[email protected]>
> ---
>
> v2: Fixes a kbuild test robot reported build failure, by directly
> including pagemap.h
>
> drivers/infiniband/core/umem_odp.c | 25 ++++++++++---------------
> 1 file changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
> index acb882f279cb..83872c1f3f2c 100644
> --- a/drivers/infiniband/core/umem_odp.c
> +++ b/drivers/infiniband/core/umem_odp.c
> @@ -40,6 +40,7 @@
> #include <linux/vmalloc.h>
> #include <linux/hugetlb.h>
> #include <linux/interval_tree_generic.h>
> +#include <linux/pagemap.h>
>
> #include <rdma/ib_verbs.h>
> #include <rdma/ib_umem.h>
> @@ -648,25 +649,17 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
>
> if (npages < 0) {
> if (npages != -EAGAIN)
> - pr_warn("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
> + pr_warn("fail to get %zu user pages with error %d\n",
> + gup_num_pages, npages);
> else
> - pr_debug("fail to get %zu user pages with error %d\n", gup_num_pages, npages);
> + pr_debug("fail to get %zu user pages with error %d\n",
> + gup_num_pages, npages);
> break;
> }
>
> bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
> mutex_lock(&umem_odp->umem_mutex);
> for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
> - if (user_virt & ~page_mask) {
> - p += PAGE_SIZE;
> - if (page_to_phys(local_page_list[j]) != p) {
> - ret = -EFAULT;
> - break;
> - }
> - put_page(local_page_list[j]);
> - continue;
> - }
> -

I think this is trying to account for compound pages. (ie page_mask could
represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
But putting the page in that case seems to be the wrong thing to do?

Yes this was added by Artemy[1] now cc'ed.

> ret = ib_umem_odp_map_dma_single_page(
> umem_odp, k, local_page_list[j],
> access_mask, current_seq);
> @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
> mutex_unlock(&umem_odp->umem_mutex);
>
> if (ret < 0) {
> - /* Release left over pages when handling errors. */
> - for (++j; j < npages; ++j)
> - put_page(local_page_list[j]);
> + /*
> + * Release pages, starting at the the first page
> + * that experienced an error.
> + */
> + release_pages(&local_page_list[j], npages - j);

My concern here is that release_pages handle compound pages, perhaps
differently from the above code so calling it may may not work? But I've not
really spent a lot of time on it...

:-/

Ira

[1]

commit 403cd12e2cf759ead5cbdcb62bf9872b9618d400
Author: Artemy Kovalyov <[email protected]>
Date: Wed Apr 5 09:23:55 2017 +0300

IB/umem: Add contiguous ODP support

Currenlty ODP supports only regular MMU pages.
Add ODP support for regions consisting of physically contiguous chunks
of arbitrary order (huge pages for instance) to improve performance.

Signed-off-by: Artemy Kovalyov <[email protected]>
Signed-off-by: Leon Romanovsky <[email protected]>
Signed-off-by: Doug Ledford <[email protected]>

2019-03-03 09:54:33

by Artemy Kovalyov

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On 02/03/2019 21:44, Ira Weiny wrote:
>
> On Sat, Mar 02, 2019 at 12:24:35PM -0800, [email protected] wrote:
>> From: John Hubbard <[email protected]>
>>
>> ...
>> 3. Dead code removal: the check for (user_virt & ~page_mask)
>> is checking for a condition that can never happen,
>> because earlier:
>>
>> user_virt = user_virt & page_mask;
>>
>> ...so, remove that entire phrase.
>>
>>
>> bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
>> mutex_lock(&umem_odp->umem_mutex);
>> for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
>> - if (user_virt & ~page_mask) {
>> - p += PAGE_SIZE;
>> - if (page_to_phys(local_page_list[j]) != p) {
>> - ret = -EFAULT;
>> - break;
>> - }
>> - put_page(local_page_list[j]);
>> - continue;
>> - }
>> -
>
> I think this is trying to account for compound pages. (ie page_mask could
> represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
> But putting the page in that case seems to be the wrong thing to do?
>
> Yes this was added by Artemy[1] now cc'ed.

Right, this is for huge pages, please keep it.
put_page() needed to decrement refcount of the head page.

2019-03-03 22:38:30

by John Hubbard

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On 3/3/19 1:52 AM, Artemy Kovalyov wrote:
>
>
> On 02/03/2019 21:44, Ira Weiny wrote:
>>
>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, [email protected] wrote:
>>> From: John Hubbard <[email protected]>
>>>
>>> ...
>>> 3. Dead code removal: the check for (user_virt & ~page_mask)
>>> is checking for a condition that can never happen,
>>> because earlier:
>>>
>>>      user_virt = user_virt & page_mask;
>>>
>>> ...so, remove that entire phrase.
>>>
>>>           bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
>>>           mutex_lock(&umem_odp->umem_mutex);
>>>           for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
>>> -            if (user_virt & ~page_mask) {
>>> -                p += PAGE_SIZE;
>>> -                if (page_to_phys(local_page_list[j]) != p) {
>>> -                    ret = -EFAULT;
>>> -                    break;
>>> -                }
>>> -                put_page(local_page_list[j]);
>>> -                continue;
>>> -            }
>>> -
>>
>> I think this is trying to account for compound pages. (ie page_mask could
>> represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
>> But putting the page in that case seems to be the wrong thing to do?
>>
>> Yes this was added by Artemy[1] now cc'ed.
>
> Right, this is for huge pages, please keep it.
> put_page() needed to decrement refcount of the head page.
>

OK, thanks for explaining! Artemy, while you're here, any thoughts about the
release_pages, and the change of the starting point, from the other part of the
patch:

@@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp,
u64 user_virt,
mutex_unlock(&umem_odp->umem_mutex);

if (ret < 0) {
- /* Release left over pages when handling errors. */
- for (++j; j < npages; ++j)
- put_page(local_page_list[j]);
+ /*
+ * Release pages, starting at the the first page
+ * that experienced an error.
+ */
+ release_pages(&local_page_list[j], npages - j);
break;
}
}

?

thanks,
--
John Hubbard
NVIDIA

2019-03-04 00:58:08

by Ira Weiny

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On Sun, Mar 03, 2019 at 11:52:41AM +0200, Artemy Kovalyov wrote:
>
>
> On 02/03/2019 21:44, Ira Weiny wrote:
> >
> > On Sat, Mar 02, 2019 at 12:24:35PM -0800, [email protected] wrote:
> > > From: John Hubbard <[email protected]>
> > >
> > > ...
> > > 3. Dead code removal: the check for (user_virt & ~page_mask)
> > > is checking for a condition that can never happen,
> > > because earlier:
> > >
> > > user_virt = user_virt & page_mask;
> > >
> > > ...so, remove that entire phrase.
> > >
> > > bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
> > > mutex_lock(&umem_odp->umem_mutex);
> > > for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
> > > - if (user_virt & ~page_mask) {
> > > - p += PAGE_SIZE;
> > > - if (page_to_phys(local_page_list[j]) != p) {
> > > - ret = -EFAULT;
> > > - break;
> > > - }
> > > - put_page(local_page_list[j]);
> > > - continue;
> > > - }
> > > -
> >
> > I think this is trying to account for compound pages. (ie page_mask could
> > represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
> > But putting the page in that case seems to be the wrong thing to do?
> >
> > Yes this was added by Artemy[1] now cc'ed.
>
> Right, this is for huge pages, please keep it.
> put_page() needed to decrement refcount of the head page.

You mean decrement the refcount of the _non_-head pages?

Ira

>

2019-03-04 06:45:00

by Leon Romanovsky

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On Sun, Mar 03, 2019 at 02:37:33PM -0800, John Hubbard wrote:
> On 3/3/19 1:52 AM, Artemy Kovalyov wrote:
> >
> >
> > On 02/03/2019 21:44, Ira Weiny wrote:
> > >
> > > On Sat, Mar 02, 2019 at 12:24:35PM -0800, [email protected] wrote:
> > > > From: John Hubbard <[email protected]>
> > > >
> > > > ...
> > > > 3. Dead code removal: the check for (user_virt & ~page_mask)
> > > > is checking for a condition that can never happen,
> > > > because earlier:
> > > >
> > > > ???? user_virt = user_virt & page_mask;
> > > >
> > > > ...so, remove that entire phrase.
> > > >
> > > > ????????? bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
> > > > ????????? mutex_lock(&umem_odp->umem_mutex);
> > > > ????????? for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
> > > > -??????????? if (user_virt & ~page_mask) {
> > > > -??????????????? p += PAGE_SIZE;
> > > > -??????????????? if (page_to_phys(local_page_list[j]) != p) {
> > > > -??????????????????? ret = -EFAULT;
> > > > -??????????????????? break;
> > > > -??????????????? }
> > > > -??????????????? put_page(local_page_list[j]);
> > > > -??????????????? continue;
> > > > -??????????? }
> > > > -
> > >
> > > I think this is trying to account for compound pages. (ie page_mask could
> > > represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
> > > But putting the page in that case seems to be the wrong thing to do?
> > >
> > > Yes this was added by Artemy[1] now cc'ed.
> >
> > Right, this is for huge pages, please keep it.
> > put_page() needed to decrement refcount of the head page.
> >
>
> OK, thanks for explaining! Artemy, while you're here, any thoughts about the
> release_pages, and the change of the starting point, from the other part
> of the patch:

Your release pages code is right fix, it will be great if you prepare
proper and standalone patch.

Thanks

>
> @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp
> *umem_odp, u64 user_virt,
> mutex_unlock(&umem_odp->umem_mutex);
>
> if (ret < 0) {
> - /* Release left over pages when handling errors. */
> - for (++j; j < npages; ++j)
> - put_page(local_page_list[j]);
> + /*
> + * Release pages, starting at the the first page
> + * that experienced an error.
> + */
> + release_pages(&local_page_list[j], npages - j);
> break;
> }
> }
>
> ?
>
> thanks,
> --
> John Hubbard
> NVIDIA
>

Attachments:

(No filename) (2.42 kB)
signature.asc (817.00 B)
Download all attachments

2019-03-04 23:11:43

by John Hubbard

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On 3/3/19 8:55 AM, Ira Weiny wrote:
> On Sun, Mar 03, 2019 at 11:52:41AM +0200, Artemy Kovalyov wrote:
>>
>>
>> On 02/03/2019 21:44, Ira Weiny wrote:
>>>
>>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, [email protected] wrote:
>>>> From: John Hubbard <[email protected]>
>>>>
>>>> ...
>>>> 3. Dead code removal: the check for (user_virt & ~page_mask)
>>>> is checking for a condition that can never happen,
>>>> because earlier:
>>>>
>>>> user_virt = user_virt & page_mask;
>>>>
>>>> ...so, remove that entire phrase.
>>>>
>>>> bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
>>>> mutex_lock(&umem_odp->umem_mutex);
>>>> for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
>>>> - if (user_virt & ~page_mask) {
>>>> - p += PAGE_SIZE;
>>>> - if (page_to_phys(local_page_list[j]) != p) {
>>>> - ret = -EFAULT;
>>>> - break;
>>>> - }
>>>> - put_page(local_page_list[j]);
>>>> - continue;
>>>> - }
>>>> -
>>>
>>> I think this is trying to account for compound pages. (ie page_mask could
>>> represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
>>> But putting the page in that case seems to be the wrong thing to do?
>>>
>>> Yes this was added by Artemy[1] now cc'ed.
>>
>> Right, this is for huge pages, please keep it.
>> put_page() needed to decrement refcount of the head page.
>
> You mean decrement the refcount of the _non_-head pages?
>
> Ira
>

Actually, I'm sure Artemy means head page, because put_page() always
operates on the head page.

And this reminds me that I have a problem to solve nearby: get_user_pages
on huge pages increments the page->_refcount *for each tail page* as well.
That's a minor problem for my put_user_page()
patchset, because my approach so far assumed that I could just change us
over to:

get_user_page(): increments page->_refcount by a large amount (1024)

put_user_page(): decrements page->_refcount by a large amount (1024)

...and just stop doing the odd (to me) technique of incrementing once for
each tail page. I cannot see any reason why that's actually required, as
opposed to just "raise the page->_refcount enough to avoid losing the head
page too soon".

However, it may be tricky to do this in one pass. Probably at first, I'll have
to do this horrible thing approach:

get_user_page(): increments page->_refcount by a large amount (1024)

put_user_page(): decrements page->_refcount by a large amount (1024) MULTIPLIED
by the number of tail pages. argghhh that's ugly.

thanks,
--
John Hubbard
NVIDIA

2019-03-04 23:36:41

by John Hubbard

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On 3/4/19 3:11 PM, John Hubbard wrote:
> On 3/3/19 8:55 AM, Ira Weiny wrote:
>> On Sun, Mar 03, 2019 at 11:52:41AM +0200, Artemy Kovalyov wrote:
>>>
>>>
>>> On 02/03/2019 21:44, Ira Weiny wrote:
>>>>
>>>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, [email protected] wrote:
>>>>> From: John Hubbard <[email protected]>
>>>>>
>>>>> ...
>>>>> 3. Dead code removal: the check for (user_virt & ~page_mask)
>>>>> is checking for a condition that can never happen,
>>>>> because earlier:
>>>>>
>>>>> user_virt = user_virt & page_mask;
>>>>>
>>>>> ...so, remove that entire phrase.
>>>>>
>>>>> bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
>>>>> mutex_lock(&umem_odp->umem_mutex);
>>>>> for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
>>>>> - if (user_virt & ~page_mask) {
>>>>> - p += PAGE_SIZE;
>>>>> - if (page_to_phys(local_page_list[j]) != p) {
>>>>> - ret = -EFAULT;
>>>>> - break;
>>>>> - }
>>>>> - put_page(local_page_list[j]);
>>>>> - continue;
>>>>> - }
>>>>> -
>>>>
>>>> I think this is trying to account for compound pages. (ie page_mask could
>>>> represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
>>>> But putting the page in that case seems to be the wrong thing to do?
>>>>
>>>> Yes this was added by Artemy[1] now cc'ed.
>>>
>>> Right, this is for huge pages, please keep it.
>>> put_page() needed to decrement refcount of the head page.
>>
>> You mean decrement the refcount of the _non_-head pages?
>>
>> Ira
>>
>
> Actually, I'm sure Artemy means head page, because put_page() always
> operates on the head page.
>
> And this reminds me that I have a problem to solve nearby: get_user_pages
> on huge pages increments the page->_refcount *for each tail page* as well.
> That's a minor problem for my put_user_page()
> patchset, because my approach so far assumed that I could just change us
> over to:
>
> get_user_page(): increments page->_refcount by a large amount (1024)
>
> put_user_page(): decrements page->_refcount by a large amount (1024)
>
> ...and just stop doing the odd (to me) technique of incrementing once for
> each tail page. I cannot see any reason why that's actually required, as
> opposed to just "raise the page->_refcount enough to avoid losing the head
> page too soon".
>
> However, it may be tricky to do this in one pass. Probably at first, I'll have
> to do this horrible thing approach:
>
> get_user_page(): increments page->_refcount by a large amount (1024)
>
> put_user_page(): decrements page->_refcount by a large amount (1024) MULTIPLIED
> by the number of tail pages. argghhh that's ugly.
>

I see that this is still not stated quite right.

...to clarify, I mean to leave the existing behavior alone. So it would be
the call sites (not put_user_page as the above says) that would be doing all
that decrementing. The call sites know how many decrements are appropriate.

Unless someone thinks of a clever way to clean this up in one shot. I'm not
really seeing any way.

thanks,
--
John Hubbard
NVIDIA

2019-03-05 00:54:52

by Jason Gunthorpe

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On Mon, Mar 04, 2019 at 03:11:05PM -0800, John Hubbard wrote:

> get_user_page(): increments page->_refcount by a large amount (1024)
>
> put_user_page(): decrements page->_refcount by a large amount (1024)
>
> ...and just stop doing the odd (to me) technique of incrementing once for
> each tail page. I cannot see any reason why that's actually required, as
> opposed to just "raise the page->_refcount enough to avoid losing the head
> page too soon".

I'd very much like to see this in the infiniband umem code - the extra
work and cost of touching every page in a huge page is very much
undesired.

Jason

2019-03-05 04:15:56

by Ira Weiny

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On Mon, Mar 04, 2019 at 03:11:05PM -0800, John Hubbard wrote:
> On 3/3/19 8:55 AM, Ira Weiny wrote:
> > On Sun, Mar 03, 2019 at 11:52:41AM +0200, Artemy Kovalyov wrote:
> >>
> >>
> >> On 02/03/2019 21:44, Ira Weiny wrote:
> >>>
> >>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, [email protected] wrote:
> >>>> From: John Hubbard <[email protected]>
> >>>>
> >>>> ...
> >>>> 3. Dead code removal: the check for (user_virt & ~page_mask)
> >>>> is checking for a condition that can never happen,
> >>>> because earlier:
> >>>>
> >>>> user_virt = user_virt & page_mask;
> >>>>
> >>>> ...so, remove that entire phrase.
> >>>>
> >>>> bcnt -= min_t(size_t, npages << PAGE_SHIFT, bcnt);
> >>>> mutex_lock(&umem_odp->umem_mutex);
> >>>> for (j = 0; j < npages; j++, user_virt += PAGE_SIZE) {
> >>>> - if (user_virt & ~page_mask) {
> >>>> - p += PAGE_SIZE;
> >>>> - if (page_to_phys(local_page_list[j]) != p) {
> >>>> - ret = -EFAULT;
> >>>> - break;
> >>>> - }
> >>>> - put_page(local_page_list[j]);
> >>>> - continue;
> >>>> - }
> >>>> -
> >>>
> >>> I think this is trying to account for compound pages. (ie page_mask could
> >>> represent more than PAGE_SIZE which is what user_virt is being incrimented by.)
> >>> But putting the page in that case seems to be the wrong thing to do?
> >>>
> >>> Yes this was added by Artemy[1] now cc'ed.
> >>
> >> Right, this is for huge pages, please keep it.
> >> put_page() needed to decrement refcount of the head page.
> >
> > You mean decrement the refcount of the _non_-head pages?
> >
> > Ira
> >
>
> Actually, I'm sure Artemy means head page, because put_page() always
> operates on the head page.
>
> And this reminds me that I have a problem to solve nearby: get_user_pages
> on huge pages increments the page->_refcount *for each tail page* as well.
> That's a minor problem for my put_user_page()
> patchset, because my approach so far assumed that I could just change us
> over to:
>
> get_user_page(): increments page->_refcount by a large amount (1024)
>
> put_user_page(): decrements page->_refcount by a large amount (1024)
>
> ...and just stop doing the odd (to me) technique of incrementing once for
> each tail page. I cannot see any reason why that's actually required, as
> opposed to just "raise the page->_refcount enough to avoid losing the head
> page too soon".

What about splitting a huge page?

From Documention/vm/transhuge.rst

<quoute>
split_huge_page internally has to distribute the refcounts in the head
page to the tail pages before clearing all PG_head/tail bits from the page
structures. It can be done easily for refcounts taken by page table
entries. But we don't have enough information on how to distribute any
additional pins (i.e. from get_user_pages). split_huge_page() fails any
requests to split pinned huge page: it expects page count to be equal to
sum of mapcount of all sub-pages plus one (split_huge_page caller must
have reference for head page).
</quote>

FWIW, I'm not sure why it needs to "store" the reference in the head page for
this. I don't see any check to make sure the ref has been "stored" but I'm not
really familiar with the compound page code yet.

Ira

>
> However, it may be tricky to do this in one pass. Probably at first, I'll have
> to do this horrible thing approach:
>
> get_user_page(): increments page->_refcount by a large amount (1024)
>
> put_user_page(): decrements page->_refcount by a large amount (1024) MULTIPLIED
> by the number of tail pages. argghhh that's ugly.
>
> thanks,
> --
> John Hubbard
> NVIDIA
>

2019-03-05 20:21:58

by John Hubbard

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On 3/4/19 12:13 PM, Ira Weiny wrote:
[snip]
>> And this reminds me that I have a problem to solve nearby: get_user_pages
>> on huge pages increments the page->_refcount *for each tail page* as well.
>> That's a minor problem for my put_user_page()
>> patchset, because my approach so far assumed that I could just change us
>> over to:
>>
>> get_user_page(): increments page->_refcount by a large amount (1024)
>>
>> put_user_page(): decrements page->_refcount by a large amount (1024)
>>
>> ...and just stop doing the odd (to me) technique of incrementing once for
>> each tail page. I cannot see any reason why that's actually required, as
>> opposed to just "raise the page->_refcount enough to avoid losing the head
>> page too soon".
>
> What about splitting a huge page?
>
> From Documention/vm/transhuge.rst
>
> <quoute>
> split_huge_page internally has to distribute the refcounts in the head
> page to the tail pages before clearing all PG_head/tail bits from the page
> structures. It can be done easily for refcounts taken by page table
> entries. But we don't have enough information on how to distribute any
> additional pins (i.e. from get_user_pages). split_huge_page() fails any
> requests to split pinned huge page: it expects page count to be equal to
> sum of mapcount of all sub-pages plus one (split_huge_page caller must
> have reference for head page).
> </quote>
>

heh, so in the end, split_huge_page just needs enough information to say
"no" for gup pages. So as long as page->_refcount avoids one particular
value, the code keeps working. :)

> FWIW, I'm not sure why it needs to "store" the reference in the head page for
> this. I don't see any check to make sure the ref has been "stored" but I'm not
> really familiar with the compound page code yet.
>
> Ira
>

Thanks for peeking at this, I'll look deeper too.

thanks,
--
John Hubbard
NVIDIA

2019-03-06 01:32:58

by Artemy Kovalyov

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On 04/03/2019 00:37, John Hubbard wrote:
> On 3/3/19 1:52 AM, Artemy Kovalyov wrote:
>>
>>
>> On 02/03/2019 21:44, Ira Weiny wrote:
>>>
>>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, [email protected] wrote:
>>>> From: John Hubbard <[email protected]>
>>>>
>>>> ...
>
> OK, thanks for explaining! Artemy, while you're here, any thoughts about the
> release_pages, and the change of the starting point, from the other part of the
> patch:
>
> @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp,
> u64 user_virt,
> mutex_unlock(&umem_odp->umem_mutex);
>
> if (ret < 0) {
> - /* Release left over pages when handling errors. */
> - for (++j; j < npages; ++j)
release_pages() is an optimized batch put_page() so it's ok.
but! release starting from page next to one cause failure in
ib_umem_odp_map_dma_single_page() is correct because failure flow of
this functions already called put_page().
So release_pages(&local_page_list[j+1], npages - j-1) would be correct.
> - put_page(local_page_list[j]);
> + /*
> + * Release pages, starting at the the first page
> + * that experienced an error.
> + */
> + release_pages(&local_page_list[j], npages - j);
> break;
> }
> }
>
> ?
>
> thanks,
>

2019-03-06 01:38:58

by Jason Gunthorpe

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On Wed, Mar 06, 2019 at 03:02:36AM +0200, Artemy Kovalyov wrote:
>
>
> On 04/03/2019 00:37, John Hubbard wrote:
> > On 3/3/19 1:52 AM, Artemy Kovalyov wrote:
> > >
> > >
> > > On 02/03/2019 21:44, Ira Weiny wrote:
> > > >
> > > > On Sat, Mar 02, 2019 at 12:24:35PM -0800, [email protected] wrote:
> > > > > From: John Hubbard <[email protected]>
> > > > >
> > > > > ...
> >
> > OK, thanks for explaining! Artemy, while you're here, any thoughts about the
> > release_pages, and the change of the starting point, from the other part of the
> > patch:
> >
> > @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp,
> > u64 user_virt,
> > mutex_unlock(&umem_odp->umem_mutex);
> >
> > if (ret < 0) {
> > - /* Release left over pages when handling errors. */
> > - for (++j; j < npages; ++j)
> release_pages() is an optimized batch put_page() so it's ok.
> but! release starting from page next to one cause failure in
> ib_umem_odp_map_dma_single_page() is correct because failure flow of this
> functions already called put_page().
> So release_pages(&local_page_list[j+1], npages - j-1) would be correct.

Someone send a fixup patch please...

Jason

2019-03-06 01:40:17

by John Hubbard

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On 3/5/19 5:32 PM, Jason Gunthorpe wrote:
> On Wed, Mar 06, 2019 at 03:02:36AM +0200, Artemy Kovalyov wrote:
>>
>>
>> On 04/03/2019 00:37, John Hubbard wrote:
>>> On 3/3/19 1:52 AM, Artemy Kovalyov wrote:
>>>>
>>>>
>>>> On 02/03/2019 21:44, Ira Weiny wrote:
>>>>>
>>>>> On Sat, Mar 02, 2019 at 12:24:35PM -0800, [email protected] wrote:
>>>>>> From: John Hubbard <[email protected]>
>>>>>>
>>>>>> ...
>>>
>>> OK, thanks for explaining! Artemy, while you're here, any thoughts about the
>>> release_pages, and the change of the starting point, from the other part of the
>>> patch:
>>>
>>> @@ -684,9 +677,11 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp,
>>> u64 user_virt,
>>> mutex_unlock(&umem_odp->umem_mutex);
>>>
>>> if (ret < 0) {
>>> - /* Release left over pages when handling errors. */
>>> - for (++j; j < npages; ++j)
>> release_pages() is an optimized batch put_page() so it's ok.
>> but! release starting from page next to one cause failure in
>> ib_umem_odp_map_dma_single_page() is correct because failure flow of this
>> functions already called put_page().
>> So release_pages(&local_page_list[j+1], npages - j-1) would be correct.
>
> Someone send a fixup patch please...
>
> Jason

Yeah, I'm on it. Just need to double-check that this is the case. But Jason,
you're confirming it already, so that helps too.

Patch coming shortly.

thanks,
--
John Hubbard
NVIDIA

2019-03-06 01:42:43

by John Hubbard

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On 3/5/19 5:34 PM, John Hubbard wrote:
[snip]
>>> So release_pages(&local_page_list[j+1], npages - j-1) would be correct.
>>
>> Someone send a fixup patch please...
>>
>> Jason
>
> Yeah, I'm on it. Just need to double-check that this is the case. But Jason,
> you're confirming it already, so that helps too.
>
> Patch coming shortly.
>

Jason, btw, do you prefer a patch that fixes the previous one, or a new
patch that stands alone? (I'm not sure how this tree is maintained, exactly.)

thanks,
--
John Hubbard
NVIDIA

2019-03-06 01:52:28

by Jason Gunthorpe

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On Tue, Mar 05, 2019 at 05:37:18PM -0800, John Hubbard wrote:
> On 3/5/19 5:34 PM, John Hubbard wrote:
> [snip]
> >>> So release_pages(&local_page_list[j+1], npages - j-1) would be correct.
> >>
> >> Someone send a fixup patch please...
> >>
> >> Jason
> >
> > Yeah, I'm on it. Just need to double-check that this is the case. But Jason,
> > you're confirming it already, so that helps too.

I didn't look, just assuming Artemy is right since he knows this
code..

> > Patch coming shortly.
> >
>
> Jason, btw, do you prefer a patch that fixes the previous one, or a new
> patch that stands alone? (I'm not sure how this tree is maintained, exactly.)

It has past the point when I should apply a fixup, rdma is general has
about an approximately 1 day period after the 'thanks applied' message
where rebase is possible

Otherwise the tree is strictly no rebase

Jason

2019-03-06 02:38:10

by John Hubbard

[permalink] [raw]

Subject: Re: [PATCH v2] RDMA/umem: minor bug fix and cleanup in error handling paths

On 3/5/19 5:51 PM, Jason Gunthorpe wrote:
> On Tue, Mar 05, 2019 at 05:37:18PM -0800, John Hubbard wrote:
>> On 3/5/19 5:34 PM, John Hubbard wrote:
>> [snip]
>>>>> So release_pages(&local_page_list[j+1], npages - j-1) would be correct.
>>>>
>>>> Someone send a fixup patch please...
>>>>
>>>> Jason
>>>
>>> Yeah, I'm on it. Just need to double-check that this is the case. But Jason,
>>> you're confirming it already, so that helps too.
>
> I didn't look, just assuming Artemy is right since he knows this
> code..
>

OK. And I've confirmed it, too.

>>> Patch coming shortly.
>>>
>>
>> Jason, btw, do you prefer a patch that fixes the previous one, or a new
>> patch that stands alone? (I'm not sure how this tree is maintained, exactly.)
>
> It has past the point when I should apply a fixup, rdma is general has
> about an approximately 1 day period after the 'thanks applied' message
> where rebase is possible
>
> Otherwise the tree is strictly no rebase
>
> Jason
>

Got it, patch is posted now.

thanks,
--
John Hubbard
NVIDIA