2007-09-12 12:39:45

by Joachim Fenkes

[permalink] [raw]
Subject: [PATCH] IB/ehca: Make sure user pages are from hugetlb before using MR large pages

From: Hoang-Nam Nguyen <[email protected]>

...because, on virtualized hardware like System p, we can't be sure that the
physical pages behind them are contiguous.

Signed-off-by: Joachim Fenkes <[email protected]>
---

Another patch for 2.6.24 that will apply cleanly on top of my previous
patchset. Please review and apply. Thanks!

drivers/infiniband/hw/ehca/ehca_classes.h | 8 ++--
drivers/infiniband/hw/ehca/ehca_mrmw.c | 82 +++++++++++++++++++++++++----
2 files changed, 75 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/ehca/ehca_classes.h b/drivers/infiniband/hw/ehca/ehca_classes.h
index 206d4eb..c2edd4c 100644
--- a/drivers/infiniband/hw/ehca/ehca_classes.h
+++ b/drivers/infiniband/hw/ehca/ehca_classes.h
@@ -99,10 +99,10 @@ struct ehca_sport {
struct ehca_sma_attr saved_attr;
};

-#define HCA_CAP_MR_PGSIZE_4K 1
-#define HCA_CAP_MR_PGSIZE_64K 2
-#define HCA_CAP_MR_PGSIZE_1M 4
-#define HCA_CAP_MR_PGSIZE_16M 8
+#define HCA_CAP_MR_PGSIZE_4K 0x80000000
+#define HCA_CAP_MR_PGSIZE_64K 0x40000000
+#define HCA_CAP_MR_PGSIZE_1M 0x20000000
+#define HCA_CAP_MR_PGSIZE_16M 0x10000000

struct ehca_shca {
struct ib_device ib_device;
diff --git a/drivers/infiniband/hw/ehca/ehca_mrmw.c b/drivers/infiniband/hw/ehca/ehca_mrmw.c
index 4c8f3b3..1bb9d23 100644
--- a/drivers/infiniband/hw/ehca/ehca_mrmw.c
+++ b/drivers/infiniband/hw/ehca/ehca_mrmw.c
@@ -41,6 +41,8 @@
*/

#include <asm/current.h>
+#include <linux/mm.h>
+#include <linux/hugetlb.h>

#include <rdma/ib_umem.h>

@@ -51,6 +53,7 @@

#define NUM_CHUNKS(length, chunk_size) \
(((length) + (chunk_size - 1)) / (chunk_size))
+
/* max number of rpages (per hcall register_rpages) */
#define MAX_RPAGES 512

@@ -279,6 +282,52 @@ reg_phys_mr_exit0:
} /* end ehca_reg_phys_mr() */

/*----------------------------------------------------------------------*/
+static int ehca_is_mem_hugetlb(unsigned long addr, unsigned long size)
+{
+ struct vm_area_struct **vma_list;
+ unsigned long cur_base;
+ unsigned long npages;
+ int ret, i;
+
+ vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL);
+ if (!vma_list) {
+ ehca_gen_err("Can not alloc vma_list");
+ return -ENOMEM;
+ }
+
+ down_write(&current->mm->mmap_sem);
+ npages = PAGE_ALIGN(size + (addr & ~PAGE_MASK)) >> PAGE_SHIFT;
+ cur_base = addr & PAGE_MASK;
+
+ while (npages) {
+ ret = get_user_pages(current, current->mm, cur_base,
+ min_t(int, npages,
+ PAGE_SIZE / sizeof (*vma_list)),
+ 1, 0, NULL, vma_list);
+
+ if (ret < 0) {
+ ehca_gen_err("get_user_pages() failed "
+ "ret=%x cur_base=%lx", ret, cur_base);
+ goto is_hugetlb_out;
+ }
+
+ for (i = 0; i < ret; ++i)
+ if (!is_vm_hugetlb_page(vma_list[i])) {
+ ret = 0;
+ goto is_hugetlb_out;
+ }
+
+ cur_base += ret * PAGE_SIZE;
+ npages -= ret;
+ }
+ ret = 1;
+
+is_hugetlb_out:
+ up_write(&current->mm->mmap_sem);
+ free_page((unsigned long) vma_list);
+
+ return ret;
+}

struct ib_mr *ehca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
u64 virt, int mr_access_flags,
@@ -346,18 +395,29 @@ struct ib_mr *ehca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length,
num_kpages = NUM_CHUNKS((virt % PAGE_SIZE) + length, PAGE_SIZE);
/* select proper hw_pgsize */
if (ehca_mr_largepage &&
- (shca->hca_cap_mr_pgsize & HCA_CAP_MR_PGSIZE_16M)) {
- if (length <= EHCA_MR_PGSIZE4K
- && PAGE_SIZE == EHCA_MR_PGSIZE4K)
- hwpage_size = EHCA_MR_PGSIZE4K;
- else if (length <= EHCA_MR_PGSIZE64K)
- hwpage_size = EHCA_MR_PGSIZE64K;
- else if (length <= EHCA_MR_PGSIZE1M)
- hwpage_size = EHCA_MR_PGSIZE1M;
- else
- hwpage_size = EHCA_MR_PGSIZE16M;
+ shca->hca_cap_mr_pgsize & HCA_CAP_MR_PGSIZE_16M) {
+ ret = ehca_is_mem_hugetlb(virt, length);
+ switch (ret) {
+ case 0: /* mem is not from hugetlb */
+ hwpage_size = PAGE_SIZE;
+ break;
+ case 1:
+ if (length <= EHCA_MR_PGSIZE4K
+ && PAGE_SIZE == EHCA_MR_PGSIZE4K)
+ hwpage_size = EHCA_MR_PGSIZE4K;
+ else if (length <= EHCA_MR_PGSIZE64K)
+ hwpage_size = EHCA_MR_PGSIZE64K;
+ else if (length <= EHCA_MR_PGSIZE1M)
+ hwpage_size = EHCA_MR_PGSIZE1M;
+ else
+ hwpage_size = EHCA_MR_PGSIZE16M;
+ break;
+ default: /* out of mem */
+ ib_mr = ERR_PTR(-ENOMEM);
+ goto reg_user_mr_exit1;
+ }
} else
- hwpage_size = EHCA_MR_PGSIZE4K;
+ hwpage_size = EHCA_MR_PGSIZE4K; /* ehca1 can only 4k */
ehca_dbg(pd->device, "hwpage_size=%lx", hwpage_size);

reg_user_mr_fallback:
--
1.5.2




2007-09-13 04:33:56

by Roland Dreier

[permalink] [raw]
Subject: Re: [PATCH] IB/ehca: Make sure user pages are from hugetlb before using MR large pages

> -#define HCA_CAP_MR_PGSIZE_4K 1
> -#define HCA_CAP_MR_PGSIZE_64K 2
> -#define HCA_CAP_MR_PGSIZE_1M 4
> -#define HCA_CAP_MR_PGSIZE_16M 8
> +#define HCA_CAP_MR_PGSIZE_4K 0x80000000
> +#define HCA_CAP_MR_PGSIZE_64K 0x40000000
> +#define HCA_CAP_MR_PGSIZE_1M 0x20000000
> +#define HCA_CAP_MR_PGSIZE_16M 0x10000000

Not sure I understand what this has to do with things... is this an
unrelated fix?

> +static int ehca_is_mem_hugetlb(unsigned long addr, unsigned long size)

This is rather awful -- another call to get_user_pages() to iterate
over all the vmas...

I would suggest extending ib_umem_get() to check the vmas and adding a
member to struct ib_umem to say whether the memory is entirely covered
by hugetlb pages or not.

> + ret = ehca_is_mem_hugetlb(virt, length);
> + switch (ret) {
> + case 0: /* mem is not from hugetlb */
> + hwpage_size = PAGE_SIZE;
> + break;
> + case 1:
> + if (length <= EHCA_MR_PGSIZE4K
> + && PAGE_SIZE == EHCA_MR_PGSIZE4K)
> + hwpage_size = EHCA_MR_PGSIZE4K;
> + else if (length <= EHCA_MR_PGSIZE64K)
> + hwpage_size = EHCA_MR_PGSIZE64K;
> + else if (length <= EHCA_MR_PGSIZE1M)
> + hwpage_size = EHCA_MR_PGSIZE1M;
> + else
> + hwpage_size = EHCA_MR_PGSIZE16M;
> + break;
> + default: /* out of mem */
> + ib_mr = ERR_PTR(-ENOMEM);
> + goto reg_user_mr_exit1;

It seems like it would be better to just assume the memory is not from
a hugetlb is ehca_is_mem_hugetlb() fails its memory allocation and
fall back to the PAGE_SIZE case rather than failing entirely.

Also if someone runs a kernel with 64K pages on a machine where they
end up being simulated from 4K pages, do you have the same issue with
the hypervisor ganging together non-contiguous pages?

- R.

2007-09-13 09:49:46

by Christoph Raisch

[permalink] [raw]
Subject: Re: [PATCH] IB/ehca: Make sure user pages are from hugetlb before using MR large pages

Roland Dreier wrote on 13.09.2007 06:33:45:

>
> Also if someone runs a kernel with 64K pages on a machine where they
> end up being simulated from 4K pages, do you have the same issue with
> the hypervisor ganging together non-contiguous pages?
With todays hypervisor and todays pagesizes and todays MMUs
we don't have this problem if eHCA is enabled.

It is difficult to make predictions about the future,
but that's not specific to driver development. ;-)

>
> - R.

- Christoph R.

2007-09-13 14:27:18

by Joachim Fenkes

[permalink] [raw]
Subject: Re: [PATCH] IB/ehca: Make sure user pages are from hugetlb before using MR large pages

Roland Dreier <[email protected]> wrote on 13.09.2007 06:33:45:

> > -#define HCA_CAP_MR_PGSIZE_4K 1
> > -#define HCA_CAP_MR_PGSIZE_64K 2
> > -#define HCA_CAP_MR_PGSIZE_1M 4
> > -#define HCA_CAP_MR_PGSIZE_16M 8
> > +#define HCA_CAP_MR_PGSIZE_4K 0x80000000
> > +#define HCA_CAP_MR_PGSIZE_64K 0x40000000
> > +#define HCA_CAP_MR_PGSIZE_1M 0x20000000
> > +#define HCA_CAP_MR_PGSIZE_16M 0x10000000
>
> Not sure I understand what this has to do with things... is this an
> unrelated fix?

Kinda. I can put it into its own patch if you want.

> I would suggest extending ib_umem_get() to check the vmas and adding a
> member to struct ib_umem to say whether the memory is entirely covered
> by hugetlb pages or not.

I like that approach - one patch coming right up! =)

> > + default: /* out of mem */
> > + ib_mr = ERR_PTR(-ENOMEM);
> > + goto reg_user_mr_exit1;
>
> It seems like it would be better to just assume the memory is not from
> a hugetlb is ehca_is_mem_hugetlb() fails its memory allocation and
> fall back to the PAGE_SIZE case rather than failing entirely.

If ehca_is_mem_hugetlb() runs out of memory, ehca_reg_mr() is rather
unlikely to get the memory, but it's worth a try, I'll give you that. I'll
make the umem patch work that way.

Joachim