LinuxLists.cc - [PATCH v2 0/7] mm/hmm: more HMM clean up

2019-07-26 00:57:56

Subject: [PATCH v2 0/7] mm/hmm: more HMM clean up

Here are seven more patches for things I found to clean up.
This was based on top of Christoph's seven patches:
"hmm_range_fault related fixes and legacy API removal v3".
I assume this will go into Jason's tree since there will likely be
more HMM changes in this cycle.

Changes from v1 to v2:

Added AMD GPU to hmm_update removal.
Added 2 patches from Christoph.
Added 2 patches as a result of Jason's suggestions.

Christoph Hellwig (2):
mm/hmm: replace the block argument to hmm_range_fault with a flags
value
mm: merge hmm_range_snapshot into hmm_range_fault

Ralph Campbell (5):
mm/hmm: replace hmm_update with mmu_notifier_range
mm/hmm: a few more C style and comment clean ups
mm/hmm: make full use of walk_page_range()
mm/hmm: remove hugetlbfs check in hmm_vma_walk_pmd
mm/hmm: remove hmm_range vma

Documentation/vm/hmm.rst | 17 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 8 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
drivers/gpu/drm/nouveau/nouveau_svm.c | 13 +-
include/linux/hmm.h | 47 ++--
mm/hmm.c | 340 ++++++++----------------
6 files changed, 150 insertions(+), 277 deletions(-)

--
2.20.1

2019-07-26 00:58:03

by Ralph Campbell

[permalink] [raw]

Subject: [PATCH v2 1/7] mm/hmm: replace hmm_update with mmu_notifier_range

The hmm_mirror_ops callback function sync_cpu_device_pagetables() passes
a struct hmm_update which is a simplified version of struct
mmu_notifier_range. This is unnecessary so replace hmm_update with
mmu_notifier_range directly.

Signed-off-by: Ralph Campbell <[email protected]>
Reviewed: Christoph Hellwig <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Ben Skeggs <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 8 +++----
drivers/gpu/drm/nouveau/nouveau_svm.c | 4 ++--
include/linux/hmm.h | 31 ++++----------------------
mm/hmm.c | 13 ++++-------
4 files changed, 14 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index 3971c201f320..cf945080dff3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -196,12 +196,12 @@ static void amdgpu_mn_invalidate_node(struct amdgpu_mn_node *node,
* potentially dirty.
*/
static int amdgpu_mn_sync_pagetables_gfx(struct hmm_mirror *mirror,
- const struct hmm_update *update)
+ const struct mmu_notifier_range *update)
{
struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror);
unsigned long start = update->start;
unsigned long end = update->end;
- bool blockable = update->blockable;
+ bool blockable = mmu_notifier_range_blockable(update);
struct interval_tree_node *it;

/* notification is exclusive, but interval is inclusive */
@@ -244,12 +244,12 @@ static int amdgpu_mn_sync_pagetables_gfx(struct hmm_mirror *mirror,
* are restorted in amdgpu_mn_invalidate_range_end_hsa.
*/
static int amdgpu_mn_sync_pagetables_hsa(struct hmm_mirror *mirror,
- const struct hmm_update *update)
+ const struct mmu_notifier_range *update)
{
struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror);
unsigned long start = update->start;
unsigned long end = update->end;
- bool blockable = update->blockable;
+ bool blockable = mmu_notifier_range_blockable(update);
struct interval_tree_node *it;

/* notification is exclusive, but interval is inclusive */
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 545100f7c594..79b29c918717 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -252,13 +252,13 @@ nouveau_svmm_invalidate(struct nouveau_svmm *svmm, u64 start, u64 limit)

static int
nouveau_svmm_sync_cpu_device_pagetables(struct hmm_mirror *mirror,
- const struct hmm_update *update)
+ const struct mmu_notifier_range *update)
{
struct nouveau_svmm *svmm = container_of(mirror, typeof(*svmm), mirror);
unsigned long start = update->start;
unsigned long limit = update->end;

- if (!update->blockable)
+ if (!mmu_notifier_range_blockable(update))
return -EAGAIN;

SVMM_DBG(svmm, "invalidate %016lx-%016lx", start, limit);
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 9f32586684c9..659e25a15700 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -340,29 +340,6 @@ static inline uint64_t hmm_device_entry_from_pfn(const struct hmm_range *range,

struct hmm_mirror;

-/*
- * enum hmm_update_event - type of update
- * @HMM_UPDATE_INVALIDATE: invalidate range (no indication as to why)
- */
-enum hmm_update_event {
- HMM_UPDATE_INVALIDATE,
-};
-
-/*
- * struct hmm_update - HMM update information for callback
- *
- * @start: virtual start address of the range to update
- * @end: virtual end address of the range to update
- * @event: event triggering the update (what is happening)
- * @blockable: can the callback block/sleep ?
- */
-struct hmm_update {
- unsigned long start;
- unsigned long end;
- enum hmm_update_event event;
- bool blockable;
-};
-
/*
* struct hmm_mirror_ops - HMM mirror device operations callback
*
@@ -383,9 +360,9 @@ struct hmm_mirror_ops {
/* sync_cpu_device_pagetables() - synchronize page tables
*
* @mirror: pointer to struct hmm_mirror
- * @update: update information (see struct hmm_update)
- * Return: -EAGAIN if update.blockable false and callback need to
- * block, 0 otherwise.
+ * @update: update information (see struct mmu_notifier_range)
+ * Return: -EAGAIN if mmu_notifier_range_blockable(update) is false
+ * and callback needs to block, 0 otherwise.
*
* This callback ultimately originates from mmu_notifiers when the CPU
* page table is updated. The device driver must update its page table
@@ -397,7 +374,7 @@ struct hmm_mirror_ops {
* synchronous call.
*/
int (*sync_cpu_device_pagetables)(struct hmm_mirror *mirror,
- const struct hmm_update *update);
+ const struct mmu_notifier_range *update);
};

/*
diff --git a/mm/hmm.c b/mm/hmm.c
index 54b3a4162ae9..4040b4427635 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -165,7 +165,6 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn,
{
struct hmm *hmm = container_of(mn, struct hmm, mmu_notifier);
struct hmm_mirror *mirror;
- struct hmm_update update;
struct hmm_range *range;
unsigned long flags;
int ret = 0;
@@ -173,15 +172,10 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn,
if (!kref_get_unless_zero(&hmm->kref))
return 0;

- update.start = nrange->start;
- update.end = nrange->end;
- update.event = HMM_UPDATE_INVALIDATE;
- update.blockable = mmu_notifier_range_blockable(nrange);
-
spin_lock_irqsave(&hmm->ranges_lock, flags);
hmm->notifiers++;
list_for_each_entry(range, &hmm->ranges, list) {
- if (update.end < range->start || update.start >= range->end)
+ if (nrange->end < range->start || nrange->start >= range->end)
continue;

range->valid = false;
@@ -198,9 +192,10 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn,
list_for_each_entry(mirror, &hmm->mirrors, list) {
int rc;

- rc = mirror->ops->sync_cpu_device_pagetables(mirror, &update);
+ rc = mirror->ops->sync_cpu_device_pagetables(mirror, nrange);
if (rc) {
- if (WARN_ON(update.blockable || rc != -EAGAIN))
+ if (WARN_ON(mmu_notifier_range_blockable(nrange) ||
+ rc != -EAGAIN))
continue;
ret = -EAGAIN;
break;
--
2.20.1

2019-07-26 00:58:07

by Ralph Campbell

[permalink] [raw]

Subject: [PATCH v2 2/7] mm/hmm: a few more C style and comment clean ups

A few more comments and minor programming style clean ups.
There should be no functional changes.

Signed-off-by: Ralph Campbell <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
---
mm/hmm.c | 39 +++++++++++++++++----------------------
1 file changed, 17 insertions(+), 22 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 4040b4427635..362944b0fbca 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -32,7 +32,7 @@ static const struct mmu_notifier_ops hmm_mmu_notifier_ops;
* hmm_get_or_create - register HMM against an mm (HMM internal)
*
* @mm: mm struct to attach to
- * Returns: returns an HMM object, either by referencing the existing
+ * Return: an HMM object, either by referencing the existing
* (per-process) object, or by creating a new one.
*
* This is not intended to be used directly by device drivers. If mm already
@@ -325,8 +325,8 @@ static int hmm_pfns_bad(unsigned long addr,
}

/*
- * hmm_vma_walk_hole() - handle a range lacking valid pmd or pte(s)
- * @start: range virtual start address (inclusive)
+ * hmm_vma_walk_hole_() - handle a range lacking valid pmd or pte(s)
+ * @addr: range virtual start address (inclusive)
* @end: range virtual end address (exclusive)
* @fault: should we fault or not ?
* @write_fault: write fault ?
@@ -376,9 +376,9 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
/*
* So we not only consider the individual per page request we also
* consider the default flags requested for the range. The API can
- * be use in 2 fashions. The first one where the HMM user coalesce
- * multiple page fault into one request and set flags per pfns for
- * of those faults. The second one where the HMM user want to pre-
+ * be used 2 ways. The first one where the HMM user coalesces
+ * multiple page faults into one request and sets flags per pfn for
+ * those faults. The second one where the HMM user wants to pre-
* fault a range with specific flags. For the latter one it is a
* waste to have the user pre-fill the pfn arrays with a default
* flags value.
@@ -388,7 +388,7 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
/* We aren't ask to do anything ... */
if (!(pfns & range->flags[HMM_PFN_VALID]))
return;
- /* If this is device memory than only fault if explicitly requested */
+ /* If this is device memory then only fault if explicitly requested */
if ((cpu_flags & range->flags[HMM_PFN_DEVICE_PRIVATE])) {
/* Do we fault on device memory ? */
if (pfns & range->flags[HMM_PFN_DEVICE_PRIVATE]) {
@@ -502,7 +502,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
hmm_vma_walk->last = end;
return 0;
#else
- /* If THP is not enabled then we should never reach that code ! */
+ /* If THP is not enabled then we should never reach this code ! */
return -EINVAL;
#endif
}
@@ -522,7 +522,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
{
struct hmm_vma_walk *hmm_vma_walk = walk->private;
struct hmm_range *range = hmm_vma_walk->range;
- struct vm_area_struct *vma = walk->vma;
bool fault, write_fault;
uint64_t cpu_flags;
pte_t pte = *ptep;
@@ -571,8 +570,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
if (fault || write_fault) {
pte_unmap(ptep);
hmm_vma_walk->last = addr;
- migration_entry_wait(vma->vm_mm,
- pmdp, addr);
+ migration_entry_wait(walk->mm, pmdp, addr);
return -EBUSY;
}
return 0;
@@ -620,13 +618,11 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
{
struct hmm_vma_walk *hmm_vma_walk = walk->private;
struct hmm_range *range = hmm_vma_walk->range;
- struct vm_area_struct *vma = walk->vma;
uint64_t *pfns = range->pfns;
unsigned long addr = start, i;
pte_t *ptep;
pmd_t pmd;

-
again:
pmd = READ_ONCE(*pmdp);
if (pmd_none(pmd))
@@ -648,7 +644,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
0, &fault, &write_fault);
if (fault || write_fault) {
hmm_vma_walk->last = addr;
- pmd_migration_entry_wait(vma->vm_mm, pmdp);
+ pmd_migration_entry_wait(walk->mm, pmdp);
return -EBUSY;
}
return 0;
@@ -657,11 +653,11 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,

if (pmd_devmap(pmd) || pmd_trans_huge(pmd)) {
/*
- * No need to take pmd_lock here, even if some other threads
+ * No need to take pmd_lock here, even if some other thread
* is splitting the huge pmd we will get that event through
* mmu_notifier callback.
*
- * So just read pmd value and check again its a transparent
+ * So just read pmd value and check again it's a transparent
* huge or device mapping one and compute corresponding pfn
* values.
*/
@@ -675,7 +671,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
}

/*
- * We have handled all the valid case above ie either none, migration,
+ * We have handled all the valid cases above ie either none, migration,
* huge or transparent huge. At this point either it is a valid pmd
* entry pointing to pte directory or it is a bad pmd that will not
* recover.
@@ -795,10 +791,10 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
pte_t entry;
int ret = 0;

- size = 1UL << huge_page_shift(h);
+ size = huge_page_size(h);
mask = size - 1;
if (range->page_shift != PAGE_SHIFT) {
- /* Make sure we are looking at full page. */
+ /* Make sure we are looking at a full page. */
if (start & mask)
return -EINVAL;
if (end < (start + size))
@@ -809,8 +805,7 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
size = PAGE_SIZE;
}

-
- ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte);
+ ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte);
entry = huge_ptep_get(pte);

i = (start - range->start) >> range->page_shift;
@@ -859,7 +854,7 @@ static void hmm_pfns_clear(struct hmm_range *range,
* @start: start virtual address (inclusive)
* @end: end virtual address (exclusive)
* @page_shift: expect page shift for the range
- * Returns 0 on success, -EFAULT if the address space is no longer valid
+ * Return: 0 on success, -EFAULT if the address space is no longer valid
*
* Track updates to the CPU page table see include/linux/hmm.h
*/
--
2.20.1

2019-07-26 00:58:31

by Ralph Campbell

[permalink] [raw]

Subject: [PATCH v2 5/7] mm/hmm: make full use of walk_page_range()

hmm_range_fault() calls find_vma() and walk_page_range() in a loop.
This is unnecessary duplication since walk_page_range() calls find_vma()
in a loop already.
Simplify hmm_range_fault() by defining a walk_test() callback function
to filter unhandled vmas.

Signed-off-by: Ralph Campbell <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
---
mm/hmm.c | 130 ++++++++++++++++++++++++-------------------------------
1 file changed, 57 insertions(+), 73 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 1bc014cddd78..838cd1d50497 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -840,13 +840,44 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
#endif
}

-static void hmm_pfns_clear(struct hmm_range *range,
- uint64_t *pfns,
- unsigned long addr,
- unsigned long end)
+static int hmm_vma_walk_test(unsigned long start,
+ unsigned long end,
+ struct mm_walk *walk)
{
- for (; addr < end; addr += PAGE_SIZE, pfns++)
- *pfns = range->values[HMM_PFN_NONE];
+ struct hmm_vma_walk *hmm_vma_walk = walk->private;
+ struct hmm_range *range = hmm_vma_walk->range;
+ struct vm_area_struct *vma = walk->vma;
+
+ /* If range is no longer valid, force retry. */
+ if (!range->valid)
+ return -EBUSY;
+
+ /*
+ * Skip vma ranges that don't have struct page backing them or
+ * map I/O devices directly.
+ * TODO: handle peer-to-peer device mappings.
+ */
+ if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))
+ return -EFAULT;
+
+ if (is_vm_hugetlb_page(vma)) {
+ if (huge_page_shift(hstate_vma(vma)) != range->page_shift &&
+ range->page_shift != PAGE_SHIFT)
+ return -EINVAL;
+ } else {
+ if (range->page_shift != PAGE_SHIFT)
+ return -EINVAL;
+ }
+
+ /*
+ * If vma does not allow read access, then assume that it does not
+ * allow write access, either. HMM does not support architectures
+ * that allow write without read.
+ */
+ if (!(vma->vm_flags & VM_READ))
+ return -EPERM;
+
+ return 0;
}

/*
@@ -965,82 +996,35 @@ EXPORT_SYMBOL(hmm_range_unregister);
*/
long hmm_range_fault(struct hmm_range *range, unsigned int flags)
{
- const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
- unsigned long start = range->start, end;
- struct hmm_vma_walk hmm_vma_walk;
+ unsigned long start = range->start;
+ struct hmm_vma_walk hmm_vma_walk = {};
struct hmm *hmm = range->hmm;
- struct vm_area_struct *vma;
- struct mm_walk mm_walk;
+ struct mm_walk mm_walk = {};
int ret;

lockdep_assert_held(&hmm->mm->mmap_sem);

- do {
- /* If range is no longer valid force retry. */
- if (!range->valid)
- return -EBUSY;
+ hmm_vma_walk.range = range;
+ hmm_vma_walk.last = start;
+ hmm_vma_walk.flags = flags;
+ mm_walk.private = &hmm_vma_walk;

- vma = find_vma(hmm->mm, start);
- if (vma == NULL || (vma->vm_flags & device_vma))
- return -EFAULT;
-
- if (is_vm_hugetlb_page(vma)) {
- if (huge_page_shift(hstate_vma(vma)) !=
- range->page_shift &&
- range->page_shift != PAGE_SHIFT)
- return -EINVAL;
- } else {
- if (range->page_shift != PAGE_SHIFT)
- return -EINVAL;
- }
+ mm_walk.mm = hmm->mm;
+ mm_walk.pud_entry = hmm_vma_walk_pud;
+ mm_walk.pmd_entry = hmm_vma_walk_pmd;
+ mm_walk.pte_hole = hmm_vma_walk_hole;
+ mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry;
+ mm_walk.test_walk = hmm_vma_walk_test;

- if (!(vma->vm_flags & VM_READ)) {
- /*
- * If vma do not allow read access, then assume that it
- * does not allow write access, either. HMM does not
- * support architecture that allow write without read.
- */
- hmm_pfns_clear(range, range->pfns,
- range->start, range->end);
- return -EPERM;
- }
+ do {
+ ret = walk_page_range(start, range->end, &mm_walk);
+ start = hmm_vma_walk.last;

- range->vma = vma;
- hmm_vma_walk.pgmap = NULL;
- hmm_vma_walk.last = start;
- hmm_vma_walk.flags = flags;
- hmm_vma_walk.range = range;
- mm_walk.private = &hmm_vma_walk;
- end = min(range->end, vma->vm_end);
-
- mm_walk.vma = vma;
- mm_walk.mm = vma->vm_mm;
- mm_walk.pte_entry = NULL;
- mm_walk.test_walk = NULL;
- mm_walk.hugetlb_entry = NULL;
- mm_walk.pud_entry = hmm_vma_walk_pud;
- mm_walk.pmd_entry = hmm_vma_walk_pmd;
- mm_walk.pte_hole = hmm_vma_walk_hole;
- mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry;
-
- do {
- ret = walk_page_range(start, end, &mm_walk);
- start = hmm_vma_walk.last;
-
- /* Keep trying while the range is valid. */
- } while (ret == -EBUSY && range->valid);
-
- if (ret) {
- unsigned long i;
-
- i = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
- hmm_pfns_clear(range, &range->pfns[i],
- hmm_vma_walk.last, range->end);
- return ret;
- }
- start = end;
+ /* Keep trying while the range is valid. */
+ } while (ret == -EBUSY && range->valid);

- } while (start < range->end);
+ if (ret)
+ return ret;

return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
}
--
2.20.1

2019-07-26 00:58:35

by Ralph Campbell

[permalink] [raw]

Subject: [PATCH v2 7/7] mm/hmm: remove hmm_range vma

Since hmm_range_fault() doesn't use the struct hmm_range vma field,
remove it.

Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Ralph Campbell <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_svm.c | 7 +++----
include/linux/hmm.h | 1 -
2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 49b520c60fc5..a74530b5a523 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -496,12 +496,12 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
range->start, range->end,
PAGE_SHIFT);
if (ret) {
- up_read(&range->vma->vm_mm->mmap_sem);
+ up_read(&range->hmm->mm->mmap_sem);
return (int)ret;
}

if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
- up_read(&range->vma->vm_mm->mmap_sem);
+ up_read(&range->hmm->mm->mmap_sem);
return -EBUSY;
}

@@ -509,7 +509,7 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
if (ret <= 0) {
if (ret == 0)
ret = -EBUSY;
- up_read(&range->vma->vm_mm->mmap_sem);
+ up_read(&range->hmm->mm->mmap_sem);
hmm_range_unregister(range);
return ret;
}
@@ -682,7 +682,6 @@ nouveau_svm_fault(struct nvif_notify *notify)
args.i.p.addr + args.i.p.size, fn - fi);

/* Have HMM fault pages within the fault window to the GPU. */
- range.vma = vma;
range.start = args.i.p.addr;
range.end = args.i.p.addr + args.i.p.size;
range.pfns = args.phys;
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index f3693dcc8b98..68949cf815f9 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -164,7 +164,6 @@ enum hmm_pfn_value_e {
*/
struct hmm_range {
struct hmm *hmm;
- struct vm_area_struct *vma;
struct list_head list;
unsigned long start;
unsigned long end;
--
2.20.1

2019-07-26 00:58:36

by Ralph Campbell

[permalink] [raw]

Subject: [PATCH v2 3/7] mm/hmm: replace the block argument to hmm_range_fault with a flags value

From: Christoph Hellwig <[email protected]>

This allows easier expansion to other flags, and also makes the
callers a little easier to read.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Ralph Campbell <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +-
include/linux/hmm.h | 11 +++-
mm/hmm.c | 74 ++++++++++++-------------
4 files changed, 48 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e51b48ac48eb..12a59ac83f72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -832,7 +832,7 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)

down_read(&mm->mmap_sem);

- r = hmm_range_fault(range, true);
+ r = hmm_range_fault(range, 0);
if (unlikely(r < 0)) {
if (likely(r == -EAGAIN)) {
/*
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 79b29c918717..49b520c60fc5 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -505,7 +505,7 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
return -EBUSY;
}

- ret = hmm_range_fault(range, true);
+ ret = hmm_range_fault(range, 0);
if (ret <= 0) {
if (ret == 0)
ret = -EBUSY;
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 659e25a15700..15f1b113be3c 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -406,12 +406,19 @@ int hmm_range_register(struct hmm_range *range,
unsigned long end,
unsigned page_shift);
void hmm_range_unregister(struct hmm_range *range);
+
+/*
+ * Retry fault if non-blocking, drop mmap_sem and return -EAGAIN in that case.
+ */
+#define HMM_FAULT_ALLOW_RETRY (1 << 0)
+
long hmm_range_snapshot(struct hmm_range *range);
-long hmm_range_fault(struct hmm_range *range, bool block);
+long hmm_range_fault(struct hmm_range *range, unsigned int flags);
+
long hmm_range_dma_map(struct hmm_range *range,
struct device *device,
dma_addr_t *daddrs,
- bool block);
+ unsigned int flags);
long hmm_range_dma_unmap(struct hmm_range *range,
struct vm_area_struct *vma,
struct device *device,
diff --git a/mm/hmm.c b/mm/hmm.c
index 362944b0fbca..84f2791d3510 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -281,7 +281,7 @@ struct hmm_vma_walk {
struct dev_pagemap *pgmap;
unsigned long last;
bool fault;
- bool block;
+ unsigned int flags;
};

static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
@@ -293,8 +293,11 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
struct vm_area_struct *vma = walk->vma;
vm_fault_t ret;

- flags |= hmm_vma_walk->block ? 0 : FAULT_FLAG_ALLOW_RETRY;
- flags |= write_fault ? FAULT_FLAG_WRITE : 0;
+ if (hmm_vma_walk->flags & HMM_FAULT_ALLOW_RETRY)
+ flags |= FAULT_FLAG_ALLOW_RETRY;
+ if (write_fault)
+ flags |= FAULT_FLAG_WRITE;
+
ret = handle_mm_fault(vma, addr, flags);
if (ret & VM_FAULT_RETRY) {
/* Note, handle_mm_fault did up_read(&mm->mmap_sem)) */
@@ -1012,26 +1015,26 @@ long hmm_range_snapshot(struct hmm_range *range)
}
EXPORT_SYMBOL(hmm_range_snapshot);

-/*
- * hmm_range_fault() - try to fault some address in a virtual address range
- * @range: range being faulted
- * @block: allow blocking on fault (if true it sleeps and do not drop mmap_sem)
- * Return: number of valid pages in range->pfns[] (from range start
- * address). This may be zero. If the return value is negative,
- * then one of the following values may be returned:
+/**
+ * hmm_range_fault - try to fault some address in a virtual address range
+ * @range: range being faulted
+ * @flags: HMM_FAULT_* flags
*
- * -EINVAL invalid arguments or mm or virtual address are in an
- * invalid vma (for instance device file vma).
- * -ENOMEM: Out of memory.
- * -EPERM: Invalid permission (for instance asking for write and
- * range is read only).
- * -EAGAIN: If you need to retry and mmap_sem was drop. This can only
- * happens if block argument is false.
- * -EBUSY: If the the range is being invalidated and you should wait
- * for invalidation to finish.
- * -EFAULT: Invalid (ie either no valid vma or it is illegal to access
- * that range), number of valid pages in range->pfns[] (from
- * range start address).
+ * Return: the number of valid pages in range->pfns[] (from range start
+ * address), which may be zero. On error one of the following status codes
+ * can be returned:
+ *
+ * -EINVAL: Invalid arguments or mm or virtual address is in an invalid vma
+ * (e.g., device file vma).
+ * -ENOMEM: Out of memory.
+ * -EPERM: Invalid permission (e.g., asking for write and range is read
+ * only).
+ * -EAGAIN: A page fault needs to be retried and mmap_sem was dropped.
+ * -EBUSY: The range has been invalidated and the caller needs to wait for
+ * the invalidation to finish.
+ * -EFAULT: Invalid (i.e., either no valid vma or it is illegal to access
+ * that range) number of valid pages in range->pfns[] (from
+ * range start address).
*
* This is similar to a regular CPU page fault except that it will not trigger
* any memory migration if the memory being faulted is not accessible by CPUs
@@ -1040,7 +1043,7 @@ EXPORT_SYMBOL(hmm_range_snapshot);
* On error, for one virtual address in the range, the function will mark the
* corresponding HMM pfn entry with an error flag.
*/
-long hmm_range_fault(struct hmm_range *range, bool block)
+long hmm_range_fault(struct hmm_range *range, unsigned int flags)
{
const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
unsigned long start = range->start, end;
@@ -1086,7 +1089,7 @@ long hmm_range_fault(struct hmm_range *range, bool block)
hmm_vma_walk.pgmap = NULL;
hmm_vma_walk.last = start;
hmm_vma_walk.fault = true;
- hmm_vma_walk.block = block;
+ hmm_vma_walk.flags = flags;
hmm_vma_walk.range = range;
mm_walk.private = &hmm_vma_walk;
end = min(range->end, vma->vm_end);
@@ -1125,25 +1128,22 @@ long hmm_range_fault(struct hmm_range *range, bool block)
EXPORT_SYMBOL(hmm_range_fault);

/**
- * hmm_range_dma_map() - hmm_range_fault() and dma map page all in one.
- * @range: range being faulted
- * @device: device against to dma map page to
- * @daddrs: dma address of mapped pages
- * @block: allow blocking on fault (if true it sleeps and do not drop mmap_sem)
- * Return: number of pages mapped on success, -EAGAIN if mmap_sem have been
- * drop and you need to try again, some other error value otherwise
+ * hmm_range_dma_map - hmm_range_fault() and dma map page all in one.
+ * @range: range being faulted
+ * @device: device to map page to
+ * @daddrs: array of dma addresses for the mapped pages
+ * @flags: HMM_FAULT_*
*
- * Note same usage pattern as hmm_range_fault().
+ * Return: the number of pages mapped on success (including zero), or any
+ * status return from hmm_range_fault() otherwise.
*/
-long hmm_range_dma_map(struct hmm_range *range,
- struct device *device,
- dma_addr_t *daddrs,
- bool block)
+long hmm_range_dma_map(struct hmm_range *range, struct device *device,
+ dma_addr_t *daddrs, unsigned int flags)
{
unsigned long i, npages, mapped;
long ret;

- ret = hmm_range_fault(range, block);
+ ret = hmm_range_fault(range, flags);
if (ret <= 0)
return ret ? ret : -EBUSY;

--
2.20.1

2019-07-26 00:58:43

by Ralph Campbell

[permalink] [raw]

Subject: [PATCH v2 6/7] mm/hmm: remove hugetlbfs check in hmm_vma_walk_pmd

walk_page_range() will only call hmm_vma_walk_hugetlb_entry() for
hugetlbfs pages and doesn't call hmm_vma_walk_pmd() in this case.
Therefore, it is safe to remove the check for vma->vm_flags & VM_HUGETLB
in hmm_vma_walk_pmd().

Signed-off-by: Ralph Campbell <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
---
mm/hmm.c | 3 ---
1 file changed, 3 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 838cd1d50497..29f322ca5d58 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -630,9 +630,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
if (pmd_none(pmd))
return hmm_vma_walk_hole(start, end, walk);

- if (pmd_huge(pmd) && (range->vma->vm_flags & VM_HUGETLB))
- return hmm_pfns_bad(start, end, walk);
-
if (thp_migration_supported() && is_pmd_migration_entry(pmd)) {
bool fault, write_fault;
unsigned long npages;
--
2.20.1

2019-07-26 00:59:36

by Ralph Campbell

[permalink] [raw]

Subject: [PATCH v2 4/7] mm: merge hmm_range_snapshot into hmm_range_fault

From: Christoph Hellwig <[email protected]>

Add a HMM_FAULT_SNAPSHOT flag so that hmm_range_snapshot can be merged
into the almost identical hmm_range_fault function.

Signed-off-by: Christoph Hellwig <[email protected]>
Signed-off-by: Ralph Campbell <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
---
Documentation/vm/hmm.rst | 17 ++++----
include/linux/hmm.h | 4 +-
mm/hmm.c | 85 +---------------------------------------
3 files changed, 13 insertions(+), 93 deletions(-)

diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index 710ce1c701bf..ddcb5ca8b296 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -192,15 +192,14 @@ read only, or fully unmap, etc.). The device must complete the update before
the driver callback returns.

When the device driver wants to populate a range of virtual addresses, it can
-use either::
+use::

- long hmm_range_snapshot(struct hmm_range *range);
- long hmm_range_fault(struct hmm_range *range, bool block);
+ long hmm_range_fault(struct hmm_range *range, unsigned int flags);

-The first one (hmm_range_snapshot()) will only fetch present CPU page table
+With the HMM_RANGE_SNAPSHOT flag, it will only fetch present CPU page table
entries and will not trigger a page fault on missing or non-present entries.
-The second one does trigger a page fault on missing or read-only entries if
-write access is requested (see below). Page faults use the generic mm page
+Without that flag, it does trigger a page fault on missing or read-only entries
+if write access is requested (see below). Page faults use the generic mm page
fault code path just like a CPU page fault.

Both functions copy CPU page table entries into their pfns array argument. Each
@@ -227,20 +226,20 @@ The usage pattern is::

/*
* Just wait for range to be valid, safe to ignore return value as we
- * will use the return value of hmm_range_snapshot() below under the
+ * will use the return value of hmm_range_fault() below under the
* mmap_sem to ascertain the validity of the range.
*/
hmm_range_wait_until_valid(&range, TIMEOUT_IN_MSEC);

again:
down_read(&mm->mmap_sem);
- ret = hmm_range_snapshot(&range);
+ ret = hmm_range_fault(&range, HMM_RANGE_SNAPSHOT);
if (ret) {
up_read(&mm->mmap_sem);
if (ret == -EBUSY) {
/*
* No need to check hmm_range_wait_until_valid() return value
- * on retry we will get proper error with hmm_range_snapshot()
+ * on retry we will get proper error with hmm_range_fault()
*/
hmm_range_wait_until_valid(&range, TIMEOUT_IN_MSEC);
goto again;
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 15f1b113be3c..f3693dcc8b98 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -412,7 +412,9 @@ void hmm_range_unregister(struct hmm_range *range);
*/
#define HMM_FAULT_ALLOW_RETRY (1 << 0)

-long hmm_range_snapshot(struct hmm_range *range);
+/* Don't fault in missing PTEs, just snapshot the current state. */
+#define HMM_FAULT_SNAPSHOT (1 << 1)
+
long hmm_range_fault(struct hmm_range *range, unsigned int flags);

long hmm_range_dma_map(struct hmm_range *range,
diff --git a/mm/hmm.c b/mm/hmm.c
index 84f2791d3510..1bc014cddd78 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -280,7 +280,6 @@ struct hmm_vma_walk {
struct hmm_range *range;
struct dev_pagemap *pgmap;
unsigned long last;
- bool fault;
unsigned int flags;
};

@@ -373,7 +372,7 @@ static inline void hmm_pte_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
{
struct hmm_range *range = hmm_vma_walk->range;

- if (!hmm_vma_walk->fault)
+ if (hmm_vma_walk->flags & HMM_FAULT_SNAPSHOT)
return;

/*
@@ -418,7 +417,7 @@ static void hmm_range_need_fault(const struct hmm_vma_walk *hmm_vma_walk,
{
unsigned long i;

- if (!hmm_vma_walk->fault) {
+ if (hmm_vma_walk->flags & HMM_FAULT_SNAPSHOT) {
*fault = *write_fault = false;
return;
}
@@ -936,85 +935,6 @@ void hmm_range_unregister(struct hmm_range *range)
}
EXPORT_SYMBOL(hmm_range_unregister);

-/*
- * hmm_range_snapshot() - snapshot CPU page table for a range
- * @range: range
- * Return: -EINVAL if invalid argument, -ENOMEM out of memory, -EPERM invalid
- * permission (for instance asking for write and range is read only),
- * -EBUSY if you need to retry, -EFAULT invalid (ie either no valid
- * vma or it is illegal to access that range), number of valid pages
- * in range->pfns[] (from range start address).
- *
- * This snapshots the CPU page table for a range of virtual addresses. Snapshot
- * validity is tracked by range struct. See in include/linux/hmm.h for example
- * on how to use.
- */
-long hmm_range_snapshot(struct hmm_range *range)
-{
- const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
- unsigned long start = range->start, end;
- struct hmm_vma_walk hmm_vma_walk;
- struct hmm *hmm = range->hmm;
- struct vm_area_struct *vma;
- struct mm_walk mm_walk;
-
- lockdep_assert_held(&hmm->mm->mmap_sem);
- do {
- /* If range is no longer valid force retry. */
- if (!range->valid)
- return -EBUSY;
-
- vma = find_vma(hmm->mm, start);
- if (vma == NULL || (vma->vm_flags & device_vma))
- return -EFAULT;
-
- if (is_vm_hugetlb_page(vma)) {
- if (huge_page_shift(hstate_vma(vma)) !=
- range->page_shift &&
- range->page_shift != PAGE_SHIFT)
- return -EINVAL;
- } else {
- if (range->page_shift != PAGE_SHIFT)
- return -EINVAL;
- }
-
- if (!(vma->vm_flags & VM_READ)) {
- /*
- * If vma do not allow read access, then assume that it
- * does not allow write access, either. HMM does not
- * support architecture that allow write without read.
- */
- hmm_pfns_clear(range, range->pfns,
- range->start, range->end);
- return -EPERM;
- }
-
- range->vma = vma;
- hmm_vma_walk.pgmap = NULL;
- hmm_vma_walk.last = start;
- hmm_vma_walk.fault = false;
- hmm_vma_walk.range = range;
- mm_walk.private = &hmm_vma_walk;
- end = min(range->end, vma->vm_end);
-
- mm_walk.vma = vma;
- mm_walk.mm = vma->vm_mm;
- mm_walk.pte_entry = NULL;
- mm_walk.test_walk = NULL;
- mm_walk.hugetlb_entry = NULL;
- mm_walk.pud_entry = hmm_vma_walk_pud;
- mm_walk.pmd_entry = hmm_vma_walk_pmd;
- mm_walk.pte_hole = hmm_vma_walk_hole;
- mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry;
-
- walk_page_range(start, end, &mm_walk);
- start = end;
- } while (start < range->end);
-
- return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
-}
-EXPORT_SYMBOL(hmm_range_snapshot);
-
/**
* hmm_range_fault - try to fault some address in a virtual address range
* @range: range being faulted
@@ -1088,7 +1008,6 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
range->vma = vma;
hmm_vma_walk.pgmap = NULL;
hmm_vma_walk.last = start;
- hmm_vma_walk.fault = true;
hmm_vma_walk.flags = flags;
hmm_vma_walk.range = range;
mm_walk.private = &hmm_vma_walk;
--
2.20.1

2019-07-26 06:25:39

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v2 2/7] mm/hmm: a few more C style and comment clean ups

Note: it seems like you've only CCed me on patches 2-7, but not on the
cover letter and patch 1. I'll try to find them later, but to make Ccs
useful they should normally cover the whole series.

Otherwise this looks fine to me:

Reviewed-by: Christoph Hellwig <[email protected]>

2019-07-26 06:25:53

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v2 6/7] mm/hmm: remove hugetlbfs check in hmm_vma_walk_pmd

On Thu, Jul 25, 2019 at 05:56:49PM -0700, Ralph Campbell wrote:
> walk_page_range() will only call hmm_vma_walk_hugetlb_entry() for
> hugetlbfs pages and doesn't call hmm_vma_walk_pmd() in this case.
> Therefore, it is safe to remove the check for vma->vm_flags & VM_HUGETLB
> in hmm_vma_walk_pmd().
>
> Signed-off-by: Ralph Campbell <[email protected]>

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2019-07-26 06:27:00

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v2 5/7] mm/hmm: make full use of walk_page_range()

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2019-07-26 06:27:48

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH v2 7/7] mm/hmm: remove hmm_range vma

On Thu, Jul 25, 2019 at 05:56:50PM -0700, Ralph Campbell wrote:
> Since hmm_range_fault() doesn't use the struct hmm_range vma field,
> remove it.
>
> Suggested-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Ralph Campbell <[email protected]>

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2019-07-26 15:13:41

by Jason Gunthorpe

[permalink] [raw]

Subject: Re: [PATCH v2 5/7] mm/hmm: make full use of walk_page_range()

On Thu, Jul 25, 2019 at 05:56:48PM -0700, Ralph Campbell wrote:
> hmm_range_fault() calls find_vma() and walk_page_range() in a loop.
> This is unnecessary duplication since walk_page_range() calls find_vma()
> in a loop already.
> Simplify hmm_range_fault() by defining a walk_test() callback function
> to filter unhandled vmas.
>
> Signed-off-by: Ralph Campbell <[email protected]>
> Cc: "Jérôme Glisse" <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> mm/hmm.c | 130 ++++++++++++++++++++++++-------------------------------
> 1 file changed, 57 insertions(+), 73 deletions(-)
>
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 1bc014cddd78..838cd1d50497 100644
> +++ b/mm/hmm.c
> @@ -840,13 +840,44 @@ static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
> #endif
> }
>
> -static void hmm_pfns_clear(struct hmm_range *range,
> - uint64_t *pfns,
> - unsigned long addr,
> - unsigned long end)
> +static int hmm_vma_walk_test(unsigned long start,
> + unsigned long end,
> + struct mm_walk *walk)
> {
> - for (; addr < end; addr += PAGE_SIZE, pfns++)
> - *pfns = range->values[HMM_PFN_NONE];
> + struct hmm_vma_walk *hmm_vma_walk = walk->private;
> + struct hmm_range *range = hmm_vma_walk->range;
> + struct vm_area_struct *vma = walk->vma;
> +
> + /* If range is no longer valid, force retry. */
> + if (!range->valid)
> + return -EBUSY;
> +
> + /*
> + * Skip vma ranges that don't have struct page backing them or
> + * map I/O devices directly.
> + * TODO: handle peer-to-peer device mappings.
> + */
> + if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))
> + return -EFAULT;
> +
> + if (is_vm_hugetlb_page(vma)) {
> + if (huge_page_shift(hstate_vma(vma)) != range->page_shift &&
> + range->page_shift != PAGE_SHIFT)
> + return -EINVAL;
> + } else {
> + if (range->page_shift != PAGE_SHIFT)
> + return -EINVAL;
> + }
> +
> + /*
> + * If vma does not allow read access, then assume that it does not
> + * allow write access, either. HMM does not support architectures
> + * that allow write without read.
> + */
> + if (!(vma->vm_flags & VM_READ))
> + return -EPERM;
> +
> + return 0;
> }
>
> /*
> @@ -965,82 +996,35 @@ EXPORT_SYMBOL(hmm_range_unregister);
> */
> long hmm_range_fault(struct hmm_range *range, unsigned int flags)
> {
> - const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
> - unsigned long start = range->start, end;
> - struct hmm_vma_walk hmm_vma_walk;
> + unsigned long start = range->start;
> + struct hmm_vma_walk hmm_vma_walk = {};
> struct hmm *hmm = range->hmm;
> - struct vm_area_struct *vma;
> - struct mm_walk mm_walk;
> + struct mm_walk mm_walk = {};
> int ret;
>
> lockdep_assert_held(&hmm->mm->mmap_sem);
>
> - do {
> - /* If range is no longer valid force retry. */
> - if (!range->valid)
> - return -EBUSY;
> + hmm_vma_walk.range = range;
> + hmm_vma_walk.last = start;
> + hmm_vma_walk.flags = flags;
> + mm_walk.private = &hmm_vma_walk;
>
> - vma = find_vma(hmm->mm, start);
> - if (vma == NULL || (vma->vm_flags & device_vma))
> - return -EFAULT;

It is hard to tell what is a confused/wrong and what is deliberate in
this code...

Currently the hmm_range_fault invokes walk_page_range on a VMA by VMA
basis, and the above prevents some cases of walk->vma becoming
NULL, but not all - for instance it doesn't check for start < vma->vm_start.

However, checking if it can actually tolerate the walk->vma == NULL it
looks like no:

walk_page_range
find_vma == NULL || start < vm_start -> walk->vma == NULL
__walk_page_range
walk_pgd_range
pte_hole / hmm_vma_walk_hole
hmm_vma_walk_hole_
hmm_vma_do_fault
handle_mm_fault(walk->vma, addr, flags)
vma->vm_mm <-- OOPS

Which kind of suggests the find_vma above was about preventing
walk->vma == NULL? Does something else tricky prevent this?

This patch also changes behavior so that missing VMAs don't always
trigger EFAULT (which is a good thing, but needs to be in the commit
message)

I strongly believe this is the correct direction to go in, and the fact
that this function returns EFAULT if there is no VMA/incompatible VMA
is actually a semantic bug we need to fix before it is a usable API.

Ie consider the user does something like
ptr = mmap(0, PAGE_SIZE ..)
mr = ib_reg_mr(ptr - PAGE_SIZE, ptr + 3*PAGE_SIZE, IBV_ACCESS_ON_DEMAND)

Then in the kernel I want to do hmm_range_fault(HMM_FAULT_SNAPSHOT)
across the MR VA and get a pfns array that says PAGE 0 is FAULT, PAGE
1 is R/W, PAGE 2 is FAULT.

Instead the entire call fails because there is no VMA at the starting
offset, or the VMA had the wrong flags, or something.

What it should do is populate the result with FAULT for the gap part
of the VA range and continue to the next VMA.

The same comment applies to the implementation of the walker test
function, it should return 1 to skip the VMA and fill PFNS with FAULT
when there is a mismatch VMA, not fail entirely.

Perhaps there was some thought that the fault version should fail to
tell the pagefault handler there is nothing to DMA, but even that is
not entirely desirable, I'd like to have 'fault around' semantics, if
we are going to all the work of doing a few PTEs, lets do a chunk. I
only care if the critical page(s) triggering the fault couldn't be
faulted in, the others can remain as pfn FAULT.

To proceed with this patch we need to confirm/deny the above trace. I
think it probably can be fixed easily (as another patch) by checking
for walk->vma == NULL in the right places.

I really would like to see a test for this function too :( It has lots
and lots of edge cases that need the be comprehensively explored
before we can call this working..

Jason

2019-07-26 15:46:58

by Jason Gunthorpe

[permalink] [raw]

Subject: Re: [PATCH v2 0/7] mm/hmm: more HMM clean up

On Thu, Jul 25, 2019 at 05:56:43PM -0700, Ralph Campbell wrote:
> Here are seven more patches for things I found to clean up.
> This was based on top of Christoph's seven patches:
> "hmm_range_fault related fixes and legacy API removal v3".
> I assume this will go into Jason's tree since there will likely be
> more HMM changes in this cycle.
>
> Changes from v1 to v2:
>
> Added AMD GPU to hmm_update removal.
> Added 2 patches from Christoph.
> Added 2 patches as a result of Jason's suggestions.
>
> Christoph Hellwig (2):
> mm/hmm: replace the block argument to hmm_range_fault with a flags
> value
> mm: merge hmm_range_snapshot into hmm_range_fault
>
> Ralph Campbell (5):
> mm/hmm: replace hmm_update with mmu_notifier_range
> mm/hmm: a few more C style and comment clean ups
> mm/hmm: remove hugetlbfs check in hmm_vma_walk_pmd
> mm/hmm: remove hmm_range vma

For all of these:

Reviewed-by: Jason Gunthorpe <[email protected]>

I've applied this to hmm.git, excluding:

> mm/hmm: make full use of walk_page_range()

Pending further discussion.

Based on last cycle I've decided to move good patches into linux-next
earlier and rely on some rebase if needed. This is to help Andrew's
workflow.

So, if there are more tags/etc please continue to send them, I will
sort it..

Thanks,
Jason

2019-07-26 19:31:10

by Ralph Campbell

[permalink] [raw]

Subject: Re: [PATCH v2 2/7] mm/hmm: a few more C style and comment clean ups

On 7/25/19 11:23 PM, Christoph Hellwig wrote:
> Note: it seems like you've only CCed me on patches 2-7, but not on the
> cover letter and patch 1. I'll try to find them later, but to make Ccs
> useful they should normally cover the whole series.
>
> Otherwise this looks fine to me:
>
> Reviewed-by: Christoph Hellwig <[email protected]>
>

Thanks for the review and sorry about the oversight on CCs.