Subject: [PATCH v4 00/13] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping

This is our MEMORY_DEVICE_COHERENT patch series rebased and updated
for current 5.18.0

Changes since the last version:
- Fixed problems with migration during long-term pinning in
get_user_pages
- Open coded vm_normal_lru_pages as suggested in previous code review
- Update hmm_gup_test with more get_user_pages calls, include
hmm_cow_in_device in hmm-test.

This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
owned by a device that can be mapped into CPU page tables like
MEMORY_DEVICE_GENERIC and can also be migrated like
MEMORY_DEVICE_PRIVATE.

This patch series is mostly self-contained except for a few places where
it needs to update other subsystems to handle the new memory type.

System stability and performance are not affected according to our
ongoing testing, including xfstests.

How it works: The system BIOS advertises the GPU device memory
(aka VRAM) as SPM (special purpose memory) in the UEFI system address
map.

The amdgpu driver registers the memory with devmap as
MEMORY_DEVICE_COHERENT using devm_memremap_pages. The initial user for
this hardware page migration capability is the Frontier supercomputer
project. This functionality is not AMD-specific. We expect other GPU
vendors to find this functionality useful, and possibly other hardware
types in the future.

Our test nodes in the lab are similar to the Frontier configuration,
with .5 TB of system memory plus 256 GB of device memory split across
4 GPUs, all in a single coherent address space. Page migration is
expected to improve application efficiency significantly. We will
report empirical results as they become available.

Coherent device type pages at gup are now migrated back to system
memory if they are being pinned long-term (FOLL_LONGTERM). The reason
is, that long-term pinning would interfere with the device memory
manager owning the device-coherent pages (e.g. evictions in TTM).
These series incorporate Alistair Popple patches to do this
migration from pin_user_pages() calls. hmm_gup_test has been added to
hmm-test to test different get user pages calls.

This series includes handling of device-managed anonymous pages
returned by vm_normal_pages. Although they behave like normal pages
for purposes of mapping in CPU page tables and for COW, they do not
support LRU lists, NUMA migration or THP.

We also introduced a FOLL_LRU flag that adds the same behaviour to
follow_page and related APIs, to allow callers to specify that they
expect to put pages on an LRU list.

v2:
- Rebase to latest 5.18-rc7.
- Drop patch "mm: add device coherent checker to remove migration pte"
and modify try_to_migrate_one, to let DEVICE_COHERENT pages fall
through to normal page path. Based on Alistair Popple's comment.
- Fix comment formatting.
- Reword comment in vm_normal_page about pte_devmap().
- Merge "drm/amdkfd: coherent type as sys mem on migration to ram" to
"drm/amdkfd: add SPM support for SVM".

v3:
- Rebase to latest 5.18.0.
- Patch "mm: handling Non-LRU pages returned by vm_normal_pages"
reordered.
- Add WARN_ON_ONCE for thp device coherent case.

v4:
- Rebase to latest 5.18.0
- Fix consitency between pages with FOLL_LRU flag set and pte_devmap
at follow_page_pte.

Alex Sierra (11):
mm: add zone device coherent type memory support
mm: handling Non-LRU pages returned by vm_normal_pages
mm: add device coherent vma selection for memory migration
drm/amdkfd: add SPM support for SVM
lib: test_hmm add ioctl to get zone device type
lib: test_hmm add module param for zone device type
lib: add support for device coherent type in test_hmm
tools: update hmm-test to support device coherent type
tools: update test_hmm script to support SP config
tools: add hmm gup tests for device coherent type
tools: add selftests to hmm for COW in device memory

Alistair Popple (2):
mm: remove the vma check in migrate_vma_setup()
mm/gup: migrate device coherent pages when pinning instead of failing

drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 34 ++-
fs/proc/task_mmu.c | 2 +-
include/linux/memremap.h | 19 ++
include/linux/migrate.h | 1 +
include/linux/mm.h | 3 +-
lib/test_hmm.c | 349 +++++++++++++++++------
lib/test_hmm_uapi.h | 22 +-
mm/gup.c | 53 +++-
mm/huge_memory.c | 2 +-
mm/internal.h | 1 +
mm/khugepaged.c | 9 +-
mm/ksm.c | 6 +-
mm/madvise.c | 4 +-
mm/memcontrol.c | 7 +-
mm/memory-failure.c | 8 +-
mm/memory.c | 9 +-
mm/mempolicy.c | 2 +-
mm/memremap.c | 10 +
mm/migrate.c | 4 +-
mm/migrate_device.c | 115 ++++++--
mm/mlock.c | 2 +-
mm/mprotect.c | 2 +-
mm/rmap.c | 5 +-
tools/testing/selftests/vm/hmm-tests.c | 308 ++++++++++++++++++--
tools/testing/selftests/vm/test_hmm.sh | 24 +-
25 files changed, 817 insertions(+), 184 deletions(-)

--
2.32.0



Subject: [PATCH v4 02/13] mm: handling Non-LRU pages returned by vm_normal_pages

With DEVICE_COHERENT, we'll soon have vm_normal_pages() return
device-managed anonymous pages that are not LRU pages. Although they
behave like normal pages for purposes of mapping in CPU page, and for
COW. They do not support LRU lists, NUMA migration or THP.

We also introduced a FOLL_LRU flag that adds the same behaviour to
follow_page and related APIs, to allow callers to specify that they
expect to put pages on an LRU list.

Signed-off-by: Alex Sierra <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
---
fs/proc/task_mmu.c | 2 +-
include/linux/mm.h | 3 ++-
mm/gup.c | 6 +++++-
mm/huge_memory.c | 2 +-
mm/khugepaged.c | 9 ++++++---
mm/ksm.c | 6 +++---
mm/madvise.c | 4 ++--
mm/memory.c | 9 ++++++++-
mm/mempolicy.c | 2 +-
mm/migrate.c | 4 ++--
mm/mlock.c | 2 +-
mm/mprotect.c | 2 +-
12 files changed, 33 insertions(+), 18 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 2d04e3470d4c..2dd8c8a66924 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1792,7 +1792,7 @@ static struct page *can_gather_numa_stats(pte_t pte, struct vm_area_struct *vma,
return NULL;

page = vm_normal_page(vma, addr, pte);
- if (!page)
+ if (!page || is_zone_device_page(page))
return NULL;

if (PageReserved(page))
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bc8f326be0ce..d3f43908ff8d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -601,7 +601,7 @@ struct vm_operations_struct {
#endif
/*
* Called by vm_normal_page() for special PTEs to find the
- * page for @addr. This is useful if the default behavior
+ * page for @addr. This is useful if the default behavior
* (using pte_page()) would not find the correct page.
*/
struct page *(*find_special_page)(struct vm_area_struct *vma,
@@ -2934,6 +2934,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
#define FOLL_NUMA 0x200 /* force NUMA hinting page fault */
#define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */
#define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */
+#define FOLL_LRU 0x1000 /* return only LRU (anon or page cache) */
#define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */
#define FOLL_COW 0x4000 /* internal GUP flag */
#define FOLL_ANON 0x8000 /* don't do file mappings */
diff --git a/mm/gup.c b/mm/gup.c
index 551264407624..48b45bcc8501 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -532,7 +532,11 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
}

page = vm_normal_page(vma, address, pte);
- if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) {
+ if ((flags & FOLL_LRU) && ((page && is_zone_device_page(page)) ||
+ (!page && pte_devmap(pte)))) {
+ page = ERR_PTR(-EEXIST);
+ goto out;
+ } else if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) {
/*
* Only return device mapping pages in the FOLL_GET or FOLL_PIN
* case since they are only valid while holding the pgmap
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a77c78a2b6b5..48182c8fe151 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2906,7 +2906,7 @@ static int split_huge_pages_pid(int pid, unsigned long vaddr_start,
}

/* FOLL_DUMP to ignore special (like zero) pages */
- page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
+ page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP | FOLL_LRU);

if (IS_ERR(page))
continue;
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 16be62d493cd..671ac7800e53 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -618,7 +618,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
goto out;
}
page = vm_normal_page(vma, address, pteval);
- if (unlikely(!page)) {
+ if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
result = SCAN_PAGE_NULL;
goto out;
}
@@ -1267,7 +1267,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
writable = true;

page = vm_normal_page(vma, _address, pteval);
- if (unlikely(!page)) {
+ if (unlikely(!page) || unlikely(is_zone_device_page(page))) {
result = SCAN_PAGE_NULL;
goto out_unmap;
}
@@ -1479,7 +1479,8 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
goto abort;

page = vm_normal_page(vma, addr, *pte);
-
+ if (WARN_ON_ONCE(page && is_zone_device_page(page)))
+ page = NULL;
/*
* Note that uprobe, debugger, or MAP_PRIVATE may change the
* page table, but the new page will not be a subpage of hpage.
@@ -1497,6 +1498,8 @@ void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr)
if (pte_none(*pte))
continue;
page = vm_normal_page(vma, addr, *pte);
+ if (WARN_ON_ONCE(page && is_zone_device_page(page)))
+ goto abort;
page_remove_rmap(page, vma, false);
}

diff --git a/mm/ksm.c b/mm/ksm.c
index 54f78c9eecae..400790128102 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -474,7 +474,7 @@ static int break_ksm(struct vm_area_struct *vma, unsigned long addr)
do {
cond_resched();
page = follow_page(vma, addr,
- FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE);
+ FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE | FOLL_LRU);
if (IS_ERR_OR_NULL(page))
break;
if (PageKsm(page))
@@ -559,7 +559,7 @@ static struct page *get_mergeable_page(struct rmap_item *rmap_item)
if (!vma)
goto out;

- page = follow_page(vma, addr, FOLL_GET);
+ page = follow_page(vma, addr, FOLL_GET | FOLL_LRU);
if (IS_ERR_OR_NULL(page))
goto out;
if (PageAnon(page)) {
@@ -2307,7 +2307,7 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
while (ksm_scan.address < vma->vm_end) {
if (ksm_test_exit(mm))
break;
- *page = follow_page(vma, ksm_scan.address, FOLL_GET);
+ *page = follow_page(vma, ksm_scan.address, FOLL_GET | FOLL_LRU);
if (IS_ERR_OR_NULL(*page)) {
ksm_scan.address += PAGE_SIZE;
cond_resched();
diff --git a/mm/madvise.c b/mm/madvise.c
index d7b4f2602949..e5637181de1b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -421,7 +421,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
continue;

page = vm_normal_page(vma, addr, ptent);
- if (!page)
+ if (!page || is_zone_device_page(page))
continue;

/*
@@ -639,7 +639,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
}

page = vm_normal_page(vma, addr, ptent);
- if (!page)
+ if (!page || is_zone_device_page(page))
continue;

/*
diff --git a/mm/memory.c b/mm/memory.c
index 21dadf03f089..30ecbc715e60 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -624,6 +624,13 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
if (is_zero_pfn(pfn))
return NULL;
if (pte_devmap(pte))
+/*
+ * NOTE: New uers of ZONE_DEVICE will not set pte_devmap() and will have
+ * refcounts incremented on their struct pages when they are inserted into
+ * PTEs, thus they are safe to return here. Legacy ZONE_DEVICE pages that set
+ * pte_devmap() do not have refcounts. Example of legacy ZONE_DEVICE is
+ * MEMORY_DEVICE_FS_DAX type in pmem or virtio_fs drivers.
+ */
return NULL;

print_bad_pte(vma, addr, pte, NULL);
@@ -4685,7 +4692,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
pte = pte_modify(old_pte, vma->vm_page_prot);

page = vm_normal_page(vma, vmf->address, pte);
- if (!page)
+ if (!page || is_zone_device_page(page))
goto out_map;

/* TODO: handle PTE-mapped THP */
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index d39b01fd52fe..abc26890fc95 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -523,7 +523,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr,
if (!pte_present(*pte))
continue;
page = vm_normal_page(vma, addr, *pte);
- if (!page)
+ if (!page || is_zone_device_page(page))
continue;
/*
* vm_normal_page() filters out zero pages, but there might
diff --git a/mm/migrate.c b/mm/migrate.c
index e51588e95f57..f7d1b8312631 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1612,7 +1612,7 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr,
goto out;

/* FOLL_DUMP to ignore special (like zero) pages */
- page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
+ page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP | FOLL_LRU);

err = PTR_ERR(page);
if (IS_ERR(page))
@@ -1803,7 +1803,7 @@ static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
goto set_status;

/* FOLL_DUMP to ignore special (like zero) pages */
- page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP);
+ page = follow_page(vma, addr, FOLL_GET | FOLL_DUMP | FOLL_LRU);

err = PTR_ERR(page);
if (IS_ERR(page))
diff --git a/mm/mlock.c b/mm/mlock.c
index 716caf851043..b14e929084cc 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -333,7 +333,7 @@ static int mlock_pte_range(pmd_t *pmd, unsigned long addr,
if (!pte_present(*pte))
continue;
page = vm_normal_page(vma, addr, *pte);
- if (!page)
+ if (!page || is_zone_device_page(page))
continue;
if (PageTransCompound(page))
continue;
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ba5592655ee3..e034aae2a98b 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -95,7 +95,7 @@ static unsigned long change_pte_range(struct mmu_gather *tlb,
continue;

page = vm_normal_page(vma, addr, oldpte);
- if (!page || PageKsm(page))
+ if (!page || is_zone_device_page(page) || PageKsm(page))
continue;

/* Also skip shared copy-on-write pages */
--
2.32.0


Subject: [PATCH v4 05/13] mm/gup: migrate device coherent pages when pinning instead of failing

From: Alistair Popple <[email protected]>

Currently any attempts to pin a device coherent page will fail. This is
because device coherent pages need to be managed by a device driver, and
pinning them would prevent a driver from migrating them off the device.

However this is no reason to fail pinning of these pages. These are
coherent and accessible from the CPU so can be migrated just like
pinning ZONE_MOVABLE pages. So instead of failing all attempts to pin
them first try migrating them out of ZONE_DEVICE.

Signed-off-by: Alistair Popple <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
[hch: rebased to the split device memory checks,
moved migrate_device_page to migrate_device.c]
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/gup.c | 47 +++++++++++++++++++++++++++++++++++-----
mm/internal.h | 1 +
mm/migrate_device.c | 53 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 96 insertions(+), 5 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 48b45bcc8501..e6093c31f932 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1895,9 +1895,43 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
continue;
prev_folio = folio;

- if (folio_is_pinnable(folio))
+ /*
+ * Device private pages will get faulted in during gup so it
+ * shouldn't be possible to see one here.
+ */
+ if (WARN_ON_ONCE(folio_is_device_private(folio))) {
+ ret = -EFAULT;
+ goto unpin_pages;
+ }
+
+ /*
+ * Device coherent pages are managed by a driver and should not
+ * be pinned indefinitely as it prevents the driver moving the
+ * page. So when trying to pin with FOLL_LONGTERM instead try
+ * to migrate the page out of device memory.
+ */
+ if (folio_is_device_coherent(folio)) {
+ WARN_ON_ONCE(PageCompound(&folio->page));
+
+ /*
+ * Migration will fail if the page is pinned, so convert
+ * the pin on the source page to a normal reference.
+ */
+ if (gup_flags & FOLL_PIN) {
+ get_page(&folio->page);
+ unpin_user_page(&folio->page);
+ }
+
+ pages[i] = migrate_device_page(&folio->page, gup_flags);
+ if (!pages[i]) {
+ ret = -EBUSY;
+ goto unpin_pages;
+ }
continue;
+ }

+ if (folio_is_pinnable(folio))
+ continue;
/*
* Try to move out any movable page before pinning the range.
*/
@@ -1933,10 +1967,13 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
return nr_pages;

unpin_pages:
- if (gup_flags & FOLL_PIN) {
- unpin_user_pages(pages, nr_pages);
- } else {
- for (i = 0; i < nr_pages; i++)
+ for (i = 0; i < nr_pages; i++) {
+ if (!pages[i])
+ continue;
+
+ if (gup_flags & FOLL_PIN)
+ unpin_user_page(pages[i]);
+ else
put_page(pages[i]);
}

diff --git a/mm/internal.h b/mm/internal.h
index c0f8fbe0445b..eeab4ee7a4a3 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -853,6 +853,7 @@ int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
unsigned long addr, int page_nid, int *flags);

void free_zone_device_page(struct page *page);
+struct page *migrate_device_page(struct page *page, unsigned int gup_flags);

/*
* mm/gup.c
diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index cf9668376c5a..5decd26dd551 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -794,3 +794,56 @@ void migrate_vma_finalize(struct migrate_vma *migrate)
}
}
EXPORT_SYMBOL(migrate_vma_finalize);
+
+/*
+ * Migrate a device coherent page back to normal memory. The caller should have
+ * a reference on page which will be copied to the new page if migration is
+ * successful or dropped on failure.
+ */
+struct page *migrate_device_page(struct page *page, unsigned int gup_flags)
+{
+ unsigned long src_pfn, dst_pfn = 0;
+ struct migrate_vma args;
+ struct page *dpage;
+
+ lock_page(page);
+ src_pfn = migrate_pfn(page_to_pfn(page)) | MIGRATE_PFN_MIGRATE;
+ args.src = &src_pfn;
+ args.dst = &dst_pfn;
+ args.cpages = 1;
+ args.npages = 1;
+ args.vma = NULL;
+ migrate_vma_setup(&args);
+ if (!(src_pfn & MIGRATE_PFN_MIGRATE))
+ return NULL;
+
+ dpage = alloc_pages(GFP_USER | __GFP_NOWARN, 0);
+
+ /*
+ * get/pin the new page now so we don't have to retry gup after
+ * migrating. We already have a reference so this should never fail.
+ */
+ if (dpage && WARN_ON_ONCE(!try_grab_page(dpage, gup_flags))) {
+ __free_pages(dpage, 0);
+ dpage = NULL;
+ }
+
+ if (dpage) {
+ lock_page(dpage);
+ dst_pfn = migrate_pfn(page_to_pfn(dpage));
+ }
+
+ migrate_vma_pages(&args);
+ if (src_pfn & MIGRATE_PFN_MIGRATE)
+ copy_highpage(dpage, page);
+ migrate_vma_finalize(&args);
+ if (dpage && !(src_pfn & MIGRATE_PFN_MIGRATE)) {
+ if (gup_flags & FOLL_PIN)
+ unpin_user_page(dpage);
+ else
+ put_page(dpage);
+ dpage = NULL;
+ }
+
+ return dpage;
+}
--
2.32.0


Subject: [PATCH v4 04/13] mm: remove the vma check in migrate_vma_setup()

From: Alistair Popple <[email protected]>

migrate_vma_setup() checks that a valid vma is passed so that the page
tables can be walked to find the pfns associated with a given address
range. However in some cases the pfns are already known, such as when
migrating device coherent pages during pin_user_pages() meaning a valid
vma isn't required.

Signed-off-by: Alistair Popple <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/migrate_device.c | 34 +++++++++++++++++-----------------
1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/mm/migrate_device.c b/mm/migrate_device.c
index 18bc6483f63a..cf9668376c5a 100644
--- a/mm/migrate_device.c
+++ b/mm/migrate_device.c
@@ -486,24 +486,24 @@ int migrate_vma_setup(struct migrate_vma *args)

args->start &= PAGE_MASK;
args->end &= PAGE_MASK;
- if (!args->vma || is_vm_hugetlb_page(args->vma) ||
- (args->vma->vm_flags & VM_SPECIAL) || vma_is_dax(args->vma))
- return -EINVAL;
- if (nr_pages <= 0)
- return -EINVAL;
- if (args->start < args->vma->vm_start ||
- args->start >= args->vma->vm_end)
- return -EINVAL;
- if (args->end <= args->vma->vm_start || args->end > args->vma->vm_end)
- return -EINVAL;
if (!args->src || !args->dst)
return -EINVAL;
-
- memset(args->src, 0, sizeof(*args->src) * nr_pages);
- args->cpages = 0;
- args->npages = 0;
-
- migrate_vma_collect(args);
+ if (args->vma) {
+ if (is_vm_hugetlb_page(args->vma) ||
+ (args->vma->vm_flags & VM_SPECIAL) || vma_is_dax(args->vma))
+ return -EINVAL;
+ if (args->start < args->vma->vm_start ||
+ args->start >= args->vma->vm_end)
+ return -EINVAL;
+ if (args->end <= args->vma->vm_start ||
+ args->end > args->vma->vm_end)
+ return -EINVAL;
+ memset(args->src, 0, sizeof(*args->src) * nr_pages);
+ args->cpages = 0;
+ args->npages = 0;
+
+ migrate_vma_collect(args);
+ }

if (args->cpages)
migrate_vma_unmap(args);
@@ -685,7 +685,7 @@ void migrate_vma_pages(struct migrate_vma *migrate)
continue;
}

- if (!page) {
+ if (!page && migrate->vma) {
if (!(migrate->src[i] & MIGRATE_PFN_MIGRATE))
continue;
if (!notified) {
--
2.32.0


Subject: [PATCH v4 08/13] lib: test_hmm add module param for zone device type

In order to configure device coherent in test_hmm, two module parameters
should be passed, which correspond to the SP start address of each
device (2) spm_addr_dev0 & spm_addr_dev1. If no parameters are passed,
private device type is configured.

Signed-off-by: Alex Sierra <[email protected]>
Acked-by: Felix Kuehling <[email protected]>
Reviewed-by: Alistair Poppple <[email protected]>
Signed-off-by: Christoph Hellwig <[email protected]>
---
lib/test_hmm.c | 73 ++++++++++++++++++++++++++++++++-------------
lib/test_hmm_uapi.h | 1 +
2 files changed, 53 insertions(+), 21 deletions(-)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 7a27584484ce..15747f70c5bc 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -37,6 +37,16 @@
#define DEVMEM_CHUNK_SIZE (256 * 1024 * 1024U)
#define DEVMEM_CHUNKS_RESERVE 16

+static unsigned long spm_addr_dev0;
+module_param(spm_addr_dev0, long, 0644);
+MODULE_PARM_DESC(spm_addr_dev0,
+ "Specify start address for SPM (special purpose memory) used for device 0. By setting this Coherent device type will be used. Make sure spm_addr_dev1 is set too. Minimum SPM size should be DEVMEM_CHUNK_SIZE.");
+
+static unsigned long spm_addr_dev1;
+module_param(spm_addr_dev1, long, 0644);
+MODULE_PARM_DESC(spm_addr_dev1,
+ "Specify start address for SPM (special purpose memory) used for device 1. By setting this Coherent device type will be used. Make sure spm_addr_dev0 is set too. Minimum SPM size should be DEVMEM_CHUNK_SIZE.");
+
static const struct dev_pagemap_ops dmirror_devmem_ops;
static const struct mmu_interval_notifier_ops dmirror_min_ops;
static dev_t dmirror_dev;
@@ -455,28 +465,44 @@ static int dmirror_write(struct dmirror *dmirror, struct hmm_dmirror_cmd *cmd)
return ret;
}

-static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
+static int dmirror_allocate_chunk(struct dmirror_device *mdevice,
struct page **ppage)
{
struct dmirror_chunk *devmem;
- struct resource *res;
+ struct resource *res = NULL;
unsigned long pfn;
unsigned long pfn_first;
unsigned long pfn_last;
void *ptr;
+ int ret = -ENOMEM;

devmem = kzalloc(sizeof(*devmem), GFP_KERNEL);
if (!devmem)
- return false;
+ return ret;

- res = request_free_mem_region(&iomem_resource, DEVMEM_CHUNK_SIZE,
- "hmm_dmirror");
- if (IS_ERR(res))
+ switch (mdevice->zone_device_type) {
+ case HMM_DMIRROR_MEMORY_DEVICE_PRIVATE:
+ res = request_free_mem_region(&iomem_resource, DEVMEM_CHUNK_SIZE,
+ "hmm_dmirror");
+ if (IS_ERR_OR_NULL(res))
+ goto err_devmem;
+ devmem->pagemap.range.start = res->start;
+ devmem->pagemap.range.end = res->end;
+ devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
+ break;
+ case HMM_DMIRROR_MEMORY_DEVICE_COHERENT:
+ devmem->pagemap.range.start = (MINOR(mdevice->cdevice.dev) - 2) ?
+ spm_addr_dev0 :
+ spm_addr_dev1;
+ devmem->pagemap.range.end = devmem->pagemap.range.start +
+ DEVMEM_CHUNK_SIZE - 1;
+ devmem->pagemap.type = MEMORY_DEVICE_COHERENT;
+ break;
+ default:
+ ret = -EINVAL;
goto err_devmem;
+ }

- devmem->pagemap.type = MEMORY_DEVICE_PRIVATE;
- devmem->pagemap.range.start = res->start;
- devmem->pagemap.range.end = res->end;
devmem->pagemap.nr_range = 1;
devmem->pagemap.ops = &dmirror_devmem_ops;
devmem->pagemap.owner = mdevice;
@@ -497,10 +523,14 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
mdevice->devmem_capacity = new_capacity;
mdevice->devmem_chunks = new_chunks;
}
-
ptr = memremap_pages(&devmem->pagemap, numa_node_id());
- if (IS_ERR(ptr))
+ if (IS_ERR_OR_NULL(ptr)) {
+ if (ptr)
+ ret = PTR_ERR(ptr);
+ else
+ ret = -EFAULT;
goto err_release;
+ }

devmem->mdevice = mdevice;
pfn_first = devmem->pagemap.range.start >> PAGE_SHIFT;
@@ -529,15 +559,17 @@ static bool dmirror_allocate_chunk(struct dmirror_device *mdevice,
}
spin_unlock(&mdevice->lock);

- return true;
+ return 0;

err_release:
mutex_unlock(&mdevice->devmem_lock);
- release_mem_region(devmem->pagemap.range.start, range_len(&devmem->pagemap.range));
+ if (res && devmem->pagemap.type == MEMORY_DEVICE_PRIVATE)
+ release_mem_region(devmem->pagemap.range.start,
+ range_len(&devmem->pagemap.range));
err_devmem:
kfree(devmem);

- return false;
+ return ret;
}

static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice)
@@ -562,7 +594,7 @@ static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice)
spin_unlock(&mdevice->lock);
} else {
spin_unlock(&mdevice->lock);
- if (!dmirror_allocate_chunk(mdevice, &dpage))
+ if (dmirror_allocate_chunk(mdevice, &dpage))
goto error;
}

@@ -1244,10 +1276,8 @@ static int dmirror_device_init(struct dmirror_device *mdevice, int id)
if (ret)
return ret;

- /* Build a list of free ZONE_DEVICE private struct pages */
- dmirror_allocate_chunk(mdevice, NULL);
-
- return 0;
+ /* Build a list of free ZONE_DEVICE struct pages */
+ return dmirror_allocate_chunk(mdevice, NULL);
}

static void dmirror_device_remove(struct dmirror_device *mdevice)
@@ -1260,8 +1290,9 @@ static void dmirror_device_remove(struct dmirror_device *mdevice)
mdevice->devmem_chunks[i];

memunmap_pages(&devmem->pagemap);
- release_mem_region(devmem->pagemap.range.start,
- range_len(&devmem->pagemap.range));
+ if (devmem->pagemap.type == MEMORY_DEVICE_PRIVATE)
+ release_mem_region(devmem->pagemap.range.start,
+ range_len(&devmem->pagemap.range));
kfree(devmem);
}
kfree(mdevice->devmem_chunks);
diff --git a/lib/test_hmm_uapi.h b/lib/test_hmm_uapi.h
index 17f842f1aa02..625f3690d086 100644
--- a/lib/test_hmm_uapi.h
+++ b/lib/test_hmm_uapi.h
@@ -68,6 +68,7 @@ enum {
enum {
/* 0 is reserved to catch uninitialized type fields */
HMM_DMIRROR_MEMORY_DEVICE_PRIVATE = 1,
+ HMM_DMIRROR_MEMORY_DEVICE_COHERENT,
};

#endif /* _LIB_TEST_HMM_UAPI_H */
--
2.32.0