2022-01-12 23:02:03

by Alistair Popple

[permalink] [raw]
Subject: Re: [PATCH v3 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping

I have been looking at this in relation to the migration code and noticed we
have the following in try_to_migrate():

if (is_zone_device_page(page) && !is_device_private_page(page))
return;

Which if I'm understanding correctly means that migration of device coherent
pages will always fail. Given that I do wonder how hmm-tests are passing, but
I assume you must always be hitting this fast path in
migrate_vma_collect_pmd():

/*
* Optimize for the common case where page is only mapped once
* in one process. If we can lock the page, then we can safely
* set up a special migration page table entry now.
*/

Meaning that try_to_migrate() never gets called from migrate_vma_unmap(). So
you will also need some changes to try_to_migrate() and possibly
try_to_migrate_one() to make this reliable.

- Alistair

On Tuesday, 11 January 2022 9:31:51 AM AEDT Alex Sierra wrote:
> This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory
> owned by a device that can be mapped into CPU page tables like
> MEMORY_DEVICE_GENERIC and can also be migrated like
> MEMORY_DEVICE_PRIVATE.
>
> Christoph, the suggestion to incorporate Ralph Campbell’s refcount
> cleanup patch into our hardware page migration patchset originally came
> from you, but it proved impractical to do things in that order because
> the refcount cleanup introduced a bug with wide ranging structural
> implications. Instead, we amended Ralph’s patch so that it could be
> applied after merging the migration work. As we saw from the recent
> discussion, merging the refcount work is going to take some time and
> cooperation between multiple development groups, while the migration
> work is ready now and is needed now. So we propose to merge this
> patchset first and continue to work with Ralph and others to merge the
> refcount cleanup separately, when it is ready.
>
> This patch series is mostly self-contained except for a few places where
> it needs to update other subsystems to handle the new memory type.
> System stability and performance are not affected according to our
> ongoing testing, including xfstests.
>
> How it works: The system BIOS advertises the GPU device memory
> (aka VRAM) as SPM (special purpose memory) in the UEFI system address
> map.
>
> The amdgpu driver registers the memory with devmap as
> MEMORY_DEVICE_COHERENT using devm_memremap_pages. The initial user for
> this hardware page migration capability is the Frontier supercomputer
> project. This functionality is not AMD-specific. We expect other GPU
> vendors to find this functionality useful, and possibly other hardware
> types in the future.
>
> Our test nodes in the lab are similar to the Frontier configuration,
> with .5 TB of system memory plus 256 GB of device memory split across
> 4 GPUs, all in a single coherent address space. Page migration is
> expected to improve application efficiency significantly. We will
> report empirical results as they become available.
>
> We extended hmm_test to cover migration of MEMORY_DEVICE_COHERENT. This
> patch set builds on HMM and our SVM memory manager already merged in
> 5.15.
>
> v2:
> - test_hmm is now able to create private and coherent device mirror
> instances in the same driver probe. This adds more usability to the hmm
> test by not having to remove the kernel module for each device type
> test (private/coherent type). This is done by passing the module
> parameters spm_addr_dev0 & spm_addr_dev1. In this case, it will create
> four instances of device_mirror. The first two correspond to private
> device type, the last two to coherent type. Then, they can be easily
> accessed from user space through /dev/hmm_mirror<num_device>. Usually
> num_device 0 and 1 are for private, and 2 and 3 for coherent types.
>
> - Coherent device type pages at gup are now migrated back to system
> memory if they have been long term pinned (FOLL_LONGTERM). The reason
> is these pages could eventually interfere with their own device memory
> manager. A new hmm_gup_test has been added to the hmm-test to test this
> functionality. It makes use of the gup_test module to long term pin
> user pages that have been migrate to device memory first.
>
> - Other patch corrections made by Felix, Alistair and Christoph.
>
> v3:
> - Based on last v2 feedback we got from Alistair, we've decided to
> remove migration logic for FOLL_LONGTERM coherent device type pages at
> gup for now. Ideally, this should be done through the kernel mm,
> instead of calling the device driver to do it. Currently, there's no
> support for migrating device pages based on pfn, mainly because
> migrate_pages() relies on pages being LRU pages. Alistair mentioned, he
> has started to work on adding this migrate device pages logic. For now,
> we fail on get_user_pages call with FOLL_LONGTERM for DEVICE_COHERENT
> pages.
>
> - Also, hmm_gup_test has been removed from hmm-test. We plan to include
> it again after this migration work is ready.
>
> - Addressed Liam Howlett's feedback changes.
>
> Alex Sierra (10):
> mm: add zone device coherent type memory support
> mm: add device coherent vma selection for memory migration
> mm/gup: fail get_user_pages for LONGTERM dev coherent type
> drm/amdkfd: add SPM support for SVM
> drm/amdkfd: coherent type as sys mem on migration to ram
> lib: test_hmm add ioctl to get zone device type
> lib: test_hmm add module param for zone device type
> lib: add support for device coherent type in test_hmm
> tools: update hmm-test to support device coherent type
> tools: update test_hmm script to support SP config
>
> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 34 ++-
> include/linux/memremap.h | 8 +
> include/linux/migrate.h | 1 +
> include/linux/mm.h | 16 ++
> lib/test_hmm.c | 333 +++++++++++++++++------
> lib/test_hmm_uapi.h | 22 +-
> mm/gup.c | 7 +
> mm/memcontrol.c | 6 +-
> mm/memory-failure.c | 8 +-
> mm/memremap.c | 5 +-
> mm/migrate.c | 30 +-
> tools/testing/selftests/vm/hmm-tests.c | 122 +++++++--
> tools/testing/selftests/vm/test_hmm.sh | 24 +-
> 13 files changed, 475 insertions(+), 141 deletions(-)
>
>