Subject: [PATCH v1 0/2] Remove extra ZONE_DEVICE page refcount

This patch includes Ralph Campbell’s ZONE_DEVICE refcount cleanup and
additionally the changes necessary to avoid breaking the separately
submitted MEMORY_DEVICE_COHERENT page migration code.
Ralph’s original description:
ZONE_DEVICE struct pages have an extra reference count that complicates
the code for put_page() and several places in the kernel that need to
check the reference count to see that a page is not being used (gup,
compaction, migration, etc.). Clean up the code so the reference count
doesn't need to be treated specially for ZONE_DEVICE.
Following a suggestion by Christoph, we attempted to combine this
cleanup with the device patch migration patch series, however this
caused xftests 413 to fail with a warning, the root cause of which has
large kernel footprint than just device memory. Fixing this issue
properly will require cooperation between multiple development groups
working across multiple kernel subsystems, as is apparent from the
discussion under the earlier, combined patch submission.
We therefore propose to break this work out separately as its own patch,
so it can receive the cooperative development work it needs. The deep
problem arises from the get_user_pages API, which has proved troublesome
for many years. It is possible that a concerted effort to repair this
particular refcount issue properly will be a step in the direction of
rationalizing this popular and problematic API.
In the larger picture, this API rationalization work probably deserves
an agenda item at the upcoming Filesystem, MM & BPF Summit:
https://events.linuxfoundation.org/lsfmm/

The wide ranging discussion following previous iterations of the
migration patchset focused almost exclusively on the refcount cleanup
patch. The thread is here:
https://lore.kernel.org/linux-mm/[email protected]/
and links a number of previous threads. It is apparent that there is a
lot of work in progress by a number of developer groups in parallel,
and that there are issues with the order in which this work should be
attempted and merged.
Jason provided his list of “balls in the air”:
- Joao's compound page support for device_dax and more
- Alex's DEVICE_COHERENT
- The refcount normalization
- Removing the pgmap test from GUP
- Removing the need for the PUD/PMD/PTE special bit
- Removing the need for the PUD/PMD/PTE devmap bit
- Remove PUD/PMD vma_is_special
- folios for fsdax
- shootdown for fsdax
It is not clear that the refcount cleanup in this patch should be the
first item on the list to be merged, however it has proved to be a good
starting point for a cooperative effort to address the underlying
issues.
Ralph, if you would prefer to take back “ownership” of this patch, it’s
yours, otherwise we will be happy to keep it in play and get it merged
when some other pieces of the puzzle fall into place.

Ralph Campbell (2):
ext4/xfs: add page refcount helper
mm: remove extra ZONE_DEVICE struct page refcount

arch/powerpc/kvm/book3s_hv_uvmem.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 3 +-
drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +-
fs/dax.c | 8 +--
fs/ext4/inode.c | 5 +-
fs/fuse/dax.c | 4 +-
fs/xfs/xfs_file.c | 4 +-
include/linux/dax.h | 10 ++++
include/linux/memremap.h | 7 ++-
include/linux/mm.h | 11 ----
lib/test_hmm.c | 2 +-
mm/internal.h | 7 +++
mm/memcontrol.c | 6 +-
mm/memremap.c | 72 +++++++-----------------
mm/migrate.c | 5 --
mm/page_alloc.c | 3 +
mm/swap.c | 45 ++-------------
17 files changed, 62 insertions(+), 134 deletions(-)

--
2.32.0