From: Ralph Campbell <[email protected]>
I hit a use after free bug in hmm_free() with KASAN and then couldn't
stop myself from cleaning up a bunch of documentation and coding style
changes. So the first two patches are clean ups, the last three are
the fixes.
Ralph Campbell (5):
mm/hmm: Update HMM documentation
mm/hmm: Clean up some coding style and comments
mm/hmm: Use mm_get_hmm() in hmm_range_register()
mm/hmm: hmm_vma_fault() doesn't always call hmm_range_unregister()
mm/hmm: Fix mm stale reference use in hmm_free()
Documentation/vm/hmm.rst | 139 ++++++++++++++++++-----------------
include/linux/hmm.h | 84 ++++++++++------------
mm/hmm.c | 151 ++++++++++++++++-----------------------
3 files changed, 174 insertions(+), 200 deletions(-)
--
2.20.1
From: Ralph Campbell <[email protected]>
Update the HMM documentation to reflect the latest API and make a few minor
wording changes.
Signed-off-by: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Andrew Morton <[email protected]>
---
Documentation/vm/hmm.rst | 139 ++++++++++++++++++++-------------------
1 file changed, 73 insertions(+), 66 deletions(-)
diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index ec1efa32af3c..7c1e929931a0 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -10,7 +10,7 @@ of this being specialized struct page for such memory (see sections 5 to 7 of
this document).
HMM also provides optional helpers for SVM (Share Virtual Memory), i.e.,
-allowing a device to transparently access program address coherently with
+allowing a device to transparently access program addresses coherently with
the CPU meaning that any valid pointer on the CPU is also a valid pointer
for the device. This is becoming mandatory to simplify the use of advanced
heterogeneous computing where GPU, DSP, or FPGA are used to perform various
@@ -22,8 +22,8 @@ expose the hardware limitations that are inherent to many platforms. The third
section gives an overview of the HMM design. The fourth section explains how
CPU page-table mirroring works and the purpose of HMM in this context. The
fifth section deals with how device memory is represented inside the kernel.
-Finally, the last section presents a new migration helper that allows lever-
-aging the device DMA engine.
+Finally, the last section presents a new migration helper that allows
+leveraging the device DMA engine.
.. contents:: :local:
@@ -39,20 +39,20 @@ address space. I use shared address space to refer to the opposite situation:
i.e., one in which any application memory region can be used by a device
transparently.
-Split address space happens because device can only access memory allocated
-through device specific API. This implies that all memory objects in a program
+Split address space happens because devices can only access memory allocated
+through a device specific API. This implies that all memory objects in a program
are not equal from the device point of view which complicates large programs
that rely on a wide set of libraries.
-Concretely this means that code that wants to leverage devices like GPUs needs
-to copy object between generically allocated memory (malloc, mmap private, mmap
+Concretely, this means that code that wants to leverage devices like GPUs needs
+to copy objects between generically allocated memory (malloc, mmap private, mmap
share) and memory allocated through the device driver API (this still ends up
with an mmap but of the device file).
For flat data sets (array, grid, image, ...) this isn't too hard to achieve but
-complex data sets (list, tree, ...) are hard to get right. Duplicating a
+for complex data sets (list, tree, ...) it's hard to get right. Duplicating a
complex data set needs to re-map all the pointer relations between each of its
-elements. This is error prone and program gets harder to debug because of the
+elements. This is error prone and programs get harder to debug because of the
duplicate data set and addresses.
Split address space also means that libraries cannot transparently use data
@@ -77,12 +77,12 @@ I/O bus, device memory characteristics
I/O buses cripple shared address spaces due to a few limitations. Most I/O
buses only allow basic memory access from device to main memory; even cache
-coherency is often optional. Access to device memory from CPU is even more
+coherency is often optional. Access to device memory from a CPU is even more
limited. More often than not, it is not cache coherent.
If we only consider the PCIE bus, then a device can access main memory (often
through an IOMMU) and be cache coherent with the CPUs. However, it only allows
-a limited set of atomic operations from device on main memory. This is worse
+a limited set of atomic operations from the device on main memory. This is worse
in the other direction: the CPU can only access a limited range of the device
memory and cannot perform atomic operations on it. Thus device memory cannot
be considered the same as regular memory from the kernel point of view.
@@ -93,20 +93,20 @@ The final limitation is latency. Access to main memory from the device has an
order of magnitude higher latency than when the device accesses its own memory.
Some platforms are developing new I/O buses or additions/modifications to PCIE
-to address some of these limitations (OpenCAPI, CCIX). They mainly allow two-
-way cache coherency between CPU and device and allow all atomic operations the
+to address some of these limitations (OpenCAPI, CCIX). They mainly allow
+two-way cache coherency between CPU and device and allow all atomic operations the
architecture supports. Sadly, not all platforms are following this trend and
some major architectures are left without hardware solutions to these problems.
So for shared address space to make sense, not only must we allow devices to
access any memory but we must also permit any memory to be migrated to device
-memory while device is using it (blocking CPU access while it happens).
+memory while the device is using it (blocking CPU access while it happens).
Shared address space and migration
==================================
-HMM intends to provide two main features. First one is to share the address
+HMM intends to provide two main features. The first one is to share the address
space by duplicating the CPU page table in the device page table so the same
address points to the same physical memory for any valid main memory address in
the process address space.
@@ -121,14 +121,14 @@ why HMM provides helpers to factor out everything that can be while leaving the
hardware specific details to the device driver.
The second mechanism HMM provides is a new kind of ZONE_DEVICE memory that
-allows allocating a struct page for each page of the device memory. Those pages
+allows allocating a struct page for each page of device memory. Those pages
are special because the CPU cannot map them. However, they allow migrating
main memory to device memory using existing migration mechanisms and everything
-looks like a page is swapped out to disk from the CPU point of view. Using a
-struct page gives the easiest and cleanest integration with existing mm mech-
-anisms. Here again, HMM only provides helpers, first to hotplug new ZONE_DEVICE
+looks like a page that is swapped out to disk from the CPU point of view. Using a
+struct page gives the easiest and cleanest integration with existing mm
+mechanisms. Here again, HMM only provides helpers, first to hotplug new ZONE_DEVICE
memory for the device memory and second to perform migration. Policy decisions
-of what and when to migrate things is left to the device driver.
+of what and when to migrate is left to the device driver.
Note that any CPU access to a device page triggers a page fault and a migration
back to main memory. For example, when a page backing a given CPU address A is
@@ -136,8 +136,8 @@ migrated from a main memory page to a device page, then any CPU access to
address A triggers a page fault and initiates a migration back to main memory.
With these two features, HMM not only allows a device to mirror process address
-space and keeping both CPU and device page table synchronized, but also lever-
-ages device memory by migrating the part of the data set that is actively being
+space and keeps both CPU and device page tables synchronized, but also
+leverages device memory by migrating the part of the data set that is actively being
used by the device.
@@ -151,21 +151,27 @@ registration of an hmm_mirror struct::
int hmm_mirror_register(struct hmm_mirror *mirror,
struct mm_struct *mm);
- int hmm_mirror_register_locked(struct hmm_mirror *mirror,
- struct mm_struct *mm);
-
-The locked variant is to be used when the driver is already holding mmap_sem
-of the mm in write mode. The mirror struct has a set of callbacks that are used
+The mirror struct has a set of callbacks that are used
to propagate CPU page tables::
struct hmm_mirror_ops {
+ /* release() - release hmm_mirror
+ *
+ * @mirror: pointer to struct hmm_mirror
+ *
+ * This is called when the mm_struct is being released.
+ * The callback should make sure no references to the mirror occur
+ * after the callback returns.
+ */
+ void (*release)(struct hmm_mirror *mirror);
+
/* sync_cpu_device_pagetables() - synchronize page tables
*
* @mirror: pointer to struct hmm_mirror
- * @update_type: type of update that occurred to the CPU page table
- * @start: virtual start address of the range to update
- * @end: virtual end address of the range to update
+ * @update: update information (see struct mmu_notifier_range)
+ * Return: -EAGAIN if update.blockable false and callback need to
+ * block, 0 otherwise.
*
* This callback ultimately originates from mmu_notifiers when the CPU
* page table is updated. The device driver must update its page table
@@ -176,14 +182,12 @@ to propagate CPU page tables::
* page tables are completely updated (TLBs flushed, etc); this is a
* synchronous call.
*/
- void (*update)(struct hmm_mirror *mirror,
- enum hmm_update action,
- unsigned long start,
- unsigned long end);
+ int (*sync_cpu_device_pagetables)(struct hmm_mirror *mirror,
+ const struct hmm_update *update);
};
The device driver must perform the update action to the range (mark range
-read only, or fully unmap, ...). The device must be done with the update before
+read only, or fully unmap, etc.). The device must complete the update before
the driver callback returns.
When the device driver wants to populate a range of virtual addresses, it can
@@ -194,17 +198,18 @@ use either::
The first one (hmm_range_snapshot()) will only fetch present CPU page table
entries and will not trigger a page fault on missing or non-present entries.
-The second one does trigger a page fault on missing or read-only entry if the
-write parameter is true. Page faults use the generic mm page fault code path
-just like a CPU page fault.
+The second one does trigger a page fault on missing or read-only entries if
+write access is requested (see below). Page faults use the generic mm page
+fault code path just like a CPU page fault.
Both functions copy CPU page table entries into their pfns array argument. Each
entry in that array corresponds to an address in the virtual range. HMM
provides a set of flags to help the driver identify special CPU page table
entries.
-Locking with the update() callback is the most important aspect the driver must
-respect in order to keep things properly synchronized. The usage pattern is::
+Locking within the sync_cpu_device_pagetables() callback is the most important
+aspect the driver must respect in order to keep things properly synchronized.
+The usage pattern is::
int driver_populate_range(...)
{
@@ -243,7 +248,7 @@ respect in order to keep things properly synchronized. The usage pattern is::
return ret;
}
take_lock(driver->update);
- if (!range.valid) {
+ if (!hmm_range_valid(&range)) {
release_lock(driver->update);
up_read(&mm->mmap_sem);
goto again;
@@ -258,8 +263,8 @@ respect in order to keep things properly synchronized. The usage pattern is::
}
The driver->update lock is the same lock that the driver takes inside its
-update() callback. That lock must be held before checking the range.valid
-field to avoid any race with a concurrent CPU page table update.
+sync_cpu_device_pagetables() callback. That lock must be held before calling
+hmm_range_valid() to avoid any race with a concurrent CPU page table update.
HMM implements all this on top of the mmu_notifier API because we wanted a
simpler API and also to be able to perform optimizations latter on like doing
@@ -279,44 +284,46 @@ concurrently).
Leverage default_flags and pfn_flags_mask
=========================================
-The hmm_range struct has 2 fields default_flags and pfn_flags_mask that allows
-to set fault or snapshot policy for a whole range instead of having to set them
-for each entries in the range.
+The hmm_range struct has 2 fields, default_flags and pfn_flags_mask, that specify
+fault or snapshot policy for the whole range instead of having to set them
+for each entry in the pfns array.
+
+For instance, if the device flags for range.flags are::
-For instance if the device flags for device entries are:
- VALID (1 << 63)
- WRITE (1 << 62)
+ range.flags[HMM_PFN_VALID] = (1 << 63);
+ range.flags[HMM_PFN_WRITE] = (1 << 62);
-Now let say that device driver wants to fault with at least read a range then
-it does set:
- range->default_flags = (1 << 63)
+and the device driver wants pages for a range with at least read permission,
+it sets::
+
+ range->default_flags = (1 << 63);
range->pfn_flags_mask = 0;
-and calls hmm_range_fault() as described above. This will fill fault all page
+and calls hmm_range_fault() as described above. This will fill fault all pages
in the range with at least read permission.
-Now let say driver wants to do the same except for one page in the range for
-which its want to have write. Now driver set:
+Now let's say the driver wants to do the same except for one page in the range for
+which it wants to have write permission. Now driver set:
range->default_flags = (1 << 63);
range->pfn_flags_mask = (1 << 62);
range->pfns[index_of_write] = (1 << 62);
-With this HMM will fault in all page with at least read (ie valid) and for the
+With this, HMM will fault in all pages with at least read (i.e., valid) and for the
address == range->start + (index_of_write << PAGE_SHIFT) it will fault with
-write permission ie if the CPU pte does not have write permission set then HMM
+write permission i.e., if the CPU pte does not have write permission set then HMM
will call handle_mm_fault().
-Note that HMM will populate the pfns array with write permission for any entry
-that have write permission within the CPU pte no matter what are the values set
+Note that HMM will populate the pfns array with write permission for any page
+that is mapped with CPU write permission no matter what values are set
in default_flags or pfn_flags_mask.
Represent and manage device memory from core kernel point of view
=================================================================
-Several different designs were tried to support device memory. First one used
-a device specific data structure to keep information about migrated memory and
-HMM hooked itself in various places of mm code to handle any access to
+Several different designs were tried to support device memory. The first one
+used a device specific data structure to keep information about migrated memory
+and HMM hooked itself in various places of mm code to handle any access to
addresses that were backed by device memory. It turns out that this ended up
replicating most of the fields of struct page and also needed many kernel code
paths to be updated to understand this new kind of memory.
@@ -339,7 +346,7 @@ The hmm_devmem_ops is where most of the important things are::
struct hmm_devmem_ops {
void (*free)(struct hmm_devmem *devmem, struct page *page);
- int (*fault)(struct hmm_devmem *devmem,
+ vm_fault_t (*fault)(struct hmm_devmem *devmem,
struct vm_area_struct *vma,
unsigned long addr,
struct page *page,
@@ -415,9 +422,9 @@ willing to pay to keep all the code simpler.
Memory cgroup (memcg) and rss accounting
========================================
-For now device memory is accounted as any regular page in rss counters (either
+For now, device memory is accounted as any regular page in rss counters (either
anonymous if device page is used for anonymous, file if device page is used for
-file backed page or shmem if device page is used for shared memory). This is a
+file backed page, or shmem if device page is used for shared memory). This is a
deliberate choice to keep existing applications, that might start using device
memory without knowing about it, running unimpacted.
@@ -437,6 +444,6 @@ get more experience in how device memory is used and its impact on memory
resource control.
-Note that device memory can never be pinned by device driver nor through GUP
+Note that device memory can never be pinned by a device driver nor through GUP
and thus such memory is always free upon process exit. Or when last reference
is dropped in case of shared memory or file backed memory.
--
2.20.1
From: Ralph Campbell <[email protected]>
There are no functional changes, just some coding style clean ups and
minor comment changes.
Signed-off-by: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Andrew Morton <[email protected]>
---
include/linux/hmm.h | 71 +++++++++++++++++++++++----------------------
mm/hmm.c | 51 ++++++++++++++++----------------
2 files changed, 62 insertions(+), 60 deletions(-)
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 51ec27a84668..35a429621e1e 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -30,8 +30,8 @@
*
* HMM address space mirroring API:
*
- * Use HMM address space mirroring if you want to mirror range of the CPU page
- * table of a process into a device page table. Here, "mirror" means "keep
+ * Use HMM address space mirroring if you want to mirror a range of the CPU
+ * page tables of a process into a device page table. Here, "mirror" means "keep
* synchronized". Prerequisites: the device must provide the ability to write-
* protect its page tables (at PAGE_SIZE granularity), and must be able to
* recover from the resulting potential page faults.
@@ -114,10 +114,11 @@ struct hmm {
* HMM_PFN_WRITE: CPU page table has write permission set
* HMM_PFN_DEVICE_PRIVATE: private device memory (ZONE_DEVICE)
*
- * The driver provide a flags array, if driver valid bit for an entry is bit
- * 3 ie (entry & (1 << 3)) is true if entry is valid then driver must provide
+ * The driver provides a flags array for mapping page protections to device
+ * PTE bits. If the driver valid bit for an entry is bit 3,
+ * i.e., (entry & (1 << 3)), then the driver must provide
* an array in hmm_range.flags with hmm_range.flags[HMM_PFN_VALID] == 1 << 3.
- * Same logic apply to all flags. This is same idea as vm_page_prot in vma
+ * Same logic apply to all flags. This is the same idea as vm_page_prot in vma
* except that this is per device driver rather than per architecture.
*/
enum hmm_pfn_flag_e {
@@ -138,13 +139,13 @@ enum hmm_pfn_flag_e {
* be mirrored by a device, because the entry will never have HMM_PFN_VALID
* set and the pfn value is undefined.
*
- * Driver provide entry value for none entry, error entry and special entry,
- * driver can alias (ie use same value for error and special for instance). It
- * should not alias none and error or special.
+ * Driver provides values for none entry, error entry, and special entry.
+ * Driver can alias (i.e., use same value) error and special, but
+ * it should not alias none with error or special.
*
* HMM pfn value returned by hmm_vma_get_pfns() or hmm_vma_fault() will be:
* hmm_range.values[HMM_PFN_ERROR] if CPU page table entry is poisonous,
- * hmm_range.values[HMM_PFN_NONE] if there is no CPU page table
+ * hmm_range.values[HMM_PFN_NONE] if there is no CPU page table entry,
* hmm_range.values[HMM_PFN_SPECIAL] if CPU page table entry is a special one
*/
enum hmm_pfn_value_e {
@@ -167,6 +168,7 @@ enum hmm_pfn_value_e {
* @values: pfn value for some special case (none, special, error, ...)
* @default_flags: default flags for the range (write, read, ... see hmm doc)
* @pfn_flags_mask: allows to mask pfn flags so that only default_flags matter
+ * @page_shift: device virtual address shift value (should be >= PAGE_SHIFT)
* @pfn_shifts: pfn shift value (should be <= PAGE_SHIFT)
* @valid: pfns array did not change since it has been fill by an HMM function
*/
@@ -189,7 +191,7 @@ struct hmm_range {
/*
* hmm_range_page_shift() - return the page shift for the range
* @range: range being queried
- * Returns: page shift (page size = 1 << page shift) for the range
+ * Return: page shift (page size = 1 << page shift) for the range
*/
static inline unsigned hmm_range_page_shift(const struct hmm_range *range)
{
@@ -199,7 +201,7 @@ static inline unsigned hmm_range_page_shift(const struct hmm_range *range)
/*
* hmm_range_page_size() - return the page size for the range
* @range: range being queried
- * Returns: page size for the range in bytes
+ * Return: page size for the range in bytes
*/
static inline unsigned long hmm_range_page_size(const struct hmm_range *range)
{
@@ -210,7 +212,7 @@ static inline unsigned long hmm_range_page_size(const struct hmm_range *range)
* hmm_range_wait_until_valid() - wait for range to be valid
* @range: range affected by invalidation to wait on
* @timeout: time out for wait in ms (ie abort wait after that period of time)
- * Returns: true if the range is valid, false otherwise.
+ * Return: true if the range is valid, false otherwise.
*/
static inline bool hmm_range_wait_until_valid(struct hmm_range *range,
unsigned long timeout)
@@ -231,7 +233,7 @@ static inline bool hmm_range_wait_until_valid(struct hmm_range *range,
/*
* hmm_range_valid() - test if a range is valid or not
* @range: range
- * Returns: true if the range is valid, false otherwise.
+ * Return: true if the range is valid, false otherwise.
*/
static inline bool hmm_range_valid(struct hmm_range *range)
{
@@ -242,7 +244,7 @@ static inline bool hmm_range_valid(struct hmm_range *range)
* hmm_device_entry_to_page() - return struct page pointed to by a device entry
* @range: range use to decode device entry value
* @entry: device entry value to get corresponding struct page from
- * Returns: struct page pointer if entry is a valid, NULL otherwise
+ * Return: struct page pointer if entry is a valid, NULL otherwise
*
* If the device entry is valid (ie valid flag set) then return the struct page
* matching the entry value. Otherwise return NULL.
@@ -265,7 +267,7 @@ static inline struct page *hmm_device_entry_to_page(const struct hmm_range *rang
* hmm_device_entry_to_pfn() - return pfn value store in a device entry
* @range: range use to decode device entry value
* @entry: device entry to extract pfn from
- * Returns: pfn value if device entry is valid, -1UL otherwise
+ * Return: pfn value if device entry is valid, -1UL otherwise
*/
static inline unsigned long
hmm_device_entry_to_pfn(const struct hmm_range *range, uint64_t pfn)
@@ -285,7 +287,7 @@ hmm_device_entry_to_pfn(const struct hmm_range *range, uint64_t pfn)
* hmm_device_entry_from_page() - create a valid device entry for a page
* @range: range use to encode HMM pfn value
* @page: page for which to create the device entry
- * Returns: valid device entry for the page
+ * Return: valid device entry for the page
*/
static inline uint64_t hmm_device_entry_from_page(const struct hmm_range *range,
struct page *page)
@@ -298,7 +300,7 @@ static inline uint64_t hmm_device_entry_from_page(const struct hmm_range *range,
* hmm_device_entry_from_pfn() - create a valid device entry value from pfn
* @range: range use to encode HMM pfn value
* @pfn: pfn value for which to create the device entry
- * Returns: valid device entry for the pfn
+ * Return: valid device entry for the pfn
*/
static inline uint64_t hmm_device_entry_from_pfn(const struct hmm_range *range,
unsigned long pfn)
@@ -403,7 +405,7 @@ enum hmm_update_event {
};
/*
- * struct hmm_update - HMM update informations for callback
+ * struct hmm_update - HMM update information for callback
*
* @start: virtual start address of the range to update
* @end: virtual end address of the range to update
@@ -436,8 +438,8 @@ struct hmm_mirror_ops {
/* sync_cpu_device_pagetables() - synchronize page tables
*
* @mirror: pointer to struct hmm_mirror
- * @update: update informations (see struct hmm_update)
- * Returns: -EAGAIN if update.blockable false and callback need to
+ * @update: update information (see struct hmm_update)
+ * Return: -EAGAIN if update.blockable false and callback need to
* block, 0 otherwise.
*
* This callback ultimately originates from mmu_notifiers when the CPU
@@ -476,13 +478,13 @@ void hmm_mirror_unregister(struct hmm_mirror *mirror);
/*
* hmm_mirror_mm_is_alive() - test if mm is still alive
* @mirror: the HMM mm mirror for which we want to lock the mmap_sem
- * Returns: false if the mm is dead, true otherwise
+ * Return: false if the mm is dead, true otherwise
*
- * This is an optimization it will not accurately always return -EINVAL if the
- * mm is dead ie there can be false negative (process is being kill but HMM is
- * not yet inform of that). It is only intented to be use to optimize out case
- * where driver is about to do something time consuming and it would be better
- * to skip it if the mm is dead.
+ * This is an optimization, it will not always accurately return false if the
+ * mm is dead; i.e., there can be false negatives (process is being killed but
+ * HMM is not yet informed of that). It is only intended to be used to optimize
+ * out cases where the driver is about to do something time consuming and it
+ * would be better to skip it if the mm is dead.
*/
static inline bool hmm_mirror_mm_is_alive(struct hmm_mirror *mirror)
{
@@ -497,7 +499,6 @@ static inline bool hmm_mirror_mm_is_alive(struct hmm_mirror *mirror)
return true;
}
-
/*
* Please see Documentation/vm/hmm.rst for how to use the range API.
*/
@@ -570,7 +571,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
ret = hmm_range_fault(range, block);
if (ret <= 0) {
if (ret == -EBUSY || !ret) {
- /* Same as above drop mmap_sem to match old API. */
+ /* Same as above, drop mmap_sem to match old API. */
up_read(&range->vma->vm_mm->mmap_sem);
ret = -EBUSY;
} else if (ret == -EAGAIN)
@@ -637,7 +638,7 @@ struct hmm_devmem_ops {
* @page: pointer to struct page backing virtual address (unreliable)
* @flags: FAULT_FLAG_* (see include/linux/mm.h)
* @pmdp: page middle directory
- * Returns: VM_FAULT_MINOR/MAJOR on success or one of VM_FAULT_ERROR
+ * Return: VM_FAULT_MINOR/MAJOR on success or one of VM_FAULT_ERROR
* on error
*
* The callback occurs whenever there is a CPU page fault or GUP on a
@@ -645,14 +646,14 @@ struct hmm_devmem_ops {
* page back to regular memory (CPU accessible).
*
* The device driver is free to migrate more than one page from the
- * fault() callback as an optimization. However if device decide to
- * migrate more than one page it must always priotirize the faulting
+ * fault() callback as an optimization. However if the device decides
+ * to migrate more than one page it must always priotirize the faulting
* address over the others.
*
- * The struct page pointer is only given as an hint to allow quick
+ * The struct page pointer is only given as a hint to allow quick
* lookup of internal device driver data. A concurrent migration
- * might have already free that page and the virtual address might
- * not longer be back by it. So it should not be modified by the
+ * might have already freed that page and the virtual address might
+ * no longer be backed by it. So it should not be modified by the
* callback.
*
* Note that mmap semaphore is held in read mode at least when this
@@ -679,7 +680,7 @@ struct hmm_devmem_ops {
* @ref: per CPU refcount
* @page_fault: callback when CPU fault on an unaddressable device page
*
- * This an helper structure for device drivers that do not wish to implement
+ * This is a helper structure for device drivers that do not wish to implement
* the gory details related to hotplugging new memoy and allocating struct
* pages.
*
diff --git a/mm/hmm.c b/mm/hmm.c
index 0db8491090b8..f6c4c8633db9 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -162,9 +162,8 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
/* Wake-up everyone waiting on any range. */
mutex_lock(&hmm->lock);
- list_for_each_entry(range, &hmm->ranges, list) {
+ list_for_each_entry(range, &hmm->ranges, list)
range->valid = false;
- }
wake_up_all(&hmm->wq);
mutex_unlock(&hmm->lock);
@@ -175,9 +174,10 @@ static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
list_del_init(&mirror->list);
if (mirror->ops->release) {
/*
- * Drop mirrors_sem so callback can wait on any pending
- * work that might itself trigger mmu_notifier callback
- * and thus would deadlock with us.
+ * Drop mirrors_sem so the release callback can wait
+ * on any pending work that might itself trigger a
+ * mmu_notifier callback and thus would deadlock with
+ * us.
*/
up_write(&hmm->mirrors_sem);
mirror->ops->release(mirror);
@@ -232,11 +232,8 @@ static int hmm_invalidate_range_start(struct mmu_notifier *mn,
int ret;
ret = mirror->ops->sync_cpu_device_pagetables(mirror, &update);
- if (!update.blockable && ret == -EAGAIN) {
- up_read(&hmm->mirrors_sem);
- ret = -EAGAIN;
- goto out;
- }
+ if (!update.blockable && ret == -EAGAIN)
+ break;
}
up_read(&hmm->mirrors_sem);
@@ -280,6 +277,7 @@ static const struct mmu_notifier_ops hmm_mmu_notifier_ops = {
*
* @mirror: new mirror struct to register
* @mm: mm to register against
+ * Return: 0 on success, -ENOMEM if no memory, -EINVAL if invalid arguments
*
* To start mirroring a process address space, the device driver must register
* an HMM mirror struct.
@@ -307,7 +305,7 @@ EXPORT_SYMBOL(hmm_mirror_register);
/*
* hmm_mirror_unregister() - unregister a mirror
*
- * @mirror: new mirror struct to register
+ * @mirror: mirror struct to unregister
*
* Stop mirroring a process address space, and cleanup.
*/
@@ -381,7 +379,7 @@ static int hmm_pfns_bad(unsigned long addr,
* @fault: should we fault or not ?
* @write_fault: write fault ?
* @walk: mm_walk structure
- * Returns: 0 on success, -EBUSY after page fault, or page fault error
+ * Return: 0 on success, -EBUSY after page fault, or page fault error
*
* This function will be called whenever pmd_none() or pte_none() returns true,
* or whenever there is no page directory covering the virtual address range.
@@ -924,6 +922,7 @@ int hmm_range_register(struct hmm_range *range,
unsigned page_shift)
{
unsigned long mask = ((1UL << page_shift) - 1UL);
+ struct hmm *hmm;
range->valid = false;
range->hmm = NULL;
@@ -947,18 +946,18 @@ int hmm_range_register(struct hmm_range *range,
return -EFAULT;
}
- /* Initialize range to track CPU page table update */
+ /* Initialize range to track CPU page table updates. */
mutex_lock(&range->hmm->lock);
- list_add_rcu(&range->list, &range->hmm->ranges);
+ list_add_rcu(&range->list, &hmm->ranges);
/*
* If there are any concurrent notifiers we have to wait for them for
* the range to be valid (see hmm_range_wait_until_valid()).
*/
- if (!range->hmm->notifiers)
+ if (!hmm->notifiers)
range->valid = true;
- mutex_unlock(&range->hmm->lock);
+ mutex_unlock(&hmm->lock);
return 0;
}
@@ -973,17 +972,19 @@ EXPORT_SYMBOL(hmm_range_register);
*/
void hmm_range_unregister(struct hmm_range *range)
{
+ struct hmm *hmm = range->hmm;
+
/* Sanity check this really should not happen. */
- if (range->hmm == NULL || range->end <= range->start)
+ if (hmm == NULL || range->end <= range->start)
return;
- mutex_lock(&range->hmm->lock);
+ mutex_lock(&hmm->lock);
list_del_rcu(&range->list);
- mutex_unlock(&range->hmm->lock);
+ mutex_unlock(&hmm->lock);
/* Drop reference taken by hmm_range_register() */
range->valid = false;
- hmm_put(range->hmm);
+ hmm_put(hmm);
range->hmm = NULL;
}
EXPORT_SYMBOL(hmm_range_unregister);
@@ -991,7 +992,7 @@ EXPORT_SYMBOL(hmm_range_unregister);
/*
* hmm_range_snapshot() - snapshot CPU page table for a range
* @range: range
- * Returns: -EINVAL if invalid argument, -ENOMEM out of memory, -EPERM invalid
+ * Return: -EINVAL if invalid argument, -ENOMEM out of memory, -EPERM invalid
* permission (for instance asking for write and range is read only),
* -EAGAIN if you need to retry, -EFAULT invalid (ie either no valid
* vma or it is illegal to access that range), number of valid pages
@@ -1075,7 +1076,7 @@ EXPORT_SYMBOL(hmm_range_snapshot);
* hmm_range_fault() - try to fault some address in a virtual address range
* @range: range being faulted
* @block: allow blocking on fault (if true it sleeps and do not drop mmap_sem)
- * Returns: number of valid pages in range->pfns[] (from range start
+ * Return: number of valid pages in range->pfns[] (from range start
* address). This may be zero. If the return value is negative,
* then one of the following values may be returned:
*
@@ -1193,7 +1194,7 @@ EXPORT_SYMBOL(hmm_range_fault);
* @device: device against to dma map page to
* @daddrs: dma address of mapped pages
* @block: allow blocking on fault (if true it sleeps and do not drop mmap_sem)
- * Returns: number of pages mapped on success, -EAGAIN if mmap_sem have been
+ * Return: number of pages mapped on success, -EAGAIN if mmap_sem have been
* drop and you need to try again, some other error value otherwise
*
* Note same usage pattern as hmm_range_fault().
@@ -1281,7 +1282,7 @@ EXPORT_SYMBOL(hmm_range_dma_map);
* @device: device against which dma map was done
* @daddrs: dma address of mapped pages
* @dirty: dirty page if it had the write flag set
- * Returns: number of page unmapped on success, -EINVAL otherwise
+ * Return: number of page unmapped on success, -EINVAL otherwise
*
* Note that caller MUST abide by mmu notifier or use HMM mirror and abide
* to the sync_cpu_device_pagetables() callback so that it is safe here to
@@ -1404,7 +1405,7 @@ static void hmm_devmem_free(struct page *page, void *data)
* @ops: memory event device driver callback (see struct hmm_devmem_ops)
* @device: device struct to bind the resource too
* @size: size in bytes of the device memory to add
- * Returns: pointer to new hmm_devmem struct ERR_PTR otherwise
+ * Return: pointer to new hmm_devmem struct ERR_PTR otherwise
*
* This function first finds an empty range of physical address big enough to
* contain the new resource, and then hotplugs it as ZONE_DEVICE memory, which
--
2.20.1
From: Ralph Campbell <[email protected]>
The helper function hmm_vma_fault() calls hmm_range_register() but is
missing a call to hmm_range_unregister() in one of the error paths.
This leads to a reference count leak and ultimately a memory leak on
struct hmm.
Always call hmm_range_unregister() if hmm_range_register() succeeded.
Signed-off-by: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Andrew Morton <[email protected]>
---
include/linux/hmm.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 35a429621e1e..fa0671d67269 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
return (int)ret;
if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
+ hmm_range_unregister(range);
/*
* The mmap_sem was taken by driver we release it here and
* returns -EAGAIN which correspond to mmap_sem have been
@@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
ret = hmm_range_fault(range, block);
if (ret <= 0) {
+ hmm_range_unregister(range);
if (ret == -EBUSY || !ret) {
/* Same as above, drop mmap_sem to match old API. */
up_read(&range->vma->vm_mm->mmap_sem);
ret = -EBUSY;
} else if (ret == -EAGAIN)
ret = -EBUSY;
- hmm_range_unregister(range);
return ret;
}
return 0;
--
2.20.1
From: Ralph Campbell <[email protected]>
In hmm_range_register(), the call to hmm_get_or_create() implies that
hmm_range_register() could be called before hmm_mirror_register() when
in fact, that would violate the HMM API.
Use mm_get_hmm() instead of hmm_get_or_create() to get the HMM structure.
Signed-off-by: Ralph Campbell <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Balbir Singh <[email protected]>
Cc: Dan Carpenter <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Souptick Joarder <[email protected]>
Cc: Andrew Morton <[email protected]>
---
mm/hmm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/hmm.c b/mm/hmm.c
index f6c4c8633db9..2aa75dbed04a 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -936,7 +936,7 @@ int hmm_range_register(struct hmm_range *range,
range->start = start;
range->end = end;
- range->hmm = hmm_get_or_create(mm);
+ range->hmm = mm_get_hmm(mm);
if (!range->hmm)
return -EFAULT;
--
2.20.1
On Tue, May 7, 2019 at 5:00 AM <[email protected]> wrote:
>
> From: Ralph Campbell <[email protected]>
>
> The helper function hmm_vma_fault() calls hmm_range_register() but is
> missing a call to hmm_range_unregister() in one of the error paths.
> This leads to a reference count leak and ultimately a memory leak on
> struct hmm.
>
> Always call hmm_range_unregister() if hmm_range_register() succeeded.
How about * Call hmm_range_unregister() in error path if
hmm_range_register() succeeded* ?
>
> Signed-off-by: Ralph Campbell <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Balbir Singh <[email protected]>
> Cc: Dan Carpenter <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Souptick Joarder <[email protected]>
> Cc: Andrew Morton <[email protected]>
> ---
> include/linux/hmm.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 35a429621e1e..fa0671d67269 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> return (int)ret;
>
> if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
> + hmm_range_unregister(range);
> /*
> * The mmap_sem was taken by driver we release it here and
> * returns -EAGAIN which correspond to mmap_sem have been
> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>
> ret = hmm_range_fault(range, block);
> if (ret <= 0) {
> + hmm_range_unregister(range);
what is the reason to moved it up ?
> if (ret == -EBUSY || !ret) {
> /* Same as above, drop mmap_sem to match old API. */
> up_read(&range->vma->vm_mm->mmap_sem);
> ret = -EBUSY;
> } else if (ret == -EAGAIN)
> ret = -EBUSY;
> - hmm_range_unregister(range);
> return ret;
> }
> return 0;
> --
> 2.20.1
>
On 5/7/19 6:15 AM, Souptick Joarder wrote:
> On Tue, May 7, 2019 at 5:00 AM <[email protected]> wrote:
>>
>> From: Ralph Campbell <[email protected]>
>>
>> The helper function hmm_vma_fault() calls hmm_range_register() but is
>> missing a call to hmm_range_unregister() in one of the error paths.
>> This leads to a reference count leak and ultimately a memory leak on
>> struct hmm.
>>
>> Always call hmm_range_unregister() if hmm_range_register() succeeded.
>
> How about * Call hmm_range_unregister() in error path if
> hmm_range_register() succeeded* ?
Sure, sounds good.
I'll include that in v2.
>>
>> Signed-off-by: Ralph Campbell <[email protected]>
>> Cc: John Hubbard <[email protected]>
>> Cc: Ira Weiny <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Arnd Bergmann <[email protected]>
>> Cc: Balbir Singh <[email protected]>
>> Cc: Dan Carpenter <[email protected]>
>> Cc: Matthew Wilcox <[email protected]>
>> Cc: Souptick Joarder <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> ---
>> include/linux/hmm.h | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
>> index 35a429621e1e..fa0671d67269 100644
>> --- a/include/linux/hmm.h
>> +++ b/include/linux/hmm.h
>> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>> return (int)ret;
>>
>> if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
>> + hmm_range_unregister(range);
>> /*
>> * The mmap_sem was taken by driver we release it here and
>> * returns -EAGAIN which correspond to mmap_sem have been
>> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>
>> ret = hmm_range_fault(range, block);
>> if (ret <= 0) {
>> + hmm_range_unregister(range);
>
> what is the reason to moved it up ?
I moved it up because the normal calling pattern is:
down_read(&mm->mmap_sem)
hmm_vma_fault()
hmm_range_register()
hmm_range_fault()
hmm_range_unregister()
up_read(&mm->mmap_sem)
I don't think it is a bug to unlock mmap_sem and then unregister,
it is just more consistent nesting.
>> if (ret == -EBUSY || !ret) {
>> /* Same as above, drop mmap_sem to match old API. */
>> up_read(&range->vma->vm_mm->mmap_sem);
>> ret = -EBUSY;
>> } else if (ret == -EAGAIN)
>> ret = -EBUSY;
>> - hmm_range_unregister(range);
>> return ret;
>> }
>> return 0;
>> --
>> 2.20.1
>>
On Tue, May 7, 2019 at 11:42 PM Ralph Campbell <[email protected]> wrote:
>
>
> On 5/7/19 6:15 AM, Souptick Joarder wrote:
> > On Tue, May 7, 2019 at 5:00 AM <[email protected]> wrote:
> >>
> >> From: Ralph Campbell <[email protected]>
> >>
> >> The helper function hmm_vma_fault() calls hmm_range_register() but is
> >> missing a call to hmm_range_unregister() in one of the error paths.
> >> This leads to a reference count leak and ultimately a memory leak on
> >> struct hmm.
> >>
> >> Always call hmm_range_unregister() if hmm_range_register() succeeded.
> >
> > How about * Call hmm_range_unregister() in error path if
> > hmm_range_register() succeeded* ?
>
> Sure, sounds good.
> I'll include that in v2.
Thanks.
>
> >>
> >> Signed-off-by: Ralph Campbell <[email protected]>
> >> Cc: John Hubbard <[email protected]>
> >> Cc: Ira Weiny <[email protected]>
> >> Cc: Dan Williams <[email protected]>
> >> Cc: Arnd Bergmann <[email protected]>
> >> Cc: Balbir Singh <[email protected]>
> >> Cc: Dan Carpenter <[email protected]>
> >> Cc: Matthew Wilcox <[email protected]>
> >> Cc: Souptick Joarder <[email protected]>
> >> Cc: Andrew Morton <[email protected]>
> >> ---
> >> include/linux/hmm.h | 3 ++-
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> >> index 35a429621e1e..fa0671d67269 100644
> >> --- a/include/linux/hmm.h
> >> +++ b/include/linux/hmm.h
> >> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> >> return (int)ret;
> >>
> >> if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
> >> + hmm_range_unregister(range);
> >> /*
> >> * The mmap_sem was taken by driver we release it here and
> >> * returns -EAGAIN which correspond to mmap_sem have been
> >> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> >>
> >> ret = hmm_range_fault(range, block);
> >> if (ret <= 0) {
> >> + hmm_range_unregister(range);
> >
> > what is the reason to moved it up ?
>
> I moved it up because the normal calling pattern is:
> down_read(&mm->mmap_sem)
> hmm_vma_fault()
> hmm_range_register()
> hmm_range_fault()
> hmm_range_unregister()
> up_read(&mm->mmap_sem)
>
> I don't think it is a bug to unlock mmap_sem and then unregister,
> it is just more consistent nesting.
Ok. I think, adding it in change log will be helpful :)
>
> >> if (ret == -EBUSY || !ret) {
> >> /* Same as above, drop mmap_sem to match old API. */
> >> up_read(&range->vma->vm_mm->mmap_sem);
> >> ret = -EBUSY;
> >> } else if (ret == -EAGAIN)
> >> ret = -EBUSY;
> >> - hmm_range_unregister(range);
> >> return ret;
> >> }
> >> return 0;
> >> --
> >> 2.20.1
> >>
On Tue, May 07, 2019 at 11:12:14AM -0700, Ralph Campbell wrote:
>
> On 5/7/19 6:15 AM, Souptick Joarder wrote:
> > On Tue, May 7, 2019 at 5:00 AM <[email protected]> wrote:
> > >
> > > From: Ralph Campbell <[email protected]>
> > >
> > > The helper function hmm_vma_fault() calls hmm_range_register() but is
> > > missing a call to hmm_range_unregister() in one of the error paths.
> > > This leads to a reference count leak and ultimately a memory leak on
> > > struct hmm.
> > >
> > > Always call hmm_range_unregister() if hmm_range_register() succeeded.
> >
> > How about * Call hmm_range_unregister() in error path if
> > hmm_range_register() succeeded* ?
>
> Sure, sounds good.
> I'll include that in v2.
NAK for the patch see below why
>
> > >
> > > Signed-off-by: Ralph Campbell <[email protected]>
> > > Cc: John Hubbard <[email protected]>
> > > Cc: Ira Weiny <[email protected]>
> > > Cc: Dan Williams <[email protected]>
> > > Cc: Arnd Bergmann <[email protected]>
> > > Cc: Balbir Singh <[email protected]>
> > > Cc: Dan Carpenter <[email protected]>
> > > Cc: Matthew Wilcox <[email protected]>
> > > Cc: Souptick Joarder <[email protected]>
> > > Cc: Andrew Morton <[email protected]>
> > > ---
> > > include/linux/hmm.h | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> > > index 35a429621e1e..fa0671d67269 100644
> > > --- a/include/linux/hmm.h
> > > +++ b/include/linux/hmm.h
> > > @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> > > return (int)ret;
> > >
> > > if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
> > > + hmm_range_unregister(range);
> > > /*
> > > * The mmap_sem was taken by driver we release it here and
> > > * returns -EAGAIN which correspond to mmap_sem have been
> > > @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> > >
> > > ret = hmm_range_fault(range, block);
> > > if (ret <= 0) {
> > > + hmm_range_unregister(range);
> >
> > what is the reason to moved it up ?
>
> I moved it up because the normal calling pattern is:
> down_read(&mm->mmap_sem)
> hmm_vma_fault()
> hmm_range_register()
> hmm_range_fault()
> hmm_range_unregister()
> up_read(&mm->mmap_sem)
>
> I don't think it is a bug to unlock mmap_sem and then unregister,
> it is just more consistent nesting.
So this is not the usage pattern with HMM usage pattern is:
hmm_range_register()
hmm_range_fault()
hmm_range_unregister()
The hmm_vma_fault() is gonne so this patch here break thing.
See https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3
On Mon, May 06, 2019 at 04:29:37PM -0700, [email protected] wrote:
> From: Ralph Campbell <[email protected]>
>
> I hit a use after free bug in hmm_free() with KASAN and then couldn't
> stop myself from cleaning up a bunch of documentation and coding style
> changes. So the first two patches are clean ups, the last three are
> the fixes.
>
> Ralph Campbell (5):
> mm/hmm: Update HMM documentation
> mm/hmm: Clean up some coding style and comments
> mm/hmm: Use mm_get_hmm() in hmm_range_register()
> mm/hmm: hmm_vma_fault() doesn't always call hmm_range_unregister()
> mm/hmm: Fix mm stale reference use in hmm_free()
This patchset does not seems to be on top of
https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3
So here we are out of sync, on documentation and code. If you
have any fix for https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3
then please submit something on top of that.
Cheers,
J?r?me
>
> Documentation/vm/hmm.rst | 139 ++++++++++++++++++-----------------
> include/linux/hmm.h | 84 ++++++++++------------
> mm/hmm.c | 151 ++++++++++++++++-----------------------
> 3 files changed, 174 insertions(+), 200 deletions(-)
>
> --
> 2.20.1
>
On Sun, May 12, 2019 at 11:07:24AM -0400, Jerome Glisse wrote:
> On Tue, May 07, 2019 at 11:12:14AM -0700, Ralph Campbell wrote:
> >
> > On 5/7/19 6:15 AM, Souptick Joarder wrote:
> > > On Tue, May 7, 2019 at 5:00 AM <[email protected]> wrote:
> > > >
> > > > From: Ralph Campbell <[email protected]>
> > > >
> > > > The helper function hmm_vma_fault() calls hmm_range_register() but is
> > > > missing a call to hmm_range_unregister() in one of the error paths.
> > > > This leads to a reference count leak and ultimately a memory leak on
> > > > struct hmm.
> > > >
> > > > Always call hmm_range_unregister() if hmm_range_register() succeeded.
> > >
> > > How about * Call hmm_range_unregister() in error path if
> > > hmm_range_register() succeeded* ?
> >
> > Sure, sounds good.
> > I'll include that in v2.
>
> NAK for the patch see below why
>
> >
> > > >
> > > > Signed-off-by: Ralph Campbell <[email protected]>
> > > > Cc: John Hubbard <[email protected]>
> > > > Cc: Ira Weiny <[email protected]>
> > > > Cc: Dan Williams <[email protected]>
> > > > Cc: Arnd Bergmann <[email protected]>
> > > > Cc: Balbir Singh <[email protected]>
> > > > Cc: Dan Carpenter <[email protected]>
> > > > Cc: Matthew Wilcox <[email protected]>
> > > > Cc: Souptick Joarder <[email protected]>
> > > > Cc: Andrew Morton <[email protected]>
> > > > ---
> > > > include/linux/hmm.h | 3 ++-
> > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> > > > index 35a429621e1e..fa0671d67269 100644
> > > > --- a/include/linux/hmm.h
> > > > +++ b/include/linux/hmm.h
> > > > @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> > > > return (int)ret;
> > > >
> > > > if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
> > > > + hmm_range_unregister(range);
> > > > /*
> > > > * The mmap_sem was taken by driver we release it here and
> > > > * returns -EAGAIN which correspond to mmap_sem have been
> > > > @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> > > >
> > > > ret = hmm_range_fault(range, block);
> > > > if (ret <= 0) {
> > > > + hmm_range_unregister(range);
> > >
> > > what is the reason to moved it up ?
> >
> > I moved it up because the normal calling pattern is:
> > down_read(&mm->mmap_sem)
> > hmm_vma_fault()
> > hmm_range_register()
> > hmm_range_fault()
> > hmm_range_unregister()
> > up_read(&mm->mmap_sem)
> >
> > I don't think it is a bug to unlock mmap_sem and then unregister,
> > it is just more consistent nesting.
>
> So this is not the usage pattern with HMM usage pattern is:
>
> hmm_range_register()
> hmm_range_fault()
> hmm_range_unregister()
>
> The hmm_vma_fault() is gonne so this patch here break thing.
>
> See https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3
Sorry not enough coffee on sunday morning, so yeah this patch
looks good except that you do not need to move it up.
Note that hmm_vma_fault() is a gonner once ODP to HMM is upstream
and i converted nouveau/amd to new API then we can remove that
one.
Cheers,
J?r?me
On 5/12/19 8:07 AM, Jerome Glisse wrote:
> On Tue, May 07, 2019 at 11:12:14AM -0700, Ralph Campbell wrote:
>>
>> On 5/7/19 6:15 AM, Souptick Joarder wrote:
>>> On Tue, May 7, 2019 at 5:00 AM <[email protected]> wrote:
>>>>
>>>> From: Ralph Campbell <[email protected]>
>>>>
>>>> The helper function hmm_vma_fault() calls hmm_range_register() but is
>>>> missing a call to hmm_range_unregister() in one of the error paths.
>>>> This leads to a reference count leak and ultimately a memory leak on
>>>> struct hmm.
>>>>
>>>> Always call hmm_range_unregister() if hmm_range_register() succeeded.
>>>
>>> How about * Call hmm_range_unregister() in error path if
>>> hmm_range_register() succeeded* ?
>>
>> Sure, sounds good.
>> I'll include that in v2.
>
> NAK for the patch see below why
>
>>
>>>>
>>>> Signed-off-by: Ralph Campbell <[email protected]>
>>>> Cc: John Hubbard <[email protected]>
>>>> Cc: Ira Weiny <[email protected]>
>>>> Cc: Dan Williams <[email protected]>
>>>> Cc: Arnd Bergmann <[email protected]>
>>>> Cc: Balbir Singh <[email protected]>
>>>> Cc: Dan Carpenter <[email protected]>
>>>> Cc: Matthew Wilcox <[email protected]>
>>>> Cc: Souptick Joarder <[email protected]>
>>>> Cc: Andrew Morton <[email protected]>
>>>> ---
>>>> include/linux/hmm.h | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
>>>> index 35a429621e1e..fa0671d67269 100644
>>>> --- a/include/linux/hmm.h
>>>> +++ b/include/linux/hmm.h
>>>> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>>> return (int)ret;
>>>>
>>>> if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
>>>> + hmm_range_unregister(range);
>>>> /*
>>>> * The mmap_sem was taken by driver we release it here and
>>>> * returns -EAGAIN which correspond to mmap_sem have been
>>>> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>>>
>>>> ret = hmm_range_fault(range, block);
>>>> if (ret <= 0) {
>>>> + hmm_range_unregister(range);
>>>
>>> what is the reason to moved it up ?
>>
>> I moved it up because the normal calling pattern is:
>> down_read(&mm->mmap_sem)
>> hmm_vma_fault()
>> hmm_range_register()
>> hmm_range_fault()
>> hmm_range_unregister()
>> up_read(&mm->mmap_sem)
>>
>> I don't think it is a bug to unlock mmap_sem and then unregister,
>> it is just more consistent nesting.
>
> So this is not the usage pattern with HMM usage pattern is:
>
> hmm_range_register()
> hmm_range_fault()
> hmm_range_unregister()
>
> The hmm_vma_fault() is gonne so this patch here break thing.
>
> See https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3
The patch series is on top of v5.1-rc6-mmotm-2019-04-25-16-30.
hmm_vma_fault() is defined there and in your hmm-5.2-v3 branch as
a backward compatibility transition function in include/linux/hmm.h.
So I agree the new API is to use hmm_range_register(), etc.
This is intended to cover the transition period.
Note that hmm_vma_fault() is being called from
drivers/gpu/drm/nouveau/nouveau_svm.c in both trees.
On 5/12/19 8:08 AM, Jerome Glisse wrote:
> On Mon, May 06, 2019 at 04:29:37PM -0700, [email protected] wrote:
>> From: Ralph Campbell <[email protected]>
>>
>> I hit a use after free bug in hmm_free() with KASAN and then couldn't
>> stop myself from cleaning up a bunch of documentation and coding style
>> changes. So the first two patches are clean ups, the last three are
>> the fixes.
>>
>> Ralph Campbell (5):
>> mm/hmm: Update HMM documentation
>> mm/hmm: Clean up some coding style and comments
>> mm/hmm: Use mm_get_hmm() in hmm_range_register()
>> mm/hmm: hmm_vma_fault() doesn't always call hmm_range_unregister()
>> mm/hmm: Fix mm stale reference use in hmm_free()
>
> This patchset does not seems to be on top of
> https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3
>
> So here we are out of sync, on documentation and code. If you
> have any fix for https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3
> then please submit something on top of that.
>
> Cheers,
> Jérôme
>
>>
>> Documentation/vm/hmm.rst | 139 ++++++++++++++++++-----------------
>> include/linux/hmm.h | 84 ++++++++++------------
>> mm/hmm.c | 151 ++++++++++++++++-----------------------
>> 3 files changed, 174 insertions(+), 200 deletions(-)
>>
>> --
>> 2.20.1
The patches are based on top of Andrew's mmotm tree
git://git.cmpxchg.org/linux-mmotm.git v5.1-rc6-mmotm-2019-04-25-16-30.
They apply cleanly to that git tag as well as your hmm-5.2-v3 branch
so I guess I am confused where we are out of sync.
On Mon, May 13, 2019 at 10:26:59AM -0700, Ralph Campbell wrote:
>
>
> On 5/12/19 8:08 AM, Jerome Glisse wrote:
> > On Mon, May 06, 2019 at 04:29:37PM -0700, [email protected] wrote:
> > > From: Ralph Campbell <[email protected]>
> > >
> > > I hit a use after free bug in hmm_free() with KASAN and then couldn't
> > > stop myself from cleaning up a bunch of documentation and coding style
> > > changes. So the first two patches are clean ups, the last three are
> > > the fixes.
> > >
> > > Ralph Campbell (5):
> > > mm/hmm: Update HMM documentation
> > > mm/hmm: Clean up some coding style and comments
> > > mm/hmm: Use mm_get_hmm() in hmm_range_register()
> > > mm/hmm: hmm_vma_fault() doesn't always call hmm_range_unregister()
> > > mm/hmm: Fix mm stale reference use in hmm_free()
> >
> > This patchset does not seems to be on top of
> > https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3
> >
> > So here we are out of sync, on documentation and code. If you
> > have any fix for https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.2-v3
> > then please submit something on top of that.
> >
> > Cheers,
> > J?r?me
> >
> > >
> > > Documentation/vm/hmm.rst | 139 ++++++++++++++++++-----------------
> > > include/linux/hmm.h | 84 ++++++++++------------
> > > mm/hmm.c | 151 ++++++++++++++++-----------------------
> > > 3 files changed, 174 insertions(+), 200 deletions(-)
> > >
> > > --
> > > 2.20.1
>
> The patches are based on top of Andrew's mmotm tree
> git://git.cmpxchg.org/linux-mmotm.git v5.1-rc6-mmotm-2019-04-25-16-30.
> They apply cleanly to that git tag as well as your hmm-5.2-v3 branch
> so I guess I am confused where we are out of sync.
No disregard my email, i was trying to apply on top of wrong
branch yesterday morning while catching up on big backlog of
email. Failure was on my side.
Cheers,
J?r?me
On Mon, May 06, 2019 at 04:29:40PM -0700, [email protected] wrote:
> From: Ralph Campbell <[email protected]>
>
> In hmm_range_register(), the call to hmm_get_or_create() implies that
> hmm_range_register() could be called before hmm_mirror_register() when
> in fact, that would violate the HMM API.
>
> Use mm_get_hmm() instead of hmm_get_or_create() to get the HMM structure.
>
> Signed-off-by: Ralph Campbell <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Balbir Singh <[email protected]>
> Cc: Dan Carpenter <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Souptick Joarder <[email protected]>
> Cc: Andrew Morton <[email protected]>
> mm/hmm.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/hmm.c b/mm/hmm.c
> index f6c4c8633db9..2aa75dbed04a 100644
> +++ b/mm/hmm.c
> @@ -936,7 +936,7 @@ int hmm_range_register(struct hmm_range *range,
> range->start = start;
> range->end = end;
>
> - range->hmm = hmm_get_or_create(mm);
> + range->hmm = mm_get_hmm(mm);
> if (!range->hmm)
> return -EFAULT;
I looked for documentation saying that hmm_range_register should only
be done inside a hmm_mirror_register and didn't see it. Did I miss it?
Can you add a comment?
It is really good to fix this because it means we can rely on mmap sem
to manage mm->hmm!
If this is true then I also think we should change the signature of
the function to make this dependency relationship clear, and remove
some possible confusing edge cases.
What do you think about something like this? (unfinished)
commit 29098bd59cf481ad1915db40aefc8435dabb8b28
Author: Jason Gunthorpe <[email protected]>
Date: Thu May 23 09:41:19 2019 -0300
mm/hmm: Use hmm_mirror not mm as an argument for hmm_register_range
Ralf observes that hmm_register_range() can only be called by a driver
while a mirror is registered. Make this clear in the API by passing
in the mirror structure as a parameter.
This also simplifies understanding the lifetime model for struct hmm,
as the hmm pointer must be valid as part of a registered mirror
so all we need in hmm_register_range() is a simple kref_get.
Suggested-by: Ralph Campbell <[email protected]>
Signed-off-by: Jason Gunthorpe <[email protected]>
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 8b91c90d3b88cb..87d29e085a69f7 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -503,7 +503,7 @@ static inline bool hmm_mirror_mm_is_alive(struct hmm_mirror *mirror)
* Please see Documentation/vm/hmm.rst for how to use the range API.
*/
int hmm_range_register(struct hmm_range *range,
- struct mm_struct *mm,
+ struct hmm_mirror *mirror,
unsigned long start,
unsigned long end,
unsigned page_shift);
@@ -539,7 +539,8 @@ static inline bool hmm_vma_range_done(struct hmm_range *range)
}
/* This is a temporary helper to avoid merge conflict between trees. */
-static inline int hmm_vma_fault(struct hmm_range *range, bool block)
+static inline int hmm_vma_fault(struct hmm_mirror *mirror,
+ struct hmm_range *range, bool block)
{
long ret;
@@ -552,7 +553,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
range->default_flags = 0;
range->pfn_flags_mask = -1UL;
- ret = hmm_range_register(range, range->vma->vm_mm,
+ ret = hmm_range_register(range, mirror,
range->start, range->end,
PAGE_SHIFT);
if (ret)
diff --git a/mm/hmm.c b/mm/hmm.c
index 824e7e160d8167..fa1b04fcfc2549 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -927,7 +927,7 @@ static void hmm_pfns_clear(struct hmm_range *range,
* Track updates to the CPU page table see include/linux/hmm.h
*/
int hmm_range_register(struct hmm_range *range,
- struct mm_struct *mm,
+ struct hmm_mirror *mirror,
unsigned long start,
unsigned long end,
unsigned page_shift)
@@ -935,7 +935,6 @@ int hmm_range_register(struct hmm_range *range,
unsigned long mask = ((1UL << page_shift) - 1UL);
range->valid = false;
- range->hmm = NULL;
if ((start & mask) || (end & mask))
return -EINVAL;
@@ -946,15 +945,12 @@ int hmm_range_register(struct hmm_range *range,
range->start = start;
range->end = end;
- range->hmm = hmm_get_or_create(mm);
- if (!range->hmm)
- return -EFAULT;
-
/* Check if hmm_mm_destroy() was call. */
- if (range->hmm->mm == NULL || range->hmm->dead) {
- hmm_put(range->hmm);
+ if (mirror->hmm->mm == NULL || mirror->hmm->dead)
return -EFAULT;
- }
+
+ range->hmm = mirror->hmm;
+ kref_get(&range->hmm->kref);
/* Initialize range to track CPU page table update */
mutex_lock(&range->hmm->lock);
On 5/23/19 5:51 AM, Jason Gunthorpe wrote:
> On Mon, May 06, 2019 at 04:29:40PM -0700, [email protected] wrote:
>> From: Ralph Campbell <[email protected]>
>>
>> In hmm_range_register(), the call to hmm_get_or_create() implies that
>> hmm_range_register() could be called before hmm_mirror_register() when
>> in fact, that would violate the HMM API.
>>
>> Use mm_get_hmm() instead of hmm_get_or_create() to get the HMM structure.
>>
>> Signed-off-by: Ralph Campbell <[email protected]>
>> Cc: John Hubbard <[email protected]>
>> Cc: Ira Weiny <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Arnd Bergmann <[email protected]>
>> Cc: Balbir Singh <[email protected]>
>> Cc: Dan Carpenter <[email protected]>
>> Cc: Matthew Wilcox <[email protected]>
>> Cc: Souptick Joarder <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> mm/hmm.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/hmm.c b/mm/hmm.c
>> index f6c4c8633db9..2aa75dbed04a 100644
>> +++ b/mm/hmm.c
>> @@ -936,7 +936,7 @@ int hmm_range_register(struct hmm_range *range,
>> range->start = start;
>> range->end = end;
>>
>> - range->hmm = hmm_get_or_create(mm);
>> + range->hmm = mm_get_hmm(mm);
>> if (!range->hmm)
>> return -EFAULT;
>
> I looked for documentation saying that hmm_range_register should only
> be done inside a hmm_mirror_register and didn't see it. Did I miss it?
Yes, hmm_mirror_register() has to be called before hmm_range_register().
Look at Documentation/vm/hmm.rst for "Address space mirroring
implementation and API".
> Can you add a comment?
Sure, although I think your idea to pass hmm_mirror to
hmm_range_register() makes sense and makes it clear.
> It is really good to fix this because it means we can rely on mmap sem
> to manage mm->hmm!
>
> If this is true then I also think we should change the signature of
> the function to make this dependency relationship clear, and remove
> some possible confusing edge cases.
>
> What do you think about something like this? (unfinished)
>
> commit 29098bd59cf481ad1915db40aefc8435dabb8b28
> Author: Jason Gunthorpe <[email protected]>
> Date: Thu May 23 09:41:19 2019 -0300
>
> mm/hmm: Use hmm_mirror not mm as an argument for hmm_register_range
>
> Ralf observes that hmm_register_range() can only be called by a driver
> while a mirror is registered. Make this clear in the API by passing
> in the mirror structure as a parameter.
>
> This also simplifies understanding the lifetime model for struct hmm,
> as the hmm pointer must be valid as part of a registered mirror
> so all we need in hmm_register_range() is a simple kref_get.
>
> Suggested-by: Ralph Campbell <[email protected]>
> Signed-off-by: Jason Gunthorpe <[email protected]>
>
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 8b91c90d3b88cb..87d29e085a69f7 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -503,7 +503,7 @@ static inline bool hmm_mirror_mm_is_alive(struct hmm_mirror *mirror)
> * Please see Documentation/vm/hmm.rst for how to use the range API.
> */
> int hmm_range_register(struct hmm_range *range,
> - struct mm_struct *mm,
> + struct hmm_mirror *mirror,
> unsigned long start,
> unsigned long end,
> unsigned page_shift);
> @@ -539,7 +539,8 @@ static inline bool hmm_vma_range_done(struct hmm_range *range)
> }
>
> /* This is a temporary helper to avoid merge conflict between trees. */
> -static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> +static inline int hmm_vma_fault(struct hmm_mirror *mirror,
> + struct hmm_range *range, bool block)
> {
> long ret;
>
> @@ -552,7 +553,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> range->default_flags = 0;
> range->pfn_flags_mask = -1UL;
>
> - ret = hmm_range_register(range, range->vma->vm_mm,
> + ret = hmm_range_register(range, mirror,
> range->start, range->end,
> PAGE_SHIFT);
> if (ret)
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 824e7e160d8167..fa1b04fcfc2549 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -927,7 +927,7 @@ static void hmm_pfns_clear(struct hmm_range *range,
> * Track updates to the CPU page table see include/linux/hmm.h
> */
> int hmm_range_register(struct hmm_range *range,
> - struct mm_struct *mm,
> + struct hmm_mirror *mirror,
> unsigned long start,
> unsigned long end,
> unsigned page_shift)
> @@ -935,7 +935,6 @@ int hmm_range_register(struct hmm_range *range,
> unsigned long mask = ((1UL << page_shift) - 1UL);
>
> range->valid = false;
> - range->hmm = NULL;
>
> if ((start & mask) || (end & mask))
> return -EINVAL;
> @@ -946,15 +945,12 @@ int hmm_range_register(struct hmm_range *range,
> range->start = start;
> range->end = end;
>
> - range->hmm = hmm_get_or_create(mm);
> - if (!range->hmm)
> - return -EFAULT;
> -
> /* Check if hmm_mm_destroy() was call. */
> - if (range->hmm->mm == NULL || range->hmm->dead) {
> - hmm_put(range->hmm);
> + if (mirror->hmm->mm == NULL || mirror->hmm->dead)
> return -EFAULT;
> - }
> +
> + range->hmm = mirror->hmm;
> + kref_get(&range->hmm->kref);
>
> /* Initialize range to track CPU page table update */
> mutex_lock(&range->hmm->lock);
>
On 5/23/19 5:51 AM, Jason Gunthorpe wrote:
> On Mon, May 06, 2019 at 04:29:40PM -0700, [email protected] wrote:
>> From: Ralph Campbell <[email protected]>
>>
>> In hmm_range_register(), the call to hmm_get_or_create() implies that
>> hmm_range_register() could be called before hmm_mirror_register() when
>> in fact, that would violate the HMM API.
>>
>> Use mm_get_hmm() instead of hmm_get_or_create() to get the HMM structure.
>>
>> Signed-off-by: Ralph Campbell <[email protected]>
>> Cc: John Hubbard <[email protected]>
>> Cc: Ira Weiny <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Arnd Bergmann <[email protected]>
>> Cc: Balbir Singh <[email protected]>
>> Cc: Dan Carpenter <[email protected]>
>> Cc: Matthew Wilcox <[email protected]>
>> Cc: Souptick Joarder <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> mm/hmm.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/hmm.c b/mm/hmm.c
>> index f6c4c8633db9..2aa75dbed04a 100644
>> +++ b/mm/hmm.c
>> @@ -936,7 +936,7 @@ int hmm_range_register(struct hmm_range *range,
>> range->start = start;
>> range->end = end;
>>
>> - range->hmm = hmm_get_or_create(mm);
>> + range->hmm = mm_get_hmm(mm);
>> if (!range->hmm)
>> return -EFAULT;
>
> I looked for documentation saying that hmm_range_register should only
> be done inside a hmm_mirror_register and didn't see it. Did I miss it?
> Can you add a comment?
>
> It is really good to fix this because it means we can rely on mmap sem
> to manage mm->hmm!
>
> If this is true then I also think we should change the signature of
> the function to make this dependency relationship clear, and remove
> some possible confusing edge cases.
>
> What do you think about something like this? (unfinished)
I like it...
>
> commit 29098bd59cf481ad1915db40aefc8435dabb8b28
> Author: Jason Gunthorpe <[email protected]>
> Date: Thu May 23 09:41:19 2019 -0300
>
> mm/hmm: Use hmm_mirror not mm as an argument for hmm_register_range
>
> Ralf observes that hmm_register_range() can only be called by a driver
^Ralph
:)
> while a mirror is registered. Make this clear in the API by passing
> in the mirror structure as a parameter.
>
> This also simplifies understanding the lifetime model for struct hmm,
> as the hmm pointer must be valid as part of a registered mirror
> so all we need in hmm_register_range() is a simple kref_get.
>
> Suggested-by: Ralph Campbell <[email protected]>
> Signed-off-by: Jason Gunthorpe <[email protected]>
>
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 8b91c90d3b88cb..87d29e085a69f7 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -503,7 +503,7 @@ static inline bool hmm_mirror_mm_is_alive(struct hmm_mirror *mirror)
> * Please see Documentation/vm/hmm.rst for how to use the range API.
> */
> int hmm_range_register(struct hmm_range *range,
> - struct mm_struct *mm,
> + struct hmm_mirror *mirror,
> unsigned long start,
> unsigned long end,
> unsigned page_shift);
> @@ -539,7 +539,8 @@ static inline bool hmm_vma_range_done(struct hmm_range *range)
> }
>
> /* This is a temporary helper to avoid merge conflict between trees. */
> -static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> +static inline int hmm_vma_fault(struct hmm_mirror *mirror,
> + struct hmm_range *range, bool block)
> {
> long ret;
>
> @@ -552,7 +553,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> range->default_flags = 0;
> range->pfn_flags_mask = -1UL;
>
> - ret = hmm_range_register(range, range->vma->vm_mm,
> + ret = hmm_range_register(range, mirror,
> range->start, range->end,
> PAGE_SHIFT);
> if (ret)
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 824e7e160d8167..fa1b04fcfc2549 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -927,7 +927,7 @@ static void hmm_pfns_clear(struct hmm_range *range,
> * Track updates to the CPU page table see include/linux/hmm.h
> */
> int hmm_range_register(struct hmm_range *range,
> - struct mm_struct *mm,
> + struct hmm_mirror *mirror,
> unsigned long start,
> unsigned long end,
> unsigned page_shift)
> @@ -935,7 +935,6 @@ int hmm_range_register(struct hmm_range *range,
> unsigned long mask = ((1UL << page_shift) - 1UL);
>
> range->valid = false;
> - range->hmm = NULL;
>
> if ((start & mask) || (end & mask))
> return -EINVAL;
> @@ -946,15 +945,12 @@ int hmm_range_register(struct hmm_range *range,
> range->start = start;
> range->end = end;
>
> - range->hmm = hmm_get_or_create(mm);
> - if (!range->hmm)
> - return -EFAULT;
> -
> /* Check if hmm_mm_destroy() was call. */
> - if (range->hmm->mm == NULL || range->hmm->dead) {
> - hmm_put(range->hmm);
> + if (mirror->hmm->mm == NULL || mirror->hmm->dead)
> return -EFAULT;
> - }
> +
> + range->hmm = mirror->hmm;
> + kref_get(&range->hmm->kref);
>
> /* Initialize range to track CPU page table update */
> mutex_lock(&range->hmm->lock);
>
So far, this looks very good to me. Passing the mirror around is an
elegant API solution to the "we must have a valid mirror in order to
call this function" constraint.
thanks,
--
John Hubbard
NVIDIA
On Mon, May 06, 2019 at 04:29:38PM -0700, [email protected] wrote:
> From: Ralph Campbell <[email protected]>
>
> Update the HMM documentation to reflect the latest API and make a few minor
> wording changes.
>
> Signed-off-by: Ralph Campbell <[email protected]>
> Reviewed-by: Jérôme Glisse <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Balbir Singh <[email protected]>
> Cc: Dan Carpenter <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Souptick Joarder <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Documentation/vm/hmm.rst | 139 ++++++++++++++++++++-------------------
> 1 file changed, 73 insertions(+), 66 deletions(-)
Okay, lets start picking up hmm patches in to the new shared hmm.git,
as promised I will take responsibility to send these to Linus. The
tree is here:
https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=hmm
This looks fine to me with one minor comment:
> diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
> index ec1efa32af3c..7c1e929931a0 100644
> +++ b/Documentation/vm/hmm.rst
>
> @@ -151,21 +151,27 @@ registration of an hmm_mirror struct::
>
> int hmm_mirror_register(struct hmm_mirror *mirror,
> struct mm_struct *mm);
> - int hmm_mirror_register_locked(struct hmm_mirror *mirror,
> - struct mm_struct *mm);
>
> -
> -The locked variant is to be used when the driver is already holding mmap_sem
> -of the mm in write mode. The mirror struct has a set of callbacks that are used
> +The mirror struct has a set of callbacks that are used
> to propagate CPU page tables::
>
> struct hmm_mirror_ops {
> + /* release() - release hmm_mirror
> + *
> + * @mirror: pointer to struct hmm_mirror
> + *
> + * This is called when the mm_struct is being released.
> + * The callback should make sure no references to the mirror occur
> + * after the callback returns.
> + */
This is not quite accurate (at least, as the other series I sent
intends), the struct hmm_mirror is valid up until
hmm_mirror_unregister() is called - specifically it remains valid
after the release() callback.
I will revise it (and the hmm.h comment it came from) to read the
below. Please let me know if you'd like something else:
/* release() - release hmm_mirror
*
* @mirror: pointer to struct hmm_mirror
*
* This is called when the mm_struct is being released. The callback
* must ensure that all access to any pages obtained from this mirror
* is halted before the callback returns. All future access should
* fault.
*/
The key task for release is to fence off all device access to any
related pages as the mm is about to recycle them and the device must
not cause a use-after-free.
I applied it to hmm.git
Thanks,
Jason
On Mon, May 06, 2019 at 04:29:39PM -0700, [email protected] wrote:
> From: Ralph Campbell <[email protected]>
>
> There are no functional changes, just some coding style clean ups and
> minor comment changes.
>
> Signed-off-by: Ralph Campbell <[email protected]>
> Reviewed-by: Jérôme Glisse <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Balbir Singh <[email protected]>
> Cc: Dan Carpenter <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Souptick Joarder <[email protected]>
> Cc: Andrew Morton <[email protected]>
> include/linux/hmm.h | 71 +++++++++++++++++++++++----------------------
> mm/hmm.c | 51 ++++++++++++++++----------------
> 2 files changed, 62 insertions(+), 60 deletions(-)
Applied to hmm.git, thanks
Jason
On Thu, Jun 06, 2019 at 11:16:44AM -0300, Jason Gunthorpe wrote:
> On Mon, May 06, 2019 at 04:29:39PM -0700, [email protected] wrote:
> > From: Ralph Campbell <[email protected]>
> >
> > There are no functional changes, just some coding style clean ups and
> > minor comment changes.
> >
> > Signed-off-by: Ralph Campbell <[email protected]>
> > Reviewed-by: J?r?me Glisse <[email protected]>
> > Cc: John Hubbard <[email protected]>
> > Cc: Ira Weiny <[email protected]>
> > Cc: Dan Williams <[email protected]>
> > Cc: Arnd Bergmann <[email protected]>
> > Cc: Balbir Singh <[email protected]>
> > Cc: Dan Carpenter <[email protected]>
> > Cc: Matthew Wilcox <[email protected]>
> > Cc: Souptick Joarder <[email protected]>
> > Cc: Andrew Morton <[email protected]>
> > include/linux/hmm.h | 71 +++++++++++++++++++++++----------------------
> > mm/hmm.c | 51 ++++++++++++++++----------------
> > 2 files changed, 62 insertions(+), 60 deletions(-)
>
> Applied to hmm.git, thanks
Can you hold off, i was already collecting patches and we will
be stepping on each other toe ... for instance i had
https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.3
But i have been working on more collection.
Cheers,
J?r?me
On Mon, May 06, 2019 at 04:29:41PM -0700, [email protected] wrote:
> From: Ralph Campbell <[email protected]>
>
> The helper function hmm_vma_fault() calls hmm_range_register() but is
> missing a call to hmm_range_unregister() in one of the error paths.
> This leads to a reference count leak and ultimately a memory leak on
> struct hmm.
>
> Always call hmm_range_unregister() if hmm_range_register() succeeded.
>
> Signed-off-by: Ralph Campbell <[email protected]>
> Signed-off-by: Jérôme Glisse <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Arnd Bergmann <[email protected]>
> Cc: Balbir Singh <[email protected]>
> Cc: Dan Carpenter <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: Souptick Joarder <[email protected]>
> Cc: Andrew Morton <[email protected]>
> ---
> include/linux/hmm.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> index 35a429621e1e..fa0671d67269 100644
> --- a/include/linux/hmm.h
> +++ b/include/linux/hmm.h
> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> return (int)ret;
>
> if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
> + hmm_range_unregister(range);
> /*
> * The mmap_sem was taken by driver we release it here and
> * returns -EAGAIN which correspond to mmap_sem have been
> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>
> ret = hmm_range_fault(range, block);
> if (ret <= 0) {
> + hmm_range_unregister(range);
While this seems to be a clear improvement, it seems there is still a
bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but
never calls hmm_range_unregister() for its on stack range - and
hmm_vma_fault() still returns with the range registered.
As hmm_vma_fault() is only used by nouveau and is marked as
deprecated, I think we need to fix nouveau, either by dropping
hmm_range_fault(), or by adding the missing unregister to nouveau in
this patch.
Also, I see in linux-next that amdgpu_ttm.c has wrongly copied use of
this deprecated API, including these bugs...
amd folks: Can you please push a patch for your driver to stop using
hmm_vma_fault() and correct the use-after free? Ideally I'd like to
delete this function this merge cycle from hmm.git
Also if you missed it, I'm running a clean hmm.git that you can pull
into the AMD tree, if necessary, to get the changes that will go into
5.3 - if you need/wish to do this please consult with me before making a
merge commit, thanks. See:
https://lore.kernel.org/lkml/[email protected]/
So Ralph, you'll need to resend this.
Thanks,
Jason
On Mon, May 06, 2019 at 04:29:37PM -0700, [email protected] wrote:
> From: Ralph Campbell <[email protected]>
>
> I hit a use after free bug in hmm_free() with KASAN and then couldn't
> stop myself from cleaning up a bunch of documentation and coding style
> changes. So the first two patches are clean ups, the last three are
> the fixes.
>
> Ralph Campbell (5):
> mm/hmm: Update HMM documentation
> mm/hmm: Clean up some coding style and comments
I applied these two to hmm.git
> mm/hmm: hmm_vma_fault() doesn't always call hmm_range_unregister()
This one needs revision
> mm/hmm: Use mm_get_hmm() in hmm_range_register()
> mm/hmm: Fix mm stale reference use in hmm_free()
I belive we all agreed that the approach in the RFC series I sent is
preferred, so these are superseded by that series.
Thanks,
Jason
On Thu, Jun 06, 2019 at 10:27:43AM -0400, Jerome Glisse wrote:
> On Thu, Jun 06, 2019 at 11:16:44AM -0300, Jason Gunthorpe wrote:
> > On Mon, May 06, 2019 at 04:29:39PM -0700, [email protected] wrote:
> > > From: Ralph Campbell <[email protected]>
> > >
> > > There are no functional changes, just some coding style clean ups and
> > > minor comment changes.
> > >
> > > Signed-off-by: Ralph Campbell <[email protected]>
> > > Reviewed-by: Jérôme Glisse <[email protected]>
> > > Cc: John Hubbard <[email protected]>
> > > Cc: Ira Weiny <[email protected]>
> > > Cc: Dan Williams <[email protected]>
> > > Cc: Arnd Bergmann <[email protected]>
> > > Cc: Balbir Singh <[email protected]>
> > > Cc: Dan Carpenter <[email protected]>
> > > Cc: Matthew Wilcox <[email protected]>
> > > Cc: Souptick Joarder <[email protected]>
> > > Cc: Andrew Morton <[email protected]>
> > > include/linux/hmm.h | 71 +++++++++++++++++++++++----------------------
> > > mm/hmm.c | 51 ++++++++++++++++----------------
> > > 2 files changed, 62 insertions(+), 60 deletions(-)
> >
> > Applied to hmm.git, thanks
>
> Can you hold off, i was already collecting patches and we will
> be stepping on each other toe ... for instance i had
I'd really rather not, I have a lot of work to do for this cycle and
this part needs to start to move forward now. I can't do everything
last minute, sorry.
The patches I picked up all look very safe to move ahead.
> https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.3
I'm aware, and am referring to this tree. You can trivially rebase it
on top of hmm.git..
BTW, what were you planning to do with this git branch anyhow?
As we'd already agreed I will send the hmm patches to Linus on a clean
git branch so we can properly collaborate between the various involved
trees.
As a tree-runner I very much prefer to take patches directly from the
mailing list where everything is public. This is the standard kernel
workflow.
> But i have been working on more collection.
We haven't talked on process, but for me, please follow the standard
kernel development process and respond to patches on the list with
comments, ack/review them, etc. I may not have seen every patch, so
I'd appreciate it if you cc me on stuff that needs to be picked up,
thanks.
I am sorting out the changes you made off-list in your .git right now,
but this is very time consuming.. Please try to keep comments &
changes on list.
I don't want to take any thing into hmm.git that is not deemed ready -
so please feel free to continue to use your freedesktop git to
co-ordinate testing.
Thanks,
Jason
On Thu, Jun 06, 2019 at 12:41:29PM -0300, Jason Gunthorpe wrote:
> On Thu, Jun 06, 2019 at 10:27:43AM -0400, Jerome Glisse wrote:
> > On Thu, Jun 06, 2019 at 11:16:44AM -0300, Jason Gunthorpe wrote:
> > > On Mon, May 06, 2019 at 04:29:39PM -0700, [email protected] wrote:
> > > > From: Ralph Campbell <[email protected]>
> > > >
> > > > There are no functional changes, just some coding style clean ups and
> > > > minor comment changes.
> > > >
> > > > Signed-off-by: Ralph Campbell <[email protected]>
> > > > Reviewed-by: J?r?me Glisse <[email protected]>
> > > > Cc: John Hubbard <[email protected]>
> > > > Cc: Ira Weiny <[email protected]>
> > > > Cc: Dan Williams <[email protected]>
> > > > Cc: Arnd Bergmann <[email protected]>
> > > > Cc: Balbir Singh <[email protected]>
> > > > Cc: Dan Carpenter <[email protected]>
> > > > Cc: Matthew Wilcox <[email protected]>
> > > > Cc: Souptick Joarder <[email protected]>
> > > > Cc: Andrew Morton <[email protected]>
> > > > include/linux/hmm.h | 71 +++++++++++++++++++++++----------------------
> > > > mm/hmm.c | 51 ++++++++++++++++----------------
> > > > 2 files changed, 62 insertions(+), 60 deletions(-)
> > >
> > > Applied to hmm.git, thanks
> >
> > Can you hold off, i was already collecting patches and we will
> > be stepping on each other toe ... for instance i had
>
> I'd really rather not, I have a lot of work to do for this cycle and
> this part needs to start to move forward now. I can't do everything
> last minute, sorry.
>
> The patches I picked up all look very safe to move ahead.
I want to post all the patch you need to apply soon, it is really
painful because they are lot of different branches i have to work
with if you start pulling patches that differ from the below branch
then you are making thing ever more difficult for me.
If you hold of i will be posting all the patches in one big set so
that you can apply all of them in one go and it will be a _lot_
easier for me that way.
>
> > https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-5.3
>
> I'm aware, and am referring to this tree. You can trivially rebase it
> on top of hmm.git..
>
> BTW, what were you planning to do with this git branch anyhow?
This is just something i use to do testing and stack-up all patches.
>
> As we'd already agreed I will send the hmm patches to Linus on a clean
> git branch so we can properly collaborate between the various involved
> trees.
>
> As a tree-runner I very much prefer to take patches directly from the
> mailing list where everything is public. This is the standard kernel
> workflow.
Like i said above i want to resend all the patches in one big set.
On process thing it would be easier if we ask Dave/Daniel to merge
hmm within drm this cycle. Merging with Linus will break drm drivers
and it seems easier to me to fix all this within the drm tree.
But if you want to do everything with Linus fine.
Cheers,
J?r?me
On Mon, May 06, 2019 at 04:29:39PM -0700, [email protected] wrote:
> @@ -924,6 +922,7 @@ int hmm_range_register(struct hmm_range *range,
> unsigned page_shift)
> {
> unsigned long mask = ((1UL << page_shift) - 1UL);
> + struct hmm *hmm;
>
> range->valid = false;
> range->hmm = NULL;
I was finishing these patches off and noticed that 'hmm' above is
never initialized.
I added the below to this patch:
diff --git a/mm/hmm.c b/mm/hmm.c
index 678873eb21930a..8e7403f081f44a 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -932,19 +932,20 @@ int hmm_range_register(struct hmm_range *range,
range->start = start;
range->end = end;
- range->hmm = hmm_get_or_create(mm);
- if (!range->hmm)
+ hmm = hmm_get_or_create(mm);
+ if (!hmm)
return -EFAULT;
/* Check if hmm_mm_destroy() was call. */
- if (range->hmm->mm == NULL || range->hmm->dead) {
- hmm_put(range->hmm);
+ if (hmm->mm == NULL || hmm->dead) {
+ hmm_put(hmm);
return -EFAULT;
}
/* Initialize range to track CPU page table updates. */
- mutex_lock(&range->hmm->lock);
+ mutex_lock(&hmm->lock);
+ range->hmm = hmm;
list_add_rcu(&range->list, &hmm->ranges);
/*
Which I think was the intent of adding the 'struct hmm *'. I prefer
this arrangement as it does not set an leave an invalid hmm pointer in
the range if there is a failure..
Most probably the later patches fixed this up?
Please confirm, thanks
Regards,
Jason
On Thu, 2019-06-06 at 11:52 -0400, Jerome Glisse wrote:
> On Thu, Jun 06, 2019 at 12:41:29PM -0300, Jason Gunthorpe wrote:
> > On Thu, Jun 06, 2019 at 10:27:43AM -0400, Jerome Glisse wrote:
> > > On Thu, Jun 06, 2019 at 11:16:44AM -0300, Jason Gunthorpe wrote:
> > > > On Mon, May 06, 2019 at 04:29:39PM -0700, [email protected] wrote:
> > > > > From: Ralph Campbell <[email protected]>
> > > > >
> > > > > There are no functional changes, just some coding style clean ups and
> > > > > minor comment changes.
> > > > >
> > > > > Signed-off-by: Ralph Campbell <[email protected]>
> > > > > Reviewed-by: J?r?me Glisse <[email protected]>
> > > > > Cc: John Hubbard <[email protected]>
> > > > > Cc: Ira Weiny <[email protected]>
> > > > > Cc: Dan Williams <[email protected]>
> > > > > Cc: Arnd Bergmann <[email protected]>
> > > > > Cc: Balbir Singh <[email protected]>
> > > > > Cc: Dan Carpenter <[email protected]>
> > > > > Cc: Matthew Wilcox <[email protected]>
> > > > > Cc: Souptick Joarder <[email protected]>
> > > > > Cc: Andrew Morton <[email protected]>
> > > > > include/linux/hmm.h | 71 +++++++++++++++++++++++----------------------
> > > > > mm/hmm.c | 51 ++++++++++++++++----------------
> > > > > 2 files changed, 62 insertions(+), 60 deletions(-)
> > > >
> > > > Applied to hmm.git, thanks
> > >
> > >
> > > Can you hold off, i was already collecting patches and we will
> > > be stepping on each other toe ... for instance i had
> >
> > I'd really rather not, I have a lot of work to do for this cycle and
> > this part needs to start to move forward now. I can't do everything
> > last minute, sorry.
> >
> > The patches I picked up all look very safe to move ahead.
>
> I want to post all the patch you need to apply soon, it is really
> painful because they are lot of different branches i have to work
> with if you start pulling patches that differ from the below branch
> then you are making thing ever more difficult for me.
>
> If you hold of i will be posting all the patches in one big set so
> that you can apply all of them in one go and it will be a _lot_
> easier for me that way.
Easier for you is not necessarily easier for a community.
Publish early and often.
On 6/6/19 7:50 AM, Jason Gunthorpe wrote:
> On Mon, May 06, 2019 at 04:29:41PM -0700, [email protected] wrote:
>> From: Ralph Campbell <[email protected]>
>>
>> The helper function hmm_vma_fault() calls hmm_range_register() but is
>> missing a call to hmm_range_unregister() in one of the error paths.
>> This leads to a reference count leak and ultimately a memory leak on
>> struct hmm.
>>
>> Always call hmm_range_unregister() if hmm_range_register() succeeded.
>>
>> Signed-off-by: Ralph Campbell <[email protected]>
>> Signed-off-by: Jérôme Glisse <[email protected]>
>> Cc: John Hubbard <[email protected]>
>> Cc: Ira Weiny <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Arnd Bergmann <[email protected]>
>> Cc: Balbir Singh <[email protected]>
>> Cc: Dan Carpenter <[email protected]>
>> Cc: Matthew Wilcox <[email protected]>
>> Cc: Souptick Joarder <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> ---
>> include/linux/hmm.h | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>
>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
>> index 35a429621e1e..fa0671d67269 100644
>> --- a/include/linux/hmm.h
>> +++ b/include/linux/hmm.h
>> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>> return (int)ret;
>>
>> if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
>> + hmm_range_unregister(range);
>> /*
>> * The mmap_sem was taken by driver we release it here and
>> * returns -EAGAIN which correspond to mmap_sem have been
>> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>
>> ret = hmm_range_fault(range, block);
>> if (ret <= 0) {
>> + hmm_range_unregister(range);
>
> While this seems to be a clear improvement, it seems there is still a
> bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but
> never calls hmm_range_unregister() for its on stack range - and
> hmm_vma_fault() still returns with the range registered.
>
> As hmm_vma_fault() is only used by nouveau and is marked as
> deprecated, I think we need to fix nouveau, either by dropping
> hmm_range_fault(), or by adding the missing unregister to nouveau in
> this patch.
I will send a patch for nouveau to use hmm_range_register() and
hmm_range_fault() and do some testing with OpenCL.
I can also send a separate patch to then remove hmm_vma_fault()
but I guess that should be after AMD's changes.
> Also, I see in linux-next that amdgpu_ttm.c has wrongly copied use of
> this deprecated API, including these bugs...
>
> amd folks: Can you please push a patch for your driver to stop using
> hmm_vma_fault() and correct the use-after free? Ideally I'd like to
> delete this function this merge cycle from hmm.git
>
> Also if you missed it, I'm running a clean hmm.git that you can pull
> into the AMD tree, if necessary, to get the changes that will go into
> 5.3 - if you need/wish to do this please consult with me before making a
> merge commit, thanks. See:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> So Ralph, you'll need to resend this.
>
> Thanks,
> Jason
>
On Thu, Jun 06, 2019 at 11:52:13AM -0400, Jerome Glisse wrote:
> On Thu, Jun 06, 2019 at 12:41:29PM -0300, Jason Gunthorpe wrote:
> > On Thu, Jun 06, 2019 at 10:27:43AM -0400, Jerome Glisse wrote:
> > > On Thu, Jun 06, 2019 at 11:16:44AM -0300, Jason Gunthorpe wrote:
> > > > On Mon, May 06, 2019 at 04:29:39PM -0700, [email protected] wrote:
> > > > > From: Ralph Campbell <[email protected]>
> > > > >
> > > > > There are no functional changes, just some coding style clean ups and
> > > > > minor comment changes.
> > > > >
> > > > > Signed-off-by: Ralph Campbell <[email protected]>
> > > > > Reviewed-by: Jérôme Glisse <[email protected]>
> > > > > Cc: John Hubbard <[email protected]>
> > > > > Cc: Ira Weiny <[email protected]>
> > > > > Cc: Dan Williams <[email protected]>
> > > > > Cc: Arnd Bergmann <[email protected]>
> > > > > Cc: Balbir Singh <[email protected]>
> > > > > Cc: Dan Carpenter <[email protected]>
> > > > > Cc: Matthew Wilcox <[email protected]>
> > > > > Cc: Souptick Joarder <[email protected]>
> > > > > Cc: Andrew Morton <[email protected]>
> > > > > include/linux/hmm.h | 71 +++++++++++++++++++++++----------------------
> > > > > mm/hmm.c | 51 ++++++++++++++++----------------
> > > > > 2 files changed, 62 insertions(+), 60 deletions(-)
> > > >
> > > > Applied to hmm.git, thanks
> > >
> > > Can you hold off, i was already collecting patches and we will
> > > be stepping on each other toe ... for instance i had
> >
> > I'd really rather not, I have a lot of work to do for this cycle and
> > this part needs to start to move forward now. I can't do everything
> > last minute, sorry.
> >
> > The patches I picked up all look very safe to move ahead.
>
> I want to post all the patch you need to apply soon, it is really
> painful because they are lot of different branches
I've already handled everything in your hmm-5.3, so I don't think
there is anything for you to do in that regard. Please double check
though!
If you have new patches please post them against something sensible
(and put them in a git branch) and I can usually sort out 'git am'
conflicts pretty quickly.
> If you hold of i will be posting all the patches in one big set so
> that you can apply all of them in one go and it will be a _lot_
> easier for me that way.
You don't need to repost my patches, I can do that myself, but thanks
for all the help getting them ready! Please respond to my v2 with more
review's/ack's/changes/etc so the series can move toward being
applied.
> On process thing it would be easier if we ask Dave/Daniel to merge
> hmm within drm this cycle.
Yes, I expect we will do this - probably also to the AMD tree judging
on things in -next. This is the entire point of running a shared tree.
> Merging with Linus will break drm drivers and it seems easier to me
> to fix all this within the drm tree.
This is the normal process with a shared tree, we merge the tree
*everywhere it is required* so all trees can run concurrently.
I will *also* send it to Linus early so that Linus reviews the hmm
patches in the HMM pull request, not in the DRM or RDMA pull
request. This is best-practice when working across trees like this.
Please just keep me up to date when things conflicting arise and we
will work out the best solution.
Reminder, I still need patches from you for:
- Fix all the kconfig stuff for randconfig failures/etc
- Enable ARM64
- Remove deprecated APIs from hmm.h
Please send them ASAP so it can be tested.
There shouldn't be any patches held back for 5.4 - send them all now.
Thanks,
Jason
On Thu, Jun 06, 2019 at 11:50:15AM -0700, Ralph Campbell wrote:
> Yes, I agree this is better.
>
> Also, I noticed the sample code for hmm_range_register() is wrong.
> If you could merge this minor change into this patch, that
> would be appreciated.
Sure, done thanks
Jason
On Thu, Jun 06, 2019 at 12:44:36PM -0700, Ralph Campbell wrote:
>
> On 6/6/19 7:50 AM, Jason Gunthorpe wrote:
> > On Mon, May 06, 2019 at 04:29:41PM -0700, [email protected] wrote:
> > > From: Ralph Campbell <[email protected]>
> > >
> > > The helper function hmm_vma_fault() calls hmm_range_register() but is
> > > missing a call to hmm_range_unregister() in one of the error paths.
> > > This leads to a reference count leak and ultimately a memory leak on
> > > struct hmm.
> > >
> > > Always call hmm_range_unregister() if hmm_range_register() succeeded.
> > >
> > > Signed-off-by: Ralph Campbell <[email protected]>
> > > Signed-off-by: Jérôme Glisse <[email protected]>
> > > Cc: John Hubbard <[email protected]>
> > > Cc: Ira Weiny <[email protected]>
> > > Cc: Dan Williams <[email protected]>
> > > Cc: Arnd Bergmann <[email protected]>
> > > Cc: Balbir Singh <[email protected]>
> > > Cc: Dan Carpenter <[email protected]>
> > > Cc: Matthew Wilcox <[email protected]>
> > > Cc: Souptick Joarder <[email protected]>
> > > Cc: Andrew Morton <[email protected]>
> > > include/linux/hmm.h | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > > diff --git a/include/linux/hmm.h b/include/linux/hmm.h
> > > index 35a429621e1e..fa0671d67269 100644
> > > +++ b/include/linux/hmm.h
> > > @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> > > return (int)ret;
> > > if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
> > > + hmm_range_unregister(range);
> > > /*
> > > * The mmap_sem was taken by driver we release it here and
> > > * returns -EAGAIN which correspond to mmap_sem have been
> > > @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
> > > ret = hmm_range_fault(range, block);
> > > if (ret <= 0) {
> > > + hmm_range_unregister(range);
> >
> > While this seems to be a clear improvement, it seems there is still a
> > bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but
> > never calls hmm_range_unregister() for its on stack range - and
> > hmm_vma_fault() still returns with the range registered.
> >
> > As hmm_vma_fault() is only used by nouveau and is marked as
> > deprecated, I think we need to fix nouveau, either by dropping
> > hmm_range_fault(), or by adding the missing unregister to nouveau in
> > this patch.
>
> I will send a patch for nouveau to use hmm_range_register() and
> hmm_range_fault() and do some testing with OpenCL.
wow, thanks, I'd like to also really like to send such a thing through
hmm.git - do you know who the nouveau maintainers are so we can
collaborate on patch planning this?
> I can also send a separate patch to then remove hmm_vma_fault()
> but I guess that should be after AMD's changes.
Let us wait to hear back from AMD how they can consume hmm.git - I'd
very much like to get everything done in one kernel cycle!
Regards,
Jason
On 6/6/19 7:02 AM, Jason Gunthorpe wrote:
> On Mon, May 06, 2019 at 04:29:38PM -0700, [email protected] wrote:
>> From: Ralph Campbell <[email protected]>
>>
>> Update the HMM documentation to reflect the latest API and make a few minor
>> wording changes.
>>
>> Signed-off-by: Ralph Campbell <[email protected]>
>> Reviewed-by: Jérôme Glisse <[email protected]>
>> Cc: John Hubbard <[email protected]>
>> Cc: Ira Weiny <[email protected]>
>> Cc: Dan Williams <[email protected]>
>> Cc: Arnd Bergmann <[email protected]>
>> Cc: Balbir Singh <[email protected]>
>> Cc: Dan Carpenter <[email protected]>
>> Cc: Matthew Wilcox <[email protected]>
>> Cc: Souptick Joarder <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> Documentation/vm/hmm.rst | 139 ++++++++++++++++++++-------------------
>> 1 file changed, 73 insertions(+), 66 deletions(-)
>
> Okay, lets start picking up hmm patches in to the new shared hmm.git,
> as promised I will take responsibility to send these to Linus. The
> tree is here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=hmm
>
> This looks fine to me with one minor comment:
>
>> diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
>> index ec1efa32af3c..7c1e929931a0 100644
>> +++ b/Documentation/vm/hmm.rst
>>
>> @@ -151,21 +151,27 @@ registration of an hmm_mirror struct::
>>
>> int hmm_mirror_register(struct hmm_mirror *mirror,
>> struct mm_struct *mm);
>> - int hmm_mirror_register_locked(struct hmm_mirror *mirror,
>> - struct mm_struct *mm);
>>
>> -
>> -The locked variant is to be used when the driver is already holding mmap_sem
>> -of the mm in write mode. The mirror struct has a set of callbacks that are used
>> +The mirror struct has a set of callbacks that are used
>> to propagate CPU page tables::
>>
>> struct hmm_mirror_ops {
>> + /* release() - release hmm_mirror
>> + *
>> + * @mirror: pointer to struct hmm_mirror
>> + *
>> + * This is called when the mm_struct is being released.
>> + * The callback should make sure no references to the mirror occur
>> + * after the callback returns.
>> + */
>
> This is not quite accurate (at least, as the other series I sent
> intends), the struct hmm_mirror is valid up until
> hmm_mirror_unregister() is called - specifically it remains valid
> after the release() callback.
>
> I will revise it (and the hmm.h comment it came from) to read the
> below. Please let me know if you'd like something else:
>
> /* release() - release hmm_mirror
> *
> * @mirror: pointer to struct hmm_mirror
> *
> * This is called when the mm_struct is being released. The callback
> * must ensure that all access to any pages obtained from this mirror
> * is halted before the callback returns. All future access should
> * fault.
> */
>
> The key task for release is to fence off all device access to any
> related pages as the mm is about to recycle them and the device must
> not cause a use-after-free.
>
> I applied it to hmm.git
>
> Thanks,
> Jason
>
Yes, I agree this is better.
Also, I noticed the sample code for hmm_range_register() is wrong.
If you could merge this minor change into this patch, that
would be appreciated.
diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index dc8fe4241a18..b5fb9bc02aa2 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -245,7 +245,7 @@ The usage pattern is::
hmm_range_wait_until_valid(&range, TIMEOUT_IN_MSEC);
goto again;
}
- hmm_mirror_unregister(&range);
+ hmm_range_unregister(&range);
return ret;
}
take_lock(driver->update);
@@ -257,7 +257,7 @@ The usage pattern is::
// Use pfns array content to update device page table
- hmm_mirror_unregister(&range);
+ hmm_range_unregister(&range);
release_lock(driver->update);
up_read(&mm->mmap_sem);
return 0;
On 6/6/19 12:54 PM, Jason Gunthorpe wrote:
> On Thu, Jun 06, 2019 at 12:44:36PM -0700, Ralph Campbell wrote:
>>
>> On 6/6/19 7:50 AM, Jason Gunthorpe wrote:
>>> On Mon, May 06, 2019 at 04:29:41PM -0700, [email protected] wrote:
>>>> From: Ralph Campbell <[email protected]>
>>>>
>>>> The helper function hmm_vma_fault() calls hmm_range_register() but is
>>>> missing a call to hmm_range_unregister() in one of the error paths.
>>>> This leads to a reference count leak and ultimately a memory leak on
>>>> struct hmm.
>>>>
>>>> Always call hmm_range_unregister() if hmm_range_register() succeeded.
>>>>
>>>> Signed-off-by: Ralph Campbell <[email protected]>
>>>> Signed-off-by: Jérôme Glisse <[email protected]>
>>>> Cc: John Hubbard <[email protected]>
>>>> Cc: Ira Weiny <[email protected]>
>>>> Cc: Dan Williams <[email protected]>
>>>> Cc: Arnd Bergmann <[email protected]>
>>>> Cc: Balbir Singh <[email protected]>
>>>> Cc: Dan Carpenter <[email protected]>
>>>> Cc: Matthew Wilcox <[email protected]>
>>>> Cc: Souptick Joarder <[email protected]>
>>>> Cc: Andrew Morton <[email protected]>
>>>> include/linux/hmm.h | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>>> diff --git a/include/linux/hmm.h b/include/linux/hmm.h
>>>> index 35a429621e1e..fa0671d67269 100644
>>>> +++ b/include/linux/hmm.h
>>>> @@ -559,6 +559,7 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>>> return (int)ret;
>>>> if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
>>>> + hmm_range_unregister(range);
>>>> /*
>>>> * The mmap_sem was taken by driver we release it here and
>>>> * returns -EAGAIN which correspond to mmap_sem have been
>>>> @@ -570,13 +571,13 @@ static inline int hmm_vma_fault(struct hmm_range *range, bool block)
>>>> ret = hmm_range_fault(range, block);
>>>> if (ret <= 0) {
>>>> + hmm_range_unregister(range);
>>>
>>> While this seems to be a clear improvement, it seems there is still a
>>> bug in nouveau_svm.c around here as I see it calls hmm_vma_fault() but
>>> never calls hmm_range_unregister() for its on stack range - and
>>> hmm_vma_fault() still returns with the range registered.
>>>
>>> As hmm_vma_fault() is only used by nouveau and is marked as
>>> deprecated, I think we need to fix nouveau, either by dropping
>>> hmm_range_fault(), or by adding the missing unregister to nouveau in
>>> this patch.
>>
>> I will send a patch for nouveau to use hmm_range_register() and
>> hmm_range_fault() and do some testing with OpenCL.
>
> wow, thanks, I'd like to also really like to send such a thing through
> hmm.git - do you know who the nouveau maintainers are so we can
> collaborate on patch planning this?
Ben Skeggs <[email protected]> is the maintainer and
[email protected] is the mailing list for changes.
I'll be sure to CC them for the patch.
>> I can also send a separate patch to then remove hmm_vma_fault()
>> but I guess that should be after AMD's changes.
>
> Let us wait to hear back from AMD how they can consume hmm.git - I'd
> very much like to get everything done in one kernel cycle!
>
> Regards,
> Jason
>
On 6/6/19 8:57 AM, Jason Gunthorpe wrote:
> On Mon, May 06, 2019 at 04:29:39PM -0700, [email protected] wrote:
>> @@ -924,6 +922,7 @@ int hmm_range_register(struct hmm_range *range,
>> unsigned page_shift)
>> {
>> unsigned long mask = ((1UL << page_shift) - 1UL);
>> + struct hmm *hmm;
>>
>> range->valid = false;
>> range->hmm = NULL;
>
> I was finishing these patches off and noticed that 'hmm' above is
> never initialized.
>
> I added the below to this patch:
>
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 678873eb21930a..8e7403f081f44a 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -932,19 +932,20 @@ int hmm_range_register(struct hmm_range *range,
> range->start = start;
> range->end = end;
>
> - range->hmm = hmm_get_or_create(mm);
> - if (!range->hmm)
> + hmm = hmm_get_or_create(mm);
> + if (!hmm)
> return -EFAULT;
>
> /* Check if hmm_mm_destroy() was call. */
> - if (range->hmm->mm == NULL || range->hmm->dead) {
> - hmm_put(range->hmm);
> + if (hmm->mm == NULL || hmm->dead) {
> + hmm_put(hmm);
> return -EFAULT;
> }
>
> /* Initialize range to track CPU page table updates. */
> - mutex_lock(&range->hmm->lock);
> + mutex_lock(&hmm->lock);
>
> + range->hmm = hmm;
> list_add_rcu(&range->list, &hmm->ranges);
>
> /*
>
> Which I think was the intent of adding the 'struct hmm *'. I prefer
> this arrangement as it does not set an leave an invalid hmm pointer in
> the range if there is a failure..
>
> Most probably the later patches fixed this up?
>
> Please confirm, thanks
>
> Regards,
> Jason
>
Yes, you understand correctly. That was the intended clean up.
I must have split my original patch set incorrectly.