2019-07-24 06:53:58

by Christoph Hellwig

[permalink] [raw]
Subject: hmm_range_fault related fixes and legacy API removal v3

Hi Jérôme, Ben and Jason,

below is a series against the hmm tree which fixes up the mmap_sem
locking in nouveau and while at it also removes leftover legacy HMM APIs
only used by nouveau.

The first 4 patches are a bug fix for nouveau, which I suspect should
go into this merge window even if the code is marked as staging, just
to avoid people copying the breakage.

Changes since v2:
- new patch from Jason to document FAULT_FLAG_ALLOW_RETRY semantics
better
- remove -EAGAIN handling in nouveau earlier

Changes since v1:
- don't return the valid state from hmm_range_unregister
- additional nouveau cleanups


2019-07-24 06:54:05

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 1/7] mm: always return EBUSY for invalid ranges in hmm_range_{fault,snapshot}

We should not have two different error codes for the same condition. In
addition this really complicates the code due to the special handling of
EAGAIN that drops the mmap_sem due to the FAULT_FLAG_ALLOW_RETRY logic
in the core vm.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
---
Documentation/vm/hmm.rst | 2 +-
mm/hmm.c | 10 ++++------
2 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index 7d90964abbb0..710ce1c701bf 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -237,7 +237,7 @@ The usage pattern is::
ret = hmm_range_snapshot(&range);
if (ret) {
up_read(&mm->mmap_sem);
- if (ret == -EAGAIN) {
+ if (ret == -EBUSY) {
/*
* No need to check hmm_range_wait_until_valid() return value
* on retry we will get proper error with hmm_range_snapshot()
diff --git a/mm/hmm.c b/mm/hmm.c
index e1eedef129cf..16b6731a34db 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -946,7 +946,7 @@ EXPORT_SYMBOL(hmm_range_unregister);
* @range: range
* Return: -EINVAL if invalid argument, -ENOMEM out of memory, -EPERM invalid
* permission (for instance asking for write and range is read only),
- * -EAGAIN if you need to retry, -EFAULT invalid (ie either no valid
+ * -EBUSY if you need to retry, -EFAULT invalid (ie either no valid
* vma or it is illegal to access that range), number of valid pages
* in range->pfns[] (from range start address).
*
@@ -967,7 +967,7 @@ long hmm_range_snapshot(struct hmm_range *range)
do {
/* If range is no longer valid force retry. */
if (!range->valid)
- return -EAGAIN;
+ return -EBUSY;

vma = find_vma(hmm->mm, start);
if (vma == NULL || (vma->vm_flags & device_vma))
@@ -1062,10 +1062,8 @@ long hmm_range_fault(struct hmm_range *range, bool block)

do {
/* If range is no longer valid force retry. */
- if (!range->valid) {
- up_read(&hmm->mm->mmap_sem);
- return -EAGAIN;
- }
+ if (!range->valid)
+ return -EBUSY;

vma = find_vma(hmm->mm, start);
if (vma == NULL || (vma->vm_flags & device_vma))
--
2.20.1

2019-07-24 06:54:08

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 2/7] mm: move hmm_vma_range_done and hmm_vma_fault to nouveau

These two functions are marked as a legacy APIs to get rid of, but seem
to suit the current nouveau flow. Move it to the only user in
preparation for fixing a locking bug involving caller and callee.
All comments referring to the old API have been removed as this now
is a driver private helper.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_svm.c | 46 ++++++++++++++++++++++-
include/linux/hmm.h | 54 ---------------------------
2 files changed, 44 insertions(+), 56 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 8c92374afcf2..6c1b04de0db8 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -475,6 +475,48 @@ nouveau_svm_fault_cache(struct nouveau_svm *svm,
fault->inst, fault->addr, fault->access);
}

+static inline bool
+nouveau_range_done(struct hmm_range *range)
+{
+ bool ret = hmm_range_valid(range);
+
+ hmm_range_unregister(range);
+ return ret;
+}
+
+static int
+nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range,
+ bool block)
+{
+ long ret;
+
+ range->default_flags = 0;
+ range->pfn_flags_mask = -1UL;
+
+ ret = hmm_range_register(range, mirror,
+ range->start, range->end,
+ PAGE_SHIFT);
+ if (ret)
+ return (int)ret;
+
+ if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
+ up_read(&range->vma->vm_mm->mmap_sem);
+ return -EAGAIN;
+ }
+
+ ret = hmm_range_fault(range, block);
+ if (ret <= 0) {
+ if (ret == -EBUSY || !ret) {
+ up_read(&range->vma->vm_mm->mmap_sem);
+ ret = -EBUSY;
+ } else if (ret == -EAGAIN)
+ ret = -EBUSY;
+ hmm_range_unregister(range);
+ return ret;
+ }
+ return 0;
+}
+
static int
nouveau_svm_fault(struct nvif_notify *notify)
{
@@ -649,10 +691,10 @@ nouveau_svm_fault(struct nvif_notify *notify)
range.values = nouveau_svm_pfn_values;
range.pfn_shift = NVIF_VMM_PFNMAP_V0_ADDR_SHIFT;
again:
- ret = hmm_vma_fault(&svmm->mirror, &range, true);
+ ret = nouveau_range_fault(&svmm->mirror, &range, true);
if (ret == 0) {
mutex_lock(&svmm->mutex);
- if (!hmm_vma_range_done(&range)) {
+ if (!nouveau_range_done(&range)) {
mutex_unlock(&svmm->mutex);
goto again;
}
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index b8a08b2a10ca..7ef56dc18050 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -484,60 +484,6 @@ long hmm_range_dma_unmap(struct hmm_range *range,
*/
#define HMM_RANGE_DEFAULT_TIMEOUT 1000

-/* This is a temporary helper to avoid merge conflict between trees. */
-static inline bool hmm_vma_range_done(struct hmm_range *range)
-{
- bool ret = hmm_range_valid(range);
-
- hmm_range_unregister(range);
- return ret;
-}
-
-/* This is a temporary helper to avoid merge conflict between trees. */
-static inline int hmm_vma_fault(struct hmm_mirror *mirror,
- struct hmm_range *range, bool block)
-{
- long ret;
-
- /*
- * With the old API the driver must set each individual entries with
- * the requested flags (valid, write, ...). So here we set the mask to
- * keep intact the entries provided by the driver and zero out the
- * default_flags.
- */
- range->default_flags = 0;
- range->pfn_flags_mask = -1UL;
-
- ret = hmm_range_register(range, mirror,
- range->start, range->end,
- PAGE_SHIFT);
- if (ret)
- return (int)ret;
-
- if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
- /*
- * The mmap_sem was taken by driver we release it here and
- * returns -EAGAIN which correspond to mmap_sem have been
- * drop in the old API.
- */
- up_read(&range->vma->vm_mm->mmap_sem);
- return -EAGAIN;
- }
-
- ret = hmm_range_fault(range, block);
- if (ret <= 0) {
- if (ret == -EBUSY || !ret) {
- /* Same as above, drop mmap_sem to match old API. */
- up_read(&range->vma->vm_mm->mmap_sem);
- ret = -EBUSY;
- } else if (ret == -EAGAIN)
- ret = -EBUSY;
- hmm_range_unregister(range);
- return ret;
- }
- return 0;
-}
-
/* Below are for HMM internal use only! Not to be used by device driver! */
static inline void hmm_mm_init(struct mm_struct *mm)
{
--
2.20.1

2019-07-24 06:54:17

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 4/7] nouveau: unlock mmap_sem on all errors from nouveau_range_fault

Currently nouveau_svm_fault expects nouveau_range_fault to never unlock
mmap_sem, but the latter unlocks it for a random selection of error
codes. Fix this up by always unlocking mmap_sem for non-zero return
values in nouveau_range_fault, and only unlocking it in the caller
for successful returns.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_svm.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index e3097492b4ad..a835cebb6d90 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -495,8 +495,10 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
ret = hmm_range_register(range, mirror,
range->start, range->end,
PAGE_SHIFT);
- if (ret)
+ if (ret) {
+ up_read(&range->vma->vm_mm->mmap_sem);
return (int)ret;
+ }

if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
up_read(&range->vma->vm_mm->mmap_sem);
@@ -505,10 +507,9 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)

ret = hmm_range_fault(range, true);
if (ret <= 0) {
- if (ret == -EBUSY || !ret) {
- up_read(&range->vma->vm_mm->mmap_sem);
+ if (ret == 0)
ret = -EBUSY;
- }
+ up_read(&range->vma->vm_mm->mmap_sem);
hmm_range_unregister(range);
return ret;
}
@@ -706,8 +707,8 @@ nouveau_svm_fault(struct nvif_notify *notify)
NULL);
svmm->vmm->vmm.object.client->super = false;
mutex_unlock(&svmm->mutex);
+ up_read(&svmm->mm->mmap_sem);
}
- up_read(&svmm->mm->mmap_sem);

/* Cancel any faults in the window whose pages didn't manage
* to keep their valid bit, or stay writeable when required.
--
2.20.1

2019-07-24 06:54:30

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 5/7] nouveau: return -EBUSY when hmm_range_wait_until_valid fails

-EAGAIN has a magic meaning for non-blocking faults, so don't overload
it. Given that the caller doesn't check for specific error codes this
change is purely cosmetic.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index a835cebb6d90..545100f7c594 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -502,7 +502,7 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)

if (!hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT)) {
up_read(&range->vma->vm_mm->mmap_sem);
- return -EAGAIN;
+ return -EBUSY;
}

ret = hmm_range_fault(range, true);
--
2.20.1

2019-07-24 06:55:58

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 3/7] nouveau: remove the block parameter to nouveau_range_fault

The parameter is always false, so remove it as well as the -EAGAIN
handling that can only happen for the non-blocking case.

Signed-off-by: Christoph Hellwig <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_svm.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 6c1b04de0db8..e3097492b4ad 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -485,8 +485,7 @@ nouveau_range_done(struct hmm_range *range)
}

static int
-nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range,
- bool block)
+nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range)
{
long ret;

@@ -504,13 +503,12 @@ nouveau_range_fault(struct hmm_mirror *mirror, struct hmm_range *range,
return -EAGAIN;
}

- ret = hmm_range_fault(range, block);
+ ret = hmm_range_fault(range, true);
if (ret <= 0) {
if (ret == -EBUSY || !ret) {
up_read(&range->vma->vm_mm->mmap_sem);
ret = -EBUSY;
- } else if (ret == -EAGAIN)
- ret = -EBUSY;
+ }
hmm_range_unregister(range);
return ret;
}
@@ -691,7 +689,7 @@ nouveau_svm_fault(struct nvif_notify *notify)
range.values = nouveau_svm_pfn_values;
range.pfn_shift = NVIF_VMM_PFNMAP_V0_ADDR_SHIFT;
again:
- ret = nouveau_range_fault(&svmm->mirror, &range, true);
+ ret = nouveau_range_fault(&svmm->mirror, &range);
if (ret == 0) {
mutex_lock(&svmm->mutex);
if (!nouveau_range_done(&range)) {
--
2.20.1

2019-07-24 06:56:11

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 7/7] mm: comment on VM_FAULT_RETRY semantics in handle_mm_fault

From: Jason Gunthorpe <[email protected]>

The magic dropping of mmap_sem when handle_mm_fault returns
VM_FAULT_RETRY is rather subtile. Add a comment explaining it.

Signed-off-by: Jason Gunthorpe <[email protected]>
[hch: wrote a changelog]
Signed-off-by: Christoph Hellwig <[email protected]>
---
mm/hmm.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 16b6731a34db..54b3a4162ae9 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -301,8 +301,10 @@ static int hmm_vma_do_fault(struct mm_walk *walk, unsigned long addr,
flags |= hmm_vma_walk->block ? 0 : FAULT_FLAG_ALLOW_RETRY;
flags |= write_fault ? FAULT_FLAG_WRITE : 0;
ret = handle_mm_fault(vma, addr, flags);
- if (ret & VM_FAULT_RETRY)
+ if (ret & VM_FAULT_RETRY) {
+ /* Note, handle_mm_fault did up_read(&mm->mmap_sem)) */
return -EAGAIN;
+ }
if (ret & VM_FAULT_ERROR) {
*pfn = range->values[HMM_PFN_ERROR];
return -EFAULT;
--
2.20.1

2019-07-24 06:56:22

by Christoph Hellwig

[permalink] [raw]
Subject: [PATCH 6/7] mm: remove the legacy hmm_pfn_* APIs

Switch the one remaining user in nouveau over to its replacement,
and remove all the wrappers.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Ralph Campbell <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
---
drivers/gpu/drm/nouveau/nouveau_dmem.c | 2 +-
include/linux/hmm.h | 34 --------------------------
2 files changed, 1 insertion(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c
index 1333220787a1..345c63cb752a 100644
--- a/drivers/gpu/drm/nouveau/nouveau_dmem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c
@@ -845,7 +845,7 @@ nouveau_dmem_convert_pfn(struct nouveau_drm *drm,
struct page *page;
uint64_t addr;

- page = hmm_pfn_to_page(range, range->pfns[i]);
+ page = hmm_device_entry_to_page(range, range->pfns[i]);
if (page == NULL)
continue;

diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 7ef56dc18050..9f32586684c9 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -290,40 +290,6 @@ static inline uint64_t hmm_device_entry_from_pfn(const struct hmm_range *range,
range->flags[HMM_PFN_VALID];
}

-/*
- * Old API:
- * hmm_pfn_to_page()
- * hmm_pfn_to_pfn()
- * hmm_pfn_from_page()
- * hmm_pfn_from_pfn()
- *
- * This are the OLD API please use new API, it is here to avoid cross-tree
- * merge painfullness ie we convert things to new API in stages.
- */
-static inline struct page *hmm_pfn_to_page(const struct hmm_range *range,
- uint64_t pfn)
-{
- return hmm_device_entry_to_page(range, pfn);
-}
-
-static inline unsigned long hmm_pfn_to_pfn(const struct hmm_range *range,
- uint64_t pfn)
-{
- return hmm_device_entry_to_pfn(range, pfn);
-}
-
-static inline uint64_t hmm_pfn_from_page(const struct hmm_range *range,
- struct page *page)
-{
- return hmm_device_entry_from_page(range, page);
-}
-
-static inline uint64_t hmm_pfn_from_pfn(const struct hmm_range *range,
- unsigned long pfn)
-{
- return hmm_device_entry_from_pfn(range, pfn);
-}
-
/*
* Mirroring: how to synchronize device page table with CPU page table.
*
--
2.20.1

2019-07-26 00:17:34

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: hmm_range_fault related fixes and legacy API removal v3

On Wed, Jul 24, 2019 at 08:52:51AM +0200, Christoph Hellwig wrote:
> Hi Jérôme, Ben and Jason,
>
> below is a series against the hmm tree which fixes up the mmap_sem
> locking in nouveau and while at it also removes leftover legacy HMM APIs
> only used by nouveau.
>
> The first 4 patches are a bug fix for nouveau, which I suspect should
> go into this merge window even if the code is marked as staging, just
> to avoid people copying the breakage.
>
> Changes since v2:
> - new patch from Jason to document FAULT_FLAG_ALLOW_RETRY semantics
> better
> - remove -EAGAIN handling in nouveau earlier

I don't see Ralph's tested by, do you think it changed enough to
require testing again? If so, Ralph would you be so kind?

In any event, I'm sending this into linux-next and intend to forward
the first four next week.

Thanks,
Jason

2019-07-26 00:56:22

by Ralph Campbell

[permalink] [raw]
Subject: Re: hmm_range_fault related fixes and legacy API removal v3


On 7/25/19 5:16 PM, Jason Gunthorpe wrote:
> On Wed, Jul 24, 2019 at 08:52:51AM +0200, Christoph Hellwig wrote:
>> Hi Jérôme, Ben and Jason,
>>
>> below is a series against the hmm tree which fixes up the mmap_sem
>> locking in nouveau and while at it also removes leftover legacy HMM APIs
>> only used by nouveau.
>>
>> The first 4 patches are a bug fix for nouveau, which I suspect should
>> go into this merge window even if the code is marked as staging, just
>> to avoid people copying the breakage.
>>
>> Changes since v2:
>> - new patch from Jason to document FAULT_FLAG_ALLOW_RETRY semantics
>> better
>> - remove -EAGAIN handling in nouveau earlier
>
> I don't see Ralph's tested by, do you think it changed enough to
> require testing again? If so, Ralph would you be so kind?
>
> In any event, I'm sending this into linux-next and intend to forward
> the first four next week.
>
> Thanks,
> Jason
>

I have been testing Christoph's v3 with my set of v2 changes so
feel free to add my tested-by.

2019-07-26 04:57:59

by Christoph Hellwig

[permalink] [raw]
Subject: Re: hmm_range_fault related fixes and legacy API removal v3

On Fri, Jul 26, 2019 at 12:16:30AM +0000, Jason Gunthorpe wrote:
> I don't see Ralph's tested by, do you think it changed enough to
> require testing again? If so, Ralph would you be so kind?

The changes were fairly small, but I didn't feel to carry it over given
that there were changes after all.