To block fast gup we need to make sure TTM ptes are always special.
With MIXEDMAP we, on architectures that don't support pte_special,
insert normal ptes, but OTOH on those architectures, fast is not
supported.
At the same time, the function documentation to vm_normal_page() suggests
that ptes pointing to system memory pages of MIXEDMAP vmas are always
normal, but that doesn't seem consistent with what's implemented in
vmf_insert_mixed(). I'm thus not entirely sure this patch is actually
needed.
But to make sure and to avoid also normal (non-fast) gup, make all
TTM vmas PFNMAP. With PFNMAP we can't allow COW mappings
anymore so make is_cow_mapping() available and use it to reject
COW mappigs at mmap time.
There was previously a comment in the code that WC mappings together
with x86 PAT + PFNMAP was bad for performance. However from looking at
vmf_insert_mixed() it looks like in the current code PFNMAP and MIXEDMAP
are handled the same for architectures that support pte_special. This
means there should not be a performance difference anymore, but this
needs to be verified.
Cc: Christian Koenig <[email protected]>
Cc: David Airlie <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Thomas Hellström (Intel) <[email protected]>
---
drivers/gpu/drm/ttm/ttm_bo_vm.c | 22 ++++++++--------------
include/linux/mm.h | 5 +++++
mm/internal.h | 5 -----
3 files changed, 13 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 1c34983480e5..708c6fb9be81 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -372,12 +372,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
* at arbitrary times while the data is mmap'ed.
* See vmf_insert_mixed_prot() for a discussion.
*/
- if (vma->vm_flags & VM_MIXEDMAP)
- ret = vmf_insert_mixed_prot(vma, address,
- __pfn_to_pfn_t(pfn, PFN_DEV),
- prot);
- else
- ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
+ ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
/* Never error on prefaulted PTEs */
if (unlikely((ret & VM_FAULT_ERROR))) {
@@ -555,18 +550,14 @@ static void ttm_bo_mmap_vma_setup(struct ttm_buffer_object *bo, struct vm_area_s
* Note: We're transferring the bo reference to
* vma->vm_private_data here.
*/
-
vma->vm_private_data = bo;
/*
- * We'd like to use VM_PFNMAP on shared mappings, where
- * (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
- * but for some reason VM_PFNMAP + x86 PAT + write-combine is very
- * bad for performance. Until that has been sorted out, use
- * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
+ * PFNMAP forces us to block COW mappings in mmap(),
+ * and with MIXEDMAP we would incorrectly allow fast gup
+ * on TTM memory on architectures that don't have pte_special.
*/
- vma->vm_flags |= VM_MIXEDMAP;
- vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
+ vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
}
int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
@@ -579,6 +570,9 @@ int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
if (unlikely(vma->vm_pgoff < DRM_FILE_PAGE_OFFSET_START))
return -EINVAL;
+ if (unlikely(is_cow_mapping(vma->vm_flags)))
+ return -EINVAL;
+
bo = ttm_bo_vm_lookup(bdev, vma->vm_pgoff, vma_pages(vma));
if (unlikely(!bo))
return -EINVAL;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 77e64e3eac80..c6ebf7f9ddbb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -686,6 +686,11 @@ static inline bool vma_is_accessible(struct vm_area_struct *vma)
return vma->vm_flags & VM_ACCESS_FLAGS;
}
+static inline bool is_cow_mapping(vm_flags_t flags)
+{
+ return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
+}
+
#ifdef CONFIG_SHMEM
/*
* The vma_is_shmem is not inline because it is used only by slow
diff --git a/mm/internal.h b/mm/internal.h
index 9902648f2206..1432feec62df 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -296,11 +296,6 @@ static inline unsigned int buddy_order(struct page *page)
*/
#define buddy_order_unsafe(page) READ_ONCE(page_private(page))
-static inline bool is_cow_mapping(vm_flags_t flags)
-{
- return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
-}
-
/*
* These three helpers classifies VMAs for virtual memory accounting.
*/
--
2.30.2
Am 21.03.21 um 19:45 schrieb Thomas Hellström (Intel):
> To block fast gup we need to make sure TTM ptes are always special.
> With MIXEDMAP we, on architectures that don't support pte_special,
> insert normal ptes, but OTOH on those architectures, fast is not
> supported.
> At the same time, the function documentation to vm_normal_page() suggests
> that ptes pointing to system memory pages of MIXEDMAP vmas are always
> normal, but that doesn't seem consistent with what's implemented in
> vmf_insert_mixed(). I'm thus not entirely sure this patch is actually
> needed.
>
> But to make sure and to avoid also normal (non-fast) gup, make all
> TTM vmas PFNMAP. With PFNMAP we can't allow COW mappings
> anymore so make is_cow_mapping() available and use it to reject
> COW mappigs at mmap time.
I would separate the disallowing of COW mapping from the PFN change. I'm
pretty sure that COW mappings never worked on TTM BOs in the first place.
But either way this patch is Reviewed-by: Christian König
<[email protected]>.
Thanks,
Christian.
>
> There was previously a comment in the code that WC mappings together
> with x86 PAT + PFNMAP was bad for performance. However from looking at
> vmf_insert_mixed() it looks like in the current code PFNMAP and MIXEDMAP
> are handled the same for architectures that support pte_special. This
> means there should not be a performance difference anymore, but this
> needs to be verified.
>
> Cc: Christian Koenig <[email protected]>
> Cc: David Airlie <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Thomas Hellström (Intel) <[email protected]>
> ---
> drivers/gpu/drm/ttm/ttm_bo_vm.c | 22 ++++++++--------------
> include/linux/mm.h | 5 +++++
> mm/internal.h | 5 -----
> 3 files changed, 13 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index 1c34983480e5..708c6fb9be81 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -372,12 +372,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
> * at arbitrary times while the data is mmap'ed.
> * See vmf_insert_mixed_prot() for a discussion.
> */
> - if (vma->vm_flags & VM_MIXEDMAP)
> - ret = vmf_insert_mixed_prot(vma, address,
> - __pfn_to_pfn_t(pfn, PFN_DEV),
> - prot);
> - else
> - ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
> + ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
>
> /* Never error on prefaulted PTEs */
> if (unlikely((ret & VM_FAULT_ERROR))) {
> @@ -555,18 +550,14 @@ static void ttm_bo_mmap_vma_setup(struct ttm_buffer_object *bo, struct vm_area_s
> * Note: We're transferring the bo reference to
> * vma->vm_private_data here.
> */
> -
> vma->vm_private_data = bo;
>
> /*
> - * We'd like to use VM_PFNMAP on shared mappings, where
> - * (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
> - * but for some reason VM_PFNMAP + x86 PAT + write-combine is very
> - * bad for performance. Until that has been sorted out, use
> - * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
> + * PFNMAP forces us to block COW mappings in mmap(),
> + * and with MIXEDMAP we would incorrectly allow fast gup
> + * on TTM memory on architectures that don't have pte_special.
> */
> - vma->vm_flags |= VM_MIXEDMAP;
> - vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
> + vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
> }
>
> int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
> @@ -579,6 +570,9 @@ int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
> if (unlikely(vma->vm_pgoff < DRM_FILE_PAGE_OFFSET_START))
> return -EINVAL;
>
> + if (unlikely(is_cow_mapping(vma->vm_flags)))
> + return -EINVAL;
> +
> bo = ttm_bo_vm_lookup(bdev, vma->vm_pgoff, vma_pages(vma));
> if (unlikely(!bo))
> return -EINVAL;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 77e64e3eac80..c6ebf7f9ddbb 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -686,6 +686,11 @@ static inline bool vma_is_accessible(struct vm_area_struct *vma)
> return vma->vm_flags & VM_ACCESS_FLAGS;
> }
>
> +static inline bool is_cow_mapping(vm_flags_t flags)
> +{
> + return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
> +}
> +
> #ifdef CONFIG_SHMEM
> /*
> * The vma_is_shmem is not inline because it is used only by slow
> diff --git a/mm/internal.h b/mm/internal.h
> index 9902648f2206..1432feec62df 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -296,11 +296,6 @@ static inline unsigned int buddy_order(struct page *page)
> */
> #define buddy_order_unsafe(page) READ_ONCE(page_private(page))
>
> -static inline bool is_cow_mapping(vm_flags_t flags)
> -{
> - return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
> -}
> -
> /*
> * These three helpers classifies VMAs for virtual memory accounting.
> */
Hi!
On 3/22/21 8:47 AM, Christian König wrote:
> Am 21.03.21 um 19:45 schrieb Thomas Hellström (Intel):
>> To block fast gup we need to make sure TTM ptes are always special.
>> With MIXEDMAP we, on architectures that don't support pte_special,
>> insert normal ptes, but OTOH on those architectures, fast is not
>> supported.
>> At the same time, the function documentation to vm_normal_page()
>> suggests
>> that ptes pointing to system memory pages of MIXEDMAP vmas are always
>> normal, but that doesn't seem consistent with what's implemented in
>> vmf_insert_mixed(). I'm thus not entirely sure this patch is actually
>> needed.
>>
>> But to make sure and to avoid also normal (non-fast) gup, make all
>> TTM vmas PFNMAP. With PFNMAP we can't allow COW mappings
>> anymore so make is_cow_mapping() available and use it to reject
>> COW mappigs at mmap time.
>
> I would separate the disallowing of COW mapping from the PFN change.
> I'm pretty sure that COW mappings never worked on TTM BOs in the first
> place.
COW doesn't work with PFNMAP together with non-linear maps, so as a
consequence from moving from MIXEDMAP to PFNMAP we must disallow COW, so
it seems logical to me to do it in one patch.
And working COW was one of the tests I used for huge PMDs/PUDs, so it
has indeed been working, but I can't think of any relevant use-cases.
Did you, BTW, have a chance to test this with WC mappings?
Thanks,
/Thomas
>
> But either way this patch is Reviewed-by: Christian König
> <[email protected]>.
>
> Thanks,
> Christian.
>
>>
>> There was previously a comment in the code that WC mappings together
>> with x86 PAT + PFNMAP was bad for performance. However from looking at
>> vmf_insert_mixed() it looks like in the current code PFNMAP and MIXEDMAP
>> are handled the same for architectures that support pte_special. This
>> means there should not be a performance difference anymore, but this
>> needs to be verified.
>>
>> Cc: Christian Koenig <[email protected]>
>> Cc: David Airlie <[email protected]>
>> Cc: Daniel Vetter <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Jason Gunthorpe <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Thomas Hellström (Intel) <[email protected]>
>> ---
>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 22 ++++++++--------------
>> include/linux/mm.h | 5 +++++
>> mm/internal.h | 5 -----
>> 3 files changed, 13 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> index 1c34983480e5..708c6fb9be81 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> @@ -372,12 +372,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct
>> vm_fault *vmf,
>> * at arbitrary times while the data is mmap'ed.
>> * See vmf_insert_mixed_prot() for a discussion.
>> */
>> - if (vma->vm_flags & VM_MIXEDMAP)
>> - ret = vmf_insert_mixed_prot(vma, address,
>> - __pfn_to_pfn_t(pfn, PFN_DEV),
>> - prot);
>> - else
>> - ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
>> + ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
>> /* Never error on prefaulted PTEs */
>> if (unlikely((ret & VM_FAULT_ERROR))) {
>> @@ -555,18 +550,14 @@ static void ttm_bo_mmap_vma_setup(struct
>> ttm_buffer_object *bo, struct vm_area_s
>> * Note: We're transferring the bo reference to
>> * vma->vm_private_data here.
>> */
>> -
>> vma->vm_private_data = bo;
>> /*
>> - * We'd like to use VM_PFNMAP on shared mappings, where
>> - * (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
>> - * but for some reason VM_PFNMAP + x86 PAT + write-combine is very
>> - * bad for performance. Until that has been sorted out, use
>> - * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
>> + * PFNMAP forces us to block COW mappings in mmap(),
>> + * and with MIXEDMAP we would incorrectly allow fast gup
>> + * on TTM memory on architectures that don't have pte_special.
>> */
>> - vma->vm_flags |= VM_MIXEDMAP;
>> - vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
>> + vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
>> }
>> int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
>> @@ -579,6 +570,9 @@ int ttm_bo_mmap(struct file *filp, struct
>> vm_area_struct *vma,
>> if (unlikely(vma->vm_pgoff < DRM_FILE_PAGE_OFFSET_START))
>> return -EINVAL;
>> + if (unlikely(is_cow_mapping(vma->vm_flags)))
>> + return -EINVAL;
>> +
>> bo = ttm_bo_vm_lookup(bdev, vma->vm_pgoff, vma_pages(vma));
>> if (unlikely(!bo))
>> return -EINVAL;
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 77e64e3eac80..c6ebf7f9ddbb 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -686,6 +686,11 @@ static inline bool vma_is_accessible(struct
>> vm_area_struct *vma)
>> return vma->vm_flags & VM_ACCESS_FLAGS;
>> }
>> +static inline bool is_cow_mapping(vm_flags_t flags)
>> +{
>> + return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
>> +}
>> +
>> #ifdef CONFIG_SHMEM
>> /*
>> * The vma_is_shmem is not inline because it is used only by slow
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 9902648f2206..1432feec62df 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -296,11 +296,6 @@ static inline unsigned int buddy_order(struct
>> page *page)
>> */
>> #define buddy_order_unsafe(page) READ_ONCE(page_private(page))
>> -static inline bool is_cow_mapping(vm_flags_t flags)
>> -{
>> - return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
>> -}
>> -
>> /*
>> * These three helpers classifies VMAs for virtual memory accounting.
>> */
On Sun, Mar 21, 2021 at 07:45:29PM +0100, Thomas Hellstr?m (Intel) wrote:
> To block fast gup we need to make sure TTM ptes are always special.
> With MIXEDMAP we, on architectures that don't support pte_special,
> insert normal ptes, but OTOH on those architectures, fast is not
> supported.
> At the same time, the function documentation to vm_normal_page() suggests
> that ptes pointing to system memory pages of MIXEDMAP vmas are always
> normal, but that doesn't seem consistent with what's implemented in
> vmf_insert_mixed(). I'm thus not entirely sure this patch is actually
> needed.
>
> But to make sure and to avoid also normal (non-fast) gup, make all
> TTM vmas PFNMAP. With PFNMAP we can't allow COW mappings
> anymore so make is_cow_mapping() available and use it to reject
> COW mappigs at mmap time.
>
> There was previously a comment in the code that WC mappings together
> with x86 PAT + PFNMAP was bad for performance. However from looking at
> vmf_insert_mixed() it looks like in the current code PFNMAP and MIXEDMAP
> are handled the same for architectures that support pte_special. This
> means there should not be a performance difference anymore, but this
> needs to be verified.
>
> Cc: Christian Koenig <[email protected]>
> Cc: David Airlie <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Thomas Hellstr?m (Intel) <[email protected]>
> ---
> drivers/gpu/drm/ttm/ttm_bo_vm.c | 22 ++++++++--------------
> include/linux/mm.h | 5 +++++
> mm/internal.h | 5 -----
> 3 files changed, 13 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index 1c34983480e5..708c6fb9be81 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -372,12 +372,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
> * at arbitrary times while the data is mmap'ed.
> * See vmf_insert_mixed_prot() for a discussion.
> */
> - if (vma->vm_flags & VM_MIXEDMAP)
> - ret = vmf_insert_mixed_prot(vma, address,
> - __pfn_to_pfn_t(pfn, PFN_DEV),
> - prot);
> - else
> - ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
> + ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
>
> /* Never error on prefaulted PTEs */
> if (unlikely((ret & VM_FAULT_ERROR))) {
> @@ -555,18 +550,14 @@ static void ttm_bo_mmap_vma_setup(struct ttm_buffer_object *bo, struct vm_area_s
> * Note: We're transferring the bo reference to
> * vma->vm_private_data here.
> */
> -
> vma->vm_private_data = bo;
>
> /*
> - * We'd like to use VM_PFNMAP on shared mappings, where
> - * (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
> - * but for some reason VM_PFNMAP + x86 PAT + write-combine is very
> - * bad for performance. Until that has been sorted out, use
> - * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
> + * PFNMAP forces us to block COW mappings in mmap(),
> + * and with MIXEDMAP we would incorrectly allow fast gup
> + * on TTM memory on architectures that don't have pte_special.
> */
> - vma->vm_flags |= VM_MIXEDMAP;
> - vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
> + vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
> }
>
> int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
> @@ -579,6 +570,9 @@ int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
> if (unlikely(vma->vm_pgoff < DRM_FILE_PAGE_OFFSET_START))
> return -EINVAL;
>
> + if (unlikely(is_cow_mapping(vma->vm_flags)))
> + return -EINVAL;
> +
> bo = ttm_bo_vm_lookup(bdev, vma->vm_pgoff, vma_pages(vma));
> if (unlikely(!bo))
> return -EINVAL;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 77e64e3eac80..c6ebf7f9ddbb 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -686,6 +686,11 @@ static inline bool vma_is_accessible(struct vm_area_struct *vma)
> return vma->vm_flags & VM_ACCESS_FLAGS;
> }
>
> +static inline bool is_cow_mapping(vm_flags_t flags)
Bit a bikeshed, but I wonder whether the public interface shouldn't be
vma_is_cow_mapping. Or whether this shouldn't be rejected somewhere else,
since at least in drivers/gpu we have tons of cases that don't check for
this and get it all kinds of wrong I think.
remap_pfn_range handles this for many cases, but by far not for all.
Anyway patch itself lgtm:
Reviewed-by: Daniel Vetter <[email protected]>
I'll try and find some -mm folks to look at this too.
-Daniel
> +{
> + return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
> +}
> +
> #ifdef CONFIG_SHMEM
> /*
> * The vma_is_shmem is not inline because it is used only by slow
> diff --git a/mm/internal.h b/mm/internal.h
> index 9902648f2206..1432feec62df 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -296,11 +296,6 @@ static inline unsigned int buddy_order(struct page *page)
> */
> #define buddy_order_unsafe(page) READ_ONCE(page_private(page))
>
> -static inline bool is_cow_mapping(vm_flags_t flags)
> -{
> - return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
> -}
> -
> /*
> * These three helpers classifies VMAs for virtual memory accounting.
> */
> --
> 2.30.2
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Am 22.03.21 um 09:13 schrieb Thomas Hellström (Intel):
> Hi!
>
> On 3/22/21 8:47 AM, Christian König wrote:
>> Am 21.03.21 um 19:45 schrieb Thomas Hellström (Intel):
>>> To block fast gup we need to make sure TTM ptes are always special.
>>> With MIXEDMAP we, on architectures that don't support pte_special,
>>> insert normal ptes, but OTOH on those architectures, fast is not
>>> supported.
>>> At the same time, the function documentation to vm_normal_page()
>>> suggests
>>> that ptes pointing to system memory pages of MIXEDMAP vmas are always
>>> normal, but that doesn't seem consistent with what's implemented in
>>> vmf_insert_mixed(). I'm thus not entirely sure this patch is actually
>>> needed.
>>>
>>> But to make sure and to avoid also normal (non-fast) gup, make all
>>> TTM vmas PFNMAP. With PFNMAP we can't allow COW mappings
>>> anymore so make is_cow_mapping() available and use it to reject
>>> COW mappigs at mmap time.
>>
>> I would separate the disallowing of COW mapping from the PFN change.
>> I'm pretty sure that COW mappings never worked on TTM BOs in the
>> first place.
>
> COW doesn't work with PFNMAP together with non-linear maps, so as a
> consequence from moving from MIXEDMAP to PFNMAP we must disallow COW,
> so it seems logical to me to do it in one patch.
>
> And working COW was one of the tests I used for huge PMDs/PUDs, so it
> has indeed been working, but I can't think of any relevant use-cases.
Ok, going to keep that in mind. I was assuming COW didn't worked before
on TTM pages.
> Did you, BTW, have a chance to test this with WC mappings?
I'm going to give this a full piglit round, but currently I'm busy with
internal testing.
Thanks,
Christian.
>
> Thanks,
> /Thomas
>
>
>
>>
>> But either way this patch is Reviewed-by: Christian König
>> <[email protected]>.
>>
>> Thanks,
>> Christian.
>>
>>>
>>> There was previously a comment in the code that WC mappings together
>>> with x86 PAT + PFNMAP was bad for performance. However from looking at
>>> vmf_insert_mixed() it looks like in the current code PFNMAP and
>>> MIXEDMAP
>>> are handled the same for architectures that support pte_special. This
>>> means there should not be a performance difference anymore, but this
>>> needs to be verified.
>>>
>>> Cc: Christian Koenig <[email protected]>
>>> Cc: David Airlie <[email protected]>
>>> Cc: Daniel Vetter <[email protected]>
>>> Cc: Andrew Morton <[email protected]>
>>> Cc: Jason Gunthorpe <[email protected]>
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Cc: [email protected]
>>> Signed-off-by: Thomas Hellström (Intel) <[email protected]>
>>> ---
>>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 22 ++++++++--------------
>>> include/linux/mm.h | 5 +++++
>>> mm/internal.h | 5 -----
>>> 3 files changed, 13 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> index 1c34983480e5..708c6fb9be81 100644
>>> --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>>> @@ -372,12 +372,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct
>>> vm_fault *vmf,
>>> * at arbitrary times while the data is mmap'ed.
>>> * See vmf_insert_mixed_prot() for a discussion.
>>> */
>>> - if (vma->vm_flags & VM_MIXEDMAP)
>>> - ret = vmf_insert_mixed_prot(vma, address,
>>> - __pfn_to_pfn_t(pfn, PFN_DEV),
>>> - prot);
>>> - else
>>> - ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
>>> + ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
>>> /* Never error on prefaulted PTEs */
>>> if (unlikely((ret & VM_FAULT_ERROR))) {
>>> @@ -555,18 +550,14 @@ static void ttm_bo_mmap_vma_setup(struct
>>> ttm_buffer_object *bo, struct vm_area_s
>>> * Note: We're transferring the bo reference to
>>> * vma->vm_private_data here.
>>> */
>>> -
>>> vma->vm_private_data = bo;
>>> /*
>>> - * We'd like to use VM_PFNMAP on shared mappings, where
>>> - * (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
>>> - * but for some reason VM_PFNMAP + x86 PAT + write-combine is very
>>> - * bad for performance. Until that has been sorted out, use
>>> - * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
>>> + * PFNMAP forces us to block COW mappings in mmap(),
>>> + * and with MIXEDMAP we would incorrectly allow fast gup
>>> + * on TTM memory on architectures that don't have pte_special.
>>> */
>>> - vma->vm_flags |= VM_MIXEDMAP;
>>> - vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
>>> + vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
>>> }
>>> int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
>>> @@ -579,6 +570,9 @@ int ttm_bo_mmap(struct file *filp, struct
>>> vm_area_struct *vma,
>>> if (unlikely(vma->vm_pgoff < DRM_FILE_PAGE_OFFSET_START))
>>> return -EINVAL;
>>> + if (unlikely(is_cow_mapping(vma->vm_flags)))
>>> + return -EINVAL;
>>> +
>>> bo = ttm_bo_vm_lookup(bdev, vma->vm_pgoff, vma_pages(vma));
>>> if (unlikely(!bo))
>>> return -EINVAL;
>>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>>> index 77e64e3eac80..c6ebf7f9ddbb 100644
>>> --- a/include/linux/mm.h
>>> +++ b/include/linux/mm.h
>>> @@ -686,6 +686,11 @@ static inline bool vma_is_accessible(struct
>>> vm_area_struct *vma)
>>> return vma->vm_flags & VM_ACCESS_FLAGS;
>>> }
>>> +static inline bool is_cow_mapping(vm_flags_t flags)
>>> +{
>>> + return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
>>> +}
>>> +
>>> #ifdef CONFIG_SHMEM
>>> /*
>>> * The vma_is_shmem is not inline because it is used only by slow
>>> diff --git a/mm/internal.h b/mm/internal.h
>>> index 9902648f2206..1432feec62df 100644
>>> --- a/mm/internal.h
>>> +++ b/mm/internal.h
>>> @@ -296,11 +296,6 @@ static inline unsigned int buddy_order(struct
>>> page *page)
>>> */
>>> #define buddy_order_unsafe(page) READ_ONCE(page_private(page))
>>> -static inline bool is_cow_mapping(vm_flags_t flags)
>>> -{
>>> - return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
>>> -}
>>> -
>>> /*
>>> * These three helpers classifies VMAs for virtual memory accounting.
>>> */
On Tue, Mar 23, 2021 at 12:47:24PM +0100, Daniel Vetter wrote:
> > +static inline bool is_cow_mapping(vm_flags_t flags)
>
> Bit a bikeshed, but I wonder whether the public interface shouldn't be
> vma_is_cow_mapping. Or whether this shouldn't be rejected somewhere else,
> since at least in drivers/gpu we have tons of cases that don't check for
> this and get it all kinds of wrong I think.
>
> remap_pfn_range handles this for many cases, but by far not for all.
>
> Anyway patch itself lgtm:
>
> Reviewed-by: Daniel Vetter <[email protected]>
I would like it if io_remap_pfn_range() did not allow shared mappings
at all.
IIRC it doesn't work anyway, the kernel can't reliably copy from IO
pages eg the "_copy_from_user_inatomic()" under cow_user_page() will
not work on s390 that requires all IO memory be accessed with special
instructions.
Unfortunately I have no idea what the long ago special case of
allowing COW'd IO mappings is. :\
Jason
On 3/23/21 3:00 PM, Jason Gunthorpe wrote:
> On Sun, Mar 21, 2021 at 07:45:29PM +0100, Thomas Hellström (Intel) wrote:
>> To block fast gup we need to make sure TTM ptes are always special.
>> With MIXEDMAP we, on architectures that don't support pte_special,
>> insert normal ptes, but OTOH on those architectures, fast is not
>> supported.
>> At the same time, the function documentation to vm_normal_page() suggests
>> that ptes pointing to system memory pages of MIXEDMAP vmas are always
>> normal, but that doesn't seem consistent with what's implemented in
>> vmf_insert_mixed(). I'm thus not entirely sure this patch is actually
>> needed.
>>
>> But to make sure and to avoid also normal (non-fast) gup, make all
>> TTM vmas PFNMAP. With PFNMAP we can't allow COW mappings
>> anymore so make is_cow_mapping() available and use it to reject
>> COW mappigs at mmap time.
>>
>> There was previously a comment in the code that WC mappings together
>> with x86 PAT + PFNMAP was bad for performance. However from looking at
>> vmf_insert_mixed() it looks like in the current code PFNMAP and MIXEDMAP
>> are handled the same for architectures that support pte_special. This
>> means there should not be a performance difference anymore, but this
>> needs to be verified.
>>
>> Cc: Christian Koenig <[email protected]>
>> Cc: David Airlie <[email protected]>
>> Cc: Daniel Vetter <[email protected]>
>> Cc: Andrew Morton <[email protected]>
>> Cc: Jason Gunthorpe <[email protected]>
>> Cc: [email protected]
>> Cc: [email protected]
>> Cc: [email protected]
>> Signed-off-by: Thomas Hellström (Intel) <[email protected]>
>> drivers/gpu/drm/ttm/ttm_bo_vm.c | 22 ++++++++--------------
>> include/linux/mm.h | 5 +++++
>> mm/internal.h | 5 -----
>> 3 files changed, 13 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> index 1c34983480e5..708c6fb9be81 100644
>> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
>> @@ -372,12 +372,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
>> * at arbitrary times while the data is mmap'ed.
>> * See vmf_insert_mixed_prot() for a discussion.
>> */
>> - if (vma->vm_flags & VM_MIXEDMAP)
>> - ret = vmf_insert_mixed_prot(vma, address,
>> - __pfn_to_pfn_t(pfn, PFN_DEV),
>> - prot);
>> - else
>> - ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
>> + ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
>>
>> /* Never error on prefaulted PTEs */
>> if (unlikely((ret & VM_FAULT_ERROR))) {
>> @@ -555,18 +550,14 @@ static void ttm_bo_mmap_vma_setup(struct ttm_buffer_object *bo, struct vm_area_s
>> * Note: We're transferring the bo reference to
>> * vma->vm_private_data here.
>> */
>> -
>> vma->vm_private_data = bo;
>>
>> /*
>> - * We'd like to use VM_PFNMAP on shared mappings, where
>> - * (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
>> - * but for some reason VM_PFNMAP + x86 PAT + write-combine is very
>> - * bad for performance. Until that has been sorted out, use
>> - * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
>> + * PFNMAP forces us to block COW mappings in mmap(),
>> + * and with MIXEDMAP we would incorrectly allow fast gup
>> + * on TTM memory on architectures that don't have pte_special.
>> */
>> - vma->vm_flags |= VM_MIXEDMAP;
>> - vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
>> + vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
>> }
>>
>> int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
>> @@ -579,6 +570,9 @@ int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
>> if (unlikely(vma->vm_pgoff < DRM_FILE_PAGE_OFFSET_START))
>> return -EINVAL;
>>
>> + if (unlikely(is_cow_mapping(vma->vm_flags)))
>> + return -EINVAL;
>> +
>> bo = ttm_bo_vm_lookup(bdev, vma->vm_pgoff, vma_pages(vma));
>> if (unlikely(!bo))
>> return -EINVAL;
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 77e64e3eac80..c6ebf7f9ddbb 100644
>> +++ b/include/linux/mm.h
>> @@ -686,6 +686,11 @@ static inline bool vma_is_accessible(struct vm_area_struct *vma)
>> return vma->vm_flags & VM_ACCESS_FLAGS;
>> }
>>
>> +static inline bool is_cow_mapping(vm_flags_t flags)
>> +{
>> + return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
>> +}
> Most driver places are just banning VM_SHARED.
>
> I see you copied this from remap_pfn_range(), but that logic is so
> special I'm not sure..
It's actually used all over the place. Both in drivers and also
redefined with
CONFIG_MEM_SOFT_DIRTY which makes me think Daniels idea of
vma_is_cow_mapping() is better since it won't clash and cause
compilation failures...
>
> Can the user mprotect the write back on with the above logic?
No, it's blocked by mprotect.
> Do we
> need VM_DENYWRITE too?
Seems tied to MAP_DENYWRITE which is nowadays ignored according to man
mmap().
Thanks,
Thomas
>
> Jason
On Sun, Mar 21, 2021 at 07:45:29PM +0100, Thomas Hellström (Intel) wrote:
> To block fast gup we need to make sure TTM ptes are always special.
> With MIXEDMAP we, on architectures that don't support pte_special,
> insert normal ptes, but OTOH on those architectures, fast is not
> supported.
> At the same time, the function documentation to vm_normal_page() suggests
> that ptes pointing to system memory pages of MIXEDMAP vmas are always
> normal, but that doesn't seem consistent with what's implemented in
> vmf_insert_mixed(). I'm thus not entirely sure this patch is actually
> needed.
>
> But to make sure and to avoid also normal (non-fast) gup, make all
> TTM vmas PFNMAP. With PFNMAP we can't allow COW mappings
> anymore so make is_cow_mapping() available and use it to reject
> COW mappigs at mmap time.
>
> There was previously a comment in the code that WC mappings together
> with x86 PAT + PFNMAP was bad for performance. However from looking at
> vmf_insert_mixed() it looks like in the current code PFNMAP and MIXEDMAP
> are handled the same for architectures that support pte_special. This
> means there should not be a performance difference anymore, but this
> needs to be verified.
>
> Cc: Christian Koenig <[email protected]>
> Cc: David Airlie <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Signed-off-by: Thomas Hellström (Intel) <[email protected]>
> drivers/gpu/drm/ttm/ttm_bo_vm.c | 22 ++++++++--------------
> include/linux/mm.h | 5 +++++
> mm/internal.h | 5 -----
> 3 files changed, 13 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> index 1c34983480e5..708c6fb9be81 100644
> +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
> @@ -372,12 +372,7 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
> * at arbitrary times while the data is mmap'ed.
> * See vmf_insert_mixed_prot() for a discussion.
> */
> - if (vma->vm_flags & VM_MIXEDMAP)
> - ret = vmf_insert_mixed_prot(vma, address,
> - __pfn_to_pfn_t(pfn, PFN_DEV),
> - prot);
> - else
> - ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
> + ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
>
> /* Never error on prefaulted PTEs */
> if (unlikely((ret & VM_FAULT_ERROR))) {
> @@ -555,18 +550,14 @@ static void ttm_bo_mmap_vma_setup(struct ttm_buffer_object *bo, struct vm_area_s
> * Note: We're transferring the bo reference to
> * vma->vm_private_data here.
> */
> -
> vma->vm_private_data = bo;
>
> /*
> - * We'd like to use VM_PFNMAP on shared mappings, where
> - * (vma->vm_flags & VM_SHARED) != 0, for performance reasons,
> - * but for some reason VM_PFNMAP + x86 PAT + write-combine is very
> - * bad for performance. Until that has been sorted out, use
> - * VM_MIXEDMAP on all mappings. See freedesktop.org bug #75719
> + * PFNMAP forces us to block COW mappings in mmap(),
> + * and with MIXEDMAP we would incorrectly allow fast gup
> + * on TTM memory on architectures that don't have pte_special.
> */
> - vma->vm_flags |= VM_MIXEDMAP;
> - vma->vm_flags |= VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
> + vma->vm_flags |= VM_PFNMAP | VM_IO | VM_DONTEXPAND | VM_DONTDUMP;
> }
>
> int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
> @@ -579,6 +570,9 @@ int ttm_bo_mmap(struct file *filp, struct vm_area_struct *vma,
> if (unlikely(vma->vm_pgoff < DRM_FILE_PAGE_OFFSET_START))
> return -EINVAL;
>
> + if (unlikely(is_cow_mapping(vma->vm_flags)))
> + return -EINVAL;
> +
> bo = ttm_bo_vm_lookup(bdev, vma->vm_pgoff, vma_pages(vma));
> if (unlikely(!bo))
> return -EINVAL;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 77e64e3eac80..c6ebf7f9ddbb 100644
> +++ b/include/linux/mm.h
> @@ -686,6 +686,11 @@ static inline bool vma_is_accessible(struct vm_area_struct *vma)
> return vma->vm_flags & VM_ACCESS_FLAGS;
> }
>
> +static inline bool is_cow_mapping(vm_flags_t flags)
> +{
> + return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
> +}
Most driver places are just banning VM_SHARED.
I see you copied this from remap_pfn_range(), but that logic is so
special I'm not sure..
Can the user mprotect the write back on with the above logic? Do we
need VM_DENYWRITE too?
Jason
On 3/23/21 3:04 PM, Jason Gunthorpe wrote:
> On Tue, Mar 23, 2021 at 12:47:24PM +0100, Daniel Vetter wrote:
>
>>> +static inline bool is_cow_mapping(vm_flags_t flags)
>> Bit a bikeshed, but I wonder whether the public interface shouldn't be
>> vma_is_cow_mapping. Or whether this shouldn't be rejected somewhere else,
>> since at least in drivers/gpu we have tons of cases that don't check for
>> this and get it all kinds of wrong I think.
>>
>> remap_pfn_range handles this for many cases, but by far not for all.
>>
>> Anyway patch itself lgtm:
>>
>> Reviewed-by: Daniel Vetter <[email protected]>
> I would like it if io_remap_pfn_range() did not allow shared mappings
> at all.
You mean private mappings?
>
> IIRC it doesn't work anyway, the kernel can't reliably copy from IO
> pages eg the "_copy_from_user_inatomic()" under cow_user_page() will
> not work on s390 that requires all IO memory be accessed with special
> instructions.
>
> Unfortunately I have no idea what the long ago special case of
> allowing COW'd IO mappings is. :\
Me neither, but at some point it must have been important enough to
introduce VM_MIXEDMAP...
/Thomas
> Jason
On Tue, Mar 23, 2021 at 04:46:00PM +0100, Thomas Hellström (Intel) wrote:
> > > +static inline bool is_cow_mapping(vm_flags_t flags)
> > > +{
> > > + return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
> > > +}
> > Most driver places are just banning VM_SHARED.
> >
> > I see you copied this from remap_pfn_range(), but that logic is so
> > special I'm not sure..
>
> It's actually used all over the place. Both in drivers and also redefined
> with
> CONFIG_MEM_SOFT_DIRTY which makes me think Daniels idea of
> vma_is_cow_mapping() is better since it won't clash and cause compilation
> failures...
Well, lets update more mmap fops to use this new helper then?
Searching for VM_SHARED gives a good list, there are several in
drivers/infiniband
Jason