In order to reduce TLB misses and CPU usage this patchset enables huge-
and giant page-table entries for TTM and TTM-enabled graphics drivers.
Patch 1 and 2 introduce a vma_is_special_huge() function to make the mm code
take the same path as DAX when splitting huge- and giant page table entries,
(which currently means zapping the page-table entry and rely on re-faulting).
Patch 3 makes the mm code split existing huge page-table entries
on huge_fault fallbacks. Typically on COW or on buffer-objects that want
write-notify. COW and write-notification is always done on the lowest
page-table level. See the patch log message for additional considerations.
Patch 4 introduces functions to allow the graphics drivers to manipulate
the caching- and encryption flags of huge page-table entries without ugly
hacks.
Patch 5 implements the huge_fault handler in TTM.
This enables huge page-table entries, provided that the kernel is configured
to support transhuge pages, either by default or using madvise().
However, they are unlikely to be inserted unless the kernel buffer object
pfns and user-space addresses align perfectly. There are various options
here, but since buffer objects that reside in system pages typically start
at huge page boundaries if they are backed by huge pages, we try to enforce
buffer object starting pfns and user-space addresses to be huge page-size
aligned if their size exceeds a huge page-size. If pud-size transhuge
("giant") pages are enabled by the arch, the same holds for those.
Patch 6 implements a specialized huge_fault handler for vmwgfx.
The vmwgfx driver may perform dirty-tracking and needs some special code
to handle that correctly.
Patch 7 implements a drm helper to align user-space addresses according
to the above scheme, if possible.
Patch 8 implements a TTM range manager for vmwgfx that does the same for
graphics IO memory. This may later be reused by other graphics drivers
if necessary.
Patch 9 finally hooks up the helpers of patch 7 and 8 to the vmwgfx driver.
A similar change is needed for graphics drivers that want a reasonable
likelyhood of actually using huge page-table entries.
If a buffer object size is not huge-page or giant-page aligned,
its size will NOT be inflated by this patchset. This means that the buffer
object tail will use smaller size page-table entries and thus no memory
overhead occurs. Drivers that want to pay the memory overhead price need to
implement their own scheme to inflate buffer-object sizes.
PMD size huge page-table-entries have been tested with vmwgfx and found to
work well both with system memory backed and IO memory backed buffer objects.
PUD size giant page-table-entries have seen limited (fault and COW) testing
using a modified kernel (to support 1GB page allocations) and a fake vmwgfx
TTM memory type. The vmwgfx driver does otherwise not support 1GB-size IO
memory resources.
Comments and suggestions welcome.
Thomas
Changes since RFC:
* Check for buffer objects present in contigous IO Memory (Christian König)
* Rebased on the vmwgfx emulated coherent memory functionality. That rebase
adds patch 5.
Changes since v1:
* Make the new TTM range manager vmwgfx-specific. (Christian König)
* Minor fixes for configs that don't support or only partially support
transhuge pages.
Changes since v2:
* Minor coding style and doc fixes in patch 5/9 (Christian König)
* Patch 5/9 doesn't touch mm. Remove from the patch title.
Changes since v3:
* Added reviews and acks
* Implemented ugly but generic ttm_pgprot_is_wrprotecting() instead of arch
specific code.
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Dan Williams <[email protected]>
From: Thomas Hellstrom <[email protected]>
For VM_PFNMAP and VM_MIXEDMAP vmas that want to support transhuge pages
and -page table entries, introduce vma_is_special_huge() that takes the
same codepaths as vma_is_dax().
The use of "special" follows the definition in memory.c, vm_normal_page():
"Special" mappings do not wish to be associated with a "struct page"
(either it doesn't exist, or it exists but they don't want to touch it)
For PAGE_SIZE pages, "special" is determined per page table entry to be
able to deal with COW pages. But since we don't have huge COW pages,
we can classify a vma as either "special huge" or "normal huge".
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Matthew Wilcox (Oracle)" <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Ralph Campbell <[email protected]>
Cc: "Jérôme Glisse" <[email protected]>
Cc: "Christian König" <[email protected]>
Cc: Dan Williams <[email protected]>
Signed-off-by: Thomas Hellstrom <[email protected]>
Acked-by: Christian König <[email protected]>
---
include/linux/mm.h | 17 +++++++++++++++++
mm/huge_memory.c | 6 +++---
2 files changed, 20 insertions(+), 3 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0157d293935f..d370ce2932a1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2822,6 +2822,23 @@ extern long copy_huge_page_from_user(struct page *dst_page,
const void __user *usr_src,
unsigned int pages_per_huge_page,
bool allow_pagefault);
+
+/**
+ * vma_is_special_huge - Are transhuge page-table entries considered special?
+ * @vma: Pointer to the struct vm_area_struct to consider
+ *
+ * Whether transhuge page-table entries are considered "special" following
+ * the definition in vm_normal_page().
+ *
+ * Return: true if transhuge page-table entries should be considered special,
+ * false otherwise.
+ */
+static inline bool vma_is_special_huge(const struct vm_area_struct *vma)
+{
+ return vma_is_dax(vma) || (vma->vm_file &&
+ (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)));
+}
+
#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_HUGETLBFS */
#ifdef CONFIG_DEBUG_PAGEALLOC
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 41a0fbddc96b..f8d24fc3f4df 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1789,7 +1789,7 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd,
tlb->fullmm);
tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
- if (vma_is_dax(vma)) {
+ if (vma_is_special_huge(vma)) {
if (arch_needs_pgtable_deposit())
zap_deposited_table(tlb->mm, pmd);
spin_unlock(ptl);
@@ -2053,7 +2053,7 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma,
*/
pudp_huge_get_and_clear_full(tlb->mm, addr, pud, tlb->fullmm);
tlb_remove_pud_tlb_entry(tlb, pud, addr);
- if (vma_is_dax(vma)) {
+ if (vma_is_special_huge(vma)) {
spin_unlock(ptl);
/* No zero page support yet */
} else {
@@ -2162,7 +2162,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
*/
if (arch_needs_pgtable_deposit())
zap_deposited_table(mm, pmd);
- if (vma_is_dax(vma))
+ if (vma_is_special_huge(vma))
return;
page = pmd_page(_pmd);
if (!PageDirty(page) && pmd_dirty(_pmd))
--
2.21.1
Andrew, Michal
I'm wondering what's the best way here to get the patches touching mm
reviewed and accepted?
While drm people and VMware internal people have looked at them, I think
the huge_fault() fallback splitting and the introduction of
vma_is_special_huge() needs looking at more thoroughly.
Apart from that, if possible, I think the best way to merge this series
is also through a DRM tree.
Thanks,
Thomas
On 2/20/20 1:27 PM, Thomas Hellström (VMware) wrote:
> In order to reduce TLB misses and CPU usage this patchset enables huge-
> and giant page-table entries for TTM and TTM-enabled graphics drivers.
>
> Patch 1 and 2 introduce a vma_is_special_huge() function to make the mm code
> take the same path as DAX when splitting huge- and giant page table entries,
> (which currently means zapping the page-table entry and rely on re-faulting).
>
> Patch 3 makes the mm code split existing huge page-table entries
> on huge_fault fallbacks. Typically on COW or on buffer-objects that want
> write-notify. COW and write-notification is always done on the lowest
> page-table level. See the patch log message for additional considerations.
>
> Patch 4 introduces functions to allow the graphics drivers to manipulate
> the caching- and encryption flags of huge page-table entries without ugly
> hacks.
>
> Patch 5 implements the huge_fault handler in TTM.
> This enables huge page-table entries, provided that the kernel is configured
> to support transhuge pages, either by default or using madvise().
> However, they are unlikely to be inserted unless the kernel buffer object
> pfns and user-space addresses align perfectly. There are various options
> here, but since buffer objects that reside in system pages typically start
> at huge page boundaries if they are backed by huge pages, we try to enforce
> buffer object starting pfns and user-space addresses to be huge page-size
> aligned if their size exceeds a huge page-size. If pud-size transhuge
> ("giant") pages are enabled by the arch, the same holds for those.
>
> Patch 6 implements a specialized huge_fault handler for vmwgfx.
> The vmwgfx driver may perform dirty-tracking and needs some special code
> to handle that correctly.
>
> Patch 7 implements a drm helper to align user-space addresses according
> to the above scheme, if possible.
>
> Patch 8 implements a TTM range manager for vmwgfx that does the same for
> graphics IO memory. This may later be reused by other graphics drivers
> if necessary.
>
> Patch 9 finally hooks up the helpers of patch 7 and 8 to the vmwgfx driver.
> A similar change is needed for graphics drivers that want a reasonable
> likelyhood of actually using huge page-table entries.
>
> If a buffer object size is not huge-page or giant-page aligned,
> its size will NOT be inflated by this patchset. This means that the buffer
> object tail will use smaller size page-table entries and thus no memory
> overhead occurs. Drivers that want to pay the memory overhead price need to
> implement their own scheme to inflate buffer-object sizes.
>
> PMD size huge page-table-entries have been tested with vmwgfx and found to
> work well both with system memory backed and IO memory backed buffer objects.
>
> PUD size giant page-table-entries have seen limited (fault and COW) testing
> using a modified kernel (to support 1GB page allocations) and a fake vmwgfx
> TTM memory type. The vmwgfx driver does otherwise not support 1GB-size IO
> memory resources.
>
> Comments and suggestions welcome.
> Thomas
>
> Changes since RFC:
> * Check for buffer objects present in contigous IO Memory (Christian König)
> * Rebased on the vmwgfx emulated coherent memory functionality. That rebase
> adds patch 5.
> Changes since v1:
> * Make the new TTM range manager vmwgfx-specific. (Christian König)
> * Minor fixes for configs that don't support or only partially support
> transhuge pages.
> Changes since v2:
> * Minor coding style and doc fixes in patch 5/9 (Christian König)
> * Patch 5/9 doesn't touch mm. Remove from the patch title.
> Changes since v3:
> * Added reviews and acks
> * Implemented ugly but generic ttm_pgprot_is_wrprotecting() instead of arch
> specific code.
>
> Cc: Andrew Morton <[email protected]>
> Cc: Michal Hocko <[email protected]>
> Cc: "Matthew Wilcox (Oracle)" <[email protected]>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Cc: Ralph Campbell <[email protected]>
> Cc: "Jérôme Glisse" <[email protected]>
> Cc: "Christian König" <[email protected]>
> Cc: Dan Williams <[email protected]>
>
>
> _______________________________________________
> dri-devel mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Fri, 28 Feb 2020 14:08:04 +0100 Thomas Hellstr?m (VMware) <[email protected]> wrote:
> I'm wondering what's the best way here to get the patches touching mm
> reviewed and accepted?
> While drm people and VMware internal people have looked at them, I think
> the huge_fault() fallback splitting and the introduction of
> vma_is_special_huge() needs looking at more thoroughly.
>
> Apart from that, if possible, I think the best way to merge this series
> is also through a DRM tree.
Patches 1-3 look OK to me. I just had a few commenting/changelogging
niggles.
On Fri 28-02-20 14:08:04, Thomas Hellstr?m (VMware) wrote:
> Andrew, Michal
>
> I'm wondering what's the best way here to get the patches touching mm
> reviewed and accepted?
I am sorry, but I am busy with other stuff and unlikely to find time to
review this series.
--
Michal Hocko
SUSE Labs
On 3/1/20 5:04 AM, Andrew Morton wrote:
> On Fri, 28 Feb 2020 14:08:04 +0100 Thomas Hellström (VMware) <[email protected]> wrote:
>
>> I'm wondering what's the best way here to get the patches touching mm
>> reviewed and accepted?
>> While drm people and VMware internal people have looked at them, I think
>> the huge_fault() fallback splitting and the introduction of
>> vma_is_special_huge() needs looking at more thoroughly.
>>
>> Apart from that, if possible, I think the best way to merge this series
>> is also through a DRM tree.
> Patches 1-3 look OK to me. I just had a few commenting/changelogging
> niggles.
Thanks for reviewing, Andrew.
I just updated the series following your comments.
Thanks,
Thomas