These patches include few backported fixes for the 4.4 stable
tree.
I would appreciate if you could kindly consider including them in the
next release.
Ajay
---
[PATCH 1/8]:
Backporting of upstream commit f958d7b528b1:
mm: make page ref count overflow check tighter and more explicit
[PATCH 2/8]:
Backporting of upstream commit 88b1a17dfc3e:
mm: add 'try_get_page()' helper function
[PATCH 3/8]:
Backporting of upstream commit 7aef4172c795:
mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton
[PATCH 4/8]:
Backporting of upstream commit a3e328556d41:
mm, gup: remove broken VM_BUG_ON_PAGE compound check for hugepages
[PATCH 5/8]:
Backporting of upstream commit d63206ee32b6:
mm, gup: ensure real head page is ref-counted when using hugepages
[PATCH 6/8]:
Backporting of upstream commit 8fde12ca79af:
mm: prevent get_user_pages() from overflowing page refcount
[PATCH 7/8]:
Backporting of upstream commit 7bf2d1df8082:
pipe: add pipe_buf_get() helper
[PATCH 8/8]:
Backporting of upstream commit 15fab63e1e57:
fs: prevent page refcount overflow in pipe_buf_get
From: Will Deacon <[email protected]>
commit a3e328556d41bb61c55f9dfcc62d6a826ea97b85 upstream.
When operating on hugepages with DEBUG_VM enabled, the GUP code checks
the compound head for each tail page prior to calling
page_cache_add_speculative. This is broken, because on the fast-GUP
path (where we don't hold any page table locks) we can be racing with a
concurrent invocation of split_huge_page_to_list.
split_huge_page_to_list deals with this race by using page_ref_freeze to
freeze the page and force concurrent GUPs to fail whilst the component
pages are modified. This modification includes clearing the
compound_head field for the tail pages, so checking this prior to a
successful call to page_cache_add_speculative can lead to false
positives: In fact, page_cache_add_speculative *already* has this check
once the page refcount has been successfully updated, so we can simply
remove the broken calls to VM_BUG_ON_PAGE.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Will Deacon <[email protected]>
Signed-off-by: Punit Agrawal <[email protected]>
Acked-by: Steve Capper <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Srivatsa S. Bhat (VMware) <[email protected]>
Signed-off-by: Ajay Kaher <[email protected]>
---
mm/gup.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 45c544b..6e7cfaa 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1136,7 +1136,6 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
tail = page;
do {
- VM_BUG_ON_PAGE(compound_head(page) != head, page);
pages[*nr] = page;
(*nr)++;
page++;
@@ -1183,7 +1182,6 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
tail = page;
do {
- VM_BUG_ON_PAGE(compound_head(page) != head, page);
pages[*nr] = page;
(*nr)++;
page++;
@@ -1226,7 +1224,6 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
page = head + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
tail = page;
do {
- VM_BUG_ON_PAGE(compound_head(page) != head, page);
pages[*nr] = page;
(*nr)++;
page++;
--
2.7.4
From: Punit Agrawal <[email protected]>
commit d63206ee32b6e64b0e12d46e5d6004afd9913713 upstream.
When speculatively taking references to a hugepage using
page_cache_add_speculative() in gup_huge_pmd(), it is assumed that the
page returned by pmd_page() is the head page. Although normally true,
this assumption doesn't hold when the hugepage comprises of successive
page table entries such as when using contiguous bit on arm64 at PTE or
PMD levels.
This can be addressed by ensuring that the page passed to
page_cache_add_speculative() is the real head or by de-referencing the
head page within the function.
We take the first approach to keep the usage pattern aligned with
page_cache_get_speculative() where users already pass the appropriate
page, i.e., the de-referenced head.
Apply the same logic to fix gup_huge_[pud|pgd]() as well.
[[email protected]: fix arm64 ltp failure]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Punit Agrawal <[email protected]>
Acked-by: Steve Capper <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Mike Kravetz <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Signed-off-by: Ajay Kaher <[email protected]>
Reviewed-by: Srivatsa S. Bhat (VMware) <[email protected]>
---
mm/gup.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 6e7cfaa..fae4d1e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1132,8 +1132,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
return 0;
refs = 0;
- head = pmd_page(orig);
- page = head + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
+ page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
tail = page;
do {
pages[*nr] = page;
@@ -1142,6 +1141,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
+ head = compound_head(pmd_page(orig));
if (!page_cache_add_speculative(head, refs)) {
*nr -= refs;
return 0;
@@ -1178,8 +1178,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
return 0;
refs = 0;
- head = pud_page(orig);
- page = head + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
+ page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
tail = page;
do {
pages[*nr] = page;
@@ -1188,6 +1187,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
+ head = compound_head(pud_page(orig));
if (!page_cache_add_speculative(head, refs)) {
*nr -= refs;
return 0;
@@ -1220,8 +1220,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
return 0;
refs = 0;
- head = pgd_page(orig);
- page = head + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
+ page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
tail = page;
do {
pages[*nr] = page;
@@ -1230,6 +1229,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
+ head = compound_head(pgd_page(orig));
if (!page_cache_add_speculative(head, refs)) {
*nr -= refs;
return 0;
--
2.7.4
From: Linus Torvalds <[email protected]>
commit 8fde12ca79aff9b5ba951fce1a2641901b8d8e64 upstream.
If the page refcount wraps around past zero, it will be freed while
there are still four billion references to it. One of the possible
avenues for an attacker to try to make this happen is by doing direct IO
on a page multiple times. This patch makes get_user_pages() refuse to
take a new page reference if there are already more than two billion
references to the page.
Reported-by: Jann Horn <[email protected]>
Acked-by: Matthew Wilcox <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
[ 4.4.y backport notes:
Ajay: Added local variable 'err' with-in follow_hugetlb_page()
from 2be7cfed995e, to resolve compilation error
Srivatsa: Replaced call to get_page_foll() with try_get_page_foll() ]
Signed-off-by: Srivatsa S. Bhat (VMware) <[email protected]>
Signed-off-by: Ajay Kaher <[email protected]>
---
mm/gup.c | 43 ++++++++++++++++++++++++++++++++-----------
mm/hugetlb.c | 16 +++++++++++++++-
2 files changed, 47 insertions(+), 12 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index fae4d1e..171b460 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -126,8 +126,12 @@ retry:
}
}
- if (flags & FOLL_GET)
- get_page_foll(page);
+ if (flags & FOLL_GET) {
+ if (unlikely(!try_get_page_foll(page))) {
+ page = ERR_PTR(-ENOMEM);
+ goto out;
+ }
+ }
if (flags & FOLL_TOUCH) {
if ((flags & FOLL_WRITE) &&
!pte_dirty(pte) && !PageDirty(page))
@@ -289,7 +293,10 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
goto unmap;
*page = pte_page(*pte);
}
- get_page(*page);
+ if (unlikely(!try_get_page(*page))) {
+ ret = -ENOMEM;
+ goto unmap;
+ }
out:
ret = 0;
unmap:
@@ -1053,6 +1060,20 @@ struct page *get_dump_page(unsigned long addr)
*/
#ifdef CONFIG_HAVE_GENERIC_RCU_GUP
+/*
+ * Return the compund head page with ref appropriately incremented,
+ * or NULL if that failed.
+ */
+static inline struct page *try_get_compound_head(struct page *page, int refs)
+{
+ struct page *head = compound_head(page);
+ if (WARN_ON_ONCE(atomic_read(&head->_count) < 0))
+ return NULL;
+ if (unlikely(!page_cache_add_speculative(head, refs)))
+ return NULL;
+ return head;
+}
+
#ifdef __HAVE_ARCH_PTE_SPECIAL
static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
int write, struct page **pages, int *nr)
@@ -1082,9 +1103,9 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);
- head = compound_head(page);
- if (!page_cache_get_speculative(head))
+ head = try_get_compound_head(page, 1);
+ if (!head)
goto pte_unmap;
if (unlikely(pte_val(pte) != pte_val(*ptep))) {
@@ -1141,8 +1162,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
- head = compound_head(pmd_page(orig));
- if (!page_cache_add_speculative(head, refs)) {
+ head = try_get_compound_head(pmd_page(orig), refs);
+ if (!head) {
*nr -= refs;
return 0;
}
@@ -1187,8 +1208,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
- head = compound_head(pud_page(orig));
- if (!page_cache_add_speculative(head, refs)) {
+ head = try_get_compound_head(pud_page(orig), refs);
+ if (!head) {
*nr -= refs;
return 0;
}
@@ -1229,8 +1250,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
refs++;
} while (addr += PAGE_SIZE, addr != end);
- head = compound_head(pgd_page(orig));
- if (!page_cache_add_speculative(head, refs)) {
+ head = try_get_compound_head(pgd_page(orig), refs);
+ if (!head) {
*nr -= refs;
return 0;
}
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index fd932e7..3a1501e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3886,6 +3886,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long vaddr = *position;
unsigned long remainder = *nr_pages;
struct hstate *h = hstate_vma(vma);
+ int err = -EFAULT;
while (vaddr < vma->vm_end && remainder) {
pte_t *pte;
@@ -3957,6 +3958,19 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT;
page = pte_page(huge_ptep_get(pte));
+
+ /*
+ * Instead of doing 'try_get_page_foll()' below in the same_page
+ * loop, just check the count once here.
+ */
+ if (unlikely(page_count(page) <= 0)) {
+ if (pages) {
+ spin_unlock(ptl);
+ remainder = 0;
+ err = -ENOMEM;
+ break;
+ }
+ }
same_page:
if (pages) {
pages[i] = mem_map_offset(page, pfn_offset);
@@ -3983,7 +3997,7 @@ same_page:
*nr_pages = remainder;
*position = vaddr;
- return i ? i : -EFAULT;
+ return i ? i : err;
}
unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
--
2.7.4
On Tue, Jul 23, 2019 at 04:38:23PM +0530, Ajay Kaher wrote:
> These patches include few backported fixes for the 4.4 stable
> tree.
> I would appreciate if you could kindly consider including them in the
> next release.
Why are these needed? From what I remember, the last patch here is only
needed for machines that are "HUGE" and for those, you shouldn't be
using 4.4.y anymore anyway, right? You just end up saving so much more
speed and energy using a newer kernel, why would you want to waste it
using an older one?
So I need a really good reason why to accept these :)
thanks,
greg k-h
On Wed, Jul 24, 2019 at 02:06:27PM +0200, Greg KH wrote:
> On Tue, Jul 23, 2019 at 04:38:23PM +0530, Ajay Kaher wrote:
> > These patches include few backported fixes for the 4.4 stable
> > tree.
> > I would appreciate if you could kindly consider including them in the
> > next release.
>
> Why are these needed? From what I remember, the last patch here is only
> needed for machines that are "HUGE" and for those, you shouldn't be
> using 4.4.y anymore anyway, right? You just end up saving so much more
> speed and energy using a newer kernel, why would you want to waste it
> using an older one?
>
> So I need a really good reason why to accept these :)
It's been a week, so I'm dropping this from my queue now. Please resend
with this information if you still want these in the tree.
thanks,
greg k-h