2007-12-10 21:31:50

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [rfc] lockless get_user_pages for dio (and more)


On Mon, 2007-10-15 at 22:25 +1000, Nick Piggin wrote:
> On Monday 15 October 2007 04:19, Siddha, Suresh B wrote:
> > On Sun, Oct 14, 2007 at 11:01:02AM +1000, Nick Piggin wrote:

> > > This is just a really quick hack, untested ATM, but one that
> > > has at least a chance of working (on x86).
> >
> > When we fall back to slow mode, we should decrement the ref counts
> > on the pages we got so far in the fast mode.
>
> Here is something that is actually tested and works (not
> tested with hugepages yet, though).
>
> However it's not 100% secure at the moment. It's actually
> not completely trivial; I think we need to use an extra bit
> in the present pte in order to exclude "not normal" pages,
> if we want fast_gup to work on small page mappings too. I
> think this would be possible to do on most architectures, but
> I haven't done it here obviously.
>
> Still, it should be enough to test the design. I've added
> fast_gup and fast_gup_slow to /proc/vmstat, which count the
> number of times fast_gup was called, and the number of times
> it dropped into the slowpath. It would be interesting to know
> how it performs compared to your granular hugepage ptl...

Nick,
I've played with the fast_gup patch a bit. I was able to find a problem
in follow_hugetlb_page() that Adam Litke fixed. I'm haven't been brave
enough to implement it on any other architectures, but I did add a
default that takes mmap_sem and calls the normal get_user_pages() if the
architecture doesn't define fast_gup(). I put it in linux/mm.h, for
lack of a better place, but it's a little kludgy since I didn't want
mm.h to have to include sched.h. This patch is against 2.6.24-rc4.
It's not ready for inclusion yet, of course.

I haven't done much benchmarking. The one test I was looking at didn't
show much of a change.

==============================================
Introduce a new "fast_gup" (for want of a better name right now) which
is basically a get_user_pages with a less general API that is more suited
to the common case.

- task and mm are always current and current->mm
- force is always 0
- pages is always non-NULL
- don't pass back vmas

This allows (at least on x86), an optimistic lockless pagetable walk,
without taking any page table locks or even mmap_sem. Page table existence
is guaranteed by turning interrupts off (combined with the fact that we're
always looking up the current mm, which would need an IPI before its
pagetables could be shot down from another CPU).

Many other architectures could do the same thing. Those that don't IPI
could potentially RCU free the page tables and do speculative references
on the pages (a la lockless pagecache) to achieve a lockless fast_gup.

Originally by Nick Piggin <[email protected]>
---
arch/x86/lib/Makefile_64 | 2
arch/x86/lib/gup_64.c | 188 +++++++++++++++++++++++++++++++++++++++++++
fs/bio.c | 8 -
fs/block_dev.c | 5 -
fs/direct-io.c | 10 --
fs/splice.c | 38 --------
include/asm-x86/uaccess_64.h | 4
include/linux/mm.h | 26 +++++
include/linux/vmstat.h | 1
mm/vmstat.c | 3
10 files changed, 231 insertions(+), 54 deletions(-)

diff -Nurp linux-2.6.24-rc4/arch/x86/lib/Makefile_64 linux/arch/x86/lib/Makefile_64
--- linux-2.6.24-rc4/arch/x86/lib/Makefile_64 2007-12-04 08:44:34.000000000 -0600
+++ linux/arch/x86/lib/Makefile_64 2007-12-10 15:01:17.000000000 -0600
@@ -10,4 +10,4 @@ obj-$(CONFIG_SMP) += msr-on-cpu.o
lib-y := csum-partial_64.o csum-copy_64.o csum-wrappers_64.o delay_64.o \
usercopy_64.o getuser_64.o putuser_64.o \
thunk_64.o clear_page_64.o copy_page_64.o bitstr_64.o bitops_64.o
-lib-y += memcpy_64.o memmove_64.o memset_64.o copy_user_64.o rwlock_64.o copy_user_nocache_64.o
+lib-y += memcpy_64.o memmove_64.o memset_64.o copy_user_64.o rwlock_64.o copy_user_nocache_64.o gup_64.o
diff -Nurp linux-2.6.24-rc4/arch/x86/lib/gup_64.c linux/arch/x86/lib/gup_64.c
--- linux-2.6.24-rc4/arch/x86/lib/gup_64.c 1969-12-31 18:00:00.000000000 -0600
+++ linux/arch/x86/lib/gup_64.c 2007-12-10 15:01:17.000000000 -0600
@@ -0,0 +1,188 @@
+/*
+ * Lockless fast_gup for x86
+ *
+ * Copyright (C) 2007 Nick Piggin
+ * Copyright (C) 2007 Novell Inc.
+ */
+#include <linux/sched.h>
+#include <linux/mm.h>
+#include <linux/vmstat.h>
+#include <asm/pgtable.h>
+
+static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ pte_t *ptep;
+
+ /* XXX: this won't work for 32-bit (must map pte) */
+ ptep = (pte_t *)pmd_page_vaddr(pmd) + pte_index(addr);
+ do {
+ pte_t pte = *ptep;
+ unsigned long pfn;
+ struct page *page;
+
+ if ((pte_val(pte) & (_PAGE_PRESENT|_PAGE_USER)) !=
+ (_PAGE_PRESENT|_PAGE_USER))
+ return 0;
+
+ if (write && !pte_write(pte))
+ return 0;
+
+ /* XXX: really need new bit in pte to denote normal page */
+ pfn = pte_pfn(pte);
+ if (unlikely(!pfn_valid(pfn)))
+ return 0;
+
+ page = pfn_to_page(pfn);
+ get_page(page);
+ pages[*nr] = page;
+ (*nr)++;
+
+ } while (ptep++, addr += PAGE_SIZE, addr != end);
+ pte_unmap(ptep - 1);
+
+ return 1;
+}
+
+static inline void get_head_page_multiple(struct page *page, int nr)
+{
+ VM_BUG_ON(page != compound_head(page));
+ VM_BUG_ON(page_count(page) == 0);
+ atomic_add(nr, &page->_count);
+}
+
+static int gup_huge_pmd(pmd_t pmd, unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ pte_t pte = *(pte_t *)&pmd;
+ struct page *head, *page;
+ int refs;
+
+ if ((pte_val(pte) & _PAGE_USER) != _PAGE_USER)
+ return 0;
+
+ BUG_ON(!pfn_valid(pte_pfn(pte)));
+
+ if (write && !pte_write(pte))
+ return 0;
+
+ refs = 0;
+ head = pte_page(pte);
+ page = head + ((addr & ~HPAGE_MASK) >> PAGE_SHIFT);
+ do {
+ pages[*nr] = page;
+ (*nr)++;
+ page++;
+ refs++;
+ } while (addr += PAGE_SIZE, addr != end);
+
+ get_head_page_multiple(head, refs);
+
+ return 1;
+}
+
+static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ static int count = 50;
+ unsigned long next;
+ pmd_t *pmdp;
+
+ pmdp = (pmd_t *)pud_page_vaddr(pud) + pmd_index(addr);
+ do {
+ pmd_t pmd = *pmdp;
+
+ next = pmd_addr_end(addr, end);
+ if (pmd_none(pmd))
+ return 0;
+ if (unlikely(pmd_large(pmd))) {
+ if (!gup_huge_pmd(pmd, addr, next, write, pages, nr)) {
+ if (count) {
+ printk(KERN_ERR
+ "pmd = 0x%lx, addr = 0x%lx\n",
+ pmd.pmd, addr);
+ count--;
+ }
+ return 0;
+ }
+ } else {
+ if (!gup_pte_range(pmd, addr, next, write, pages, nr))
+ return 0;
+ }
+ } while (pmdp++, addr = next, addr != end);
+
+ return 1;
+}
+
+static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
+ int write, struct page **pages, int *nr)
+{
+ unsigned long next;
+ pud_t *pudp;
+
+ pudp = (pud_t *)pgd_page_vaddr(pgd) + pud_index(addr);
+ do {
+ pud_t pud = *pudp;
+
+ next = pud_addr_end(addr, end);
+ if (pud_none(pud))
+ return 0;
+ if (!gup_pmd_range(pud, addr, next, write, pages, nr))
+ return 0;
+ } while (pudp++, addr = next, addr != end);
+
+ return 1;
+}
+
+int fast_gup(unsigned long start, int nr_pages, int write, struct page **pages)
+{
+ struct mm_struct *mm = current->mm;
+ unsigned long end = start + (nr_pages << PAGE_SHIFT);
+ unsigned long addr = start;
+ unsigned long next;
+ pgd_t *pgdp;
+ int nr = 0;
+
+ /* XXX: batch / limit 'nr', to avoid huge latency */
+ /*
+ * This doesn't prevent pagetable teardown, but does prevent
+ * the pagetables from being freed on x86-64.
+ *
+ * So long as we atomically load page table pointers versus teardown
+ * (which we do on x86-64), we can follow the address down to the
+ * the page.
+ */
+ local_irq_disable();
+ __count_vm_event(FAST_GUP);
+ pgdp = pgd_offset(mm, addr);
+ do {
+ pgd_t pgd = *pgdp;
+
+ next = pgd_addr_end(addr, end);
+ if (pgd_none(pgd))
+ goto slow;
+ if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
+ goto slow;
+ } while (pgdp++, addr = next, addr != end);
+ local_irq_enable();
+
+ BUG_ON(nr != (end - start) >> PAGE_SHIFT);
+ return nr;
+
+slow:
+ {
+ int i, ret;
+
+ __count_vm_event(FAST_GUP_SLOW);
+ local_irq_enable();
+ for (i = 0; i < nr; i++)
+ put_page(pages[i]);
+
+ down_read(&mm->mmap_sem);
+ ret = get_user_pages(current, mm, start,
+ (end - start) >> PAGE_SHIFT, write, 0, pages, NULL);
+ up_read(&mm->mmap_sem);
+
+ return ret;
+ }
+}
diff -Nurp linux-2.6.24-rc4/fs/bio.c linux/fs/bio.c
--- linux-2.6.24-rc4/fs/bio.c 2007-12-04 08:44:49.000000000 -0600
+++ linux/fs/bio.c 2007-12-10 15:01:17.000000000 -0600
@@ -636,13 +636,9 @@ static struct bio *__bio_map_user_iov(st
unsigned long start = uaddr >> PAGE_SHIFT;
const int local_nr_pages = end - start;
const int page_limit = cur_page + local_nr_pages;
-
- down_read(&current->mm->mmap_sem);
- ret = get_user_pages(current, current->mm, uaddr,
- local_nr_pages,
- write_to_vm, 0, &pages[cur_page], NULL);
- up_read(&current->mm->mmap_sem);

+ ret = fast_gup(uaddr, local_nr_pages, write_to_vm,
+ &pages[cur_page]);
if (ret < local_nr_pages) {
ret = -EFAULT;
goto out_unmap;
diff -Nurp linux-2.6.24-rc4/fs/block_dev.c linux/fs/block_dev.c
--- linux-2.6.24-rc4/fs/block_dev.c 2007-12-04 08:44:49.000000000 -0600
+++ linux/fs/block_dev.c 2007-12-10 15:01:17.000000000 -0600
@@ -221,10 +221,7 @@ static struct page *blk_get_page(unsigne
if (pvec->idx == pvec->nr) {
nr_pages = PAGES_SPANNED(addr, count);
nr_pages = min(nr_pages, VEC_SIZE);
- down_read(&current->mm->mmap_sem);
- ret = get_user_pages(current, current->mm, addr, nr_pages,
- rw == READ, 0, pvec->page, NULL);
- up_read(&current->mm->mmap_sem);
+ ret = fast_gup(addr, nr_pages, rw == READ, pvec->page);
if (ret < 0)
return ERR_PTR(ret);
pvec->nr = ret;
diff -Nurp linux-2.6.24-rc4/fs/direct-io.c linux/fs/direct-io.c
--- linux-2.6.24-rc4/fs/direct-io.c 2007-12-04 08:44:49.000000000 -0600
+++ linux/fs/direct-io.c 2007-12-10 15:01:17.000000000 -0600
@@ -150,17 +150,11 @@ static int dio_refill_pages(struct dio *
int nr_pages;

nr_pages = min(dio->total_pages - dio->curr_page, DIO_PAGES);
- down_read(&current->mm->mmap_sem);
- ret = get_user_pages(
- current, /* Task for fault acounting */
- current->mm, /* whose pages? */
+ ret = fast_gup(
dio->curr_user_address, /* Where from? */
nr_pages, /* How many pages? */
dio->rw == READ, /* Write to memory? */
- 0, /* force (?) */
- &dio->pages[0],
- NULL); /* vmas */
- up_read(&current->mm->mmap_sem);
+ &dio->pages[0]); /* Put results here */

if (ret < 0 && dio->blocks_available && (dio->rw & WRITE)) {
struct page *page = ZERO_PAGE(0);
diff -Nurp linux-2.6.24-rc4/fs/splice.c linux/fs/splice.c
--- linux-2.6.24-rc4/fs/splice.c 2007-12-04 08:44:50.000000000 -0600
+++ linux/fs/splice.c 2007-12-10 15:01:17.000000000 -0600
@@ -1174,33 +1174,6 @@ static long do_splice(struct file *in, l
}

/*
- * Do a copy-from-user while holding the mmap_semaphore for reading, in a
- * manner safe from deadlocking with simultaneous mmap() (grabbing mmap_sem
- * for writing) and page faulting on the user memory pointed to by src.
- * This assumes that we will very rarely hit the partial != 0 path, or this
- * will not be a win.
- */
-static int copy_from_user_mmap_sem(void *dst, const void __user *src, size_t n)
-{
- int partial;
-
- pagefault_disable();
- partial = __copy_from_user_inatomic(dst, src, n);
- pagefault_enable();
-
- /*
- * Didn't copy everything, drop the mmap_sem and do a faulting copy
- */
- if (unlikely(partial)) {
- up_read(&current->mm->mmap_sem);
- partial = copy_from_user(dst, src, n);
- down_read(&current->mm->mmap_sem);
- }
-
- return partial;
-}
-
-/*
* Map an iov into an array of pages and offset/length tupples. With the
* partial_page structure, we can map several non-contiguous ranges into
* our ones pages[] map instead of splitting that operation into pieces.
@@ -1213,8 +1186,6 @@ static int get_iovec_page_array(const st
{
int buffers = 0, error = 0;

- down_read(&current->mm->mmap_sem);
-
while (nr_vecs) {
unsigned long off, npages;
struct iovec entry;
@@ -1223,7 +1194,7 @@ static int get_iovec_page_array(const st
int i;

error = -EFAULT;
- if (copy_from_user_mmap_sem(&entry, iov, sizeof(entry)))
+ if (copy_from_user(&entry, iov, sizeof(entry)))
break;

base = entry.iov_base;
@@ -1257,9 +1228,8 @@ static int get_iovec_page_array(const st
if (npages > PIPE_BUFFERS - buffers)
npages = PIPE_BUFFERS - buffers;

- error = get_user_pages(current, current->mm,
- (unsigned long) base, npages, 0, 0,
- &pages[buffers], NULL);
+ error = fast_gup((unsigned long)base, npages, 0,
+ &pages[buffers]);

if (unlikely(error <= 0))
break;
@@ -1298,8 +1268,6 @@ static int get_iovec_page_array(const st
iov++;
}

- up_read(&current->mm->mmap_sem);
-
if (buffers)
return buffers;

diff -Nurp linux-2.6.24-rc4/include/asm-x86/uaccess_64.h linux/include/asm-x86/uaccess_64.h
--- linux-2.6.24-rc4/include/asm-x86/uaccess_64.h 2007-12-04 08:44:54.000000000 -0600
+++ linux/include/asm-x86/uaccess_64.h 2007-12-10 15:01:17.000000000 -0600
@@ -381,4 +381,8 @@ static inline int __copy_from_user_inato
return __copy_user_nocache(dst, src, size, 0);
}

+#define __HAVE_ARCH_FAST_GUP
+struct page;
+int fast_gup(unsigned long start, int nr_pages, int write, struct page **pages);
+
#endif /* __X86_64_UACCESS_H */
diff -Nurp linux-2.6.24-rc4/include/linux/mm.h linux/include/linux/mm.h
--- linux-2.6.24-rc4/include/linux/mm.h 2007-12-04 08:44:56.000000000 -0600
+++ linux/include/linux/mm.h 2007-12-10 15:01:17.000000000 -0600
@@ -12,6 +12,7 @@
#include <linux/prio_tree.h>
#include <linux/debug_locks.h>
#include <linux/mm_types.h>
+#include <linux/uaccess.h> /* for __HAVE_ARCH_FAST_GUP */

struct mempolicy;
struct anon_vma;
@@ -750,6 +751,31 @@ extern int mprotect_fixup(struct vm_area
unsigned long end, unsigned long newflags);

/*
+ * Architecture may implement efficient get_user_pages to avoid having to
+ * take the mmap sem
+ */
+#ifndef __HAVE_ARCH_FAST_GUP
+static inline int __fast_gup(struct mm_struct *mm, unsigned long start,
+ int nr_pages, int write, struct page **pages)
+{
+ int ret;
+
+ down_read(&mm->mmap_sem);
+ ret = get_user_pages(current, mm, start, nr_pages, write,
+ 0, pages, NULL);
+ up_read(&mm->mmap_sem);
+
+ return ret;
+}
+/*
+ * This macro avoids having to include sched.h in this header to
+ * dereference current->mm.
+ */
+#define fast_gup(start, nr_pages, write, pages) \
+ __fast_gup(current->mm, start, nr_pages, write, pages)
+#endif
+
+/*
* A callback you can register to apply pressure to ageable caches.
*
* 'shrink' is passed a count 'nr_to_scan' and a 'gfpmask'. It should
diff -Nurp linux-2.6.24-rc4/include/linux/vmstat.h linux/include/linux/vmstat.h
--- linux-2.6.24-rc4/include/linux/vmstat.h 2007-10-09 15:31:38.000000000 -0500
+++ linux/include/linux/vmstat.h 2007-12-10 15:01:17.000000000 -0600
@@ -37,6 +37,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
FOR_ALL_ZONES(PGSCAN_DIRECT),
PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
PAGEOUTRUN, ALLOCSTALL, PGROTATED,
+ FAST_GUP, FAST_GUP_SLOW,
NR_VM_EVENT_ITEMS
};

diff -Nurp linux-2.6.24-rc4/mm/vmstat.c linux/mm/vmstat.c
--- linux-2.6.24-rc4/mm/vmstat.c 2007-12-04 08:45:01.000000000 -0600
+++ linux/mm/vmstat.c 2007-12-10 15:01:17.000000000 -0600
@@ -642,6 +642,9 @@ static const char * const vmstat_text[]
"allocstall",

"pgrotated",
+ "fast_gup",
+ "fast_gup_slow",
+
#endif
};


--
David Kleikamp
IBM Linux Technology Center


2007-12-12 04:57:42

by Nick Piggin

[permalink] [raw]
Subject: Re: [rfc] lockless get_user_pages for dio (and more)

On Tuesday 11 December 2007 08:30, Dave Kleikamp wrote:
> On Mon, 2007-10-15 at 22:25 +1000, Nick Piggin wrote:
> > On Monday 15 October 2007 04:19, Siddha, Suresh B wrote:
> > > On Sun, Oct 14, 2007 at 11:01:02AM +1000, Nick Piggin wrote:
> > > > This is just a really quick hack, untested ATM, but one that
> > > > has at least a chance of working (on x86).
> > >
> > > When we fall back to slow mode, we should decrement the ref counts
> > > on the pages we got so far in the fast mode.
> >
> > Here is something that is actually tested and works (not
> > tested with hugepages yet, though).
> >
> > However it's not 100% secure at the moment. It's actually
> > not completely trivial; I think we need to use an extra bit
> > in the present pte in order to exclude "not normal" pages,
> > if we want fast_gup to work on small page mappings too. I
> > think this would be possible to do on most architectures, but
> > I haven't done it here obviously.
> >
> > Still, it should be enough to test the design. I've added
> > fast_gup and fast_gup_slow to /proc/vmstat, which count the
> > number of times fast_gup was called, and the number of times
> > it dropped into the slowpath. It would be interesting to know
> > how it performs compared to your granular hugepage ptl...
>
> Nick,
> I've played with the fast_gup patch a bit. I was able to find a problem
> in follow_hugetlb_page() that Adam Litke fixed. I'm haven't been brave
> enough to implement it on any other architectures, but I did add a
> default that takes mmap_sem and calls the normal get_user_pages() if the
> architecture doesn't define fast_gup(). I put it in linux/mm.h, for
> lack of a better place, but it's a little kludgy since I didn't want
> mm.h to have to include sched.h. This patch is against 2.6.24-rc4.
> It's not ready for inclusion yet, of course.

Hi Dave,

Thanks so much. This makes it much more a complete patch (although
still missing the "normal page" detection).

I think I missed -- or forgot -- what was the follow_hugetlb_page
problem?

Anyway, I am hoping that someone will one day and test if this and
find it helps their workload, but on the other hand, if it doesn't
help anyone then we don't have to worry about adding it to the
kernel ;) I don't have any real setups that hammers DIO with threads.
I'm guessing DB2 and/or Oracle does?

Thanks,
Nick

>
> I haven't done much benchmarking. The one test I was looking at didn't
> show much of a change.
>
> ==============================================
> Introduce a new "fast_gup" (for want of a better name right now) which
> is basically a get_user_pages with a less general API that is more suited
> to the common case.
>
> - task and mm are always current and current->mm
> - force is always 0
> - pages is always non-NULL
> - don't pass back vmas
>
> This allows (at least on x86), an optimistic lockless pagetable walk,
> without taking any page table locks or even mmap_sem. Page table existence
> is guaranteed by turning interrupts off (combined with the fact that we're
> always looking up the current mm, which would need an IPI before its
> pagetables could be shot down from another CPU).
>
> Many other architectures could do the same thing. Those that don't IPI
> could potentially RCU free the page tables and do speculative references
> on the pages (a la lockless pagecache) to achieve a lockless fast_gup.
>
> Originally by Nick Piggin <[email protected]>
> ---
> arch/x86/lib/Makefile_64 | 2
> arch/x86/lib/gup_64.c | 188
> +++++++++++++++++++++++++++++++++++++++++++ fs/bio.c |
> 8 -
> fs/block_dev.c | 5 -
> fs/direct-io.c | 10 --
> fs/splice.c | 38 --------
> include/asm-x86/uaccess_64.h | 4
> include/linux/mm.h | 26 +++++
> include/linux/vmstat.h | 1
> mm/vmstat.c | 3
> 10 files changed, 231 insertions(+), 54 deletions(-)
>
> diff -Nurp linux-2.6.24-rc4/arch/x86/lib/Makefile_64
> linux/arch/x86/lib/Makefile_64 ---
> linux-2.6.24-rc4/arch/x86/lib/Makefile_64 2007-12-04 08:44:34.000000000
> -0600 +++ linux/arch/x86/lib/Makefile_64 2007-12-10 15:01:17.000000000
> -0600 @@ -10,4 +10,4 @@ obj-$(CONFIG_SMP) += msr-on-cpu.o
> lib-y := csum-partial_64.o csum-copy_64.o csum-wrappers_64.o delay_64.o \
> usercopy_64.o getuser_64.o putuser_64.o \
> thunk_64.o clear_page_64.o copy_page_64.o bitstr_64.o bitops_64.o
> -lib-y += memcpy_64.o memmove_64.o memset_64.o copy_user_64.o rwlock_64.o
> copy_user_nocache_64.o +lib-y += memcpy_64.o memmove_64.o memset_64.o
> copy_user_64.o rwlock_64.o copy_user_nocache_64.o gup_64.o diff -Nurp
> linux-2.6.24-rc4/arch/x86/lib/gup_64.c linux/arch/x86/lib/gup_64.c ---
> linux-2.6.24-rc4/arch/x86/lib/gup_64.c 1969-12-31 18:00:00.000000000 -0600
> +++ linux/arch/x86/lib/gup_64.c 2007-12-10 15:01:17.000000000 -0600 @@ -0,0
> +1,188 @@
> +/*
> + * Lockless fast_gup for x86
> + *
> + * Copyright (C) 2007 Nick Piggin
> + * Copyright (C) 2007 Novell Inc.
> + */
> +#include <linux/sched.h>
> +#include <linux/mm.h>
> +#include <linux/vmstat.h>
> +#include <asm/pgtable.h>
> +
> +static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
> + int write, struct page **pages, int *nr)
> +{
> + pte_t *ptep;
> +
> + /* XXX: this won't work for 32-bit (must map pte) */
> + ptep = (pte_t *)pmd_page_vaddr(pmd) + pte_index(addr);
> + do {
> + pte_t pte = *ptep;
> + unsigned long pfn;
> + struct page *page;
> +
> + if ((pte_val(pte) & (_PAGE_PRESENT|_PAGE_USER)) !=
> + (_PAGE_PRESENT|_PAGE_USER))
> + return 0;
> +
> + if (write && !pte_write(pte))
> + return 0;
> +
> + /* XXX: really need new bit in pte to denote normal page */
> + pfn = pte_pfn(pte);
> + if (unlikely(!pfn_valid(pfn)))
> + return 0;
> +
> + page = pfn_to_page(pfn);
> + get_page(page);
> + pages[*nr] = page;
> + (*nr)++;
> +
> + } while (ptep++, addr += PAGE_SIZE, addr != end);
> + pte_unmap(ptep - 1);
> +
> + return 1;
> +}
> +
> +static inline void get_head_page_multiple(struct page *page, int nr)
> +{
> + VM_BUG_ON(page != compound_head(page));
> + VM_BUG_ON(page_count(page) == 0);
> + atomic_add(nr, &page->_count);
> +}
> +
> +static int gup_huge_pmd(pmd_t pmd, unsigned long addr, unsigned long end,
> + int write, struct page **pages, int *nr)
> +{
> + pte_t pte = *(pte_t *)&pmd;
> + struct page *head, *page;
> + int refs;
> +
> + if ((pte_val(pte) & _PAGE_USER) != _PAGE_USER)
> + return 0;
> +
> + BUG_ON(!pfn_valid(pte_pfn(pte)));
> +
> + if (write && !pte_write(pte))
> + return 0;
> +
> + refs = 0;
> + head = pte_page(pte);
> + page = head + ((addr & ~HPAGE_MASK) >> PAGE_SHIFT);
> + do {
> + pages[*nr] = page;
> + (*nr)++;
> + page++;
> + refs++;
> + } while (addr += PAGE_SIZE, addr != end);
> +
> + get_head_page_multiple(head, refs);
> +
> + return 1;
> +}
> +
> +static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> + int write, struct page **pages, int *nr)
> +{
> + static int count = 50;
> + unsigned long next;
> + pmd_t *pmdp;
> +
> + pmdp = (pmd_t *)pud_page_vaddr(pud) + pmd_index(addr);
> + do {
> + pmd_t pmd = *pmdp;
> +
> + next = pmd_addr_end(addr, end);
> + if (pmd_none(pmd))
> + return 0;
> + if (unlikely(pmd_large(pmd))) {
> + if (!gup_huge_pmd(pmd, addr, next, write, pages, nr)) {
> + if (count) {
> + printk(KERN_ERR
> + "pmd = 0x%lx, addr = 0x%lx\n",
> + pmd.pmd, addr);
> + count--;
> + }
> + return 0;
> + }
> + } else {
> + if (!gup_pte_range(pmd, addr, next, write, pages, nr))
> + return 0;
> + }
> + } while (pmdp++, addr = next, addr != end);
> +
> + return 1;
> +}
> +
> +static int gup_pud_range(pgd_t pgd, unsigned long addr, unsigned long end,
> + int write, struct page **pages, int *nr)
> +{
> + unsigned long next;
> + pud_t *pudp;
> +
> + pudp = (pud_t *)pgd_page_vaddr(pgd) + pud_index(addr);
> + do {
> + pud_t pud = *pudp;
> +
> + next = pud_addr_end(addr, end);
> + if (pud_none(pud))
> + return 0;
> + if (!gup_pmd_range(pud, addr, next, write, pages, nr))
> + return 0;
> + } while (pudp++, addr = next, addr != end);
> +
> + return 1;
> +}
> +
> +int fast_gup(unsigned long start, int nr_pages, int write, struct page
> **pages) +{
> + struct mm_struct *mm = current->mm;
> + unsigned long end = start + (nr_pages << PAGE_SHIFT);
> + unsigned long addr = start;
> + unsigned long next;
> + pgd_t *pgdp;
> + int nr = 0;
> +
> + /* XXX: batch / limit 'nr', to avoid huge latency */
> + /*
> + * This doesn't prevent pagetable teardown, but does prevent
> + * the pagetables from being freed on x86-64.
> + *
> + * So long as we atomically load page table pointers versus teardown
> + * (which we do on x86-64), we can follow the address down to the
> + * the page.
> + */
> + local_irq_disable();
> + __count_vm_event(FAST_GUP);
> + pgdp = pgd_offset(mm, addr);
> + do {
> + pgd_t pgd = *pgdp;
> +
> + next = pgd_addr_end(addr, end);
> + if (pgd_none(pgd))
> + goto slow;
> + if (!gup_pud_range(pgd, addr, next, write, pages, &nr))
> + goto slow;
> + } while (pgdp++, addr = next, addr != end);
> + local_irq_enable();
> +
> + BUG_ON(nr != (end - start) >> PAGE_SHIFT);
> + return nr;
> +
> +slow:
> + {
> + int i, ret;
> +
> + __count_vm_event(FAST_GUP_SLOW);
> + local_irq_enable();
> + for (i = 0; i < nr; i++)
> + put_page(pages[i]);
> +
> + down_read(&mm->mmap_sem);
> + ret = get_user_pages(current, mm, start,
> + (end - start) >> PAGE_SHIFT, write, 0, pages, NULL);
> + up_read(&mm->mmap_sem);
> +
> + return ret;
> + }
> +}
> diff -Nurp linux-2.6.24-rc4/fs/bio.c linux/fs/bio.c
> --- linux-2.6.24-rc4/fs/bio.c 2007-12-04 08:44:49.000000000 -0600
> +++ linux/fs/bio.c 2007-12-10 15:01:17.000000000 -0600
> @@ -636,13 +636,9 @@ static struct bio *__bio_map_user_iov(st
> unsigned long start = uaddr >> PAGE_SHIFT;
> const int local_nr_pages = end - start;
> const int page_limit = cur_page + local_nr_pages;
> -
> - down_read(&current->mm->mmap_sem);
> - ret = get_user_pages(current, current->mm, uaddr,
> - local_nr_pages,
> - write_to_vm, 0, &pages[cur_page], NULL);
> - up_read(&current->mm->mmap_sem);
>
> + ret = fast_gup(uaddr, local_nr_pages, write_to_vm,
> + &pages[cur_page]);
> if (ret < local_nr_pages) {
> ret = -EFAULT;
> goto out_unmap;
> diff -Nurp linux-2.6.24-rc4/fs/block_dev.c linux/fs/block_dev.c
> --- linux-2.6.24-rc4/fs/block_dev.c 2007-12-04 08:44:49.000000000 -0600
> +++ linux/fs/block_dev.c 2007-12-10 15:01:17.000000000 -0600
> @@ -221,10 +221,7 @@ static struct page *blk_get_page(unsigne
> if (pvec->idx == pvec->nr) {
> nr_pages = PAGES_SPANNED(addr, count);
> nr_pages = min(nr_pages, VEC_SIZE);
> - down_read(&current->mm->mmap_sem);
> - ret = get_user_pages(current, current->mm, addr, nr_pages,
> - rw == READ, 0, pvec->page, NULL);
> - up_read(&current->mm->mmap_sem);
> + ret = fast_gup(addr, nr_pages, rw == READ, pvec->page);
> if (ret < 0)
> return ERR_PTR(ret);
> pvec->nr = ret;
> diff -Nurp linux-2.6.24-rc4/fs/direct-io.c linux/fs/direct-io.c
> --- linux-2.6.24-rc4/fs/direct-io.c 2007-12-04 08:44:49.000000000 -0600
> +++ linux/fs/direct-io.c 2007-12-10 15:01:17.000000000 -0600
> @@ -150,17 +150,11 @@ static int dio_refill_pages(struct dio *
> int nr_pages;
>
> nr_pages = min(dio->total_pages - dio->curr_page, DIO_PAGES);
> - down_read(&current->mm->mmap_sem);
> - ret = get_user_pages(
> - current, /* Task for fault acounting */
> - current->mm, /* whose pages? */
> + ret = fast_gup(
> dio->curr_user_address, /* Where from? */
> nr_pages, /* How many pages? */
> dio->rw == READ, /* Write to memory? */
> - 0, /* force (?) */
> - &dio->pages[0],
> - NULL); /* vmas */
> - up_read(&current->mm->mmap_sem);
> + &dio->pages[0]); /* Put results here */
>
> if (ret < 0 && dio->blocks_available && (dio->rw & WRITE)) {
> struct page *page = ZERO_PAGE(0);
> diff -Nurp linux-2.6.24-rc4/fs/splice.c linux/fs/splice.c
> --- linux-2.6.24-rc4/fs/splice.c 2007-12-04 08:44:50.000000000 -0600
> +++ linux/fs/splice.c 2007-12-10 15:01:17.000000000 -0600
> @@ -1174,33 +1174,6 @@ static long do_splice(struct file *in, l
> }
>
> /*
> - * Do a copy-from-user while holding the mmap_semaphore for reading, in a
> - * manner safe from deadlocking with simultaneous mmap() (grabbing
> mmap_sem - * for writing) and page faulting on the user memory pointed to
> by src. - * This assumes that we will very rarely hit the partial != 0
> path, or this - * will not be a win.
> - */
> -static int copy_from_user_mmap_sem(void *dst, const void __user *src,
> size_t n) -{
> - int partial;
> -
> - pagefault_disable();
> - partial = __copy_from_user_inatomic(dst, src, n);
> - pagefault_enable();
> -
> - /*
> - * Didn't copy everything, drop the mmap_sem and do a faulting copy
> - */
> - if (unlikely(partial)) {
> - up_read(&current->mm->mmap_sem);
> - partial = copy_from_user(dst, src, n);
> - down_read(&current->mm->mmap_sem);
> - }
> -
> - return partial;
> -}
> -
> -/*
> * Map an iov into an array of pages and offset/length tupples. With the
> * partial_page structure, we can map several non-contiguous ranges into
> * our ones pages[] map instead of splitting that operation into pieces.
> @@ -1213,8 +1186,6 @@ static int get_iovec_page_array(const st
> {
> int buffers = 0, error = 0;
>
> - down_read(&current->mm->mmap_sem);
> -
> while (nr_vecs) {
> unsigned long off, npages;
> struct iovec entry;
> @@ -1223,7 +1194,7 @@ static int get_iovec_page_array(const st
> int i;
>
> error = -EFAULT;
> - if (copy_from_user_mmap_sem(&entry, iov, sizeof(entry)))
> + if (copy_from_user(&entry, iov, sizeof(entry)))
> break;
>
> base = entry.iov_base;
> @@ -1257,9 +1228,8 @@ static int get_iovec_page_array(const st
> if (npages > PIPE_BUFFERS - buffers)
> npages = PIPE_BUFFERS - buffers;
>
> - error = get_user_pages(current, current->mm,
> - (unsigned long) base, npages, 0, 0,
> - &pages[buffers], NULL);
> + error = fast_gup((unsigned long)base, npages, 0,
> + &pages[buffers]);
>
> if (unlikely(error <= 0))
> break;
> @@ -1298,8 +1268,6 @@ static int get_iovec_page_array(const st
> iov++;
> }
>
> - up_read(&current->mm->mmap_sem);
> -
> if (buffers)
> return buffers;
>
> diff -Nurp linux-2.6.24-rc4/include/asm-x86/uaccess_64.h
> linux/include/asm-x86/uaccess_64.h ---
> linux-2.6.24-rc4/include/asm-x86/uaccess_64.h 2007-12-04 08:44:54.000000000
> -0600 +++ linux/include/asm-x86/uaccess_64.h 2007-12-10 15:01:17.000000000
> -0600 @@ -381,4 +381,8 @@ static inline int __copy_from_user_inato
> return __copy_user_nocache(dst, src, size, 0);
> }
>
> +#define __HAVE_ARCH_FAST_GUP
> +struct page;
> +int fast_gup(unsigned long start, int nr_pages, int write, struct page
> **pages); +
> #endif /* __X86_64_UACCESS_H */
> diff -Nurp linux-2.6.24-rc4/include/linux/mm.h linux/include/linux/mm.h
> --- linux-2.6.24-rc4/include/linux/mm.h 2007-12-04 08:44:56.000000000 -0600
> +++ linux/include/linux/mm.h 2007-12-10 15:01:17.000000000 -0600
> @@ -12,6 +12,7 @@
> #include <linux/prio_tree.h>
> #include <linux/debug_locks.h>
> #include <linux/mm_types.h>
> +#include <linux/uaccess.h> /* for __HAVE_ARCH_FAST_GUP */
>
> struct mempolicy;
> struct anon_vma;
> @@ -750,6 +751,31 @@ extern int mprotect_fixup(struct vm_area
> unsigned long end, unsigned long newflags);
>
> /*
> + * Architecture may implement efficient get_user_pages to avoid having to
> + * take the mmap sem
> + */
> +#ifndef __HAVE_ARCH_FAST_GUP
> +static inline int __fast_gup(struct mm_struct *mm, unsigned long start,
> + int nr_pages, int write, struct page **pages)
> +{
> + int ret;
> +
> + down_read(&mm->mmap_sem);
> + ret = get_user_pages(current, mm, start, nr_pages, write,
> + 0, pages, NULL);
> + up_read(&mm->mmap_sem);
> +
> + return ret;
> +}
> +/*
> + * This macro avoids having to include sched.h in this header to
> + * dereference current->mm.
> + */
> +#define fast_gup(start, nr_pages, write, pages) \
> + __fast_gup(current->mm, start, nr_pages, write, pages)
> +#endif
> +
> +/*
> * A callback you can register to apply pressure to ageable caches.
> *
> * 'shrink' is passed a count 'nr_to_scan' and a 'gfpmask'. It should
> diff -Nurp linux-2.6.24-rc4/include/linux/vmstat.h
> linux/include/linux/vmstat.h ---
> linux-2.6.24-rc4/include/linux/vmstat.h 2007-10-09 15:31:38.000000000 -0500
> +++ linux/include/linux/vmstat.h 2007-12-10 15:01:17.000000000 -0600 @@
> -37,6 +37,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PS
> FOR_ALL_ZONES(PGSCAN_DIRECT),
> PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
> PAGEOUTRUN, ALLOCSTALL, PGROTATED,
> + FAST_GUP, FAST_GUP_SLOW,
> NR_VM_EVENT_ITEMS
> };
>
> diff -Nurp linux-2.6.24-rc4/mm/vmstat.c linux/mm/vmstat.c
> --- linux-2.6.24-rc4/mm/vmstat.c 2007-12-04 08:45:01.000000000 -0600
> +++ linux/mm/vmstat.c 2007-12-10 15:01:17.000000000 -0600
> @@ -642,6 +642,9 @@ static const char * const vmstat_text[]
> "allocstall",
>
> "pgrotated",
> + "fast_gup",
> + "fast_gup_slow",
> +
> #endif
> };

2007-12-12 05:12:01

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [rfc] lockless get_user_pages for dio (and more)


On Wed, 2007-12-12 at 15:57 +1100, Nick Piggin wrote:
> On Tuesday 11 December 2007 08:30, Dave Kleikamp wrote:

> > Nick,
> > I've played with the fast_gup patch a bit. I was able to find a problem
> > in follow_hugetlb_page() that Adam Litke fixed. I'm haven't been brave
> > enough to implement it on any other architectures, but I did add a
> > default that takes mmap_sem and calls the normal get_user_pages() if the
> > architecture doesn't define fast_gup(). I put it in linux/mm.h, for
> > lack of a better place, but it's a little kludgy since I didn't want
> > mm.h to have to include sched.h. This patch is against 2.6.24-rc4.
> > It's not ready for inclusion yet, of course.
>
> Hi Dave,
>
> Thanks so much. This makes it much more a complete patch (although
> still missing the "normal page" detection).
>
> I think I missed -- or forgot -- what was the follow_hugetlb_page
> problem?

Badari found a problem running some tests and handed it off to me to
look at. I didn't share it publicly. Anyway, we were finding that
fastgup was taking the slow path almost all the time with huge pages.
The problem was that follow_hugetlb_page was failing to fault on a
non-writable page when it needed a writable one. So we'd keep seeing a
non-writable page over and over. This is fixed in 2.6.24-rc5.

> Anyway, I am hoping that someone will one day and test if this and
> find it helps their workload, but on the other hand, if it doesn't
> help anyone then we don't have to worry about adding it to the
> kernel ;) I don't have any real setups that hammers DIO with threads.
> I'm guessing DB2 and/or Oracle does?

I'll try to get someone to run a DB2 benchmark and see what it looks
like.
--
David Kleikamp
IBM Linux Technology Center

2007-12-12 05:40:35

by Nick Piggin

[permalink] [raw]
Subject: Re: [rfc] lockless get_user_pages for dio (and more)

On Wednesday 12 December 2007 16:11, Dave Kleikamp wrote:
> On Wed, 2007-12-12 at 15:57 +1100, Nick Piggin wrote:
> > On Tuesday 11 December 2007 08:30, Dave Kleikamp wrote:
> > > Nick,
> > > I've played with the fast_gup patch a bit. I was able to find a
> > > problem in follow_hugetlb_page() that Adam Litke fixed. I'm haven't
> > > been brave enough to implement it on any other architectures, but I did
> > > add a default that takes mmap_sem and calls the normal
> > > get_user_pages() if the architecture doesn't define fast_gup(). I put
> > > it in linux/mm.h, for lack of a better place, but it's a little kludgy
> > > since I didn't want mm.h to have to include sched.h. This patch is
> > > against 2.6.24-rc4. It's not ready for inclusion yet, of course.
> >
> > Hi Dave,
> >
> > Thanks so much. This makes it much more a complete patch (although
> > still missing the "normal page" detection).
> >
> > I think I missed -- or forgot -- what was the follow_hugetlb_page
> > problem?
>
> Badari found a problem running some tests and handed it off to me to
> look at. I didn't share it publicly. Anyway, we were finding that
> fastgup was taking the slow path almost all the time with huge pages.
> The problem was that follow_hugetlb_page was failing to fault on a
> non-writable page when it needed a writable one. So we'd keep seeing a
> non-writable page over and over. This is fixed in 2.6.24-rc5.

Ah yes, I just saw that fix in the changelog. So not a problem with my
patch as such, but good to get that fixed.


> > Anyway, I am hoping that someone will one day and test if this and
> > find it helps their workload, but on the other hand, if it doesn't
> > help anyone then we don't have to worry about adding it to the
> > kernel ;) I don't have any real setups that hammers DIO with threads.
> > I'm guessing DB2 and/or Oracle does?
>
> I'll try to get someone to run a DB2 benchmark and see what it looks
> like.

That would be great if you could.

Thanks,
Nick

2008-01-16 19:58:18

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [rfc] lockless get_user_pages for dio (and more)


On Wed, 2007-12-12 at 16:40 +1100, Nick Piggin wrote:
> On Wednesday 12 December 2007 16:11, Dave Kleikamp wrote:
> > On Wed, 2007-12-12 at 15:57 +1100, Nick Piggin wrote:

> > > Anyway, I am hoping that someone will one day and test if this and
> > > find it helps their workload, but on the other hand, if it doesn't
> > > help anyone then we don't have to worry about adding it to the
> > > kernel ;) I don't have any real setups that hammers DIO with threads.
> > > I'm guessing DB2 and/or Oracle does?
> >
> > I'll try to get someone to run a DB2 benchmark and see what it looks
> > like.
>
> That would be great if you could.

We weren't able to get in any runs before the holidays, but we finally
have some good news from our performance team:

"To test the effects of the patch, an OLTP workload was run on an IBM
x3850 M2 server with 2 processors (quad-core Intel Xeon processors at
2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel. Comparing
runs with and without the patch resulted in an overall performance
benefit of ~9.8%. Correspondingly, oprofiles showed that samples from
__up_read and __down_read routines that is seen during thread contention
for system resources was reduced from 2.8% down to .05%. Monitoring
the /proc/vmstat output from the patched run showed that the counter for
fast_gup contained a very high number while the fast_gup_slow value was
zero."

Great work, Nick!

Thanks,
Shaggy
--
David Kleikamp
IBM Linux Technology Center

2008-01-17 06:34:58

by Nick Piggin

[permalink] [raw]
Subject: Re: [rfc] lockless get_user_pages for dio (and more)

On Thursday 17 January 2008 06:58, Dave Kleikamp wrote:
> On Wed, 2007-12-12 at 16:40 +1100, Nick Piggin wrote:
> > On Wednesday 12 December 2007 16:11, Dave Kleikamp wrote:
> > > On Wed, 2007-12-12 at 15:57 +1100, Nick Piggin wrote:
> > > > Anyway, I am hoping that someone will one day and test if this and
> > > > find it helps their workload, but on the other hand, if it doesn't
> > > > help anyone then we don't have to worry about adding it to the
> > > > kernel ;) I don't have any real setups that hammers DIO with threads.
> > > > I'm guessing DB2 and/or Oracle does?
> > >
> > > I'll try to get someone to run a DB2 benchmark and see what it looks
> > > like.
> >
> > That would be great if you could.
>
> We weren't able to get in any runs before the holidays, but we finally
> have some good news from our performance team:
>
> "To test the effects of the patch, an OLTP workload was run on an IBM
> x3850 M2 server with 2 processors (quad-core Intel Xeon processors at
> 2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel. Comparing
> runs with and without the patch resulted in an overall performance
> benefit of ~9.8%. Correspondingly, oprofiles showed that samples from
> __up_read and __down_read routines that is seen during thread contention
> for system resources was reduced from 2.8% down to .05%. Monitoring
> the /proc/vmstat output from the patched run showed that the counter for
> fast_gup contained a very high number while the fast_gup_slow value was
> zero."
>
> Great work, Nick!

Ah, excellent. Thanks for getting those numbers Dave. This will
be a great help towards getting the patch merged.

I'm just working on the final required piece for this thing (the
pte_special pte bit, required to distinguish whether or not we
can refcount a page without looking at the vma). It is strictly
just a correctness/security measure, which is why you were able
to run tests without it. And it won't add any significant cost to
the fastpaths, so the numbers remain valid.

FWIW, I cc'ed linux-arch: the lockless get_user_pages patch has
architecture specific elements, so it will need some attention
there. If other architectures are interested (eg. powerpc or
ia64), then I will be happy to work with maintainers to help
try to devise a way of fitting it into their tlb flushing scheme.
Ping me if you'd like to take up the offer.

Thanks,
Nick

2008-01-24 07:07:18

by Nick Piggin

[permalink] [raw]
Subject: Re: [rfc] lockless get_user_pages for dio (and more)

On Thursday 17 January 2008 06:58, Dave Kleikamp wrote:

> We weren't able to get in any runs before the holidays, but we finally
> have some good news from our performance team:
>
> "To test the effects of the patch, an OLTP workload was run on an IBM
> x3850 M2 server with 2 processors (quad-core Intel Xeon processors at
> 2.93 GHz) using IBM DB2 v9.5 running Linux 2.6.24rc7 kernel. Comparing
> runs with and without the patch resulted in an overall performance
> benefit of ~9.8%. Correspondingly, oprofiles showed that samples from
> __up_read and __down_read routines that is seen during thread contention
> for system resources was reduced from 2.8% down to .05%. Monitoring
> the /proc/vmstat output from the patched run showed that the counter for
> fast_gup contained a very high number while the fast_gup_slow value was
> zero."

Just for reference, I've attached a more complete patch for x86,
which has to be applied on top of the pte_special patch posted in
another thread.

No need to test anything at this point... the generated code for
this version is actually slightly better than the last one despite
the extra condition being tested for. With a few tweak I was
actually able to reduce the number of tests in the inner loop, and
adding noinline to the leaf functions helps keep them in registers.

I'm currently having a look at an initial powerpc 64 patch,
hopefully we'll see similar improvements there. Will post that when
I get further along with it.

Thanks,
Nick


Attachments:
(No filename) (1.46 kB)
mm-get_user_pages-fast.patch (12.20 kB)
Download all attachments