2012-02-16 12:04:54

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH] extend prefault helpers to fault in more than PAGE_SIZE

Hi all,

drm/i915 has write/read paths to upload/download data to/from gpu buffer
objects. For a bunch of reasons we have special fastpaths with decent setup
costs, so when we fall back to the slow-path we don't fully recover to the
fastest fast-path when grabbing our locks again. This is also in parts due to
that we have multiple fallbacks to slower paths, so control-flow in our code
would get really ugly.

As part of a larger rewrite and re-tuning of these functions we've noticed that
the prefault helpers in pagemap.h only prefault up to PAGE_SIZE because that's
all the other users need. The follow-up patch extends this to abritary sizes so
that we can fully exploit our special fastpaths in more cases (we typically see
reads and writes of a few pages up to a few mb).

I'd like to get this in for 3.4. There's no functional dependency between this
patch and the drm/i915 rework (only performance effects), so this can go in
through -mm without causing merge issues.

If this or something similar isn't acceptable, plan B is to hand-roll these 2
functions in drm/i915. But I don't like nih these things in driver code much.

Comments highly welcome.

Yours, Daniel

For reference, the drm/i915 read/write rework is avaialable at:

http://cgit.freedesktop.org/~danvet/drm/log/?h=pwrite-pread

Unfortunately cgit.fd.o is currently on hiatus.

Daniel Vetter (1):
mm: extend prefault helpers to fault in more than PAGE_SIZE

include/linux/pagemap.h | 28 ++++++++++++++++++----------
1 files changed, 18 insertions(+), 10 deletions(-)

--
1.7.7.5


2012-02-16 12:04:56

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE

drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.

I've checked the callsites and they all already clamp size when
calling fault_in_pages_* to the same as for the subsequent
__copy_to|from_user and hence don't rely on the implicit clamping
to PAGE_SIZE.

Also kill a copy&pasted spurious space in both functions while at it.

Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
---
include/linux/pagemap.h | 28 ++++++++++++++++++----------
1 files changed, 18 insertions(+), 10 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index cfaaa69..689527d 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
static inline int fault_in_pages_writeable(char __user *uaddr, int size)
{
int ret;
+ char __user *end = uaddr + size - 1;

if (unlikely(size == 0))
return 0;
@@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
* Writing zeroes into userspace here is OK, because we know that if
* the zero gets there, we'll be overwriting it.
*/
- ret = __put_user(0, uaddr);
+ while (uaddr <= end) {
+ ret = __put_user(0, uaddr);
+ if (ret != 0)
+ return ret;
+ uaddr += PAGE_SIZE;
+ }
if (ret == 0) {
- char __user *end = uaddr + size - 1;
-
/*
* If the page was already mapped, this will get a cache miss
* for sure, so try to avoid doing it.
*/
- if (((unsigned long)uaddr & PAGE_MASK) !=
+ if (((unsigned long)uaddr & PAGE_MASK) ==
((unsigned long)end & PAGE_MASK))
- ret = __put_user(0, end);
+ ret = __put_user(0, end);
}
return ret;
}
@@ -435,17 +439,21 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size)
{
volatile char c;
int ret;
+ const char __user *end = uaddr + size - 1;

if (unlikely(size == 0))
return 0;

- ret = __get_user(c, uaddr);
+ while (uaddr <= end) {
+ ret = __get_user(c, uaddr);
+ if (ret != 0)
+ return ret;
+ uaddr += PAGE_SIZE;
+ }
if (ret == 0) {
- const char __user *end = uaddr + size - 1;
-
- if (((unsigned long)uaddr & PAGE_MASK) !=
+ if (((unsigned long)uaddr & PAGE_MASK) ==
((unsigned long)end & PAGE_MASK)) {
- ret = __get_user(c, end);
+ ret = __get_user(c, end);
(void)c;
}
}
--
1.7.7.5

2012-02-16 13:32:11

by Hillf Danton

[permalink] [raw]
Subject: Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE

On Thu, Feb 16, 2012 at 8:01 PM, Daniel Vetter <[email protected]> wrote:
> @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
>         * Writing zeroes into userspace here is OK, because we know that if
>         * the zero gets there, we'll be overwriting it.
>         */
> -       ret = __put_user(0, uaddr);
> +       while (uaddr <= end) {
> +               ret = __put_user(0, uaddr);
> +               if (ret != 0)
> +                       return ret;
> +               uaddr += PAGE_SIZE;
> +       }

What if
uaddr & ~PAGE_MASK == PAGE_SIZE -3 &&
end & ~PAGE_MASK == 2

2012-02-16 15:14:17

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE

On Thu, Feb 16, 2012 at 09:32:08PM +0800, Hillf Danton wrote:
> On Thu, Feb 16, 2012 at 8:01 PM, Daniel Vetter <[email protected]> wrote:
> > @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
> > ? ? ? ? * Writing zeroes into userspace here is OK, because we know that if
> > ? ? ? ? * the zero gets there, we'll be overwriting it.
> > ? ? ? ? */
> > - ? ? ? ret = __put_user(0, uaddr);
> > + ? ? ? while (uaddr <= end) {
> > + ? ? ? ? ? ? ? ret = __put_user(0, uaddr);
> > + ? ? ? ? ? ? ? if (ret != 0)
> > + ? ? ? ? ? ? ? ? ? ? ? return ret;
> > + ? ? ? ? ? ? ? uaddr += PAGE_SIZE;
> > + ? ? ? }
>
> What if
> uaddr & ~PAGE_MASK == PAGE_SIZE -3 &&
> end & ~PAGE_MASK == 2

I don't quite follow - can you elaborate upon which issue you're seeing?
-Daniel
--
Daniel Vetter
Mail: [email protected]
Mobile: +41 (0)79 365 57 48

2012-02-17 13:06:19

by Hillf Danton

[permalink] [raw]
Subject: Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE

On Thu, Feb 16, 2012 at 11:14 PM, Daniel Vetter <[email protected]> wrote:
> On Thu, Feb 16, 2012 at 09:32:08PM +0800, Hillf Danton wrote:
>> On Thu, Feb 16, 2012 at 8:01 PM, Daniel Vetter <[email protected]> wrote:
>> > @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
>> >         * Writing zeroes into userspace here is OK, because we know that if
>> >         * the zero gets there, we'll be overwriting it.
>> >         */
>> > -       ret = __put_user(0, uaddr);
>> > +       while (uaddr <= end) {
>> > +               ret = __put_user(0, uaddr);
>> > +               if (ret != 0)
>> > +                       return ret;
>> > +               uaddr += PAGE_SIZE;
>> > +       }
>>
>> What if
>>              uaddr & ~PAGE_MASK == PAGE_SIZE -3 &&
>>                 end & ~PAGE_MASK == 2
>
> I don't quite follow - can you elaborate upon which issue you're seeing?

I concerned that __put_user(0, end) is missed, but it was added below.

And looks good to me.
Hillf

>        if (ret == 0) {
> -               char __user *end = uaddr + size - 1;
> -
>                /*
>                 * If the page was already mapped, this will get a cache miss
>                 * for sure, so try to avoid doing it.
>                 */
> -               if (((unsigned long)uaddr & PAGE_MASK) !=
> +               if (((unsigned long)uaddr & PAGE_MASK) ==
>                                ((unsigned long)end & PAGE_MASK))
> -                       ret = __put_user(0, end);
> +                       ret = __put_user(0, end);
>        }
>        return ret;
????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2012-02-23 22:37:01

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE

On Thu, 16 Feb 2012 13:01:36 +0100
Daniel Vetter <[email protected]> wrote:

> drm/i915 wants to read/write more than one page in its fastpath
> and hence needs to prefault more than PAGE_SIZE bytes.
>
> I've checked the callsites and they all already clamp size when
> calling fault_in_pages_* to the same as for the subsequent
> __copy_to|from_user and hence don't rely on the implicit clamping
> to PAGE_SIZE.
>
> Also kill a copy&pasted spurious space in both functions while at it.
>
> ...
>
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
> static inline int fault_in_pages_writeable(char __user *uaddr, int size)
> {
> int ret;
> + char __user *end = uaddr + size - 1;
>
> if (unlikely(size == 0))
> return 0;
> @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
> * Writing zeroes into userspace here is OK, because we know that if
> * the zero gets there, we'll be overwriting it.
> */
> - ret = __put_user(0, uaddr);
> + while (uaddr <= end) {
> + ret = __put_user(0, uaddr);
> + if (ret != 0)
> + return ret;
> + uaddr += PAGE_SIZE;
> + }

The callsites in filemap.c are pretty hot paths, which is why this
thing remains explicitly inlined. I think it would be worth adding a
bit of code here to avoid adding a pointless test-n-branch and larger
cache footprint to read() and write().

A way of doing that is to add another argument to these functions, say
"bool multipage". Change the code to do

if (multipage) {
while (uaddr <= end) {
...
}
}

and change the callsites to pass in constant "true" or "false". Then
compile it up and manually check that the compiler completely removed
the offending code from the filemap.c callsites.

Wanna have a think about that? If it all looks OK then please be sure
to add code comments explaining why we did this.

> if (ret == 0) {
> - char __user *end = uaddr + size - 1;
> -
> /*
> * If the page was already mapped, this will get a cache miss
> * for sure, so try to avoid doing it.
> */
> - if (((unsigned long)uaddr & PAGE_MASK) !=
> + if (((unsigned long)uaddr & PAGE_MASK) ==
> ((unsigned long)end & PAGE_MASK))

Maybe I'm having a dim day, but I don't immediately see why != got
turned into ==.


Once we have this settled I'd suggest that the patch be carried in
whatever-git-tree-needs-it.

2012-02-24 13:34:19

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE

On Thu, Feb 23, 2012 at 02:36:58PM -0800, Andrew Morton wrote:
> On Thu, 16 Feb 2012 13:01:36 +0100
> Daniel Vetter <[email protected]> wrote:
>
> > drm/i915 wants to read/write more than one page in its fastpath
> > and hence needs to prefault more than PAGE_SIZE bytes.
> >
> > I've checked the callsites and they all already clamp size when
> > calling fault_in_pages_* to the same as for the subsequent
> > __copy_to|from_user and hence don't rely on the implicit clamping
> > to PAGE_SIZE.
> >
> > Also kill a copy&pasted spurious space in both functions while at it.
> >
> > ...
> >
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
> > static inline int fault_in_pages_writeable(char __user *uaddr, int size)
> > {
> > int ret;
> > + char __user *end = uaddr + size - 1;
> >
> > if (unlikely(size == 0))
> > return 0;
> > @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
> > * Writing zeroes into userspace here is OK, because we know that if
> > * the zero gets there, we'll be overwriting it.
> > */
> > - ret = __put_user(0, uaddr);
> > + while (uaddr <= end) {
> > + ret = __put_user(0, uaddr);
> > + if (ret != 0)
> > + return ret;
> > + uaddr += PAGE_SIZE;
> > + }
>
> The callsites in filemap.c are pretty hot paths, which is why this
> thing remains explicitly inlined. I think it would be worth adding a
> bit of code here to avoid adding a pointless test-n-branch and larger
> cache footprint to read() and write().
>
> A way of doing that is to add another argument to these functions, say
> "bool multipage". Change the code to do
>
> if (multipage) {
> while (uaddr <= end) {
> ...
> }
> }
>
> and change the callsites to pass in constant "true" or "false". Then
> compile it up and manually check that the compiler completely removed
> the offending code from the filemap.c callsites.
>
> Wanna have a think about that? If it all looks OK then please be sure
> to add code comments explaining why we did this.

I wasn't really happy with the added branch either, but failed to come up
with a trick to avoid it. Imho adding new _multipage variants of these
functions instead of adding a constant argument is simpler because the
functions don't really share much thanks to the block below. I'll see what
it looks like (and obviously add a comment explaining what's going on).

> > if (ret == 0) {
> > - char __user *end = uaddr + size - 1;
> > -
> > /*
> > * If the page was already mapped, this will get a cache miss
> > * for sure, so try to avoid doing it.
> > */
> > - if (((unsigned long)uaddr & PAGE_MASK) !=
> > + if (((unsigned long)uaddr & PAGE_MASK) ==
> > ((unsigned long)end & PAGE_MASK))
>
> Maybe I'm having a dim day, but I don't immediately see why != got
> turned into ==.

Because of the loop uaddr will now point one page beyond the last
prefaulted page. To check whether end spilled into a new page we therefore
need to check whether uaddr and end are in the same pfn. Before uaddr
wasn't changed and hence the checking for a different pfn worked
correctly.

> Once we have this settled I'd suggest that the patch be carried in
> whatever-git-tree-needs-it.

Thanks for the comments.

Yours, Daniel
--
Daniel Vetter
Mail: [email protected]
Mobile: +41 (0)79 365 57 48

2012-02-24 20:40:06

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE

On Fri, 24 Feb 2012 14:34:31 +0100
Daniel Vetter <[email protected]> wrote:

> > > --- a/include/linux/pagemap.h
> > > +++ b/include/linux/pagemap.h
> > > @@ -408,6 +408,7 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
> > > static inline int fault_in_pages_writeable(char __user *uaddr, int size)
> > > {
> > > int ret;
> > > + char __user *end = uaddr + size - 1;
> > >
> > > if (unlikely(size == 0))
> > > return 0;
> > > @@ -416,17 +417,20 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
> > > * Writing zeroes into userspace here is OK, because we know that if
> > > * the zero gets there, we'll be overwriting it.
> > > */
> > > - ret = __put_user(0, uaddr);
> > > + while (uaddr <= end) {
> > > + ret = __put_user(0, uaddr);
> > > + if (ret != 0)
> > > + return ret;
> > > + uaddr += PAGE_SIZE;
> > > + }
> >
> > The callsites in filemap.c are pretty hot paths, which is why this
> > thing remains explicitly inlined. I think it would be worth adding a
> > bit of code here to avoid adding a pointless test-n-branch and larger
> > cache footprint to read() and write().
> >
> > A way of doing that is to add another argument to these functions, say
> > "bool multipage". Change the code to do
> >
> > if (multipage) {
> > while (uaddr <= end) {
> > ...
> > }
> > }
> >
> > and change the callsites to pass in constant "true" or "false". Then
> > compile it up and manually check that the compiler completely removed
> > the offending code from the filemap.c callsites.
> >
> > Wanna have a think about that? If it all looks OK then please be sure
> > to add code comments explaining why we did this.
>
> I wasn't really happy with the added branch either, but failed to come up
> with a trick to avoid it. Imho adding new _multipage variants of these
> functions instead of adding a constant argument is simpler because the
> functions don't really share much thanks to the block below. I'll see what
> it looks like (and obviously add a comment explaining what's going on).

well... that's just syntactic sugar:

static inline int __fault_in_pages_writeable(char __user *uaddr, int size, bool multipage)
{
...
}

static inline int fault_in_pages_writeable(char __user *uaddr, int size)
{
return __fault_in_pages_writeable(uaddr, size, false);
}

static inline int fault_in_multipages_writeable(char __user *uaddr, int size)
{
return __fault_in_pages_writeable(uaddr, size, true);
}

which I don't think is worth bothering with given the very small number
of callsites.

2012-02-29 14:07:08

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE

drm/i915 wants to read/write more than one page in its fastpath
and hence needs to prefault more than PAGE_SIZE bytes.

I've checked the callsites and they all already clamp size when
calling fault_in_pages_* to the same as for the subsequent
__copy_to|from_user and hence don't rely on the implicit clamping
to PAGE_SIZE.

Also kill a copy&pasted spurious space in both functions while at it.

v2: As suggested by Andrew Morton, add a multipage parameter to both
functions to avoid the additional branch for the pagemap.c hotpath.
My gcc 4.6 here seems to dtrt and indeed reap these branches where not
needed.

Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
---
drivers/gpu/drm/i915/i915_gem.c | 4 +-
drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 +-
fs/pipe.c | 4 +-
fs/splice.c | 2 +-
include/linux/pagemap.h | 39 ++++++++++++++++++---------
mm/filemap.c | 4 +-
6 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 544e528..9b200f4e 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -436,7 +436,7 @@ i915_gem_shmem_pread(struct drm_device *dev,
mutex_unlock(&dev->struct_mutex);

if (!prefaulted) {
- ret = fault_in_pages_writeable(user_data, remain);
+ ret = fault_in_pages_writeable(user_data, remain, true);
/* Userspace is tricking us, but we've already clobbered
* its pages with the prefault and promised to write the
* data up to the first fault. Hence ignore any errors
@@ -823,7 +823,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
return -EFAULT;

ret = fault_in_pages_readable((char __user *)(uintptr_t)args->data_ptr,
- args->size);
+ args->size, true);
if (ret)
return -EFAULT;

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 81687af..5f0b685 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -955,7 +955,7 @@ validate_exec_list(struct drm_i915_gem_exec_object2 *exec,
if (!access_ok(VERIFY_WRITE, ptr, length))
return -EFAULT;

- if (fault_in_pages_readable(ptr, length))
+ if (fault_in_pages_readable(ptr, length, true))
return -EFAULT;
}

diff --git a/fs/pipe.c b/fs/pipe.c
index a932ced..b29f71c 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -167,7 +167,7 @@ static int iov_fault_in_pages_write(struct iovec *iov, unsigned long len)
unsigned long this_len;

this_len = min_t(unsigned long, len, iov->iov_len);
- if (fault_in_pages_writeable(iov->iov_base, this_len))
+ if (fault_in_pages_writeable(iov->iov_base, this_len, false))
break;

len -= this_len;
@@ -189,7 +189,7 @@ static void iov_fault_in_pages_read(struct iovec *iov, unsigned long len)
unsigned long this_len;

this_len = min_t(unsigned long, len, iov->iov_len);
- fault_in_pages_readable(iov->iov_base, this_len);
+ fault_in_pages_readable(iov->iov_base, this_len, false);
len -= this_len;
iov++;
}
diff --git a/fs/splice.c b/fs/splice.c
index 1ec0493..e919d78 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -1491,7 +1491,7 @@ static int pipe_to_user(struct pipe_inode_info *pipe, struct pipe_buffer *buf,
* See if we can use the atomic maps, by prefaulting in the
* pages and doing an atomic copy
*/
- if (!fault_in_pages_writeable(sd->u.userptr, sd->len)) {
+ if (!fault_in_pages_writeable(sd->u.userptr, sd->len, false)) {
src = buf->ops->map(pipe, buf, 1);
ret = __copy_to_user_inatomic(sd->u.userptr, src + buf->offset,
sd->len);
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index cfaaa69..60ac5c5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -403,11 +403,14 @@ extern void add_page_wait_queue(struct page *page, wait_queue_t *waiter);
* Fault a userspace page into pagetables. Return non-zero on a fault.
*
* This assumes that two userspace pages are always sufficient. That's
- * not true if PAGE_CACHE_SIZE > PAGE_SIZE.
+ * not true if PAGE_CACHE_SIZE > PAGE_SIZE. If more than PAGE_SIZE needs to be
+ * prefaulted, set multipage to true.
*/
-static inline int fault_in_pages_writeable(char __user *uaddr, int size)
+static inline int fault_in_pages_writeable(char __user *uaddr, int size,
+ bool multipage)
{
int ret;
+ char __user *end = uaddr + size - 1;

if (unlikely(size == 0))
return 0;
@@ -416,36 +419,46 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
* Writing zeroes into userspace here is OK, because we know that if
* the zero gets there, we'll be overwriting it.
*/
- ret = __put_user(0, uaddr);
- if (ret == 0) {
- char __user *end = uaddr + size - 1;
+ do {
+ ret = __put_user(0, uaddr);
+ if (ret != 0)
+ return ret;
+ uaddr += PAGE_SIZE;
+ } while (multipage && uaddr <= end);

+ if (ret == 0) {
/*
* If the page was already mapped, this will get a cache miss
* for sure, so try to avoid doing it.
*/
- if (((unsigned long)uaddr & PAGE_MASK) !=
+ if (((unsigned long)uaddr & PAGE_MASK) ==
((unsigned long)end & PAGE_MASK))
- ret = __put_user(0, end);
+ ret = __put_user(0, end);
}
return ret;
}

-static inline int fault_in_pages_readable(const char __user *uaddr, int size)
+static inline int fault_in_pages_readable(const char __user *uaddr, int size,
+ bool multipage)
{
volatile char c;
int ret;
+ const char __user *end = uaddr + size - 1;

if (unlikely(size == 0))
return 0;

- ret = __get_user(c, uaddr);
- if (ret == 0) {
- const char __user *end = uaddr + size - 1;
+ do {
+ ret = __get_user(c, uaddr);
+ if (ret != 0)
+ return ret;
+ uaddr += PAGE_SIZE;
+ } while (multipage && uaddr <= end);

- if (((unsigned long)uaddr & PAGE_MASK) !=
+ if (ret == 0) {
+ if (((unsigned long)uaddr & PAGE_MASK) ==
((unsigned long)end & PAGE_MASK)) {
- ret = __get_user(c, end);
+ ret = __get_user(c, end);
(void)c;
}
}
diff --git a/mm/filemap.c b/mm/filemap.c
index 97f49ed..af2cad5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1317,7 +1317,7 @@ int file_read_actor(read_descriptor_t *desc, struct page *page,
* Faults on the destination of a read are common, so do it before
* taking the kmap.
*/
- if (!fault_in_pages_writeable(desc->arg.buf, size)) {
+ if (!fault_in_pages_writeable(desc->arg.buf, size, false)) {
kaddr = kmap_atomic(page, KM_USER0);
left = __copy_to_user_inatomic(desc->arg.buf,
kaddr + offset, size);
@@ -2138,7 +2138,7 @@ int iov_iter_fault_in_readable(struct iov_iter *i, size_t bytes)
{
char __user *buf = i->iov->iov_base + i->iov_offset;
bytes = min(bytes, i->iov->iov_len - i->iov_offset);
- return fault_in_pages_readable(buf, bytes);
+ return fault_in_pages_readable(buf, bytes, false);
}
EXPORT_SYMBOL(iov_iter_fault_in_readable);

--
1.7.7.5

2012-02-29 23:01:49

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE

On Wed, 29 Feb 2012 15:03:31 +0100
Daniel Vetter <[email protected]> wrote:

> drm/i915 wants to read/write more than one page in its fastpath
> and hence needs to prefault more than PAGE_SIZE bytes.
>
> I've checked the callsites and they all already clamp size when
> calling fault_in_pages_* to the same as for the subsequent
> __copy_to|from_user and hence don't rely on the implicit clamping
> to PAGE_SIZE.
>
> Also kill a copy&pasted spurious space in both functions while at it.
>
> v2: As suggested by Andrew Morton, add a multipage parameter to both
> functions to avoid the additional branch for the pagemap.c hotpath.
> My gcc 4.6 here seems to dtrt and indeed reap these branches where not
> needed.
>

I don't think this produced a very good result :(

>
> ...
>
> -static inline int fault_in_pages_writeable(char __user *uaddr, int size)
> +static inline int fault_in_pages_writeable(char __user *uaddr, int size,
> + bool multipage)
> {
> int ret;
> + char __user *end = uaddr + size - 1;
>
> if (unlikely(size == 0))
> return 0;
> @@ -416,36 +419,46 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
> * Writing zeroes into userspace here is OK, because we know that if
> * the zero gets there, we'll be overwriting it.
> */
> - ret = __put_user(0, uaddr);
> - if (ret == 0) {
> - char __user *end = uaddr + size - 1;
> + do {
> + ret = __put_user(0, uaddr);
> + if (ret != 0)
> + return ret;
> + uaddr += PAGE_SIZE;
> + } while (multipage && uaddr <= end);
>
> + if (ret == 0) {
> /*
> * If the page was already mapped, this will get a cache miss
> * for sure, so try to avoid doing it.
> */
> - if (((unsigned long)uaddr & PAGE_MASK) !=
> + if (((unsigned long)uaddr & PAGE_MASK) ==
> ((unsigned long)end & PAGE_MASK))
> - ret = __put_user(0, end);
> + ret = __put_user(0, end);
> }
> return ret;
> }

One effect of this change for the filemap.c callsite is that `uaddr'
now gets incremented by PAGE_SIZE. That happens to not break anything
because we then mask `uaddr' with PAGE_MASK, and if gcc were really
smart, it could remove that addition. But it's a bit ugly.

Ideally the patch would have no effect upon filemap.o size, but with an
allmodconfig config I'm seeing

text data bss dec hex filename
22876 118 7344 30338 7682 mm/filemap.o (before)
22925 118 7392 30435 76e3 mm/filemap.o (after)

so we are adding read()/write() overhead, and bss mysteriously got larger.

Can we improve on this? Even if it's some dumb

static inline int fault_in_pages_writeable(char __user *uaddr, int size,
bool multipage)
{
if (multipage) {
do-this
} else {
do-that
}
}

the code duplication between do-this and do-that is regrettable, but at
least it's all in the same place in the same file, so we won't
accidentally introduce skew later on.

Alternatively, add a separate fault_in_multi_pages_writeable() to
pagemap.h. I have a bad feeling this is what your original patch did!

(But we *should* be able to make this work! Why did this version of
the patch go so wrong?)

2012-02-29 23:14:57

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE

On Wed, Feb 29, 2012 at 03:01:46PM -0800, Andrew Morton wrote:
> On Wed, 29 Feb 2012 15:03:31 +0100
> Daniel Vetter <[email protected]> wrote:
>
> > drm/i915 wants to read/write more than one page in its fastpath
> > and hence needs to prefault more than PAGE_SIZE bytes.
> >
> > I've checked the callsites and they all already clamp size when
> > calling fault_in_pages_* to the same as for the subsequent
> > __copy_to|from_user and hence don't rely on the implicit clamping
> > to PAGE_SIZE.
> >
> > Also kill a copy&pasted spurious space in both functions while at it.
> >
> > v2: As suggested by Andrew Morton, add a multipage parameter to both
> > functions to avoid the additional branch for the pagemap.c hotpath.
> > My gcc 4.6 here seems to dtrt and indeed reap these branches where not
> > needed.
> >
>
> I don't think this produced a very good result :(

And I halfway expected this mail here ;-)

> > ...
> >
> > -static inline int fault_in_pages_writeable(char __user *uaddr, int size)
> > +static inline int fault_in_pages_writeable(char __user *uaddr, int size,
> > + bool multipage)
> > {
> > int ret;
> > + char __user *end = uaddr + size - 1;
> >
> > if (unlikely(size == 0))
> > return 0;
> > @@ -416,36 +419,46 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size)
> > * Writing zeroes into userspace here is OK, because we know that if
> > * the zero gets there, we'll be overwriting it.
> > */
> > - ret = __put_user(0, uaddr);
> > - if (ret == 0) {
> > - char __user *end = uaddr + size - 1;
> > + do {
> > + ret = __put_user(0, uaddr);
> > + if (ret != 0)
> > + return ret;
> > + uaddr += PAGE_SIZE;
> > + } while (multipage && uaddr <= end);
> >
> > + if (ret == 0) {
> > /*
> > * If the page was already mapped, this will get a cache miss
> > * for sure, so try to avoid doing it.
> > */
> > - if (((unsigned long)uaddr & PAGE_MASK) !=
> > + if (((unsigned long)uaddr & PAGE_MASK) ==
> > ((unsigned long)end & PAGE_MASK))
> > - ret = __put_user(0, end);
> > + ret = __put_user(0, end);
> > }
> > return ret;
> > }
>
> One effect of this change for the filemap.c callsite is that `uaddr'
> now gets incremented by PAGE_SIZE. That happens to not break anything
> because we then mask `uaddr' with PAGE_MASK, and if gcc were really
> smart, it could remove that addition. But it's a bit ugly.

Yep, gcc is not clever enough to reap the addl on uaddr (and change the
check for 'do we need to fault the 2nd page to' from jne to je again).
I've checked that before submitting - maybe should have mentioned this.

> Ideally the patch would have no effect upon filemap.o size, but with an
> allmodconfig config I'm seeing
>
> text data bss dec hex filename
> 22876 118 7344 30338 7682 mm/filemap.o (before)
> 22925 118 7392 30435 76e3 mm/filemap.o (after)
>
> so we are adding read()/write() overhead, and bss mysteriously got larger.
>
> Can we improve on this? Even if it's some dumb
>
> static inline int fault_in_pages_writeable(char __user *uaddr, int size,
> bool multipage)
> {
> if (multipage) {
> do-this
> } else {
> do-that
> }
> }
>
> the code duplication between do-this and do-that is regrettable, but at
> least it's all in the same place in the same file, so we won't
> accidentally introduce skew later on.
>
> Alternatively, add a separate fault_in_multi_pages_writeable() to
> pagemap.h. I have a bad feeling this is what your original patch did!
>
> (But we *should* be able to make this work! Why did this version of
> the patch go so wrong?)

Well, I couldn't reconcile the non-multipage with the multipage versions
of these functions - at least not without changing them slightly (like
this patch here does). Which is why I've asked you whether I should just
add a new multipage version of these. I personally deem your proposal of
using and if (multipage) with no shared code too ugly. But you've shot at
it a bit, so I've figured that this version here is what you want.

I'll redo this patch by adding _multipage versions of these 2 functions
for i915.

Yours, Daniel
--
Daniel Vetter
Mail: [email protected]
Mobile: +41 (0)79 365 57 48

2012-02-29 23:32:19

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] mm: extend prefault helpers to fault in more than PAGE_SIZE

On Thu, 1 Mar 2012 00:14:53 +0100
Daniel Vetter <[email protected]> wrote:

> I'll redo this patch by adding _multipage versions of these 2 functions
> for i915.

OK, but I hope "for i915" doesn't mean "private to"! Put 'em in
pagemap.h, for maintenance reasons and because they are generic.

Making them inline is a bit sad, but that's OK for now - they have few
callsites.