2020-12-07 23:00:42

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V2 0/2] Lift memcpy_[to|from]_page to core

From: Ira Weiny <[email protected]>

These are based on tip/core/mm. As I was looking at converting the calls to
kmap_local_page() I realized that there were a number of calls in highmem.h
which should also be converted.

So I've added a second prelim patch to convert those.

This is a V2 to get into 5.11 so that we can start to convert all the various
subsystems in 5.12.[1]

I'm sending to Andrew and Thomas but I'm expecting these to go through
tip/core/mm via Thomas if that is ok with Andrew.

[1] https://lore.kernel.org/lkml/[email protected]/

Ira Weiny (2):
mm/highmem: Remove deprecated kmap_atomic
mm/highmem: Lift memcpy_[to|from]_page to core

include/linux/highmem.h | 28 +++++++++++++-------------
include/linux/pagemap.h | 44 +++++++++++++++++++++++++++++++++++++++++
lib/iov_iter.c | 26 +++---------------------
3 files changed, 61 insertions(+), 37 deletions(-)

--
2.28.0.rc0.12.gb6a658bd00c9


2020-12-07 23:00:52

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V2 1/2] mm/highmem: Remove deprecated kmap_atomic

From: Ira Weiny <[email protected]>

kmap_atomic() is being deprecated in favor of kmap_local_page().

Replace the uses of kmap_atomic() within the highmem code.

Signed-off-by: Ira Weiny <[email protected]>
---
include/linux/highmem.h | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index f597830f26b4..3bfe6a12d14d 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -146,9 +146,9 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
#ifndef clear_user_highpage
static inline void clear_user_highpage(struct page *page, unsigned long vaddr)
{
- void *addr = kmap_atomic(page);
+ void *addr = kmap_local_page(page);
clear_user_page(addr, vaddr, page);
- kunmap_atomic(addr);
+ kunmap_local(addr);
}
#endif

@@ -199,16 +199,16 @@ alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma,

static inline void clear_highpage(struct page *page)
{
- void *kaddr = kmap_atomic(page);
+ void *kaddr = kmap_local_page(page);
clear_page(kaddr);
- kunmap_atomic(kaddr);
+ kunmap_local(kaddr);
}

static inline void zero_user_segments(struct page *page,
unsigned start1, unsigned end1,
unsigned start2, unsigned end2)
{
- void *kaddr = kmap_atomic(page);
+ void *kaddr = kmap_local_page(page);

BUG_ON(end1 > PAGE_SIZE || end2 > PAGE_SIZE);

@@ -218,7 +218,7 @@ static inline void zero_user_segments(struct page *page,
if (end2 > start2)
memset(kaddr + start2, 0, end2 - start2);

- kunmap_atomic(kaddr);
+ kunmap_local(kaddr);
flush_dcache_page(page);
}

@@ -241,11 +241,11 @@ static inline void copy_user_highpage(struct page *to, struct page *from,
{
char *vfrom, *vto;

- vfrom = kmap_atomic(from);
- vto = kmap_atomic(to);
+ vfrom = kmap_local_page(from);
+ vto = kmap_local_page(to);
copy_user_page(vto, vfrom, vaddr, to);
- kunmap_atomic(vto);
- kunmap_atomic(vfrom);
+ kunmap_local(vto);
+ kunmap_local(vfrom);
}

#endif
@@ -256,11 +256,11 @@ static inline void copy_highpage(struct page *to, struct page *from)
{
char *vfrom, *vto;

- vfrom = kmap_atomic(from);
- vto = kmap_atomic(to);
+ vfrom = kmap_local_page(from);
+ vto = kmap_local_page(to);
copy_page(vto, vfrom);
- kunmap_atomic(vto);
- kunmap_atomic(vfrom);
+ kunmap_local(vto);
+ kunmap_local(vfrom);
}

#endif
--
2.28.0.rc0.12.gb6a658bd00c9

2020-12-07 23:01:46

by Ira Weiny

[permalink] [raw]
Subject: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

From: Ira Weiny <[email protected]>

Working through a conversion to a call such as kmap_thread() revealed
many places where the pattern kmap/memcpy/kunmap occurred.

Eric Biggers, Matthew Wilcox, Christoph Hellwig, Dan Williams, and Al
Viro all suggested putting this code into helper functions. Al Viro
further pointed out that these functions already existed in the iov_iter
code.[1]

Placing these functions in 'highmem.h' is suboptimal especially with the
changes being proposed in the functionality of kmap. From a caller
perspective including/using 'highmem.h' implies that the functions
defined in that header are only required when highmem is in use which is
increasingly not the case with modern processors. Some headers like
mm.h or string.h seem ok but don't really portray the functionality
well. 'pagemap.h', on the other hand, makes sense and is already
included in many of the places we want to convert.

Another alternative would be to create a new header for the promoted
memcpy functions, but it masks the fact that these are designed to copy
to/from pages using the kernel direct mappings and complicates matters
with a new header.

Lift memcpy_to_page() and memcpy_from_page() to pagemap.h.

Remove memzero_page() in favor of zero_user() to zero a page.

Add a memcpy_page(), memmove_page, and memset_page() to cover more
kmap/mem*/kunmap. patterns.

Finally use kmap_local_page() in all the new calls.

[1] https://lore.kernel.org/lkml/[email protected]/
https://lore.kernel.org/lkml/[email protected]/

Cc: Dave Hansen <[email protected]>
Suggested-by: Matthew Wilcox <[email protected]>
Suggested-by: Christoph Hellwig <[email protected]>
Suggested-by: Dan Williams <[email protected]>
Suggested-by: Al Viro <[email protected]>
Suggested-by: Eric Biggers <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>

---
Changes for V2:
From Thomas Gleixner
Change kmap_atomic() to kmap_local_page() after basing
on tip/core/mm
From Joonas Lahtinen
Reverse offset/val in memset_page()
---
include/linux/pagemap.h | 44 +++++++++++++++++++++++++++++++++++++++++
lib/iov_iter.c | 26 +++---------------------
2 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index c77b7c31b2e4..9141e5b7b9df 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -1028,4 +1028,48 @@ unsigned int i_blocks_per_page(struct inode *inode, struct page *page)
{
return thp_size(page) >> inode->i_blkbits;
}
+
+static inline void memcpy_page(struct page *dst_page, size_t dst_off,
+ struct page *src_page, size_t src_off,
+ size_t len)
+{
+ char *dst = kmap_local_page(dst_page);
+ char *src = kmap_local_page(src_page);
+ memcpy(dst + dst_off, src + src_off, len);
+ kunmap_local(src);
+ kunmap_local(dst);
+}
+
+static inline void memmove_page(struct page *dst_page, size_t dst_off,
+ struct page *src_page, size_t src_off,
+ size_t len)
+{
+ char *dst = kmap_local_page(dst_page);
+ char *src = kmap_local_page(src_page);
+ memmove(dst + dst_off, src + src_off, len);
+ kunmap_local(src);
+ kunmap_local(dst);
+}
+
+static inline void memcpy_from_page(char *to, struct page *page, size_t offset, size_t len)
+{
+ char *from = kmap_local_page(page);
+ memcpy(to, from + offset, len);
+ kunmap_local(from);
+}
+
+static inline void memcpy_to_page(struct page *page, size_t offset, const char *from, size_t len)
+{
+ char *to = kmap_local_page(page);
+ memcpy(to + offset, from, len);
+ kunmap_local(to);
+}
+
+static inline void memset_page(struct page *page, size_t offset, int val, size_t len)
+{
+ char *addr = kmap_local_page(page);
+ memset(addr + offset, val, len);
+ kunmap_local(addr);
+}
+
#endif /* _LINUX_PAGEMAP_H */
diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 1635111c5bd2..8ed1f846fcc3 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -5,6 +5,7 @@
#include <linux/fault-inject-usercopy.h>
#include <linux/uio.h>
#include <linux/pagemap.h>
+#include <linux/highmem.h>
#include <linux/slab.h>
#include <linux/vmalloc.h>
#include <linux/splice.h>
@@ -466,27 +467,6 @@ void iov_iter_init(struct iov_iter *i, unsigned int direction,
}
EXPORT_SYMBOL(iov_iter_init);

-static void memcpy_from_page(char *to, struct page *page, size_t offset, size_t len)
-{
- char *from = kmap_atomic(page);
- memcpy(to, from + offset, len);
- kunmap_atomic(from);
-}
-
-static void memcpy_to_page(struct page *page, size_t offset, const char *from, size_t len)
-{
- char *to = kmap_atomic(page);
- memcpy(to + offset, from, len);
- kunmap_atomic(to);
-}
-
-static void memzero_page(struct page *page, size_t offset, size_t len)
-{
- char *addr = kmap_atomic(page);
- memset(addr + offset, 0, len);
- kunmap_atomic(addr);
-}
-
static inline bool allocated(struct pipe_buffer *buf)
{
return buf->ops == &default_pipe_buf_ops;
@@ -964,7 +944,7 @@ static size_t pipe_zero(size_t bytes, struct iov_iter *i)

do {
size_t chunk = min_t(size_t, n, PAGE_SIZE - off);
- memzero_page(pipe->bufs[i_head & p_mask].page, off, chunk);
+ zero_user(pipe->bufs[i_head & p_mask].page, off, chunk);
i->head = i_head;
i->iov_offset = off + chunk;
n -= chunk;
@@ -981,7 +961,7 @@ size_t iov_iter_zero(size_t bytes, struct iov_iter *i)
return pipe_zero(bytes, i);
iterate_and_advance(i, bytes, v,
clear_user(v.iov_base, v.iov_len),
- memzero_page(v.bv_page, v.bv_offset, v.bv_len),
+ zero_user(v.bv_page, v.bv_offset, v.bv_len),
memset(v.iov_base, 0, v.iov_len)
)

--
2.28.0.rc0.12.gb6a658bd00c9

2020-12-07 23:30:07

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> + struct page *src_page, size_t src_off,
> + size_t len)
> +{
> + char *dst = kmap_local_page(dst_page);
> + char *src = kmap_local_page(src_page);

I appreciate you've only moved these, but please add:

BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);

> + memcpy(dst + dst_off, src + src_off, len);
> + kunmap_local(src);
> + kunmap_local(dst);
> +}
> +
> +static inline void memmove_page(struct page *dst_page, size_t dst_off,
> + struct page *src_page, size_t src_off,
> + size_t len)
> +{
> + char *dst = kmap_local_page(dst_page);
> + char *src = kmap_local_page(src_page);

BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);

> + memmove(dst + dst_off, src + src_off, len);
> + kunmap_local(src);
> + kunmap_local(dst);
> +}
> +
> +static inline void memcpy_from_page(char *to, struct page *page, size_t offset, size_t len)
> +{
> + char *from = kmap_local_page(page);

BUG_ON(offset + len > PAGE_SIZE);

> + memcpy(to, from + offset, len);
> + kunmap_local(from);
> +}
> +
> +static inline void memcpy_to_page(struct page *page, size_t offset, const char *from, size_t len)
> +{
> + char *to = kmap_local_page(page);

BUG_ON(offset + len > PAGE_SIZE);

> + memcpy(to + offset, from, len);
> + kunmap_local(to);
> +}
> +
> +static inline void memset_page(struct page *page, size_t offset, int val, size_t len)
> +{
> + char *addr = kmap_local_page(page);

BUG_ON(offset + len > PAGE_SIZE);

> + memset(addr + offset, val, len);
> + kunmap_local(addr);
> +}
> +
> #endif /* _LINUX_PAGEMAP_H */

2020-12-07 23:39:32

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <[email protected]> wrote:
>
> On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > + struct page *src_page, size_t src_off,
> > + size_t len)
> > +{
> > + char *dst = kmap_local_page(dst_page);
> > + char *src = kmap_local_page(src_page);
>
> I appreciate you've only moved these, but please add:
>
> BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);

I imagine it's not outside the realm of possibility that some driver
on CONFIG_HIGHMEM=n is violating this assumption and getting away with
it because kmap_atomic() of contiguous pages "just works (TM)".
Shouldn't this WARN rather than BUG so that the user can report the
buggy driver and not have a dead system?

2020-12-07 23:45:00

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <[email protected]> wrote:
> >
> > On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > + struct page *src_page, size_t src_off,
> > > + size_t len)
> > > +{
> > > + char *dst = kmap_local_page(dst_page);
> > > + char *src = kmap_local_page(src_page);
> >
> > I appreciate you've only moved these, but please add:
> >
> > BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
>
> I imagine it's not outside the realm of possibility that some driver
> on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> it because kmap_atomic() of contiguous pages "just works (TM)".
> Shouldn't this WARN rather than BUG so that the user can report the
> buggy driver and not have a dead system?

As opposed to (on a HIGHMEM=y system) silently corrupting data that
is on the next page of memory? I suppose ideally ...

if (WARN_ON(dst_off + len > PAGE_SIZE))
len = PAGE_SIZE - dst_off;
if (WARN_ON(src_off + len > PAGE_SIZE))
len = PAGE_SIZE - src_off;

and then we just truncate the data of the offending caller instead of
corrupting innocent data that happens to be adjacent. Although that's
not ideal either ... I dunno, what's the least bad poison to drink here?

2020-12-08 01:17:59

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <[email protected]> wrote:
>
> On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <[email protected]> wrote:
> > >
> > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > + struct page *src_page, size_t src_off,
> > > > + size_t len)
> > > > +{
> > > > + char *dst = kmap_local_page(dst_page);
> > > > + char *src = kmap_local_page(src_page);
> > >
> > > I appreciate you've only moved these, but please add:
> > >
> > > BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> >
> > I imagine it's not outside the realm of possibility that some driver
> > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > it because kmap_atomic() of contiguous pages "just works (TM)".
> > Shouldn't this WARN rather than BUG so that the user can report the
> > buggy driver and not have a dead system?
>
> As opposed to (on a HIGHMEM=y system) silently corrupting data that
> is on the next page of memory?

Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...

> I suppose ideally ...
>
> if (WARN_ON(dst_off + len > PAGE_SIZE))
> len = PAGE_SIZE - dst_off;
> if (WARN_ON(src_off + len > PAGE_SIZE))
> len = PAGE_SIZE - src_off;
>
> and then we just truncate the data of the offending caller instead of
> corrupting innocent data that happens to be adjacent. Although that's
> not ideal either ... I dunno, what's the least bad poison to drink here?

Right, if the driver was relying on "corruption" for correct operation.

If corruption actual were happening in practice wouldn't there have
been screams by now? Again, not necessarily...

At least with just plain WARN the kernel will start screaming on the
user's behalf, and if it worked before it will keep working.

2020-12-08 12:57:25

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> Placing these functions in 'highmem.h' is suboptimal especially with the
> changes being proposed in the functionality of kmap. From a caller
> perspective including/using 'highmem.h' implies that the functions
> defined in that header are only required when highmem is in use which is
> increasingly not the case with modern processors. Some headers like
> mm.h or string.h seem ok but don't really portray the functionality
> well. 'pagemap.h', on the other hand, makes sense and is already
> included in many of the places we want to convert.

pagemap.h is for the page cache. It's not for "random page
functionality". Yes, I know it's badly named. No, I don't want to
rename it. These helpers should go in highmem.h along with zero_user().

2020-12-08 16:41:50

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Tue, Dec 08, 2020 at 12:23:16PM +0000, Matthew Wilcox wrote:
> On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> > Placing these functions in 'highmem.h' is suboptimal especially with the
> > changes being proposed in the functionality of kmap. From a caller
> > perspective including/using 'highmem.h' implies that the functions
> > defined in that header are only required when highmem is in use which is
> > increasingly not the case with modern processors. Some headers like
> > mm.h or string.h seem ok but don't really portray the functionality
> > well. 'pagemap.h', on the other hand, makes sense and is already
> > included in many of the places we want to convert.
>
> pagemap.h is for the page cache. It's not for "random page
> functionality". Yes, I know it's badly named. No, I don't want to
> rename it. These helpers should go in highmem.h along with zero_user().

I could have sworn you suggested pagemap.h. But I can't find the evidence on
lore. :-/ hehehe...

In the end the code does not care. I have a distaste for highmem.h because it
is no longer for 'high memory'. And I think a number of driver writers who are
targeting 64bit platforms just don't care any more. So as we head toward
memory not being mapped by the kernel for other reasons I think highmem needs
to be 'rebranded' if not renamed.

Ira

2020-12-08 16:44:59

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Tue, Dec 08, 2020 at 08:38:14AM -0800, Ira Weiny wrote:
> On Tue, Dec 08, 2020 at 12:23:16PM +0000, Matthew Wilcox wrote:
> > On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> > > Placing these functions in 'highmem.h' is suboptimal especially with the
> > > changes being proposed in the functionality of kmap. From a caller
> > > perspective including/using 'highmem.h' implies that the functions
> > > defined in that header are only required when highmem is in use which is
> > > increasingly not the case with modern processors. Some headers like
> > > mm.h or string.h seem ok but don't really portray the functionality
> > > well. 'pagemap.h', on the other hand, makes sense and is already
> > > included in many of the places we want to convert.
> >
> > pagemap.h is for the page cache. It's not for "random page
> > functionality". Yes, I know it's badly named. No, I don't want to
> > rename it. These helpers should go in highmem.h along with zero_user().
>
> I could have sworn you suggested pagemap.h. But I can't find the evidence on
> lore. :-/ hehehe...
>
> In the end the code does not care. I have a distaste for highmem.h because it
> is no longer for 'high memory'. And I think a number of driver writers who are
> targeting 64bit platforms just don't care any more. So as we head toward
> memory not being mapped by the kernel for other reasons I think highmem needs
> to be 'rebranded' if not renamed.

Rename highmem.h to kmap.h?

2020-12-08 21:37:29

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <[email protected]> wrote:
> >
> > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <[email protected]> wrote:
> > > >
> > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > + struct page *src_page, size_t src_off,
> > > > > + size_t len)
> > > > > +{
> > > > > + char *dst = kmap_local_page(dst_page);
> > > > > + char *src = kmap_local_page(src_page);
> > > >
> > > > I appreciate you've only moved these, but please add:
> > > >
> > > > BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > >
> > > I imagine it's not outside the realm of possibility that some driver
> > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > Shouldn't this WARN rather than BUG so that the user can report the
> > > buggy driver and not have a dead system?
> >
> > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > is on the next page of memory?
>
> Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
>
> > I suppose ideally ...
> >
> > if (WARN_ON(dst_off + len > PAGE_SIZE))
> > len = PAGE_SIZE - dst_off;
> > if (WARN_ON(src_off + len > PAGE_SIZE))
> > len = PAGE_SIZE - src_off;
> >
> > and then we just truncate the data of the offending caller instead of
> > corrupting innocent data that happens to be adjacent. Although that's
> > not ideal either ... I dunno, what's the least bad poison to drink here?
>
> Right, if the driver was relying on "corruption" for correct operation.
>
> If corruption actual were happening in practice wouldn't there have
> been screams by now? Again, not necessarily...
>
> At least with just plain WARN the kernel will start screaming on the
> user's behalf, and if it worked before it will keep working.

So I decided to just sleep on this because I was recently told to not introduce
new WARN_ON's[1]

I don't think that truncating len is worth the effort. The conversions being
done should all 'work' At least corrupting users data in the same way as it
used to... ;-) I'm ok with adding the WARN_ON's and I have modified the patch
to do so while I work through the 0-day issues. (not sure what is going on
there.)

However, are we ok with adding the WARN_ON's given what Greg KH told me? This
is a bit more critical than the PKS API in that it could result in corrupt
data.

Ira

[1] https://lore.kernel.org/linux-doc/[email protected]/

2020-12-08 21:55:44

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Tue, Dec 08, 2020 at 01:32:55PM -0800, Ira Weiny wrote:
> On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> > On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <[email protected]> wrote:
> > >
> > > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <[email protected]> wrote:
> > > > >
> > > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> > > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > > + struct page *src_page, size_t src_off,
> > > > > > + size_t len)
> > > > > > +{
> > > > > > + char *dst = kmap_local_page(dst_page);
> > > > > > + char *src = kmap_local_page(src_page);
> > > > >
> > > > > I appreciate you've only moved these, but please add:
> > > > >
> > > > > BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > > >
> > > > I imagine it's not outside the realm of possibility that some driver
> > > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > > Shouldn't this WARN rather than BUG so that the user can report the
> > > > buggy driver and not have a dead system?
> > >
> > > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > > is on the next page of memory?
> >
> > Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> >
> > > I suppose ideally ...
> > >
> > > if (WARN_ON(dst_off + len > PAGE_SIZE))
> > > len = PAGE_SIZE - dst_off;
> > > if (WARN_ON(src_off + len > PAGE_SIZE))
> > > len = PAGE_SIZE - src_off;
> > >
> > > and then we just truncate the data of the offending caller instead of
> > > corrupting innocent data that happens to be adjacent. Although that's
> > > not ideal either ... I dunno, what's the least bad poison to drink here?
> >
> > Right, if the driver was relying on "corruption" for correct operation.
> >
> > If corruption actual were happening in practice wouldn't there have
> > been screams by now? Again, not necessarily...
> >
> > At least with just plain WARN the kernel will start screaming on the
> > user's behalf, and if it worked before it will keep working.
>
> So I decided to just sleep on this because I was recently told to not introduce
> new WARN_ON's[1]
>
> I don't think that truncating len is worth the effort. The conversions being
> done should all 'work' At least corrupting users data in the same way as it
> used to... ;-) I'm ok with adding the WARN_ON's and I have modified the patch
> to do so while I work through the 0-day issues. (not sure what is going on
> there.)
>
> However, are we ok with adding the WARN_ON's given what Greg KH told me? This
> is a bit more critical than the PKS API in that it could result in corrupt
> data.

zero_user_segments contains:

BUG_ON(end1 > page_size(page) || end2 > page_size(page));

These should be consistent. I think we've demonstrated that there is
no good option here.

2020-12-08 22:26:39

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Tue, Dec 8, 2020 at 1:51 PM Matthew Wilcox <[email protected]> wrote:
>
> On Tue, Dec 08, 2020 at 01:32:55PM -0800, Ira Weiny wrote:
> > On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> > > On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <[email protected]> wrote:
> > > >
> > > > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> > > > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > > > + struct page *src_page, size_t src_off,
> > > > > > > + size_t len)
> > > > > > > +{
> > > > > > > + char *dst = kmap_local_page(dst_page);
> > > > > > > + char *src = kmap_local_page(src_page);
> > > > > >
> > > > > > I appreciate you've only moved these, but please add:
> > > > > >
> > > > > > BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > > > >
> > > > > I imagine it's not outside the realm of possibility that some driver
> > > > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > > > Shouldn't this WARN rather than BUG so that the user can report the
> > > > > buggy driver and not have a dead system?
> > > >
> > > > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > > > is on the next page of memory?
> > >
> > > Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> > >
> > > > I suppose ideally ...
> > > >
> > > > if (WARN_ON(dst_off + len > PAGE_SIZE))
> > > > len = PAGE_SIZE - dst_off;
> > > > if (WARN_ON(src_off + len > PAGE_SIZE))
> > > > len = PAGE_SIZE - src_off;
> > > >
> > > > and then we just truncate the data of the offending caller instead of
> > > > corrupting innocent data that happens to be adjacent. Although that's
> > > > not ideal either ... I dunno, what's the least bad poison to drink here?
> > >
> > > Right, if the driver was relying on "corruption" for correct operation.
> > >
> > > If corruption actual were happening in practice wouldn't there have
> > > been screams by now? Again, not necessarily...
> > >
> > > At least with just plain WARN the kernel will start screaming on the
> > > user's behalf, and if it worked before it will keep working.
> >
> > So I decided to just sleep on this because I was recently told to not introduce
> > new WARN_ON's[1]
> >
> > I don't think that truncating len is worth the effort. The conversions being
> > done should all 'work' At least corrupting users data in the same way as it
> > used to... ;-) I'm ok with adding the WARN_ON's and I have modified the patch
> > to do so while I work through the 0-day issues. (not sure what is going on
> > there.)
> >
> > However, are we ok with adding the WARN_ON's given what Greg KH told me? This
> > is a bit more critical than the PKS API in that it could result in corrupt
> > data.
>
> zero_user_segments contains:
>
> BUG_ON(end1 > page_size(page) || end2 > page_size(page));
>
> These should be consistent. I think we've demonstrated that there is
> no good option here.

True, but these helpers are being deployed to many new locations where
they were not used before.

2020-12-08 22:26:45

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Tue, Dec 8, 2020 at 1:33 PM Ira Weiny <[email protected]> wrote:
>
> On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> > On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <[email protected]> wrote:
> > >
> > > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <[email protected]> wrote:
> > > > >
> > > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> > > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > > + struct page *src_page, size_t src_off,
> > > > > > + size_t len)
> > > > > > +{
> > > > > > + char *dst = kmap_local_page(dst_page);
> > > > > > + char *src = kmap_local_page(src_page);
> > > > >
> > > > > I appreciate you've only moved these, but please add:
> > > > >
> > > > > BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > > >
> > > > I imagine it's not outside the realm of possibility that some driver
> > > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > > Shouldn't this WARN rather than BUG so that the user can report the
> > > > buggy driver and not have a dead system?
> > >
> > > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > > is on the next page of memory?
> >
> > Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> >
> > > I suppose ideally ...
> > >
> > > if (WARN_ON(dst_off + len > PAGE_SIZE))
> > > len = PAGE_SIZE - dst_off;
> > > if (WARN_ON(src_off + len > PAGE_SIZE))
> > > len = PAGE_SIZE - src_off;
> > >
> > > and then we just truncate the data of the offending caller instead of
> > > corrupting innocent data that happens to be adjacent. Although that's
> > > not ideal either ... I dunno, what's the least bad poison to drink here?
> >
> > Right, if the driver was relying on "corruption" for correct operation.
> >
> > If corruption actual were happening in practice wouldn't there have
> > been screams by now? Again, not necessarily...
> >
> > At least with just plain WARN the kernel will start screaming on the
> > user's behalf, and if it worked before it will keep working.
>
> So I decided to just sleep on this because I was recently told to not introduce
> new WARN_ON's[1]
>
> I don't think that truncating len is worth the effort. The conversions being
> done should all 'work' At least corrupting users data in the same way as it
> used to... ;-) I'm ok with adding the WARN_ON's and I have modified the patch
> to do so while I work through the 0-day issues. (not sure what is going on
> there.)
>
> However, are we ok with adding the WARN_ON's given what Greg KH told me? This
> is a bit more critical than the PKS API in that it could result in corrupt
> data.

That situation was a bit different in my mind because the default
fallback stub path has typically never had WARN_ON even if it's never
supposed to be called.

2020-12-08 22:35:45

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Tue, Dec 08, 2020 at 02:23:10PM -0800, Dan Williams wrote:
> On Tue, Dec 8, 2020 at 1:51 PM Matthew Wilcox <[email protected]> wrote:
> >
> > On Tue, Dec 08, 2020 at 01:32:55PM -0800, Ira Weiny wrote:
> > > On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> > > > On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <[email protected]> wrote:
> > > > >
> > > > > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > > > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <[email protected]> wrote:
> > > > > > >
> > > > > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> > > > > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > > > > + struct page *src_page, size_t src_off,
> > > > > > > > + size_t len)
> > > > > > > > +{
> > > > > > > > + char *dst = kmap_local_page(dst_page);
> > > > > > > > + char *src = kmap_local_page(src_page);
> > > > > > >
> > > > > > > I appreciate you've only moved these, but please add:
> > > > > > >
> > > > > > > BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > > > > >
> > > > > > I imagine it's not outside the realm of possibility that some driver
> > > > > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > > > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > > > > Shouldn't this WARN rather than BUG so that the user can report the
> > > > > > buggy driver and not have a dead system?
> > > > >
> > > > > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > > > > is on the next page of memory?
> > > >
> > > > Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> > > >
> > > > > I suppose ideally ...
> > > > >
> > > > > if (WARN_ON(dst_off + len > PAGE_SIZE))
> > > > > len = PAGE_SIZE - dst_off;
> > > > > if (WARN_ON(src_off + len > PAGE_SIZE))
> > > > > len = PAGE_SIZE - src_off;
> > > > >
> > > > > and then we just truncate the data of the offending caller instead of
> > > > > corrupting innocent data that happens to be adjacent. Although that's
> > > > > not ideal either ... I dunno, what's the least bad poison to drink here?
> > > >
> > > > Right, if the driver was relying on "corruption" for correct operation.
> > > >
> > > > If corruption actual were happening in practice wouldn't there have
> > > > been screams by now? Again, not necessarily...
> > > >
> > > > At least with just plain WARN the kernel will start screaming on the
> > > > user's behalf, and if it worked before it will keep working.
> > >
> > > So I decided to just sleep on this because I was recently told to not introduce
> > > new WARN_ON's[1]
> > >
> > > I don't think that truncating len is worth the effort. The conversions being
> > > done should all 'work' At least corrupting users data in the same way as it
> > > used to... ;-) I'm ok with adding the WARN_ON's and I have modified the patch
> > > to do so while I work through the 0-day issues. (not sure what is going on
> > > there.)
> > >
> > > However, are we ok with adding the WARN_ON's given what Greg KH told me? This
> > > is a bit more critical than the PKS API in that it could result in corrupt
> > > data.
> >
> > zero_user_segments contains:
> >
> > BUG_ON(end1 > page_size(page) || end2 > page_size(page));
> >
> > These should be consistent. I think we've demonstrated that there is
> > no good option here.
>
> True, but these helpers are being deployed to many new locations where
> they were not used before.

So what's your preferred poison?

1. Corrupt random data in whatever's been mapped into the next page (which
is what the helpers currently do)
2. Copy less data than requested
3. Crash
4. Something else

2020-12-08 22:52:54

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Tue, Dec 08, 2020 at 10:32:34PM +0000, Matthew Wilcox wrote:
> On Tue, Dec 08, 2020 at 02:23:10PM -0800, Dan Williams wrote:
> > On Tue, Dec 8, 2020 at 1:51 PM Matthew Wilcox <[email protected]> wrote:
> > >
> > > On Tue, Dec 08, 2020 at 01:32:55PM -0800, Ira Weiny wrote:
> > > > On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> > > > > On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <[email protected]> wrote:
> > > > > >
> > > > > > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > > > > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> > > > > > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > > > > > + struct page *src_page, size_t src_off,
> > > > > > > > > + size_t len)
> > > > > > > > > +{
> > > > > > > > > + char *dst = kmap_local_page(dst_page);
> > > > > > > > > + char *src = kmap_local_page(src_page);
> > > > > > > >
> > > > > > > > I appreciate you've only moved these, but please add:
> > > > > > > >
> > > > > > > > BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > > > > > >
> > > > > > > I imagine it's not outside the realm of possibility that some driver
> > > > > > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > > > > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > > > > > Shouldn't this WARN rather than BUG so that the user can report the
> > > > > > > buggy driver and not have a dead system?
> > > > > >
> > > > > > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > > > > > is on the next page of memory?
> > > > >
> > > > > Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> > > > >
> > > > > > I suppose ideally ...
> > > > > >
> > > > > > if (WARN_ON(dst_off + len > PAGE_SIZE))
> > > > > > len = PAGE_SIZE - dst_off;
> > > > > > if (WARN_ON(src_off + len > PAGE_SIZE))
> > > > > > len = PAGE_SIZE - src_off;
> > > > > >
> > > > > > and then we just truncate the data of the offending caller instead of
> > > > > > corrupting innocent data that happens to be adjacent. Although that's
> > > > > > not ideal either ... I dunno, what's the least bad poison to drink here?
> > > > >
> > > > > Right, if the driver was relying on "corruption" for correct operation.
> > > > >
> > > > > If corruption actual were happening in practice wouldn't there have
> > > > > been screams by now? Again, not necessarily...
> > > > >
> > > > > At least with just plain WARN the kernel will start screaming on the
> > > > > user's behalf, and if it worked before it will keep working.
> > > >
> > > > So I decided to just sleep on this because I was recently told to not introduce
> > > > new WARN_ON's[1]
> > > >
> > > > I don't think that truncating len is worth the effort. The conversions being
> > > > done should all 'work' At least corrupting users data in the same way as it
> > > > used to... ;-) I'm ok with adding the WARN_ON's and I have modified the patch
> > > > to do so while I work through the 0-day issues. (not sure what is going on
> > > > there.)
> > > >
> > > > However, are we ok with adding the WARN_ON's given what Greg KH told me? This
> > > > is a bit more critical than the PKS API in that it could result in corrupt
> > > > data.
> > >
> > > zero_user_segments contains:
> > >
> > > BUG_ON(end1 > page_size(page) || end2 > page_size(page));
> > >
> > > These should be consistent. I think we've demonstrated that there is
> > > no good option here.
> >
> > True, but these helpers are being deployed to many new locations where
> > they were not used before.
>
> So what's your preferred poison?
>
> 1. Corrupt random data in whatever's been mapped into the next page (which
> is what the helpers currently do)

Please no.

> 2. Copy less data than requested

This sounds like the germination event for a research paper showing that
63% of callers never notice. ;)

> 3. Crash

Useful as a debug tool?

> 4. Something else

Return bytes copied like we do for writes that didn't quite work?

--D

2020-12-08 23:00:09

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Tue, Dec 08, 2020 at 02:45:55PM -0800, Darrick J. Wong wrote:
> On Tue, Dec 08, 2020 at 10:32:34PM +0000, Matthew Wilcox wrote:
> > On Tue, Dec 08, 2020 at 02:23:10PM -0800, Dan Williams wrote:
> > > On Tue, Dec 8, 2020 at 1:51 PM Matthew Wilcox <[email protected]> wrote:
> > > >
> > > > On Tue, Dec 08, 2020 at 01:32:55PM -0800, Ira Weiny wrote:
> > > > > On Mon, Dec 07, 2020 at 03:49:55PM -0800, Dan Williams wrote:
> > > > > > On Mon, Dec 7, 2020 at 3:40 PM Matthew Wilcox <[email protected]> wrote:
> > > > > > >
> > > > > > > On Mon, Dec 07, 2020 at 03:34:44PM -0800, Dan Williams wrote:
> > > > > > > > On Mon, Dec 7, 2020 at 3:27 PM Matthew Wilcox <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Dec 07, 2020 at 02:57:03PM -0800, [email protected] wrote:
> > > > > > > > > > +static inline void memcpy_page(struct page *dst_page, size_t dst_off,
> > > > > > > > > > + struct page *src_page, size_t src_off,
> > > > > > > > > > + size_t len)
> > > > > > > > > > +{
> > > > > > > > > > + char *dst = kmap_local_page(dst_page);
> > > > > > > > > > + char *src = kmap_local_page(src_page);
> > > > > > > > >
> > > > > > > > > I appreciate you've only moved these, but please add:
> > > > > > > > >
> > > > > > > > > BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE);
> > > > > > > >
> > > > > > > > I imagine it's not outside the realm of possibility that some driver
> > > > > > > > on CONFIG_HIGHMEM=n is violating this assumption and getting away with
> > > > > > > > it because kmap_atomic() of contiguous pages "just works (TM)".
> > > > > > > > Shouldn't this WARN rather than BUG so that the user can report the
> > > > > > > > buggy driver and not have a dead system?
> > > > > > >
> > > > > > > As opposed to (on a HIGHMEM=y system) silently corrupting data that
> > > > > > > is on the next page of memory?
> > > > > >
> > > > > > Wouldn't it fault in HIGHMEM=y case? I guess not necessarily...
> > > > > >
> > > > > > > I suppose ideally ...
> > > > > > >
> > > > > > > if (WARN_ON(dst_off + len > PAGE_SIZE))
> > > > > > > len = PAGE_SIZE - dst_off;
> > > > > > > if (WARN_ON(src_off + len > PAGE_SIZE))
> > > > > > > len = PAGE_SIZE - src_off;
> > > > > > >
> > > > > > > and then we just truncate the data of the offending caller instead of
> > > > > > > corrupting innocent data that happens to be adjacent. Although that's
> > > > > > > not ideal either ... I dunno, what's the least bad poison to drink here?
> > > > > >
> > > > > > Right, if the driver was relying on "corruption" for correct operation.
> > > > > >
> > > > > > If corruption actual were happening in practice wouldn't there have
> > > > > > been screams by now? Again, not necessarily...
> > > > > >
> > > > > > At least with just plain WARN the kernel will start screaming on the
> > > > > > user's behalf, and if it worked before it will keep working.
> > > > >
> > > > > So I decided to just sleep on this because I was recently told to not introduce
> > > > > new WARN_ON's[1]
> > > > >
> > > > > I don't think that truncating len is worth the effort. The conversions being
> > > > > done should all 'work' At least corrupting users data in the same way as it
> > > > > used to... ;-) I'm ok with adding the WARN_ON's and I have modified the patch
> > > > > to do so while I work through the 0-day issues. (not sure what is going on
> > > > > there.)
> > > > >
> > > > > However, are we ok with adding the WARN_ON's given what Greg KH told me? This
> > > > > is a bit more critical than the PKS API in that it could result in corrupt
> > > > > data.
> > > >
> > > > zero_user_segments contains:
> > > >
> > > > BUG_ON(end1 > page_size(page) || end2 > page_size(page));
> > > >
> > > > These should be consistent. I think we've demonstrated that there is
> > > > no good option here.
> > >
> > > True, but these helpers are being deployed to many new locations where
> > > they were not used before.
> >
> > So what's your preferred poison?
> >
> > 1. Corrupt random data in whatever's been mapped into the next page (which
> > is what the helpers currently do)
>
> Please no.
>
> > 2. Copy less data than requested
>
> This sounds like the germination event for a research paper showing that
> 63% of callers never notice. ;)
>
> > 3. Crash
>
> Useful as a debug tool?
>
> > 4. Something else
>
> Return bytes copied like we do for writes that didn't quite work?

... to learn that 87% of callers never check the return value, 10%
of them do the wrong thing and the remainder have never been tested?

2020-12-08 23:45:04

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Tue, Dec 8, 2020 at 2:49 PM Darrick J. Wong <[email protected]> wrote:
[..]
> > So what's your preferred poison?
> >
> > 1. Corrupt random data in whatever's been mapped into the next page (which
> > is what the helpers currently do)
>
> Please no.

My assertion is that the kernel can't know it's corruption, it can
only know that the driver is abusing the API. So over-copy and WARN
seems better than violently regress by crashing what might have been
working silently before.

2020-12-09 02:27:19

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Tue, Dec 08, 2020 at 03:40:52PM -0800, Dan Williams wrote:
> On Tue, Dec 8, 2020 at 2:49 PM Darrick J. Wong <[email protected]> wrote:
> [..]
> > > So what's your preferred poison?
> > >
> > > 1. Corrupt random data in whatever's been mapped into the next page (which
> > > is what the helpers currently do)
> >
> > Please no.
>
> My assertion is that the kernel can't know it's corruption, it can
> only know that the driver is abusing the API. So over-copy and WARN
> seems better than violently regress by crashing what might have been
> working silently before.

Right now we have a mixed bag. zero_user() [and it's variants, circa 2008]
does a BUG_ON.[0] While the other ones do nothing; clear_highpage(),
clear_user_highpage(), copy_user_highpage(), and copy_highpage().

While continuing to audit the code I don't see any users who would violating
the API with a simple conversion of the code. The calls which I have worked on
[which is many at this point] all have checks in place which are well aware of
page boundaries.

Therefore, I tend to agree with Dan that if anything is to be done it should be
a WARN_ON() which is only going to throw an error that something has probably
been wrong all along and should be fixed but continue running as before.

BUG_ON() is a very big hammer. And I don't think that Linus is going to
appreciate a BUG_ON here.[1] Callers of this API should be well aware that
they are operating on a page and that specifying parameters beyond the bounds
of a page are going to have bad consequences...

Furthermore, I'm still leery of adding the WARN_ON's because Greg KH says many
people will be converting them to BUG_ON's via panic-on-warn anyway. But at
least that is their choice.

FWIW I think this is a 'bad BUG_ON' use because we are "checking something that
we know we might be getting wrong".[1] And because, "BUG() is only good for
something that never happens and that we really have no other option for".[2]

IMO, These calls are like memcpy/memmove. memcpy/memmove don't validate bounds
and developers have lived with those constructs for a long time.

Ira

[0] BTW, After writing this email, with various URL research, I think this
BUG_ON() is also probably wrong...

[1]
<quote>
...
It's [BUG_ON] not a "let's check that
everybody did things right", it's a "this is a major design rule in
this core code".
...
</quote>
-- Linus (https://lkml.org/lkml/2016/10/4/337)

[2] https://yarchive.net/comp/linux/BUG.html

2020-12-09 04:07:08

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Tue, Dec 08, 2020 at 06:22:50PM -0800, Ira Weiny wrote:
> Right now we have a mixed bag. zero_user() [and it's variants, circa 2008]
> does a BUG_ON.[0] While the other ones do nothing; clear_highpage(),
> clear_user_highpage(), copy_user_highpage(), and copy_highpage().

Erm, those functions operate on the entire PAGE_SIZE. There's nothing
for them to check.

> While continuing to audit the code I don't see any users who would violating
> the API with a simple conversion of the code. The calls which I have worked on
> [which is many at this point] all have checks in place which are well aware of
> page boundaries.

Oh good, then this BUG_ON won't trigger.

> Therefore, I tend to agree with Dan that if anything is to be done it should be
> a WARN_ON() which is only going to throw an error that something has probably
> been wrong all along and should be fixed but continue running as before.

Silent data corruption is for ever. Are you absolutely sure nobody has
done:

page = alloc_pages(GFP_HIGHUSER_MOVABLE, 3);
memcpy_to_page(page, PAGE_SIZE * 2, p, PAGE_SIZE * 2);

because that will work fine if the pages come from ZONE_NORMAL and fail
miserably if they came from ZONE_HIGHMEM.

> FWIW I think this is a 'bad BUG_ON' use because we are "checking something that
> we know we might be getting wrong".[1] And because, "BUG() is only good for
> something that never happens and that we really have no other option for".[2]

BUG() is our only option here. Both limiting how much we copy or
copying the requested amount result in data corruption or leaking
information to a process that isn't supposed to see it.

What Linus is railing against is the developers who say "Oh, I don't
know what to do here, I'll just BUG()". That's not the case here.
We've thought about it. We've discussed it. There's NO GOOD OPTION.

Unless you want to do the moral equivalent of this:

http://git.infradead.org/users/willy/pagecache.git/commitdiff/d2417516bd8b3dd1db096a9b040b0264d8052339

I think that would look something like this ...

void memcpy_to_page(struct page *page, size_t offset, const char *from,
size_t len)
{
page += offset / PAGE_SIZE;
offset %= PAGE_SIZE;

while (len) {
char *to = kmap_atomic(page);
size_t bytes = min(len, PAGE_SIZE - offset);
memcpy(to + offset, from, len);
kunmap_atomic(to);
len -= bytes;
offset = 0;
page++;
}
}

Now 32-bit highmem will do the same thing as 64-bit for my example above,
just more slowly. Untested, obviously.

2020-12-09 20:38:42

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Tue, Dec 8, 2020 at 8:03 PM Matthew Wilcox <[email protected]> wrote:
>
> On Tue, Dec 08, 2020 at 06:22:50PM -0800, Ira Weiny wrote:
> > Right now we have a mixed bag. zero_user() [and it's variants, circa 2008]
> > does a BUG_ON.[0] While the other ones do nothing; clear_highpage(),
> > clear_user_highpage(), copy_user_highpage(), and copy_highpage().
>
> Erm, those functions operate on the entire PAGE_SIZE. There's nothing
> for them to check.
>
> > While continuing to audit the code I don't see any users who would violating
> > the API with a simple conversion of the code. The calls which I have worked on
> > [which is many at this point] all have checks in place which are well aware of
> > page boundaries.
>
> Oh good, then this BUG_ON won't trigger.
>
> > Therefore, I tend to agree with Dan that if anything is to be done it should be
> > a WARN_ON() which is only going to throw an error that something has probably
> > been wrong all along and should be fixed but continue running as before.
>
> Silent data corruption is for ever. Are you absolutely sure nobody has
> done:
>
> page = alloc_pages(GFP_HIGHUSER_MOVABLE, 3);
> memcpy_to_page(page, PAGE_SIZE * 2, p, PAGE_SIZE * 2);
>
> because that will work fine if the pages come from ZONE_NORMAL and fail
> miserably if they came from ZONE_HIGHMEM.

...and violently regress with the BUG_ON.

The question to me is: which is more likely that any bad usages have
been covered up by being limited to ZONE_NORMAL / 64-bit only, or that
silent data corruption has been occurring with no ill effects?

> > FWIW I think this is a 'bad BUG_ON' use because we are "checking something that
> > we know we might be getting wrong".[1] And because, "BUG() is only good for
> > something that never happens and that we really have no other option for".[2]
>
> BUG() is our only option here. Both limiting how much we copy or
> copying the requested amount result in data corruption or leaking
> information to a process that isn't supposed to see it.

At a minimum I think this should be debated in a follow on patch to
add assertion checking where there was none before. There is no
evidence of a page being overrun in the audit Ira performed.

2020-12-09 20:42:54

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Wed, Dec 09, 2020 at 11:47:56AM -0800, Dan Williams wrote:
> On Tue, Dec 8, 2020 at 8:03 PM Matthew Wilcox <[email protected]> wrote:
> > On Tue, Dec 08, 2020 at 06:22:50PM -0800, Ira Weiny wrote:
> > > Therefore, I tend to agree with Dan that if anything is to be done it should be
> > > a WARN_ON() which is only going to throw an error that something has probably
> > > been wrong all along and should be fixed but continue running as before.
> >
> > Silent data corruption is for ever. Are you absolutely sure nobody has
> > done:
> >
> > page = alloc_pages(GFP_HIGHUSER_MOVABLE, 3);
> > memcpy_to_page(page, PAGE_SIZE * 2, p, PAGE_SIZE * 2);
> >
> > because that will work fine if the pages come from ZONE_NORMAL and fail
> > miserably if they came from ZONE_HIGHMEM.
>
> ...and violently regress with the BUG_ON.

... which is what we want, no?

> The question to me is: which is more likely that any bad usages have
> been covered up by being limited to ZONE_NORMAL / 64-bit only, or that
> silent data corruption has been occurring with no ill effects?

I wouldn't be at all surprised to learn that there is silent data
corruption on 32-bit systems with HIGHMEM. Would you? How much testing
do you do on 32-bit HIGHMEM systems?

Actually, I wouldn't be at all surprised if we can hit this problem today.
Look at this:

size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
{
char *to = addr;
if (unlikely(iov_iter_is_pipe(i))) {
WARN_ON(1);
return 0;
}
if (iter_is_iovec(i))
might_fault();
iterate_and_advance(i, bytes, v,
copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
v.bv_offset, v.bv_len),
memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
)

return bytes;
}
EXPORT_SYMBOL(_copy_from_iter);

There's a lot of macrology in there, so for those following along who
aren't familiar with the iov_iter code, if the iter is operating on a
bvec, then iterate_and_advance() will call memcpy_from_page(), passing
it the bv_page, bv_offset and bv_len stored in the bvec. Since 2019,
Linux has supported multipage bvecs (commit 07173c3ec276). So bv_len
absolutely *can* be > PAGE_SIZE.

Does this ever happen in practice? I have no idea; I don't know whether
any multipage BIOs are currently handed to copy_from_iter(). But I
have no confidence in your audit if you didn't catch this one.

> > > FWIW I think this is a 'bad BUG_ON' use because we are "checking something that
> > > we know we might be getting wrong".[1] And because, "BUG() is only good for
> > > something that never happens and that we really have no other option for".[2]
> >
> > BUG() is our only option here. Both limiting how much we copy or
> > copying the requested amount result in data corruption or leaking
> > information to a process that isn't supposed to see it.
>
> At a minimum I think this should be debated in a follow on patch to
> add assertion checking where there was none before. There is no
> evidence of a page being overrun in the audit Ira performed.

If we put in into a separate patch, someone will suggest backing out the
patch which tells us that there's a problem. You know, like this guy ...
https://lore.kernel.org/linux-mm/CAPcyv4jNVroYmirzKw_=CsEixOEACdL3M1Wc4xjd_TFv3h+o8Q@mail.gmail.com/

2020-12-09 20:45:27

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Wed, Dec 9, 2020 at 12:14 PM Matthew Wilcox <[email protected]> wrote:
[..]
> If we put in into a separate patch, someone will suggest backing out the
> patch which tells us that there's a problem. You know, like this guy ...
> https://lore.kernel.org/linux-mm/CAPcyv4jNVroYmirzKw_=CsEixOEACdL3M1Wc4xjd_TFv3h+o8Q@mail.gmail.com/

...and that person, like in this case, will be told 'no' and go on to
find / fix the real problem.

2020-12-10 07:18:35

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH V2 2/2] mm/highmem: Lift memcpy_[to|from]_page to core

On Wed, Dec 09, 2020 at 08:14:15PM +0000, Matthew Wilcox wrote:
> On Wed, Dec 09, 2020 at 11:47:56AM -0800, Dan Williams wrote:
> > On Tue, Dec 8, 2020 at 8:03 PM Matthew Wilcox <[email protected]> wrote:
> > > On Tue, Dec 08, 2020 at 06:22:50PM -0800, Ira Weiny wrote:
> > > > Therefore, I tend to agree with Dan that if anything is to be done it should be
> > > > a WARN_ON() which is only going to throw an error that something has probably
> > > > been wrong all along and should be fixed but continue running as before.
> > >
> > > Silent data corruption is for ever. Are you absolutely sure nobody has
> > > done:
> > >
> > > page = alloc_pages(GFP_HIGHUSER_MOVABLE, 3);
> > > memcpy_to_page(page, PAGE_SIZE * 2, p, PAGE_SIZE * 2);
> > >
> > > because that will work fine if the pages come from ZONE_NORMAL and fail
> > > miserably if they came from ZONE_HIGHMEM.
> >
> > ...and violently regress with the BUG_ON.
>
> ... which is what we want, no?
>
> > The question to me is: which is more likely that any bad usages have
> > been covered up by being limited to ZONE_NORMAL / 64-bit only, or that
> > silent data corruption has been occurring with no ill effects?
>
> I wouldn't be at all surprised to learn that there is silent data
> corruption on 32-bit systems with HIGHMEM. Would you? How much testing
> do you do on 32-bit HIGHMEM systems?
>
> Actually, I wouldn't be at all surprised if we can hit this problem today.
> Look at this:
>
> size_t _copy_from_iter(void *addr, size_t bytes, struct iov_iter *i)
> {
> char *to = addr;
> if (unlikely(iov_iter_is_pipe(i))) {
> WARN_ON(1);
> return 0;
> }
> if (iter_is_iovec(i))
> might_fault();
> iterate_and_advance(i, bytes, v,
> copyin((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len),
> memcpy_from_page((to += v.bv_len) - v.bv_len, v.bv_page,
> v.bv_offset, v.bv_len),
> memcpy((to += v.iov_len) - v.iov_len, v.iov_base, v.iov_len)
> )
>
> return bytes;
> }
> EXPORT_SYMBOL(_copy_from_iter);
>
> There's a lot of macrology in there, so for those following along who
> aren't familiar with the iov_iter code, if the iter is operating on a
> bvec, then iterate_and_advance() will call memcpy_from_page(), passing
> it the bv_page, bv_offset and bv_len stored in the bvec. Since 2019,
> Linux has supported multipage bvecs (commit 07173c3ec276). So bv_len
> absolutely *can* be > PAGE_SIZE.
>
> Does this ever happen in practice? I have no idea; I don't know whether
> any multipage BIOs are currently handed to copy_from_iter(). But I
> have no confidence in your audit if you didn't catch this one.

Ah... This call site has been there since 2014 and is not a new caller I have
been 'auditing'.[1]

>
> > > > FWIW I think this is a 'bad BUG_ON' use because we are "checking something that
> > > > we know we might be getting wrong".[1] And because, "BUG() is only good for
> > > > something that never happens and that we really have no other option for".[2]
> > >
> > > BUG() is our only option here. Both limiting how much we copy or
> > > copying the requested amount result in data corruption or leaking
> > > information to a process that isn't supposed to see it.
> >
> > At a minimum I think this should be debated in a follow on patch to
> > add assertion checking where there was none before. There is no
> > evidence of a page being overrun in the audit Ira performed.
>
> If we put in into a separate patch, someone will suggest backing out the
> patch which tells us that there's a problem. You know, like this guy ...
> https://lore.kernel.org/linux-mm/CAPcyv4jNVroYmirzKw_=CsEixOEACdL3M1Wc4xjd_TFv3h+o8Q@mail.gmail.com/

I'm not following this. Regardless I've already added the BUG_ON's.

Ira

[1]
commit 0dbca9a4b5d69a7e4b8c1d55b98312fcd9aafcf7
Author: Al Viro <[email protected]>
Date: Thu Nov 27 14:26:43 2014 -0500

iov_iter.c: convert copy_from_iter() to iterate_and_advance

Signed-off-by: Al Viro <[email protected]>