2019-08-02 02:32:42

by John Hubbard

[permalink] [raw]
Subject: [PATCH 00/34] put_user_pages(): miscellaneous call sites

From: John Hubbard <[email protected]>

Hi,

These are best characterized as miscellaneous conversions: many (not all)
call sites that don't involve biovec or iov_iter, nor mm/. It also leaves
out a few call sites that require some more work. These are mostly pretty
simple ones.

It's probably best to send all of these via Andrew's -mm tree, assuming
that there are no significant merge conflicts with ongoing work in other
trees (which I doubt, given that these are small changes).

These patches apply to the latest linux.git. Patch #1 is also already in
Andrew's tree, but given the broad non-linux-mm Cc list, I thought it
would be more convenient to just include that patch here, so that people
can use linux.git as the base--even though these are probably destined
for linux-mm.

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions"). That commit
has an extensive description of the problem and the planned steps to
solve it, but the highlites are:

1) Provide put_user_page*() routines, intended to be used
for releasing pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to
invoke put_user_page*(), instead of put_page(). This involves dozens of
call sites, and will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
implement tracking of these pages. This tracking will be separate from
the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
special handling (especially in writeback paths) when the pages are
backed by a filesystem.

And a few references, also from that commit:

[1] https://lwn.net/Articles/774411/ : "DMA and get_user_pages()"
[2] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"


Ira Weiny (1):
fs/binfmt_elf: convert put_page() to put_user_page*()

John Hubbard (33):
mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
net/rds: convert put_page() to put_user_page*()
net/ceph: convert put_page() to put_user_page*()
x86/kvm: convert put_page() to put_user_page*()
drm/etnaviv: convert release_pages() to put_user_pages()
drm/i915: convert put_page() to put_user_page*()
drm/radeon: convert put_page() to put_user_page*()
media/ivtv: convert put_page() to put_user_page*()
media/v4l2-core/mm: convert put_page() to put_user_page*()
genwqe: convert put_page() to put_user_page*()
scif: convert put_page() to put_user_page*()
vmci: convert put_page() to put_user_page*()
rapidio: convert put_page() to put_user_page*()
oradax: convert put_page() to put_user_page*()
staging/vc04_services: convert put_page() to put_user_page*()
drivers/tee: convert put_page() to put_user_page*()
vfio: convert put_page() to put_user_page*()
fbdev/pvr2fb: convert put_page() to put_user_page*()
fsl_hypervisor: convert put_page() to put_user_page*()
xen: convert put_page() to put_user_page*()
fs/exec.c: convert put_page() to put_user_page*()
orangefs: convert put_page() to put_user_page*()
uprobes: convert put_page() to put_user_page*()
futex: convert put_page() to put_user_page*()
mm/frame_vector.c: convert put_page() to put_user_page*()
mm/gup_benchmark.c: convert put_page() to put_user_page*()
mm/memory.c: convert put_page() to put_user_page*()
mm/madvise.c: convert put_page() to put_user_page*()
mm/process_vm_access.c: convert put_page() to put_user_page*()
crypt: convert put_page() to put_user_page*()
nfs: convert put_page() to put_user_page*()
goldfish_pipe: convert put_page() to put_user_page*()
kernel/events/core.c: convert put_page() to put_user_page*()

arch/x86/kvm/svm.c | 4 +-
crypto/af_alg.c | 7 +-
drivers/gpu/drm/etnaviv/etnaviv_gem.c | 4 +-
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 9 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
drivers/infiniband/core/umem.c | 5 +-
drivers/infiniband/hw/hfi1/user_pages.c | 5 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 5 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 5 +-
drivers/infiniband/sw/siw/siw_mem.c | 10 +-
drivers/media/pci/ivtv/ivtv-udma.c | 14 +--
drivers/media/pci/ivtv/ivtv-yuv.c | 10 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 3 +-
drivers/misc/genwqe/card_utils.c | 17 +--
drivers/misc/mic/scif/scif_rma.c | 17 ++-
drivers/misc/vmw_vmci/vmci_context.c | 2 +-
drivers/misc/vmw_vmci/vmci_queue_pair.c | 11 +-
drivers/platform/goldfish/goldfish_pipe.c | 9 +-
drivers/rapidio/devices/rio_mport_cdev.c | 9 +-
drivers/sbus/char/oradax.c | 2 +-
.../interface/vchiq_arm/vchiq_2835_arm.c | 10 +-
drivers/tee/tee_shm.c | 10 +-
drivers/vfio/vfio_iommu_type1.c | 8 +-
drivers/video/fbdev/pvr2fb.c | 3 +-
drivers/virt/fsl_hypervisor.c | 7 +-
drivers/xen/gntdev.c | 5 +-
drivers/xen/privcmd.c | 7 +-
fs/binfmt_elf.c | 2 +-
fs/binfmt_elf_fdpic.c | 2 +-
fs/exec.c | 2 +-
fs/nfs/direct.c | 4 +-
fs/orangefs/orangefs-bufmap.c | 7 +-
include/linux/mm.h | 5 +-
kernel/events/core.c | 2 +-
kernel/events/uprobes.c | 6 +-
kernel/futex.c | 10 +-
mm/frame_vector.c | 4 +-
mm/gup.c | 115 ++++++++----------
mm/gup_benchmark.c | 2 +-
mm/madvise.c | 2 +-
mm/memory.c | 2 +-
mm/process_vm_access.c | 18 +--
net/ceph/pagevec.c | 8 +-
net/rds/info.c | 5 +-
net/rds/message.c | 2 +-
net/rds/rdma.c | 15 ++-
virt/kvm/kvm_main.c | 4 +-
47 files changed, 151 insertions(+), 266 deletions(-)

--
2.22.0


2019-08-02 02:57:56

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites

On 8/1/19 7:16 PM, [email protected] wrote:
> From: John Hubbard <[email protected]>
>
> Hi,
>
> These are best characterized as miscellaneous conversions: many (not all)
> call sites that don't involve biovec or iov_iter, nor mm/. It also leaves
> out a few call sites that require some more work. These are mostly pretty
> simple ones.
>
> It's probably best to send all of these via Andrew's -mm tree, assuming
> that there are no significant merge conflicts with ongoing work in other
> trees (which I doubt, given that these are small changes).
>

In case anyone is wondering, this truncated series is due to a script failure:
git-send-email chokes when it hits email addresses whose names have a
comma in them, as happened here with patch 0003.

Please disregard this set and reply to the other thread.

thanks,
--
John Hubbard
NVIDIA

2019-08-02 09:33:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites

On Thu, Aug 01, 2019 at 07:16:19PM -0700, [email protected] wrote:

> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
> ("mm: introduce put_user_page*(), placeholder versions"). That commit
> has an extensive description of the problem and the planned steps to
> solve it, but the highlites are:

That is one horridly mangled Changelog there :-/ It looks like it's
partially duplicated.

Anyway; no objections to any of that, but I just wanted to mention that
there are other problems with long term pinning that haven't been
mentioned, notably they inhibit compaction.

A long time ago I proposed an interface to mark pages as pinned, such
that we could run compaction before we actually did the pinning.

2019-08-02 23:20:28

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites

On Fri 02-08-19 11:12:44, Michal Hocko wrote:
> On Thu 01-08-19 19:19:31, [email protected] wrote:
> [...]
> > 2) Convert all of the call sites for get_user_pages*(), to
> > invoke put_user_page*(), instead of put_page(). This involves dozens of
> > call sites, and will take some time.
>
> How do we make sure this is the case and it will remain the case in the
> future? There must be some automagic to enforce/check that. It is simply
> not manageable to do it every now and then because then 3) will simply
> be never safe.
>
> Have you considered coccinele or some other scripted way to do the
> transition? I have no idea how to deal with future changes that would
> break the balance though.

Yeah, that's why I've been suggesting at LSF/MM that we may need to create
a gup wrapper - say vaddr_pin_pages() - and track which sites dropping
references got converted by using this wrapper instead of gup. The
counterpart would then be more logically named as unpin_page() or whatever
instead of put_user_page(). Sure this is not completely foolproof (you can
create new callsite using vaddr_pin_pages() and then just drop refs using
put_page()) but I suppose it would be a high enough barrier for missed
conversions... Thoughts?

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2019-08-03 14:21:27

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites

On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
> On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
> > On Fri 02-08-19 11:12:44, Michal Hocko wrote:
> > > On Thu 01-08-19 19:19:31, [email protected] wrote:
> > > [...]
> > > > 2) Convert all of the call sites for get_user_pages*(), to
> > > > invoke put_user_page*(), instead of put_page(). This involves dozens of
> > > > call sites, and will take some time.
> > >
> > > How do we make sure this is the case and it will remain the case in the
> > > future? There must be some automagic to enforce/check that. It is simply
> > > not manageable to do it every now and then because then 3) will simply
> > > be never safe.
> > >
> > > Have you considered coccinele or some other scripted way to do the
> > > transition? I have no idea how to deal with future changes that would
> > > break the balance though.
> >
> > Yeah, that's why I've been suggesting at LSF/MM that we may need to create
> > a gup wrapper - say vaddr_pin_pages() - and track which sites dropping
> > references got converted by using this wrapper instead of gup. The
> > counterpart would then be more logically named as unpin_page() or whatever
> > instead of put_user_page(). Sure this is not completely foolproof (you can
> > create new callsite using vaddr_pin_pages() and then just drop refs using
> > put_page()) but I suppose it would be a high enough barrier for missed
> > conversions... Thoughts?
>
> I think the API we really need is get_user_bvec() / put_user_bvec(),
> and I know Christoph has been putting some work into that. That avoids
> doing refcount operations on hundreds of pages if the page in question is
> a huge page. Once people are switched over to that, they won't be tempted
> to manually call put_page() on the individual constituent pages of a bvec.

Well, get_user_bvec() is certainly a good API for one class of users but
just looking at the above series, you'll see there are *many* places that
just don't work with bvecs at all and you need something for those.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2019-08-04 01:59:09

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites

On 8/2/19 7:52 AM, Jan Kara wrote:
> On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
>> On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
>>> On Fri 02-08-19 11:12:44, Michal Hocko wrote:
>>>> On Thu 01-08-19 19:19:31, [email protected] wrote:
>>>> [...]
>>>>> 2) Convert all of the call sites for get_user_pages*(), to
>>>>> invoke put_user_page*(), instead of put_page(). This involves dozens of
>>>>> call sites, and will take some time.
>>>>
>>>> How do we make sure this is the case and it will remain the case in the
>>>> future? There must be some automagic to enforce/check that. It is simply
>>>> not manageable to do it every now and then because then 3) will simply
>>>> be never safe.
>>>>
>>>> Have you considered coccinele or some other scripted way to do the
>>>> transition? I have no idea how to deal with future changes that would
>>>> break the balance though.

Hi Michal,

Yes, I've thought about it, and coccinelle falls a bit short (it's not smart
enough to know which put_page()'s to convert). However, there is a debug
option planned: a yet-to-be-posted commit [1] uses struct page extensions
(obviously protected by CONFIG_DEBUG_GET_USER_PAGES_REFERENCES) to add
a redundant counter. That allows:

void __put_page(struct page *page)
{
...
/* Someone called put_page() instead of put_user_page() */
WARN_ON_ONCE(atomic_read(&page_ext->pin_count) > 0);

>>>
>>> Yeah, that's why I've been suggesting at LSF/MM that we may need to create
>>> a gup wrapper - say vaddr_pin_pages() - and track which sites dropping
>>> references got converted by using this wrapper instead of gup. The
>>> counterpart would then be more logically named as unpin_page() or whatever
>>> instead of put_user_page(). Sure this is not completely foolproof (you can
>>> create new callsite using vaddr_pin_pages() and then just drop refs using
>>> put_page()) but I suppose it would be a high enough barrier for missed
>>> conversions... Thoughts?

The debug option above is still a bit simplistic in its implementation (and maybe
not taking full advantage of the data it has), but I think it's preferable,
because it monitors the "core" and WARNs.

Instead of the wrapper, I'm thinking: documentation and the passage of time,
plus the debug option (perhaps enhanced--probably once I post it someone will
notice opportunities), yes?

>>
>> I think the API we really need is get_user_bvec() / put_user_bvec(),
>> and I know Christoph has been putting some work into that. That avoids
>> doing refcount operations on hundreds of pages if the page in question is
>> a huge page. Once people are switched over to that, they won't be tempted
>> to manually call put_page() on the individual constituent pages of a bvec.
>
> Well, get_user_bvec() is certainly a good API for one class of users but
> just looking at the above series, you'll see there are *many* places that
> just don't work with bvecs at all and you need something for those.
>

Yes, there are quite a few places that don't involve _bvec, as we can see
right here. So we need something. Andrew asked for a debug option some time
ago, and several people (Dave Hansen, Dan Williams, Jerome) had the idea
of vmap-ing gup pages separately, so you can definitely tell where each
page came from. I'm hoping not to have to go to that level of complexity
though.


[1] "mm/gup: debug tracking of get_user_pages() references" :
https://github.com/johnhubbard/linux/commit/21ff7d6161ec2a14d3f9d17c98abb00cc969d4d6

thanks,
--
John Hubbard
NVIDIA

2019-08-04 02:13:17

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites

On 8/2/19 1:05 AM, Peter Zijlstra wrote:
> On Thu, Aug 01, 2019 at 07:16:19PM -0700, [email protected] wrote:
>
>> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
>> ("mm: introduce put_user_page*(), placeholder versions"). That commit
>> has an extensive description of the problem and the planned steps to
>> solve it, but the highlites are:
>
> That is one horridly mangled Changelog there :-/ It looks like it's
> partially duplicated.

Yeah. It took so long to merge that I think I was no longer able to
actually see the commit description, after N readings. sigh

>
> Anyway; no objections to any of that, but I just wanted to mention that
> there are other problems with long term pinning that haven't been
> mentioned, notably they inhibit compaction.
>
> A long time ago I proposed an interface to mark pages as pinned, such
> that we could run compaction before we actually did the pinning.
>

This is all heading toward marking pages as pinned, so we should finally
get there. I'll post the RFC for tracking pinned pages shortly.


thanks,
--
John Hubbard
NVIDIA

2019-08-07 08:37:56

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites

On Fri 02-08-19 12:14:09, John Hubbard wrote:
> On 8/2/19 7:52 AM, Jan Kara wrote:
> > On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
> > > On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
> > > > On Fri 02-08-19 11:12:44, Michal Hocko wrote:
> > > > > On Thu 01-08-19 19:19:31, [email protected] wrote:
> > > > > [...]
> > > > > > 2) Convert all of the call sites for get_user_pages*(), to
> > > > > > invoke put_user_page*(), instead of put_page(). This involves dozens of
> > > > > > call sites, and will take some time.
> > > > >
> > > > > How do we make sure this is the case and it will remain the case in the
> > > > > future? There must be some automagic to enforce/check that. It is simply
> > > > > not manageable to do it every now and then because then 3) will simply
> > > > > be never safe.
> > > > >
> > > > > Have you considered coccinele or some other scripted way to do the
> > > > > transition? I have no idea how to deal with future changes that would
> > > > > break the balance though.
>
> Hi Michal,
>
> Yes, I've thought about it, and coccinelle falls a bit short (it's not smart
> enough to know which put_page()'s to convert). However, there is a debug
> option planned: a yet-to-be-posted commit [1] uses struct page extensions
> (obviously protected by CONFIG_DEBUG_GET_USER_PAGES_REFERENCES) to add
> a redundant counter. That allows:
>
> void __put_page(struct page *page)
> {
> ...
> /* Someone called put_page() instead of put_user_page() */
> WARN_ON_ONCE(atomic_read(&page_ext->pin_count) > 0);
>
> > > >
> > > > Yeah, that's why I've been suggesting at LSF/MM that we may need to create
> > > > a gup wrapper - say vaddr_pin_pages() - and track which sites dropping
> > > > references got converted by using this wrapper instead of gup. The
> > > > counterpart would then be more logically named as unpin_page() or whatever
> > > > instead of put_user_page(). Sure this is not completely foolproof (you can
> > > > create new callsite using vaddr_pin_pages() and then just drop refs using
> > > > put_page()) but I suppose it would be a high enough barrier for missed
> > > > conversions... Thoughts?
>
> The debug option above is still a bit simplistic in its implementation
> (and maybe not taking full advantage of the data it has), but I think
> it's preferable, because it monitors the "core" and WARNs.
>
> Instead of the wrapper, I'm thinking: documentation and the passage of
> time, plus the debug option (perhaps enhanced--probably once I post it
> someone will notice opportunities), yes?

So I think your debug option and my suggested renaming serve a bit
different purposes (and thus both make sense). If you do the renaming, you
can just grep to see unconverted sites. Also when someone merges new GUP
user (unaware of the new rules) while you switch GUP to use pins instead of
ordinary references, you'll get compilation error in case of renaming
instead of hard to debug refcount leak without the renaming. And such
conflict is almost bound to happen given the size of GUP patch set... Also
the renaming serves against the "coding inertia" - i.e., GUP is around for
ages so people just use it without checking any documentation or comments.
After switching how GUP works, what used to be correct isn't anymore so
renaming the function serves as a warning that something has really
changed.

Your refcount debug patches are good to catch bugs in the conversions done
but that requires you to be able to excercise the code path in the first
place which may require particular HW or so, and you also have to enable
the debug option which means you already aim at verifying the GUP
references are treated properly.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2019-08-07 08:47:38

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 00/34] put_user_pages(): miscellaneous call sites

On Wed 07-08-19 10:37:26, Jan Kara wrote:
> On Fri 02-08-19 12:14:09, John Hubbard wrote:
> > On 8/2/19 7:52 AM, Jan Kara wrote:
> > > On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
> > > > On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
> > > > > On Fri 02-08-19 11:12:44, Michal Hocko wrote:
> > > > > > On Thu 01-08-19 19:19:31, [email protected] wrote:
> > > > > > [...]
> > > > > > > 2) Convert all of the call sites for get_user_pages*(), to
> > > > > > > invoke put_user_page*(), instead of put_page(). This involves dozens of
> > > > > > > call sites, and will take some time.
> > > > > >
> > > > > > How do we make sure this is the case and it will remain the case in the
> > > > > > future? There must be some automagic to enforce/check that. It is simply
> > > > > > not manageable to do it every now and then because then 3) will simply
> > > > > > be never safe.
> > > > > >
> > > > > > Have you considered coccinele or some other scripted way to do the
> > > > > > transition? I have no idea how to deal with future changes that would
> > > > > > break the balance though.
> >
> > Hi Michal,
> >
> > Yes, I've thought about it, and coccinelle falls a bit short (it's not smart
> > enough to know which put_page()'s to convert). However, there is a debug
> > option planned: a yet-to-be-posted commit [1] uses struct page extensions
> > (obviously protected by CONFIG_DEBUG_GET_USER_PAGES_REFERENCES) to add
> > a redundant counter. That allows:
> >
> > void __put_page(struct page *page)
> > {
> > ...
> > /* Someone called put_page() instead of put_user_page() */
> > WARN_ON_ONCE(atomic_read(&page_ext->pin_count) > 0);
> >
> > > > >
> > > > > Yeah, that's why I've been suggesting at LSF/MM that we may need to create
> > > > > a gup wrapper - say vaddr_pin_pages() - and track which sites dropping
> > > > > references got converted by using this wrapper instead of gup. The
> > > > > counterpart would then be more logically named as unpin_page() or whatever
> > > > > instead of put_user_page(). Sure this is not completely foolproof (you can
> > > > > create new callsite using vaddr_pin_pages() and then just drop refs using
> > > > > put_page()) but I suppose it would be a high enough barrier for missed
> > > > > conversions... Thoughts?
> >
> > The debug option above is still a bit simplistic in its implementation
> > (and maybe not taking full advantage of the data it has), but I think
> > it's preferable, because it monitors the "core" and WARNs.
> >
> > Instead of the wrapper, I'm thinking: documentation and the passage of
> > time, plus the debug option (perhaps enhanced--probably once I post it
> > someone will notice opportunities), yes?
>
> So I think your debug option and my suggested renaming serve a bit
> different purposes (and thus both make sense). If you do the renaming, you
> can just grep to see unconverted sites. Also when someone merges new GUP
> user (unaware of the new rules) while you switch GUP to use pins instead of
> ordinary references, you'll get compilation error in case of renaming
> instead of hard to debug refcount leak without the renaming. And such
> conflict is almost bound to happen given the size of GUP patch set... Also
> the renaming serves against the "coding inertia" - i.e., GUP is around for
> ages so people just use it without checking any documentation or comments.
> After switching how GUP works, what used to be correct isn't anymore so
> renaming the function serves as a warning that something has really
> changed.

Fully agreed!

> Your refcount debug patches are good to catch bugs in the conversions done
> but that requires you to be able to excercise the code path in the first
> place which may require particular HW or so, and you also have to enable
> the debug option which means you already aim at verifying the GUP
> references are treated properly.
>
> Honza
>
> --
> Jan Kara <[email protected]>
> SUSE Labs, CR

--
Michal Hocko
SUSE Labs