On Sun, Sep 25, 2016 at 03:50:21PM -0700, Linus Torvalds wrote:
> I'd really like to re-open the "drop FOLL_FORCE entirely" discussion,
> because the thing really is disgusting.
>
> I realize that debuggers etc sometimes would want to punch through
> PROT_NONE protections, and I also realize that right now we only have
> a read/write flag, and we have that whole issue with "what if it's
> executable but not readable", which currently FOLL_FORCE makes a
> non-issue.
So I've experimented with this a little locally, removing FOLL_FORCE altogether
and tracking places where it is used (it seems to be a fair few places
actually.)
I've rather naively replaced the FOLL_FORCE check in check_vma_flags() with a
check against 'tsk && tsk->ptrace && tsk->parent == current', I'm not sure how
valid or sane this is, however, but a quick check against gdb proves that it is
able to do its thing in this configuration. Is this a viable path, or is this
way off the mark here?
The places I've found that have invoked gup functions which eventually result in
FOLL_FORCE being set are:
Calls __get_user_pages():
mm/gup.c: populate_vma_page_range()
mm/gup.c: get_dump_page()
calls get_user_pages_unlocked():
drivers/media/pci/ivtv/ivtv-yuv.c: ivtv_yuv_prep_user_dma()
drivers/media/pci/ivtv/ivtv-udma.c: ivtv_udma_setup()
calls get_user_pages_remote():
mm/memory.c: __access_remote_vm() [ see below for callers ]
fs/exec.c: get_arg_page()
kernel/events/uprobes.c: uprobe_write_opcode()
kernel/events/uprobes.c: is_trap_at_addr()
security/tomoyo/domain.c: tomoyo_dump_page()
calls __access_remote_vm():
mm/memory.c: access_remote_vm() [ see below for callers ]
mm/memory.c: access_process_vm()
access_process_vm() is exclusively used for ptrace, omitting its callers here.
calls access_remote_vm():
fs/proc/base.c: proc_pid_cmdline_read()
fs/proc/base.c: memrw()
fs/proc/base.c: environ_read()
calls get_user_pages():
drivers/infiniband/core/umem.c: ib_umem_get()
drivers/infiniband/hw/qib/qib_user_pages.c: __qib_get_user_pages()
drivers/infiniband/hw/usnic/usnic_uiom.c: usnic_uiom_get_pages()
drivers/media/v4l2-core/videobuf-dma-sg.c: videobuf_dma_init_user_locked()
calls get_vaddr_frames():
drivers/media/v4l2-core/videobuf2-memops.c: vb2_create_framevec()
drivers/gpu/drm/exynos/exynos_drm_g2d.c: g2d_userptr_get_dma_addr()
So it seems the general areas where it is used are tracing, uprobes and DMA
initialisation what what I can tell. I'm thinking some extra provision/careful
checking will be needed in each of these cases to see if an alternative is
possible.
I'm happy to explore this some more if that is useful in any way, though of
course I defer to your expertise as to how a world without FOLL_FORCE might
look!
Cheers, Lorenzo
On Fri, Oct 7, 2016 at 3:07 AM, Lorenzo Stoakes <[email protected]> wrote:
>
> So I've experimented with this a little locally, removing FOLL_FORCE altogether
> and tracking places where it is used (it seems to be a fair few places
> actually.)
I'm actually a bit worried that it is used too much simply because
it's so easy to use.
In paticular, there's a class of functions that (for legacy reasons)
get the "int force" and "int write" arguments, and they both tend to
just use a very non-descript 0/1 in the callers. Then at some point
you have code like
if (write)
flags |= FOLL_WRITE;
if (force)
flags |= FOLL_FORCE;
(I'm paraphrasing from memory), and then subsequent calls use that FOLL_FORCE.
And the problem with that it's that it's *really* hard to see where
people actually use FOLL_FORCE, and it's also really easy to use it
without thinking (because it doesn't say "FOLL_FORCE", it just
randomly says "1" in some random argument list).
So the *first* step I'd do is to actually get rid of all the "write"
and "force" arguments, and just pass in "flags" instead, and use
FOLL_FORCE and FOLL_WRITE explicitly in the caller. That way it
suddenly becomes *much* easier to grep for FOLL_FORCE and see who it
is that actually really wants it.
(In fact, I think some functions get *both* "flags" and the separate
write/force argument).
Would you be willing to look at doing that kind of purely syntactic,
non-semantic cleanup first?
> I've rather naively replaced the FOLL_FORCE check in check_vma_flags() with a
> check against 'tsk && tsk->ptrace && tsk->parent == current', I'm not sure how
> valid or sane this is, however, but a quick check against gdb proves that it is
> able to do its thing in this configuration. Is this a viable path, or is this
> way off the mark here?
I think that if we end up having the FOLL_FORCE semantics, we're
actually better off having an explicit FOLL_FORCE flag, and *not* do
some kind of implicit "under these magical circumstances we'll force
it anyway". The implicit thing is what we used to do long long ago, we
definitely don't want to.
So I'd rather see all the callers that actually use FOLL_FORCE, and
then we can start looking at whether it's really a requirement. Some
of them may not strictly *need* it in actual use, they just have it
because of historical reasons (ie exactly those implicit users that
just got mindlessly converted to maintain same semantics).
Linus
On Fri, Oct 07, 2016 at 08:34:15AM -0700, Linus Torvalds wrote:
> Would you be willing to look at doing that kind of purely syntactic,
> non-semantic cleanup first?
Sure, more than happy to do that! I'll work on a patch for this.
> I think that if we end up having the FOLL_FORCE semantics, we're
> actually better off having an explicit FOLL_FORCE flag, and *not* do
> some kind of implicit "under these magical circumstances we'll force
> it anyway". The implicit thing is what we used to do long long ago, we
> definitely don't want to.
That's a good point, it would definitely be considerably more 'magical', and
expanding the conditions to include uprobes etc. would only add to that.
I wondered about an alternative parameter/flag but it felt like it was
more-or-less FOLL_FORCE in a different form, at which point it may as well
remain FOLL_FORCE :)
On Fri, 7 Oct 2016, Lorenzo Stoakes wrote:
> On Fri, Oct 07, 2016 at 08:34:15AM -0700, Linus Torvalds wrote:
> > Would you be willing to look at doing that kind of purely syntactic,
> > non-semantic cleanup first?
>
> Sure, more than happy to do that! I'll work on a patch for this.
>
> > I think that if we end up having the FOLL_FORCE semantics, we're
> > actually better off having an explicit FOLL_FORCE flag, and *not* do
> > some kind of implicit "under these magical circumstances we'll force
> > it anyway". The implicit thing is what we used to do long long ago, we
> > definitely don't want to.
>
> That's a good point, it would definitely be considerably more 'magical', and
> expanding the conditions to include uprobes etc. would only add to that.
>
> I wondered about an alternative parameter/flag but it felt like it was
> more-or-less FOLL_FORCE in a different form, at which point it may as well
> remain FOLL_FORCE :)
Adding Jan Kara (and Dave Hansen) to the Cc list: I think they were
pursuing get_user_pages() cleanups last year (which would remove the
force option from most callers anyway), and I've lost track of where
that all got to. Lorenzo, please don't expend a lot of effort before
checking with Jan.
Hugh
On Fri, Oct 07, 2016 at 11:16:26AM -0700, Hugh Dickins wrote:
>
> Adding Jan Kara (and Dave Hansen) to the Cc list: I think they were
> pursuing get_user_pages() cleanups last year (which would remove the
> force option from most callers anyway), and I've lost track of where
> that all got to. Lorenzo, please don't expend a lot of effort before
> checking with Jan.
Sure, no problem. I have the callers noted down + the surrounding code in my
wetware 'L3 cache' so it's not a huge effort to get a draft patch written, but
I'll hold off on sending anything until I hear from Jan.
On Mon, Oct 10, 2016 at 09:47:12AM +0200, Jan Kara wrote:
> Yeah, so my cleanups where mostly concerned about mmap_sem locking and
> reducing number of places which cared about those. Regarding flags for
> get_user_pages() / get_vaddr_frames(), I agree that using flags argument
> as Linus suggests will make it easier to see what the callers actually
> want. So I'm for that.
Great, thanks Jan! I have a draft patch that needs a little tweaking/further
testing but isn't too far off.
One thing I am wondering about is whether functions that have write/force
parameters replaced with gup_flags should mask against (FOLL_WRITE | FOLL_FORCE)
to prevent callers from doing unexpected things with other FOLL_* flags?
I'm inclined _not_ to because it adds a rather non-obvious restriction on this
parameter, reduces clarity about which flags are actually being used (which is
the point of the patch in the first place), and the caller ought to know what
they are doing.
I'd be curious to hear people's thoughts on this.
On Fri 07-10-16 11:16:26, Hugh Dickins wrote:
> On Fri, 7 Oct 2016, Lorenzo Stoakes wrote:
> > On Fri, Oct 07, 2016 at 08:34:15AM -0700, Linus Torvalds wrote:
> > > Would you be willing to look at doing that kind of purely syntactic,
> > > non-semantic cleanup first?
> >
> > Sure, more than happy to do that! I'll work on a patch for this.
> >
> > > I think that if we end up having the FOLL_FORCE semantics, we're
> > > actually better off having an explicit FOLL_FORCE flag, and *not* do
> > > some kind of implicit "under these magical circumstances we'll force
> > > it anyway". The implicit thing is what we used to do long long ago, we
> > > definitely don't want to.
> >
> > That's a good point, it would definitely be considerably more 'magical', and
> > expanding the conditions to include uprobes etc. would only add to that.
> >
> > I wondered about an alternative parameter/flag but it felt like it was
> > more-or-less FOLL_FORCE in a different form, at which point it may as well
> > remain FOLL_FORCE :)
>
> Adding Jan Kara (and Dave Hansen) to the Cc list: I think they were
> pursuing get_user_pages() cleanups last year (which would remove the
> force option from most callers anyway), and I've lost track of where
> that all got to. Lorenzo, please don't expend a lot of effort before
> checking with Jan.
Yeah, so my cleanups where mostly concerned about mmap_sem locking and
reducing number of places which cared about those. Regarding flags for
get_user_pages() / get_vaddr_frames(), I agree that using flags argument
as Linus suggests will make it easier to see what the callers actually
want. So I'm for that.
Regarding the FOLL_FORCE I've had a discussion about its use in Infiniband
drivers in 2013 with Roland Dreier. He referenced me to discussion
https://lkml.org/lkml/2012/1/26/7 and summarized that they use FOLL_FORCE
to trigger possible COW early. That avoids the situation when the process
triggers COW of the pages later after DMA buffers are set up which results
in the DMA result to not be visible where the process expects it... I'll
defer to others whether that is a sane or intended use of FOLL_FORCE flag
but I suppose that is the reason why most drivers use it when setting up
DMA buffers in memory passed from userspace.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Mon 10-10-16 09:28:28, Lorenzo Stoakes wrote:
> On Mon, Oct 10, 2016 at 09:47:12AM +0200, Jan Kara wrote:
> > Yeah, so my cleanups where mostly concerned about mmap_sem locking and
> > reducing number of places which cared about those. Regarding flags for
> > get_user_pages() / get_vaddr_frames(), I agree that using flags argument
> > as Linus suggests will make it easier to see what the callers actually
> > want. So I'm for that.
>
> Great, thanks Jan! I have a draft patch that needs a little tweaking/further
> testing but isn't too far off.
>
> One thing I am wondering about is whether functions that have write/force
> parameters replaced with gup_flags should mask against (FOLL_WRITE | FOLL_FORCE)
> to prevent callers from doing unexpected things with other FOLL_* flags?
>
> I'm inclined _not_ to because it adds a rather non-obvious restriction on this
> parameter, reduces clarity about which flags are actually being used (which is
> the point of the patch in the first place), and the caller ought to know what
> they are doing.
Yeah, just leave flags as is. There is no strong reason to restrict them.
Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR