2014-10-23 23:25:51

by Miklos Szeredi

[permalink] [raw]
Subject: [GIT PULL] overlay filesystem v25

Linus,

Please pull

git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25

Plenty of bugs fixed relative to the previous version, thanks to Al Viro's
review and is now officially be BugFree(TM). Also passes the
unionmount-testsuite by David Howells.

Thanks,
Miklos

---
Andy Whitcroft (1):
overlayfs: add statfs support

Erez Zadok (1):
overlayfs: implement show_options

Miklos Szeredi (11):
vfs: add i_op->dentry_open()
vfs: export do_splice_direct() to modules
vfs: export __inode_permission() to modules
vfs: introduce clone_private_mount()
vfs: export check_sticky()
vfs: add whiteout support
vfs: add RENAME_WHITEOUT
ext4: support RENAME_WHITEOUT
shmem: support RENAME_WHITEOUT
overlay filesystem
fs: limit filesystem stacking depth

Neil Brown (1):
overlay: overlay filesystem documentation

---
Documentation/filesystems/Locking | 2 +
Documentation/filesystems/overlayfs.txt | 198 +++++++
Documentation/filesystems/vfs.txt | 7 +
MAINTAINERS | 7 +
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/btrfs/ioctl.c | 20 +-
fs/ecryptfs/main.c | 7 +
fs/ext4/namei.c | 95 +++-
fs/internal.h | 7 -
fs/namei.c | 41 +-
fs/namespace.c | 27 +
fs/open.c | 23 +-
fs/overlayfs/Kconfig | 10 +
fs/overlayfs/Makefile | 7 +
fs/overlayfs/copy_up.c | 414 ++++++++++++++
fs/overlayfs/dir.c | 921 ++++++++++++++++++++++++++++++++
fs/overlayfs/inode.c | 425 +++++++++++++++
fs/overlayfs/overlayfs.h | 191 +++++++
fs/overlayfs/readdir.c | 587 ++++++++++++++++++++
fs/overlayfs/super.c | 796 +++++++++++++++++++++++++++
fs/splice.c | 1 +
include/linux/fs.h | 39 ++
include/linux/mount.h | 3 +
include/uapi/linux/fs.h | 1 +
mm/shmem.c | 36 +-
26 files changed, 3809 insertions(+), 58 deletions(-)
create mode 100644 Documentation/filesystems/overlayfs.txt
create mode 100644 fs/overlayfs/Kconfig
create mode 100644 fs/overlayfs/Makefile
create mode 100644 fs/overlayfs/copy_up.c
create mode 100644 fs/overlayfs/dir.c
create mode 100644 fs/overlayfs/inode.c
create mode 100644 fs/overlayfs/overlayfs.h
create mode 100644 fs/overlayfs/readdir.c
create mode 100644 fs/overlayfs/super.c


2014-10-24 02:21:00

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

On Fri, Oct 24, 2014 at 01:25:39AM +0200, Miklos Szeredi wrote:
> Linus,
>
> Please pull
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25
>
> Plenty of bugs fixed relative to the previous version, thanks to Al Viro's
> review and is now officially be BugFree(TM). Also passes the
> unionmount-testsuite by David Howells.

*blink*

Why the hell do you hold ->i_mutex across the entire opening of underlying
directory? All you need is to serialize one assignment; the side that loses
the race will simply fput() what it opened...

Oh, well - that goes under "weird pessimisations, easy to fix in followups"...

2014-10-24 03:24:27

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
> Why the hell do you hold ->i_mutex across the entire opening of underlying
> directory? All you need is to serialize one assignment; the side that loses
> the race will simply fput() what it opened...
>
> Oh, well - that goes under "weird pessimisations, easy to fix in followups"...

OK, pulled into vfs.git, followups in question added. Also there: fix for
a long-standing leak in d_splice_alias() failure exits. Guys, could you
check that current vfs.git#for-linus survives your local tests? Seems to
survive here; if I don't hear of any problems by tomorrow morning, to Linus
it goes... FWIW, for that pull request stats would be

Shortlog:
Al Viro (5):
fix inode leaks on d_splice_alias() failure exits
overlayfs: don't hold ->i_mutex over opening the real directory
overlayfs: make ovl_cache_entry->name an array instead of pointer
overlayfs: embed root into overlay_readdir_data
overlayfs: embed middle into overlay_readdir_data

Andy Whitcroft (1):
overlayfs: add statfs support

Erez Zadok (1):
overlayfs: implement show_options

Miklos Szeredi (11):
vfs: add i_op->dentry_open()
vfs: export do_splice_direct() to modules
vfs: export __inode_permission() to modules
vfs: introduce clone_private_mount()
vfs: export check_sticky()
vfs: add whiteout support
vfs: add RENAME_WHITEOUT
ext4: support RENAME_WHITEOUT
shmem: support RENAME_WHITEOUT
overlay filesystem
fs: limit filesystem stacking depth

Neil Brown (1):
overlay: overlay filesystem documentation

Diffstat:
Documentation/filesystems/Locking | 2 +
Documentation/filesystems/overlayfs.txt | 198 +++++++
Documentation/filesystems/vfs.txt | 7 +
MAINTAINERS | 7 +
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/btrfs/ioctl.c | 20 +-
fs/dcache.c | 2 +
fs/ecryptfs/main.c | 7 +
fs/ext4/namei.c | 95 +++-
fs/internal.h | 7 -
fs/namei.c | 41 +-
fs/namespace.c | 27 +
fs/open.c | 23 +-
fs/overlayfs/Kconfig | 10 +
fs/overlayfs/Makefile | 7 +
fs/overlayfs/copy_up.c | 414 ++++++++++++++
fs/overlayfs/dir.c | 921 +++++++++++++++++++++++++++++++
fs/overlayfs/inode.c | 425 ++++++++++++++
fs/overlayfs/overlayfs.h | 191 +++++++
fs/overlayfs/readdir.c | 589 ++++++++++++++++++++
fs/overlayfs/super.c | 796 ++++++++++++++++++++++++++
fs/splice.c | 1 +
include/linux/fs.h | 39 ++
include/linux/mount.h | 3 +
include/uapi/linux/fs.h | 1 +
mm/shmem.c | 36 +-
27 files changed, 3813 insertions(+), 58 deletions(-)
create mode 100644 Documentation/filesystems/overlayfs.txt
create mode 100644 fs/overlayfs/Kconfig
create mode 100644 fs/overlayfs/Makefile
create mode 100644 fs/overlayfs/copy_up.c
create mode 100644 fs/overlayfs/dir.c
create mode 100644 fs/overlayfs/inode.c
create mode 100644 fs/overlayfs/overlayfs.h
create mode 100644 fs/overlayfs/readdir.c
create mode 100644 fs/overlayfs/super.c

2014-10-24 07:24:48

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

On Fri, Oct 24, 2014 at 5:24 AM, Al Viro <[email protected]> wrote:
> On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
>> Why the hell do you hold ->i_mutex across the entire opening of underlying
>> directory? All you need is to serialize one assignment; the side that loses
>> the race will simply fput() what it opened...

The reason I didn't do your "fix" is that it

- adds more lines than it takes,

- I wasn't sure at all if the lockless access is actually correct
without the ACCESS_ONCE and all the memory barrier magic that might be
necessary on weird architectures.

The rest of the changes look OK.

Thanks,
Miklos

2014-10-24 07:28:24

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

Hi Al,

On Fri, Oct 24, 2014 at 5:24 AM, Al Viro <[email protected]> wrote:
> On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
>> Why the hell do you hold ->i_mutex across the entire opening of underlying
>> directory? All you need is to serialize one assignment; the side that loses
>> the race will simply fput() what it opened...
>>
>> Oh, well - that goes under "weird pessimisations, easy to fix in followups"...
>
> OK, pulled into vfs.git, followups in question added. Also there: fix for
> a long-standing leak in d_splice_alias() failure exits. Guys, could you
> check that current vfs.git#for-linus survives your local tests? Seems to
> survive here; if I don't hear of any problems by tomorrow morning, to Linus
> it goes... FWIW, for that pull request stats would be

I think you forgot to merge your for-linus branch into for-next?

Anyway, as Stephen announced there will be no linux-next release
today, I'm afraid you'll have to delay your pull request to Tuesday.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2014-10-25 19:17:31

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

On Sat, Oct 25, 2014 at 10:18 AM, Al Viro <[email protected]> wrote:
> On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote:
>
>> The reason I didn't do your "fix" is that it
>>
>> - adds more lines than it takes,
>>
>> - I wasn't sure at all if the lockless access is actually correct
>> without the ACCESS_ONCE and all the memory barrier magic that might be
>> necessary on weird architectures.
>
> _What_ lockless accesses? There is an extremely embarrassing bug in that
> commit, all right, but it has nothing to do with barriers... All
> barrier-related issues are taken care of by ovl_path_upper() (and without
> that you'd have tons of worse problems). Fetching ->upperfile outside of
> ->i_mutex is fine - in the worst case we'll fetch NULL, open the sucker
> grab ->i_mutex and find out that it has already been taken care of.
> In which case we fput() what we'd opened and move on (fput() under
> ->i_mutex is fine - it's going to be delayed until return from syscall
> anyway).

Yes, but it's not about race with copy-up (which the ovl_path_upper()
protects against), but race of two fsync calls with each other. If
there's no synchronization between them, then that od->upperfile does
indeed count as lockless access, no matter that the assignment was
done under lock.

Thanks,
Miklos

2014-10-25 19:51:21

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:

> Yes, but it's not about race with copy-up (which the ovl_path_upper()
> protects against), but race of two fsync calls with each other. If
> there's no synchronization between them, then that od->upperfile does
> indeed count as lockless access, no matter that the assignment was
> done under lock.

p = global;
if (!p) { // outside of lock
p = alloc();
grab lock
if (!global) {
global = p;
} else {
destroy(p);
p = global;
}
drop lock
}
is a very common pattern, especially if you look for cases when lock is
a spinlock and allocation is blocking (in those cases you'll often see
destroy() part done after dropping the lock; that's where what I fucked up in
what I'd originally pushed. And it wasn't even needed - fput() under
->i_mutex is OK...)

2014-10-25 19:51:23

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote:

> The reason I didn't do your "fix" is that it
>
> - adds more lines than it takes,
>
> - I wasn't sure at all if the lockless access is actually correct
> without the ACCESS_ONCE and all the memory barrier magic that might be
> necessary on weird architectures.

_What_ lockless accesses? There is an extremely embarrassing bug in that
commit, all right, but it has nothing to do with barriers... All
barrier-related issues are taken care of by ovl_path_upper() (and without
that you'd have tons of worse problems). Fetching ->upperfile outside of
->i_mutex is fine - in the worst case we'll fetch NULL, open the sucker
grab ->i_mutex and find out that it has already been taken care of.
In which case we fput() what we'd opened and move on (fput() under
->i_mutex is fine - it's going to be delayed until return from syscall
anyway).

There was a very dumb braino in there; fixed, force-pushed, passes unionmount
tests, no regressions on LTP syscall ones and xfstests.

2014-10-27 08:06:58

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

[Paul McKenney added to CC]

On Sat, Oct 25, 2014 at 7:06 PM, Al Viro <[email protected]> wrote:
> On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:
>
>> Yes, but it's not about race with copy-up (which the ovl_path_upper()
>> protects against), but race of two fsync calls with each other. If
>> there's no synchronization between them, then that od->upperfile does
>> indeed count as lockless access, no matter that the assignment was
>> done under lock.
>
> p = global;
> if (!p) { // outside of lock
> p = alloc();
> grab lock
> if (!global) {
> global = p;
> } else {
> destroy(p);
> p = global;
> }
> drop lock
> }
> is a very common pattern, especially if you look for cases when lock is
> a spinlock and allocation is blocking (in those cases you'll often see
> destroy() part done after dropping the lock; that's where what I fucked up in
> what I'd originally pushed. And it wasn't even needed - fput() under
> ->i_mutex is OK...)

Being a very common pattern does not automatically make it correct...

My understanding of these issues is very limited, but it's not clear
to me what will order initialization of members of p with the storing
of p into global. E.g. we start out with global == NULL and p->foo ==
0.

CPU1:
p->foo = 1
grab lock
if (!global)
global = p

CPU1:
p = global
if (p)
q = p->foo

Is it guaranteed that the above sequence (as is, without any barriers
or ACCESS_ONCE() other than the lock acquisition) will result in q ==
1 if p != NULL?

Thanks,
Miklos

2014-10-27 16:02:56

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

On Mon, Oct 27, 2014 at 09:06:54AM +0100, Miklos Szeredi wrote:
> [Paul McKenney added to CC]
>
> On Sat, Oct 25, 2014 at 7:06 PM, Al Viro <[email protected]> wrote:
> > On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:
> >
> >> Yes, but it's not about race with copy-up (which the ovl_path_upper()
> >> protects against), but race of two fsync calls with each other. If
> >> there's no synchronization between them, then that od->upperfile does
> >> indeed count as lockless access, no matter that the assignment was
> >> done under lock.
> >
> > p = global;
> > if (!p) { // outside of lock
> > p = alloc();
> > grab lock
> > if (!global) {
> > global = p;
> > } else {
> > destroy(p);
> > p = global;
> > }
> > drop lock
> > }
> > is a very common pattern, especially if you look for cases when lock is
> > a spinlock and allocation is blocking (in those cases you'll often see
> > destroy() part done after dropping the lock; that's where what I fucked up in
> > what I'd originally pushed. And it wasn't even needed - fput() under
> > ->i_mutex is OK...)
>
> Being a very common pattern does not automatically make it correct...
>
> My understanding of these issues is very limited, but it's not clear
> to me what will order initialization of members of p with the storing
> of p into global. E.g. we start out with global == NULL and p->foo ==
> 0.
>
> CPU1:
> p->foo = 1
> grab lock
> if (!global)
> global = p
>
> CPU1:

If it is all the same to you, I will call this CPU2 to distinguish it from
the first CPU1. ;-)

> p = global
> if (p)
> q = p->foo
>
> Is it guaranteed that the above sequence (as is, without any barriers
> or ACCESS_ONCE() other than the lock acquisition) will result in q ==
> 1 if p != NULL?

Indeed, life is hard here. Keep in mind that lock acquisition is not
guaranteed to prevent prior operations from being reordered into the
critical section, possibly as follows:

CPU1:
grab lock
if (!global)
global = p;
/* Assume all of CPU2's accesses happen here. */
p->foo = 1;

This clearly allows CPU2 to execute as follows:

CPU2:
p = global; /* gets p */
if (p) /* non-NULL */
q = p->foo; /* might not be 1 */

Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
accesses can be misordered. You need rcu_dereference() or the combination
of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
more info.

So no, there is no guarantee. I am assuming that the lock grabbed by
CPU1 guards all assignments to "global", otherwise the code needs further
help. I am further assuming that the memory pointed to by CPU1's "p"
is inaccessible to any other CPU, as in CPU1 just allocated the memory.
Otherwise, the assignment "p->foo = 1" is questionable. And finally,
I am assuming that p->foo stays constant once it has been made
accessible to readers.

But the following should work:

CPU1:
p->foo = 1; /* Assumes p is local. */
smp_mb__before_spinlock();
grab lock
if (!global) /* Assumes lock protects all assignments to global. */
global = p;

CPU2:
p = rcu_dereference(global);
if (p)
q = p->foo; /* Assumes p->foo constant once visible to readers. */
/* Also assumes p and q are local. */

If the assumptions called out in the comments do not hold, you at least
need ACCESS_ONCE(), and perhaps even more synchronization. For more info
on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.

Thanx, Paul

2014-10-27 17:28:09

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote:

> Indeed, life is hard here. Keep in mind that lock acquisition is not
> guaranteed to prevent prior operations from being reordered into the
> critical section, possibly as follows:
>
> CPU1:
> grab lock
> if (!global)
> global = p;
> /* Assume all of CPU2's accesses happen here. */
> p->foo = 1;

A bit of context: p is a local pointer to struct file here and alloc is
opening it...

> This clearly allows CPU2 to execute as follows:
>
> CPU2:
> p = global; /* gets p */
> if (p) /* non-NULL */
> q = p->foo; /* might not be 1 */
>
> Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
> accesses can be misordered. You need rcu_dereference() or the combination
> of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
> As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
> more info.
>
> So no, there is no guarantee. I am assuming that the lock grabbed by
> CPU1 guards all assignments to "global", otherwise the code needs further
> help. I am further assuming that the memory pointed to by CPU1's "p"
> is inaccessible to any other CPU, as in CPU1 just allocated the memory.
> Otherwise, the assignment "p->foo = 1" is questionable. And finally,
> I am assuming that p->foo stays constant once it has been made
> accessible to readers.
>
> But the following should work:
>
> CPU1:
> p->foo = 1; /* Assumes p is local. */
> smp_mb__before_spinlock();
> grab lock
> if (!global) /* Assumes lock protects all assignments to global. */
> global = p;
>
> CPU2:
> p = rcu_dereference(global);
> if (p)
> q = p->foo; /* Assumes p->foo constant once visible to readers. */
> /* Also assumes p and q are local. */
>
> If the assumptions called out in the comments do not hold, you at least
> need ACCESS_ONCE(), and perhaps even more synchronization. For more info
> on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.

First of all, this "p->foo = 1" is a shorthand for initialization of
struct file done by opening a file. What you are saying is that it
can leak past the point where we stick a pointer to freshly opened
file into a place where another CPU can see it, but not past the
barrier in dropping the lock, right?

And you are suggesting rcu_dereference() as a way to bring the required
barriers in. Ouch... The names are really bad, but there's another
fun issue - rcu_dereference brings in sparse noise. Wouldn't direct use
of smp_read_barrier_depends() be cleaner, anyway?

2014-10-27 17:40:15

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

On Mon, Oct 27, 2014 at 05:28:01PM +0000, Al Viro wrote:
> On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote:
>
> > Indeed, life is hard here. Keep in mind that lock acquisition is not
> > guaranteed to prevent prior operations from being reordered into the
> > critical section, possibly as follows:
> >
> > CPU1:
> > grab lock
> > if (!global)
> > global = p;
> > /* Assume all of CPU2's accesses happen here. */
> > p->foo = 1;
>
> A bit of context: p is a local pointer to struct file here and alloc is
> opening it...

OK, good to know. ;-)

> > This clearly allows CPU2 to execute as follows:
> >
> > CPU2:
> > p = global; /* gets p */
> > if (p) /* non-NULL */
> > q = p->foo; /* might not be 1 */
> >
> > Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
> > accesses can be misordered. You need rcu_dereference() or the combination
> > of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
> > As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
> > more info.
> >
> > So no, there is no guarantee. I am assuming that the lock grabbed by
> > CPU1 guards all assignments to "global", otherwise the code needs further
> > help. I am further assuming that the memory pointed to by CPU1's "p"
> > is inaccessible to any other CPU, as in CPU1 just allocated the memory.
> > Otherwise, the assignment "p->foo = 1" is questionable. And finally,
> > I am assuming that p->foo stays constant once it has been made
> > accessible to readers.
> >
> > But the following should work:
> >
> > CPU1:
> > p->foo = 1; /* Assumes p is local. */
> > smp_mb__before_spinlock();
> > grab lock
> > if (!global) /* Assumes lock protects all assignments to global. */
> > global = p;
> >
> > CPU2:
> > p = rcu_dereference(global);
> > if (p)
> > q = p->foo; /* Assumes p->foo constant once visible to readers. */
> > /* Also assumes p and q are local. */
> >
> > If the assumptions called out in the comments do not hold, you at least
> > need ACCESS_ONCE(), and perhaps even more synchronization. For more info
> > on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.
>
> First of all, this "p->foo = 1" is a shorthand for initialization of
> struct file done by opening a file. What you are saying is that it
> can leak past the point where we stick a pointer to freshly opened
> file into a place where another CPU can see it, but not past the
> barrier in dropping the lock, right?

Exactly!

I should also add that smp_mb__before_spinlock() implies smp_wmb(), but
nothing more. But that is OK because smp_wmb() suffices in this case.

> And you are suggesting rcu_dereference() as a way to bring the required
> barriers in. Ouch... The names are really bad, but there's another
> fun issue - rcu_dereference brings in sparse noise. Wouldn't direct use
> of smp_read_barrier_depends() be cleaner, anyway?

Code making direct use of smp_read_barrier_depends() is harder to read,
in my experience, but good point on the sparse noise. Maybe a new
lockless_dereference() primitive? Maybe something like the following?
(Untested, probably does not even build.)

#define lockless_dereference(p) \
({ \
typeof(*p) *_________p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
_________p1; \
})

Thanx, Paul

2014-10-28 01:12:19

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> Code making direct use of smp_read_barrier_depends() is harder to read,
> in my experience, but good point on the sparse noise. Maybe a new
> lockless_dereference() primitive? Maybe something like the following?
> (Untested, probably does not even build.)
>
> #define lockless_dereference(p) \
> ({ \
> typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> _________p1; \
> })

Hmm... Where would you prefer to put it? rcupdate.h?

2014-10-28 04:27:02

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote:
> On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > Code making direct use of smp_read_barrier_depends() is harder to read,
> > in my experience, but good point on the sparse noise. Maybe a new
> > lockless_dereference() primitive? Maybe something like the following?
> > (Untested, probably does not even build.)
> >
> > #define lockless_dereference(p) \
> > ({ \
> > typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > _________p1; \
> > })
>
> Hmm... Where would you prefer to put it? rcupdate.h?

Good a place as any, I guess. Please see patch below. Left to myself,
I would send this along for the next merge window, but please let me
know if you would like it sooner.

Thanx, Paul

------------------------------------------------------------------------

rcu: Provide counterpart to rcu_dereference() for non-RCU situations

Although rcu_dereference() and friends can be used in situations where
object lifetimes are being managed by something other than RCU, the
resulting sparse and lockdep-RCU noise can be annoying. This commit
therefore supplies a lockless_dereference(), which provides the
protection for dereferences without the RCU-related debugging noise.

Reported-by: Al Viro <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9575c2d403b5..ed4f5939a452 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -617,6 +617,21 @@ static inline void rcu_preempt_sleep_check(void)
#define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)

/**
+ * lockless_dereference() - safely load a pointer for later dereference
+ * @p: The pointer to load
+ *
+ * Similar to rcu_dereference(), but for situations where the pointed-to
+ * object's lifetime is managed by something other than RCU. That
+ * "something other" might be reference counting or simple immortality.
+ */
+#define lockless_dereference(p) \
+({ \
+ typeof(p) _________p1 = ACCESS_ONCE(p); \
+ smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
+ (_________p1); \
+})
+
+/**
* rcu_assign_pointer() - assign to RCU-protected pointer
* @p: pointer to assign to
* @v: value to assign (publish)

2014-10-28 22:55:19

by Al Viro

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote:
> > On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > > Code making direct use of smp_read_barrier_depends() is harder to read,
> > > in my experience, but good point on the sparse noise. Maybe a new
> > > lockless_dereference() primitive? Maybe something like the following?
> > > (Untested, probably does not even build.)
> > >
> > > #define lockless_dereference(p) \
> > > ({ \
> > > typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > _________p1; \
> > > })
> >
> > Hmm... Where would you prefer to put it? rcupdate.h?
>
> Good a place as any, I guess. Please see patch below. Left to myself,
> I would send this along for the next merge window, but please let me
> know if you would like it sooner.

It's needed sooner, unfortunately. Guys, could you take a look at
vfs.git#for-linus and comment?

2014-10-28 23:29:52

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [GIT PULL] overlay filesystem v25

On Tue, Oct 28, 2014 at 10:55:13PM +0000, Al Viro wrote:
> On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote:
> > > On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > > > Code making direct use of smp_read_barrier_depends() is harder to read,
> > > > in my experience, but good point on the sparse noise. Maybe a new
> > > > lockless_dereference() primitive? Maybe something like the following?
> > > > (Untested, probably does not even build.)
> > > >
> > > > #define lockless_dereference(p) \
> > > > ({ \
> > > > typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > > > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > > _________p1; \
> > > > })
> > >
> > > Hmm... Where would you prefer to put it? rcupdate.h?
> >
> > Good a place as any, I guess. Please see patch below. Left to myself,
> > I would send this along for the next merge window, but please let me
> > know if you would like it sooner.
>
> It's needed sooner, unfortunately. Guys, could you take a look at
> vfs.git#for-linus and comment?

The version in your tree looks good. I will drop my commit in favor
of yours.

The use of lockless_dereference() and smp_mb__before_spinlock() in
d45f00ae43 (overlayfs: barriers for opening upper-layer directory)
looks fine to me. Only nit is lack of space between "=" and
lockless_dereference().

Thanx, Paul