Linus,
Please pull
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25
Plenty of bugs fixed relative to the previous version, thanks to Al Viro's
review and is now officially be BugFree(TM). Also passes the
unionmount-testsuite by David Howells.
Thanks,
Miklos
---
Andy Whitcroft (1):
overlayfs: add statfs support
Erez Zadok (1):
overlayfs: implement show_options
Miklos Szeredi (11):
vfs: add i_op->dentry_open()
vfs: export do_splice_direct() to modules
vfs: export __inode_permission() to modules
vfs: introduce clone_private_mount()
vfs: export check_sticky()
vfs: add whiteout support
vfs: add RENAME_WHITEOUT
ext4: support RENAME_WHITEOUT
shmem: support RENAME_WHITEOUT
overlay filesystem
fs: limit filesystem stacking depth
Neil Brown (1):
overlay: overlay filesystem documentation
---
Documentation/filesystems/Locking | 2 +
Documentation/filesystems/overlayfs.txt | 198 +++++++
Documentation/filesystems/vfs.txt | 7 +
MAINTAINERS | 7 +
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/btrfs/ioctl.c | 20 +-
fs/ecryptfs/main.c | 7 +
fs/ext4/namei.c | 95 +++-
fs/internal.h | 7 -
fs/namei.c | 41 +-
fs/namespace.c | 27 +
fs/open.c | 23 +-
fs/overlayfs/Kconfig | 10 +
fs/overlayfs/Makefile | 7 +
fs/overlayfs/copy_up.c | 414 ++++++++++++++
fs/overlayfs/dir.c | 921 ++++++++++++++++++++++++++++++++
fs/overlayfs/inode.c | 425 +++++++++++++++
fs/overlayfs/overlayfs.h | 191 +++++++
fs/overlayfs/readdir.c | 587 ++++++++++++++++++++
fs/overlayfs/super.c | 796 +++++++++++++++++++++++++++
fs/splice.c | 1 +
include/linux/fs.h | 39 ++
include/linux/mount.h | 3 +
include/uapi/linux/fs.h | 1 +
mm/shmem.c | 36 +-
26 files changed, 3809 insertions(+), 58 deletions(-)
create mode 100644 Documentation/filesystems/overlayfs.txt
create mode 100644 fs/overlayfs/Kconfig
create mode 100644 fs/overlayfs/Makefile
create mode 100644 fs/overlayfs/copy_up.c
create mode 100644 fs/overlayfs/dir.c
create mode 100644 fs/overlayfs/inode.c
create mode 100644 fs/overlayfs/overlayfs.h
create mode 100644 fs/overlayfs/readdir.c
create mode 100644 fs/overlayfs/super.c
On Fri, Oct 24, 2014 at 01:25:39AM +0200, Miklos Szeredi wrote:
> Linus,
>
> Please pull
>
> git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git overlayfs.v25
>
> Plenty of bugs fixed relative to the previous version, thanks to Al Viro's
> review and is now officially be BugFree(TM). Also passes the
> unionmount-testsuite by David Howells.
*blink*
Why the hell do you hold ->i_mutex across the entire opening of underlying
directory? All you need is to serialize one assignment; the side that loses
the race will simply fput() what it opened...
Oh, well - that goes under "weird pessimisations, easy to fix in followups"...
On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
> Why the hell do you hold ->i_mutex across the entire opening of underlying
> directory? All you need is to serialize one assignment; the side that loses
> the race will simply fput() what it opened...
>
> Oh, well - that goes under "weird pessimisations, easy to fix in followups"...
OK, pulled into vfs.git, followups in question added. Also there: fix for
a long-standing leak in d_splice_alias() failure exits. Guys, could you
check that current vfs.git#for-linus survives your local tests? Seems to
survive here; if I don't hear of any problems by tomorrow morning, to Linus
it goes... FWIW, for that pull request stats would be
Shortlog:
Al Viro (5):
fix inode leaks on d_splice_alias() failure exits
overlayfs: don't hold ->i_mutex over opening the real directory
overlayfs: make ovl_cache_entry->name an array instead of pointer
overlayfs: embed root into overlay_readdir_data
overlayfs: embed middle into overlay_readdir_data
Andy Whitcroft (1):
overlayfs: add statfs support
Erez Zadok (1):
overlayfs: implement show_options
Miklos Szeredi (11):
vfs: add i_op->dentry_open()
vfs: export do_splice_direct() to modules
vfs: export __inode_permission() to modules
vfs: introduce clone_private_mount()
vfs: export check_sticky()
vfs: add whiteout support
vfs: add RENAME_WHITEOUT
ext4: support RENAME_WHITEOUT
shmem: support RENAME_WHITEOUT
overlay filesystem
fs: limit filesystem stacking depth
Neil Brown (1):
overlay: overlay filesystem documentation
Diffstat:
Documentation/filesystems/Locking | 2 +
Documentation/filesystems/overlayfs.txt | 198 +++++++
Documentation/filesystems/vfs.txt | 7 +
MAINTAINERS | 7 +
fs/Kconfig | 1 +
fs/Makefile | 1 +
fs/btrfs/ioctl.c | 20 +-
fs/dcache.c | 2 +
fs/ecryptfs/main.c | 7 +
fs/ext4/namei.c | 95 +++-
fs/internal.h | 7 -
fs/namei.c | 41 +-
fs/namespace.c | 27 +
fs/open.c | 23 +-
fs/overlayfs/Kconfig | 10 +
fs/overlayfs/Makefile | 7 +
fs/overlayfs/copy_up.c | 414 ++++++++++++++
fs/overlayfs/dir.c | 921 +++++++++++++++++++++++++++++++
fs/overlayfs/inode.c | 425 ++++++++++++++
fs/overlayfs/overlayfs.h | 191 +++++++
fs/overlayfs/readdir.c | 589 ++++++++++++++++++++
fs/overlayfs/super.c | 796 ++++++++++++++++++++++++++
fs/splice.c | 1 +
include/linux/fs.h | 39 ++
include/linux/mount.h | 3 +
include/uapi/linux/fs.h | 1 +
mm/shmem.c | 36 +-
27 files changed, 3813 insertions(+), 58 deletions(-)
create mode 100644 Documentation/filesystems/overlayfs.txt
create mode 100644 fs/overlayfs/Kconfig
create mode 100644 fs/overlayfs/Makefile
create mode 100644 fs/overlayfs/copy_up.c
create mode 100644 fs/overlayfs/dir.c
create mode 100644 fs/overlayfs/inode.c
create mode 100644 fs/overlayfs/overlayfs.h
create mode 100644 fs/overlayfs/readdir.c
create mode 100644 fs/overlayfs/super.c
On Fri, Oct 24, 2014 at 5:24 AM, Al Viro <[email protected]> wrote:
> On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
>> Why the hell do you hold ->i_mutex across the entire opening of underlying
>> directory? All you need is to serialize one assignment; the side that loses
>> the race will simply fput() what it opened...
The reason I didn't do your "fix" is that it
- adds more lines than it takes,
- I wasn't sure at all if the lockless access is actually correct
without the ACCESS_ONCE and all the memory barrier magic that might be
necessary on weird architectures.
The rest of the changes look OK.
Thanks,
Miklos
Hi Al,
On Fri, Oct 24, 2014 at 5:24 AM, Al Viro <[email protected]> wrote:
> On Fri, Oct 24, 2014 at 03:20:55AM +0100, Al Viro wrote:
>> Why the hell do you hold ->i_mutex across the entire opening of underlying
>> directory? All you need is to serialize one assignment; the side that loses
>> the race will simply fput() what it opened...
>>
>> Oh, well - that goes under "weird pessimisations, easy to fix in followups"...
>
> OK, pulled into vfs.git, followups in question added. Also there: fix for
> a long-standing leak in d_splice_alias() failure exits. Guys, could you
> check that current vfs.git#for-linus survives your local tests? Seems to
> survive here; if I don't hear of any problems by tomorrow morning, to Linus
> it goes... FWIW, for that pull request stats would be
I think you forgot to merge your for-linus branch into for-next?
Anyway, as Stephen announced there will be no linux-next release
today, I'm afraid you'll have to delay your pull request to Tuesday.
Gr{oetje,eeting}s,
Geert
--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]
In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
On Sat, Oct 25, 2014 at 10:18 AM, Al Viro <[email protected]> wrote:
> On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote:
>
>> The reason I didn't do your "fix" is that it
>>
>> - adds more lines than it takes,
>>
>> - I wasn't sure at all if the lockless access is actually correct
>> without the ACCESS_ONCE and all the memory barrier magic that might be
>> necessary on weird architectures.
>
> _What_ lockless accesses? There is an extremely embarrassing bug in that
> commit, all right, but it has nothing to do with barriers... All
> barrier-related issues are taken care of by ovl_path_upper() (and without
> that you'd have tons of worse problems). Fetching ->upperfile outside of
> ->i_mutex is fine - in the worst case we'll fetch NULL, open the sucker
> grab ->i_mutex and find out that it has already been taken care of.
> In which case we fput() what we'd opened and move on (fput() under
> ->i_mutex is fine - it's going to be delayed until return from syscall
> anyway).
Yes, but it's not about race with copy-up (which the ovl_path_upper()
protects against), but race of two fsync calls with each other. If
there's no synchronization between them, then that od->upperfile does
indeed count as lockless access, no matter that the assignment was
done under lock.
Thanks,
Miklos
On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:
> Yes, but it's not about race with copy-up (which the ovl_path_upper()
> protects against), but race of two fsync calls with each other. If
> there's no synchronization between them, then that od->upperfile does
> indeed count as lockless access, no matter that the assignment was
> done under lock.
p = global;
if (!p) { // outside of lock
p = alloc();
grab lock
if (!global) {
global = p;
} else {
destroy(p);
p = global;
}
drop lock
}
is a very common pattern, especially if you look for cases when lock is
a spinlock and allocation is blocking (in those cases you'll often see
destroy() part done after dropping the lock; that's where what I fucked up in
what I'd originally pushed. And it wasn't even needed - fput() under
->i_mutex is OK...)
On Fri, Oct 24, 2014 at 09:24:45AM +0200, Miklos Szeredi wrote:
> The reason I didn't do your "fix" is that it
>
> - adds more lines than it takes,
>
> - I wasn't sure at all if the lockless access is actually correct
> without the ACCESS_ONCE and all the memory barrier magic that might be
> necessary on weird architectures.
_What_ lockless accesses? There is an extremely embarrassing bug in that
commit, all right, but it has nothing to do with barriers... All
barrier-related issues are taken care of by ovl_path_upper() (and without
that you'd have tons of worse problems). Fetching ->upperfile outside of
->i_mutex is fine - in the worst case we'll fetch NULL, open the sucker
grab ->i_mutex and find out that it has already been taken care of.
In which case we fput() what we'd opened and move on (fput() under
->i_mutex is fine - it's going to be delayed until return from syscall
anyway).
There was a very dumb braino in there; fixed, force-pushed, passes unionmount
tests, no regressions on LTP syscall ones and xfstests.
[Paul McKenney added to CC]
On Sat, Oct 25, 2014 at 7:06 PM, Al Viro <[email protected]> wrote:
> On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:
>
>> Yes, but it's not about race with copy-up (which the ovl_path_upper()
>> protects against), but race of two fsync calls with each other. If
>> there's no synchronization between them, then that od->upperfile does
>> indeed count as lockless access, no matter that the assignment was
>> done under lock.
>
> p = global;
> if (!p) { // outside of lock
> p = alloc();
> grab lock
> if (!global) {
> global = p;
> } else {
> destroy(p);
> p = global;
> }
> drop lock
> }
> is a very common pattern, especially if you look for cases when lock is
> a spinlock and allocation is blocking (in those cases you'll often see
> destroy() part done after dropping the lock; that's where what I fucked up in
> what I'd originally pushed. And it wasn't even needed - fput() under
> ->i_mutex is OK...)
Being a very common pattern does not automatically make it correct...
My understanding of these issues is very limited, but it's not clear
to me what will order initialization of members of p with the storing
of p into global. E.g. we start out with global == NULL and p->foo ==
0.
CPU1:
p->foo = 1
grab lock
if (!global)
global = p
CPU1:
p = global
if (p)
q = p->foo
Is it guaranteed that the above sequence (as is, without any barriers
or ACCESS_ONCE() other than the lock acquisition) will result in q ==
1 if p != NULL?
Thanks,
Miklos
On Mon, Oct 27, 2014 at 09:06:54AM +0100, Miklos Szeredi wrote:
> [Paul McKenney added to CC]
>
> On Sat, Oct 25, 2014 at 7:06 PM, Al Viro <[email protected]> wrote:
> > On Sat, Oct 25, 2014 at 11:53:52AM +0200, Miklos Szeredi wrote:
> >
> >> Yes, but it's not about race with copy-up (which the ovl_path_upper()
> >> protects against), but race of two fsync calls with each other. If
> >> there's no synchronization between them, then that od->upperfile does
> >> indeed count as lockless access, no matter that the assignment was
> >> done under lock.
> >
> > p = global;
> > if (!p) { // outside of lock
> > p = alloc();
> > grab lock
> > if (!global) {
> > global = p;
> > } else {
> > destroy(p);
> > p = global;
> > }
> > drop lock
> > }
> > is a very common pattern, especially if you look for cases when lock is
> > a spinlock and allocation is blocking (in those cases you'll often see
> > destroy() part done after dropping the lock; that's where what I fucked up in
> > what I'd originally pushed. And it wasn't even needed - fput() under
> > ->i_mutex is OK...)
>
> Being a very common pattern does not automatically make it correct...
>
> My understanding of these issues is very limited, but it's not clear
> to me what will order initialization of members of p with the storing
> of p into global. E.g. we start out with global == NULL and p->foo ==
> 0.
>
> CPU1:
> p->foo = 1
> grab lock
> if (!global)
> global = p
>
> CPU1:
If it is all the same to you, I will call this CPU2 to distinguish it from
the first CPU1. ;-)
> p = global
> if (p)
> q = p->foo
>
> Is it guaranteed that the above sequence (as is, without any barriers
> or ACCESS_ONCE() other than the lock acquisition) will result in q ==
> 1 if p != NULL?
Indeed, life is hard here. Keep in mind that lock acquisition is not
guaranteed to prevent prior operations from being reordered into the
critical section, possibly as follows:
CPU1:
grab lock
if (!global)
global = p;
/* Assume all of CPU2's accesses happen here. */
p->foo = 1;
This clearly allows CPU2 to execute as follows:
CPU2:
p = global; /* gets p */
if (p) /* non-NULL */
q = p->foo; /* might not be 1 */
Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
accesses can be misordered. You need rcu_dereference() or the combination
of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
more info.
So no, there is no guarantee. I am assuming that the lock grabbed by
CPU1 guards all assignments to "global", otherwise the code needs further
help. I am further assuming that the memory pointed to by CPU1's "p"
is inaccessible to any other CPU, as in CPU1 just allocated the memory.
Otherwise, the assignment "p->foo = 1" is questionable. And finally,
I am assuming that p->foo stays constant once it has been made
accessible to readers.
But the following should work:
CPU1:
p->foo = 1; /* Assumes p is local. */
smp_mb__before_spinlock();
grab lock
if (!global) /* Assumes lock protects all assignments to global. */
global = p;
CPU2:
p = rcu_dereference(global);
if (p)
q = p->foo; /* Assumes p->foo constant once visible to readers. */
/* Also assumes p and q are local. */
If the assumptions called out in the comments do not hold, you at least
need ACCESS_ONCE(), and perhaps even more synchronization. For more info
on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.
Thanx, Paul
On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote:
> Indeed, life is hard here. Keep in mind that lock acquisition is not
> guaranteed to prevent prior operations from being reordered into the
> critical section, possibly as follows:
>
> CPU1:
> grab lock
> if (!global)
> global = p;
> /* Assume all of CPU2's accesses happen here. */
> p->foo = 1;
A bit of context: p is a local pointer to struct file here and alloc is
opening it...
> This clearly allows CPU2 to execute as follows:
>
> CPU2:
> p = global; /* gets p */
> if (p) /* non-NULL */
> q = p->foo; /* might not be 1 */
>
> Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
> accesses can be misordered. You need rcu_dereference() or the combination
> of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
> As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
> more info.
>
> So no, there is no guarantee. I am assuming that the lock grabbed by
> CPU1 guards all assignments to "global", otherwise the code needs further
> help. I am further assuming that the memory pointed to by CPU1's "p"
> is inaccessible to any other CPU, as in CPU1 just allocated the memory.
> Otherwise, the assignment "p->foo = 1" is questionable. And finally,
> I am assuming that p->foo stays constant once it has been made
> accessible to readers.
>
> But the following should work:
>
> CPU1:
> p->foo = 1; /* Assumes p is local. */
> smp_mb__before_spinlock();
> grab lock
> if (!global) /* Assumes lock protects all assignments to global. */
> global = p;
>
> CPU2:
> p = rcu_dereference(global);
> if (p)
> q = p->foo; /* Assumes p->foo constant once visible to readers. */
> /* Also assumes p and q are local. */
>
> If the assumptions called out in the comments do not hold, you at least
> need ACCESS_ONCE(), and perhaps even more synchronization. For more info
> on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.
First of all, this "p->foo = 1" is a shorthand for initialization of
struct file done by opening a file. What you are saying is that it
can leak past the point where we stick a pointer to freshly opened
file into a place where another CPU can see it, but not past the
barrier in dropping the lock, right?
And you are suggesting rcu_dereference() as a way to bring the required
barriers in. Ouch... The names are really bad, but there's another
fun issue - rcu_dereference brings in sparse noise. Wouldn't direct use
of smp_read_barrier_depends() be cleaner, anyway?
On Mon, Oct 27, 2014 at 05:28:01PM +0000, Al Viro wrote:
> On Mon, Oct 27, 2014 at 08:59:01AM -0700, Paul E. McKenney wrote:
>
> > Indeed, life is hard here. Keep in mind that lock acquisition is not
> > guaranteed to prevent prior operations from being reordered into the
> > critical section, possibly as follows:
> >
> > CPU1:
> > grab lock
> > if (!global)
> > global = p;
> > /* Assume all of CPU2's accesses happen here. */
> > p->foo = 1;
>
> A bit of context: p is a local pointer to struct file here and alloc is
> opening it...
OK, good to know. ;-)
> > This clearly allows CPU2 to execute as follows:
> >
> > CPU2:
> > p = global; /* gets p */
> > if (p) /* non-NULL */
> > q = p->foo; /* might not be 1 */
> >
> > Not only that, on DEC Alpha, even if CPU1's accesses are ordered, CPU2's
> > accesses can be misordered. You need rcu_dereference() or the combination
> > of ACCESS_ONCE() and smp_read_barrier_depends() to avoid this issue.
> > As always, see http://www.openvms.compaq.com/wizard/wiz_2637.html for
> > more info.
> >
> > So no, there is no guarantee. I am assuming that the lock grabbed by
> > CPU1 guards all assignments to "global", otherwise the code needs further
> > help. I am further assuming that the memory pointed to by CPU1's "p"
> > is inaccessible to any other CPU, as in CPU1 just allocated the memory.
> > Otherwise, the assignment "p->foo = 1" is questionable. And finally,
> > I am assuming that p->foo stays constant once it has been made
> > accessible to readers.
> >
> > But the following should work:
> >
> > CPU1:
> > p->foo = 1; /* Assumes p is local. */
> > smp_mb__before_spinlock();
> > grab lock
> > if (!global) /* Assumes lock protects all assignments to global. */
> > global = p;
> >
> > CPU2:
> > p = rcu_dereference(global);
> > if (p)
> > q = p->foo; /* Assumes p->foo constant once visible to readers. */
> > /* Also assumes p and q are local. */
> >
> > If the assumptions called out in the comments do not hold, you at least
> > need ACCESS_ONCE(), and perhaps even more synchronization. For more info
> > on ACCESS_ONCE(), Jon's LWN article is at http://lwn.net/Articles/508991/.
>
> First of all, this "p->foo = 1" is a shorthand for initialization of
> struct file done by opening a file. What you are saying is that it
> can leak past the point where we stick a pointer to freshly opened
> file into a place where another CPU can see it, but not past the
> barrier in dropping the lock, right?
Exactly!
I should also add that smp_mb__before_spinlock() implies smp_wmb(), but
nothing more. But that is OK because smp_wmb() suffices in this case.
> And you are suggesting rcu_dereference() as a way to bring the required
> barriers in. Ouch... The names are really bad, but there's another
> fun issue - rcu_dereference brings in sparse noise. Wouldn't direct use
> of smp_read_barrier_depends() be cleaner, anyway?
Code making direct use of smp_read_barrier_depends() is harder to read,
in my experience, but good point on the sparse noise. Maybe a new
lockless_dereference() primitive? Maybe something like the following?
(Untested, probably does not even build.)
#define lockless_dereference(p) \
({ \
typeof(*p) *_________p1 = ACCESS_ONCE(p); \
smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
_________p1; \
})
Thanx, Paul
On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> Code making direct use of smp_read_barrier_depends() is harder to read,
> in my experience, but good point on the sparse noise. Maybe a new
> lockless_dereference() primitive? Maybe something like the following?
> (Untested, probably does not even build.)
>
> #define lockless_dereference(p) \
> ({ \
> typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> _________p1; \
> })
Hmm... Where would you prefer to put it? rcupdate.h?
On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote:
> On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > Code making direct use of smp_read_barrier_depends() is harder to read,
> > in my experience, but good point on the sparse noise. Maybe a new
> > lockless_dereference() primitive? Maybe something like the following?
> > (Untested, probably does not even build.)
> >
> > #define lockless_dereference(p) \
> > ({ \
> > typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > _________p1; \
> > })
>
> Hmm... Where would you prefer to put it? rcupdate.h?
Good a place as any, I guess. Please see patch below. Left to myself,
I would send this along for the next merge window, but please let me
know if you would like it sooner.
Thanx, Paul
------------------------------------------------------------------------
rcu: Provide counterpart to rcu_dereference() for non-RCU situations
Although rcu_dereference() and friends can be used in situations where
object lifetimes are being managed by something other than RCU, the
resulting sparse and lockdep-RCU noise can be annoying. This commit
therefore supplies a lockless_dereference(), which provides the
protection for dereferences without the RCU-related debugging noise.
Reported-by: Al Viro <[email protected]>
Signed-off-by: Paul E. McKenney <[email protected]>
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 9575c2d403b5..ed4f5939a452 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -617,6 +617,21 @@ static inline void rcu_preempt_sleep_check(void)
#define RCU_INITIALIZER(v) (typeof(*(v)) __force __rcu *)(v)
/**
+ * lockless_dereference() - safely load a pointer for later dereference
+ * @p: The pointer to load
+ *
+ * Similar to rcu_dereference(), but for situations where the pointed-to
+ * object's lifetime is managed by something other than RCU. That
+ * "something other" might be reference counting or simple immortality.
+ */
+#define lockless_dereference(p) \
+({ \
+ typeof(p) _________p1 = ACCESS_ONCE(p); \
+ smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
+ (_________p1); \
+})
+
+/**
* rcu_assign_pointer() - assign to RCU-protected pointer
* @p: pointer to assign to
* @v: value to assign (publish)
On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote:
> On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote:
> > On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > > Code making direct use of smp_read_barrier_depends() is harder to read,
> > > in my experience, but good point on the sparse noise. Maybe a new
> > > lockless_dereference() primitive? Maybe something like the following?
> > > (Untested, probably does not even build.)
> > >
> > > #define lockless_dereference(p) \
> > > ({ \
> > > typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > _________p1; \
> > > })
> >
> > Hmm... Where would you prefer to put it? rcupdate.h?
>
> Good a place as any, I guess. Please see patch below. Left to myself,
> I would send this along for the next merge window, but please let me
> know if you would like it sooner.
It's needed sooner, unfortunately. Guys, could you take a look at
vfs.git#for-linus and comment?
On Tue, Oct 28, 2014 at 10:55:13PM +0000, Al Viro wrote:
> On Mon, Oct 27, 2014 at 09:11:27PM -0700, Paul E. McKenney wrote:
> > On Tue, Oct 28, 2014 at 01:12:14AM +0000, Al Viro wrote:
> > > On Mon, Oct 27, 2014 at 10:36:21AM -0700, Paul E. McKenney wrote:
> > > > Code making direct use of smp_read_barrier_depends() is harder to read,
> > > > in my experience, but good point on the sparse noise. Maybe a new
> > > > lockless_dereference() primitive? Maybe something like the following?
> > > > (Untested, probably does not even build.)
> > > >
> > > > #define lockless_dereference(p) \
> > > > ({ \
> > > > typeof(*p) *_________p1 = ACCESS_ONCE(p); \
> > > > smp_read_barrier_depends(); /* Dependency order vs. p above. */ \
> > > > _________p1; \
> > > > })
> > >
> > > Hmm... Where would you prefer to put it? rcupdate.h?
> >
> > Good a place as any, I guess. Please see patch below. Left to myself,
> > I would send this along for the next merge window, but please let me
> > know if you would like it sooner.
>
> It's needed sooner, unfortunately. Guys, could you take a look at
> vfs.git#for-linus and comment?
The version in your tree looks good. I will drop my commit in favor
of yours.
The use of lockless_dereference() and smp_mb__before_spinlock() in
d45f00ae43 (overlayfs: barriers for opening upper-layer directory)
looks fine to me. Only nit is lack of space between "=" and
lockless_dereference().
Thanx, Paul