2013-09-10 04:38:16

by Stephen Rothwell

[permalink] [raw]
Subject: linux-next: manual merge of the akpm tree with Linus' tree

Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in fs/dcache.c
between commit 8aab6a27332b ("vfs: reorganize dput() memory accesses")
from Linus' tree and commit "dcache: convert to use new lru list
infrastructure" from the akpm tree.

/me mutters about development happening during the merge window -
especially when Andrew is absent.

I have no idea if this will be correct, but I just used the version from
the akpm tree (effectively reverting parts of commit 8aab6a27332b) and
can carry the fix as necessary (no action is required).

--
Cheers,
Stephen Rothwell [email protected]


Attachments:
(No filename) (628.00 B)
(No filename) (836.00 B)
Download all attachments

2013-09-10 22:27:56

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, 10 Sep 2013 14:38:07 +1000 Stephen Rothwell <[email protected]> wrote:

> Hi Andrew,
>
> Today's linux-next merge of the akpm tree got a conflict in fs/dcache.c
> between commit 8aab6a27332b ("vfs: reorganize dput() memory accesses")
> from Linus' tree and commit "dcache: convert to use new lru list
> infrastructure" from the akpm tree.
>
> /me mutters about development happening during the merge window -
> especially when Andrew is absent.
>
> I have no idea if this will be correct, but I just used the version from
> the akpm tree (effectively reverting parts of commit 8aab6a27332b) and
> can carry the fix as necessary (no action is required).

This is rather a fiasco. "vfs: reorganize dput() memory accesses" made
rather a mess of a 46 patch series which has been under development and
test for two cycles so far.

I reverted it so I could get it all to apply and build with some
confidence that I didn't break anything. Then I went to re-apply "vfs:
reorganize dput() memory accesses" on top but I'm unsure that it's the
right thing to do. ->d_lru is now reused for s_dentry_lru and for the
shrink list and for the dispose list, so simply testing its
list_emptiness doesn't work any more. And it's unobvious that the
problem which "vfs: reorganize dput() memory accesses" addresses still
exists, or whether it was worsened or whatever.

And given that "vfs: reorganize dput() memory accesses" was driven by
careful profiling work, there's not a lot of point in going any further
until the new code is profiled and comparisons are performed.

Am moderately frustrated by all of this. It would be nice if the vfs
maintainer could have, you know, paid some attention to a massive great
vfs patchset instead of blithely wrecking it :(


Dave, can you please eyeball the below and have a think about its
applicability under the new regime? Thanks.

Right now I'm not very confident in all of this. I think what I'll do
is send the patches out for re-re-re-review right now and I'll ask Al
and Linus to please find some time to think them over.




commit 8aab6a27332bbf2abfcb35224738394e784d940b
Author: Linus Torvalds <[email protected]>
AuthorDate: Sun Sep 8 13:26:18 2013 -0700
Commit: Linus Torvalds <[email protected]>
CommitDate: Sun Sep 8 13:26:18 2013 -0700

vfs: reorganize dput() memory accesses

This is me being a bit OCD after all the dentry optimization work this
merge window: profiles end up showing 'dput()' as a rather expensive
operation, and there were two unrelated bad reasons for that.

The first reason was reading d_lockref.count for debugging purposes,
which touches the lockref cacheline (for reads) before really need to.
More importantly, the debugging test in question is _wrong_, and has
hidden bugs. It's true that we can only sleep when the count goes down
to zero, but the test as-is hides the much more subtle bug that happens
if we race with somebody else deleting the file.

Anyway we _will_ touch that cacheline, but let's do it for a write and
in the right routine (ie in "lockref_put_or_lock()") which annotates the
costs better. So remove the misleading debug code.

The other was an unnecessary access to the cacheline that contains the
d_lru list, just to check whether we already were on the LRU list or
not. This is exactly what we have d_flags for, so that we can avoid
touching extra cache lines for the common case. So just add another bit
for "is this dentry on the LRU".

Finally, mark the tests properly likely/unlikely, so that the common
fast-paths are dense in the instruction stream.

This makes the profiles look much saner.

Signed-off-by: Linus Torvalds <[email protected]>

diff --git a/fs/dcache.c b/fs/dcache.c
index 761e31b..bf3c4f9 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -308,8 +308,9 @@ static void dentry_unlink_inode(struct dentry * dentry)
*/
static void dentry_lru_add(struct dentry *dentry)
{
- if (list_empty(&dentry->d_lru)) {
+ if (unlikely(!(dentry->d_flags & DCACHE_LRU_LIST))) {
spin_lock(&dcache_lru_lock);
+ dentry->d_flags |= DCACHE_LRU_LIST;
list_add(&dentry->d_lru, &dentry->d_sb->s_dentry_lru);
dentry->d_sb->s_nr_dentry_unused++;
dentry_stat.nr_unused++;
@@ -320,7 +321,7 @@ static void dentry_lru_add(struct dentry *dentry)
static void __dentry_lru_del(struct dentry *dentry)
{
list_del_init(&dentry->d_lru);
- dentry->d_flags &= ~DCACHE_SHRINK_LIST;
+ dentry->d_flags &= ~(DCACHE_SHRINK_LIST | DCACHE_LRU_LIST);
dentry->d_sb->s_nr_dentry_unused--;
dentry_stat.nr_unused--;
}
@@ -341,6 +342,7 @@ static void dentry_lru_move_list(struct dentry *dentry, struct list_head *list)
{
spin_lock(&dcache_lru_lock);
if (list_empty(&dentry->d_lru)) {
+ dentry->d_flags |= DCACHE_LRU_LIST;
list_add_tail(&dentry->d_lru, list);
dentry->d_sb->s_nr_dentry_unused++;
dentry_stat.nr_unused++;
@@ -509,24 +511,22 @@ relock:
*/
void dput(struct dentry *dentry)
{
- if (!dentry)
+ if (unlikely(!dentry))
return;

repeat:
- if (dentry->d_lockref.count == 1)
- might_sleep();
if (lockref_put_or_lock(&dentry->d_lockref))
return;

- if (dentry->d_flags & DCACHE_OP_DELETE) {
+ /* Unreachable? Get rid of it */
+ if (unlikely(d_unhashed(dentry)))
+ goto kill_it;
+
+ if (unlikely(dentry->d_flags & DCACHE_OP_DELETE)) {
if (dentry->d_op->d_delete(dentry))
goto kill_it;
}

- /* Unreachable? Get rid of it */
- if (d_unhashed(dentry))
- goto kill_it;
-
dentry->d_flags |= DCACHE_REFERENCED;
dentry_lru_add(dentry);

diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index fe50f3d..feaa8d8 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -208,6 +208,7 @@ struct dentry_operations {
#define DCACHE_MANAGED_DENTRY \
(DCACHE_MOUNTED|DCACHE_NEED_AUTOMOUNT|DCACHE_MANAGE_TRANSIT)

+#define DCACHE_LRU_LIST 0x80000
#define DCACHE_DENTRY_KILLED 0x100000

extern seqlock_t rename_lock;

2013-09-10 22:29:28

by Al Viro

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, Sep 10, 2013 at 03:27:53PM -0700, Andrew Morton wrote:

> This is rather a fiasco. "vfs: reorganize dput() memory accesses" made
> rather a mess of a 46 patch series which has been under development and
> test for two cycles so far.

Check vfs.git#for-next...

2013-09-10 22:35:07

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, Sep 10, 2013 at 3:27 PM, Andrew Morton
<[email protected]> wrote:
>
> This is rather a fiasco. "vfs: reorganize dput() memory accesses" made
> rather a mess of a 46 patch series which has been under development and
> test for two cycles so far.

Andrew, *please* don't do the insane rebasing you keep on doing.

Nobody else does that, and it adds more work for you, and makes your
patch bombs be untimely.

And I'll be honest, I don't care about the "more work for you" part,
since I don't really see it, but I do care about the "untimely" part.
That's the one that affects me. For example, I'm traveling starting
Friday, so I'll close the merge window on the road (linuxCon US next
week, a weekend of diving before that). It would be much nicer if I
have almost nothing pending before I leave.

And quite frankly, there is absolutely nothing that touches
fs/dcache.c that should be in your tree anyway, as far as I can tell.

But seriously - don't do the constant rebasing. I tell the git people
that, because I hate how it actually dilutes the value of any testing
they did. The fact that you rebase to the day you send the patches is
actually taking *away* value from what you do, and it adds a lot of
work for you.

So I'd (once again) suggest you base your pile of patches on the
previous stable kernel, and that linux-next take it *first* rather
than take it last.

Linus

2013-09-10 22:35:26

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, 10 Sep 2013 23:29:24 +0100 Al Viro <[email protected]> wrote:

> On Tue, Sep 10, 2013 at 03:27:53PM -0700, Andrew Morton wrote:
>
> > This is rather a fiasco. "vfs: reorganize dput() memory accesses" made
> > rather a mess of a 46 patch series which has been under development and
> > test for two cycles so far.
>
> Check vfs.git#for-next...

Why? What is in it?

2013-09-10 22:36:27

by Al Viro

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, Sep 10, 2013 at 03:35:20PM -0700, Andrew Morton wrote:
> On Tue, 10 Sep 2013 23:29:24 +0100 Al Viro <[email protected]> wrote:
>
> > On Tue, Sep 10, 2013 at 03:27:53PM -0700, Andrew Morton wrote:
> >
> > > This is rather a fiasco. "vfs: reorganize dput() memory accesses" made
> > > rather a mess of a 46 patch series which has been under development and
> > > test for two cycles so far.
> >
> > Check vfs.git#for-next...
>
> Why? What is in it?

Initial attempt to port that very thing on top of current mainline.

2013-09-10 22:39:40

by Al Viro

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, Sep 10, 2013 at 11:36:24PM +0100, Al Viro wrote:
> On Tue, Sep 10, 2013 at 03:35:20PM -0700, Andrew Morton wrote:
> > On Tue, 10 Sep 2013 23:29:24 +0100 Al Viro <[email protected]> wrote:
> >
> > > On Tue, Sep 10, 2013 at 03:27:53PM -0700, Andrew Morton wrote:
> > >
> > > > This is rather a fiasco. "vfs: reorganize dput() memory accesses" made
> > > > rather a mess of a 46 patch series which has been under development and
> > > > test for two cycles so far.
> > >
> > > Check vfs.git#for-next...
> >
> > Why? What is in it?
>
> Initial attempt to port that very thing on top of current mainline.

... with considerable breakage around "dcache: convert to use new lru list
infrastructure", indeed. I'll see what I can do there...

2013-09-10 22:41:19

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, 10 Sep 2013 23:36:24 +0100 Al Viro <[email protected]> wrote:

> On Tue, Sep 10, 2013 at 03:35:20PM -0700, Andrew Morton wrote:
> > On Tue, 10 Sep 2013 23:29:24 +0100 Al Viro <[email protected]> wrote:
> >
> > > On Tue, Sep 10, 2013 at 03:27:53PM -0700, Andrew Morton wrote:
> > >
> > > > This is rather a fiasco. "vfs: reorganize dput() memory accesses" made
> > > > rather a mess of a 46 patch series which has been under development and
> > > > test for two cycles so far.
> > >
> > > Check vfs.git#for-next...
> >
> > Why? What is in it?
>
> Initial attempt to port that very thing on top of current mainline.

Obtained from where? There are a whole pile of fixes resulting from
review and linux-next testing. Are they included?

2013-09-10 22:44:04

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, 10 Sep 2013 15:35:04 -0700 Linus Torvalds <[email protected]> wrote:

> So I'd (once again) suggest you base your pile of patches on the
> previous stable kernel, and that linux-next take it *first* rather
> than take it last.

That's what we're now doing. But this particular patchset was
different because it's changing multiple subsystems, several of which
are concurrently being changed in an uncoordinated fashion.

2013-09-10 22:48:29

by Al Viro

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, Sep 10, 2013 at 03:41:16PM -0700, Andrew Morton wrote:

> Obtained from where? There are a whole pile of fixes resulting from
> review and linux-next testing. Are they included?

-next and yes. The trivial ones - folded into the commits they are fixing
(I mean, ones directly following the commit being fixed). The rest included
as individual commits.

I'm looking through the DCACHE_LRU_LIST mess right now...

2013-09-10 22:59:43

by Al Viro

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, Sep 10, 2013 at 11:48:23PM +0100, Al Viro wrote:
> On Tue, Sep 10, 2013 at 03:41:16PM -0700, Andrew Morton wrote:
>
> > Obtained from where? There are a whole pile of fixes resulting from
> > review and linux-next testing. Are they included?
>
> -next and yes. The trivial ones - folded into the commits they are fixing
> (I mean, ones directly following the commit being fixed). The rest included
> as individual commits.
>
> I'm looking through the DCACHE_LRU_LIST mess right now...

It's not that bad, actually; I think the variant I've pushed right now
(vfs.git#for-next, head at f5e1dd34561e0fb06400b378d595198918833021) should
be doing the right thing. It ought to cover everything in your branch
in -next from "fs: bump inode and dentry counters to long" on to the
end of queue.

2013-09-10 23:13:06

by Andrew Morton

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, 10 Sep 2013 23:59:34 +0100 Al Viro <[email protected]> wrote:

> On Tue, Sep 10, 2013 at 11:48:23PM +0100, Al Viro wrote:
> > On Tue, Sep 10, 2013 at 03:41:16PM -0700, Andrew Morton wrote:
> >
> > > Obtained from where? There are a whole pile of fixes resulting from
> > > review and linux-next testing. Are they included?
> >
> > -next and yes. The trivial ones - folded into the commits they are fixing
> > (I mean, ones directly following the commit being fixed). The rest included
> > as individual commits.
> >
> > I'm looking through the DCACHE_LRU_LIST mess right now...
>
> It's not that bad, actually; I think the variant I've pushed right now
> (vfs.git#for-next, head at f5e1dd34561e0fb06400b378d595198918833021) should
> be doing the right thing. It ought to cover everything in your branch
> in -next from "fs: bump inode and dentry counters to long" on to the
> end of queue.

That's the correct starting point. The end point should be
"staging/lustre/libcfs: cleanup linux-mem.h".

2013-09-10 23:37:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, Sep 10, 2013 at 3:59 PM, Al Viro <[email protected]> wrote:
>
> It's not that bad, actually; I think the variant I've pushed right now
> (vfs.git#for-next, head at f5e1dd34561e0fb06400b378d595198918833021) should
> be doing the right thing. It ought to cover everything in your branch
> in -next from "fs: bump inode and dentry counters to long" on to the
> end of queue.

>From a quick look, this looks pretty broken:

if (list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru))
this_cpu_inc(nr_dentry_unused);
dentry->d_flags |= DCACHE_LRU_LIST;

because if that list_lru_add() can fail, then we shouldn't set the
DCACHE_LRU_LIST bit either.

That said, I don't see how it can fail. We only do this with the
dentry locked, and when it's not already on the LRU list. So I think
the "if()" is just misleading and unnecessary - but the code works.

Linus

2013-09-10 23:53:56

by Al Viro

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, Sep 10, 2013 at 04:37:19PM -0700, Linus Torvalds wrote:
> On Tue, Sep 10, 2013 at 3:59 PM, Al Viro <[email protected]> wrote:
> >
> > It's not that bad, actually; I think the variant I've pushed right now
> > (vfs.git#for-next, head at f5e1dd34561e0fb06400b378d595198918833021) should
> > be doing the right thing. It ought to cover everything in your branch
> > in -next from "fs: bump inode and dentry counters to long" on to the
> > end of queue.
>
> >From a quick look, this looks pretty broken:
>
> if (list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru))
> this_cpu_inc(nr_dentry_unused);
> dentry->d_flags |= DCACHE_LRU_LIST;
>
> because if that list_lru_add() can fail, then we shouldn't set the
> DCACHE_LRU_LIST bit either.

list_lru_add() can fail if it's already on the list; leaving the counter
alone should've been conditional on that, setting the flag - no. Said
that, it probably should be WARN_ON(!...); this_cpu_inc(); ... |= ...;

2013-09-10 23:55:21

by Al Viro

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, Sep 10, 2013 at 04:13:02PM -0700, Andrew Morton wrote:
> > in -next from "fs: bump inode and dentry counters to long" on to the
> > end of queue.
>
> That's the correct starting point. The end point should be
> "staging/lustre/libcfs: cleanup linux-mem.h".

... which is the end of queue (well, the last one I grabbed, anyway).
And it obviously needed to go before the removal of ->shrinker(),
not after it, so it got moved up.

2013-09-11 00:01:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, Sep 10, 2013 at 4:53 PM, Al Viro <[email protected]> wrote:
>
> list_lru_add() can fail if it's already on the list; leaving the counter
> alone should've been conditional on that, setting the flag - no. Said
> that, it probably should be WARN_ON(!...); this_cpu_inc(); ... |= ...;

That WARN_ON_(!..) might indeed be better (maybe just WARN_ON_ONCE())..

That DCACHE_LRU_LIST bit needs to be coherent with "the dentry->d_lru
entry is on _some_ list" (whether it's the dentry one or the shrinker
one), so if that list_lru_add() ever fails, that would be a sign of
badness.

And that whole function is very performance-critical, to the point
where we not only don't want to call down to list_lry_add(), we don't
even want to touch the d_lru list entry itself to even _look_ if it's
empty or not, because that will take a cache miss. Which was obviously
the whole reason for that DCACHE_LRU_LIST bit existing...

Linus

2013-09-11 00:30:20

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

Hi Linus,

On Tue, 10 Sep 2013 15:44:00 -0700 Andrew Morton <[email protected]> wrote:
>
> On Tue, 10 Sep 2013 15:35:04 -0700 Linus Torvalds <[email protected]> wrote:
>
> > So I'd (once again) suggest you base your pile of patches on the
> > previous stable kernel, and that linux-next take it *first* rather
> > than take it last.
>
> That's what we're now doing. But this particular patchset was
> different because it's changing multiple subsystems, several of which
> are concurrently being changed in an uncoordinated fashion.

Yeah, Andrew and I came to an arrangement this round so that almost all
of his series is based only on your tree. Currently I have 375 patches
based on v3.11-rc7-14-gfa8218d in a branch that I just merge each day.
The remaining bit (which contains the series that caused this thread) is
now only 38 patches (after removing the stuff that Al has taken) is still
rebased on top of linux-next each day due to dependencies on other trees
in linux-next. (Yes, I know this is not ideal, but it is a work in
progress.)

So, Andrew, you should be able to just about send those 375 patches to
Linus (I know that there may be some fix folding to do before that) and
have him apply them on top of v3.11-rc7-14-gfa8218d in a separate branch
and then merge that branch. I have included the "git diff-tree --cc"
output from my merge of that into linux-next yesterday. Some of it is
not applicable yet (since there are still some other outstanding trees),
but it gives you some idea of how little the merge is a problem.

I hope this is helpful.
--
Cheers,
Stephen Rothwell [email protected]

diff --cc arch/s390/kernel/kprobes.c
index adbbe7f,cb7ac9e..0ce9fb2
--- a/arch/s390/kernel/kprobes.c
+++ b/arch/s390/kernel/kprobes.c
@@@ -100,17 -120,13 +120,16 @@@ static int __kprobes get_fixup_type(kpr
fixup |= FIXUP_RETURN_REGISTER;
break;
case 0xc0:
- if ((insn[0] & 0x0f) == 0x00 || /* larl */
- (insn[0] & 0x0f) == 0x05) /* brasl */
- fixup |= FIXUP_RETURN_REGISTER;
+ if ((insn[0] & 0x0f) == 0x05) /* brasl */
+ fixup |= FIXUP_RETURN_REGISTER;
break;
case 0xeb:
- if ((insn[2] & 0xff) == 0x44 || /* bxhg */
- (insn[2] & 0xff) == 0x45) /* bxleg */
+ switch (insn[2] & 0xff) {
+ case 0x44: /* bxhg */
+ case 0x45: /* bxleg */
fixup = FIXUP_BRANCH_NOT_TAKEN;
+ break;
+ }
break;
case 0xe3: /* bctg */
if ((insn[2] & 0xff) == 0x46)
diff --cc arch/x86/lib/csum-wrappers_64.c
index 7609e0e,aaba241..2528500
--- a/arch/x86/lib/csum-wrappers_64.c
+++ b/arch/x86/lib/csum-wrappers_64.c
@@@ -4,9 -4,9 +4,10 @@@
*
* Wrappers of assembly checksum functions for x86-64.
*/
+ #include <linux/sched.h>
#include <asm/checksum.h>
#include <linux/module.h>
+#include <asm/smap.h>

/**
* csum_partial_copy_from_user - Copy and checksum from user space.
diff --cc drivers/staging/lustre/lustre/llite/remote_perm.c
index dedd56a,ceac936..50de0f0
--- a/drivers/staging/lustre/lustre/llite/remote_perm.c
+++ b/drivers/staging/lustre/lustre/llite/remote_perm.c
@@@ -44,7 -44,9 +44,8 @@@
#define DEBUG_SUBSYSTEM S_LLITE

#include <linux/module.h>
+ #include <linux/sched.h>
#include <linux/types.h>
-#include <linux/version.h>

#include <lustre_lite.h>
#include <lustre_ha.h>
diff --cc fs/fat/file.c
index 33711ff,00b5810..26c8e32
--- a/fs/fat/file.c
+++ b/fs/fat/file.c
@@@ -148,6 -151,22 +151,22 @@@ static long fat_generic_compat_ioctl(st

static int fat_file_release(struct inode *inode, struct file *filp)
{
+
+ struct super_block *sb = inode->i_sb;
+ loff_t mmu_private_ideal;
+
+ /*
+ * Release unwritten fallocated blocks on file release.
+ * Do this only when the last open file descriptor is closed.
+ */
+ mutex_lock(&inode->i_mutex);
+ mmu_private_ideal = round_up(inode->i_size, sb->s_blocksize);
+
+ if (mmu_private_ideal < MSDOS_I(inode)->mmu_private &&
- filp->f_dentry->d_count == 1)
++ d_count(filp->f_dentry) == 1)
+ fat_truncate_blocks(inode, inode->i_size);
+ mutex_unlock(&inode->i_mutex);
+
if ((filp->f_mode & FMODE_WRITE) &&
MSDOS_SB(inode->i_sb)->options.flush) {
fat_flush_inodes(inode->i_sb, inode, NULL);
diff --cc fs/xfs/xfs_mount.c
index 5dcc680,eb9ba15..65dbf17
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@@ -15,9 -15,10 +15,10 @@@
* along with this program; if not, write the Free Software Foundation,
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
*/
+ #include <linux/sched.h>
#include "xfs.h"
#include "xfs_fs.h"
-#include "xfs_types.h"
+#include "xfs_format.h"
#include "xfs_bit.h"
#include "xfs_log.h"
#include "xfs_inum.h"
diff --cc kernel/fork.c
index fb4406b,04a8c2a..5ede60b
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@@ -1173,13 -1171,15 +1171,16 @@@ static struct task_struct *copy_process
return ERR_PTR(-EINVAL);

/*
- * If the new process will be in a different pid namespace
- * don't allow the creation of threads.
+ * If the new process will be in a different pid or user namespace
+ * do not allow it to share a thread group or signal handlers or
+ * parent with the forking task.
*/
- if ((clone_flags & (CLONE_VM|CLONE_NEWPID)) &&
- (task_active_pid_ns(current) !=
- current->nsproxy->pid_ns_for_children))
- return ERR_PTR(-EINVAL);
+ if (clone_flags & (CLONE_SIGHAND | CLONE_PARENT)) {
+ if ((clone_flags & (CLONE_NEWUSER | CLONE_NEWPID)) ||
- (task_active_pid_ns(current) != current->nsproxy->pid_ns))
++ (task_active_pid_ns(current) !=
++ current->nsproxy->pid_ns_for_children))
+ return ERR_PTR(-EINVAL);
+ }

retval = security_task_create(clone_flags);
if (retval)
diff --cc kernel/watchdog.c
index 51c4f34,410d5bb..373d3e1
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@@ -553,6 -607,14 +607,6 @@@ void __init lockup_detector_init(void
{
set_sample_period();

-#ifdef CONFIG_NO_HZ_FULL
- if (watchdog_user_enabled) {
- watchdog_user_enabled = 0;
- pr_warning("Disabled lockup detectors by default for full dynticks\n");
- pr_warning("You can reactivate it with 'sysctl -w kernel.watchdog=1'\n");
- }
-#endif
-
if (watchdog_user_enabled)
- watchdog_enable_all_cpus();
+ watchdog_enable_all_cpus(false);
}


Attachments:
(No filename) (6.17 kB)
(No filename) (836.00 B)
Download all attachments

2013-09-11 00:39:18

by Dave Chinner

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, Sep 10, 2013 at 05:01:23PM -0700, Linus Torvalds wrote:
> On Tue, Sep 10, 2013 at 4:53 PM, Al Viro <[email protected]> wrote:
> >
> > list_lru_add() can fail if it's already on the list; leaving the counter
> > alone should've been conditional on that, setting the flag - no. Said
> > that, it probably should be WARN_ON(!...); this_cpu_inc(); ... |= ...;
>
> That WARN_ON_(!..) might indeed be better (maybe just WARN_ON_ONCE())..
>
> That DCACHE_LRU_LIST bit needs to be coherent with "the dentry->d_lru
> entry is on _some_ list" (whether it's the dentry one or the shrinker
> one), so if that list_lru_add() ever fails, that would be a sign of
> badness.
>
> And that whole function is very performance-critical, to the point
> where we not only don't want to call down to list_lry_add(), we don't
> even want to touch the d_lru list entry itself to even _look_ if it's
> empty or not, because that will take a cache miss. Which was obviously
> the whole reason for that DCACHE_LRU_LIST bit existing...

Guys, I'm about to be out of the office for 4-5 days, so this is
real bad timing for me. When I get back I'll put some effort into
validating that everything still works properly and performs as
expected.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2013-09-11 00:41:31

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, Sep 10, 2013 at 5:30 PM, Stephen Rothwell <[email protected]> wrote:
>
> So, Andrew, you should be able to just about send those 375 patches to
> Linus (I know that there may be some fix folding to do before that) and
> have him apply them on top of v3.11-rc7-14-gfa8218d in a separate branch
> and then merge that branch.

That's how I have been doing Andrew's patch-bombs anyway, since I
started asking him what they were based on (they *usually* apply on
top of my current tip, but during the merge window so much happens
that it wasn't all that unusual that one or two patches didn't quite
apply). So these days I just always do a "git checkout -b akpm
<base-andrew-gives-me>" and apply the patches that way.

It's just that normally the base is within about six hours of my tip
anyway. This would be the first time it's way back.

But I much prefer having more of a merge of well-tested code over
having an easier merge of something that was rebased. That's
especially true if the rebasing ends up delaying me getting the
patches in the first place. I detest having merge windows where the
last few days are busy, I'd really much rather have a couple of days
with very little going on before doing the -rc1 (in the hope that rc1
actually ends up being pretty good).

> I have included the "git diff-tree --cc"
> output from my merge of that into linux-next yesterday. Some of it is
> not applicable yet (since there are still some other outstanding trees),
> but it gives you some idea of how little the merge is a problem.

Yes. We actually seldom have real merge problems. The only really
annoying merges tend to be when somebody did something that isn't
really code-related, like sorting a big list of entries (Makefiles,
Kconfig files, device tree cleanups) and then both sides do a fair
amount of changes on their respective - and very different - sides.

The "multiple people worked on the same code" part is fairly unusual,
and particular if I know the code and can even compile it, it's all
fine.

Linus

2013-09-11 04:31:51

by Stephen Rothwell

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

Hi Andrew,

On Tue, 10 Sep 2013 16:13:02 -0700 Andrew Morton <[email protected]> wrote:
>
> On Tue, 10 Sep 2013 23:59:34 +0100 Al Viro <[email protected]> wrote:
> >
> > It's not that bad, actually; I think the variant I've pushed right now
> > (vfs.git#for-next, head at f5e1dd34561e0fb06400b378d595198918833021) should
> > be doing the right thing. It ought to cover everything in your branch
> > in -next from "fs: bump inode and dentry counters to long" on to the
> > end of queue.
>
> That's the correct starting point. The end point should be
> "staging/lustre/libcfs: cleanup linux-mem.h".

OK, so I have removed that set of patches from the copy of the -mm tree
that is in linux-next today.

--
Cheers,
Stephen Rothwell [email protected]


Attachments:
(No filename) (786.00 B)
(No filename) (836.00 B)
Download all attachments

2013-09-13 00:56:25

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Tue, Sep 10, 2013 at 4:37 PM, Linus Torvalds
<[email protected]> wrote:
>
> From a quick look, this looks pretty broken:
>
> if (list_lru_add(&dentry->d_sb->s_dentry_lru, &dentry->d_lru))
> this_cpu_inc(nr_dentry_unused);
> dentry->d_flags |= DCACHE_LRU_LIST;
>
> because if that list_lru_add() can fail, then we shouldn't set the
> DCACHE_LRU_LIST bit either.
>
> That said, I don't see how it can fail. We only do this with the
> dentry locked, and when it's not already on the LRU list. So I think
> the "if()" is just misleading and unnecessary - but the code works.

So I thought you'd clean this up. Looking again, it still seems really
confused, and I'm finding actual bugs.

You don't clear the DCACHE_LRU_LIST when you remove dentries from the
d_lru list. In other cases (like shrink_dentry_list), you clear just
the DCACHE_SHRINK_LIST.

As a result, the "if ()" isn't necessarily unnecessary, but there are
actual bugs. It looks like the dentry can be removed from the d_lru
lists without the bit ever getting cleared, and if that happens, it
will never be moved back.

The rule for DCACHE_LRU_LIST was - and should be - that the bit is set
IFF the d_lru list is not empty. So it gets set when a dentry is moved
to the LRU lists, but it _stays_ set if the dentry is moved to the
shrink_list. It then gets cleared when the dentry is removed from any
d_lru list (ie "list_del_init()").

I'll walk through the code, it looked suspicious. Maybe there's
something subtle that makes it work, but I don't see it.

Linus

2013-09-13 01:12:28

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Thu, Sep 12, 2013 at 5:56 PM, Linus Torvalds
<[email protected]> wrote:
>
> I'll walk through the code, it looked suspicious. Maybe there's
> something subtle that makes it work, but I don't see it.

Btw, it's not just the DCACHE_LRU_LIST bit. The games with
"nr_dentry_unused" look totally broken too. It's decremented in
dentry_lru_isolate_shrink() for each dentry we remove, and then it is
decremented *again* in shrink_dcache_sb() by the number of dentries we
removed.

Maybe I'm confused, but the code sure looks more confused than I feel.

I would suggest keeping the same semantics for 'nr_dentry_unused'.
Dentries are unused whether they are on the "real" LRU list or have
been tagged with DCACHE_SHRINK_LIST. So moving from one list to the
other does nothing. It's the "list_del_init()" that should trigger
both 'nr_dentry_unused' and DCACHE_LRU_LIST bit-clearing.

In fact, maybe a helper function for _actually_ removing the thing
from all lists, and adding them back. Right now there are
"list_del_init()" and "list_add[_tail]()" calls sprinkled around in
random places, mixed up with the new "list_lru_add()".

Damn, the code is too confused. I have to go to a highschool parent
back-to-school meeting, so I won't get to this until maybe on a plane
tomorrow. Al, can you please give this a look?

Linus

2013-09-13 01:35:59

by Al Viro

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Thu, Sep 12, 2013 at 06:12:24PM -0700, Linus Torvalds wrote:
> On Thu, Sep 12, 2013 at 5:56 PM, Linus Torvalds
> <[email protected]> wrote:
> >
> > I'll walk through the code, it looked suspicious. Maybe there's
> > something subtle that makes it work, but I don't see it.
>
> Btw, it's not just the DCACHE_LRU_LIST bit. The games with
> "nr_dentry_unused" look totally broken too. It's decremented in
> dentry_lru_isolate_shrink() for each dentry we remove, and then it is
> decremented *again* in shrink_dcache_sb() by the number of dentries we
> removed.
>
> Maybe I'm confused, but the code sure looks more confused than I feel.
>
> I would suggest keeping the same semantics for 'nr_dentry_unused'.
> Dentries are unused whether they are on the "real" LRU list or have
> been tagged with DCACHE_SHRINK_LIST. So moving from one list to the
> other does nothing. It's the "list_del_init()" that should trigger
> both 'nr_dentry_unused' and DCACHE_LRU_LIST bit-clearing.
>
> In fact, maybe a helper function for _actually_ removing the thing
> from all lists, and adding them back. Right now there are
> "list_del_init()" and "list_add[_tail]()" calls sprinkled around in
> random places, mixed up with the new "list_lru_add()".
>
> Damn, the code is too confused. I have to go to a highschool parent
> back-to-school meeting, so I won't get to this until maybe on a plane
> tomorrow. Al, can you please give this a look?

Will do...

2013-09-13 19:12:23

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Thu, Sep 12, 2013 at 6:12 PM, Linus Torvalds
<[email protected]> wrote:
>
> Damn, the code is too confused. I have to go to a highschool parent
> back-to-school meeting, so I won't get to this until maybe on a plane
> tomorrow. Al, can you please give this a look?

I'm on a plane. I have gogo. Here's my current TOTALLY UNTESTED patch.

It tries to consolidate the dentry LRU stuff into a few helper
functions that right now have anal checking of the flags. Maybe I
overdid it, but the code was really confusing, and I think we got the
free dentry counts wrong, and the bits wrong too, so I tried to be
extra careful.

There are several cases:
- d_lru_add/del: fairly obvious
- d_lru_isolate: this is when the LRU callbacks ask us to remove the
entry from the list. This is different from d_lru_del() only in that
it uses the raw list removal, not the lru list helper function. I'm
not sure that's right, but that's what the code used to do.
- d_lru_shrink_move: move from the "global" lru list to a private shrinker list
- d_shrink_add/del: fairly obvious.

And then "denty_lru_add/del" that actually take the current state into
account and do the right thing. Those we had before, I'm just
explaining the difference from the low-level operations that have
fixed "from this state to that" semantics

Hmm?

Does it work? Who knows.. But *if* it works, I think it has a higher
chance of getting all the rules for bits and free object counting
right.

Somebody not in a plane please double-check my low-oxygen-pressure thinking..

Linus


Attachments:
patch.diff (6.25 kB)

2013-09-13 19:28:10

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Fri, Sep 13, 2013 at 12:12 PM, Linus Torvalds
<[email protected]> wrote:
>
> Does it work? Who knows.. But *if* it works, I think it has a higher
> chance of getting all the rules for bits and free object counting
> right.
>
> Somebody not in a plane please double-check my low-oxygen-pressure thinking..

Ok, it seems to work for me, even when memory pressure causes the
dentries to be shrunk.

I still would like people (ie Al, in particular), to double-check everything.

Linus

2013-09-13 19:54:12

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Fri, Sep 13, 2013 at 12:12 PM, Linus Torvalds
<[email protected]> wrote:
>
> - d_lru_isolate: this is when the LRU callbacks ask us to remove the
> entry from the list. This is different from d_lru_del() only in that
> it uses the raw list removal, not the lru list helper function. I'm
> not sure that's right, but that's what the code used to do.

Yes, verified. The LRU isolation functions are called with the lru
list lock held, so d_lru_isolate() basically just does the
list)_del_init() and our own state management.

Anyway, I fixed a typo in a comment (nd_dentry_unused ->
nr_dentry_unused) but the patch still looks good to me.

Linus

2013-09-13 20:00:08

by Al Viro

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Fri, Sep 13, 2013 at 12:12:20PM -0700, Linus Torvalds wrote:

> It tries to consolidate the dentry LRU stuff into a few helper
> functions that right now have anal checking of the flags. Maybe I
> overdid it, but the code was really confusing, and I think we got the
> free dentry counts wrong, and the bits wrong too, so I tried to be
> extra careful.
>
> There are several cases:
> - d_lru_add/del: fairly obvious
> - d_lru_isolate: this is when the LRU callbacks ask us to remove the
> entry from the list. This is different from d_lru_del() only in that
> it uses the raw list removal, not the lru list helper function. I'm
> not sure that's right, but that's what the code used to do.

It is right - for one thing, we are holding the lock on that LRU list,
so list_lru_del() would deadlock right there. For another, the same
list_lru_walk (OK, list_lru_walk_node()) will do ->nr_items decrement
when we return LRU_REMOVED to it, so we don't want to do it twice.
Plain list_del_init() is correct here.

> - d_lru_shrink_move: move from the "global" lru list to a private shrinker list
> - d_shrink_add/del: fairly obvious.
>
> And then "denty_lru_add/del" that actually take the current state into
> account and do the right thing. Those we had before, I'm just
> explaining the difference from the low-level operations that have
> fixed "from this state to that" semantics

Looks sane; FWIW, the variant I'm playing with uses two independent
flags for "shrinker" and "per-sb", but AFAICS that doesn't yield better
code. Feel free to slap my acked-by on it.

2013-09-13 20:18:07

by Al Viro

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Fri, Sep 13, 2013 at 09:00:00PM +0100, Al Viro wrote:
> > - d_lru_shrink_move: move from the "global" lru list to a private shrinker list
> > - d_shrink_add/del: fairly obvious.
> >
> > And then "denty_lru_add/del" that actually take the current state into
> > account and do the right thing. Those we had before, I'm just
> > explaining the difference from the low-level operations that have
> > fixed "from this state to that" semantics
>
> Looks sane; FWIW, the variant I'm playing with uses two independent
> flags for "shrinker" and "per-sb", but AFAICS that doesn't yield better
> code.

Actually, it does yield slightly better code... Look - if you take your
patch and replace LRU_LIST | SHRINK_LIST combination with bare SHRINK_LIST
(which can't occur right now). Then all transitions turn into flipping
a single bit, check in dentry_lru_add() becomes if (!(flags & (SHRINK | LRU))
and dentry_lru_del() -- if (... & SHRINK) return ...; if (... & LRU) return ...

It can be done as a followup, anyway - better not mix that with fixes.

2013-09-13 20:23:30

by Al Viro

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Fri, Sep 13, 2013 at 09:18:03PM +0100, Al Viro wrote:
> On Fri, Sep 13, 2013 at 09:00:00PM +0100, Al Viro wrote:
> > > - d_lru_shrink_move: move from the "global" lru list to a private shrinker list
> > > - d_shrink_add/del: fairly obvious.
> > >
> > > And then "denty_lru_add/del" that actually take the current state into
> > > account and do the right thing. Those we had before, I'm just
> > > explaining the difference from the low-level operations that have
> > > fixed "from this state to that" semantics
> >
> > Looks sane; FWIW, the variant I'm playing with uses two independent
> > flags for "shrinker" and "per-sb", but AFAICS that doesn't yield better
> > code.
>
> Actually, it does yield slightly better code... Look - if you take your
> patch and replace LRU_LIST | SHRINK_LIST combination with bare SHRINK_LIST
> (which can't occur right now). Then all transitions turn into flipping
^^^
> a single bit,

Obviously not *and* lru->shrink is common ;-/ Anyway, any benefit from
microoptimizing that will drown in noise, so... nevermind that idea.

2013-09-13 20:25:51

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Fri, Sep 13, 2013 at 4:00 PM, Al Viro <[email protected]> wrote:
\>
> It is right - for one thing, we are holding the lock on that LRU list,
> so list_lru_del() would deadlock right there. For another, the same
> list_lru_walk (OK, list_lru_walk_node()) will do ->nr_items decrement
> when we return LRU_REMOVED to it, so we don't want to do it twice.
> Plain list_del_init() is correct here.

Yes. And I found the opposite bug in one place: when we are collecting
dentries by walking the parents etc, we do *not* hold the global RCU
lock, so we cannot use the "d_lru_shrink_list()" thing after all. It's
correct as far as the internal logic of fs/dcache.c goes, but it
violates the global LRU list rules. So I replaced that with a
dentry_lru_del() followed by a d_shrink_add() instead.

Updated patch attached.

Linus


Attachments:
patch.diff (6.48 kB)

2013-09-13 20:31:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Fri, Sep 13, 2013 at 4:25 PM, Linus Torvalds
<[email protected]> wrote:
>
> Yes. And I found the opposite bug in one place: when we are collecting
> dentries by walking the parents etc, we do *not* hold the global RCU
> lock, so we cannot use the "d_lru_shrink_list()" thing after all. It's
> correct as far as the internal logic of fs/dcache.c goes, but it
> violates the global LRU list rules. So I replaced that with a
> dentry_lru_del() followed by a d_shrink_add() instead.

Actually, I replaced it with d_lru_del()+d_shrink_add(), because afaik
we should be guaranteed that the dentry in question is on the global
RCU list (we checked that refcount is 0, and it's not on a local
list).

Agreed?

The patch I sent out already had that version, I just "documented" my first one.

Linus

2013-09-13 20:31:55

by Al Viro

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Fri, Sep 13, 2013 at 04:25:48PM -0400, Linus Torvalds wrote:
> On Fri, Sep 13, 2013 at 4:00 PM, Al Viro <[email protected]> wrote:
> \>
> > It is right - for one thing, we are holding the lock on that LRU list,
> > so list_lru_del() would deadlock right there. For another, the same
> > list_lru_walk (OK, list_lru_walk_node()) will do ->nr_items decrement
> > when we return LRU_REMOVED to it, so we don't want to do it twice.
> > Plain list_del_init() is correct here.
>
> Yes. And I found the opposite bug in one place: when we are collecting
> dentries by walking the parents etc, we do *not* hold the global RCU
> lock,

??? LRU list lock, presumably?

so we cannot use the "d_lru_shrink_list()" thing after all. It's
> correct as far as the internal logic of fs/dcache.c goes, but it
> violates the global LRU list rules. So I replaced that with a
> dentry_lru_del() followed by a d_shrink_add() instead.

2013-09-13 20:34:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: linux-next: manual merge of the akpm tree with Linus' tree

On Fri, Sep 13, 2013 at 4:31 PM, Al Viro <[email protected]> wrote:
> On Fri, Sep 13, 2013 at 04:25:48PM -0400, Linus Torvalds wrote:
>>
>> Yes. And I found the opposite bug in one place: when we are collecting
>> dentries by walking the parents etc, we do *not* hold the global RCU
>> lock,
>
> ??? LRU list lock, presumably?

Heh, yes. You wouldn't believe how many times I mis-typed "lru" as
"rcu" while doing that patch.

Much of the other dentry work this merge window was all about rcu, so
my fingers end up being confused..

Linus