LinuxLists.cc - [PATCH 0/5 -tip] umount

[permalink] [raw]

Subject: [PATCH 1/5 -tip] 9p: umount_begin BKL pushdown

Signed-off-by: Alessio Igor Bogani <[email protected]>
---
fs/9p/vfs_super.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 5f8ab8a..2f64462 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -37,6 +37,7 @@
#include <linux/mount.h>
#include <linux/idr.h>
#include <linux/sched.h>
+#include <linux/smp_lock.h>
#include <net/9p/9p.h>
#include <net/9p/client.h>

@@ -232,7 +233,9 @@ v9fs_umount_begin(struct super_block *sb)
{
struct v9fs_session_info *v9ses = sb->s_fs_info;

+ lock_kernel();
v9fs_session_cancel(v9ses);
+ unlock_kernel();
}

static const struct super_operations v9fs_super_ops = {
--
1.6.0.4

2009-04-23 19:13:14

[permalink] [raw]

Subject: [PATCH 2/5 -tip] cifs: umount_begin BKL pushdown

Signed-off-by: Alessio Igor Bogani <[email protected]>
---
fs/cifs/cifsfs.c | 13 +++++++++++--
1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 38491fd..9517ae1 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -35,6 +35,7 @@
#include <linux/delay.h>
#include <linux/kthread.h>
#include <linux/freezer.h>
+#include <linux/smp_lock.h>
#include "cifsfs.h"
#include "cifspdu.h"
#define DECLARE_GLOBALS_HERE
@@ -525,18 +526,26 @@ static void cifs_umount_begin(struct super_block *sb)
struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
struct cifsTconInfo *tcon;

- if (cifs_sb == NULL)
+ lock_kernel();
+
+ if (cifs_sb == NULL) {
+ unlock_kernel();
return;
+ }

tcon = cifs_sb->tcon;
- if (tcon == NULL)
+ if (tcon == NULL) {
+ unlock_kernel();
return;
+ }

read_lock(&cifs_tcp_ses_lock);
if (tcon->tc_count == 1)
tcon->tidStatus = CifsExiting;
read_unlock(&cifs_tcp_ses_lock);

+ unlock_kernel();
+
/* cancel_brl_requests(tcon); */ /* BB mark all brl mids as exiting */
/* cancel_notify_requests(tcon); */
if (tcon->ses && tcon->ses->server) {
--
1.6.0.4

2009-04-23 19:13:33

[permalink] [raw]

Subject: [PATCH 5/5 -tip] vfs: Don-t call umount_begin with BKL held

Every filesystem should acquire BKL or provide an adeguate locking
mechanism.

Signed-off-by: Alessio Igor Bogani <[email protected]>
---
fs/namespace.c | 2 --
1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index c6f54e4..f046f4b 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1073,9 +1073,7 @@ static int do_umount(struct vfsmount *mnt, int flags)
*/

if (flags & MNT_FORCE && sb->s_op->umount_begin) {
- lock_kernel();
sb->s_op->umount_begin(sb);
- unlock_kernel();
}

/*
--
1.6.0.4

2009-04-23 19:13:48

[permalink] [raw]

Subject: [PATCH 3/5 -tip] fuse: umount_begin BKL pushdown

Signed-off-by: Alessio Igor Bogani <[email protected]>
---
fs/fuse/inode.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 459b73d..d1bc4d3 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -19,6 +19,7 @@
#include <linux/random.h>
#include <linux/sched.h>
#include <linux/exportfs.h>
+#include <linux/smp_lock.h>

MODULE_AUTHOR("Miklos Szeredi <[email protected]>");
MODULE_DESCRIPTION("Filesystem in Userspace");
@@ -259,7 +260,9 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,

static void fuse_umount_begin(struct super_block *sb)
{
+ lock_kernel();
fuse_abort_conn(get_fuse_conn_super(sb));
+ unlock_kernel();
}

static void fuse_send_destroy(struct fuse_conn *fc)
--
1.6.0.4

2009-04-23 19:14:13

[permalink] [raw]

Subject: [PATCH 4/5 -tip] nfs: umount_begin BKL pushdown

Signed-off-by: Alessio Igor Bogani <[email protected]>
---
fs/nfs/super.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 6717200..1679a16 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -683,9 +683,12 @@ static int nfs_show_stats(struct seq_file *m, struct vfsmount *mnt)
*/
static void nfs_umount_begin(struct super_block *sb)
{
- struct nfs_server *server = NFS_SB(sb);
+ struct nfs_server *server;
struct rpc_clnt *rpc;

+ lock_kernel();
+
+ server = NFS_SB(sb);
/* -EIO all pending I/O */
rpc = server->client_acl;
if (!IS_ERR(rpc))
@@ -693,6 +696,8 @@ static void nfs_umount_begin(struct super_block *sb)
rpc = server->client;
if (!IS_ERR(rpc))
rpc_killall_tasks(rpc);
+
+ unlock_kernel();
}

/*
--
1.6.0.4

2009-04-23 19:15:36

[permalink] [raw]

Subject: Re: [PATCH 2/5 -tip] cifs: umount_begin BKL pushdown

On Thu, Apr 23, 2009 at 09:12:02PM +0200, Alessio Igor Bogani wrote:
> struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
> struct cifsTconInfo *tcon;
>
> - if (cifs_sb == NULL)
> + lock_kernel();
> +
> + if (cifs_sb == NULL) {
> + unlock_kernel();
> return;
> + }

Huh? Why would a bleeding *local* *variable* care about BKL at all?

> tcon = cifs_sb->tcon;
> - if (tcon == NULL)
> + if (tcon == NULL) {
> + unlock_kernel();
> return;
> + }
>
> read_lock(&cifs_tcp_ses_lock);
> if (tcon->tc_count == 1)
> tcon->tidStatus = CifsExiting;
> read_unlock(&cifs_tcp_ses_lock);
>
> + unlock_kernel();
> +
> /* cancel_brl_requests(tcon); */ /* BB mark all brl mids as exiting */
> /* cancel_notify_requests(tcon); */

... and why would the rest of it *not* care?

> if (tcon->ses && tcon->ses->server) {
> --
> 1.6.0.4
>

2009-04-23 19:18:38

[permalink] [raw]

Subject: Re: [PATCH 0/5 -tip] umount_begin BKL pushdown

On Thu, Apr 23, 2009 at 09:12:00PM +0200, Alessio Igor Bogani wrote:
> Push the BKL acquisition from vfs to every specific filesystems
> with hope that it can be eliminated in a second moment.
>
> The first 4 patches add BKL lock into umount_begin() functions (for
> the filesystems that have this handler). The last one remove
> lock_kernel()/unlock_kernel() from fs/namespace.c (the only point
> that invoke umount_begin() funtcions).

I'd rather collapse all these patches together; no point doing that
per-fs (for all 4 of them). And CIFS side is bogus.

Another thing: -tip is no place for that. I can put that into VFS
tree, provided that comments above are dealt with.

2009-04-23 19:19:51

by Matthew Wilcox

[permalink] [raw]

Subject: Re: [PATCH 2/5 -tip] cifs: umount_begin BKL pushdown

On Thu, Apr 23, 2009 at 09:12:02PM +0200, Alessio Igor Bogani wrote:
> @@ -525,18 +526,26 @@ static void cifs_umount_begin(struct super_block *sb)
> struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
> struct cifsTconInfo *tcon;
>
> - if (cifs_sb == NULL)
> + lock_kernel();
> +
> + if (cifs_sb == NULL) {
> + unlock_kernel();
> return;

OK, what are you doing here? Either the BKL protects the sb here,
and you need to lock_kernel() before calling CIFS_SB(sb), or the BKL
protects nothing, and the lock_kernel() should be moved down below the
test for cifs_sb.

> + }
>
> tcon = cifs_sb->tcon;
> - if (tcon == NULL)
> + if (tcon == NULL) {
> + unlock_kernel();
> return;
> + }

Likewise -- does cifs rely on the BKL to protect the 'tcon' or not?

--
Matthew Wilcox Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours. We can't possibly take such
a retrograde step."

2009-04-23 21:33:20

[permalink] [raw]

Subject: Re: [PATCH 0/5 -tip] umount_begin BKL pushdown

* Al Viro <[email protected]> wrote:

> On Thu, Apr 23, 2009 at 09:12:00PM +0200, Alessio Igor Bogani wrote:
> > Push the BKL acquisition from vfs to every specific filesystems
> > with hope that it can be eliminated in a second moment.
> >
> > The first 4 patches add BKL lock into umount_begin() functions
> > (for the filesystems that have this handler). The last one
> > remove lock_kernel()/unlock_kernel() from fs/namespace.c (the
> > only point that invoke umount_begin() funtcions).
>
> I'd rather collapse all these patches together; no point doing
> that per-fs (for all 4 of them). And CIFS side is bogus.
>
> Another thing: -tip is no place for that. I can put that into VFS
> tree, provided that comments above are dealt with.

When that happens, could you please put it into a separate,
append-only branch i could pull (after some initial test-time) into
tip:kill-the-BKL?

Thanks,

Ingo

2009-04-24 01:58:14

by Stephen Rothwell

[permalink] [raw]

Subject: Re: [PATCH 0/5 -tip] umount_begin BKL pushdown

Hi all,

On Thu, 23 Apr 2009 23:32:49 +0200 Ingo Molnar <[email protected]> wrote:
>
> * Al Viro <[email protected]> wrote:
> >
> > Another thing: -tip is no place for that. I can put that into VFS
> > tree, provided that comments above are dealt with.
>
> When that happens, could you please put it into a separate,
> append-only branch i could pull (after some initial test-time) into
> tip:kill-the-BKL?

And just wondering how all this relates to Jonathan's (currently empty)
bkl-removal tree (that is merged into linux-next)?

--
Cheers,
Stephen Rothwell [email protected]
http://www.canb.auug.org.au/~sfr/

Attachments:

(No filename) (643.00 B)
(No filename) (197.00 B)
Download all attachments

2009-04-24 07:07:26

[permalink] [raw]

Subject: [PATCH 0/1] vfs: umount_begin BKL pushdown v2

Push the BKL acquisition from vfs to every specific filesystems
with hope that it can be eliminated in a second moment.

Filesystems, which support umount_begin(), changed by this patch
are 9p, nfs, cifs and fuse.

Changes:
Collapsed all patches into only one as requested by Al Viro.

Moved CIFS_SB() down into BKL protected zone as requested by Matthew Wilcox.

It is hard to say if tcon (into cifs's umount_begin() function) should be
protected by BKL. Up to now umount_begin() is always called with BKL held
so in uncertainty I maintain same logic. So I moved unlock_kernel() to at
the bottom of that function as requested by Al Viro and Matthew Wilcox.

Notes:
About cifs's umount_begin() function I suspect that a deadlock or circular
locking dependency can happens into kill-the-BKL tree if that patch is applied.

2009-04-24 07:07:45

[permalink] [raw]

Subject: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

Signed-off-by: Alessio Igor Bogani <[email protected]>
---
fs/9p/vfs_super.c | 6 +++++-
fs/cifs/cifsfs.c | 15 ++++++++++++---
fs/fuse/inode.c | 3 +++
fs/namespace.c | 2 --
fs/nfs/super.c | 7 ++++++-
5 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/fs/9p/vfs_super.c b/fs/9p/vfs_super.c
index 5f8ab8a..7d23214 100644
--- a/fs/9p/vfs_super.c
+++ b/fs/9p/vfs_super.c
@@ -37,6 +37,7 @@
#include <linux/mount.h>
#include <linux/idr.h>
#include <linux/sched.h>
+#include <linux/smp_lock.h>
#include <net/9p/9p.h>
#include <net/9p/client.h>

@@ -230,9 +231,12 @@ static int v9fs_show_options(struct seq_file *m, struct vfsmount *mnt)
static void
v9fs_umount_begin(struct super_block *sb)
{
- struct v9fs_session_info *v9ses = sb->s_fs_info;
+ struct v9fs_session_info *v9ses;

+ lock_kernel();
+ v9ses = sb->s_fs_info;
v9fs_session_cancel(v9ses);
+ unlock_kernel();
}

static const struct super_operations v9fs_super_ops = {
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 0d6d8b5..eef568c 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -35,6 +35,7 @@
#include <linux/delay.h>
#include <linux/kthread.h>
#include <linux/freezer.h>
+#include <linux/smp_lock.h>
#include "cifsfs.h"
#include "cifspdu.h"
#define DECLARE_GLOBALS_HERE
@@ -520,15 +521,22 @@ static struct quotactl_ops cifs_quotactl_ops = {

static void cifs_umount_begin(struct super_block *sb)
{
- struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
+ struct cifs_sb_info *cifs_sb;
struct cifsTconInfo *tcon;

- if (cifs_sb == NULL)
+ lock_kernel();
+ cifs_sb = CIFS_SB(sb);
+
+ if (cifs_sb == NULL) {
+ unlock_kernel();
return;
+ }

tcon = cifs_sb->tcon;
- if (tcon == NULL)
+ if (tcon == NULL) {
+ unlock_kernel();
return;
+ }

read_lock(&cifs_tcp_ses_lock);
if (tcon->tc_count == 1)
@@ -548,6 +556,7 @@ static void cifs_umount_begin(struct super_block *sb)
}
/* BB FIXME - finish add checks for tidStatus BB */

+ unlock_kernel();
return;
}

diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 459b73d..d1bc4d3 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -19,6 +19,7 @@
#include <linux/random.h>
#include <linux/sched.h>
#include <linux/exportfs.h>
+#include <linux/smp_lock.h>

MODULE_AUTHOR("Miklos Szeredi <[email protected]>");
MODULE_DESCRIPTION("Filesystem in Userspace");
@@ -259,7 +260,9 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,

static void fuse_umount_begin(struct super_block *sb)
{
+ lock_kernel();
fuse_abort_conn(get_fuse_conn_super(sb));
+ unlock_kernel();
}

static void fuse_send_destroy(struct fuse_conn *fc)
diff --git a/fs/namespace.c b/fs/namespace.c
index 4119620..0d2003f 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1073,9 +1073,7 @@ static int do_umount(struct vfsmount *mnt, int flags)
*/

if (flags & MNT_FORCE && sb->s_op->umount_begin) {
- lock_kernel();
sb->s_op->umount_begin(sb);
- unlock_kernel();
}

/*
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 6717200..1679a16 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -683,9 +683,12 @@ static int nfs_show_stats(struct seq_file *m, struct vfsmount *mnt)
*/
static void nfs_umount_begin(struct super_block *sb)
{
- struct nfs_server *server = NFS_SB(sb);
+ struct nfs_server *server;
struct rpc_clnt *rpc;

+ lock_kernel();
+
+ server = NFS_SB(sb);
/* -EIO all pending I/O */
rpc = server->client_acl;
if (!IS_ERR(rpc))
@@ -693,6 +696,8 @@ static void nfs_umount_begin(struct super_block *sb)
rpc = server->client;
if (!IS_ERR(rpc))
rpc_killall_tasks(rpc);
+
+ unlock_kernel();
}

/*
--
1.6.0.4

2009-04-24 07:13:36

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

On Fri, Apr 24, 2009 at 09:06:53AM +0200, Alessio Igor Bogani wrote:
> static void cifs_umount_begin(struct super_block *sb)
> {
> - struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
> + struct cifs_sb_info *cifs_sb;
> struct cifsTconInfo *tcon;
>
> - if (cifs_sb == NULL)
> + lock_kernel();
> + cifs_sb = CIFS_SB(sb);
> +
> + if (cifs_sb == NULL) {
> + unlock_kernel();
> return;
> + }
>
> tcon = cifs_sb->tcon;
> - if (tcon == NULL)
> + if (tcon == NULL) {
> + unlock_kernel();
> return;
> + }

AFAICS, both CIFS_SB(sb) and ->tcon are assign-once, so lock_kernel() should
really go here (if it can't be removed completely, of course, but that's up
to CIFS folks). Applied with such modification.

2009-04-24 07:15:52

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

On Fri, Apr 24, 2009 at 08:13:12AM +0100, Al Viro wrote:
> On Fri, Apr 24, 2009 at 09:06:53AM +0200, Alessio Igor Bogani wrote:
> > static void cifs_umount_begin(struct super_block *sb)
> > {
> > - struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
> > + struct cifs_sb_info *cifs_sb;
> > struct cifsTconInfo *tcon;
> >
> > - if (cifs_sb == NULL)
> > + lock_kernel();
> > + cifs_sb = CIFS_SB(sb);
> > +
> > + if (cifs_sb == NULL) {
> > + unlock_kernel();
> > return;
> > + }
> >
> > tcon = cifs_sb->tcon;
> > - if (tcon == NULL)
> > + if (tcon == NULL) {
> > + unlock_kernel();
> > return;
> > + }
>
> AFAICS, both CIFS_SB(sb) and ->tcon are assign-once, so lock_kernel() should
> really go here (if it can't be removed completely, of course, but that's up
> to CIFS folks). Applied with such modification.

PS: I suspect that checks for NULL are actually "what if kernel memory got
corrupted", but I'm too lazy to verify that at the moment; again, up to
CIFS folks.

2009-04-24 07:19:16

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

On Fri, Apr 24, 2009 at 08:13:12AM +0100, Al Viro wrote:

> AFAICS, both CIFS_SB(sb) and ->tcon are assign-once, so lock_kernel() should
> really go here (if it can't be removed completely, of course, but that's up
> to CIFS folks). Applied with such modification.

commit 208f6be8f9244f4a3e8de7b4c6ca97069698303a in
git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/

if you want to see the version after that change (or wait for linux-next
to pick it).

2009-04-24 07:41:26

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

2009/4/24 Al Viro <[email protected]>:
[...]
> commit 208f6be8f9244f4a3e8de7b4c6ca97069698303a in
> git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/
>
> if you want to see the version after that change (or wait for linux-next
> to pick it).

Thank Sir.

Ciao,
Alessio

2009-04-24 08:07:18

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

* Al Viro <[email protected]> wrote:

> On Fri, Apr 24, 2009 at 08:13:12AM +0100, Al Viro wrote:
>
> > AFAICS, both CIFS_SB(sb) and ->tcon are assign-once, so lock_kernel() should
> > really go here (if it can't be removed completely, of course, but that's up
> > to CIFS folks). Applied with such modification.
>
> commit 208f6be8f9244f4a3e8de7b4c6ca97069698303a in
> git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/
>
> if you want to see the version after that change (or wait for
> linux-next to pick it).

You've not replied to my request (attached below) to put these
trivial BKL-pushdown bits into a separate branch/tree and not into
the VFS tree. You've now mixed that commit with other VFS changes.

Had it been in a separate branch, and had we tested it, Linus could
have pulled the trivial BKL pushdown bits out of normal merge order
as well. That is not possible now.

Furthermore, by doing this you are also hindering the
tip:kill-the-BKL effort (which has been ongoing for a year chipping
away at various BKL details) which facilitated these changes.
Alessio did these fixes to fix bugs he can trigger in that tree.

You've also not explained why you have done it this way. It would
cost you almost nothing to apply these bits into a separate branch
and merge that branch into your main tree. Lots of other maintainer
are doing that.

So if you've done this by mistake, i'd like to ask you to reconsider
and put these bits into a separate, stable-commit-ID branch. If
you've done this intentionally, i'd like you to explain the reasons
for it, instead of just doing it silently without explanation.

Anwyay, if there's no resolution, i'll apply Alessio's fixes with a
different commit ID, to not hold up the rather useful work that is
going on in the kill-the-BKL tree. Later on i'll have to rebase that
portion of the tree to avoid duplicate commit IDs. I just wanted to
put it on the record why i have to do that rebase.

Thanks,

Ingo

----- Forwarded message from Ingo Molnar <[email protected]> -----

Date: Thu, 23 Apr 2009 23:32:49 +0200
From: Ingo Molnar <[email protected]>
To: Al Viro <[email protected]>
Subject: Re: [PATCH 0/5 -tip] umount_begin BKL pushdown
Cc: Alessio Igor Bogani <[email protected]>,
Jonathan Corbet <[email protected]>,
Fr??d??ric Weisbecker <[email protected]>,
Peter Zijlstra <[email protected]>,
LKML <[email protected]>,
LFSDEV <[email protected]>

* Al Viro <[email protected]> wrote:

> On Thu, Apr 23, 2009 at 09:12:00PM +0200, Alessio Igor Bogani wrote:
> > Push the BKL acquisition from vfs to every specific filesystems
> > with hope that it can be eliminated in a second moment.
> >
> > The first 4 patches add BKL lock into umount_begin() functions
> > (for the filesystems that have this handler). The last one
> > remove lock_kernel()/unlock_kernel() from fs/namespace.c (the
> > only point that invoke umount_begin() funtcions).
>
> I'd rather collapse all these patches together; no point doing
> that per-fs (for all 4 of them). And CIFS side is bogus.
>
> Another thing: -tip is no place for that. I can put that into VFS
> tree, provided that comments above are dealt with.

When that happens, could you please put it into a separate,
append-only branch i could pull (after some initial test-time) into
tip:kill-the-BKL?

Thanks,

Ingo

2009-04-24 08:48:44

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

On Fri, Apr 24, 2009 at 08:15:36AM +0100, Al Viro wrote:
> On Fri, Apr 24, 2009 at 08:13:12AM +0100, Al Viro wrote:
> > On Fri, Apr 24, 2009 at 09:06:53AM +0200, Alessio Igor Bogani wrote:
> > > static void cifs_umount_begin(struct super_block *sb)
> > > {
> > > - struct cifs_sb_info *cifs_sb = CIFS_SB(sb);
> > > + struct cifs_sb_info *cifs_sb;
> > > struct cifsTconInfo *tcon;
> > >
> > > - if (cifs_sb == NULL)
> > > + lock_kernel();
> > > + cifs_sb = CIFS_SB(sb);
> > > +
> > > + if (cifs_sb == NULL) {
> > > + unlock_kernel();
> > > return;
> > > + }
> > >
> > > tcon = cifs_sb->tcon;
> > > - if (tcon == NULL)
> > > + if (tcon == NULL) {
> > > + unlock_kernel();
> > > return;
> > > + }
> >
> > AFAICS, both CIFS_SB(sb) and ->tcon are assign-once, so lock_kernel() should
> > really go here (if it can't be removed completely, of course, but that's up
> > to CIFS folks). Applied with such modification.
>
> PS: I suspect that checks for NULL are actually "what if kernel memory got
> corrupted", but I'm too lazy to verify that at the moment; again, up to
> CIFS folks.

NULL checks are superflous here, but that should be a separate patch.

2009-04-24 08:50:38

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

On Fri, Apr 24, 2009 at 10:06:34AM +0200, Ingo Molnar wrote:
>
> You've not replied to my request (attached below) to put these
> trivial BKL-pushdown bits into a separate branch/tree and not into
> the VFS tree. You've now mixed that commit with other VFS changes.
>
> Had it been in a separate branch, and had we tested it, Linus could
> have pulled the trivial BKL pushdown bits out of normal merge order
> as well. That is not possible now.

It shouldn't be pushed out of order. It's a normal VFS locking change
and should be pushed with the next VFS push for 2.6.31.

> Furthermore, by doing this you are also hindering the
> tip:kill-the-BKL effort (which has been ongoing for a year chipping
> away at various BKL details) which facilitated these changes.
> Alessio did these fixes to fix bugs he can trigger in that tree.
>
> You've also not explained why you have done it this way. It would
> cost you almost nothing to apply these bits into a separate branch
> and merge that branch into your main tree. Lots of other maintainer
> are doing that.

Having a separate kill the BKL tree is a stupid idea. Locking changes
need deep subsystem knowledge and should always go through the subsystem
trees.

2009-04-24 09:16:33

by Frederic Weisbecker

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

2009/4/24 Christoph Hellwig <[email protected]>:
>> Furthermore, by doing this you are also hindering the
>> tip:kill-the-BKL effort (which has been ongoing for a year chipping
>> away at various BKL details) which facilitated these changes.
>> Alessio did these fixes to fix bugs he can trigger in that tree.
>>
>> You've also not explained why you have done it this way. It would
>> cost you almost nothing to apply these bits into a separate branch
>> and merge that branch into your main tree. Lots of other maintainer
>> are doing that.
>
> Having a separate kill the BKL tree is a stupid idea. Locking changes
> need deep subsystem knowledge and should always go through the subsystem
> trees.

I disagree with you. The kill-the-BKL tree does not only aggregate patches
to turn the BKL into more traditional locks. The Bkl has been
converted to a common mutex in
this tree, making it losing its common horrid properties:

- release/reacquire on schedule
- not preemptable
- can be reacquired recursively by a same task

Such a basis is very useful because we can easily find these places
which won't support a usual lock conversion without reworking the
locking scheme.
This is a necessary preliminary for the Bkl removal.
All the places which have been designed very tightly with Bkl
properties are rapidly detected
with lockdep in this tree and reworked, still using lockdep, code
reviewing and the help of
this Bkl-to-mutex conversion.

The work done with this tree can be merged inside and also on the
matching subsytem tree for
each patchset. That's a very sane workflow IMHO.

Thanks,
Frederic.

2009-04-24 13:58:40

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

On Fri, Apr 24, 2009 at 10:06:34AM +0200, Ingo Molnar wrote:

> You've not replied to my request (attached below) to put these
> trivial BKL-pushdown bits into a separate branch/tree and not into
> the VFS tree. You've now mixed that commit with other VFS changes.
>
> Had it been in a separate branch, and had we tested it, Linus could
> have pulled the trivial BKL pushdown bits out of normal merge order
> as well. That is not possible now.
>
> Furthermore, by doing this you are also hindering the
> tip:kill-the-BKL effort (which has been ongoing for a year chipping
> away at various BKL details) which facilitated these changes.
> Alessio did these fixes to fix bugs he can trigger in that tree.
>
> You've also not explained why you have done it this way. It would
> cost you almost nothing to apply these bits into a separate branch
> and merge that branch into your main tree. Lots of other maintainer
> are doing that.
>
> So if you've done this by mistake, i'd like to ask you to reconsider
> and put these bits into a separate, stable-commit-ID branch. If
> you've done this intentionally, i'd like you to explain the reasons
> for it, instead of just doing it silently without explanation.
>
> Anwyay, if there's no resolution, i'll apply Alessio's fixes with a
> different commit ID, to not hold up the rather useful work that is
> going on in the kill-the-BKL tree. Later on i'll have to rebase that
> portion of the tree to avoid duplicate commit IDs. I just wanted to
> put it on the record why i have to do that rebase.

Good grief... You have the commit ID, git fetch + git-cherry-pick would
take two lines to type instead of more than a screenful.

This patch is certainly trivial enough to go into the mainline at any point.
Including "right now". However, the stuff to follow it might get more
convoluted and I wouldn't argue for pushing it before the next merge window.
It's not just the "push BKL down there" - that I could simply do right
now and ACK pushing it to Linus/push myself. Unless I'm mistaken, you
want to pull the subsequent "remove BKL in foofs" bits as well and those
are almost certainly going to get entangled with other stuff.

I'm not particulary against a separate branch for all that stuff (including
the remount changes that'll be needed, etc.). The question is, what merge
time are you aiming for and how much VFS stuff are you willing to tolerate
in that branch?

Details, please...

2009-04-24 14:31:41

by Jonathan Corbet

[permalink] [raw]

Subject: Re: [PATCH 0/5 -tip] umount_begin BKL pushdown

On Fri, 24 Apr 2009 11:57:44 +1000
Stephen Rothwell <[email protected]> wrote:

> > When that happens, could you please put it into a separate,
> > append-only branch i could pull (after some initial test-time) into
> > tip:kill-the-BKL?
>
> And just wondering how all this relates to Jonathan's (currently empty)
> bkl-removal tree (that is merged into linux-next)?

I set up my tree after a request from Linus; it was meant as a place
for near-term BKL-removal stuff to accumulate. In the last couple of
cycles, it's only had a small trickle of stuff that I've been able to
do myself. Ingo's tree is a rather grander vision, including a bunch
of debugging stuff and more.

The time for my tree may be nearing its end, but I'd just as soon keep
it around for a little longer. I may yet find some time to do some
more work in this area.

jon

2009-04-24 17:51:13

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

On Fri, Apr 24, 2009 at 11:16:18AM +0200, Fr?d?ric Weisbecker wrote:
> I disagree with you. The kill-the-BKL tree does not only aggregate patches
> to turn the BKL into more traditional locks. The Bkl has been
> converted to a common mutex in
> this tree, making it losing its common horrid properties:
>
> - release/reacquire on schedule
> - not preemptable
> - can be reacquired recursively by a same task
>
> Such a basis is very useful because we can easily find these places
> which won't support a usual lock conversion without reworking the
> locking scheme.
> This is a necessary preliminary for the Bkl removal.
> All the places which have been designed very tightly with Bkl
> properties are rapidly detected
> with lockdep in this tree and reworked, still using lockdep, code
> reviewing and the help of
> this Bkl-to-mutex conversion.
>
> The work done with this tree can be merged inside and also on the
> matching subsytem tree for
> each patchset. That's a very sane workflow IMHO.

Having a working tree for debugging stuff is fine, but the point is
that it should never be pulled into mainline and probably frequently
reabsed to avoid cruft. In that case there's really no point in
creating branches to share pieces of tree history, just apply the patch
locally if you think you want it and merge or rebase once mainline gets
the patch.

Al frequently rebases the vfs tree, btw - so even if it was a separate
branch now there's a fair chance it would end up in mainline with a
different commit id.

2009-04-24 18:55:49

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

On Fri, Apr 24, 2009 at 01:50:25PM -0400, Christoph Hellwig wrote:

> Having a working tree for debugging stuff is fine, but the point is
> that it should never be pulled into mainline and probably frequently
> reabsed to avoid cruft. In that case there's really no point in
> creating branches to share pieces of tree history, just apply the patch
> locally if you think you want it and merge or rebase once mainline gets
> the patch.
>
> Al frequently rebases the vfs tree, btw - so even if it was a separate
> branch now there's a fair chance it would end up in mainline with a
> different commit id.

Nah, it's not that. I can hold that in a separate branch and keep it
anchored. The question is, what else will end up there?
* the work inside the methods on BKL _removal_
* things like merging that ->write_super() call into ->put_super(),
etc.
* probably parts of work on s_flags mess and ro (tied to remout)

I agree that getting rid of BKL in that area is a good thing; no arguments
about that. If it had been entirely self-contained, I'd gladly drop that
stuff into a separate branch, let mingo pull it and forgot about the entire
thing.

The things get tricky, though, since we have two more things in the same
area: remount (once Nick comes back with the latest on mnt_write_count,
I'm going to merge that and start on per-sb side, BTW) and stuff around
Jan's sync series.

So let's figure out how do we do that. I have no problem with a single
branch for *all* of that, separate from the rest of VFS stuff. However,
I very much suspect that it's not what mingo et.al. have in mind - too
much stuff alien for them. I can keep a cherry-picked branch with minimal
BKL-affecting backports from that one. It might or might not be OK,
depending on what the hell their workflow is in -tip. I honestly have
no idea how the devil the things are done there, except that it apparently
involves much more merges than I'd be comfortable with, but then I never
had a taste for literal clusterf*cks either.

Could the folks from the other side tell
* what kind of patches do they want in that branch
* what kind of patches can they accept in that branch
* when do they intend to see it merged into mainline
* how much is going to be merged on top of that and how often
(if ever) is it going to be thrown out and re-pulled. I.e. is that for
a devel/debugging tree pulled together from many topic branches on
regular basis, with branches dropped/re-added/etc. (i.e. something a-la
linux-next) or is that something more cast in stone?

Seriously, let's sort that out; flamefests being what they are, there's
a real problem with keeping two streams of development tolerable for
participants. I *do* have very unkind words to say to Ingo, but that's
a matter for private mail and I'm not going to let that anywhere near
development question.

2009-04-24 19:02:24

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

On Fri, Apr 24, 2009 at 07:55:24PM +0100, Al Viro wrote:
> Nah, it's not that. I can hold that in a separate branch and keep it
> anchored. The question is, what else will end up there?
> * the work inside the methods on BKL _removal_
> * things like merging that ->write_super() call into ->put_super(),
> etc.
> * probably parts of work on s_flags mess and ro (tied to remout)

* moving lock_super from callers into ->write_super and
->remount_fs. No need to only push one lock down when we
touch them anyway.

2009-04-24 20:43:41

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

On Fri, Apr 24, 2009 at 03:02:01PM -0400, Christoph Hellwig wrote:
> On Fri, Apr 24, 2009 at 07:55:24PM +0100, Al Viro wrote:
> > Nah, it's not that. I can hold that in a separate branch and keep it
> > anchored. The question is, what else will end up there?
> > * the work inside the methods on BKL _removal_
> > * things like merging that ->write_super() call into ->put_super(),
> > etc.
> > * probably parts of work on s_flags mess and ro (tied to remout)
>
> * moving lock_super from callers into ->write_super and
> ->remount_fs. No need to only push one lock down when we
> touch them anyway.

Aye. We want lock_super() dead anyway, and the most obvious way to do that
is to shove it down to filesystems and let them kill it one by one.

We might want to switch sync_supers() et.al. to down_write() and do
similar thing in callers of fsync_super/__fsync_super() (i.e. instead
of get_super() use a variant that would lock for write), while we are
at it; OTOH, that'll need careful thinking about - things like
shrink_dcache() currently don't care about write_super done in parallel
and we might add contention where it hurts.

It's worth investigating, though - switching s_umount to mutex is attractive
per se and we might simplify locking e.g. in freeze/thaw on that.

2009-04-24 22:07:44

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

* Christoph Hellwig <[email protected]> wrote:

> On Fri, Apr 24, 2009 at 11:16:18AM +0200, Fr?d?ric Weisbecker wrote:
> > I disagree with you. The kill-the-BKL tree does not only aggregate patches
> > to turn the BKL into more traditional locks. The Bkl has been
> > converted to a common mutex in
> > this tree, making it losing its common horrid properties:
> >
> > - release/reacquire on schedule
> > - not preemptable
> > - can be reacquired recursively by a same task
> >
> > Such a basis is very useful because we can easily find these places
> > which won't support a usual lock conversion without reworking the
> > locking scheme.
> > This is a necessary preliminary for the Bkl removal.
> > All the places which have been designed very tightly with Bkl
> > properties are rapidly detected
> > with lockdep in this tree and reworked, still using lockdep, code
> > reviewing and the help of
> > this Bkl-to-mutex conversion.
> >
> > The work done with this tree can be merged inside and also on the
> > matching subsytem tree for
> > each patchset. That's a very sane workflow IMHO.
>
> Having a working tree for debugging stuff is fine, but the point
> is that it should never be pulled into mainline and probably
> frequently reabsed to avoid cruft. In that case there's really no
> point in creating branches to share pieces of tree history, just
> apply the patch locally if you think you want it and merge or
> rebase once mainline gets the patch.
>
> Al frequently rebases the vfs tree, btw [...]

Btw., doing that can (and will) destroy Git history and is pretty
explicitly discouraged.

> [...] - so even if it was a separate branch now there's a fair
> chance it would end up in mainline with a different commit id.

So did i get you right, you are advocating people to rebase their
trees because the VFS tree is rebased often? That's pretty
backwards.

Ingo

2009-04-24 22:19:40

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

* Al Viro <[email protected]> wrote:

> On Fri, Apr 24, 2009 at 10:06:34AM +0200, Ingo Molnar wrote:
>
> > You've not replied to my request (attached below) to put these
> > trivial BKL-pushdown bits into a separate branch/tree and not into
> > the VFS tree. You've now mixed that commit with other VFS changes.
> >
> > Had it been in a separate branch, and had we tested it, Linus could
> > have pulled the trivial BKL pushdown bits out of normal merge order
> > as well. That is not possible now.
> >
> > Furthermore, by doing this you are also hindering the
> > tip:kill-the-BKL effort (which has been ongoing for a year chipping
> > away at various BKL details) which facilitated these changes.
> > Alessio did these fixes to fix bugs he can trigger in that tree.
> >
> > You've also not explained why you have done it this way. It would
> > cost you almost nothing to apply these bits into a separate branch
> > and merge that branch into your main tree. Lots of other maintainer
> > are doing that.
> >
> > So if you've done this by mistake, i'd like to ask you to reconsider
> > and put these bits into a separate, stable-commit-ID branch. If
> > you've done this intentionally, i'd like you to explain the reasons
> > for it, instead of just doing it silently without explanation.
> >
> > Anwyay, if there's no resolution, i'll apply Alessio's fixes with a
> > different commit ID, to not hold up the rather useful work that is
> > going on in the kill-the-BKL tree. Later on i'll have to rebase that
> > portion of the tree to avoid duplicate commit IDs. I just wanted to
> > put it on the record why i have to do that rebase.
>
> Good grief... You have the commit ID, git fetch + git-cherry-pick
> would take two lines to type instead of more than a screenful.

I did that before writing this mail - look at the
tip:core/kill-the-BKL tree. That is why i got worried about
'poisonig' that tree with a patch like that.

> This patch is certainly trivial enough to go into the mainline at
> any point. Including "right now". However, the stuff to follow it
> might get more convoluted and I wouldn't argue for pushing it
> before the next merge window. It's not just the "push BKL down
> there" - that I could simply do right now and ACK pushing it to
> Linus/push myself. Unless I'm mistaken, you want to pull the
> subsequent "remove BKL in foofs" bits as well and those are almost
> certainly going to get entangled with other stuff.
>
> I'm not particulary against a separate branch for all that stuff
> (including the remount changes that'll be needed, etc.). The
> question is, what merge time are you aiming for and how much VFS
> stuff are you willing to tolerate in that branch?
>
> Details, please...

As i pointed it out in the first mail: our immediate concern is NFS
(nfs_get_sb() in particular), where we can reproduce real and easy
deadlocks with BKL-as-a-mutex. So if you could push this patch to
Linus (or just not NAK me cherry-picking your precise commit) that
would be enough to continue here.

[ Surprisingly large amount of BKL code gets away with a plain-mutex
BKL - and that's the basis of our experimental tree: we removed
all BKL logic from the scheduler and turned it into a plain mutex
- and are using lockdep and other measures to map out all
'complex' BKL assumptions that trigger in practice - combined with
review efforts such as Frederic's.

Basically lockdep is the 'binocular' that finds the trouble spots,
review is the knife that fixes the problem. ]

Anyway - regarding this commit, doing it via a branch would have
been the most Git-ish way to solve it - and that's what i'm using in
equivalent situations - but if you rebase your tree often as
Christoph Hellwig suggests i can imagine that causing problems.

In fact, you do seem to have rebased a lot of commits just a couple
of days ago:

earth4:~/linux.trees.git> git log --pretty=fuller linus/master..vfs/for-next | grep CommitDate
CommitDate: Fri Apr 24 03:15:47 2009 -0400
CommitDate: Thu Apr 23 19:30:02 2009 -0400
CommitDate: Thu Apr 23 19:30:02 2009 -0400
CommitDate: Tue Apr 21 17:55:57 2009 -0400
CommitDate: Tue Apr 21 17:55:56 2009 -0400
CommitDate: Tue Apr 21 17:55:55 2009 -0400
CommitDate: Tue Apr 21 17:55:54 2009 -0400
CommitDate: Tue Apr 21 17:55:53 2009 -0400
CommitDate: Tue Apr 21 17:55:52 2009 -0400
CommitDate: Tue Apr 21 17:55:51 2009 -0400
CommitDate: Tue Apr 21 17:55:50 2009 -0400
CommitDate: Tue Apr 21 17:55:49 2009 -0400
CommitDate: Tue Apr 21 17:55:48 2009 -0400
CommitDate: Tue Apr 21 17:55:47 2009 -0400
CommitDate: Tue Apr 21 17:55:46 2009 -0400
CommitDate: Tue Apr 21 17:55:45 2009 -0400
CommitDate: Tue Apr 21 17:55:44 2009 -0400
CommitDate: Tue Apr 21 17:55:43 2009 -0400
CommitDate: Tue Apr 21 17:55:42 2009 -0400
CommitDate: Tue Apr 21 17:55:41 2009 -0400
CommitDate: Tue Apr 21 17:55:40 2009 -0400
CommitDate: Tue Apr 21 17:55:39 2009 -0400

So yes, a branch with a stable commit is not possible for you to do.

Would you mind to describe the workflow that leads to such frequent
rebasing? The commit dates are nicely apart:

earth4:~/linux.trees.git> git log --pretty=fuller linus/master..vfs/for-next | grep AuthorDate
AuthorDate: Fri Apr 24 09:06:53 2009 +0200
AuthorDate: Fri Apr 24 01:02:45 2009 +0200
AuthorDate: Fri Apr 24 01:01:56 2009 +0200
AuthorDate: Sat Apr 18 14:06:57 2009 -0400
AuthorDate: Sat Apr 18 13:59:41 2009 -0400
AuthorDate: Sat Apr 18 13:58:15 2009 -0400
AuthorDate: Sat Apr 18 03:28:19 2009 -0400
AuthorDate: Sat Apr 18 03:26:48 2009 -0400
AuthorDate: Sat Apr 18 03:00:46 2009 -0400
AuthorDate: Sat Apr 18 02:42:05 2009 -0400
AuthorDate: Sat Apr 18 02:14:32 2009 -0400
AuthorDate: Sat Apr 18 02:04:46 2009 -0400
AuthorDate: Tue Apr 7 12:21:18 2009 -0400
AuthorDate: Tue Apr 7 11:53:49 2009 -0400
AuthorDate: Tue Apr 7 11:49:53 2009 -0400
AuthorDate: Tue Apr 7 11:44:16 2009 -0400
AuthorDate: Tue Apr 7 11:08:56 2009 -0400
AuthorDate: Mon Apr 6 11:16:22 2009 -0400
AuthorDate: Mon Apr 6 09:38:49 2009 -0400
AuthorDate: Thu Apr 2 21:17:03 2009 -0400
AuthorDate: Tue Apr 7 13:19:18 2009 -0400
AuthorDate: Mon Apr 6 16:43:42 2009 -0700

so these are not commits developed in the same minute as the
commit-date suggests. I.e. the whole tree got rebased at Apr 21
17:55.

Ing

2009-04-24 22:49:53

[permalink] [raw]

Subject: Re: [PATCH 1/1] vfs: umount_begin BKL pushdown v2

* Christoph Hellwig <[email protected]> wrote:

> > You've also not explained why you have done it this way. It
> > would cost you almost nothing to apply these bits into a
> > separate branch and merge that branch into your main tree. Lots
> > of other maintainer are doing that.
>
> Having a separate kill the BKL tree is a stupid idea. Locking
> changes need deep subsystem knowledge and should always go through
> the subsystem trees.

Here you are missing the small inconvenient fact that having the
kill-the-BKL tree is what got this work underway. It is what got
developers interested, it is what is concentrating the effort, and
it is that is producing the patches.

_Nobody_ ever suggested that VFS patches should not go upstream via
the VFS tree. We are _happy_ that BKL removal patches are finally
flowing through the VFS tree.

The _only_ very minimal courtesy i was asking for was to also be
'allowed' to carry those fixes that we WROTE, with the same commit
ID - so that if the kill-the-BKL tree goes upstream sometime in the
(apparently far) future (well after the VFS bits go upstream), it
will look nice and wont have duplicate commits. We are patient, and
we'd like to maintain a tidy tree.

But i didnt even get a _reply_ to that initial request - Al just
committed it straight into the VFS tree and ignored my question
somewhat rudely.

The thing is, for years you never cared about the BKL being deep
embedded in the guts of the VFS. But the minute someone _else_ does
what arguably you should have done long ago, you stand in the way
and hinder that effort by first proclaiming that this tree should
not be doing such changes and then forcing it into an ugly (future)
rebase?

Exactly how does such kind of behavior help Linux, in your opinion?

Ingo

2009-04-25 07:16:48