2001-11-23 21:22:43

by Alexander Viro

[permalink] [raw]
Subject: 2.4.15-pre9 breakage (inode.c)

Sigh... Supposed fix to problems with stale inodes was completely
broken.

What we need is "if we are doing last iput() on fs that is getting
shut, sync it and don't leave it in cache". And yes, we have a similar
path in iput(). Similar, but not quite the same.

Fix is
* new fs flag: "MS_ACTIVE".
* set after normal ->read_super().
* reset after we are done with fsync_super() in kill_super().
* iput() checking that and if it's set - doing write_inode_now() and kicking
it out of hash.

I'll send patch in ~10 minutes.


2001-11-23 21:44:33

by Jeff Merkey

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)


Al,

I have seen this as well during testing, except in my case, it results in a
hard hang of sorts that does not respond to signals. I can make it show up
easily by doing a copy accross mount points in low memory conditions, if any
of this helps. Sounds like you've got a good handle on it, though.

Jeff

----- Original Message -----
From: "Alexander Viro" <[email protected]>
To: <[email protected]>
Cc: "Linus Torvalds" <[email protected]>; "Marcelo Tosatti"
<[email protected]>
Sent: Friday, November 23, 2001 2:22 PM
Subject: 2.4.15-pre9 breakage (inode.c)


> Sigh... Supposed fix to problems with stale inodes was completely
> broken.
>
> What we need is "if we are doing last iput() on fs that is getting
> shut, sync it and don't leave it in cache". And yes, we have a similar
> path in iput(). Similar, but not quite the same.
>
> Fix is
> * new fs flag: "MS_ACTIVE".
> * set after normal ->read_super().
> * reset after we are done with fsync_super() in kill_super().
> * iput() checking that and if it's set - doing write_inode_now() and
kicking
> it out of hash.
>
> I'll send patch in ~10 minutes.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2001-11-23 21:51:33

by Alexander Viro

[permalink] [raw]
Subject: [PATCH][CFT] Re: 2.4.15-pre9 breakage (inode.c)

Untested fix follows. And please, pass the brown paperbag... ;-/

diff -urN S15/fs/inode.c S15-fix/fs/inode.c
--- S15/fs/inode.c Fri Nov 23 06:45:43 2001
+++ S15-fix/fs/inode.c Fri Nov 23 16:33:33 2001
@@ -1065,24 +1065,27 @@
if (inode->i_state != I_CLEAR)
BUG();
} else {
- if (!list_empty(&inode->i_hash) && sb && sb->s_root) {
+ if (!list_empty(&inode->i_hash) {
if (!(inode->i_state & (I_DIRTY|I_LOCK))) {
list_del(&inode->i_list);
list_add(&inode->i_list, &inode_unused);
}
inodes_stat.nr_unused++;
spin_unlock(&inode_lock);
- return;
- } else {
- list_del_init(&inode->i_list);
+ if (!sb || sb->s_flags & MS_ACTIVE))
+ return;
+ write_inode_now(inode);
+ spin_lock(&inode_lock);
+ inodes_stat.nr_unused--;
list_del_init(&inode->i_hash);
- inode->i_state|=I_FREEING;
- inodes_stat.nr_inodes--;
- spin_unlock(&inode_lock);
- if (inode->i_data.nrpages)
- truncate_inode_pages(&inode->i_data, 0);
- clear_inode(inode);
}
+ list_del_init(&inode->i_list);
+ inode->i_state|=I_FREEING;
+ inodes_stat.nr_inodes--;
+ spin_unlock(&inode_lock);
+ if (inode->i_data.nrpages)
+ truncate_inode_pages(&inode->i_data, 0);
+ clear_inode(inode);
}
destroy_inode(inode);
}
diff -urN S15/fs/super.c S15-fix/fs/super.c
--- S15/fs/super.c Fri Nov 23 06:45:43 2001
+++ S15-fix/fs/super.c Fri Nov 23 16:26:18 2001
@@ -462,6 +462,7 @@
lock_super(s);
if (!type->read_super(s, data, flags & MS_VERBOSE ? 1 : 0))
goto out_fail;
+ s->s_flags |= MS_ACTIVE;
unlock_super(s);
/* tell bdcache that we are going to keep this one */
if (bdev)
@@ -614,6 +615,7 @@
lock_super(s);
if (!fs_type->read_super(s, data, flags & MS_VERBOSE ? 1 : 0))
goto out_fail;
+ s->s_flags |= MS_ACTIVE;
unlock_super(s);
get_filesystem(fs_type);
path_release(&nd);
@@ -695,6 +697,7 @@
lock_super(s);
if (!fs_type->read_super(s, data, flags & MS_VERBOSE ? 1 : 0))
goto out_fail;
+ s->s_flags |= MS_ACTIVE;
unlock_super(s);
get_filesystem(fs_type);
return s;
@@ -739,6 +742,7 @@
dput(root);
fsync_super(sb);
lock_super(sb);
+ sb->s_flags &= ~MS_ACTIVE;
invalidate_inodes(sb); /* bad name - it should be evict_inodes() */
if (sop) {
if (sop->write_super && sb->s_dirt)
diff -urN S15/include/linux/fs.h S15-fix/include/linux/fs.h
--- S15/include/linux/fs.h Fri Nov 23 06:45:44 2001
+++ S15-fix/include/linux/fs.h Fri Nov 23 16:24:44 2001
@@ -110,6 +110,7 @@
#define MS_BIND 4096
#define MS_REC 16384
#define MS_VERBOSE 32768
+#define MS_ACTIVE (1<<30)
#define MS_NOUSER (1<<31)

/*

2001-11-23 22:07:16

by Alexander Viro

[permalink] [raw]
Subject: Re: [PATCH][CFT] Re: 2.4.15-pre9 breakage (inode.c)



On Fri, 23 Nov 2001, Alexander Viro wrote:

> Untested fix follows. And please, pass the brown paperbag... ;-/

... and now for something that really builds:

diff -urN S15/fs/inode.c S15-fix/fs/inode.c
--- S15/fs/inode.c Fri Nov 23 06:45:43 2001
+++ S15-fix/fs/inode.c Fri Nov 23 17:04:05 2001
@@ -1065,24 +1065,27 @@
if (inode->i_state != I_CLEAR)
BUG();
} else {
- if (!list_empty(&inode->i_hash) && sb && sb->s_root) {
+ if (!list_empty(&inode->i_hash)) {
if (!(inode->i_state & (I_DIRTY|I_LOCK))) {
list_del(&inode->i_list);
list_add(&inode->i_list, &inode_unused);
}
inodes_stat.nr_unused++;
spin_unlock(&inode_lock);
- return;
- } else {
- list_del_init(&inode->i_list);
+ if (!sb || sb->s_flags & MS_ACTIVE)
+ return;
+ write_inode_now(inode, 1);
+ spin_lock(&inode_lock);
+ inodes_stat.nr_unused--;
list_del_init(&inode->i_hash);
- inode->i_state|=I_FREEING;
- inodes_stat.nr_inodes--;
- spin_unlock(&inode_lock);
- if (inode->i_data.nrpages)
- truncate_inode_pages(&inode->i_data, 0);
- clear_inode(inode);
}
+ list_del_init(&inode->i_list);
+ inode->i_state|=I_FREEING;
+ inodes_stat.nr_inodes--;
+ spin_unlock(&inode_lock);
+ if (inode->i_data.nrpages)
+ truncate_inode_pages(&inode->i_data, 0);
+ clear_inode(inode);
}
destroy_inode(inode);
}
diff -urN S15/fs/super.c S15-fix/fs/super.c
--- S15/fs/super.c Fri Nov 23 06:45:43 2001
+++ S15-fix/fs/super.c Fri Nov 23 16:56:50 2001
@@ -462,6 +462,7 @@
lock_super(s);
if (!type->read_super(s, data, flags & MS_VERBOSE ? 1 : 0))
goto out_fail;
+ s->s_flags |= MS_ACTIVE;
unlock_super(s);
/* tell bdcache that we are going to keep this one */
if (bdev)
@@ -614,6 +615,7 @@
lock_super(s);
if (!fs_type->read_super(s, data, flags & MS_VERBOSE ? 1 : 0))
goto out_fail;
+ s->s_flags |= MS_ACTIVE;
unlock_super(s);
get_filesystem(fs_type);
path_release(&nd);
@@ -695,6 +697,7 @@
lock_super(s);
if (!fs_type->read_super(s, data, flags & MS_VERBOSE ? 1 : 0))
goto out_fail;
+ s->s_flags |= MS_ACTIVE;
unlock_super(s);
get_filesystem(fs_type);
return s;
@@ -739,6 +742,7 @@
dput(root);
fsync_super(sb);
lock_super(sb);
+ sb->s_flags &= ~MS_ACTIVE;
invalidate_inodes(sb); /* bad name - it should be evict_inodes() */
if (sop) {
if (sop->write_super && sb->s_dirt)
diff -urN S15/include/linux/fs.h S15-fix/include/linux/fs.h
--- S15/include/linux/fs.h Fri Nov 23 06:45:44 2001
+++ S15-fix/include/linux/fs.h Fri Nov 23 17:01:18 2001
@@ -110,6 +110,7 @@
#define MS_BIND 4096
#define MS_REC 16384
#define MS_VERBOSE 32768
+#define MS_ACTIVE (1<<30)
#define MS_NOUSER (1<<31)

/*

2001-11-23 22:35:49

by Phil Sorber

[permalink] [raw]
Subject: Re: [PATCH][CFT] Re: 2.4.15-pre9 breakage (inode.c)

does this affect non-ext2 systems? the only complaints i have seen on
here were related to ext2, i am running reiserfs, should i have to be
worried as well?

On Fri, 2001-11-23 at 17:06, Alexander Viro wrote:
>
>
> On Fri, 23 Nov 2001, Alexander Viro wrote:
>
> > Untested fix follows. And please, pass the brown paperbag... ;-/
>
> ... and now for something that really builds:
>
> diff -urN S15/fs/inode.c S15-fix/fs/inode.c
> --- S15/fs/inode.c Fri Nov 23 06:45:43 2001
> +++ S15-fix/fs/inode.c Fri Nov 23 17:04:05 2001
> @@ -1065,24 +1065,27 @@
> if (inode->i_state != I_CLEAR)
> BUG();
> } else {
> - if (!list_empty(&inode->i_hash) && sb && sb->s_root) {
> + if (!list_empty(&inode->i_hash)) {
> if (!(inode->i_state & (I_DIRTY|I_LOCK))) {
> list_del(&inode->i_list);
> list_add(&inode->i_list, &inode_unused);
> }
> inodes_stat.nr_unused++;
> spin_unlock(&inode_lock);
> - return;
> - } else {
> - list_del_init(&inode->i_list);
> + if (!sb || sb->s_flags & MS_ACTIVE)
> + return;
> + write_inode_now(inode, 1);
> + spin_lock(&inode_lock);
> + inodes_stat.nr_unused--;
> list_del_init(&inode->i_hash);
> - inode->i_state|=I_FREEING;
> - inodes_stat.nr_inodes--;
> - spin_unlock(&inode_lock);
> - if (inode->i_data.nrpages)
> - truncate_inode_pages(&inode->i_data, 0);
> - clear_inode(inode);
> }
> + list_del_init(&inode->i_list);
> + inode->i_state|=I_FREEING;
> + inodes_stat.nr_inodes--;
> + spin_unlock(&inode_lock);
> + if (inode->i_data.nrpages)
> + truncate_inode_pages(&inode->i_data, 0);
> + clear_inode(inode);
> }
> destroy_inode(inode);
> }
> diff -urN S15/fs/super.c S15-fix/fs/super.c
> --- S15/fs/super.c Fri Nov 23 06:45:43 2001
> +++ S15-fix/fs/super.c Fri Nov 23 16:56:50 2001
> @@ -462,6 +462,7 @@
> lock_super(s);
> if (!type->read_super(s, data, flags & MS_VERBOSE ? 1 : 0))
> goto out_fail;
> + s->s_flags |= MS_ACTIVE;
> unlock_super(s);
> /* tell bdcache that we are going to keep this one */
> if (bdev)
> @@ -614,6 +615,7 @@
> lock_super(s);
> if (!fs_type->read_super(s, data, flags & MS_VERBOSE ? 1 : 0))
> goto out_fail;
> + s->s_flags |= MS_ACTIVE;
> unlock_super(s);
> get_filesystem(fs_type);
> path_release(&nd);
> @@ -695,6 +697,7 @@
> lock_super(s);
> if (!fs_type->read_super(s, data, flags & MS_VERBOSE ? 1 : 0))
> goto out_fail;
> + s->s_flags |= MS_ACTIVE;
> unlock_super(s);
> get_filesystem(fs_type);
> return s;
> @@ -739,6 +742,7 @@
> dput(root);
> fsync_super(sb);
> lock_super(sb);
> + sb->s_flags &= ~MS_ACTIVE;
> invalidate_inodes(sb); /* bad name - it should be evict_inodes() */
> if (sop) {
> if (sop->write_super && sb->s_dirt)
> diff -urN S15/include/linux/fs.h S15-fix/include/linux/fs.h
> --- S15/include/linux/fs.h Fri Nov 23 06:45:44 2001
> +++ S15-fix/include/linux/fs.h Fri Nov 23 17:01:18 2001
> @@ -110,6 +110,7 @@
> #define MS_BIND 4096
> #define MS_REC 16384
> #define MS_VERBOSE 32768
> +#define MS_ACTIVE (1<<30)
> #define MS_NOUSER (1<<31)
>
> /*
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
--
Phil Sorber
AIM: PSUdaemon
IRC: irc.openprojects.net #psulug PSUdaemon
GnuPG: keyserver - pgp.mit.edu


Attachments:
(No filename) (232.00 B)

2001-11-23 22:35:19

by Russell King

[permalink] [raw]
Subject: Re: [PATCH][CFT] Re: 2.4.15-pre9 breakage (inode.c)

On Fri, Nov 23, 2001 at 05:06:46PM -0500, Alexander Viro wrote:
> On Fri, 23 Nov 2001, Alexander Viro wrote:
> > Untested fix follows. And please, pass the brown paperbag... ;-/
>
> ... and now for something that really builds:

I can confirm this passes my boot, shutdown, reboot test that failed with
2.4.15-greased-turkey. 2.4.15-viro1 is currently running my kernel build
tests quite happily thus far.

I think 2.4.15-greased-turkey should be renamed to 2.4.15-dead-duck. 8)

Al, thanks for the fast response. Appologies for the slow build.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-11-23 22:50:12

by Alexander Viro

[permalink] [raw]
Subject: Re: [PATCH][CFT] Re: 2.4.15-pre9 breakage (inode.c)



On 23 Nov 2001, Phil Sorber wrote:

> does this affect non-ext2 systems? the only complaints i have seen on
> here were related to ext2, i am running reiserfs, should i have to be
> worried as well?

Same problem.

Trivial workaround to prevent damage is to do sync before umount. It's
_not_ a real fix, though - just a way to get out of the situation without
fs damage.

For real fix see the last variant posted on l-k.

2001-11-23 23:06:42

by Andreas Dilger

[permalink] [raw]
Subject: Re: [PATCH][CFT] Re: 2.4.15-pre9 breakage (inode.c)

On Nov 23, 2001 17:06 -0500, Alexander Viro wrote:
> On Fri, 23 Nov 2001, Alexander Viro wrote:
> > Untested fix follows. And please, pass the brown paperbag... ;-/
>
> ... and now for something that really builds:

Hey, this gives Linus a good reason to make another 2.4.15 release,
and silence all of the people complaining about -greased-turkey (which,
as we all know, was Linus' prerelease for testing that everything was
working OK before he made an _official_ 2.4.15 release, and a good thing
he did or this bug wouldn't have shown up until the _official_ 2.4.15
release which would have been embarassing ;-).

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/

2001-11-23 23:36:05

by Russell King

[permalink] [raw]
Subject: Re: [PATCH][CFT] Re: 2.4.15-pre9 breakage (inode.c)

On Fri, Nov 23, 2001 at 04:05:36PM -0700, Andreas Dilger wrote:
> Hey, this gives Linus a good reason to make another 2.4.15 release,
> and silence all of the people complaining about -greased-turkey (which,
> as we all know, was Linus' prerelease for testing that everything was
> working OK before he made an _official_ 2.4.15 release, and a good thing
> he did or this bug wouldn't have shown up until the _official_ 2.4.15
> release which would have been embarassing ;-).

Actually, had Linus have waited until tonight before doing the 2.4.15
release, I would've been running 2.4.15-pre9 on the machine which showed
the problem.

>From pre7, pre8 came out, didn't even get to build that before pre9 came
out. Built pre9, slept. Woke up and 2.4.15-dead-duck was out. Had I
not immediately grabbed the dead-duck release, I would've been running
pre9 on the machine, which I understand contains the problem as well.

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-11-24 01:54:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH][CFT] Re: 2.4.15-pre9 breakage (inode.c)


On Fri, 23 Nov 2001, Andreas Dilger wrote:
>
> Hey, this gives Linus a good reason to make another 2.4.15 release,

Think of it this way: this is a good dry-run for Marcelo ;)

Linus "hates maintenance" Torvalds

2001-11-24 05:48:06

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)

On Fri, Nov 23, 2001 at 04:22:17PM -0500, Alexander Viro wrote:
> Sigh... Supposed fix to problems with stale inodes was completely
> broken.
>
> What we need is "if we are doing last iput() on fs that is getting
> shut, sync it and don't leave it in cache". And yes, we have a similar

What's this "stale inode" problem? invalidate_inodes in kill_super will
obviously get rid of all of them or we would be getting the
"selfdestructing in 5 seconds" message all the time.

I'm probably missing something, but actually to me it seems the patch in
pre9 plus this your further fix are completly superflous. And if you
mean that there's a race in the sync+invalidate_inode sequence (I didn't
checked) that should be serialized at the higher dcache/mnt layer,
without messing and slowing down all of the iput()s just for the unmount
case where the fsync+invalidate can be serialized by design at the mnt
layer without iput intervention. I actually believe this is the right
fix and it should work fine as far as anything <= 2.4.15pre8 was working
correctly. I'm going to release a 2.4.15aa1 with the below fix applied
(while waiting your comments of course, thanks!).

--- 2.4.15pre9aa1/fs/inode.c.~1~ Thu Nov 22 20:48:23 2001
+++ 2.4.15pre9aa1/fs/inode.c Sat Nov 24 06:30:20 2001
@@ -1071,7 +1071,7 @@
if (inode->i_state != I_CLEAR)
BUG();
} else {
- if (!list_empty(&inode->i_hash) && sb && sb->s_root) {
+ if (!list_empty(&inode->i_hash)) {
if (!(inode->i_state & (I_DIRTY|I_LOCK))) {
list_del(&inode->i_list);
list_add(&inode->i_list, &inode_unused);

Andrea

2001-11-24 06:05:07

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)



On Sat, 24 Nov 2001, Andrea Arcangeli wrote:

> On Fri, Nov 23, 2001 at 04:22:17PM -0500, Alexander Viro wrote:
> > Sigh... Supposed fix to problems with stale inodes was completely
> > broken.
> >
> > What we need is "if we are doing last iput() on fs that is getting
> > shut, sync it and don't leave it in cache". And yes, we have a similar
>
> What's this "stale inode" problem? invalidate_inodes in kill_super will
> obviously get rid of all of them or we would be getting the

First of all, there is ->read_super() side of the things. If it fails
after iget() on root, we have nothing to kick inode out of cache. And
no, we can't call invalidate_inodes() here - too late for calling any
methods.

What's more, for stuff like inodes held by superblock (e.g. fs keeping
block bitmaps in a file - in that case the earliest point that _can_
do iput() on that sucker is ->put_super(); ditto for $BIGNUM similar
cases - journal, other fs structures of that kind - ACLs, etc., etc.)
we get final iput() _after_ invalidate_inodes(). And doing anything
after ->put_super() is again too late.

IOW, we can kick inode out of icache only between successful ->read_super()
and ->put_super(). Any iput() done outside of that range must go immediately
and yes, such cases are not only possible but actually exist.

2001-11-24 06:01:57

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)


On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
>
> --- 2.4.15pre9aa1/fs/inode.c.~1~ Thu Nov 22 20:48:23 2001
> +++ 2.4.15pre9aa1/fs/inode.c Sat Nov 24 06:30:20 2001
> @@ -1071,7 +1071,7 @@
> if (inode->i_state != I_CLEAR)
> BUG();
> } else {
> - if (!list_empty(&inode->i_hash) && sb && sb->s_root) {
> + if (!list_empty(&inode->i_hash)) {
> if (!(inode->i_state & (I_DIRTY|I_LOCK))) {
> list_del(&inode->i_list);
> list_add(&inode->i_list, &inode_unused);

I have to say that I like this patch better myself - the added tests are
not sensible, and just removing them seems to be the right thing.

Linus

2001-11-24 06:09:17

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)



On Fri, 23 Nov 2001, Linus Torvalds wrote:

> > - if (!list_empty(&inode->i_hash) && sb && sb->s_root) {
> > + if (!list_empty(&inode->i_hash)) {
> > if (!(inode->i_state & (I_DIRTY|I_LOCK))) {
> > list_del(&inode->i_list);
> > list_add(&inode->i_list, &inode_unused);
>
> I have to say that I like this patch better myself - the added tests are
> not sensible, and just removing them seems to be the right thing.

Test for ->s_root is bogus and had been removed - check the patch I've sent.

However, that variant suffers from the following problem: if ->read_super()
fails after it had done _any_ iget() (root inode, journal, whatever) -
we are screwed. Sure, we do iput(). And then we have inode stuck in icache,
with ->i_sb pointing nowhere. When it finally gets evicted we call
inode->i_sb->s_op->clear_inode(). Oops...

2001-11-24 06:20:50

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)

On Sat, Nov 24, 2001 at 01:04:48AM -0500, Alexander Viro wrote:
>
>
> On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
>
> > On Fri, Nov 23, 2001 at 04:22:17PM -0500, Alexander Viro wrote:
> > > Sigh... Supposed fix to problems with stale inodes was completely
> > > broken.
> > >
> > > What we need is "if we are doing last iput() on fs that is getting
> > > shut, sync it and don't leave it in cache". And yes, we have a similar
> >
> > What's this "stale inode" problem? invalidate_inodes in kill_super will
> > obviously get rid of all of them or we would be getting the
>
> First of all, there is ->read_super() side of the things. If it fails
> after iget() on root, we have nothing to kick inode out of cache. And
> no, we can't call invalidate_inodes() here - too late for calling any
> methods.

why should you call any method, the ->read_super just make sure if it
fails you can safely get rid of the logical struct inode. I don't see
any problem in solving this case without special iput cases.

>
> What's more, for stuff like inodes held by superblock (e.g. fs keeping
> block bitmaps in a file - in that case the earliest point that _can_
> do iput() on that sucker is ->put_super(); ditto for $BIGNUM similar
> cases - journal, other fs structures of that kind - ACLs, etc., etc.)
> we get final iput() _after_ invalidate_inodes(). And doing anything
^^^^^^^^^^^^^^
> after ->put_super() is again too late.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

if (sop) {
if (sop->write_super && sb->s_dirt)
sop->write_super(sb);
if (sop->put_super)
sop->put_super(sb);
^^^^^^^^^^^^^^
}

/* Forget any remaining inodes */
if (invalidate_inodes(sb)) {
^^^^^^^^^^^^^^^^^^^^^
printk("VFS: Busy inodes after unmount. "
"Self-destruct in 5 seconds. Have a nice day...\n");
}

>
> IOW, we can kick inode out of icache only between successful ->read_super()
> and ->put_super(). Any iput() done outside of that range must go immediately
> and yes, such cases are not only possible but actually exist.

If put_user changes an inode and then it does an iput internally, it's
its own business to recall fsync_dev internally if necessary. btw, we could
even move all the invalidate code into the lowlevel callbacks, the point
of having it at the VFS is because most fs are just fine with it, but it
doesn't restrict any functionality.

Andrea

2001-11-24 06:26:39

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)

On Sat, Nov 24, 2001 at 01:08:49AM -0500, Alexander Viro wrote:
>
>
> On Fri, 23 Nov 2001, Linus Torvalds wrote:
>
> > > - if (!list_empty(&inode->i_hash) && sb && sb->s_root) {
> > > + if (!list_empty(&inode->i_hash)) {
> > > if (!(inode->i_state & (I_DIRTY|I_LOCK))) {
> > > list_del(&inode->i_list);
> > > list_add(&inode->i_list, &inode_unused);
> >
> > I have to say that I like this patch better myself - the added tests are
> > not sensible, and just removing them seems to be the right thing.
>
> Test for ->s_root is bogus and had been removed - check the patch I've sent.
>
> However, that variant suffers from the following problem: if ->read_super()
> fails after it had done _any_ iget() (root inode, journal, whatever) -
> we are screwed. Sure, we do iput(). And then we have inode stuck in icache,

you are screwed because you were running a broken filesystem: it is its
own business to drop the inodes if it fails, all it needs to do is to
call invalidate_inodes(s) internally before returning from the read_super
in the failure case.

> with ->i_sb pointing nowhere. When it finally gets evicted we call
> inode->i_sb->s_op->clear_inode(). Oops...


Andrea

2001-11-24 06:30:09

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)



On Sat, 24 Nov 2001, Andrea Arcangeli wrote:

> > First of all, there is ->read_super() side of the things. If it fails
> > after iget() on root, we have nothing to kick inode out of cache. And
> > no, we can't call invalidate_inodes() here - too late for calling any
> > methods.
>
> why should you call any method, the ->read_super just make sure if it
> fails you can safely get rid of the logical struct inode. I don't see
> any problem in solving this case without special iput cases.

RTFS.

Suppose ->read_super() fails to allocate root dentry. It already has inode
of root directory in cache. It does iput() and returns.

Fine. Now we have an inode in cache. Superblock gets freed. See what
will happen next?

Right, after a while memory pressure will shrink icache. That will call
clear_inode() on our inode, followed by destroy_inode().

And clear_inode() will call inode->i_sb->s_op->clear_inode(). Oops...

That problem exists for almost every filesystem in the tree. Had been there
for quite a while.

As for the invalidate_inodes() after ->put_super() - check what 2.4.15 does
_before_ ->put_super(). All this stuff is not about inodes that need to
be written out after something that ->put_super() had done - I agree that
it's ->put_super() headache. It's about inodes that stay in icache for too
long.

Again, whenever struct inode is freed, we call ->i_sb->s_op->clear_inode()
and we'd better do that while relevant data structures are guaranteed to
be alive.

2001-11-24 06:32:09

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)



On Sat, 24 Nov 2001, Andrea Arcangeli wrote:

> you are screwed because you were running a broken filesystem: it is its
> own business to drop the inodes if it fails, all it needs to do is to
> call invalidate_inodes(s) internally before returning from the read_super
> in the failure case.

Cute. Do you realize that _every_ fs would have to do that?

2001-11-24 06:37:59

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)



On Sat, 24 Nov 2001, Alexander Viro wrote:

> On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
>
> > you are screwed because you were running a broken filesystem: it is its
> > own business to drop the inodes if it fails, all it needs to do is to
> > call invalidate_inodes(s) internally before returning from the read_super
> > in the failure case.
>
> Cute. Do you realize that _every_ fs would have to do that?

Put it that way:
* if ->read_super() decides to fail, it should evict all inodes
it had put into icache.
* if ->put_super() does any iput(), it should take care to evict
that inode from icache.

IOW,
* if we do iput() while we are outside of (success of ->read_super(),
call of ->put_super()) - we want that inode to be evicted ASAP.

Which is precisely what 2.4.15+patch does.

2001-11-24 06:44:01

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)

On Sat, Nov 24, 2001 at 01:31:18AM -0500, Alexander Viro wrote:
>
>
> On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
>
> > you are screwed because you were running a broken filesystem: it is its
> > own business to drop the inodes if it fails, all it needs to do is to
> > call invalidate_inodes(s) internally before returning from the read_super
> > in the failure case.
>
> Cute. Do you realize that _every_ fs would have to do that?

I don't care who has to do it, but who has to do it it has to do it in a
very very very very slow path, and you want to handle it in iput fast
path instead, cute?

Andrea

2001-11-24 06:50:50

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)

On Sat, Nov 24, 2001 at 01:37:22AM -0500, Alexander Viro wrote:
>
>
> On Sat, 24 Nov 2001, Alexander Viro wrote:
>
> > On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
> >
> > > you are screwed because you were running a broken filesystem: it is its
> > > own business to drop the inodes if it fails, all it needs to do is to
> > > call invalidate_inodes(s) internally before returning from the read_super
> > > in the failure case.
> >
> > Cute. Do you realize that _every_ fs would have to do that?
>
> Put it that way:
> * if ->read_super() decides to fail, it should evict all inodes
> it had put into icache.
> * if ->put_super() does any iput(), it should take care to evict
> that inode from icache.

exactly.

> IOW,
> * if we do iput() while we are outside of (success of ->read_super(),
> call of ->put_super()) - we want that inode to be evicted ASAP.
>
> Which is precisely what 2.4.15+patch does.

and it's slower and overlay complex compared to the right fix:

--- 2.4.15aa1/fs/ext2/super.c.~1~ Fri Nov 23 08:21:00 2001
+++ 2.4.15aa1/fs/ext2/super.c Sat Nov 24 07:50:19 2001
@@ -643,6 +643,7 @@
printk(KERN_ERR "EXT2-fs: corrupt root inode, run e2fsck\n");
} else
printk(KERN_ERR "EXT2-fs: get root inode failed\n");
+ invalidate_inodes(sb);
goto failed_mount2;
}
ext2_setup_super (sb, es, sb->s_flags & MS_RDONLY);

Andrea

2001-11-24 06:52:20

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)



On Sat, 24 Nov 2001, Andrea Arcangeli wrote:

> I don't care who has to do it, but who has to do it it has to do it in a
> very very very very slow path, and you want to handle it in iput fast
> path instead, cute?

Let's count. In the place where before -pre9 iput() had
return;
we have (2.4.15 + patch)
if (!sb || sb->s_flags & MS_ACTIVE)
return;
and on the fast path the first is false and second is true.

Price:
compare with zero
not taken branch
test (register + offset), <bit>
not taken branch
return
instead of
return
Compare with the stuff we had done just before that. Pollution of fast
path sucks, but here it _is_ noise.

I agree that variant in 2.4.15 is crap - hell, it's completely broken.
Please, grab the patch in question, apply to 2.4.15 and look at the
resulting code.

2001-11-24 06:58:41

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)



On Sat, 24 Nov 2001, Andrea Arcangeli wrote:

> and it's slower and overlay complex compared to the right fix:

> --- 2.4.15aa1/fs/ext2/super.c.~1~ Fri Nov 23 08:21:00 2001
> +++ 2.4.15aa1/fs/ext2/super.c Sat Nov 24 07:50:19 2001
> @@ -643,6 +643,7 @@
> printk(KERN_ERR "EXT2-fs: corrupt root inode, run e2fsck\n");
> } else
> printk(KERN_ERR "EXT2-fs: get root inode failed\n");
> + invalidate_inodes(sb);
> goto failed_mount2;
> }
> ext2_setup_super (sb, es, sb->s_flags & MS_RDONLY);


It also fixes the problem for good for all filesystems. As
for the speed - see previous posting. It _is_ noise. Comparison
with zero, test bit and two not taken branches. In final iput().

Seriously, check what else happens on that path.

2001-11-24 07:01:31

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)

On Sat, Nov 24, 2001 at 07:50:45AM +0100, Andrea Arcangeli wrote:
> On Sat, Nov 24, 2001 at 01:37:22AM -0500, Alexander Viro wrote:
> >
> >
> > On Sat, 24 Nov 2001, Alexander Viro wrote:
> >
> > > On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
> > >
> > > > you are screwed because you were running a broken filesystem: it is its
> > > > own business to drop the inodes if it fails, all it needs to do is to
> > > > call invalidate_inodes(s) internally before returning from the read_super
> > > > in the failure case.
> > >
> > > Cute. Do you realize that _every_ fs would have to do that?
> >
> > Put it that way:
> > * if ->read_super() decides to fail, it should evict all inodes
> > it had put into icache.
> > * if ->put_super() does any iput(), it should take care to evict
> > that inode from icache.
>
> exactly.
>
> > IOW,
> > * if we do iput() while we are outside of (success of ->read_super(),
> > call of ->put_super()) - we want that inode to be evicted ASAP.
> >
> > Which is precisely what 2.4.15+patch does.
>
> and it's slower and overlay complex compared to the right fix:
>
> --- 2.4.15aa1/fs/ext2/super.c.~1~ Fri Nov 23 08:21:00 2001
> +++ 2.4.15aa1/fs/ext2/super.c Sat Nov 24 07:50:19 2001
> @@ -643,6 +643,7 @@
> printk(KERN_ERR "EXT2-fs: corrupt root inode, run e2fsck\n");
> } else
> printk(KERN_ERR "EXT2-fs: get root inode failed\n");
> + invalidate_inodes(sb);
> goto failed_mount2;
> }
> ext2_setup_super (sb, es, sb->s_flags & MS_RDONLY);

this one looks even better (and it doesn't need to propagate the fix to
the lowlevel):

--- 2.4.15aa1/fs/super.c.~1~ Fri Nov 23 08:21:01 2001
+++ 2.4.15aa1/fs/super.c Sat Nov 24 07:58:37 2001
@@ -470,6 +470,7 @@
return s;

out_fail:
+ invalidate_inodes(s);
s->s_dev = 0;
s->s_bdev = 0;
s->s_type = NULL;

Andrea

2001-11-24 07:07:01

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)



On Sat, 24 Nov 2001, Andrea Arcangeli wrote:

> this one looks even better (and it doesn't need to propagate the fix to
> the lowlevel):
>
> --- 2.4.15aa1/fs/super.c.~1~ Fri Nov 23 08:21:01 2001
> +++ 2.4.15aa1/fs/super.c Sat Nov 24 07:58:37 2001
> @@ -470,6 +470,7 @@
> return s;
>
> out_fail:
> + invalidate_inodes(s);

Sigh...
a) grep, please.
b) you are triggering method calls for a superblock after failed
->read_super(). Blindly. Care to audit all filesystems and check that
it is legitimate?

2001-11-24 07:12:21

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)

On Sat, Nov 24, 2001 at 02:06:32AM -0500, Alexander Viro wrote:
>
>
> On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
>
> > this one looks even better (and it doesn't need to propagate the fix to
> > the lowlevel):
> >
> > --- 2.4.15aa1/fs/super.c.~1~ Fri Nov 23 08:21:01 2001
> > +++ 2.4.15aa1/fs/super.c Sat Nov 24 07:58:37 2001
> > @@ -470,6 +470,7 @@
> > return s;
> >
> > out_fail:
> > + invalidate_inodes(s);
>
> Sigh...
> a) grep, please.
> b) you are triggering method calls for a superblock after failed
> ->read_super(). Blindly. Care to audit all filesystems and check that
> it is legitimate?

if the method or the s_op isn't defined it will do nothing, if it is
defined it'd better do something not wrong because the fs just did an
iget within read_super. I don't see obvious troubles and the above looks
better than making iput more complex (and nitpicking slower 8).

Andrea

2001-11-24 07:30:40

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)



On Sat, 24 Nov 2001, Andrea Arcangeli wrote:

> if the method or the s_op isn't defined it will do nothing, if it is
> defined it'd better do something not wrong because the fs just did an
> iget within read_super. I don't see obvious troubles and the above looks
> better than making iput more complex (and nitpicking slower 8).

2.4.15-pre8:

if (!list_empty(&inode->i_hash)) {
FOO
return;
} else {
BAR
}

2.4.15+patch:

if (!list_empty(&inode->i_hash)) {
FOO
if (!sb || sb->s_flags & MS_ACTIVE)
return;
write_inode_now(inode, 1);
spin_lock(&inode_lock);
inodes_stat.nr_unused--;
list_del_init(&inode->i_hash);
}
BAR

Notice that it fixes _all_ problems with stale inodes, with only one rule
for fs code - "don't call iput() when ->clear_inode() doesn't work". Your
variant requires funnier things - "if at some point ->clear_inode()
may stop working make sure to call invalidate_inodes()" in addition to
the rule above.

Frankly, I'd prefer fix that provides less possibilities for fsckup
in fs code and requires less analysis.

2001-11-24 07:45:09

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)

On Sat, Nov 24, 2001 at 02:30:15AM -0500, Alexander Viro wrote:
>
>
> On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
>
> > if the method or the s_op isn't defined it will do nothing, if it is
> > defined it'd better do something not wrong because the fs just did an
> > iget within read_super. I don't see obvious troubles and the above looks
> > better than making iput more complex (and nitpicking slower 8).
>
> 2.4.15-pre8:
>
> if (!list_empty(&inode->i_hash)) {
> FOO
> return;
> } else {
> BAR
> }
>
> 2.4.15+patch:
>
> if (!list_empty(&inode->i_hash)) {
> FOO
> if (!sb || sb->s_flags & MS_ACTIVE)
> return;
> write_inode_now(inode, 1);
> spin_lock(&inode_lock);
> inodes_stat.nr_unused--;
> list_del_init(&inode->i_hash);
> }
> BAR
>
> Notice that it fixes _all_ problems with stale inodes, with only one rule
> for fs code - "don't call iput() when ->clear_inode() doesn't work". Your
> variant requires funnier things - "if at some point ->clear_inode()
> may stop working make sure to call invalidate_inodes()" in addition to
> the rule above.

the rule I add is "if ->clear_inode is really needed, just don't clear
s_op before returning null from read_super" and that requirement looks
fine.

>
> Frankly, I'd prefer fix that provides less possibilities for fsckup
> in fs code and requires less analysis.

I think right fix is invalidate_inodes() in read_super fail path so I
think it worth the short analysis, and that fail path was just buggy, so
it shouldn't get worse at least :).

Andrea

2001-11-24 08:05:32

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)



On Sat, 24 Nov 2001, Andrea Arcangeli wrote:

> > Notice that it fixes _all_ problems with stale inodes, with only one rule
> > for fs code - "don't call iput() when ->clear_inode() doesn't work". Your
> > variant requires funnier things - "if at some point ->clear_inode()
> > may stop working make sure to call invalidate_inodes()" in addition to
> > the rule above.
>
> the rule I add is "if ->clear_inode is really needed, just don't clear
> s_op before returning null from read_super" and that requirement looks
> fine.

It's not that simple. You may need the per-superblock data structures for
->clear_inode() to work.

In any case, it _is_ additional rule for no good reason. "inode may stay
in icache after iput() only when fs is up and running" is a warranty that
is trivial to provide and that removes a source of hard-to-debug screwups
in fs code.

2001-11-24 08:21:53

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)

On Sat, Nov 24, 2001 at 03:05:02AM -0500, Alexander Viro wrote:
>
>
> On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
>
> > > Notice that it fixes _all_ problems with stale inodes, with only one rule
> > > for fs code - "don't call iput() when ->clear_inode() doesn't work". Your
> > > variant requires funnier things - "if at some point ->clear_inode()
> > > may stop working make sure to call invalidate_inodes()" in addition to
> > > the rule above.
> >
> > the rule I add is "if ->clear_inode is really needed, just don't clear
> > s_op before returning null from read_super" and that requirement looks
> > fine.
>
> It's not that simple. You may need the per-superblock data structures for
> ->clear_inode() to work.
>
> In any case, it _is_ additional rule for no good reason. "inode may stay
> in icache after iput() only when fs is up and running" is a warranty that
> is trivial to provide and that removes a source of hard-to-debug screwups
> in fs code.

I don't think it's harder to debug, you need the per-superblock data
structures for ->clear_inode() also if you try to ->clear_inode in iput,
and I cannot see any valid reason for which the fs would be allowed to
screwup the superblock before returning from read_inode. As soon as you
call iget the superblock must be sane and there's no point in screwing
it up afterwards.

Andrea

2001-11-24 08:38:33

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)



On Sat, 24 Nov 2001, Andrea Arcangeli wrote:

> I don't think it's harder to debug, you need the per-superblock data
> structures for ->clear_inode() also if you try to ->clear_inode in iput,
> and I cannot see any valid reason for which the fs would be allowed to
> screwup the superblock before returning from read_inode. As soon as you
> call iget the superblock must be sane and there's no point in screwing
> it up afterwards.

Sigh...

set per-sb structures
...
iget()
...
sanity checks
...
normal return
sanity_checks_failed:
iput()
...
free per-sb structures
...
return NULL;

Looks sane, doesn't it? And that's pretty much the only way to go if
we allocate that stuff dynamically. With your variant we _must_ call
invalidate_inodes() here to force eviction from icache. What's more,
not calling it will screw up non-deterministically - it will survive
if inode gets evicted in the right interval and produce whatever damage
it's going to produce if eviction happens too late.

Again, what we really want here is "don't keep inodes dropped during
->read_super() or ->put_super() in icache". You propose to stick
invalidate_inodes() in a bunch of places so that it would kill these
inodes before it's too late. For some filesystems it would be
covered by ones you add in fs/super.c, for other it would need
explicit calls, required positions may depend on the fs internals
and change with them... What I propose is "don't wait, kill them
immediately and forget about the whole thing".

2001-11-24 09:01:37

by Russell King

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)

On Fri, Nov 23, 2001 at 09:55:42PM -0800, Linus Torvalds wrote:
> I have to say that I like this patch better myself - the added tests are
> not sensible, and just removing them seems to be the right thing.

Linus,

Could we at least have a working -final kernel? Please?

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2001-11-24 09:39:17

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)

On Sat, Nov 24, 2001 at 03:38:07AM -0500, Alexander Viro wrote:
>
>
> On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
>
> > I don't think it's harder to debug, you need the per-superblock data
> > structures for ->clear_inode() also if you try to ->clear_inode in iput,
> > and I cannot see any valid reason for which the fs would be allowed to
> > screwup the superblock before returning from read_inode. As soon as you
> > call iget the superblock must be sane and there's no point in screwing
> > it up afterwards.
>
> Sigh...
>
> set per-sb structures
> ...
> iget()
> ...
> sanity checks
> ...
> normal return
> sanity_checks_failed:
> iput()
> ...
> free per-sb structures
> ...
> return NULL;
>
> Looks sane, doesn't it? And that's pretty much the only way to go if

yes, it's definitely sane.

> we allocate that stuff dynamically. With your variant we _must_ call
> invalidate_inodes() here to force eviction from icache. What's more,
> not calling it will screw up non-deterministically - it will survive
> if inode gets evicted in the right interval and produce whatever damage
> it's going to produce if eviction happens too late.
>
> Again, what we really want here is "don't keep inodes dropped during
> ->read_super() or ->put_super() in icache". You propose to stick
> invalidate_inodes() in a bunch of places so that it would kill these
> inodes before it's too late. For some filesystems it would be

correct.

> covered by ones you add in fs/super.c, for other it would need
> explicit calls, required positions may depend on the fs internals
> and change with them... What I propose is "don't wait, kill them
> immediately and forget about the whole thing".

I don't like bloating iput with code just needed for backwards
compatibility with buggy code and for a non common case (infact it's not
even backwards compatibiltiy, if something it would be instead bugwards
compatibility 8).

OTOH I see the iget/iput semantics within the read_super/put_super (not
the code!) would be cleaner your way, so we don't even need to document
that if a per-sb ->clear_inode needs to access any per-sb special
structures, invalidate_inodes(sb) must be called before freeing the
per-sb structures [always ignoring any possible flush] (plus the fact
sb->s_op must not be clobbered before returning from read_super, so that
the vfs can see the s_op->clear_inode, but nobody should do that
anyways). btw, even your way we should still make sure nobody is
calling iput after freeing the per-sb structures :).

In practice I had a very short look at all the ->clear_inode implemented
and none seems to need the special per-sb structures, so I think also my
patch is just fine for current tree 2.4. (I guess somebody should check
xfs and jfs too)

I think long term (2.5) the right way is to replace all the iput in the
slow fail paths with a iput_not_mounted, that will avoid both the iput
clobbering and the MS_ACTIVE tracking. The differentiation should be
quite self documenting (so people will taste that this iput_not_mounted
is kind of raw thing that will flush + ->clear_inode and destroy the
inode synchronously, so ->clear_inode has to work while recalling
iput_not_mounted). It should be very easy to identify the iputs in the
read_super/put_super paths to replace them with the iput_not_mounted (at
least for the normal fs like ext2/minix etc..).

Andrea

2001-11-24 09:57:07

by Alexander Viro

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)



On Sat, 24 Nov 2001, Andrea Arcangeli wrote:

> I think long term (2.5) the right way is to replace all the iput in the
> slow fail paths with a iput_not_mounted, that will avoid both the iput
> clobbering and the MS_ACTIVE tracking. The differentiation should be

Egads... Andrea, think for a bloody minute. What's the point of adding
a new helper that would share a lot of code with the iput() _and_ would
bring additional calling rules? We get
* more code in inode.c
* more code duplications
* a lot of opportunities to fsck up in fs code
* redundant invalidate_inodes() calls in fs/super.c existing just
to catch these fsckups.
* some filesystems getting out with only iput(), some needing new
helper.
* cut'n'paste programming getting one more source of bugs to
introduce.

And it's not even the case when filesystems could use that distinction in
any sane way...

2001-11-24 10:25:58

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)

On Sat, Nov 24, 2001 at 04:56:41AM -0500, Alexander Viro wrote:
>
>
> On Sat, 24 Nov 2001, Andrea Arcangeli wrote:
>
> > I think long term (2.5) the right way is to replace all the iput in the
> > slow fail paths with a iput_not_mounted, that will avoid both the iput
> > clobbering and the MS_ACTIVE tracking. The differentiation should be
>
> Egads... Andrea, think for a bloody minute. What's the point of adding
> a new helper that would share a lot of code with the iput() _and_ would
> bring additional calling rules? We get
> * more code in inode.c
> * more code duplications
> * a lot of opportunities to fsck up in fs code

red herring, you can solve all by simply implementing it sanely, for
example make it inline with a parameter (int sync), code duplication is
not the issue, icache waste obviously isn't either since the
_not_mounted version would never be called during prodution.

> * redundant invalidate_inodes() calls in fs/super.c existing just
> to catch these fsckups.

that would obviously go away if we take your way.

> * some filesystems getting out with only iput(), some needing new
> helper.

I don't follow this one, what's the point of "needing new helper"? The
iput_not_mounted would work like yours, but without the need of the
MS_ACTIVE tracking and without slowing down iput, and most important it
would make self documenting the synchronous behaviour and the need of
->clean_inode working.

> * cut'n'paste programming getting one more source of bugs to
> introduce.

See above.

> And it's not even the case when filesystems could use that distinction in
> any sane way...

This could be a valid point against that split of functions, I thought
all iputs were localized and not share with the production code indeed.

Or we could also stick with document that if somebody has the per-sb
things, he has to run invalidate_inodes() before freeing them, that
would be the "lesser code to maintain" and simpler solution (the one I
prefer for 2.4 incidentally). Another cons about this one is that gets
rid of the dirty inodes as well, so the read_super/put_super should also
take care to flush to disk by hand in the put_super/read_super if they
really ever needs. I tend to like to have structures in place only for
the production code, and to be able to share them for the special cases
too with minimal effort, but ok, this way the semantics gets a bit more
complicated for the very unlikely case. But in practice nobody will ever
need the flush or/and the explicit invalidate_inodes into the
put_user/read_super and 2.4.15aa1 should be just unbrekeable.

In short I'm fine either ways.

Andrea

2001-11-24 10:21:44

by Christian Borntraeger

[permalink] [raw]
Subject: Re: 2.4.15-pre9 breakage (inode.c)

Hi everyone,

We have seen on the 2.4.14-loop problem that a lot of people don't read the
LKML or the archives.
But there is still the broken 2.4.15 on kernel.org.

Now I am asking myself wouldn't it be better to mark 2.4.15 as broken and
replace it as soon as Alexander and Andrea have decided which of the patches
is better?

greetings

Christian Borntr?ger