LinuxLists.cc - [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

2016-12-26 12:34:17

Subject: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

Because of the disk and hardware issue, the ext4 filesystem have
many errors, the inode->i_nlink of ext4 becomes zero abnormally
but the dentry is still positive, it will cause memory corruption
after the following process:

1) Due to the inode->i_nlink is 0, this inode will be added into
the orhpan list,
2) ext4_rename() cover this inode, and drop_nlink() will reverse
the inode->i_nlink to 0xFFFFFFFF,
3) iput() add this inode to LRU,
4) evict() will call destroy_inode() to destroy this inode but
skip removing it from the orphan list,
5) after this, the inode's memory address space will be used by
other module, when the ext4 filesystem change the orphan list, it will
trample other module's data and then may cause oops.

Although we cannot avoid hardware and disk errors, we can control the
softwore error in the ext4 module, do not affect other modules and
increase the difficulty of locating problems.

This patch avoid inode->i_nlink reverse and remove the inode form the
orphan list when destroy it if the list is not empty.
Signed-off-by: yi zhang <[email protected]>
---
fs/ext4/super.c | 1 +
fs/inode.c | 5 ++++-
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 52b0530..617327e 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -975,6 +975,7 @@ static void ext4_destroy_inode(struct inode *inode)
EXT4_I(inode), sizeof(struct ext4_inode_info),
true);
dump_stack();
+ ext4_orphan_del(NULL, inode);
}
call_rcu(&inode->i_rcu, ext4_i_callback);
}
diff --git a/fs/inode.c b/fs/inode.c
index 88110fd..079d383 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -279,7 +279,10 @@ static void destroy_inode(struct inode *inode)
*/
void drop_nlink(struct inode *inode)
{
- WARN_ON(inode->i_nlink == 0);
+ if (WARN(inode->i_nlink == 0, "inode %lu nlink"
+ " is already 0", inode->i_ino))
+ return;
+
inode->__i_nlink--;
if (!inode->i_nlink)
atomic_long_inc(&inode->i_sb->s_remove_count);
--
2.5.0

2016-12-26 18:32:38

by Andreas Dilger

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

On Dec 26, 2016, at 5:34 AM, yi zhang <[email protected]> wrote:
>
> Because of the disk and hardware issue, the ext4 filesystem have
> many errors, the inode->i_nlink of ext4 becomes zero abnormally
> but the dentry is still positive, it will cause memory corruption
> after the following process:
>
> 1) Due to the inode->i_nlink is 0, this inode will be added into
> the orhpan list,
> 2) ext4_rename() cover this inode, and drop_nlink() will reverse
> the inode->i_nlink to 0xFFFFFFFF,
> 3) iput() add this inode to LRU,
> 4) evict() will call destroy_inode() to destroy this inode but
> skip removing it from the orphan list,
> 5) after this, the inode's memory address space will be used by
> other module, when the ext4 filesystem change the orphan list, it will
> trample other module's data and then may cause oops.
>
> Although we cannot avoid hardware and disk errors, we can control the
> softwore error in the ext4 module, do not affect other modules and
> increase the difficulty of locating problems.
>
> This patch avoid inode->i_nlink reverse and remove the inode form the

(typo) s/form/from/

> orphan list when destroy it if the list is not empty.
> Signed-off-by: yi zhang <[email protected]>
> ---
> fs/ext4/super.c | 1 +
> fs/inode.c | 5 ++++-
> 2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index 52b0530..617327e 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -975,6 +975,7 @@ static void ext4_destroy_inode(struct inode *inode)
> EXT4_I(inode), sizeof(struct ext4_inode_info),
> true);
> dump_stack();
> + ext4_orphan_del(NULL, inode);
> }
> call_rcu(&inode->i_rcu, ext4_i_callback);
> }
> diff --git a/fs/inode.c b/fs/inode.c
> index 88110fd..079d383 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -279,7 +279,10 @@ static void destroy_inode(struct inode *inode)
> */
> void drop_nlink(struct inode *inode)
> {
> - WARN_ON(inode->i_nlink == 0);
> + if (WARN(inode->i_nlink == 0, "inode %lu nlink"
> + " is already 0", inode->i_ino))

(style) the string should be kept on a single line instead of being
split, especially since it can fit easily.

(defect) this needs to have a newline.

if (WARN(inode->i_nlink == 0,
"inode %lu nlink is already 0\n", inode->i_ino))

Cheers, Andreas

> + return;
> +
> inode->__i_nlink--;
> if (!inode->i_nlink)
> atomic_long_inc(&inode->i_sb->s_remove_count);
> --
> 2.5.0
>

Cheers, Andreas

Attachments:

signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2016-12-31 22:59:13

by Valdis Klētnieks

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

On Mon, 26 Dec 2016 20:34:17 +0800, yi zhang said:
> Because of the disk and hardware issue, the ext4 filesystem have
> many errors, the inode->i_nlink of ext4 becomes zero abnormally
> but the dentry is still positive, it will cause memory corruption
> after the following process:
>
> 1) Due to the inode->i_nlink is 0, this inode will be added into
> the orhpan list,

> + if (WARN(inode->i_nlink == 0, "inode %lu nlink"
> + " is already 0", inode->i_ino))

Can we get the filesystem? Or at least the device major/minor? If a system
has multiple large ext4 filesystems, it would be helpful to know which
one is having the problem.

Attachments:

(No filename) (484.00 B)

2017-01-04 08:29:33

by Zhang Yi

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

On 2017/1/1 6:59, [email protected] said:
> On Mon, 26 Dec 2016 20:34:17 +0800, yi zhang said:
>> Because of the disk and hardware issue, the ext4 filesystem have
>> many errors, the inode->i_nlink of ext4 becomes zero abnormally
>> but the dentry is still positive, it will cause memory corruption
>> after the following process:
>>
>> 1) Due to the inode->i_nlink is 0, this inode will be added into
>> the orhpan list,
>
>> + if (WARN(inode->i_nlink == 0, "inode %lu nlink"
>> + " is already 0", inode->i_ino))
>
> Can we get the filesystem? Or at least the device major/minor? If a system
> has multiple large ext4 filesystems, it would be helpful to know which
> one is having the problem.
>

if (WARN(inode->i_nlink == 0,
- "inode %lu nlink is already 0", inode->i_ino))
+ "inode %lu nlink is already 0, dev=%u:%u",
+ inode->i_ino, MAJOR(inode->i_sb->s_dev), MINOR(inode->i_sb->s_dev)))
return;

We can modify as above, it's enough to know which filesystem is having the
problem, what do you think?

yi zhang

2017-01-04 22:00:24

by Andreas Dilger

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

On Jan 4, 2017, at 2:54 PM, Darrick J. Wong <[email protected]> wrote:
>
> On Wed, Jan 04, 2017 at 04:29:33PM +0800, zhangyi (F) wrote:
>> On 2017/1/1 6:59, [email protected] said:
>>> On Mon, 26 Dec 2016 20:34:17 +0800, yi zhang said:
>>>> Because of the disk and hardware issue, the ext4 filesystem have
>>>> many errors, the inode->i_nlink of ext4 becomes zero abnormally
>>>> but the dentry is still positive, it will cause memory corruption
>>>> after the following process:
>>>>
>>>> 1) Due to the inode->i_nlink is 0, this inode will be added into
>>>> the orhpan list,
>>>
>>>> + if (WARN(inode->i_nlink == 0, "inode %lu nlink"
>>>> + " is already 0", inode->i_ino))
>>>
>>> Can we get the filesystem? Or at least the device major/minor? If a system
>>> has multiple large ext4 filesystems, it would be helpful to know which
>>> one is having the problem.
>>>
>>
>> if (WARN(inode->i_nlink == 0,
>> - "inode %lu nlink is already 0", inode->i_ino))
>> + "inode %lu nlink is already 0, dev=%u:%u",
>> + inode->i_ino, MAJOR(inode->i_sb->s_dev), MINOR(inode->i_sb->s_dev)))
>> return;
>>
>> We can modify as above, it's enough to know which filesystem is having the
>> problem, what do you think?
>
> Why not:
>
> if (inode->i_nlink == 0) {
> ext4_warning_inode(inode, "nlink is already 0");
> return;
> }

I like this better as well.

Cheers, Andreas

Attachments:

signature.asc (833.00 B)
Message signed with OpenPGP using GPGMail

2017-01-04 22:58:43

by Darrick J. Wong

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

On Wed, Jan 04, 2017 at 04:29:33PM +0800, zhangyi (F) wrote:
> On 2017/1/1 6:59, [email protected] said:
> > On Mon, 26 Dec 2016 20:34:17 +0800, yi zhang said:
> >> Because of the disk and hardware issue, the ext4 filesystem have
> >> many errors, the inode->i_nlink of ext4 becomes zero abnormally
> >> but the dentry is still positive, it will cause memory corruption
> >> after the following process:
> >>
> >> 1) Due to the inode->i_nlink is 0, this inode will be added into
> >> the orhpan list,
> >
> >> + if (WARN(inode->i_nlink == 0, "inode %lu nlink"
> >> + " is already 0", inode->i_ino))
> >
> > Can we get the filesystem? Or at least the device major/minor? If a system
> > has multiple large ext4 filesystems, it would be helpful to know which
> > one is having the problem.
> >
>
> if (WARN(inode->i_nlink == 0,
> - "inode %lu nlink is already 0", inode->i_ino))
> + "inode %lu nlink is already 0, dev=%u:%u",
> + inode->i_ino, MAJOR(inode->i_sb->s_dev), MINOR(inode->i_sb->s_dev)))
> return;
>
> We can modify as above, it's enough to know which filesystem is having the
> problem, what do you think?

Why not:

if (inode->i_nlink == 0) {
ext4_warning_inode(inode, "nlink is already 0");
return;
}

?

--D

>
> yi zhang
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2017-01-04 23:35:59

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

On Wed, Jan 04, 2017 at 01:54:24PM -0800, Darrick J. Wong wrote:
>
> if (inode->i_nlink == 0) {
> ext4_warning_inode(inode, "nlink is already 0");
> return;
> }

We can't do that because the place where Zhangyi is proposing to
change is in fs/inode.c:drop_nlink(), so we can't add a call to
ext4_error() or ext4_warning().

So how exactly how did we get into this state? When we read the inode
into memory, if i_nlink is zero, we declare the file system as
corrupted immediately.

So I assume this is happening the on-disk i_links_count (which is read
into inode->i_nlink) was too low. So I think the way we should be
handling this is in unlink and rename, before we let i_nlink drop to
zero, we need to check to see if there are other dcache entries
pointing at the inode. If so, we need to call ext4_error(), and in
the errors=continue case, return EFSCORRUPTED (aka EUCLEAN).

- Ted

2017-01-05 07:24:14

by Zhang Yi

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

On 2017/1/5 7:35, Theodore Ts'o wrote:
>
> So how exactly how did we get into this state? When we read the inode
> into memory, if i_nlink is zero, we declare the file system as
> corrupted immediately.
>
> So I assume this is happening the on-disk i_links_count (which is read
> into inode->i_nlink) was too low. So I think the way we should be
> handling this is in unlink and rename, before we let i_nlink drop to
> zero, we need to check to see if there are other dcache entries
> pointing at the inode. If so, we need to call ext4_error(), and in
> the errors=continue case, return EFSCORRUPTED (aka EUCLEAN).
>

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -3662,6 +3662,11 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
}

if (new.inode) {
+ if (new.inode->i_nlink == 0) {
+ ext4_warning_inode(new.inode, "Removing file '%.*s' with no links",
+ new.dentry->d_name.len, new.dentry->d_name.name);
+ set_nlink(new.inode, 1);
+ }
ext4_dec_count(handle, new.inode);
new.inode->i_ctime = ext4_current_time(new.inode);
}

Because the filesystem have many errors, and the reason of i_nlink becomes zero is unknown,
the on-disk i_links_count was too low may be one reason. I think we can add i_nlink check in
ext4_rename just like ext4_unlink did, it can avoid inversion under any case.

2017-01-05 17:38:38

by Darrick J. Wong

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

On Thu, Jan 05, 2017 at 03:24:14PM +0800, zhangyi (F) wrote:
>
> On 2017/1/5 7:35, Theodore Ts'o wrote:
> >
> > So how exactly how did we get into this state? When we read the inode
> > into memory, if i_nlink is zero, we declare the file system as
> > corrupted immediately.
> >
> > So I assume this is happening the on-disk i_links_count (which is read
> > into inode->i_nlink) was too low. So I think the way we should be
> > handling this is in unlink and rename, before we let i_nlink drop to

D'oh, /me failed to notice the patch was against fs/inode.c, not
fs/ext4/inode.c. Sorry for the noise.

> > zero, we need to check to see if there are other dcache entries
> > pointing at the inode. If so, we need to call ext4_error(), and in
> > the errors=continue case, return EFSCORRUPTED (aka EUCLEAN).
> >
>
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ -3662,6 +3662,11 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
> }
>
> if (new.inode) {
> + if (new.inode->i_nlink == 0) {
> + ext4_warning_inode(new.inode, "Removing file '%.*s' with no links",
> + new.dentry->d_name.len, new.dentry->d_name.name);
> + set_nlink(new.inode, 1);

Not sure we need to dump the dentry d_name, but I guess it can't hurt.

> + }
> ext4_dec_count(handle, new.inode);
> new.inode->i_ctime = ext4_current_time(new.inode);
> }
>
> Because the filesystem have many errors, and the reason of i_nlink becomes
> zero is unknown, the on-disk i_links_count was too low may be one reason. I
> think we can add i_nlink check in ext4_rename just like ext4_unlink did, it
> can avoid inversion under any case.

Er... yes, you could add the same hunk to ext4_unlink.

--D

>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2017-01-11 09:07:29

by Zhang Yi

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

on 2017/1/5 7:35, Theodore Ts'o wrote:
> On Wed, Jan 04, 2017 at 01:54:24PM -0800, Darrick J. Wong wrote:
>>
>> if (inode->i_nlink == 0) {
>> ext4_warning_inode(inode, "nlink is already 0");
>> return;
>> }
>
> We can't do that because the place where Zhangyi is proposing to
> change is in fs/inode.c:drop_nlink(), so we can't add a call to
> ext4_error() or ext4_warning().
>
> So how exactly how did we get into this state? When we read the inode
> into memory, if i_nlink is zero, we declare the file system as
> corrupted immediately.
>
> So I assume this is happening the on-disk i_links_count (which is read
> into inode->i_nlink) was too low. So I think the way we should be
> handling this is in unlink and rename, before we let i_nlink drop to
> zero, we need to check to see if there are other dcache entries
> pointing at the inode. If so, we need to call ext4_error(), and in
> the errors=continue case, return EFSCORRUPTED (aka EUCLEAN).
>
> - Ted
>

Hi Theodore:

The i_nlink underflow and memory corruption problem on ext4fs remains inconclusive.

You suggest we can check dcache entries when i_nlink drop to zero in unlink and
rename. But I think it may still have some problems, assume the following situation:

(1) The file we want to unlink have many hard links, but only one dcache entry in memory.
(2) open this file, but it's inode->i_nlink read from disk was 1 (too low).
(3) some one call rename and drop it's i_nlink to zero.
(4) it's inode is still in use and do not destroy (not closed), at the same time,
some others open it's hard link and create a dcache entry.
(5) call rename again and it's i_nlink will still underflow and cause memory corruption.

For simplicity, I think we can add underflow protection in ext4_rename or
drop_nlink as V2 and V3 patch wrote. What do you think?

yi zhang

2017-01-11 15:34:49

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

On Wed, Jan 11, 2017 at 05:07:29PM +0800, zhangyi (F) wrote:
>
> (1) The file we want to unlink have many hard links, but only one dcache entry in memory.
> (2) open this file, but it's inode->i_nlink read from disk was 1 (too low).
> (3) some one call rename and drop it's i_nlink to zero.
> (4) it's inode is still in use and do not destroy (not closed), at the same time,
> some others open it's hard link and create a dcache entry.
> (5) call rename again and it's i_nlink will still underflow and cause memory corruption.

Do you have reproducers that make it easy to reproduce situations like
this? (It shouldn't be hard to write, but if you have them already
will save me some effort. :-)

If we ever get passed an inode to ext4_file_open() where i_nlink is
zero, we can declare the file system is corrupt by calling
ext4_error() to report the problem.

Similarly, whenever we are passed a dentry pointing to an inode for
link, unlink, rename, and other methods in the inode_operations
structure, by definition the file system is corrupt, and again we
should report this using ext4_error().

So I don't think we should think of this as adding "underflow
protection"; instead we should think of it as adding more aggressive
detection of file system inconsistencies. If there is dentry which is
valid, and pointing at an inode where n_links is zero, something has
gone seriously wrong. So we should call ext4_error() to report the
file system inconsistency, and then return EFSCORRUPTED (aka EUCLEAN).

Since we would be doing this in a number of places, we should probably
add an inline function:

static inline int ext4_validate_dentry(struct dentry *dentry);

which returns 0 if the dentry is valid, and calls ext4_error_inode()
and returns -EFSCORRUPTED if the dentry points to an inode with a zero
i_nlink.

(Note: it's valid for i_nlinks to be zero if the system call started
with a file descriptor, such as read(2) or write(2) operating on a
file which is still deleted but has open file descriptors. But if the
user has passed a pathname to the system call, such as in the case of
open(), rename(), unlink(), rmdir(), etc, then the dentry had better
be pointing at an inode with a non-zero i_nlink call. We need to be a
bit careful if the method function could be called by both a pathname
and file descriptor variant of the system call --- for example
fsetxattr(2) and setxattr(2); we won't be able to use
ext4_validate_dentry() for those inode_operations calls.)

Cheers,

- Ted

2017-01-12 08:06:56

by Zhang Yi

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

2017-01-12 17:03:28

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

On Thu, Jan 12, 2017 at 04:00:16PM +0800, zhangyi (F) wrote:
>
> At the same time, I think other file systems may have the same problem, do
> you think we should put these detections on the VFS layer? Thus other file
> systems no need to do the same things, but the disadvantage is that we can
> not call ext4_error to report ext4 inconsistency.

There are file systems which don't have inodes per-se where the
i_nlinks could be a something which is simulated by the file system.
So it's not *necessarily* an on-disk inconsistency.

We'll have to see if Al and other file system developers are
agreeable, but one thing that we could do is to do the detection in
the VFS layer (which it is actually easier to do), and if they find an
issue, they can just pass a report via a callback function found in
the struct_operations structure. If there isn't such a function
defined, or the function returns 0, the VFS could just do nothing; if
it returns an error code, then that would get reflected back up to
userspace, plus whatever other action the file system sees fit to do.

- Ted

2017-01-13 03:42:19

by Al Viro

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

On Thu, Jan 12, 2017 at 12:03:28PM -0500, Theodore Ts'o wrote:
> On Thu, Jan 12, 2017 at 04:00:16PM +0800, zhangyi (F) wrote:
> >
> > At the same time, I think other file systems may have the same problem, do
> > you think we should put these detections on the VFS layer? Thus other file
> > systems no need to do the same things, but the disadvantage is that we can
> > not call ext4_error to report ext4 inconsistency.
>
> There are file systems which don't have inodes per-se where the
> i_nlinks could be a something which is simulated by the file system.
> So it's not *necessarily* an on-disk inconsistency.
>
> We'll have to see if Al and other file system developers are
> agreeable, but one thing that we could do is to do the detection in
> the VFS layer (which it is actually easier to do), and if they find an
> issue, they can just pass a report via a callback function found in
> the struct_operations structure. If there isn't such a function
> defined, or the function returns 0, the VFS could just do nothing; if
> it returns an error code, then that would get reflected back up to
> userspace, plus whatever other action the file system sees fit to do.

Detection of what? Zero ->i_nlink on inode of dentry that passes e.g.
may_delete()?

2017-01-13 14:26:56

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

On Fri, Jan 13, 2017 at 03:42:19AM +0000, Al Viro wrote:
> On Thu, Jan 12, 2017 at 12:03:28PM -0500, Theodore Ts'o wrote:
> > On Thu, Jan 12, 2017 at 04:00:16PM +0800, zhangyi (F) wrote:
> > >
> > > At the same time, I think other file systems may have the same problem, do
> > > you think we should put these detections on the VFS layer? Thus other file
> > > systems no need to do the same things, but the disadvantage is that we can
> > > not call ext4_error to report ext4 inconsistency.
> >
> > There are file systems which don't have inodes per-se where the
> > i_nlinks could be a something which is simulated by the file system.
> > So it's not *necessarily* an on-disk inconsistency.
> >
> > We'll have to see if Al and other file system developers are
> > agreeable, but one thing that we could do is to do the detection in
> > the VFS layer (which it is actually easier to do), and if they find an
> > issue, they can just pass a report via a callback function found in
> > the struct_operations structure. If there isn't such a function
> > defined, or the function returns 0, the VFS could just do nothing; if
> > it returns an error code, then that would get reflected back up to
> > userspace, plus whatever other action the file system sees fit to do.
>
> Detection of what? Zero ->i_nlink on inode of dentry that passes e.g.
> may_delete()?

Or other impossible cases where there is a valid dentry pointing at an
inode with zero i_nlink. I am fairly sure this should **never**
happen in the case of unlink(2), rmdir(2), or by the time we call
file_ops->open(), and if it does, it indicates that the underlying
on-disk file system (at least for ext4) is corrupt.

Am I missing a case? And do you have an opinion about whether we
should be trying to do this check at the VFS layer versus inside ext4?

Thanks,

- Ted

2017-01-16 03:24:46

by Zhang Yi

[permalink] [raw]

Subject: Re: [RFC PATCH] ext4: increase the protection of drop nlink and ext4 inode destroy

on 2017/1/11 23:34, Theodore Ts'o wrote:
> On Wed, Jan 11, 2017 at 05:07:29PM +0800, zhangyi (F) wrote:
>>
>> (1) The file we want to unlink have many hard links, but only one dcache entry in memory.
>> (2) open this file, but it's inode->i_nlink read from disk was 1 (too low).
>> (3) some one call rename and drop it's i_nlink to zero.
>> (4) it's inode is still in use and do not destroy (not closed), at the same time,
>> some others open it's hard link and create a dcache entry.
>> (5) call rename again and it's i_nlink will still underflow and cause memory corruption.
>
> Do you have reproducers that make it easy to reproduce situations like
> this? (It shouldn't be hard to write, but if you have them already
> will save me some effort. :-)
>

I make a reproducer, we can do the following steps to reproduce this probrem easily:
1) mount a ext4 file system, and create 3 files and 1 hard link,

#mount /dev/sdax /mnt
#cd /mnt
#touch old_file1 old_file2 new_file
#ln new_file new_link1

2) umount the file system and use the debugfs to change new_file's
links_count value to 1, which is used to simulate the fs inconsistency,

#umount /mnt
#debugfs /dev/sdax -w
set_inode_field new_file links_count 1

3) mount the fs again, and then execute the following program (Note:
do not execute the ls cmd, it will create the second dcache entry),

#define RENAME_OLD_FILE_1 "old_file1"
#define RENAME_OLD_FILE_2 "old_file2"
#define RENAME_NEW_FILE "new_file"
#define NEW_FILE_LINK_1 "new_link1"

int main(int argc, char *argv[])
{
int fd = 0;
int err = 0;

fd = open(RENAME_NEW_FILE, O_RDONLY);
if (fd < 0) {
printf("open error:%d\n", errno);
return -1;
}

err = rename(RENAME_OLD_FILE_1, RENAME_NEW_FILE);
if (err < 0) {
printf("rename error:%d\n", errno);
close(fd);
return -1;
}

err = rename(RENAME_OLD_FILE_2, NEW_FILE_LINK_1);
if (err < 0) {
printf("rename error:%d\n", errno);
close(fd);
return -1;
}

close(fd);
return 0;
}

4) after this, the new_file's inode->i_nlink is underflowed and add to orphan list,
kernel dump like this:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1814 at fs/inode.c:282 drop_nlink+0x3e/0x50
...
Call Trace:
dump_stack+0x63/0x86
__warn+0xcb/0xf0
warn_slowpath_null+0x1d/0x20
drop_nlink+0x3e/0x50
ext4_rename+0x532/0x8c0
ext4_rename2+0x1d/0x30
vfs_rename+0x728/0x940
? __lookup_hash+0x20/0xa0
SyS_rename+0x3ba/0x3e0
entry_SYSCALL_64_fastpath+0x1a/0xa9
...
---[ end trace b157dacbc891e6e8 ]---

5) then, we trigger mem shrink, this inode will be destroyed but it is still
on the orphan list,

#echo 3 > /proc/sys/vm/drop_caches

kernrl dump:

EXT4-fs (sdb1): Inode 16 (ffff98f4b3285c20): orphan list check failed!
...
ffff98f4b3285d30: fa87e800 ffff98f4 b3285e80 ffff98f4 .........^(.....
ffff98f4b3285d40: b20829d8 ffff98f4 00000010 00000000 .)..............
ffff98f4b3285d50: ffffffff 00000000 00000000 00000000 ................
...
Call Trace:
dump_stack+0x63/0x86
ext4_destroy_inode+0xa0/0xb0
destroy_inode+0x3b/0x60
evict+0x130/0x1c0
dispose_list+0x4d/0x70
prune_icache_sb+0x5a/0x80
super_cache_scan+0x14b/0x1a0
shrink_slab.part.40+0x1f5/0x420
shrink_slab+0x29/0x30
drop_slab_node+0x31/0x60
drop_slab+0x3f/0x70
drop_caches_sysctl_handler+0x71/0xc0
proc_sys_call_handler+0xea/0x110
proc_sys_write+0x14/0x20
__vfs_write+0x37/0x160
? selinux_file_permission+0xd7/0x110
? security_file_permission+0x3b/0xc0
vfs_write+0xb5/0x1a0
SyS_write+0x55/0xc0
entry_SYSCALL_64_fastpath+0x1a/0xa9
...
bash (1594): drop_caches: 3

6) Some time later, if we change the orphan list, it will cause memory corruption.

Thanks.

zhangyi