Hello,
this series of patches fixes possible use-after-free issues resulting from
freed inode being still in writeback list. One partner was able to trigger
this using fsstress on filesystem on pmem device. The first patch fixes
mostly a theoretical issue, the third patch fixes a real issue happening
in the wild and leading to kernel crashing.
Honza
evict() can race with writeback_sb_inodes() and so
list_empty(&inode->i_io_list) check can race with list_move() from
redirty_tail() possibly resulting in list_empty() returning false and
thus we end up leaving freed inode in wb->b_dirty list leading to
use-after-free issues.
Fix the problem by using list_empty_careful() check and add assert that
inode's i_io_list is empty in clear_inode() to catch the problem earlier
in the future.
Signed-off-by: Jan Kara <[email protected]>
---
fs/inode.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/inode.c b/fs/inode.c
index 93d9252a00ab..a73c8a7aa71a 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -534,6 +534,7 @@ void clear_inode(struct inode *inode)
BUG_ON(!(inode->i_state & I_FREEING));
BUG_ON(inode->i_state & I_CLEAR);
BUG_ON(!list_empty(&inode->i_wb_list));
+ BUG_ON(!list_empty(&inode->i_io_list));
/* don't need i_lock here, no concurrent mods to i_state */
inode->i_state = I_FREEING | I_CLEAR;
}
@@ -559,7 +560,13 @@ static void evict(struct inode *inode)
BUG_ON(!(inode->i_state & I_FREEING));
BUG_ON(!list_empty(&inode->i_lru));
- if (!list_empty(&inode->i_io_list))
+ /*
+ * We are the only holder of the inode so it cannot be marked dirty.
+ * Flusher thread won't start new writeback but there can be still e.g.
+ * redirty_tail() running from writeback_sb_inodes(). So we have to be
+ * careful to remove inode from dirty/io list in all the cases.
+ */
+ if (!list_empty_careful(&inode->i_io_list))
inode_io_list_del(inode);
inode_sb_list_del(inode);
--
2.16.4
Ext4 needs to remove inode from writeback lists after it is out of
visibility of its journalling machinery (which can still dirty the
inode). Export inode_io_list_del() for it.
Signed-off-by: Jan Kara <[email protected]>
---
fs/fs-writeback.c | 1 +
fs/internal.h | 2 --
include/linux/writeback.h | 1 +
3 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 76ac9c7d32ec..e58bd5f758d0 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -1126,6 +1126,7 @@ void inode_io_list_del(struct inode *inode)
inode_io_list_del_locked(inode, wb);
spin_unlock(&wb->list_lock);
}
+EXPORT_SYMBOL(inode_io_list_del);
/*
* mark an inode as under writeback on the sb
diff --git a/fs/internal.h b/fs/internal.h
index aa5d45524e87..8819d0d58b03 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -143,8 +143,6 @@ extern int dentry_needs_remove_privs(struct dentry *dentry);
/*
* fs-writeback.c
*/
-extern void inode_io_list_del(struct inode *inode);
-
extern long get_nr_dirty_inodes(void);
extern int invalidate_inodes(struct super_block *, bool);
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index a19d845dd7eb..902aa317621b 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -197,6 +197,7 @@ void wakeup_flusher_threads(enum wb_reason reason);
void wakeup_flusher_threads_bdi(struct backing_dev_info *bdi,
enum wb_reason reason);
void inode_wait_for_writeback(struct inode *inode);
+void inode_io_list_del(struct inode *inode);
/* writeback.h requires fs.h; it, too, is not included from here. */
static inline void wait_on_inode(struct inode *inode)
--
2.16.4
When we are evicting inode with journalled data, we may race with
transaction commit in the following way:
CPU0 CPU1
jbd2_journal_commit_transaction() evict(inode)
inode_io_list_del()
inode_wait_for_writeback()
process BJ_Forget list
__jbd2_journal_insert_checkpoint()
__jbd2_journal_refile_buffer()
__jbd2_journal_unfile_buffer()
if (test_clear_buffer_jbddirty(bh))
mark_buffer_dirty(bh)
__mark_inode_dirty(inode)
ext4_evict_inode(inode)
frees the inode
This results in use-after-free issues in the writeback code (or
the assertion added in the previous commit triggering).
Fix the problem by removing inode from writeback lists once all the page
cache is evicted and so inode cannot be added to writeback lists again.
Signed-off-by: Jan Kara <[email protected]>
---
fs/ext4/inode.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e416096fc081..d8a9d3da678c 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -220,6 +220,16 @@ void ext4_evict_inode(struct inode *inode)
ext4_begin_ordered_truncate(inode, 0);
truncate_inode_pages_final(&inode->i_data);
+ /*
+ * For inodes with journalled data, transaction commit could have
+ * dirtied the inode. Flush worker is ignoring it because of I_FREEING
+ * flag but we still need to remove the inode from the writeback lists.
+ */
+ if (!list_empty_careful(&inode->i_io_list)) {
+ WARN_ON_ONCE(!ext4_should_journal_data(inode));
+ inode_io_list_del(inode);
+ }
+
/*
* Protect us against freezing - iput() caller didn't have to have any
* protection against it
--
2.16.4
hi,
> evict() can race with writeback_sb_inodes() and so
> list_empty(&inode->i_io_list) check can race with list_move() from
> redirty_tail() possibly resulting in list_empty() returning false and
^^^^^^^^^^^^^^^
returning true?
if (!list_empty(&inode->i_io_list))
inode_io_list_del(inode);
so "!list_empty(&inode->i_io_list)" returns false, and will not remove
inode for wb->b_dirty list.
> thus we end up leaving freed inode in wb->b_dirty list leading to
> use-after-free issues.
>
> Fix the problem by using list_empty_careful() check and add assert that
> inode's i_io_list is empty in clear_inode() to catch the problem earlier
> in the future.
From list_empty_careful()'s comments, using list_empty_careful() without
synchronization can only be safe if the only activity that can happen to the
list entry is list_del_init(), but list_move() does not use list_del_init().
static inline void list_move(struct list_head *list, struct list_head *head)
{
__list_del_entry(list);
list_add(list, head);
}
So I wonder whether list_empty(&inode->i_io_list) check in evict() can race with
list_move() from redirty_tail()?
Regards,
Xiaoguang Wang
>
> Signed-off-by: Jan Kara <[email protected]>
> ---
> fs/inode.c | 9 ++++++++-
> 1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/fs/inode.c b/fs/inode.c
> index 93d9252a00ab..a73c8a7aa71a 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -534,6 +534,7 @@ void clear_inode(struct inode *inode)
> BUG_ON(!(inode->i_state & I_FREEING));
> BUG_ON(inode->i_state & I_CLEAR);
> BUG_ON(!list_empty(&inode->i_wb_list));
> + BUG_ON(!list_empty(&inode->i_io_list));
> /* don't need i_lock here, no concurrent mods to i_state */
> inode->i_state = I_FREEING | I_CLEAR;
> }
> @@ -559,7 +560,13 @@ static void evict(struct inode *inode)
> BUG_ON(!(inode->i_state & I_FREEING));
> BUG_ON(!list_empty(&inode->i_lru));
>
> - if (!list_empty(&inode->i_io_list))
> + /*
> + * We are the only holder of the inode so it cannot be marked dirty.
> + * Flusher thread won't start new writeback but there can be still e.g.
> + * redirty_tail() running from writeback_sb_inodes(). So we have to be
> + * careful to remove inode from dirty/io list in all the cases.
> + */
> + if (!list_empty_careful(&inode->i_io_list))
> inode_io_list_del(inode);
>
> inode_sb_list_del(inode);
>
On Sat 25-04-20 14:42:03, Xiaoguang Wang wrote:
> hi,
>
> > evict() can race with writeback_sb_inodes() and so
> > list_empty(&inode->i_io_list) check can race with list_move() from
> > redirty_tail() possibly resulting in list_empty() returning false and
> ^^^^^^^^^^^^^^^
> returning true?
> if (!list_empty(&inode->i_io_list))
> inode_io_list_del(inode);
> so "!list_empty(&inode->i_io_list)" returns false, and will not remove
> inode for wb->b_dirty list.
Yeah, right. I'll fix the mistake in the changelog. Thanks for noticing.
> > thus we end up leaving freed inode in wb->b_dirty list leading to
> > use-after-free issues.
> >
> > Fix the problem by using list_empty_careful() check and add assert that
> > inode's i_io_list is empty in clear_inode() to catch the problem earlier
> > in the future.
> From list_empty_careful()'s comments, using list_empty_careful() without
> synchronization can only be safe if the only activity that can happen to the
> list entry is list_del_init(), but list_move() does not use list_del_init().
>
> static inline void list_move(struct list_head *list, struct list_head *head)
> {
> __list_del_entry(list);
> list_add(list, head);
> }
>
> So I wonder whether list_empty(&inode->i_io_list) check in evict() can
> race with list_move() from redirty_tail()?
list_empty() check can race with list_move() but I don't think the outcome
of the racy check can ever be that the list is empty... Thinking about it
again, I'm not sure how even the list_empty() check could give false
positive because during the list_move() sequence, I don't think head->next
== head is ever true. So maybe this patch isn't needed at all (except for
the added BUG_ON() which is useful).
Honza
> >
> > Signed-off-by: Jan Kara <[email protected]>
> > ---
> > fs/inode.c | 9 ++++++++-
> > 1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/inode.c b/fs/inode.c
> > index 93d9252a00ab..a73c8a7aa71a 100644
> > --- a/fs/inode.c
> > +++ b/fs/inode.c
> > @@ -534,6 +534,7 @@ void clear_inode(struct inode *inode)
> > BUG_ON(!(inode->i_state & I_FREEING));
> > BUG_ON(inode->i_state & I_CLEAR);
> > BUG_ON(!list_empty(&inode->i_wb_list));
> > + BUG_ON(!list_empty(&inode->i_io_list));
> > /* don't need i_lock here, no concurrent mods to i_state */
> > inode->i_state = I_FREEING | I_CLEAR;
> > }
> > @@ -559,7 +560,13 @@ static void evict(struct inode *inode)
> > BUG_ON(!(inode->i_state & I_FREEING));
> > BUG_ON(!list_empty(&inode->i_lru));
> > - if (!list_empty(&inode->i_io_list))
> > + /*
> > + * We are the only holder of the inode so it cannot be marked dirty.
> > + * Flusher thread won't start new writeback but there can be still e.g.
> > + * redirty_tail() running from writeback_sb_inodes(). So we have to be
> > + * careful to remove inode from dirty/io list in all the cases.
> > + */
> > + if (!list_empty_careful(&inode->i_io_list))
> > inode_io_list_del(inode);
> > inode_sb_list_del(inode);
> >
--
Jan Kara <[email protected]>
SUSE Labs, CR
On Tue, Apr 21, 2020 at 10:54:44AM +0200, Jan Kara wrote:
> Ext4 needs to remove inode from writeback lists after it is out of
> visibility of its journalling machinery (which can still dirty the
> inode). Export inode_io_list_del() for it.
>
> Signed-off-by: Jan Kara <[email protected]>
Applied, thanks.
- Ted
On Tue, Apr 21, 2020 at 10:54:45AM +0200, Jan Kara wrote:
> When we are evicting inode with journalled data, we may race with
> transaction commit in the following way:
>
> CPU0 CPU1
> jbd2_journal_commit_transaction() evict(inode)
> inode_io_list_del()
> inode_wait_for_writeback()
> process BJ_Forget list
> __jbd2_journal_insert_checkpoint()
> __jbd2_journal_refile_buffer()
> __jbd2_journal_unfile_buffer()
> if (test_clear_buffer_jbddirty(bh))
> mark_buffer_dirty(bh)
> __mark_inode_dirty(inode)
> ext4_evict_inode(inode)
> frees the inode
>
> This results in use-after-free issues in the writeback code (or
> the assertion added in the previous commit triggering).
>
> Fix the problem by removing inode from writeback lists once all the page
> cache is evicted and so inode cannot be added to writeback lists again.
>
> Signed-off-by: Jan Kara <[email protected]>
Applied, thanks.
- Ted