2011-09-13 02:57:14

by Valerie Aurora

[permalink] [raw]
Subject: [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock



Attachments:
1 (8.62 kB)

2011-09-14 14:00:48

by Jan Kara

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock

On Mon 12-09-11 19:57:11, Valerie Aurora wrote:
> Val, if you are sending patches as attachments, make them at least
> text/plain please!
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 04cf3b9..d1dca03 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -537,6 +537,9 @@ static long writeback_sb_inodes(struct super_block *sb,
> long write_chunk;
> long wrote = 0; /* count both pages and inodes */
>
> + if (vfs_is_frozen(sb))
> + return 0;
> +
Umm, maybe we could make this more robust by skipping the superblock in
__writeback_inodes_wb() and just explicitely stopping the writeback when
work->sb is set (i.e. writeback is required only for frozen sb) in
wb_writeback()?

> while (!list_empty(&wb->b_io)) {
> struct inode *inode = wb_inode(wb->b_io.prev);
>
> @@ -1238,39 +1241,43 @@ EXPORT_SYMBOL(writeback_inodes_sb);
> * writeback_inodes_sb_if_idle - start writeback if none underway
> * @sb: the superblock
> *
> - * Invoke writeback_inodes_sb if no writeback is currently underway.
> - * Returns 1 if writeback was started, 0 if not.
> + * Invoke writeback_inodes_sb if no writeback is currently underway
> + * and no one else holds the s_umount lock. Returns 1 if writeback
> + * was started, 0 if not.
> */
> int writeback_inodes_sb_if_idle(struct super_block *sb)
> {
> if (!writeback_in_progress(sb->s_bdi)) {
> - down_read(&sb->s_umount);
> - writeback_inodes_sb(sb);
> - up_read(&sb->s_umount);
> - return 1;
> - } else
> - return 0;
> + if (down_read_trylock(&sb->s_umount)) {
What's exactly the deadlock trylock protects from here? Or is it just an
optimization?

> + writeback_inodes_sb(sb);
> + up_read(&sb->s_umount);
> + return 1;
> + }
> + }
> + return 0;
> }
> EXPORT_SYMBOL(writeback_inodes_sb_if_idle);
>
> /**
> - * writeback_inodes_sb_if_idle - start writeback if none underway
> + * writeback_inodes_sb_nr_if_idle - start writeback if none underway
> * @sb: the superblock
> * @nr: the number of pages to write
> *
> - * Invoke writeback_inodes_sb if no writeback is currently underway.
> - * Returns 1 if writeback was started, 0 if not.
> + * Invoke writeback_inodes_sb if no writeback is currently underway
> + * and no one else holds the s_umount lock. Returns 1 if writeback
> + * was started, 0 if not.
> */
> int writeback_inodes_sb_nr_if_idle(struct super_block *sb,
> unsigned long nr)
> {
> if (!writeback_in_progress(sb->s_bdi)) {
> - down_read(&sb->s_umount);
> - writeback_inodes_sb_nr(sb, nr);
> - up_read(&sb->s_umount);
> - return 1;
> - } else
> - return 0;
> + if (down_read_trylock(&sb->s_umount)) {
The same question here...

> + writeback_inodes_sb_nr(sb, nr);
> + up_read(&sb->s_umount);
> + return 1;
> + }
> + }
> + return 0;
> }
> EXPORT_SYMBOL(writeback_inodes_sb_nr_if_idle);
>
> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
> index b34bdb2..993ce22 100644
> --- a/fs/quota/quota.c
> +++ b/fs/quota/quota.c
> @@ -366,6 +366,18 @@ SYSCALL_DEFINE4(quotactl, unsigned int, cmd, const char __user *, special,
> if (IS_ERR(sb))
> return PTR_ERR(sb);
>
> + /*
> + * It's not clear which quota functions may write to the file
> + * system (all?). Check for a frozen fs and bail out now.
> + */
> + if (vfs_is_frozen(sb)) {
> + drop_super(sb);
> + /* XXX Should quotactl_block() error path do this too? */
Yes, it does. Thanks for spotting this.

> + if (pathp && !IS_ERR(pathp))
> + path_put(pathp);
> + return -EBUSY;
> + }
> +
> ret = do_quotactl(sb, type, cmds, id, addr, pathp);
>
> drop_super(sb);
...
> diff --git a/fs/sync.c b/fs/sync.c
> index c98a747..db15b11 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -68,7 +68,7 @@ int sync_filesystem(struct super_block *sb)
> /*
> * No point in syncing out anything if the filesystem is read-only.
> */
> - if (sb->s_flags & MS_RDONLY)
> + if ((sb->s_flags & MS_RDONLY) && !vfs_is_frozen(sb))
^^^^
The check should be: (sb->s_flags & MS_RDONLY || vfs_is_frozen(sb))

> return 0;
>
> ret = __sync_filesystem(sb, 0);

Honza
--...
Jan Kara <[email protected]>
SUSE Labs, CR

2011-09-14 23:53:40

by Valerie Aurora

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock

On Wed, Sep 14, 2011 at 7:00 AM, Jan Kara <[email protected]> wrote:
> On Mon 12-09-11 19:57:11, Valerie Aurora wrote:
>> Val, if you are sending patches as attachments, make them at least
>> text/plain please!

Oops, sorry.

>> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
>> index 04cf3b9..d1dca03 100644
>> --- a/fs/fs-writeback.c
>> +++ b/fs/fs-writeback.c
>> @@ -537,6 +537,9 @@ static long writeback_sb_inodes(struct super_block *sb,
>> ? ? ? long write_chunk;
>> ? ? ? long wrote = 0; ?/* count both pages and inodes */
>>
>> + ? ? if (vfs_is_frozen(sb))
>> + ? ? ? ? ? ? return 0;
>> +
> ?Umm, maybe we could make this more robust by skipping the superblock in
> __writeback_inodes_wb() and just explicitely stopping the writeback when
> work->sb is set (i.e. writeback is required only for frozen sb) in
> wb_writeback()?

Sorry, I don't quite understand what the goal is here? I'm happy to
make the change, just want to make sure I'm accomplishing what you
want.

>> ? ? ? while (!list_empty(&wb->b_io)) {
>> ? ? ? ? ? ? ? struct inode *inode = wb_inode(wb->b_io.prev);
>>
>> @@ -1238,39 +1241,43 @@ EXPORT_SYMBOL(writeback_inodes_sb);
>> ? * writeback_inodes_sb_if_idle ? ? ? - ? ? ? start writeback if none underway
>> ? * @sb: the superblock
>> ? *
>> - * Invoke writeback_inodes_sb if no writeback is currently underway.
>> - * Returns 1 if writeback was started, 0 if not.
>> + * Invoke writeback_inodes_sb if no writeback is currently underway
>> + * and no one else holds the s_umount lock. ?Returns 1 if writeback
>> + * was started, 0 if not.
>> ? */
>> ?int writeback_inodes_sb_if_idle(struct super_block *sb)
>> ?{
>> ? ? ? if (!writeback_in_progress(sb->s_bdi)) {
>> - ? ? ? ? ? ? down_read(&sb->s_umount);
>> - ? ? ? ? ? ? writeback_inodes_sb(sb);
>> - ? ? ? ? ? ? up_read(&sb->s_umount);
>> - ? ? ? ? ? ? return 1;
>> - ? ? } else
>> - ? ? ? ? ? ? return 0;
>> + ? ? ? ? ? ? if (down_read_trylock(&sb->s_umount)) {
> ?What's exactly the deadlock trylock protects from here? Or is it just an
> optimization?

The trylock is an optimization Dave Chinner suggested. The first
version I wrote acquired the lock and then checked vfs_is_frozen().

This function and the similar following one each have one caller,
btrfs in one case and ext4 in the other, and they are both trying to
get more writes to go out to disk in the hope of freeing up disk
space.

>> + ? ? ? ? ? ? ? ? ? ? writeback_inodes_sb(sb);
>> + ? ? ? ? ? ? ? ? ? ? up_read(&sb->s_umount);
>> + ? ? ? ? ? ? ? ? ? ? return 1;
>> + ? ? ? ? ? ? }
>> + ? ? }
>> + ? ? return 0;
>> ?}
>> ?EXPORT_SYMBOL(writeback_inodes_sb_if_idle);
>>
>> ?/**
>> - * writeback_inodes_sb_if_idle ? ? ? - ? ? ? start writeback if none underway
>> + * writeback_inodes_sb_nr_if_idle ? ?- ? ? ? start writeback if none underway
>> ? * @sb: the superblock
>> ? * @nr: the number of pages to write
>> ? *
>> - * Invoke writeback_inodes_sb if no writeback is currently underway.
>> - * Returns 1 if writeback was started, 0 if not.
>> + * Invoke writeback_inodes_sb if no writeback is currently underway
>> + * and no one else holds the s_umount lock. ?Returns 1 if writeback
>> + * was started, 0 if not.
>> ? */
>> ?int writeback_inodes_sb_nr_if_idle(struct super_block *sb,
>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?unsigned long nr)
>> ?{
>> ? ? ? if (!writeback_in_progress(sb->s_bdi)) {
>> - ? ? ? ? ? ? down_read(&sb->s_umount);
>> - ? ? ? ? ? ? writeback_inodes_sb_nr(sb, nr);
>> - ? ? ? ? ? ? up_read(&sb->s_umount);
>> - ? ? ? ? ? ? return 1;
>> - ? ? } else
>> - ? ? ? ? ? ? return 0;
>> + ? ? ? ? ? ? if (down_read_trylock(&sb->s_umount)) {
> ?The same question here...
>
>> + ? ? ? ? ? ? ? ? ? ? writeback_inodes_sb_nr(sb, nr);
>> + ? ? ? ? ? ? ? ? ? ? up_read(&sb->s_umount);
>> + ? ? ? ? ? ? ? ? ? ? return 1;
>> + ? ? ? ? ? ? }
>> + ? ? }
>> + ? ? return 0;
>> ?}
>> ?EXPORT_SYMBOL(writeback_inodes_sb_nr_if_idle);
>>
>> diff --git a/fs/quota/quota.c b/fs/quota/quota.c
>> index b34bdb2..993ce22 100644
>> --- a/fs/quota/quota.c
>> +++ b/fs/quota/quota.c
>> @@ -366,6 +366,18 @@ SYSCALL_DEFINE4(quotactl, unsigned int, cmd, const char __user *, special,
>> ? ? ? if (IS_ERR(sb))
>> ? ? ? ? ? ? ? return PTR_ERR(sb);
>>
>> + ? ? /*
>> + ? ? ?* It's not clear which quota functions may write to the file
>> + ? ? ?* system (all?). ?Check for a frozen fs and bail out now.
>> + ? ? ?*/
>> + ? ? if (vfs_is_frozen(sb)) {
>> + ? ? ? ? ? ? drop_super(sb);
>> + ? ? ? ? ? ? /* XXX Should quotactl_block() error path do this too? */
> ?Yes, it does. Thanks for spotting this.

Fixed, thanks for confirming.

>> + ? ? ? ? ? ? if (pathp && !IS_ERR(pathp))
>> + ? ? ? ? ? ? ? ? ? ? path_put(pathp);
>> + ? ? ? ? ? ? return -EBUSY;
>> + ? ? }
>> +
>> ? ? ? ret = do_quotactl(sb, type, cmds, id, addr, pathp);
>>
>> ? ? ? drop_super(sb);
> ...
>> diff --git a/fs/sync.c b/fs/sync.c
>> index c98a747..db15b11 100644
>> --- a/fs/sync.c
>> +++ b/fs/sync.c
>> @@ -68,7 +68,7 @@ int sync_filesystem(struct super_block *sb)
>> ? ? ? /*
>> ? ? ? ?* No point in syncing out anything if the filesystem is read-only.
>> ? ? ? ?*/
>> - ? ? if (sb->s_flags & MS_RDONLY)
>> + ? ? if ((sb->s_flags & MS_RDONLY) && !vfs_is_frozen(sb))
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?^^^^
> The check should be: (sb->s_flags & MS_RDONLY || vfs_is_frozen(sb))

Fixed, and I reviewed the other checks for similar mistakes. Thanks!
I'll resend soon.

-VAL

2011-09-15 16:22:16

by Jan Kara

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock

On Wed 14-09-11 16:53:38, Valerie Aurora wrote:
> On Wed, Sep 14, 2011 at 7:00 AM, Jan Kara <[email protected]> wrote:
> > On Mon 12-09-11 19:57:11, Valerie Aurora wrote:
> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >> index 04cf3b9..d1dca03 100644
> >> --- a/fs/fs-writeback.c
> >> +++ b/fs/fs-writeback.c
> >> @@ -537,6 +537,9 @@ static long writeback_sb_inodes(struct super_block *sb,
> >> ? ? ? long write_chunk;
> >> ? ? ? long wrote = 0; ?/* count both pages and inodes */
> >>
> >> + ? ? if (vfs_is_frozen(sb))
> >> + ? ? ? ? ? ? return 0;
> >> +
> > ?Umm, maybe we could make this more robust by skipping the superblock in
> > __writeback_inodes_wb() and just explicitely stopping the writeback when
> > work->sb is set (i.e. writeback is required only for frozen sb) in
> > wb_writeback()?
>
> Sorry, I don't quite understand what the goal is here? I'm happy to
> make the change, just want to make sure I'm accomplishing what you
> want.
My worry is, that when we just bail out from writeback_sb_inodes(), we
leave dirty inodes on b_io list. That is a unique behavior in writeback
code since all other places either put inode to b_more_io list for a quick
retry or redirty the inode to retry it later.

Emptyness of b_io list is used for example to detect whether we should
queue more inodes to b_io list and also busylooping logic kind of doesn't
count with inodes staying at b_io list. So although I don't see an
immediate problem with your solution it does add a new special case.

After some more thought I think the cleanest would be to just call
redirty_tail() if an inode is on frozen superblock in the beginning of the
loop in writeback_sb_inodes(). It won't be as efficient but much more in
line how the rest of the writeback code works.

> >> ? ? ? while (!list_empty(&wb->b_io)) {
> >> ? ? ? ? ? ? ? struct inode *inode = wb_inode(wb->b_io.prev);
> >>
> >> @@ -1238,39 +1241,43 @@ EXPORT_SYMBOL(writeback_inodes_sb);
> >> ? * writeback_inodes_sb_if_idle ? ? ? - ? ? ? start writeback if none underway
> >> ? * @sb: the superblock
> >> ? *
> >> - * Invoke writeback_inodes_sb if no writeback is currently underway.
> >> - * Returns 1 if writeback was started, 0 if not.
> >> + * Invoke writeback_inodes_sb if no writeback is currently underway
> >> + * and no one else holds the s_umount lock. ?Returns 1 if writeback
> >> + * was started, 0 if not.
> >> ? */
> >> ?int writeback_inodes_sb_if_idle(struct super_block *sb)
> >> ?{
> >> ? ? ? if (!writeback_in_progress(sb->s_bdi)) {
> >> - ? ? ? ? ? ? down_read(&sb->s_umount);
> >> - ? ? ? ? ? ? writeback_inodes_sb(sb);
> >> - ? ? ? ? ? ? up_read(&sb->s_umount);
> >> - ? ? ? ? ? ? return 1;
> >> - ? ? } else
> >> - ? ? ? ? ? ? return 0;
> >> + ? ? ? ? ? ? if (down_read_trylock(&sb->s_umount)) {
> > ?What's exactly the deadlock trylock protects from here? Or is it just an
> > optimization?
>
> The trylock is an optimization Dave Chinner suggested. The first
> version I wrote acquired the lock and then checked vfs_is_frozen().
>
> This function and the similar following one each have one caller,
> btrfs in one case and ext4 in the other, and they are both trying to
> get more writes to go out to disk in the hope of freeing up disk
> space.
Ah right. Maybe a short comment that it's just an optimization for
opportunictic writeout would be nice.

> >> + ? ? ? ? ? ? ? ? ? ? writeback_inodes_sb(sb);
> >> + ? ? ? ? ? ? ? ? ? ? up_read(&sb->s_umount);
> >> + ? ? ? ? ? ? ? ? ? ? return 1;
> >> + ? ? ? ? ? ? }
> >> + ? ? }
> >> + ? ? return 0;

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2011-09-18 23:25:36

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock

On Wed, Sep 14, 2011 at 04:53:38PM -0700, Valerie Aurora wrote:
> On Wed, Sep 14, 2011 at 7:00 AM, Jan Kara <[email protected]> wrote:
> > On Mon 12-09-11 19:57:11, Valerie Aurora wrote:
> >> Val, if you are sending patches as attachments, make them at least
> >> text/plain please!
>
> Oops, sorry.
>
> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> >> index 04cf3b9..d1dca03 100644
> >> --- a/fs/fs-writeback.c
> >> +++ b/fs/fs-writeback.c
> >> @@ -537,6 +537,9 @@ static long writeback_sb_inodes(struct super_block *sb,
> >> ? ? ? long write_chunk;
> >> ? ? ? long wrote = 0; ?/* count both pages and inodes */
> >>
> >> + ? ? if (vfs_is_frozen(sb))
> >> + ? ? ? ? ? ? return 0;
> >> +
> > ?Umm, maybe we could make this more robust by skipping the superblock in
> > __writeback_inodes_wb() and just explicitely stopping the writeback when
> > work->sb is set (i.e. writeback is required only for frozen sb) in
> > wb_writeback()?
>
> Sorry, I don't quite understand what the goal is here? I'm happy to
> make the change, just want to make sure I'm accomplishing what you
> want.
>
> >> ? ? ? while (!list_empty(&wb->b_io)) {
> >> ? ? ? ? ? ? ? struct inode *inode = wb_inode(wb->b_io.prev);
> >>
> >> @@ -1238,39 +1241,43 @@ EXPORT_SYMBOL(writeback_inodes_sb);
> >> ? * writeback_inodes_sb_if_idle ? ? ? - ? ? ? start writeback if none underway
> >> ? * @sb: the superblock
> >> ? *
> >> - * Invoke writeback_inodes_sb if no writeback is currently underway.
> >> - * Returns 1 if writeback was started, 0 if not.
> >> + * Invoke writeback_inodes_sb if no writeback is currently underway
> >> + * and no one else holds the s_umount lock. ?Returns 1 if writeback
> >> + * was started, 0 if not.
> >> ? */
> >> ?int writeback_inodes_sb_if_idle(struct super_block *sb)
> >> ?{
> >> ? ? ? if (!writeback_in_progress(sb->s_bdi)) {
> >> - ? ? ? ? ? ? down_read(&sb->s_umount);
> >> - ? ? ? ? ? ? writeback_inodes_sb(sb);
> >> - ? ? ? ? ? ? up_read(&sb->s_umount);
> >> - ? ? ? ? ? ? return 1;
> >> - ? ? } else
> >> - ? ? ? ? ? ? return 0;
> >> + ? ? ? ? ? ? if (down_read_trylock(&sb->s_umount)) {
> > ?What's exactly the deadlock trylock protects from here? Or is it just an
> > optimization?
>
> The trylock is an optimization Dave Chinner suggested. The first
> version I wrote acquired the lock and then checked vfs_is_frozen().

It's not so much an optimisation, but the general case of avoiding
read-write deadlocks such that freezing can trigger. I think remount
can trigger the same deadlock as freezing, so the trylock avoids both
deadlock cases rather than just working around the freeze problem....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2011-09-20 22:51:17

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] VFS: Fix s_umount thaw/write deadlock

On Mon, Sep 12, 2011 at 07:57:11PM -0700, Valerie Aurora wrote:
>

Can you please resend the series in proper format, that is inlined,
with the subject in the mail header and just the description and the
patch itself in the body?

The current form is not really reviewable.