Hello,

On Wed 11-01-12 08:45:17, Surbhi Palande wrote:
> Isn't dirty data flushed out in "ordered" mode? as
> ext4_jbd2_file_inode() will get called for ordered writes. Thus this
> inode's data is flushed at journal commit time through
> journal_submit_data_buffers()?
Well, not with delayed allocation and also not for example for xfs. So
in some special cases it might happen but we cannot really depend on it.

> However I do see that we will still have a dirty data problem for
> "writeback" and "journalled" mode?
For journalled mode, data is treated as metadata so it's the mode where
the problems are smallest (although we'd still have problems because even
though kjournald writes the data, it clears only buffer dirty bits but not
page dirty bits). For writeback mode you are correct.

Honza

> On Wed, Jan 11, 2012 at 4:10 AM, Jan Kara <[email protected]> wrote:
> > On Tue 10-01-12 21:38:29, Surbhi Palande wrote:
> >> On second thoughts, I fail to see why there is still a race window
> >> after this patch.
> >>
> >> Here are the reasons why i fail to see how the data can be dirtied
> >> when all the operations involve a journal:
> >>
> >> ----------
> >> So here is the problem that we see
> >> ? ? ? CPU1 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? CPU2
> >> ? ? ? ?Task1 (write operation) ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Task2
> >> ---------------------------------------------------------------------------------------
> >> t1 ? ?ext4_journal_start()
> >> t2 ? ? ?ext4_journal_start_sb()
> >> t3 ? ? ? ?vfs_check_frozen ? ? ? ? ? ? ? ? ? ? ? ? ? ?sb->frozen=SB_FREEZE_WRITE
> >> t4 ? ? ? ? ? ?jbd2_journal_start() ? ? ? ? ? ? ? ? ? ?/* hence forth all processes calling
> >> vfs_check_frozen will wait */
> > ?Note that we call vfs_check_frozen(sb, SB_FREEZE_TRANS) in
> > ext4_journal_start_sb(). Thus we start blocking only when s_frozen ==
> > SB_FREEZE_TRANS and we just ignore s_frozen == SB_FREEZE_WRITE.
> >
> >> Now, our aim is to stop Task1 from dirtying the page cache ie in
> >> starting this transaction. However if it is successful in starting
> >> this transaction, then we want to make sure that this transaction is
> >> flushed out.
> >> Correct?
> > ?Not quite. Flushing a journal will flush dirty metadata but we will still
> > have dirty pages (dirty data is not part of any transaction). So in the
> > scenarion I describe in
> > http://marc.info/?l=linux-fsdevel&m=132585911925796&w=2
> > all metadata changes will be flushed inside ->freeze_fs (at least for
> > journalling filesystems) but pages will be left dirty. Is it clearer now?
> >
> > But your comment makes me realize that the situation is simpler than I
> > thought by the fact that we only have to protect paths that create dirty
> > data as dirty metadata can be handled by flushing a journal. And there are
> > only a few places creating dirty data. So a reasonably clean solution
> > shouldn't be that complicated after all. I'll tweak my patch and try it in
> > a moment.
> >
> > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Honza
> > --
> > Jan Kara <[email protected]>
> > SUSE Labs, CR
--
Jan Kara <[email protected]>
SUSE Labs, CR

2012-01-11 20:29:02

by Kamal Mostafa

[permalink] [raw]

Subject: Re: [PATCH v2 5/7] VFS: Avoid read-write deadlock in try_to_writeback_inodes_sb

On Fri, 2012-01-06 at 01:35 +0100, Jan Kara wrote:
> On Thu 08-12-11 10:04:35, Kamal Mostafa wrote:
> > From: Valerie Aurora <[email protected]>
> >
> > Use trylock in try_to_writeback_inodes_sb to avoid read-write
> > deadlocks that could be triggered by freeze.

> Christoph asked about what is the exact deadlock this patch tries to fix.
> I don't think you answered that. So can you elaborate please? Is it somehow
> connected with the fact that ext4 calls try_to_writeback_inodes_sb() with
> i_mutex held?
>
> Honza

This was discussed in the thread
http://www.spinics.net/lists/linux-fsdevel/msg48754.html
Summarizing...

Jan> What's exactly the deadlock trylock protects from here?
Jan> Or is it just an optimization?

Val> The trylock is an optimization Dave Chinner suggested. The first
Val> version I wrote acquired the lock and then checked vfs_is_frozen().

Dave> It's not so much an optimisation, but the general case of avoiding
Dave> read-write deadlocks such that freezing can trigger. I think remount
Dave> can trigger the same deadlock as freezing, so the trylock avoids
Dave> both deadlock cases rather than just working around the freeze
Dave> problem....

-Kamal

> > BugLink: https://bugs.launchpad.net/bugs/897421
> > Signed-off-by: Valerie Aurora <[email protected]>
> > Cc: Kamal Mostafa <[email protected]>
> > Tested-by: Peter M. Petrakis <[email protected]>
> > [[email protected]: patch restructure]
> > Signed-off-by: Kamal Mostafa <[email protected]>
> > ---
> > fs/fs-writeback.c | 13 ++++++++-----
> > 1 files changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > index ea89b3f..3a80f1b 100644
> > --- a/fs/fs-writeback.c
> > +++ b/fs/fs-writeback.c
> > @@ -1274,8 +1274,9 @@ EXPORT_SYMBOL(writeback_inodes_sb);
> > * try_to_writeback_inodes_sb - start writeback if none underway
> > * @sb: the superblock
> > *
> > - * Invoke writeback_inodes_sb if no writeback is currently underway.
> > - * Returns 1 if writeback was started, 0 if not.
> > + * Invoke writeback_inodes_sb if no writeback is currently underway
> > + * and no one else holds the s_umount lock. Returns 1 if writeback
> > + * was started, 0 if not.
> > */
> > int try_to_writeback_inodes_sb(struct super_block *sb, enum wb_reason reason)
> > {
> > @@ -1288,15 +1289,17 @@ EXPORT_SYMBOL(try_to_writeback_inodes_sb);
> > * @sb: the superblock
> > * @nr: the number of pages to write
> > *
> > - * Invoke writeback_inodes_sb if no writeback is currently underway.
> > - * Returns 1 if writeback was started, 0 if not.
> > + * Invoke writeback_inodes_sb if no writeback is currently underway
> > + * and no one else holds the s_umount lock. Returns 1 if writeback
> > + * was started, 0 if not.
> > */
> > int try_to_writeback_inodes_sb_nr(struct super_block *sb,
> > unsigned long nr,
> > enum wb_reason reason)
> > {
> > if (!writeback_in_progress(sb->s_bdi)) {
> > - down_read(&sb->s_umount);
> > + if (!down_read_trylock(&sb->s_umount))
> > + return 0;
> > if (nr == 0)
> > writeback_inodes_sb(sb, reason);
> > else
> > --
> > 1.7.5.4
> >

Attachments:

signature.asc (836.00 B)
This is a digitally signed message part

2012-01-12 15:54:32

by Mikulas Patocka

[permalink] [raw]

Subject: Re: [PATCH v2 5/7] VFS: Avoid read-write deadlock in try_to_writeback_inodes_sb

On Wed, 11 Jan 2012, Kamal Mostafa wrote:

> On Fri, 2012-01-06 at 01:35 +0100, Jan Kara wrote:
> > On Thu 08-12-11 10:04:35, Kamal Mostafa wrote:
> > > From: Valerie Aurora <[email protected]>
> > >
> > > Use trylock in try_to_writeback_inodes_sb to avoid read-write
> > > deadlocks that could be triggered by freeze.
>
> > Christoph asked about what is the exact deadlock this patch tries to fix.
> > I don't think you answered that. So can you elaborate please? Is it somehow
> > connected with the fact that ext4 calls try_to_writeback_inodes_sb() with
> > i_mutex held?
> >
> > Honza
>
> This was discussed in the thread
> http://www.spinics.net/lists/linux-fsdevel/msg48754.html
> Summarizing...
>
> Jan> What's exactly the deadlock trylock protects from here?
> Jan> Or is it just an optimization?
>
> Val> The trylock is an optimization Dave Chinner suggested. The first
> Val> version I wrote acquired the lock and then checked vfs_is_frozen().
>
> Dave> It's not so much an optimisation, but the general case of avoiding
> Dave> read-write deadlocks such that freezing can trigger. I think remount
> Dave> can trigger the same deadlock as freezing, so the trylock avoids
> Dave> both deadlock cases rather than just working around the freeze
> Dave> problem....
>
> -Kamal

As I wrote in
https://www.redhat.com/archives/dm-devel/2011-November/msg00151.html ,
down_read_trylock doesn't fix the freeze deadlock. Think of this sequence:

Process 1 (freezing)
down_write(&sb->s_umount);
set the filesystem to frozen state
up_write(&sb->s_umount);

Process 2 (executing the code from the patch)
down_read_trylock(&sb->s_umount); - succeeds, because s_umount is not held
writeback_inodes_sb(sb, reason); - waits, because the filesystem is frozen

Process 1 (unfreezing)
down_write(&sb->s_umount); - deadlock (process 1 is waiting for process 2
to drop the lock; process 2 is waiting for process 1 to unfreeze).

See the patch at
https://www.redhat.com/archives/dm-devel/2011-November/msg00151.html , it
has a different approach and it avoids the mentined freeze deadlock.

Mikulas

> > > BugLink: https://bugs.launchpad.net/bugs/897421
> > > Signed-off-by: Valerie Aurora <[email protected]>
> > > Cc: Kamal Mostafa <[email protected]>
> > > Tested-by: Peter M. Petrakis <[email protected]>
> > > [[email protected]: patch restructure]
> > > Signed-off-by: Kamal Mostafa <[email protected]>
> > > ---
> > > fs/fs-writeback.c | 13 ++++++++-----
> > > 1 files changed, 8 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> > > index ea89b3f..3a80f1b 100644
> > > --- a/fs/fs-writeback.c
> > > +++ b/fs/fs-writeback.c
> > > @@ -1274,8 +1274,9 @@ EXPORT_SYMBOL(writeback_inodes_sb);
> > > * try_to_writeback_inodes_sb - start writeback if none underway
> > > * @sb: the superblock
> > > *
> > > - * Invoke writeback_inodes_sb if no writeback is currently underway.
> > > - * Returns 1 if writeback was started, 0 if not.
> > > + * Invoke writeback_inodes_sb if no writeback is currently underway
> > > + * and no one else holds the s_umount lock. Returns 1 if writeback
> > > + * was started, 0 if not.
> > > */
> > > int try_to_writeback_inodes_sb(struct super_block *sb, enum wb_reason reason)
> > > {
> > > @@ -1288,15 +1289,17 @@ EXPORT_SYMBOL(try_to_writeback_inodes_sb);
> > > * @sb: the superblock
> > > * @nr: the number of pages to write
> > > *
> > > - * Invoke writeback_inodes_sb if no writeback is currently underway.
> > > - * Returns 1 if writeback was started, 0 if not.
> > > + * Invoke writeback_inodes_sb if no writeback is currently underway
> > > + * and no one else holds the s_umount lock. Returns 1 if writeback
> > > + * was started, 0 if not.
> > > */
> > > int try_to_writeback_inodes_sb_nr(struct super_block *sb,
> > > unsigned long nr,
> > > enum wb_reason reason)
> > > {
> > > if (!writeback_in_progress(sb->s_bdi)) {
> > > - down_read(&sb->s_umount);
> > > + if (!down_read_trylock(&sb->s_umount))
> > > + return 0;
> > > if (nr == 0)
> > > writeback_inodes_sb(sb, reason);
> > > else
> > > --
> > > 1.7.5.4