2008-03-06 07:18:15

by Andreas Dilger

[permalink] [raw]
Subject: Re: [RFC, PATCH 6/6] ext3: do not write to the disk when mounting a dirty read-only filesystem

On Mar 06, 2008 01:59 +0000, Duane Griffin wrote:
> NOTE: For now I'm simply preventing filesystems requiring recovery from being
> remounted read-write. This breaks booting with an uncleanly mounted root
> filesystem!

I was going to ask about this - not being able to remount rw is a serious
problem because many users have only the root filesystem and this
limitation basically prevents this patch from being landable.

I'd suggest if the filesystem is going to be remounted read/write that
the journal mapping be discarded and the journal replayed. Depending
on how you do the mapping it may be necessary to invalidate all of the
pages in the cache so that they don't reference the blocks in the journal.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.



2008-03-06 11:19:06

by Duane Griffin

[permalink] [raw]
Subject: Re: [RFC, PATCH 6/6] ext3: do not write to the disk when mounting a dirty read-only filesystem

On 06/03/2008, Andreas Dilger <[email protected]> wrote:
> On Mar 06, 2008 01:59 +0000, Duane Griffin wrote:
> > NOTE: For now I'm simply preventing filesystems requiring recovery from being
> > remounted read-write. This breaks booting with an uncleanly mounted root
> > filesystem!
>
> I was going to ask about this - not being able to remount rw is a serious
> problem because many users have only the root filesystem and this
> limitation basically prevents this patch from being landable.

Yep, I agree. I wanted to post this as an RFC now to get feedback on
the overall approach. I'll try and get remount support in there ASAP
and post another version.

> I'd suggest if the filesystem is going to be remounted read/write that
> the journal mapping be discarded and the journal replayed. Depending
> on how you do the mapping it may be necessary to invalidate all of the
> pages in the cache so that they don't reference the blocks in the journal.

That was along the lines I was thinking of, too. Thanks for mentioning
invalidating the page cache -- I'll make sure that doesn't get
overlooked.

> Cheers, Andreas

Cheers,
Duane.

--
"I never could learn to drink that blood and call it wine" - Bob Dylan

2008-03-11 15:11:23

by Jan Kara

[permalink] [raw]
Subject: Re: [RFC, PATCH 6/6] ext3: do not write to the disk when mounting a dirty read-only filesystem

> On Mar 06, 2008 01:59 +0000, Duane Griffin wrote:
> > NOTE: For now I'm simply preventing filesystems requiring recovery from being
> > remounted read-write. This breaks booting with an uncleanly mounted root
> > filesystem!
>
> I was going to ask about this - not being able to remount rw is a serious
> problem because many users have only the root filesystem and this
> limitation basically prevents this patch from being landable.
>
> I'd suggest if the filesystem is going to be remounted read/write that
> the journal mapping be discarded and the journal replayed. Depending
> on how you do the mapping it may be necessary to invalidate all of the
> pages in the cache so that they don't reference the blocks in the journal.
Actually, this is nastier than it looks - currently fs, asks
ext3_sb_getblk() for block 'a' and it gets buffer head with b_blocknr == 'b'
instead. So when remounting you'd have to rewrite these buffers with
original block numbers which is not really possible. So I think
remapping will have to be solved differently, like providing buffer
head with correct b_blocknr but taking care when reading data to it and
reading them from elsewhere. Actually, this has to be done anyway
because JBD escapes data in the journal and you have to do unescaping
when reading data...


Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2008-03-12 02:42:49

by Duane Griffin

[permalink] [raw]
Subject: Re: [RFC, PATCH 6/6] ext3: do not write to the disk when mounting a dirty read-only filesystem

On Tue, Mar 11, 2008 at 04:11:23PM +0100, Jan Kara wrote:
> Actually, this is nastier than it looks - currently fs, asks
> ext3_sb_getblk() for block 'a' and it gets buffer head with b_blocknr == 'b'
> instead.

Note that it will be a different device as well, in the case of an
external journal.

> So when remounting you'd have to rewrite these buffers with
> original block numbers which is not really possible. So I think
> remapping will have to be solved differently, like providing buffer
> head with correct b_blocknr but taking care when reading data to it and
> reading them from elsewhere. Actually, this has to be done anyway
> because JBD escapes data in the journal and you have to do unescaping
> when reading data...

Hmm, I'll think about this and try to get something working. As a quick
proof-of-concept hack, getting both buffers then overwriting the fs
block's data with the unescaped journal data should do the trick, right?

Cheers,
Duane.

--
"I never could learn to drink that blood and call it wine" - Bob Dylan

2008-03-12 10:53:52

by Jan Kara

[permalink] [raw]
Subject: Re: [RFC, PATCH 6/6] ext3: do not write to the disk when mounting a dirty read-only filesystem

On Wed 12-03-08 02:42:46, Duane Griffin wrote:
> On Tue, Mar 11, 2008 at 04:11:23PM +0100, Jan Kara wrote:
> > Actually, this is nastier than it looks - currently fs, asks
> > ext3_sb_getblk() for block 'a' and it gets buffer head with b_blocknr == 'b'
> > instead.
>
> Note that it will be a different device as well, in the case of an
> external journal.
>
> > So when remounting you'd have to rewrite these buffers with
> > original block numbers which is not really possible. So I think
> > remapping will have to be solved differently, like providing buffer
> > head with correct b_blocknr but taking care when reading data to it and
> > reading them from elsewhere. Actually, this has to be done anyway
> > because JBD escapes data in the journal and you have to do unescaping
> > when reading data...
>
> Hmm, I'll think about this and try to get something working. As a quick
> proof-of-concept hack, getting both buffers then overwriting the fs
> block's data with the unescaped journal data should do the trick, right?
Yes, it should, but you should take care for users that do things like:

getblk(a)
ll_rw_block(READ, a)

or even

getblk(a)
submit_bh(a)

I'm not sure if there are any in ext3/ext4 but it definitely needs
checking.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR