2006-08-10 06:39:20

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

On Wed, 09 Aug 2006 18:17:02 -0700
Mingming Cao <[email protected]> wrote:

> Fork(copy) ext4 filesystem from ext3 filesystem. Rename all functions in ext4 from ext3_xxx() to ext4_xxx().

It would have been nice to spend a few hours cleaning up ext3 and JBD
before doing this. The code isn't toooo bad, but there are number of
coding style problems, whitespace screwups, incorrect comments, missing
comments, poorly-chosen variable names and all of that sort of thing.

One the fs has been copied-and-pasted, it's much harder to address these
things: either need to do it twice, or allow the filesystems to diverge, or
not do it.

Also, -mm presently has two patches pending against fs/jbd/ and nine pending
against fs/ext3/. We should get all those things merged before taking the
copy.

Also, JBD is presently feeding into submit_bh() buffer_heads which span two
machine pages, and some device drivers spit the dummy. It'd be better to
fix that once, rather than twice..


2006-08-10 16:42:05

by Mingming Cao

[permalink] [raw]
Subject: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

Andrew Morton wrote:

> On Wed, 09 Aug 2006 18:17:02 -0700
> Mingming Cao <[email protected]> wrote:
>
>
>>Fork(copy) ext4 filesystem from ext3 filesystem. Rename all functions in ext4 from ext3_xxx() to ext4_xxx().
>
>
> It would have been nice to spend a few hours cleaning up ext3 and JBD
> before doing this. The code isn't toooo bad, but there are number of
> coding style problems, whitespace screwups, incorrect comments, missing
> comments, poorly-chosen variable names and all of that sort of thing.
>
> One the fs has been copied-and-pasted, it's much harder to address these
> things: either need to do it twice, or allow the filesystems to diverge, or
> not do it.
>
Andrew, thanks for taking a close look this series of changes.

I agree with you that the timing is right, to do the clean up now rather
than later. I would give it a try. If I could get more help from more
code reviewer, it probably makes the effort a lot easier. For those
issues you pointed out : coding style problem??incorrect comments,
poorly-named variables -- do you have any specific examples in your mind?

> Also, -mm presently has two patches pending against fs/jbd/ and nine pending
> against fs/ext3/. We should get all those things merged before taking the
> copy.
>
So probably the right thing to do is keep the ext4 patches against mm
tree instead of rc three?

> Also, JBD is presently feeding into submit_bh() buffer_heads which span two
> machine pages, and some device drivers spit the dummy. It'd be better to
> fix that once, rather than twice..

Okay, I will look at it.


Thanks,
Mingming

2006-08-10 16:48:51

by Jörn Engel

[permalink] [raw]
Subject: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

On Thu, 10 August 2006 09:41:59 -0700, Mingming Cao wrote:
>
> I agree with you that the timing is right, to do the clean up now rather
> than later. I would give it a try. If I could get more help from more
> code reviewer, it probably makes the effort a lot easier. For those
> issues you pointed out : coding style problem??incorrect comments,
> poorly-named variables -- do you have any specific examples in your mind?

For whitespace damage, you can try the following regex:
/\s\+$\| \+\ze\t/

Or if you use vim as an editor, you can add this to your vimrc:
highlight RedundantSpaces ctermbg=red guibg=red
match RedundantSpaces /\s\+$\| \+\ze\t/

For example, it will show 4 cases of trailing whitespace in the quoted
part of your email above. :)

J?rn

--
You ain't got no problem, Jules. I'm on the motherfucker. Go back in
there, chill them niggers out and wait for the Wolf, who should be
coming directly.
-- Marsellus Wallace

2006-08-10 17:44:29

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [Ext2-devel] [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

On Wed, Aug 09, 2006 at 11:39:14PM -0700, Andrew Morton wrote:
> One the fs has been copied-and-pasted, it's much harder to address these
> things: either need to do it twice, or allow the filesystems to diverge, or
> not do it.

> Also, JBD is presently feeding into submit_bh() buffer_heads which span two
> machine pages, and some device drivers spit the dummy. It'd be better to
> fix that once, rather than twice..

Well, it's harder no matter when we do it, right? Whether we do it
before we submit or after, we still before us have the choice of
whether to let the filesystems diverge (and make it harder to do
maintenance), or not. But Linus and others have spoken pretty clearly
that they don't want ext3 to be touched, and it's not even clear that
people would be happy with cleanups.

If we are going to do the cleanups in both, it would actually be
better to fix up ext3 *first*, and then do the copy...

- Ted

2006-08-10 18:22:39

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

On Thu, 10 Aug 2006 09:41:59 -0700
Mingming Cao <[email protected]> wrote:

> Andrew Morton wrote:
>
> > On Wed, 09 Aug 2006 18:17:02 -0700
> > Mingming Cao <[email protected]> wrote:
> >
> >
> >>Fork(copy) ext4 filesystem from ext3 filesystem. Rename all functions in ext4 from ext3_xxx() to ext4_xxx().
> >
> >
> > It would have been nice to spend a few hours cleaning up ext3 and JBD
> > before doing this. The code isn't toooo bad, but there are number of
> > coding style problems, whitespace screwups, incorrect comments, missing
> > comments, poorly-chosen variable names and all of that sort of thing.
> >
> > One the fs has been copied-and-pasted, it's much harder to address these
> > things: either need to do it twice, or allow the filesystems to diverge, or
> > not do it.
> >
> Andrew, thanks for taking a close look this series of changes.
>
> I agree with you that the timing is right, to do the clean up now rather
> than later. I would give it a try. If I could get more help from more
> code reviewer, it probably makes the effort a lot easier. For those
> issues you pointed out : coding style problem___incorrect comments,
> poorly-named variables -- do you have any specific examples in your mind?

Not really, apart from the few things I identified elsewhere (such as the
brelse thing).

It's just that now is the right time for a general spring-cleaning, if we
ever want to do that.

> > Also, -mm presently has two patches pending against fs/jbd/ and nine pending
> > against fs/ext3/. We should get all those things merged before taking the
> > copy.
> >
> So probably the right thing to do is keep the ext4 patches against mm
> tree instead of rc three?

That would drive everyone nuts, I think. What I would suggest is:

- get ext3 into a ready-to-copy state (merge bugfixes, spring-clean, etc)

- do the big copy-n-rename operation in Linus's tree.

- run a quilt or git tree against the resulting Linus tree.

2006-08-10 18:51:41

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

Andrew Morton wrote:
> Also, JBD is presently feeding into submit_bh() buffer_heads which span two
> machine pages, and some device drivers spit the dummy. It'd be better to
> fix that once, rather than twice..
>
Andrew,

I looked at this few days ago. I am not sure how we end up having
multiple pages (especially,
why we end up having buffers with bh_size > pagesize) ? Do you know why ?

Easiest fix would be to fix submit_bh() to deal with multiple vecs -
which is vetoed by
Jens and I agree with him :(

Thanks,
Badari

2006-08-10 19:23:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

On Thu, 10 Aug 2006 11:51:34 -0700
Badari Pulavarty <[email protected]> wrote:

> Andrew Morton wrote:
> > Also, JBD is presently feeding into submit_bh() buffer_heads which span two
> > machine pages, and some device drivers spit the dummy. It'd be better to
> > fix that once, rather than twice..
> >
> Andrew,
>
> I looked at this few days ago. I am not sure how we end up having
> multiple pages (especially,
> why we end up having buffers with bh_size > pagesize) ? Do you know why ?
>

It's one or both of the jbd_kmalloc(bh->b_size) calls in
fs/jbd/transaction.c. Here we're allocating data to attach to a bh which
later gets fed into submit_bh().

Problem is, with CONFIG_DEBUG_SLAB=y, the data which kmalloc() returns can
be offset by 4 bytes due to redzoning.

Example: if the fs is using a 1k blocksize and we have a 4k pagesize, the
data coming back from kmalloc may have an address of 0xnnnnxc04, so the
data which we later feed into submit_bh() will span two pages.

A simple fix would be to replace kmalloc() with a call to alloc_page().
We'd need to work out how much memory that will worst-case-waste. If "not
much" then OK.

If "quite a lot in the worst case" then we'd need something more elaborate.
I'd suggest that ext3 implement ext3-private slab caches of size 1024,
2048, 4096 and perhaps 8192. Those caches should be kmem_cache_create()d
on-demand at mount-time. They should be created with appropriate slab
options to defeat the redzoning. The transaction.c code should use the
appropriate slab (based on b_size) rather than using kmalloc(). The
up-to-four private slab caches should be destroyed on ext3 rmmod.


2006-08-10 19:49:31

by Dave Kleikamp

[permalink] [raw]
Subject: Re: [Ext2-devel] [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

On Thu, 2006-08-10 at 12:23 -0700, Andrew Morton wrote:
> On Thu, 10 Aug 2006 11:51:34 -0700
> Badari Pulavarty <[email protected]> wrote:
>
> > Andrew Morton wrote:
> > > Also, JBD is presently feeding into submit_bh() buffer_heads which span two
> > > machine pages, and some device drivers spit the dummy. It'd be better to
> > > fix that once, rather than twice..
> > >
> > Andrew,
> >
> > I looked at this few days ago. I am not sure how we end up having
> > multiple pages (especially,
> > why we end up having buffers with bh_size > pagesize) ? Do you know why ?
> >
>
> It's one or both of the jbd_kmalloc(bh->b_size) calls in
> fs/jbd/transaction.c. Here we're allocating data to attach to a bh which
> later gets fed into submit_bh().
>
> Problem is, with CONFIG_DEBUG_SLAB=y, the data which kmalloc() returns can
> be offset by 4 bytes due to redzoning.
>
> Example: if the fs is using a 1k blocksize and we have a 4k pagesize, the
> data coming back from kmalloc may have an address of 0xnnnnxc04, so the
> data which we later feed into submit_bh() will span two pages.
>
> A simple fix would be to replace kmalloc() with a call to alloc_page().
> We'd need to work out how much memory that will worst-case-waste. If "not
> much" then OK.
>
> If "quite a lot in the worst case" then we'd need something more elaborate.

Would some like this be okay:

#ifdef CONFIG_DEBUG_SLAB
return alloc_page(...
#else
return kmalloc(...
#endif

This keeps it simple, and should be still be efficient in the
non-DEBUG_SLAB case.

> I'd suggest that ext3 implement ext3-private slab caches of size 1024,
> 2048, 4096 and perhaps 8192. Those caches should be kmem_cache_create()d
> on-demand at mount-time. They should be created with appropriate slab
> options to defeat the redzoning. The transaction.c code should use the
> appropriate slab (based on b_size) rather than using kmalloc(). The
> up-to-four private slab caches should be destroyed on ext3 rmmod.
--
David Kleikamp
IBM Linux Technology Center

2006-08-10 19:57:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [Ext2-devel] [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

On Thu, 10 Aug 2006 14:36:10 -0500
Dave Kleikamp <[email protected]> wrote:

> On Thu, 2006-08-10 at 12:23 -0700, Andrew Morton wrote:
> > On Thu, 10 Aug 2006 11:51:34 -0700
> > Badari Pulavarty <[email protected]> wrote:
> >
> > > Andrew Morton wrote:
> > > > Also, JBD is presently feeding into submit_bh() buffer_heads which span two
> > > > machine pages, and some device drivers spit the dummy. It'd be better to
> > > > fix that once, rather than twice..
> > > >
> > > Andrew,
> > >
> > > I looked at this few days ago. I am not sure how we end up having
> > > multiple pages (especially,
> > > why we end up having buffers with bh_size > pagesize) ? Do you know why ?
> > >
> >
> > It's one or both of the jbd_kmalloc(bh->b_size) calls in
> > fs/jbd/transaction.c. Here we're allocating data to attach to a bh which
> > later gets fed into submit_bh().
> >
> > Problem is, with CONFIG_DEBUG_SLAB=y, the data which kmalloc() returns can
> > be offset by 4 bytes due to redzoning.
> >
> > Example: if the fs is using a 1k blocksize and we have a 4k pagesize, the
> > data coming back from kmalloc may have an address of 0xnnnnxc04, so the
> > data which we later feed into submit_bh() will span two pages.
> >
> > A simple fix would be to replace kmalloc() with a call to alloc_page().
> > We'd need to work out how much memory that will worst-case-waste. If "not
> > much" then OK.
> >
> > If "quite a lot in the worst case" then we'd need something more elaborate.
>
> Would some like this be okay:
>
> #ifdef CONFIG_DEBUG_SLAB
> return alloc_page(...
> #else
> return kmalloc(...
> #endif
>
> This keeps it simple, and should be still be efficient in the
> non-DEBUG_SLAB case.
>

I guess that would work OK. It does appear that a lot of people build and
distribute CONFIG_DEBUG_SLAB kernels though. Fedora, for one.

2006-08-10 20:23:06

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

Andrew Morton wrote:
> On Thu, 10 Aug 2006 09:41:59 -0700
> Mingming Cao <[email protected]> wrote:
>
>> Andrew Morton wrote:
>>
>>> On Wed, 09 Aug 2006 18:17:02 -0700
>>> Mingming Cao <[email protected]> wrote:
>>>
>>>
>>>> Fork(copy) ext4 filesystem from ext3 filesystem. Rename all functions in ext4 from ext3_xxx() to ext4_xxx().
>>>
>>> It would have been nice to spend a few hours cleaning up ext3 and JBD
>>> before doing this. The code isn't toooo bad, but there are number of
>>> coding style problems, whitespace screwups, incorrect comments, missing
>>> comments, poorly-chosen variable names and all of that sort of thing.
>>>
>>> One the fs has been copied-and-pasted, it's much harder to address these
>>> things: either need to do it twice, or allow the filesystems to diverge, or
>>> not do it.
>>>
>> Andrew, thanks for taking a close look this series of changes.
>>
>> I agree with you that the timing is right, to do the clean up now rather
>> than later. I would give it a try. If I could get more help from more
>> code reviewer, it probably makes the effort a lot easier. For those
>> issues you pointed out : coding style problem___incorrect comments,
>> poorly-named variables -- do you have any specific examples in your mind?
>
> Not really, apart from the few things I identified elsewhere (such as the
> brelse thing).
>
> It's just that now is the right time for a general spring-cleaning, if we
> ever want to do that.
>
>>> Also, -mm presently has two patches pending against fs/jbd/ and nine pending
>>> against fs/ext3/. We should get all those things merged before taking the
>>> copy.
>>>
>> So probably the right thing to do is keep the ext4 patches against mm
>> tree instead of rc three?
>
> That would drive everyone nuts, I think. What I would suggest is:
>
> - get ext3 into a ready-to-copy state (merge bugfixes, spring-clean, etc)

Presumably bug fixes should go in immediately, regardless of whether
it's before or after "cp -a ext3 ext4".

I strongly disagree that ext3 should be subject to a spring cleaning.
Comments, whitespace, very very minor things, sure. Trying to get rid
of brelse() when _many_ other filesystems also use it? ext4 material.

That detracts from the idea that its the stable counterpart to the devel
filesystem (ext4).

Jeff


2006-08-10 20:14:35

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

Badari Pulavarty wrote:
> Andrew Morton wrote:
>> Also, JBD is presently feeding into submit_bh() buffer_heads which
>> span two
>> machine pages, and some device drivers spit the dummy. It'd be better to
>> fix that once, rather than twice..
> Andrew,
>
> I looked at this few days ago. I am not sure how we end up having
> multiple pages (especially,
> why we end up having buffers with bh_size > pagesize) ? Do you know why ?
>
> Easiest fix would be to fix submit_bh() to deal with multiple vecs -
> which is vetoed by
> Jens and I agree with him :(

Yep. The sooner we kill buffer heads and use submit_bio(), the better :)

Jeff



2006-08-10 20:28:53

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

On Thu, 10 Aug 2006 16:13:33 -0400
Jeff Garzik <[email protected]> wrote:

> The sooner we kill buffer heads and use submit_bio(), the better :)

A buffer_head is a caching entity and a bio is an IO container. They're
quite separate concepts.

A buffer_head is the kernel's sole abstraction of a disk block.
Filesystems use disk blocks a lot, and they need such an abstraction.

If one was to replace buffer_heads with direct-to-BIO operations then the
filesytem would need to internally track the mapping from

page+offset+length -> disk block

and it would need to internally track the page+offset+length<->disk block
coherency state and it would need to internally perform serialisation of
access to each page+offset+length hunk of pagecache and etc and etc and
etc. Create a data structure with which to do all that and voila,
buffer_heads reinvented.

2006-08-10 20:34:37

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

On Thu, 10 Aug 2006 16:22:26 -0400
Jeff Garzik <[email protected]> wrote:

> I strongly disagree that ext3 should be subject to a spring cleaning.
> Comments, whitespace, very very minor things, sure. Trying to get rid
> of brelse() when _many_ other filesystems also use it? ext4 material.

We should seek to minimise the difficulty of cross-porting bugfixes and
enhancements. Putting cleanups in only ext4 works against that.

ext3 will be around for many years yet. We cannot just let it rot due to
some false belief that performing routine maintenance against it will for
some magical reason cause it to break.

2006-08-10 20:14:25

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

Andrew Morton wrote:
> On Thu, 10 Aug 2006 11:51:34 -0700
> Badari Pulavarty <[email protected]> wrote:
>
>> Andrew Morton wrote:
>>> Also, JBD is presently feeding into submit_bh() buffer_heads which span two
>>> machine pages, and some device drivers spit the dummy. It'd be better to
>>> fix that once, rather than twice..
>>>
>> Andrew,
>>
>> I looked at this few days ago. I am not sure how we end up having
>> multiple pages (especially,
>> why we end up having buffers with bh_size > pagesize) ? Do you know why ?
>>
>
> It's one or both of the jbd_kmalloc(bh->b_size) calls in
> fs/jbd/transaction.c. Here we're allocating data to attach to a bh which
> later gets fed into submit_bh().
[...]

I respectfully submit that this sort of urge to clean up all the
narstiness (a local term) in ext3 is _precisely_ the reason why we need
ext4 in the tree ASAP.

Once people start pushing a large volume of changes into ext[34], the
"obvious cleanups" start popping up. Look at the ratholes we are
_already_ diving down, in this thread, trying to clean up ext3 before
the copy.

ext4 will exist precisely to enable these sort of rapid cleanups. If
obvious stuff starts popping up, the cleanups can go in immediately
without worry of destabilization.

Just let ext3 sit. Other filesystems use submit_bh(), brelse(), etc.
If ext3 is to stop using those, let it be when others stop using them as
well.

ext4 is the devel tree. ext3 is the stable tree. Just go ahead and "cp
fs/ext3 fs/ext4" immediately, otherwise the cleanups will pile up, and
the branch will take place a year from now.

Jeff, still waiting for submit_bio() to be used in fs's :)



2006-08-10 20:52:55

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

Andrew Morton wrote:
> On Thu, 10 Aug 2006 16:22:26 -0400
> Jeff Garzik <[email protected]> wrote:
>
>> I strongly disagree that ext3 should be subject to a spring cleaning.
>> Comments, whitespace, very very minor things, sure. Trying to get rid
>> of brelse() when _many_ other filesystems also use it? ext4 material.
>
> We should seek to minimise the difficulty of cross-porting bugfixes and
> enhancements. Putting cleanups in only ext4 works against that.
>
> ext3 will be around for many years yet. We cannot just let it rot due to
> some false belief that performing routine maintenance against it will for
> some magical reason cause it to break.

Because ext4 is impending, you want to push a bunch of cleanups into
ext3 over a short span of time. That's not routine maintenance at all.
We're not talking about routine maintenance. In your words, we are
talking about spring cleaning.

Why not let the devel/stable system work its magic? If the cleanups are
viable, proving that first in ext4 should give us more confidence to put
them into ext3.

Cross-porting bugfixes and cleanups will _obviously_ be quite easy,
during the first few months of ext4's life.

Just look at ext2->ext3 history. Regardless of when you make the split,
there will be a bunch of stuff people wish to backport after the split
occurs. Given that, it makes more sense to testbed the changes in ext4
first.

Jeff


2006-08-10 21:01:04

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

Andrew Morton wrote:
> On Thu, 10 Aug 2006 16:13:33 -0400
> Jeff Garzik <[email protected]> wrote:
>
>> The sooner we kill buffer heads and use submit_bio(), the better :)
>
> A buffer_head is a caching entity and a bio is an IO container. They're
> quite separate concepts.

Yeah, sorry, I meant direct pagecache I/O. I forgot that bio doesn't
handle caching.


> A buffer_head is the kernel's sole abstraction of a disk block.
> Filesystems use disk blocks a lot, and they need such an abstraction.

IMO Al Viro work has shown that you can do pagecache I/O without needing
such a heavyweight system.

Jeff



2006-08-10 21:10:08

by Alex Tomas

[permalink] [raw]
Subject: Re: [Ext2-devel] [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

>>>>> Jeff Garzik (JG) writes:

>> A buffer_head is the kernel's sole abstraction of a disk block.
>> Filesystems use disk blocks a lot, and they need such an abstraction.

JG> IMO Al Viro work has shown that you can do pagecache I/O without needing
JG> such a heavyweight system.

in delayed allocation patch for ext3 (ontop of extents) we do use
bio's for data.

thanks, Alex

2006-08-10 22:21:13

by Andrew Morton

[permalink] [raw]
Subject: Re: [Ext2-devel] [PATCH 1/5] Forking ext4 filesystem from ext3 filesystem

On Fri, 11 Aug 2006 01:11:23 +0400
Alex Tomas <[email protected]> wrote:

> >> A buffer_head is the kernel's sole abstraction of a disk block.
> >> Filesystems use disk blocks a lot, and they need such an abstraction.
>
> JG> IMO Al Viro work has shown that you can do pagecache I/O without needing
> JG> such a heavyweight system.
>
> in delayed allocation patch for ext3 (ontop of extents) we do use
> bio's for data.

As does ext3 with `-o writeback,nobh'.