LinuxLists.cc - [RFC] do you want jbd2 interface of ext3?

2010-02-16 08:15:49

Subject: [RFC] do you want jbd2 interface of ext3?

Hi.

I will try to change the journaling interface of ext3 from jbd into jbd2.

jbd2 has new features from jbd. For example, it includes the integrity
improvement features. The body of ext3 is already enough quality. If ext3
changes the journaling interface from jbd into jbd2, ext3 filesystem with jbd2
interface may get better integrity than with the jbd interface.
(jbd2 is aggressively being developed now, so I think we are glad if we can
get the effect of the development of jbd2 for ext3.)

And ext3 is as de facto standard filesystem, so jbd2 component will be used
by more people than now if ext3 has the jbd2 interface. If many people used
the jbd2 interface of ext3, the jbd2 component would get more chances to
improve the quality and performance and so on.

Besides, ext3 is now the only user of jbd.
(ocfs2 which was the user of jbd is now the user of jbd2.)

Do you want the jbd2 interface of ext3?
If you want the jbd2 interface, I will try to implement one.

Best regards,
Toshiyuki Okajima

2010-02-16 14:31:46

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [RFC] do you want jbd2 interface of ext3?

On Tue, Feb 16, 2010 at 04:41:23PM +0900, Toshiyuki Okajima wrote:
>
> jbd2 has new features from jbd. For example, it includes the
> integrity improvement features. The body of ext3 is already enough
> quality. If ext3 changes the journaling interface from jbd into
> jbd2, ext3 filesystem with jbd2 interface may get better integrity
> than with the jbd interface. (jbd2 is aggressively being developed
> now, so I think we are glad if we can get the effect of the
> development of jbd2 for ext3.)
>
> And ext3 is as de facto standard filesystem, so jbd2 component will
> be used by more people than now if ext3 has the jbd2 interface. If
> many people used the jbd2 interface of ext3, the jbd2 component
> would get more chances to improve the quality and performance and so
> on.

Jbd2 is development attention because it is part of ext4. And you
don't get to use the data integrity features of jbd2 without
backporting required changes from ext4 to ext3. At which point, why
not have people use ext4?

Ext4 is format compatible with ext3, and with the proper kernel
configuration options, starting with 2.6.33, it's possible to
seemlessly allow people who use "mount -t ext3 /dev/sda1 /u1" to have
/dev/sda1 mounted using the ext4 file system driver. So we even have
a way that we can seemlessly upgrade existing userspace setups to
using ext4 without having to make any system configuration changes
(except installing a new kernel, of course).

The whole point of creating the ext3/ext4 fork was to not disturb ext3
users while ext4 was under development. This was done by effectively
putting ext3 into a bug-fix-only development mode. Changing ext3 so
it could use jbd2 would seem to violate the stability process that we
have made to the ext3 users; if people want new features and
performance improvements, they can use ext4.

Best regards,

- Ted

2010-02-16 18:54:44

by Jan Kara

[permalink] [raw]

Subject: Re: [RFC] do you want jbd2 interface of ext3?

Hello,

On Tue 16-02-10 16:41:23, Toshiyuki Okajima wrote:
> I will try to change the journaling interface of ext3 from jbd into jbd2.
>
> jbd2 has new features from jbd. For example, it includes the integrity
> improvement features. The body of ext3 is already enough quality. If ext3
> changes the journaling interface from jbd into jbd2, ext3 filesystem with jbd2
> interface may get better integrity than with the jbd interface.
> (jbd2 is aggressively being developed now, so I think we are glad if we can
> get the effect of the development of jbd2 for ext3.)
>
> And ext3 is as de facto standard filesystem, so jbd2 component will be used
> by more people than now if ext3 has the jbd2 interface. If many people used
> the jbd2 interface of ext3, the jbd2 component would get more chances to
> improve the quality and performance and so on.
>
> Besides, ext3 is now the only user of jbd.
> (ocfs2 which was the user of jbd is now the user of jbd2.)
>
> Do you want the jbd2 interface of ext3?
> If you want the jbd2 interface, I will try to implement one.
Yes, as Ted pointed out, the main reason why we have a separate codebase for
ext3 and ext4 and similarly jbd and jbd2 is that we didn't want the changes
in ext4/jbd2 to influence (and possibly destabilize) ext3 filesystem. So
switching ext3 to jbd2 would be directly against this logic...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2010-02-17 08:36:09

by Toshiyuki Okajima

[permalink] [raw]

Subject: Re: [RFC] do you want jbd2 interface of ext3?

Hi, Ted and Jan!

[email protected] wrote:
> On Tue, Feb 16, 2010 at 04:41:23PM +0900, Toshiyuki Okajima wrote:
> > >
> > > jbd2 has new features from jbd. For example, it includes the
> > > integrity improvement features. The body of ext3 is already enough
> > > quality. If ext3 changes the journaling interface from jbd into
> > > jbd2, ext3 filesystem with jbd2 interface may get better integrity
> > > than with the jbd interface. (jbd2 is aggressively being developed
> > > now, so I think we are glad if we can get the effect of the
> > > development of jbd2 for ext3.)
> > >
> > > And ext3 is as de facto standard filesystem, so jbd2 component will
> > > be used by more people than now if ext3 has the jbd2 interface. If
> > > many people used the jbd2 interface of ext3, the jbd2 component
> > > would get more chances to improve the quality and performance and so
> > > on.
>
> Jbd2 is development attention because it is part of ext4. And you
> don't get to use the data integrity features of jbd2 without
OK. I understand.
(jbd2 is now developing.)

> backporting required changes from ext4 to ext3. At which point, why
> not have people use ext4?

The reason that I wanted to change the journaling interface into jbd2 were:
- the most of my customers use linux for Mission Critical (M.C.).
- M.C. users want the filesystems which have more integrity for their data.
- I think we should not recommend ext4 to M.C. users because
for M.C. users, ext4 is still unstable filesystem.
Therefore I want to let M.C. users use ext3 for the present.
- it is not easy to maintain both jbd and jbd2, so
I thought it was easy to solve it by unifying the journaling interfaces
into ext4.

>
> Ext4 is format compatible with ext3, and with the proper kernel
> configuration options, starting with 2.6.33, it's possible to
> seemlessly allow people who use "mount -t ext3 /dev/sda1 /u1" to have
> /dev/sda1 mounted using the ext4 file system driver. So we even have
> a way that we can seemlessly upgrade existing userspace setups to
> using ext4 without having to make any system configuration changes
> (except installing a new kernel, of course).
I know this feature.
But I wanted not to let M.C. users use it now because this feature is
based on ext4.

>
> The whole point of creating the ext3/ext4 fork was to not disturb ext3
> users while ext4 was under development. This was done by effectively
> putting ext3 into a bug-fix-only development mode. Changing ext3 so
> it could use jbd2 would seem to violate the stability process that we
> have made to the ext3 users; if people want new features and
> performance improvements, they can use ext4.

Jan Kara wrote:
> Hello,
>
> On Tue 16-02-10 16:41:23, Toshiyuki Okajima wrote:
> > > I will try to change the journaling interface of ext3 from jbd into jbd2.
> > >
> > > jbd2 has new features from jbd. For example, it includes the integrity
> > > improvement features. The body of ext3 is already enough quality. If ext3
> > > changes the journaling interface from jbd into jbd2, ext3 filesystem with jbd2
> > > interface may get better integrity than with the jbd interface.
> > > (jbd2 is aggressively being developed now, so I think we are glad if we can
> > > get the effect of the development of jbd2 for ext3.)
> > >
> > > And ext3 is as de facto standard filesystem, so jbd2 component will be used
> > > by more people than now if ext3 has the jbd2 interface. If many people used
> > > the jbd2 interface of ext3, the jbd2 component would get more chances to
> > > improve the quality and performance and so on.
> > >
> > > Besides, ext3 is now the only user of jbd.
> > > (ocfs2 which was the user of jbd is now the user of jbd2.)
> > >
> > > Do you want the jbd2 interface of ext3?
> > > If you want the jbd2 interface, I will try to implement one.
> Yes, as Ted pointed out, the main reason why we have a separate codebase for
> ext3 and ext4 and similarly jbd and jbd2 is that we didn't want the changes
> in ext4/jbd2 to influence (and possibly destabilize) ext3 filesystem. So
> switching ext3 to jbd2 would be directly against this logic...

OK. I see.
(ext3 is already stable filesystem, so, we should not change
ext3 drastically.)

Thanks,
Toshiyuki Okajima

2010-02-17 16:49:45

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [RFC] do you want jbd2 interface of ext3?

On Wed, Feb 17, 2010 at 05:36:00PM +0900, Toshiyuki Okajima wrote:
>
> The reason that I wanted to change the journaling interface into jbd2 were:
> - the most of my customers use linux for Mission Critical (M.C.).
> - M.C. users want the filesystems which have more integrity for their data.
> - I think we should not recommend ext4 to M.C. users because
> for M.C. users, ext4 is still unstable filesystem.
> Therefore I want to let M.C. users use ext3 for the present.
> - it is not easy to maintain both jbd and jbd2, so
> I thought it was easy to solve it by unifying the journaling interfaces
> into ext4.

But if they are mission critical users, why would they be willing to
accept changes to the jbd2 layer, and the necessary changes to ext3 so
it can use jbd2? Any time you add changes, you will be causing a
certain amount of instability and risk. So the question is, what are
your users willing to tolerate?

Some important questions to ask:

1) Is the problem psychological? i.e., is the problem that it is
*called* ext4? After all, ext4 is derived from ext3, so if they are
willing to accept new features backported into ext3 (i.e., journal
checksums) and the risks associated with making changes to add new
features, why are they not willing to accept ext4?

2) If it is a question of risk, how many changes are they willing to
accept? I will note that if you don't enable extents, and disable
delayed allocation, you can significantly decrease the risk of using
ext4. (Essentially at that point the only major change is the block
allocation code and the changes to use jbd2.)

3) How much testing do you need to do before it would be considered
acceptable for your Mission Critical users? Or is it a matter of time
to allow other users to be the "guinea pigs"? :-)

> OK. I see.
> (ext3 is already stable filesystem, so, we should not change
> ext3 drastically.)

Well, certainly, *any* change is going to risk destablizing the file
system. Isn't that the argument why you are concerned about whether
ext4 is ready for your M.C. users? One of the reasons why we forked
jbd2 from jbd was precisely because of these sorts of concerned. So
if you switch ext3 to use jbd2, would that not increase the risk to
your M.C. users?

Best regards,

- Ted

2010-02-17 18:09:36

by Greg Freemyer

[permalink] [raw]

Subject: Re: [RFC] do you want jbd2 interface of ext3?

On Wed, Feb 17, 2010 at 11:49 AM, <[email protected]> wrote:
> On Wed, Feb 17, 2010 at 05:36:00PM +0900, Toshiyuki Okajima wrote:
>>
>> The reason that I wanted to change the journaling interface into jbd2 were:
>> - the most of my customers use linux for Mission Critical (M.C.).
>> - M.C. users want the filesystems which have more integrity for their data.
>> - I think we should not recommend ext4 to M.C. users because
>> ? for M.C. users, ext4 is still unstable filesystem.
>> ? Therefore I want to let M.C. users use ext3 for the present.
>> - it is not easy to maintain both jbd and jbd2, so
>> ? I thought it was easy to solve it by unifying the journaling interfaces
>> ? into ext4.
>
> But if they are mission critical users, why would they be willing to
> accept changes to the jbd2 layer, and the necessary changes to ext3 so
> it can use jbd2? ?Any time you add changes, you will be causing a
> certain amount of instability and risk. ?So the question is, what are
> your users willing to tolerate?
>
> Some important questions to ask:
>
> 1) Is the problem psychological? ?i.e., is the problem that it is
> *called* ext4? ?After all, ext4 is derived from ext3, so if they are
> willing to accept new features backported into ext3 (i.e., journal
> checksums) and the risks associated with making changes to add new
> features, why are they not willing to accept ext4?
>
> 2) If it is a question of risk, how many changes are they willing to
> accept? ?I will note that if you don't enable extents, and disable
> delayed allocation, you can significantly decrease the risk of using
> ext4. ?(Essentially at that point the only major change is the block
> allocation code and the changes to use jbd2.)
>
> 3) How much testing do you need to do before it would be considered
> acceptable for your Mission Critical users? ?Or is it a matter of time
> to allow other users to be the "guinea pigs"? ?:-)
>
>> OK. I see.
>> (ext3 is already stable filesystem, so, we should not change
>> ?ext3 drastically.)
>
> Well, certainly, *any* change is going to risk destablizing the file
> system. ?Isn't that the argument why you are concerned about whether
> ext4 is ready for your M.C. users? ?One of the reasons why we forked
> jbd2 from jbd was precisely because of these sorts of concerned. ?So
> if you switch ext3 to use jbd2, would that not increase the risk to
> your M.C. users?
>
> Best regards,
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted

Having little knowledge on my part, it sounds like it is time to open
up jdb3 for leading edge development and turn jdb2 into the new stable
platform and move jdb into legacy status.

If it really does only have one user at this time jdb is basically
just legacy at this point anyway.

Greg

--
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
Preservation and Forensic processing of Exchange Repositories White Paper -
<http://www.norcrossgroup.com/forms/whitepapers/tng_whitepaper_fpe.html>

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

2010-02-17 19:16:43

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [RFC] do you want jbd2 interface of ext3?

On Wed, Feb 17, 2010 at 01:09:30PM -0500, Greg Freemyer wrote:
>
> Having little knowledge on my part, it sounds like it is time to open
> up jdb3 for leading edge development and turn jdb2 into the new stable
> platform and move jdb into legacy status.

There is no "leading edge" development currently planned (nor has
there been any for at least six months or so) for the journaling
block layer.

> If it really does only have one user at this time jdb is basically
> just legacy at this point anyway.

The one user is "ext3", and for a long time jbd has only been used by
ext3. It's only recently that ocfs/ocfs2 started using jbd, and then
more recently they switch over to using jbd2.

I think people are fundamentally confused about the nature of jbd
versus ext3; jbd is a very fundamental part of ext3. Heck, jbd was
originally written *for* ext3. If jbd is "legacy", why not call ext3
"legacy"? And what the heck does "legacy" mean, anyway? If it means,
bug fixes only, that's the state of ext3 and jbd for the most part
anyway.

Yes, there have been a bunch of quota-related ext3 changes, but
they've been pretty much all been bug fixes, mainly because more
people are caring about quota now that people are much more seriously
thinking about using quota with as part of their virtualization
solution. These bugs have been around in quota for a long time, but
since time sharing is so last century, for long time it wasn't getting
much attention.

- Ted

2010-02-22 05:44:35

by Toshiyuki Okajima

[permalink] [raw]

Subject: Re: [RFC] do you want jbd2 interface of ext3?

Hi Ted,
sorry for my late response.

[email protected] wrote:
> On Wed, Feb 17, 2010 at 05:36:00PM +0900, Toshiyuki Okajima wrote:
> > >
> > > The reason that I wanted to change the journaling interface into jbd2 were:
> > > - the most of my customers use linux for Mission Critical (M.C.).
> > > - M.C. users want the filesystems which have more integrity for their data.
> > > - I think we should not recommend ext4 to M.C. users because
> > > for M.C. users, ext4 is still unstable filesystem.
> > > Therefore I want to let M.C. users use ext3 for the present.
> > > - it is not easy to maintain both jbd and jbd2, so
> > > I thought it was easy to solve it by unifying the journaling interfaces
> > > into ext4.
>
> But if they are mission critical users, why would they be willing to
> accept changes to the jbd2 layer, and the necessary changes to ext3 so
> it can use jbd2? Any time you add changes, you will be causing a
> certain amount of instability and risk. So the question is, what are
> your users willing to tolerate?
>
> Some important questions to ask:
>
> 1) Is the problem psychological? i.e., is the problem that it is
> *called* ext4? After all, ext4 is derived from ext3, so if they are
> willing to accept new features backported into ext3 (i.e., journal
> checksums) and the risks associated with making changes to add new
> features, why are they not willing to accept ext4?
I guess some important basic functions, delayed allocation and quota
seems to be still unstable. At least, if these functions may work
incorrectly, M.C. users cannot use it.

> 2) If it is a question of risk, how many changes are they willing to
> accept? I will note that if you don't enable extents, and disable
> delayed allocation, you can significantly decrease the risk of using
> ext4. (Essentially at that point the only major change is the block
> allocation code and the changes to use jbd2.)
I think M.C. users don't accept ext4 easily because the basic body of ext4
is a little different from ext3 even if the features of delalloc and extents
are unused.
(For example, ext4 has the multi block allocation feature, but ext3 doesn't
have.)

---
Besides, even if we use ext3 and encounter some troubles by ext3/jbd module,
we can avoid these troubles by using ext2 module during repairing
these troubles. (Because ext3 filesystem can mount as ext2 filesystem by ext2
module.)
But even if we use ext4 with "extents" feature and encounter some troubles
by ext4/jbd2 module, we cannot avoid these troubles by ext2/ext3 modules
because ext3 (or ext2) cannot work "extents" feature. Therefore I think
M.C. users demand that the quality of ext4 is the same as ext3 level or
higher.
---

>
> 3) How much testing do you need to do before it would be considered
> acceptable for your Mission Critical users? Or is it a matter of time
> to allow other users to be the "guinea pigs"? :-)
>
I think I also have to test the ext4 features (delalloc, quota, mballoc
and so on).
It may cost about half a year or a year ...

> > > OK. I see.
> > > (ext3 is already stable filesystem, so, we should not change
> > > ext3 drastically.)
>
> Well, certainly, *any* change is going to risk destablizing the file
> system. Isn't that the argument why you are concerned about whether
> ext4 is ready for your M.C. users? One of the reasons why we forked
> jbd2 from jbd was precisely because of these sorts of concerned. So
> if you switch ext3 to use jbd2, would that not increase the risk to
> your M.C. users?
Of course, M.C. users don't want unstable filesystems.
So, M.C. users don't use ext3 with jbd2 at once even if ext3 with jbd2
is achieved.
But they are sure to want filesystems which have enough integrity features.　

Therefore, at first, I was to try to add the journaling interface jbd2
with jbd interface left.
(But I think M.C. users want to use ext4 in the future because ext4 is:
- max filesystem size is much bigger than ext3
- I/O performance is superior to ext3 with delayed allocation and multi block
allocation.
- and other features which ext3 doesn't have fascinate M.C. users.)

By this change, I thought ext3 and ext4 would mutually achieve a good effect:
- [ext3] jbd2 component can get stable earlier because ext3 joins the jbd2
users, then developers of jbd2 increase.
- [ext3] as the result, if jbd2 becomes enough quality, M.C. users can use
jbd2 interface of ext3.
- [ext4] as the result, if jbd2 becomes enough quality, ext4 becomes better
quality.
- [ext4] at last, if ext4 becomes enough quality, M.C. users can use ext4.

But, I will stop trying to implement jbd2 interface of ext3
because I understand the policy which keeps ext3 stable with ext3 & jbd.

Best Regards,
Toshiyuki Okajima

2010-02-22 13:56:01

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [RFC] do you want jbd2 interface of ext3?

On Feb 22, 2010, at 12:44 AM, Toshiyuki Okajima wrote:
> > 1) Is the problem psychological? i.e., is the problem that it is
> > *called* ext4? After all, ext4 is derived from ext3, so if they are
> > willing to accept new features backported into ext3 (i.e., journal
> > checksums) and the risks associated with making changes to add new
> > features, why are they not willing to accept ext4?
> I guess some important basic functions, delayed allocation and quota
> seems to be still unstable. At least, if these functions may work
> incorrectly, M.C. users cannot use it.

I haven't seen a bug reported with respect to delayed allocation in quite a while, actually. That code path is pretty well tested at this point. It's probably one of the more complicated paths, though, which is why if you wanted to be very paranoid, disabling is certainly a valid option. On the other hand, if you eventually want the performance features of delalloc, there's a question of how much testing do you want to do on interim measures --- but that question applies just as much to ext3 modified to use jbd2 as it does using ext4 with extents and delayed allocation disabled.

The main reason why people what to disable delayed allocation is because they have buggy applications which don't use fsync() but which depend on the data being written to disk after a crash. But that's an application issue, not a file system issue --- and I'll note that even with ext3, if you don't use fsync(), there is a chance you will lose data after a power failure. It's not a very large chance, granted --- but the premise of this discussion is that even a small chance of failure is unacceptable for mission critical systems. So I would argue that if application data is *reliably* lost after a power failure, this is actually a good thing in terms of exposing and then fixing application bugs. After all, if there is only a 1% chance of losing data on a buggy, non-fsync()'ing appli
cation, that might be OK for desktop users but not for M.C. users --- but trying to find those application bugs when they only result in data loss 1% of the time is very, very difficult. Better to have a system which is much higher performance, but which requires applications to actually do the right thing and use fsync() when they really care about data hitting the disk --- and then doing exhaustive power fail testing of the entire mission critical software stack, and fixing those application bugs.

As for quota --- quite seriously --- if you have mission critical users, I'd suggest that they not use quota. Dimitry has been turning up all sorts of bugs in the quota subsystem, many of which are just as applicable to ext3. The real issue is that quota hasn't received as much testing as other file system features --- in any file system, not just ext4.

> Besides, even if we use ext3 and encounter some troubles by ext3/jbd module,
> we can avoid these troubles by using ext2 module during repairing
> these troubles. (Because ext3 filesystem can mount as ext2 filesystem by ext2
> module.)
> But even if we use ext4 with "extents" feature and encounter some troubles
> by ext4/jbd2 module, we cannot avoid these troubles by ext2/ext3 modules
> because ext3 (or ext2) cannot work "extents" feature. Therefore I think
> M.C. users demand that the quality of ext4 is the same as ext3 level or
> higher.

Again, your customers don't have to use extents if they care so much about being able to fall back to ext2. I'm not sure I understand the thinking behind needing to use the ext2 module while repairing problems. If there are file system corruption issues, e2fsck is used to fix the file system consistency issues --- and e2fsck is used to repair ext2, ext3, and ext4 file system issues. Is the concern the hypothetical one of a file system bug which is uncovered which is so terrible that there is a need to completely change the code base to use ext2 while the file system bug in ext4 is repaired? (That is, the concern being over a bug in the file system code, as opposed to a file system corruption issue?)

That seems to be a little far-fetched, since what if the bug is in the generic VM layer, or in a block device driver? Requiring the ability to use an alternate code implementation in case of a failure seems like a very stringent requirement that can't be met in general for most kernel subsystems. Why is the file system singled out as needing this requirement? Also, how big are the disk images used for most mission critical systems. Most of the ones I can think of which are this highly mission critical --- and which can't be addressed by using multiple systems with high availability fallback schemes --- tend to be relatively small, embedded devices (i.e., medical devices and avionics systems), with at best a gigabyte or so of storage. In which case, the amount of effort needed to do a
dump, reformat , and restore shouldn't be that big.

> > 3) How much testing do you need to do before it would be considered
> > acceptable for your Mission Critical users? Or is it a matter of time
> > to allow other users to be the "guinea pigs"? :-)
> >
> I think I also have to test the ext4 features (delalloc, quota, mballoc
> and so on).
> It may cost about half a year or a year ...

So let me ask you this --- how much testing do you think it would take before you were confident that ext3+jbd2 combination would be stable? And do you have a specific test suite in mind? (And is that something that can be shared so the rest of the community can help with the testing?) How does that compare with the six month effort that you have estimated?

I will note that in general it's not the amount of features that determine the amount of testing required (although it could make a huge difference in terms of fixing bugs that are found), but rather the combinatorics in terms of the set of options which you need to test. So if you need to test extents vs. extents disabled, delalloc vs. non-delalloc, etc., that's what causes the test matrix to become very large. But in the case of testing for mission critical systems, you don't have to test all of the options. In fact, you may be able to get away with only testing one configuration, or maybe only 2-3 combinations, depending on your customers' requirements. (I doubt, for example, that you did a full exhaustive testing with ext3 and bh vs nobh, and so on.)

Best regards,

-- Ted

2010-02-22 18:02:39

by Jan Kara

[permalink] [raw]

Subject: Re: [RFC] do you want jbd2 interface of ext3?

On Mon 22-02-10 08:55:53, Theodore Tso wrote:
> As for quota --- quite seriously --- if you have mission critical users,
> I'd suggest that they not use quota. Dimitry has been turning up all
> sorts of bugs in the quota subsystem, many of which are just as
> applicable to ext3. The real issue is that quota hasn't received as much
> testing as other file system features --- in any file system, not just
> ext4.
I don't agree with this. I know about quite a few large customers
depending on quotas on their servers and they run on ext3 / reiserfs quite
happily. Dmitry's patches touching the generic code were mostly cleanups,
the fixes were just in the delayed allocation handling but that never
gets executed for ext3 or reiserfs...
I don't say there cannot be bugs and certainly quota code has less
exposure than other more used filesystem parts. But I don't know about
any serious quota issue on ext3 / reiserfs in last two years or so
(except the one that was caused by Dmitry's fixes ;).

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2010-02-22 18:57:27

by Dmitry Monakhov

[permalink] [raw]

Subject: Re: [RFC] do you want jbd2 interface of ext3?

Jan Kara <[email protected]> writes:

> On Mon 22-02-10 08:55:53, Theodore Tso wrote:
>> As for quota --- quite seriously --- if you have mission critical users,
>> I'd suggest that they not use quota. Dimitry has been turning up all
>> sorts of bugs in the quota subsystem, many of which are just as
>> applicable to ext3. The real issue is that quota hasn't received as much
>> testing as other file system features --- in any file system, not just
>> ext4.
> I don't agree with this. I know about quite a few large customers
> depending on quotas on their servers and they run on ext3 / reiserfs quite
> happily. Dmitry's patches touching the generic code were mostly cleanups,
> the fixes were just in the delayed allocation handling but that never
> gets executed for ext3 or reiserfs...
Stability is relative thing. I's quite depends on usecase.
For example after triggering bug on not empty orphan list on ext3_umount
i've started full orphan-list management code revision. And both
ext3/ext4 appears to be almost broken in case of errors.
But nobody seems never catch it in real life. But still
at that time i have triggered:
1) non empty orphan list on umount for both (ext3 and ext4)
2) on_disk linked list corruption for both
3) data blocks beyond i_size
4) bit-difference on fsck for both
Currently i'm working on fixes. It takes week or so.
So at least i'll reduce "project_id quota" spam flow a bit.
> I don't say there cannot be bugs and certainly quota code has less
> exposure than other more used filesystem parts. But I don't know about
> any serious quota issue on ext3 / reiserfs in last two years or so
> (except the one that was caused by Dmitry's fixes ;).
This time i'll try to give enough test coverage.
>
> Honza

2010-02-26 08:04:15

by Toshiyuki Okajima

[permalink] [raw]

Subject: Re: [RFC] do you want jbd2 interface of ext3?

Hi Ted,

On Mon, 22 Feb 2010 08:55:53 -0500
Theodore Tso <[email protected]> wrote:
> On Feb 22, 2010, at 12:44 AM, Toshiyuki Okajima wrote:
> >> > > 1) Is the problem psychological? i.e., is the problem that it is
> >> > > *called* ext4? After all, ext4 is derived from ext3, so if they are
> >> > > willing to accept new features backported into ext3 (i.e., journal
> >> > > checksums) and the risks associated with making changes to add new
> >> > > features, why are they not willing to accept ext4?
> > > I guess some important basic functions, delayed allocation and quota
> > > seems to be still unstable. At least, if these functions may work
> > > incorrectly, M.C. users cannot use it.
>
> I haven't seen a bug reported with respect to delayed allocation in quite a
> while, actually. That code path is pretty well tested at this point.
> It's probably one of the more complicated paths, though, which is why if you
> wanted to be very paranoid, disabling is certainly a valid option. On the
> other hand, if you eventually want the performance features of delalloc,
> there's a question of how much testing do you want to do on interim measures
> --- but that question applies just as much to ext3 modified to use jbd2 as it ]
> does using ext4 with extents and delayed allocation disabled.
>
> The main reason why people what to disable delayed allocation is because
> they have buggy applications which don't use fsync() but which depend on
> the data being written to disk after a crash. But that's an application
> issue, not a file system issue --- and I'll note that even with ext3, if
> you don't use fsync(), there is a chance you will lose data after a power
> failure. It's not a very large chance, granted --- but the premise of this
> discussion is that even a small chance of failure is unacceptable for mission
> critical systems. So I would argue that if application data is *reliably*
> lost after a power failure, this is actually a good thing in terms of
> exposing and then fixing application bugs. After all, if there is only a
> 1% chance of losing data on a buggy, non-fsync()'ing application, that might
> be OK for desktop users but not for M.C. users --- but trying to find those
> application bugs when they only result in data loss 1% of the time is very,
> very difficult. Better to have a system which is much higher performance,
> but which requires applications to actually do the right thing and use
> fsync() when they really care about data hitting the disk --- and then doing
> exhaustive power fail testing of the entire mission critical software stack,
> and fixing those application bugs.
>
> As for quota --- quite seriously --- if you have mission critical users, I'd
> suggest that they not use quota. Dimitry has been turning up all sorts of
> bugs in the quota subsystem, many of which are just as applicable to ext3.
> The real issue is that quota hasn't received as much testing as other file
> system features --- in any file system, not just ext4.

First of all, the expression which I had previously written seemed to cause
misunderstanding, so I correct:
I meant "delayed allocation and quota seems to be still unstable" was
"there is the possibility which some problems happen by using both delayed
allocation and quota".

If the applications aren't implemented correctly, I understand the "fsync()"
problem that data can lose after a crash. But there was no deep consideration
for my changing the journaling interface of ext3. I thought that it was easier
to maintain only jbd2 than both jbd and jbd2. And I thought we could get
the integrity features of jbd2 into ext3 by changing the journaling interface.
Besides I thought shifting to jbd2 was very easy because the body of
ext3(without jbd) had been tested enough and jbd2 is the almost same as jbd
(jbd2 was derived from jbd).
So, I have proposed the change of the journaling interface of ext3 because
I thought the possibility to generate the problem was low.

However, I find that my proposal to change the journaling interface is
meaningless after I understand the policy which keeps ext3 stable with ext3
& jbd.

BTW, the strong reason why I don't recommend that my users use ext4 for the
present is: we cannot roll back to ext3 from ext4(+extent).

Though the quality of ext4 is improved day by day, I think the quality of
ext4 doesn't still reach the one of ext3.
(using both delalloc&quota is still unstable)

So, if all ext4 problems which we recognize now is solved, I will consider
to let my customer use ext4.

> > > Besides, even if we use ext3 and encounter some troubles by ext3/jbd module,
> > > we can avoid these troubles by using ext2 module during repairing
> > > these troubles. (Because ext3 filesystem can mount as ext2 filesystem by ext2
> > > module.)
> > > But even if we use ext4 with "extents" feature and encounter some troubles
> > > by ext4/jbd2 module, we cannot avoid these troubles by ext2/ext3 modules
> > > because ext3 (or ext2) cannot work "extents" feature. Therefore I think
> > > M.C. users demand that the quality of ext4 is the same as ext3 level or
> > > higher.
>
> Again, your customers don't have to use extents if they care so much about
> being able to fall back to ext2. I'm not sure I understand the thinking
> behind needing to use the ext2 module while repairing problems. If there are
> file system corruption issues, e2fsck is used to fix the file system
> consistency issues --- and e2fsck is used to repair ext2, ext3, and ext4 file
> system issues. Is the concern the hypothetical one of a file system bug
> which is uncovered which is so terrible that there is a need to completely
> change the code base to use ext2 while the file system bug in ext4 is
> repaired? (That is, the concern being over a bug in the file system code,
> as opposed to a file system corruption issue?)
>
> That seems to be a little far-fetched, since what if the bug is in the
> generic VM layer, or in a block device driver? Requiring the ability to use
> an alternate code implementation in case of a failure seems like a very
> stringent requirement that can't be met in general for most kernel
> subsystems. Why is the file system singled out as needing this requirement?
> Also, how big are the disk images used for most mission critical systems.
> Most of the ones I can think of which are this highly mission critical
> --- and which can't be addressed by using multiple systems with high
> availability fallback schemes --- tend to be relatively small, embedded
> devices (i.e., medical devices and avionics systems), with at best a gigabyte
> or so of storage. In which case, the amount of effort needed to do a dump,
> reformat , and restore shouldn't be that big.
The problem which I mentioned is not for the media (storage) but for the codepath.
M.C. users tend to continue to work with the original status if possible because
they dislike that the time of the system down is long. Therefore they don't like
the operation of backup&restore(+mkfs).
------ step to backup&restore(+mkfs)
(1) system down
(2) restart system with single user-mode (=> all services are stopped.)
(3) dump ext4 files of the device which caused the system down into the storage
(4) do mkfs.ext3 with that device
(5) rewrite /etc/fstab
(6) restore from the storage to ext3
(7) restart system
------
Therefore at least, they request the workarounds with immediate effect.

> >> > > 3) How much testing do you need to do before it would be considered
> >> > > acceptable for your Mission Critical users? Or is it a matter of time
> >> > > to allow other users to be the "guinea pigs"? :-)
> >> > >
> > > I think I also have to test the ext4 features (delalloc, quota, mballoc
> > > and so on).
> > > It may cost about half a year or a year ...
>
> So let me ask you this --- how much testing do you think it would take before
> you were confident that ext3+jbd2 combination would be stable? And do you
> have a specific test suite in mind? (And is that something that can be
> shared so the rest of the community can help with the testing?) How does
> that compare with the six month effort that you have estimated?
>
> I will note that in general it's not the amount of features that determine
> the amount of testing required (although it could make a huge difference in
> terms of fixing bugs that are found), but rather the combinatorics in terms
> of the set of options which you need to test. So if you need to test
> extents vs. extents disabled, delalloc vs. non-delalloc, etc., that's what
> causes the test matrix to become very large. But in the case of testing for
> mission critical systems, you don't have to test all of the options.
> In fact, you may be able to get away with only testing one configuration, or
> maybe only 2-3 combinations, depending on your customers' requirements. (I
> doubt, for example, that you did a full exhaustive testing with ext3 and bh
> vs nobh, and so on.)
I'm sorry. The period which I previously indicated is my feeling. I do not
have theoretical grounds for it.

I think the quality verification of the combination of both ext3 and jbd2 is
dependent on the quality of implementing the journaling interface of ext3.
So, I thought it was all right if I could increase the quality of the
journaling interface. And I thought I could improve much more easily the
quality of ext3+jbd2 than the one of ext4+jbd2.

However, I will find another methods for more improving the integrities of
ext4 so that I can recommend using ext4 earlier to my customers because I
understand new features should not be added to ext3.

Thanks,
Toshiyuki Okajima