2014-06-13 10:32:20

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

Hi!

> > As I said, the glaring omission is proper ENOSPC handling, which is
> > work in progress. I do not view that as an obstacle to merging.
> >
> > After all, Btrfs did not have proper ENOSPC handling when it was
> > merged.
>
> Yup, and that was a big mistake. Hence not having working ENOSPC
> detection is a major strike against merging a new filesystem now.

Hmm, it seems that merging filesystems is getting harder over
time. Soon, it will be impossible to merge new filesystem.

> > The design is here:
>
> So come back when you've implemented it properly and proven that you
> have a sound design and clean implementation.

People submit code early to get feedback... but this is not exactly
helpful feedback, I'm afraid...

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


2014-06-13 17:49:50

by Daniel Phillips

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

Hi Pavel, On Friday, June 13, 2014 3:32:16 AM PDT, Pavel Machek wrote:
> Hmm, it seems that merging filesystems is getting harder over
> time. Soon, it will be impossible to merge new filesystem.

My thought exactly, but it carries more weight coming from you.

It is getting more unpleasant to discuss things on LKML in
general, which tends to drive the design process away from
public view, leaving only the dregs of politics and infighting
for the public record. Perhaps some participants prefer it that
way, but I am certainly not one of them.

I thought this issue was going to be addressed at last year's
kernel summit.

Regards,

Daniel

2014-06-13 20:20:43

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

Hi!

On Fri 2014-06-13 10:49:39, Daniel Phillips wrote:
> Hi Pavel, On Friday, June 13, 2014 3:32:16 AM PDT, Pavel Machek wrote:
> >Hmm, it seems that merging filesystems is getting harder over
> >time. Soon, it will be impossible to merge new filesystem.
>
> My thought exactly, but it carries more weight coming from you.
>
> It is getting more unpleasant to discuss things on LKML in
> general, which tends to drive the design process away from
> public view, leaving only the dregs of politics and infighting
> for the public record. Perhaps some participants prefer it that
> way, but I am certainly not one of them.
>
> I thought this issue was going to be addressed at last year's
> kernel summit.

Actually, would it make sense to have staging/fs/?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-06-15 21:41:27

by Daniel Phillips

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

On Friday, June 13, 2014 1:20:39 PM PDT, Pavel Machek wrote:
> Hi!
>
> On Fri 2014-06-13 10:49:39, Daniel Phillips wrote:
>> Hi Pavel, On Friday, June 13, 2014 3:32:16 AM PDT, Pavel Machek wrote:
> ...
>
> Actually, would it make sense to have staging/fs/?

That makes sense to me, if a suitably expert and nonaligned maintainer can
be found to sign up for a ridiculous amount of largely thankless, but
perhaps fascinating work. Any volunteers?

Regards,

Daniel

2014-06-16 15:26:13

by James Bottomley

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

On Sun, 2014-06-15 at 14:41 -0700, Daniel Phillips wrote:
> On Friday, June 13, 2014 1:20:39 PM PDT, Pavel Machek wrote:
> > Hi!
> >
> > On Fri 2014-06-13 10:49:39, Daniel Phillips wrote:
> >> Hi Pavel, On Friday, June 13, 2014 3:32:16 AM PDT, Pavel Machek wrote:
> > ...
> >
> > Actually, would it make sense to have staging/fs/?
>
> That makes sense to me, if a suitably expert and nonaligned maintainer can
> be found

Really? We're at the passive aggressive implication that everyone's
against you now? Can we get back to the technical discussion, please?

> to sign up for a ridiculous amount of largely thankless, but
> perhaps fascinating work. Any volunteers?

The whole suggestion is a non starter: we can't stage core API changes.
Even if we worked out how to do that, the staging trees mostly don't get
the type of in-depth expert review that you need anyway.

The Cardinal concern has always been the viability page forking and its
impact on writeback ... and since writeback is our most difficult an
performance sensitive area, the bar to changing it is high.

When you presented page forking at LSF/MM in 2013, it didn't even stand
up to basic scrutiny before people found unresolved problems:

http://lwn.net/Articles/548091/

After lots of prodding, you finally coughed up a patch for discussion:

http://thread.gmane.org/gmane.linux.file-systems/85619

But then that petered out again. I can't emphasise enough that
iterating these threads to a conclusion and reposting interface
suggestions is the way to proceed on this ... as far as I can tell from
the discussion, the reviewers were making helpful suggestions, even if
they didn't like the original interface you proposed.

James

2014-06-19 08:21:40

by Pavel Machek

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

On Mon 2014-06-16 08:25:54, James Bottomley wrote:
> On Sun, 2014-06-15 at 14:41 -0700, Daniel Phillips wrote:
> > On Friday, June 13, 2014 1:20:39 PM PDT, Pavel Machek wrote:
> > > On Fri 2014-06-13 10:49:39, Daniel Phillips wrote:
> > to sign up for a ridiculous amount of largely thankless, but
> > perhaps fascinating work. Any volunteers?
>
> The whole suggestion is a non starter: we can't stage core API changes.
> Even if we worked out how to do that, the staging trees mostly don't get
> the type of in-depth expert review that you need anyway.

Well.. most filesystems do not need any core API changes, right?

> The Cardinal concern has always been the viability page forking and its
> impact on writeback ... and since writeback is our most difficult an
> performance sensitive area, the bar to changing it is high.

And in this particular case, Daniel was flamed for poor coding style, not
for page forking. So staging/ would actually help him -- he could concentrate
on core changes without being distracted by unimportant stuff.

> When you presented page forking at LSF/MM in 2013, it didn't even stand
> up to basic scrutiny before people found unresolved problems:
>
> http://lwn.net/Articles/548091/
>
> After lots of prodding, you finally coughed up a patch for discussion:
>
> http://thread.gmane.org/gmane.linux.file-systems/85619
>
> But then that petered out again. I can't emphasise enough that
> iterating these threads to a conclusion and reposting interface
> suggestions is the way to proceed on this ... as far as I can tell from
> the discussion, the reviewers were making helpful suggestions, even if
> they didn't like the original interface you proposed.

This obviously needs to be solved, first...
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-06-19 09:27:04

by Lukas Czerner

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

On Thu, 19 Jun 2014, Pavel Machek wrote:

> Date: Thu, 19 Jun 2014 10:21:29 +0200
> From: Pavel Machek <[email protected]>
> To: James Bottomley <[email protected]>
> Cc: Daniel Phillips <[email protected]>, Dave Chinner <[email protected]>,
> [email protected], [email protected],
> Linus Torvalds <[email protected]>,
> Andrew Morton <[email protected]>
> Subject: Re: [RFC] Tux3 for review
>
> On Mon 2014-06-16 08:25:54, James Bottomley wrote:
> > On Sun, 2014-06-15 at 14:41 -0700, Daniel Phillips wrote:
> > > On Friday, June 13, 2014 1:20:39 PM PDT, Pavel Machek wrote:
> > > > On Fri 2014-06-13 10:49:39, Daniel Phillips wrote:
> > > to sign up for a ridiculous amount of largely thankless, but
> > > perhaps fascinating work. Any volunteers?
> >
> > The whole suggestion is a non starter: we can't stage core API changes.
> > Even if we worked out how to do that, the staging trees mostly don't get
> > the type of in-depth expert review that you need anyway.
>
> Well.. most filesystems do not need any core API changes, right?
>
> > The Cardinal concern has always been the viability page forking and its
> > impact on writeback ... and since writeback is our most difficult an
> > performance sensitive area, the bar to changing it is high.
>
> And in this particular case, Daniel was flamed for poor coding style, not
> for page forking. So staging/ would actually help him -- he could concentrate
> on core changes without being distracted by unimportant stuff.

Flamed ? really ? Dave pointed out some serious coding style problems.
Those should be very easy to fix.

Let me remind you some more important problems Dave brought up,
including page forking:

"
The hacks around VFS and MM functionality need to have demonstrated
methods for being removed. We're not going to merge that page
forking stuff (like you were told at LSF 2013 more than a year ago:
http://lwn.net/Articles/548091/) without rigorous design review and
a demonstration of the solutions to all the hard corner cases it
has. The current code doesn't solve them (e.g. direct IO doesn't
work in tux3), and there's no clear patch set we can review that
demonstrates how it is all supposed to work. i.e. you need to
separate out all the page forking code into a separate patchset for
review, independent of the tux3 code and applies to the core mm/
code.
"

"
Then there's all the writeback hacks. You've simply copy-n-pasted
most of fs-writeback.c, including duplicating structures like struct
wb_writeback_work and then hacked in crap (kallsyms lookups!) to be
able to access core structures from kernel module context
(tux3_setup_writeback(), I'm looking at you). This is completely
unacceptible for a merge. Again, you need to separate out all the
writeback changes you need into an independent patchset so that they
can be reviewed independently of the tux3 code that uses it.
"

-Lukas



>
> > When you presented page forking at LSF/MM in 2013, it didn't even stand
> > up to basic scrutiny before people found unresolved problems:
> >
> > http://lwn.net/Articles/548091/
> >
> > After lots of prodding, you finally coughed up a patch for discussion:
> >
> > http://thread.gmane.org/gmane.linux.file-systems/85619
> >
> > But then that petered out again. I can't emphasise enough that
> > iterating these threads to a conclusion and reposting interface
> > suggestions is the way to proceed on this ... as far as I can tell from
> > the discussion, the reviewers were making helpful suggestions, even if
> > they didn't like the original interface you proposed.
>
> This obviously needs to be solved, first...
> Pavel
>

2014-06-19 21:58:34

by Daniel Phillips

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

On Thursday, June 19, 2014 2:26:48 AM PDT, Lukáš Czerner wrote:
> On Thu, 19 Jun 2014, Pavel Machek wrote:
>
>> Date: Thu, 19 Jun 2014 10:21:29 +0200
>> From: Pavel Machek <[email protected]>
>> To: James Bottomley <[email protected]>
>> Cc: Daniel Phillips <[email protected]>, Dave Chinner
>> <[email protected]>,
>> [email protected], [email protected],
> ...
>
> Flamed ? really ?

Yes, really. There were valid points and there were also unabashed flames.
The latter are not helpful to anybody, even the flamer. But note that there
were no counter flames. The boy scout rule applies: always leave your
campsite cleaner than you found it.

> Dave pointed out some serious coding style problems.
> Those should be very easy to fix.

One needs to be careful about the definition of "fix" so that it does not
turn into "throw the baby out with the bath water". Our kernel code
necessarily has a few __KERNEL__ #ifdefs because the majority of it also
runs in user space. This not a feature to disparage, far from it.

Among other benefits, running in user space supports automated unit testing
at fine granularity. We run make tests as a habit to catch a wide spectrum
of correctness regressions. A successful make tests usually indicates that
the heavyweight kernel stress tests are going to pass. Obviously, there are
occasional exceptions to this. For example user space does not catch SMP
races. In practice, only a handful of those have slipped through and
required kernel level bug chasing.

That said, we will will happily merge any concrete suggestion that reduces
the frequency of __KERNEL__. But please be realistic. There are 32
__KERNEL__ ifdefs in our 18K line code base. That hardly amounts to a
"dog's breakfast".

> Let me remind you some more important problems Dave brought up,
> including page forking:
>
> "
> The hacks around VFS and MM functionality need to have demonstrated
> methods for being removed.

We already removed 450 lines of core kernel workarounds from Tux3 with an
approach that was literally cut and pasted from one of Dave's emails. Then
Dave changed his mind. Now the Tux3 team has been assigned a research
project to improve core kernel writeback instead of simply adapting the
approach that is already proven to work well enough. That is a rather
blatant example of "perfect is the enemy of good enough". Please read the
thread.

> We're not going to merge that page
> forking stuff (like you were told at LSF 2013 more than a year ago:
> http://lwn.net/Articles/548091/) without rigorous design review and
> a demonstration of the solutions to all the hard corner cases it
> has. The current code doesn't solve them (e.g. direct IO doesn't
> work in tux3), and there's no clear patch set we can review that
> demonstrates how it is all supposed to work. i.e. you need to
> separate out all the page forking code into a separate patchset for
> review, independent of the tux3 code and applies to the core mm/
> code.
> "

Direct IO is a spurious issue. To recap: direct IO does not introduce any
new page forking issues. All of the page forking issues already exist with
normal buffered IO and mmap. We have little interest and scant available
time for heading off on a tangent to implement direct IO at this point just
as a precondition for merging.

On the other hand, page forking itself has a number of interesting issues.
Hirofumi is currently preparing a set of core kernel patches for review.
These patches explicitly do not attempt to package page forking up into a
nice and easy API that other filesystems could patch in tomorrow. That
would be an unreasonable research burden on our small development team.
Instead, we show how it works in Tux3, and if other filesystems want to get
those benefits, they can make similar changes. If we (the kernel community)
are lucky enough to find a pattern in it such that substantial parts of the
code can be abstracted into a library, then good. But requiring such a
library to be developed as a precondition to merging Tux3 is unreasonable.

> "
> Then there's all the writeback hacks. You've simply copy-n-pasted
> most of fs-writeback.c, including duplicating structures like struct
> wb_writeback_work and then hacked in crap (kallsyms lookups!) to be
> able to access core structures from kernel module context
> (tux3_setup_writeback(), I'm looking at you). This is completely
> unacceptible for a merge. Again, you need to separate out all the
> writeback changes you need into an independent patchset so that they
> can be reviewed independently of the tux3 code that uses it.
> "

That was already fixed as noted above, and all the relevant changes were
already posted as an independent patch set. After that, some developers
weighed in with half formed ideas about how the same thing could be done
better, but without concrete suggestions. There is nothing wrong with half
formed ideas, except when they turn into a way of blocking forward
progress. See "perfect is the enemy of good enough" above.

It is worth noting that we (the kernel community) have been thrashing away
at the writeback problem for more than twenty years, and the current
solution still leaves much to be desired. It is unfair to expect us, the
Tux3 team, to fix that mess in a week or two, just to merge our filesystem.
We prefer to adapt the existing infrastructure for now, as expressed in the
currently proposed patch set. With that, we allow core to mark our inodes
dirty just as it has always done, and we continue to use the usual inode
writeback lists for writeback sheduling, which work just fine.

Regards,

Daniel

2014-06-21 19:29:06

by James Bottomley

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

On Thu, 2014-06-19 at 14:58 -0700, Daniel Phillips wrote:
> On Thursday, June 19, 2014 2:26:48 AM PDT, Luk?? Czerner wrote:
> > Let me remind you some more important problems Dave brought up,
> > including page forking:
> >
> > "
> > The hacks around VFS and MM functionality need to have demonstrated
> > methods for being removed.
>
> We already removed 450 lines of core kernel workarounds from Tux3 with an
> approach that was literally cut and pasted from one of Dave's emails. Then
> Dave changed his mind. Now the Tux3 team has been assigned a research
> project to improve core kernel writeback instead of simply adapting the
> approach that is already proven to work well enough. That is a rather
> blatant example of "perfect is the enemy of good enough". Please read the
> thread.

That's a bit disingenuous: the concern has always been how page forking
interacted with writeback. It's not new, it was one of the major things
brought up at LSF 14 months ago, so you weren't just assigned this.

> > We're not going to merge that page
> > forking stuff (like you were told at LSF 2013 more than a year ago:
> > http://lwn.net/Articles/548091/) without rigorous design review and
> > a demonstration of the solutions to all the hard corner cases it
> > has. The current code doesn't solve them (e.g. direct IO doesn't
> > work in tux3), and there's no clear patch set we can review that
> > demonstrates how it is all supposed to work. i.e. you need to
> > separate out all the page forking code into a separate patchset for
> > review, independent of the tux3 code and applies to the core mm/
> > code.
> > "
>
> Direct IO is a spurious issue. To recap: direct IO does not introduce any
> new page forking issues. All of the page forking issues already exist with
> normal buffered IO and mmap. We have little interest and scant available
> time for heading off on a tangent to implement direct IO at this point just
> as a precondition for merging.

The specific concern is that page forking cannot be made to work with
direct io. Asserting that it doesn't cause any additional problems
isn't an answer to that concern. Direct IO isn't actually a huge issue
for most filesystems (I mean even vfat has it). The fact that you think
it is such a huge deal to implement for tux3 tends to lend credence to
this viewpoint.

The point is that if page forking won't work with direct IO at all, then
it's a broken design and there's no point merging it.

> On the other hand, page forking itself has a number of interesting issues.
> Hirofumi is currently preparing a set of core kernel patches for review.
> These patches explicitly do not attempt to package page forking up into a
> nice and easy API that other filesystems could patch in tomorrow. That
> would be an unreasonable research burden on our small development team.
> Instead, we show how it works in Tux3, and if other filesystems want to get
> those benefits, they can make similar changes. If we (the kernel community)
> are lucky enough to find a pattern in it such that substantial parts of the
> code can be abstracted into a library, then good. But requiring such a
> library to be developed as a precondition to merging Tux3 is unreasonable.

OK, can we take a step back and ask why you're so keen to push this into
the tree? The usual reason is ease of maintenance because in-tree
filesystems get updated as the vfs and mm APIs change. However, the
reciprocal side of that is using standard VFS and MM APIs to make this
update and maintenance easy. The reason no-one wants an in-tree
filesystem that implements its own writeback by hacking into the current
writeback system is that it's a huge maintenance burden. Every time
writeback gets tweaked, tux3 will break meaning either we double the
burden on people updating writeback (to try to figure out how to
replicate the change in tux3) or we just accept that tux3 gets broken.
The former is unacceptable to the filesystem and mm people and the
latter would mean there's not really much point merging tux3 if we just
keep breaking it ... it's better to keep it out of tree where the
breakages can be fixed by people who understand them on their own
timescales.

The object of the exercise is *not* for you to convert every filesystem
to tux3, it's to see if there's a way of integrating enough of page
forking into the current writeback code that tux3 uses standard APIs and
doesn't multiply the burden on the people who maintain and update the
writeback code.

> > "
> > Then there's all the writeback hacks. You've simply copy-n-pasted
> > most of fs-writeback.c, including duplicating structures like struct
> > wb_writeback_work and then hacked in crap (kallsyms lookups!) to be
> > able to access core structures from kernel module context
> > (tux3_setup_writeback(), I'm looking at you). This is completely
> > unacceptible for a merge. Again, you need to separate out all the
> > writeback changes you need into an independent patchset so that they
> > can be reviewed independently of the tux3 code that uses it.
> > "
>
> That was already fixed as noted above, and all the relevant changes were
> already posted as an independent patch set. After that, some developers
> weighed in with half formed ideas about how the same thing could be done
> better, but without concrete suggestions. There is nothing wrong with half
> formed ideas, except when they turn into a way of blocking forward
> progress. See "perfect is the enemy of good enough" above.

Could you post the url to the new series, please, I must have missed it;
seeing the patches that implement the API for insertion into the
writeback code would certainly help frame this discussion.

> It is worth noting that we (the kernel community) have been thrashing away
> at the writeback problem for more than twenty years, and the current
> solution still leaves much to be desired. It is unfair to expect us, the
> Tux3 team, to fix that mess in a week or two, just to merge our filesystem.
> We prefer to adapt the existing infrastructure for now, as expressed in the
> currently proposed patch set. With that, we allow core to mark our inodes
> dirty just as it has always done, and we continue to use the usual inode
> writeback lists for writeback sheduling, which work just fine.

So that's a misunderstanding of expectations; the actual expectation is
that you won't make the writeback problem more difficult to tackle.
Reimplementing writeback within your code in a way that's hacked into
the system is fragile and burdensome: as I said above, it becomes double
the code to maintain and tux3 breaks if its not updated.

James

2014-06-22 01:06:18

by Dave Chinner

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

On Sat, Jun 21, 2014 at 12:29:01PM -0700, James Bottomley wrote:
> On Thu, 2014-06-19 at 14:58 -0700, Daniel Phillips wrote:
> > On Thursday, June 19, 2014 2:26:48 AM PDT, Lukáš Czerner wrote:
> > > Let me remind you some more important problems Dave brought up,
> > > including page forking:
> > >
> > > "
> > > The hacks around VFS and MM functionality need to have demonstrated
> > > methods for being removed.
> >
> > We already removed 450 lines of core kernel workarounds from Tux3 with an
> > approach that was literally cut and pasted from one of Dave's emails. Then
> > Dave changed his mind. Now the Tux3 team has been assigned a research
> > project to improve core kernel writeback instead of simply adapting the
> > approach that is already proven to work well enough. That is a rather
> > blatant example of "perfect is the enemy of good enough". Please read the
> > thread.
>
> That's a bit disingenuous: the concern has always been how page forking
> interacted with writeback. It's not new, it was one of the major things
> brought up at LSF 14 months ago, so you weren't just assigned this.

BTW, it's worth noting that reviewers are *allowed* to change their
mind at any time during a discussion or during review cycles.
Indeed, this occurs quite commonly. It's no different to multiple
reviewers disagreeing on what the best way to make the improvement
is - sometimes it takes an implementation to solidify opinion on the
best approach to solving a problem.

i.e. it took an implementation of the writeback hook tailored
specifically to tux3's requirements to understand the best way to
solve the infrastructure problem for *everyone*. This is how review
is supposed to work - take an idea, and refine it into something
better that works for everyone.

We'd have been stuck way up the creek without a paddle a long time
ago if reviewers weren't allowed to change their minds....

Cheers,

Dave.
--
Dave Chinner
[email protected]

2014-06-22 03:32:17

by Daniel Phillips

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

On Saturday, June 21, 2014 12:29:01 PM PDT, James Bottomley wrote:
> On Thu, 2014-06-19 at 14:58 -0700, Daniel Phillips wrote:
>> We already removed 450 lines of core kernel workarounds from Tux3 with
an
>> approach that was literally cut and pasted from one of Dave's
>> emails. Then
>> Dave changed his mind. Now the Tux3 team has been assigned a research
>> project to improve core kernel writeback instead of simply adapting the
>> approach that is already proven to work well enough. That is a rather
>> blatant example of "perfect is the enemy of good enough". Please read
the
>> thread.
>
> That's a bit disingenuous: the concern has always been how page forking
> interacted with writeback. It's not new, it was one of the major things
> brought up at LSF 14 months ago, so you weren't just assigned this.

[citation needed]

Regards,

Daniel

2014-06-22 14:43:12

by James Bottomley

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

On Sat, 2014-06-21 at 20:32 -0700, Daniel Phillips wrote:
> On Saturday, June 21, 2014 12:29:01 PM PDT, James Bottomley wrote:
> > On Thu, 2014-06-19 at 14:58 -0700, Daniel Phillips wrote:
> >> We already removed 450 lines of core kernel workarounds from Tux3 with
> an
> >> approach that was literally cut and pasted from one of Dave's
> >> emails. Then
> >> Dave changed his mind. Now the Tux3 team has been assigned a research
> >> project to improve core kernel writeback instead of simply adapting the
> >> approach that is already proven to work well enough. That is a rather
> >> blatant example of "perfect is the enemy of good enough". Please read
> the
> >> thread.
> >
> > That's a bit disingenuous: the concern has always been how page forking
> > interacted with writeback. It's not new, it was one of the major things
> > brought up at LSF 14 months ago, so you weren't just assigned this.
>
> [citation needed]

Really? I was there; I remember and it's in my notes of the discussion.
However, it's also in Jon's at paragraph 6 if you need to refer to
something to refresh your memory.

However, when it was spotted isn't the issue; how we add tux3 without a
large maintenance burden on writeback is, as I carefully explained in
the rest of the email you cut.

James

2014-06-22 18:35:07

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

On Sat, Jun 21, 2014 at 08:32:03PM -0700, Daniel Phillips wrote:
> >That's a bit disingenuous: the concern has always been how page forking
> >interacted with writeback. It's not new, it was one of the major things
> >brought up at LSF 14 months ago, so you weren't just assigned this.
>
> [citation needed]

http://lwn.net/Articles/548091/

- Ted

2014-06-24 00:19:41

by Daniel Phillips

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

On Saturday, June 21, 2014 12:29:01 PM PDT, James Bottomley wrote:
> On Thu, 2014-06-19 at 14:58 -0700, Daniel Phillips wrote:
>> On Thursday, June 19, 2014 2:26:48 AM PDT, Lukáš Czerner wrote:
> ...
>
> the concern has always been how page forking interacted with
> writeback.

More accurately, that is just one of several concerns that Tux3
necessarily addresses in order to benefit from this powerful
optimization. We are pleased that the details continue to be of
general interest.

>> Direct IO is a spurious issue. To recap: direct IO does
>> notintroduce any new page forking issues. All of the page forking
>> issues already exist with normal buffered IO and mmap. We have
>> little interest and scant available time for heading off on a
>> tangent to implement direct IO at this point just as a
>> precondition for merging.
> ...
>
> The specific concern is that page forking cannot be made to work
> with direct io. Asserting that it doesn't cause any additional
> problems isn't an answer to that concern.

Yes it is. We are satisfied that direct IO introduces no new issues
with page forking. If you are concerned about a specific issue then
the onus is on you to specify it.

> Direct IO isn't actually a huge issue for most filesystems (I mean
> even vfat has it).

You might consider asking Hirofumi about that (VFAT maintainer).

> ...The fact that you think it is such a huge deal...

(Surely you could have found a less disparaging way to express
yourself...)

> ...to implement for tux3 tends to lend credence to this viewpoint.

It is purely a matter of concentrating on what is actually
important, as opposed to imagined or manufactured. We do not wish
to spend time on direct IO at this point in time. If you have
identified a specific issue then please raise it.

For the record, there is a genuine reason why direct IO requires
extra work for Tux3, which has nothing to do with page forking.
Tux3 has an asynchronous backend, unlike any other local Linux
filesystem (but like Matt Dillon's Hammer, from which we took
inspiration). Direct IO thus requires implementing a new
synchronization mechanism to allow frontend direct IO to use the
backend allocation and writeback mechanisms, because direct IO is
synchronous. There is nothing new, magical or particularly
challenging about that, it is just time consuming work that we do
not intend to do right now because other more important things need
to be done.

In the fullness of time, Tux3 will have direct IO just like VFAT,
however that work is a good candidate for post-merge development.
For example, it could be a good ramp-up project for a new team
member or a student looking to make their mark on the kernel world.

The bottom line is that direct IO has nothing to do with compiling
the kernel or operating a cell phone efficiently, so it is not
interesting to us right now. It will become more interesting when
Tux3 is ready to scale to servers running Oracle and the like.

> The point is that if page forking won't work with direct IO at
> all, then it's a broken design and there's no point merging it.

You can rest assured that direct IO will work with page forking,
given that buffered IO does. We are now discussing details of how
to make core Linux a more hospitable environment for page forking,
not whether page forking can be made to work at all, a question that
was settled by example some time ago.

>> On the other hand, page forking itself has a number of
>> interesting issues. Hirofumi is currently preparing a set of
>> core kernel patches for review. These patches explicitly do
>> not attempt to package page forking up into a nice and easy
>> API that other filesystems could patch in tomorrow. That would
>> be an unreasonable research burden on our small development
>> team.
> ...
>
> OK, can we take a step back and ask why you're so keen to push
> this into the tree?

If you mean, why are we keen to merge Tux3, I should not need to
explain that to you.

If you mean, why are we keen to push page forking per se into
mainline, then the answer is, we are by no means keen to push page
forking into core kernel. Rather, that request comes from other
filesystem developers who recognize it as a plausible way to avoid
the pain of stable pages.

Based on our experience, page forking is properly implemented within
the filesystem, not core kernel, and we are keen only to push the
requisite hooks into core. If somebody disagrees and feels the need
to prove their point by implementing page forking entirely in core,
then they should post patches and we will be the first to applaud.

> The usual reason is ease of maintenance because in-tree
> filesystems get updated as the vfs and mm APIs change. However,
> the reciprocal side of that is using standard VFS and MM APIs to
> make this update and maintenance easy. The reason no-one wants
> an in-tree filesystem that implements its own writeback by
> hacking into the current writeback system is that it's a huge
> maintenance burden.

Every filesystem is a maintenance burden. Core kernel simply must
provide the mechanisms that are required to make the kernel a good
place for filesystems to exist. The fact that some ancient core
hackery needs to be tweaked to better accommodate the requirements
of a modern filesystem is not unusual in any way. Essentially, that
is the entire story of Linux kernel development.

> Every time writeback gets tweaked, tux3 will break meaning either
> we double the burden on people updating writeback (to try to
> figure out how to replicate the change in tux3) or we just accept
> that tux3 gets broken.

No. Tux3 will be less of a burden for writeback maintenance than
other filesystems because it hooks in above the messy writepages
machinery and therefore is not sensitive to subtle changes in that
creaky code.

> The former is unacceptable to the filesystem and mm people and the
> latter would mean there's not really much point merging tux3 if we
> just keep breaking it ... it's better to keep it out of tree
> where the breakages can be fixed by people who understand them on
> their own timescales.

On the face of it you are arguing the case that Tux3 should be
blocked from merging forever, as should every new filesystem, as
Pavel succinctly pointed out. That is less than helpful. But if
your goal is to buttress the public perception that LKML has
become a toxic forum for contributors then you do an admirable
job.

By the way, after reading your polemic an observer might draw the
conclusion that I am not one of the "filesystem and mm people". When
did that change?

>>> ...
>> That was already fixed as noted above, and all the relevant
>> changes were already posted as an independent patch set. After
>> that, some developers weighed in with half formed ideas about
>> how the same thing could be done better, but without concrete
>> suggestions. There is nothing wrong with half formed ideas,
>> except when they turn into a way of blocking forward progress
> ...
>
> Could you post the url to the new series, please, I must have
> missed it; seeing the patches that implement the API for
> insertion into the writeback code would certainly help frame
> this discussion.

We think that our most recently posted patch is the best approach
at this time. Which is to say that it relies on exactly the
existing writeback scheduling heuristics. We think that Dave Chinner
and others are wrong to advocate experimental development of a new
writeback mechanism at this juncture while the current scheme
already works perfectly well for Tux3, either with our writeback
hack or with the new hook.

We further suggest that the new hook is easy to understand and
imposes insignificant new maintenance burden. In any case we will be
happy to assume whatever maintenance burden might arise. Obviously,
that is entirely academic while we are the only user.

>> It is worth noting that we (the kernel community) have been
>> thrashing away at the writeback problem for more than twenty
>> years, and the current solution still leaves much to be
>> desired. It is unfair to expect us, the Tux3 team, to fix that
>> mess in a week or two, just to merge our filesystem. We prefer
>> to adapt the existing infrastructure for now, as expressed in
>> the currently proposed patch set. With that, we allow core to
>> mark our inodes dirty just as it has always done, and we
>> continue to use the usual inode writeback lists for writeback
>> scheduling, which work just fine.
>
> So that's a misunderstanding of expectations...

I did not misunderstand. It is clear from the context you deleted
that we are being pushed to engineer a new core writeback mechanism
instead of adapting the existing one.

> ...the actual expectation is that you won't make the writeback
> problem more difficult to tackle.

We do not make the writeback problem more difficult, which is
obvious from the patch.

> Reimplementing writeback within your code in a way that's hacked
> into the system is fragile and burdensome ... it becomes double
> the code to maintain ... and tux3 breaks if its not updated.

You are preaching to the converted. As you know, we posted a patch
set that eliminates this particular instance of core duplication.
Upcoming patches will eliminate the remaining core duplication. It
is unnecessary to belabor that point further.

Regards,

Daniel

2014-06-24 00:31:58

by Daniel Phillips

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

On Sunday, June 22, 2014 11:34:50 AM PDT, Theodore Ts'o wrote:
> On Sat, Jun 21, 2014 at 08:32:03PM -0700, Daniel Phillips wrote:
>>> That's a bit disingenuous: the concern has always been how page forking
>>> interacted with writeback. It's not new, it was one of the major
things
>>> brought up at LSF 14 months ago, so you weren't just assigned this.
>>
>> [citation needed]
>
> http://lwn.net/Articles/548091/

Thank you Ted, and also thank you for providing an example worth emulating
of collegial behavior on LKML.

Regards,

Daniel

2014-06-24 11:16:25

by Daniel Phillips

[permalink] [raw]
Subject: Re: [RFC] Tux3 for review

On Saturday, June 21, 2014 6:06:00 PM PDT, Dave Chinner wrote:
> BTW, it's worth noting that reviewers are *allowed* to change their
> mind at any time during a discussion or during review cycles.
> Indeed, this occurs quite commonly. It's no different to multiple
> reviewers disagreeing on what the best way to make the improvement
> is - sometimes it takes an implementation to solidify opinion on the
> best approach to solving a problem.

The issue I have is not that you changed your mind per se, but
that you were right the first time and wrong the second time.
As you know, reviewers are not just allowed to change their
minds but are also allowed to be wrong from time to time.

The reason that you were wrong the second time is not that the
interface you proposed is wrong - I believe that we violently
agree about superblock-based writeback as the correct approach
long term - but that the current, inode based writeback already
works well enough for our needs. It therefore makes exactly zero
sense to go off on a tangent to engineer a new core mechanism
at the same time as merging the filesystem. The correct way to
do it is to get a likely user into kernel first (Tux3) and
then engineer the new interface that will be so all-dancing
that you will immediately feel compelled to adopt it for XFS.

Obviously, with only one user of the imperfect/functional
interface the maintenance overhead of updating it to the new
perfect/amazing interface rounds to zero. Remember, this is an
_internal_ API, so the do-not-break rule simply does not apply.
Instead, the "perfect is the enemy of good enough" rule is
operative.

Just to reiterate for the tl;dr amongst us: you were right the
first time. Go ahead and change your mind, but when you finally
realize that you were wrong the second time, please do let us
know.

Meanwhile, we must concentrate on the upcoming page forking
hooks, which promise to provide even more scope for being both
right and wrong, and smart or stupid about which parts of the
kernel should be deeply re-engineered versus prudently adapted
to evolving needs.

Regards,

Daniel