From: Dave Chinner <[email protected]>
Commit bbdd6808 ("fs: reserve fallocate flag codepoint") changes the
fallocate(2) syscall interface. The flag that is reserved by this
commit is for functionality that has previously been NAKed on the
-fsdevel mailing list, and so exists out-of-tree.
The reserved syscall flag is completely undocumented, the commit
message doesn't tell us why the patches that use it exist out of
tree, or even why the flag needs to be in the kernel code and not
part of the out-of-tree patches. Further, the flag is not
implemented in any in-tree filesystems and probably never will be
due to the truck-sized security hole it opens up. Finally, we don't
change syscalls purely to support out-of-tree patches or kernel
modules.
The change to the syscall API was written and committed directly to
the ext4 tree by the ext4 maintainer, and merged through that tree
via the ext4 merge without review. According to the commit message,
this was discussed at the Plumber's conference but no documentary
evidence of that discussion exists. However, whether or not this
discussion took place is irrelevant as the proper venue for
discussion of this change is linux-fsdevel; discussions at a
conference are no substitute for a full airing of the change on the
appropriate mailing list.
The method of pushing of such a commit (i.e. written, committed and
pushed by a tree maintainer as part of a larger subsystem merge)
could be seen as designed to avoid review and discussion of a
controversial change that is likely to be NAKed. A long-term
subsystem maintainer should know better than to push changes in this
manner.
The lack of formal review and discussion for a syscall API change is
grounds for reverting patch, especially given the controversial
nature of the feature and the previous discussions and NAKs. The way
the change was pushed into mainline borders on an abuse of the trust
we place in maintainers and hence as a matter of principle this
change should be reverted.
Signed-off-by: Dave Chinner <[email protected]>
---
include/uapi/linux/falloc.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
index 990c4cc..8a7935f 100644
--- a/include/uapi/linux/falloc.h
+++ b/include/uapi/linux/falloc.h
@@ -3,7 +3,6 @@
#define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */
#define FALLOC_FL_PUNCH_HOLE 0x02 /* de-allocates range */
-#define FALLOC_FL_NO_HIDE_STALE 0x04 /* reserved codepoint */
#endif /* _UAPI_FALLOC_H_ */
--
1.7.10
On Tue, Nov 20, 2012 at 10:04:27AM +1100, Dave Chinner wrote:
> The method of pushing of such a commit (i.e. written, committed and
> pushed by a tree maintainer as part of a larger subsystem merge)
> could be seen as designed to avoid review and discussion of a
> controversial change that is likely to be NAKed. A long-term
> subsystem maintainer should know better than to push changes in this
> manner.
Agreed, this needs to be reverted.
Reviewed-by: Christoph Hellwig <[email protected]>
fs: revert commit bbdd6808 to fallocate UAPI
From: Dave Chinner <[email protected]>
Commit bbdd6808 ("fs: reserve fallocate flag codepoint") changes the
fallocate(2) syscall interface. The flag that is reserved by this
commit is for functionality that has previously been NAKed on the
-fsdevel mailing list, and so exists out-of-tree.
The reserved syscall flag is completely undocumented, the commit
message doesn't tell us why the patches that use it exist out of
tree, or even why the flag needs to be in the kernel code and not
part of the out-of-tree patches. Further, the flag is not
implemented in any in-tree filesystems and probably never will be
due to the truck-sized security hole it opens up. Finally, we don't
change syscalls purely to support out-of-tree patches or kernel
modules.
The change to the syscall API was written and committed directly to
the ext4 tree by the ext4 maintainer, and merged through that tree
via the ext4 merge without review. According to the commit message,
this was discussed at the Plumber's conference but no documentary
evidence of that discussion exists. However, whether or not this
discussion took place is irrelevant as the proper venue for
discussion of this change is linux-fsdevel; discussions at a
conference are no substitute for a full airing of the change on the
appropriate mailing list.
The method of pushing of such a commit (i.e. written, committed and
pushed by a tree maintainer as part of a larger subsystem merge)
could be seen as designed to avoid review and discussion of a
controversial change that is likely to be NAKed. A long-term
subsystem maintainer should know better than to push changes in this
manner.
The lack of formal review and discussion for a syscall API change is
grounds for reverting patch, especially given the controversial
nature of the feature and the previous discussions and NAKs. The way
the change was pushed into mainline borders on an abuse of the trust
we place in maintainers and hence as a matter of principle this
change should be reverted.
Signed-off-by: Dave Chinner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
Reason for resend:
- No response to orignal posting here:
http://www.spinics.net/lists/linux-fsdevel/msg59937.html
include/uapi/linux/falloc.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include/uapi/linux/falloc.h b/include/uapi/linux/falloc.h
index 990c4cc..8a7935f 100644
--- a/include/uapi/linux/falloc.h
+++ b/include/uapi/linux/falloc.h
@@ -3,7 +3,6 @@
#define FALLOC_FL_KEEP_SIZE 0x01 /* default is extend size */
#define FALLOC_FL_PUNCH_HOLE 0x02 /* de-allocates range */
-#define FALLOC_FL_NO_HIDE_STALE 0x04 /* reserved codepoint */
#endif /* _UAPI_FALLOC_H_ */
On Mon, Nov 26, 2012 at 11:28:14AM +1100, Dave Chinner wrote:
> fs: revert commit bbdd6808 to fallocate UAPI
>
> From: Dave Chinner <[email protected]>
>
> Commit bbdd6808 ("fs: reserve fallocate flag codepoint") changes the
> fallocate(2) syscall interface. The flag that is reserved by this
> commit is for functionality that has previously been NAKed on the
> -fsdevel mailing list, and so exists out-of-tree.
Hi Linus,
It doesn't change the interface or break anything; it just reserves a
bit so that out-of-tree patches don't collide with future allocations.
There are significant usages of this bit within Google and Tao Bao.
It is true that there has been significant pushback about adding this
functionality on linux-fsdevel; I find it personally frustrating that
in effect, if enough people scream, they can veto an optional feature
that might only be implemented by a single file system.
It's not like there is any shortage of flag bits, so what's the harm
of reserving the bit?
- Ted
Hi Dave,
On 11/26/2012 10:55 AM, Theodore Ts'o wrote:
> On Mon, Nov 26, 2012 at 11:28:14AM +1100, Dave Chinner wrote:
>> fs: revert commit bbdd6808 to fallocate UAPI
>>
>> From: Dave Chinner <[email protected]>
>>
>> Commit bbdd6808 ("fs: reserve fallocate flag codepoint") changes the
>> fallocate(2) syscall interface. The flag that is reserved by this
>> commit is for functionality that has previously been NAKed on the
>> -fsdevel mailing list, and so exists out-of-tree.
>
> Hi Linus,
>
> It doesn't change the interface or break anything; it just reserves a
> bit so that out-of-tree patches don't collide with future allocations.
> There are significant usages of this bit within Google and Tao Bao.
> It is true that there has been significant pushback about adding this
> functionality on linux-fsdevel; I find it personally frustrating that
> in effect, if enough people scream, they can veto an optional feature
> that might only be implemented by a single file system.
Sorry, but we(Tao Bao) should say it explicitly that it is currently
used in our product system. So we are with Ted that there should be no
side effect for reserving just a bit to avoid future conflict?
Thanks
Tao
On Sun, Nov 25, 2012 at 09:55:20PM -0500, Theodore Ts'o wrote:
> On Mon, Nov 26, 2012 at 11:28:14AM +1100, Dave Chinner wrote:
> > fs: revert commit bbdd6808 to fallocate UAPI
> >
> > From: Dave Chinner <[email protected]>
> >
> > Commit bbdd6808 ("fs: reserve fallocate flag codepoint") changes the
> > fallocate(2) syscall interface. The flag that is reserved by this
> > commit is for functionality that has previously been NAKed on the
> > -fsdevel mailing list, and so exists out-of-tree.
>
> Hi Linus,
>
> It doesn't change the interface or break anything; it just reserves a
> bit so that out-of-tree patches don't collide with future allocations.
> There are significant usages of this bit within Google and Tao Bao.
> It is true that there has been significant pushback about adding this
> functionality on linux-fsdevel;
It's not the fact that you want to reserve a bit that is at issue
here - it's the way it's been pushed into the tree that is the
front-and-center issue.
> I find it personally frustrating that
> in effect, if enough people scream, they can veto an optional feature
> that might only be implemented by a single file system.
Having a significant portion of the wider fs development community
disagree with your patches is no reason for subverting the review
process. Besides, that's irrelevant to the issue being discussed,
unless you are describing your motives in an effort to justify your
actions.
In fact, it's even more disturbing if this was your real motive.
That is, is sounds somewhat like you've just admitted that you
pushed this change silently through the ext4 tree to avoid review
and discussion and that you are blaming the rest of the FS community
for forcing you to take such actions.
> It's not like there is any shortage of flag bits, so what's the harm
> of reserving the bit?
The harm has already been done - to the trust we've placed in you as
a maintainer. To argue that the code does no harm is to completely
miss the crux of the issue at hand: principles, process and trust
are far more important in our community than a single line of
code.
Ted, it comes down to trust. If we can't trust you not to push your
own changes to syscall APIs into the mainline tree via backdoor
channels, then how can we trust you not to push the entire
out-of-tree patch into the kernel the same way?
Cheers,
Dave.
--
Dave Chinner
[email protected]
> It's not like there is any shortage of flag bits, so what's the harm
> of reserving the bit?
Why not just reserve a small group of bits for fs private use in that
case - for any fs.
Alan
On Mon, Nov 26, 2012 at 11:53:45AM +0000, Alan Cox wrote:
> > It's not like there is any shortage of flag bits, so what's the harm
> > of reserving the bit?
>
> Why not just reserve a small group of bits for fs private use in that
> case - for any fs.
I think that would be a *fine* idea.
- Ted
On Mon, Nov 26, 2012 at 11:53:45AM +0000, Alan Cox wrote:
> > It's not like there is any shortage of flag bits, so what's the harm
> > of reserving the bit?
>
> Why not just reserve a small group of bits for fs private use in that
> case - for any fs.
Flawed - one bit, one function for all filesystems, otherwise the
same binary could behave very differently on different filesystems.
Besides, we already have a mechanism for adding filesystem specific
interfaces. It's called an ioctl. That's what it's there for - a
free-form extensible interface that can be wholly defined and
contained in the out-of-tree patch.
Most filesystems implement ioctls for their own specific
functionality, including for one-off preallocation semantics (e.g.
XFS_IOC_ZERO_RANGE). There is no reason why ext4 can't do the same
thing and we can drop the whole issue of having to modify a syscall
API with magic, undocumented flag bits with unpredictable
behaviour....
ext4 is not a special snowflake that allows developers to bend rules
whenever they want. If the ext4 developers want to support out of
tree functionality for their filesystem, then they can do it within
their filesystem via ioctls like everyone else does.
Cheers,
Dave.
--
Dave Chinner
[email protected]
Am Montag, 26. November 2012 schrieb Dave Chinner:
> On Mon, Nov 26, 2012 at 11:53:45AM +0000, Alan Cox wrote:
> > > It's not like there is any shortage of flag bits, so what's the
> > > harm of reserving the bit?
> >
> > Why not just reserve a small group of bits for fs private use in that
> > case - for any fs.
>
> Flawed - one bit, one function for all filesystems, otherwise the
> same binary could behave very differently on different filesystems.
>
> Besides, we already have a mechanism for adding filesystem specific
> interfaces. It's called an ioctl. That's what it's there for - a
> free-form extensible interface that can be wholly defined and
> contained in the out-of-tree patch.
>
> Most filesystems implement ioctls for their own specific
> functionality, including for one-off preallocation semantics (e.g.
> XFS_IOC_ZERO_RANGE). There is no reason why ext4 can't do the same
> thing and we can drop the whole issue of having to modify a syscall
> API with magic, undocumented flag bits with unpredictable
> behaviour....
>
> ext4 is not a special snowflake that allows developers to bend rules
> whenever they want. If the ext4 developers want to support out of
> tree functionality for their filesystem, then they can do it within
> their filesystem via ioctls like everyone else does.
I do not develop for the kernel, just test here a bit, there a bit… and I
didn´t read the previous discussions about this patch…
… but I think I can follow this argument of yours, Dave.
And Ted, this does not appear to be screaming to me.
So why no ioctl?
And what functionality is this about anyway? Sometimes I read in some
kernel source and I am happy that sometimes can understood some of it.
Reading "this is used for some magic Google or Tao Bao do with their
filesystem" would decrease my chance to understand whats going on there.
Not quite approbiate for an open source kernel, I think.
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7