2011-02-25 22:14:42

by Steve Rago

[permalink] [raw]
Subject: [PATCH] Allow O_SYNC to be set by fcntl(F_SETFL)

This has probably been a problem since day 1 (I ran into this running the 2.4 kernel years ago; finally got around to
fixing it). The problem is that fcntl(fd, F_SETFL, flags|O_SYNC) appears to work, but silently ignores the O_SYNC flag.
Opening the file with O_SYNC works okay, but setting it later on via fcntl doesn't work.


Signed-off-by: Steve Rago <[email protected]>
---
fs/fcntl.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index cb10261..afd233a 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -143,7 +143,7 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes)
return ret;
}

-#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME)
+#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME | O_SYNC)

static int setfl(int fd, struct file * filp, unsigned long arg)
{
--
1.7.2.1


2011-04-07 21:37:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Allow O_SYNC to be set by fcntl(F_SETFL)

(did I ever reply to this? I meant to ;))

On Fri, 25 Feb 2011 16:52:36 -0500
Steve Rago <[email protected]> wrote:

> This has probably been a problem since day 1 (I ran into this running the 2.4 kernel years ago; finally got around to
> fixing it). The problem is that fcntl(fd, F_SETFL, flags|O_SYNC) appears to work, but silently ignores the O_SYNC flag.
> Opening the file with O_SYNC works okay, but setting it later on via fcntl doesn't work.
>
>
> Signed-off-by: Steve Rago <[email protected]>
> ---
> fs/fcntl.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index cb10261..afd233a 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -143,7 +143,7 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes)
> return ret;
> }
>
> -#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME)
> +#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME | O_SYNC)

Does any standard say that we should do this?
http://pubs.opengroup.org/onlinepubs/007908799/xsh/fcntl.html does, I
guess.

I worry a bit that this change will surprise people. For example, this
person:
http://koders.com/c/fidA34D8D5EE9AA5D0AB0F3C604678E2E935E5B0246.aspx?s=dupa
is going to wonder why his app suddenly got a lot slower!

Sadly, the kernel silently ignores invalid set bits in `arg', so we
have no reliable way of signaling to the user that our behaviour here
changed.

I wonder if we should sync the file when someone sets O_SYNC this way.
If we don't then there is a period during which we have an fd which has
O_SYNC set, but it has pending unwritten data. An O_SYNC fd should
never be in such a state!

Ho hum. yes, I guess we should apply the patch. But it would have
been better to not have screwed this up in the first place!

2011-04-08 15:14:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] Allow O_SYNC to be set by fcntl(F_SETFL)

I actually prototypes this patch independetly a while ago, and in
addition to the data writeout when removing O_SYNC there are the
following caveats:

- O_SYNC is not actually one flag, but two: O_DSYNC and __O_SYNC.
setfl() needs to make sure __O_SYNC cannot be in f_flags without
O_DSYNC also beeing present.
- we need to audit all filesystems that they don't do stupid things
when the O_SYNC flags appear or disappear during a write, that
is make sure it is checked in just one place. The generic write
code is fine in that respect, but I didn't go through all filesystems
to verify it yet.

2011-04-08 17:56:07

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Allow O_SYNC to be set by fcntl(F_SETFL)

On Fri, 08 Apr 2011 13:39:16 -0400
Steve Rago <[email protected]> wrote:

> > I wonder if we should sync the file when someone sets O_SYNC this way.
> > If we don't then there is a period during which we have an fd which has
> > O_SYNC set, but it has pending unwritten data. An O_SYNC fd should
> > never be in such a state!
>
> Why not?

Because it's inconsistent. An O_SYNC fd never has outstanding writeout.
Except for in this one new and special time window between a setfl()
and the next write().

It's not a big deal, but it's somewhat ugly and merits thinking about.

> If I write something in non-synchronous mode, then change the file descriptor to synchronous mode, I should
> not make any assumptions about what was written prior to this point. If I care that much, I'll call fsync.

Well. You can call fsync() after every write() too.

> All that
> matters is that the operating system honors the contract as specified by the system call API.

There's a lot more to it than that. Things like
quality-of-implementation and principle-of-least-surprise. We used to
have a particular relationship between an O_SYNC fd and the state of
the inode which it represents. With this patch, that relationship no
longer holds.

As I say: not a big deal IMO, but it should be aired and thought about.

2011-04-08 17:59:53

by Steve Rago

[permalink] [raw]
Subject: Re: [PATCH] Allow O_SYNC to be set by fcntl(F_SETFL)

On 04/07/2011 05:37 PM, Andrew Morton wrote:
> (did I ever reply to this? I meant to ;))
>
> On Fri, 25 Feb 2011 16:52:36 -0500
> Steve Rago<[email protected]> wrote:
>
>> This has probably been a problem since day 1 (I ran into this running the 2.4 kernel years ago; finally got around to
>> fixing it). The problem is that fcntl(fd, F_SETFL, flags|O_SYNC) appears to work, but silently ignores the O_SYNC flag.
>> Opening the file with O_SYNC works okay, but setting it later on via fcntl doesn't work.
>>
>>
>> Signed-off-by: Steve Rago<[email protected]>
>> ---
>> fs/fcntl.c | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/fs/fcntl.c b/fs/fcntl.c
>> index cb10261..afd233a 100644
>> --- a/fs/fcntl.c
>> +++ b/fs/fcntl.c
>> @@ -143,7 +143,7 @@ SYSCALL_DEFINE1(dup, unsigned int, fildes)
>> return ret;
>> }
>>
>> -#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME)
>> +#define SETFL_MASK (O_APPEND | O_NONBLOCK | O_NDELAY | O_DIRECT | O_NOATIME | O_SYNC)
>
> Does any standard say that we should do this?
> http://pubs.opengroup.org/onlinepubs/007908799/xsh/fcntl.html does, I
> guess.

It's required by the Single UNIX Specification (POSIX.1). All other major platforms allow it to be set via fcntl. See
bugzilla.kernel.org bug ID #5994.

>
> I worry a bit that this change will surprise people. For example, this
> person:
> http://koders.com/c/fidA34D8D5EE9AA5D0AB0F3C604678E2E935E5B0246.aspx?s=dupa
> is going to wonder why his app suddenly got a lot slower!
>
> Sadly, the kernel silently ignores invalid set bits in `arg', so we
> have no reliable way of signaling to the user that our behaviour here
> changed.
>
> I wonder if we should sync the file when someone sets O_SYNC this way.
> If we don't then there is a period during which we have an fd which has
> O_SYNC set, but it has pending unwritten data. An O_SYNC fd should
> never be in such a state!

Why not? If I write something in non-synchronous mode, then change the file descriptor to synchronous mode, I should
not make any assumptions about what was written prior to this point. If I care that much, I'll call fsync. All that
matters is that the operating system honors the contract as specified by the system call API.

>
> Ho hum. yes, I guess we should apply the patch. But it would have
> been better to not have screwed this up in the first place!
>
>

Agreed. Thanks for not letting this fall through the cracks.

Steve

2011-04-08 21:08:55

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] Allow O_SYNC to be set by fcntl(F_SETFL)

On Fri, Apr 08, 2011 at 10:56:02AM -0700, Andrew Morton wrote:
> Because it's inconsistent. An O_SYNC fd never has outstanding writeout.
> Except for in this one new and special time window between a setfl()
> and the next write().

It might actually have outstanding writes for as long as it eventually
takes the writeback code to push them out. O_SYNC only does a range
writeout for the area that was written.