2020-08-29 07:15:01

by milan.opensource

[permalink] [raw]
Subject: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

From: Milan Shah <[email protected]>

This Fix addresses Bug 194757.
Ref: https://bugzilla.kernel.org/show_bug.cgi?id=194757
---
man2/fsync.2 | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/man2/fsync.2 b/man2/fsync.2
index 96401cd..f38b3e4 100644
--- a/man2/fsync.2
+++ b/man2/fsync.2
@@ -186,6 +186,19 @@ In these cases disk caches need to be disabled using
or
.BR sdparm (8)
to guarantee safe operation.
+
+When
+.BR fsync ()
+or
+.BR fdatasync ()
+returns
+.B EIO
+or
+.B ENOSPC
+any error flags on pages in the file mapping are cleared, so subsequent synchronisation attempts
+will return without error. It is
+.I not
+safe to retry synchronisation and assume that a non-error return means prior writes are now on disk.
.SH SEE ALSO
.BR sync (1),
.BR bdflush (2),
--
2.7.4


Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

[Widening the CC to include Andrew and linux-fsdevel@]
[Milan: thanks for the patch, but it's unclear to me from your commit
message how/if you verified the details.]

Andrew, maybe you (or someone else) can comment, since long ago your

commit f79e2abb9bd452d97295f34376dedbec9686b986
Author: Andrew Morton <[email protected]>
Date: Fri Mar 31 02:30:42 2006 -0800

included a comment that is referred to in stackoverflow discussion
about this topic (that SO discussion is in turn referred to by
https://bugzilla.kernel.org/show_bug.cgi?id=194757).

The essence as I understand it, is this:
(1) fsync() (and similar) may fail EIO or ENOSPC, at which point data
has not been synced.
(2) In this case, the EIO/ENOSPC setting is cleared so that...
(3) A subsequent fsync() might return success, but...
(4) That doesn't mean that the data in (1) landed on the disk.

The proposed manual page patch below wants to document this, but I'd
be happy to have an FS-knowledgeable person comment before I apply.

Thanks,

Michael

On Sat, 29 Aug 2020 at 09:13, <[email protected]> wrote:
>
> From: Milan Shah <[email protected]>
>
> This Fix addresses Bug 194757.
> Ref: https://bugzilla.kernel.org/show_bug.cgi?id=194757
> ---
> man2/fsync.2 | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/man2/fsync.2 b/man2/fsync.2
> index 96401cd..f38b3e4 100644
> --- a/man2/fsync.2
> +++ b/man2/fsync.2
> @@ -186,6 +186,19 @@ In these cases disk caches need to be disabled using
> or
> .BR sdparm (8)
> to guarantee safe operation.
> +
> +When
> +.BR fsync ()
> +or
> +.BR fdatasync ()
> +returns
> +.B EIO
> +or
> +.B ENOSPC
> +any error flags on pages in the file mapping are cleared, so subsequent synchronisation attempts
> +will return without error. It is
> +.I not
> +safe to retry synchronisation and assume that a non-error return means prior writes are now on disk.
> .SH SEE ALSO
> .BR sync (1),
> .BR bdflush (2),
> --
> 2.7.4
>


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2020-09-08 11:40:03

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

Added Jeff to CC since he has written the code...

On Mon 07-09-20 09:11:06, Michael Kerrisk (man-pages) wrote:
> [Widening the CC to include Andrew and linux-fsdevel@]
> [Milan: thanks for the patch, but it's unclear to me from your commit
> message how/if you verified the details.]
>
> Andrew, maybe you (or someone else) can comment, since long ago your
>
> commit f79e2abb9bd452d97295f34376dedbec9686b986
> Author: Andrew Morton <[email protected]>
> Date: Fri Mar 31 02:30:42 2006 -0800
>
> included a comment that is referred to in stackoverflow discussion
> about this topic (that SO discussion is in turn referred to by
> https://bugzilla.kernel.org/show_bug.cgi?id=194757).
>
> The essence as I understand it, is this:
> (1) fsync() (and similar) may fail EIO or ENOSPC, at which point data
> has not been synced.
> (2) In this case, the EIO/ENOSPC setting is cleared so that...
> (3) A subsequent fsync() might return success, but...
> (4) That doesn't mean that the data in (1) landed on the disk.

Correct.

> The proposed manual page patch below wants to document this, but I'd
> be happy to have an FS-knowledgeable person comment before I apply.

Just a small comment below:

> On Sat, 29 Aug 2020 at 09:13, <[email protected]> wrote:
> >
> > From: Milan Shah <[email protected]>
> >
> > This Fix addresses Bug 194757.
> > Ref: https://bugzilla.kernel.org/show_bug.cgi?id=194757
> > ---
> > man2/fsync.2 | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/man2/fsync.2 b/man2/fsync.2
> > index 96401cd..f38b3e4 100644
> > --- a/man2/fsync.2
> > +++ b/man2/fsync.2
> > @@ -186,6 +186,19 @@ In these cases disk caches need to be disabled using
> > or
> > .BR sdparm (8)
> > to guarantee safe operation.
> > +
> > +When
> > +.BR fsync ()
> > +or
> > +.BR fdatasync ()
> > +returns
> > +.B EIO
> > +or
> > +.B ENOSPC
> > +any error flags on pages in the file mapping are cleared, so subsequent synchronisation attempts
> > +will return without error. It is
> > +.I not
> > +safe to retry synchronisation and assume that a non-error return means prior writes are now on disk.
> > .SH SEE ALSO
> > .BR sync (1),
> > .BR bdflush (2),

So the error state isn't really stored "on pages in the file mapping".
Current implementation (since 4.14) is that error state is stored in struct
file (I think this tends to be called "file description" in manpages) and
so EIO / ENOSPC is reported once for each file description of the file that
was open before the error happened. Not sure if we want to be so precise in
the manpages or if it just confuses people. Anyway your takeway that no
error on subsequent fsync() does not mean data was written is correct.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-09-08 16:30:33

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

On Tue, 2020-09-08 at 13:27 +0200, Jan Kara wrote:
> Added Jeff to CC since he has written the code...
>
> On Mon 07-09-20 09:11:06, Michael Kerrisk (man-pages) wrote:
> > [Widening the CC to include Andrew and linux-fsdevel@]
> > [Milan: thanks for the patch, but it's unclear to me from your commit
> > message how/if you verified the details.]
> >
> > Andrew, maybe you (or someone else) can comment, since long ago your
> >
> > commit f79e2abb9bd452d97295f34376dedbec9686b986
> > Author: Andrew Morton <[email protected]>
> > Date: Fri Mar 31 02:30:42 2006 -0800
> >
> > included a comment that is referred to in stackoverflow discussion
> > about this topic (that SO discussion is in turn referred to by
> > https://bugzilla.kernel.org/show_bug.cgi?id=194757).
> >
> > The essence as I understand it, is this:
> > (1) fsync() (and similar) may fail EIO or ENOSPC, at which point data
> > has not been synced.
> > (2) In this case, the EIO/ENOSPC setting is cleared so that...
> > (3) A subsequent fsync() might return success, but...
> > (4) That doesn't mean that the data in (1) landed on the disk.
>
> Correct.
>
> > The proposed manual page patch below wants to document this, but I'd
> > be happy to have an FS-knowledgeable person comment before I apply.
>
> Just a small comment below:
>
> > On Sat, 29 Aug 2020 at 09:13, <[email protected]> wrote:
> > > From: Milan Shah <[email protected]>
> > >
> > > This Fix addresses Bug 194757.
> > > Ref: https://bugzilla.kernel.org/show_bug.cgi?id=194757
> > > ---
> > > man2/fsync.2 | 13 +++++++++++++
> > > 1 file changed, 13 insertions(+)
> > >
> > > diff --git a/man2/fsync.2 b/man2/fsync.2
> > > index 96401cd..f38b3e4 100644
> > > --- a/man2/fsync.2
> > > +++ b/man2/fsync.2
> > > @@ -186,6 +186,19 @@ In these cases disk caches need to be disabled using
> > > or
> > > .BR sdparm (8)
> > > to guarantee safe operation.
> > > +
> > > +When
> > > +.BR fsync ()
> > > +or
> > > +.BR fdatasync ()
> > > +returns
> > > +.B EIO
> > > +or
> > > +.B ENOSPC
> > > +any error flags on pages in the file mapping are cleared, so subsequent synchronisation attempts
> > > +will return without error. It is
> > > +.I not
> > > +safe to retry synchronisation and assume that a non-error return means prior writes are now on disk.
> > > .SH SEE ALSO
> > > .BR sync (1),
> > > .BR bdflush (2),
>
> So the error state isn't really stored "on pages in the file mapping".
> Current implementation (since 4.14) is that error state is stored in struct
> file (I think this tends to be called "file description" in manpages) and
> so EIO / ENOSPC is reported once for each file description of the file that
> was open before the error happened. Not sure if we want to be so precise in
> the manpages or if it just confuses people. Anyway your takeway that no
> error on subsequent fsync() does not mean data was written is correct.
>
> Honza
>

Yep.

My only comment is that there is nothing special about EIO and ENOSPC.
All errors are the same in this regard. Basically, issuing a new fsync
after a failed one doesn't do any good. You need to redirty the pages
first.
--
Jeff Layton <[email protected]>

2020-09-08 19:46:32

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

On Tue, 2020-09-08 at 13:27 +0200, Jan Kara wrote:
> Added Jeff to CC since he has written the code...
>
> On Mon 07-09-20 09:11:06, Michael Kerrisk (man-pages) wrote:
> > [Widening the CC to include Andrew and linux-fsdevel@]
> > [Milan: thanks for the patch, but it's unclear to me from your commit
> > message how/if you verified the details.]
> >
> > Andrew, maybe you (or someone else) can comment, since long ago your
> >
> > commit f79e2abb9bd452d97295f34376dedbec9686b986
> > Author: Andrew Morton <[email protected]>
> > Date: Fri Mar 31 02:30:42 2006 -0800
> >
> > included a comment that is referred to in stackoverflow discussion
> > about this topic (that SO discussion is in turn referred to by
> > https://bugzilla.kernel.org/show_bug.cgi?id=194757).
> >
> > The essence as I understand it, is this:
> > (1) fsync() (and similar) may fail EIO or ENOSPC, at which point data
> > has not been synced.
> > (2) In this case, the EIO/ENOSPC setting is cleared so that...
> > (3) A subsequent fsync() might return success, but...
> > (4) That doesn't mean that the data in (1) landed on the disk.
>
> Correct.
>
> > The proposed manual page patch below wants to document this, but I'd
> > be happy to have an FS-knowledgeable person comment before I apply.
>
> Just a small comment below:
>
> > On Sat, 29 Aug 2020 at 09:13, <[email protected]> wrote:
> > > From: Milan Shah <[email protected]>
> > >
> > > This Fix addresses Bug 194757.
> > > Ref: https://bugzilla.kernel.org/show_bug.cgi?id=194757
> > > ---
> > > man2/fsync.2 | 13 +++++++++++++
> > > 1 file changed, 13 insertions(+)
> > >
> > > diff --git a/man2/fsync.2 b/man2/fsync.2
> > > index 96401cd..f38b3e4 100644
> > > --- a/man2/fsync.2
> > > +++ b/man2/fsync.2
> > > @@ -186,6 +186,19 @@ In these cases disk caches need to be disabled using
> > > or
> > > .BR sdparm (8)
> > > to guarantee safe operation.
> > > +
> > > +When
> > > +.BR fsync ()
> > > +or
> > > +.BR fdatasync ()
> > > +returns
> > > +.B EIO
> > > +or
> > > +.B ENOSPC
> > > +any error flags on pages in the file mapping are cleared, so subsequent synchronisation attempts
> > > +will return without error. It is
> > > +.I not
> > > +safe to retry synchronisation and assume that a non-error return means prior writes are now on disk.
> > > .SH SEE ALSO
> > > .BR sync (1),
> > > .BR bdflush (2),
>
> So the error state isn't really stored "on pages in the file mapping".
> Current implementation (since 4.14) is that error state is stored in struct
> file (I think this tends to be called "file description" in manpages) and
> so EIO / ENOSPC is reported once for each file description of the file that
> was open before the error happened. Not sure if we want to be so precise in
> the manpages or if it just confuses people. Anyway your takeway that no
> error on subsequent fsync() does not mean data was written is correct.
>
>

Thinking about it more, I think we ought to spell this out explicitly as
we can in the manpage. This is a point of confusion for a lot of people
and not understanding this can lead to data integrity bugs. Maybe
something like this in the NOTES section?

'''
When fsync returns an error, the file is considered to be "clean". A
subsequent call to fsync will not result in a reattempt to write out the
data, unless that data has been rewritten. Applications that want to
reattempt writing to the file after a transient error must re-write
their data.
'''

To be clear:

In practice, you'd only have to write enough to redirty each page in
most cases.

Also, it is hard to claim that the above behavior is universally true. A
filesystem could opt to keep the pages dirty for some errors, but the
vast majority just toss out the data whenever there is a writeback
problem.


--
Jeff Layton <[email protected]>

Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

Hello Jan,

Thank you for jumping in on this thread.

On 9/8/20 1:27 PM, Jan Kara wrote:
> Added Jeff to CC since he has written the code...
>
> On Mon 07-09-20 09:11:06, Michael Kerrisk (man-pages) wrote:
>> [Widening the CC to include Andrew and linux-fsdevel@]
>> [Milan: thanks for the patch, but it's unclear to me from your commit
>> message how/if you verified the details.]
>>
>> Andrew, maybe you (or someone else) can comment, since long ago your
>>
>> commit f79e2abb9bd452d97295f34376dedbec9686b986
>> Author: Andrew Morton <[email protected]>
>> Date: Fri Mar 31 02:30:42 2006 -0800
>>
>> included a comment that is referred to in stackoverflow discussion
>> about this topic (that SO discussion is in turn referred to by
>> https://bugzilla.kernel.org/show_bug.cgi?id=194757).
>>
>> The essence as I understand it, is this:
>> (1) fsync() (and similar) may fail EIO or ENOSPC, at which point data
>> has not been synced.
>> (2) In this case, the EIO/ENOSPC setting is cleared so that...
>> (3) A subsequent fsync() might return success, but...
>> (4) That doesn't mean that the data in (1) landed on the disk.
>
> Correct.

Thanks for the confirmation!

>> The proposed manual page patch below wants to document this, but I'd
>> be happy to have an FS-knowledgeable person comment before I apply.
>
> Just a small comment below:
>
>> On Sat, 29 Aug 2020 at 09:13, <[email protected]> wrote:
>>>
>>> From: Milan Shah <[email protected]>
>>>
>>> This Fix addresses Bug 194757.
>>> Ref: https://bugzilla.kernel.org/show_bug.cgi?id=194757
>>> ---
>>> man2/fsync.2 | 13 +++++++++++++
>>> 1 file changed, 13 insertions(+)
>>>
>>> diff --git a/man2/fsync.2 b/man2/fsync.2
>>> index 96401cd..f38b3e4 100644
>>> --- a/man2/fsync.2
>>> +++ b/man2/fsync.2
>>> @@ -186,6 +186,19 @@ In these cases disk caches need to be disabled using
>>> or
>>> .BR sdparm (8)
>>> to guarantee safe operation.
>>> +
>>> +When
>>> +.BR fsync ()
>>> +or
>>> +.BR fdatasync ()
>>> +returns
>>> +.B EIO
>>> +or
>>> +.B ENOSPC
>>> +any error flags on pages in the file mapping are cleared, so subsequent synchronisation attempts
>>> +will return without error. It is
>>> +.I not
>>> +safe to retry synchronisation and assume that a non-error return means prior writes are now on disk.
>>> .SH SEE ALSO
>>> .BR sync (1),
>>> .BR bdflush (2),
>
> So the error state isn't really stored "on pages in the file mapping".
> Current implementation (since 4.14) is that error state is stored in struct
> file (I think this tends to be called "file description" in manpages) and

(Yes, "open file description" is the POSIX terminology for the thing that
sits between the FD and the inode--struct file in kernel parlance--and I
try to follow POSIX terminology in the manual pages where possible.

> so EIO / ENOSPC is reported once for each file description of the file that
> was open before the error happened. Not sure if we want to be so precise in
> the manpages or if it just confuses people.

Well, people are confused now, so I think more detail would be good.

> Anyway your takeway that no
> error on subsequent fsync() does not mean data was written is correct.

Thanks. (See also my rply to Jeff.)

By the way, a question related to your comments above. In the
errors section, there is this:

EIO An error occurred during synchronization. This error may
relate to data written to some other file descriptor on the
* same file. Since Linux 4.13, errors from write-back will
be reported to all file descriptors that might have written
the data which triggered the error. Some filesystems
(e.g., NFS) keep close track of which data came through
which file descriptor, and give more precise reporting.
Other filesystems (e.g., most local filesystems) will
report errors to all file descriptors that were open on the
* file when the error was recorded.

In the marked (*) lines, we have the word "file". Is this accurate? I mean, I
would normally take "file" in this context to mean the inode ('struct inode').
But I wonder if really what is meant here is "open file description"
('struct file'). In other words, is the EIO being generated for all FDs
connected to the same open file description, or for all FDs for all of the
open file descriptions connected to the inode? Your thoughts?

Cheers,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

Hello Jeff,

On 9/8/20 9:44 PM, Jeff Layton wrote:
> On Tue, 2020-09-08 at 13:27 +0200, Jan Kara wrote:
>> Added Jeff to CC since he has written the code...
>>
>> On Mon 07-09-20 09:11:06, Michael Kerrisk (man-pages) wrote:
>>> [Widening the CC to include Andrew and linux-fsdevel@]
>>> [Milan: thanks for the patch, but it's unclear to me from your commit
>>> message how/if you verified the details.]
>>>
>>> Andrew, maybe you (or someone else) can comment, since long ago your
>>>
>>> commit f79e2abb9bd452d97295f34376dedbec9686b986
>>> Author: Andrew Morton <[email protected]>
>>> Date: Fri Mar 31 02:30:42 2006 -0800
>>>
>>> included a comment that is referred to in stackoverflow discussion
>>> about this topic (that SO discussion is in turn referred to by
>>> https://bugzilla.kernel.org/show_bug.cgi?id=194757).
>>>
>>> The essence as I understand it, is this:
>>> (1) fsync() (and similar) may fail EIO or ENOSPC, at which point data
>>> has not been synced.
>>> (2) In this case, the EIO/ENOSPC setting is cleared so that...
>>> (3) A subsequent fsync() might return success, but...
>>> (4) That doesn't mean that the data in (1) landed on the disk.
>>
>> Correct.
>>
>>> The proposed manual page patch below wants to document this, but I'd
>>> be happy to have an FS-knowledgeable person comment before I apply.
>>
>> Just a small comment below:
>>
>>> On Sat, 29 Aug 2020 at 09:13, <[email protected]> wrote:
>>>> From: Milan Shah <[email protected]>
>>>>
>>>> This Fix addresses Bug 194757.
>>>> Ref: https://bugzilla.kernel.org/show_bug.cgi?id=194757
>>>> ---
>>>> man2/fsync.2 | 13 +++++++++++++
>>>> 1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/man2/fsync.2 b/man2/fsync.2
>>>> index 96401cd..f38b3e4 100644
>>>> --- a/man2/fsync.2
>>>> +++ b/man2/fsync.2
>>>> @@ -186,6 +186,19 @@ In these cases disk caches need to be disabled using
>>>> or
>>>> .BR sdparm (8)
>>>> to guarantee safe operation.
>>>> +
>>>> +When
>>>> +.BR fsync ()
>>>> +or
>>>> +.BR fdatasync ()
>>>> +returns
>>>> +.B EIO
>>>> +or
>>>> +.B ENOSPC
>>>> +any error flags on pages in the file mapping are cleared, so subsequent synchronisation attempts
>>>> +will return without error. It is
>>>> +.I not
>>>> +safe to retry synchronisation and assume that a non-error return means prior writes are now on disk.
>>>> .SH SEE ALSO
>>>> .BR sync (1),
>>>> .BR bdflush (2),
>>
>> So the error state isn't really stored "on pages in the file mapping".
>> Current implementation (since 4.14) is that error state is stored in struct
>> file (I think this tends to be called "file description" in manpages) and
>> so EIO / ENOSPC is reported once for each file description of the file that
>> was open before the error happened. Not sure if we want to be so precise in
>> the manpages or if it just confuses people. Anyway your takeway that no
>> error on subsequent fsync() does not mean data was written is correct.
>>
>>
>
> Thinking about it more, I think we ought to spell this out explicitly as
> we can in the manpage. This is a point of confusion for a lot of people
> and not understanding this can lead to data integrity bugs. Maybe
> something like this in the NOTES section?
>
> '''
> When fsync returns an error, the file is considered to be "clean". A
> subsequent call to fsync will not result in a reattempt to write out the
> data, unless that data has been rewritten. Applications that want to
> reattempt writing to the file after a transient error must re-write
> their data.
> '''

Thanks. It's incredibly helpful when someone with the needed
domain-specific knowledge suggest a wording!

> To be clear:
>
> In practice, you'd only have to write enough to redirty each page in
> most cases.

Presumably, this could be accomplished by write(2)-ing exactly the
same user space buffers again?

So, I'd like to expand your text a little. How would the following
be:

[[
When fsync() or fdatasync() returns an error, the file is considered
to be "clean", even though the corresponding modified ("dirty") buffer
cache pages may not have been flushed to the storage device. A
subsequent call to fsync() will not result in a reattempt to write out
the data, unless that data has in the meantime been rewritten.
Applications that want to reattempt writing to the file after a
transient error--for example, EIO and ENOSPC can occur because of
transient conditions--must rewrite their data.
]]

How is that text?

> Also, it is hard to claim that the above behavior is universally true. A
> filesystem could opt to keep the pages dirty for some errors, but the
> vast majority just toss out the data whenever there is a writeback
> problem.

I think I won't worry about trying to discuss such variations
in the manual page.

BTW, I added a loosely related question in a reply that I just
sent to Jan. Maybe you have some thoughts there also.

Thanks,

Michael


--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2020-09-09 11:43:05

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

On Wed 09-09-20 12:52:48, Michael Kerrisk (man-pages) wrote:
> > So the error state isn't really stored "on pages in the file mapping".
> > Current implementation (since 4.14) is that error state is stored in struct
> > file (I think this tends to be called "file description" in manpages) and
>
> (Yes, "open file description" is the POSIX terminology for the thing that
> sits between the FD and the inode--struct file in kernel parlance--and I
> try to follow POSIX terminology in the manual pages where possible.
>
> > so EIO / ENOSPC is reported once for each file description of the file that
> > was open before the error happened. Not sure if we want to be so precise in
> > the manpages or if it just confuses people.
>
> Well, people are confused now, so I think more detail would be good.
>
> > Anyway your takeway that no
> > error on subsequent fsync() does not mean data was written is correct.
>
> Thanks. (See also my rply to Jeff.)
>
> By the way, a question related to your comments above. In the
> errors section, there is this:
>
> EIO An error occurred during synchronization. This error may
> relate to data written to some other file descriptor on the
> * same file. Since Linux 4.13, errors from write-back will
> be reported to all file descriptors that might have written
> the data which triggered the error. Some filesystems
> (e.g., NFS) keep close track of which data came through
> which file descriptor, and give more precise reporting.
> Other filesystems (e.g., most local filesystems) will
> report errors to all file descriptors that were open on the
> * file when the error was recorded.
>
> In the marked (*) lines, we have the word "file". Is this accurate? I mean, I
> would normally take "file" in this context to mean the inode ('struct inode').
> But I wonder if really what is meant here is "open file description"
> ('struct file'). In other words, is the EIO being generated for all FDs
> connected to the same open file description, or for all FDs for all of the
> open file descriptions connected to the inode? Your thoughts?

The error gets reported once for each "open file description" of the file
(inode) where the error happened. If there are multiple file descriptors
pointing to the same open file description, then only one of those file
descriptors will see the error. This is inevitable consequence of kernel
storing the error state in struct file and clearing it once it is
reported...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

[CC += Neil, since he wrote the text we're talking about]

Hello Jan,

On 9/9/20 1:21 PM, Jan Kara wrote:
> On Wed 09-09-20 12:52:48, Michael Kerrisk (man-pages) wrote:
>>> So the error state isn't really stored "on pages in the file mapping".
>>> Current implementation (since 4.14) is that error state is stored in struct
>>> file (I think this tends to be called "file description" in manpages) and
>>
>> (Yes, "open file description" is the POSIX terminology for the thing that
>> sits between the FD and the inode--struct file in kernel parlance--and I
>> try to follow POSIX terminology in the manual pages where possible.
>>
>>> so EIO / ENOSPC is reported once for each file description of the file that
>>> was open before the error happened. Not sure if we want to be so precise in
>>> the manpages or if it just confuses people.
>>
>> Well, people are confused now, so I think more detail would be good.
>>
>>> Anyway your takeway that no
>>> error on subsequent fsync() does not mean data was written is correct.
>>
>> Thanks. (See also my rply to Jeff.)
>>
>> By the way, a question related to your comments above. In the
>> errors section, there is this:
>>
>> EIO An error occurred during synchronization. This error may
>> relate to data written to some other file descriptor on the
>> * same file. Since Linux 4.13, errors from write-back will
>> be reported to all file descriptors that might have written
>> the data which triggered the error. Some filesystems
>> (e.g., NFS) keep close track of which data came through
>> which file descriptor, and give more precise reporting.
>> Other filesystems (e.g., most local filesystems) will
>> report errors to all file descriptors that were open on the
>> * file when the error was recorded.
>>
>> In the marked (*) lines, we have the word "file". Is this accurate? I mean, I
>> would normally take "file" in this context to mean the inode ('struct inode').
>> But I wonder if really what is meant here is "open file description"
>> ('struct file'). In other words, is the EIO being generated for all FDs
>> connected to the same open file description, or for all FDs for all of the
>> open file descriptions connected to the inode? Your thoughts?
>
> The error gets reported once for each "open file description" of the file
> (inode) where the error happened. If there are multiple file descriptors
> pointing to the same open file description, then only one of those file
> descriptors will see the error. This is inevitable consequence of kernel
> storing the error state in struct file and clearing it once it is
> reported...

So, the text in wrong two respects, I believe:

* It should be phrased in terms of "open file description", not "file",
in the lines that I marked.

* Where it says "to all file descriptors" (twice), it should rather say
"to any of the file descriptors [that refer to the open file description]"

Do you agree?

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

2020-09-09 15:58:30

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

On Wed 09-09-20 13:58:50, Michael Kerrisk (man-pages) wrote:
> [CC += Neil, since he wrote the text we're talking about]
>
> Hello Jan,
>
> On 9/9/20 1:21 PM, Jan Kara wrote:
> > On Wed 09-09-20 12:52:48, Michael Kerrisk (man-pages) wrote:
> >>> So the error state isn't really stored "on pages in the file mapping".
> >>> Current implementation (since 4.14) is that error state is stored in struct
> >>> file (I think this tends to be called "file description" in manpages) and
> >>
> >> (Yes, "open file description" is the POSIX terminology for the thing that
> >> sits between the FD and the inode--struct file in kernel parlance--and I
> >> try to follow POSIX terminology in the manual pages where possible.
> >>
> >>> so EIO / ENOSPC is reported once for each file description of the file that
> >>> was open before the error happened. Not sure if we want to be so precise in
> >>> the manpages or if it just confuses people.
> >>
> >> Well, people are confused now, so I think more detail would be good.
> >>
> >>> Anyway your takeway that no
> >>> error on subsequent fsync() does not mean data was written is correct.
> >>
> >> Thanks. (See also my rply to Jeff.)
> >>
> >> By the way, a question related to your comments above. In the
> >> errors section, there is this:
> >>
> >> EIO An error occurred during synchronization. This error may
> >> relate to data written to some other file descriptor on the
> >> * same file. Since Linux 4.13, errors from write-back will
> >> be reported to all file descriptors that might have written
> >> the data which triggered the error. Some filesystems
> >> (e.g., NFS) keep close track of which data came through
> >> which file descriptor, and give more precise reporting.
> >> Other filesystems (e.g., most local filesystems) will
> >> report errors to all file descriptors that were open on the
> >> * file when the error was recorded.
> >>
> >> In the marked (*) lines, we have the word "file". Is this accurate? I mean, I
> >> would normally take "file" in this context to mean the inode ('struct inode').
> >> But I wonder if really what is meant here is "open file description"
> >> ('struct file'). In other words, is the EIO being generated for all FDs
> >> connected to the same open file description, or for all FDs for all of the
> >> open file descriptions connected to the inode? Your thoughts?
> >
> > The error gets reported once for each "open file description" of the file
> > (inode) where the error happened. If there are multiple file descriptors
> > pointing to the same open file description, then only one of those file
> > descriptors will see the error. This is inevitable consequence of kernel
> > storing the error state in struct file and clearing it once it is
> > reported...
>
> So, the text in wrong two respects, I believe:
>
> * It should be phrased in terms of "open file description", not "file",
> in the lines that I marked.

No, I believe 'file' is correct on these two lines. I guess I wasn't
precise enough in my explanation of the mechanism :) We actually have two
places where we store error state. There's "error counter" in the inode and
then "last seen error" counter in struct file. Whenever "last seen error"
is less than "inode error counter" we report error from the syscall and set
"last seen error" in the used struct file to the current "inode error
counter". So whenever writeback error happens for the inode, all 'struct
file's will end up reporting the error exactly once.

> * Where it says "to all file descriptors" (twice), it should rather say
> "to any of the file descriptors [that refer to the open file description]"

This is correct but I'm not sure it captures well the fact that each open
file description is guaranteed to get a notification. So maybe I'd rephrase
it like "reported to all file descriptors" -> "reported to all open file
descriptions (if there are multiple file descriptors pointing to the same
open file description, the error is reported only to the first call
regardless of which of the descriptors it uses)"

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2020-09-10 02:20:16

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

On Tue, Sep 08 2020, Jeff Layton wrote:

>
> Yep.
>
> My only comment is that there is nothing special about EIO and ENOSPC.

There are two type of errors that fsync can return.
EBADF EROFS EINVAL - these are usage errors.
EIO ENOSPC EDQUOT - these are functional failures.

So I would say there *is* something special about those errors, though
it isn't *very* special, and it isn't *just* those errors. EDQUOT should
be included in the list.

NeilBrown


> All errors are the same in this regard. Basically, issuing a new fsync
> after a failed one doesn't do any good. You need to redirty the pages
> first.
> --
> Jeff Layton <[email protected]>


Attachments:
signature.asc (869.00 B)

2020-09-10 02:29:39

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

On Tue, Sep 08 2020, Jeff Layton wrote:

> On Tue, 2020-09-08 at 13:27 +0200, Jan Kara wrote:
>> Added Jeff to CC since he has written the code...
>>
>> On Mon 07-09-20 09:11:06, Michael Kerrisk (man-pages) wrote:
>> > [Widening the CC to include Andrew and linux-fsdevel@]
>> > [Milan: thanks for the patch, but it's unclear to me from your commit
>> > message how/if you verified the details.]
>> >
>> > Andrew, maybe you (or someone else) can comment, since long ago your
>> >
>> > commit f79e2abb9bd452d97295f34376dedbec9686b986
>> > Author: Andrew Morton <[email protected]>
>> > Date: Fri Mar 31 02:30:42 2006 -0800
>> >
>> > included a comment that is referred to in stackoverflow discussion
>> > about this topic (that SO discussion is in turn referred to by
>> > https://bugzilla.kernel.org/show_bug.cgi?id=194757).
>> >
>> > The essence as I understand it, is this:
>> > (1) fsync() (and similar) may fail EIO or ENOSPC, at which point data
>> > has not been synced.
>> > (2) In this case, the EIO/ENOSPC setting is cleared so that...
>> > (3) A subsequent fsync() might return success, but...
>> > (4) That doesn't mean that the data in (1) landed on the disk.
>>
>> Correct.
>>
>> > The proposed manual page patch below wants to document this, but I'd
>> > be happy to have an FS-knowledgeable person comment before I apply.
>>
>> Just a small comment below:
>>
>> > On Sat, 29 Aug 2020 at 09:13, <[email protected]> wrote:
>> > > From: Milan Shah <[email protected]>
>> > >
>> > > This Fix addresses Bug 194757.
>> > > Ref: https://bugzilla.kernel.org/show_bug.cgi?id=194757
>> > > ---
>> > > man2/fsync.2 | 13 +++++++++++++
>> > > 1 file changed, 13 insertions(+)
>> > >
>> > > diff --git a/man2/fsync.2 b/man2/fsync.2
>> > > index 96401cd..f38b3e4 100644
>> > > --- a/man2/fsync.2
>> > > +++ b/man2/fsync.2
>> > > @@ -186,6 +186,19 @@ In these cases disk caches need to be disabled using
>> > > or
>> > > .BR sdparm (8)
>> > > to guarantee safe operation.
>> > > +
>> > > +When
>> > > +.BR fsync ()
>> > > +or
>> > > +.BR fdatasync ()
>> > > +returns
>> > > +.B EIO
>> > > +or
>> > > +.B ENOSPC
>> > > +any error flags on pages in the file mapping are cleared, so subsequent synchronisation attempts
>> > > +will return without error. It is
>> > > +.I not
>> > > +safe to retry synchronisation and assume that a non-error return means prior writes are now on disk.
>> > > .SH SEE ALSO
>> > > .BR sync (1),
>> > > .BR bdflush (2),
>>
>> So the error state isn't really stored "on pages in the file mapping".
>> Current implementation (since 4.14) is that error state is stored in struct
>> file (I think this tends to be called "file description" in manpages) and
>> so EIO / ENOSPC is reported once for each file description of the file that
>> was open before the error happened. Not sure if we want to be so precise in
>> the manpages or if it just confuses people. Anyway your takeway that no
>> error on subsequent fsync() does not mean data was written is correct.
>>
>>
>
> Thinking about it more, I think we ought to spell this out explicitly as
> we can in the manpage. This is a point of confusion for a lot of people
> and not understanding this can lead to data integrity bugs. Maybe
> something like this in the NOTES section?
>
> '''
> When fsync returns an error, the file is considered to be "clean". A
> subsequent call to fsync will not result in a reattempt to write out the
> data, unless that data has been rewritten. Applications that want to
> reattempt writing to the file after a transient error must re-write
> their data.
> '''
>
> To be clear:
>
> In practice, you'd only have to write enough to redirty each page in
> most cases.

Nonononono. In practice you have to repeat the entire write because you
cannot know if the cached page is from before the write failure, or has
since been flushed and reloaded.

>
> Also, it is hard to claim that the above behavior is universally true. A
> filesystem could opt to keep the pages dirty for some errors, but the
> vast majority just toss out the data whenever there is a writeback
> problem.

...and any filesystem that doesn't behave that way is wasting effort,
because nothing else can be assumed.

Regarding your "NOTES" addition, I don't feel comfortable with the
"clean" language. I would prefer something like:

When fsync() reports a failure (EIO, ENOSPC, EDQUOT) it must be assumed
that any write requests initiated since the previous successful fsync
was initiated may have failed, and that any cached data may have been
lost. A future fsync() will not attempt to write out the same data
again. If recovery is possible and desired, the application must
repeat all the writes that may have failed.

If the regions of a file that were written to prior to a failed fsync()
are read, the content reported may not reflect the stored content, and
subsequent reads may revert to the stored content at any time.

NeilBrown


>
>
> --
> Jeff Layton <[email protected]>


Attachments:
signature.asc (869.00 B)

2020-09-10 17:47:54

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

On Thu, 2020-09-10 at 09:04 +1000, NeilBrown wrote:
> On Tue, Sep 08 2020, Jeff Layton wrote:
>
> > On Tue, 2020-09-08 at 13:27 +0200, Jan Kara wrote:
> > > Added Jeff to CC since he has written the code...
> > >
> > > On Mon 07-09-20 09:11:06, Michael Kerrisk (man-pages) wrote:
> > > > [Widening the CC to include Andrew and linux-fsdevel@]
> > > > [Milan: thanks for the patch, but it's unclear to me from your commit
> > > > message how/if you verified the details.]
> > > >
> > > > Andrew, maybe you (or someone else) can comment, since long ago your
> > > >
> > > > commit f79e2abb9bd452d97295f34376dedbec9686b986
> > > > Author: Andrew Morton <[email protected]>
> > > > Date: Fri Mar 31 02:30:42 2006 -0800
> > > >
> > > > included a comment that is referred to in stackoverflow discussion
> > > > about this topic (that SO discussion is in turn referred to by
> > > > https://bugzilla.kernel.org/show_bug.cgi?id=194757).
> > > >
> > > > The essence as I understand it, is this:
> > > > (1) fsync() (and similar) may fail EIO or ENOSPC, at which point data
> > > > has not been synced.
> > > > (2) In this case, the EIO/ENOSPC setting is cleared so that...
> > > > (3) A subsequent fsync() might return success, but...
> > > > (4) That doesn't mean that the data in (1) landed on the disk.
> > >
> > > Correct.
> > >
> > > > The proposed manual page patch below wants to document this, but I'd
> > > > be happy to have an FS-knowledgeable person comment before I apply.
> > >
> > > Just a small comment below:
> > >
> > > > On Sat, 29 Aug 2020 at 09:13, <[email protected]> wrote:
> > > > > From: Milan Shah <[email protected]>
> > > > >
> > > > > This Fix addresses Bug 194757.
> > > > > Ref: https://bugzilla.kernel.org/show_bug.cgi?id=194757
> > > > > ---
> > > > > man2/fsync.2 | 13 +++++++++++++
> > > > > 1 file changed, 13 insertions(+)
> > > > >
> > > > > diff --git a/man2/fsync.2 b/man2/fsync.2
> > > > > index 96401cd..f38b3e4 100644
> > > > > --- a/man2/fsync.2
> > > > > +++ b/man2/fsync.2
> > > > > @@ -186,6 +186,19 @@ In these cases disk caches need to be disabled using
> > > > > or
> > > > > .BR sdparm (8)
> > > > > to guarantee safe operation.
> > > > > +
> > > > > +When
> > > > > +.BR fsync ()
> > > > > +or
> > > > > +.BR fdatasync ()
> > > > > +returns
> > > > > +.B EIO
> > > > > +or
> > > > > +.B ENOSPC
> > > > > +any error flags on pages in the file mapping are cleared, so subsequent synchronisation attempts
> > > > > +will return without error. It is
> > > > > +.I not
> > > > > +safe to retry synchronisation and assume that a non-error return means prior writes are now on disk.
> > > > > .SH SEE ALSO
> > > > > .BR sync (1),
> > > > > .BR bdflush (2),
> > >
> > > So the error state isn't really stored "on pages in the file mapping".
> > > Current implementation (since 4.14) is that error state is stored in struct
> > > file (I think this tends to be called "file description" in manpages) and
> > > so EIO / ENOSPC is reported once for each file description of the file that
> > > was open before the error happened. Not sure if we want to be so precise in
> > > the manpages or if it just confuses people. Anyway your takeway that no
> > > error on subsequent fsync() does not mean data was written is correct.
> > >
> > >
> >
> > Thinking about it more, I think we ought to spell this out explicitly as
> > we can in the manpage. This is a point of confusion for a lot of people
> > and not understanding this can lead to data integrity bugs. Maybe
> > something like this in the NOTES section?
> >
> > '''
> > When fsync returns an error, the file is considered to be "clean". A
> > subsequent call to fsync will not result in a reattempt to write out the
> > data, unless that data has been rewritten. Applications that want to
> > reattempt writing to the file after a transient error must re-write
> > their data.
> > '''
> >
> > To be clear:
> >
> > In practice, you'd only have to write enough to redirty each page in
> > most cases.
>
> Nonononono. In practice you have to repeat the entire write because you
> cannot know if the cached page is from before the write failure, or has
> since been flushed and reloaded.
>

Oh, good point! There's no way for userland to know that, so you really
do have to rewrite the whole thing.

> > Also, it is hard to claim that the above behavior is universally true. A
> > filesystem could opt to keep the pages dirty for some errors, but the
> > vast majority just toss out the data whenever there is a writeback
> > problem.
>
> ...and any filesystem that doesn't behave that way is wasting effort,
> because nothing else can be assumed.
>

Yeah. I only made the point to be pedantic. There's no benefit to
documenting that, I think...

> Regarding your "NOTES" addition, I don't feel comfortable with the
> "clean" language. I would prefer something like:
>
> When fsync() reports a failure (EIO, ENOSPC, EDQUOT) it must be assumed
> that any write requests initiated since the previous successful fsync
> was initiated may have failed, and that any cached data may have been
> lost. A future fsync() will not attempt to write out the same data
> again. If recovery is possible and desired, the application must
> repeat all the writes that may have failed.
>
> If the regions of a file that were written to prior to a failed fsync()
> are read, the content reported may not reflect the stored content, and
> subsequent reads may revert to the stored content at any time.
>

Much nicer.

Should we make a distinction between usage and functional classes of
errors in this? The "usage" errors will probably not result in the pages
being tossed out, but the functional ones almost certainly will...

--
Jeff Layton <[email protected]>

2020-09-16 23:27:16

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

On Thu, Sep 10 2020, Jeff Layton wrote:
>
>> Regarding your "NOTES" addition, I don't feel comfortable with the
>> "clean" language. I would prefer something like:
>>
>> When fsync() reports a failure (EIO, ENOSPC, EDQUOT) it must be assumed
>> that any write requests initiated since the previous successful fsync
>> was initiated may have failed, and that any cached data may have been
>> lost. A future fsync() will not attempt to write out the same data
>> again. If recovery is possible and desired, the application must
>> repeat all the writes that may have failed.
>>
>> If the regions of a file that were written to prior to a failed fsync()
>> are read, the content reported may not reflect the stored content, and
>> subsequent reads may revert to the stored content at any time.
>>
>
> Much nicer.

I guess someone should turn it into a patch....

>
> Should we make a distinction between usage and functional classes of
> errors in this? The "usage" errors will probably not result in the pages
> being tossed out, but the functional ones almost certainly will...

Maybe. I think it is a useful distinction, but to be consistent it
would be best to make it in all (section 2) man pages. Maybe not all at
once though. It is really up to Michael if that is a direction he is
interesting in following.

NeilBrown


>
> --
> Jeff Layton <[email protected]>


Attachments:
signature.asc (869.00 B)
Subject: Re: [PATCH] fsync.2: ERRORS: add EIO and ENOSPC

On 9/17/20 1:25 AM, NeilBrown wrote:
> On Thu, Sep 10 2020, Jeff Layton wrote:
>>
>>> Regarding your "NOTES" addition, I don't feel comfortable with the
>>> "clean" language. I would prefer something like:
>>>
>>> When fsync() reports a failure (EIO, ENOSPC, EDQUOT) it must be assumed
>>> that any write requests initiated since the previous successful fsync
>>> was initiated may have failed, and that any cached data may have been
>>> lost. A future fsync() will not attempt to write out the same data
>>> again. If recovery is possible and desired, the application must
>>> repeat all the writes that may have failed.
>>>
>>> If the regions of a file that were written to prior to a failed fsync()
>>> are read, the content reported may not reflect the stored content, and
>>> subsequent reads may revert to the stored content at any time.
>>>
>>
>> Much nicer.
>
> I guess someone should turn it into a patch....

That woud be great.

>> Should we make a distinction between usage and functional classes of
>> errors in this? The "usage" errors will probably not result in the pages
>> being tossed out, but the functional ones almost certainly will...
>
> Maybe. I think it is a useful distinction, but to be consistent it
> would be best to make it in all (section 2) man pages. Maybe not all at
> once though. It is really up to Michael if that is a direction he is
> interesting in following.

I think it's useful, and I'd accept patches that make such
distinctions. Of course, if we said *everything* should get fixed
at the same time, nothing would get fixed :-). So, I think
I'd just take individual patches that made such changes on an
ad hoc basis.

Thanks,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/