In case of a block device error, written parameter in iomap_end()
is zero as opposed to the amount of submitted I/O.
Filesystems such as btrfs need to account for the I/O in ordered
extents, even if it resulted in an error. Having (incomplete)
submitted bytes in written gives the filesystem the amount of data
which has been submitted before the error occurred, and the
filesystem code can choose how to use it.
The final returned error for iomap_dio_rw() is set by
iomap_dio_complete().
Partial writes in direct I/O are considered an error. So,
->iomap_end() using written == 0 as error must be changed
to written < length. In this case, ext4 is the only user.
Signed-off-by: Goldwyn Rodrigues <[email protected]>
---
fs/ext4/inode.c | 2 +-
fs/iomap/direct-io.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index e60aca791d3f..e50e7414351a 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3475,7 +3475,7 @@ static int ext4_iomap_end(struct inode *inode, loff_t offset, loff_t length,
* the I/O. Any blocks that may have been allocated in preparation for
* the direct I/O will be reused during buffered I/O.
*/
- if (flags & (IOMAP_WRITE | IOMAP_DIRECT) && written == 0)
+ if (flags & (IOMAP_WRITE | IOMAP_DIRECT) && written < length)
return -ENOTBLK;
return 0;
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 41c1e7c20a1f..01865db1bd09 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -264,7 +264,7 @@ iomap_dio_bio_actor(struct inode *inode, loff_t pos, loff_t length,
size_t n;
if (dio->error) {
iov_iter_revert(dio->submit.iter, copied);
- copied = ret = 0;
+ ret = 0;
goto out;
}
--
2.25.0
--
Goldwyn
On 2020/02/26 5:53, Christoph Hellwig wrote:
> On Fri, Feb 21, 2020 at 10:21:04AM +0530, Ritesh Harjani wrote:
>>> if (dio->error) {
>>> iov_iter_revert(dio->submit.iter, copied);
>>> - copied = ret = 0;
>>> + ret = 0;
>>> goto out;
>>> }
>>
>> But if I am seeing this correctly, even after there was a dio->error
>> if you return copied > 0, then the loop in iomap_dio_rw will continue
>> for next iteration as well. Until the second time it won't copy
>> anything since dio->error is set and from there I guess it may return
>> 0 which will break the loop.
>
> In addition to that copied is also iov_iter_reexpand call. We don't
> really need the re-expand in case of errors, and in fact we also
> have the iov_iter_revert call before jumping out, so this will
> need a little bit more of an audit and properly documented in the
> commit log.
>
>>
>> Is this the correct flow? Shouldn't the while loop doing
>> iomap_apply in iomap_dio_rw should also break in case of
>> dio->error? Or did I miss anything?
>
> We'd need something there iff we care about a good number of written
> in case of the error. Goldwyn, can you explain what you need this
> number for in btrfs? Maybe with a pointer to the current code base?
Not sure about btrfs, but for zonefs, getting the partial I/O count done for a
failed large dio would also be useful to avoid having to do the error recovery
dance with report zones for getting the current zone write pointer.
--
Damien Le Moal
Western Digital Research