2018-10-30 22:00:09

by Maximilian Heyne

[permalink] [raw]
Subject: [PATCH] fs: fix lost error code in dio_complete

commit e259221763a40403d5bb232209998e8c45804ab8 ("fs: simplify the
generic_write_sync prototype") reworked callers of generic_write_sync(),
and ended up dropping the error return for the directio path. Prior to
that commit, in dio_complete(), an error would be bubbled up the stack,
but after that commit, errors passed on to dio_complete were eaten up.

This was reported on the list earlier, and a fix was proposed in
https://lore.kernel.org/lkml/[email protected]/, but
never followed up with. We recently hit this bug in our testing where
fencing io errors, which were previously erroring out with EIO, were
being returned as success operations after this commit.

The fix proposed on the list earlier was a little short -- it would have
still called generic_write_sync() in case `ret` already contained an
error. This fix ensures generic_write_sync() is only called when
there's no pending error in the write.

CC: [email protected]
Reported-by: Ravi Nankani <[email protected]>
Signed-off-by: Maximilian Heyne <[email protected]>
Signed-off-by: Torsten Mehlan <[email protected]>
Signed-off-by: Uwe Dannowski <[email protected]>
Signed-off-by: Amit Shah <[email protected]>
Signed-off-by: David Woodhouse <[email protected]>
---
fs/direct-io.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/direct-io.c b/fs/direct-io.c
index 093fb54cd316..199146036093 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -325,8 +325,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, unsigned int flags)
*/
dio->iocb->ki_pos += transferred;

- if (dio->op == REQ_OP_WRITE)
- ret = generic_write_sync(dio->iocb, transferred);
+ if (ret > 0 && dio->op == REQ_OP_WRITE)
+ ret = generic_write_sync(dio->iocb, ret);
dio->iocb->ki_complete(dio->iocb, ret, 0);
}

--
2.16.2

Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B



2018-10-31 05:47:29

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] fs: fix lost error code in dio_complete

Looks good,

Reviewed-by: Christoph Hellwig <[email protected]>

2018-10-31 09:25:26

by Shah, Amit

[permalink] [raw]
Subject: Re: [PATCH] fs: fix lost error code in dio_complete

On Di, 2018-10-30 at 21:57 +0000, Maximilian Heyne wrote:
> commit e259221763a40403d5bb232209998e8c45804ab8 ("fs: simplify the
> generic_write_sync prototype") reworked callers of generic_write_sync(),
> and ended up dropping the error return for the directio path. Prior to
> that commit, in dio_complete(), an error would be bubbled up the stack,
> but after that commit, errors passed on to dio_complete were eaten up.
>
> This was reported on the list earlier, and a fix was proposed in
> https://lore.kernel.org/lkml/[email protected]/, but
> never followed up with.  We recently hit this bug in our testing where
> fencing io errors, which were previously erroring out with EIO, were
> being returned as success operations after this commit.
>
> The fix proposed on the list earlier was a little short -- it would have
> still called generic_write_sync() in case `ret` already contained an
> error.  This fix ensures generic_write_sync() is only called when
> there's no pending error in the write.
>
> CC: [email protected]
> Reported-by: Ravi Nankani <[email protected]>
> Signed-off-by: Maximilian Heyne <[email protected]>
> Signed-off-by: Torsten Mehlan <[email protected]>
> Signed-off-by: Uwe Dannowski <[email protected]>
> Signed-off-by: Amit Shah <[email protected]>
> Signed-off-by: David Woodhouse <[email protected]>
> ---
>  fs/direct-io.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index 093fb54cd316..199146036093 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -325,8 +325,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, unsigned int flags)
>    */
>   dio->iocb->ki_pos += transferred;
>  
> - if (dio->op == REQ_OP_WRITE)
> - ret = generic_write_sync(dio->iocb,  transferred);
> + if (ret > 0 && dio->op == REQ_OP_WRITE)
> + ret = generic_write_sync(dio->iocb, ret);

Is the s/transferred/ret/ change necessary?  Needs explaining, at least.

>   dio->iocb->ki_complete(dio->iocb, ret, 0);
>   }
>  

Thanks,



Amit
Amazon Development Center Germany GmbH
Berlin - Dresden - Aachen
main office: Krausenstr. 38, 10117 Berlin
Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger
Ust-ID: DE289237879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

2018-11-01 08:16:49

by Maximilian Heyne

[permalink] [raw]
Subject: Re: [PATCH] fs: fix lost error code in dio_complete

On 10/31/18 10:24 AM, Shah, Amit wrote:
> On Di, 2018-10-30 at 21:57 +0000, Maximilian Heyne wrote:
>> [...]
>>
>> diff --git a/fs/direct-io.c b/fs/direct-io.c
>> index 093fb54cd316..199146036093 100644
>> --- a/fs/direct-io.c
>> +++ b/fs/direct-io.c
>> @@ -325,8 +325,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, unsigned int flags)
>>    */
>>   dio->iocb->ki_pos += transferred;
>>
>> - if (dio->op == REQ_OP_WRITE)
>> - ret = generic_write_sync(dio->iocb,  transferred);
>> + if (ret > 0 && dio->op == REQ_OP_WRITE)
>> + ret = generic_write_sync(dio->iocb, ret);
> Is the s/transferred/ret/ change necessary?  Needs explaining, at least.

In an above code line `ret` is set to `transferred`. So the change is
a no op. However, in my opinion the construct then looks cleaner.

>>   dio->iocb->ki_complete(dio->iocb, ret, 0);
>>   }
>>
> Thanks,
>
>
>
> Amit



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich
Ust-ID: DE 289 237 879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B

2018-11-01 09:06:54

by Shah, Amit

[permalink] [raw]
Subject: Re: [PATCH] fs: fix lost error code in dio_complete


On Do, 2018-11-01 at 09:03 +0100, Maximilian Heyne wrote:
> On 10/31/18 10:24 AM, Shah, Amit wrote:
> >
> > On Di, 2018-10-30 at 21:57 +0000, Maximilian Heyne wrote:
> > >
> > > [...]
> > >
> > > diff --git a/fs/direct-io.c b/fs/direct-io.c
> > > index 093fb54cd316..199146036093 100644
> > > --- a/fs/direct-io.c
> > > +++ b/fs/direct-io.c
> > > @@ -325,8 +325,8 @@ static ssize_t dio_complete(struct dio *dio, ssize_t ret, unsigned int flags)
> > >     */
> > >    dio->iocb->ki_pos += transferred;
> > >   
> > > - if (dio->op == REQ_OP_WRITE)
> > > - ret = generic_write_sync(dio->iocb,  transferred);
> > > + if (ret > 0 && dio->op == REQ_OP_WRITE)
> > > + ret = generic_write_sync(dio->iocb, ret);
> > Is the s/transferred/ret/ change necessary?  Needs explaining, at least.
> In an above code line `ret` is set to `transferred`. So the change is
> a no op. However, in my opinion the construct then looks cleaner.

Yes, makes it also in line with the other callers, so this is good, thanks.



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrer: Christian Schlaeger, Ralf Herbrich
Ust-ID: DE 289 237 879
Eingetragen am Amtsgericht Charlottenburg HRB 149173 B