2018-11-05 15:19:31

by Gregory Shapiro

[permalink] [raw]
Subject: Re: BUG: aio/direct-io data corruption in 4.7

Hello, my name is Gregory Shapiro and I am a newbie on this list.
I recently encountered data corruption as I got a kernel to
acknowledge write ("io_getevents" system call with a correct number of
bytes) but undergoing write to disk failed.
After investigating the problem I found it is identical to issue found
in direct-io.c mentioned the bellow thread.
https://lore.kernel.org/lkml/[email protected]/
Is there a reason proposed patch didn't apply to the kernel?
When can I expect it to be applied?
Thanks,
Gregory


2018-11-06 07:29:39

by Jack Wang

[permalink] [raw]
Subject: Re: BUG: aio/direct-io data corruption in 4.7

Gregory Shapiro <[email protected]> 于2018年11月5日周一 下午4:19写道:
>
> Hello, my name is Gregory Shapiro and I am a newbie on this list.
> I recently encountered data corruption as I got a kernel to
> acknowledge write ("io_getevents" system call with a correct number of
> bytes) but undergoing write to disk failed.
> After investigating the problem I found it is identical to issue found
> in direct-io.c mentioned the bellow thread.
> https://lore.kernel.org/lkml/[email protected]/
> Is there a reason proposed patch didn't apply to the kernel?
> When can I expect it to be applied?
> Thanks,
> Gregory

Hi Gregory,

Thanks for your info.
Have you tried with latest kernel other than 4.7, is the problem still there?

Could you share your test case?

Regards,
Jack Wang

2018-11-06 11:33:13

by Gregory Shapiro

[permalink] [raw]
Subject: Re: BUG: aio/direct-io data corruption in 4.7

Hi Jack,
I tested it in 4.9.102 and I checked the latest code from elixir
(versions 4.19 and 4.20) and the error in code is still present there.
More on the scenario and the bug:
I experienced data corruption in my application (nvme based storage).
The issue was caused because of faulty hardware, but the real problem
is I got a correct number of bytes in io_getevents thus couldn't
recognize it correctly the error.
Looking at the /var/log/messages and I saw the following errors in
time of coruption:

Oct 11 14:55:15 block01-node05 kernel: [19272.951015]
blk_update_request: I/O error, dev nvme2n3, sector 117359360
Oct 11 14:55:15 block01-node05 kernel: [19272.952786]
blk_update_request: I/O error, dev nvme2n3, sector 117359872
Oct 11 14:55:16 block01-node05 kernel: [19273.544374]
blk_update_request: I/O error, dev nvme2n3, sector 117360384
...
So the block level does receive information about the error, but I
don't see it in the application.
running ftrace and doing code reading I find out that dio error status
is overridden.
In dio_complete it is propagated in (dio->io_error and if
dio->io_error is not zero in we are in async write
the status is overridden by transferred.

static ssize_t dio_complete(struct dio *dio, ssize_t ret, bool is_async)
{
...
if (ret == 0)
ret = dio->page_errors;
if (ret == 0)
ret = dio->io_error;
if (ret == 0)
ret = transferred;
...
if (is_async) {
/*
* generic_write_sync expects ki_pos to have been updated
* already, but the submission path only does this for
* synchronous I/O.
*/
dio->iocb->ki_pos += transferred;

if (dio->op == REQ_OP_WRITE)
ret = generic_write_sync(dio->iocb, transferred);
dio->iocb->ki_complete(dio->iocb, ret, 0);



For your convenience I am attaching ftrace log to for easier tracking
the flow in the code:


26) | nvme_complete_rq [nvme_core]() {
26) | blk_mq_end_request() {
26) | blk_update_request() { <----
log is from here
26) 0.563 us | blk_account_io_completion();
26) 0.263 us | bio_advance();
26) | bio_endio() {
26) | dio_bio_end_aio() {
26) | dio_bio_complete() {
26) | bio_check_pages_dirty() {
26) | bio_put() {
26) | bio_free() {
26) | __bio_free() {
26) 0.045 us | bio_disassociate_task();
26) 0.497 us | }
26) 0.042 us | bvec_free();
26) | mempool_free() {
26) | mempool_free_slab() {
26) 0.264 us | kmem_cache_free();
26) 0.606 us | }
26) 1.125 us | }
26) 2.588 us | }
26) 2.920 us | }
26) 3.979 us | }
26) 4.712 us | }
26) 0.040 us | _raw_spin_lock_irqsave();
26) 0.048 us | _raw_spin_unlock_irqrestore();
26) | dio_complete() {
dio_complete(dio, 0, true);
26) | aio_complete() {
dio->iocb->ki_complete(dio->iocb, ret, 0); <<aio_complete(struct kiocb
*kiocb, long res, long res2)>>
26) 0.073 us | _raw_spin_lock_irqsave();
26) 0.114 us | refill_reqs_available();
26) 0.048 us | _raw_spin_unlock_irqrestore();
26) | kiocb_free() {
26) 0.171 us | fput();
26) 0.102 us | kmem_cache_free();
26) 0.902 us | }


}

On Tue, Nov 6, 2018 at 9:29 AM Jack Wang <[email protected]> wrote:
>
> Gregory Shapiro <[email protected]> 于2018年11月5日周一 下午4:19写道:
> >
> > Hello, my name is Gregory Shapiro and I am a newbie on this list.
> > I recently encountered data corruption as I got a kernel to
> > acknowledge write ("io_getevents" system call with a correct number of
> > bytes) but undergoing write to disk failed.
> > After investigating the problem I found it is identical to issue found
> > in direct-io.c mentioned the bellow thread.
> > https://lore.kernel.org/lkml/[email protected]/
> > Is there a reason proposed patch didn't apply to the kernel?
> > When can I expect it to be applied?
> > Thanks,
> > Gregory
>
> Hi Gregory,
>
> Thanks for your info.
> Have you tried with latest kernel other than 4.7, is the problem still there?
>
> Could you share your test case?
>
> Regards,
> Jack Wang

2018-11-09 15:48:46

by Jack Wang

[permalink] [raw]
Subject: Re: BUG: aio/direct-io data corruption in 4.7

Gregory Shapiro <[email protected]> 于2018年11月6日周二 下午12:31写道:
>
> Hi Jack,
> I tested it in 4.9.102 and I checked the latest code from elixir
> (versions 4.19 and 4.20) and the error in code is still present there.
> More on the scenario and the bug:
> I experienced data corruption in my application (nvme based storage).
> The issue was caused because of faulty hardware, but the real problem
> is I got a correct number of bytes in io_getevents thus couldn't
> recognize it correctly the error.
> Looking at the /var/log/messages and I saw the following errors in
> time of coruption:
Thanks for the info, Gregory.

I noticed guys from Amazon pushing the fix to upstream:
https://lore.kernel.org/patchwork/patch/1008443/
I hope it will be in upstream soon.


Regards,
Jack Wang