LinuxLists.cc - [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

2019-12-02 10:26:29

Subject: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

data may lost when in the follow scene of writeback mode:
1. client write data1 to bcache
2. client fdatasync
3. bcache flush cache set and backing device
if now data1 was not writed back to backing, it was only guaranteed safe in cache.
4.then cache writeback data1 to backing with only REQ_OP_WRITE
So data1 was not guaranteed in non-volatile storage, it may lost if power interruption

Signed-off-by: kungf <[email protected]>
---
drivers/md/bcache/writeback.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
index 4a40f9eadeaf..e5cecb60569e 100644
--- a/drivers/md/bcache/writeback.c
+++ b/drivers/md/bcache/writeback.c
@@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
*/
if (KEY_DIRTY(&w->key)) {
dirty_init(w);
- bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
+ bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
io->bio.bi_iter.bi_sector = KEY_START(&w->key);
bio_set_dev(&io->bio, io->dc->bdev);
io->bio.bi_end_io = dirty_endio;
--
2.17.1

2019-12-02 11:11:44

by Coly Li

[permalink] [raw]

Subject: Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

On 2019/12/2 6:24 下午, kungf wrote:
> data may lost when in the follow scene of writeback mode:
> 1. client write data1 to bcache
> 2. client fdatasync
> 3. bcache flush cache set and backing device
> if now data1 was not writed back to backing, it was only guaranteed safe in cache.
> 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> So data1 was not guaranteed in non-volatile storage, it may lost if power interruption
>

Hi,

Do you encounter such problem in real work load ? With bcache journal, I
don't see the possibility of data lost with your description.

Correct me if I am wrong.

Coly Li

> Signed-off-by: kungf <[email protected]>
> ---
> drivers/md/bcache/writeback.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> index 4a40f9eadeaf..e5cecb60569e 100644
> --- a/drivers/md/bcache/writeback.c
> +++ b/drivers/md/bcache/writeback.c
> @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
> */
> if (KEY_DIRTY(&w->key)) {
> dirty_init(w);
> - bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> + bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
> io->bio.bi_iter.bi_sector = KEY_START(&w->key);
> bio_set_dev(&io->bio, io->dc->bdev);
> io->bio.bi_end_io = dirty_endio;
>

2019-12-02 19:42:36

by Eric Wheeler

[permalink] [raw]

Subject: Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

On Mon, 2 Dec 2019, Coly Li wrote:
> On 2019/12/2 6:24 下午, kungf wrote:
> > data may lost when in the follow scene of writeback mode:
> > 1. client write data1 to bcache
> > 2. client fdatasync
> > 3. bcache flush cache set and backing device
> > if now data1 was not writed back to backing, it was only guaranteed safe in cache.
> > 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> > So data1 was not guaranteed in non-volatile storage, it may lost if power interruption
> >
>
> Hi,
>
> Do you encounter such problem in real work load ? With bcache journal, I
> don't see the possibility of data lost with your description.
>
> Correct me if I am wrong.
>
> Coly Li

If this does become necessary, then we should have a sysfs or superblock
flag to disable FUA for those with RAID BBUs.

--
Eric Wheeler

> > Signed-off-by: kungf <[email protected]>
> > ---
> > drivers/md/bcache/writeback.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> > index 4a40f9eadeaf..e5cecb60569e 100644
> > --- a/drivers/md/bcache/writeback.c
> > +++ b/drivers/md/bcache/writeback.c
> > @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
> > */
> > if (KEY_DIRTY(&w->key)) {
> > dirty_init(w);
> > - bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> > + bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
> > io->bio.bi_iter.bi_sector = KEY_START(&w->key);
> > bio_set_dev(&io->bio, io->dc->bdev);
> > io->bio.bi_end_io = dirty_endio;
> >
>
>

2019-12-03 11:41:10

by kungf

[permalink] [raw]

Subject: Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

On Mon, 2 Dec 2019 19:08:59 +0800
Coly Li <[email protected]> wrote:

> On 2019/12/2 6:24 pm, kungf wrote:
> > data may lost when in the follow scene of writeback mode:
> > 1. client write data1 to bcache
> > 2. client fdatasync
> > 3. bcache flush cache set and backing device
> > if now data1 was not writed back to backing, it was only guaranteed safe in cache.
> > 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> > So data1 was not guaranteed in non-volatile storage, it may lost if power interruption?
> >
>
> Hi,
>
> Do you encounter such problem in real work load ? With bcache journal, I
> don't see the possibility of data lost with your description.
>
> Correct me if I am wrong.
>
> Coly Li

Hi Coly?

Sorry to confuse you. As i known now, write_dirty function write dirty to backing without FUA, and write_dirty_finish make dirty key clean,
it means the data indexed by the key will not be writeback again, am i wrong?
I only find that the backing device will be flushed when bcache get an PREFLUSH bio, any other place it will be flushed in journal?

I made a test that write bcache with dd, and then detach it, blktrace the cache and backing device at the same time.
1. close writeback
# echo 0 > /sys/block/bcache0/bcache/writeback_running
2. write data with a fdatasync
#dd if=/dev/zero of=/dev/bcache0 bs=16k count=1 oflag=direct
3. detach and trigger writeback
#echo b1f40ca5-37a3-4852-9abf-6abed96d71db >/sys/block/bcache0/bcache/detach

the blow text is blkparse result.
from cache blktrace blow, we can see 16k data write to cache set, and then flush with op FWFSM(PREFLUSH|WRITE|FUA|SYNC|META)
```
8,160 33 1 0.000000000 222844 A W 630609920 + 32 <- (8,167) 1464320
8,167 33 2 0.000000478 222844 Q W 630609920 + 32 [dd]
8,167 33 3 0.000006167 222844 G W 630609920 + 32 [dd]
8,167 33 5 0.000011385 222844 I W 630609920 + 32 [dd]
8,167 33 6 0.000023890 948 D W 630609920 + 32 [kworker/33:1H]
8,167 33 7 0.000111203 0 C W 630609920 + 32 [0]
8,160 34 1 0.000167029 215616 A FWFSM 629153808 + 8 <- (8,167) 8208
8,167 34 2 0.000167490 215616 Q FWFSM 629153808 + 8 [kworker/34:2]
8,167 34 3 0.000169061 215616 G FWFSM 629153808 + 8 [kworker/34:2]
8,167 34 4 0.000301308 949 D WFSM 629153808 + 8 [kworker/34:1H]
8,167 34 5 0.000348832 0 C WFSM 629153808 + 8 [0]
8,167 34 6 0.000349612 0 C WFSM 629153808 [0]
```

from backing blktrace blow, the backing device first get flush op FWS(PERFLUSH|WRITE|SYNC) because of we stop writeback, then get W op after detach,
the 16k data was writeback to backing device, and after this, the backing device never get flush op, it means that the 16k data we write it's not safe
in backing device, even we had write with fdatasync.
```
8,144 33 1 0.000000000 222844 Q WSM 8 + 8 [dd]
8,144 33 2 0.000016609 222844 G WSM 8 + 8 [dd]
8,144 33 5 0.000020710 222844 I WSM 8 + 8 [dd]
8,144 33 6 0.000031967 948 D WSM 8 + 8 [kworker/33:1H]
8,144 33 7 0.000152945 88631 C WS 16 + 32 [0]
8,144 34 1 0.000186127 215616 Q FWS [kworker/34:2]
8,144 34 2 0.000187006 215616 G FWS [kworker/34:2]
8,144 33 8 0.000326761 0 C WSM 8 + 8 [0]
8,144 34 3 0.020195027 0 C WS 16 [0]
8,144 34 4 0.020195904 0 C FWS 16 [0]
8,144 23 1 19.415130395 215884 Q W 16 + 32 [kworker/23:2]
8,144 23 2 19.415132072 215884 G W 16 + 32 [kworker/23:2]
8,144 23 3 19.415133134 215884 I W 16 + 32 [kworker/23:2]
8,144 23 4 19.415137776 1215 D W 16 + 32 [kworker/23:1H]
8,144 23 5 19.416607260 0 C W 16 + 32 [0]
8,144 24 1 19.416640754 222593 Q WSM 8 + 8 [bcache_writebac]
8,144 24 2 19.416642698 222593 G WSM 8 + 8 [bcache_writebac]
8,144 24 3 19.416643505 222593 I WSM 8 + 8 [bcache_writebac]
8,144 24 4 19.416650589 1107 D WSM 8 + 8 [kworker/24:1H]
8,144 24 5 19.416865258 0 C WSM 8 + 8 [0]
8,144 24 6 19.416871350 221889 Q WSM 8 + 8 [kworker/24:1]
8,144 24 7 19.416872201 221889 G WSM 8 + 8 [kworker/24:1]
8,144 24 8 19.416872542 221889 I WSM 8 + 8 [kworker/24:1]
8,144 24 9 19.416875458 1107 D WSM 8 + 8 [kworker/24:1H]
8,144 24 10 19.417076935 0 C WSM 8 + 8 [0]
```
>
> > Signed-off-by: kungf <[email protected]>
> > ---
> > drivers/md/bcache/writeback.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> > index 4a40f9eadeaf..e5cecb60569e 100644
> > --- a/drivers/md/bcache/writeback.c
> > +++ b/drivers/md/bcache/writeback.c
> > @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
> > */
> > if (KEY_DIRTY(&w->key)) {
> > dirty_init(w);
> > - bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> > + bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
> > io->bio.bi_iter.bi_sector = KEY_START(&w->key);
> > bio_set_dev(&io->bio, io->dc->bdev);
> > io->bio.bi_end_io = dirty_endio;
> >
>
--
kungf <[email protected]>

2019-12-03 14:10:37

by Coly Li

[permalink] [raw]

Subject: Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

On 2019/12/3 3:16 下午, kungf wrote:
>
>
> On Mon, 2 Dec 2019 at 19:09, Coly Li <[email protected]
> <mailto:[email protected]>> wrote:
>>
>> On 2019/12/2 6:24 下午, kungf wrote:
>> > data may lost when in the follow scene of writeback mode:
>> > 1. client write data1 to bcache
>> > 2. client fdatasync
>> > 3. bcache flush cache set and backing device
>> > if now data1 was not writed back to backing, it was only guaranteed
> safe in cache.
>> > 4.then cache writeback data1 to backing with only REQ_OP_WRITE
>> > So data1 was not guaranteed in non-volatile storage, it may lost if
> power interruption
>> >
>>
>> Hi,
>>
>> Do you encounter such problem in real work load ? With bcache journal, I
>> don't see the possibility of data lost with your description.
>>
>> Correct me if I am wrong.
>>
>> Coly Li
>>
> Hi Coly,
>
> Sorry to confuse you. As i known now, write_dirty function write dirty
> to backing without FUA，and write_dirty_finish make dirty key clean,
> it means the data indexed by the key will not be writeback again, am i
> wrong?

Yes, you are right. This is the behavior as design. We don't guarantee
the data will be on always on platter, this is what most storage systems do.

> I only find that the backing device will be flushed when bcache get an
> PREFLUSH bio, any other place it will be flushed in journal?
>

Storage system flushes its buffer when upper layer requires, that means
if the application wants to make its writing data flushed on platter, it
should explicitly issue a flush request.

What you observe and test are all as designed IMHO. The I/O stack does
not guarantee any data persistent on storage media unless an explicit
flush request received from upper layer and returned to upper layer.

Coly Li

> I made a test that write bcache with dd，and then detach it, blktrace
> the cache and backing device at the same time.
> 1. close writeback
> # echo 0 > /sys/block/bcache0/bcache/writeback_running
> 2. write data with a fdatasync
> #dd if=/dev/zero of=/dev/bcache0 bs=16k count=1 oflag=direct
> 3. detach and trigger writeback
> #echo b1f40ca5-37a3-4852-9abf-6abed96d71db >/sys/block/bcache0/bcache/detach
>
> the blow text is blkparse result.
> from cache blktrace blow, we can see 16k data write to cache set, and
> then flush with op FWFSM (PREFLUSH| WRITE| FUA|SYNC|META )
> ```
> 8,160 33 1 0.000000000 222844 A W 630609920 + 32 <-
> (8,167) 1464320
> 8,167 33 2 0.000000478 222844 Q W 630609920 + 32 [dd]
> 8,167 33 3 0.000006167 222844 G W 630609920 + 32 [dd]
> 8,167 33 5 0.000011385 222844 I W 630609920 + 32 [dd]
> 8,167 33 6 0.000023890 948 D W 630609920 + 32
> [kworker/33:1H]
> 8,167 33 7 0.000111203 0 C W 630609920 + 32 [0]
> 8,160 34 1 0.000167029 215616 A FWFSM 629153808 + 8 <-
> (8,167) 8208
> 8,167 34 2 0.000167490 215616 Q FWFSM 629153808 + 8
> [kworker/34:2]
> 8,167 34 3 0.000169061 215616 G FWFSM 629153808 + 8
> [kworker/34:2]
> 8,167 34 4 0.000301308 949 D WFSM 629153808 + 8
> [kworker/34:1H]
> 8,167 34 5 0.000348832 0 C WFSM 629153808 + 8 [0]
> 8,167 34 6 0.000349612 0 C WFSM 629153808 [0]
> ```
>
> from backing blktrace blow, the backing device first get flush op FWS
> (PERFLUSH|WRITE|SYNC) because of we stop writeback, then get W op after
> detach,
> the 16k data was writeback to backing device, and after this, the
> backing device never get flush op, */it means that the 16k data we write
> it's not safe in backing/*
> */device, even we dd write with fdatasync./*
> ```
> 8,144 33 1 0.000000000 222844 Q WSM 8 + 8 [dd]
> 8,144 33 2 0.000016609 222844 G WSM 8 + 8 [dd]
> 8,144 33 5 0.000020710 222844 I WSM 8 + 8 [dd]
> 8,144 33 6 0.000031967 948 D WSM 8 + 8 [kworker/33:1H]
> 8,144 33 7 0.000152945 88631 C WS 16 + 32 [0]
> 8,144 34 1 0.000186127 215616 Q FWS [kworker/34:2]
> 8,144 34 2 0.000187006 215616 G FWS [kworker/34:2]
> 8,144 33 8 0.000326761 0 C WSM 8 + 8 [0]
> 8,144 34 3 0.020195027 0 C WS 16 [0]
> 8,144 34 4 0.020195904 0 C FWS 16 [0]
> 8,144 23 1 19.415130395 215884 Q W 16 + 32 [kworker/23:2]
> 8,144 23 2 19.415132072 215884 G W 16 + 32 [kworker/23:2]
> 8,144 23 3 19.415133134 215884 I W 16 + 32 [kworker/23:2]
> 8,144 23 4 19.415137776 1215 D W 16 + 32 [kworker/23:1H]
> 8,144 23 5 19.416607260 0 C W 16 + 32 [0]
> 8,144 24 1 19.416640754 222593 Q WSM 8 + 8 [bcache_writebac]
> 8,144 24 2 19.416642698 222593 G WSM 8 + 8 [bcache_writebac]
> 8,144 24 3 19.416643505 222593 I WSM 8 + 8 [bcache_writebac]
> 8,144 24 4 19.416650589 1107 D WSM 8 + 8 [kworker/24:1H]
> 8,144 24 5 19.416865258 0 C WSM 8 + 8 [0]
> 8,144 24 6 19.416871350 221889 Q WSM 8 + 8 [kworker/24:1]
> 8,144 24 7 19.416872201 221889 G WSM 8 + 8 [kworker/24:1]
> 8,144 24 8 19.416872542 221889 I WSM 8 + 8 [kworker/24:1]
> 8,144 24 9 19.416875458 1107 D WSM 8 + 8 [kworker/24:1H]
> 8,144 24 10 19.417076935 0 C WSM 8 + 8 [0]
> ```
>
>
>
> On Mon, 2 Dec 2019 at 19:09, Coly Li <[email protected]
> <mailto:[email protected]>> wrote:
>
> On 2019/12/2 6:24 下午, kungf wrote:
> > data may lost when in the follow scene of writeback mode:
> > 1. client write data1 to bcache
> > 2. client fdatasync
> > 3. bcache flush cache set and backing device
> > if now data1 was not writed back to backing, it was only
> guaranteed safe in cache.
> > 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> > So data1 was not guaranteed in non-volatile storage, it may lost
> if power interruption
> >
>
> Hi,
>
> Do you encounter such problem in real work load ? With bcache journal, I
> don't see the possibility of data lost with your description.
>
> Correct me if I am wrong.
>
> Coly Li
>
> > Signed-off-by: kungf <[email protected]
> <mailto:[email protected]>>
> > ---
> > drivers/md/bcache/writeback.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/md/bcache/writeback.c
> b/drivers/md/bcache/writeback.c
> > index 4a40f9eadeaf..e5cecb60569e 100644
> > --- a/drivers/md/bcache/writeback.c
> > +++ b/drivers/md/bcache/writeback.c
> > @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
> > */
> > if (KEY_DIRTY(&w->key)) {
> > dirty_init(w);
> > - bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> > + bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
> > io->bio.bi_iter.bi_sector = KEY_START(&w->key);
> > bio_set_dev(&io->bio, io->dc->bdev);
> > io->bio.bi_end_io = dirty_endio;
> >
>

2019-12-03 14:23:55

by Coly Li

[permalink] [raw]

Subject: Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

On 2019/12/3 3:34 上午, Eric Wheeler wrote:
> On Mon, 2 Dec 2019, Coly Li wrote:
>> On 2019/12/2 6:24 下午, kungf wrote:
>>> data may lost when in the follow scene of writeback mode:
>>> 1. client write data1 to bcache
>>> 2. client fdatasync
>>> 3. bcache flush cache set and backing device
>>> if now data1 was not writed back to backing, it was only guaranteed safe in cache.
>>> 4.then cache writeback data1 to backing with only REQ_OP_WRITE
>>> So data1 was not guaranteed in non-volatile storage, it may lost if power interruption
>>>
>>
>> Hi,
>>
>> Do you encounter such problem in real work load ? With bcache journal, I
>> don't see the possibility of data lost with your description.
>>
>> Correct me if I am wrong.
>>
>> Coly Li
>
> If this does become necessary, then we should have a sysfs or superblock
> flag to disable FUA for those with RAID BBUs.

Hi Eric,

I doubt it is necessary to add FUA tag for all writeback bios, it is
unnecessary. If power failure happens after dirty data written to
backing device and the bkey turns into clean, a following read request
will go to cache device because the LBA can be indexed no matter it is
dirty or clean. Unless the bkey is invalidated from the B+tree, read
will always go to cache device firstly in writeback mode. If a power
failure happens before the cached bkey turns from dirty to clean, just
an extra writeback bio flushed from cache device to backing device with
identical data. Comparing the FUA tag for all writeback bios (it will be
really slow), the extra writeback IOs after a power failure is more
acceptable to me.

Coly Li

>
>>> Signed-off-by: kungf <[email protected]>
>>> ---
>>> drivers/md/bcache/writeback.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
>>> index 4a40f9eadeaf..e5cecb60569e 100644
>>> --- a/drivers/md/bcache/writeback.c
>>> +++ b/drivers/md/bcache/writeback.c
>>> @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
>>> */
>>> if (KEY_DIRTY(&w->key)) {
>>> dirty_init(w);
>>> - bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
>>> + bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
>>> io->bio.bi_iter.bi_sector = KEY_START(&w->key);
>>> bio_set_dev(&io->bio, io->dc->bdev);
>>> io->bio.bi_end_io = dirty_endio;
>>>
>>

2019-12-04 10:56:29

by Coly Li

[permalink] [raw]

Subject: Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

On 2019/12/3 10:21 下午, Coly Li wrote:
> On 2019/12/3 3:34 上午, Eric Wheeler wrote:
>> On Mon, 2 Dec 2019, Coly Li wrote:
>>> On 2019/12/2 6:24 下午, kungf wrote:
>>>> data may lost when in the follow scene of writeback mode:
>>>> 1. client write data1 to bcache
>>>> 2. client fdatasync
>>>> 3. bcache flush cache set and backing device
>>>> if now data1 was not writed back to backing, it was only guaranteed safe in cache.
>>>> 4.then cache writeback data1 to backing with only REQ_OP_WRITE
>>>> So data1 was not guaranteed in non-volatile storage, it may lost if power interruption
>>>>
>>>
>>> Hi,
>>>
>>> Do you encounter such problem in real work load ? With bcache journal, I
>>> don't see the possibility of data lost with your description.
>>>
>>> Correct me if I am wrong.
>>>
>>> Coly Li
>>
>> If this does become necessary, then we should have a sysfs or superblock
>> flag to disable FUA for those with RAID BBUs.
>
> Hi Eric,
>
> I doubt it is necessary to add FUA tag for all writeback bios, it is
> unnecessary. If power failure happens after dirty data written to
> backing device and the bkey turns into clean, a following read request
> will go to cache device because the LBA can be indexed no matter it is
> dirty or clean. Unless the bkey is invalidated from the B+tree, read
> will always go to cache device firstly in writeback mode. If a power
> failure happens before the cached bkey turns from dirty to clean, just
> an extra writeback bio flushed from cache device to backing device with
> identical data. Comparing the FUA tag for all writeback bios (it will be
> really slow), the extra writeback IOs after a power failure is more
> acceptable to me.

Hi Eric,

I come to realize what the problem is. Yes there is potential
possibility.With FUA the writeback performance will be very low, it is
quite tricky....

Coly Li

2019-12-06 00:05:19

by Eric Wheeler

[permalink] [raw]

Subject: Re: [PATCH] bcache: add REQ_FUA to avoid data lost in writeback mode

On Tue, 3 Dec 2019, Coly Li wrote:

> On 2019/12/3 3:34 上午, Eric Wheeler wrote:
> > On Mon, 2 Dec 2019, Coly Li wrote:
> >> On 2019/12/2 6:24 下午, kungf wrote:
> >>> data may lost when in the follow scene of writeback mode:
> >>> 1. client write data1 to bcache
> >>> 2. client fdatasync
> >>> 3. bcache flush cache set and backing device
> >>> if now data1 was not writed back to backing, it was only guaranteed safe in cache.
> >>> 4.then cache writeback data1 to backing with only REQ_OP_WRITE
> >>> So data1 was not guaranteed in non-volatile storage, it may lost if power interruption
> >>>
> >>
> >> Hi,
> >>
> >> Do you encounter such problem in real work load ? With bcache journal, I
> >> don't see the possibility of data lost with your description.
> >>
> >> Correct me if I am wrong.
> >>
> >> Coly Li
> >
> > If this does become necessary, then we should have a sysfs or superblock
> > flag to disable FUA for those with RAID BBUs.
>
> Hi Eric,
>
> I doubt it is necessary to add FUA tag for all writeback bios, it is
> unnecessary. If power failure happens after dirty data written to
> backing device and the bkey turns into clean, a following read request
> will go to cache device because the LBA can be indexed no matter it is
> dirty or clean. Unless the bkey is invalidated from the B+tree, read
> will always go to cache device firstly in writeback mode. If a power
> failure happens before the cached bkey turns from dirty to clean, just
> an extra writeback bio flushed from cache device to backing device with
> identical data. Comparing the FUA tag for all writeback bios (it will be
> really slow), the extra writeback IOs after a power failure is more
> acceptable to me.

I agree. FWIW, I just learned about /sys/block/sdX/queue/write_cache from
Nikos Tsironis <[email protected]>. Thus, my flag request for a FUA
bypass isn't necessary anyway, even if you did want an FUA there, because
FUAs are stripped when a blockdev is set to "write back" (QUEUE_FLAG_WC).

----------------------------------------------------------------------
This happens in generic_make_request_checks():

/*
* Filter flush bio's early so that make_request based
* drivers without flush support don't have to worry
* about them.
*/
if (op_is_flush(bio->bi_opf) &&
!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
if (!nr_sectors) {
status = BLK_STS_OK;
goto end_io;
}
}
----------------------------------------------------------------------

-Eric

>
> Coly Li
>
> >
> >>> Signed-off-by: kungf <[email protected]>
> >>> ---
> >>> drivers/md/bcache/writeback.c | 2 +-
> >>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c
> >>> index 4a40f9eadeaf..e5cecb60569e 100644
> >>> --- a/drivers/md/bcache/writeback.c
> >>> +++ b/drivers/md/bcache/writeback.c
> >>> @@ -357,7 +357,7 @@ static void write_dirty(struct closure *cl)
> >>> */
> >>> if (KEY_DIRTY(&w->key)) {
> >>> dirty_init(w);
> >>> - bio_set_op_attrs(&io->bio, REQ_OP_WRITE, 0);
> >>> + bio_set_op_attrs(&io->bio, REQ_OP_WRITE | REQ_FUA, 0);
> >>> io->bio.bi_iter.bi_sector = KEY_START(&w->key);
> >>> bio_set_dev(&io->bio, io->dc->bdev);
> >>> io->bio.bi_end_io = dirty_endio;
> >>>
> >>
>