2013-03-22 20:12:05

by Darrick J. Wong

[permalink] [raw]
Subject: [PATCH] dm: dm-cache fails to write the cache device in writethrough mode

The new writethrough strategy for dm-cache issues a bio to the origin device,
remaps the bio to the cache device, and issues the bio to the cache device.
However, the block layer modifies bi_sector and bi_size, so we need to preserve
these or else nothing gets written to the cache (bi_size == 0). This fixes the
problem where someone writes a block through the cache, but a subsequent reread
(from the cache) returns old contents.

Signed-off-by: Darrick J. Wong <[email protected]>
---
drivers/md/dm-cache-target.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 66120bd..0db0ad2 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -205,6 +205,8 @@ struct per_bio_data {
struct cache *cache;
dm_cblock_t cblock;
bio_end_io_t *saved_bi_end_io;
+ unsigned int bi_size;
+ sector_t bi_sector;
};

struct dm_cache_migration {
@@ -643,6 +645,8 @@ static void writethrough_endio(struct bio *bio, int err)
return;
}

+ bio->bi_sector = pb->bi_sector;
+ bio->bi_size = pb->bi_size;
remap_to_cache(pb->cache, bio, pb->cblock);

/*
@@ -667,6 +671,12 @@ static void remap_to_origin_then_cache(struct cache *cache, struct bio *bio,
pb->cache = cache;
pb->cblock = cblock;
pb->saved_bi_end_io = bio->bi_end_io;
+ /*
+ * The block layer modifies bi_size and bi_sector, so we must save
+ * them for when we re-issue the bio against the cache device.
+ */
+ pb->bi_size = bio->bi_size;
+ pb->bi_sector = bio->bi_sector;
bio->bi_end_io = writethrough_endio;

remap_to_origin_clear_discard(pb->cache, bio, oblock);


2013-03-22 20:22:07

by Ben Hutchings

[permalink] [raw]
Subject: Re: [PATCH] dm: dm-cache fails to write the cache device in writethrough mode

On Fri, 2013-03-22 at 13:11 -0700, Darrick J. Wong wrote:
> The new writethrough strategy for dm-cache issues a bio to the origin device,
> remaps the bio to the cache device, and issues the bio to the cache device.
> However, the block layer modifies bi_sector and bi_size, so we need to preserve
> these or else nothing gets written to the cache (bi_size == 0). This fixes the
> problem where someone writes a block through the cache, but a subsequent reread
> (from the cache) returns old contents.
>
> Signed-off-by: Darrick J. Wong <[email protected]>
> ---
[...]

This is not the correct way to submit a change to stable. See
Documentation/stable_kernel_rules.txt

Ben.

--
Ben Hutchings
Make three consecutive correct guesses and you will be considered an expert.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2013-03-22 20:50:27

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [PATCH] dm: dm-cache fails to write the cache device in writethrough mode

On Fri, Mar 22, 2013 at 08:21:56PM +0000, Ben Hutchings wrote:
> On Fri, 2013-03-22 at 13:11 -0700, Darrick J. Wong wrote:
> > The new writethrough strategy for dm-cache issues a bio to the origin device,
> > remaps the bio to the cache device, and issues the bio to the cache device.
> > However, the block layer modifies bi_sector and bi_size, so we need to preserve
> > these or else nothing gets written to the cache (bi_size == 0). This fixes the
> > problem where someone writes a block through the cache, but a subsequent reread
> > (from the cache) returns old contents.
> >
> > Signed-off-by: Darrick J. Wong <[email protected]>
> > ---
> [...]
>
> This is not the correct way to submit a change to stable. See
> Documentation/stable_kernel_rules.txt

Frankly, I'm not sure why agk sent the 3.9-fixes pull request to -stable. 7 of
the 10 commits are for dm-cache, and dm-cache is a new feature for 3.9. I
probably could have dropped -stable from the cc: list when I started
complaining about bugs. :)

Sorry about the noise.

--D
>
> Ben.
>
> --
> Ben Hutchings
> Make three consecutive correct guesses and you will be considered an expert.

2013-03-22 21:35:58

by Mike Snitzer

[permalink] [raw]
Subject: Re: dm: dm-cache fails to write the cache device in writethrough mode

On Fri, Mar 22 2013 at 4:50pm -0400,
Darrick J. Wong <[email protected]> wrote:

> On Fri, Mar 22, 2013 at 08:21:56PM +0000, Ben Hutchings wrote:
> > On Fri, 2013-03-22 at 13:11 -0700, Darrick J. Wong wrote:
> > > The new writethrough strategy for dm-cache issues a bio to the origin device,
> > > remaps the bio to the cache device, and issues the bio to the cache device.
> > > However, the block layer modifies bi_sector and bi_size, so we need to preserve
> > > these or else nothing gets written to the cache (bi_size == 0). This fixes the
> > > problem where someone writes a block through the cache, but a subsequent reread
> > > (from the cache) returns old contents.
> > >
> > > Signed-off-by: Darrick J. Wong <[email protected]>
> > > ---
> > [...]
> >
> > This is not the correct way to submit a change to stable. See
> > Documentation/stable_kernel_rules.txt
>
> Frankly, I'm not sure why agk sent the 3.9-fixes pull request to -stable. 7 of
> the 10 commits are for dm-cache, and dm-cache is a new feature for 3.9. I
> probably could have dropped -stable from the cc: list when I started
> complaining about bugs. :)

Alasdair's pull mail was sent to stable because commit f046f89a99c ("dm
thin: fix discard corruption") cc'd stable and was included in the pull
request.

2013-03-22 22:34:38

by Mike Snitzer

[permalink] [raw]
Subject: Re: dm: dm-cache fails to write the cache device in writethrough mode

On Fri, Mar 22 2013 at 4:11pm -0400,
Darrick J. Wong <[email protected]> wrote:

> The new writethrough strategy for dm-cache issues a bio to the origin device,
> remaps the bio to the cache device, and issues the bio to the cache device.
> However, the block layer modifies bi_sector and bi_size, so we need to preserve
> these or else nothing gets written to the cache (bi_size == 0). This fixes the
> problem where someone writes a block through the cache, but a subsequent reread
> (from the cache) returns old contents.

Your writethrough blkid test results are certainly strange. But I'm not
aware of where the block layer would modify bi_size and bi_sector;
please elaborate.

I cannot reproduce your original report. I developed
'test_writethrough_ext4_uuids_match', apologies for the ruby code:

def test_writethrough_ext4_uuids_match
size = meg(10)

# wipe the origin to ensure we don't accidentally have the same
# data on it.
with_standard_linear(:data_size => size) do |origin|
wipe_device(origin)
end

uuid = "deadbeef-cafe-dead-beef-cafedeadbeef"

# format the cache device with a specific uuid
with_standard_cache(:format => true,
:io_mode => :writethrough,
:data_size => size) do |cache|
fs = FS::file_system(:ext4, cache)
fs.format(:uuid => uuid)
FS::assert_fs_uuid(uuid, cache)
end

# origin should have the same uuid as the cache
with_standard_linear(:data_size => size) do |origin|
FS::assert_fs_uuid(uuid, origin)
end
end

This test was committed to the 'devel' branch of my thinp-test-suite
tree: git://github.com/snitm/thinp-test-suite.git

Also the existing 'test_writethrough' test works fine.

So for now:

Nacked-by: Mike Snitzer <[email protected]>

2013-03-22 23:16:13

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [dm-devel] dm: dm-cache fails to write the cache device in writethrough mode

On Fri, Mar 22, 2013 at 06:34:28PM -0400, Mike Snitzer wrote:
> On Fri, Mar 22 2013 at 4:11pm -0400,
> Darrick J. Wong <[email protected]> wrote:
>
> > The new writethrough strategy for dm-cache issues a bio to the origin device,
> > remaps the bio to the cache device, and issues the bio to the cache device.
> > However, the block layer modifies bi_sector and bi_size, so we need to preserve
> > these or else nothing gets written to the cache (bi_size == 0). This fixes the
> > problem where someone writes a block through the cache, but a subsequent reread
> > (from the cache) returns old contents.
>
> Your writethrough blkid test results are certainly strange. But I'm not
> aware of where the block layer would modify bi_size and bi_sector;
> please elaborate.
>
> I cannot reproduce your original report. I developed
> 'test_writethrough_ext4_uuids_match', apologies for the ruby code:

Hmm... I'm building my kernels off 0a7e453103b9718d357688b83bb968ee108cc874 in
Linus' tree (post 3.9-rc3). This is the full output of dmsetup table:

moocache-blocks: 0 1039360 linear 8:16 9088
moocache-metadata: 0 8704 linear 8:16 384
moocache: 0 67108864 cache 253:0 253:1 8:0 512 1 writethrough default 4 random_threshold 4 sequential_threshold 32768

253:0 -> moocache-metadata and 253:1 -> moocache-blocks.

I'm curious what your setup is...

--D

>
> def test_writethrough_ext4_uuids_match
> size = meg(10)
>
> # wipe the origin to ensure we don't accidentally have the same
> # data on it.
> with_standard_linear(:data_size => size) do |origin|
> wipe_device(origin)
> end
>
> uuid = "deadbeef-cafe-dead-beef-cafedeadbeef"
>
> # format the cache device with a specific uuid
> with_standard_cache(:format => true,
> :io_mode => :writethrough,
> :data_size => size) do |cache|
> fs = FS::file_system(:ext4, cache)
> fs.format(:uuid => uuid)
> FS::assert_fs_uuid(uuid, cache)
> end
>
> # origin should have the same uuid as the cache
> with_standard_linear(:data_size => size) do |origin|
> FS::assert_fs_uuid(uuid, origin)
> end
> end
>
> This test was committed to the 'devel' branch of my thinp-test-suite
> tree: git://github.com/snitm/thinp-test-suite.git
>
> Also the existing 'test_writethrough' test works fine.
>
> So for now:
>
> Nacked-by: Mike Snitzer <[email protected]>
>
> --
> dm-devel mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/dm-devel

2013-03-23 03:00:50

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-devel] [PATCH] dm: dm-cache fails to write the cache device in writethrough mode

On Fri, Mar 22, 2013 at 01:11:51PM -0700, Darrick J. Wong wrote:
> The new writethrough strategy for dm-cache issues a bio to the origin device,
> remaps the bio to the cache device, and issues the bio to the cache device.
> However, the block layer modifies bi_sector and bi_size, so we need to preserve
> these or else nothing gets written to the cache (bi_size == 0). This fixes the
> problem where someone writes a block through the cache, but a subsequent reread
> (from the cache) returns old contents.

If this needs doing here, please use dm_bio_record() and dm_bio_restore().
E.g. Look at how dm-raid1.c does something similar.

Alasdair

2013-03-23 03:27:26

by Mike Snitzer

[permalink] [raw]
Subject: Re: dm: dm-cache fails to write the cache device in writethrough mode

On Fri, Mar 22 2013 at 7:16pm -0400,
Darrick J. Wong <[email protected]> wrote:

> On Fri, Mar 22, 2013 at 06:34:28PM -0400, Mike Snitzer wrote:
> > On Fri, Mar 22 2013 at 4:11pm -0400,
> > Darrick J. Wong <[email protected]> wrote:
> >
> > > The new writethrough strategy for dm-cache issues a bio to the origin device,
> > > remaps the bio to the cache device, and issues the bio to the cache device.
> > > However, the block layer modifies bi_sector and bi_size, so we need to preserve
> > > these or else nothing gets written to the cache (bi_size == 0). This fixes the
> > > problem where someone writes a block through the cache, but a subsequent reread
> > > (from the cache) returns old contents.
> >
> > Your writethrough blkid test results are certainly strange. But I'm not
> > aware of where the block layer would modify bi_size and bi_sector;
> > please elaborate.
> >
> > I cannot reproduce your original report. I developed
> > 'test_writethrough_ext4_uuids_match', apologies for the ruby code:
>
> Hmm... I'm building my kernels off 0a7e453103b9718d357688b83bb968ee108cc874 in
> Linus' tree (post 3.9-rc3). This is the full output of dmsetup table:
>
> moocache-blocks: 0 1039360 linear 8:16 9088
> moocache-metadata: 0 8704 linear 8:16 384
> moocache: 0 67108864 cache 253:0 253:1 8:0 512 1 writethrough default 4 random_threshold 4 sequential_threshold 32768
>
> 253:0 -> moocache-metadata and 253:1 -> moocache-blocks.
>
> I'm curious what your setup is...

Here are the tables:
test-dev-238267: 0 8192 linear /dev/stec/metadata 0
test-dev-255913: 0 2097152 linear /dev/stec/metadata 8192
test-dev-655144: 0 20480 linear /dev/spindle/data 0
0 20480 cache /dev/mapper/test-dev-238267 /dev/mapper/test-dev-255913 /dev/mapper/test-dev-655144 512 1 writethrough default 0

And I tweaked 'test_writethrough_ext4_uuids_match' to make sure to use the
same thresholds you're using, full status output:
0 20480 cache 15/1024 0 19 0 0 0 0 0 0 1 writethrough 2 migration_threshold 32768 4 random_threshold 4 sequential_threshold 512

So the big difference is the thinp-test-suite uses intermediate linear
DM layers above the slower sd device (spindle/data) -- whereas in your
setup the origin device is direct to sd (8:0).

I'll re-run with the origin directly on sd in the morning and will
report back.

2013-03-23 03:48:09

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-devel] dm: dm-cache fails to write the cache device in writethrough mode

On Fri, Mar 22, 2013 at 11:27:16PM -0400, Mike Snitzer wrote:
> So the big difference is the thinp-test-suite uses intermediate linear
> DM layers above the slower sd device (spindle/data) -- whereas in your
> setup the origin device is direct to sd (8:0).

I would think the difference is because DM issues (edited) copies of
incoming bios to the layers underneath, so changes don't propagate back
up.

Darrick: if you retest with a dm linear device, does the problem go
away?

Alasdair

2013-03-23 05:16:06

by Darrick J. Wong

[permalink] [raw]
Subject: Re: [dm-devel] dm: dm-cache fails to write the cache device in writethrough mode

On Fri, Mar 22, 2013 at 11:27:16PM -0400, Mike Snitzer wrote:
> On Fri, Mar 22 2013 at 7:16pm -0400,
> Darrick J. Wong <[email protected]> wrote:
>
> > On Fri, Mar 22, 2013 at 06:34:28PM -0400, Mike Snitzer wrote:
> > > On Fri, Mar 22 2013 at 4:11pm -0400,
> > > Darrick J. Wong <[email protected]> wrote:
> > >
> > > > The new writethrough strategy for dm-cache issues a bio to the origin device,
> > > > remaps the bio to the cache device, and issues the bio to the cache device.
> > > > However, the block layer modifies bi_sector and bi_size, so we need to preserve
> > > > these or else nothing gets written to the cache (bi_size == 0). This fixes the
> > > > problem where someone writes a block through the cache, but a subsequent reread
> > > > (from the cache) returns old contents.
> > >
> > > Your writethrough blkid test results are certainly strange. But I'm not
> > > aware of where the block layer would modify bi_size and bi_sector;
> > > please elaborate.
> > >
> > > I cannot reproduce your original report. I developed
> > > 'test_writethrough_ext4_uuids_match', apologies for the ruby code:
> >
> > Hmm... I'm building my kernels off 0a7e453103b9718d357688b83bb968ee108cc874 in
> > Linus' tree (post 3.9-rc3). This is the full output of dmsetup table:
> >
> > moocache-blocks: 0 1039360 linear 8:16 9088
> > moocache-metadata: 0 8704 linear 8:16 384
> > moocache: 0 67108864 cache 253:0 253:1 8:0 512 1 writethrough default 4 random_threshold 4 sequential_threshold 32768
> >
> > 253:0 -> moocache-metadata and 253:1 -> moocache-blocks.
> >
> > I'm curious what your setup is...
>
> Here are the tables:
> test-dev-238267: 0 8192 linear /dev/stec/metadata 0
> test-dev-255913: 0 2097152 linear /dev/stec/metadata 8192
> test-dev-655144: 0 20480 linear /dev/spindle/data 0
> 0 20480 cache /dev/mapper/test-dev-238267 /dev/mapper/test-dev-255913 /dev/mapper/test-dev-655144 512 1 writethrough default 0
>
> And I tweaked 'test_writethrough_ext4_uuids_match' to make sure to use the
> same thresholds you're using, full status output:
> 0 20480 cache 15/1024 0 19 0 0 0 0 0 0 1 writethrough 2 migration_threshold 32768 4 random_threshold 4 sequential_threshold 512
>
> So the big difference is the thinp-test-suite uses intermediate linear
> DM layers above the slower sd device (spindle/data) -- whereas in your
> setup the origin device is direct to sd (8:0).
>
> I'll re-run with the origin directly on sd in the morning and will
> report back.

Interesting ... if I set up this:

# echo "0 67108864 linear /dev/sda 0" | dmsetup create origin

And then repeat the test, but using /dev/mapper/origin as the origin instead
of /dev/sda, the problem goes away.

--D

2013-03-23 21:09:06

by Mike Snitzer

[permalink] [raw]
Subject: Re: dm: dm-cache fails to write the cache device in writethrough mode

On Sat, Mar 23 2013 at 1:15am -0400,
Darrick J. Wong <[email protected]> wrote:

> On Fri, Mar 22, 2013 at 11:27:16PM -0400, Mike Snitzer wrote:
> > On Fri, Mar 22 2013 at 7:16pm -0400,
> > Darrick J. Wong <[email protected]> wrote:
> >
> > > On Fri, Mar 22, 2013 at 06:34:28PM -0400, Mike Snitzer wrote:
> > > > On Fri, Mar 22 2013 at 4:11pm -0400,
> > > > Darrick J. Wong <[email protected]> wrote:
> > > >
> > > > > The new writethrough strategy for dm-cache issues a bio to the origin device,
> > > > > remaps the bio to the cache device, and issues the bio to the cache device.
> > > > > However, the block layer modifies bi_sector and bi_size, so we need to preserve
> > > > > these or else nothing gets written to the cache (bi_size == 0). This fixes the
> > > > > problem where someone writes a block through the cache, but a subsequent reread
> > > > > (from the cache) returns old contents.
> > > >
> > > > Your writethrough blkid test results are certainly strange. But I'm not
> > > > aware of where the block layer would modify bi_size and bi_sector;
> > > > please elaborate.
> > > >
> > > > I cannot reproduce your original report. I developed
> > > > 'test_writethrough_ext4_uuids_match', apologies for the ruby code:
> > >
> > > Hmm... I'm building my kernels off 0a7e453103b9718d357688b83bb968ee108cc874 in
> > > Linus' tree (post 3.9-rc3). This is the full output of dmsetup table:
> > >
> > > moocache-blocks: 0 1039360 linear 8:16 9088
> > > moocache-metadata: 0 8704 linear 8:16 384
> > > moocache: 0 67108864 cache 253:0 253:1 8:0 512 1 writethrough default 4 random_threshold 4 sequential_threshold 32768
> > >
> > > 253:0 -> moocache-metadata and 253:1 -> moocache-blocks.
> > >
> > > I'm curious what your setup is...
> >
> > Here are the tables:
> > test-dev-238267: 0 8192 linear /dev/stec/metadata 0
> > test-dev-255913: 0 2097152 linear /dev/stec/metadata 8192
> > test-dev-655144: 0 20480 linear /dev/spindle/data 0
> > 0 20480 cache /dev/mapper/test-dev-238267 /dev/mapper/test-dev-255913 /dev/mapper/test-dev-655144 512 1 writethrough default 0
> >
> > And I tweaked 'test_writethrough_ext4_uuids_match' to make sure to use the
> > same thresholds you're using, full status output:
> > 0 20480 cache 15/1024 0 19 0 0 0 0 0 0 1 writethrough 2 migration_threshold 32768 4 random_threshold 4 sequential_threshold 512
> >
> > So the big difference is the thinp-test-suite uses intermediate linear
> > DM layers above the slower sd device (spindle/data) -- whereas in your
> > setup the origin device is direct to sd (8:0).
> >
> > I'll re-run with the origin directly on sd in the morning and will
> > report back.
>
> Interesting ... if I set up this:
>
> # echo "0 67108864 linear /dev/sda 0" | dmsetup create origin
>
> And then repeat the test, but using /dev/mapper/origin as the origin instead
> of /dev/sda, the problem goes away.

Using the extra dm-linear layer is implicitly leveraging the DM core's
bio cloning to restore the original bio that was sent to the linear
target.

But even after having changed my test to use /dev/sdb for the origin
device I cannot reproduce the problem you've reported. Do you have any
further details on how/why the bios are being altered? Are you
reliably hitting partial completions within the origin's driver? If so
how?

Having looked at this for a bit it seems pretty clear writethrough_endio
is missing partial completion handling, e.g.:
if (!bio_flagged(bio, BIO_UPTODATE) && !err)
err = -EIO;

But I haven't yet come to terms with what the partial completion
handling implementation needs to be for the writethrough support.

2013-03-23 21:13:13

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-devel] dm: dm-cache fails to write the cache device in writethrough mode

On Fri, Mar 22, 2013 at 10:15:49PM -0700, Darrick J. Wong wrote:
> Interesting ... if I set up this:
> # echo "0 67108864 linear /dev/sda 0" | dmsetup create origin
> And then repeat the test, but using /dev/mapper/origin as the origin instead
> of /dev/sda, the problem goes away.

So that looks like an easy and correct workaround for now.

Alasdair

2013-03-23 22:56:59

by Mike Snitzer

[permalink] [raw]
Subject: Re: dm: dm-cache fails to write the cache device in writethrough mode

On Sat, Mar 23 2013 at 5:08pm -0400,
Mike Snitzer <[email protected]> wrote:

> But even after having changed my test to use /dev/sdb for the origin
> device I cannot reproduce the problem you've reported. Do you have any
> further details on how/why the bios are being altered? Are you
> reliably hitting partial completions within the origin's driver? If so
> how?

I can easily see bio->bi_size being 0 in writethrough_endio, here is the
stack trace from a WARN_ON_ONCE(!bio->bi_size); that I added to
writethrough_endio:

Call Trace:
<IRQ> [<ffffffff81042d7f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81042dda>] warn_slowpath_null+0x1a/0x20
[<ffffffffa072f56f>] writethrough_endio+0x13f/0x150 [dm_cache]
[<ffffffff811a30dd>] bio_endio+0x3d/0x90
[<ffffffff81233853>] req_bio_endio+0xa3/0xe0
[<ffffffff81234f6f>] blk_update_request+0x10f/0x480
[<ffffffff81235307>] blk_update_bidi_request+0x27/0xb0
[<ffffffff8123651f>] blk_end_bidi_request+0x2f/0x80
[<ffffffff812365c0>] blk_end_request+0x10/0x20
[<ffffffff81363680>] scsi_end_request+0x40/0xb0
[<ffffffff81081737>] ? entity_tick+0x97/0x420
[<ffffffff813639ff>] scsi_io_completion+0x9f/0x660
[<ffffffff8104baa9>] ? raise_softirq_irqoff+0x9/0x50
[<ffffffff8135ad89>] scsi_finish_command+0xc9/0x130
[<ffffffff81364127>] scsi_softirq_done+0x147/0x170
[<ffffffff8123ca42>] blk_done_softirq+0x82/0xa0
[<ffffffff8104b697>] __do_softirq+0xe7/0x260
[<ffffffff811a21a5>] ? bio_alloc_bioset+0x65/0x120
[<ffffffff8150fc9c>] call_softirq+0x1c/0x30
[<ffffffff81004415>] do_softirq+0x65/0xa0
[<ffffffff8104b46d>] irq_exit+0xbd/0xe0
[<ffffffff81510396>] do_IRQ+0x66/0xe0
[<ffffffff815061ed>] common_interrupt+0x6d/0x6d

No idea why I was so oblivious to a bio->bi_size of 0 reflecting
completion. So nothing to do with partial completion at all.

Here is a version of the patch you posted that uses
dm_bio_{record,restore} like Alasdair suggested:

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index 66120bd..90b1dd2 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -5,6 +5,7 @@
*/

#include "dm.h"
+#include "dm-bio-record.h"
#include "dm-bio-prison.h"
#include "dm-cache-metadata.h"

@@ -205,6 +206,7 @@ struct per_bio_data {
struct cache *cache;
dm_cblock_t cblock;
bio_end_io_t *saved_bi_end_io;
+ struct dm_bio_details bio_details;
};

struct dm_cache_migration {
@@ -643,6 +645,7 @@ static void writethrough_endio(struct bio *bio, int err)
return;
}

+ dm_bio_restore(&pb->bio_details, bio);
remap_to_cache(pb->cache, bio, pb->cblock);

/*
@@ -668,6 +671,7 @@ static void remap_to_origin_then_cache(struct cache *cache, struct bio *bio,
pb->cblock = cblock;
pb->saved_bi_end_io = bio->bi_end_io;
bio->bi_end_io = writethrough_endio;
+ dm_bio_record(&pb->bio_details, bio);

remap_to_origin_clear_discard(pb->cache, bio, oblock);
}

2013-03-25 10:28:41

by Joe Thornber

[permalink] [raw]
Subject: Re: [PATCH] dm: dm-cache fails to write the cache device in writethrough mode

Hi Darrick,

On Fri, Mar 22, 2013 at 01:11:51PM -0700, Darrick J. Wong wrote:
> The new writethrough strategy for dm-cache issues a bio to the origin device,
> remaps the bio to the cache device, and issues the bio to the cache device.
> However, the block layer modifies bi_sector and bi_size, so we need to preserve
> these or else nothing gets written to the cache (bi_size == 0). This fixes the
> problem where someone writes a block through the cache, but a subsequent reread
> (from the cache) returns old contents.

Thanks for diagnosing this, as you may have gathered none of the
hardware we're testing on does this. I've pushed a patch to dm-devel
that hopefully will make it's way upstream very quickly. I used
dm_bio_record/restore, since there are many more fields that
underlying drivers make tweak in the bio.

- Joe

2013-03-25 23:59:23

by Mike Snitzer

[permalink] [raw]
Subject: [PATCH v2] dm cache: fix writes to cache device in writethrough mode

From: Darrick J. Wong <[email protected]>

The dm-cache writethrough strategy introduced by commit e2e74d617eadc15
("dm cache: fix race in writethrough implementation") issues a bio to
the origin device, remaps and then issues the bio to the cache device.
This more conservative in-series approach was selected to favor
correctness over performance (of the previous parallel writethrough).
However, this in-series implementation that reuses the same bio to write
both the origin and cache device didn't take into account that the block
layer's req_bio_endio() modifies a completing bio's bi_sector and
bi_size. So the new writethrough strategy needs to preserve these bio
fields, and restore them before submission to the cache device,
otherwise nothing gets written to the cache (because bi_size is 0).

This patch adds a struct dm_bio_details field to struct per_bio_data,
and uses dm_bio_record() and dm_bio_restore() to ensure the bio is
restored before reissuing to the cache device. Adding such a large
structure to the per_bio_data is not ideal but we can improve this
later, for now correctness is the important thing.

This problem initially went unnoticed because the dm-cache test-suite
uses a linear DM device for the dm-cache device's origin device.
Writethrough worked as expected because DM submits a *clone* of the
original bio, so the original bio which was reused for the cache was
never touched.

Signed-off-by: Darrick J. Wong <[email protected]>
Signed-off-by: Joe Thornber <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
---
drivers/md/dm-cache-target.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

v2: revised patch header on Darrick and Joe's behalf

diff --git a/drivers/md/dm-cache-target.c b/drivers/md/dm-cache-target.c
index cf24a46..873f495 100644
--- a/drivers/md/dm-cache-target.c
+++ b/drivers/md/dm-cache-target.c
@@ -6,6 +6,7 @@

#include "dm.h"
#include "dm-bio-prison.h"
+#include "dm-bio-record.h"
#include "dm-cache-metadata.h"

#include <linux/dm-io.h>
@@ -205,6 +206,7 @@ struct per_bio_data {
struct cache *cache;
dm_cblock_t cblock;
bio_end_io_t *saved_bi_end_io;
+ struct dm_bio_details bio_details;
};

struct dm_cache_migration {
@@ -642,6 +644,7 @@ static void writethrough_endio(struct bio *bio, int err)
return;
}

+ dm_bio_restore(&pb->bio_details, bio);
remap_to_cache(pb->cache, bio, pb->cblock);

/*
@@ -665,6 +668,7 @@ static void remap_to_origin_then_cache(struct cache *cache, struct bio *bio,
pb->cache = cache;
pb->cblock = cblock;
pb->saved_bi_end_io = bio->bi_end_io;
+ dm_bio_record(&pb->bio_details, bio);
bio->bi_end_io = writethrough_endio;

remap_to_origin_clear_discard(pb->cache, bio, oblock);