2015-02-09 16:39:01

by Dongsu Park

[permalink] [raw]
Subject: blk-mq crash with dm-multipath in for-3.20/core

Hi Jens,

during testing with the linux-block for-3.20/core branch, I hit a BUG
like below. It's reproducible by running xfstests/xfs/279.

Bisecting showed that the first bad commit is 6d6285c45f5a ("block:
require blk_rq_prep_clone() be given an initialized clone request").
With reverting this commit, the crash disappears.
The linux-dm's branch dm-for-3.20 works fine without crash too.

As pointed out already by Keith Busch in a thread, [1] that commit should
not be there in the first place. Commit 102e38b1030e ("dm: split
request structure out from dm_rq_target_io structure") from linux-dm tree
[2] is going to move the blk_rq_init() call again to __clone_rq().

So that commit 6d6285c45f5a should be either reverted, or moved to
linux-dm tree, doesn't it?

Cheers,
Dongsu

[1] https://www.redhat.com/archives/dm-devel/2015-January/msg00171.html
[2] https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20&id=102e38b1030e883efc022dfdc7b7e7a3de70d1c5

------------[ cut here ]------------
kernel BUG at block/blk-core.c:2333!
RIP: 0010: [<ffffffff814c6858>] blk_dequeue_request+0x78/0x90
Call Trace:
[<ffffffff814c6886>] blk_start_request+0x16/0x70
[<ffffffff8169f9fa>] dm_start_request+0x1a/0x50
[<ffffffff8169fce6>] dm_request_fn+0x2b6/0x3e0
[<ffffffff814c0087>] __blk_run_queue+0x37/0x50
[<ffffffff814c31ed>] queue_unplugged+0x5d/0x230
[<ffffffff814c710c>] blk_flush_plug_list+0x1ac/0x230
[<ffffffff814c7708>] blk_finish_plug+0x18/0x60
[<ffffffff811baea1>] __do_page_cache_readahead+0x2b1/0x320
[<ffffffff811bad55>] ? __do_page_cache_readahead+0x165/0x320
[<ffffffff811baff2>] ondemand_readahead+0xe2/0x480
[<ffffffff811ac3ff>] ? pagecache_get_page+0x2f/0x200
[<ffffffff811bb4c1>] page_cache_sync_readahead+0x31/0x50
[<ffffffff811ad5bc>] generic_file_read_iter+0x51c/0x630
[<ffffffff811dd00e>] ? might_fault+0x5e/0xc0
[<ffffffff81261e37>] blkdev_read_iter+0x37/0x40
[<ffffffff8121fa4e>] new_sync_read+0x7e/0xb0
[<ffffffff81220ce8>] __vfs_read+0x18/0x50
[<ffffffff81220dad>] vfs_read+0x8d/0x150
[<ffffffff81220eb9>] SyS_read+0x49/0xb0
[<ffffffff817dce52>] system_call_fastpath+0x12/0x17
RIP [<ffffffff814c6858>] blk_dequeue_request+0x78/0x90
RSP <ffff88006e1eba68>
---[ end trace dcfc3d438518b1aa ]---


2015-02-09 16:49:09

by Mike Snitzer

[permalink] [raw]
Subject: Re: blk-mq crash with dm-multipath in for-3.20/core

On Mon, Feb 09 2015 at 11:38am -0500,
Dongsu Park <[email protected]> wrote:

> Hi Jens,
>
> during testing with the linux-block for-3.20/core branch, I hit a BUG
> like below. It's reproducible by running xfstests/xfs/279.
>
> Bisecting showed that the first bad commit is 6d6285c45f5a ("block:
> require blk_rq_prep_clone() be given an initialized clone request").
> With reverting this commit, the crash disappears.
> The linux-dm's branch dm-for-3.20 works fine without crash too.
>
> As pointed out already by Keith Busch in a thread, [1] that commit should
> not be there in the first place. Commit 102e38b1030e ("dm: split
> request structure out from dm_rq_target_io structure") from linux-dm tree
> [2] is going to move the blk_rq_init() call again to __clone_rq().
>
> So that commit 6d6285c45f5a should be either reverted, or moved to
> linux-dm tree, doesn't it?
>
> Cheers,
> Dongsu
>
> [1] https://www.redhat.com/archives/dm-devel/2015-January/msg00171.html
> [2] https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20&id=102e38b1030e883efc022dfdc7b7e7a3de70d1c5

Right, we're aware of this typo in 6d6285c45f5a. Sorry about that, but
as you noted, once both the linux-block and linux-dm branches for 3.20
are merged all is back to working.

So we're planning to just leave the block commit as broken and let the
dm commit you noted fix it up. In the end 3.20-rc1 will have a working
dm-multipath.

Mike

2015-02-09 17:08:01

by Keith Busch

[permalink] [raw]
Subject: Re: blk-mq crash with dm-multipath in for-3.20/core

On Mon, 9 Feb 2015, Mike Snitzer wrote:
> On Mon, Feb 09 2015 at 11:38am -0500,
> Dongsu Park <[email protected]> wrote:
>> So that commit 6d6285c45f5a should be either reverted, or moved to
>> linux-dm tree, doesn't it?
>>
>> Cheers,
>> Dongsu
>>
>> [1] https://www.redhat.com/archives/dm-devel/2015-January/msg00171.html
>> [2] https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20&id=102e38b1030e883efc022dfdc7b7e7a3de70d1c5
>
> Right, we're aware of this typo in 6d6285c45f5a. Sorry about that, but
> as you noted, once both the linux-block and linux-dm branches for 3.20
> are merged all is back to working.
>
> So we're planning to just leave the block commit as broken and let the
> dm commit you noted fix it up. In the end 3.20-rc1 will have a working
> dm-multipath.

Oh, we're not going rebase the series with the correction? I'm concerned
someone biscecting a completely unrelated problem might step on this
commit. Up to you guys. It's my fault, so I'll deal with the consequences.

2015-02-09 17:13:27

by Mike Snitzer

[permalink] [raw]
Subject: Re: blk-mq crash with dm-multipath in for-3.20/core

On Mon, Feb 09 2015 at 12:07pm -0500,
Keith Busch <[email protected]> wrote:

> On Mon, 9 Feb 2015, Mike Snitzer wrote:
> >On Mon, Feb 09 2015 at 11:38am -0500,
> >Dongsu Park <[email protected]> wrote:
> >>So that commit 6d6285c45f5a should be either reverted, or moved to
> >>linux-dm tree, doesn't it?
> >>
> >>Cheers,
> >>Dongsu
> >>
> >>[1] https://www.redhat.com/archives/dm-devel/2015-January/msg00171.html
> >>[2] https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20&id=102e38b1030e883efc022dfdc7b7e7a3de70d1c5
> >
> >Right, we're aware of this typo in 6d6285c45f5a. Sorry about that, but
> >as you noted, once both the linux-block and linux-dm branches for 3.20
> >are merged all is back to working.
> >
> >So we're planning to just leave the block commit as broken and let the
> >dm commit you noted fix it up. In the end 3.20-rc1 will have a working
> >dm-multipath.
>
> Oh, we're not going rebase the series with the correction? I'm concerned
> someone biscecting a completely unrelated problem might step on this
> commit. Up to you guys. It's my fault, so I'll deal with the consequences.

Rebasing this late (3.20 merge already opened) is generally not done.
Unfortunately we'll just have to suck it up and deal with the fallout
during bisect -- would wager very few are bisecting with multipath as
the focus of their test.

2015-02-09 17:35:26

by Mike Snitzer

[permalink] [raw]
Subject: [PATCH] dm: fix multipath regression due to initializing wrong request

On Mon, Feb 09 2015 at 12:13P -0500,
Mike Snitzer <[email protected]> wrote:

> On Mon, Feb 09 2015 at 12:07pm -0500,
> Keith Busch <[email protected]> wrote:
>
> > Oh, we're not going rebase the series with the correction? I'm concerned
> > someone biscecting a completely unrelated problem might step on this
> > commit. Up to you guys. It's my fault, so I'll deal with the consequences.
>
> Rebasing this late (3.20 merge already opened) is generally not done.
> Unfortunately we'll just have to suck it up and deal with the fallout
> during bisect -- would wager very few are bisecting with multipath as
> the focus of their test.

Jens and I discussed this further and given that linux-block breaks
dm-multipath it is best to fix linux-block and let Linus resolve the
merge when I send him the linux-dm pull.

Here is the patch to fix the regression:

From: Mike Snitzer <[email protected]>
Date: Mon, 9 Feb 2015 12:21:54 -0500
Subject: [PATCH] dm: fix multipath regression due to initializing wrong request

Commit febf715 ("block: require blk_rq_prep_clone() be given an
initialized clone request") introduced a regression by calling
blk_rq_init() on the original request rather than the clone
request that is passed to setup_clone().

Signed-off-by: Mike Snitzer <[email protected]>
---
drivers/md/dm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f251633..71e6b73 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1719,7 +1719,7 @@ static int setup_clone(struct request *clone, struct request *rq,
{
int r;

- blk_rq_init(NULL, rq);
+ blk_rq_init(NULL, clone);
r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
dm_rq_bio_constructor, tio);
if (r)
--
1.9.3 (Apple Git-50)

2015-02-09 17:47:26

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] dm: fix multipath regression due to initializing wrong request

On 02/09/2015 10:35 AM, Mike Snitzer wrote:
> On Mon, Feb 09 2015 at 12:13P -0500,
> Mike Snitzer <[email protected]> wrote:
>
>> On Mon, Feb 09 2015 at 12:07pm -0500,
>> Keith Busch <[email protected]> wrote:
>>
>>> Oh, we're not going rebase the series with the correction? I'm concerned
>>> someone biscecting a completely unrelated problem might step on this
>>> commit. Up to you guys. It's my fault, so I'll deal with the consequences.
>>
>> Rebasing this late (3.20 merge already opened) is generally not done.
>> Unfortunately we'll just have to suck it up and deal with the fallout
>> during bisect -- would wager very few are bisecting with multipath as
>> the focus of their test.
>
> Jens and I discussed this further and given that linux-block breaks
> dm-multipath it is best to fix linux-block and let Linus resolve the
> merge when I send him the linux-dm pull.
>
> Here is the patch to fix the regression:

Added, thanks. I don't think this is worth rebasing for, so just added
to the top of for-3.20/core (since that's where the buggy commit was added).

--
Jens Axboe

2015-02-10 10:42:16

by Dongsu Park

[permalink] [raw]
Subject: Re: [PATCH] dm: fix multipath regression due to initializing wrong request

On 09.02.2015 10:47, Jens Axboe wrote:
> On 02/09/2015 10:35 AM, Mike Snitzer wrote:
> >On Mon, Feb 09 2015 at 12:13P -0500,
> >Mike Snitzer <[email protected]> wrote:
> >
> >Jens and I discussed this further and given that linux-block breaks
> >dm-multipath it is best to fix linux-block and let Linus resolve the
> >merge when I send him the linux-dm pull.
> >
> >Here is the patch to fix the regression:
>
> Added, thanks. I don't think this is worth rebasing for, so just added to
> the top of for-3.20/core (since that's where the buggy commit was added).

Thanks a lot. Now the branch for-3.20/core works without hitting the BUG.

Dongsu

> --
> Jens Axboe
>