Hey,
We've managed to track down some hard to hit bugs that were introduced
in two relatively recent commits. The fixes are both one liners.
Thanks,
Logan
--
David Sloan (1):
md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk()
Logan Gunthorpe (1):
md: Remove extra mddev_get() in md_seq_start()
drivers/md/md.c | 1 -
drivers/md/raid5.c | 1 -
2 files changed, 2 deletions(-)
base-commit: 526bd69b9d330eed1db59b2cf6a7d18caf866847
--
2.30.2
From: David Sloan <[email protected]>
When running chunk-sized reads on disks with badblocks duplicate bio
free/puts are observed:
=============================================================================
BUG bio-200 (Not tainted): Object already free
-----------------------------------------------------------------------------
Allocated in mempool_alloc_slab+0x17/0x20 age=3 cpu=2 pid=7504
__slab_alloc.constprop.0+0x5a/0xb0
kmem_cache_alloc+0x31e/0x330
mempool_alloc_slab+0x17/0x20
mempool_alloc+0x100/0x2b0
bio_alloc_bioset+0x181/0x460
do_mpage_readpage+0x776/0xd00
mpage_readahead+0x166/0x320
blkdev_readahead+0x15/0x20
read_pages+0x13f/0x5f0
page_cache_ra_unbounded+0x18d/0x220
force_page_cache_ra+0x181/0x1c0
page_cache_sync_ra+0x65/0xb0
filemap_get_pages+0x1df/0xaf0
filemap_read+0x1e1/0x700
blkdev_read_iter+0x1e5/0x330
vfs_read+0x42a/0x570
Freed in mempool_free_slab+0x17/0x20 age=3 cpu=2 pid=7504
kmem_cache_free+0x46d/0x490
mempool_free_slab+0x17/0x20
mempool_free+0x66/0x190
bio_free+0x78/0x90
bio_put+0x100/0x1a0
raid5_make_request+0x2259/0x2450
md_handle_request+0x402/0x600
md_submit_bio+0xd9/0x120
__submit_bio+0x11f/0x1b0
submit_bio_noacct_nocheck+0x204/0x480
submit_bio_noacct+0x32e/0xc70
submit_bio+0x98/0x1a0
mpage_readahead+0x250/0x320
blkdev_readahead+0x15/0x20
read_pages+0x13f/0x5f0
page_cache_ra_unbounded+0x18d/0x220
Slab 0xffffea000481b600 objects=21 used=0 fp=0xffff8881206d8940 flags=0x17ffffc0010201(locked|slab|head|node=0|zone=2|lastcpupid=0x1fffff)
CPU: 0 PID: 34525 Comm: kworker/u24:2 Not tainted 6.0.0-rc2-localyes-265166-gf11c5343fa3f #143
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
Workqueue: raid5wq raid5_do_work
Call Trace:
<TASK>
dump_stack_lvl+0x5a/0x78
dump_stack+0x10/0x16
print_trailer+0x158/0x165
object_err+0x35/0x50
free_debug_processing.cold+0xb7/0xbe
__slab_free+0x1ae/0x330
kmem_cache_free+0x46d/0x490
mempool_free_slab+0x17/0x20
mempool_free+0x66/0x190
bio_free+0x78/0x90
bio_put+0x100/0x1a0
mpage_end_io+0x36/0x150
bio_endio+0x2fd/0x360
md_end_io_acct+0x7e/0x90
bio_endio+0x2fd/0x360
handle_failed_stripe+0x960/0xb80
handle_stripe+0x1348/0x3760
handle_active_stripes.constprop.0+0x72a/0xaf0
raid5_do_work+0x177/0x330
process_one_work+0x616/0xb20
worker_thread+0x2bd/0x6f0
kthread+0x179/0x1b0
ret_from_fork+0x22/0x30
</TASK>
The double free is caused by an unnecessary bio_put() in the
if(is_badblock(...)) error path in raid5_read_one_chunk().
The error path was moved ahead of bio_alloc_clone() in c82aa1b76787c
("md/raid5: move checking badblock before clone bio in
raid5_read_one_chunk"). The previous code checked and freed align_bio
which required a bio_put. After he move that is no longer needed as
raid_bio is returned to the control of the common io path which
performs its own endio resulting in a double free on bad device blocks.
Fixes: c82aa1b76787c ("md/raid5: move checking badblock before clone bio in raid5_read_one_chunk")
Signed-off-by: David Sloan <[email protected]>
[[email protected]: minor rewording of commit message]
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/md/raid5.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4e6d865a6456..734f92e75f85 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5538,7 +5538,6 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad,
&bad_sectors)) {
- bio_put(raid_bio);
rdev_dec_pending(rdev, mddev);
return 0;
}
--
2.30.2
A regression is seen where mddev devices stay permanently after they
are stopped due to an elevated reference count.
This was tracked down to an extra mddev_get() in md_seq_start().
It only happened rarely because most of the time the md_seq_start()
is called with a zero offset. The path with an extra mddev_get() only
happens when it starts with a non-zero offset.
The commit noted below changed an mddev_get() to check its success
but inadevrtantly left the original call in. Remove the extra call.
Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the disk is freed")
Signed-off-by: Logan Gunthorpe <[email protected]>
---
drivers/md/md.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index afaf36b2f6ab..9dc0175280b4 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8154,7 +8154,6 @@ static void *md_seq_start(struct seq_file *seq, loff_t *pos)
list_for_each(tmp,&all_mddevs)
if (!l--) {
mddev = list_entry(tmp, struct mddev, all_mddevs);
- mddev_get(mddev);
if (!mddev_get(mddev))
continue;
spin_unlock(&all_mddevs_lock);
--
2.30.2
Looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
Dear Logan, dear David,
Thank you for the patch.
Am 02.09.22 um 19:16 schrieb Logan Gunthorpe:
> From: David Sloan <[email protected]>
>
> When running chunk-sized reads on disks with badblocks duplicate bio
> free/puts are observed:
>
> =============================================================================
> BUG bio-200 (Not tainted): Object already free
> -----------------------------------------------------------------------------
> Allocated in mempool_alloc_slab+0x17/0x20 age=3 cpu=2 pid=7504
> __slab_alloc.constprop.0+0x5a/0xb0
[…]
> The double free is caused by an unnecessary bio_put() in the
> if(is_badblock(...)) error path in raid5_read_one_chunk().
>
> The error path was moved ahead of bio_alloc_clone() in c82aa1b76787c
> ("md/raid5: move checking badblock before clone bio in
> raid5_read_one_chunk"). The previous code checked and freed align_bio
> which required a bio_put. After he move that is no longer needed as
*t*he move
> raid_bio is returned to the control of the common io path which
> performs its own endio resulting in a double free on bad device blocks.
>
> Fixes: c82aa1b76787c ("md/raid5: move checking badblock before clone bio in raid5_read_one_chunk")
This commit was added to Linux in v5.14-rc1, so it’s not a recent commit
– or I misunderstood the cover letter.
Kind regards,
Paul
> Signed-off-by: David Sloan <[email protected]>
> [[email protected]: minor rewording of commit message]
> Signed-off-by: Logan Gunthorpe <[email protected]>
> ---
> drivers/md/raid5.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 4e6d865a6456..734f92e75f85 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5538,7 +5538,6 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
>
> if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad,
> &bad_sectors)) {
> - bio_put(raid_bio);
> rdev_dec_pending(rdev, mddev);
> return 0;
> }
Looks good:
Reviewed-by: Christoph Hellwig <[email protected]>
On 9/3/22 1:16 AM, Logan Gunthorpe wrote:
> A regression is seen where mddev devices stay permanently after they
> are stopped due to an elevated reference count.
>
> This was tracked down to an extra mddev_get() in md_seq_start().
>
> It only happened rarely because most of the time the md_seq_start()
> is called with a zero offset. The path with an extra mddev_get() only
> happens when it starts with a non-zero offset.
>
> The commit noted below changed an mddev_get() to check its success
> but inadevrtantly left the original call in. Remove the extra call.
>
> Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the disk is freed")
> Signed-off-by: Logan Gunthorpe <[email protected]>
> ---
> drivers/md/md.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index afaf36b2f6ab..9dc0175280b4 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8154,7 +8154,6 @@ static void *md_seq_start(struct seq_file *seq, loff_t *pos)
> list_for_each(tmp,&all_mddevs)
> if (!l--) {
> mddev = list_entry(tmp, struct mddev, all_mddevs);
> - mddev_get(mddev);
> if (!mddev_get(mddev))
> continue;
> spin_unlock(&all_mddevs_lock);
Acked-by: Guoqing Jiang <[email protected]>
Thanks,
Guoqing
On 9/3/22 1:16 AM, Logan Gunthorpe wrote:
> From: David Sloan <[email protected]>
>
> When running chunk-sized reads on disks with badblocks duplicate bio
> free/puts are observed:
>
> =============================================================================
> BUG bio-200 (Not tainted): Object already free
> -----------------------------------------------------------------------------
> Allocated in mempool_alloc_slab+0x17/0x20 age=3 cpu=2 pid=7504
> __slab_alloc.constprop.0+0x5a/0xb0
> kmem_cache_alloc+0x31e/0x330
> mempool_alloc_slab+0x17/0x20
> mempool_alloc+0x100/0x2b0
> bio_alloc_bioset+0x181/0x460
> do_mpage_readpage+0x776/0xd00
> mpage_readahead+0x166/0x320
> blkdev_readahead+0x15/0x20
> read_pages+0x13f/0x5f0
> page_cache_ra_unbounded+0x18d/0x220
> force_page_cache_ra+0x181/0x1c0
> page_cache_sync_ra+0x65/0xb0
> filemap_get_pages+0x1df/0xaf0
> filemap_read+0x1e1/0x700
> blkdev_read_iter+0x1e5/0x330
> vfs_read+0x42a/0x570
> Freed in mempool_free_slab+0x17/0x20 age=3 cpu=2 pid=7504
> kmem_cache_free+0x46d/0x490
> mempool_free_slab+0x17/0x20
> mempool_free+0x66/0x190
> bio_free+0x78/0x90
> bio_put+0x100/0x1a0
> raid5_make_request+0x2259/0x2450
> md_handle_request+0x402/0x600
> md_submit_bio+0xd9/0x120
> __submit_bio+0x11f/0x1b0
> submit_bio_noacct_nocheck+0x204/0x480
> submit_bio_noacct+0x32e/0xc70
> submit_bio+0x98/0x1a0
> mpage_readahead+0x250/0x320
> blkdev_readahead+0x15/0x20
> read_pages+0x13f/0x5f0
> page_cache_ra_unbounded+0x18d/0x220
> Slab 0xffffea000481b600 objects=21 used=0 fp=0xffff8881206d8940 flags=0x17ffffc0010201(locked|slab|head|node=0|zone=2|lastcpupid=0x1fffff)
> CPU: 0 PID: 34525 Comm: kworker/u24:2 Not tainted 6.0.0-rc2-localyes-265166-gf11c5343fa3f #143
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> Workqueue: raid5wq raid5_do_work
> Call Trace:
> <TASK>
> dump_stack_lvl+0x5a/0x78
> dump_stack+0x10/0x16
> print_trailer+0x158/0x165
> object_err+0x35/0x50
> free_debug_processing.cold+0xb7/0xbe
> __slab_free+0x1ae/0x330
> kmem_cache_free+0x46d/0x490
> mempool_free_slab+0x17/0x20
> mempool_free+0x66/0x190
> bio_free+0x78/0x90
> bio_put+0x100/0x1a0
> mpage_end_io+0x36/0x150
> bio_endio+0x2fd/0x360
> md_end_io_acct+0x7e/0x90
> bio_endio+0x2fd/0x360
> handle_failed_stripe+0x960/0xb80
> handle_stripe+0x1348/0x3760
> handle_active_stripes.constprop.0+0x72a/0xaf0
> raid5_do_work+0x177/0x330
> process_one_work+0x616/0xb20
> worker_thread+0x2bd/0x6f0
> kthread+0x179/0x1b0
> ret_from_fork+0x22/0x30
> </TASK>
>
> The double free is caused by an unnecessary bio_put() in the
> if(is_badblock(...)) error path in raid5_read_one_chunk().
>
> The error path was moved ahead of bio_alloc_clone() in c82aa1b76787c
> ("md/raid5: move checking badblock before clone bio in
> raid5_read_one_chunk"). The previous code checked and freed align_bio
> which required a bio_put. After he move that is no longer needed as
> raid_bio is returned to the control of the common io path which
> performs its own endio resulting in a double free on bad device blocks.
>
> Fixes: c82aa1b76787c ("md/raid5: move checking badblock before clone bio in raid5_read_one_chunk")
> Signed-off-by: David Sloan <[email protected]>
> [[email protected]: minor rewording of commit message]
> Signed-off-by: Logan Gunthorpe <[email protected]>
> ---
> drivers/md/raid5.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 4e6d865a6456..734f92e75f85 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -5538,7 +5538,6 @@ static int raid5_read_one_chunk(struct mddev *mddev, struct bio *raid_bio)
>
> if (is_badblock(rdev, sector, bio_sectors(raid_bio), &first_bad,
> &bad_sectors)) {
> - bio_put(raid_bio);
> rdev_dec_pending(rdev, mddev);
> return 0;
> }
Acked-by: Guoqing Jiang <[email protected]>
Thanks,
Guoqing
On 2022-09-05 00:07, Paul Menzel wrote:
> This commit was added to Linux in v5.14-rc1, so it’s not a recent commit
> – or I misunderstood the cover letter.
Ah, yes, sorry. The commit it fixes is a bit more than a year old. I
guess it depends on how you define "recent". ;)
Logan