LinuxLists.cc - [PATCH v2 0/2] md: fix is_mddev

2023-12-15 01:41:46

Subject: [PATCH v2 0/2] md: fix is_mddev_idle()

From: Li Nan <[email protected]>

Changes in v2:
- patch 2, fix typo error.

Li Nan (2):
md: Fix overflow in is_mddev_idle
md: don't account sync_io if iostats of the disk is disabled

drivers/md/md.h | 7 ++++---
include/linux/blkdev.h | 2 +-
drivers/md/md.c | 11 ++++++++---
3 files changed, 13 insertions(+), 7 deletions(-)

--
2.39.2

2023-12-15 01:41:50

by Li Nan

[permalink] [raw]

Subject: [PATCH v2 1/2] md: Fix overflow in is_mddev_idle

From: Li Nan <[email protected]>

UBSAN reports this problem:

UBSAN: Undefined behaviour in drivers/md/md.c:8175:15
signed integer overflow:
-2147483291 - 2072033152 cannot be represented in type 'int'
Call trace:
dump_backtrace+0x0/0x310
show_stack+0x28/0x38
dump_stack+0xec/0x15c
ubsan_epilogue+0x18/0x84
handle_overflow+0x14c/0x19c
__ubsan_handle_sub_overflow+0x34/0x44
is_mddev_idle+0x338/0x3d8
md_do_sync+0x1bb8/0x1cf8
md_thread+0x220/0x288
kthread+0x1d8/0x1e0
ret_from_fork+0x10/0x18

'curr_events' will overflow when stat accum or 'sync_io' is greater than
INT_MAX.

Fix it by changing sync_io, last_events and curr_events to 64bit.

Signed-off-by: Li Nan <[email protected]>
---
drivers/md/md.h | 4 ++--
include/linux/blkdev.h | 2 +-
drivers/md/md.c | 7 ++++---
3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index ade83af123a2..1a4f976951c1 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -50,7 +50,7 @@ struct md_rdev {

sector_t sectors; /* Device size (in 512bytes sectors) */
struct mddev *mddev; /* RAID array if running */
- int last_events; /* IO event timestamp */
+ long long last_events; /* IO event timestamp */

/*
* If meta_bdev is non-NULL, it means that a separate device is
@@ -584,7 +584,7 @@ extern void mddev_unlock(struct mddev *mddev);

static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sectors)
{
- atomic_add(nr_sectors, &bdev->bd_disk->sync_io);
+ atomic64_add(nr_sectors, &bdev->bd_disk->sync_io);
}

static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 3f8a21cd9233..d28b98adf457 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -170,7 +170,7 @@ struct gendisk {
struct list_head slave_bdevs;
#endif
struct timer_rand_state *random;
- atomic_t sync_io; /* RAID */
+ atomic64_t sync_io; /* RAID */
struct disk_events *ev;

#ifdef CONFIG_BLK_DEV_ZONED
diff --git a/drivers/md/md.c b/drivers/md/md.c
index c94373d64f2c..1d71b2a9af03 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8496,14 +8496,15 @@ static int is_mddev_idle(struct mddev *mddev, int init)
{
struct md_rdev *rdev;
int idle;
- int curr_events;
+ long long curr_events;

idle = 1;
rcu_read_lock();
rdev_for_each_rcu(rdev, mddev) {
struct gendisk *disk = rdev->bdev->bd_disk;
- curr_events = (int)part_stat_read_accum(disk->part0, sectors) -
- atomic_read(&disk->sync_io);
+ curr_events =
+ (long long)part_stat_read_accum(disk->part0, sectors) -
+ atomic64_read(&disk->sync_io);
/* sync IO will cause sync_io to increase before the disk_stats
* as sync_io is counted when a request starts, and
* disk_stats is counted when it completes.
--
2.39.2

2023-12-15 01:42:20

by Li Nan

[permalink] [raw]

Subject: [PATCH v2 2/2] md: don't account sync_io if iostats of the disk is disabled

From: Li Nan <[email protected]>

If iostats is disabled, disk_stats will not be updated and
part_stat_read_accum() only returns a constant value. In this case,
continuing to count sync_io and to check is_mddev_idle() is no longer
meaningful.

Signed-off-by: Li Nan <[email protected]>
---
drivers/md/md.h | 3 ++-
drivers/md/md.c | 4 ++++
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 1a4f976951c1..75f5c5d04e71 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -589,7 +589,8 @@ static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sect

static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors)
{
- md_sync_acct(bio->bi_bdev, nr_sectors);
+ if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
+ md_sync_acct(bio->bi_bdev, nr_sectors);
}

struct md_personality
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 1d71b2a9af03..18bbceb0ffd6 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8502,6 +8502,10 @@ static int is_mddev_idle(struct mddev *mddev, int init)
rcu_read_lock();
rdev_for_each_rcu(rdev, mddev) {
struct gendisk *disk = rdev->bdev->bd_disk;
+
+ if (blk_queue_io_stat(mddev->queue))
+ continue;
+
curr_events =
(long long)part_stat_read_accum(disk->part0, sectors) -
atomic64_read(&disk->sync_io);
--
2.39.2

2023-12-15 09:08:32

by Song Liu

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] md: don't account sync_io if iostats of the disk is disabled

On Thu, Dec 14, 2023 at 5:41 PM <[email protected]> wrote:
>
> From: Li Nan <[email protected]>
>
> If iostats is disabled, disk_stats will not be updated and
> part_stat_read_accum() only returns a constant value. In this case,
> continuing to count sync_io and to check is_mddev_idle() is no longer
> meaningful.
>
> Signed-off-by: Li Nan <[email protected]>
> ---
> drivers/md/md.h | 3 ++-
> drivers/md/md.c | 4 ++++
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 1a4f976951c1..75f5c5d04e71 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -589,7 +589,8 @@ static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sect
>
> static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors)
> {
> - md_sync_acct(bio->bi_bdev, nr_sectors);
> + if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
> + md_sync_acct(bio->bi_bdev, nr_sectors);

Do we need the same check for md_sync_acct()?

> }
>
> struct md_personality
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 1d71b2a9af03..18bbceb0ffd6 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8502,6 +8502,10 @@ static int is_mddev_idle(struct mddev *mddev, int init)
> rcu_read_lock();
> rdev_for_each_rcu(rdev, mddev) {
> struct gendisk *disk = rdev->bdev->bd_disk;
> +
> + if (blk_queue_io_stat(mddev->queue))
> + continue;

This looks weird. Do you mean

if (!blk_queue_io_stat(disk->queue))

Note that I changed two things here: 1) add "!"; 2) check disk, not mddev.

Did I miss something?

Thanks,
Song

> +
> curr_events =
> (long long)part_stat_read_accum(disk->part0, sectors) -
> atomic64_read(&disk->sync_io);
> --
> 2.39.2
>

2023-12-15 09:29:58

by Li Nan

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] md: don't account sync_io if iostats of the disk is disabled

在 2023/12/15 17:08, Song Liu 写道:
> On Thu, Dec 14, 2023 at 5:41 PM <[email protected]> wrote:
>>
>> From: Li Nan <[email protected]>
>>
>> If iostats is disabled, disk_stats will not be updated and
>> part_stat_read_accum() only returns a constant value. In this case,
>> continuing to count sync_io and to check is_mddev_idle() is no longer
>> meaningful.
>>
>> Signed-off-by: Li Nan <[email protected]>
>> ---
>> drivers/md/md.h | 3 ++-
>> drivers/md/md.c | 4 ++++
>> 2 files changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index 1a4f976951c1..75f5c5d04e71 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -589,7 +589,8 @@ static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sect
>>
>> static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors)
>> {
>> - md_sync_acct(bio->bi_bdev, nr_sectors);
>> + if (blk_queue_io_stat(bio->bi_bdev->bd_disk->queue))
>> + md_sync_acct(bio->bi_bdev, nr_sectors);
>
> Do we need the same check for md_sync_acct()?
>
>> }
>>
>> struct md_personality
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 1d71b2a9af03..18bbceb0ffd6 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -8502,6 +8502,10 @@ static int is_mddev_idle(struct mddev *mddev, int init)
>> rcu_read_lock();
>> rdev_for_each_rcu(rdev, mddev) {
>> struct gendisk *disk = rdev->bdev->bd_disk;
>> +
>> + if (blk_queue_io_stat(mddev->queue))
>> + continue;
>
> This looks weird. Do you mean
>
> if (!blk_queue_io_stat(disk->queue))
>
> Note that I changed two things here: 1) add "!"; 2) check disk, not mddev.
>
> Did I miss something?
>
> Thanks,
> Song
>

Yeah... It's my bad. I will fix it in v3.

>> +
>> curr_events =
>> (long long)part_stat_read_accum(disk->part0, sectors) -
>> atomic64_read(&disk->sync_io);
>> --
>> 2.39.2
>>
>
> .

--
Thanks,
Nan

2023-12-15 23:21:31

by Song Liu

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] md: Fix overflow in is_mddev_idle

On Thu, Dec 14, 2023 at 5:41 PM <[email protected]> wrote:
>
> From: Li Nan <[email protected]>
>
> UBSAN reports this problem:
>
> UBSAN: Undefined behaviour in drivers/md/md.c:8175:15
> signed integer overflow:
> -2147483291 - 2072033152 cannot be represented in type 'int'
> Call trace:
> dump_backtrace+0x0/0x310
> show_stack+0x28/0x38
> dump_stack+0xec/0x15c
> ubsan_epilogue+0x18/0x84
> handle_overflow+0x14c/0x19c
> __ubsan_handle_sub_overflow+0x34/0x44
> is_mddev_idle+0x338/0x3d8
> md_do_sync+0x1bb8/0x1cf8
> md_thread+0x220/0x288
> kthread+0x1d8/0x1e0
> ret_from_fork+0x10/0x18
>
> 'curr_events' will overflow when stat accum or 'sync_io' is greater than
> INT_MAX.
>
> Fix it by changing sync_io, last_events and curr_events to 64bit.
>
> Signed-off-by: Li Nan <[email protected]>
> ---
> drivers/md/md.h | 4 ++--
> include/linux/blkdev.h | 2 +-
> drivers/md/md.c | 7 ++++---
> 3 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index ade83af123a2..1a4f976951c1 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -50,7 +50,7 @@ struct md_rdev {
>
> sector_t sectors; /* Device size (in 512bytes sectors) */
> struct mddev *mddev; /* RAID array if running */
> - int last_events; /* IO event timestamp */
> + long long last_events; /* IO event timestamp */
>
> /*
> * If meta_bdev is non-NULL, it means that a separate device is
> @@ -584,7 +584,7 @@ extern void mddev_unlock(struct mddev *mddev);
>
> static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sectors)
> {
> - atomic_add(nr_sectors, &bdev->bd_disk->sync_io);
> + atomic64_add(nr_sectors, &bdev->bd_disk->sync_io);
> }
>
> static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors)
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 3f8a21cd9233..d28b98adf457 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -170,7 +170,7 @@ struct gendisk {
> struct list_head slave_bdevs;
> #endif
> struct timer_rand_state *random;
> - atomic_t sync_io; /* RAID */
> + atomic64_t sync_io; /* RAID */
> struct disk_events *ev;

As we are on this, I wonder whether we really need this.
AFAICT, is_mddev_idle() is the only consumer of sync_io.
We can probably do the same check in is_mddev_idle()
without sync_io.

Thanks,
Song

>
> #ifdef CONFIG_BLK_DEV_ZONED
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index c94373d64f2c..1d71b2a9af03 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8496,14 +8496,15 @@ static int is_mddev_idle(struct mddev *mddev, int init)
> {
> struct md_rdev *rdev;
> int idle;
> - int curr_events;
> + long long curr_events;
>
> idle = 1;
> rcu_read_lock();
> rdev_for_each_rcu(rdev, mddev) {
> struct gendisk *disk = rdev->bdev->bd_disk;
> - curr_events = (int)part_stat_read_accum(disk->part0, sectors) -
> - atomic_read(&disk->sync_io);
> + curr_events =
> + (long long)part_stat_read_accum(disk->part0, sectors) -
> + atomic64_read(&disk->sync_io);
> /* sync IO will cause sync_io to increase before the disk_stats
> * as sync_io is counted when a request starts, and
> * disk_stats is counted when it completes.
> --
> 2.39.2
>
>

2023-12-16 02:24:44

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] md: Fix overflow in is_mddev_idle

Hi,

在 2023/12/16 7:12, Song Liu 写道:
> On Thu, Dec 14, 2023 at 5:41 PM <[email protected]> wrote:
>>
>> From: Li Nan <[email protected]>
>>
>> UBSAN reports this problem:
>>
>> UBSAN: Undefined behaviour in drivers/md/md.c:8175:15
>> signed integer overflow:
>> -2147483291 - 2072033152 cannot be represented in type 'int'
>> Call trace:
>> dump_backtrace+0x0/0x310
>> show_stack+0x28/0x38
>> dump_stack+0xec/0x15c
>> ubsan_epilogue+0x18/0x84
>> handle_overflow+0x14c/0x19c
>> __ubsan_handle_sub_overflow+0x34/0x44
>> is_mddev_idle+0x338/0x3d8
>> md_do_sync+0x1bb8/0x1cf8
>> md_thread+0x220/0x288
>> kthread+0x1d8/0x1e0
>> ret_from_fork+0x10/0x18
>>
>> 'curr_events' will overflow when stat accum or 'sync_io' is greater than
>> INT_MAX.
>>
>> Fix it by changing sync_io, last_events and curr_events to 64bit.
>>
>> Signed-off-by: Li Nan <[email protected]>
>> ---
>> drivers/md/md.h | 4 ++--
>> include/linux/blkdev.h | 2 +-
>> drivers/md/md.c | 7 ++++---
>> 3 files changed, 7 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/md/md.h b/drivers/md/md.h
>> index ade83af123a2..1a4f976951c1 100644
>> --- a/drivers/md/md.h
>> +++ b/drivers/md/md.h
>> @@ -50,7 +50,7 @@ struct md_rdev {
>>
>> sector_t sectors; /* Device size (in 512bytes sectors) */
>> struct mddev *mddev; /* RAID array if running */
>> - int last_events; /* IO event timestamp */
>> + long long last_events; /* IO event timestamp */
>>
>> /*
>> * If meta_bdev is non-NULL, it means that a separate device is
>> @@ -584,7 +584,7 @@ extern void mddev_unlock(struct mddev *mddev);
>>
>> static inline void md_sync_acct(struct block_device *bdev, unsigned long nr_sectors)
>> {
>> - atomic_add(nr_sectors, &bdev->bd_disk->sync_io);
>> + atomic64_add(nr_sectors, &bdev->bd_disk->sync_io);
>> }
>>
>> static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors)
>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> index 3f8a21cd9233..d28b98adf457 100644
>> --- a/include/linux/blkdev.h
>> +++ b/include/linux/blkdev.h
>> @@ -170,7 +170,7 @@ struct gendisk {
>> struct list_head slave_bdevs;
>> #endif
>> struct timer_rand_state *random;
>> - atomic_t sync_io; /* RAID */
>> + atomic64_t sync_io; /* RAID */
>> struct disk_events *ev;
>
> As we are on this, I wonder whether we really need this.
> AFAICT, is_mddev_idle() is the only consumer of sync_io.
> We can probably do the same check in is_mddev_idle()
> without sync_io.

After reviewing some code, what I'm understand is following:

I think 'sync_io' is used to distinguish 'sync io' from raid(this can
from different raid, for example, different partition is used for
different array) and other io(any io, even it's not from raid). And
if there are any other IO, sync speed is limited to min, otherwise it's
limited to max.

If we want to keep this behaviour, I can't think of any other way for
now...

>
> Thanks,
> Song
>
>
>>
>> #ifdef CONFIG_BLK_DEV_ZONED
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index c94373d64f2c..1d71b2a9af03 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -8496,14 +8496,15 @@ static int is_mddev_idle(struct mddev *mddev, int init)
>> {
>> struct md_rdev *rdev;
>> int idle;
>> - int curr_events;
>> + long long curr_events;
>>
>> idle = 1;
>> rcu_read_lock();
>> rdev_for_each_rcu(rdev, mddev) {
>> struct gendisk *disk = rdev->bdev->bd_disk;
>> - curr_events = (int)part_stat_read_accum(disk->part0, sectors) -
>> - atomic_read(&disk->sync_io);
>> + curr_events =
>> + (long long)part_stat_read_accum(disk->part0, sectors) -
>> + atomic64_read(&disk->sync_io);

By the way, I don't think this overflow is problematic, assume that
sectors accumulate by 'A' and sync_io accumulate by 'B':

(setros + A) - (sync_io + B) -(sectors - sync_io) = (A - B)

Nomatter overflow or truncation happened of not in the abouve addition
and subtraction, the result is correct.

And even if io accounting is disabled, which means sectors and A is
always 0, the result will always be -B that is <= 0, hence
is_mddev_idle() will always return true, and sync_speed will be limited
to max in this case.

Thanks,
Kuai

>> /* sync IO will cause sync_io to increase before the disk_stats
>> * as sync_io is counted when a request starts, and
>> * disk_stats is counted when it completes.
>> --
>> 2.39.2
>>
>>
> .
>

2023-12-17 00:28:38

by Song Liu

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] md: Fix overflow in is_mddev_idle

On Fri, Dec 15, 2023 at 6:24 PM Yu Kuai <[email protected]> wrote:
[...]
> >> static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors)
> >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> >> index 3f8a21cd9233..d28b98adf457 100644
> >> --- a/include/linux/blkdev.h
> >> +++ b/include/linux/blkdev.h
> >> @@ -170,7 +170,7 @@ struct gendisk {
> >> struct list_head slave_bdevs;
> >> #endif
> >> struct timer_rand_state *random;
> >> - atomic_t sync_io; /* RAID */
> >> + atomic64_t sync_io; /* RAID */
> >> struct disk_events *ev;
> >
> > As we are on this, I wonder whether we really need this.
> > AFAICT, is_mddev_idle() is the only consumer of sync_io.
> > We can probably do the same check in is_mddev_idle()
> > without sync_io.
>
> After reviewing some code, what I'm understand is following:
>
> I think 'sync_io' is used to distinguish 'sync io' from raid(this can
> from different raid, for example, different partition is used for
> different array) and other io(any io, even it's not from raid). And
> if there are any other IO, sync speed is limited to min, otherwise it's
> limited to max.
>
> If we want to keep this behaviour, I can't think of any other way for
> now...

Thanks for looking into this. To keep current behavior, we will need it
in gendisk.

[...]

> >> @@ -8496,14 +8496,15 @@ static int is_mddev_idle(struct mddev *mddev, int init)
> >> {
> >> struct md_rdev *rdev;
> >> int idle;
> >> - int curr_events;
> >> + long long curr_events;
> >>
> >> idle = 1;
> >> rcu_read_lock();
> >> rdev_for_each_rcu(rdev, mddev) {
> >> struct gendisk *disk = rdev->bdev->bd_disk;
> >> - curr_events = (int)part_stat_read_accum(disk->part0, sectors) -
> >> - atomic_read(&disk->sync_io);
> >> + curr_events =
> >> + (long long)part_stat_read_accum(disk->part0, sectors) -
> >> + atomic64_read(&disk->sync_io);
>
> By the way, I don't think this overflow is problematic, assume that
> sectors accumulate by 'A' and sync_io accumulate by 'B':
>
> (setros + A) - (sync_io + B) -(sectors - sync_io) = (A - B)
>
> Nomatter overflow or truncation happened of not in the abouve addition
> and subtraction, the result is correct.
>
> And even if io accounting is disabled, which means sectors and A is
> always 0, the result will always be -B that is <= 0, hence
> is_mddev_idle() will always return true, and sync_speed will be limited
> to max in this case.

We only use this for idle or not check, the behavior is OK (I think).
However, this logic is error prone.

On 64-bit systems, there is a 4-byte hole behind sync_io. I think we can
just use it for atomic64_t so that we don't have to worry about overflow.

Thanks,
Song

2023-12-18 01:39:47

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] md: Fix overflow in is_mddev_idle

Hi,

在 2023/12/17 8:28, Song Liu 写道:
> On Fri, Dec 15, 2023 at 6:24 PM Yu Kuai <[email protected]> wrote:
> [...]
>>>> static inline void md_sync_acct_bio(struct bio *bio, unsigned long nr_sectors)
>>>> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>>>> index 3f8a21cd9233..d28b98adf457 100644
>>>> --- a/include/linux/blkdev.h
>>>> +++ b/include/linux/blkdev.h
>>>> @@ -170,7 +170,7 @@ struct gendisk {
>>>> struct list_head slave_bdevs;
>>>> #endif
>>>> struct timer_rand_state *random;
>>>> - atomic_t sync_io; /* RAID */
>>>> + atomic64_t sync_io; /* RAID */
>>>> struct disk_events *ev;
>>>
>>> As we are on this, I wonder whether we really need this.
>>> AFAICT, is_mddev_idle() is the only consumer of sync_io.
>>> We can probably do the same check in is_mddev_idle()
>>> without sync_io.
>>
>> After reviewing some code, what I'm understand is following:
>>
>> I think 'sync_io' is used to distinguish 'sync io' from raid(this can
>> from different raid, for example, different partition is used for
>> different array) and other io(any io, even it's not from raid). And
>> if there are any other IO, sync speed is limited to min, otherwise it's
>> limited to max.
>>
>> If we want to keep this behaviour, I can't think of any other way for
>> now...
>
> Thanks for looking into this. To keep current behavior, we will need it
> in gendisk.
>
> [...]
>
>>>> @@ -8496,14 +8496,15 @@ static int is_mddev_idle(struct mddev *mddev, int init)
>>>> {
>>>> struct md_rdev *rdev;
>>>> int idle;
>>>> - int curr_events;
>>>> + long long curr_events;
>>>>
>>>> idle = 1;
>>>> rcu_read_lock();
>>>> rdev_for_each_rcu(rdev, mddev) {
>>>> struct gendisk *disk = rdev->bdev->bd_disk;
>>>> - curr_events = (int)part_stat_read_accum(disk->part0, sectors) -
>>>> - atomic_read(&disk->sync_io);
>>>> + curr_events =
>>>> + (long long)part_stat_read_accum(disk->part0, sectors) -
>>>> + atomic64_read(&disk->sync_io);
>>
>> By the way, I don't think this overflow is problematic, assume that
>> sectors accumulate by 'A' and sync_io accumulate by 'B':
>>
>> (setros + A) - (sync_io + B) -(sectors - sync_io) = (A - B)
>>
>> Nomatter overflow or truncation happened of not in the abouve addition
>> and subtraction, the result is correct.
>>
>> And even if io accounting is disabled, which means sectors and A is
>> always 0, the result will always be -B that is <= 0, hence
>> is_mddev_idle() will always return true, and sync_speed will be limited
>> to max in this case.
>
> We only use this for idle or not check, the behavior is OK (I think).
> However, this logic is error prone.
>
> On 64-bit systems, there is a 4-byte hole behind sync_io. I think we can
> just use it for atomic64_t so that we don't have to worry about overflow.

I'm not sure about this, because other than this ubsan warning, this
overflow doesn't have any impact on functionality to me.

If we care about this 'hole', there are lots of holes in gendisk, and
can be avoiled, for example, moving 'sync_io' near to 'node_id'.

Thanks,
Kuai

>
> Thanks,
> Song
> .
>

2023-12-18 16:13:21

by Song Liu

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] md: Fix overflow in is_mddev_idle

On Sun, Dec 17, 2023 at 5:39 PM Yu Kuai <[email protected]> wrote:
>
[...]
> >
> > We only use this for idle or not check, the behavior is OK (I think).
> > However, this logic is error prone.
> >
> > On 64-bit systems, there is a 4-byte hole behind sync_io. I think we can
> > just use it for atomic64_t so that we don't have to worry about overflow.
>
> I'm not sure about this, because other than this ubsan warning, this
> overflow doesn't have any impact on functionality to me.

Fixing warnings for zero or low cost is always a good idea. It helps boost
the signal when UBSAN (and other debug features) detects real issues.

> If we care about this 'hole', there are lots of holes in gendisk, and
> can be avoiled, for example, moving 'sync_io' near to 'node_id'.

The point was not "let's fill the hole", but "we can use atomic64_t
without extra memory cost". In general, I don't think we care too
much about holes in "struct gendisk".

Does this make sense?

Thanks,
Song

2023-12-19 01:40:17

by Yu Kuai

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] md: Fix overflow in is_mddev_idle

Hi,

在 2023/12/19 0:04, Song Liu 写道:
> On Sun, Dec 17, 2023 at 5:39 PM Yu Kuai <[email protected]> wrote:
>>
> [...]
>>>
>>> We only use this for idle or not check, the behavior is OK (I think).
>>> However, this logic is error prone.
>>>
>>> On 64-bit systems, there is a 4-byte hole behind sync_io. I think we can
>>> just use it for atomic64_t so that we don't have to worry about overflow.
>>
>> I'm not sure about this, because other than this ubsan warning, this
>> overflow doesn't have any impact on functionality to me.
>
> Fixing warnings for zero or low cost is always a good idea. It helps boost
> the signal when UBSAN (and other debug features) detects real issues.
>
>> If we care about this 'hole', there are lots of holes in gendisk, and
>> can be avoiled, for example, moving 'sync_io' near to 'node_id'.
>
> The point was not "let's fill the hole", but "we can use atomic64_t
> without extra memory cost". In general, I don't think we care too
> much about holes in "struct gendisk".
>
> Does this make sense?

Of course, I don't have strong preference on this. Because our syzkaller
reported lots of UBSAN warnings, hence only fix real issues is how we do
it. For upstream, I'm good at fix this warning with zero or low cost.

Thanks,
Kuai

>
> Thanks,
> Song
> .
>