2020-08-30 01:09:38

by OGAWA Hirofumi

[permalink] [raw]
Subject: [PATCH] fat: Avoid oops when bdi->io_pages==0

On one system, there was bdi->io_pages==0. This seems to be the bug of
a driver somewhere, and should fix it though. Anyway, it is better to
avoid the divide-by-zero Oops.

So this check it.

Signed-off-by: OGAWA Hirofumi <[email protected]>
Cc: <[email protected]>
---
fs/fat/fatent.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
index f7e3304..98a1c4f 100644
--- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900
+++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900
@@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
if (fatent->entry >= ent_limit)
return;

- if (ra_pages > sb->s_bdi->io_pages)
+ if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
reada_blocks = ra_pages << (PAGE_SHIFT - sb->s_blocksize_bits + 1);

_

--
OGAWA Hirofumi <[email protected]>


2020-08-30 01:24:35

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

On Sun, Aug 30, 2020 at 09:59:41AM +0900, OGAWA Hirofumi wrote:
> On one system, there was bdi->io_pages==0. This seems to be the bug of
> a driver somewhere, and should fix it though. Anyway, it is better to
> avoid the divide-by-zero Oops.
>
> So this check it.
>
> Signed-off-by: OGAWA Hirofumi <[email protected]>
> Cc: <[email protected]>
> ---
> fs/fat/fatent.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
> index f7e3304..98a1c4f 100644
> --- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900
> +++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900
> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
> if (fatent->entry >= ent_limit)
> return;
>
> - if (ra_pages > sb->s_bdi->io_pages)
> + if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
> ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);

Wait, rounddown? ->io_pages is supposed to be the maximum number of
pages to readahead. Shouldn't this be max() instead of rounddown()?

2020-08-30 01:57:22

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

Matthew Wilcox <[email protected]> writes:

> On Sun, Aug 30, 2020 at 09:59:41AM +0900, OGAWA Hirofumi wrote:
>> On one system, there was bdi->io_pages==0. This seems to be the bug of
>> a driver somewhere, and should fix it though. Anyway, it is better to
>> avoid the divide-by-zero Oops.
>>
>> So this check it.
>>
>> Signed-off-by: OGAWA Hirofumi <[email protected]>
>> Cc: <[email protected]>
>> ---
>> fs/fat/fatent.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
>> index f7e3304..98a1c4f 100644
>> --- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900
>> +++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900
>> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
>> if (fatent->entry >= ent_limit)
>> return;
>>
>> - if (ra_pages > sb->s_bdi->io_pages)
>> + if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
>> ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
>
> Wait, rounddown? ->io_pages is supposed to be the maximum number of
> pages to readahead. Shouldn't this be max() instead of rounddown()?

Hm, io_pages is limited by driver setting too, and io_pages can be lower
than ra_pages, e.g. usb storage.

Assuming ra_pages is user intent of readahead window. So if io_pages is
lower than ra_pages, this try ra_pages to align of io_pages chunk, but
not bigger than ra_pages. Because if block layer splits I/O requests to
hard limit, then I/O is not optimal.

So it is intent, I can be misunderstanding though.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2020-08-30 03:57:48

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

On Sun, Aug 30, 2020 at 10:54:35AM +0900, OGAWA Hirofumi wrote:
> Matthew Wilcox <[email protected]> writes:
>
> > On Sun, Aug 30, 2020 at 09:59:41AM +0900, OGAWA Hirofumi wrote:
> >> On one system, there was bdi->io_pages==0. This seems to be the bug of
> >> a driver somewhere, and should fix it though. Anyway, it is better to
> >> avoid the divide-by-zero Oops.
> >>
> >> So this check it.
> >>
> >> Signed-off-by: OGAWA Hirofumi <[email protected]>
> >> Cc: <[email protected]>
> >> ---
> >> fs/fat/fatent.c | 2 +-
> >> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
> >> index f7e3304..98a1c4f 100644
> >> --- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900
> >> +++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900
> >> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
> >> if (fatent->entry >= ent_limit)
> >> return;
> >>
> >> - if (ra_pages > sb->s_bdi->io_pages)
> >> + if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
> >> ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
> >
> > Wait, rounddown? ->io_pages is supposed to be the maximum number of
> > pages to readahead. Shouldn't this be max() instead of rounddown()?

Sorry, I meant 'min', not 'max'.

> Hm, io_pages is limited by driver setting too, and io_pages can be lower
> than ra_pages, e.g. usb storage.
>
> Assuming ra_pages is user intent of readahead window. So if io_pages is
> lower than ra_pages, this try ra_pages to align of io_pages chunk, but
> not bigger than ra_pages. Because if block layer splits I/O requests to
> hard limit, then I/O is not optimal.
>
> So it is intent, I can be misunderstanding though.

Looking at this some more, I'm not sure it makes sense to consult ->io_pages
at all. I see how it gets set to 0 -- the admin can write '1' to
/sys/block/<device>/queue/max_sectors_kb and that gets turned into 0
in ->io_pages.

But I'm not sure it makes any sense to respect that. Looking at mm/readahead.c, all it does is limit the size of a read request which exceeds the current readahead window. It's not used to limit the readahead window itself. For
example:

unsigned long max_pages = ra->ra_pages;
...
if (req_size > max_pages && bdi->io_pages > max_pages)
max_pages = min(req_size, bdi->io_pages);

Setting io_pages below ra_pages has no effect. So maybe fat should also
disregard it?

2020-08-30 09:05:24

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

Matthew Wilcox <[email protected]> writes:

> On Sun, Aug 30, 2020 at 10:54:35AM +0900, OGAWA Hirofumi wrote:
>> Matthew Wilcox <[email protected]> writes:
>>
>> Hm, io_pages is limited by driver setting too, and io_pages can be lower
>> than ra_pages, e.g. usb storage.
>>
>> Assuming ra_pages is user intent of readahead window. So if io_pages is
>> lower than ra_pages, this try ra_pages to align of io_pages chunk, but
>> not bigger than ra_pages. Because if block layer splits I/O requests to
>> hard limit, then I/O is not optimal.
>>
>> So it is intent, I can be misunderstanding though.
>
> Looking at this some more, I'm not sure it makes sense to consult ->io_pages
> at all. I see how it gets set to 0 -- the admin can write '1' to
> /sys/block/<device>/queue/max_sectors_kb and that gets turned into 0
> in ->io_pages.

if (max_sectors_kb > max_hw_sectors_kb || max_sectors_kb < page_kb)
return -EINVAL;

It should not set to 0 via /sys/.../max_sectors_kb. However the default
of bdi->io_pages is 0. So it happened if a driver didn't initialized it.

> But I'm not sure it makes any sense to respect that. Looking at
> mm/readahead.c, all it does is limit the size of a read request which
> exceeds the current readahead window. It's not used to limit the
> readahead window itself. For example:
>
> unsigned long max_pages = ra->ra_pages;
> ...
> if (req_size > max_pages && bdi->io_pages > max_pages)
> max_pages = min(req_size, bdi->io_pages);
>
> Setting io_pages below ra_pages has no effect. So maybe fat should also
> disregard it?

|-----------------------| requested blocks
[before]
ra_pages |===========|===========|===========|
io_pages |---------|---------|---------|
req |---------|-|-------|---|

[after]
ra_pages |=========|=========|=========|
io_pages |---------|---------|---------|
req |---------|---------|---|

This path is known the large sequential read there. Well, anyway, this
intent is to use [after] as 3 req, instead of [before] as 4 req.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2020-08-30 14:07:05

by Sasha Levin

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

Hi

[This is an automated email]

This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all

The bot has tested the following trees: v5.8.5, v5.4.61, v4.19.142, v4.14.195, v4.9.234, v4.4.234.

v5.8.5: Build OK!
v5.4.61: Failed to apply! Possible dependencies:
898310032b96 ("fat: improve the readahead for FAT entries")
a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0")

v4.19.142: Failed to apply! Possible dependencies:
898310032b96 ("fat: improve the readahead for FAT entries")
a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0")

v4.14.195: Failed to apply! Possible dependencies:
898310032b96 ("fat: improve the readahead for FAT entries")
a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0")
f663b5b38fff ("fat: add FITRIM ioctl for FAT file system")

v4.9.234: Failed to apply! Possible dependencies:
898310032b96 ("fat: improve the readahead for FAT entries")
a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0")
f663b5b38fff ("fat: add FITRIM ioctl for FAT file system")

v4.4.234: Failed to apply! Possible dependencies:
898310032b96 ("fat: improve the readahead for FAT entries")
8992de4cec12 ("fat: constify fatent_operations structures")
a090a5a7d73f ("fat: fix fat_ra_init() for data clusters == 0")
f663b5b38fff ("fat: add FITRIM ioctl for FAT file system")


NOTE: The patch will not be queued to stable trees until it is upstream.

How should we proceed with this patch?

--
Thanks
Sasha

2020-08-30 14:19:45

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

Sasha Levin <[email protected]> writes:

> Hi
>
> [This is an automated email]
>
> This commit has been processed because it contains a -stable tag.
> The stable tag indicates that it's relevant for the following trees: all
>
> The bot has tested the following trees: v5.8.5, v5.4.61, v4.19.142, v4.14.195, v4.9.234, v4.4.234.
>
> v5.8.5: Build OK!

[...]

>
> How should we proceed with this patch?

Only 5.8.x has to apply this patch.

Thanks.
--
OGAWA Hirofumi <[email protected]>

2020-08-31 15:25:57

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

On Sat, Aug 29, 2020 at 7:08 PM OGAWA Hirofumi <[email protected]> wrote:
>
> On one system, there was bdi->io_pages==0. This seems to be the bug of
> a driver somewhere, and should fix it though. Anyway, it is better to
> avoid the divide-by-zero Oops.
>
> So this check it.
>
> Signed-off-by: OGAWA Hirofumi <[email protected]>
> Cc: <[email protected]>
> ---
> fs/fat/fatent.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
> index f7e3304..98a1c4f 100644
> --- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900
> +++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900
> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
> if (fatent->entry >= ent_limit)
> return;
>
> - if (ra_pages > sb->s_bdi->io_pages)
> + if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
> ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
> reada_blocks = ra_pages << (PAGE_SHIFT - sb->s_blocksize_bits + 1);

I don't think we should work-around this here. What device is this on?
Something like the below may help.

diff --git a/block/blk-core.c b/block/blk-core.c
index d9d632639bd1..10c08ac50697 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -539,6 +539,7 @@ struct request_queue *blk_alloc_queue(int node_id)
goto fail_stats;

q->backing_dev_info->ra_pages = VM_READAHEAD_PAGES;
+ q->backing_dev_info->io_pages = VM_READAHEAD_PAGES;
q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
q->node = node_id;

--
Jens Axboe

2020-08-31 16:39:10

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

Jens Axboe <[email protected]> writes:

> On Sat, Aug 29, 2020 at 7:08 PM OGAWA Hirofumi <[email protected]> wrote:
>>
>> On one system, there was bdi->io_pages==0. This seems to be the bug of
>> a driver somewhere, and should fix it though. Anyway, it is better to
>> avoid the divide-by-zero Oops.
>>
>> So this check it.
>>
>> Signed-off-by: OGAWA Hirofumi <[email protected]>
>> Cc: <[email protected]>
>> ---
>> fs/fat/fatent.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
>> index f7e3304..98a1c4f 100644
>> --- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900
>> +++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900
>> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
>> if (fatent->entry >= ent_limit)
>> return;
>>
>> - if (ra_pages > sb->s_bdi->io_pages)
>> + if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
>> ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
>> reada_blocks = ra_pages << (PAGE_SHIFT - sb->s_blocksize_bits + 1);
>
> I don't think we should work-around this here. What device is this on?
> Something like the below may help.

The reported bug is from nvme stack, and the below patch (I submitted
same patch to you) fixed the reported case though. But I didn't verify
all possible path, so I'd liked to use safer side.

If block layer can guarantee io_pages!=0 instead, and can apply to
stable branch (5.8+). It would work too.

Thanks.

> diff --git a/block/blk-core.c b/block/blk-core.c
> index d9d632639bd1..10c08ac50697 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -539,6 +539,7 @@ struct request_queue *blk_alloc_queue(int node_id)
> goto fail_stats;
>
> q->backing_dev_info->ra_pages = VM_READAHEAD_PAGES;
> + q->backing_dev_info->io_pages = VM_READAHEAD_PAGES;
> q->backing_dev_info->capabilities = BDI_CAP_CGROUP_WRITEBACK;
> q->node = node_id;


--
OGAWA Hirofumi <[email protected]>

2020-08-31 16:42:43

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

On 8/31/20 10:37 AM, OGAWA Hirofumi wrote:
> Jens Axboe <[email protected]> writes:
>
>> On Sat, Aug 29, 2020 at 7:08 PM OGAWA Hirofumi <[email protected]> wrote:
>>>
>>> On one system, there was bdi->io_pages==0. This seems to be the bug of
>>> a driver somewhere, and should fix it though. Anyway, it is better to
>>> avoid the divide-by-zero Oops.
>>>
>>> So this check it.
>>>
>>> Signed-off-by: OGAWA Hirofumi <[email protected]>
>>> Cc: <[email protected]>
>>> ---
>>> fs/fat/fatent.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/fs/fat/fatent.c b/fs/fat/fatent.c
>>> index f7e3304..98a1c4f 100644
>>> --- a/fs/fat/fatent.c 2020-08-30 06:52:47.251564566 +0900
>>> +++ b/fs/fat/fatent.c 2020-08-30 06:54:05.838319213 +0900
>>> @@ -660,7 +660,7 @@ static void fat_ra_init(struct super_blo
>>> if (fatent->entry >= ent_limit)
>>> return;
>>>
>>> - if (ra_pages > sb->s_bdi->io_pages)
>>> + if (sb->s_bdi->io_pages && ra_pages > sb->s_bdi->io_pages)
>>> ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
>>> reada_blocks = ra_pages << (PAGE_SHIFT - sb->s_blocksize_bits + 1);
>>
>> I don't think we should work-around this here. What device is this on?
>> Something like the below may help.
>
> The reported bug is from nvme stack, and the below patch (I submitted
> same patch to you) fixed the reported case though. But I didn't verify
> all possible path, so I'd liked to use safer side.
>
> If block layer can guarantee io_pages!=0 instead, and can apply to
> stable branch (5.8+). It would work too.

We really should ensure that ->io_pages is always set, imho, instead of
having to work-around it in other spots.

--
Jens Axboe

2020-08-31 17:00:20

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

On Mon, Aug 31, 2020 at 10:39:26AM -0600, Jens Axboe wrote:
> We really should ensure that ->io_pages is always set, imho, instead of
> having to work-around it in other spots.

Interestingly, there are only three places in the entire kernel which
_use_ bdi->io_pages. FAT, Verity and the pagecache readahead code.

Verity:
unsigned long num_ra_pages =
min_t(unsigned long, num_blocks_to_hash - i,
inode->i_sb->s_bdi->io_pages);

FAT:
if (ra_pages > sb->s_bdi->io_pages)
ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);

Pagecache:
max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages);
and
if (req_size > max_pages && bdi->io_pages > max_pages)
max_pages = min(req_size, bdi->io_pages);

The funny thing is that all three are using it differently. Verity is
taking io_pages to be the maximum amount to readahead. FAT is using
it as the unit of readahead (round down to the previous multiple) and
the pagecache uses it to limit reads that exceed the current per-file
readahead limit (but allows per-file readahead to exceed io_pages,
in which case it has no effect).

So how should it be used? My inclination is to say that the pagecache
is right, by virtue of being the most-used.

2020-08-31 17:02:17

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

On 8/31/20 10:56 AM, Matthew Wilcox wrote:
> On Mon, Aug 31, 2020 at 10:39:26AM -0600, Jens Axboe wrote:
>> We really should ensure that ->io_pages is always set, imho, instead of
>> having to work-around it in other spots.
>
> Interestingly, there are only three places in the entire kernel which
> _use_ bdi->io_pages. FAT, Verity and the pagecache readahead code.
>
> Verity:
> unsigned long num_ra_pages =
> min_t(unsigned long, num_blocks_to_hash - i,
> inode->i_sb->s_bdi->io_pages);
>
> FAT:
> if (ra_pages > sb->s_bdi->io_pages)
> ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
>
> Pagecache:
> max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages);
> and
> if (req_size > max_pages && bdi->io_pages > max_pages)
> max_pages = min(req_size, bdi->io_pages);
>
> The funny thing is that all three are using it differently. Verity is
> taking io_pages to be the maximum amount to readahead. FAT is using
> it as the unit of readahead (round down to the previous multiple) and
> the pagecache uses it to limit reads that exceed the current per-file
> readahead limit (but allows per-file readahead to exceed io_pages,
> in which case it has no effect).
>
> So how should it be used? My inclination is to say that the pagecache
> is right, by virtue of being the most-used.

When I added ->io_pages, it was for the page cache use case. The others
grew after that...

--
Jens Axboe

2020-08-31 17:18:04

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

Jens Axboe <[email protected]> writes:

> On 8/31/20 10:37 AM, OGAWA Hirofumi wrote:
>> Jens Axboe <[email protected]> writes:
>>
>>> I don't think we should work-around this here. What device is this on?
>>> Something like the below may help.
>>
>> The reported bug is from nvme stack, and the below patch (I submitted
>> same patch to you) fixed the reported case though. But I didn't verify
>> all possible path, so I'd liked to use safer side.
>>
>> If block layer can guarantee io_pages!=0 instead, and can apply to
>> stable branch (5.8+). It would work too.
>
> We really should ensure that ->io_pages is always set, imho, instead of
> having to work-around it in other spots.

I think it is good too. However, the issue would be how to do it for
stable branch.

If you think that block layer patch is enough and submit to stable
(5.8+) branch instead, I'm fine without fatfs patch. (Or removing
workaround in fatfs with block layer patch later?)

Thanks.
--
OGAWA Hirofumi <[email protected]>

2020-08-31 17:23:43

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

On 8/31/20 11:16 AM, OGAWA Hirofumi wrote:
> Jens Axboe <[email protected]> writes:
>
>> On 8/31/20 10:37 AM, OGAWA Hirofumi wrote:
>>> Jens Axboe <[email protected]> writes:
>>>
>>>> I don't think we should work-around this here. What device is this on?
>>>> Something like the below may help.
>>>
>>> The reported bug is from nvme stack, and the below patch (I submitted
>>> same patch to you) fixed the reported case though. But I didn't verify
>>> all possible path, so I'd liked to use safer side.
>>>
>>> If block layer can guarantee io_pages!=0 instead, and can apply to
>>> stable branch (5.8+). It would work too.
>>
>> We really should ensure that ->io_pages is always set, imho, instead of
>> having to work-around it in other spots.
>
> I think it is good too. However, the issue would be how to do it for
> stable branch.

Agree

> If you think that block layer patch is enough and submit to stable
> (5.8+) branch instead, I'm fine without fatfs patch. (Or removing
> workaround in fatfs with block layer patch later?)

Well, it should catch any block based case as we then set it when
allocating the queue. So should do the trick for all block based
cases at least.

--
Jens Axboe

2020-08-31 17:42:16

by OGAWA Hirofumi

[permalink] [raw]
Subject: Re: [PATCH] fat: Avoid oops when bdi->io_pages==0

Jens Axboe <[email protected]> writes:

> On 8/31/20 10:56 AM, Matthew Wilcox wrote:
>> On Mon, Aug 31, 2020 at 10:39:26AM -0600, Jens Axboe wrote:
>>> We really should ensure that ->io_pages is always set, imho, instead of
>>> having to work-around it in other spots.
>>
>> Interestingly, there are only three places in the entire kernel which
>> _use_ bdi->io_pages. FAT, Verity and the pagecache readahead code.
>>
>> Verity:
>> unsigned long num_ra_pages =
>> min_t(unsigned long, num_blocks_to_hash - i,
>> inode->i_sb->s_bdi->io_pages);
>>
>> FAT:
>> if (ra_pages > sb->s_bdi->io_pages)
>> ra_pages = rounddown(ra_pages, sb->s_bdi->io_pages);
>>
>> Pagecache:
>> max_pages = max_t(unsigned long, bdi->io_pages, ra->ra_pages);
>> and
>> if (req_size > max_pages && bdi->io_pages > max_pages)
>> max_pages = min(req_size, bdi->io_pages);
>>
>> The funny thing is that all three are using it differently. Verity is
>> taking io_pages to be the maximum amount to readahead. FAT is using
>> it as the unit of readahead (round down to the previous multiple) and
>> the pagecache uses it to limit reads that exceed the current per-file
>> readahead limit (but allows per-file readahead to exceed io_pages,
>> in which case it has no effect).
>>
>> So how should it be used? My inclination is to say that the pagecache
>> is right, by virtue of being the most-used.
>
> When I added ->io_pages, it was for the page cache use case. The others
> grew after that...

FAT and pagecache usage would be similar or same purpose. The both is
using io_pages as optimal IO size.

In pagecache case, it uses io_pages if one request size is exceeding
io_pages. In FAT case, there is perfect knowledge about future/total
request size. So FAT divides request by io_pages, and adjust ra_pages
with knowledge.

I don't know about verity.

Thanks.
--
OGAWA Hirofumi <[email protected]>