LinuxLists.cc - [PATCH] libext2fs: readahead for meta

2017-02-20 10:33:02

Subject: [PATCH] libext2fs: readahead for meta_bg

From: Alexey Lyashkov <[email protected]>

There are ~37k of random IOs with meta_bg option on 300T target.
Debugfs requires 20 minutes to be started. Enabling readahead for
group blocks metadata save time dramatically. Only 12s to start.

Signed-off-by: Alexey Lyashkov <[email protected]>
---
lib/ext2fs/openfs.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
index ba501e6..f158b0a 100644
--- a/lib/ext2fs/openfs.c
+++ b/lib/ext2fs/openfs.c
@@ -399,6 +399,12 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
#endif
dest += fs->blocksize*first_meta_bg;
}
+
+ for (i = first_meta_bg ; i < fs->desc_blocks; i++) {
+ blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
+ io_channel_cache_readahead(fs->io, blk, 1);
+ }
+
for (i=first_meta_bg ; i < fs->desc_blocks; i++) {
blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
retval = io_channel_read_blk64(fs->io, blk, 1, dest);
--
1.7.1

2017-02-28 16:16:49

by Darrick J. Wong

[permalink] [raw]

Subject: Re: [PATCH] libext2fs: readahead for meta_bg

On Mon, Feb 20, 2017 at 01:03:45PM +0300, Artem Blagodarenko wrote:
> From: Alexey Lyashkov <[email protected]>
>
> There are ~37k of random IOs with meta_bg option on 300T target.

Yikes, 300TB ext4! :)

(Is that really 2.2 million blockgroups?)

> Debugfs requires 20 minutes to be started. Enabling readahead for
> group blocks metadata save time dramatically. Only 12s to start.
>
> Signed-off-by: Alexey Lyashkov <[email protected]>

Looks good,
Reviewed-by: Darrick J. Wong <[email protected]>

--D

> ---
> lib/ext2fs/openfs.c | 6 ++++++
> 1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
> index ba501e6..f158b0a 100644
> --- a/lib/ext2fs/openfs.c
> +++ b/lib/ext2fs/openfs.c
> @@ -399,6 +399,12 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
> #endif
> dest += fs->blocksize*first_meta_bg;
> }
> +
> + for (i = first_meta_bg ; i < fs->desc_blocks; i++) {
> + blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
> + io_channel_cache_readahead(fs->io, blk, 1);
> + }
> +
> for (i=first_meta_bg ; i < fs->desc_blocks; i++) {
> blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
> retval = io_channel_read_blk64(fs->io, blk, 1, dest);
> --
> 1.7.1
>

2017-03-01 01:16:54

by Andreas Dilger

[permalink] [raw]

Subject: Re: [PATCH] libext2fs: readahead for meta_bg

On Feb 20, 2017, at 3:03 AM, Artem Blagodarenko <[email protected]> wrote:
>
> From: Alexey Lyashkov <[email protected]>
>
> There are ~37k of random IOs with meta_bg option on 300T target.
> Debugfs requires 20 minutes to be started. Enabling readahead for
> group blocks metadata save time dramatically. Only 12s to start.
>
> Signed-off-by: Alexey Lyashkov <[email protected]>

This patch looks good by itself.

Reviewed-by: Andreas Dilger <[email protected]>
----

On a related note, I've been wondering if it would make sense to have
a second patch that *only* does the readahead of the group descriptor blocks
in ext2fs_open2(), and move io_channel_read_blk64() to ext2fs_group_desc()
when the group descriptor blocks are actually accessed the first time? This
would allow tools like tune2fs, debugfs, dumpe2fs, etc. that may not access
group descriptors to load _much_ faster than if it loads all of the bitmaps
synchronously at filesystem open time. Even if they _do_ access the GDT it
will at least allow the prefetch more time to run in the background, and the
GDT swabbing happen incrementally upon access rather than all at the start.

A quick look through lib/ext2fs looks like ext2fs_group_desc() is used for
the majority of group descriptor accesses, but there are a few places that
access fs->group_desc directly. The ext2fs_group_desc() code could check
whether the group descriptor is all-zero (ext2fs_open2() should be changed
to use ext2fs_get_array_zero(..., &fs->group_desc)) and if so read the whole
descriptor block into the array and optionally swab it.

Cheers, Andreas

> ---
> lib/ext2fs/openfs.c | 6 ++++++
> 1 files changed, 6 insertions(+), 0 deletions(-)
>
> diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
> index ba501e6..f158b0a 100644
> --- a/lib/ext2fs/openfs.c
> +++ b/lib/ext2fs/openfs.c
> @@ -399,6 +399,12 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
> #endif
> dest += fs->blocksize*first_meta_bg;
> }
> +
> + for (i = first_meta_bg ; i < fs->desc_blocks; i++) {
> + blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
> + io_channel_cache_readahead(fs->io, blk, 1);
> + }
> +
> for (i=first_meta_bg ; i < fs->desc_blocks; i++) {
> blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
> retval = io_channel_read_blk64(fs->io, blk, 1, dest);
> --
> 1.7.1
>

Cheers, Andreas

Attachments:

signature.asc (195.00 B)
Message signed with OpenPGP

2017-03-01 02:20:07

by Alexey Lyahkov

[permalink] [raw]

Subject: Re: [PATCH] libext2fs: readahead for meta_bg

Andreas,

we have do it first. But patch was much and much complex due checksums handling in code.
and ext2_flush() will be need to read an all GD in memory to as use a flat array to write a GD to the disk.
If you want we have submit it also, but it have no benefits (like a few seconds), against it simple version.

> 1 марта 2017 г., в 3:10, Andreas Dilger <[email protected]> написал(а):
>
> On Feb 20, 2017, at 3:03 AM, Artem Blagodarenko <[email protected]> wrote:
>>
>> From: Alexey Lyashkov <[email protected]>
>>
>> There are ~37k of random IOs with meta_bg option on 300T target.
>> Debugfs requires 20 minutes to be started. Enabling readahead for
>> group blocks metadata save time dramatically. Only 12s to start.
>>
>> Signed-off-by: Alexey Lyashkov <[email protected]>
>
> This patch looks good by itself.
>
> Reviewed-by: Andreas Dilger <[email protected]>
> ----
>
> On a related note, I've been wondering if it would make sense to have
> a second patch that *only* does the readahead of the group descriptor blocks
> in ext2fs_open2(), and move io_channel_read_blk64() to ext2fs_group_desc()
> when the group descriptor blocks are actually accessed the first time? This
> would allow tools like tune2fs, debugfs, dumpe2fs, etc. that may not access
> group descriptors to load _much_ faster than if it loads all of the bitmaps
> synchronously at filesystem open time. Even if they _do_ access the GDT it
> will at least allow the prefetch more time to run in the background, and the
> GDT swabbing happen incrementally upon access rather than all at the start.
>
> A quick look through lib/ext2fs looks like ext2fs_group_desc() is used for
> the majority of group descriptor accesses, but there are a few places that
> access fs->group_desc directly. The ext2fs_group_desc() code could check
> whether the group descriptor is all-zero (ext2fs_open2() should be changed
> to use ext2fs_get_array_zero(..., &fs->group_desc)) and if so read the whole
> descriptor block into the array and optionally swab it.
>
> Cheers, Andreas
>
>> ---
>> lib/ext2fs/openfs.c | 6 ++++++
>> 1 files changed, 6 insertions(+), 0 deletions(-)
>>
>> diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
>> index ba501e6..f158b0a 100644
>> --- a/lib/ext2fs/openfs.c
>> +++ b/lib/ext2fs/openfs.c
>> @@ -399,6 +399,12 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
>> #endif
>> dest += fs->blocksize*first_meta_bg;
>> }
>> +
>> + for (i = first_meta_bg ; i < fs->desc_blocks; i++) {
>> + blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
>> + io_channel_cache_readahead(fs->io, blk, 1);
>> + }
>> +
>> for (i=first_meta_bg ; i < fs->desc_blocks; i++) {
>> blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
>> retval = io_channel_read_blk64(fs->io, blk, 1, dest);
>> --
>> 1.7.1
>>
>
>
> Cheers, Andreas
>
>
>
>
>

2017-03-01 04:59:11

by Alexey Lyahkov

[permalink] [raw]

Subject: Re: [PATCH] libext2fs: readahead for meta_bg

> 1 марта 2017 г., в 5:50, Andreas Dilger <[email protected]> написал(а):
>
> On Feb 28, 2017, at 7:19 PM, Alexey Lyashkov <[email protected]> wrote:
>>
>> Andreas,
>>
>> we have do it first. But patch was much and much complex due checksums handling in code.
>> and ext2_flush() will be need to read an all GD in memory to as use a flat array to write a GD to the disk.
>
> Yes, I saw ext2_flush() was accessing the array directly, and would have a
> problem as you describe. One option would be to skip writing uninitialized
> GDT blocks, but that also becomes more complex to get correct.
checking agains a inode bitmap block number is enough, to see is GD read from disk or not.

>
>> If you want we have submit it also, but it have no benefits (like a few seconds), against it simple version.
>
> I guess a large part of your speedup is because of submitting the GDT reads
> in parallel to a disk array. If the GDT blocks are all mapped to a single
> disk in the array (entirely possible with META_BG, depending on array geometry)
> then the prefetch will have minimal benefits.
yes and no. it will have a benefits because avoid a delay between sending a new requests so io scheduler may optimize it better.
from other view - we have additional delay between submit and access due processing a previously request, while IO subsystem may work in this time.
Both these cases will provide a good benefit.
But again - I may ask an Artem submit a patch to read GD by demand as it ready again, but it don’t like it due a complexity.

>
> Another option would be to change debugfs/tune2fs/dumpe2fs to use the
> EXT2_FLAG_SUPER_ONLY flag to only read the superblock on open if the
> requested operations do not need access to the group descriptors at all?
> For a large filesystem as you describe, 37K GDT blocks is still over 144MB
> of data that needs to be read from disk, vs 4KB for the superblock.

It not an option. If we talk about lustre - debugfs uses to mount data copy from raid to local disk to parse.
It mean we need a GD covers directory inode in memory and full inode read to have an checks passed.
I tries it also, but it need to disable a checks inside of libe2fs.

>
> Cheers, Andreas
>
>>> 1 марта 2017 г., в 3:10, Andreas Dilger <[email protected]> написал(а):
>>>
>>> On Feb 20, 2017, at 3:03 AM, Artem Blagodarenko <[email protected]> wrote:
>>>>
>>>> From: Alexey Lyashkov <[email protected]>
>>>>
>>>> There are ~37k of random IOs with meta_bg option on 300T target.
>>>> Debugfs requires 20 minutes to be started. Enabling readahead for
>>>> group blocks metadata save time dramatically. Only 12s to start.
>>>>
>>>> Signed-off-by: Alexey Lyashkov <[email protected]>
>>>
>>> This patch looks good by itself.
>>>
>>> Reviewed-by: Andreas Dilger <[email protected]>
>>> ----
>>>
>>> On a related note, I've been wondering if it would make sense to have
>>> a second patch that *only* does the readahead of the group descriptor blocks
>>> in ext2fs_open2(), and move io_channel_read_blk64() to ext2fs_group_desc()
>>> when the group descriptor blocks are actually accessed the first time? This
>>> would allow tools like tune2fs, debugfs, dumpe2fs, etc. that may not access
>>> group descriptors to load _much_ faster than if it loads all of the bitmaps
>>> synchronously at filesystem open time. Even if they _do_ access the GDT it
>>> will at least allow the prefetch more time to run in the background, and the
>>> GDT swabbing happen incrementally upon access rather than all at the start.
>>>
>>> A quick look through lib/ext2fs looks like ext2fs_group_desc() is used for
>>> the majority of group descriptor accesses, but there are a few places that
>>> access fs->group_desc directly. The ext2fs_group_desc() code could check
>>> whether the group descriptor is all-zero (ext2fs_open2() should be changed
>>> to use ext2fs_get_array_zero(..., &fs->group_desc)) and if so read the whole
>>> descriptor block into the array and optionally swab it.
>>>
>>> Cheers, Andreas
>>>
>>>> ---
>>>> lib/ext2fs/openfs.c | 6 ++++++
>>>> 1 files changed, 6 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
>>>> index ba501e6..f158b0a 100644
>>>> --- a/lib/ext2fs/openfs.c
>>>> +++ b/lib/ext2fs/openfs.c
>>>> @@ -399,6 +399,12 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
>>>> #endif
>>>> dest += fs->blocksize*first_meta_bg;
>>>> }
>>>> +
>>>> + for (i = first_meta_bg ; i < fs->desc_blocks; i++) {
>>>> + blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
>>>> + io_channel_cache_readahead(fs->io, blk, 1);
>>>> + }
>>>> +
>>>> for (i=first_meta_bg ; i < fs->desc_blocks; i++) {
>>>> blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
>>>> retval = io_channel_read_blk64(fs->io, blk, 1, dest);
>>>> --
>>>> 1.7.1
>>>>
>>>
>>>
>>> Cheers, Andreas
>>>
>>>
>>>
>>>
>>>
>>
>
>
> Cheers, Andreas
>
>
>
>
>

2017-03-01 05:07:00

by Andreas Dilger

[permalink] [raw]

Subject: Re: [PATCH] libext2fs: readahead for meta_bg

On Feb 28, 2017, at 9:02 PM, Alexey Lyashkov <[email protected]> wrote:
>
>
>> 1 марта 2017 г., в 5:50, Andreas Dilger <[email protected]> написал(а):
>>
>> On Feb 28, 2017, at 7:19 PM, Alexey Lyashkov <[email protected]> wrote:
>>>
>>> Andreas,
>>>
>>> we have do it first. But patch was much and much complex due checksums handling in code.
>>> and ext2_flush() will be need to read an all GD in memory to as use a flat array to write a GD to the disk.
>>
>> Yes, I saw ext2_flush() was accessing the array directly, and would have a
>> problem as you describe. One option would be to skip writing uninitialized
>> GDT blocks, but that also becomes more complex to get correct.
> checking agains a inode bitmap block number is enough, to see is GD read from disk or not.

It would be possible to check only the inode bitmap location (low and high word)
is zero, but memcmp(gdt, zero_gdt, sizeof(*gdt)) just as easy, and safer.

>>> If you want we have submit it also, but it have no benefits (like a few seconds), against it simple version.
>>
>> I guess a large part of your speedup is because of submitting the GDT reads
>> in parallel to a disk array. If the GDT blocks are all mapped to a single
>> disk in the array (entirely possible with META_BG, depending on array
>> geometry) then the prefetch will have minimal benefits.
>
> yes and no. it will have a benefits because avoid a delay between sending a new requests so io scheduler may optimize it better.
> from other view - we have additional delay between submit and access due processing a previously request, while IO subsystem may work in this time.
> Both these cases will provide a good benefit.
> But again - I may ask an Artem submit a patch to read GD by demand as it ready again, but it don’t like it due a complexity.

Sure, it would be good to send it to the list, just to see where complexity
is and discuss how it might be fixed.

My 5-minute investigation showed loading the GDT block in ext2fs_group_desc()
would cover almost all of the use cases, and some of them (e.g. filesystem
clone/copy) could also just do on-demand loading for the GDT copy.

The tricky part is to handle GDT writeout correctly, otherwise the GDT may
be overwritten by zero-filled buffers. The number of cases of direct buffer
access is very small. In fact, most of the users of ext2fs_group_desc()
could lose the fs->group_desc argument and just get this internally, so
that it is clear when it is being used on an external buffer (only in one
or two places).

>> Another option would be to change debugfs/tune2fs/dumpe2fs to use the
>> EXT2_FLAG_SUPER_ONLY flag to only read the superblock on open if the
>> requested operations do not need access to the group descriptors at all?
>> For a large filesystem as you describe, 37K GDT blocks is still over 144MB
>> of data that needs to be read from disk, vs 4KB for the superblock.
>
> It not an option. If we talk about lustre - debugfs uses to mount data copy from raid to local disk to parse. It mean we need a GD covers directory inode
> in memory and full inode read to have an checks passed.

Sure, but for some of the operations (e.g. "debugfs -c -R features", or
"dumpe2fs -h" or "tune2fs [-cCeEgimrTuLM]" and similar) can work with only
the superblock, and possibly just the first GDT block to load the journal
inode or bad blocks inode.

> I tries it also, but it need to disable a checks inside of libe2fs.

Sure, it depends on how much needs to be disabled in order to work. If there
is on-demand loading, then those changes are also minimized. There may need
to be an EXT2_FLAG_LAZY_GDT open flag also, to indicate that the caller knows
that the fs->group_desc table may not be fully loaded, and that a full scan is
needed before writing it out.

Cheers, Andreas

>>>> 1 марта 2017 г., в 3:10, Andreas Dilger <[email protected]> написал(а):
>>>>
>>>> On Feb 20, 2017, at 3:03 AM, Artem Blagodarenko <[email protected]> wrote:
>>>>>
>>>>> From: Alexey Lyashkov <[email protected]>
>>>>>
>>>>> There are ~37k of random IOs with meta_bg option on 300T target.
>>>>> Debugfs requires 20 minutes to be started. Enabling readahead for
>>>>> group blocks metadata save time dramatically. Only 12s to start.
>>>>>
>>>>> Signed-off-by: Alexey Lyashkov <[email protected]>
>>>>
>>>> This patch looks good by itself.
>>>>
>>>> Reviewed-by: Andreas Dilger <[email protected]>
>>>> ----
>>>>
>>>> On a related note, I've been wondering if it would make sense to have
>>>> a second patch that *only* does the readahead of the group descriptor blocks
>>>> in ext2fs_open2(), and move io_channel_read_blk64() to ext2fs_group_desc()
>>>> when the group descriptor blocks are actually accessed the first time? This
>>>> would allow tools like tune2fs, debugfs, dumpe2fs, etc. that may not access
>>>> group descriptors to load _much_ faster than if it loads all of the bitmaps
>>>> synchronously at filesystem open time. Even if they _do_ access the GDT it
>>>> will at least allow the prefetch more time to run in the background, and the
>>>> GDT swabbing happen incrementally upon access rather than all at the start.
>>>>
>>>> A quick look through lib/ext2fs looks like ext2fs_group_desc() is used for
>>>> the majority of group descriptor accesses, but there are a few places that
>>>> access fs->group_desc directly. The ext2fs_group_desc() code could check
>>>> whether the group descriptor is all-zero (ext2fs_open2() should be changed
>>>> to use ext2fs_get_array_zero(..., &fs->group_desc)) and if so read the whole
>>>> descriptor block into the array and optionally swab it.
>>>>
>>>> Cheers, Andreas
>>>>
>>>>> ---
>>>>> lib/ext2fs/openfs.c | 6 ++++++
>>>>> 1 files changed, 6 insertions(+), 0 deletions(-)
>>>>>
>>>>> diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
>>>>> index ba501e6..f158b0a 100644
>>>>> --- a/lib/ext2fs/openfs.c
>>>>> +++ b/lib/ext2fs/openfs.c
>>>>> @@ -399,6 +399,12 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
>>>>> #endif
>>>>> dest += fs->blocksize*first_meta_bg;
>>>>> }
>>>>> +
>>>>> + for (i = first_meta_bg ; i < fs->desc_blocks; i++) {
>>>>> + blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
>>>>> + io_channel_cache_readahead(fs->io, blk, 1);
>>>>> + }
>>>>> +
>>>>> for (i=first_meta_bg ; i < fs->desc_blocks; i++) {
>>>>> blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
>>>>> retval = io_channel_read_blk64(fs->io, blk, 1, dest);
>>>>> --
>>>>> 1.7.1
>>>>>
>>>>
>>>>
>>>> Cheers, Andreas
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>> Cheers, Andreas
>>
>>
>>
>>
>>
>

Cheers, Andreas

Attachments:

signature.asc (195.00 B)
Message signed with OpenPGP

2017-03-01 05:55:46

by Andreas Dilger

[permalink] [raw]

Subject: Re: [PATCH] libext2fs: readahead for meta_bg

On Feb 28, 2017, at 7:19 PM, Alexey Lyashkov <[email protected]> wrote:
>
> Andreas,
>
> we have do it first. But patch was much and much complex due checksums handling in code.
> and ext2_flush() will be need to read an all GD in memory to as use a flat array to write a GD to the disk.

Yes, I saw ext2_flush() was accessing the array directly, and would have a
problem as you describe. One option would be to skip writing uninitialized
GDT blocks, but that also becomes more complex to get correct.

> If you want we have submit it also, but it have no benefits (like a few seconds), against it simple version.

I guess a large part of your speedup is because of submitting the GDT reads
in parallel to a disk array. If the GDT blocks are all mapped to a single
disk in the array (entirely possible with META_BG, depending on array geometry)
then the prefetch will have minimal benefits.

Another option would be to change debugfs/tune2fs/dumpe2fs to use the
EXT2_FLAG_SUPER_ONLY flag to only read the superblock on open if the
requested operations do not need access to the group descriptors at all?
For a large filesystem as you describe, 37K GDT blocks is still over 144MB
of data that needs to be read from disk, vs 4KB for the superblock.

Cheers, Andreas

>> 1 марта 2017 г., в 3:10, Andreas Dilger <[email protected]> написал(а):
>>
>> On Feb 20, 2017, at 3:03 AM, Artem Blagodarenko <[email protected]> wrote:
>>>
>>> From: Alexey Lyashkov <[email protected]>
>>>
>>> There are ~37k of random IOs with meta_bg option on 300T target.
>>> Debugfs requires 20 minutes to be started. Enabling readahead for
>>> group blocks metadata save time dramatically. Only 12s to start.
>>>
>>> Signed-off-by: Alexey Lyashkov <[email protected]>
>>
>> This patch looks good by itself.
>>
>> Reviewed-by: Andreas Dilger <[email protected]>
>> ----
>>
>> On a related note, I've been wondering if it would make sense to have
>> a second patch that *only* does the readahead of the group descriptor blocks
>> in ext2fs_open2(), and move io_channel_read_blk64() to ext2fs_group_desc()
>> when the group descriptor blocks are actually accessed the first time? This
>> would allow tools like tune2fs, debugfs, dumpe2fs, etc. that may not access
>> group descriptors to load _much_ faster than if it loads all of the bitmaps
>> synchronously at filesystem open time. Even if they _do_ access the GDT it
>> will at least allow the prefetch more time to run in the background, and the
>> GDT swabbing happen incrementally upon access rather than all at the start.
>>
>> A quick look through lib/ext2fs looks like ext2fs_group_desc() is used for
>> the majority of group descriptor accesses, but there are a few places that
>> access fs->group_desc directly. The ext2fs_group_desc() code could check
>> whether the group descriptor is all-zero (ext2fs_open2() should be changed
>> to use ext2fs_get_array_zero(..., &fs->group_desc)) and if so read the whole
>> descriptor block into the array and optionally swab it.
>>
>> Cheers, Andreas
>>
>>> ---
>>> lib/ext2fs/openfs.c | 6 ++++++
>>> 1 files changed, 6 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
>>> index ba501e6..f158b0a 100644
>>> --- a/lib/ext2fs/openfs.c
>>> +++ b/lib/ext2fs/openfs.c
>>> @@ -399,6 +399,12 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
>>> #endif
>>> dest += fs->blocksize*first_meta_bg;
>>> }
>>> +
>>> + for (i = first_meta_bg ; i < fs->desc_blocks; i++) {
>>> + blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
>>> + io_channel_cache_readahead(fs->io, blk, 1);
>>> + }
>>> +
>>> for (i=first_meta_bg ; i < fs->desc_blocks; i++) {
>>> blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
>>> retval = io_channel_read_blk64(fs->io, blk, 1, dest);
>>> --
>>> 1.7.1
>>>
>>
>>
>> Cheers, Andreas
>>
>>
>>
>>
>>
>

Cheers, Andreas

Attachments:

signature.asc (195.00 B)
Message signed with OpenPGP

2017-03-01 20:37:30

by Theodore Ts'o

[permalink] [raw]

Subject: Re: [PATCH] libext2fs: readahead for meta_bg

On Mon, Feb 20, 2017 at 01:03:45PM +0300, Artem Blagodarenko wrote:
> From: Alexey Lyashkov <[email protected]>
>
> There are ~37k of random IOs with meta_bg option on 300T target.
> Debugfs requires 20 minutes to be started. Enabling readahead for
> group blocks metadata save time dramatically. Only 12s to start.
>
> Signed-off-by: Alexey Lyashkov <[email protected]>

Applied, thanks!

- Ted

2017-03-03 10:35:13

by Artem Blagodarenko

[permalink] [raw]

Subject: [PATCH] libext2fs: preload block group on request

From: Artem Blagodarenko <[email protected]>

I send this for descussion. Requested by Andreas Dilger and Alexey Lyashkov.

> But again - I may ask an Artem submit a patch to read GD by demand as =
>it ready again, but it don=E2=80=99t like it due a complexity.

>Sure, it would be good to send it to the list, just to see where =
>complexity is and discuss how it might be fixed.

---
lib/ext2fs/blknum.c | 47 +++++++++++++++++++++++++++++++++++++++++++++--
lib/ext2fs/ext2_fs.h | 1 +
lib/ext2fs/ext2fs.h | 1 +
lib/ext2fs/openfs.c | 27 ++++++---------------------
4 files changed, 53 insertions(+), 23 deletions(-)

diff --git a/lib/ext2fs/blknum.c b/lib/ext2fs/blknum.c
index ac80849..b171bac 100644
--- a/lib/ext2fs/blknum.c
+++ b/lib/ext2fs/blknum.c
@@ -186,8 +186,51 @@ struct ext2_group_desc *ext2fs_group_desc(ext2_filsys fs,
dgrp_t group)
{
int desc_size = EXT2_DESC_SIZE(fs->super) & ~7;
-
- return (struct ext2_group_desc *)((char *)gdp + group * desc_size);
+ struct ext2_group_desc *ret_gdp;
+ struct ext2_group_desc *tmp_gdp;
+ char *dest;
+ dgrp_t block;
+ blk64_t blk;
+ int retval;
+ unsigned int i;
+ unsigned int groups_per_block = EXT2_DESC_PER_BLOCK(fs->super);
+
+ ret_gdp = (struct ext2_group_desc *)((char *)gdp + group * desc_size);
+
+ if ((ret_gdp->bg_flags & EXT2_BG_READ) == 0) {
+ block = group / groups_per_block;
+ blk = ext2fs_descriptor_block_loc2(fs, fs->group_block, block);
+ dest = (char *) gdp + fs->blocksize * block;
+ retval = io_channel_read_blk64(fs->io, blk, 1, dest);
+ if (retval)
+ return NULL;
+
+ tmp_gdp = (struct ext2_group_desc *)dest;
+
+ for (i=0; i < groups_per_block; i++) {
+ /*
+ * TDB: If recovery is from backup superblock, Clear
+ * _UNININT flags & reset bg_itable_unused to zero
+ */
+#ifdef WORDS_BIGENDIAN
+ ext2fs_swap_group_desc2(fs, tmp_gdp);
+#endif
+ tmp_gdp->bg_flags |= EXT2_BG_READ;
+
+ if (fs->orig_super == 0 &&
+ EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+ EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+ ext2fs_bg_flags_clear(fs, block * groups_per_block + i, EXT2_BG_BLOCK_UNINIT);
+ ext2fs_bg_flags_clear(fs, block * groups_per_block + i, EXT2_BG_INODE_UNINIT);
+ ext2fs_bg_itable_unused_set(fs, block * groups_per_block + i, 0);
+ // The checksum will be reset later, but fix it here
+ // anyway to avoid printing a lot of spurious errors.
+ ext2fs_group_desc_csum_set(fs, block * groups_per_block + i);
+ }
+ tmp_gdp = (struct ext2_group_desc *)((char *)tmp_gdp + EXT2_DESC_SIZE(fs->super));
+ }
+ }
+ return ret_gdp;
}

/* Do the same but as an ext4 group desc for internal use here */
diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index 27a7d3a..4858684 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -202,6 +202,7 @@ struct ext4_group_desc
#define EXT2_BG_INODE_UNINIT 0x0001 /* Inode table/bitmap not initialized */
#define EXT2_BG_BLOCK_UNINIT 0x0002 /* Block bitmap not initialized */
#define EXT2_BG_INODE_ZEROED 0x0004 /* On-disk itable initialized to zero */
+#define EXT2_BG_READ 0x0008 /* Block group was read from disk */

/*
* Data structures used by the directory indexing feature
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 8ff49ca..0ae5cbe 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -221,6 +221,7 @@ struct struct_ext2_filsys {
int fragsize;
dgrp_t group_desc_count;
unsigned long desc_blocks;
+ blk64_t group_block;
struct opaque_ext2_group_desc * group_desc;
unsigned int inode_blocks_per_group;
ext2fs_inode_bitmap inode_map;
diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
index ba39e01..5f6cda5 100644
--- a/lib/ext2fs/openfs.c
+++ b/lib/ext2fs/openfs.c
@@ -121,11 +121,9 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
blk64_t group_block, blk;
char *dest, *cp;
int group_zero_adjust = 0;
-#ifdef WORDS_BIGENDIAN
unsigned int groups_per_block;
struct ext2_group_desc *gdp;
int j;
-#endif

EXT2_CHECK_MAGIC(manager, EXT2_ET_MAGIC_IO_MANAGER);

@@ -412,40 +410,27 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
first_meta_bg, dest);
if (retval)
goto cleanup;
-#ifdef WORDS_BIGENDIAN
- gdp = (struct ext2_group_desc *) dest;
for (j=0; j < groups_per_block*first_meta_bg; j++) {
gdp = ext2fs_group_desc(fs, fs->group_desc, j);
- ext2fs_swap_group_desc2(fs, gdp);
- }
-#endif
- dest += fs->blocksize*first_meta_bg;
- }
- for (i=first_meta_bg ; i < fs->desc_blocks; i++) {
- blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
- retval = io_channel_read_blk64(fs->io, blk, 1, dest);
- if (retval)
- goto cleanup;
#ifdef WORDS_BIGENDIAN
- for (j=0; j < groups_per_block; j++) {
- gdp = ext2fs_group_desc(fs, fs->group_desc,
- i * groups_per_block + j);
ext2fs_swap_group_desc2(fs, gdp);
- }
#endif
- dest += fs->blocksize;
+ gdp->bg_flags |= EXT2_BG_READ;
+ }
+ dest += fs->blocksize*first_meta_bg;
}

+ fs->group_block = group_block;
fs->stride = fs->super->s_raid_stride;

/*
* If recovery is from backup superblock, Clear _UNININT flags &
* reset bg_itable_unused to zero
*/
- if (superblock > 1 && ext2fs_has_group_desc_csum(fs)) {
+ if (first_meta_bg && superblock > 1 && ext2fs_has_group_desc_csum(fs)) {
dgrp_t group;

- for (group = 0; group < fs->group_desc_count; group++) {
+ for (group = 0; group < groups_per_block*first_meta_bg; group++) {
ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
ext2fs_bg_flags_clear(fs, group, EXT2_BG_INODE_UNINIT);
ext2fs_bg_itable_unused_set(fs, group, 0);
--
1.7.1

2017-03-03 19:59:21

by Artem Blagodarenko

[permalink] [raw]

Subject: [PATCH v2] libext2fs: preload block group on request

From: Artem Blagodarenko <[email protected]>

---
lib/ext2fs/blknum.c | 46 ++++++++++++++++++++++++++++++++++++++++++++--
lib/ext2fs/closefs.c | 7 +++++--
lib/ext2fs/ext2_fs.h | 1 +
lib/ext2fs/ext2fs.h | 3 +++
lib/ext2fs/openfs.c | 29 +++++++----------------------
5 files changed, 60 insertions(+), 26 deletions(-)

diff --git a/lib/ext2fs/blknum.c b/lib/ext2fs/blknum.c
index ac80849..a5d5082 100644
--- a/lib/ext2fs/blknum.c
+++ b/lib/ext2fs/blknum.c
@@ -186,8 +186,50 @@ struct ext2_group_desc *ext2fs_group_desc(ext2_filsys fs,
dgrp_t group)
{
int desc_size = EXT2_DESC_SIZE(fs->super) & ~7;
-
- return (struct ext2_group_desc *)((char *)gdp + group * desc_size);
+ struct ext2_group_desc *ret_gdp;
+ struct ext2_group_desc *tmp_gdp;
+ char *dest;
+ dgrp_t block;
+ blk64_t blk;
+ int retval;
+ unsigned int i;
+ unsigned int groups_per_block = EXT2_DESC_PER_BLOCK(fs->super);
+
+ ret_gdp = (struct ext2_group_desc *)((char *)gdp + group * desc_size);
+
+ if (fs->first_meta_desc && group >= fs->first_meta_desc
+ && ret_gdp->bg_block_bitmap == 0) {
+ block = group / groups_per_block;
+ blk = ext2fs_descriptor_block_loc2(fs, fs->group_block, block);
+ dest = (char *) gdp + fs->blocksize * block;
+ retval = io_channel_read_blk64(fs->io, blk, 1, dest);
+ if (retval)
+ return NULL;
+
+ tmp_gdp = (struct ext2_group_desc *)dest;
+
+ for (i=0; i < groups_per_block; i++) {
+ /*
+ * TDB: If recovery is from backup superblock, Clear
+ * _UNININT flags & reset bg_itable_unused to zero
+ */
+#ifdef WORDS_BIGENDIAN
+ ext2fs_swap_group_desc2(fs, tmp_gdp);
+#endif
+ if (fs->orig_super == 0 &&
+ EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
+ EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {
+ ext2fs_bg_flags_clear(fs, block * groups_per_block + i, EXT2_BG_BLOCK_UNINIT);
+ ext2fs_bg_flags_clear(fs, block * groups_per_block + i, EXT2_BG_INODE_UNINIT);
+ ext2fs_bg_itable_unused_set(fs, block * groups_per_block + i, 0);
+ // The checksum will be reset later, but fix it here
+ // anyway to avoid printing a lot of spurious errors.
+ ext2fs_group_desc_csum_set(fs, block * groups_per_block + i);
+ }
+ tmp_gdp = (struct ext2_group_desc *)((char *)tmp_gdp + EXT2_DESC_SIZE(fs->super));
+ }
+ }
+ return ret_gdp;
}

/* Do the same but as an ext4 group desc for internal use here */
diff --git a/lib/ext2fs/closefs.c b/lib/ext2fs/closefs.c
index b255759..994b841 100644
--- a/lib/ext2fs/closefs.c
+++ b/lib/ext2fs/closefs.c
@@ -285,10 +285,8 @@ errcode_t ext2fs_flush2(ext2_filsys fs, int flags)
__u32 feature_incompat;
struct ext2_super_block *super_shadow = 0;
struct ext2_group_desc *group_shadow = 0;
-#ifdef WORDS_BIGENDIAN
struct ext2_group_desc *gdp;
dgrp_t j;
-#endif
char *group_ptr;
blk64_t old_desc_blocks;
struct ext2fs_numeric_progress_struct progress;
@@ -337,6 +335,11 @@ errcode_t ext2fs_flush2(ext2_filsys fs, int flags)
}
#else
super_shadow = fs->super;
+ /* make sure all in memory */
+ for (j = 0; j < fs->group_desc_count; j++) {
+ gdp = ext2fs_group_desc(fs, group_shadow, j);
+ }
+
group_shadow = ext2fs_group_desc(fs, fs->group_desc, 0);
#endif

diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
index 27a7d3a..4858684 100644
--- a/lib/ext2fs/ext2_fs.h
+++ b/lib/ext2fs/ext2_fs.h
@@ -202,6 +202,7 @@ struct ext4_group_desc
#define EXT2_BG_INODE_UNINIT 0x0001 /* Inode table/bitmap not initialized */
#define EXT2_BG_BLOCK_UNINIT 0x0002 /* Block bitmap not initialized */
#define EXT2_BG_INODE_ZEROED 0x0004 /* On-disk itable initialized to zero */
+#define EXT2_BG_READ 0x0008 /* Block group was read from disk */

/*
* Data structures used by the directory indexing feature
diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
index 8ff49ca..d23bd99 100644
--- a/lib/ext2fs/ext2fs.h
+++ b/lib/ext2fs/ext2fs.h
@@ -221,6 +221,9 @@ struct struct_ext2_filsys {
int fragsize;
dgrp_t group_desc_count;
unsigned long desc_blocks;
+ blk64_t group_block;
+ /* first meta descriptor block */
+ dgrp_t first_meta_desc;
struct opaque_ext2_group_desc * group_desc;
unsigned int inode_blocks_per_group;
ext2fs_inode_bitmap inode_map;
diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
index ba39e01..12a3935 100644
--- a/lib/ext2fs/openfs.c
+++ b/lib/ext2fs/openfs.c
@@ -121,11 +121,9 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
blk64_t group_block, blk;
char *dest, *cp;
int group_zero_adjust = 0;
-#ifdef WORDS_BIGENDIAN
unsigned int groups_per_block;
struct ext2_group_desc *gdp;
int j;
-#endif

EXT2_CHECK_MAGIC(manager, EXT2_ET_MAGIC_IO_MANAGER);

@@ -377,7 +375,7 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
}
fs->desc_blocks = ext2fs_div_ceil(fs->group_desc_count,
EXT2_DESC_PER_BLOCK(fs->super));
- retval = ext2fs_get_array(fs->desc_blocks, fs->blocksize,
+ retval = ext2fs_get_arrayzero(fs->desc_blocks, fs->blocksize,
&fs->group_desc);
if (retval)
goto cleanup;
@@ -412,40 +410,27 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
first_meta_bg, dest);
if (retval)
goto cleanup;
-#ifdef WORDS_BIGENDIAN
- gdp = (struct ext2_group_desc *) dest;
for (j=0; j < groups_per_block*first_meta_bg; j++) {
gdp = ext2fs_group_desc(fs, fs->group_desc, j);
- ext2fs_swap_group_desc2(fs, gdp);
- }
-#endif
- dest += fs->blocksize*first_meta_bg;
- }
- for (i=first_meta_bg ; i < fs->desc_blocks; i++) {
- blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
- retval = io_channel_read_blk64(fs->io, blk, 1, dest);
- if (retval)
- goto cleanup;
#ifdef WORDS_BIGENDIAN
- for (j=0; j < groups_per_block; j++) {
- gdp = ext2fs_group_desc(fs, fs->group_desc,
- i * groups_per_block + j);
ext2fs_swap_group_desc2(fs, gdp);
- }
#endif
- dest += fs->blocksize;
+ }
+ dest += fs->blocksize*first_meta_bg;
}

+ fs->first_meta_desc = groups_per_block * first_meta_bg;
+ fs->group_block = group_block;
fs->stride = fs->super->s_raid_stride;

/*
* If recovery is from backup superblock, Clear _UNININT flags &
* reset bg_itable_unused to zero
*/
- if (superblock > 1 && ext2fs_has_group_desc_csum(fs)) {
+ if (first_meta_bg && superblock > 1 && ext2fs_has_group_desc_csum(fs)) {
dgrp_t group;

- for (group = 0; group < fs->group_desc_count; group++) {
+ for (group = 0; group < groups_per_block*first_meta_bg; group++) {
ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
ext2fs_bg_flags_clear(fs, group, EXT2_BG_INODE_UNINIT);
ext2fs_bg_itable_unused_set(fs, group, 0);
--
1.7.1

2017-03-04 00:45:35

by Andreas Dilger

[permalink] [raw]

Subject: Re: [PATCH, RFC] libext2fs: preload block group on request

On Mar 3, 2017, at 2:15 AM, Artem Blagodarenko <[email protected]> wrote:
>
> From: Artem Blagodarenko <[email protected]>
>
> I send this for descussion. Requested by Andreas Dilger and Alexey Lyashkov.
>
>> But again - I may ask an Artem submit a patch to read GD by demand as =
>> it ready again, but it don=E2=80=99t like it due a complexity.
>
>> Sure, it would be good to send it to the list, just to see where =
>> complexity is and discuss how it might be fixed.

Artem,
thanks for the patch. This looks roughly like I expected. Comments inline:

> diff --git a/lib/ext2fs/blknum.c b/lib/ext2fs/blknum.c
> index ac80849..b171bac 100644
> --- a/lib/ext2fs/blknum.c
> +++ b/lib/ext2fs/blknum.c
> @@ -186,8 +186,51 @@ struct ext2_group_desc *ext2fs_group_desc(ext2_filsys fs,
> struct opaque_ext2_group_desc *gdp,

It might be a good idea to rename this variable to "gdt" so that it is more
clear that this is the pointer to the whole group descriptor table, rather
than just a single entry for the group.

> dgrp_t group)
> {
> int desc_size = EXT2_DESC_SIZE(fs->super) & ~7;
> -
> - return (struct ext2_group_desc *)((char *)gdp + group * desc_size);
> + struct ext2_group_desc *ret_gdp;
> + struct ext2_group_desc *tmp_gdp;

> + char *dest;

> + dgrp_t block;
> + blk64_t blk;

(style) these should be declared locally inside "if" block

> + int retval;
> + unsigned int i;
> + unsigned int groups_per_block = EXT2_DESC_PER_BLOCK(fs->super);
> +

> + ret_gdp = (struct ext2_group_desc *)((char *)gdp + group * desc_size);
> +
> + if ((ret_gdp->bg_flags & EXT2_BG_READ) == 0) {

The main problem I see with storing EXT2_BG_READ in the group descriptor
itself it will be written to disk. It would be equivalent to check the
non-zero status of some or all of the fields, using something like:

if (memcmp(zero_gdt, ret_gdp, sizeof(*ret_gdp)) != 0) {

> + block = group / groups_per_block;
> + blk = ext2fs_descriptor_block_loc2(fs, fs->group_block, block);
> + dest = (char *) gdp + fs->blocksize * block;

(style) no space after typecast

> + retval = io_channel_read_blk64(fs->io, blk, 1, dest);
> + if (retval)
> + return NULL;
> +
> + tmp_gdp = (struct ext2_group_desc *)dest;

(style) we could get rid of "dest" and have only "tmp_gdp", and just use:

tmp_gdp = (char *)gdp + fs->blocksize * block;
retval = io_channel_read_blk64(fs->io, blk, 1, tmp_gdp);

since the argument is "void *data" so no need to cast it.

> + for (i=0; i < groups_per_block; i++) {

(style) spaces around that '='

> + /*
> + * TDB: If recovery is from backup superblock, Clear
> + * _UNININT flags & reset bg_itable_unused to zero

Not sure what "TDB" means, since this is being done below?

> + */
> +#ifdef WORDS_BIGENDIAN
> + ext2fs_swap_group_desc2(fs, tmp_gdp);
> +#endif
> + tmp_gdp->bg_flags |= EXT2_BG_READ;

> + if (fs->orig_super == 0 &&
> + EXT2_HAS_RO_COMPAT_FEATURE(fs->super,
> + EXT4_FEATURE_RO_COMPAT_GDT_CSUM)) {

ext2fs_has_group_desc_csum(fs)

> + ext2fs_bg_flags_clear(fs, block * groups_per_block + i, EXT2_BG_BLOCK_UNINIT);

Would be cleaner to calculate "start_group = block * blocks_per_group" once
at the start.

> + ext2fs_bg_flags_clear(fs, block * groups_per_block + i, EXT2_BG_INODE_UNINIT);
> + ext2fs_bg_itable_unused_set(fs, block * groups_per_block + i, 0);

This is nasty, since these are calling ext2fs_group_desc() recursively.
Better to just do this directly in this function:

tmp_gdp->bg_flags &= ~EXT2_BG_BLOCK_UNINIT;
tmp_gdp->bg_flags &= ~EXT2_BG_INODE_UNINIT;

or have some static inline helper functions in blknum.c that take a single
gdp pointer argument instead of doing the lookup internally each time:

set_bg_itable_unused(tmp_gdp, start_group + i);

> + // The checksum will be reset later, but fix it here
> + // anyway to avoid printing a lot of spurious errors.

No C99 comments.

> + ext2fs_group_desc_csum_set(fs, block * groups_per_block + i);

When recovering from the backup superblocks this needs the call to:

if (fs->flags & EXT2_FLAG_RW)
ext2fs_mark_super_dirty(fs);

> + }
> + tmp_gdp = (struct ext2_group_desc *)((char *)tmp_gdp + EXT2_DESC_SIZE(fs->super));

(style) wrap at 80 columns

> + }
> + }
> + return ret_gdp;
> }
>
> /* Do the same but as an ext4 group desc for internal use here */
> diff --git a/lib/ext2fs/ext2_fs.h b/lib/ext2fs/ext2_fs.h
> index 27a7d3a..4858684 100644
> --- a/lib/ext2fs/ext2_fs.h
> +++ b/lib/ext2fs/ext2_fs.h
> @@ -202,6 +202,7 @@ struct ext4_group_desc
> #define EXT2_BG_INODE_UNINIT 0x0001 /* Inode table/bitmap not initialized */
> #define EXT2_BG_BLOCK_UNINIT 0x0002 /* Block bitmap not initialized */
> #define EXT2_BG_INODE_ZEROED 0x0004 /* On-disk itable initialized to zero */
> +#define EXT2_BG_READ 0x0008 /* Block group was read from disk */
>
> /*
> * Data structures used by the directory indexing feature
> diff --git a/lib/ext2fs/ext2fs.h b/lib/ext2fs/ext2fs.h
> index 8ff49ca..0ae5cbe 100644
> --- a/lib/ext2fs/ext2fs.h
> +++ b/lib/ext2fs/ext2fs.h
> @@ -221,6 +221,7 @@ struct struct_ext2_filsys {
> int fragsize;
> dgrp_t group_desc_count;
> unsigned long desc_blocks;
> + blk64_t group_block;
> struct opaque_ext2_group_desc * group_desc;
> unsigned int inode_blocks_per_group;
> ext2fs_inode_bitmap inode_map;
> diff --git a/lib/ext2fs/openfs.c b/lib/ext2fs/openfs.c
> index ba39e01..5f6cda5 100644
> --- a/lib/ext2fs/openfs.c
> +++ b/lib/ext2fs/openfs.c
> @@ -121,11 +121,9 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
> blk64_t group_block, blk;
> char *dest, *cp;
> int group_zero_adjust = 0;
> -#ifdef WORDS_BIGENDIAN
> unsigned int groups_per_block;
> struct ext2_group_desc *gdp;
> int j;
> -#endif
>
> EXT2_CHECK_MAGIC(manager, EXT2_ET_MAGIC_IO_MANAGER);
>
> @@ -412,40 +410,27 @@ errcode_t ext2fs_open2(const char *name, const char *io_options,
> first_meta_bg, dest);
> if (retval)
> goto cleanup;
> -#ifdef WORDS_BIGENDIAN
> - gdp = (struct ext2_group_desc *) dest;
> for (j=0; j < groups_per_block*first_meta_bg; j++) {
> gdp = ext2fs_group_desc(fs, fs->group_desc, j);

Isn't this just loading the group descriptor blocks to be read in here?
This should just start the prefetch and nothing else. We might even try
moving the prefetch to the first call to ext2fs_group_desc() when GDT block
#1 is accessed for the first time so that the prefetch isn't started if
the GDT is never accessed. The drawback is that it could lose time when
the blocks could be prefetched from disk, but I suspect that interval would
be very small if the GDT is actually needed so it is probably a net win.
Otherwise, this will read up to 128MB for filesystems without meta_bg.

Possibly always reading in the first GDT block would be good, so that the
special reserved inodes are loaded always.

> - ext2fs_swap_group_desc2(fs, gdp);
> - }
> -#endif
> - dest += fs->blocksize*first_meta_bg;
> - }
> - for (i=first_meta_bg ; i < fs->desc_blocks; i++) {
> - blk = ext2fs_descriptor_block_loc2(fs, group_block, i);
> - retval = io_channel_read_blk64(fs->io, blk, 1, dest);
> - if (retval)
> - goto cleanup;
> #ifdef WORDS_BIGENDIAN
> - for (j=0; j < groups_per_block; j++) {
> - gdp = ext2fs_group_desc(fs, fs->group_desc,
> - i * groups_per_block + j);
> ext2fs_swap_group_desc2(fs, gdp);
> - }
> #endif
> - dest += fs->blocksize;
> + gdp->bg_flags |= EXT2_BG_READ;
> + }
> + dest += fs->blocksize*first_meta_bg;
> }
>
> + fs->group_block = group_block;
> fs->stride = fs->super->s_raid_stride;
>
> /*
> * If recovery is from backup superblock, Clear _UNININT flags &
> * reset bg_itable_unused to zero
> */
> - if (superblock > 1 && ext2fs_has_group_desc_csum(fs)) {
> + if (first_meta_bg && superblock > 1 && ext2fs_has_group_desc_csum(fs)) {
> dgrp_t group;
>
> - for (group = 0; group < fs->group_desc_count; group++) {
> + for (group = 0; group < groups_per_block*first_meta_bg; group++) {
> ext2fs_bg_flags_clear(fs, group, EXT2_BG_BLOCK_UNINIT);
> ext2fs_bg_flags_clear(fs, group, EXT2_BG_INODE_UNINIT);
> ext2fs_bg_itable_unused_set(fs, group, 0);

This is already implemented in the ext2fs_group_desc() access, so could be
removed here?

Cheers, Andreas

Attachments:

signature.asc (195.00 B)
Message signed with OpenPGP