2011-04-03 18:08:04

by Zeev Tarantov

[permalink] [raw]
Subject: problem(?) in ext4 or mke2fs

While testing zram I ran a script that creates a block devices,
creates a filesystem on in and untars Qt on that filesystem.
I was surprised to find ext4_mb_scan_aligned near the top of the profile output.
This was evidently because the command "mke2fs -t ext4 -m 0 -I 128 -O
^has_journal,^ext_attr <block device>"
created a filesystem with (output of tune2fs):
RAID stride: 1
RAID stripe width: 1

I thought this strange and removed these values using debugfs:
set_super_value raid_stride 0
set_super_value raid_stripe_width 0

With this "fix" the symbol ext4_mb_scan_aligned disappeared from perf's output:

18.98% -3.66% gzip [.] zip
0.00% +14.84% [kernel.kallsyms] [k] ext4_mb_scan_aligned
17.91% -3.44% gzip [.]
treat_file.part.4.2264
13.73% -2.47% [csnappy_compress] [k]
snappy_compress_fragment
3.96% -0.77% [kernel.kallsyms] [k]
copy_user_generic_string
3.05% -0.41% libc-2.13.so [.] __memcpy_ssse3
0.89% +1.49% [kernel.kallsyms] [k] _raw_spin_lock
2.63% -0.49% [kernel.kallsyms] [k] __memcpy
1.61% -0.17% [kernel.kallsyms] [k] __memset
0.78% -0.11% [kernel.kallsyms] [k] ext4_mark_iloc_dirty
0.63% -0.11% [kernel.kallsyms] [k] system_call
0.66% -0.14% gzip [.] treat_stdin.2262
0.58% -0.12% libc-2.13.so [.] _int_malloc

This is using mke2fs 1.41.14 (22-Dec-2010) on Linux 2.6.38.2.

Is this expected behavior? Do you need me to provide more information?

regards,
-Z.T.


2011-04-03 18:40:41

by Eric Sandeen

[permalink] [raw]
Subject: Re: problem(?) in ext4 or mke2fs

On 4/3/11 11:07 AM, Zeev Tarantov wrote:
> While testing zram I ran a script that creates a block devices,
> creates a filesystem on in and untars Qt on that filesystem.
> I was surprised to find ext4_mb_scan_aligned near the top of the profile output.
> This was evidently because the command "mke2fs -t ext4 -m 0 -I 128 -O
> ^has_journal,^ext_attr <block device>"
> created a filesystem with (output of tune2fs):
> RAID stride: 1
> RAID stripe width: 1

mke2fs queries the block device for its geometry, based on what is
reported via sysfs:

/*
* Sets the geometry of a device (stripe/stride), and returns the
* device's alignment offset, if any, or a negative error.
*/
static int get_device_geometry( ...
...
min_io = blkid_topology_get_minimum_io_size(tp);
opt_io = blkid_topology_get_optimal_io_size(tp);
...

fs_param->s_raid_stride = min_io / blocksize;
fs_param->s_raid_stripe_width = opt_io / blocksize;

What does

# blockdev --getiomin --getioopt /dev/<yourdevice>

say for your device?

The device may be reporting odd values, but mke2fs probably
should be smart enough not to set block-sized stripe unit and width...

-Eric



> I thought this strange and removed these values using debugfs:
> set_super_value raid_stride 0
> set_super_value raid_stripe_width 0
>
> With this "fix" the symbol ext4_mb_scan_aligned disappeared from perf's output:
>
> 18.98% -3.66% gzip [.] zip
> 0.00% +14.84% [kernel.kallsyms] [k] ext4_mb_scan_aligned
> 17.91% -3.44% gzip [.]
> treat_file.part.4.2264
> 13.73% -2.47% [csnappy_compress] [k]
> snappy_compress_fragment
> 3.96% -0.77% [kernel.kallsyms] [k]
> copy_user_generic_string
> 3.05% -0.41% libc-2.13.so [.] __memcpy_ssse3
> 0.89% +1.49% [kernel.kallsyms] [k] _raw_spin_lock
> 2.63% -0.49% [kernel.kallsyms] [k] __memcpy
> 1.61% -0.17% [kernel.kallsyms] [k] __memset
> 0.78% -0.11% [kernel.kallsyms] [k] ext4_mark_iloc_dirty
> 0.63% -0.11% [kernel.kallsyms] [k] system_call
> 0.66% -0.14% gzip [.] treat_stdin.2262
> 0.58% -0.12% libc-2.13.so [.] _int_malloc
>
> This is using mke2fs 1.41.14 (22-Dec-2010) on Linux 2.6.38.2.
>
> Is this expected behavior? Do you need me to provide more information?
>
> regards,
> -Z.T.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-04-03 18:53:18

by Zeev Tarantov

[permalink] [raw]
Subject: Re: problem(?) in ext4 or mke2fs

On Sun, Apr 3, 2011 at 21:40, Eric Sandeen <[email protected]> wrote:
> On 4/3/11 11:07 AM, Zeev Tarantov wrote:
>> While testing zram I ran a script that creates a block devices,
>> creates a filesystem on in and untars Qt on that filesystem.
>> I was surprised to find ext4_mb_scan_aligned near the top of the profile output.
>> This was evidently because the command "mke2fs -t ext4 -m 0 -I 128 -O
>> ^has_journal,^ext_attr <block device>"
>> ?created a filesystem with (output of tune2fs):
>> RAID stride: ? ? ? ? ? ? ?1
>> RAID stripe width: ? ? ? ?1
>
> mke2fs queries the block device for its geometry, based on what is
> reported via sysfs:
>
> /*
> ?* Sets the geometry of a device (stripe/stride), and returns the
> ?* device's alignment offset, if any, or a negative error.
> ?*/
> static int get_device_geometry( ...
> ...
> ? ? ? ?min_io = blkid_topology_get_minimum_io_size(tp);
> ? ? ? ?opt_io = blkid_topology_get_optimal_io_size(tp);
> ...
>
> ? ? ? ?fs_param->s_raid_stride = min_io / blocksize;
> ? ? ? ?fs_param->s_raid_stripe_width = opt_io / blocksize;
>
> What does
>
> # blockdev --getiomin --getioopt /dev/<yourdevice>
>
> say for your device?

get logical block (sector) size: 4096
get physical block (sector) size: 4096
get minimum I/O size: 4096
get optimal I/O size: 4096
get alignment offset in bytes: 0
get max sectors per request: 255
get blocksize: 4096
get readahead: 256

> The device may be reporting odd values, but mke2fs probably
> should be smart enough not to set block-sized stripe unit and width...

If the filesystem created with the default options is slow or has
higher cpu usage, it should be changed.

> -Eric

-Z.T.

2011-04-03 18:56:21

by Eric Sandeen

[permalink] [raw]
Subject: Re: problem(?) in ext4 or mke2fs

On 4/3/11 11:52 AM, Zeev Tarantov wrote:
> On Sun, Apr 3, 2011 at 21:40, Eric Sandeen <[email protected]> wrote:

...

>> What does
>>
>> # blockdev --getiomin --getioopt /dev/<yourdevice>
>>
>> say for your device?
>
> get logical block (sector) size: 4096
> get physical block (sector) size: 4096
> get minimum I/O size: 4096
> get optimal I/O size: 4096
> get alignment offset in bytes: 0
> get max sectors per request: 255
> get blocksize: 4096
> get readahead: 256
>
>> The device may be reporting odd values, but mke2fs probably
>> should be smart enough not to set block-sized stripe unit and width...
>
> If the filesystem created with the default options is slow or has
> higher cpu usage, it should be changed.

I agree. For actual striped storage, this makes it faster, but this
case is a problem; block-sized stripe width is never going to be good.
What device is this, exactly?

-Eric (losing my free airport wifi in about 8 minutes, so I may have
to continue this later...!)

>> -Eric
>
> -Z.T.

2011-04-03 19:02:14

by Zeev Tarantov

[permalink] [raw]
Subject: Re: problem(?) in ext4 or mke2fs

On Sun, Apr 3, 2011 at 21:56, Eric Sandeen <[email protected]> wrote:
> On 4/3/11 11:52 AM, Zeev Tarantov wrote:
>> If the filesystem created with the default options is slow or has
>> higher cpu usage, it should be changed.
>
> I agree. ?For actual striped storage, this makes it faster, but this
> case is a problem; block-sized stripe width is never going to be good.
> What device is this, exactly?

Look in linux-2.6/drivers/staging/zram/zram.txt

> -Eric (losing my free airport wifi in about 8 minutes, so I may have
> to continue this later...!)
>
>>> -Eric
>>
>> -Z.T.
>

2011-04-04 00:40:46

by Andreas Dilger

[permalink] [raw]
Subject: Re: problem(?) in ext4 or mke2fs



Cheers, Andreas

On 2011-04-03, at 8:56 AM, Eric Sandeen <[email protected]> wrote:

> On 4/3/11 11:52 AM, Zeev Tarantov wrote:
>> On Sun, Apr 3, 2011 at 21:40, Eric Sandeen <[email protected]> wrote:
>
> ...
>
>>> What does
>>>
>>> # blockdev --getiomin --getioopt /dev/<yourdevice>
>>>
>>> say for your device?
>>
>> get logical block (sector) size: 4096
>> get physical block (sector) size: 4096
>> get minimum I/O size: 4096
>> get optimal I/O size: 4096
>> get alignment offset in bytes: 0
>> get max sectors per request: 255
>> get blocksize: 4096
>> get readahead: 256
>>
>>> The device may be reporting odd values, but mke2fs probably
>>> should be smart enough not to set block-sized stripe unit and width...
>>
>> If the filesystem created with the default options is slow or has
>> higher cpu usage, it should be changed.
>
> I agree. For actual striped storage, this makes it faster, but this
> case is a problem; block-sized stripe width is never going to be good.
> What device is this, exactly?
>
> -Eric (losing my free airport wifi in about 8 minutes, so I may have
> to continue this later...!)
>
>>> -Eric
>>
>> -Z.T.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-04-04 00:50:20

by Andreas Dilger

[permalink] [raw]
Subject: Re: problem(?) in ext4 or mke2fs

Sorry for the previous empty message...

I was just going to write that it makes sense to have mballoc use a reasonably large stripe size (e.g. 1MB) that is an even multiple of the underlying device blocksize.

Something like the following might work:

if (ioopt != 0)
stripe = max(1, (1048576 + ioopt - 1) / ioopt) * ioopt;
else
stripe = 0;

And let the kernel decide what to do if unspecified. I'd prefer to leave it unset if there is nothing provided by the device, so we don't confuse the default value with a value specified by the admin.

Cheers, Andreas

On 2011-04-03, at 8:56 AM, Eric Sandeen <[email protected]> wrote:

> On 4/3/11 11:52 AM, Zeev Tarantov wrote:
>> On Sun, Apr 3, 2011 at 21:40, Eric Sandeen <[email protected]> wrote:
>
> ...
>
>>> What does
>>>
>>> # blockdev --getiomin --getioopt /dev/<yourdevice>
>>>
>>> say for your device?
>>
>> get logical block (sector) size: 4096
>> get physical block (sector) size: 4096
>> get minimum I/O size: 4096
>> get optimal I/O size: 4096
>> get alignment offset in bytes: 0
>> get max sectors per request: 255
>> get blocksize: 4096
>> get readahead: 256
>>
>>> The device may be reporting odd values, but mke2fs probably
>>> should be smart enough not to set block-sized stripe unit and width...
>>
>> If the filesystem created with the default options is slow or has
>> higher cpu usage, it should be changed.
>
> I agree. For actual striped storage, this makes it faster, but this
> case is a problem; block-sized stripe width is never going to be good.
> What device is this, exactly?
>
> -Eric (losing my free airport wifi in about 8 minutes, so I may have
> to continue this later...!)
>
>>> -Eric
>>
>> -Z.T.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-04-03 19:21:14

by Eric Sandeen

[permalink] [raw]
Subject: Re: problem(?) in ext4 or mke2fs

On 4/3/11 12:01 PM, Zeev Tarantov wrote:
> On Sun, Apr 3, 2011 at 21:56, Eric Sandeen <[email protected]> wrote:
>> On 4/3/11 11:52 AM, Zeev Tarantov wrote:
>>> If the filesystem created with the default options is slow or has
>>> higher cpu usage, it should be changed.
>>
>> I agree. For actual striped storage, this makes it faster, but this
>> case is a problem; block-sized stripe width is never going to be good.
>> What device is this, exactly?
>
> Look in linux-2.6/drivers/staging/zram/zram.txt

OK, so it does:

/*
* To ensure that we always get PAGE_SIZE aligned
* and n*PAGE_SIZED sized I/O requests.
*/
blk_queue_physical_block_size(zram->disk->queue, PAGE_SIZE);
blk_queue_logical_block_size(zram->disk->queue,
ZRAM_LOGICAL_BLOCK_SIZE);
blk_queue_io_min(zram->disk->queue, PAGE_SIZE);
blk_queue_io_opt(zram->disk->queue, PAGE_SIZE);

These are all documented in Documentation/ABI/testing/sysfs-block.

I don't think that setting all those values in zram is necessary and/or
sufficient to achieve what is desired in the comment. io_min/io_opt
generally are set only for striped devices.

Still, mke2fsprogs should probably sanity-check for this; I'll make
sure this seems right, and send a patch.

Thanks,
-Eric

>> -Eric (losing my free airport wifi in about 8 minutes, so I may have
>> to continue this later...!)
>>
>>>> -Eric
>>>
>>> -Z.T.
>>