2009-07-08 12:40:26

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

Ronald Moesbergen, on 07/08/2009 12:49 PM wrote:
> 2009/7/7 Vladislav Bolkhovitin <[email protected]>:
>> Ronald Moesbergen, on 07/07/2009 10:49 AM wrote:
>>>>>> I think, most likely, there was some confusion between the tested and
>>>>>> patched versions of the kernel or you forgot to apply the io_context
>>>>>> patch.
>>>>>> Please recheck.
>>>>> The tests above were definitely done right, I just rechecked the
>>>>> patches, and I do see an average increase of about 10MB/s over an
>>>>> unpatched kernel. But overall the performance is still pretty bad.
>>>> Have you rebuild and reinstall SCST after patching kernel?
>>> Yes I have. And the warning about missing io_context patches wasn't
>>> there during the compilation.
>> Can you update to the latest trunk/ and send me the kernel logs from the
>> kernel's boot after one dd with any block size you like >128K and the
>> transfer rate the dd reported, please?
>>
>
> I think I just reproduced the 'wrong' result:
>
> dd if=/dev/sdc of=/dev/null bs=512K count=2000
> 2000+0 records in
> 2000+0 records out
> 1048576000 bytes (1.0 GB) copied, 12.1291 s, 86.5 MB/s
>
> This happens when I do a 'dd' on the device with a mounted filesystem.
> The filesystem mount causes some of the blocks on the device to be
> cached and therefore the results are wrong. This was not the case in
> all the blockdev-perftest run's I did (the filesystem was never
> mounted).

Why do you think the file system (which one, BTW?) has any additional
caching if you did "echo 3 > /proc/sys/vm/drop_caches" before the tests?
All block devices and file systems use the same cache facilities.

I've also long ago noticed that reading data from block devices is
slower than from files from mounted on those block devices file systems.
Can anybody explain it?

Looks like this is strangeness #2 which we uncovered in our tests (the
first one was earlier in this thread why the context RA doesn't work
with cooperative I/O threads as good as it should).

Can you rerun the same 11 tests over a file on the file system, please?

> Ronald.
>


2009-07-10 06:32:38

by Ronald Moesbergen

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

2009/7/8 Vladislav Bolkhovitin <[email protected]>:
> Ronald Moesbergen, on 07/08/2009 12:49 PM wrote:
>>
>> 2009/7/7 Vladislav Bolkhovitin <[email protected]>:
>>>
>>> Ronald Moesbergen, on 07/07/2009 10:49 AM wrote:
>>>>>>>
>>>>>>> I think, most likely, there was some confusion between the tested and
>>>>>>> patched versions of the kernel or you forgot to apply the io_context
>>>>>>> patch.
>>>>>>> Please recheck.
>>>>>>
>>>>>> The tests above were definitely done right, I just rechecked the
>>>>>> patches, and I do see an average increase of about 10MB/s over an
>>>>>> unpatched kernel. But overall the performance is still pretty bad.
>>>>>
>>>>> Have you rebuild and reinstall SCST after patching kernel?
>>>>
>>>> Yes I have. And the warning about missing io_context patches wasn't
>>>> there during the compilation.
>>>
>>> Can you update to the latest trunk/ and send me the kernel logs from the
>>> kernel's boot after one dd with any block size you like >128K and the
>>> transfer rate the dd reported, please?
>>>
>>
>> I think I just reproduced the 'wrong' result:
>>
>> dd if=/dev/sdc of=/dev/null bs=512K count=2000
>> 2000+0 records in
>> 2000+0 records out
>> 1048576000 bytes (1.0 GB) copied, 12.1291 s, 86.5 MB/s
>>
>> This happens when I do a 'dd' on the device with a mounted filesystem.
>> The filesystem mount causes some of the blocks on the device to be
>> cached and therefore the results are wrong. This was not the case in
>> all the blockdev-perftest run's I did (the filesystem was never
>> mounted).
>
> Why do you think the file system (which one, BTW?) has any additional
> caching if you did "echo 3 > /proc/sys/vm/drop_caches" before the tests? All
> block devices and file systems use the same cache facilities.

I didn't drop the caches because I just restarted both machines and
thought that would be enough. But because of the mounted filesystem
the results were invalid. (The filesystem is OCFS2, but that doesn't
matter).

> I've also long ago noticed that reading data from block devices is slower
> than from files from mounted on those block devices file systems. Can
> anybody explain it?
>
> Looks like this is strangeness #2 which we uncovered in our tests (the first
> one was earlier in this thread why the context RA doesn't work with
> cooperative I/O threads as good as it should).
>
> Can you rerun the same 11 tests over a file on the file system, please?

I'll see what I can do. Just te be sure: you want me to run
blockdev-perftest on a file on the OCFS2 filesystem which is mounted
on the client over iScsi, right?

Ronald.

2009-07-10 08:43:57

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>> I've also long ago noticed that reading data from block devices is slower
>> than from files from mounted on those block devices file systems. Can
>> anybody explain it?
>>
>> Looks like this is strangeness #2 which we uncovered in our tests (the first
>> one was earlier in this thread why the context RA doesn't work with
>> cooperative I/O threads as good as it should).
>>
>> Can you rerun the same 11 tests over a file on the file system, please?
>
> I'll see what I can do. Just te be sure: you want me to run
> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
> on the client over iScsi, right?

Yes, please.

> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2009-07-10 09:27:50

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>>> I've also long ago noticed that reading data from block devices is slower
>>> than from files from mounted on those block devices file systems. Can
>>> anybody explain it?
>>>
>>> Looks like this is strangeness #2 which we uncovered in our tests (the first
>>> one was earlier in this thread why the context RA doesn't work with
>>> cooperative I/O threads as good as it should).
>>>
>>> Can you rerun the same 11 tests over a file on the file system, please?
>> I'll see what I can do. Just te be sure: you want me to run
>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>> on the client over iScsi, right?
>
> Yes, please.

Forgot to mention that you should also configure your backend storage as
a big file on a file system (preferably, XFS) too, not as direct device,
like /dev/vg/db-master.

Thanks,
Vlad

2009-07-13 12:12:20

by Ronald Moesbergen

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

2009/7/10 Vladislav Bolkhovitin <[email protected]>:
>
> Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
>>
>> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>>>>
>>>> I've also long ago noticed that reading data from block devices is
>>>> slower
>>>> than from files from mounted on those block devices file systems. Can
>>>> anybody explain it?
>>>>
>>>> Looks like this is strangeness #2 which we uncovered in our tests (the
>>>> first
>>>> one was earlier in this thread why the context RA doesn't work with
>>>> cooperative I/O threads as good as it should).
>>>>
>>>> Can you rerun the same 11 tests over a file on the file system, please?
>>>
>>> I'll see what I can do. Just te be sure: you want me to run
>>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>>> on the client over iScsi, right?
>>
>> Yes, please.
>
> Forgot to mention that you should also configure your backend storage as a
> big file on a file system (preferably, XFS) too, not as direct device, like
> /dev/vg/db-master.

Ok, here are the results:

client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead patch

Test done with XFS on both the target and the initiator. This confirms
your findings, using files instead of block devices is faster, but
only when using the io_context patch.

Without io_context patch:
1) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.327 18.327 17.740 56.491 0.872 0.883
33554432 18.662 18.311 18.116 55.772 0.683 1.743
16777216 18.900 18.421 18.312 55.229 0.754 3.452
8388608 18.893 18.533 18.281 55.156 0.743 6.895
4194304 18.512 18.097 18.400 55.850 0.536 13.963
2097152 18.635 18.313 18.676 55.232 0.486 27.616
1048576 18.441 18.264 18.245 55.907 0.267 55.907
524288 17.773 18.669 18.459 55.980 1.184 111.960
262144 18.580 18.758 17.483 56.091 1.767 224.365
131072 17.224 18.333 18.765 56.626 2.067 453.006
65536 18.082 19.223 18.238 55.348 1.483 885.567
32768 17.719 18.293 18.198 56.680 0.795 1813.766
16384 17.872 18.322 17.537 57.192 1.024 3660.273

2) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.738 18.435 18.400 55.283 0.451 0.864
33554432 18.046 18.167 17.572 57.128 0.826 1.785
16777216 18.504 18.203 18.377 55.771 0.376 3.486
8388608 22.069 18.554 17.825 53.013 4.766 6.627
4194304 19.211 18.136 18.083 55.465 1.529 13.866
2097152 18.647 17.851 18.511 55.866 1.071 27.933
1048576 19.084 18.177 18.194 55.425 1.249 55.425
524288 18.999 18.553 18.380 54.934 0.763 109.868
262144 18.867 18.273 18.063 55.668 1.020 222.673
131072 17.846 18.966 18.193 55.885 1.412 447.081
65536 18.195 18.616 18.482 55.564 0.530 889.023
32768 17.882 18.841 17.707 56.481 1.525 1807.394
16384 17.073 18.278 17.985 57.646 1.689 3689.369

3) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.658 17.830 19.258 55.162 1.750 0.862
33554432 17.193 18.265 18.517 56.974 1.854 1.780
16777216 17.531 17.681 18.776 56.955 1.720 3.560
8388608 18.234 17.547 18.201 56.926 1.014 7.116
4194304 18.057 17.923 17.901 57.015 0.218 14.254
2097152 18.565 17.739 17.658 56.958 1.277 28.479
1048576 18.393 17.433 17.314 57.851 1.550 57.851
524288 18.939 17.835 18.972 55.152 1.600 110.304
262144 18.562 19.005 18.069 55.240 1.141 220.959
131072 19.574 17.562 18.251 55.576 2.476 444.611
65536 19.117 18.019 17.886 55.882 1.647 894.115
32768 18.237 17.415 17.482 57.842 1.200 1850.933
16384 17.760 18.444 18.055 56.631 0.876 3624.391

4) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.368 17.495 18.524 56.520 1.434 0.883
33554432 18.209 17.523 19.146 56.052 2.027 1.752
16777216 18.765 18.053 18.550 55.497 0.903 3.469
8388608 17.878 17.848 18.389 56.778 0.774 7.097
4194304 18.058 17.683 18.567 56.589 1.129 14.147
2097152 18.896 18.384 18.697 54.888 0.623 27.444
1048576 18.505 17.769 17.804 56.826 1.055 56.826
524288 18.319 17.689 17.941 56.955 0.816 113.910
262144 19.227 17.770 18.212 55.704 1.821 222.815
131072 18.738 18.227 17.869 56.044 1.090 448.354
65536 19.319 18.525 18.084 54.969 1.494 879.504
32768 18.321 17.672 17.870 57.047 0.856 1825.495
16384 18.249 17.495 18.146 57.025 1.073 3649.582

With io_context patch:
5) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.393 11.925 12.627 83.196 1.989 1.300
33554432 11.844 11.855 12.191 85.610 1.142 2.675
16777216 12.729 12.602 12.068 82.187 1.913 5.137
8388608 12.245 12.060 14.081 80.419 5.469 10.052
4194304 13.224 11.866 12.110 82.763 3.833 20.691
2097152 11.585 12.584 11.755 85.623 3.052 42.811
1048576 12.166 12.144 12.321 83.867 0.539 83.867
524288 12.019 12.148 12.160 84.568 0.448 169.137
262144 12.014 12.378 12.074 84.259 1.095 337.036
131072 11.840 12.068 11.849 85.921 0.756 687.369
65536 12.098 11.803 12.312 84.857 1.470 1357.720
32768 11.852 12.635 11.887 84.529 2.465 2704.931
16384 12.443 13.110 11.881 82.197 3.299 5260.620

6) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.033 12.122 11.950 82.911 3.110 1.295
33554432 12.386 13.357 12.082 81.364 3.429 2.543
16777216 12.102 11.542 12.053 86.096 1.860 5.381
8388608 12.240 11.740 11.789 85.917 1.601 10.740
4194304 11.824 12.388 12.042 84.768 1.621 21.192
2097152 11.962 12.283 11.973 84.832 1.036 42.416
1048576 12.639 11.863 12.010 84.197 2.290 84.197
524288 11.809 12.919 11.853 84.121 3.439 168.243
262144 12.105 12.649 12.779 81.894 1.940 327.577
131072 12.441 12.769 12.713 81.017 0.923 648.137
65536 12.490 13.308 12.440 80.414 2.457 1286.630
32768 13.235 11.917 12.300 82.184 3.576 2629.883
16384 12.335 12.394 12.201 83.187 0.549 5323.990

7) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.017 12.334 12.151 84.168 0.897 1.315
33554432 12.265 12.200 11.976 84.310 0.864 2.635
16777216 12.356 11.972 12.292 83.903 1.165 5.244
8388608 12.247 12.368 11.769 84.472 1.825 10.559
4194304 11.888 11.974 12.144 85.325 0.754 21.331
2097152 12.433 10.938 11.669 87.911 4.595 43.956
1048576 11.748 12.271 12.498 84.180 2.196 84.180
524288 11.726 11.681 12.322 86.031 2.075 172.062
262144 12.593 12.263 11.939 83.530 1.817 334.119
131072 11.874 12.265 12.441 84.012 1.648 672.093
65536 12.119 11.848 12.037 85.330 0.809 1365.277
32768 12.549 12.080 12.008 83.882 1.625 2684.238
16384 12.369 12.087 12.589 82.949 1.385 5308.766

8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.664 11.793 11.963 84.428 2.575 1.319
33554432 11.825 12.074 12.442 84.571 1.761 2.643
16777216 11.997 11.952 10.905 88.311 3.958 5.519
8388608 11.866 12.270 11.796 85.519 1.476 10.690
4194304 11.754 12.095 12.539 84.483 2.230 21.121
2097152 11.948 11.633 11.886 86.628 1.007 43.314
1048576 12.029 12.519 11.701 84.811 2.345 84.811
524288 11.928 12.011 12.049 85.363 0.361 170.726
262144 12.559 11.827 11.729 85.140 2.566 340.558
131072 12.015 12.356 11.587 85.494 2.253 683.952
65536 11.741 12.113 11.931 85.861 1.093 1373.770
32768 12.655 11.738 12.237 83.945 2.589 2686.246
16384 11.928 12.423 11.875 84.834 1.711 5429.381

9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.570 13.491 14.299 74.326 1.927 1.161
33554432 13.238 13.198 13.255 77.398 0.142 2.419
16777216 13.851 13.199 13.463 75.857 1.497 4.741
8388608 13.339 16.695 13.551 71.223 7.010 8.903
4194304 13.689 13.173 14.258 74.787 2.415 18.697
2097152 13.518 13.543 13.894 75.021 0.934 37.510
1048576 14.119 14.030 13.820 73.202 0.659 73.202
524288 13.747 14.781 13.820 72.621 2.369 145.243
262144 14.168 13.652 14.165 73.189 1.284 292.757
131072 14.112 13.868 14.213 72.817 0.753 582.535
65536 14.604 13.762 13.725 73.045 2.071 1168.728
32768 14.796 15.356 14.486 68.861 1.653 2203.564
16384 13.079 13.525 13.427 76.757 1.111 4912.426

10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 20.372 18.077 17.262 55.411 3.800 0.866
33554432 17.287 17.620 17.828 58.263 0.740 1.821
16777216 16.802 18.154 17.315 58.831 1.865 3.677
8388608 17.510 18.291 17.253 57.939 1.427 7.242
4194304 17.059 17.706 17.352 58.958 0.897 14.740
2097152 17.252 18.064 17.615 58.059 1.090 29.029
1048576 17.082 17.373 17.688 58.927 0.838 58.927
524288 17.129 17.271 17.583 59.103 0.644 118.206
262144 17.411 17.695 18.048 57.808 0.848 231.231
131072 17.937 17.704 18.681 56.581 1.285 452.649
65536 17.927 17.465 17.907 57.646 0.698 922.338
32768 18.494 17.820 17.719 56.875 1.073 1819.985
16384 18.800 17.759 17.575 56.798 1.666 3635.058

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 20.045 21.881 20.018 49.680 2.037 0.776
33554432 20.768 20.291 20.464 49.938 0.479 1.561
16777216 21.563 20.714 20.429 49.017 1.116 3.064
8388608 21.290 21.109 21.308 48.221 0.205 6.028
4194304 22.240 20.662 21.088 48.054 1.479 12.013
2097152 20.282 21.098 20.580 49.593 0.806 24.796
1048576 20.367 19.929 20.252 50.741 0.469 50.741
524288 20.885 21.203 20.684 48.945 0.498 97.890
262144 19.982 21.375 20.798 49.463 1.373 197.853
131072 20.744 21.590 19.698 49.593 1.866 396.740
65536 21.586 20.953 21.055 48.314 0.627 773.024
32768 21.228 20.307 21.049 49.104 0.950 1571.327
16384 21.257 21.209 21.150 48.289 0.100 3090.498

Ronald.

2009-07-13 12:36:39

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Mon, Jul 13, 2009 at 08:12:14PM +0800, Ronald Moesbergen wrote:
> 2009/7/10 Vladislav Bolkhovitin <[email protected]>:
> >
> > Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
> >>
> >> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
> >>>>
> >>>> I've also long ago noticed that reading data from block devices is
> >>>> slower
> >>>> than from files from mounted on those block devices file systems. Can
> >>>> anybody explain it?
> >>>>
> >>>> Looks like this is strangeness #2 which we uncovered in our tests (the
> >>>> first
> >>>> one was earlier in this thread why the context RA doesn't work with
> >>>> cooperative I/O threads as good as it should).
> >>>>
> >>>> Can you rerun the same 11 tests over a file on the file system, please?
> >>>
> >>> I'll see what I can do. Just te be sure: you want me to run
> >>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
> >>> on the client over iScsi, right?
> >>
> >> Yes, please.
> >
> > Forgot to mention that you should also configure your backend storage as a
> > big file on a file system (preferably, XFS) too, not as direct device, like
> > /dev/vg/db-master.
>
> Ok, here are the results:

Ronald, thanks for the numbers!

> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead patch

Do you mean the context readahead patch?

> Test done with XFS on both the target and the initiator. This confirms
> your findings, using files instead of block devices is faster, but
> only when using the io_context patch.

It shows that the one really matters is the io_context patch,
even when context readahead is running. I guess what happened
in the tests are:
- without readahead (or readahead algorithm failed to do proper
sequential readaheads), the SCST processes will be submitting
small but close to each other IOs. CFQ relies on the io_context
patch to prevent unnecessary idling.
- with proper readahead, the SCST processes will also be submitting
close readahead IOs. For example, one file's 100-102MB pages is
readahead by process A, while its 102-104MB pages may be
readahead by process B. In this case CFQ will also idle waiting
for process A to submit the next IO, but in fact that IO is being
submitted by process B. So the io_context patch is still necessary
even when context readahead is working fine. I guess context
readahead do have the added value of possibly enlarging the IO size
(however this benchmark seems to not very sensitive to IO size).

Thanks,
Fengguang

> Without io_context patch:
> 1) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.327 18.327 17.740 56.491 0.872 0.883
> 33554432 18.662 18.311 18.116 55.772 0.683 1.743
> 16777216 18.900 18.421 18.312 55.229 0.754 3.452
> 8388608 18.893 18.533 18.281 55.156 0.743 6.895
> 4194304 18.512 18.097 18.400 55.850 0.536 13.963
> 2097152 18.635 18.313 18.676 55.232 0.486 27.616
> 1048576 18.441 18.264 18.245 55.907 0.267 55.907
> 524288 17.773 18.669 18.459 55.980 1.184 111.960
> 262144 18.580 18.758 17.483 56.091 1.767 224.365
> 131072 17.224 18.333 18.765 56.626 2.067 453.006
> 65536 18.082 19.223 18.238 55.348 1.483 885.567
> 32768 17.719 18.293 18.198 56.680 0.795 1813.766
> 16384 17.872 18.322 17.537 57.192 1.024 3660.273
>
> 2) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.738 18.435 18.400 55.283 0.451 0.864
> 33554432 18.046 18.167 17.572 57.128 0.826 1.785
> 16777216 18.504 18.203 18.377 55.771 0.376 3.486
> 8388608 22.069 18.554 17.825 53.013 4.766 6.627
> 4194304 19.211 18.136 18.083 55.465 1.529 13.866
> 2097152 18.647 17.851 18.511 55.866 1.071 27.933
> 1048576 19.084 18.177 18.194 55.425 1.249 55.425
> 524288 18.999 18.553 18.380 54.934 0.763 109.868
> 262144 18.867 18.273 18.063 55.668 1.020 222.673
> 131072 17.846 18.966 18.193 55.885 1.412 447.081
> 65536 18.195 18.616 18.482 55.564 0.530 889.023
> 32768 17.882 18.841 17.707 56.481 1.525 1807.394
> 16384 17.073 18.278 17.985 57.646 1.689 3689.369
>
> 3) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.658 17.830 19.258 55.162 1.750 0.862
> 33554432 17.193 18.265 18.517 56.974 1.854 1.780
> 16777216 17.531 17.681 18.776 56.955 1.720 3.560
> 8388608 18.234 17.547 18.201 56.926 1.014 7.116
> 4194304 18.057 17.923 17.901 57.015 0.218 14.254
> 2097152 18.565 17.739 17.658 56.958 1.277 28.479
> 1048576 18.393 17.433 17.314 57.851 1.550 57.851
> 524288 18.939 17.835 18.972 55.152 1.600 110.304
> 262144 18.562 19.005 18.069 55.240 1.141 220.959
> 131072 19.574 17.562 18.251 55.576 2.476 444.611
> 65536 19.117 18.019 17.886 55.882 1.647 894.115
> 32768 18.237 17.415 17.482 57.842 1.200 1850.933
> 16384 17.760 18.444 18.055 56.631 0.876 3624.391
>
> 4) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.368 17.495 18.524 56.520 1.434 0.883
> 33554432 18.209 17.523 19.146 56.052 2.027 1.752
> 16777216 18.765 18.053 18.550 55.497 0.903 3.469
> 8388608 17.878 17.848 18.389 56.778 0.774 7.097
> 4194304 18.058 17.683 18.567 56.589 1.129 14.147
> 2097152 18.896 18.384 18.697 54.888 0.623 27.444
> 1048576 18.505 17.769 17.804 56.826 1.055 56.826
> 524288 18.319 17.689 17.941 56.955 0.816 113.910
> 262144 19.227 17.770 18.212 55.704 1.821 222.815
> 131072 18.738 18.227 17.869 56.044 1.090 448.354
> 65536 19.319 18.525 18.084 54.969 1.494 879.504
> 32768 18.321 17.672 17.870 57.047 0.856 1825.495
> 16384 18.249 17.495 18.146 57.025 1.073 3649.582
>
> With io_context patch:
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.393 11.925 12.627 83.196 1.989 1.300
> 33554432 11.844 11.855 12.191 85.610 1.142 2.675
> 16777216 12.729 12.602 12.068 82.187 1.913 5.137
> 8388608 12.245 12.060 14.081 80.419 5.469 10.052
> 4194304 13.224 11.866 12.110 82.763 3.833 20.691
> 2097152 11.585 12.584 11.755 85.623 3.052 42.811
> 1048576 12.166 12.144 12.321 83.867 0.539 83.867
> 524288 12.019 12.148 12.160 84.568 0.448 169.137
> 262144 12.014 12.378 12.074 84.259 1.095 337.036
> 131072 11.840 12.068 11.849 85.921 0.756 687.369
> 65536 12.098 11.803 12.312 84.857 1.470 1357.720
> 32768 11.852 12.635 11.887 84.529 2.465 2704.931
> 16384 12.443 13.110 11.881 82.197 3.299 5260.620
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.033 12.122 11.950 82.911 3.110 1.295
> 33554432 12.386 13.357 12.082 81.364 3.429 2.543
> 16777216 12.102 11.542 12.053 86.096 1.860 5.381
> 8388608 12.240 11.740 11.789 85.917 1.601 10.740
> 4194304 11.824 12.388 12.042 84.768 1.621 21.192
> 2097152 11.962 12.283 11.973 84.832 1.036 42.416
> 1048576 12.639 11.863 12.010 84.197 2.290 84.197
> 524288 11.809 12.919 11.853 84.121 3.439 168.243
> 262144 12.105 12.649 12.779 81.894 1.940 327.577
> 131072 12.441 12.769 12.713 81.017 0.923 648.137
> 65536 12.490 13.308 12.440 80.414 2.457 1286.630
> 32768 13.235 11.917 12.300 82.184 3.576 2629.883
> 16384 12.335 12.394 12.201 83.187 0.549 5323.990
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.017 12.334 12.151 84.168 0.897 1.315
> 33554432 12.265 12.200 11.976 84.310 0.864 2.635
> 16777216 12.356 11.972 12.292 83.903 1.165 5.244
> 8388608 12.247 12.368 11.769 84.472 1.825 10.559
> 4194304 11.888 11.974 12.144 85.325 0.754 21.331
> 2097152 12.433 10.938 11.669 87.911 4.595 43.956
> 1048576 11.748 12.271 12.498 84.180 2.196 84.180
> 524288 11.726 11.681 12.322 86.031 2.075 172.062
> 262144 12.593 12.263 11.939 83.530 1.817 334.119
> 131072 11.874 12.265 12.441 84.012 1.648 672.093
> 65536 12.119 11.848 12.037 85.330 0.809 1365.277
> 32768 12.549 12.080 12.008 83.882 1.625 2684.238
> 16384 12.369 12.087 12.589 82.949 1.385 5308.766
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.664 11.793 11.963 84.428 2.575 1.319
> 33554432 11.825 12.074 12.442 84.571 1.761 2.643
> 16777216 11.997 11.952 10.905 88.311 3.958 5.519
> 8388608 11.866 12.270 11.796 85.519 1.476 10.690
> 4194304 11.754 12.095 12.539 84.483 2.230 21.121
> 2097152 11.948 11.633 11.886 86.628 1.007 43.314
> 1048576 12.029 12.519 11.701 84.811 2.345 84.811
> 524288 11.928 12.011 12.049 85.363 0.361 170.726
> 262144 12.559 11.827 11.729 85.140 2.566 340.558
> 131072 12.015 12.356 11.587 85.494 2.253 683.952
> 65536 11.741 12.113 11.931 85.861 1.093 1373.770
> 32768 12.655 11.738 12.237 83.945 2.589 2686.246
> 16384 11.928 12.423 11.875 84.834 1.711 5429.381
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.570 13.491 14.299 74.326 1.927 1.161
> 33554432 13.238 13.198 13.255 77.398 0.142 2.419
> 16777216 13.851 13.199 13.463 75.857 1.497 4.741
> 8388608 13.339 16.695 13.551 71.223 7.010 8.903
> 4194304 13.689 13.173 14.258 74.787 2.415 18.697
> 2097152 13.518 13.543 13.894 75.021 0.934 37.510
> 1048576 14.119 14.030 13.820 73.202 0.659 73.202
> 524288 13.747 14.781 13.820 72.621 2.369 145.243
> 262144 14.168 13.652 14.165 73.189 1.284 292.757
> 131072 14.112 13.868 14.213 72.817 0.753 582.535
> 65536 14.604 13.762 13.725 73.045 2.071 1168.728
> 32768 14.796 15.356 14.486 68.861 1.653 2203.564
> 16384 13.079 13.525 13.427 76.757 1.111 4912.426
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 20.372 18.077 17.262 55.411 3.800 0.866
> 33554432 17.287 17.620 17.828 58.263 0.740 1.821
> 16777216 16.802 18.154 17.315 58.831 1.865 3.677
> 8388608 17.510 18.291 17.253 57.939 1.427 7.242
> 4194304 17.059 17.706 17.352 58.958 0.897 14.740
> 2097152 17.252 18.064 17.615 58.059 1.090 29.029
> 1048576 17.082 17.373 17.688 58.927 0.838 58.927
> 524288 17.129 17.271 17.583 59.103 0.644 118.206
> 262144 17.411 17.695 18.048 57.808 0.848 231.231
> 131072 17.937 17.704 18.681 56.581 1.285 452.649
> 65536 17.927 17.465 17.907 57.646 0.698 922.338
> 32768 18.494 17.820 17.719 56.875 1.073 1819.985
> 16384 18.800 17.759 17.575 56.798 1.666 3635.058
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 20.045 21.881 20.018 49.680 2.037 0.776
> 33554432 20.768 20.291 20.464 49.938 0.479 1.561
> 16777216 21.563 20.714 20.429 49.017 1.116 3.064
> 8388608 21.290 21.109 21.308 48.221 0.205 6.028
> 4194304 22.240 20.662 21.088 48.054 1.479 12.013
> 2097152 20.282 21.098 20.580 49.593 0.806 24.796
> 1048576 20.367 19.929 20.252 50.741 0.469 50.741
> 524288 20.885 21.203 20.684 48.945 0.498 97.890
> 262144 19.982 21.375 20.798 49.463 1.373 197.853
> 131072 20.744 21.590 19.698 49.593 1.866 396.740
> 65536 21.586 20.953 21.055 48.314 0.627 773.024
> 32768 21.228 20.307 21.049 49.104 0.950 1571.327
> 16384 21.257 21.209 21.150 48.289 0.100 3090.498
>
> Ronald.

2009-07-13 12:47:35

by Ronald Moesbergen

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

2009/7/13 Wu Fengguang <[email protected]>:
> On Mon, Jul 13, 2009 at 08:12:14PM +0800, Ronald Moesbergen wrote:
>> 2009/7/10 Vladislav Bolkhovitin <[email protected]>:
>> >
>> > Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
>> >>
>> >> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>> >>>>
>> >>>> I've also long ago noticed that reading data from block devices is
>> >>>> slower
>> >>>> than from files from mounted on those block devices file systems. Can
>> >>>> anybody explain it?
>> >>>>
>> >>>> Looks like this is strangeness #2 which we uncovered in our tests (the
>> >>>> first
>> >>>> one was earlier in this thread why the context RA doesn't work with
>> >>>> cooperative I/O threads as good as it should).
>> >>>>
>> >>>> Can you rerun the same 11 tests over a file on the file system, please?
>> >>>
>> >>> I'll see what I can do. Just te be sure: you want me to run
>> >>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>> >>> on the client over iScsi, right?
>> >>
>> >> Yes, please.
>> >
>> > Forgot to mention that you should also configure your backend storage as a
>> > big file on a file system (preferably, XFS) too, not as direct device, like
>> > /dev/vg/db-master.
>>
>> Ok, here are the results:
>
> Ronald, thanks for the numbers!

You're welcome.

>> client kernel: 2.6.26-15lenny3 (debian)
>> server kernel: 2.6.29.5 with readahead patch
>
> Do you mean the context readahead patch?

No, I meant the blk_run_backing_dev patch. The patchnames are
confusing, I'll be sure to clarify them from now on.

Ronald.

2009-07-13 12:52:41

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Mon, Jul 13, 2009 at 08:47:31PM +0800, Ronald Moesbergen wrote:
> 2009/7/13 Wu Fengguang <[email protected]>:
> > On Mon, Jul 13, 2009 at 08:12:14PM +0800, Ronald Moesbergen wrote:
> >> 2009/7/10 Vladislav Bolkhovitin <[email protected]>:
> >> >
> >> > Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
> >> >>
> >> >> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
> >> >>>>
> >> >>>> I've also long ago noticed that reading data from block devices is
> >> >>>> slower
> >> >>>> than from files from mounted on those block devices file systems. Can
> >> >>>> anybody explain it?
> >> >>>>
> >> >>>> Looks like this is strangeness #2 which we uncovered in our tests (the
> >> >>>> first
> >> >>>> one was earlier in this thread why the context RA doesn't work with
> >> >>>> cooperative I/O threads as good as it should).
> >> >>>>
> >> >>>> Can you rerun the same 11 tests over a file on the file system, please?
> >> >>>
> >> >>> I'll see what I can do. Just te be sure: you want me to run
> >> >>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
> >> >>> on the client over iScsi, right?
> >> >>
> >> >> Yes, please.
> >> >
> >> > Forgot to mention that you should also configure your backend storage as a
> >> > big file on a file system (preferably, XFS) too, not as direct device, like
> >> > /dev/vg/db-master.
> >>
> >> Ok, here are the results:
> >
> > Ronald, thanks for the numbers!
>
> You're welcome.
>
> >> client kernel: 2.6.26-15lenny3 (debian)
> >> server kernel: 2.6.29.5 with readahead patch
> >
> > Do you mean the context readahead patch?
>
> No, I meant the blk_run_backing_dev patch. The patchnames are
> confusing, I'll be sure to clarify them from now on.

That's OK. I did see previous benchmarks were not helped by context
readahead noticeably on CFQ, hehe.

Thanks,
Fengguang

2009-07-14 18:53:00

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


Wu Fengguang, on 07/13/2009 04:36 PM wrote:
>> Test done with XFS on both the target and the initiator. This confirms
>> your findings, using files instead of block devices is faster, but
>> only when using the io_context patch.
>
> It shows that the one really matters is the io_context patch,
> even when context readahead is running. I guess what happened
> in the tests are:
> - without readahead (or readahead algorithm failed to do proper
> sequential readaheads), the SCST processes will be submitting
> small but close to each other IOs. CFQ relies on the io_context
> patch to prevent unnecessary idling.
> - with proper readahead, the SCST processes will also be submitting
> close readahead IOs. For example, one file's 100-102MB pages is
> readahead by process A, while its 102-104MB pages may be
> readahead by process B. In this case CFQ will also idle waiting
> for process A to submit the next IO, but in fact that IO is being
> submitted by process B. So the io_context patch is still necessary
> even when context readahead is working fine. I guess context
> readahead do have the added value of possibly enlarging the IO size
> (however this benchmark seems to not very sensitive to IO size).

Looks like the truth. Although with 2MB RA I expect CFQ to do idling >10
times less, which should bring bigger improvement than few %%.

For how long CFQ idles? For HZ/125, i.e. 8 ms with HZ 250?

> Thanks,
> Fengguang

2009-07-14 18:52:49

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

Ronald Moesbergen, on 07/13/2009 04:12 PM wrote:
> 2009/7/10 Vladislav Bolkhovitin <[email protected]>:
>> Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
>>> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>>>>> I've also long ago noticed that reading data from block devices is
>>>>> slower
>>>>> than from files from mounted on those block devices file systems. Can
>>>>> anybody explain it?
>>>>>
>>>>> Looks like this is strangeness #2 which we uncovered in our tests (the
>>>>> first
>>>>> one was earlier in this thread why the context RA doesn't work with
>>>>> cooperative I/O threads as good as it should).
>>>>>
>>>>> Can you rerun the same 11 tests over a file on the file system, please?
>>>> I'll see what I can do. Just te be sure: you want me to run
>>>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>>>> on the client over iScsi, right?
>>> Yes, please.
>> Forgot to mention that you should also configure your backend storage as a
>> big file on a file system (preferably, XFS) too, not as direct device, like
>> /dev/vg/db-master.
>
> Ok, here are the results:
>
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead patch
>
> Test done with XFS on both the target and the initiator. This confirms
> your findings, using files instead of block devices is faster, but
> only when using the io_context patch.

Seems, correct, except case (2), which is still 10% faster.

> Without io_context patch:
> 1) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.327 18.327 17.740 56.491 0.872 0.883
> 33554432 18.662 18.311 18.116 55.772 0.683 1.743
> 16777216 18.900 18.421 18.312 55.229 0.754 3.452
> 8388608 18.893 18.533 18.281 55.156 0.743 6.895
> 4194304 18.512 18.097 18.400 55.850 0.536 13.963
> 2097152 18.635 18.313 18.676 55.232 0.486 27.616
> 1048576 18.441 18.264 18.245 55.907 0.267 55.907
> 524288 17.773 18.669 18.459 55.980 1.184 111.960
> 262144 18.580 18.758 17.483 56.091 1.767 224.365
> 131072 17.224 18.333 18.765 56.626 2.067 453.006
> 65536 18.082 19.223 18.238 55.348 1.483 885.567
> 32768 17.719 18.293 18.198 56.680 0.795 1813.766
> 16384 17.872 18.322 17.537 57.192 1.024 3660.273
>
> 2) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.738 18.435 18.400 55.283 0.451 0.864
> 33554432 18.046 18.167 17.572 57.128 0.826 1.785
> 16777216 18.504 18.203 18.377 55.771 0.376 3.486
> 8388608 22.069 18.554 17.825 53.013 4.766 6.627
> 4194304 19.211 18.136 18.083 55.465 1.529 13.866
> 2097152 18.647 17.851 18.511 55.866 1.071 27.933
> 1048576 19.084 18.177 18.194 55.425 1.249 55.425
> 524288 18.999 18.553 18.380 54.934 0.763 109.868
> 262144 18.867 18.273 18.063 55.668 1.020 222.673
> 131072 17.846 18.966 18.193 55.885 1.412 447.081
> 65536 18.195 18.616 18.482 55.564 0.530 889.023
> 32768 17.882 18.841 17.707 56.481 1.525 1807.394
> 16384 17.073 18.278 17.985 57.646 1.689 3689.369
>
> 3) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.658 17.830 19.258 55.162 1.750 0.862
> 33554432 17.193 18.265 18.517 56.974 1.854 1.780
> 16777216 17.531 17.681 18.776 56.955 1.720 3.560
> 8388608 18.234 17.547 18.201 56.926 1.014 7.116
> 4194304 18.057 17.923 17.901 57.015 0.218 14.254
> 2097152 18.565 17.739 17.658 56.958 1.277 28.479
> 1048576 18.393 17.433 17.314 57.851 1.550 57.851
> 524288 18.939 17.835 18.972 55.152 1.600 110.304
> 262144 18.562 19.005 18.069 55.240 1.141 220.959
> 131072 19.574 17.562 18.251 55.576 2.476 444.611
> 65536 19.117 18.019 17.886 55.882 1.647 894.115
> 32768 18.237 17.415 17.482 57.842 1.200 1850.933
> 16384 17.760 18.444 18.055 56.631 0.876 3624.391
>
> 4) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.368 17.495 18.524 56.520 1.434 0.883
> 33554432 18.209 17.523 19.146 56.052 2.027 1.752
> 16777216 18.765 18.053 18.550 55.497 0.903 3.469
> 8388608 17.878 17.848 18.389 56.778 0.774 7.097
> 4194304 18.058 17.683 18.567 56.589 1.129 14.147
> 2097152 18.896 18.384 18.697 54.888 0.623 27.444
> 1048576 18.505 17.769 17.804 56.826 1.055 56.826
> 524288 18.319 17.689 17.941 56.955 0.816 113.910
> 262144 19.227 17.770 18.212 55.704 1.821 222.815
> 131072 18.738 18.227 17.869 56.044 1.090 448.354
> 65536 19.319 18.525 18.084 54.969 1.494 879.504
> 32768 18.321 17.672 17.870 57.047 0.856 1825.495
> 16384 18.249 17.495 18.146 57.025 1.073 3649.582
>
> With io_context patch:
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.393 11.925 12.627 83.196 1.989 1.300
> 33554432 11.844 11.855 12.191 85.610 1.142 2.675
> 16777216 12.729 12.602 12.068 82.187 1.913 5.137
> 8388608 12.245 12.060 14.081 80.419 5.469 10.052
> 4194304 13.224 11.866 12.110 82.763 3.833 20.691
> 2097152 11.585 12.584 11.755 85.623 3.052 42.811
> 1048576 12.166 12.144 12.321 83.867 0.539 83.867
> 524288 12.019 12.148 12.160 84.568 0.448 169.137
> 262144 12.014 12.378 12.074 84.259 1.095 337.036
> 131072 11.840 12.068 11.849 85.921 0.756 687.369
> 65536 12.098 11.803 12.312 84.857 1.470 1357.720
> 32768 11.852 12.635 11.887 84.529 2.465 2704.931
> 16384 12.443 13.110 11.881 82.197 3.299 5260.620
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.033 12.122 11.950 82.911 3.110 1.295
> 33554432 12.386 13.357 12.082 81.364 3.429 2.543
> 16777216 12.102 11.542 12.053 86.096 1.860 5.381
> 8388608 12.240 11.740 11.789 85.917 1.601 10.740
> 4194304 11.824 12.388 12.042 84.768 1.621 21.192
> 2097152 11.962 12.283 11.973 84.832 1.036 42.416
> 1048576 12.639 11.863 12.010 84.197 2.290 84.197
> 524288 11.809 12.919 11.853 84.121 3.439 168.243
> 262144 12.105 12.649 12.779 81.894 1.940 327.577
> 131072 12.441 12.769 12.713 81.017 0.923 648.137
> 65536 12.490 13.308 12.440 80.414 2.457 1286.630
> 32768 13.235 11.917 12.300 82.184 3.576 2629.883
> 16384 12.335 12.394 12.201 83.187 0.549 5323.990
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.017 12.334 12.151 84.168 0.897 1.315
> 33554432 12.265 12.200 11.976 84.310 0.864 2.635
> 16777216 12.356 11.972 12.292 83.903 1.165 5.244
> 8388608 12.247 12.368 11.769 84.472 1.825 10.559
> 4194304 11.888 11.974 12.144 85.325 0.754 21.331
> 2097152 12.433 10.938 11.669 87.911 4.595 43.956
> 1048576 11.748 12.271 12.498 84.180 2.196 84.180
> 524288 11.726 11.681 12.322 86.031 2.075 172.062
> 262144 12.593 12.263 11.939 83.530 1.817 334.119
> 131072 11.874 12.265 12.441 84.012 1.648 672.093
> 65536 12.119 11.848 12.037 85.330 0.809 1365.277
> 32768 12.549 12.080 12.008 83.882 1.625 2684.238
> 16384 12.369 12.087 12.589 82.949 1.385 5308.766
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.664 11.793 11.963 84.428 2.575 1.319
> 33554432 11.825 12.074 12.442 84.571 1.761 2.643
> 16777216 11.997 11.952 10.905 88.311 3.958 5.519
> 8388608 11.866 12.270 11.796 85.519 1.476 10.690
> 4194304 11.754 12.095 12.539 84.483 2.230 21.121
> 2097152 11.948 11.633 11.886 86.628 1.007 43.314
> 1048576 12.029 12.519 11.701 84.811 2.345 84.811
> 524288 11.928 12.011 12.049 85.363 0.361 170.726
> 262144 12.559 11.827 11.729 85.140 2.566 340.558
> 131072 12.015 12.356 11.587 85.494 2.253 683.952
> 65536 11.741 12.113 11.931 85.861 1.093 1373.770
> 32768 12.655 11.738 12.237 83.945 2.589 2686.246
> 16384 11.928 12.423 11.875 84.834 1.711 5429.381
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.570 13.491 14.299 74.326 1.927 1.161
> 33554432 13.238 13.198 13.255 77.398 0.142 2.419
> 16777216 13.851 13.199 13.463 75.857 1.497 4.741
> 8388608 13.339 16.695 13.551 71.223 7.010 8.903
> 4194304 13.689 13.173 14.258 74.787 2.415 18.697
> 2097152 13.518 13.543 13.894 75.021 0.934 37.510
> 1048576 14.119 14.030 13.820 73.202 0.659 73.202
> 524288 13.747 14.781 13.820 72.621 2.369 145.243
> 262144 14.168 13.652 14.165 73.189 1.284 292.757
> 131072 14.112 13.868 14.213 72.817 0.753 582.535
> 65536 14.604 13.762 13.725 73.045 2.071 1168.728
> 32768 14.796 15.356 14.486 68.861 1.653 2203.564
> 16384 13.079 13.525 13.427 76.757 1.111 4912.426
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 20.372 18.077 17.262 55.411 3.800 0.866
> 33554432 17.287 17.620 17.828 58.263 0.740 1.821
> 16777216 16.802 18.154 17.315 58.831 1.865 3.677
> 8388608 17.510 18.291 17.253 57.939 1.427 7.242
> 4194304 17.059 17.706 17.352 58.958 0.897 14.740
> 2097152 17.252 18.064 17.615 58.059 1.090 29.029
> 1048576 17.082 17.373 17.688 58.927 0.838 58.927
> 524288 17.129 17.271 17.583 59.103 0.644 118.206
> 262144 17.411 17.695 18.048 57.808 0.848 231.231
> 131072 17.937 17.704 18.681 56.581 1.285 452.649
> 65536 17.927 17.465 17.907 57.646 0.698 922.338
> 32768 18.494 17.820 17.719 56.875 1.073 1819.985
> 16384 18.800 17.759 17.575 56.798 1.666 3635.058
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 20.045 21.881 20.018 49.680 2.037 0.776
> 33554432 20.768 20.291 20.464 49.938 0.479 1.561
> 16777216 21.563 20.714 20.429 49.017 1.116 3.064
> 8388608 21.290 21.109 21.308 48.221 0.205 6.028
> 4194304 22.240 20.662 21.088 48.054 1.479 12.013
> 2097152 20.282 21.098 20.580 49.593 0.806 24.796
> 1048576 20.367 19.929 20.252 50.741 0.469 50.741
> 524288 20.885 21.203 20.684 48.945 0.498 97.890
> 262144 19.982 21.375 20.798 49.463 1.373 197.853
> 131072 20.744 21.590 19.698 49.593 1.866 396.740
> 65536 21.586 20.953 21.055 48.314 0.627 773.024
> 32768 21.228 20.307 21.049 49.104 0.950 1571.327
> 16384 21.257 21.209 21.150 48.289 0.100 3090.498

The drop with 64 max_sectors_kb on the client is a consequence of how
CFQ is working. I can't find the exact code responsible for this, but
from all signs, CFQ stops delaying requests if amount of outstanding
requests exceeds some threshold, which is 2 or 3. With 64 max_sectors_kb
and 5 SCST I/O threads this threshold is exceeded, so CFQ doesn't
recover order of requests, hence the performance drop. With default 512
max_sectors_kb and 128K RA the server sees at max 2 requests at time.

Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
please?

You can limit amount of SCST I/O threads by num_threads parameter of
scst_vdisk module.

Thanks,
Vlad

2009-07-15 06:30:45

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

Vladislav Bolkhovitin, on 07/14/2009 10:52 PM wrote:
> Ronald Moesbergen, on 07/13/2009 04:12 PM wrote:
>> 2009/7/10 Vladislav Bolkhovitin <[email protected]>:
>>> Vladislav Bolkhovitin, on 07/10/2009 12:43 PM wrote:
>>>> Ronald Moesbergen, on 07/10/2009 10:32 AM wrote:
>>>>>> I've also long ago noticed that reading data from block devices is
>>>>>> slower
>>>>>> than from files from mounted on those block devices file systems. Can
>>>>>> anybody explain it?
>>>>>>
>>>>>> Looks like this is strangeness #2 which we uncovered in our tests (the
>>>>>> first
>>>>>> one was earlier in this thread why the context RA doesn't work with
>>>>>> cooperative I/O threads as good as it should).
>>>>>>
>>>>>> Can you rerun the same 11 tests over a file on the file system, please?
>>>>> I'll see what I can do. Just te be sure: you want me to run
>>>>> blockdev-perftest on a file on the OCFS2 filesystem which is mounted
>>>>> on the client over iScsi, right?
>>>> Yes, please.
>>> Forgot to mention that you should also configure your backend storage as a
>>> big file on a file system (preferably, XFS) too, not as direct device, like
>>> /dev/vg/db-master.
>> Ok, here are the results:
>>
>> client kernel: 2.6.26-15lenny3 (debian)
>> server kernel: 2.6.29.5 with readahead patch
>>
>> Test done with XFS on both the target and the initiator. This confirms
>> your findings, using files instead of block devices is faster, but
>> only when using the io_context patch.
>
> Seems, correct, except case (2), which is still 10% faster.
>
>> Without io_context patch:
>> 1) client: default, server: default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 18.327 18.327 17.740 56.491 0.872 0.883
>> 33554432 18.662 18.311 18.116 55.772 0.683 1.743
>> 16777216 18.900 18.421 18.312 55.229 0.754 3.452
>> 8388608 18.893 18.533 18.281 55.156 0.743 6.895
>> 4194304 18.512 18.097 18.400 55.850 0.536 13.963
>> 2097152 18.635 18.313 18.676 55.232 0.486 27.616
>> 1048576 18.441 18.264 18.245 55.907 0.267 55.907
>> 524288 17.773 18.669 18.459 55.980 1.184 111.960
>> 262144 18.580 18.758 17.483 56.091 1.767 224.365
>> 131072 17.224 18.333 18.765 56.626 2.067 453.006
>> 65536 18.082 19.223 18.238 55.348 1.483 885.567
>> 32768 17.719 18.293 18.198 56.680 0.795 1813.766
>> 16384 17.872 18.322 17.537 57.192 1.024 3660.273
>>
>> 2) client: default, server: 64 max_sectors_kb, RA default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 18.738 18.435 18.400 55.283 0.451 0.864
>> 33554432 18.046 18.167 17.572 57.128 0.826 1.785
>> 16777216 18.504 18.203 18.377 55.771 0.376 3.486
>> 8388608 22.069 18.554 17.825 53.013 4.766 6.627
>> 4194304 19.211 18.136 18.083 55.465 1.529 13.866
>> 2097152 18.647 17.851 18.511 55.866 1.071 27.933
>> 1048576 19.084 18.177 18.194 55.425 1.249 55.425
>> 524288 18.999 18.553 18.380 54.934 0.763 109.868
>> 262144 18.867 18.273 18.063 55.668 1.020 222.673
>> 131072 17.846 18.966 18.193 55.885 1.412 447.081
>> 65536 18.195 18.616 18.482 55.564 0.530 889.023
>> 32768 17.882 18.841 17.707 56.481 1.525 1807.394
>> 16384 17.073 18.278 17.985 57.646 1.689 3689.369
>>
>> 3) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 18.658 17.830 19.258 55.162 1.750 0.862
>> 33554432 17.193 18.265 18.517 56.974 1.854 1.780
>> 16777216 17.531 17.681 18.776 56.955 1.720 3.560
>> 8388608 18.234 17.547 18.201 56.926 1.014 7.116
>> 4194304 18.057 17.923 17.901 57.015 0.218 14.254
>> 2097152 18.565 17.739 17.658 56.958 1.277 28.479
>> 1048576 18.393 17.433 17.314 57.851 1.550 57.851
>> 524288 18.939 17.835 18.972 55.152 1.600 110.304
>> 262144 18.562 19.005 18.069 55.240 1.141 220.959
>> 131072 19.574 17.562 18.251 55.576 2.476 444.611
>> 65536 19.117 18.019 17.886 55.882 1.647 894.115
>> 32768 18.237 17.415 17.482 57.842 1.200 1850.933
>> 16384 17.760 18.444 18.055 56.631 0.876 3624.391
>>
>> 4) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 18.368 17.495 18.524 56.520 1.434 0.883
>> 33554432 18.209 17.523 19.146 56.052 2.027 1.752
>> 16777216 18.765 18.053 18.550 55.497 0.903 3.469
>> 8388608 17.878 17.848 18.389 56.778 0.774 7.097
>> 4194304 18.058 17.683 18.567 56.589 1.129 14.147
>> 2097152 18.896 18.384 18.697 54.888 0.623 27.444
>> 1048576 18.505 17.769 17.804 56.826 1.055 56.826
>> 524288 18.319 17.689 17.941 56.955 0.816 113.910
>> 262144 19.227 17.770 18.212 55.704 1.821 222.815
>> 131072 18.738 18.227 17.869 56.044 1.090 448.354
>> 65536 19.319 18.525 18.084 54.969 1.494 879.504
>> 32768 18.321 17.672 17.870 57.047 0.856 1825.495
>> 16384 18.249 17.495 18.146 57.025 1.073 3649.582
>>
>> With io_context patch:
>> 5) client: default, server: default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.393 11.925 12.627 83.196 1.989 1.300
>> 33554432 11.844 11.855 12.191 85.610 1.142 2.675
>> 16777216 12.729 12.602 12.068 82.187 1.913 5.137
>> 8388608 12.245 12.060 14.081 80.419 5.469 10.052
>> 4194304 13.224 11.866 12.110 82.763 3.833 20.691
>> 2097152 11.585 12.584 11.755 85.623 3.052 42.811
>> 1048576 12.166 12.144 12.321 83.867 0.539 83.867
>> 524288 12.019 12.148 12.160 84.568 0.448 169.137
>> 262144 12.014 12.378 12.074 84.259 1.095 337.036
>> 131072 11.840 12.068 11.849 85.921 0.756 687.369
>> 65536 12.098 11.803 12.312 84.857 1.470 1357.720
>> 32768 11.852 12.635 11.887 84.529 2.465 2704.931
>> 16384 12.443 13.110 11.881 82.197 3.299 5260.620
>>
>> 6) client: default, server: 64 max_sectors_kb, RA default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 13.033 12.122 11.950 82.911 3.110 1.295
>> 33554432 12.386 13.357 12.082 81.364 3.429 2.543
>> 16777216 12.102 11.542 12.053 86.096 1.860 5.381
>> 8388608 12.240 11.740 11.789 85.917 1.601 10.740
>> 4194304 11.824 12.388 12.042 84.768 1.621 21.192
>> 2097152 11.962 12.283 11.973 84.832 1.036 42.416
>> 1048576 12.639 11.863 12.010 84.197 2.290 84.197
>> 524288 11.809 12.919 11.853 84.121 3.439 168.243
>> 262144 12.105 12.649 12.779 81.894 1.940 327.577
>> 131072 12.441 12.769 12.713 81.017 0.923 648.137
>> 65536 12.490 13.308 12.440 80.414 2.457 1286.630
>> 32768 13.235 11.917 12.300 82.184 3.576 2629.883
>> 16384 12.335 12.394 12.201 83.187 0.549 5323.990
>>
>> 7) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.017 12.334 12.151 84.168 0.897 1.315
>> 33554432 12.265 12.200 11.976 84.310 0.864 2.635
>> 16777216 12.356 11.972 12.292 83.903 1.165 5.244
>> 8388608 12.247 12.368 11.769 84.472 1.825 10.559
>> 4194304 11.888 11.974 12.144 85.325 0.754 21.331
>> 2097152 12.433 10.938 11.669 87.911 4.595 43.956
>> 1048576 11.748 12.271 12.498 84.180 2.196 84.180
>> 524288 11.726 11.681 12.322 86.031 2.075 172.062
>> 262144 12.593 12.263 11.939 83.530 1.817 334.119
>> 131072 11.874 12.265 12.441 84.012 1.648 672.093
>> 65536 12.119 11.848 12.037 85.330 0.809 1365.277
>> 32768 12.549 12.080 12.008 83.882 1.625 2684.238
>> 16384 12.369 12.087 12.589 82.949 1.385 5308.766
>>
>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.664 11.793 11.963 84.428 2.575 1.319
>> 33554432 11.825 12.074 12.442 84.571 1.761 2.643
>> 16777216 11.997 11.952 10.905 88.311 3.958 5.519
>> 8388608 11.866 12.270 11.796 85.519 1.476 10.690
>> 4194304 11.754 12.095 12.539 84.483 2.230 21.121
>> 2097152 11.948 11.633 11.886 86.628 1.007 43.314
>> 1048576 12.029 12.519 11.701 84.811 2.345 84.811
>> 524288 11.928 12.011 12.049 85.363 0.361 170.726
>> 262144 12.559 11.827 11.729 85.140 2.566 340.558
>> 131072 12.015 12.356 11.587 85.494 2.253 683.952
>> 65536 11.741 12.113 11.931 85.861 1.093 1373.770
>> 32768 12.655 11.738 12.237 83.945 2.589 2686.246
>> 16384 11.928 12.423 11.875 84.834 1.711 5429.381
>>
>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 13.570 13.491 14.299 74.326 1.927 1.161
>> 33554432 13.238 13.198 13.255 77.398 0.142 2.419
>> 16777216 13.851 13.199 13.463 75.857 1.497 4.741
>> 8388608 13.339 16.695 13.551 71.223 7.010 8.903
>> 4194304 13.689 13.173 14.258 74.787 2.415 18.697
>> 2097152 13.518 13.543 13.894 75.021 0.934 37.510
>> 1048576 14.119 14.030 13.820 73.202 0.659 73.202
>> 524288 13.747 14.781 13.820 72.621 2.369 145.243
>> 262144 14.168 13.652 14.165 73.189 1.284 292.757
>> 131072 14.112 13.868 14.213 72.817 0.753 582.535
>> 65536 14.604 13.762 13.725 73.045 2.071 1168.728
>> 32768 14.796 15.356 14.486 68.861 1.653 2203.564
>> 16384 13.079 13.525 13.427 76.757 1.111 4912.426
>>
>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 20.372 18.077 17.262 55.411 3.800 0.866
>> 33554432 17.287 17.620 17.828 58.263 0.740 1.821
>> 16777216 16.802 18.154 17.315 58.831 1.865 3.677
>> 8388608 17.510 18.291 17.253 57.939 1.427 7.242
>> 4194304 17.059 17.706 17.352 58.958 0.897 14.740
>> 2097152 17.252 18.064 17.615 58.059 1.090 29.029
>> 1048576 17.082 17.373 17.688 58.927 0.838 58.927
>> 524288 17.129 17.271 17.583 59.103 0.644 118.206
>> 262144 17.411 17.695 18.048 57.808 0.848 231.231
>> 131072 17.937 17.704 18.681 56.581 1.285 452.649
>> 65536 17.927 17.465 17.907 57.646 0.698 922.338
>> 32768 18.494 17.820 17.719 56.875 1.073 1819.985
>> 16384 18.800 17.759 17.575 56.798 1.666 3635.058
>>
>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 20.045 21.881 20.018 49.680 2.037 0.776
>> 33554432 20.768 20.291 20.464 49.938 0.479 1.561
>> 16777216 21.563 20.714 20.429 49.017 1.116 3.064
>> 8388608 21.290 21.109 21.308 48.221 0.205 6.028
>> 4194304 22.240 20.662 21.088 48.054 1.479 12.013
>> 2097152 20.282 21.098 20.580 49.593 0.806 24.796
>> 1048576 20.367 19.929 20.252 50.741 0.469 50.741
>> 524288 20.885 21.203 20.684 48.945 0.498 97.890
>> 262144 19.982 21.375 20.798 49.463 1.373 197.853
>> 131072 20.744 21.590 19.698 49.593 1.866 396.740
>> 65536 21.586 20.953 21.055 48.314 0.627 773.024
>> 32768 21.228 20.307 21.049 49.104 0.950 1571.327
>> 16384 21.257 21.209 21.150 48.289 0.100 3090.498
>
> The drop with 64 max_sectors_kb on the client is a consequence of how
> CFQ is working. I can't find the exact code responsible for this, but
> from all signs, CFQ stops delaying requests if amount of outstanding
> requests exceeds some threshold, which is 2 or 3. With 64 max_sectors_kb
> and 5 SCST I/O threads this threshold is exceeded, so CFQ doesn't
> recover order of requests, hence the performance drop. With default 512
> max_sectors_kb and 128K RA the server sees at max 2 requests at time.
>
> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
> please?

With context-RA patch, please, in those and future tests, since it
should make RA for cooperative threads much better.

> You can limit amount of SCST I/O threads by num_threads parameter of
> scst_vdisk module.
>
> Thanks,
> Vlad
>

2009-07-15 07:06:36

by Fengguang Wu

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

On Wed, Jul 15, 2009 at 02:52:27AM +0800, Vladislav Bolkhovitin wrote:
>
> Wu Fengguang, on 07/13/2009 04:36 PM wrote:
> >> Test done with XFS on both the target and the initiator. This confirms
> >> your findings, using files instead of block devices is faster, but
> >> only when using the io_context patch.
> >
> > It shows that the one really matters is the io_context patch,
> > even when context readahead is running. I guess what happened
> > in the tests are:
> > - without readahead (or readahead algorithm failed to do proper
> > sequential readaheads), the SCST processes will be submitting
> > small but close to each other IOs. CFQ relies on the io_context
> > patch to prevent unnecessary idling.
> > - with proper readahead, the SCST processes will also be submitting
> > close readahead IOs. For example, one file's 100-102MB pages is
> > readahead by process A, while its 102-104MB pages may be
> > readahead by process B. In this case CFQ will also idle waiting
> > for process A to submit the next IO, but in fact that IO is being
> > submitted by process B. So the io_context patch is still necessary
> > even when context readahead is working fine. I guess context
> > readahead do have the added value of possibly enlarging the IO size
> > (however this benchmark seems to not very sensitive to IO size).
>
> Looks like the truth. Although with 2MB RA I expect CFQ to do idling >10
> times less, which should bring bigger improvement than few %%.
>
> For how long CFQ idles? For HZ/125, i.e. 8 ms with HZ 250?

Yes, 8ms by default. Note that the 8ms idle time is armed when the
last IO from current process completes. So it would be definitely a
waste if the cooperative process submitted the next read/readahead
IO within this 8ms idle window (without cfq_coop.patch).

Thanks,
Fengguang

2009-07-15 20:51:47

by Kurt Garloff

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

Hi,

On Wed, Jul 08, 2009 at 04:40:08PM +0400, Vladislav Bolkhovitin wrote:
> I've also long ago noticed that reading data from block devices is slower
> than from files from mounted on those block devices file systems. Can
> anybody explain it?

Brainstorming:
- block size (reads on the block dev might be done with smaller size)
- readahead (do we use the same RA algo for block devs)
- page cache might be better optimized than buffer cache?

Just guesses from someone that has not looked into that area of the
kernel for a while, so take it with a grain of salt.

Cheers,
--
Kurt Garloff, VP OPS Partner Engineering -- Novell Inc.


Attachments:
(No filename) (639.00 B)
(No filename) (189.00 B)
Download all attachments

2009-07-16 07:32:51

by Ronald Moesbergen

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

2009/7/15 Vladislav Bolkhovitin <[email protected]>:
>> The drop with 64 max_sectors_kb on the client is a consequence of how CFQ
>> is working. I can't find the exact code responsible for this, but from all
>> signs, CFQ stops delaying requests if amount of outstanding requests exceeds
>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>> threads this threshold is exceeded, so CFQ doesn't recover order of
>> requests, hence the performance drop. With default 512 max_sectors_kb and
>> 128K RA the server sees at max 2 requests at time.
>>
>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>> please?

Ok. Should I still use the file-on-xfs testcase for this, or should I
go back to using a regular block device? The file-over-iscsi is quite
uncommon I suppose, most people will export a block device over iscsi,
not a file.

> With context-RA patch, please, in those and future tests, since it should
> make RA for cooperative threads much better.
>
>> You can limit amount of SCST I/O threads by num_threads parameter of
>> scst_vdisk module.

Ok, I'll try that and include the blk_run_backing_dev,
readahead-context and io_context patches.

Ronald.

2009-07-16 10:36:09

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
> 2009/7/15 Vladislav Bolkhovitin <[email protected]>:
>>> The drop with 64 max_sectors_kb on the client is a consequence of how CFQ
>>> is working. I can't find the exact code responsible for this, but from all
>>> signs, CFQ stops delaying requests if amount of outstanding requests exceeds
>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>> requests, hence the performance drop. With default 512 max_sectors_kb and
>>> 128K RA the server sees at max 2 requests at time.
>>>
>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>> please?
>
> Ok. Should I still use the file-on-xfs testcase for this, or should I
> go back to using a regular block device?

Yes, please

> The file-over-iscsi is quite
> uncommon I suppose, most people will export a block device over iscsi,
> not a file.

No, files are common. The main reason why people use direct block
devices is a not supported by anything believe that comparing with files
they "have less overhead", so "should be faster". But it isn't true and
can be easily checked.

>> With context-RA patch, please, in those and future tests, since it should
>> make RA for cooperative threads much better.
>>
>>> You can limit amount of SCST I/O threads by num_threads parameter of
>>> scst_vdisk module.
>
> Ok, I'll try that and include the blk_run_backing_dev,
> readahead-context and io_context patches.
>
> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2009-07-16 10:38:30

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


Kurt Garloff, on 07/16/2009 12:52 AM wrote:
> Hi,
>
> On Wed, Jul 08, 2009 at 04:40:08PM +0400, Vladislav Bolkhovitin wrote:
>> I've also long ago noticed that reading data from block devices is slower
>> than from files from mounted on those block devices file systems. Can
>> anybody explain it?
>
> Brainstorming:
> - block size (reads on the block dev might be done with smaller size)

As we already found out in this and other threads, smaller "block size",
i.e. each request size, is often means better throughput, sometimes much
better.

> - readahead (do we use the same RA algo for block devs)
> - page cache might be better optimized than buffer cache?
>
> Just guesses from someone that has not looked into that area of the
> kernel for a while, so take it with a grain of salt.
>
> Cheers,

2009-07-16 14:54:47

by Ronald Moesbergen

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

2009/7/16 Vladislav Bolkhovitin <[email protected]>:
>
> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>
>> 2009/7/15 Vladislav Bolkhovitin <[email protected]>:
>>>>
>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>> CFQ
>>>> is working. I can't find the exact code responsible for this, but from
>>>> all
>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>> exceeds
>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>> and
>>>> 128K RA the server sees at max 2 requests at time.
>>>>
>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>> please?
>>
>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>> go back to using a regular block device?
>
> Yes, please

As in: Yes, go back to block device, or Yes use file-on-xfs?

>> The file-over-iscsi is quite
>> uncommon I suppose, most people will export a block device over iscsi,
>> not a file.
>
> No, files are common. The main reason why people use direct block devices is
> a not supported by anything believe that comparing with files they "have
> less overhead", so "should be faster". But it isn't true and can be easily
> checked.

Well, there are other advantages of using a block device: they are
generally more manageble, for instance you can use LVM for resizing
instead of strange dd magic to extend a file. When using a file you
have to extend the volume that holds the file first, and then the file
itself. And you don't lose disk space to filesystem metadata twice.
Also, I still don't get why reads/writes from a blockdevice are
different in speed than reads/writes from a file on a filesystem. I
for one will not be using files exported over iscsi, but blockdevices
(LVM volumes).

Ronald.

2009-07-16 16:04:26

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


Ronald Moesbergen, on 07/16/2009 06:54 PM wrote:
> 2009/7/16 Vladislav Bolkhovitin <[email protected]>:
>> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>> 2009/7/15 Vladislav Bolkhovitin <[email protected]>:
>>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>>> CFQ
>>>>> is working. I can't find the exact code responsible for this, but from
>>>>> all
>>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>>> exceeds
>>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>>> and
>>>>> 128K RA the server sees at max 2 requests at time.
>>>>>
>>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>>> please?
>>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>>> go back to using a regular block device?
>> Yes, please
>
> As in: Yes, go back to block device, or Yes use file-on-xfs?

File-on-xfs :)

>>> The file-over-iscsi is quite
>>> uncommon I suppose, most people will export a block device over iscsi,
>>> not a file.
>> No, files are common. The main reason why people use direct block devices is
>> a not supported by anything believe that comparing with files they "have
>> less overhead", so "should be faster". But it isn't true and can be easily
>> checked.
>
> Well, there are other advantages of using a block device: they are
> generally more manageble, for instance you can use LVM for resizing
> instead of strange dd magic to extend a file. When using a file you
> have to extend the volume that holds the file first, and then the file
> itself.

Files also have advantages. For instance, it's easier to backup them and
move between servers. On modern systems with fallocate() syscall support
you don't have to do "strange dd magic" to resize files and can nearly
instantaneously make them bigger. Also with pretty simple modifications
scst_vdisk can be improved to make a single virtual device from several
files.

> And you don't lose disk space to filesystem metadata twice.

This is negligible (0.05% for XFS)

> Also, I still don't get why reads/writes from a blockdevice are
> different in speed than reads/writes from a file on a filesystem.

Me too and I'd appreciate if someone explain it. But I don't want to
introduce one more variable in the task we are solving (how to make
100+MB/s from iSCSI on your system).

> I
> for one will not be using files exported over iscsi, but blockdevices
> (LVM volumes).

Are you sure?

> Ronald.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2009-07-17 14:15:54

by Ronald Moesbergen

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

2009/7/16 Vladislav Bolkhovitin <[email protected]>:
>
> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>
>> 2009/7/15 Vladislav Bolkhovitin <[email protected]>:
>>>>
>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>> CFQ
>>>> is working. I can't find the exact code responsible for this, but from
>>>> all
>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>> exceeds
>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>> and
>>>> 128K RA the server sees at max 2 requests at time.
>>>>
>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>> please?
>>
>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>> go back to using a regular block device?
>
> Yes, please
>
>> The file-over-iscsi is quite
>> uncommon I suppose, most people will export a block device over iscsi,
>> not a file.
>
> No, files are common. The main reason why people use direct block devices is
> a not supported by anything believe that comparing with files they "have
> less overhead", so "should be faster". But it isn't true and can be easily
> checked.
>
>>> With context-RA patch, please, in those and future tests, since it should
>>> make RA for cooperative threads much better.
>>>
>>>> You can limit amount of SCST I/O threads by num_threads parameter of
>>>> scst_vdisk module.
>>
>> Ok, I'll try that and include the blk_run_backing_dev,
>> readahead-context and io_context patches.

The results:

client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context

With one IO thread:

5) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.990 15.308 16.689 64.097 2.259 1.002
33554432 15.981 16.064 16.221 63.651 0.392 1.989
16777216 15.841 15.660 16.031 64.635 0.619 4.040

6) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.035 16.024 16.654 63.084 1.130 0.986
33554432 15.924 15.975 16.359 63.668 0.762 1.990
16777216 16.168 16.104 15.838 63.858 0.571 3.991

7) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.895 16.142 15.998 65.398 2.379 1.022
33554432 16.753 16.169 16.067 62.729 1.146 1.960
16777216 16.866 15.912 16.099 62.892 1.570 3.931

8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.923 15.716 16.741 63.545 1.715 0.993
33554432 16.010 16.026 16.113 63.802 0.180 1.994
16777216 16.644 16.239 16.143 62.672 0.827 3.917

9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.753 15.882 15.482 65.207 0.697 1.019
33554432 15.670 16.268 15.669 64.548 1.134 2.017
16777216 15.746 15.519 16.411 64.471 1.516 4.029

10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.639 14.360 13.654 73.795 1.758 1.153
33554432 13.584 13.938 14.538 73.095 2.035 2.284
16777216 13.617 13.510 13.803 75.060 0.665 4.691

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.428 13.541 14.144 74.760 1.690 1.168
33554432 13.707 13.352 13.462 75.821 0.827 2.369
16777216 14.380 13.504 13.675 73.975 1.991 4.623

With two threads:
5) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.453 12.173 13.014 81.677 2.254 1.276
33554432 12.066 11.999 12.960 83.073 2.877 2.596
16777216 13.719 11.969 12.569 80.554 4.500 5.035

6) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.886 12.201 12.147 82.564 2.198 1.290
33554432 12.344 12.928 12.007 82.483 2.504 2.578
16777216 12.380 11.951 13.119 82.151 3.141 5.134

7) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.824 13.485 13.534 77.148 1.913 1.205
33554432 12.084 13.752 12.111 81.251 4.800 2.539
16777216 12.658 13.035 11.196 83.640 5.612 5.227

8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.253 12.552 11.773 84.044 2.230 1.313
33554432 13.177 12.456 11.604 82.723 4.316 2.585
16777216 12.471 12.318 13.006 81.324 1.878 5.083

9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.409 13.311 14.278 73.238 2.624 1.144
33554432 14.665 14.260 14.080 71.455 1.211 2.233
16777216 14.179 14.810 14.640 70.438 1.303 4.402

10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.401 14.107 13.549 74.860 1.642 1.170
33554432 14.575 13.221 14.428 72.894 3.236 2.278
16777216 13.771 14.227 13.594 73.887 1.408 4.618

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 10.286 12.272 10.245 94.317 7.690 1.474
33554432 10.241 10.415 13.374 91.624 10.670 2.863
16777216 10.499 10.224 10.792 97.526 2.151 6.095

The last result comes close to 100MB/s!

Ronald.

2009-07-17 18:23:30

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


Ronald Moesbergen, on 07/17/2009 06:15 PM wrote:
> 2009/7/16 Vladislav Bolkhovitin <[email protected]>:
>> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>> 2009/7/15 Vladislav Bolkhovitin <[email protected]>:
>>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>>> CFQ
>>>>> is working. I can't find the exact code responsible for this, but from
>>>>> all
>>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>>> exceeds
>>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>>> and
>>>>> 128K RA the server sees at max 2 requests at time.
>>>>>
>>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>>> please?
>>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>>> go back to using a regular block device?
>> Yes, please
>>
>>> The file-over-iscsi is quite
>>> uncommon I suppose, most people will export a block device over iscsi,
>>> not a file.
>> No, files are common. The main reason why people use direct block devices is
>> a not supported by anything believe that comparing with files they "have
>> less overhead", so "should be faster". But it isn't true and can be easily
>> checked.
>>
>>>> With context-RA patch, please, in those and future tests, since it should
>>>> make RA for cooperative threads much better.
>>>>
>>>>> You can limit amount of SCST I/O threads by num_threads parameter of
>>>>> scst_vdisk module.
>>> Ok, I'll try that and include the blk_run_backing_dev,
>>> readahead-context and io_context patches.
>
> The results:
>
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context
>
> With one IO thread:
>
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.990 15.308 16.689 64.097 2.259 1.002
> 33554432 15.981 16.064 16.221 63.651 0.392 1.989
> 16777216 15.841 15.660 16.031 64.635 0.619 4.040
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.035 16.024 16.654 63.084 1.130 0.986
> 33554432 15.924 15.975 16.359 63.668 0.762 1.990
> 16777216 16.168 16.104 15.838 63.858 0.571 3.991
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.895 16.142 15.998 65.398 2.379 1.022
> 33554432 16.753 16.169 16.067 62.729 1.146 1.960
> 16777216 16.866 15.912 16.099 62.892 1.570 3.931
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.923 15.716 16.741 63.545 1.715 0.993
> 33554432 16.010 16.026 16.113 63.802 0.180 1.994
> 16777216 16.644 16.239 16.143 62.672 0.827 3.917
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.753 15.882 15.482 65.207 0.697 1.019
> 33554432 15.670 16.268 15.669 64.548 1.134 2.017
> 16777216 15.746 15.519 16.411 64.471 1.516 4.029
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.639 14.360 13.654 73.795 1.758 1.153
> 33554432 13.584 13.938 14.538 73.095 2.035 2.284
> 16777216 13.617 13.510 13.803 75.060 0.665 4.691
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.428 13.541 14.144 74.760 1.690 1.168
> 33554432 13.707 13.352 13.462 75.821 0.827 2.369
> 16777216 14.380 13.504 13.675 73.975 1.991 4.623
>
> With two threads:
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.453 12.173 13.014 81.677 2.254 1.276
> 33554432 12.066 11.999 12.960 83.073 2.877 2.596
> 16777216 13.719 11.969 12.569 80.554 4.500 5.035
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.886 12.201 12.147 82.564 2.198 1.290
> 33554432 12.344 12.928 12.007 82.483 2.504 2.578
> 16777216 12.380 11.951 13.119 82.151 3.141 5.134
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.824 13.485 13.534 77.148 1.913 1.205
> 33554432 12.084 13.752 12.111 81.251 4.800 2.539
> 16777216 12.658 13.035 11.196 83.640 5.612 5.227
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.253 12.552 11.773 84.044 2.230 1.313
> 33554432 13.177 12.456 11.604 82.723 4.316 2.585
> 16777216 12.471 12.318 13.006 81.324 1.878 5.083
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.409 13.311 14.278 73.238 2.624 1.144
> 33554432 14.665 14.260 14.080 71.455 1.211 2.233
> 16777216 14.179 14.810 14.640 70.438 1.303 4.402
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.401 14.107 13.549 74.860 1.642 1.170
> 33554432 14.575 13.221 14.428 72.894 3.236 2.278
> 16777216 13.771 14.227 13.594 73.887 1.408 4.618
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 10.286 12.272 10.245 94.317 7.690 1.474
> 33554432 10.241 10.415 13.374 91.624 10.670 2.863
> 16777216 10.499 10.224 10.792 97.526 2.151 6.095
>
> The last result comes close to 100MB/s!

Good! Although I expected maximum with a single thread.

Can you do the same set of tests with deadline scheduler on the server?

Thanks,
Vlad

2009-07-20 07:20:48

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


Vladislav Bolkhovitin, on 07/17/2009 10:23 PM wrote:
> Ronald Moesbergen, on 07/17/2009 06:15 PM wrote:
>> 2009/7/16 Vladislav Bolkhovitin <[email protected]>:
>>> Ronald Moesbergen, on 07/16/2009 11:32 AM wrote:
>>>> 2009/7/15 Vladislav Bolkhovitin <[email protected]>:
>>>>>> The drop with 64 max_sectors_kb on the client is a consequence of how
>>>>>> CFQ
>>>>>> is working. I can't find the exact code responsible for this, but from
>>>>>> all
>>>>>> signs, CFQ stops delaying requests if amount of outstanding requests
>>>>>> exceeds
>>>>>> some threshold, which is 2 or 3. With 64 max_sectors_kb and 5 SCST I/O
>>>>>> threads this threshold is exceeded, so CFQ doesn't recover order of
>>>>>> requests, hence the performance drop. With default 512 max_sectors_kb
>>>>>> and
>>>>>> 128K RA the server sees at max 2 requests at time.
>>>>>>
>>>>>> Ronald, can you perform the same tests with 1 and 2 SCST I/O threads,
>>>>>> please?
>>>> Ok. Should I still use the file-on-xfs testcase for this, or should I
>>>> go back to using a regular block device?
>>> Yes, please
>>>
>>>> The file-over-iscsi is quite
>>>> uncommon I suppose, most people will export a block device over iscsi,
>>>> not a file.
>>> No, files are common. The main reason why people use direct block devices is
>>> a not supported by anything believe that comparing with files they "have
>>> less overhead", so "should be faster". But it isn't true and can be easily
>>> checked.
>>>
>>>>> With context-RA patch, please, in those and future tests, since it should
>>>>> make RA for cooperative threads much better.
>>>>>
>>>>>> You can limit amount of SCST I/O threads by num_threads parameter of
>>>>>> scst_vdisk module.
>>>> Ok, I'll try that and include the blk_run_backing_dev,
>>>> readahead-context and io_context patches.
>> The results:
>>
>> client kernel: 2.6.26-15lenny3 (debian)
>> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
>> and io_context
>>
>> With one IO thread:
>>
>> 5) client: default, server: default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 15.990 15.308 16.689 64.097 2.259 1.002
>> 33554432 15.981 16.064 16.221 63.651 0.392 1.989
>> 16777216 15.841 15.660 16.031 64.635 0.619 4.040
>>
>> 6) client: default, server: 64 max_sectors_kb, RA default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 16.035 16.024 16.654 63.084 1.130 0.986
>> 33554432 15.924 15.975 16.359 63.668 0.762 1.990
>> 16777216 16.168 16.104 15.838 63.858 0.571 3.991
>>
>> 7) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 14.895 16.142 15.998 65.398 2.379 1.022
>> 33554432 16.753 16.169 16.067 62.729 1.146 1.960
>> 16777216 16.866 15.912 16.099 62.892 1.570 3.931
>>
>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 15.923 15.716 16.741 63.545 1.715 0.993
>> 33554432 16.010 16.026 16.113 63.802 0.180 1.994
>> 16777216 16.644 16.239 16.143 62.672 0.827 3.917
>>
>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 15.753 15.882 15.482 65.207 0.697 1.019
>> 33554432 15.670 16.268 15.669 64.548 1.134 2.017
>> 16777216 15.746 15.519 16.411 64.471 1.516 4.029
>>
>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 13.639 14.360 13.654 73.795 1.758 1.153
>> 33554432 13.584 13.938 14.538 73.095 2.035 2.284
>> 16777216 13.617 13.510 13.803 75.060 0.665 4.691
>>
>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 13.428 13.541 14.144 74.760 1.690 1.168
>> 33554432 13.707 13.352 13.462 75.821 0.827 2.369
>> 16777216 14.380 13.504 13.675 73.975 1.991 4.623
>>
>> With two threads:
>> 5) client: default, server: default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.453 12.173 13.014 81.677 2.254 1.276
>> 33554432 12.066 11.999 12.960 83.073 2.877 2.596
>> 16777216 13.719 11.969 12.569 80.554 4.500 5.035
>>
>> 6) client: default, server: 64 max_sectors_kb, RA default
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.886 12.201 12.147 82.564 2.198 1.290
>> 33554432 12.344 12.928 12.007 82.483 2.504 2.578
>> 16777216 12.380 11.951 13.119 82.151 3.141 5.134
>>
>> 7) client: default, server: default max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.824 13.485 13.534 77.148 1.913 1.205
>> 33554432 12.084 13.752 12.111 81.251 4.800 2.539
>> 16777216 12.658 13.035 11.196 83.640 5.612 5.227
>>
>> 8) client: default, server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 12.253 12.552 11.773 84.044 2.230 1.313
>> 33554432 13.177 12.456 11.604 82.723 4.316 2.585
>> 16777216 12.471 12.318 13.006 81.324 1.878 5.083
>>
>> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 14.409 13.311 14.278 73.238 2.624 1.144
>> 33554432 14.665 14.260 14.080 71.455 1.211 2.233
>> 16777216 14.179 14.810 14.640 70.438 1.303 4.402
>>
>> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 13.401 14.107 13.549 74.860 1.642 1.170
>> 33554432 14.575 13.221 14.428 72.894 3.236 2.278
>> 16777216 13.771 14.227 13.594 73.887 1.408 4.618
>>
>> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
>> blocksize R R R R(avg, R(std R
>> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
>> 67108864 10.286 12.272 10.245 94.317 7.690 1.474
>> 33554432 10.241 10.415 13.374 91.624 10.670 2.863
>> 16777216 10.499 10.224 10.792 97.526 2.151 6.095
>>
>> The last result comes close to 100MB/s!
>
> Good! Although I expected maximum with a single thread.
>
> Can you do the same set of tests with deadline scheduler on the server?

Case of 5 I/O threads (default) will also be interesting. I.e., overall,
cases of 1, 2 and 5 I/O threads with deadline scheduler on the server.

Thanks,
Vlad

2009-07-22 08:44:27

by Ronald Moesbergen

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

2009/7/20 Vladislav Bolkhovitin <[email protected]>:
>>>
>>> The last result comes close to 100MB/s!
>>
>> Good! Although I expected maximum with a single thread.
>>
>> Can you do the same set of tests with deadline scheduler on the server?
>
> Case of 5 I/O threads (default) will also be interesting. I.e., overall,
> cases of 1, 2 and 5 I/O threads with deadline scheduler on the server.

Ok. The results:

Cfq seems to perform better in this case.

client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context
server scheduler: deadline

With one IO thread:
5) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.067 16.883 16.096 62.669 1.427 0.979
33554432 16.034 16.564 16.050 63.161 0.948 1.974
16777216 16.045 15.086 16.709 64.329 2.715 4.021

6) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.851 15.348 16.652 64.271 2.147 1.004
33554432 16.182 16.104 16.170 63.397 0.135 1.981
16777216 15.952 16.085 16.258 63.613 0.493 3.976

7) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.814 16.222 16.650 63.126 1.327 0.986
33554432 16.113 15.962 16.340 63.456 0.610 1.983
16777216 16.149 16.098 15.895 63.815 0.438 3.988

8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.032 17.163 15.864 62.695 2.161 0.980
33554432 16.163 15.499 16.466 63.870 1.626 1.996
16777216 16.067 16.133 16.710 62.829 1.099 3.927

9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.498 15.474 15.195 66.547 0.599 1.040
33554432 15.729 15.636 15.758 65.192 0.214 2.037
16777216 15.656 15.481 15.724 65.557 0.430 4.097

10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.480 14.125 13.648 74.497 1.466 1.164
33554432 13.584 13.518 14.272 74.293 1.806 2.322
16777216 13.511 13.585 13.552 75.576 0.170 4.723

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.356 13.079 13.488 76.960 0.991 1.203
33554432 13.713 13.038 13.030 77.268 1.834 2.415
16777216 13.895 13.032 13.128 76.758 2.178 4.797

With two threads:
5) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.661 12.773 13.654 78.681 2.622 1.229
33554432 12.709 12.693 12.459 81.145 0.738 2.536
16777216 12.657 14.055 13.237 77.038 3.292 4.815

6) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.300 12.877 13.705 77.078 1.964 1.204
33554432 13.025 14.404 12.833 76.501 3.855 2.391
16777216 13.172 13.220 12.997 77.995 0.570 4.875

7) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.365 13.168 12.835 78.053 1.308 1.220
33554432 13.518 13.122 13.366 76.799 0.942 2.400
16777216 13.177 13.146 13.839 76.534 1.797 4.783

8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.308 12.669 13.520 76.045 3.788 1.188
33554432 12.586 12.897 13.221 79.405 1.596 2.481
16777216 13.766 12.583 14.176 76.001 3.903 4.750

9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.454 12.537 15.058 73.509 5.893 1.149
33554432 15.871 14.201 13.846 70.194 4.083 2.194
16777216 14.721 13.346 14.434 72.410 3.104 4.526

10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.262 13.308 13.416 76.828 0.371 1.200
33554432 13.915 13.182 13.065 76.551 2.114 2.392
16777216 13.223 14.133 13.317 75.596 2.232 4.725

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.277 17.743 17.534 57.380 0.997 0.897
33554432 18.018 17.728 17.343 57.879 0.907 1.809
16777216 17.600 18.466 17.645 57.223 1.253 3.576

With five threads:
5) client: default, server: default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 12.915 13.643 12.572 78.598 2.654 1.228
33554432 12.716 12.970 13.283 78.858 1.403 2.464
16777216 14.372 13.282 13.122 75.461 3.002 4.716

6) client: default, server: 64 max_sectors_kb, RA default
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.372 13.205 12.468 78.750 2.421 1.230
33554432 13.489 13.352 12.883 77.363 1.533 2.418
16777216 13.127 12.653 14.252 76.928 3.785 4.808

7) client: default, server: default max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.135 13.031 13.824 76.872 1.994 1.201
33554432 13.079 13.590 13.730 76.076 1.600 2.377
16777216 12.707 12.951 13.805 77.942 2.735 4.871

8) client: default, server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.030 12.947 13.538 77.772 1.524 1.215
33554432 12.826 12.973 13.805 77.649 2.482 2.427
16777216 12.751 13.007 12.986 79.295 0.718 4.956

9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.236 13.349 13.833 76.034 1.445 1.188
33554432 13.481 14.259 13.582 74.389 1.836 2.325
16777216 14.394 13.922 13.943 72.712 1.111 4.545

10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.245 18.690 17.342 56.654 1.779 0.885
33554432 17.744 18.122 17.577 57.492 0.731 1.797
16777216 18.280 18.564 17.846 56.186 0.914 3.512

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.241 16.894 15.853 64.131 2.705 1.002
33554432 14.858 16.904 15.588 65.064 3.435 2.033
16777216 16.777 15.939 15.034 64.465 2.893 4.029

Ronald.

2009-07-27 13:11:12

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev



Ronald Moesbergen, on 07/22/2009 12:44 PM wrote:
> 2009/7/20 Vladislav Bolkhovitin <[email protected]>:
>>>> The last result comes close to 100MB/s!
>>> Good! Although I expected maximum with a single thread.
>>>
>>> Can you do the same set of tests with deadline scheduler on the server?
>> Case of 5 I/O threads (default) will also be interesting. I.e., overall,
>> cases of 1, 2 and 5 I/O threads with deadline scheduler on the server.
>
> Ok. The results:
>
> Cfq seems to perform better in this case.
>
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context
> server scheduler: deadline
>
> With one IO thread:
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.067 16.883 16.096 62.669 1.427 0.979
> 33554432 16.034 16.564 16.050 63.161 0.948 1.974
> 16777216 16.045 15.086 16.709 64.329 2.715 4.021
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.851 15.348 16.652 64.271 2.147 1.004
> 33554432 16.182 16.104 16.170 63.397 0.135 1.981
> 16777216 15.952 16.085 16.258 63.613 0.493 3.976
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.814 16.222 16.650 63.126 1.327 0.986
> 33554432 16.113 15.962 16.340 63.456 0.610 1.983
> 16777216 16.149 16.098 15.895 63.815 0.438 3.988
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.032 17.163 15.864 62.695 2.161 0.980
> 33554432 16.163 15.499 16.466 63.870 1.626 1.996
> 16777216 16.067 16.133 16.710 62.829 1.099 3.927
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.498 15.474 15.195 66.547 0.599 1.040
> 33554432 15.729 15.636 15.758 65.192 0.214 2.037
> 16777216 15.656 15.481 15.724 65.557 0.430 4.097
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.480 14.125 13.648 74.497 1.466 1.164
> 33554432 13.584 13.518 14.272 74.293 1.806 2.322
> 16777216 13.511 13.585 13.552 75.576 0.170 4.723
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.356 13.079 13.488 76.960 0.991 1.203
> 33554432 13.713 13.038 13.030 77.268 1.834 2.415
> 16777216 13.895 13.032 13.128 76.758 2.178 4.797
>
> With two threads:
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.661 12.773 13.654 78.681 2.622 1.229
> 33554432 12.709 12.693 12.459 81.145 0.738 2.536
> 16777216 12.657 14.055 13.237 77.038 3.292 4.815
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.300 12.877 13.705 77.078 1.964 1.204
> 33554432 13.025 14.404 12.833 76.501 3.855 2.391
> 16777216 13.172 13.220 12.997 77.995 0.570 4.875
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.365 13.168 12.835 78.053 1.308 1.220
> 33554432 13.518 13.122 13.366 76.799 0.942 2.400
> 16777216 13.177 13.146 13.839 76.534 1.797 4.783
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.308 12.669 13.520 76.045 3.788 1.188
> 33554432 12.586 12.897 13.221 79.405 1.596 2.481
> 16777216 13.766 12.583 14.176 76.001 3.903 4.750
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.454 12.537 15.058 73.509 5.893 1.149
> 33554432 15.871 14.201 13.846 70.194 4.083 2.194
> 16777216 14.721 13.346 14.434 72.410 3.104 4.526
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.262 13.308 13.416 76.828 0.371 1.200
> 33554432 13.915 13.182 13.065 76.551 2.114 2.392
> 16777216 13.223 14.133 13.317 75.596 2.232 4.725
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.277 17.743 17.534 57.380 0.997 0.897
> 33554432 18.018 17.728 17.343 57.879 0.907 1.809
> 16777216 17.600 18.466 17.645 57.223 1.253 3.576
>
> With five threads:
> 5) client: default, server: default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 12.915 13.643 12.572 78.598 2.654 1.228
> 33554432 12.716 12.970 13.283 78.858 1.403 2.464
> 16777216 14.372 13.282 13.122 75.461 3.002 4.716
>
> 6) client: default, server: 64 max_sectors_kb, RA default
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.372 13.205 12.468 78.750 2.421 1.230
> 33554432 13.489 13.352 12.883 77.363 1.533 2.418
> 16777216 13.127 12.653 14.252 76.928 3.785 4.808
>
> 7) client: default, server: default max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.135 13.031 13.824 76.872 1.994 1.201
> 33554432 13.079 13.590 13.730 76.076 1.600 2.377
> 16777216 12.707 12.951 13.805 77.942 2.735 4.871
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.030 12.947 13.538 77.772 1.524 1.215
> 33554432 12.826 12.973 13.805 77.649 2.482 2.427
> 16777216 12.751 13.007 12.986 79.295 0.718 4.956
>
> 9) client: 64 max_sectors_kb, default RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.236 13.349 13.833 76.034 1.445 1.188
> 33554432 13.481 14.259 13.582 74.389 1.836 2.325
> 16777216 14.394 13.922 13.943 72.712 1.111 4.545
>
> 10) client: default max_sectors_kb, 2MB RA. server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.245 18.690 17.342 56.654 1.779 0.885
> 33554432 17.744 18.122 17.577 57.492 0.731 1.797
> 16777216 18.280 18.564 17.846 56.186 0.914 3.512
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.241 16.894 15.853 64.131 2.705 1.002
> 33554432 14.858 16.904 15.588 65.064 3.435 2.033
> 16777216 16.777 15.939 15.034 64.465 2.893 4.029

Hmm, it's really weird, why the case of 2 threads is faster. There must
be some commands reordering somewhere in SCST, which I'm missing, like
list_add() instead of list_add_tail().

Can you apply the attached patch and repeat tests 5, 8 and 11 with 1 and
2 threads, please. The patch will enable forced commands order
protection, i.e. with it all the commands will be executed in exactly
the same order as they were received.

Thanks,
Vlad


Attachments:
forced_order.diff (747.00 B)

2009-07-28 09:51:05

by Ronald Moesbergen

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

2009/7/27 Vladislav Bolkhovitin <[email protected]>:
>
> Hmm, it's really weird, why the case of 2 threads is faster. There must be
> some commands reordering somewhere in SCST, which I'm missing, like
> list_add() instead of list_add_tail().
>
> Can you apply the attached patch and repeat tests 5, 8 and 11 with 1 and 2
> threads, please. The patch will enable forced commands order protection,
> i.e. with it all the commands will be executed in exactly the same order as
> they were received.

The patched source doesn't compile. I changed the code to this:

@ line 3184:

case SCST_CMD_QUEUE_UNTAGGED:
#if 1 /* left for future performance investigations */
goto ordered;
#endif

The results:

Overall performance seems lower.

client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context, forced_order

With one IO thread:
5) client: default, server: default (cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.484 16.417 16.068 62.741 0.706 0.980
33554432 15.684 16.348 16.011 63.961 1.083 1.999
16777216 16.044 16.239 15.938 63.710 0.493 3.982

8) client: default, server: 64 max_sectors_kb, RA 2MB (cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.127 15.784 16.210 63.847 0.740 0.998
33554432 16.103 16.072 16.106 63.627 0.061 1.988
16777216 16.637 16.058 16.154 62.902 0.970 3.931

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB (cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.417 15.219 13.912 72.405 3.785 1.131
33554432 13.868 13.789 14.110 73.558 0.718 2.299
16777216 13.691 13.784 10.280 82.898 11.822 5.181

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA
2MB (deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.604 13.532 13.978 74.733 1.055 1.168
33554432 13.523 13.166 13.504 76.443 0.945 2.389
16777216 13.434 13.409 13.632 75.902 0.557 4.744

With two threads:
5) client: default, server: default (cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.206 16.001 15.908 63.851 0.493 0.998
33554432 16.927 16.033 15.991 62.799 1.631 1.962
16777216 16.566 15.968 16.212 63.035 0.950 3.940

8) client: default, server: 64 max_sectors_kb, RA 2MB (cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.017 15.849 15.748 64.521 0.450 1.008
33554432 16.652 15.542 16.259 63.454 1.823 1.983
16777216 16.456 16.071 15.943 63.392 0.849 3.962

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB (cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.109 9.985 13.548 83.572 13.478 1.306
33554432 13.698 14.236 13.754 73.711 1.267 2.303
16777216 13.610 12.090 14.136 77.458 5.244 4.841

11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA
2MB (deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 13.542 13.975 13.978 74.049 1.110 1.157
33554432 9.921 13.272 13.321 85.746 12.349 2.680
16777216 13.850 13.600 13.344 75.324 1.144 4.708

Ronald.

2009-07-28 19:08:45

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


Ronald Moesbergen, on 07/28/2009 01:51 PM wrote:
> 2009/7/27 Vladislav Bolkhovitin <[email protected]>:
>> Hmm, it's really weird, why the case of 2 threads is faster. There must be
>> some commands reordering somewhere in SCST, which I'm missing, like
>> list_add() instead of list_add_tail().
>>
>> Can you apply the attached patch and repeat tests 5, 8 and 11 with 1 and 2
>> threads, please. The patch will enable forced commands order protection,
>> i.e. with it all the commands will be executed in exactly the same order as
>> they were received.
>
> The patched source doesn't compile. I changed the code to this:
>
> @ line 3184:
>
> case SCST_CMD_QUEUE_UNTAGGED:
> #if 1 /* left for future performance investigations */
> goto ordered;
> #endif
>
> The results:
>
> Overall performance seems lower.
>
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context, forced_order
>
> With one IO thread:
> 5) client: default, server: default (cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.484 16.417 16.068 62.741 0.706 0.980
> 33554432 15.684 16.348 16.011 63.961 1.083 1.999
> 16777216 16.044 16.239 15.938 63.710 0.493 3.982
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.127 15.784 16.210 63.847 0.740 0.998
> 33554432 16.103 16.072 16.106 63.627 0.061 1.988
> 16777216 16.637 16.058 16.154 62.902 0.970 3.931
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB (cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.417 15.219 13.912 72.405 3.785 1.131
> 33554432 13.868 13.789 14.110 73.558 0.718 2.299
> 16777216 13.691 13.784 10.280 82.898 11.822 5.181
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA
> 2MB (deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.604 13.532 13.978 74.733 1.055 1.168
> 33554432 13.523 13.166 13.504 76.443 0.945 2.389
> 16777216 13.434 13.409 13.632 75.902 0.557 4.744
>
> With two threads:
> 5) client: default, server: default (cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.206 16.001 15.908 63.851 0.493 0.998
> 33554432 16.927 16.033 15.991 62.799 1.631 1.962
> 16777216 16.566 15.968 16.212 63.035 0.950 3.940
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.017 15.849 15.748 64.521 0.450 1.008
> 33554432 16.652 15.542 16.259 63.454 1.823 1.983
> 16777216 16.456 16.071 15.943 63.392 0.849 3.962
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA 2MB (cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.109 9.985 13.548 83.572 13.478 1.306
> 33554432 13.698 14.236 13.754 73.711 1.267 2.303
> 16777216 13.610 12.090 14.136 77.458 5.244 4.841
>
> 11) client: 64 max_sectors_kb, 2MB. RA server: 64 max_sectors_kb, RA
> 2MB (deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 13.542 13.975 13.978 74.049 1.110 1.157
> 33554432 9.921 13.272 13.321 85.746 12.349 2.680
> 16777216 13.850 13.600 13.344 75.324 1.144 4.708

Can you perform the tests 5 and 8 the deadline? I asked for deadline..

What I/O scheduler do you use on the initiator? Can you check if
changing it to deadline or noop makes any difference?

Thanks,
Vlad

2009-07-29 12:48:13

by Ronald Moesbergen

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev

2009/7/28 Vladislav Bolkhovitin <[email protected]>:
>
> Can you perform the tests 5 and 8 the deadline? I asked for deadline..
>
> What I/O scheduler do you use on the initiator? Can you check if changing it
> to deadline or noop makes any difference?
>

client kernel: 2.6.26-15lenny3 (debian)
server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
and io_context, forced_order

With one IO thread:
5) client: default, server: default (server deadline, client cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.739 15.339 16.511 64.613 1.959 1.010
33554432 15.411 12.384 15.400 71.876 7.646 2.246
16777216 16.564 15.569 16.279 63.498 1.667 3.969

5) client: default, server: default (server deadline, client deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 17.578 20.051 18.010 55.395 3.111 0.866
33554432 19.247 12.607 17.930 63.846 12.390 1.995
16777216 14.587 19.631 18.032 59.718 7.650 3.732

8) client: default, server: 64 max_sectors_kb, RA 2MB (server
deadline, client deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 17.418 19.520 22.050 52.564 5.043 0.821
33554432 21.263 17.623 17.782 54.616 4.571 1.707
16777216 17.896 18.335 19.407 55.278 1.864 3.455

8) client: default, server: 64 max_sectors_kb, RA 2MB (server
deadline, client cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 16.639 15.216 16.035 64.233 2.365 1.004
33554432 15.750 16.511 16.092 63.557 1.224 1.986
16777216 16.390 15.866 15.331 64.604 1.763 4.038

11) client: 2MB RA, 64 max_sectors_kb, server: 64 max_sectors_kb, RA
2MB (server deadline, client deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.117 13.610 13.558 74.435 1.347 1.163
33554432 13.450 10.344 13.556 83.555 10.918 2.611
16777216 13.408 13.319 13.239 76.867 0.398 4.804

With two threads:
5) client: default, server: default (server deadline, client cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.723 16.535 16.189 63.438 1.312 0.991
33554432 16.152 16.363 15.782 63.621 0.954 1.988
16777216 15.174 16.084 16.682 64.178 2.516 4.011

5) client: default, server: default (server deadline, client deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.087 18.082 17.639 57.099 0.674 0.892
33554432 18.377 15.750 17.551 59.694 3.912 1.865
16777216 18.490 15.553 18.778 58.585 5.143 3.662

8) client: default, server: 64 max_sectors_kb, RA 2MB (server
deadline, client deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 18.140 19.114 17.442 56.244 2.103 0.879
33554432 17.183 17.233 21.367 55.646 5.461 1.739
16777216 19.813 17.965 18.132 55.053 2.393 3.441

8) client: default, server: 64 max_sectors_kb, RA 2MB (server
deadline, client cfq)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 15.753 16.085 16.522 63.548 1.239 0.993
33554432 13.502 15.912 15.507 68.743 5.065 2.148
16777216 16.584 16.171 15.959 63.077 1.003 3.942

11) client: 2MB RA, 64 max_sectors_kb, server: 64 max_sectors_kb, RA
2MB (server deadline, client deadline)
blocksize R R R R(avg, R(std R
(bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
67108864 14.051 13.427 13.498 75.001 1.510 1.172
33554432 13.397 14.008 13.453 75.217 1.503 2.351
16777216 13.277 9.942 14.318 83.882 13.712 5.243


Ronald.

2009-07-31 18:32:19

by Vladislav Bolkhovitin

[permalink] [raw]
Subject: Re: [RESEND] [PATCH] readahead:add blk_run_backing_dev


Ronald Moesbergen, on 07/29/2009 04:48 PM wrote:
> 2009/7/28 Vladislav Bolkhovitin <[email protected]>:
>> Can you perform the tests 5 and 8 the deadline? I asked for deadline..
>>
>> What I/O scheduler do you use on the initiator? Can you check if changing it
>> to deadline or noop makes any difference?
>>
>
> client kernel: 2.6.26-15lenny3 (debian)
> server kernel: 2.6.29.5 with readahead-context, blk_run_backing_dev
> and io_context, forced_order
>
> With one IO thread:
> 5) client: default, server: default (server deadline, client cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.739 15.339 16.511 64.613 1.959 1.010
> 33554432 15.411 12.384 15.400 71.876 7.646 2.246
> 16777216 16.564 15.569 16.279 63.498 1.667 3.969
>
> 5) client: default, server: default (server deadline, client deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 17.578 20.051 18.010 55.395 3.111 0.866
> 33554432 19.247 12.607 17.930 63.846 12.390 1.995
> 16777216 14.587 19.631 18.032 59.718 7.650 3.732
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (server
> deadline, client deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 17.418 19.520 22.050 52.564 5.043 0.821
> 33554432 21.263 17.623 17.782 54.616 4.571 1.707
> 16777216 17.896 18.335 19.407 55.278 1.864 3.455
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (server
> deadline, client cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 16.639 15.216 16.035 64.233 2.365 1.004
> 33554432 15.750 16.511 16.092 63.557 1.224 1.986
> 16777216 16.390 15.866 15.331 64.604 1.763 4.038
>
> 11) client: 2MB RA, 64 max_sectors_kb, server: 64 max_sectors_kb, RA
> 2MB (server deadline, client deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.117 13.610 13.558 74.435 1.347 1.163
> 33554432 13.450 10.344 13.556 83.555 10.918 2.611
> 16777216 13.408 13.319 13.239 76.867 0.398 4.804
>
> With two threads:
> 5) client: default, server: default (server deadline, client cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.723 16.535 16.189 63.438 1.312 0.991
> 33554432 16.152 16.363 15.782 63.621 0.954 1.988
> 16777216 15.174 16.084 16.682 64.178 2.516 4.011
>
> 5) client: default, server: default (server deadline, client deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.087 18.082 17.639 57.099 0.674 0.892
> 33554432 18.377 15.750 17.551 59.694 3.912 1.865
> 16777216 18.490 15.553 18.778 58.585 5.143 3.662
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (server
> deadline, client deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 18.140 19.114 17.442 56.244 2.103 0.879
> 33554432 17.183 17.233 21.367 55.646 5.461 1.739
> 16777216 19.813 17.965 18.132 55.053 2.393 3.441
>
> 8) client: default, server: 64 max_sectors_kb, RA 2MB (server
> deadline, client cfq)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 15.753 16.085 16.522 63.548 1.239 0.993
> 33554432 13.502 15.912 15.507 68.743 5.065 2.148
> 16777216 16.584 16.171 15.959 63.077 1.003 3.942
>
> 11) client: 2MB RA, 64 max_sectors_kb, server: 64 max_sectors_kb, RA
> 2MB (server deadline, client deadline)
> blocksize R R R R(avg, R(std R
> (bytes) (s) (s) (s) MB/s) ,MB/s) (IOPS)
> 67108864 14.051 13.427 13.498 75.001 1.510 1.172
> 33554432 13.397 14.008 13.453 75.217 1.503 2.351
> 16777216 13.277 9.942 14.318 83.882 13.712 5.243

OK, as I expected, on the SCST level everything is clear and the forced
ordering change didn't change anything.

But still, a single read stream must be the fastest from single thread.
Otherwise, there's something wrong somewhere in the I/O path: block
layer, RA, I/O scheduler. And, apparently, this is what we have and
should find out the cause.

Can you check if noop on the target and/or initiator makes any
difference? Case 5 with 1 and 2 threads will be sufficient.

Thanks,
Vlad