Neil,
This is another attempt to make raid5 stripe handling multi-threading.
Recent workqueue improvement for unbound workqueue looks very promising to the
raid5 usage. I had details in the first patch.
The patches are against your tree with patch 'raid5: make release_stripe
lockless' and 'raid5: fix stripe release order' but without 'raid5: create
multiple threads to handle stripes'
My test setup has 7 PCIe SSD, chunksize 8k, stripe_cache_size 2048. If enabling
multi-threading, group_thread_cnt is set to 8.
randwrite throughput(ratio) unpatch/patch requestsize(sectors) unpatch/patch
4k 1/5.9 8/8
8k 1/5.5 16/13
16k 1/4.8 16/13
32k 1/4.6 18/14
64k 1/4.7 17/13
128k 1/4.2 23/16
256k 1/3.5 41/21
512k 1/3 75/28
1M 1/2.6 134/34
For example, in 1M randwrite test, patched kernel is 2.6x faster, but average
request sending to each disk is drop to 34 sectors from 134 sectors long.
Currently the biggest problem is request size is dropped, because there are
multiple threads dispatching requests. This indicates multi-threading might not
be proper for all setup, so I disable it by default in this version. But since
throughput is largly improved, I thought this isn't a blocking issue. I'm still
working on improving this, maybe schedule stripes from one block plug as a
whole.
Thanks,
Shaohua
On Mon, 12 Aug 2013 10:18:03 +0800 Shaohua Li <[email protected]> wrote:
> Neil,
>
> This is another attempt to make raid5 stripe handling multi-threading.
> Recent workqueue improvement for unbound workqueue looks very promising to the
> raid5 usage. I had details in the first patch.
>
> The patches are against your tree with patch 'raid5: make release_stripe
> lockless' and 'raid5: fix stripe release order' but without 'raid5: create
> multiple threads to handle stripes'
>
> My test setup has 7 PCIe SSD, chunksize 8k, stripe_cache_size 2048. If enabling
> multi-threading, group_thread_cnt is set to 8.
>
> randwrite throughput(ratio) unpatch/patch requestsize(sectors) unpatch/patch
> 4k 1/5.9 8/8
> 8k 1/5.5 16/13
> 16k 1/4.8 16/13
> 32k 1/4.6 18/14
> 64k 1/4.7 17/13
> 128k 1/4.2 23/16
> 256k 1/3.5 41/21
> 512k 1/3 75/28
> 1M 1/2.6 134/34
>
> For example, in 1M randwrite test, patched kernel is 2.6x faster, but average
> request sending to each disk is drop to 34 sectors from 134 sectors long.
>
> Currently the biggest problem is request size is dropped, because there are
> multiple threads dispatching requests. This indicates multi-threading might not
> be proper for all setup, so I disable it by default in this version. But since
> throughput is largly improved, I thought this isn't a blocking issue. I'm still
> working on improving this, maybe schedule stripes from one block plug as a
> whole.
>
> Thanks,
> Shaohua
Thanks. I like this a lot more than the previous version.
It doesn't seem to apply exactly to my current 'for-next' - probably because
I have moved things around and have a different set of patches applied :-(
If you could rebase it on my current for-next I'll apply it and probably
submit for next merge window.
A couple of little changes I'd like made:
1/ alloc_thread_groups need to use GFP_NOIO, it least when called from
raid5_store_group_thread_cnt. At this point in time IO to the RAID5 is
stalled so if the malloc needs to free memory it might wait for writeout to
the RAID5 and so deadlock. GFP_NOIO prevents that.
2/ could we move the
+ if (!cpu_online(cpu)) {
+ cpu = cpumask_any(cpu_online_mask);
+ sh->cpu = cpu;
+ }
inside raid5_wakeup_stripe_thread() ?
It isn't a perfect fit, but I think it is a better place for it.
It could check list_empty(&sh->lru) and if it is empty, add to
the appropriate group->handle_list.
The code in do_release_stripe would become
else {
clear_bit(STRIPE_DELAYED, &sh->state);
clear_bit(STRIPE_BIT_DELAY, &sh->state);
+ if (conf->worker_cnt_per_group == 0) {
+ list_add_tail(&sh->lru, &conf->handle_list);
+ } else {
+ raid5_wakeup_stripe_thread(sh);
+ return;
+ }
}
md_wakeup_thread(conf->mddev->thread);
??
NeilBrown