2019-03-07 05:21:16

by Shile Zhang

[permalink] [raw]
Subject: [PATCH] bcache: add cond_resched() in __bch_cache_cmp()

From: Shile Zhang <[email protected]>

Read /sys/fs/bcache/<uuid>/cacheN/priority_stats can take very long
time with huge cache after long run.

Signed-off-by: Shile Zhang <[email protected]>
---
drivers/md/bcache/sysfs.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
index 557a8a3..028fea1 100644
--- a/drivers/md/bcache/sysfs.c
+++ b/drivers/md/bcache/sysfs.c
@@ -897,6 +897,7 @@ static void bch_cache_set_internal_release(struct kobject *k)

static int __bch_cache_cmp(const void *l, const void *r)
{
+ cond_resched();
return *((uint16_t *)r) - *((uint16_t *)l);
}

--
1.8.3.1



2019-03-07 10:35:54

by Coly Li

[permalink] [raw]
Subject: Re: [PATCH] bcache: add cond_resched() in __bch_cache_cmp()

On 2019/3/7 1:15 下午, [email protected] wrote:
> From: Shile Zhang <[email protected]>
>
> Read /sys/fs/bcache/<uuid>/cacheN/priority_stats can take very long
> time with huge cache after long run.
>
> Signed-off-by: Shile Zhang <[email protected]>

Hi Shile,

Do you test your change ? It will be helpful with more performance data
(what problem that you improved).

Thanks.

Coly Li

> ---
> drivers/md/bcache/sysfs.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
> index 557a8a3..028fea1 100644
> --- a/drivers/md/bcache/sysfs.c
> +++ b/drivers/md/bcache/sysfs.c
> @@ -897,6 +897,7 @@ static void bch_cache_set_internal_release(struct kobject *k)
>
> static int __bch_cache_cmp(const void *l, const void *r)
> {
> + cond_resched();
> return *((uint16_t *)r) - *((uint16_t *)l);
> }
>
>


--

Coly Li

2019-03-07 15:09:46

by Shile Zhang

[permalink] [raw]
Subject: Re: [PATCH] bcache: add cond_resched() in __bch_cache_cmp()


On 2019/3/7 18:34, Coly Li wrote:
> On 2019/3/7 1:15 下午, [email protected] wrote:
>> From: Shile Zhang <[email protected]>
>>
>> Read /sys/fs/bcache/<uuid>/cacheN/priority_stats can take very long
>> time with huge cache after long run.
>>
>> Signed-off-by: Shile Zhang <[email protected]>
> Hi Shile,
>
> Do you test your change ? It will be helpful with more performance data
> (what problem that you improved).

In case of 960GB SSD cache device, once read of the 'priority_stats'
costs about 600ms in our test environment.

The perf tool shown that near 50% CPU time consumed by 'sort()', this
means once sort will hold the CPU near 300ms.

In our case, the statistics collector reads the 'priority_stats'
periodically, it will trigger the schedule latency jitters of the

task which shared same CPU core.

>
> Thanks.
>
> Coly Li
>
>> ---
>> drivers/md/bcache/sysfs.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
>> index 557a8a3..028fea1 100644
>> --- a/drivers/md/bcache/sysfs.c
>> +++ b/drivers/md/bcache/sysfs.c
>> @@ -897,6 +897,7 @@ static void bch_cache_set_internal_release(struct kobject *k)
>>
>> static int __bch_cache_cmp(const void *l, const void *r)
>> {
>> + cond_resched();
>> return *((uint16_t *)r) - *((uint16_t *)l);
>> }
>>
>>
>

2019-03-07 15:37:32

by Coly Li

[permalink] [raw]
Subject: Re: [PATCH] bcache: add cond_resched() in __bch_cache_cmp()

On 2019/3/7 11:06 下午, Shile Zhang wrote:
>
> On 2019/3/7 18:34, Coly Li wrote:
>> On 2019/3/7 1:15 下午, [email protected] wrote:
>>> From: Shile Zhang <[email protected]>
>>>
>>> Read /sys/fs/bcache/<uuid>/cacheN/priority_stats can take very long
>>> time with huge cache after long run.
>>>
>>> Signed-off-by: Shile Zhang <[email protected]>
>> Hi Shile,
>>
>> Do you test your change ? It will be helpful with more performance data
>> (what problem that you improved).
>
> In case of 960GB SSD cache device, once read of the 'priority_stats'
> costs about 600ms in our test environment.
>

After the fix, how much time it takes ?


> The perf tool shown that near 50% CPU time consumed by 'sort()', this
> means once sort will hold the CPU near 300ms.
>
> In our case, the statistics collector reads the 'priority_stats'
> periodically, it will trigger the schedule latency jitters of the
>
> task which shared same CPU core.
>

Hmm, it seems you just make the sort slower, and nothing more changes.
Am I right ?


Coly Li


>>
>> Thanks.
>>
>> Coly Li
>>
>>> ---
>>>   drivers/md/bcache/sysfs.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/md/bcache/sysfs.c b/drivers/md/bcache/sysfs.c
>>> index 557a8a3..028fea1 100644
>>> --- a/drivers/md/bcache/sysfs.c
>>> +++ b/drivers/md/bcache/sysfs.c
>>> @@ -897,6 +897,7 @@ static void bch_cache_set_internal_release(struct
>>> kobject *k)
>>>     static int __bch_cache_cmp(const void *l, const void *r)
>>>   {
>>> +    cond_resched();
>>>       return *((uint16_t *)r) - *((uint16_t *)l);
>>>   }
>>>  
>>


--

Coly Li

2019-03-07 15:46:51

by Vojtech Pavlik

[permalink] [raw]
Subject: Re: [PATCH] bcache: add cond_resched() in __bch_cache_cmp()

On Thu, Mar 07, 2019 at 11:36:18PM +0800, Coly Li wrote:
> On 2019/3/7 11:06 下午, Shile Zhang wrote:
> >
> > On 2019/3/7 18:34, Coly Li wrote:
> >> On 2019/3/7 1:15 下午, [email protected] wrote:
> >>> From: Shile Zhang <[email protected]>
> >>>
> >>> Read /sys/fs/bcache/<uuid>/cacheN/priority_stats can take very long
> >>> time with huge cache after long run.
> >>>
> >>> Signed-off-by: Shile Zhang <[email protected]>
> >> Hi Shile,
> >>
> >> Do you test your change ? It will be helpful with more performance data
> >> (what problem that you improved).
> >
> > In case of 960GB SSD cache device, once read of the 'priority_stats'
> > costs about 600ms in our test environment.
> >
>
> After the fix, how much time it takes ?
>
>
> > The perf tool shown that near 50% CPU time consumed by 'sort()', this
> > means once sort will hold the CPU near 300ms.
> >
> > In our case, the statistics collector reads the 'priority_stats'
> > periodically, it will trigger the schedule latency jitters of the
> >
> > task which shared same CPU core.
> >
>
> Hmm, it seems you just make the sort slower, and nothing more changes.
> Am I right ?

Well, it has to make the sort slower, but it'll also avoid hogging the
CPU (on a non-preemptible kernel), avoiding a potential soft lockup
warning and allowing other tasks to run.

--
Vojtech Pavlik
VP SUSE Labs

2019-03-08 00:36:12

by Shile Zhang

[permalink] [raw]
Subject: Re: [PATCH] bcache: add cond_resched() in __bch_cache_cmp()


On 2019/3/7 23:44, Vojtech Pavlik wrote:
> On Thu, Mar 07, 2019 at 11:36:18PM +0800, Coly Li wrote:
>> On 2019/3/7 11:06 下午, Shile Zhang wrote:
>>> On 2019/3/7 18:34, Coly Li wrote:
>>>> On 2019/3/7 1:15 下午, [email protected] wrote:
>>>>> From: Shile Zhang <[email protected]>
>>>>>
>>>>> Read /sys/fs/bcache/<uuid>/cacheN/priority_stats can take very long
>>>>> time with huge cache after long run.
>>>>>
>>>>> Signed-off-by: Shile Zhang <[email protected]>
>>>> Hi Shile,
>>>>
>>>> Do you test your change ? It will be helpful with more performance data
>>>> (what problem that you improved).
>>> In case of 960GB SSD cache device, once read of the 'priority_stats'
>>> costs about 600ms in our test environment.
>>>
>> After the fix, how much time it takes ?
>>
>>
>>> The perf tool shown that near 50% CPU time consumed by 'sort()', this
>>> means once sort will hold the CPU near 300ms.
>>>
>>> In our case, the statistics collector reads the 'priority_stats'
>>> periodically, it will trigger the schedule latency jitters of the
>>>
>>> task which shared same CPU core.
>>>
>> Hmm, it seems you just make the sort slower, and nothing more changes.
>> Am I right ?
> Well, it has to make the sort slower, but it'll also avoid hogging the
> CPU (on a non-preemptible kernel), avoiding a potential soft lockup
> warning and allowing other tasks to run.
>
Yes, there is a risk that other tasks have no chance to run due to sort
hogging the CPU, it is harmful to some schedule-latency sensitive tasks.
This change just try to reduce the impact of sort, but not a performance
improvement of it. I'm not sure if a better way can handle it more
efficiency.

Thanks,

Shile



2019-03-08 02:20:01

by Coly Li

[permalink] [raw]
Subject: Re: [PATCH] bcache: add cond_resched() in __bch_cache_cmp()

On 2019/3/8 8:35 上午, Shile Zhang wrote:
>
> On 2019/3/7 23:44, Vojtech Pavlik wrote:
>> On Thu, Mar 07, 2019 at 11:36:18PM +0800, Coly Li wrote:
>>> On 2019/3/7 11:06 下午, Shile Zhang wrote:
>>>> On 2019/3/7 18:34, Coly Li wrote:
>>>>> On 2019/3/7 1:15 下午, [email protected] wrote:
>>>>>> From: Shile Zhang <[email protected]>
>>>>>>
>>>>>> Read /sys/fs/bcache/<uuid>/cacheN/priority_stats can take very long
>>>>>> time with huge cache after long run.
>>>>>>
>>>>>> Signed-off-by: Shile Zhang <[email protected]>
>>>>> Hi Shile,
>>>>>
>>>>> Do you test your change ? It will be helpful with more performance
>>>>> data
>>>>> (what problem that you improved).
>>>> In case of 960GB SSD cache device, once read of the 'priority_stats'
>>>> costs about 600ms in our test environment.
>>>>
>>> After the fix, how much time it takes ?
>>>
>>>
>>>> The perf tool shown that near 50% CPU time consumed by 'sort()', this
>>>> means once sort will hold the CPU near 300ms.
>>>>
>>>> In our case, the statistics collector reads the 'priority_stats'
>>>> periodically, it will trigger the schedule latency jitters of the
>>>>
>>>> task which shared same CPU core.
>>>>
>>> Hmm, it seems you just make the sort slower, and nothing more changes.
>>> Am I right ?
>> Well, it has to make the sort slower, but it'll also avoid hogging the
>> CPU (on a non-preemptible kernel), avoiding a potential soft lockup
>> warning and allowing other tasks to run.
>>
> Yes, there is a risk that other tasks have no chance to run due to sort
> hogging the CPU, it is harmful to some schedule-latency sensitive tasks.
> This change just try to reduce the impact of sort, but not a performance
> improvement of it. I'm not sure if a better way can handle it more
> efficiency.
I know the above concept, since I would expect when people talking about
performance improvement, it would be better to provide real performance
number. Under some conditions it might be hard, but in this exact
example, it won't.

Could you please provide some data that on your configuration, how slow
'sort' becomes ?

AND, reading priority_stats in period for performance monitoring is not
good idea. The problem is not from 'sort', it is from
mutex_lock(&ca->set->bucket_lock) lines above the sort in sysfs.c, when
you have a quite large or very busy cache device, you may see normal I/O
performance drops due to too much time holding bucket_lock here.

Anyway, this is just my suggestion. Back to this patch, please provide
performance number.

Thanks.

Coly Li

2019-08-14 14:24:02

by Heitor Alves de Siqueira

[permalink] [raw]
Subject: Re: [PATCH] bcache: add cond_resched() in __bch_cache_cmp()

Hi Coly,

We've had users impacted by system stalls and were able to trace it back to the
bcache priority_stats query. After investigating a bit further, it seems that
the sorting step in the quantiles calculation can cause heavy CPU
contention. This has a severe performance impact on any task that is running in
the same CPU as the sysfs query, and caused issues even for non-bcache
workloads.

We did some test runs with fio to get a better picture of the impact on
read/write workloads while a priority_stats query is running, and came up with
some interesting results. The bucket locking doesn't seem to have that
much performance impact even in full-write workloads, but the 'sort' can affect
bcache device throughput and latency pretty heavily (and any other tasks that
are "unlucky" to be scheduled together with it). In some of our tests, there was
a performance reduction of almost 90% in random read IOPS to the bcache device
(refer to the comparison graph at [0]). There's a few more details on the
Launchpad bug [1] we've created to track this, together with the complete fio
results + comparison graphs.

The cond_resched() patch suggested by Shile Zhang actually improved performance
a lot, and eliminated the stalls we've observed during the priority_stats
query. Even though it may cause the sysfs query to take a bit longer, it seems
like a decent tradeoff for general performance when running that query on a
system under heavy load. It's also a cheap short-term solution until we can look
into a more complex re-write of the priority_stats calculation (perhaps one that
doesn't require the locking?). Could we revisit Shile's patch, and consider
merging it?

Thanks!
Heitor

[0] https://people.canonical.com/~halves/priority_stats/read/4k-iops-2Dsmooth.png
[1] https://bugs.launchpad.net/bugs/1840043

2019-08-14 16:52:38

by Coly Li

[permalink] [raw]
Subject: Re: [PATCH] bcache: add cond_resched() in __bch_cache_cmp()

On 2019/8/14 10:23 下午, Heitor Alves de Siqueira wrote:
> Hi Coly,
>
> We've had users impacted by system stalls and were able to trace it back to the
> bcache priority_stats query. After investigating a bit further, it seems that
> the sorting step in the quantiles calculation can cause heavy CPU
> contention. This has a severe performance impact on any task that is running in
> the same CPU as the sysfs query, and caused issues even for non-bcache
> workloads.
>
> We did some test runs with fio to get a better picture of the impact on
> read/write workloads while a priority_stats query is running, and came up with
> some interesting results. The bucket locking doesn't seem to have that
> much performance impact even in full-write workloads, but the 'sort' can affect
> bcache device throughput and latency pretty heavily (and any other tasks that
> are "unlucky" to be scheduled together with it). In some of our tests, there was
> a performance reduction of almost 90% in random read IOPS to the bcache device
> (refer to the comparison graph at [0]). There's a few more details on the
> Launchpad bug [1] we've created to track this, together with the complete fio
> results + comparison graphs.
>
> The cond_resched() patch suggested by Shile Zhang actually improved performance
> a lot, and eliminated the stalls we've observed during the priority_stats
> query. Even though it may cause the sysfs query to take a bit longer, it seems
> like a decent tradeoff for general performance when running that query on a
> system under heavy load. It's also a cheap short-term solution until we can look
> into a more complex re-write of the priority_stats calculation (perhaps one that
> doesn't require the locking?). Could we revisit Shile's patch, and consider
> merging it?
>
> Thanks!
> Heitor
>
> [0] https://people.canonical.com/~halves/priority_stats/read/4k-iops-2Dsmooth.png
> [1] https://bugs.launchpad.net/bugs/1840043
>

Hi Heitor,

With your very detailed explanation I come to understand why people
cares about performance impact of pririty_stats. In the case of system
monitoring, how long priority_stats returns is less important than
overall system throughput.

Yes I agree with your opinion and the benchmark chart makes me confident
with the original patch. I will add this patch to v5.4 window with
tested-by: Heitor Alves de Siqueira <[email protected]>

Thanks for your detailed information. And thanks for Shile Zhang
originally composing this patch.

--

Coly Li