2023-01-06 18:27:47

by Tejun Heo

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v3] blk-throtl: Introduce sync and async queues for blk-throtl

Hello,

On Sat, Jan 07, 2023 at 02:07:38AM +0800, hanjinke wrote:
> In our internal scenario, iocost has been deployed as the main io isolation
> method and is gradually spreading。

Ah, glad to hear. If you don't mind sharing, how are you configuring iocost
currently? How do you derive the parameters?

> But for some specific scenarios with old kernel versions, blk-throtl
> is alose needed. The scenario described in my email is in the early stage of

Yeah, I think we use blk-throttle in very limited cases currently but might
need to deploy hard limits more in the future.

> research and extensive testing for it. During this period,some priority
> inversion issues amoug cgroups or within one cgroup have been observed. So I
> send this patch to try to fix or mitigate some of these issues.

blk-throttle has a lot of issues which may be difficult to address. Even the
way it's configured is pretty difficult to scale across different hardware /
application combinations and we've neglected its control performance and
behavior (like handling of shared IOs) for quite a while.

While iocost's work-conserving control does address a lot of the use cases
we see today, it's likely that we'll need hard limits more in the future
too. I've been thinking about implementing io.max on top of iocost. There
are some challenges around dynamic vrate adj semantics but it's kinda
attractive because iocost already has the concept of total device capacity.

Thanks.

--
tejun


2023-01-07 05:13:26

by hanjinke

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v3] blk-throtl: Introduce sync and async queues for blk-throtl



在 2023/1/7 上午2:15, Tejun Heo 写道:
> Hello,
>
> On Sat, Jan 07, 2023 at 02:07:38AM +0800, hanjinke wrote:
>> In our internal scenario, iocost has been deployed as the main io isolation
>> method and is gradually spreading。
>
> Ah, glad to hear. If you don't mind sharing, how are you configuring iocost
> currently? How do you derive the parameters?
>

For cost.model setting, We first use the tools iocost provided to test
the benchmark model parameters of different types of disks online, and
then save these benchmark parameters to a parametric Model Table. During
the deployment process, pull and set the corresponding model parameters
according to the type of disk.

The setting of cost.qos should be considered slightly more,we need to
make some compromises between overall disk throughput and io latency.
The average disk utilization of the entire disk on a specific business
and the RLA(if it is io sensitive) of key businesses will be taken as
important input considerations. The cost.qos will be dynamically
fine-tuned according to the health status monitoring of key businesses.

For cost.weight setting, high-priority services will gain greater
advantages through weight settings to deal with a large number of io
requests in a short period of time. It works fine as work-conservation
of iocost works well according to our observation.

These practices can be done better and I look forward to your better
suggestions.


> blk-throttle has a lot of issues which may be difficult to address. Even the
> way it's configured is pretty difficult to scale across different hardware /
> application combinations and we've neglected its control performance and
> behavior (like handling of shared IOs) for quite a while.
>
> While iocost's work-conserving control does address a lot of the use cases
> we see today, it's likely that we'll need hard limits more in the future
> too. I've been thinking about implementing io.max on top of iocost. There
> are some challenges around dynamic vrate adj semantics but it's kinda
> attractive because iocost already has the concept of total device capacity.

Indeed in our multi-tenancy scenario, the hard limits are necessary.

Jinke
Thanks.

2023-01-09 18:21:02

by Tejun Heo

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v3] blk-throtl: Introduce sync and async queues for blk-throtl

Hello,

On Sat, Jan 07, 2023 at 12:44:35PM +0800, hanjinke wrote:
> For cost.model setting, We first use the tools iocost provided to test the
> benchmark model parameters of different types of disks online, and then save
> these benchmark parameters to a parametric Model Table. During the
> deployment process, pull and set the corresponding model parameters
> according to the type of disk.
>
> The setting of cost.qos should be considered slightly more,we need to make
> some compromises between overall disk throughput and io latency.
> The average disk utilization of the entire disk on a specific business and
> the RLA(if it is io sensitive) of key businesses will be taken as
> important input considerations. The cost.qos will be dynamically fine-tuned
> according to the health status monitoring of key businesses.

Ah, I see. Do you use the latency targets and min/max ranges or just fixate
the vrate by setting min == max?

> For cost.weight setting, high-priority services will gain greater
> advantages through weight settings to deal with a large number of io
> requests in a short period of time. It works fine as work-conservation
> of iocost works well according to our observation.

Glad to hear.

> These practices can be done better and I look forward to your better
> suggestions.

It's still in progress but resctl-bench's iocost-tune benchmark is what
we're starting to use:

https://github.com/facebookexperimental/resctl-demo/blob/main/resctl-bench/doc/iocost-tune.md

The benchmark takes like 6 hours and what it does is probing the whole vrate
range looking for behavior inflection points given the scenario of
protecting a latency sensitive workload against memory leak. On completion,
it provides several solutions based on the behavior observed.

The benchmark is destructive (to the content on the target ssd) and can be
tricky to set up. There's installable image to help setting up and running
the benchmark:

https://github.com/iocost-benchmark/resctl-demo-image-recipe/actions

The eventual goal is collecting these benchmark results in the following git
repo:

https://github.com/iocost-benchmark/iocost-benchmarks

which generates hwdb files describing all the found solution and make
systemd apply the appropriate configuration on boot automatically.

It's still all a work in progress but hopefully we should be able to
configure iocost reasonably on boot on most SSDs.

Thanks.

--
tejun

2023-01-10 13:23:31

by hanjinke

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v3] blk-throtl: Introduce sync and async queues for blk-throtl



在 2023/1/10 上午2:08, Tejun Heo 写道:
> Hello,
>
> On Sat, Jan 07, 2023 at 12:44:35PM +0800, hanjinke wrote:
>> For cost.model setting, We first use the tools iocost provided to test the
>> benchmark model parameters of different types of disks online, and then save
>> these benchmark parameters to a parametric Model Table. During the
>> deployment process, pull and set the corresponding model parameters
>> according to the type of disk.
>>
>> The setting of cost.qos should be considered slightly more,we need to make
>> some compromises between overall disk throughput and io latency.
>> The average disk utilization of the entire disk on a specific business and
>> the RLA(if it is io sensitive) of key businesses will be taken as
>> important input considerations. The cost.qos will be dynamically fine-tuned
>> according to the health status monitoring of key businesses.
>
> Ah, I see. Do you use the latency targets and min/max ranges or just fixate
> the vrate by setting min == max?

Currently we use the former.

>
>> For cost.weight setting, high-priority services will gain greater
>> advantages through weight settings to deal with a large number of io
>> requests in a short period of time. It works fine as work-conservation
>> of iocost works well according to our observation.
>
> Glad to hear.
>
>> These practices can be done better and I look forward to your better
>> suggestions.
>
> It's still in progress but resctl-bench's iocost-tune benchmark is what
> we're starting to use:
>
> https://github.com/facebookexperimental/resctl-demo/blob/main/resctl-bench/doc/iocost-tune.md
>
> The benchmark takes like 6 hours and what it does is probing the whole vrate
> range looking for behavior inflection points given the scenario of
> protecting a latency sensitive workload against memory leak. On completion,
> it provides several solutions based on the behavior observed.
>
> The benchmark is destructive (to the content on the target ssd) and can be
> tricky to set up. There's installable image to help setting up and running
> the benchmark:
>
> https://github.com/iocost-benchmark/resctl-demo-image-recipe/actions
>
> The eventual goal is collecting these benchmark results in the following git
> repo:
>
> https://github.com/iocost-benchmark/iocost-benchmarks
>
> which generates hwdb files describing all the found solution and make
> systemd apply the appropriate configuration on boot automatically.
>
> It's still all a work in progress but hopefully we should be able to
> configure iocost reasonably on boot on most SSDs.
>
> Thanks.
>

These methodologies are worthy of our study and will definitely help our
future deployment of iocost. Thanks a lot.

Thanks.