2021-12-14 03:17:00

by Yajun Deng

[permalink] [raw]
Subject: [PATCH] lib/raid6: fix abnormally high latency

We found an abnormally high latency when executing modprobe raid6_pq, the
latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than
67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
This is caused by disable the preemption, this time is too long and
unreasonable. We just need to disable migration. so used migrate_disable()/
migrate_enable() instead of preempt_disable()/preempt_enable(). This is
beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for
CONFIG_PREEMPT_VOLUNTARY=y.

Fixes: fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
Signed-off-by: Yajun Deng <[email protected]>
---
lib/raid6/algos.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
index 6d5e5000fdd7..21611d05c34c 100644
--- a/lib/raid6/algos.c
+++ b/lib/raid6/algos.c
@@ -162,7 +162,7 @@ static inline const struct raid6_calls *raid6_choose_gen(

perf = 0;

- preempt_disable();
+ migrate_disable();
j0 = jiffies;
while ((j1 = jiffies) == j0)
cpu_relax();
@@ -171,7 +171,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
(*algo)->gen_syndrome(disks, PAGE_SIZE, *dptrs);
perf++;
}
- preempt_enable();
+ migrate_enable();

if (perf > bestgenperf) {
bestgenperf = perf;
@@ -186,7 +186,7 @@ static inline const struct raid6_calls *raid6_choose_gen(

perf = 0;

- preempt_disable();
+ migrate_disable();
j0 = jiffies;
while ((j1 = jiffies) == j0)
cpu_relax();
@@ -196,7 +196,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
PAGE_SIZE, *dptrs);
perf++;
}
- preempt_enable();
+ migrate_enable();

if (best == *algo)
bestxorperf = perf;
--
2.32.0



2021-12-14 17:28:11

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH] lib/raid6: fix abnormally high latency

On Mon, Dec 13, 2021 at 7:17 PM Yajun Deng <[email protected]> wrote:
>
> We found an abnormally high latency when executing modprobe raid6_pq, the
> latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than
> 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
> This is caused by disable the preemption, this time is too long and
> unreasonable. We just need to disable migration. so used migrate_disable()/
> migrate_enable() instead of preempt_disable()/preempt_enable(). This is
> beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for
> CONFIG_PREEMPT_VOLUNTARY=y.
>
> Fixes: fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
> Signed-off-by: Yajun Deng <[email protected]>

We measure the speed of different RAID algorithms.If we don't disable
preempt, the result may be inaccurate, right? IIUC, we only disable preempt
for 16 jiffies. Why do we see 1.2 second delay?

Thanks,
Song

> ---
> lib/raid6/algos.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
> index 6d5e5000fdd7..21611d05c34c 100644
> --- a/lib/raid6/algos.c
> +++ b/lib/raid6/algos.c
> @@ -162,7 +162,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
>
> perf = 0;
>
> - preempt_disable();
> + migrate_disable();
> j0 = jiffies;
> while ((j1 = jiffies) == j0)
> cpu_relax();
> @@ -171,7 +171,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
> (*algo)->gen_syndrome(disks, PAGE_SIZE, *dptrs);
> perf++;
> }
> - preempt_enable();
> + migrate_enable();
>
> if (perf > bestgenperf) {
> bestgenperf = perf;
> @@ -186,7 +186,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
>
> perf = 0;
>
> - preempt_disable();
> + migrate_disable();
> j0 = jiffies;
> while ((j1 = jiffies) == j0)
> cpu_relax();
> @@ -196,7 +196,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
> PAGE_SIZE, *dptrs);
> perf++;
> }
> - preempt_enable();
> + migrate_enable();
>
> if (best == *algo)
> bestxorperf = perf;
> --
> 2.32.0
>

2021-12-15 02:14:50

by Yajun Deng

[permalink] [raw]
Subject: Re: [PATCH] lib/raid6: fix abnormally high latency

December 15, 2021 1:27 AM, "Song Liu" <[email protected]> wrote:

> On Mon, Dec 13, 2021 at 7:17 PM Yajun Deng <[email protected]> wrote:
>
>> We found an abnormally high latency when executing modprobe raid6_pq, the
>> latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than
>> 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
>> This is caused by disable the preemption, this time is too long and
>> unreasonable. We just need to disable migration. so used migrate_disable()/
>> migrate_enable() instead of preempt_disable()/preempt_enable(). This is
>> beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for
>> CONFIG_PREEMPT_VOLUNTARY=y.
>>
>> Fixes: fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
>> Signed-off-by: Yajun Deng <[email protected]>
>
> We measure the speed of different RAID algorithms.If we don't disable
> preempt, the result may be inaccurate, right? IIUC, we only disable preempt
> for 16 jiffies. Why do we see 1.2 second delay?

Here are the command of my test:
Execute "sudo cyclictest -S -p 95 -d 0 -i 1000 -D 24h -m" in one terminal and "sudo modprobe raid6_pq" in the other terminal.

Here are the results of my test:
CONFIG_PREEMPT_VOLUNTARY=y,CONFIG_HZ_250=y
T: 0 ( 3092) P:95 I:1000 C: 8514 Min: 1 Act: 2 Avg: 1 Max: 6
T: 1 ( 3093) P:95 I:1000 C: 8511 Min: 1 Act: 2 Avg: 1 Max: 14
T: 2 ( 3094) P:95 I:1000 C: 8509 Min: 1 Act: 2 Avg: 1 Max: 5
T: 3 ( 3095) P:95 I:1000 C: 8506 Min: 1 Act: 2 Avg: 1 Max: 2
T: 4 ( 3096) P:95 I:1000 C: 8504 Min: 1 Act: 2 Avg: 1 Max: 3
T: 5 ( 3097) P:95 I:1000 C: 8502 Min: 1 Act: 2 Avg: 1 Max: 3
T: 6 ( 3098) P:95 I:1000 C: 8499 Min: 1 Act: 2 Avg: 1 Max: 3
T: 7 ( 3099) P:95 I:1000 C: 8497 Min: 1 Act: 1 Avg: 1 Max: 16
T: 8 ( 3100) P:95 I:1000 C: 8495 Min: 1 Act: 2 Avg: 1 Max: 5
T: 9 ( 3101) P:95 I:1000 C: 8493 Min: 1 Act: 2 Avg: 1 Max: 4
T:10 ( 3102) P:95 I:1000 C: 8490 Min: 1 Act: 2 Avg: 1 Max: 13
T:11 ( 3103) P:95 I:1000 C: 8488 Min: 1 Act: 2 Avg: 1 Max: 8
T:12 ( 3104) P:95 I:1000 C: 8486 Min: 1 Act: 2 Avg: 1 Max: 10
T:13 ( 3105) P:95 I:1000 C: 8484 Min: 1 Act: 2 Avg: 2 Max: 6
T:14 ( 3106) P:95 I:1000 C: 8482 Min: 1 Act: 2 Avg: 1 Max: 3
T:15 ( 3107) P:95 I:1000 C: 8479 Min: 1 Act: 2 Avg: 1 Max: 5
T:16 ( 3108) P:95 I:1000 C: 8477 Min: 1 Act: 2 Avg: 2 Max: 137
T:17 ( 3109) P:95 I:1000 C: 7255 Min: 1 Act: 2 Avg: 170 Max: 1220832
T:18 ( 3110) P:95 I:1000 C: 8473 Min: 1 Act: 2 Avg: 1 Max: 5
T:19 ( 3111) P:95 I:1000 C: 8471 Min: 1 Act: 2 Avg: 1 Max: 4
T:20 ( 3112) P:95 I:1000 C: 8469 Min: 1 Act: 2 Avg: 1 Max: 4
T:21 ( 3113) P:95 I:1000 C: 8466 Min: 1 Act: 2 Avg: 1 Max: 3
T:22 ( 3114) P:95 I:1000 C: 8464 Min: 1 Act: 2 Avg: 1 Max: 8
T:23 ( 3115) P:95 I:1000 C: 8462 Min: 1 Act: 2 Avg: 1 Max: 10
T:24 ( 3116) P:95 I:1000 C: 8457 Min: 1 Act: 2 Avg: 1 Max: 9
T:25 ( 3117) P:95 I:1000 C: 8458 Min: 1 Act: 2 Avg: 1 Max: 4
T:26 ( 3118) P:95 I:1000 C: 8456 Min: 1 Act: 2 Avg: 1 Max: 5
T:27 ( 3119) P:95 I:1000 C: 8454 Min: 1 Act: 2 Avg: 1 Max: 2
T:28 ( 3120) P:95 I:1000 C: 8452 Min: 1 Act: 2 Avg: 1 Max: 3
T:29 ( 3121) P:95 I:1000 C: 8450 Min: 2 Act: 2 Avg: 2 Max: 3
T:30 ( 3122) P:95 I:1000 C: 8447 Min: 1 Act: 2 Avg: 1 Max: 2
T:31 ( 3123) P:95 I:1000 C: 8445 Min: 1 Act: 2 Avg: 1 Max: 12

CONFIG_PREEMPT=y,CONFIG_HZ_250=y
T: 0 ( 2460) P:95 I:1000 C: 49161 Min: 1 Act: 2 Avg: 2 Max: 8
T: 1 ( 2461) P:95 I:1000 C: 49158 Min: 1 Act: 2 Avg: 2 Max: 9
T: 2 ( 2462) P:95 I:1000 C: 49024 Min: 1 Act: 2 Avg: 4 Max: 67888
T: 3 ( 2463) P:95 I:1000 C: 49153 Min: 1 Act: 2 Avg: 2 Max: 4
T: 4 ( 2464) P:95 I:1000 C: 48882 Min: 1 Act: 2 Avg: 7 Max: 67726
T: 5 ( 2465) P:95 I:1000 C: 49148 Min: 1 Act: 2 Avg: 2 Max: 5
T: 6 ( 2466) P:95 I:1000 C: 49146 Min: 1 Act: 2 Avg: 1 Max: 6
T: 7 ( 2467) P:95 I:1000 C: 49143 Min: 2 Act: 2 Avg: 2 Max: 6
T: 8 ( 2468) P:95 I:1000 C: 49140 Min: 1 Act: 2 Avg: 2 Max: 929
T: 9 ( 2469) P:95 I:1000 C: 49139 Min: 1 Act: 2 Avg: 2 Max: 8
T:10 ( 2470) P:95 I:1000 C: 49136 Min: 1 Act: 2 Avg: 2 Max: 8
T:11 ( 2471) P:95 I:1000 C: 48866 Min: 1 Act: 2 Avg: 7 Max: 67417
T:12 ( 2472) P:95 I:1000 C: 49132 Min: 1 Act: 2 Avg: 2 Max: 6
T:13 ( 2473) P:95 I:1000 C: 49063 Min: 1 Act: 2 Avg: 3 Max: 67854
T:14 ( 2474) P:95 I:1000 C: 49127 Min: 1 Act: 2 Avg: 2 Max: 5
T:15 ( 2475) P:95 I:1000 C: 49125 Min: 1 Act: 2 Avg: 2 Max: 7
T:16 ( 2476) P:95 I:1000 C: 49123 Min: 1 Act: 2 Avg: 2 Max: 5
T:17 ( 2477) P:95 I:1000 C: 49121 Min: 1 Act: 2 Avg: 2 Max: 6
T:18 ( 2478) P:95 I:1000 C: 49052 Min: 1 Act: 2 Avg: 3 Max: 67717
T:19 ( 2479) P:95 I:1000 C: 49116 Min: 1 Act: 2 Avg: 2 Max: 6
T:20 ( 2480) P:95 I:1000 C: 48913 Min: 1 Act: 2 Avg: 6 Max: 67291
T:21 ( 2481) P:95 I:1000 C: 49112 Min: 1 Act: 2 Avg: 1 Max: 4
T:22 ( 2482) P:95 I:1000 C: 49110 Min: 1 Act: 2 Avg: 2 Max: 5
T:23 ( 2483) P:95 I:1000 C: 49108 Min: 1 Act: 2 Avg: 1 Max: 4
T:24 ( 2484) P:95 I:1000 C: 49105 Min: 1 Act: 2 Avg: 2 Max: 6
T:25 ( 2485) P:95 I:1000 C: 49103 Min: 1 Act: 2 Avg: 2 Max: 7
T:26 ( 2486) P:95 I:1000 C: 49101 Min: 1 Act: 2 Avg: 2 Max: 6
T:27 ( 2487) P:95 I:1000 C: 48898 Min: 1 Act: 2 Avg: 6 Max: 67004
T:28 ( 2488) P:95 I:1000 C: 49097 Min: 1 Act: 2 Avg: 2 Max: 7
T:29 ( 2489) P:95 I:1000 C: 49095 Min: 1 Act: 2 Avg: 1 Max: 3
T:30 ( 2490) P:95 I:1000 C: 49092 Min: 1 Act: 2 Avg: 1 Max: 4
T:31 ( 2491) P:95 I:1000 C: 49090 Min: 1 Act: 2 Avg: 2 Max: 6

CONFIG_PREEMPT_RT=y,CONFIG_HZ_1000=y
T: 0 ( 2555) P:95 I:1000 C: 25336 Min: 2 Act: 2 Avg: 2 Max: 11
T: 1 ( 2556) P:95 I:1000 C: 25332 Min: 2 Act: 4 Avg: 3 Max: 10
T: 2 ( 2557) P:95 I:1000 C: 25329 Min: 3 Act: 3 Avg: 3 Max: 7
T: 3 ( 2558) P:95 I:1000 C: 25326 Min: 3 Act: 3 Avg: 3 Max: 9
T: 4 ( 2559) P:95 I:1000 C: 25322 Min: 3 Act: 3 Avg: 3 Max: 19
T: 5 ( 2560) P:95 I:1000 C: 25319 Min: 3 Act: 3 Avg: 3 Max: 9
T: 6 ( 2561) P:95 I:1000 C: 25284 Min: 3 Act: 3 Avg: 4 Max: 16593
T: 7 ( 2562) P:95 I:1000 C: 25265 Min: 3 Act: 3 Avg: 5 Max: 16375
T: 8 ( 2563) P:95 I:1000 C: 25293 Min: 3 Act: 3 Avg: 3 Max: 16138
T: 9 ( 2564) P:95 I:1000 C: 25290 Min: 3 Act: 5 Avg: 5 Max: 16902
T:10 ( 2565) P:95 I:1000 C: 25287 Min: 3 Act: 3 Avg: 4 Max: 16696
T:11 ( 2566) P:95 I:1000 C: 25300 Min: 3 Act: 3 Avg: 3 Max: 7
T:12 ( 2567) P:95 I:1000 C: 25297 Min: 2 Act: 2 Avg: 2 Max: 9
T:13 ( 2568) P:95 I:1000 C: 25277 Min: 3 Act: 3 Avg: 3 Max: 16179
T:14 ( 2569) P:95 I:1000 C: 25274 Min: 3 Act: 3 Avg: 3 Max: 16043
T:15 ( 2570) P:95 I:1000 C: 25287 Min: 3 Act: 3 Avg: 3 Max: 8
T:16 ( 2571) P:95 I:1000 C: 25284 Min: 2 Act: 2 Avg: 2 Max: 5
T:17 ( 2572) P:95 I:1000 C: 25281 Min: 3 Act: 3 Avg: 3 Max: 7
T:18 ( 2573) P:95 I:1000 C: 25278 Min: 3 Act: 3 Avg: 3 Max: 7
T:19 ( 2574) P:95 I:1000 C: 25275 Min: 3 Act: 3 Avg: 3 Max: 6
T:20 ( 2575) P:95 I:1000 C: 25272 Min: 3 Act: 3 Avg: 3 Max: 7
T:21 ( 2576) P:95 I:1000 C: 25269 Min: 3 Act: 4 Avg: 3 Max: 10
T:22 ( 2577) P:95 I:1000 C: 25234 Min: 2 Act: 3 Avg: 4 Max: 16244
T:23 ( 2578) P:95 I:1000 C: 25231 Min: 2 Act: 4 Avg: 4 Max: 16200
T:24 ( 2579) P:95 I:1000 C: 25259 Min: 3 Act: 3 Avg: 3 Max: 6
T:25 ( 2580) P:95 I:1000 C: 25256 Min: 3 Act: 3 Avg: 3 Max: 8
T:26 ( 2581) P:95 I:1000 C: 25237 Min: 3 Act: 3 Avg: 3 Max: 16078
T:27 ( 2582) P:95 I:1000 C: 25250 Min: 3 Act: 3 Avg: 3 Max: 7
T:28 ( 2583) P:95 I:1000 C: 25247 Min: 3 Act: 3 Avg: 3 Max: 8
T:29 ( 2584) P:95 I:1000 C: 25214 Min: 2 Act: 3 Avg: 4 Max: 15985
T:30 ( 2585) P:95 I:1000 C: 25225 Min: 2 Act: 2 Avg: 3 Max: 16988
T:31 ( 2586) P:95 I:1000 C: 25238 Min: 3 Act: 3 Avg: 3 Max: 12


>
> Thanks,
> Song
>
>> ---
>> lib/raid6/algos.c | 8 ++++----
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
>> index 6d5e5000fdd7..21611d05c34c 100644
>> --- a/lib/raid6/algos.c
>> +++ b/lib/raid6/algos.c
>> @@ -162,7 +162,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
>>
>> perf = 0;
>>
>> - preempt_disable();
>> + migrate_disable();
>> j0 = jiffies;
>> while ((j1 = jiffies) == j0)
>> cpu_relax();
>> @@ -171,7 +171,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
>> (*algo)->gen_syndrome(disks, PAGE_SIZE, *dptrs);
>> perf++;
>> }
>> - preempt_enable();
>> + migrate_enable();
>>
>> if (perf > bestgenperf) {
>> bestgenperf = perf;
>> @@ -186,7 +186,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
>>
>> perf = 0;
>>
>> - preempt_disable();
>> + migrate_disable();
>> j0 = jiffies;
>> while ((j1 = jiffies) == j0)
>> cpu_relax();
>> @@ -196,7 +196,7 @@ static inline const struct raid6_calls *raid6_choose_gen(
>> PAGE_SIZE, *dptrs);
>> perf++;
>> }
>> - preempt_enable();
>> + migrate_enable();
>>
>> if (best == *algo)
>> bestxorperf = perf;
>> --
>> 2.32.0

2021-12-15 16:52:27

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH] lib/raid6: fix abnormally high latency

On Tue, Dec 14, 2021 at 6:14 PM <[email protected]> wrote:
>
> December 15, 2021 1:27 AM, "Song Liu" <[email protected]> wrote:
>
> > On Mon, Dec 13, 2021 at 7:17 PM Yajun Deng <[email protected]> wrote:
> >
> >> We found an abnormally high latency when executing modprobe raid6_pq, the
> >> latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than
> >> 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
> >> This is caused by disable the preemption, this time is too long and
> >> unreasonable. We just need to disable migration. so used migrate_disable()/
> >> migrate_enable() instead of preempt_disable()/preempt_enable(). This is
> >> beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for
> >> CONFIG_PREEMPT_VOLUNTARY=y.
> >>
> >> Fixes: fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
> >> Signed-off-by: Yajun Deng <[email protected]>
> >
> > We measure the speed of different RAID algorithms.If we don't disable
> > preempt, the result may be inaccurate, right? IIUC, we only disable preempt
> > for 16 jiffies. Why do we see 1.2 second delay?
>
> Here are the command of my test:
> Execute "sudo cyclictest -S -p 95 -d 0 -i 1000 -D 24h -m" in one terminal and "sudo modprobe raid6_pq" in the other terminal.
>
> Here are the results of my test:
> CONFIG_PREEMPT_VOLUNTARY=y,CONFIG_HZ_250=y
> T: 0 ( 3092) P:95 I:1000 C: 8514 Min: 1 Act: 2 Avg: 1 Max: 6
> T: 1 ( 3093) P:95 I:1000 C: 8511 Min: 1 Act: 2 Avg: 1 Max: 14

I am not very familiar with the RT work, so please forgive me for some
rookie questions.

From the result, I think the CONFIG_PREEMPT_VOLUNTARY=y and the
CONFIG_PREEMPT=y cases failed to preempt during the preempt enabled period in
raid6_choose_gen(). Is this expected?

OTOH, the 16ms latency with CONFIG_PREEMPT_RT=y is more or less expected.
Is this acceptable? If not, is 1ms latency acceptable?

Thanks,
Song


[...]

2021-12-16 02:15:15

by Yajun Deng

[permalink] [raw]
Subject: Re: [PATCH] lib/raid6: fix abnormally high latency

December 16, 2021 12:52 AM, "Song Liu" <[email protected]> wrote:

> On Tue, Dec 14, 2021 at 6:14 PM <[email protected]> wrote:
>
>> December 15, 2021 1:27 AM, "Song Liu" <[email protected]> wrote:
>>
>> On Mon, Dec 13, 2021 at 7:17 PM Yajun Deng <[email protected]> wrote:
>>
>> We found an abnormally high latency when executing modprobe raid6_pq, the
>> latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than
>> 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
>> This is caused by disable the preemption, this time is too long and
>> unreasonable. We just need to disable migration. so used migrate_disable()/
>> migrate_enable() instead of preempt_disable()/preempt_enable(). This is
>> beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for
>> CONFIG_PREEMPT_VOLUNTARY=y.
>>
>> Fixes: fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
>> Signed-off-by: Yajun Deng <[email protected]>
>>
>> We measure the speed of different RAID algorithms.If we don't disable
>> preempt, the result may be inaccurate, right? IIUC, we only disable preempt
>> for 16 jiffies. Why do we see 1.2 second delay?
>>
>> Here are the command of my test:
>> Execute "sudo cyclictest -S -p 95 -d 0 -i 1000 -D 24h -m" in one terminal and "sudo modprobe
>> raid6_pq" in the other terminal.
>>
>> Here are the results of my test:
>> CONFIG_PREEMPT_VOLUNTARY=y,CONFIG_HZ_250=y
>> T: 0 ( 3092) P:95 I:1000 C: 8514 Min: 1 Act: 2 Avg: 1 Max: 6
>> T: 1 ( 3093) P:95 I:1000 C: 8511 Min: 1 Act: 2 Avg: 1 Max: 14
>
> I am not very familiar with the RT work, so please forgive me for some
> rookie questions.
>
> From the result, I think the CONFIG_PREEMPT_VOLUNTARY=y and the
> CONFIG_PREEMPT=y cases failed to preempt during the preempt enabled period in
> raid6_choose_gen(). Is this expected?
>

No, This is due to disable preemption causing ksoftirqd fail to schedule, we can use bcc tools see that.

> OTOH, the 16ms latency with CONFIG_PREEMPT_RT=y is more or less expected.
> Is this acceptable? If not, is 1ms latency acceptable?
>

Here are the test results after adding patch:
CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y
T: 0 ( 3167) P:95 I:1000 C: 13958 Min: 1 Act: 2 Avg: 1 Max: 5
T: 1 ( 3168) P:95 I:1000 C: 13956 Min: 1 Act: 2 Avg: 1 Max: 7
T: 2 ( 3169) P:95 I:1000 C: 13946 Min: 1 Act: 2 Avg: 1 Max: 12
T: 3 ( 3170) P:95 I:1000 C: 13951 Min: 1 Act: 2 Avg: 1 Max: 5
T: 4 ( 3171) P:95 I:1000 C: 13949 Min: 1 Act: 2 Avg: 1 Max: 3
T: 5 ( 3172) P:95 I:1000 C: 13947 Min: 1 Act: 2 Avg: 1 Max: 16
T: 6 ( 3173) P:95 I:1000 C: 13945 Min: 1 Act: 2 Avg: 2 Max: 7
T: 7 ( 3174) P:95 I:1000 C: 13942 Min: 1 Act: 2 Avg: 1 Max: 3
T: 8 ( 3175) P:95 I:1000 C: 13940 Min: 1 Act: 2 Avg: 1 Max: 3
T: 9 ( 3176) P:95 I:1000 C: 13938 Min: 1 Act: 1 Avg: 1 Max: 3
T:10 ( 3177) P:95 I:1000 C: 13936 Min: 1 Act: 2 Avg: 1 Max: 6
T:11 ( 3178) P:95 I:1000 C: 13933 Min: 1 Act: 2 Avg: 1 Max: 3
T:12 ( 3179) P:95 I:1000 C: 13931 Min: 1 Act: 2 Avg: 1 Max: 4
T:13 ( 3180) P:95 I:1000 C: 13929 Min: 1 Act: 2 Avg: 1 Max: 7
T:14 ( 3181) P:95 I:1000 C: 13927 Min: 1 Act: 2 Avg: 1 Max: 6
T:15 ( 3182) P:95 I:1000 C: 13925 Min: 1 Act: 2 Avg: 1 Max: 4
T:16 ( 3183) P:95 I:1000 C: 13923 Min: 1 Act: 2 Avg: 1 Max: 5
T:17 ( 3184) P:95 I:1000 C: 13921 Min: 1 Act: 2 Avg: 1 Max: 5
T:18 ( 3185) P:95 I:1000 C: 13919 Min: 1 Act: 2 Avg: 1 Max: 4
T:19 ( 3186) P:95 I:1000 C: 13916 Min: 1 Act: 2 Avg: 1 Max: 4
T:20 ( 3187) P:95 I:1000 C: 13914 Min: 1 Act: 2 Avg: 1 Max: 4
T:21 ( 3188) P:95 I:1000 C: 13912 Min: 1 Act: 2 Avg: 1 Max: 10
T:22 ( 3189) P:95 I:1000 C: 13910 Min: 1 Act: 2 Avg: 1 Max: 5
T:23 ( 3190) P:95 I:1000 C: 13908 Min: 1 Act: 2 Avg: 1 Max: 5
T:24 ( 3191) P:95 I:1000 C: 13906 Min: 1 Act: 2 Avg: 1 Max: 18
T:25 ( 3192) P:95 I:1000 C: 13904 Min: 1 Act: 2 Avg: 1 Max: 7
T:26 ( 3193) P:95 I:1000 C: 13902 Min: 1 Act: 2 Avg: 1 Max: 5
T:27 ( 3194) P:95 I:1000 C: 13900 Min: 1 Act: 1 Avg: 1 Max: 11
T:28 ( 3195) P:95 I:1000 C: 13898 Min: 1 Act: 2 Avg: 1 Max: 3
T:29 ( 3196) P:95 I:1000 C: 13896 Min: 1 Act: 2 Avg: 1 Max: 5
T:30 ( 3197) P:95 I:1000 C: 13894 Min: 1 Act: 2 Avg: 1 Max: 4
T:31 ( 3198) P:95 I:1000 C: 13892 Min: 1 Act: 2 Avg: 1 Max: 3

we can see the latency will not greater than 100us,so 1ms latency is also too long for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y.
use migrate_disable()/migrate_enable() instead of preempt_disable()/preempt_enable() will not affect the speed of different RAID algorithms and the latency can be reduced to a reasonable range.


> Thanks,
> Song
>
> [...]

2021-12-16 06:39:18

by Song Liu

[permalink] [raw]
Subject: Re: [PATCH] lib/raid6: fix abnormally high latency

On Wed, Dec 15, 2021 at 6:15 PM <[email protected]> wrote:
>
> December 16, 2021 12:52 AM, "Song Liu" <[email protected]> wrote:
>
> > On Tue, Dec 14, 2021 at 6:14 PM <[email protected]> wrote:
> >
> >> December 15, 2021 1:27 AM, "Song Liu" <[email protected]> wrote:
> >>
> >> On Mon, Dec 13, 2021 at 7:17 PM Yajun Deng <[email protected]> wrote:
> >>
> >> We found an abnormally high latency when executing modprobe raid6_pq, the
> >> latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than
> >> 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
> >> This is caused by disable the preemption, this time is too long and
> >> unreasonable. We just need to disable migration. so used migrate_disable()/
> >> migrate_enable() instead of preempt_disable()/preempt_enable(). This is
> >> beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for
> >> CONFIG_PREEMPT_VOLUNTARY=y.
> >>
> >> Fixes: fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
> >> Signed-off-by: Yajun Deng <[email protected]>
> >>
> >> We measure the speed of different RAID algorithms.If we don't disable
> >> preempt, the result may be inaccurate, right? IIUC, we only disable preempt
> >> for 16 jiffies. Why do we see 1.2 second delay?
> >>
> >> Here are the command of my test:
> >> Execute "sudo cyclictest -S -p 95 -d 0 -i 1000 -D 24h -m" in one terminal and "sudo modprobe
> >> raid6_pq" in the other terminal.
> >>
> >> Here are the results of my test:
> >> CONFIG_PREEMPT_VOLUNTARY=y,CONFIG_HZ_250=y
> >> T: 0 ( 3092) P:95 I:1000 C: 8514 Min: 1 Act: 2 Avg: 1 Max: 6
> >> T: 1 ( 3093) P:95 I:1000 C: 8511 Min: 1 Act: 2 Avg: 1 Max: 14
> >
> > I am not very familiar with the RT work, so please forgive me for some
> > rookie questions.
> >
> > From the result, I think the CONFIG_PREEMPT_VOLUNTARY=y and the
> > CONFIG_PREEMPT=y cases failed to preempt during the preempt enabled period in
> > raid6_choose_gen(). Is this expected?
> >
>
> No, This is due to disable preemption causing ksoftirqd fail to schedule, we can use bcc tools see that.
>
> > OTOH, the 16ms latency with CONFIG_PREEMPT_RT=y is more or less expected.
> > Is this acceptable? If not, is 1ms latency acceptable?
> >
>
> Here are the test results after adding patch:
> CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y
> T: 0 ( 3167) P:95 I:1000 C: 13958 Min: 1 Act: 2 Avg: 1 Max: 5
> T: 1 ( 3168) P:95 I:1000 C: 13956 Min: 1 Act: 2 Avg: 1 Max: 7
> T: 2 ( 3169) P:95 I:1000 C: 13946 Min: 1 Act: 2 Avg: 1 Max: 12
> T: 3 ( 3170) P:95 I:1000 C: 13951 Min: 1 Act: 2 Avg: 1 Max: 5
> T: 4 ( 3171) P:95 I:1000 C: 13949 Min: 1 Act: 2 Avg: 1 Max: 3
> T: 5 ( 3172) P:95 I:1000 C: 13947 Min: 1 Act: 2 Avg: 1 Max: 16
> T: 6 ( 3173) P:95 I:1000 C: 13945 Min: 1 Act: 2 Avg: 2 Max: 7
> T: 7 ( 3174) P:95 I:1000 C: 13942 Min: 1 Act: 2 Avg: 1 Max: 3
> T: 8 ( 3175) P:95 I:1000 C: 13940 Min: 1 Act: 2 Avg: 1 Max: 3
> T: 9 ( 3176) P:95 I:1000 C: 13938 Min: 1 Act: 1 Avg: 1 Max: 3
> T:10 ( 3177) P:95 I:1000 C: 13936 Min: 1 Act: 2 Avg: 1 Max: 6
> T:11 ( 3178) P:95 I:1000 C: 13933 Min: 1 Act: 2 Avg: 1 Max: 3
> T:12 ( 3179) P:95 I:1000 C: 13931 Min: 1 Act: 2 Avg: 1 Max: 4
> T:13 ( 3180) P:95 I:1000 C: 13929 Min: 1 Act: 2 Avg: 1 Max: 7
> T:14 ( 3181) P:95 I:1000 C: 13927 Min: 1 Act: 2 Avg: 1 Max: 6
> T:15 ( 3182) P:95 I:1000 C: 13925 Min: 1 Act: 2 Avg: 1 Max: 4
> T:16 ( 3183) P:95 I:1000 C: 13923 Min: 1 Act: 2 Avg: 1 Max: 5
> T:17 ( 3184) P:95 I:1000 C: 13921 Min: 1 Act: 2 Avg: 1 Max: 5
> T:18 ( 3185) P:95 I:1000 C: 13919 Min: 1 Act: 2 Avg: 1 Max: 4
> T:19 ( 3186) P:95 I:1000 C: 13916 Min: 1 Act: 2 Avg: 1 Max: 4
> T:20 ( 3187) P:95 I:1000 C: 13914 Min: 1 Act: 2 Avg: 1 Max: 4
> T:21 ( 3188) P:95 I:1000 C: 13912 Min: 1 Act: 2 Avg: 1 Max: 10
> T:22 ( 3189) P:95 I:1000 C: 13910 Min: 1 Act: 2 Avg: 1 Max: 5
> T:23 ( 3190) P:95 I:1000 C: 13908 Min: 1 Act: 2 Avg: 1 Max: 5
> T:24 ( 3191) P:95 I:1000 C: 13906 Min: 1 Act: 2 Avg: 1 Max: 18
> T:25 ( 3192) P:95 I:1000 C: 13904 Min: 1 Act: 2 Avg: 1 Max: 7
> T:26 ( 3193) P:95 I:1000 C: 13902 Min: 1 Act: 2 Avg: 1 Max: 5
> T:27 ( 3194) P:95 I:1000 C: 13900 Min: 1 Act: 1 Avg: 1 Max: 11
> T:28 ( 3195) P:95 I:1000 C: 13898 Min: 1 Act: 2 Avg: 1 Max: 3
> T:29 ( 3196) P:95 I:1000 C: 13896 Min: 1 Act: 2 Avg: 1 Max: 5
> T:30 ( 3197) P:95 I:1000 C: 13894 Min: 1 Act: 2 Avg: 1 Max: 4
> T:31 ( 3198) P:95 I:1000 C: 13892 Min: 1 Act: 2 Avg: 1 Max: 3
>
> we can see the latency will not greater than 100us,so 1ms latency is also too long for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y.
> use migrate_disable()/migrate_enable() instead of preempt_disable()/preempt_enable() will not affect the speed of different RAID algorithms and the latency can be reduced to a reasonable range.

I think allowing preempt may still affect the speed comparison. But
such discrepancy
should be acceptable. I will apply this to md-next.

Thanks,
Song

2021-12-16 07:04:43

by Yajun Deng

[permalink] [raw]
Subject: Re: [PATCH] lib/raid6: fix abnormally high latency

December 16, 2021 2:39 PM, "Song Liu" <[email protected]> wrote:

> On Wed, Dec 15, 2021 at 6:15 PM <[email protected]> wrote:
>
>> December 16, 2021 12:52 AM, "Song Liu" <[email protected]> wrote:
>>
>> On Tue, Dec 14, 2021 at 6:14 PM <[email protected]> wrote:
>>
>> December 15, 2021 1:27 AM, "Song Liu" <[email protected]> wrote:
>>
>> On Mon, Dec 13, 2021 at 7:17 PM Yajun Deng <[email protected]> wrote:
>>
>> We found an abnormally high latency when executing modprobe raid6_pq, the
>> latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than
>> 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
>> This is caused by disable the preemption, this time is too long and
>> unreasonable. We just need to disable migration. so used migrate_disable()/
>> migrate_enable() instead of preempt_disable()/preempt_enable(). This is
>> beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for
>> CONFIG_PREEMPT_VOLUNTARY=y.
>>
>> Fixes: fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
>> Signed-off-by: Yajun Deng <[email protected]>
>>
>> We measure the speed of different RAID algorithms.If we don't disable
>> preempt, the result may be inaccurate, right? IIUC, we only disable preempt
>> for 16 jiffies. Why do we see 1.2 second delay?
>>
>> Here are the command of my test:
>> Execute "sudo cyclictest -S -p 95 -d 0 -i 1000 -D 24h -m" in one terminal and "sudo modprobe
>> raid6_pq" in the other terminal.
>>
>> Here are the results of my test:
>> CONFIG_PREEMPT_VOLUNTARY=y,CONFIG_HZ_250=y
>> T: 0 ( 3092) P:95 I:1000 C: 8514 Min: 1 Act: 2 Avg: 1 Max: 6
>> T: 1 ( 3093) P:95 I:1000 C: 8511 Min: 1 Act: 2 Avg: 1 Max: 14
>>
>> I am not very familiar with the RT work, so please forgive me for some
>> rookie questions.
>>
>> From the result, I think the CONFIG_PREEMPT_VOLUNTARY=y and the
>> CONFIG_PREEMPT=y cases failed to preempt during the preempt enabled period in
>> raid6_choose_gen(). Is this expected?
>>
>> No, This is due to disable preemption causing ksoftirqd fail to schedule, we can use bcc tools see
>> that.
>>
>> OTOH, the 16ms latency with CONFIG_PREEMPT_RT=y is more or less expected.
>> Is this acceptable? If not, is 1ms latency acceptable?
>>
>> Here are the test results after adding patch:
>> CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y
>> T: 0 ( 3167) P:95 I:1000 C: 13958 Min: 1 Act: 2 Avg: 1 Max: 5
>> T: 1 ( 3168) P:95 I:1000 C: 13956 Min: 1 Act: 2 Avg: 1 Max: 7
>> T: 2 ( 3169) P:95 I:1000 C: 13946 Min: 1 Act: 2 Avg: 1 Max: 12
>> T: 3 ( 3170) P:95 I:1000 C: 13951 Min: 1 Act: 2 Avg: 1 Max: 5
>> T: 4 ( 3171) P:95 I:1000 C: 13949 Min: 1 Act: 2 Avg: 1 Max: 3
>> T: 5 ( 3172) P:95 I:1000 C: 13947 Min: 1 Act: 2 Avg: 1 Max: 16
>> T: 6 ( 3173) P:95 I:1000 C: 13945 Min: 1 Act: 2 Avg: 2 Max: 7
>> T: 7 ( 3174) P:95 I:1000 C: 13942 Min: 1 Act: 2 Avg: 1 Max: 3
>> T: 8 ( 3175) P:95 I:1000 C: 13940 Min: 1 Act: 2 Avg: 1 Max: 3
>> T: 9 ( 3176) P:95 I:1000 C: 13938 Min: 1 Act: 1 Avg: 1 Max: 3
>> T:10 ( 3177) P:95 I:1000 C: 13936 Min: 1 Act: 2 Avg: 1 Max: 6
>> T:11 ( 3178) P:95 I:1000 C: 13933 Min: 1 Act: 2 Avg: 1 Max: 3
>> T:12 ( 3179) P:95 I:1000 C: 13931 Min: 1 Act: 2 Avg: 1 Max: 4
>> T:13 ( 3180) P:95 I:1000 C: 13929 Min: 1 Act: 2 Avg: 1 Max: 7
>> T:14 ( 3181) P:95 I:1000 C: 13927 Min: 1 Act: 2 Avg: 1 Max: 6
>> T:15 ( 3182) P:95 I:1000 C: 13925 Min: 1 Act: 2 Avg: 1 Max: 4
>> T:16 ( 3183) P:95 I:1000 C: 13923 Min: 1 Act: 2 Avg: 1 Max: 5
>> T:17 ( 3184) P:95 I:1000 C: 13921 Min: 1 Act: 2 Avg: 1 Max: 5
>> T:18 ( 3185) P:95 I:1000 C: 13919 Min: 1 Act: 2 Avg: 1 Max: 4
>> T:19 ( 3186) P:95 I:1000 C: 13916 Min: 1 Act: 2 Avg: 1 Max: 4
>> T:20 ( 3187) P:95 I:1000 C: 13914 Min: 1 Act: 2 Avg: 1 Max: 4
>> T:21 ( 3188) P:95 I:1000 C: 13912 Min: 1 Act: 2 Avg: 1 Max: 10
>> T:22 ( 3189) P:95 I:1000 C: 13910 Min: 1 Act: 2 Avg: 1 Max: 5
>> T:23 ( 3190) P:95 I:1000 C: 13908 Min: 1 Act: 2 Avg: 1 Max: 5
>> T:24 ( 3191) P:95 I:1000 C: 13906 Min: 1 Act: 2 Avg: 1 Max: 18
>> T:25 ( 3192) P:95 I:1000 C: 13904 Min: 1 Act: 2 Avg: 1 Max: 7
>> T:26 ( 3193) P:95 I:1000 C: 13902 Min: 1 Act: 2 Avg: 1 Max: 5
>> T:27 ( 3194) P:95 I:1000 C: 13900 Min: 1 Act: 1 Avg: 1 Max: 11
>> T:28 ( 3195) P:95 I:1000 C: 13898 Min: 1 Act: 2 Avg: 1 Max: 3
>> T:29 ( 3196) P:95 I:1000 C: 13896 Min: 1 Act: 2 Avg: 1 Max: 5
>> T:30 ( 3197) P:95 I:1000 C: 13894 Min: 1 Act: 2 Avg: 1 Max: 4
>> T:31 ( 3198) P:95 I:1000 C: 13892 Min: 1 Act: 2 Avg: 1 Max: 3
>>
>> we can see the latency will not greater than 100us,so 1ms latency is also too long for
>> CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y.
>> use migrate_disable()/migrate_enable() instead of preempt_disable()/preempt_enable() will not
>> affect the speed of different RAID algorithms and the latency can be reduced to a reasonable range.
>
> I think allowing preempt may still affect the speed comparison. But
> such discrepancy
> should be acceptable. I will apply this to md-next.

Thank you, would you mind backport this patch to all LTS kernel? As you can see, the latency is not a little higher, but too much higher, it seems unreasonable, I think this may be a bug. We are using the 5.10 kernel and it would be very convenient for us if that was done.

> Thanks,
> Song

2021-12-16 08:09:32

by Paul Menzel

[permalink] [raw]
Subject: Re: [PATCH] lib/raid6: fix abnormally high latency

Dear Song, dear Yajun,


Am 16.12.21 um 07:39 schrieb Song Liu:
> On Wed, Dec 15, 2021 at 6:15 PM <[email protected]> wrote:
>>
>> December 16, 2021 12:52 AM, "Song Liu" <[email protected]> wrote:
>>
>>> On Tue, Dec 14, 2021 at 6:14 PM <[email protected]> wrote:
>>>
>>>> December 15, 2021 1:27 AM, "Song Liu" <[email protected]> wrote:
>>>>
>>>> On Mon, Dec 13, 2021 at 7:17 PM Yajun Deng <[email protected]> wrote:
>>>>
>>>> We found an abnormally high latency when executing modprobe raid6_pq, the
>>>> latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than
>>>> 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
>>>> This is caused by disable the preemption, this time is too long and
>>>> unreasonable. We just need to disable migration. so used migrate_disable()/
>>>> migrate_enable() instead of preempt_disable()/preempt_enable(). This is
>>>> beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for
>>>> CONFIG_PREEMPT_VOLUNTARY=y.
>>>>
>>>> Fixes: fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
>>>> Signed-off-by: Yajun Deng <[email protected]>
>>>>
>>>> We measure the speed of different RAID algorithms.If we don't disable
>>>> preempt, the result may be inaccurate, right? IIUC, we only disable preempt
>>>> for 16 jiffies. Why do we see 1.2 second delay?
>>>>
>>>> Here are the command of my test:
>>>> Execute "sudo cyclictest -S -p 95 -d 0 -i 1000 -D 24h -m" in one terminal and "sudo modprobe
>>>> raid6_pq" in the other terminal.
>>>>
>>>> Here are the results of my test:
>>>> CONFIG_PREEMPT_VOLUNTARY=y,CONFIG_HZ_250=y
>>>> T: 0 ( 3092) P:95 I:1000 C: 8514 Min: 1 Act: 2 Avg: 1 Max: 6
>>>> T: 1 ( 3093) P:95 I:1000 C: 8511 Min: 1 Act: 2 Avg: 1 Max: 14
>>>
>>> I am not very familiar with the RT work, so please forgive me for some
>>> rookie questions.
>>>
>>> From the result, I think the CONFIG_PREEMPT_VOLUNTARY=y and the
>>> CONFIG_PREEMPT=y cases failed to preempt during the preempt enabled period in
>>> raid6_choose_gen(). Is this expected?
>>>
>>
>> No, This is due to disable preemption causing ksoftirqd fail to schedule, we can use bcc tools see that.
>>
>>> OTOH, the 16ms latency with CONFIG_PREEMPT_RT=y is more or less expected.
>>> Is this acceptable? If not, is 1ms latency acceptable?
>>>
>>
>> Here are the test results after adding patch:
>> CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y
>> T: 0 ( 3167) P:95 I:1000 C: 13958 Min: 1 Act: 2 Avg: 1 Max: 5
>> T: 1 ( 3168) P:95 I:1000 C: 13956 Min: 1 Act: 2 Avg: 1 Max: 7
>> T: 2 ( 3169) P:95 I:1000 C: 13946 Min: 1 Act: 2 Avg: 1 Max: 12
>> T: 3 ( 3170) P:95 I:1000 C: 13951 Min: 1 Act: 2 Avg: 1 Max: 5
>> T: 4 ( 3171) P:95 I:1000 C: 13949 Min: 1 Act: 2 Avg: 1 Max: 3
>> T: 5 ( 3172) P:95 I:1000 C: 13947 Min: 1 Act: 2 Avg: 1 Max: 16
>> T: 6 ( 3173) P:95 I:1000 C: 13945 Min: 1 Act: 2 Avg: 2 Max: 7
>> T: 7 ( 3174) P:95 I:1000 C: 13942 Min: 1 Act: 2 Avg: 1 Max: 3
>> T: 8 ( 3175) P:95 I:1000 C: 13940 Min: 1 Act: 2 Avg: 1 Max: 3
>> T: 9 ( 3176) P:95 I:1000 C: 13938 Min: 1 Act: 1 Avg: 1 Max: 3
>> T:10 ( 3177) P:95 I:1000 C: 13936 Min: 1 Act: 2 Avg: 1 Max: 6
>> T:11 ( 3178) P:95 I:1000 C: 13933 Min: 1 Act: 2 Avg: 1 Max: 3
>> T:12 ( 3179) P:95 I:1000 C: 13931 Min: 1 Act: 2 Avg: 1 Max: 4
>> T:13 ( 3180) P:95 I:1000 C: 13929 Min: 1 Act: 2 Avg: 1 Max: 7
>> T:14 ( 3181) P:95 I:1000 C: 13927 Min: 1 Act: 2 Avg: 1 Max: 6
>> T:15 ( 3182) P:95 I:1000 C: 13925 Min: 1 Act: 2 Avg: 1 Max: 4
>> T:16 ( 3183) P:95 I:1000 C: 13923 Min: 1 Act: 2 Avg: 1 Max: 5
>> T:17 ( 3184) P:95 I:1000 C: 13921 Min: 1 Act: 2 Avg: 1 Max: 5
>> T:18 ( 3185) P:95 I:1000 C: 13919 Min: 1 Act: 2 Avg: 1 Max: 4
>> T:19 ( 3186) P:95 I:1000 C: 13916 Min: 1 Act: 2 Avg: 1 Max: 4
>> T:20 ( 3187) P:95 I:1000 C: 13914 Min: 1 Act: 2 Avg: 1 Max: 4
>> T:21 ( 3188) P:95 I:1000 C: 13912 Min: 1 Act: 2 Avg: 1 Max: 10
>> T:22 ( 3189) P:95 I:1000 C: 13910 Min: 1 Act: 2 Avg: 1 Max: 5
>> T:23 ( 3190) P:95 I:1000 C: 13908 Min: 1 Act: 2 Avg: 1 Max: 5
>> T:24 ( 3191) P:95 I:1000 C: 13906 Min: 1 Act: 2 Avg: 1 Max: 18
>> T:25 ( 3192) P:95 I:1000 C: 13904 Min: 1 Act: 2 Avg: 1 Max: 7
>> T:26 ( 3193) P:95 I:1000 C: 13902 Min: 1 Act: 2 Avg: 1 Max: 5
>> T:27 ( 3194) P:95 I:1000 C: 13900 Min: 1 Act: 1 Avg: 1 Max: 11
>> T:28 ( 3195) P:95 I:1000 C: 13898 Min: 1 Act: 2 Avg: 1 Max: 3
>> T:29 ( 3196) P:95 I:1000 C: 13896 Min: 1 Act: 2 Avg: 1 Max: 5
>> T:30 ( 3197) P:95 I:1000 C: 13894 Min: 1 Act: 2 Avg: 1 Max: 4
>> T:31 ( 3198) P:95 I:1000 C: 13892 Min: 1 Act: 2 Avg: 1 Max: 3
>>
>> we can see the latency will not greater than 100us,so 1ms latency is also too long for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y.
>> use migrate_disable()/migrate_enable() instead of preempt_disable()/preempt_enable() will not affect the speed of different RAID algorithms and the latency can be reduced to a reasonable range.
>
> I think allowing preempt may still affect the speed comparison. But
> such discrepancy should be acceptable. I will apply this to md-next.

Could the commit message please be extended, how to reproduce this? No
idea, where to find `cyclictest` for example. Was `initcall_debug` used
to measure the execution time of the init method?

Lastly, only one Fixes: tag is added, but the mentioned one only added
one of the changed `preempt_enabled()`/`preempt_disable()`. Should all
be listed?

The commit message could also say something like:

> Reduce high latency by using migrate instead of preempt


Kind regards,

Paul

2021-12-16 08:27:35

by Yajun Deng

[permalink] [raw]
Subject: Re: [PATCH] lib/raid6: fix abnormally high latency

December 16, 2021 4:09 PM, "Paul Menzel" <[email protected]> wrote:

> Dear Song, dear Yajun,
>
> Am 16.12.21 um 07:39 schrieb Song Liu:
>
>> On Wed, Dec 15, 2021 at 6:15 PM <[email protected]> wrote:
>>> December 16, 2021 12:52 AM, "Song Liu" <[email protected]> wrote:
>>
>> On Tue, Dec 14, 2021 at 6:14 PM <[email protected]> wrote:
>>
>> December 15, 2021 1:27 AM, "Song Liu" <[email protected]> wrote:
>>
>> On Mon, Dec 13, 2021 at 7:17 PM Yajun Deng <[email protected]> wrote:
>>
>> We found an abnormally high latency when executing modprobe raid6_pq, the
>> latency is greater than 1.2s when CONFIG_PREEMPT_VOLUNTARY=y, greater than
>> 67ms when CONFIG_PREEMPT=y, and greater than 16ms when CONFIG_PREEMPT_RT=y.
>> This is caused by disable the preemption, this time is too long and
>> unreasonable. We just need to disable migration. so used migrate_disable()/
>> migrate_enable() instead of preempt_disable()/preempt_enable(). This is
>> beneficial for CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y, but no effect for
>> CONFIG_PREEMPT_VOLUNTARY=y.
>>
>> Fixes: fe5cbc6e06c7 ("md/raid6 algorithms: delta syndrome functions")
>> Signed-off-by: Yajun Deng <[email protected]>
>>
>> We measure the speed of different RAID algorithms.If we don't disable
>> preempt, the result may be inaccurate, right? IIUC, we only disable preempt
>> for 16 jiffies. Why do we see 1.2 second delay?
>>
>> Here are the command of my test:
>> Execute "sudo cyclictest -S -p 95 -d 0 -i 1000 -D 24h -m" in one terminal and "sudo modprobe
>> raid6_pq" in the other terminal.
>>
>> Here are the results of my test:
>> CONFIG_PREEMPT_VOLUNTARY=y,CONFIG_HZ_250=y
>> T: 0 ( 3092) P:95 I:1000 C: 8514 Min: 1 Act: 2 Avg: 1 Max: 6
>> T: 1 ( 3093) P:95 I:1000 C: 8511 Min: 1 Act: 2 Avg: 1 Max: 14
>>
>> I am not very familiar with the RT work, so please forgive me for some
>> rookie questions.
>>
>> From the result, I think the CONFIG_PREEMPT_VOLUNTARY=y and the
>> CONFIG_PREEMPT=y cases failed to preempt during the preempt enabled period in
>> raid6_choose_gen(). Is this expected?
>>
>>> No, This is due to disable preemption causing ksoftirqd fail to schedule, we can use bcc tools see
>>> that.
>>
>> OTOH, the 16ms latency with CONFIG_PREEMPT_RT=y is more or less expected.
>> Is this acceptable? If not, is 1ms latency acceptable?
>>
>>> Here are the test results after adding patch:
>>> CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y
>>> T: 0 ( 3167) P:95 I:1000 C: 13958 Min: 1 Act: 2 Avg: 1 Max: 5
>>> T: 1 ( 3168) P:95 I:1000 C: 13956 Min: 1 Act: 2 Avg: 1 Max: 7
>>> T: 2 ( 3169) P:95 I:1000 C: 13946 Min: 1 Act: 2 Avg: 1 Max: 12
>>> T: 3 ( 3170) P:95 I:1000 C: 13951 Min: 1 Act: 2 Avg: 1 Max: 5
>>> T: 4 ( 3171) P:95 I:1000 C: 13949 Min: 1 Act: 2 Avg: 1 Max: 3
>>> T: 5 ( 3172) P:95 I:1000 C: 13947 Min: 1 Act: 2 Avg: 1 Max: 16
>>> T: 6 ( 3173) P:95 I:1000 C: 13945 Min: 1 Act: 2 Avg: 2 Max: 7
>>> T: 7 ( 3174) P:95 I:1000 C: 13942 Min: 1 Act: 2 Avg: 1 Max: 3
>>> T: 8 ( 3175) P:95 I:1000 C: 13940 Min: 1 Act: 2 Avg: 1 Max: 3
>>> T: 9 ( 3176) P:95 I:1000 C: 13938 Min: 1 Act: 1 Avg: 1 Max: 3
>>> T:10 ( 3177) P:95 I:1000 C: 13936 Min: 1 Act: 2 Avg: 1 Max: 6
>>> T:11 ( 3178) P:95 I:1000 C: 13933 Min: 1 Act: 2 Avg: 1 Max: 3
>>> T:12 ( 3179) P:95 I:1000 C: 13931 Min: 1 Act: 2 Avg: 1 Max: 4
>>> T:13 ( 3180) P:95 I:1000 C: 13929 Min: 1 Act: 2 Avg: 1 Max: 7
>>> T:14 ( 3181) P:95 I:1000 C: 13927 Min: 1 Act: 2 Avg: 1 Max: 6
>>> T:15 ( 3182) P:95 I:1000 C: 13925 Min: 1 Act: 2 Avg: 1 Max: 4
>>> T:16 ( 3183) P:95 I:1000 C: 13923 Min: 1 Act: 2 Avg: 1 Max: 5
>>> T:17 ( 3184) P:95 I:1000 C: 13921 Min: 1 Act: 2 Avg: 1 Max: 5
>>> T:18 ( 3185) P:95 I:1000 C: 13919 Min: 1 Act: 2 Avg: 1 Max: 4
>>> T:19 ( 3186) P:95 I:1000 C: 13916 Min: 1 Act: 2 Avg: 1 Max: 4
>>> T:20 ( 3187) P:95 I:1000 C: 13914 Min: 1 Act: 2 Avg: 1 Max: 4
>>> T:21 ( 3188) P:95 I:1000 C: 13912 Min: 1 Act: 2 Avg: 1 Max: 10
>>> T:22 ( 3189) P:95 I:1000 C: 13910 Min: 1 Act: 2 Avg: 1 Max: 5
>>> T:23 ( 3190) P:95 I:1000 C: 13908 Min: 1 Act: 2 Avg: 1 Max: 5
>>> T:24 ( 3191) P:95 I:1000 C: 13906 Min: 1 Act: 2 Avg: 1 Max: 18
>>> T:25 ( 3192) P:95 I:1000 C: 13904 Min: 1 Act: 2 Avg: 1 Max: 7
>>> T:26 ( 3193) P:95 I:1000 C: 13902 Min: 1 Act: 2 Avg: 1 Max: 5
>>> T:27 ( 3194) P:95 I:1000 C: 13900 Min: 1 Act: 1 Avg: 1 Max: 11
>>> T:28 ( 3195) P:95 I:1000 C: 13898 Min: 1 Act: 2 Avg: 1 Max: 3
>>> T:29 ( 3196) P:95 I:1000 C: 13896 Min: 1 Act: 2 Avg: 1 Max: 5
>>> T:30 ( 3197) P:95 I:1000 C: 13894 Min: 1 Act: 2 Avg: 1 Max: 4
>>> T:31 ( 3198) P:95 I:1000 C: 13892 Min: 1 Act: 2 Avg: 1 Max: 3
>>>
>>> we can see the latency will not greater than 100us,so 1ms latency is also too long for
>>> CONFIG_PREEMPT=y or CONFIG_PREEMPT_RT=y.
>>> use migrate_disable()/migrate_enable() instead of preempt_disable()/preempt_enable() will not
>>> affect the speed of different RAID algorithms and the latency can be reduced to a reasonable range.
>>
>> I think allowing preempt may still affect the speed comparison. But
>> such discrepancy should be acceptable. I will apply this to md-next.
>
> Could the commit message please be extended, how to reproduce this? No idea, where to find
> `cyclictest` for example. Was `initcall_debug` used to measure the execution time of the init
> method?
>
> Lastly, only one Fixes: tag is added, but the mentioned one only added one of the changed
> `preempt_enabled()`/`preempt_disable()`. Should all be listed?
>
> The commit message could also say something like:
>
>> Reduce high latency by using migrate instead of preempt
>
> Kind regards,
>
> Paul

Thank you for your suggestion, I will submit another patch, extending comment message.

2021-12-17 21:51:52

by Daniel Vacek

[permalink] [raw]
Subject: Re: [PATCH] lib/raid6: fix abnormally high latency

On Thu, Dec 16, 2021 at 1:19 PM Song Liu <[email protected]> wrote:
> I think allowing preempt may still affect the speed comparison. But
> such discrepancy
> should be acceptable. I will apply this to md-next.

No, please. This could eventually result with people starting to
report 'sometimes the storage does not perform' or 'random high CPU
usage on some boots' kind of issues.

We should not touch this. See my other email with detailed explanation.

--nX

> Thanks,
> Song