2023-03-09 21:38:07

by Nathan Huckleberry

[permalink] [raw]
Subject: [PATCH] fsverity: Remove WQ_UNBOUND from fsverity read workqueue

WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This
is problematic for latency sensitive workloads like I/O post-processing.

Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
scheduler latency and improves app cold startup times by ~30ms.

This code was tested by running Android app startup benchmarks and
measuring how long the fsverity workqueue spent in the ready queue.

Before
Total workqueue scheduler latency: 553800us
After
Total workqueue scheduler latency: 18962us

Change-Id: I693efee541757851ed6d229430111cd763d39067
Signed-off-by: Nathan Huckleberry <[email protected]>
---
fs/verity/verify.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/verity/verify.c b/fs/verity/verify.c
index f50e3b5b52c9..e8ec37774d63 100644
--- a/fs/verity/verify.c
+++ b/fs/verity/verify.c
@@ -395,7 +395,7 @@ int __init fsverity_init_workqueue(void)
* which blocks reads from completing, over regular application tasks.
*/
fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue",
- WQ_UNBOUND | WQ_HIGHPRI,
+ WQ_HIGHPRI,
num_online_cpus());
if (!fsverity_read_workqueue)
return -ENOMEM;
--
2.40.0.rc1.284.g88254d51c5-goog



2023-03-10 05:12:10

by Eric Biggers

[permalink] [raw]
Subject: Re: [PATCH] fsverity: Remove WQ_UNBOUND from fsverity read workqueue

On Thu, Mar 09, 2023 at 01:37:41PM -0800, Nathan Huckleberry wrote:
> WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This
> is problematic for latency sensitive workloads like I/O post-processing.
>
> Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
> scheduler latency and improves app cold startup times by ~30ms.

Maybe mention that WQ_UNBOUND was recently removed from the dm-verity workqueue
too, for the same reason?

I'm still amazed that it's such a big improvement! I don't really need it to
apply this patch, but it would be very interesting to know exactly why the
latency is so bad with WQ_UNBOUND.

>
> This code was tested by running Android app startup benchmarks and
> measuring how long the fsverity workqueue spent in the ready queue.
>
> Before
> Total workqueue scheduler latency: 553800us
> After
> Total workqueue scheduler latency: 18962us
>
> Change-Id: I693efee541757851ed6d229430111cd763d39067

No Change-Id in upstream patches, please.

> diff --git a/fs/verity/verify.c b/fs/verity/verify.c
> index f50e3b5b52c9..e8ec37774d63 100644
> --- a/fs/verity/verify.c
> +++ b/fs/verity/verify.c
> @@ -395,7 +395,7 @@ int __init fsverity_init_workqueue(void)
> * which blocks reads from completing, over regular application tasks.
> */
> fsverity_read_workqueue = alloc_workqueue("fsverity_read_queue",
> - WQ_UNBOUND | WQ_HIGHPRI,
> + WQ_HIGHPRI,
> num_online_cpus());

There's a comment just above here that explains why WQ_UNBOUND is being used.
It needs to be updated to explain why WQ_UNBOUND is *not* being used.

- Eric

2023-03-10 19:10:52

by Nathan Huckleberry

[permalink] [raw]
Subject: Re: [PATCH] fsverity: Remove WQ_UNBOUND from fsverity read workqueue

On Fri, Mar 10, 2023 at 12:01 AM Hillf Danton <[email protected]> wrote:
>
> On 9 Mar 2023 21:11:47 -0800 Eric Biggers <[email protected]>
> > On Thu, Mar 09, 2023 at 01:37:41PM -0800, Nathan Huckleberry wrote:
> > > WQ_UNBOUND causes significant scheduler latency on ARM64/Android. This
> > > is problematic for latency sensitive workloads like I/O post-processing.
> > >
> > > Removing WQ_UNBOUND gives a 96% reduction in fsverity workqueue related
> > > scheduler latency and improves app cold startup times by ~30ms.
> >
> > Maybe mention that WQ_UNBOUND was recently removed from the dm-verity workqueue
> > too, for the same reason?
> >
> > I'm still amazed that it's such a big improvement! I don't really need it to
> > apply this patch, but it would be very interesting to know exactly why the
> > latency is so bad with WQ_UNBOUND.

My current guess for the root cause is excessing saving/restoring of
the FPSIMD state.

> >
> > > This code was tested by running Android app startup benchmarks and
> > > measuring how long the fsverity workqueue spent in the ready queue.
> > >
> > > Before
> > > Total workqueue scheduler latency: 553800us
> > > After
> > > Total workqueue scheduler latency: 18962us
>
> Given the gap between data above and the 15253 us in diagram[1], and
> the SHA instructions[2], could you specify a bit on your test?

The test I'm running opens the Android messaging APK which is
validated with fsverity. It opens the messaging app 25 times, dropping
caches each time. The benchmark produces a Perfetto trace which we use
to compute the scheduler latency. We sum up the amount of time that
each fsverity worker spent in the ready state. The test in [1] is
similar, but may be using a different APK. These tests are not in
AOSP, so I can't share a link to them, but I would expect that fio on
a ramdisk would produce similarly good results.

>
> [1] https://lore.kernel.org/linux-erofs/[email protected]/
> [2] https://lore.kernel.org/lkml/CAJkfWY490-m6wNubkxiTPsW59sfsQs37Wey279LmiRxKt7aQYg@mail.gmail.com/