by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH v3 5/9] SUNRPC: Count pool threads that were awoken but found no work to do

2023-07-19 13:18:02

by Chuck Lever

[permalink] [raw]

Subject: Re: [PATCH v3 5/9] SUNRPC: Count pool threads that were awoken but found no work to do

> On Jul 18, 2023, at 8:39 PM, NeilBrown <[email protected]> wrote:
>
> On Tue, 11 Jul 2023, Chuck Lever III wrote:
>>
>>> On Jul 10, 2023, at 6:29 PM, NeilBrown <[email protected]> wrote:
>>>
>>> On Tue, 11 Jul 2023, Chuck Lever wrote:
>>>> From: Chuck Lever <[email protected]>
>>>>
>>>> Measure a source of thread scheduling inefficiency -- count threads
>>>> that were awoken but found that the transport queue had already been
>>>> emptied.
>>>>
>>>> An empty transport queue is possible when threads that run between
>>>> the wake_up_process() call and the woken thread returning from the
>>>> scheduler have pulled all remaining work off the transport queue
>>>> using the first svc_xprt_dequeue() in svc_get_next_xprt().
>>>
>>> I'm in two minds about this. The data being gathered here is
>>> potentially useful
>>
>> It's actually pretty shocking: I've measured more than
>> 15% of thread wake-ups find no work to do.
>
> I'm now wondering if that is a reliable statistic.
>
> This counter as implemented doesn't count "pool threads that were awoken
> but found no work to do". Rather, it counts "pool threads that found no
> work to do, either after having been awoken, or having just completed
> some other request".

In the current code, the only way to get to "return -EAGAIN;" is if the
thread calls schedule_timeout() (ie, it actually sleeps), then the
svc_rqst was specifically selected and awoken, and the schedule_timeout()
did not time out.

I don't see a problem.

> And it doesn't even really count that is it doesn't notice that lockd
> "retry blocked" work (or the NFSv4.1 callback work, but we don't count
> that at all I think).
>
> Maybe we should only update the count if we had actually been woken up
> recently.

So this one can be dropped for now since it's currently of value only
for working on the scheduler changes. If the thread scheduler were to
change such that a work item was actually assigned to a thread before
it is awoken (a la, a work queue model) then this counter would be
truly meaningless. I think we can wait for a bit.

--
Chuck Lever

2023-07-19 23:50:31

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH v3 5/9] SUNRPC: Count pool threads that were awoken but found no work to do

On Wed, 19 Jul 2023, Chuck Lever III wrote:
>
> > On Jul 18, 2023, at 8:39 PM, NeilBrown <[email protected]> wrote:
> >
> > On Tue, 11 Jul 2023, Chuck Lever III wrote:
> >>
> >>> On Jul 10, 2023, at 6:29 PM, NeilBrown <[email protected]> wrote:
> >>>
> >>> On Tue, 11 Jul 2023, Chuck Lever wrote:
> >>>> From: Chuck Lever <[email protected]>
> >>>>
> >>>> Measure a source of thread scheduling inefficiency -- count threads
> >>>> that were awoken but found that the transport queue had already been
> >>>> emptied.
> >>>>
> >>>> An empty transport queue is possible when threads that run between
> >>>> the wake_up_process() call and the woken thread returning from the
> >>>> scheduler have pulled all remaining work off the transport queue
> >>>> using the first svc_xprt_dequeue() in svc_get_next_xprt().
> >>>
> >>> I'm in two minds about this. The data being gathered here is
> >>> potentially useful
> >>
> >> It's actually pretty shocking: I've measured more than
> >> 15% of thread wake-ups find no work to do.
> >
> > I'm now wondering if that is a reliable statistic.
> >
> > This counter as implemented doesn't count "pool threads that were awoken
> > but found no work to do". Rather, it counts "pool threads that found no
> > work to do, either after having been awoken, or having just completed
> > some other request".
>
> In the current code, the only way to get to "return -EAGAIN;" is if the
> thread calls schedule_timeout() (ie, it actually sleeps), then the
> svc_rqst was specifically selected and awoken, and the schedule_timeout()
> did not time out.
>
> I don't see a problem.
>

Yeah - I don't either any more. Sorry for the noise.

>
> > And it doesn't even really count that is it doesn't notice that lockd
> > "retry blocked" work (or the NFSv4.1 callback work, but we don't count
> > that at all I think).
> >
> > Maybe we should only update the count if we had actually been woken up
> > recently.
>
> So this one can be dropped for now since it's currently of value only
> for working on the scheduler changes. If the thread scheduler were to
> change such that a work item was actually assigned to a thread before
> it is awoken (a la, a work queue model) then this counter would be
> truly meaningless. I think we can wait for a bit.
>

We used to assign a workitem to a thread before it was woken. I find
that model to be aesthetically pleasing.
Trond changed that in
Commit: 22700f3c6df5 ("SUNRPC: Improve ordering of transport processing")

claiming that the wake-up time for a sleeping thread could result in
poorer throughput. No data given but I find the reasoning quite
credible.

Thanks,
NeilBrown

>
> --
> Chuck Lever
>
>
>