Currently we cap the batch count at max(32, 2*nr_online_cpus), which these
days is kind of silly as systems have gotten much bigger than in 2009 when
this heuristic was introduced.
Bump it to capping it at 256 instead. This has a noticeable improvement
for certain io_uring workloads, as io_uring tracks per-task inflight count
using percpu counters.
Signed-off-by: Jens Axboe <[email protected]>
---
diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c
index 00f666d94486..c3a9af5462ba 100644
--- a/lib/percpu_counter.c
+++ b/lib/percpu_counter.c
@@ -188,7 +188,7 @@ static int compute_batch_value(unsigned int cpu)
{
int nr = num_online_cpus();
- percpu_counter_batch = max(32, nr*2);
+ percpu_counter_batch = max(256, nr*2);
return 0;
}
--
Jens Axboe
On Thu, 18 Feb 2021 14:36:31 -0700 Jens Axboe <[email protected]> wrote:
> Currently we cap the batch count at max(32, 2*nr_online_cpus), which these
> days is kind of silly as systems have gotten much bigger than in 2009 when
> this heuristic was introduced.
>
> Bump it to capping it at 256 instead. This has a noticeable improvement
> for certain io_uring workloads, as io_uring tracks per-task inflight count
> using percpu counters.
>
It will also make percpu_counter_read() and
percpu_counter_read_positive() more inaccurate than at present. Any
effects from this will take a while to discover.
But yes, worth trying - I'll add it to the post-rc1 pile.
On 2/18/21 4:16 PM, Andrew Morton wrote:
> On Thu, 18 Feb 2021 14:36:31 -0700 Jens Axboe <[email protected]> wrote:
>
>> Currently we cap the batch count at max(32, 2*nr_online_cpus), which these
>> days is kind of silly as systems have gotten much bigger than in 2009 when
>> this heuristic was introduced.
>>
>> Bump it to capping it at 256 instead. This has a noticeable improvement
>> for certain io_uring workloads, as io_uring tracks per-task inflight count
>> using percpu counters.
>>
>
> It will also make percpu_counter_read() and
> percpu_counter_read_positive() more inaccurate than at present. Any
> effects from this will take a while to discover.
It will, but the value of 32 is very low, especially when you are potentially
doing millions of these per second. So I do think it should track the times
a bit.
> But yes, worth trying - I'll add it to the post-rc1 pile.
Thanks!
--
Jens Axboe
On Thu, 18 Feb 2021, Jens Axboe wrote:
> On 2/18/21 4:16 PM, Andrew Morton wrote:
> > On Thu, 18 Feb 2021 14:36:31 -0700 Jens Axboe <[email protected]> wrote:
> >
> >> Currently we cap the batch count at max(32, 2*nr_online_cpus), which these
> >> days is kind of silly as systems have gotten much bigger than in 2009 when
> >> this heuristic was introduced.
> >>
> >> Bump it to capping it at 256 instead. This has a noticeable improvement
> >> for certain io_uring workloads, as io_uring tracks per-task inflight count
> >> using percpu counters.
I want to quibble with the word "capping" here, it's misleading -
but I'm sorry I cannot think of the right word.
The macro is max() not min(): you're making an improvement for
certain io_uring workloads on machines with 1 to 15 cpus, right?
Does "bigger than in 2009" apply to those?
Though, io_uring could as well use percpu_counter_add_batch() instead?
(Yeah, this has nothing to do with me really, but I was looking at
percpu_counter_compare() just now, for tmpfs reasons, so took more
interest. Not objecting to a change, but the wording leaves me
wondering if the patch does what you think - or, not for the
first time, I'm confused.)
Hugh
> >>
> >
> > It will also make percpu_counter_read() and
> > percpu_counter_read_positive() more inaccurate than at present. Any
> > effects from this will take a while to discover.
>
> It will, but the value of 32 is very low, especially when you are potentially
> doing millions of these per second. So I do think it should track the times
> a bit.
>
> > But yes, worth trying - I'll add it to the post-rc1 pile.
>
> Thanks!
>
> --
> Jens Axboe
On 2/22/21 2:31 PM, Hugh Dickins wrote:
> On Thu, 18 Feb 2021, Jens Axboe wrote:
>> On 2/18/21 4:16 PM, Andrew Morton wrote:
>>> On Thu, 18 Feb 2021 14:36:31 -0700 Jens Axboe <[email protected]> wrote:
>>>
>>>> Currently we cap the batch count at max(32, 2*nr_online_cpus), which these
>>>> days is kind of silly as systems have gotten much bigger than in 2009 when
>>>> this heuristic was introduced.
>>>>
>>>> Bump it to capping it at 256 instead. This has a noticeable improvement
>>>> for certain io_uring workloads, as io_uring tracks per-task inflight count
>>>> using percpu counters.
>
> I want to quibble with the word "capping" here, it's misleading -
> but I'm sorry I cannot think of the right word.
Agree, it's not the best wording. And if you can't think of a better
one, then I'm at a loss too :-)
> The macro is max() not min(): you're making an improvement for
> certain io_uring workloads on machines with 1 to 15 cpus, right?
> Does "bigger than in 2009" apply to those?
Right, that actually had me confused. The box in question has 64 threads,
so my effective count was 128, or 256 with the patch.
> Though, io_uring could as well use percpu_counter_add_batch() instead?
That might be a simpler/better choice!
> (Yeah, this has nothing to do with me really, but I was looking at
> percpu_counter_compare() just now, for tmpfs reasons, so took more
> interest. Not objecting to a change, but the wording leaves me
> wondering if the patch does what you think - or, not for the
> first time, I'm confused.)
I don't think you're confused, and honestly I think using the batch
version instead would likely improve our situation without potentially
changing behavior for everyone else. So it's likely the right way to go.
Thanks Hugh!
--
Jens Axboe
On Mon, 22 Feb 2021, Jens Axboe wrote:
> On 2/22/21 2:31 PM, Hugh Dickins wrote:
> > On Thu, 18 Feb 2021, Jens Axboe wrote:
> >> On 2/18/21 4:16 PM, Andrew Morton wrote:
> >>> On Thu, 18 Feb 2021 14:36:31 -0700 Jens Axboe <[email protected]> wrote:
> >>>
> >>>> Currently we cap the batch count at max(32, 2*nr_online_cpus), which these
> >>>> days is kind of silly as systems have gotten much bigger than in 2009 when
> >>>> this heuristic was introduced.
> >>>>
> >>>> Bump it to capping it at 256 instead. This has a noticeable improvement
> >>>> for certain io_uring workloads, as io_uring tracks per-task inflight count
> >>>> using percpu counters.
> >
> > I want to quibble with the word "capping" here, it's misleading -
> > but I'm sorry I cannot think of the right word.
>
> Agree, it's not the best wording. And if you can't think of a better
> one, then I'm at a loss too :-)
>
> > The macro is max() not min(): you're making an improvement for
> > certain io_uring workloads on machines with 1 to 15 cpus, right?
> > Does "bigger than in 2009" apply to those?
>
> Right, that actually had me confused. The box in question has 64 threads,
> so my effective count was 128, or 256 with the patch.
Ah, yes, so there I *was* confused in saying "1 to 15",
the improvement was for "1 to 127" of course - thanks.
>
> > Though, io_uring could as well use percpu_counter_add_batch() instead?
>
> That might be a simpler/better choice!
>
> > (Yeah, this has nothing to do with me really, but I was looking at
> > percpu_counter_compare() just now, for tmpfs reasons, so took more
> > interest. Not objecting to a change, but the wording leaves me
> > wondering if the patch does what you think - or, not for the
> > first time, I'm confused.)
>
> I don't think you're confused, and honestly I think using the batch
> version instead would likely improve our situation without potentially
> changing behavior for everyone else. So it's likely the right way to go.
You're too polite! But yes, if percpu_counter_add_batch() suits, great.
>
> Thanks Hugh!
>
> --
> Jens Axboe