On Mon 11-09-23 10:21:24, Tejun Heo wrote:
> Hello,
>
> On Mon, Sep 11, 2023 at 01:01:25PM -0700, Wei Xu wrote:
> > Yes, it is the same test (10K contending readers). The kernel change
> > is to remove stats_user_flush_mutex from mem_cgroup_user_flush_stats()
> > so that the concurrent mem_cgroup_user_flush_stats() requests directly
> > contend on cgroup_rstat_lock in cgroup_rstat_flush().
>
> I don't think it'd be a good idea to twist rstat and other kernel internal
> code to accommodate 10k parallel readers.
I didn't mean to suggest optimizing for this specific scenario. I was
mostly curious whether the pathological case of unbound high latency due
to lock dropping is easy to trigger by huge number of readers. It seems
it is not and the mutex might not be really needed as a prevention.
> If we want to support that, let's
> explicitly support that by implementing better batching in the read path.
Well, we need to be able to handle those situations because stat files
are generally readable and we do not want unrelated workloads to
influence each other heavily through this path.
[...]
> When you have that many concurrent readers, most of them won't need to
> actually flush.
Agreed!
--
Michal Hocko
SUSE Labs
On Tue, Sep 12, 2023 at 4:03 AM Michal Hocko <[email protected]> wrote:
>
> On Mon 11-09-23 10:21:24, Tejun Heo wrote:
> > Hello,
> >
> > On Mon, Sep 11, 2023 at 01:01:25PM -0700, Wei Xu wrote:
> > > Yes, it is the same test (10K contending readers). The kernel change
> > > is to remove stats_user_flush_mutex from mem_cgroup_user_flush_stats()
> > > so that the concurrent mem_cgroup_user_flush_stats() requests directly
> > > contend on cgroup_rstat_lock in cgroup_rstat_flush().
> >
> > I don't think it'd be a good idea to twist rstat and other kernel internal
> > code to accommodate 10k parallel readers.
>
> I didn't mean to suggest optimizing for this specific scenario. I was
> mostly curious whether the pathological case of unbound high latency due
> to lock dropping is easy to trigger by huge number of readers. It seems
> it is not and the mutex might not be really needed as a prevention.
>
> > If we want to support that, let's
> > explicitly support that by implementing better batching in the read path.
>
> Well, we need to be able to handle those situations because stat files
> are generally readable and we do not want unrelated workloads to
> influence each other heavily through this path.
I am working on a complete rework of this series based on the feedback
I got from Wei and the discussions here. I think I have something
simpler and more generic, and doesn't proliferate the number of
flushing variants we have. I am running some tests right now and will
share it as soon as I can.
It should address the high concurrency use case without adding a lot
of complexity. It basically involves a fast path where we only flush
the needed subtree if there's no contention, and a slow path where we
coalesce all flushing requests, and everyone just waits for a single
flush to complete (without spinning or contending on any locks). I am
trying to use this generic mechanism for both userspace reads and
in-kernel flushers. I am making sure in-kernel flushers do not
regress.
>
> [...]
>
> > When you have that many concurrent readers, most of them won't need to
> > actually flush.
>
> Agreed!
> --
> Michal Hocko
> SUSE Labs