Date:   Tue, 28 Feb 2023 19:17:05 -0800
From:   Jakub Kicinski <kuba@kernel.org>
To:     Thomas Gleixner <tglx@linutronix.de>
Cc:     Eric Dumazet <edumazet@google.com>,
        LKML <linux-kernel@vger.kernel.org>,
        Linus Torvalds <torvalds@linuxfoundation.org>, x86@kernel.org,
        Wangyang Guo <wangyang.guo@intel.com>,
        Arjan van De Ven <arjan@linux.intel.com>,
        "David S. Miller" <davem@davemloft.net>,
        Paolo Abeni <pabeni@redhat.com>, netdev@vger.kernel.org,
        Will Deacon <will@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Boqun Feng <boqun.feng@gmail.com>,
        Mark Rutland <mark.rutland@arm.com>,
        Marc Zyngier <maz@kernel.org>
Subject: Re: [patch 0/3] net, refcount: Address dst_entry reference count
 scalability issues
Message-ID: <20230228191705.3bc8bed6@kernel.org>
In-Reply-To: <871qm9mgkb.ffs@tglx>
References: <20230228132118.978145284@linutronix.de>
        <CANn89iL2pYt2QA2sS4KkXrCSjprz9byE_p+Geom3MTNPMzfFDw@mail.gmail.com>
        <87h6v5n3su.ffs@tglx>
        <CANn89iL_ey=S=FjkhJ+mk7gabOdVag6ENKnu9GnZkcF31qOaZA@mail.gmail.com>
        <871qm9mgkb.ffs@tglx>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Precedence: bulk

FWIW looks good to me, especially the refcount part.
We do see 10% of jitter in microbenchmarks due to random cache
effects, so forgive the questioning. But again, the refcount seems 
like an obvious win to my noob eyes.

While I have you it would be remiss of me not to mention my ksoftirq
change which makes a large difference in production workloads:
https://lore.kernel.org/all/20221222221244.1290833-3-kuba@kernel.org/
Is Peter's "rework" of softirq going in for 6.3?

On Wed, 01 Mar 2023 02:00:20 +0100 Thomas Gleixner wrote:
> >> We looked at this because the reference count operations stood out in
> >> perf top and we analyzed it down to the false sharing _and_ the
> >> non-scalability of atomic_inc_not_zero().
> >
> > Please share your recipe and perf results.  
> 
> Sorry for being not explicit enough about this, but I was under the
> impression that explicitely mentioning memcached and memtier would be
> enough of a hint for people famiiar with this matter.

I think the disconnect may be that we are indeed familiar with 
the workloads, but these exact workloads don't hit the issue
in production (I don't work at Google but a similarly large corp).
My initial reaction was also to see if I can find the issue in prod.
Not to question but in hope that I can indeed find a repro, and make
this series an easy sell.

> Run memcached with -t $N and memtier_benchmark with -t $M and
> --ratio=1:100 on the same machine. localhost connections obviously
> amplify the problem,
> 
> Start with the defaults for $N and $M and increase them. Depending on
> your machine this will tank at some point. But even in reasonably small
> $N, $M scenarios the refcount operations and the resulting false sharing
> fallout becomes visible in perf top. At some point it becomes the
> dominating issue while the machine still has capacity...
> 
> > We must have been very lucky to not see this at Google.  
> 
> There _is_ a world outside of Google? :)
> 
> Seriously. The point is that even if you @google cannot obverse this as
> a major issue and it just gives your usecase a minimal 0.X gain, it
> still is contributing to the overall performance, no?