Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753173AbdFNEpm (ORCPT ); Wed, 14 Jun 2017 00:45:42 -0400 Received: from mail-pg0-f43.google.com ([74.125.83.43]:36176 "EHLO mail-pg0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750717AbdFNEpk (ORCPT ); Wed, 14 Jun 2017 00:45:40 -0400 Date: Wed, 14 Jun 2017 13:45:30 +0900 From: Joonsoo Kim To: Laura Abbott Cc: Christoph Lameter , Pekka Enberg , David Rientjes , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kees Cook Subject: Re: [RFC][PATCH] slub: Introduce 'alternate' per cpu partial lists Message-ID: <20170614044528.GA5924@js1304-desktop> References: <1496965984-21962-1-git-send-email-labbott@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1496965984-21962-1-git-send-email-labbott@redhat.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2242 Lines: 42 On Thu, Jun 08, 2017 at 04:53:04PM -0700, Laura Abbott wrote: > SLUB debugging features (poisoning, red zoning etc.) skip the fast path > completely. This ensures there is a single place to do all checks and > take any locks that may be necessary for debugging. The overhead of some > of the debugging features (e.g. poisoning) ends up being comparatively > small vs the overhead of not using the fast path. > > We don't want to impose any kind of overhead on the fast path so > introduce the notion of an alternate fast path. This is essentially the > same idea as the existing fast path (store partially used pages on the > per-cpu list) but it happens after the real fast path. Debugging that > doesn't require locks (poisoning/red zoning) can happen on this path to > avoid the penalty of always needing to go for the slow path. > > Signed-off-by: Laura Abbott > --- > This is a follow up to my previous proposal to speed up slub_debug=P > https://marc.info/?l=linux-mm&m=145920558822958&w=2 . The current approach > is hopelessly slow and can't really be used outside of limited debugging. > The goal is to make slub_debug=P more usable for general use. > > Joonsoo Kim pointed out that my previous attempt still wouldn't scale > as it still involved taking the list_lock for every allocation. He suggested > adding per-cpu support, as did Christoph Lameter in a separate thread. This > proposal adds a separate per-cpu list for use when poisoning is enabled. > For this version, I'm mostly looking for general feedback about how reasonable > this approach is before trying to clean it up more. > > - Some of this code is redundant and can probably be combined. > - The fast path is very sensitive and it was suggested I leave it alone. The > approach I took means the fastpath cmpxchg always fails before trying the > alternate cmpxchg. From some of my profiling, the cmpxchg seemed to be fairly > expensive. It looks better to modify the fastpath for non-debuging poisoning. If we use the jump label, it doesn't cause any overhead to the fastpath for the user who doesn't use this feature. It really makes thing simpler. Only a few more lines will be needed in the fastpath. Christoph, any opinion? Thanks.