Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754034AbcDAWP1 (ORCPT ); Fri, 1 Apr 2016 18:15:27 -0400 Received: from mail-qk0-f175.google.com ([209.85.220.175]:35778 "EHLO mail-qk0-f175.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752688AbcDAWPZ (ORCPT ); Fri, 1 Apr 2016 18:15:25 -0400 Subject: Re: [RFC][PATCH] mm/slub: Skip CPU slab activation when debugging To: Joonsoo Kim References: <1459205581-4605-1-git-send-email-labbott@fedoraproject.org> <20160401023533.GB13179@js1304-P5Q-DELUXE> Cc: Christoph Lameter , Pekka Enberg , David Rientjes , Andrew Morton , Laura Abbott , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Kees Cook From: Laura Abbott Message-ID: <56FEF2F8.7010508@redhat.com> Date: Fri, 1 Apr 2016 15:15:20 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.7.1 MIME-Version: 1.0 In-Reply-To: <20160401023533.GB13179@js1304-P5Q-DELUXE> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1955 Lines: 56 On 03/31/2016 07:35 PM, Joonsoo Kim wrote: > On Mon, Mar 28, 2016 at 03:53:01PM -0700, Laura Abbott wrote: >> The per-cpu slab is designed to be the primary path for allocation in SLUB >> since it assumed allocations will go through the fast path if possible. >> When debugging is enabled, the fast path is disabled and per-cpu >> allocations are not used. The current debugging code path still activates >> the cpu slab for allocations and then immediately deactivates it. This >> is useless work. When a slab is enabled for debugging, skip cpu >> activation. >> >> Signed-off-by: Laura Abbott >> --- >> This is a follow on to the optimization of the debug paths for poisoning >> With this I get ~2 second drop on hackbench -g 20 -l 1000 with slub_debug=P >> and no noticable change with slub_debug=- . > > I'd like to know the performance difference between slub_debug=P and > slub_debug=- with this change. > with the hackbench benchmark slub_debug=- 6.834 slub_debug=P 8.059 so ~1.2 second difference. > Although this patch increases hackbench performance, I'm not sure it's > sufficient for the production system. Concurrent slab allocation request > will contend the node lock in every allocation attempt. So, there would be > other ues-cases that performance drop due to slub_debug=P cannot be > accepted even if it is security feature. > Hmmm, I hadn't considered that :-/ > How about allowing cpu partial list for debug cases? > It will not hurt fast path and will make less contention on the node > lock. > That helps more than this patch! It brings slub_debug=P down to 7.535 with the same relaxing of restrictions of CMPXCHG (allow the partials with poison or redzoning, restrict otherwise). It still seems unfortunate that deactive_slab takes up so much time of __slab_alloc. I'll give some more thought about trying to skip the CPU slab activation with the cpu partial list. > Thanks. > Thanks, Laura