Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752074AbcDLEv3 (ORCPT ); Tue, 12 Apr 2016 00:51:29 -0400 Received: from mail-pa0-f42.google.com ([209.85.220.42]:33540 "EHLO mail-pa0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750795AbcDLEv2 (ORCPT ); Tue, 12 Apr 2016 00:51:28 -0400 From: js1304@gmail.com X-Google-Original-From: iamjoonsoo.kim@lge.com To: Andrew Morton Cc: Christoph Lameter , Pekka Enberg , David Rientjes , Jesper Dangaard Brouer , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim Subject: [PATCH v2 00/11] mm/slab: reduce lock contention in alloc path Date: Tue, 12 Apr 2016 13:50:55 +0900 Message-Id: <1460436666-20462-1-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.9.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2433 Lines: 60 From: Joonsoo Kim Major changes from v1 o hold node lock instead of slab_mutex in kmem_cache_shrink() o fix suspend-to-ram issue reported by Nishanth o use synchronize_sched() instead of kick_all_cpus_sync() While processing concurrent allocation, SLAB could be contended a lot because it did a lots of work with holding a lock. This patchset try to reduce the number of critical section to reduce lock contention. Major changes are lockless decision to allocate more slab and lockless cpu cache refill from the newly allocated slab. Below is the result of concurrent allocation/free in slab allocation benchmark made by Christoph a long time ago. I make the output simpler. The number shows cycle count during alloc/free respectively so less is better. * Before Kmalloc N*alloc N*free(32): Average=365/806 Kmalloc N*alloc N*free(64): Average=452/690 Kmalloc N*alloc N*free(128): Average=736/886 Kmalloc N*alloc N*free(256): Average=1167/985 Kmalloc N*alloc N*free(512): Average=2088/1125 Kmalloc N*alloc N*free(1024): Average=4115/1184 Kmalloc N*alloc N*free(2048): Average=8451/1748 Kmalloc N*alloc N*free(4096): Average=16024/2048 * After Kmalloc N*alloc N*free(32): Average=344/792 Kmalloc N*alloc N*free(64): Average=347/882 Kmalloc N*alloc N*free(128): Average=390/959 Kmalloc N*alloc N*free(256): Average=393/1067 Kmalloc N*alloc N*free(512): Average=683/1229 Kmalloc N*alloc N*free(1024): Average=1295/1325 Kmalloc N*alloc N*free(2048): Average=2513/1664 Kmalloc N*alloc N*free(4096): Average=4742/2172 It shows that performance improves greatly (roughly more than 50%) for the object class whose size is more than 128 bytes. Joonsoo Kim (11): mm/slab: fix the theoretical race by holding proper lock mm/slab: remove BAD_ALIEN_MAGIC again mm/slab: drain the free slab as much as possible mm/slab: factor out kmem_cache_node initialization code mm/slab: clean-up kmem_cache_node setup mm/slab: don't keep free slabs if free_objects exceeds free_limit mm/slab: racy access/modify the slab color mm/slab: make cache_grow() handle the page allocated on arbitrary node mm/slab: separate cache_grow() to two parts mm/slab: refill cpu cache through a new slab without holding a node lock mm/slab: lockless decision to grow cache mm/slab.c | 562 +++++++++++++++++++++++++++++++++----------------------------- 1 file changed, 295 insertions(+), 267 deletions(-) -- 1.9.1