Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753986AbcC1F1X (ORCPT ); Mon, 28 Mar 2016 01:27:23 -0400 Received: from mail-pa0-f47.google.com ([209.85.220.47]:33108 "EHLO mail-pa0-f47.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753569AbcC1F1N (ORCPT ); Mon, 28 Mar 2016 01:27:13 -0400 From: js1304@gmail.com X-Google-Original-From: iamjoonsoo.kim@lge.com To: Andrew Morton Cc: Christoph Lameter , Pekka Enberg , David Rientjes , Jesper Dangaard Brouer , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Joonsoo Kim Subject: mm/slab: reduce lock contention in alloc path Date: Mon, 28 Mar 2016 14:26:50 +0900 Message-Id: <1459142821-20303-1-git-send-email-iamjoonsoo.kim@lge.com> X-Mailer: git-send-email 1.9.1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2283 Lines: 58 From: Joonsoo Kim While processing concurrent allocation, SLAB could be contended a lot because it did a lots of work with holding a lock. This patchset try to reduce the number of critical section to reduce lock contention. Major changes are lockless decision to allocate more slab and lockless cpu cache refill from the newly allocated slab. Below is the result of concurrent allocation/free in slab allocation benchmark made by Christoph a long time ago. I make the output simpler. The number shows cycle count during alloc/free respectively so less is better. * Before Kmalloc N*alloc N*free(32): Average=365/806 Kmalloc N*alloc N*free(64): Average=452/690 Kmalloc N*alloc N*free(128): Average=736/886 Kmalloc N*alloc N*free(256): Average=1167/985 Kmalloc N*alloc N*free(512): Average=2088/1125 Kmalloc N*alloc N*free(1024): Average=4115/1184 Kmalloc N*alloc N*free(2048): Average=8451/1748 Kmalloc N*alloc N*free(4096): Average=16024/2048 * After Kmalloc N*alloc N*free(32): Average=344/792 Kmalloc N*alloc N*free(64): Average=347/882 Kmalloc N*alloc N*free(128): Average=390/959 Kmalloc N*alloc N*free(256): Average=393/1067 Kmalloc N*alloc N*free(512): Average=683/1229 Kmalloc N*alloc N*free(1024): Average=1295/1325 Kmalloc N*alloc N*free(2048): Average=2513/1664 Kmalloc N*alloc N*free(4096): Average=4742/2172 It shows that performance improves greatly (roughly more than 50%) for the object class whose size is more than 128 bytes. Thanks. Joonsoo Kim (11): mm/slab: hold a slab_mutex when calling __kmem_cache_shrink() mm/slab: remove BAD_ALIEN_MAGIC again mm/slab: drain the free slab as much as possible mm/slab: factor out kmem_cache_node initialization code mm/slab: clean-up kmem_cache_node setup mm/slab: don't keep free slabs if free_objects exceeds free_limit mm/slab: racy access/modify the slab color mm/slab: make cache_grow() handle the page allocated on arbitrary node mm/slab: separate cache_grow() to two parts mm/slab: refill cpu cache through a new slab without holding a node lock mm/slab: lockless decision to grow cache mm/slab.c | 495 ++++++++++++++++++++++++++++--------------------------- mm/slab_common.c | 4 + 2 files changed, 255 insertions(+), 244 deletions(-) -- 1.9.1