Received: by 2002:a5d:9c59:0:0:0:0:0 with SMTP id 25csp53735iof; Sun, 5 Jun 2022 20:53:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxsOyenGnRiAbtCUfswOKQ6oJsUrccjXj2iL5Fq5iwlvVuVjbXNT/Q73/YklNyNAcbM2JB7 X-Received: by 2002:a17:90a:1f4c:b0:1e6:6f77:c573 with SMTP id y12-20020a17090a1f4c00b001e66f77c573mr26097802pjy.17.1654487638414; Sun, 05 Jun 2022 20:53:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1654487638; cv=none; d=google.com; s=arc-20160816; b=WryTEl57PgPBA9KbXNHzZrL7xyyTekRFU/xwyP+sTz4472pRvAXKDpekJHiKY07Ad6 TYkxDLnUGMv6SoNWxgLQn0cNLr7FraNd/xTkGgjYGurwArfSPKfTNpxa+G4bFGIjQ0pM 3bEyP5mERwVj4T0RQKDjxL6QJhIbw4cLkiHAW/iit7VNUH74ft9AwvtOrHqEg+FjWNiB FSWp0lhK/5k2IVeMscV8fCShtbkNwTVE0jLkBAAbTBifgwN/ATWwS3skGbPAT8bbG2Kk ahWpDEViifmM1t6E69ZOVCgUCtAZ2HrJsqWrWS6nytn+YhjBUafwrP3n/RUrgps/YDnQ puEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=LXSima+rBrCZEoEdN9Res2CXy6Ed3jHKZBixDdAZll8=; b=fdVg3UMrjl4u5CErtbyF5Ob1duHfloikoHQQM8NFLZ8FrBL3YtYUJROfR7h+83/l/+ cTWsovcLcOQBxkphAuxQBS/MPvfOaeh65bp5o9zUhvGFdGMNmCge2U3vb6GMeAVLHb+X l/DgKJuHgNm1CDVKKmlurM7rTGIx5aqQNQhx74653u4ACURBsahjOCMLxfIYzSOsDb0j 6G3CZ57aOor4CrhBbXJye02NyPasOfEKP/BJGdX3xmEAoDbcydlfjWEaKZDoRB+Z4dIe 9P9/+UBtiiYgd5uiKqFMOdal5z38Riw1fDQnxi674MtjHoFfFyGjxxTUux/3Ukzm69WZ gAQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=PHV3GKBO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id e4-20020a170902ef4400b0015ceea6bd9esi19317262plx.550.2022.06.05.20.53.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Jun 2022 20:53:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=PHV3GKBO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8F174606D6; Sun, 5 Jun 2022 20:42:41 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234255AbiFDLFR (ORCPT + 99 others); Sat, 4 Jun 2022 07:05:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52080 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235114AbiFDLFJ (ORCPT ); Sat, 4 Jun 2022 07:05:09 -0400 Received: from mail-io1-xd33.google.com (mail-io1-xd33.google.com [IPv6:2607:f8b0:4864:20::d33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D37B1CFC3 for ; Sat, 4 Jun 2022 04:05:05 -0700 (PDT) Received: by mail-io1-xd33.google.com with SMTP id z197so3774385iof.1 for ; Sat, 04 Jun 2022 04:05:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=LXSima+rBrCZEoEdN9Res2CXy6Ed3jHKZBixDdAZll8=; b=PHV3GKBOOPM8zFLSgovBNQM6CEUPk3EF1YQWC5HUsvdBkhDIFswCP3TuCMSDWcFu1Y FIM4KoackkR4uQVpQlXCCU1ufCzd8fMioUYyRMfL+NWbEoGLjzXSfUOIcYm4J3kC19cz aE5aWY7B11ZW83m254oZwDglF10Vb9E5d9sqmgYeLbS2DLMnLG6exu5sheVJpj0yWmmK pu8EAtbA1MWZtgtMEhBNA/Z3D7d1f0OClXtiM2syAVtrQG6c0pQg/uTiGC+eeVMRjvjJ PqS+dNkqjIOcTzZ9dUmeEiPlfk3kE36G6GeL6y6Q4qxjSggyAHikgF8ZiwfEz6B/GphQ s6Uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=LXSima+rBrCZEoEdN9Res2CXy6Ed3jHKZBixDdAZll8=; b=FC/ezSRcZjkFT7ZS6Yrp7k2FjOXSiXAp/9NA7JUlzCr80eWTkz/IQ+g4LJ69mCH/Ph qtpo78Qml8ErQhzZKx8H4WFmJ81r4Oa2ThxoVkkyqsMCfOjH3gqgDzJue8jY4egML6Nn ObB4Dv1IjiwBzUyuSE1j9oWTA2VcxQWTFisARhrPuUVQN2rC21k7ny1sfWLprqK9vDlO ynV2MkbSjuw3HfilhL0QIiXpBgmYKs6+ASkWaio97P650CezGT/umF8aLvUFouvdd8US 8MZzlpzf05IX+fbfa3PjsMdKKFsYpOPKWOCKSwmCGqfplGyQ0EKQzeVMX1c/Iqhq5Zlw 93KA== X-Gm-Message-State: AOAM530zgxYrhCkTvvRYeK1ilkV1j1OlEJzNK+WIu+xkz/jYZrMRD14Y o4Se1zmXCT02hX7/95BEIt8= X-Received: by 2002:a05:6602:81b:b0:665:746a:f079 with SMTP id z27-20020a056602081b00b00665746af079mr6863759iow.125.1654340704769; Sat, 04 Jun 2022 04:05:04 -0700 (PDT) Received: from n2.us-central1-a.c.spheric-algebra-350919.internal (151.16.70.34.bc.googleusercontent.com. [34.70.16.151]) by smtp.gmail.com with ESMTPSA id o1-20020a056e02102100b002d3ca0d55d0sm3954868ilj.48.2022.06.04.04.05.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 04 Jun 2022 04:05:03 -0700 (PDT) Date: Sat, 4 Jun 2022 11:05:02 +0000 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> To: Muchun Song Cc: Rongwei Wang , akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, iamjoonsoo.kim@lge.com, rientjes@google.com, penberg@kernel.org, cl@linux.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free Message-ID: References: <20220529081535.69275-1-rongwei.wang@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-0.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,MAILING_LIST_MULTI, RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 31, 2022 at 11:47:22AM +0800, Muchun Song wrote: > On Sun, May 29, 2022 at 11:37:06AM +0000, Hyeonggon Yoo wrote: > > On Sun, May 29, 2022 at 04:15:33PM +0800, Rongwei Wang wrote: > > > In use cases where allocating and freeing slab frequently, some > > > error messages, such as "Left Redzone overwritten", "First byte > > > 0xbb instead of 0xcc" would be printed when validating slabs. > > > That's because an object has been filled with SLAB_RED_INACTIVE, > > > but has not been added to slab's freelist. And between these > > > two states, the behaviour of validating slab is likely to occur. > > > > > > Actually, it doesn't mean the slab can not work stably. But, these > > > confusing messages will disturb slab debugging more or less. > > > > > > Signed-off-by: Rongwei Wang > > > > Have you observed it or it's from code inspection? > > > > > --- > > > mm/slub.c | 40 +++++++++++++++++----------------------- > > > 1 file changed, 17 insertions(+), 23 deletions(-) > > > > > > diff --git a/mm/slub.c b/mm/slub.c > > > index ed5c2c03a47a..310e56d99116 100644 > > > --- a/mm/slub.c > > > +++ b/mm/slub.c > > > @@ -1374,15 +1374,12 @@ static noinline int free_debug_processing( > > > void *head, void *tail, int bulk_cnt, > > > unsigned long addr) > > > { > > > - struct kmem_cache_node *n = get_node(s, slab_nid(slab)); > > > void *object = head; > > > int cnt = 0; > > > - unsigned long flags, flags2; > > > + unsigned long flags; > > > int ret = 0; > > > > > > - spin_lock_irqsave(&n->list_lock, flags); > > > - slab_lock(slab, &flags2); > > > - > > > + slab_lock(slab, &flags); > > > if (s->flags & SLAB_CONSISTENCY_CHECKS) { > > > if (!check_slab(s, slab)) > > > goto out; > > > @@ -1414,8 +1411,7 @@ static noinline int free_debug_processing( > > > slab_err(s, slab, "Bulk freelist count(%d) invalid(%d)\n", > > > bulk_cnt, cnt); > > > > > > - slab_unlock(slab, &flags2); > > > - spin_unlock_irqrestore(&n->list_lock, flags); > > > + slab_unlock(slab, &flags); > > > if (!ret) > > > slab_fix(s, "Object at 0x%p not freed", object); > > > return ret; > > > @@ -3304,7 +3300,7 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab, > > > > > > { > > > void *prior; > > > - int was_frozen; > > > + int was_frozen, to_take_off = 0; > > > struct slab new; > > > unsigned long counters; > > > struct kmem_cache_node *n = NULL; > > > @@ -3315,15 +3311,19 @@ static void __slab_free(struct kmem_cache *s, struct slab *slab, > > > if (kfence_free(head)) > > > return; > > > > > > + n = get_node(s, slab_nid(slab)); > > > + spin_lock_irqsave(&n->list_lock, flags); > > > + > > > > Oh please don't do this. > > > > SLUB free slowpath can be hit a lot depending on workload. > > > > __slab_free() try its best not to take n->list_lock. currently takes n->list_lock > > only when the slab need to be taken from list. > > > > Unconditionally taking n->list_lock will degrade performance. > > > > I can confirm you are right. We have encountered this issue in practise. > We have deployed somen HDFS instance on a 256-CPU machine. When there > are lots of IO requests from users, we can see lots of threads are contended > on n->list_lock. Lots of call traces are like following: > > ffffffff810dfbb4 native_queued_spin_lock_slowpath+0x1a4 > ffffffff81780ffb _raw_spin_lock+0x1b > ffffffff8127327e get_partial_node.isra.81+0x5e > ffffffff812752d3 ___slab_alloc+0x2f3 > ffffffff8127559c __slab_alloc+0x1c > ffffffff81275828 kmem_cache_alloc+0x278 > ffffffff812e9e3d alloc_buffer_head+0x1d > ffffffff812e9f74 alloc_page_buffers+0xa4 > ffffffff812eb0e9 create_empty_buffers+0x19 > ffffffff812eb37d create_page_buffers+0x7d > ffffffff812ed78d __block_write_begin_int+0x9d > > I thought it was because there are lots of threads which consume local > CPU slab cache quickly and then both of them try to get a new slab > from node partial list. Since there are 256 CPUs, the contention > is more competitive and easy to be visible. > > Any thoughts on this issue (e.e. how to ease contention)? Comments > are welcome. How does increasing number of partial slabs affect your situation? (increasing /sys/slab//cpu_partial) > Thanks. > >