Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp329603imw; Fri, 15 Jul 2022 04:13:56 -0700 (PDT) X-Google-Smtp-Source: AGRyM1tQwkGS9o/0QpnvoyhdKSDZxWikVaDIjSBycITFQ6Dn/ab1IAW5qj9vtEHrvkVBE5qH0T0P X-Received: by 2002:a17:906:98c8:b0:72b:408f:f499 with SMTP id zd8-20020a17090698c800b0072b408ff499mr13291219ejb.736.1657883635998; Fri, 15 Jul 2022 04:13:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657883635; cv=none; d=google.com; s=arc-20160816; b=DTY/FzvS7W/+8tS7vO9IjtP5vITRZg9ue0HQPJdVZDiz3fRAPVVlJ84drCPWVWuwSP ubGEjklzG8ZUTTggK0mzNM/tJx7MZ9jBgkLN856vofnLRdaEk9Y3WpmqUDEv+NPgpYQg te7lLGCmx1bNg3y0KzGYXeOPoSMuOYl2E1ca4KALnoiw+g1vTSTiqNh1u7KuohhkkXXQ C4vJ2AuwRkMk4fNt2+ydjBno+SZMcznDwEXmyY/PVygFzssOfWO4/tWVPhb7kNARYvhV hcTFk7ZbQjRxiLn5Rt5dN+9I3ztOl4U83g4fu8kGaYhmYv1o5J7R5qWspjY1PdKpSA5R x6YQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=TilLymi1eg84JeWDlVHnvqYTNhOf6a/K1MdLaZJhCD4=; b=Dst2Cm9J8ox2KRLtez5sfepotGBqnfHUN3eAQkncFpC8w2BXSiAhvsegazkvj9bMdL pXKuRXevU2S/+VVZu0zDwALfcDmGHozHbdvLoPSU2uzn8Wp7oyIq8CESCWtuGPvdyCbv WEZxV6DMgIBc9ZEH5oPpHoE0U2apQHkX6BHF8Zo5bqciGd39y7E+NG9bz1TvzaAtgQ72 u+EzczTPkQjJUvxdxIA3pwHfOqwhcDGYNlQbder+SQAX3gCc03z4DCmal3vLpyKKXfLt 4ybDdxrmmWAcN0LAOVpcE6IFszEP+mnG0nR7DLB0qPKFfZVJRN9RxmADGha+A5KIa9MF nW7g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c16-20020a05640227d000b0043a4b3bd453si6620464ede.504.2022.07.15.04.13.30; Fri, 15 Jul 2022 04:13:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229897AbiGOKwE (ORCPT + 99 others); Fri, 15 Jul 2022 06:52:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229459AbiGOKwC (ORCPT ); Fri, 15 Jul 2022 06:52:02 -0400 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D91D7FE44 for ; Fri, 15 Jul 2022 03:52:00 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R191e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046051;MF=rongwei.wang@linux.alibaba.com;NM=1;PH=DS;RN=10;SR=0;TI=SMTPD_---0VJP9jpm_1657882314; Received: from 30.240.97.187(mailfrom:rongwei.wang@linux.alibaba.com fp:SMTPD_---0VJP9jpm_1657882314) by smtp.aliyun-inc.com; Fri, 15 Jul 2022 18:51:55 +0800 Message-ID: <23a79337-a5eb-a959-a764-1296a3e8e7c1@linux.alibaba.com> Date: Fri, 15 Jul 2022 18:51:54 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:103.0) Gecko/20100101 Thunderbird/103.0 Subject: Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free Content-Language: en-US To: Vlastimil Babka , Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: David Rientjes , songmuchun@bytedance.com, akpm@linux-foundation.org, roman.gushchin@linux.dev, iamjoonsoo.kim@lge.com, penberg@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20220529081535.69275-1-rongwei.wang@linux.alibaba.com> <9794df4f-3ffe-4e99-0810-a1346b139ce8@linux.alibaba.com> <29723aaa-5e28-51d3-7f87-9edf0f7b9c33@linux.alibaba.com> From: Rongwei Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE, SPF_PASS,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 7/15/22 6:33 PM, Vlastimil Babka wrote: > On 7/15/22 10:05, Rongwei Wang wrote: >> >> >> On 6/17/22 5:40 PM, Vlastimil Babka wrote: >>> On 6/8/22 14:23, Christoph Lameter wrote: >>>> On Wed, 8 Jun 2022, Rongwei Wang wrote: >>>> >>>>> If available, I think document the issue and warn this incorrect >>>>> behavior is >>>>> OK. But it still prints a large amount of confusing messages, and >>>>> disturbs us? >>>> >>>> Correct it would be great if you could fix this in a way that does not >>>> impact performance. >>>> >>>>>> are current operations on the slab being validated. >>>>> And I am trying to fix it in following way. In a short, these changes only >>>>> works under the slub debug mode, and not affects the normal mode (I'm not >>>>> sure). It looks not elegant enough. And if all approve of this way, I can >>>>> submit the next version. >>>> >>>> >>>>> >>>>> Anyway, thanks for your time:). >>>>> -wrw >>>>> >>>>> @@ -3304,7 +3300,7 @@ static void __slab_free(struct kmem_cache *s, >>>> struct >>>>> slab *slab, >>>>> >>>>>   { >>>>>          void *prior; >>>>> -       int was_frozen; >>>>> +       int was_frozen, to_take_off = 0; >>>>>          struct slab new; >>>> >>>> to_take_off has the role of !n ? Why is that needed? >>>> >>>>> -       do { >>>>> -               if (unlikely(n)) { >>>>> +               spin_lock_irqsave(&n->list_lock, flags); >>>>> +               ret = free_debug_processing(s, slab, head, tail, cnt, >>>>> addr); >>>> >>>> Ok so the idea is to take the lock only if kmem_cache_debug. That looks >>>> ok. But it still adds a number of new branches etc to the free loop. >>> >> Hi, Vlastimil, sorry for missing your message long time. > > Hi, no problem. > >>> It also further complicates the already tricky code. I wonder if we should >>> make more benefit from the fact that for kmem_cache_debug() caches we don't >>> leave any slabs on percpu or percpu partial lists, and also in >>> free_debug_processing() we aready take both list_lock and slab_lock. If we >>> just did the freeing immediately there under those locks, we would be >>> protected against other freeing cpus by that list_lock and don't need the >>> double cmpxchg tricks. >> enen, I'm not sure get your "don't need the double cmpxchg tricks" means >> completely. What you want to say is that replace cmpxchg_double_slab() here >> with following code when kmem_cache_debug(s)? >> >> __slab_lock(slab); >> if (slab->freelist == freelist_old && slab->counters == counters_old){ >>     slab->freelist = freelist_new; >>     slab->counters = counters_new; >>     __slab_unlock(slab); >>     local_irq_restore(flags); >>     return true; >> } >> __slab_unlock(slab); > > Pretty much, but it's more complicated. Yes, actually, I think reuse cmpxchg_double_slab() here is more concise in code. I'm already finish this part of code, but hesitating whether to replace cmpxchg_double_slab(). > >> If I make mistakes for your words, please let me know. >> Thanks! >>> >>> What about against allocating cpus? More tricky as those will currently end >>> up privatizing the freelist via get_partial(), only to deactivate it again, >>> so our list_lock+slab_lock in freeing path would not protect in the >>> meanwhile. But the allocation is currently very inefficient for debug >>> caches, as in get_partial() it will take the list_lock to take the slab from >>> partial list and then in most cases again in deactivate_slab() to return it. >> It seems that I need speed some time to eat these words. Anyway, thanks. >>> >>> If instead the allocation path for kmem_cache_debug() cache would take a >>> single object from the partial list (not whole freelist) under list_lock, it >>> would be ultimately more efficient, and protect against freeing using >>> list_lock. Sounds like an idea worth trying to me? >> >> Hyeonggon had a similar advice that split freeing and allocating slab from >> debugging, likes below: >> >> >> __slab_alloc() { >>     if (kmem_cache_debug(s)) >>         slab_alloc_debug() >>     else >>         ___slab_alloc() >> } >> >> I guess that above code aims to solve your mentioned problem (idea)? >> >> slab_free() { >>     if (kmem_cache_debug(s)) >>         slab_free_debug() >>     else >>         __do_slab_free() >> } >> >> Currently, I only modify the code of freeing slab to fix the confusing >> messages of "slabinfo -v". If you agree, I can try to realize above >> mentioned slab_alloc_debug() code. Maybe it's also a challenge to me. > > I already started working on this approach and hope to post a RFC soon. OK, that's great. > >> Thanks for your time. >> >>> And of course we would stop creating the 'validate' sysfs files for >>> non-debug caches.