Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp178702iog; Fri, 17 Jun 2022 01:15:01 -0700 (PDT) X-Google-Smtp-Source: AGRyM1u+Kt3DRnMailvbeMQiNWYaOrp4ZpqsLUvozIWzmzG/iHBs9x4NEJcAjnH3V6W5Lo7zh1pZ X-Received: by 2002:a63:6943:0:b0:40c:3020:d0b with SMTP id e64-20020a636943000000b0040c30200d0bmr5387514pgc.34.1655453700856; Fri, 17 Jun 2022 01:15:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1655453700; cv=none; d=google.com; s=arc-20160816; b=cHdS1uYaSqd2eAkL+jaVOAkC7xIiRyEdD7NFzQG1bijEupxSPlOXkuoGT2KGrD9Cgk QQRQO/xAN2VoHEZohQ3t+5E9p61GHNjBSlgJz/3HNua3M29cbkAFFdhQty7ubyDbaNfq /H4RskO4JLVqP9OIwZvW0c5G7hDH++FWR/TIKxqe5gFUKhxe+WysZq7Fo1kJZRzB9K6+ zKElInh7zstdH5UWH0ThV4M6l5AQjzeLSGC/W2wCfIgDuheqMhTu44uPLZ5OdQgK4nnD x6azXTnoQqyLNj1WqF9M0yDA0WK1VJna77V1TuO3hvyBOzPUcstbpCt1LVwaZnxsIS0W Yjig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=der/B8yzKY1/tE3egP44nInyXRessHeL8hiZ/zsEuhU=; b=LznyNAhVykFcH1DaEPHx1i07mCv32Dd5PVj3r8JGNatsPYc6+y8/9XpGZpZWDAvGiM pEg/AsYAWg/YPssVmCBgrJaX3BqSs6k8lxzW6OLNYT83eS2l+K02Dv00pA89QSP9QLY6 EX3vY6pf5U5as3F0R3rZfuYCHUZ13Qzuo3Ek3DZh39RH39gYziN4OhaSGvOx35/tc1Rt 7sSk5k1zl2RCPeqCj/yMmj2HAn0QPDoiruOD17oRpR9KIUIOqkHO5i9jLZdRqdbDvvdt yX03UVl5//K/2iZiG93bpFYw36Dprs9mY6OrGmMQ/nxgayVx4uV1R8bDSgtgU6NKlRsN dK2g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j62-20020a638041000000b003fd91708012si5198366pgd.440.2022.06.17.01.14.47; Fri, 17 Jun 2022 01:15:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1380125AbiFQHzp (ORCPT + 99 others); Fri, 17 Jun 2022 03:55:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35868 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1380682AbiFQHzn (ORCPT ); Fri, 17 Jun 2022 03:55:43 -0400 Received: from out30-54.freemail.mail.aliyun.com (out30-54.freemail.mail.aliyun.com [115.124.30.54]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E7EB674F6 for ; Fri, 17 Jun 2022 00:55:42 -0700 (PDT) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R101e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046059;MF=rongwei.wang@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0VGeDMg5_1655452536; Received: from 30.25.208.161(mailfrom:rongwei.wang@linux.alibaba.com fp:SMTPD_---0VGeDMg5_1655452536) by smtp.aliyun-inc.com; Fri, 17 Jun 2022 15:55:37 +0800 Message-ID: <5085437c-adc9-b6a3-dbd8-91dc0856cf19@linux.alibaba.com> Date: Fri, 17 Jun 2022 15:55:35 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.0 Subject: Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free Content-Language: en-US To: Christoph Lameter Cc: David Rientjes , songmuchun@bytedance.com, Hyeonggon Yoo <42.hyeyoo@gmail.com>, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, iamjoonsoo.kim@lge.com, penberg@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20220529081535.69275-1-rongwei.wang@linux.alibaba.com> <9794df4f-3ffe-4e99-0810-a1346b139ce8@linux.alibaba.com> <29723aaa-5e28-51d3-7f87-9edf0f7b9c33@linux.alibaba.com> <02298c0e-3293-9deb-f1ed-6d8862f7c349@linux.alibaba.com> From: Rongwei Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/13/22 9:50 PM, Christoph Lameter wrote: > On Sat, 11 Jun 2022, Rongwei Wang wrote: > >>> Ok so the idea is to take the lock only if kmem_cache_debug. That looks >>> ok. But it still adds a number of new branches etc to the free loop. >>> >>> Some performance tests would be useful. >> Hi Christoph >> >> Thanks for your time! >> Do you have some advice in benchmarks that need me to test? And I find that >> hackbench and lkp was used frequently in mm/slub.c commits[1,2]. But I have no >> idea how to use these two benchmarks test to cover the above changes. Can you >> give some examples? Thanks very much! > > > Hi Rongwei, > > Well run hackbench with an without the change. > > There are also synthetic benchmarks available at > https://gentwo.org/christoph/slub/tests/ Christoph, I refer [1] to test some data below. The slub_test case is same to your provided. And here you the result of its test (the baseline is the data of upstream kernel, and fix is results of patched kernel). my test environment: arm64 vm (32 cores and 128G memory) And I have removed 'slub_debug=UFPZ' in cmdline before testing the following two groups of data. [1]https://lore.kernel.org/linux-mm/20200527103545.4348ac10@carbon/ Single thread testing 1. Kmalloc: Repeatedly allocate then free test before (baseline) fix kmalloc kfree kmalloc kfree 10000 times 8 7 cycles 8 cycles 5 cycles 7 cycles 10000 times 16 4 cycles 8 cycles 3 cycles 6 cycles 10000 times 32 4 cycles 8 cycles 3 cycles 6 cycles 10000 times 64 3 cycles 8 cycles 3 cycles 6 cycles 10000 times 128 3 cycles 8 cycles 3 cycles 6 cycles 10000 times 256 12 cycles 8 cycles 11 cycles 7 cycles 10000 times 512 27 cycles 10 cycles 23 cycles 11 cycles 10000 times 1024 18 cycles 9 cycles 20 cycles 10 cycles 10000 times 2048 54 cycles 12 cycles 54 cycles 12 cycles 10000 times 4096 105 cycles 20 cycles 105 cycles 25 cycles 10000 times 8192 210 cycles 35 cycles 212 cycles 39 cycles 10000 times 16384 133 cycles 45 cycles 119 cycles 46 cycles 2. Kmalloc: alloc/free test before (base) fix 10000 times kmalloc(8)/kfree 3 cycles 3 cycles 10000 times kmalloc(16)/kfree 3 cycles 3 cycles 10000 times kmalloc(32)/kfree 3 cycles 3 cycles 10000 times kmalloc(64)/kfree 3 cycles 3 cycles 10000 times kmalloc(128)/kfree 3 cycles 3 cycles 10000 times kmalloc(256)/kfree 3 cycles 3 cycles 10000 times kmalloc(512)/kfree 3 cycles 3 cycles 10000 times kmalloc(1024)/kfree 3 cycles 3 cycles 10000 times kmalloc(2048)/kfree 3 cycles 3 cycles 10000 times kmalloc(4096)/kfree 3 cycles 3 cycles 10000 times kmalloc(8192)/kfree 3 cycles 3 cycles 10000 times kmalloc(16384)/kfree 33 cycles 33 cycles Concurrent allocs before (baseline) fix Kmalloc N*alloc N*free(8) Average=17/18 Average=11/11 Kmalloc N*alloc N*free(16) Average=15/49 Average=9/11 Kmalloc N*alloc N*free(32) Average=15/40 Average=9/11 Kmalloc N*alloc N*free(64) Average=15/44 Average=9/10 Kmalloc N*alloc N*free(128) Average=15/42 Average=10/10 Kmalloc N*alloc N*free(256) Average=128/28 Average=71/22 Kmalloc N*alloc N*free(512) Average=206/34 Average=178/26 Kmalloc N*alloc N*free(1024) Average=762/37 Average=369/27 Kmalloc N*alloc N*free(2048) Average=327/58 Average=339/33 Kmalloc N*alloc N*free(4096) Average=2255/128 Average=1813/64 before (baseline) fix Kmalloc N*(alloc free)(8) Average=3 Average=3 Kmalloc N*(alloc free)(16) Average=3 Average=3 Kmalloc N*(alloc free)(32) Average=3 Average=3 Kmalloc N*(alloc free)(64) Average=3 Average=3 Kmalloc N*(alloc free)(128) Average=3 Average=3 Kmalloc N*(alloc free)(256) Average=3 Average=3 Kmalloc N*(alloc free)(512) Average=3 Average=3 Kmalloc N*(alloc free)(1024) Average=3 Average=3 Kmalloc N*(alloc free)(2048) Average=3 Average=3 Kmalloc N*(alloc free)(4096) Average=3 Average=3 According to the above data, It seems that no significant performance degradation in patched kernel. Plus, in concurrent allocs test, likes Kmalloc N*alloc N*free(1024), the data of 'fix' column is better than baseline (it looks less is better, if I am wrong, please let me know). And if you have other suggestions, I can try to test more data. Thanks for your time! -wrw > > These measure the cycles that slab operations take. However, they are a > bit old and I think Pekka may have a newer version of these > patches. > > Greetings, > Christoph