Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp1043153iob; Wed, 18 May 2022 20:21:16 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzJj1QjaXw97i1/Uqaeg34FyvrlfTDcvALiPt6sE9fjniP+kMMPy448Kcj+ifUzcjZXQ5jS X-Received: by 2002:a17:907:d87:b0:6fe:9d72:396f with SMTP id go7-20020a1709070d8700b006fe9d72396fmr105916ejc.129.1652930476559; Wed, 18 May 2022 20:21:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652930476; cv=none; d=google.com; s=arc-20160816; b=VVLO/cU6RPcEH8gDQH0tLIb7vuC3RL2+er9T+wlMNcmaNs+B972YnXpwzlMeNNndIJ SQKdX4fgajzB3x1He6wI/qvh1LZV1yoHgwFqaoI9B2db/Z8va5pb3QOCiJBPqnLcJwzV aMzbIi6eOvuEFMzaE0SMnqvstFM6zoFX6MwXmqN5dauFkRkBdZuUiFruWBZm9Ujhs4NA 8e6vPcYYpsbPTc4Y0KeM4NRP3H/OJ6VFCwRcG4xSkjFXBH9cI2NLAUDP1wgC8Z+cuL1F LCxEBtj4OowPHf1oirGwEbh2GTLYp9aSnCkZJ/YW4PYZJJWGQjxUy2nuB8dAQEigXTfD SPww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id :dkim-signature; bh=6IaBYwoRHRaxkHU7M30ifQh1nkElBKl7D4YjtOVQqYY=; b=A0H8lxEkuv/q7Ln+N2QXyCuNABvMcjs8XRPTZfZ4Y89UItl4sSg4CJekYIyEC2SSNV 7xi1BANExftqsaOn+eiud/xiBPQ7SRjtGqsVpOUqeu3f3Kwx/LaxLjjWyOZgn42rLI3P vxM+y3qZWgOcqMcUmPrUXzzzQVevh+if0SkaH67JjKOVY55inyCTtTvHml61brOOEhoI c3sLJst8YEQgyBLxstvzVHej0xGYyKiKmJ1CVg9u+O/Z1Y4hHI9FnU8bRh+TPwjX4oVU 4GUEmlAy+Mme7Xntw5rxn6zwSyR+WheiWC1N222D2EnYvTeRZOUIgQ2XZxjkksNGTFIv ICvw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=CmMzXeOJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g22-20020a17090670d600b006f3891ead87si4617878ejk.802.2022.05.18.20.20.50; Wed, 18 May 2022 20:21:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=CmMzXeOJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231417AbiESDNF (ORCPT + 99 others); Wed, 18 May 2022 23:13:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33010 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231282AbiESDM6 (ORCPT ); Wed, 18 May 2022 23:12:58 -0400 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D83C517CC for ; Wed, 18 May 2022 20:12:56 -0700 (PDT) Received: by mail-pf1-x42e.google.com with SMTP id y41so3922529pfw.12 for ; Wed, 18 May 2022 20:12:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:to:cc:references :from:in-reply-to:content-transfer-encoding; bh=6IaBYwoRHRaxkHU7M30ifQh1nkElBKl7D4YjtOVQqYY=; b=CmMzXeOJAh6BIxM8VcearYEZsPNG9i4weY+q0nBdW/+BaDt1HixZChQkPCHJpLaFM7 gh2obS+UcjYG0UD1Zzi6jChw2pK2dmnPTxkqiz/a2tQk+ySShqt6zI0ze1PSFW2Zh5vy Cy3yhDsfZDC2IsslcYrGQrJ+QxodnGjkB25mv4dWDvi3VM80N+m4jMsYApmhQYoQEp5W S4wsZ8kHOG3Ain84sAEhYas2SFMEovVrJNKOGoHUfI/eRmz2HTt4J3vVJyb1PrUcgF41 hSWestDp09u0gU2aNUQwKLW2BG0Cj07bm8J17UId2NUxFocnIUmAFADJQUXc320KMH59 TP+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :to:cc:references:from:in-reply-to:content-transfer-encoding; bh=6IaBYwoRHRaxkHU7M30ifQh1nkElBKl7D4YjtOVQqYY=; b=ns0QBAjoOnr3lG/p7B6UFDh715LrTRgw1oravs1CavNwdgtXG0pl89MN11Uj3vC8Xg /sUSDjFiyjapv5KYhnlO0PsaKQANKvbnhApp+DSanRaW7eZ2CWrNp/W88wpD0qlVWG87 V6q1cCLTt911CttLHtRxxHCcvxpVD29qo6GWaOgWAEPGzykETs36c6cO6dzw/7DlxskX dU+rHRvhPycEIZbQ/fc694u64yxS6DF4PskSt3pSlygR9YYV4X+SKz0mekyqMntDWWGk z774dZGzbwTayy+WYtmzTrg3s1Wii5aWSJx8sip3EOlgRXm4nCR4VDTZPFt7CfTR8LaW UV0w== X-Gm-Message-State: AOAM533rZIMGUu5lmlafgsqa/L6kLIWv5/4ET0j/xuockKlg24L7gw1R lBulnSg6A9fLPIoA19ttuRL0VQ== X-Received: by 2002:a62:2701:0:b0:518:2570:b8f6 with SMTP id n1-20020a622701000000b005182570b8f6mr2489091pfn.19.1652929976048; Wed, 18 May 2022 20:12:56 -0700 (PDT) Received: from [10.71.57.194] ([139.177.225.241]) by smtp.gmail.com with ESMTPSA id bv11-20020a17090af18b00b001d6a79768b6sm2264863pjb.49.2022.05.18.20.12.50 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 18 May 2022 20:12:55 -0700 (PDT) Message-ID: <380fa11e-f15d-da1a-51f7-70e14ed58ffc@bytedance.com> Date: Thu, 19 May 2022 11:12:48 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [External] Re: [PATCH] bpf: avoid grabbing spin_locks of all cpus when no free elems To: Yonghong Song , Alexei Starovoitov Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , John Fastabend , KP Singh , Network Development , bpf , LKML , Xiongchun Duan , Muchun Song , Dongdong Wang , Cong Wang , Chengming Zhou References: <20220518062715.27809-1-zhoufeng.zf@bytedance.com> <6ae715b3-96b1-2b42-4d1a-5267444d586b@bytedance.com> <9c0c3e0b-33bc-51a7-7916-7278f14f308e@fb.com> From: Feng Zhou In-Reply-To: <9c0c3e0b-33bc-51a7-7916-7278f14f308e@fb.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2022/5/19 上午4:39, Yonghong Song 写道: > > > On 5/17/22 11:57 PM, Feng Zhou wrote: >> 在 2022/5/18 下午2:32, Alexei Starovoitov 写道: >>> On Tue, May 17, 2022 at 11:27 PM Feng zhou >>> wrote: >>>> From: Feng Zhou >>>> >>>> We encountered bad case on big system with 96 CPUs that >>>> alloc_htab_elem() would last for 1ms. The reason is that after the >>>> prealloc hashtab has no free elems, when trying to update, it will >>>> still >>>> grab spin_locks of all cpus. If there are multiple update users, the >>>> competition is very serious. >>>> >>>> So this patch add is_empty in pcpu_freelist_head to check freelist >>>> having free or not. If having, grab spin_lock, or check next cpu's >>>> freelist. >>>> >>>> Before patch: hash_map performance >>>> ./map_perf_test 1 > > could you explain what parameter '1' means here? This code is here: samples/bpf/map_perf_test_user.c samples/bpf/map_perf_test_kern.c parameter '1' means testcase flag, test hash_map's performance parameter '2048' means test hash_map's performance when free=0. testcase flag '2048' is added by myself to reproduce the problem phenomenon. > >>>> 0:hash_map_perf pre-alloc 975345 events per sec >>>> 4:hash_map_perf pre-alloc 855367 events per sec >>>> 12:hash_map_perf pre-alloc 860862 events per sec >>>> 8:hash_map_perf pre-alloc 849561 events per sec >>>> 3:hash_map_perf pre-alloc 849074 events per sec >>>> 6:hash_map_perf pre-alloc 847120 events per sec >>>> 10:hash_map_perf pre-alloc 845047 events per sec >>>> 5:hash_map_perf pre-alloc 841266 events per sec >>>> 14:hash_map_perf pre-alloc 849740 events per sec >>>> 2:hash_map_perf pre-alloc 839598 events per sec >>>> 9:hash_map_perf pre-alloc 838695 events per sec >>>> 11:hash_map_perf pre-alloc 845390 events per sec >>>> 7:hash_map_perf pre-alloc 834865 events per sec >>>> 13:hash_map_perf pre-alloc 842619 events per sec >>>> 1:hash_map_perf pre-alloc 804231 events per sec >>>> 15:hash_map_perf pre-alloc 795314 events per sec >>>> >>>> hash_map the worst: no free >>>> ./map_perf_test 2048 >>>> 6:worse hash_map_perf pre-alloc 28628 events per sec >>>> 5:worse hash_map_perf pre-alloc 28553 events per sec >>>> 11:worse hash_map_perf pre-alloc 28543 events per sec >>>> 3:worse hash_map_perf pre-alloc 28444 events per sec >>>> 1:worse hash_map_perf pre-alloc 28418 events per sec >>>> 7:worse hash_map_perf pre-alloc 28427 events per sec >>>> 13:worse hash_map_perf pre-alloc 28330 events per sec >>>> 14:worse hash_map_perf pre-alloc 28263 events per sec >>>> 9:worse hash_map_perf pre-alloc 28211 events per sec >>>> 15:worse hash_map_perf pre-alloc 28193 events per sec >>>> 12:worse hash_map_perf pre-alloc 28190 events per sec >>>> 10:worse hash_map_perf pre-alloc 28129 events per sec >>>> 8:worse hash_map_perf pre-alloc 28116 events per sec >>>> 4:worse hash_map_perf pre-alloc 27906 events per sec >>>> 2:worse hash_map_perf pre-alloc 27801 events per sec >>>> 0:worse hash_map_perf pre-alloc 27416 events per sec >>>> 3:worse hash_map_perf pre-alloc 28188 events per sec >>>> >>>> ftrace trace >>>> >>>> 0)               |  htab_map_update_elem() { >>>> 0)   0.198 us    |    migrate_disable(); >>>> 0)               |    _raw_spin_lock_irqsave() { >>>> 0)   0.157 us    |      preempt_count_add(); >>>> 0)   0.538 us    |    } >>>> 0)   0.260 us    |    lookup_elem_raw(); >>>> 0)               |    alloc_htab_elem() { >>>> 0)               |      __pcpu_freelist_pop() { >>>> 0)               |        _raw_spin_lock() { >>>> 0)   0.152 us    |          preempt_count_add(); >>>> 0)   0.352 us    | native_queued_spin_lock_slowpath(); >>>> 0)   1.065 us    |        } >>>>                   |        ... >>>> 0)               |        _raw_spin_unlock() { >>>> 0)   0.254 us    |          preempt_count_sub(); >>>> 0)   0.555 us    |        } >>>> 0) + 25.188 us   |      } >>>> 0) + 25.486 us   |    } >>>> 0)               |    _raw_spin_unlock_irqrestore() { >>>> 0)   0.155 us    |      preempt_count_sub(); >>>> 0)   0.454 us    |    } >>>> 0)   0.148 us    |    migrate_enable(); >>>> 0) + 28.439 us   |  } >>>> >>>> The test machine is 16C, trying to get spin_lock 17 times, in addition >>>> to 16c, there is an extralist. >>> Is this with small max_entries and a large number of cpus? >>> >>> If so, probably better to fix would be to artificially >>> bump max_entries to be 4x of num_cpus. >>> Racy is_empty check still wastes the loop. >> >> This hash_map worst testcase with 16 CPUs, map's max_entries is 1000. >> >> This is the test case I constructed, it is to fill the map on >> purpose, and then >> >> continue to update, just to reproduce the problem phenomenon. >> >> The bad case we encountered with 96 CPUs, map's max_entries is 10240. > > For such cases, most likely the map is *almost* full. What is the > performance if we increase map size, e.g., from 10240 to 16K(16192)? Yes, increasing max_entries can temporarily solve this problem, but when 16k is used up, it will still encounter this problem. This patch is to try to fix this corner case.