Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp219701iob; Wed, 18 May 2022 00:12:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzu4AviMqOu+9ArOHrU2aAiOsD7/Cr2cDChyLEe9X2uTSXix4w3WzQXeGaCUHKOR0sZns06 X-Received: by 2002:a17:902:ab1d:b0:161:527e:277 with SMTP id ik29-20020a170902ab1d00b00161527e0277mr18941404plb.73.1652857944155; Wed, 18 May 2022 00:12:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652857944; cv=none; d=google.com; s=arc-20160816; b=acNy6imaKPo9CD/4XvkfRhViEBizHjBldkzWSF2vyxg8VyWpyHdKHXZ8hwur32rStz GZ9QO7K8gfXKfBawbb/0W0yTKwu/Bol8FToeR1AwhPXvbgq6V+FyqFiyz8r6VxE7JyPp Xi3+rWYIyskhTZoXQNfzwYj42z+wm/PRgrzOqy4uYAz3a47CTr+OU619UvbUtFos4kwG VNdhUXGWNaW9Poc5jQzkbKpfdiAtc5BsWh15f+R2vkVNfQE3Htv6cS6kE1T4uGfzhapB DgNVCwWT2tUCOK7iRBpbjnZfx+ROY1rVRE8WSomM7ZYZEmADrRePLIZVXlfZJhEET8z2 ummA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:subject:user-agent:mime-version:date:message-id :dkim-signature; bh=H2/5LhVK5LQwWafqLqKW0LuSdW+8K8rE6sXyLUjvZ20=; b=fuyMtJ+nmXGCdjyCjDVupwP1XeLcLGQIM9IHFpKC6ulB6uPIgYx0+PBvVG0XvRIDRq TFSgeOzwWCNnAjFvZz+OD1vqhprK84szx7khaFroiCRauIPOHhXVdT4Dhd+i+B8VUSx+ mXDbqD/6YpdPFCrhI+PPZW+yDFsbciNeEvdeFw/DN6tFbr4J9zdUfXyoUA0TqJE43ZQD vBNlifTJZ5/EVWvZ3FuTU5TqAulYt39W37emTbfApN1wuST3C3J4rDkb20hgnco9BoAH 8mmQXTzSuZIdVxWpAUiOUViQYF1V8yf+v3sy4jnLAsdtNSIw8PktWNuHKtIz6nrEFZ9L l7WA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=g8OwWZUT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id f80-20020a623853000000b0050d2a9a54f4si1858368pfa.372.2022.05.18.00.12.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 May 2022 00:12:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=g8OwWZUT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 20CDF2F38D; Tue, 17 May 2022 23:58:00 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231603AbiERG5r (ORCPT + 99 others); Wed, 18 May 2022 02:57:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231661AbiERG5o (ORCPT ); Wed, 18 May 2022 02:57:44 -0400 Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 173A42AE1E for ; Tue, 17 May 2022 23:57:43 -0700 (PDT) Received: by mail-pj1-x1029.google.com with SMTP id f10so1126012pjs.3 for ; Tue, 17 May 2022 23:57:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:to:cc:references :from:in-reply-to:content-transfer-encoding; bh=H2/5LhVK5LQwWafqLqKW0LuSdW+8K8rE6sXyLUjvZ20=; b=g8OwWZUT6pVOSmhJMcD1IuJuzUFlb9xjGHTLOsfk/1ziY8+i9QNUa7qTP59l0kzQJf x6QpmgHJ6dDAolq1OUfgu+FJixdEmNuYFINVmdIUaDkq5LdWiVP72dRSBynUpBWOWh6k z08HVW2D2Tyx7ZhplteluRHtLu+/1TRHnSTmK809D3r8/Bfy/JA19dC5bXyuiZrwHkX+ dfASEAHuUuDTkqI7QZYsGBKhZdM5ox6g56rpTmKYsVuK66F0P+Z416ESomTkzcDZRkQG UPKc6cGdOmbE3QQzWi9q/q65aJGvTPVeN5YYMZ4RSCpZMkh1Cx4WtmyFA0KAMaV9Deq1 bw6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :to:cc:references:from:in-reply-to:content-transfer-encoding; bh=H2/5LhVK5LQwWafqLqKW0LuSdW+8K8rE6sXyLUjvZ20=; b=UAkW1m+vZnnoXkn7vFEN5XXLswJeU1cc40AqV2n4C9C6podChFx2bYR5xRVn9ueLng A2GcMtd3RyKksZIt9x2JxzTpiiR9vN3JffBh7VJUHMepzKVVgD/vzMtXe2ar0HmN3mEY vKXUdgulPbtJD5Bpdk2qUhGNsZFy313/ussqaDP7PmT++sfzdvmiqzNLOmZJjrHNVyHs 17yVoeK1GFyHSNqWlC6uF6Zeeiz3AWZAYlvIj+FqM6DcnW+o01ZfiGtSANuUJWg8Jhvd RvZqPIMGdXdufQ4DOqX1kWXhSEU23Pux0NODXF8wtyQwVztyXGcsdklRBVnWGCnfMHYu dJzA== X-Gm-Message-State: AOAM530UGmFlRpr9LB+2/WFyeUjwGMPuzt7yZzzvjOXuhYw9NXlm6gZJ OTEmNDeBb40mXcj2B33CTCJJxw== X-Received: by 2002:a17:902:7483:b0:161:80be:cd37 with SMTP id h3-20020a170902748300b0016180becd37mr12249891pll.138.1652857062477; Tue, 17 May 2022 23:57:42 -0700 (PDT) Received: from [10.71.57.194] ([139.177.225.225]) by smtp.gmail.com with ESMTPSA id i132-20020a62878a000000b0050dc762813fsm1018256pfe.25.2022.05.17.23.57.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 17 May 2022 23:57:42 -0700 (PDT) Message-ID: <6ae715b3-96b1-2b42-4d1a-5267444d586b@bytedance.com> Date: Wed, 18 May 2022 14:57:35 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.9.0 Subject: Re: [External] Re: [PATCH] bpf: avoid grabbing spin_locks of all cpus when no free elems To: Alexei Starovoitov Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Network Development , bpf , LKML , Xiongchun Duan , Muchun Song , Dongdong Wang , Cong Wang , Chengming Zhou References: <20220518062715.27809-1-zhoufeng.zf@bytedance.com> From: Feng Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, NICE_REPLY_A,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org 在 2022/5/18 下午2:32, Alexei Starovoitov 写道: > On Tue, May 17, 2022 at 11:27 PM Feng zhou wrote: >> From: Feng Zhou >> >> We encountered bad case on big system with 96 CPUs that >> alloc_htab_elem() would last for 1ms. The reason is that after the >> prealloc hashtab has no free elems, when trying to update, it will still >> grab spin_locks of all cpus. If there are multiple update users, the >> competition is very serious. >> >> So this patch add is_empty in pcpu_freelist_head to check freelist >> having free or not. If having, grab spin_lock, or check next cpu's >> freelist. >> >> Before patch: hash_map performance >> ./map_perf_test 1 >> 0:hash_map_perf pre-alloc 975345 events per sec >> 4:hash_map_perf pre-alloc 855367 events per sec >> 12:hash_map_perf pre-alloc 860862 events per sec >> 8:hash_map_perf pre-alloc 849561 events per sec >> 3:hash_map_perf pre-alloc 849074 events per sec >> 6:hash_map_perf pre-alloc 847120 events per sec >> 10:hash_map_perf pre-alloc 845047 events per sec >> 5:hash_map_perf pre-alloc 841266 events per sec >> 14:hash_map_perf pre-alloc 849740 events per sec >> 2:hash_map_perf pre-alloc 839598 events per sec >> 9:hash_map_perf pre-alloc 838695 events per sec >> 11:hash_map_perf pre-alloc 845390 events per sec >> 7:hash_map_perf pre-alloc 834865 events per sec >> 13:hash_map_perf pre-alloc 842619 events per sec >> 1:hash_map_perf pre-alloc 804231 events per sec >> 15:hash_map_perf pre-alloc 795314 events per sec >> >> hash_map the worst: no free >> ./map_perf_test 2048 >> 6:worse hash_map_perf pre-alloc 28628 events per sec >> 5:worse hash_map_perf pre-alloc 28553 events per sec >> 11:worse hash_map_perf pre-alloc 28543 events per sec >> 3:worse hash_map_perf pre-alloc 28444 events per sec >> 1:worse hash_map_perf pre-alloc 28418 events per sec >> 7:worse hash_map_perf pre-alloc 28427 events per sec >> 13:worse hash_map_perf pre-alloc 28330 events per sec >> 14:worse hash_map_perf pre-alloc 28263 events per sec >> 9:worse hash_map_perf pre-alloc 28211 events per sec >> 15:worse hash_map_perf pre-alloc 28193 events per sec >> 12:worse hash_map_perf pre-alloc 28190 events per sec >> 10:worse hash_map_perf pre-alloc 28129 events per sec >> 8:worse hash_map_perf pre-alloc 28116 events per sec >> 4:worse hash_map_perf pre-alloc 27906 events per sec >> 2:worse hash_map_perf pre-alloc 27801 events per sec >> 0:worse hash_map_perf pre-alloc 27416 events per sec >> 3:worse hash_map_perf pre-alloc 28188 events per sec >> >> ftrace trace >> >> 0) | htab_map_update_elem() { >> 0) 0.198 us | migrate_disable(); >> 0) | _raw_spin_lock_irqsave() { >> 0) 0.157 us | preempt_count_add(); >> 0) 0.538 us | } >> 0) 0.260 us | lookup_elem_raw(); >> 0) | alloc_htab_elem() { >> 0) | __pcpu_freelist_pop() { >> 0) | _raw_spin_lock() { >> 0) 0.152 us | preempt_count_add(); >> 0) 0.352 us | native_queued_spin_lock_slowpath(); >> 0) 1.065 us | } >> | ... >> 0) | _raw_spin_unlock() { >> 0) 0.254 us | preempt_count_sub(); >> 0) 0.555 us | } >> 0) + 25.188 us | } >> 0) + 25.486 us | } >> 0) | _raw_spin_unlock_irqrestore() { >> 0) 0.155 us | preempt_count_sub(); >> 0) 0.454 us | } >> 0) 0.148 us | migrate_enable(); >> 0) + 28.439 us | } >> >> The test machine is 16C, trying to get spin_lock 17 times, in addition >> to 16c, there is an extralist. > Is this with small max_entries and a large number of cpus? > > If so, probably better to fix would be to artificially > bump max_entries to be 4x of num_cpus. > Racy is_empty check still wastes the loop. This hash_map worst testcase with 16 CPUs, map's max_entries is 1000. This is the test case I constructed, it is to fill the map on purpose, and then continue to update, just to reproduce the problem phenomenon. The bad case we encountered with 96 CPUs, map's max_entries is 10240.