Received: by 2002:a6b:500f:0:0:0:0:0 with SMTP id e15csp201016iob; Tue, 17 May 2022 23:41:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwq0/m6NcFG2EBkyoFlDmZcjhGxeO9kls02GD7HuFFwNXeJiynrnAog8nZ7dH+QW1zzluvn X-Received: by 2002:a17:903:290:b0:15c:1c87:e66c with SMTP id j16-20020a170903029000b0015c1c87e66cmr26343335plr.61.1652856080126; Tue, 17 May 2022 23:41:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1652856080; cv=none; d=google.com; s=arc-20160816; b=Mn5Qnq7XTMv3W9dHOSlhGj7akajkW3ggKLfx05PcbB1qLRTD4wMZCkMAP/Y9NxFOpT NMnrnKhP3DWmyrYQnXMK+82BEJ9aUDa/4hxroAVJmu9i1btOGE4o0Jq4b8UfCD5ssHFR 2VF7VOMYkvGgmAtLpncelj77Sq+ObWmHmjPHnFw4IdiuYLMPdz85AU28OWH/pzlPz35J u1h3Fo3t88sDjM9+4WyJntUATHh/Ko1UpIm1TRu5kDZxKTZXI7egXvprQu93wx6m4/ws ny3iHWhiD8/9v3KXnZTmDVu2umtXdj3gu1ns9vrot9VngeGNa1WVy1a+TTf86gAmERzI lZxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=iNLG0SMc9nRqW0DoCoki52GgrxJLjTRCAhq9TQJlP3I=; b=WqVpgokFoc1j0Sp+ZTa3OfcgzTEpAm+WYWd/XjB8sp8/v9jNoXuGivixqRfa4rHmQg w4zp1xsPPU6bA3MoE2/N8lJfcWPuQPkOZicUE7Pt/iF+QFW/hVx9/UcL8pAE1pb9Q5En bS11QcYVPN4qlYZgM0kMXmxDspXI2vphDE2jasg9V77jBdty3/BbZAApPwUvDxaAzjz3 424Qr0ox97w1DlC3fgrC6EKD3FrplWAjElHYlZiOi6W4/hefEus8AzKNpZWChRPgRTRg lI6j+Hn+UjFg1QnfCRmt98SBGIT37y4M/7L0Yh1TDcZIwG47swXzrG4mdX1TDnTWHoVW nWbw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=J57mevt0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id i17-20020a170902cf1100b0015d22c34b58si1754921plg.251.2022.05.17.23.41.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 May 2022 23:41:20 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=J57mevt0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id AE091E52A0; Tue, 17 May 2022 23:34:24 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231327AbiERGdx (ORCPT + 99 others); Wed, 18 May 2022 02:33:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34458 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231352AbiERGc7 (ORCPT ); Wed, 18 May 2022 02:32:59 -0400 Received: from mail-qv1-xf29.google.com (mail-qv1-xf29.google.com [IPv6:2607:f8b0:4864:20::f29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5C6B3DE335; Tue, 17 May 2022 23:32:47 -0700 (PDT) Received: by mail-qv1-xf29.google.com with SMTP id n10so851259qvi.5; Tue, 17 May 2022 23:32:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=iNLG0SMc9nRqW0DoCoki52GgrxJLjTRCAhq9TQJlP3I=; b=J57mevt0lm333o3WqDhpUZ6lu35lzwxo4U0axQIyYIu8/Ub51orLPP+oVWcZ+7yuRO 7TgzHWGchDJ2BfOfg2xsmx/0ebQ1NGbkQe8FjGp/g/MKi+IFcu3yF/KRfPVty1Ry2Xos Azoqc/NogRtoS6SJQsW8j0ixrA+Ttqw/Bv6pfmQ+CLAa4d/ycHb3YVx/lOOvCkvz4PLU /TWjhO3bn5hnPHQI95OdjDJHdxtkOrg+cg9VX7tWBXPGRqRTvbcmMOE92d+8NSfhSh4z Ji7EiBr4ktQIMgbpiWksFR2fN0eGkUW5r+1n2obxF3oJYxUwOQD2huxNxwLehof3hyfF dyew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=iNLG0SMc9nRqW0DoCoki52GgrxJLjTRCAhq9TQJlP3I=; b=jl3nv35/vg3g7hXWPGwiffAYumLcaRnvXtixY/ooZsumB4X9O07AOP3uoA/JQjabSz gKz92LTCgfhAdV5+Z0mgt1XhJmG4PXEi8aPXHpxU29Cq+Ift2dYQNTD3EzUtkldF9u77 VxGGMqtSRQ9+PIklcETjJYGlNbDXFm9BIr0RNO+M4ZLloKfqQ7ewLRqhXIVKCR0R/SnE EH0zWIkzglPhN1QrrBmHdxqQzi+wHwj3gBzMRxZX5+TlWPFED6VQbgoAiLE41C8cs2ZP FxbsYGNzngaClPee6CIalnVYrETkXLXyNOrGgu2uYHg1Fyo5QGczVy0rCjxKsg2wd5xq LmBg== X-Gm-Message-State: AOAM531kWdSzohGGZKI/pN6Yn+q4RJ6+lwKPyq3FINWUD9+QyEtQnBMu JuHUey7gQtBnYEVomV7Q+l9WcONmX7f5ap4N5+E= X-Received: by 2002:ad4:5c6e:0:b0:45a:aefd:f551 with SMTP id i14-20020ad45c6e000000b0045aaefdf551mr23591604qvh.95.1652855566388; Tue, 17 May 2022 23:32:46 -0700 (PDT) MIME-Version: 1.0 References: <20220518062715.27809-1-zhoufeng.zf@bytedance.com> In-Reply-To: <20220518062715.27809-1-zhoufeng.zf@bytedance.com> From: Alexei Starovoitov Date: Tue, 17 May 2022 23:32:35 -0700 Message-ID: Subject: Re: [PATCH] bpf: avoid grabbing spin_locks of all cpus when no free elems To: Feng zhou Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Network Development , bpf , LKML , Xiongchun Duan , Muchun Song , Dongdong Wang , Cong Wang , Chengming Zhou Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RDNS_NONE, SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 17, 2022 at 11:27 PM Feng zhou wrote: > > From: Feng Zhou > > We encountered bad case on big system with 96 CPUs that > alloc_htab_elem() would last for 1ms. The reason is that after the > prealloc hashtab has no free elems, when trying to update, it will still > grab spin_locks of all cpus. If there are multiple update users, the > competition is very serious. > > So this patch add is_empty in pcpu_freelist_head to check freelist > having free or not. If having, grab spin_lock, or check next cpu's > freelist. > > Before patch: hash_map performance > ./map_perf_test 1 > 0:hash_map_perf pre-alloc 975345 events per sec > 4:hash_map_perf pre-alloc 855367 events per sec > 12:hash_map_perf pre-alloc 860862 events per sec > 8:hash_map_perf pre-alloc 849561 events per sec > 3:hash_map_perf pre-alloc 849074 events per sec > 6:hash_map_perf pre-alloc 847120 events per sec > 10:hash_map_perf pre-alloc 845047 events per sec > 5:hash_map_perf pre-alloc 841266 events per sec > 14:hash_map_perf pre-alloc 849740 events per sec > 2:hash_map_perf pre-alloc 839598 events per sec > 9:hash_map_perf pre-alloc 838695 events per sec > 11:hash_map_perf pre-alloc 845390 events per sec > 7:hash_map_perf pre-alloc 834865 events per sec > 13:hash_map_perf pre-alloc 842619 events per sec > 1:hash_map_perf pre-alloc 804231 events per sec > 15:hash_map_perf pre-alloc 795314 events per sec > > hash_map the worst: no free > ./map_perf_test 2048 > 6:worse hash_map_perf pre-alloc 28628 events per sec > 5:worse hash_map_perf pre-alloc 28553 events per sec > 11:worse hash_map_perf pre-alloc 28543 events per sec > 3:worse hash_map_perf pre-alloc 28444 events per sec > 1:worse hash_map_perf pre-alloc 28418 events per sec > 7:worse hash_map_perf pre-alloc 28427 events per sec > 13:worse hash_map_perf pre-alloc 28330 events per sec > 14:worse hash_map_perf pre-alloc 28263 events per sec > 9:worse hash_map_perf pre-alloc 28211 events per sec > 15:worse hash_map_perf pre-alloc 28193 events per sec > 12:worse hash_map_perf pre-alloc 28190 events per sec > 10:worse hash_map_perf pre-alloc 28129 events per sec > 8:worse hash_map_perf pre-alloc 28116 events per sec > 4:worse hash_map_perf pre-alloc 27906 events per sec > 2:worse hash_map_perf pre-alloc 27801 events per sec > 0:worse hash_map_perf pre-alloc 27416 events per sec > 3:worse hash_map_perf pre-alloc 28188 events per sec > > ftrace trace > > 0) | htab_map_update_elem() { > 0) 0.198 us | migrate_disable(); > 0) | _raw_spin_lock_irqsave() { > 0) 0.157 us | preempt_count_add(); > 0) 0.538 us | } > 0) 0.260 us | lookup_elem_raw(); > 0) | alloc_htab_elem() { > 0) | __pcpu_freelist_pop() { > 0) | _raw_spin_lock() { > 0) 0.152 us | preempt_count_add(); > 0) 0.352 us | native_queued_spin_lock_slowpath(); > 0) 1.065 us | } > | ... > 0) | _raw_spin_unlock() { > 0) 0.254 us | preempt_count_sub(); > 0) 0.555 us | } > 0) + 25.188 us | } > 0) + 25.486 us | } > 0) | _raw_spin_unlock_irqrestore() { > 0) 0.155 us | preempt_count_sub(); > 0) 0.454 us | } > 0) 0.148 us | migrate_enable(); > 0) + 28.439 us | } > > The test machine is 16C, trying to get spin_lock 17 times, in addition > to 16c, there is an extralist. Is this with small max_entries and a large number of cpus? If so, probably better to fix would be to artificially bump max_entries to be 4x of num_cpus. Racy is_empty check still wastes the loop.