Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp3343655rwb; Wed, 30 Nov 2022 19:47:13 -0800 (PST) X-Google-Smtp-Source: AA0mqf5pS6mSXQpBnP55KyQebae9jeMw39JA7Yg7/qiovjPpWiYPPTI6unhcKzsf92ULZ61RgeBb X-Received: by 2002:a05:6402:449b:b0:459:2b41:3922 with SMTP id er27-20020a056402449b00b004592b413922mr50340402edb.160.1669866433572; Wed, 30 Nov 2022 19:47:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669866433; cv=none; d=google.com; s=arc-20160816; b=QYXzhTDkxXUG4odcZQt4BotQMGMl9Wfoj5ygPQaDD96x1vYtWbKvurxFt0Zpr3HWbg h8IaYDvL3c1lArf7C0FSEROvBBhsXS4KvqeLp7tlhlQMbxftHwHBCRxJoshUR1itxwX5 a1mTD3GajUkZYc26wH8ML38MOUhqVIKAP36zHzrmjtBylbHTJap7Rmupo7vb7Rc+lrR8 EOllRj9N8qqx4sa5iWX+br+fqYEMgFNiG+Shyq3aDxoMvSGQN3ZzcRwHIVmtZxsnvAWi T6V7H6I5GOjN8008B/KPZHqlaQlQkLGYOPYk5oIJimXgnEzamVxgEZrCctz6JhOOPzAN VQhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=iIzJ3hiaLp8pZiphdUP1LYjXI+fKRmmqH1Z780qWiWc=; b=mAMpPzU3uxJ4UGACqn2qnWqJa4IShfHOF5O6n+rgk6WnklT0b0ikPy7QMPVF2wFpZj bRdl2DE7Z3y27z0hYs0LOub0LEbT7nOEZSB19quJv3r8dpSmL9/zshIR/qwl+Y0HAQAD P6gCrFwJ5wdK8sEXFxJej/mwkHlRVLvF6TxqmHjaaROjsgJCqx3H+a/j+Obg39fRpQSw 06gGzCO8i1Gvj6GBpdottqTW/W6JvKgBaGbr6E9JF3n3r5ItyQbfrRAKjQgbmAy4W01c 8J70I5QQ1sYr5ykCpu6bd23U98TQE2tIU0/iVDqyj8AER9FJRqxRA5D0WOUE13ifWgwN ztGg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g20-20020a1709065d1400b0078ed891fef3si3349968ejt.440.2022.11.30.19.46.53; Wed, 30 Nov 2022 19:47:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229771AbiLACxu (ORCPT + 82 others); Wed, 30 Nov 2022 21:53:50 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229586AbiLACxr (ORCPT ); Wed, 30 Nov 2022 21:53:47 -0500 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 163EC9580E; Wed, 30 Nov 2022 18:53:45 -0800 (PST) Received: from mail02.huawei.com (unknown [172.30.67.169]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4NN0yD3JJ8z4f3m7W; Thu, 1 Dec 2022 10:53:40 +0800 (CST) Received: from [10.174.176.117] (unknown [10.174.176.117]) by APP2 (Coremail) with SMTP id Syh0CgCXrLcxF4hj+zG4BQ--.60735S2; Thu, 01 Dec 2022 10:53:41 +0800 (CST) Subject: Re: [net-next] bpf: avoid hashtab deadlock with try_lock To: Tonghao Zhang Cc: Hao Luo , Waiman Long , Peter Zijlstra , Ingo Molnar , Will Deacon , netdev@vger.kernel.org, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Jiri Olsa , bpf , "houtao1@huawei.com" , LKML , Boqun Feng References: <41eda0ea-0ed4-1ffb-5520-06fda08e5d38@huawei.com> <07a7491e-f391-a9b2-047e-cab5f23decc5@huawei.com> <59fc54b7-c276-2918-6741-804634337881@huaweicloud.com> <541aa740-dcf3-35f5-9f9b-e411978eaa06@redhat.com> <23b5de45-1a11-b5c9-d0d3-4dbca0b7661e@huaweicloud.com> <8d424223-1da6-60bf-dd2c-cd2fe6d263fe@huaweicloud.com> From: Hou Tao Message-ID: <20b8ad93-7a90-dc8c-581b-491d543423a5@huaweicloud.com> Date: Thu, 1 Dec 2022 10:53:37 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Content-Language: en-US X-CM-TRANSID: Syh0CgCXrLcxF4hj+zG4BQ--.60735S2 X-Coremail-Antispam: 1UD129KBjvJXoWxGFWfXw1rCrykZr45Cw4fuFg_yoWrZFyfpF W7GFyUKF4kZr15uan2vF18tr4ayw129r4UZrZ8J340vF90v3sxurWIqw1j9Fy0qrn3JFsI vr42va47CryjyFJanT9S1TB71UUUUUUqnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvIb4IE77IF4wAFF20E14v26ryj6rWUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JM4IIrI8v6xkF7I0E8cxan2IY04v7Mxk0xIA0c2IE e2xFo4CEbIxvr21l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxV Aqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r4a 6rW5MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6x kF7I0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWrZr1j6s0DMIIF0xvEx4A2jsIE 14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVjvjDU0xZFpf 9x07UZ18PUUUUU= X-CM-SenderInfo: xkrx3t3r6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On 11/30/2022 1:55 PM, Tonghao Zhang wrote: > On Wed, Nov 30, 2022 at 12:13 PM Hou Tao wrote: >> Hi, >> >> On 11/30/2022 10:47 AM, Tonghao Zhang wrote: >>> On Wed, Nov 30, 2022 at 9:50 AM Hou Tao wrote: >>>> Hi Hao, >>>> >>>> On 11/30/2022 3:36 AM, Hao Luo wrote: >>>>> On Tue, Nov 29, 2022 at 9:32 AM Boqun Feng wrote: >>>>>> Just to be clear, I meant to refactor htab_lock_bucket() into a try >>>>>> lock pattern. Also after a second thought, the below suggestion doesn't >>>>>> work. I think the proper way is to make htab_lock_bucket() as a >>>>>> raw_spin_trylock_irqsave(). >>>>>> >>>>>> Regards, >>>>>> Boqun >>>>>> >>>>> The potential deadlock happens when the lock is contended from the >>>>> same cpu. When the lock is contended from a remote cpu, we would like >>>>> the remote cpu to spin and wait, instead of giving up immediately. As >>>>> this gives better throughput. So replacing the current >>>>> raw_spin_lock_irqsave() with trylock sacrifices this performance gain. >>>>> >>>>> I suspect the source of the problem is the 'hash' that we used in >>>>> htab_lock_bucket(). The 'hash' is derived from the 'key', I wonder >>>>> whether we should use a hash derived from 'bucket' rather than from >>>>> 'key'. For example, from the memory address of the 'bucket'. Because, >>>>> different keys may fall into the same bucket, but yield different >>>>> hashes. If the same bucket can never have two different 'hashes' here, >>>>> the map_locked check should behave as intended. Also because >>>>> ->map_locked is per-cpu, execution flows from two different cpus can >>>>> both pass. >>>> The warning from lockdep is due to the reason the bucket lock A is used in a >>>> no-NMI context firstly, then the same bucke lock is used a NMI context, so >>> Yes, I tested lockdep too, we can't use the lock in NMI(but only >>> try_lock work fine) context if we use them no-NMI context. otherwise >>> the lockdep prints the warning. >>> * for the dead-lock case: we can use the >>> 1. hash & min(HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1) >>> 2. or hash bucket address. >> Use the computed hash will be better than hash bucket address, because the hash >> buckets are allocated sequentially. >>> * for lockdep warning, we should use in_nmi check with map_locked. >>> >>> BTW, the patch doesn't work, so we can remove the lock_key >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c50eb518e262fa06bd334e6eec172eaf5d7a5bd9 >>> >>> static inline int htab_lock_bucket(const struct bpf_htab *htab, >>> struct bucket *b, u32 hash, >>> unsigned long *pflags) >>> { >>> unsigned long flags; >>> >>> hash = hash & min(HASHTAB_MAP_LOCK_MASK, htab->n_buckets -1); >>> >>> preempt_disable(); >>> if (unlikely(__this_cpu_inc_return(*(htab->map_locked[hash])) != 1)) { >>> __this_cpu_dec(*(htab->map_locked[hash])); >>> preempt_enable(); >>> return -EBUSY; >>> } >>> >>> if (in_nmi()) { >>> if (!raw_spin_trylock_irqsave(&b->raw_lock, flags)) >>> return -EBUSY; >> The only purpose of trylock here is to make lockdep happy and it may lead to >> unnecessary -EBUSY error for htab operations in NMI context. I still prefer add >> a virtual lock-class for map_locked to fix the lockdep warning. So could you use > Hi, what is virtual lock-class ? Can you give me an example of what you mean? If LOCKDEP is enabled, raw_spinlock will add dep_map in the definition and it also calls lock_acquire() and lock_release() to assist the deadlock check. Now map_locked is not a lock but it acts like a raw_spin_trylock, so we need to add dep_map to it manually, and then also call lock_acquire(trylock=1) and lock_release() before increasing and decreasing map_locked. You can reference the implementation of raw_spin_trylock and raw_spin_unlock for more details. >> separated patches to fix the potential dead-lock and the lockdep warning ? It >> will be better you can also add a bpf selftests for deadlock problem as said before. >> >> Thanks, >> Tao >>> } else { >>> raw_spin_lock_irqsave(&b->raw_lock, flags); >>> } >>> >>> *pflags = flags; >>> return 0; >>> } >>> >>> >>>> lockdep deduces that may be a dead-lock. I have already tried to use the same >>>> map_locked for keys with the same bucket, the dead-lock is gone, but still got >>>> lockdep warning. >>>>> Hao >>>>> . >