Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp327327pxy; Wed, 21 Apr 2021 04:09:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwVyutGhwq1V2aLz2MUfyqgSPF9yuKuhUm1o0RslsvpLLpvn7aGbGmyo3ftHzdei82j/wg+ X-Received: by 2002:a17:906:6801:: with SMTP id k1mr17687258ejr.137.1619003344596; Wed, 21 Apr 2021 04:09:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619003344; cv=none; d=google.com; s=arc-20160816; b=etowbg44ihCchOnkP8g6UU3J4Mlp58X3wZqKXaLNSBLmJaE2BrJ+H0RWTgXsbqFQTF ZUkl829T3Q6nPfnQh9jpzP3lhMypwU6bzMUZDSnISkpuXnWVfRIrSzcEi8uGqFMTYVhs FM2r+9sTFoQDbj+5UNZH9PAOZFxv8zywZTCCUfEbn6SnOfdE7Uq41FYQVfyVWBHytOVq zlOEz1DlLL0+rQ9F/mHWP/7WWDMpB7K1Anf5HcWB3+JXlLFUGWOgDJwyExlYA1L7F1Np XlPNC1cFp8/vxRfXdY1DaMNkQoeDhzNtcNh9VaXhyEFmixiZv7KworOQNvl3/zxd56Vr LwhQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from; bh=qGRVXsej0yuoXuDXZO1cr3QXkhoPfriJCcPP1VIPVoU=; b=BCnLM95K+Kq+xSm7+imkzfRR3OwXplfH+qso4eOiJibhVTJl9gJLn6+AM1UJyLWjGN yQxnAWh3somBa3ESNYiX3yKaFgjjgqkbT+2n9NjdBHOrRBQKXlx+g6CxJTM6RUkEnEh5 tVLYXPS5eVXqu9PXttKbZJpWI6Ia8CkCOQiQxOk8qvDPK3/TltGSCHXy/xowMPjxm9Al v+mEfPUtdYKGpcFZYvh2sZs4MQFtKGsHlHWXSSfWNANuWgXieUn70XT1tri3fnTrF0D3 uK9gP5WwGrlZQYU8s5dIr2mOWP2GMyh4Z2h2H6inEBJuMW7I4WAYLZupYT8xOmL1R5kq HrPw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p10si1675027edw.62.2021.04.21.04.08.40; Wed, 21 Apr 2021 04:09:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236570AbhDUIkM (ORCPT + 99 others); Wed, 21 Apr 2021 04:40:12 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:17021 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235997AbhDUIkL (ORCPT ); Wed, 21 Apr 2021 04:40:11 -0400 Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga05-in.huawei.com (SkyGuard) with ESMTP id 4FQDRm303QzPsHZ; Wed, 21 Apr 2021 16:36:36 +0800 (CST) Received: from huawei.com (10.175.112.154) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.498.0; Wed, 21 Apr 2021 16:39:30 +0800 From: jinyiting To: , , , , , , , CC: , Subject: [PATCH] bonding: 3ad: Fix the conflict between bond_update_slave_arr and the state machine Date: Wed, 21 Apr 2021 16:38:21 +0800 Message-ID: <1618994301-1186-1-git-send-email-jinyiting@huawei.com> X-Mailer: git-send-email 1.7.12.4 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.175.112.154] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The bond works in mode 4, and performs down/up operations on the bond that is normally negotiated. The probability of bond-> slave_arr is NULL Test commands: ifconfig bond1 down ifconfig bond1 up The conflict occurs in the following process: __dev_open (CPU A) --bond_open --queue_delayed_work(bond->wq,&bond->ad_work,0); --bond_update_slave_arr --bond_3ad_get_active_agg_info ad_work(CPU B) --bond_3ad_state_machine_handler --ad_agg_selection_logic ad_work runs on cpu B. In the function ad_agg_selection_logic, all agg->is_active will be cleared. Before the new active aggregator is selected on CPU B, bond_3ad_get_active_agg_info failed on CPU A, bond->slave_arr will be set to NULL. The best aggregator in ad_agg_selection_logic has not changed, no need to update slave arr. The conflict occurred in that ad_agg_selection_logic clears agg->is_active under mode_lock, but bond_open -> bond_update_slave_arr is inspecting agg->is_active outside the lock. Also, bond_update_slave_arr is normal for potential sleep when allocating memory, so replace the WARN_ON with a call to might_sleep. Signed-off-by: jinyiting --- Previous versions: * https://lore.kernel.org/netdev/612b5e32-ea11-428e-0c17-e2977185f045@huawei.com/ drivers/net/bonding/bond_main.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 74cbbb2..83ef62d 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4391,9 +4391,7 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave) int agg_id = 0; int ret = 0; -#ifdef CONFIG_LOCKDEP - WARN_ON(lockdep_is_held(&bond->mode_lock)); -#endif + might_sleep(); usable_slaves = kzalloc(struct_size(usable_slaves, arr, bond->slave_cnt), GFP_KERNEL); @@ -4406,7 +4404,9 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave) if (BOND_MODE(bond) == BOND_MODE_8023AD) { struct ad_info ad_info; + spin_lock_bh(&bond->mode_lock); if (bond_3ad_get_active_agg_info(bond, &ad_info)) { + spin_unlock_bh(&bond->mode_lock); pr_debug("bond_3ad_get_active_agg_info failed\n"); /* No active aggragator means it's not safe to use * the previous array. @@ -4414,6 +4414,7 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave) bond_reset_slave_arr(bond); goto out; } + spin_unlock_bh(&bond->mode_lock); agg_id = ad_info.aggregator_id; } bond_for_each_slave(bond, slave, iter) { -- 1.7.12.4