Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp1637987pxp; Thu, 17 Mar 2022 13:18:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwcSQucU/HYWcS3HbSYXWQtBVHSruECuTOTCIEeipm3MhFcuLr+qvaYxeiCQJ91lBK77mL2 X-Received: by 2002:a17:90b:4c08:b0:1c6:40e4:776c with SMTP id na8-20020a17090b4c0800b001c640e4776cmr7317569pjb.237.1647548292290; Thu, 17 Mar 2022 13:18:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647548292; cv=none; d=google.com; s=arc-20160816; b=Nf7j/n0/ojKekHvwPpsN+66LjavBIuNFPiTtHhvznj2Wekk3YlYO1LRy/VMhNkhlUV gedoMzQG1EWvgMwbVU42OrwhvdjotLk73SrYJmxb/D054iyoDLdZILz2O8CLzunDfVa3 +o9MKn6B14Yjt6X6rmtWRugjBy33PuUudF/Ibvo8gwFYLsCkvb+VuKzIX0pjAR0yVMw5 ckuv/ny4wFK3ii6J573z5PK6+DFkBICc96yo/Clu9nGPKroaTg3xJ9j2tTaex20dB2Fd QwJqKGv2xCvYhEC2d5/pSXUNBbBeggPNKDwrCidI+5kD0fvnHTBcpOihg9RTR9CxOWDz 7d7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=qoD8ZvmqrJsNk+2ce8aDTG5tqn+i+fD8ZKXwR8RqlxY=; b=dBNMS656YSWoLwp7ECNikD/24ALxr/zoWrygcYzebSpHvNvc/6syHAZ3hNFvgnAMTz wUUm67SCi9cGCfVTm7b6kfoMoUt4/IaeRPdcLjdJkgY9mT9Yq2s41yaqEIq9CvVUq0lG uv2zBnWz9eSps0nF/fO82PAfWu1VyNNDGjPdDqhgmO6zIIycvHfms4B1PxajO3V0MsIF 2ajqihznn56Y895JpgslAp8bSivMK2fvclklipOYeT7Qu6Z2Cqchh+598keczk0dupn+ ACacwZWD43EZCE99+mUJ6WKfRm6p1OmNawwiSJA5xYVzIjq+wVtD7GvqLKRfaFkTfXWA aiqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=QPDURMUM; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id h7-20020a056a001a4700b004e1bb6519e7si6242946pfv.76.2022.03.17.13.18.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 17 Mar 2022 13:18:12 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@linuxfoundation.org header.s=korg header.b=QPDURMUM; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linuxfoundation.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B218B14FB90; Thu, 17 Mar 2022 12:59:56 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234370AbiCQM5S (ORCPT + 99 others); Thu, 17 Mar 2022 08:57:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39666 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234914AbiCQMyC (ORCPT ); Thu, 17 Mar 2022 08:54:02 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D26C41F0CB7; Thu, 17 Mar 2022 05:52:25 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 6E2CB614F0; Thu, 17 Mar 2022 12:52:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 56D1FC340ED; Thu, 17 Mar 2022 12:52:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1647521544; bh=BCx64xO3Wbry2wCPfTZARKe6NTiar7S30FYEF5kz4GU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=QPDURMUMwy71KDAbVCC7pgGMX+xEoPVjU+RTZVFpqjksuGOWR22k8+pX+bop0q2ZC 1DGmhbsDtaCGR9erSrD0w59sLBOg/EnNF47KvyMkKTuxZeuQpyjDr0I53ZSLvmx5go asaQPH8C62dEuBMZSx7OH1e3kT+TEShNtS4v/tig= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Ivan Vecera , Petr Oros , Dave Ertman , Jakub Kicinski Subject: [PATCH 5.15 25/25] ice: Fix race condition during interface enslave Date: Thu, 17 Mar 2022 13:46:12 +0100 Message-Id: <20220317124527.024081334@linuxfoundation.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220317124526.308079100@linuxfoundation.org> References: <20220317124526.308079100@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-3.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Ivan Vecera commit 5cb1ebdbc4342b1c2ce89516e19808d64417bdbc upstream. Commit 5dbbbd01cbba83 ("ice: Avoid RTNL lock when re-creating auxiliary device") changes a process of re-creation of aux device so ice_plug_aux_dev() is called from ice_service_task() context. This unfortunately opens a race window that can result in dead-lock when interface has left LAG and immediately enters LAG again. Reproducer: ``` #!/bin/sh ip link add lag0 type bond mode 1 miimon 100 ip link set lag0 for n in {1..10}; do echo Cycle: $n ip link set ens7f0 master lag0 sleep 1 ip link set ens7f0 nomaster done ``` This results in: [20976.208697] Workqueue: ice ice_service_task [ice] [20976.213422] Call Trace: [20976.215871] __schedule+0x2d1/0x830 [20976.219364] schedule+0x35/0xa0 [20976.222510] schedule_preempt_disabled+0xa/0x10 [20976.227043] __mutex_lock.isra.7+0x310/0x420 [20976.235071] enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core] [20976.251215] ib_enum_roce_netdev+0xa4/0xe0 [ib_core] [20976.256192] ib_cache_setup_one+0x33/0xa0 [ib_core] [20976.261079] ib_register_device+0x40d/0x580 [ib_core] [20976.266139] irdma_ib_register_device+0x129/0x250 [irdma] [20976.281409] irdma_probe+0x2c1/0x360 [irdma] [20976.285691] auxiliary_bus_probe+0x45/0x70 [20976.289790] really_probe+0x1f2/0x480 [20976.298509] driver_probe_device+0x49/0xc0 [20976.302609] bus_for_each_drv+0x79/0xc0 [20976.306448] __device_attach+0xdc/0x160 [20976.310286] bus_probe_device+0x9d/0xb0 [20976.314128] device_add+0x43c/0x890 [20976.321287] __auxiliary_device_add+0x43/0x60 [20976.325644] ice_plug_aux_dev+0xb2/0x100 [ice] [20976.330109] ice_service_task+0xd0c/0xed0 [ice] [20976.342591] process_one_work+0x1a7/0x360 [20976.350536] worker_thread+0x30/0x390 [20976.358128] kthread+0x10a/0x120 [20976.365547] ret_from_fork+0x1f/0x40 ... [20976.438030] task:ip state:D stack: 0 pid:213658 ppid:213627 flags:0x00004084 [20976.446469] Call Trace: [20976.448921] __schedule+0x2d1/0x830 [20976.452414] schedule+0x35/0xa0 [20976.455559] schedule_preempt_disabled+0xa/0x10 [20976.460090] __mutex_lock.isra.7+0x310/0x420 [20976.464364] device_del+0x36/0x3c0 [20976.467772] ice_unplug_aux_dev+0x1a/0x40 [ice] [20976.472313] ice_lag_event_handler+0x2a2/0x520 [ice] [20976.477288] notifier_call_chain+0x47/0x70 [20976.481386] __netdev_upper_dev_link+0x18b/0x280 [20976.489845] bond_enslave+0xe05/0x1790 [bonding] [20976.494475] do_setlink+0x336/0xf50 [20976.502517] __rtnl_newlink+0x529/0x8b0 [20976.543441] rtnl_newlink+0x43/0x60 [20976.546934] rtnetlink_rcv_msg+0x2b1/0x360 [20976.559238] netlink_rcv_skb+0x4c/0x120 [20976.563079] netlink_unicast+0x196/0x230 [20976.567005] netlink_sendmsg+0x204/0x3d0 [20976.570930] sock_sendmsg+0x4c/0x50 [20976.574423] ____sys_sendmsg+0x1eb/0x250 [20976.586807] ___sys_sendmsg+0x7c/0xc0 [20976.606353] __sys_sendmsg+0x57/0xa0 [20976.609930] do_syscall_64+0x5b/0x1a0 [20976.613598] entry_SYSCALL_64_after_hwframe+0x65/0xca 1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev() is called from ice_service_task() context, aux device is created and associated device->lock is taken. 2. Command 'ip link ... set master...' calls ice's notifier under RTNL lock and that notifier calls ice_unplug_aux_dev(). That function tries to take aux device->lock but this is already taken by ice_plug_aux_dev() in step 1 3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already taken in step 2 4. Dead-lock The patch fixes this issue by following changes: - Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev() call in ice_service_task() - The bit is checked in ice_clear_rdma_cap() and only if it is not set then ice_unplug_aux_dev() is called. If it is set (in other words plugging of aux device was requested and ice_plug_aux_dev() is potentially running) then the function only clears the bit - Once ice_plug_aux_dev() call (in ice_service_task) is finished the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked whether it was already cleared by ice_clear_rdma_cap(). If so then aux device is unplugged. Signed-off-by: Ivan Vecera Co-developed-by: Petr Oros Signed-off-by: Petr Oros Reviewed-by: Dave Ertman Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski Signed-off-by: Greg Kroah-Hartman --- drivers/net/ethernet/intel/ice/ice.h | 11 ++++++++++- drivers/net/ethernet/intel/ice/ice_main.c | 12 +++++++++++- 2 files changed, 21 insertions(+), 2 deletions(-) --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -703,7 +703,16 @@ static inline void ice_set_rdma_cap(stru */ static inline void ice_clear_rdma_cap(struct ice_pf *pf) { - ice_unplug_aux_dev(pf); + /* We can directly unplug aux device here only if the flag bit + * ICE_FLAG_PLUG_AUX_DEV is not set because ice_unplug_aux_dev() + * could race with ice_plug_aux_dev() called from + * ice_service_task(). In this case we only clear that bit now and + * aux device will be unplugged later once ice_plug_aux_device() + * called from ice_service_task() finishes (see ice_service_task()). + */ + if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) + ice_unplug_aux_dev(pf); + clear_bit(ICE_FLAG_RDMA_ENA, pf->flags); clear_bit(ICE_FLAG_AUX_ENA, pf->flags); } --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2143,9 +2143,19 @@ static void ice_service_task(struct work return; } - if (test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) + if (test_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) { + /* Plug aux device per request */ ice_plug_aux_dev(pf); + /* Mark plugging as done but check whether unplug was + * requested during ice_plug_aux_dev() call + * (e.g. from ice_clear_rdma_cap()) and if so then + * plug aux device. + */ + if (!test_and_clear_bit(ICE_FLAG_PLUG_AUX_DEV, pf->flags)) + ice_unplug_aux_dev(pf); + } + if (test_and_clear_bit(ICE_FLAG_MTU_CHANGED, pf->flags)) { struct iidc_event *event;