Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp4005733rdb; Mon, 11 Dec 2023 06:27:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IHGFdKyrxIPwQSCTbDEO/cy4U14FLX1bDRG3aGoA7kn9QK5NB3TSUBJPTRVzRgLIKicx6uv X-Received: by 2002:a17:90b:4b4f:b0:286:dd9f:fe39 with SMTP id mi15-20020a17090b4b4f00b00286dd9ffe39mr3474967pjb.55.1702304844725; Mon, 11 Dec 2023 06:27:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702304844; cv=none; d=google.com; s=arc-20160816; b=n5Qk9q382PEHbD2KspOwQqnZgHRQaFLb2YMew9J+Nj4CAKm6q4FbPTKgRb0+rcTIQ5 qPEXpFjuLqamwYVPYQgUUMD5r1NJlh1tWzLljMf8kSabUS55s2FL5Iuc1i8J8/G0UAo2 ln7ihbrkHwSDmwFd8Z6sRf68vCW2qd/f0yqJdjKtuF5LmnW2ePTI6uTAoDuDWU0JHfDK 1H2/cG+iv57tQHvQ41KYMMzuKyMSOmNvYSwWdPSfLTlXoT8AhHZgtia4/ttNHlFPBpPK 0cUWCsRKDw43NIWa7Gh8mqrKcPfX4kyaeSdPzbGvvqxPursMQxNJGtYD8WouEjZ9518y cjHA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=WnVWnfgDWlNIvaTnQGAKn3Bh79PthC+vFqsZ7u+oMf4=; fh=P9ty1MrCOVKQR7TZQpradc9E/r3Z5N8a4dZG3PGFYvY=; b=bPk+dyOOyikmTKPYY5J8jeE6Tg0hr6bfi4oUn6IHP6iJSB9bbFYAngPYLtbkhV9ZUe dszXogiVBVUyjgAa7Hzo8b0DBjnpMHfYl3wU1+emArMsu1WhMAZSdT/evsEXFwFYe2pG uqUXEKDFdkVPFlu7WJJhjCX0RbTtgmZFDXdRaSIb7ivSS2rzyvyF/xB+2oubNLxqrQvL ZeP1AM7TcwDA8fZgBl7apoMFkUMreniOmx0k/k+5sP1KnUG10rGkhlMz4N/UHxlpL4gr Z3tsPwcI++m1kez5c08pZAaCYMYhazqpvgpbJRMe7Inwu3b7EyfCvTs34GYb3ZMxiU4O DeVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FaN3SbKS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from morse.vger.email (morse.vger.email. [23.128.96.31]) by mx.google.com with ESMTPS id nm14-20020a17090b19ce00b002860ab2ad3dsi6162193pjb.57.2023.12.11.06.27.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 11 Dec 2023 06:27:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) client-ip=23.128.96.31; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=FaN3SbKS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.31 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by morse.vger.email (Postfix) with ESMTP id E6A9B803E9E0; Mon, 11 Dec 2023 06:27:21 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at morse.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343738AbjLKO1I (ORCPT + 99 others); Mon, 11 Dec 2023 09:27:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1343741AbjLKO0l (ORCPT ); Mon, 11 Dec 2023 09:26:41 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 64857F3 for ; Mon, 11 Dec 2023 06:26:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1702304805; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=WnVWnfgDWlNIvaTnQGAKn3Bh79PthC+vFqsZ7u+oMf4=; b=FaN3SbKSnMvUy6sVUEaGDXNQXWSXcbNqi/0qMkX97drLCJf3nyY8de4vWVkyZNgzbChrGi BmpdivTizQGxzv1dCejFNq3C19XakTwIYzOxSnTHBA9jBlmsGd8lfiA3wUIZUOWETJYw4N LsvU3JL5Snr5EgGU36rKgyFp8vzl//U= Received: from mail-pj1-f69.google.com (mail-pj1-f69.google.com [209.85.216.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-384-42wQg02XNSK2aNtQgp6w7g-1; Mon, 11 Dec 2023 09:26:17 -0500 X-MC-Unique: 42wQg02XNSK2aNtQgp6w7g-1 Received: by mail-pj1-f69.google.com with SMTP id 98e67ed59e1d1-286a4fc4e9eso5434040a91.2 for ; Mon, 11 Dec 2023 06:26:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1702304776; x=1702909576; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WnVWnfgDWlNIvaTnQGAKn3Bh79PthC+vFqsZ7u+oMf4=; b=tKMa0iZBLw3rDbzs54g/+U0JLn+cCtQVrrQUyhFFjmOett2eyw+6A2MPjYj2uRtag5 hA4772SOk8nHY/QmQiS/FWsRgu2E7nRFNTHnjVzTtkw0d/uirnSmtUKfHRNjSXpfv6sY kbh2CCzQ5APX3rkC8e3/UnA52zCTuTQHTkltSu3KzCyet2EwNYo95iapnNNf1skSRz9b LEy/OqxCzLwCvJ2/g/XDDYgkHO3AQI8j2L8sAWRV6eNZCf2Hq+uFayAwzy7ihOS1wtMC DwKMUDGBe6kHOwMh+4fYHyX401UG7lwKSgFnESSGIsgD5nc3JT1Fut2RMtRBe0cy6vaU 0tyA== X-Gm-Message-State: AOJu0YxYT9rMdSc8yxP2AfV5hWUS0M5o4u3I78w7tt1XrCsqSQ8p7arj MLNxBIVcoRUuNu/QMCad4p8CqsNX+5HVrmXsEphEDCD1x+LzPPcG3xfdvZxP8yAj9ZBEeMAMs+P DeR5UDsHm77TD0qWpEEamvATICjBuq4wf2PEphl65 X-Received: by 2002:a17:90b:46c8:b0:286:b46e:b749 with SMTP id jx8-20020a17090b46c800b00286b46eb749mr3279130pjb.48.1702304776158; Mon, 11 Dec 2023 06:26:16 -0800 (PST) X-Received: by 2002:a17:90b:46c8:b0:286:b46e:b749 with SMTP id jx8-20020a17090b46c800b00286b46eb749mr3279117pjb.48.1702304775795; Mon, 11 Dec 2023 06:26:15 -0800 (PST) MIME-Version: 1.0 References: <20231211130426.1500427-1-neelx@redhat.com> <20231211130426.1500427-2-neelx@redhat.com> <20231211134542.GG4870@unreal> In-Reply-To: <20231211134542.GG4870@unreal> From: Daniel Vacek Date: Mon, 11 Dec 2023 15:25:39 +0100 Message-ID: Subject: Re: [PATCH 1/2] IB/ipoib: Fix mcast list locking To: Leon Romanovsky Cc: Jason Gunthorpe , linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org, Yuya Fujita-bishamonten Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=0.6 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_SORBS_WEB,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on morse.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (morse.vger.email [0.0.0.0]); Mon, 11 Dec 2023 06:27:22 -0800 (PST) On Mon, Dec 11, 2023 at 2:45=E2=80=AFPM Leon Romanovsky w= rote: > > On Mon, Dec 11, 2023 at 02:04:24PM +0100, Daniel Vacek wrote: > > We need an additional protection against list removal between ipoib_mca= st_join_task() > > and ipoib_mcast_dev_flush() in case the &priv->lock needs to be dropped= while > > iterating the &priv->multicast_list in ipoib_mcast_join_task(). If the = mcast > > is removed while the lock was dropped, the for loop spins forever resul= ting > > in a hard lockup (as was reported on RHEL 4.18.0-372.75.1.el8_6 kernel)= : > > > > Task A (kworker/u72:2 below) | Task B (kworker/u72:0 below) > > -----------------------------------+-------------------------------= ---- > > ipoib_mcast_join_task(work) | ipoib_ib_dev_flush_light(work) > > spin_lock_irq(&priv->lock) | __ipoib_ib_dev_flush(priv, ...= ) > > list_for_each_entry(mcast, | ipoib_mcast_dev_flush(dev =3D = priv->dev) > > &priv->multicast_list, list) | mutex_lock(&priv->mcast_mute= x) > > ipoib_mcast_join(dev, mcast) | > > spin_unlock_irq(&priv->lock) | > > | spin_lock_irqsave(&priv->loc= k, flags) > > | list_for_each_entry_safe(mca= st, tmcast, > > | &priv->multic= ast_list, list) > > | list_del(&mcast->list); > > | list_add_tail(&mcast->list= , &remove_list) > > | spin_unlock_irqrestore(&priv= ->lock, flags) > > spin_lock_irq(&priv->lock) | > > | ipoib_mcast_remove_list(&rem= ove_list) > > (Here, mcast is no longer on the | list_for_each_entry_safe(m= cast, tmcast, > > &priv->multicast_list and we keep | rem= ove_list, list) > > spinning on the &remove_list of the \ >>> wait_for_completion(&mca= st->done) > > other thread which is blocked and the| > > list is still valid on it's stack.) | mutex_unlock(&priv->mcast_mu= tex) > > > > Fix this by adding mutex_lock(&priv->mcast_mutex) to ipoib_mcast_join_t= ask(). > > I don't entirely understand the issue and the proposed solution. > There is only one spin_unlock_irq() in the middle of list_for_each_entry(= mcast, &priv->multicast_list, list) > and it is right before return statement which will break the loop. So > how will loop spin forever? There's another unlock/lock pair around ib_sa_join_multicast() call in ipoib_mcast_join() no matter the outcome of the condition. The ib_sa_join_multicast() cannot be called with the lock being held due to GFP_KERNEL allocation can possibly sleep. That's what's causing the issue. Actually if you check the code, only if the mentioned condition is false (and the loop is not broken and returned from ipoib_mcast_join_task()) the lock is released and re-acquired, creating the window for ipoib_ib_dev_flush_light()/ipoib_mcast_dev_flush() to break the list. The vmcore data shows/confirms that clearly. --nX > Thanks > > > Unfortunately we could not reproduce the lockup and confirm this fix bu= t > > based on the code review I think this fix should address such lockups. > > > > crash> bc 31 > > PID: 747 TASK: ff1c6a1a007e8000 CPU: 31 COMMAND: "kworker/u72:2= " > > -- > > [exception RIP: ipoib_mcast_join_task+0x1b1] > > RIP: ffffffffc0944ac1 RSP: ff646f199a8c7e00 RFLAGS: 00000002 > > RAX: 0000000000000000 RBX: ff1c6a1a04dc82f8 RCX: 0000000000000000 > > work (&priv->mcast_task{,.work}) > > RDX: ff1c6a192d60ac68 RSI: 0000000000000286 RDI: ff1c6a1a04dc8000 > > &mcast->list > > RBP: ff646f199a8c7e90 R8: ff1c699980019420 R9: ff1c6a1920c9a000 > > R10: ff646f199a8c7e00 R11: ff1c6a191a7d9800 R12: ff1c6a192d60ac00 > > mcast > > R13: ff1c6a1d82200000 R14: ff1c6a1a04dc8000 R15: ff1c6a1a04dc82d8 > > dev priv (&priv->lock) &priv->multica= st_list (aka head) > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > > --- --- > > #5 [ff646f199a8c7e00] ipoib_mcast_join_task+0x1b1 at ffffffffc0944ac1 = [ib_ipoib] > > #6 [ff646f199a8c7e98] process_one_work+0x1a7 at ffffffff9bf10967 > > > > crash> rx ff646f199a8c7e68 > > ff646f199a8c7e68: ff1c6a1a04dc82f8 <<< work =3D &priv->mcast_task.work > > > > crash> list -hO ipoib_dev_priv.multicast_list ff1c6a1a04dc8000 > > (empty) > > > > crash> ipoib_dev_priv.mcast_task.work.func,mcast_mutex.owner.counter ff= 1c6a1a04dc8000 > > mcast_task.work.func =3D 0xffffffffc0944910 , > > mcast_mutex.owner.counter =3D 0xff1c69998efec000 > > > > crash> b 8 > > PID: 8 TASK: ff1c69998efec000 CPU: 33 COMMAND: "kworker/u72:0= " > > -- > > #3 [ff646f1980153d50] wait_for_completion+0x96 at ffffffff9c7d7646 > > #4 [ff646f1980153d90] ipoib_mcast_remove_list+0x56 at ffffffffc0944dc6= [ib_ipoib] > > #5 [ff646f1980153de8] ipoib_mcast_dev_flush+0x1a7 at ffffffffc09455a7 = [ib_ipoib] > > #6 [ff646f1980153e58] __ipoib_ib_dev_flush+0x1a4 at ffffffffc09431a4 [= ib_ipoib] > > #7 [ff646f1980153e98] process_one_work+0x1a7 at ffffffff9bf10967 > > > > crash> rx ff646f1980153e68 > > ff646f1980153e68: ff1c6a1a04dc83f0 <<< work =3D &priv->flush_light > > > > crash> ipoib_dev_priv.flush_light.func,broadcast ff1c6a1a04dc8000 > > flush_light.func =3D 0xffffffffc0943820 , > > broadcast =3D 0x0, > > > > The mcast(s) on the &remove_list (the remaining part of the ex &priv->m= ulticast_list): > > > > crash> list -s ipoib_mcast.done.done ipoib_mcast.list -H ff646f1980153e= 10 | paste - - > > ff1c6a192bd0c200 done.done =3D 0x0, > > ff1c6a192d60ac00 done.done =3D 0x0, > > > > Reported-by: Yuya Fujita-bishamonten > > Signed-off-by: Daniel Vacek > > --- > > drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/i= nfiniband/ulp/ipoib/ipoib_multicast.c > > index 5b3154503bf4..8e4f2c8839be 100644 > > --- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c > > @@ -580,6 +580,7 @@ void ipoib_mcast_join_task(struct work_struct *work= ) > > } > > netif_addr_unlock_bh(dev); > > > > + mutex_lock(&priv->mcast_mutex); > > spin_lock_irq(&priv->lock); > > if (!test_bit(IPOIB_FLAG_OPER_UP, &priv->flags)) > > goto out; > > @@ -634,6 +635,7 @@ void ipoib_mcast_join_task(struct work_struct *work= ) > > /* Found the next unjoined group */ > > if (ipoib_mcast_join(dev, mcast)) { > > spin_unlock_irq(&priv->lock); > > + mutex_unlock(&priv->mcast_mutex); > > return; > > } > > } else if (!delay_until || > > @@ -655,6 +657,7 @@ void ipoib_mcast_join_task(struct work_struct *work= ) > > ipoib_mcast_join(dev, mcast); > > > > spin_unlock_irq(&priv->lock); > > + mutex_unlock(&priv->mcast_mutex); > > } > > > > void ipoib_mcast_start_thread(struct net_device *dev) > > -- > > 2.43.0 > > >