Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp4467051rdh; Wed, 29 Nov 2023 02:14:53 -0800 (PST) X-Google-Smtp-Source: AGHT+IFIgHrSBuASj0babuz4RlK9leC5Cekto/rvwnIOORf0gXIdgxytOfyz82vwPM/n9XveXtro X-Received: by 2002:a05:6358:8824:b0:16f:ebd2:6ad9 with SMTP id hv36-20020a056358882400b0016febd26ad9mr209150rwb.17.1701252893298; Wed, 29 Nov 2023 02:14:53 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701252893; cv=none; d=google.com; s=arc-20160816; b=WX9ydBn6ByeuKaXkqIcSCi6vHc47rgdP4HfiDBj9p9i9fa/YOJ2Rhp5ttz+UpsgSwY CLll9FUUjf3fGwZQGVp0xKzUii+dgzmD8guJu6gVPXw06HJJ2jV286Y2PreJN7Bny+I5 3YHm/lyu7Ox19tKmvyyJVDgd6Cxma7jldjqwJytq4pSKWov3nl9kIeXihOR8t8M4B/a4 /XSxjjftqRrmTjcs4hSvEm356egfYctvbfg73hrci5wLG7LZOO8PB+UsXXYnUAoA7Ues 0fM/mwcYf34EhJFIUqZW9xh9iY/2YqBYojzcuvfLCP/sIYL50WhIweYJcdNObdqw9Be4 QUQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=0JS+N53xqRB5aHVYWO4eJA5AknGiMQP+XertKCz6nlw=; fh=eqq/GAz/qC+YkmBD60cYLzRyoW5oi/3/k6z29ZwulzI=; b=lgQRcnrWUHIrP0IMG/vbOcaMDE8RZ0CdStc6th/btLp3l/S7slYlfIQT9NuxpeUBjo /SNlsAzLnCFKTlliEOyNNEJnJ7nrLTiklmfbtvCDgFonZrxgGr21Fw9ISTsjoyH/yZzX KuOWiD6SryeeZCru2f0z2VKmJN1YtUSowb818xT3aQrxb96exuJnFMkvRMRzat4HLJJ9 PuELBcq+SPtPZsznmdUuaFcQLq/8IIXHKhE3/sT2H+cR6aTYkPw9+KFn7W6YsZTwLoRy zfzx8NQFZRhcWi2O6Sim5S4TAPlz1QatVQSf59ckUFHTDMT/xeE2mEahcmvKXvkbmL31 EI0g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=j4RfxldF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id g23-20020a63f417000000b005b9083b81f5si14273910pgi.487.2023.11.29.02.14.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 29 Nov 2023 02:14:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=j4RfxldF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 57FE7804C211; Wed, 29 Nov 2023 02:14:50 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230146AbjK2KOc (ORCPT + 99 others); Wed, 29 Nov 2023 05:14:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34960 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229513AbjK2KOc (ORCPT ); Wed, 29 Nov 2023 05:14:32 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58997E1 for ; Wed, 29 Nov 2023 02:14:38 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 64A58C433C7; Wed, 29 Nov 2023 10:14:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1701252878; bh=x3EsyBCyTPvX/m9MXNOcgKDKbuv2rGH9gFcXsxJcg8k=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=j4RfxldFxov7IXn9fDbe0YkCvsaqJnB0QlWgOFoalPjzF4RBpo5NQzn+8RkYId4dp x7GwjoCEoSII52tOLVXpqg+DiryX5LDdjIzBJDnc/NcD5R1BSFZaHNdXnniUWmkla0 yjH+maTWEmKOO2wSNuQdyxhlCNiktVdQ2ngiSgMAPJa47O2T8VXmsl5+2+DIgPWFWb hH4hZJWJYZDUaGer5aN0l0jOS56M6zmnKQAXjcfE3kZoLQGt3zPIGC1U/Yqs77qwb5 NOdt5O2SzblHMI2YE5a8V1ZIKKL6OxlsEfijb1BnVcV00hayJTmVk4EopIOkZ+S5sR c1SRDb29cPA1A== Date: Wed, 29 Nov 2023 12:14:33 +0200 From: Leon Romanovsky To: Shifeng Li Cc: "Ismail, Mustafa" , "Saleem, Shiraz" , "jgg@ziepe.ca" , "linux-rdma@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Ding, Hui" Subject: Re: [PATCH] RDMA/irdma: Avoid free the non-cqp_request scratch Message-ID: <20231129101433.GC6535@unreal> References: <20231120083122.78532-1-lishifeng1992@126.com> <15f80347-e9e9-49f9-bcab-784974856332@126.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <15f80347-e9e9-49f9-bcab-784974856332@126.com> X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 29 Nov 2023 02:14:50 -0800 (PST) On Fri, Nov 24, 2023 at 10:28:58PM +0800, Shifeng Li wrote: > On 2023/11/22 1:25, Ismail, Mustafa wrote: > > > -----Original Message----- > > > From: Shifeng Li > > > Sent: Monday, November 20, 2023 2:31 AM > > > To: Ismail, Mustafa > > > Cc: Saleem, Shiraz ; jgg@ziepe.ca; > > > leon@kernel.org; linux-rdma@vger.kernel.org; linux-kernel@vger.kernel.org; > > > Ding, Hui ; Shifeng Li > > > Subject: [PATCH] RDMA/irdma: Avoid free the non-cqp_request scratch > > > > > > When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a > > > cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing > > > the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced > > > as wrong struct in irdma_free_pending_cqp_request(). > > > > > > crash> bt 3669 > > > PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" > > > #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 > > > #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 > > > #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f > > > #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 > > > #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 > > > [exception RIP: native_queued_spin_lock_slowpath+1291] > > > RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 > > > RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e > > > RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 > > > RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa > > > R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 > > > R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 > > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 > > > --- --- > > > #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b > > > #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 > > > #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 > > > #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc > > > [irdma] > > > #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 > > > [irdma] > > > #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] > > > #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] > > > #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb > > > #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 > > > #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 > > > #15 [ffff88aa841efb88] device_del at ffffffff82179d23 > > > #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] > > > #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] > > > #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a > > > #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff > > > #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 > > > #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f > > > > > > Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization > > > definitions") > > > Signed-off-by: Shifeng Li > > > --- > > > drivers/infiniband/hw/irdma/utils.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/infiniband/hw/irdma/utils.c > > > b/drivers/infiniband/hw/irdma/utils.c > > > index 445e69e86409..222ec1f761a1 100644 > > > --- a/drivers/infiniband/hw/irdma/utils.c > > > +++ b/drivers/infiniband/hw/irdma/utils.c > > > @@ -541,7 +541,7 @@ void irdma_cleanup_pending_cqp_op(struct > > > irdma_pci_f *rf) > > > for (i = 0; i < pending_work; i++) { > > > cqp_request = (struct irdma_cqp_request *)(unsigned long) > > > cqp->scratch_array[wqe_idx]; > > > - if (cqp_request) > > > + if (cqp_request && cqp_request != (struct irdma_cqp_request > > > +*)&cqp->sc_cqp) > > > irdma_free_pending_cqp_request(cqp, cqp_request); > > > wqe_idx = (wqe_idx + 1) % IRDMA_RING_SIZE(cqp- > > > > sc_cqp.sq_ring); > > > } > > > -- > > > 2.25.1 > > > > Hi Li, > > > > Could you describe how you hit this issue? It seems like the probe would not have completed successfully if the create cceq request was still pending. > > > > A better fix might be to set the scratch to 0 in this case: In irdma_create_ceq(), pass 0 for scratch in call to irdma_sc_cceq_create(&iwceq->sc_ceq, 0), since it doesn't appear to be used in this case. Is it possible to test this fix? Thanks! > > 1. The firmware of net card or the hardware may enter into some kind of error state. And I observed > that the sq_ring of sc_cqp is full through crash. This means that all the 2048 cqp_requests were pending. > > crash> struct irdma_cqp.sc_cqp ffff88afe3e4eee8 > sc_cqp = { > ... > sq_ring = { > head = 91, (The queue head caught up with the queue tail.) > tail = 92, > size = 2048 > }, > ... > } > > 2. The issue is not reproducible. But I have tested the solution that pass 0 for scratch in > call to irdma_sc_cceq_create(&iwceq->sc_ceq, 0) in irdma_create_ceq(). It works well. Will you submit v2? Thanks > > Thanks! > >