Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp1528614rwd; Thu, 8 Jun 2023 20:54:03 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ46Yms+gW3qzwBifrBeKZslJ8f/sbktrycmox4ZPfoC6s/x7yI2zZpGXguJTeX/4wr8rVQ1 X-Received: by 2002:a05:620a:4608:b0:75b:23a0:e7ce with SMTP id br8-20020a05620a460800b0075b23a0e7cemr14430qkb.47.1686282843088; Thu, 08 Jun 2023 20:54:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686282843; cv=none; d=google.com; s=arc-20160816; b=N2IVsDf9T2YnTEzGLByAOVBQgySEQhukQWgB2k94+fjtr73eoGLwqyKdU6QaBpXaIv T2O5O+lnir9DJhhlqW5FE5UY714xDvcZk1xx5rNt2KbozvuAabauj2gJRzG/3jgeojpQ Q+msHD6YagTvd4Os3je5ZpvU2+F3/TkmTZhhQ8teGklSsQucA8CRu6nMV5Tpn0CP5UAt dj00tEyH9UnYWOJPuGyvNIbt5n87je4JqLPKlhRdeDUcVDIyi5I1rK/4ZT9HqRG6fZkY CBkPSDc5DY80fUOyeCWs/AQHFUYyeRPM8YYaAXWBSR8wJb1oCi+C9ndvDkIrukLhvikx pPtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=lR4PdEBpm66GXLyK91fb9Hkr7it6RDc+Xu6p+y1I6mA=; b=K8YAsh+hPUXD+i7zzzt8feNeSq8a4FZe0EQ5xQ5Gh0MxxGgjK4JZp8/LrC4oQBTWA0 Fc0WIguUNm4kZSgf2kyWT4TVDftyKzfiyzZk73Zl8Z/3i3LE4QzLtP2JfwtWDpixZTFD 16+6wj7Vfw9RoVAYjKGaJP5sTEqvJFv6PA5tok2BZYuzJ7siarPmREKAIKrQEUZCCjiH rfzY7TqHB1Y1OwKHcA5Hd4II9Yx+btzvZd7iFXC5nsV8ckiLo9bC+8wLZq9BhL3W74Vv EBdOKGAgALAHFKvQ4hM3+BSQpHtNVImEwqVBpQqPrpKHh9fXTqmazYANut++ou9xkCuW 586w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=VzeHWQsO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h8-20020a17090acf0800b0025672b33392si1968415pju.185.2023.06.08.20.53.47; Thu, 08 Jun 2023 20:54:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20221208 header.b=VzeHWQsO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237693AbjFIDXZ (ORCPT + 99 others); Thu, 8 Jun 2023 23:23:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229445AbjFIDXR (ORCPT ); Thu, 8 Jun 2023 23:23:17 -0400 Received: from mail-ej1-x634.google.com (mail-ej1-x634.google.com [IPv6:2a00:1450:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CCAC930F2 for ; Thu, 8 Jun 2023 20:23:15 -0700 (PDT) Received: by mail-ej1-x634.google.com with SMTP id a640c23a62f3a-97458c97333so221023166b.2 for ; Thu, 08 Jun 2023 20:23:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686280994; x=1688872994; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lR4PdEBpm66GXLyK91fb9Hkr7it6RDc+Xu6p+y1I6mA=; b=VzeHWQsOxDrTjhpiRanRhdeNOgoG4H6Sms7xBt7h6K+PhiZqeM9Sbtn3y4WhxHf1bP i3aItjTP1Pr10Yf1PKa8A9H5oF35tpQlxjA2+E/c/TIaXimqKz2OOOFBWGXv+i4/tz+b fk+iQ3kZvi/IF3kQnCvxhSoyx0w02i311P7G69QLNnaWxEs/5nQmbKgB5oNbyk47wTBX Lvm59LwhBu0pG9dGvV5F94iJCzjuFHvJckQeRSAQLPNiKh1kaTgSnN/YQAEq2Au2vtf4 EhdiGqPkKCn9xuLoxz/jyDAz7Gb8tXe6Hqq8JqXxaFBMN8H2MJi0pMcmpVT17ez4snyH 8jYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686280994; x=1688872994; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lR4PdEBpm66GXLyK91fb9Hkr7it6RDc+Xu6p+y1I6mA=; b=OoZxeHx3hFA/9QX50sJjZQKP+rN/5kw8ASwuH1uLqwGdRPYqVBEl6xBsS86MPlFL8B BlNdxIuzRpxLQZJHvQ85zbo1ICw4Q8cWqlzEmuK4EqgwkDtRwBPLdH/GdGQiD54aHA7E 6glbJ/khz+vjuItitBqvNhIeZeG5n4QUJosvfMusk8nl/TD3X9lK/ZnOtsbZBZWw1PW5 04dq4wOlnLSbNK1PgiPhptJy+WmDAx6lmQFpWMaA+RG0XEXueo9ungSlOT4f4YHBtFs4 DlZYVtE6S8jAcJ0wjPAL0QhCV8G4wOdnZweBIVvgzG7NcSCXoC/NqTer8VX6Yrbacgr5 6beA== X-Gm-Message-State: AC+VfDwt/O47ceXMRXR8cAjVaVgKUA7uHI9XO7yTBiZ2Su56XG0peddQ jqDnQdkZGwI3fov9L887T8l3lDdeSKpEYyvuOzI= X-Received: by 2002:a17:907:a4e:b0:969:fc68:fa9a with SMTP id be14-20020a1709070a4e00b00969fc68fa9amr452656ejc.40.1686280994039; Thu, 08 Jun 2023 20:23:14 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?B?6K645pil5YWJ?= Date: Fri, 9 Jun 2023 11:23:02 +0800 Message-ID: Subject: Re: [RFC PATCH 0/4] nvme-tcp: fix hung issues for deleting To: Ming Lei Cc: kbusch@kernel.org, axboe@kernel.dk, hch@lst.de, sagi@grimberg.me, linux-nvme@lists.infradead.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ming Lei =E4=BA=8E2023=E5=B9=B46=E6=9C=888=E6=97=A5= =E5=91=A8=E5=9B=9B 21:51=E5=86=99=E9=81=93=EF=BC=9A > > On Thu, Jun 08, 2023 at 10:48:50AM +0800, =E8=AE=B8=E6=98=A5=E5=85=89 wro= te: > > Ming Lei =E4=BA=8E2023=E5=B9=B46=E6=9C=888=E6=97= =A5=E5=91=A8=E5=9B=9B 08:56=E5=86=99=E9=81=93=EF=BC=9A > > > > > > On Wed, Jun 07, 2023 at 12:09:17PM +0800, =E8=AE=B8=E6=98=A5=E5=85=89= wrote: > > > > Hi Ming: > > > > > > > > Ming Lei =E4=BA=8E2023=E5=B9=B46=E6=9C=886=E6= =97=A5=E5=91=A8=E4=BA=8C 23:15=E5=86=99=E9=81=93=EF=BC=9A > > > > > > > > > > Hello Chunguang, > > > > > > > > > > On Mon, May 29, 2023 at 06:59:22PM +0800, brookxu.cn wrote: > > > > > > From: Chunguang Xu > > > > > > > > > > > > We found that nvme_remove_namespaces() may hang in flush_work(&= ctrl->scan_work) > > > > > > while removing ctrl. The root cause may due to the state of ctr= l changed to > > > > > > NVME_CTRL_DELETING while removing ctrl , which intterupt nvme_t= cp_error_recovery_work()/ > > > > > > nvme_reset_ctrl_work()/nvme_tcp_reconnect_or_remove(). At this= time, ctrl is > > > > > > > > > > I didn't dig into ctrl state check in these error handler yet, bu= t error > > > > > handling is supposed to provide forward progress for any controll= er state. > > > > > > > > > > Can you explain a bit how switching to DELETING interrupts the ab= ove > > > > > error handling and breaks the forward progress guarantee? > > > > > > > > Here we freezed ctrl, if ctrl state has changed to DELETING or > > > > DELETING_NIO(by nvme disconnect), we will break up and lease ctrl > > > > freeze, so nvme_remove_namespaces() hang. > > > > > > > > static void nvme_tcp_error_recovery_work(struct work_struct *work) > > > > { > > > > ... > > > > if (!nvme_change_ctrl_state(ctrl, NVME_CTRL_CONNECTING)) { > > > > /* state change failure is ok if we started ctrl de= lete */ > > > > WARN_ON_ONCE(ctrl->state !=3D NVME_CTRL_DELETING && > > > > ctrl->state !=3D NVME_CTRL_DELETING_NO= IO); > > > > return; > > > > } > > > > > > > > nvme_tcp_reconnect_or_remove(ctrl); > > > > } > > > > > > > > > > > > Another path, we will check ctrl state while reconnecting, if it ch= anges to > > > > DELETING or DELETING_NIO, we will break up and lease ctrl freeze an= d > > > > queue quiescing (through reset path), as a result Hang occurs. > > > > > > > > static void nvme_tcp_reconnect_or_remove(struct nvme_ctrl *ctrl) > > > > { > > > > /* If we are resetting/deleting then do nothing */ > > > > if (ctrl->state !=3D NVME_CTRL_CONNECTING) { > > > > WARN_ON_ONCE(ctrl->state =3D=3D NVME_CTRL_NEW || > > > > ctrl->state =3D=3D NVME_CTRL_LIVE); > > > > return; > > > > } > > > > ... > > > > } > > > > > > > > > > freezed and queue is quiescing . Since scan_work may continue t= o issue IOs to > > > > > > load partition table, make it blocked, and lead to nvme_tcp_err= or_recovery_work() > > > > > > hang in flush_work(&ctrl->scan_work). > > > > > > > > > > > > After analyzation, we found that there are mainly two case: > > > > > > 1. Since ctrl is freeze, scan_work hang in __bio_queue_enter() = while it issue > > > > > > new IO to load partition table. > > > > > > > > > > Yeah, nvme freeze usage is fragile, and I suggested to move > > > > > nvme_start_freeze() from nvme_tcp_teardown_io_queues to > > > > > nvme_tcp_configure_io_queues(), such as the posted change on rdma= : > > > > > > > > > > https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAc= b6nLV61uOGFiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf75= 82e06b0 > > > > > > > > While drive reconnecting, I think we should freeze ctrl or quiescin= g queue, > > > > otherwise nvme_fail_nonready_command()may return BLK_STS_RESOURCE, > > > > and the IOs may retry frequently. So I think we may better freeze c= trl > > > > while entering > > > > error_recovery/reconnect, but need to unfreeze it while exit. > > > > > > quiescing is always done in error handling, and freeze is actually > > > not a must, and it is easier to cause race by calling freeze & unfree= ze > > > from different contexts. > > > > I think if we donot freeze ctrl, as the IO already submit (just queue > > to hctx->dispatch) and may > > pending for a long time, it may trigger new hang task issue, but > > freeze ctrl may can avoid these > > hang task. > > How can the freeze make the difference? If driver/device can't move on, > any request is stuck, so the IO path waits in either submit_bio() or > upper layer after returning from submit_bio(). > Now error_recovery and reset ctrl are handled somewhat differently: 1. error_recovery will freeze the controller, but it will unquiescing queue to fast fail pending IO later, otherwise this part of IO may cause task hang during the reconnection, so while error_recovery work interrupted, just leave ctrl freeze, queue is unquiescing. Think carefully, the new IO will still hang in enter_queue, it seems that this solution still not work fine, if we try to remove freeze from nvme_tcp_teardown_io_queues(), I think we may also need to refactor error_recovery. 2. Reset ctrl will freeze the controller and quiescing queue at the same time, while reset interrupted, ctrl is freeze and the queue is quiescing. I may got the point of you, what https://lore.kernel.org/linux-block/CAHj4cs-4gQHnp5aiekvJmb6o8qAcb6nLV61uOG= FiisCzM49_dg@mail.gmail.com/T/#ma0d6bbfaa0c8c1be79738ff86a2fdcf7582e06b0 proposal seems better. > Thanks, > Ming >