Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3658212pxb; Mon, 1 Feb 2021 00:50:47 -0800 (PST) X-Google-Smtp-Source: ABdhPJzOxBBQSrvJyS7QIF4Nn7ebdsRTS1RoXFVBJ42e3xRzP1cGPeoxpr0eHZ8UewOB+TtswKxf X-Received: by 2002:a05:6402:424a:: with SMTP id g10mr17660832edb.236.1612169447309; Mon, 01 Feb 2021 00:50:47 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612169447; cv=none; d=google.com; s=arc-20160816; b=xT9ujPxt0uixW3A+Nv2KZBoJ0UMP+hlzdmCTx2WssFdmYk85HCmJf4LkJyy4kefHKo iyhRhBXc/z5sAIEL4ZwuSIq2o55E4/JxnTtTsvy57aPyUdFhOkGE0OWPMhYNJUJ2RqJ2 gGkWqeo8DMyHGZZKr2DDlaiMyG4c/5Z9hlOnDLbwBSMrM4BZLpolgSdNdNooIueLDH15 5/SjdtBew/VOfkRzQNf9pqZb8qzCQO2r47nyC9c9va72VrCPptnTTSfYV3T5GRiJMQ/w 383e1HlsfBkOl0zWOcw9X21g4oY6vrnKwiIYAC9zoW7JVi9mf2Xzea3KvqHr4LyXm2Og C1Cg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=Te/LfNFNHlkZmu0wtvjyLKL7L+jjIfUI8gIIyNeqGJA=; b=KfqUR/AlcjLD9AZZuKLPaZoGdIVTIjIMWgC8Hn44B0fKr2RsUKld9ckuGxOIatIfAn tSax7POtUAZLQx0fqQ76EMpeJhJs3TCC3DtG+OV8uFUFZzZblpds+iNIUoelY009NIun soCruadgitKwD4jFpALC3RLvsxYWzSlYvjtASyTreDKBK4qaqZ8ihKN3QjXdLqjzuVB8 NPnLuZOUON6Xlv/7HygyP+Nh6N55nqeFfb3/IlJgaJhplolvdh8ChCDy2bXedhaNRU0e v5EZj/gWlryLvJsQVs1hiaD9UqYdX/URxl6G280WFM6reMifFXafGMWxaGyap8A+4zs3 xexQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hs23si9335937ejc.26.2021.02.01.00.50.22; Mon, 01 Feb 2021 00:50:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230080AbhBAIsp (ORCPT + 99 others); Mon, 1 Feb 2021 03:48:45 -0500 Received: from szxga08-in.huawei.com ([45.249.212.255]:2819 "EHLO szxga08-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229558AbhBAIsl (ORCPT ); Mon, 1 Feb 2021 03:48:41 -0500 Received: from DGGEMM403-HUB.china.huawei.com (unknown [172.30.72.57]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4DThNv34v3z13pGH; Mon, 1 Feb 2021 16:45:51 +0800 (CST) Received: from dggema772-chm.china.huawei.com (10.1.198.214) by DGGEMM403-HUB.china.huawei.com (10.3.20.211) with Microsoft SMTP Server (TLS) id 14.3.498.0; Mon, 1 Feb 2021 16:47:53 +0800 Received: from [10.169.42.93] (10.169.42.93) by dggema772-chm.china.huawei.com (10.1.198.214) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256_P256) id 15.1.2106.2; Mon, 1 Feb 2021 16:47:52 +0800 Subject: Re: [PATCH v2] nvme-multipath: Early exit if no path is available To: Hannes Reinecke , Sagi Grimberg , "Daniel Wagner" CC: , , "Jens Axboe" , Keith Busch , Christoph Hellwig References: <20210127103033.15318-1-dwagner@suse.de> <20210128075837.u5u56t23fq5gu6ou@beryllium.lan> <69575290-200e-b4a1-4269-c71e4c2cc37b@huawei.com> <20210128094004.erwnszjqcxlsi2kd@beryllium.lan> <675d3cf7-1ae8-adc5-b6d0-359fe10f6b23@grimberg.me> <59cd053e-46cb-0235-141f-4ce919c93f48@huawei.com> <65392653-6b03-9195-f686-5fe4b3290bd2@suse.de> <81b22bbf-4dd3-6161-e63a-9699690a4e4f@huawei.com> <715dd943-0587-be08-2840-e0948cf0bc62@suse.de> <6ceff3cb-c9e9-7e74-92f0-dd745987c943@huawei.com> <114751ac-1f7d-ce5e-12c5-7d6303bdb999@suse.de> From: Chao Leng Message-ID: Date: Mon, 1 Feb 2021 16:47:50 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.9.0 MIME-Version: 1.0 In-Reply-To: <114751ac-1f7d-ce5e-12c5-7d6303bdb999@suse.de> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.169.42.93] X-ClientProxiedBy: dggeme710-chm.china.huawei.com (10.1.199.106) To dggema772-chm.china.huawei.com (10.1.198.214) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/2/1 15:29, Hannes Reinecke wrote: > On 2/1/21 3:16 AM, Chao Leng wrote: >> >> >> On 2021/1/29 17:20, Hannes Reinecke wrote: >>> On 1/29/21 9:46 AM, Chao Leng wrote: >>>> >>>> >>>> On 2021/1/29 16:33, Hannes Reinecke wrote: >>>>> On 1/29/21 8:45 AM, Chao Leng wrote: >>>>>> >>>>>> >>>>>> On 2021/1/29 15:06, Hannes Reinecke wrote: >>>>>>> On 1/29/21 4:07 AM, Chao Leng wrote: >>>>>>>> >>>>>>>> >>>>>>>> On 2021/1/29 9:42, Sagi Grimberg wrote: >>>>>>>>> >>>>>>>>>>> You can't see exactly where it dies but I followed the assembly to >>>>>>>>>>> nvme_round_robin_path(). Maybe it's not the initial nvme_next_ns(head, >>>>>>>>>>> old) which returns NULL but nvme_next_ns() is returning NULL eventually >>>>>>>>>>> (list_next_or_null_rcu()). >>>>>>>>>> So there is other bug cause nvme_next_ns abormal. >>>>>>>>>> I review the code about head->list and head->current_path, I find 2 bugs >>>>>>>>>> may cause the bug: >>>>>>>>>> First, I already send the patch. see: >>>>>>>>>> https://lore.kernel.org/linux-nvme/20210128033351.22116-1-lengchao@huawei.com/ >>>>>>>>>> Second, in nvme_ns_remove, list_del_rcu is before >>>>>>>>>> nvme_mpath_clear_current_path. This may cause "old" is deleted from the >>>>>>>>>> "head", but still use "old". I'm not sure there's any other >>>>>>>>>> consideration here, I will check it and try to fix it. >>>>>>>>> >>>>>>>>> The reason why we first remove from head->list and only then clear >>>>>>>>> current_path is because the other way around there is no way >>>>>>>>> to guarantee that that the ns won't be assigned as current_path >>>>>>>>> again (because it is in head->list). >>>>>>>> ok, I see. >>>>>>>>> >>>>>>>>> nvme_ns_remove fences continue of deletion of the ns by synchronizing >>>>>>>>> the srcu such that for sure the current_path clearance is visible. >>>>>>>> The list will be like this: >>>>>>>> head->next = ns1; >>>>>>>> ns1->next = head; >>>>>>>> old->next = ns1; >>>>>>> >>>>>>> Where does 'old' pointing to? >>>>>>> >>>>>>>> This may cause infinite loop in nvme_round_robin_path. >>>>>>>> for (ns = nvme_next_ns(head, old); >>>>>>>>      ns != old; >>>>>>>>      ns = nvme_next_ns(head, ns)) >>>>>>>> The ns will always be ns1, and then infinite loop. >>>>>>> >>>>>>> No. nvme_next_ns() will return NULL. >>>>>> If there is just one path(the "old") and the "old" is deleted, >>>>>> nvme_next_ns() will return NULL. >>>>>> The list like this: >>>>>> head->next = head; >>>>>> old->next = head; >>>>>> If there is two or more path and the "old" is deleted, >>>>>> "for" will be infinite loop. because nvme_next_ns() will return >>>>>> the path which in the list except the "old", check condition will >>>>>> be true for ever. >>>>> >>>>> But that will be caught by the statement above: >>>>> >>>>> if (list_is_singular(&head->list)) >>>>> >>>>> no? >>>> Two path just a sample example. >>>> If there is just two path, will enter it, may cause no path but there is >>>> actually one path. It is falsely assumed that the "old" must be not deleted. >>>> If there is more than two path, will cause infinite loop. >>> So you mean we'll need something like this? >>> >>> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c >>> index 71696819c228..8ffccaf9c19a 100644 >>> --- a/drivers/nvme/host/multipath.c >>> +++ b/drivers/nvme/host/multipath.c >>> @@ -202,10 +202,12 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) >>>   static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, >>>                  struct nvme_ns *ns) >>>   { >>> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns, >>> -                       siblings); >>> -       if (ns) >>> -               return ns; >>> +       if (ns) { >>> +               ns = list_next_or_null_rcu(&head->list, &ns->siblings, >>> +                                          struct nvme_ns, siblings); >>> +               if (ns) >>> +                       return ns; >>> +       } >> No, in the scenario, ns should not be NULL. > > Why not? 'ns == NULL' is precisely the corner-case this is trying to fix... > >> May be we can do like this: >> >> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c >> index 282b7a4ea9a9..b895011a2cbd 100644 >> --- a/drivers/nvme/host/multipath.c >> +++ b/drivers/nvme/host/multipath.c >> @@ -199,30 +199,24 @@ static struct nvme_ns *__nvme_find_path(struct nvme_ns_head *head, int node) >>          return found; >>   } >> >> -static struct nvme_ns *nvme_next_ns(struct nvme_ns_head *head, >> -               struct nvme_ns *ns) >> -{ >> -       ns = list_next_or_null_rcu(&head->list, &ns->siblings, struct nvme_ns, >> -                       siblings); >> -       if (ns) >> -               return ns; >> -       return list_first_or_null_rcu(&head->list, struct nvme_ns, siblings); >> -} >> +#define nvme_next_ns_condition(head, current, condition) \ >> +({ \ >> +       struct nvme_ns *__ptr = list_next_or_null_rcu(&(head)->list, \ >> +               &(current)->siblings, struct nvme_ns, siblings); \ >> +       __ptr ? __ptr : (condition) ? (condition) = false, \ >> +               list_first_or_null_rcu(&(head)->list, struct nvme_ns, \ >> +                       siblings) : NULL; \ >> +}) >> > Urgh. Please, no. That is well impossible to debug. > Can you please open-code it to demonstrate where the difference to the current (and my fixed) versions is? > I'm still not clear where the problem is once we applied both patches. For example assume the list has three path, and all path is not NVME_ANA_OPTIMIZED: head->next = ns1; ns1->next = ns2; ns2->next = head; old->next = ns2; My patch work flow: nvme_next_ns_condition(head, old, true) return ns2; nvme_next_ns_condition(head, ns2, true) change the condition to false and return ns1; nvme_next_ns_condition(head, ns1, false) return ns2; nvme_next_ns_condition(head, ns2, false) return NULL; And then the loop end. Your patch work flow: nvme_next_ns(head, old) return ns2; nvme_next_ns(head, ns2) return ns1; nvme_next_ns(head, ns1) return ns2; nvme_next_ns(head, ns2) return ns1; nvme_next_ns(head, ns1) return ns2; And then the loop is infinite. > > Cheers, > > Hannes