Received: by 2002:a05:7412:b101:b0:e2:908c:2ebd with SMTP id az1csp3285407rdb; Thu, 16 Nov 2023 05:46:42 -0800 (PST) X-Google-Smtp-Source: AGHT+IGF5vmDPB1MbTcAu544aPIK8n/Kmm9hBxPzFJgl3HvGwE0nIj5n+AVkp07MZpYaR8DwJzB7 X-Received: by 2002:a05:6e02:216a:b0:357:a049:91c4 with SMTP id s10-20020a056e02216a00b00357a04991c4mr17121272ilv.22.1700142401981; Thu, 16 Nov 2023 05:46:41 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700142401; cv=none; d=google.com; s=arc-20160816; b=AOa5EgLtlOkxZdpqmUss4V9nLJeBf4G0/ZiitCrBwTQEo9t0c6MNjy2WgQLKPeIyXu u2pZIRUrGHZkWPrvzH3hJnUOFR6p7wB7ksktnrGp3oGvGxM61/sXxghefHhYLuiKIkam ir4wJf4RFufnvoAVHoX/rBEr7Cveu79pNhycmpCqe3ImRTiPm+G2zdtc+p36Qe7GbrXA kH/PjRicKiSjzLxdP8WvLIUPn5ZV0bSot1+q8e0QaPRlzjr6dOVK+sMho5GKgJyP0PXD cmSZvRU3oVYqx/zVhjvFQuXRpuGEi0UI3/8bDBSlWmf+Eq+641+FsMXjcOKyerO61xkT 7rnQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=oegHGaQQ0b6ZmMmK9aXWemPt833hquS0Uuxi76IUH9k=; fh=FMUtMVcC97y9iqAdjmGS6ZYlnJVKN4YLZDGNjnXxe0A=; b=kPUHMyrAvVGaafRysFAYy6QuNdSO2dSTAOnBwhmY57PG0tGJExtXM6NmbkcYWpPIM9 n+dfmLY2B4eyU7frBsqJQ1h+Ev3zzmjiQwCzp2A7ObwevT6FrbFOW928f8qZdLZvPsFH i7lD+x+lR8srvXv4T0gy/pzqJC+WhfDDP3n4sQ8Pv6DPncMZxW4N8d0Ikkv3Z1JHaloj 1MQPZyO9nJ36wT7XmPIGO9Aqk+I9QkgYIQF3825EofTEyUean0Ps2ip3UgjKXSM9iQEf Q6PUrgacBIPzge1M+lbZIS50DHZiX+1l4jUtLx3hNsY0S+fWwm3fX4wyShky19ygu9AQ 3POQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id c21-20020a056a00249500b006c3482c4e4dsi12514306pfv.289.2023.11.16.05.46.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Nov 2023 05:46:41 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 40A9B80279D6; Thu, 16 Nov 2023 05:46:39 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345241AbjKPNqM (ORCPT + 99 others); Thu, 16 Nov 2023 08:46:12 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37450 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345244AbjKPNqH (ORCPT ); Thu, 16 Nov 2023 08:46:07 -0500 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 632B5CE; Thu, 16 Nov 2023 05:46:02 -0800 (PST) Received: from dggpemd100001.china.huawei.com (unknown [172.30.72.57]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4SWLnW1mGbz1P8Gy; Thu, 16 Nov 2023 21:42:39 +0800 (CST) Received: from [10.67.120.108] (10.67.120.108) by dggpemd100001.china.huawei.com (7.185.36.94) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1258.23; Thu, 16 Nov 2023 21:46:00 +0800 Message-ID: <9cc196d7-0c4f-ef09-53b8-362d5eb599a6@huawei.com> Date: Thu, 16 Nov 2023 21:45:59 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.3.1 Subject: Re: [PATCH v3] scsi: libsas: Fix set zero-address when device-type != NO_DEVICE Content-Language: en-CA To: John Garry , , , , CC: , , , , , References: <20231116035241.13730-1-yangxingui@huawei.com> From: yangxingui In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.120.108] X-ClientProxiedBy: dggpemm500007.china.huawei.com (7.185.36.183) To dggpemd100001.china.huawei.com (7.185.36.94) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-3.0 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 16 Nov 2023 05:46:39 -0800 (PST) Hi, John Thanks for you reply. On 2023/11/16 18:13, John Garry wrote: > On 16/11/2023 03:52, Xingui Yang wrote: > > I think that patch title can be improved, but I would need to know more > about the problem before suggesting an improvement. How about "Fix port add phy failed" ? > >> Firstly, when ex_phy is added to the parent port, ex_phy->port is not >> set. > > That seems correct, but why mention this now? > >> As a result, sas_port_delete_phy() won't be called in >> sas_unregister_devs_sas_addr(), and although ex_phy's sas_address is >> zero, >> it is not deleted from the parent port's phy_list. > > I am not sure why you mention this now either. You seem to be describing > how the problem occurs without actually mentioning what the problem is. > >> >> Secondly, phy->attached_sas_addr will be set to a zero-address when >> phy->linkrate < SAS_LINK_RATE_1_5_GBPS and device-type != NO_DEVICE >> during >> device registration, such as stp. It will create a new port and all other >> ex_phys whose addresses are zero will be added to the new port in >> sas_ex_get_linkrate(), and it may trigger BUG() as follows: > > I think that it would be better to first mention this crash, i.e. the > problem, how you recreate it, and then describe how and why it happens, > and then tell us how you will fix it How about follows: The following processes trigger a BUG(). A new port port-7:7:0 that created by a new zero-address sata device tries to add phy-7:7:19 had the same zero-address, but phy-7:7:19 is already part of another port. [562240.051046] sas: phy19 part of wide port with phy16 [562240.051197] sas: ex 500e004aaaaaaa1f phy19:U:0 attached: 0000000000000000 (no device) [562240.051203] sas: done REVALIDATING DOMAIN on port 0, pid:435909, res 0x0 [562240.062536] sas: ex 500e004aaaaaaa1f phy0 new device attached [562240.062616] sas: ex 500e004aaaaaaa1f phy00:U:5 attached: 0000000000000000 (stp) [562240.062680] port-7:7:0: trying to add phy phy-7:7:19 fails: it's already part of another port [562240.085064] ------------[ cut here ]------------ [562240.096612] kernel BUG at drivers/scsi/scsi_transport_sas.c:1083! [562240.109611] Internal error: Oops - BUG: 0 [#1] SMP [562240.343518] Process kworker/u256:3 (pid: 435909, stack limit = 0x0000000003bcbebf) [562240.421714] Workqueue: 0000:b4:02.0_disco_q sas_revalidate_domain [libsas] [562240.437173] pstate: 40c00009 (nZcv daif +PAN +UAO) [562240.450478] pc : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] [562240.465283] lr : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] [562240.479751] sp : ffff0000300cfa70 [562240.674822] Call trace: [562240.682709] sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] [562240.694013] sas_ex_get_linkrate.isra.5+0xcc/0x128 [libsas] [562240.704957] sas_ex_discover_end_dev+0xfc/0x538 [libsas] [562240.715508] sas_ex_discover_dev+0x3cc/0x4b8 [libsas] [562240.725634] sas_ex_discover_devices+0x9c/0x1a8 [libsas] [562240.735855] sas_ex_revalidate_domain+0x2f0/0x450 [libsas] [562240.746123] sas_revalidate_domain+0x158/0x160 [libsas] [562240.756014] process_one_work+0x1b4/0x448 [562240.764548] worker_thread+0x54/0x468 [562240.772562] kthread+0x134/0x138 [562240.779989] ret_from_fork+0x10/0x18 We found that phy-7:7:19's port is not set when added to the parent port,then it hadn't be deleted from the parent port's phy_list when call sas_unregister_devs_sas_addr(), and the link rate of the new attached sata device is 5 which is less then 1.5G/s, then the sata device's sas_address was set to a zero-address. Fix the problem as follows: Firstly, set ex_phy->port when ex_phy is added to the parent port. And set ex_dev->parent_port to NULL when the number of PHYs of the parent port becomes 0. Secondly, don't set a zero-address for phy->attached_sas_addr when phy->attached_dev_type != NO_DEVICE. Thanks, Xingui > >> >> [562240.051046] sas: phy19 part of wide port with phy16 >> [562240.051197] sas: ex 500e004aaaaaaa1f phy19:U:0 attached: >> 0000000000000000 (no device) >> [562240.051203] sas: done REVALIDATING DOMAIN on port 0, pid:435909, >> res 0x0 >> >> [562240.062536] sas: ex 500e004aaaaaaa1f phy0 new device attached >> [562240.062616] sas: ex 500e004aaaaaaa1f phy00:U:5 attached: >> 0000000000000000 (stp) >> [562240.062680]  port-7:7:0: trying to add phy phy-7:7:19 fails: it's >> already part of another port >> [562240.085064] ------------[ cut here ]------------ >> [562240.096612] kernel BUG at drivers/scsi/scsi_transport_sas.c:1083! >> [562240.109611] Internal error: Oops - BUG: 0 [#1] SMP >> [562240.343518] Process kworker/u256:3 (pid: 435909, stack limit = >> 0x0000000003bcbebf) >> [562240.421714] Workqueue: 0000:b4:02.0_disco_q sas_revalidate_domain >> [libsas] >> [562240.437173] pstate: 40c00009 (nZcv daif +PAN +UAO) >> [562240.450478] pc : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] >> [562240.465283] lr : sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] >> [562240.479751] sp : ffff0000300cfa70 >> [562240.674822] Call trace: >> [562240.682709]  sas_port_add_phy+0x13c/0x168 [scsi_transport_sas] >> [562240.694013]  sas_ex_get_linkrate.isra.5+0xcc/0x128 [libsas] >> [562240.704957]  sas_ex_discover_end_dev+0xfc/0x538 [libsas] >> [562240.715508]  sas_ex_discover_dev+0x3cc/0x4b8 [libsas] >> [562240.725634]  sas_ex_discover_devices+0x9c/0x1a8 [libsas] >> [562240.735855]  sas_ex_revalidate_domain+0x2f0/0x450 [libsas] >> [562240.746123]  sas_revalidate_domain+0x158/0x160 [libsas] >> [562240.756014]  process_one_work+0x1b4/0x448 >> [562240.764548]  worker_thread+0x54/0x468 >> [562240.772562]  kthread+0x134/0x138 >> [562240.779989]  ret_from_fork+0x10/0x18 >> >> We've done the following to solve this problem: > > I'd use "Fix the problem as follows:"" > >> Firstly, set ex_phy->port when ex_phy is added to the parent port. And >> set >> ex_dev->parent_port to NULL when the number of PHYs of the parent port >> becomes 0. > > Thanks, > John > > .