Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755964AbcKKIY3 (ORCPT ); Fri, 11 Nov 2016 03:24:29 -0500 Received: from szxga02-in.huawei.com ([119.145.14.65]:13698 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751821AbcKKIY0 (ORCPT ); Fri, 11 Nov 2016 03:24:26 -0500 Subject: Re: [RFC PATCH] scsi: libsas: fix WARN on device removal To: wangyijing , Dan Williams References: <1478185120-5509-1-git-send-email-john.garry@huawei.com> <9870e7bc-a472-1913-1930-ac022e8ad5e8@huawei.com> <58257D52.6090507@huawei.com> CC: , "Martin K. Petersen" , linux-scsi , , "linux-kernel@vger.kernel.org" , , , Tejun Heo , From: John Garry Message-ID: <93ae84f6-75a2-f576-808e-f98c6256b6a6@huawei.com> Date: Fri, 11 Nov 2016 08:23:40 +0000 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <58257D52.6090507@huawei.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.47.94.191] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2222 Lines: 65 On 11/11/2016 08:12, wangyijing wrote: >> >> They're not the same. I don't see how your solution properly deals with remote sas_port deletion. >> >> When we unplug a device connected to an expander, can't the sas_port be deleted twice, in sas_unregister_devs_sas_addr() from domain revalidation and also now in sas_destruct_devices()? I think that this gives a NULL dereference. >> And we still get the WARN as the sas_port has still been deleted before the device. >> >> In my solution, we should always delete the sas_port after the attached device. >> >>>> >>>> i.e. it moves the port destruction to the workqueue and still suffers >>>> from the flutter problem: >>>> >>>> http://marc.info/?l=linux-scsi&m=143801026028006&w=2 >>>> http://marc.info/?l=linux-scsi&m=143801971131073&w=2 >>>> >>>> Perhaps we instead need to quiet this warning? >>>> >>>> http://marc.info/?l=linux-scsi&m=143802229932175&w=2 >> >> I have not seen the flutter issue. I am just trying to solve the horrible WARN dump. >> However I do understand that there may be a issue related to how we queue the events; there was a recent attempt to fix this, but it came to nothing: >> https://www.spinics.net/lists/linux-scsi/msg99991.html > > We found libsas hotplug several problems: > 1. sysfs warning calltrace(like the case you found); Maybe you can then review my patch. > 2. hot-add and hot-remove work events may process out of order; > 3. in some extreme cases, libsas may miss some events, if the same event is still pending in workqueue. > Can you tell me how to recreate #2 and #3? > It's a complex issue, we posted two patches, try to fix these issues, but now few people are interested in it :( > IIRC, you sent as RFC and got a "reviewed-by" from Hannes, so I'm not sure what else you want. BTW, I thought that the changes were quite drastic. John >> >>> >>> Alternatively we need a mechanism to cancel in-flight port shutdown >>> requests when we start re-attaching devices before queued port >>> destruction events have run. >>> >>> . >>> >> >> >> _______________________________________________ >> linuxarm mailing list >> linuxarm@huawei.com >> http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm >> >> . >> > > > . >