Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S941663AbcKQRel (ORCPT ); Thu, 17 Nov 2016 12:34:41 -0500 Received: from szxga02-in.huawei.com ([119.145.14.65]:7222 "EHLO szxga02-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S936256AbcKQReh (ORCPT ); Thu, 17 Nov 2016 12:34:37 -0500 Subject: Re: [RFC PATCH] scsi: libsas: fix WARN on device removal To: , "Martin K. Petersen" References: <1478185120-5509-1-git-send-email-john.garry@huawei.com> <9870e7bc-a472-1913-1930-ac022e8ad5e8@huawei.com> <58257D52.6090507@huawei.com> <93ae84f6-75a2-f576-808e-f98c6256b6a6@huawei.com> <58258631.1090203@huawei.com> CC: wangyijing , Dan Williams , linux-scsi , , "linux-kernel@vger.kernel.org" , , , Tejun Heo , From: John Garry Message-ID: <9bdd2ca5-aa72-6a18-b66d-8e791e4852c7@huawei.com> Date: Thu, 17 Nov 2016 15:23:32 +0000 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.3.0 MIME-Version: 1.0 In-Reply-To: <58258631.1090203@huawei.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.203.181.159] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2780 Lines: 77 On 11/11/2016 08:49, wangyijing wrote: >>>> I have not seen the flutter issue. I am just trying to solve the horrible WARN dump. >>>> However I do understand that there may be a issue related to how we queue the events; there was a recent attempt to fix this, but it came to nothing: >>>> https://www.spinics.net/lists/linux-scsi/msg99991.html >>> >>> We found libsas hotplug several problems: >>> 1. sysfs warning calltrace(like the case you found); >> >> Maybe you can then review my patch. > > I did it, I think your solution to fix the sysfs calltrace issue is ok, and what I worried about is we still need to fix > the rest issues. So it's better if we could fix all issues one time. > @Maintainers, would you be willing to accept this patch as an interim fix for the dastardly WARN while we try to fix the flutter issue? >> >>> 2. hot-add and hot-remove work events may process out of order; >>> 3. in some extreme cases, libsas may miss some events, if the same event is still pending in workqueue. >>> >> >> Can you tell me how to recreate #2 and #3? > > Qilin Chen and Yousong He help me to reproduce it, I told them to reply this mail to tell you the test steps. > Some tests we did is make sas phy link flutter, so hardware would post phy down and phy up events sequentially. > > 1. scsi host workqueue receive phy down and phy up events. in process new added > 2. sas_deform_port would post a new destruct event to scsi host workqueue, so things in workqueue like [phy down-----phy up -----destruct] > > So the phy down logic is separated by phy up, and it's not atomic, not safe, something unexpected would happen. > > For case 3, we make hardware burst post lots pair of phy up and phy down events, so if libsas is processing the phy up event, the next > phy up event can not queue to scsi host workqueue again, it will lost, it's not we expect. > >> >>> It's a complex issue, we posted two patches, try to fix these issues, but now few people are interested in it :( >>> >> >> IIRC, you sent as RFC and got a "reviewed-by" from Hannes, so I'm not sure what else you want. BTW, I thought that the changes were quite drastic. > > I agree, the changes seems something drastic. But I think current libsas hotplug framework has a big flaw. > >> >> John >> >>>> >>>>> >>>>> Alternatively we need a mechanism to cancel in-flight port shutdown >>>>> requests when we start re-attaching devices before queued port >>>>> destruction events have run. >>>>> >>>>> . >>>>> >>>> >>>> >>>> _______________________________________________ >>>> linuxarm mailing list >>>> linuxarm@huawei.com >>>> http://rnd-openeuler.huawei.com/mailman/listinfo/linuxarm >>>> >>>> . >>>> >>> >>> >>> . >>> >> >> >> >> . >> > > > . >