Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754324AbcKURNK (ORCPT ); Mon, 21 Nov 2016 12:13:10 -0500 Received: from mail-oi0-f49.google.com ([209.85.218.49]:34571 "EHLO mail-oi0-f49.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753309AbcKURNJ (ORCPT ); Mon, 21 Nov 2016 12:13:09 -0500 MIME-Version: 1.0 In-Reply-To: <7d4e4aa5-0d15-ca8c-243f-24c60e1378ed@huawei.com> References: <1478185120-5509-1-git-send-email-john.garry@huawei.com> <9870e7bc-a472-1913-1930-ac022e8ad5e8@huawei.com> <58257D52.6090507@huawei.com> <93ae84f6-75a2-f576-808e-f98c6256b6a6@huawei.com> <58258631.1090203@huawei.com> <9bdd2ca5-aa72-6a18-b66d-8e791e4852c7@huawei.com> <7d4e4aa5-0d15-ca8c-243f-24c60e1378ed@huawei.com> From: Dan Williams Date: Mon, 21 Nov 2016 09:13:06 -0800 Message-ID: Subject: Re: [RFC PATCH] scsi: libsas: fix WARN on device removal To: John Garry Cc: jejb@linux.vnet.ibm.com, "Martin K. Petersen" , wangyijing , linux-scsi , John Garry , "linux-kernel@vger.kernel.org" , linuxarm@huawei.com, lindar_liu@usish.com, Tejun Heo , Jinpu Wang Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2063 Lines: 66 On Mon, Nov 21, 2016 at 7:16 AM, John Garry wrote: >>>>> @Maintainers, would you be willing to accept this patch as an interim >>>>> fix >>>>> for the dastardly WARN while we try to fix the flutter issue? >>>> >>>> >>>> >>>> To me this adds a bug to quiet a benign, albeit noisy, warning. >>>> >>> >>> What is the bug which is being added? >> >> >> The bug where we queue a port teardown, but see a port formation event >> in the meantime. > > > As I understand, this vulnerability already exists: > http://marc.info/?l=linux-scsi&m=143801026028006&w=2 > > I actually don't understand how libsas dealt with flutter (which I take to > mean a burst of up and down events) before these changes, as it can only > queue simultaneously one up and one down event per port. So, if we get a > flutter, then the events are lost and we get indeterminate state. > The events are not lost. The new problem this patch introduces is delaying sas port deletion where it was previously immediate. So now we can get into a situation where the port has gone down and can start processing a port up event before the previous deletion work has run. >> >>> And it's a very noisy warning, as in 6K lines on the console when an >>> expander is unplugged. >> >> >> Does something like this modulate the failure? I'm curious if we simply need to fix the double deletion of the sas_port bsg queue, could you try the changes below? >> >> diff --git a/drivers/scsi/scsi_transport_sas.c >> b/drivers/scsi/scsi_transport_sas.c index >> 60b651bfaa01..11401e5c88ba 100644 >> --- a/drivers/scsi/scsi_transport_sas.c >> +++ b/drivers/scsi/scsi_transport_sas.c >> @@ -262,9 +262,10 @@ static void sas_bsg_remove(struct Scsi_Host >> *shost, struct sas_rphy *rphy >> { >> struct request_queue *q; >> >> - if (rphy) >> + if (rphy) { >> q = rphy->q; >> - else >> + rphy->q = NULL; >> + } else >> q = to_sas_host_attrs(shost)->q; >> >> if (!q) >> >> . >> > >