Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752338AbdFOIWp (ORCPT ); Thu, 15 Jun 2017 04:22:45 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:2533 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750861AbdFOIWn (ORCPT ); Thu, 15 Jun 2017 04:22:43 -0400 Subject: Re: [PATCH v2 1/2] libsas: Don't process sas events in static works To: John Garry , Johannes Thumshirn , , References: <1497425597-18799-1-git-send-email-wangyijing@huawei.com> <1497425597-18799-2-git-send-email-wangyijing@huawei.com> <692abe7a-149f-c1bf-5f28-3e36cad81b5a@suse.de> <5940FC1C.5050000@huawei.com> <00f4b3f1-ada0-d07d-2640-d902a437b24e@huawei.com> <59423956.6070905@huawei.com> <1f9d5190-97f2-f98e-c7c4-80e259346e91@huawei.com> CC: , , , , , , , , , , , , , , , , , , Yousong He From: wangyijing Message-ID: <594243A1.7050200@huawei.com> Date: Thu, 15 Jun 2017 16:21:53 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 MIME-Version: 1.0 In-Reply-To: <1f9d5190-97f2-f98e-c7c4-80e259346e91@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 8bit X-Originating-IP: [10.177.23.4] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020202.594243C1.0064,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 6e340c3f3c25b277893d55a21c416396 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2420 Lines: 63 在 2017/6/15 16:00, John Garry 写道: > On 15/06/2017 08:37, wangyijing wrote: >> >> >> 在 2017/6/14 21:08, John Garry 写道: >>> On 14/06/2017 10:04, wangyijing wrote: >>>>>> static void notify_ha_event(struct sas_ha_struct *sas_ha, enum ha_event event) >>>>>>>> { >>>>>>>> + struct sas_ha_event *ev; >>>>>>>> + >>>>>>>> BUG_ON(event >= HA_NUM_EVENTS); >>>>>>>> >>>>>>>> - sas_queue_event(event, &sas_ha->pending, >>>>>>>> - &sas_ha->ha_events[event].work, sas_ha); >>>>>>>> + ev = kzalloc(sizeof(*ev), GFP_ATOMIC); >>>>>>>> + if (!ev) >>>>>>>> + return; >>>>>> GFP_ATOMIC allocations can fail and then no events will be queued *and* we >>>>>> don't report the error back to the caller. >>>>>> >>>> Yes, it's really a problem, but I don't find a better solution, do you have some suggestion ? >>>> >>> >>> Dan raised an issue with this approach, regarding a malfunctioning PHY which spews out events. I still don't think we're handling it safely. Here's the suggestion: >>> - each asd_sas_phy owns a finite-sized pool of events >>> - when the event pool becomes exhausted, libsas stops queuing events (obviously) and disables the PHY in the LLDD >>> - upon attempting to re-enable the PHY from sysfs, libsas first checks that the pool is still not exhausted >>> >>> If you cannot find a good solution, then let us know and we can help. >> >> Hi John and Dan, what's event you found on malfunctioning PHY, if the event is PORTE_BROADCAST_RCVD, since >> every PORTE_BROADCAST_RCVD libsas always call sas_revalidate_domain(), what about keeping a broadcast waiting(not queued in workqueue) >> and discard others. If the event is other types, things may become knotty. >> > > As I mentioned in the v1 series discussion, I found a poorly connected expander PHY was spewing out PHY up and loss of signal events continuously. This is the sort of situation we should protect against. Current solution is ok, as it uses a static event per port/PHY/HA. > > The point is that we cannot allow a PHY to continuously send events to libsas, which may lead to memory exhaustion. The current solution won't introduce memory exhaustion, but it's not ok, since the root of this issue is it may lost event which is normal. If we cannot identify the abnormal PHY, I think your mem pool idea is a candidate solution. > > John > >> >>> >>> John >>> >>> >>> . >>> >> >> >> . >> > > > > . >