Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753640AbcLIGu1 (ORCPT ); Fri, 9 Dec 2016 01:50:27 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:54480 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753571AbcLIGuZ (ORCPT ); Fri, 9 Dec 2016 01:50:25 -0500 Subject: Re: [PATCH] pci-error-recover: doc cleanup To: linasvepstas@gmail.com, Cao jin References: <1481184974-12505-1-git-send-email-caoj.fnst@cn.fujitsu.com> <20161208070539.0f00ce71@lwn.net> <58496AA4.5030602@cn.fujitsu.com> Cc: Jonathan Corbet , "linux-pci@vger.kernel.org" , linux-doc@vger.kernel.org, "linux-kernel@vger.kernel.org" , Bjorn Helgaas From: Andrew Donnellan Date: Fri, 9 Dec 2016 17:50:17 +1100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Icedove/45.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 16120906-0044-0000-0000-0000020E41B1 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 16120906-0045-0000-0000-0000062432D2 Message-Id: <3ed3151c-eeef-940c-8a9c-49cf53a51d49@au1.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2016-12-09_04:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609300000 definitions=main-1612090098 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1111 Lines: 23 On 09/12/16 17:24, Linas Vepstas wrote: > I suppose I'm confused, but I recall that link resets are non-fatal. > Fatal errors typically require that the the pci adapter be completely > reset, any adapter firmware to be reloaded from scratch, the device > driver has to kill all device state and start from scratch. Its huge. Is there a difference in terminology between an AER fatal error and what EEH/IBM people think of as a fatal error? > If the fatal error is on pci device that is under a block device > holding a file system, then (usually) there is no way to recover, > because the block layer (and file system) cannot deal with a block > device that disappeared and then reappeared some few seconds later. > (maybe some future zfs or lvm or btrfs might be able to deal with > this, but not today) Is this still true? I'm not at all familiar with the block device side of it, but the cxlflash driver has reasonably full EEH support, including surviving a full PHB fence and complete reset. -- Andrew Donnellan OzLabs, ADL Canberra andrew.donnellan@au1.ibm.com IBM Australia Limited