Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934215AbcJMSub (ORCPT ); Thu, 13 Oct 2016 14:50:31 -0400 Received: from mail-co1nam03on0117.outbound.protection.outlook.com ([104.47.40.117]:60749 "EHLO NAM03-CO1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S932150AbcJMSuT (ORCPT ); Thu, 13 Oct 2016 14:50:19 -0400 From: "Kani, Toshimitsu" To: "dan.j.williams@intel.com" CC: "linux-kernel@vger.kernel.org" , "linux-nvdimm@lists.01.org" , "vishal.l.verma@intel.com" Subject: Re: [PATCH] pmem: report error on clear poison failure Thread-Topic: [PATCH] pmem: report error on clear poison failure Thread-Index: AQHSJWpbK8SDkE3LT06eGc6/iqHfVaCmi1yAgAABUICAABVeAIAADoUA Date: Thu, 13 Oct 2016 18:16:29 +0000 Message-ID: <1476382494.20881.58.camel@hpe.com> References: <1476374061-9080-1-git-send-email-toshi.kani@hpe.com> <1476374787.20881.34.camel@hpe.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=toshi.kani@hpe.com; x-originating-ip: [15.219.163.9] x-ms-office365-filtering-correlation-id: 7772b820-281a-4d92-5599-08d3f3950dc9 x-microsoft-exchange-diagnostics: 1;CS1PR84MB0005;7:f7rgGz5uD3ODl7Evt+LXIZoIPdfcQZamh3AS44h82sWkremlBqqzYWJ/19UoMnIl0NiEBOy64AC2ZAtwjZL/3WarozA7j+Nr+k5hkp01MMwjVvUEkr+AwKnjRNdK7nimsXHqh7VA4j7IQLK8N0NHrHwG2iy6oQXO3gVO8szdQFEym+3om28Q2VqDpn2uKoEC31jOYeAdIJpqQe3eL8mfSNUKaduaUlf88LR2AvhB1GiIVz2u7jwn2WL/ycKtpqw8hca+X7ptsF+IAgI8pQVuRkBS8ONs0jfAN40xmIU5s0g24ZzNJ07Ubb4bomH33AcDWmou927IC9tzV4Mxq4KlVXk0Y0osdJ2ZFkE++RFZTcs= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:CS1PR84MB0005; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(227479698468861); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040176)(601004)(2401047)(8121501046)(5005006)(10201501046)(3002001)(6055026);SRVR:CS1PR84MB0005;BCL:0;PCL:0;RULEID:;SRVR:CS1PR84MB0005; x-forefront-prvs: 0094E3478A x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(6009001)(7916002)(377454003)(377424004)(189002)(199003)(54534003)(24454002)(66066001)(305945005)(99286002)(4001150100001)(105586002)(110136003)(2950100002)(87936001)(106116001)(93886004)(106356001)(11100500001)(6916009)(5002640100001)(103116003)(4326007)(68736007)(33646002)(8936002)(2900100001)(2906002)(86362001)(81156014)(6116002)(92566002)(5660300001)(2351001)(3280700002)(3660700001)(102836003)(189998001)(5640700001)(10400500002)(77096005)(97736004)(3846002)(101416001)(8676002)(76176999)(122556002)(2501003)(7736002)(50986999)(586003)(81166006)(36756003)(54356999)(19580405001)(19580395003)(7846002);DIR:OUT;SFP:1102;SCL:1;SRVR:CS1PR84MB0005;H:CS1PR84MB0005.NAMPRD84.PROD.OUTLOOK.COM;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" Content-ID: <8910E7B4E99B154AB36F80C2DE8D7354@NAMPRD84.PROD.OUTLOOK.COM> MIME-Version: 1.0 X-OriginatorOrg: hpe.com X-MS-Exchange-CrossTenant-originalarrivaltime: 13 Oct 2016 18:16:29.5506 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 105b2061-b669-4b31-92ac-24d304d195dc X-MS-Exchange-Transport-CrossTenantHeadersStamped: CS1PR84MB0005 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id u9DIoaZW020229 Content-Length: 1911 Lines: 50 On Thu, 2016-10-13 at 10:22 -0700, Dan Williams wrote: > On Thu, Oct 13, 2016 at 9:08 AM, Kani, Toshimitsu > wrote: > > > > On Thu, 2016-10-13 at 09:01 -0700, Dan Williams wrote: > > > > > > On Thu, Oct 13, 2016 at 8:54 AM, Toshi Kani > > > wrote: > > > > > > > > > > > > ACPI Clear Uncorrectable Error DSM function may fail or may be > > > > unsupported on a platform.  pmem_clear_poison() returns without > > > > clearing badblocks in such cases, which leads to a silent data > > > > corruption. > > > > > > > > Change pmem_do_bvec() and pmem_clear_poison() to return -EIO > > > > so that filesystem can log an error message. > > > > > > What's the silent data corruption scenario?  If the clear poison > > > fails I'm assuming that the poison will still be notified on the > > > next > > > read. > > > > I agree that the data is eventually read, but there is no guranteed > > that when it is read soon enough, i.e. user might not access to the > > data for a long time. > > ...but that's the same behavior for errors that we don't yet know > about.  That said, we indeed know that the write failed.  I'd feel > better about this patch if the justification / impact was clearer in > the changelog, because "silent data corruption" is not the impact. Agreed.  How about the following descritpion? === ACPI Clear Uncorrectable Error DSM function may fail or may be unsupported on a platform.  pmem_clear_poison() returns without clearing badblocks in such cases.  This failure is detected at the next read (-EIO). This behavior can lead to an issue when user keeps writing but does not read immedicately.  For instance, flight recorder file may be only read when it is necessary for troubleshooting. Change pmem_do_bvec() and pmem_clear_poison() to return -EIO so that filesystem can log an error message on a write error. === Thanks, -Toshi