Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750971AbcD2EBK (ORCPT ); Fri, 29 Apr 2016 00:01:10 -0400 Received: from mail-oi0-f48.google.com ([209.85.218.48]:33031 "EHLO mail-oi0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750718AbcD2EBH (ORCPT ); Fri, 29 Apr 2016 00:01:07 -0400 MIME-Version: 1.0 In-Reply-To: <1461859893.1169.49.camel@decadent.org.uk> References: <1461859893.1169.49.camel@decadent.org.uk> From: Rafael David Tinoco Date: Fri, 29 Apr 2016 01:00:37 -0300 Message-ID: Subject: Re: [PATCH 3.16 106/217] sd: disable discard_zeroes_data for UNMAP To: Ben Hutchings Cc: linux-kernel@vger.kernel.org, stable@vger.kernel.org, akpm@linux-foundation.org, Paolo Bonzini , "Martin K. Petersen" , Christoph Hellwig Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3152 Lines: 73 Actually, It was an objection. Knowing that WRITESAME(16), used as the discard mechanism, can cause storage servers to misbehave (like QEMU's SCSI WRITESAME implementation, workaround-ed by commit e461338b6cd4) and those storage servers can't rely on LBPRZ flag to opt out from WRITESAME as discard mechanism (like QEMU does) since it is out of spec... I have also seen storage servers miss-behaving with this specific change (when changing from kernel 3.13 to 3.19, for example): [21354.827291] Write same(16): 93 08 00 00 00 00 00 00 80 00 00 40 00 00 00 00 ... [21420.471648] sd 0:0:2:1: [sdw] FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE [21420.471665] sd 0:0:2:1: [sdw] Sense Key : Illegal Request [current] [21420.471670] sd 0:0:2:1: [sdw] Add. Sense: Invalid field in cdb And this happened because the storage in question didn't set properly "max_ws_blocks" (it was 0) from VPD 0xb0 (max_ws_blocks is calculated using it). Anyway, 2 examples of disk servers that had problems after this change. IMHO the change is good for regular kernel development, and it does guarantee further READ CMDs to read zeros from LBAs, but, it jeopardises already functioning storage servers. If that argument isn't enough, Without properly setting NDOB bit in WRITESAME(16), the data buffer will be read on every SCSI WRITESAME(16) command and that will impact the "discard method" performance (will probably be slower than regular UNMAP command). So far, I'm seeing 2 motives why it shouldn't be on older kernels. On Thu, Apr 28, 2016 at 1:11 PM, Ben Hutchings wrote: > On Wed, 2016-04-27 at 17:43 -0300, Rafael David Tinoco wrote: >> It seems that changing discard method from UNMAP to WRITE SAME(16) >> without using NDOB bit (as first described in sbc3r35b.pdf) can cause >> performance problems on big discards (since data-out buffer will be >> checked for every WRITE SAME command). I think this is happening after >> this commit, since NDOB bit wasn't implemented with this change >> (afaik, iirc). > > Is that an objection, or just a comment? > > I only picked this commit for backporting because it was referenced by > later fixes (commits 397737223c59, f4327a95dd08) and I read the commit > message as saying that it fixes data corruption (sd claims to be > writing zeroes but the whole area might not read back as zeroes). Is > my understanding correct? > > Ben. > >> From the spec: >> """ >> To ensure that subsequent read operations return all zeros in a >> logical block, use the WRITE SAME (16) >> command with the NDOB bit set to one. If the UNMAP bit is set to one, >> then the device server may unmap the logical blocks specified by the >> WRITE SAME (16) >> """ >> >> And there were some problems with this change (specifically QEMU SCSI >> WRITE SAME implementation). So the change (commit e461338b6cd4) was >> made to guarantee that if LBPRZ=0, after VPD 0xB2, UNMAP is still >> picked. WRITESAME(16) is picked only if LBPRZ=1. This last commit >> violated spec in favor of a WRITE SAME "optout" approach for QEMU. >> >> I wonder if this should be taken to previous versions ... >> >> -Rafael Tinoco