Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753822Ab3GXOWt (ORCPT ); Wed, 24 Jul 2013 10:22:49 -0400 Received: from shadbolt.e.decadent.org.uk ([88.96.1.126]:45741 "EHLO shadbolt.e.decadent.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751951Ab3GXOGn (ORCPT ); Wed, 24 Jul 2013 10:06:43 -0400 Content-Type: text/plain; charset="UTF-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit MIME-Version: 1.0 From: Ben Hutchings To: linux-kernel@vger.kernel.org, stable@vger.kernel.org CC: akpm@linux-foundation.org, "Ric Wheeler" , "James Bottomley" Date: Wed, 24 Jul 2013 15:02:45 +0100 Message-ID: X-Mailer: LinuxStableQueue (scripts by bwh) Subject: [43/85] [SCSI] sd: fix array cache flushing bug causing performance problems In-Reply-To: X-SA-Exim-Connect-IP: 192.168.4.101 X-SA-Exim-Mail-From: ben@decadent.org.uk X-SA-Exim-Scanned: No (on shadbolt.decadent.org.uk); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3577 Lines: 106 3.2.49-rc1 review patch. If anyone has any objections, please let me know. ------------------ From: James Bottomley commit 39c60a0948cc06139e2fbfe084f83cb7e7deae3b upstream. Some arrays synchronize their full non volatile cache when the sd driver sends a SYNCHRONIZE CACHE command. Unfortunately, they can have Terrabytes of this and we send a SYNCHRONIZE CACHE for every barrier if an array reports it has a writeback cache. This leads to massive slowdowns on journalled filesystems. The fix is to allow userspace to turn off the writeback cache setting as a temporary measure (i.e. without doing the MODE SELECT to write it back to the device), so even though the device reported it has a writeback cache, the user, knowing that the cache is non volatile and all they care about is filesystem correctness, can turn that bit off in the kernel and avoid the performance ruinous (and safety irrelevant) SYNCHRONIZE CACHE commands. The way you do this is add a 'temporary' prefix when performing the usual cache setting operations, so echo temporary write through > /sys/class/scsi_disk//cache_type Reported-by: Ric Wheeler Signed-off-by: James Bottomley Signed-off-by: Ben Hutchings --- drivers/scsi/sd.c | 20 ++++++++++++++++++++ drivers/scsi/sd.h | 1 + 2 files changed, 21 insertions(+) --- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -138,6 +138,7 @@ sd_store_cache_type(struct device *dev, char *buffer_data; struct scsi_mode_data data; struct scsi_sense_hdr sshdr; + const char *temp = "temporary "; int len; if (sdp->type != TYPE_DISK) @@ -146,6 +147,13 @@ sd_store_cache_type(struct device *dev, * it's not worth the risk */ return -EINVAL; + if (strncmp(buf, temp, sizeof(temp) - 1) == 0) { + buf += sizeof(temp) - 1; + sdkp->cache_override = 1; + } else { + sdkp->cache_override = 0; + } + for (i = 0; i < ARRAY_SIZE(sd_cache_types); i++) { len = strlen(sd_cache_types[i]); if (strncmp(sd_cache_types[i], buf, len) == 0 && @@ -158,6 +166,13 @@ sd_store_cache_type(struct device *dev, return -EINVAL; rcd = ct & 0x01 ? 1 : 0; wce = ct & 0x02 ? 1 : 0; + + if (sdkp->cache_override) { + sdkp->WCE = wce; + sdkp->RCD = rcd; + return count; + } + if (scsi_mode_sense(sdp, 0x08, 8, buffer, sizeof(buffer), SD_TIMEOUT, SD_MAX_RETRIES, &data, NULL)) return -EINVAL; @@ -2037,6 +2052,10 @@ sd_read_cache_type(struct scsi_disk *sdk int old_rcd = sdkp->RCD; int old_dpofua = sdkp->DPOFUA; + + if (sdkp->cache_override) + return; + first_len = 4; if (sdp->skip_ms_page_8) { if (sdp->type == TYPE_RBC) @@ -2518,6 +2537,7 @@ static void sd_probe_async(void *data, a sdkp->capacity = 0; sdkp->media_present = 1; sdkp->write_prot = 0; + sdkp->cache_override = 0; sdkp->WCE = 0; sdkp->RCD = 0; sdkp->ATO = 0; --- a/drivers/scsi/sd.h +++ b/drivers/scsi/sd.h @@ -64,6 +64,7 @@ struct scsi_disk { u8 protection_type;/* Data Integrity Field */ u8 provisioning_mode; unsigned ATO : 1; /* state of disk ATO bit */ + unsigned cache_override : 1; /* temp override of WCE,RCD */ unsigned WCE : 1; /* state of disk WCE bit */ unsigned RCD : 1; /* state of disk RCD bit, unused */ unsigned DPOFUA : 1; /* state of disk DPOFUA bit */ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/