Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753958Ab3JDSM5 (ORCPT ); Fri, 4 Oct 2013 14:12:57 -0400 Received: from na3sys010aog109.obsmtp.com ([74.125.245.86]:47006 "HELO na3sys010aog109.obsmtp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751169Ab3JDSMz (ORCPT ); Fri, 4 Oct 2013 14:12:55 -0400 MIME-Version: 1.0 In-Reply-To: <1380889136.4010.313.camel@localhost.localdomain> References: <1379705547-15028-1-git-send-email-kys@microsoft.com> <20130920203222.GA14306@kroah.com> <524180B7.7090307@gmail.com> <3413dbd7fa254fd380a84fe6d9cd87e1@SN2PR03MB061.namprd03.prod.outlook.com> <5241C9E7.2000404@cs.wisc.edu> <1380802143.19256.95.camel@haakon3.risingtidesystems.com> <1380889136.4010.313.camel@localhost.localdomain> From: Eric Seppanen Date: Fri, 4 Oct 2013 11:12:34 -0700 Message-ID: Subject: Re: Drivers: scsi: FLUSH timeout To: emilne@redhat.com Cc: "Nicholas A. Bellinger" , KY Srinivasan , "linux-kernel@vger.kernel.org" , "devel@linuxdriverproject.org" , "linux-scsi@vger.kernel.org" Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2126 Lines: 42 On Fri, Oct 4, 2013 at 5:18 AM, Ewan Milne wrote: > On Thu, 2013-10-03 at 13:48 -0700, Eric Seppanen wrote: >> Do I/O timeouts and flush timeouts need to be independently adjusted? >> If you're having trouble with slow operations, it seems likely to be >> across the board. >> >> Flush timeout could be defined as 2x the read/write timeout. Any >> other command-specific timeouts could be scaled the same way. > > It seems to me that there isn't any reason to expect that the maximum > amount of time a device might take to perform various operations are > related by any coefficient. And, an HBA (particularly iSCSI or FC) > could very well have different device types connected at different > target IDs. So I think the flush timeout should be adjustable on > a per-device basis. It's probably related more to the cache size > on the device than anything else... There are two possible delays: how long the device might possibly take, and how long the storage fabric might take. On a local device, only the first matters. But there are environments where the second dominates (e.g. a virtual machine, where the hypervisor's storage uses multipath with a long failover delay). If somebody wants to set flush timeouts > 60 seconds, I would like to know if they're trying to address a slow device or a slow fabric. If it's the fabric, then it's kind of silly to make them set three different timeouts to address the same problem. An alternate way of handling long fabric delays would be to have a fabric_timeout that gets added to all the other timeouts... could be a scsi_host parameter but that's probably overengineering the problem. There are already VM vendors that tell customers to adjust the current sysfs timeout, so the least amount of work would be to make all of the other timeouts track that one in some way (additive or multiplicative). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/