Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760455AbZC3Tmy (ORCPT ); Mon, 30 Mar 2009 15:42:54 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1760242AbZC3Tmn (ORCPT ); Mon, 30 Mar 2009 15:42:43 -0400 Received: from srv5.dvmed.net ([207.36.208.214]:47031 "EHLO mail.dvmed.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755269AbZC3Tml (ORCPT ); Mon, 30 Mar 2009 15:42:41 -0400 Message-ID: <49D1206E.7090809@garzik.org> Date: Mon, 30 Mar 2009 15:41:34 -0400 From: Jeff Garzik User-Agent: Thunderbird 2.0.0.21 (X11/20090320) MIME-Version: 1.0 To: Rik van Riel CC: Linus Torvalds , Ric Wheeler , "Andreas T.Auer" , Alan Cox , Theodore Tso , Mark Lord , Stefan Richter , Matthew Garrett , Andrew Morton , David Rees , Jesper Krogh , Linux Kernel Mailing List Subject: Re: Linux 2.6.29 References: <49CD7B10.7010601@garzik.org> <49CD891A.7030103@rtr.ca> <49CD9047.4060500@garzik.org> <49CE2633.2000903@s5r6.in-berlin.de> <49CE3186.8090903@garzik.org> <49CE35AE.1080702@s5r6.in-berlin.de> <49CE3F74.6090103@rtr.ca> <20090329231451.GR26138@disturbed> <20090330003948.GA13356@mit.edu> <49D0710A.1030805@ursus.ath.cx> <20090330100546.51907bd2@the-village.bc.nu> <49D0A3D6.4000300@ursus.ath.cx> <49D0AA4A.6020308@redhat.com> <49D0EF1E.9040806@redhat.com> <49D0FD4C.1010007@redhat.com> <49D11BDD.70702@redhat.com> In-Reply-To: <49D11BDD.70702@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Score: -4.4 (----) X-Spam-Report: SpamAssassin version 3.2.5 on srv5.dvmed.net summary: Content analysis details: (-4.4 points, 5.0 required) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1671 Lines: 44 Rik van Riel wrote: > Linus Torvalds wrote: >> And my point is, IT MAKES SENSE to just do the elevator barrier, >> _without_ the drive command. > > No argument there. I have seen NCQ starvation on SATA disks, > with some requests sitting in the drive for seconds, while > the drive was busy handling hundreds of requests/second > elsewhere... If certain requests are hanging out in the drive's wbcache longer than others, that increases the probability that OS filesystem-required, elevator-provided ordering becomes skewed once requests are passed to drive firmware. The sad, sucky fact is that NCQ starvation implies FLUSH CACHE is more important than ever, if filesystems want to get ordering correct. IDEALLY, according to the SATA protocol spec, we could issue up to 32 NCQ commands to a SATA drive, each marked with the "FUA" bit to force the command to hit permanent media before returning. In theory, this NCQ+FUA mode gives the drive maximum ability to optimize parallel in-progress commands, decoupling command completion and command issue -- while also giving the OS complete control of ordering by virtue of emptying the SATA tagged command queue. In practice, NCQ+FUA flat out did not work on early drives, and performance was way under what you would expect for parallel write-thru command execution. I haven't benchmarked NCQ+FUA in a few years; it might be worth revisiting. Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/