Return-Path: linux-nfs-owner@vger.kernel.org Received: from mailgw1.uni-kl.de ([131.246.120.220]:39241 "EHLO mailgw1.uni-kl.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752898Ab2AaTQh (ORCPT ); Tue, 31 Jan 2012 14:16:37 -0500 Message-ID: <4F283E0E.7010407@itwm.fraunhofer.de> Date: Tue, 31 Jan 2012 20:16:30 +0100 From: Bernd Schubert MIME-Version: 1.0 To: James Bottomley CC: "Martin K. Petersen" , Chuck Lever , lsf-pc@lists.linux-foundation.org, linux-fsdevel , Linux NFS Mailing List , linux-scsi@vger.kernel.org, Sven Breuner Subject: Re: [LSF/MM TOPIC] end-to-end data and metadata corruption detection References: <38C050B3-2AAD-4767-9A25-02C33627E427@oracle.com> <4F2147BA.6030607@itwm.fraunhofer.de> <4F217F0C.6030105@itwm.fraunhofer.de> <1327620104.6151.23.camel@dabdike.int.hansenpartnership.com> In-Reply-To: <1327620104.6151.23.camel@dabdike.int.hansenpartnership.com> Content-Type: text/plain; charset=UTF-8; format=flowed Sender: linux-nfs-owner@vger.kernel.org List-ID: On 01/27/2012 12:21 AM, James Bottomley wrote: > On Thu, 2012-01-26 at 17:27 +0100, Bernd Schubert wrote: >> On 01/26/2012 03:53 PM, Martin K. Petersen wrote: >>>>>>>> "Bernd" == Bernd Schubert writes: >>> >>> Bernd> We from the Fraunhofer FhGFS team would like to also see the T10 >>> Bernd> DIF/DIX API exposed to user space, so that we could make use of >>> Bernd> it for our FhGFS file system. And I think this feature is not >>> Bernd> only useful for file systems, but in general, scientific >>> Bernd> applications, databases, etc also would benefit from insurance of >>> Bernd> data integrity. >>> >>> I'm attending a SNIA meeting today to discuss a (cross-OS) data >>> integrity aware API. We'll see what comes out of that. >>> >>> With the Linux hat on I'm still mainly interested in pursuing the >>> sys_dio interface Joel and I proposed last year. We have good experience >>> with that I/O model and it suits applications that want to interact with >>> the protection information well. libaio is also on my list. >>> >>> But obviously any help and input is appreciated... >>> >> >> I guess you are referring to the interface described here >> >> http://www.spinics.net/lists/linux-mm/msg14512.html >> >> Hmm, direct IO would mean we could not use the page cache. As we are >> using it, that would not really suit us. libaio then might be another >> option then. > > Are you really sure you want protection information and the page cache? > The reason for using DIO is that no-one could really think of a valid > page cache based use case. What most applications using protection > information want is to say: This is my data and this is the integrity > verification, send it down and assure me you wrote it correctly. If you > go via the page cache, we have all sorts of problems, like our > granularity is a page (not a block) so you'd have to guarantee to write > a page at a time (a mechanism for combining subpage units of protection > information sounds like a nightmare). The write becomes mark page dirty > and wait for the system to flush it, and we can update the page in the > meantime. How do we update the page and its protection information > atomically. What happens if the page gets updated but no protection > information is supplied and so on ... The can of worms just gets more > squirmy. Doing DIO only avoids all of this. Well, entirely direct-IO will not work anyway as FhGFS is a parallel network file system, so data are sent from clients to servers, so data are not entirely direct anymore. The problem with server side storage direct-IO is that it is too slow for several work cases. I guess the write performance could be mostly solved somehow, but then still the read-cache would be entirely missing. From Lustre history I know that server side read-cache improved performance of applications at several sites. So I really wouldn't like to disable it for FhGFS... I guess if we couldn't use the page cache, we probably wouldn't attempt to use DIF/DIX interface, but will calculate our own checksums once we are going to work on the data integrity feature on our side. Cheers, Bernd