From: Neil Brown Subject: Re: Atomic non-durable file write API Date: Sun, 26 Dec 2010 08:40:07 +1100 Message-ID: <20101226084007.7939aabc@notabene.brown> References: <4D0A7278.3080506@gmail.com> <1292710543.17128.14.camel@nayuki> <20101224085126.2a7ff187@notabene.brown> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org To: Olaf van der Spek Return-path: In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Fri, 24 Dec 2010 12:17:46 +0100 Olaf van der Spek wrote: > On Thu, Dec 23, 2010 at 10:51 PM, Neil Brown wrote: > > You are asking for something that doesn't exist, which is why no-on= e can tell > > you want the answer is. >=20 > It seems like a very common and basic operation. If it doesn't exist > IMO it should be created. >=20 > > The only mechanism for synchronising different filesystem operation= s is > > fsync. =A0You should use that. > > > > If it is too slow, use data journalling, and place your journal on = a > > small low-latency device (NVRAM??) >=20 > This isn't about some DB-like app, it's about normal file writes, lik= e > archive extractions, compiling, editors, etc. >=20 Yes, it might be nice to have a very low cost way to make those safer a= gainst corruption during a crash. It would have to be *very* low cost as in most cases the cost of cleani= ng up after the crash instead (e.g. 'make clean') is quite low. But people d= o sometime edit /etc/init.d files with an ordinary editor and it would be rather embarrassing if a crash just at the wrong time left some critica= l file incomplete, and maybe it would be easier to teach editors to fsync befo= re rename for files in /etc ..... So what would this mechanism really look like? I think the proposal is= to delay committing the rename until the writeout of the file is complete, without accelerating the writeout. That would probably require delaying all updates to the directory until= the writeout was complete, as trying to reason about which changes were dep= endent and which were independent is unlikely to be easy. So as soon as you rename a file, you create a dependency between the fi= le and the directory such that no update for the directory may be written whil= e any page in the file is dirty. Conversely, any fsync of the directory woul= d fsync the file as well. Any write to the file should probably break the dependency as you can n= o longer be sure what exactly the rename was supposed to protect. I suspect that much of the infrastructure for this could be implemented= in the VFS/VM. Certainly the dependency linkage between inodes, created o= n rename, destroyed on write or fsync or when writeout on the inode compl= etes, and the fsync dependency could be common code. Preventing writeout of directories with dependent files would need some fs interaction. You co= uld probably prototype in ext2 quite easily to do some testing and collecti= on some numbers on overhead. I think this would be an interesting project for someone to do and I wo= uld be happy to review any patches. Whether it ever got further than an inter= esting project would depend very much on how intrusive it was to other filesys= tems, how much over head it caused, and what actual benefits resulted. If anyone wanted to pursue this idea, they would certainly need to addr= ess each of those in their final proposal. I think there could be room for improved transactional semantics in Lin= ux filesystems. This might be what they should look like ... don't know y= et. NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel= " in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html