Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Fri, 4 Oct 2002 12:51:09 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Fri, 4 Oct 2002 12:51:09 -0400 Received: from dell-paw-3.cambridge.redhat.com ([195.224.55.237]:47613 "EHLO executor.cambridge.redhat.com") by vger.kernel.org with ESMTP id ; Fri, 4 Oct 2002 12:51:06 -0400 To: David Howells , linux-kernel@vger.kernel.org Subject: Re: [PATCH] AFS filesystem for Linux (2/2) In-Reply-To: Message from Jan Harkes of "Fri, 04 Oct 2002 12:07:08 EDT." <20021004160708.GA14737@ravel.coda.cs.cmu.edu> User-Agent: EMH/1.14.1 SEMI/1.14.3 (Ushinoya) FLIM/1.14.3 (=?ISO-8859-4?Q?Unebigory=F2mae?=) APEL/10.3 Emacs/21.2 (i686-pc-linux-gnu) MULE/5.0 (SAKAKI) MIME-Version: 1.0 (generated by SEMI 1.14.3 - "Ushinoya") Content-Type: text/plain; charset=US-ASCII Date: Fri, 04 Oct 2002 17:56:36 +0100 Message-ID: <27504.1033750596@warthog.cambridge.redhat.com> From: David Howells Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6351 Lines: 142 Hi Jan, > We automatically resolve directory conflict, file conflicts are 'punted' to > userspace. There is a manual repair tool, where a user can pick the version > that he believes is the most appropriate, or he can substitute a replacement > or merged copy. > > There is also a framework for application specific resolvers, these can try > to automatically merge f.i. calendar files, or email boxes, or > remove/recompile object files when there is a conflict, etc. I see. So users have to know how to resolve conflicts themselves... > For AFS you don't have to care, it's semantics without locking are that > the 'last writer to close it's fd' wins. Is that a requirement for AFS? I'm looking at trying to pass writes to the server as soon as possible. > I thought you were implementing AFS, how can you choose your model? Where there's leeway, I can choose. > How about dirty data, that was locally modified? In some ways, what I'd ideally like to do is have write() contact the server and actually send the data direct, then assuming that wasn't rejected, update the local cache. Any bits of the write that don't overlap pages that are cached _can_ just be discarded once written to the server, as I can always retrieve them from the server later, complete with their surrounding data. Alternatively, I can use prepare-write to make sure I hold copies of all the pages I want to modify, and then use commit-write to write them back. The problem with doing that is that I potentially have bits either side that may end up reverting a write made to the server by someone else. The third option - using writepage() - has the same problems as the second. However, I could get around this by doing a binary "diff" of the page in the local cache against the modified page in memory and just send the changes to the server. Or, I could then retrieve a fresh copy of the page and apply the differences to that. The problem there is that I can't detect changes that have made no difference. > There is a network latency that comes into play here. If you want last > writer wins you should not zap dirty data. I have to zap pages that some other client has modified. The server has told me to invalidate it, and so I need to reread it from the server. This is probably only going to be a problem with shared-writable mappings, and the problem is due to AFS, not how I do my caching. > So other clients will read inconsistent data if they don't have that > chunk cached. And if the server generates callbacks on the write of a > chunk, all clients see the update? Not with prepare-/commit-write if commit-write is synchronous. In any case, if they don't have that chunk cached they'll have to fetch it from the server. > Ehh, access permissions should already be checked when the file is > opened. NFS is dealing with the same problems here. AFS fileservers don't have a concept of an open file. Every time you do a fetch or a store (be for it data or metadata) you get a short-term "lease" on the file that the server breaks if some other client modifies it. Furthermore, every operation a client issues is checked at the RxRPC level. This way a client can have several users holding a file open in the local VFS and just label the operations sent to the server with tokens representing the users doing the access. How does Coda do it? Does it get "open" the file on the server with a particular set of credentials for each user that wants to use it? > > How does Coda deal with this security problem? > > What problem? For instance: you have one file in your cache, several people write to it, but one of the people in the middle had permission taken away as far as the server is concerned. Also, when writing the file back, if several people have changed it, whose credentials do you present to the server as having made the change? > If you don't have write permission you can't write to the file. If you do > have write permissions you can write. When the last writer closes it's > filehandle the data is sent to the server with his permissions. What if someone changes the permissions on the server's copy of the file whilst you have the file open? > Now if the ACL is changed on the servers before the close so that this last > writer loses write access we get a 'conflict' that is punted to userspace, > similar to the case when writing to an already updated file. This is what I was getting at. The main issue is how do you deal with a file in the cache that has been written to by several people, some of whom don't have write permission any more. > So I store it in VM on the disk and you store it in your private FS on > disk. Looks like the same thing to me, except I get to enjoy the benefits of > the page/swapcache keeping the data in memory when it is frequently used > without having to do anything smart. And I get to enjoy the benefits of the pagecache keeping the data in memory when it is frequently used, and I can also do smart stuff and try to optimise access. > Virtual Memory. A private mmap of a file. I thought you might have meant a "Volume Manager" rather than the kernel's VM subsystem. > Updates dirty the pages, so they end up in swap, but we log the same update > in a logfile (journal), when the log fills up, the changes written directly > to the underlying file. So you keep everything in triplicate whilst running? (or at least duplicate plus a frequently emptied journal). > Optionally, when we know that the swap copy and the file copy of the page > are identical, the page is remapped to reduce swap usage. That is where we > store all the metadata and the Coda file -> cache file mappings so that they > can survive a reboot or system crash. I see... you can have dirty data residing in the cache over a reboot, and thus you can't just nuke your cache partially or wholly, and you have to keep careful account of all the entries in there... Hmmm... I'll have to think about that, but I don't think it'll be a problem, provided I don't let close() complete until all the dirty data from a file is written to the server. David - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/