Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932694AbYARWW6 (ORCPT ); Fri, 18 Jan 2008 17:22:58 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1761350AbYARWWs (ORCPT ); Fri, 18 Jan 2008 17:22:48 -0500 Received: from smtp2.linux-foundation.org ([207.189.120.14]:55610 "EHLO smtp2.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757778AbYARWWr (ORCPT ); Fri, 18 Jan 2008 17:22:47 -0500 Date: Fri, 18 Jan 2008 14:21:40 -0800 (PST) From: Linus Torvalds To: Anton Salikhmetov cc: Miklos Szeredi , peterz@infradead.org, linux-mm@kvack.org, jakob@unthought.net, linux-kernel@vger.kernel.org, valdis.kletnieks@vt.edu, riel@redhat.com, ksm@42.dk, staubach@redhat.com, jesper.juhl@gmail.com, akpm@linux-foundation.org, protasnb@gmail.com, r.e.wolff@bitwizard.nl, hidave.darkstar@gmail.com, hch@infradead.org Subject: Re: [PATCH -v6 2/2] Updating ctime and mtime for memory-mapped files In-Reply-To: <4df4ef0c0801181404m186bb847sd556e031e908b0b6@mail.gmail.com> Message-ID: References: <12006091182260-git-send-email-salikhmetov@gmail.com> <4df4ef0c0801181158s3f783beaqead3d7049d4d3fa7@mail.gmail.com> <4df4ef0c0801181303o6656832g8b63d2a119a86a9c@mail.gmail.com> <4df4ef0c0801181404m186bb847sd556e031e908b0b6@mail.gmail.com> User-Agent: Alpine 1.00 (LFD 882 2007-12-20) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3020 Lines: 81 On Sat, 19 Jan 2008, Anton Salikhmetov wrote: > > The page_check_address() function is called from the > page_mkclean_one() routine as follows: .. and the page_mkclean_one() function is totally different. Lookie here, this is the correct and complex sequence: > entry = ptep_clear_flush(vma, address, pte); > entry = pte_wrprotect(entry); > entry = pte_mkclean(entry); > set_pte_at(mm, address, pte, entry); That's a rather expensive sequence, but it's done exactly because it has to be done that way. What it does is to - *atomically* load the pte entry _and_ clear the old one in memory. That's the entry = ptep_clear_flush(vma, address, pte); thing, and it basically means that it's doing some architecture-specific magic to make sure that another CPU that accesses the PTE at the same time will never actually modify the pte (because it's clear and not valid) - it then - while the page table is actually clear and invalid - takes the old value and turns it into the new one: entry = pte_wrprotect(entry); entry = pte_mkclean(entry); - and finally, it replaces the entry with the new one: set_pte_at(mm, address, pte, entry); which takes care to write the new entry in some specific way that is atomic wrt other CPU's (ie on 32-bit x86 with a 64-bit page table entry it writes the high word first, see the write barriers in "native_set_pte()" in include/asm-x86/pgtable-3level.h Now, compare that subtle and correct thing with what is *not* correct: if (pte_dirty(*pte) && pte_write(*pte)) *pte = pte_wrprotect(*pte); which makes no effort at all to make sure that it's safe in case another CPU updates the accessed bit. Now, arguably it's unlikely to cause horrible problems at least on x86, because: - we only do this if the pte is already marked dirty, so while we can lose the accessed bit, we can *not* lose the dirty bit. And the accessed bit isn't such a big deal. - it's not doing any of the "be careful about" ordering things, but since the really important bits aren't changing, ordering probably won't practically matter. But the problem is that we have something like 24 different architectures, it's hard to make sure that none of them have issues. In other words: it may well work in practice. But when these things go subtly wrong, they are *really* nasty to find, and the unsafe sequence is really not how it's supposed to be done. For example, you don't even flush the TLB, so even if there are no cross-CPU issues, there's probably going to be writable entries in the TLB that now don't match the page tables. Will it matter? Again, probably impossible to see in practice. But ... Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/