Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752888AbWLXVfe (ORCPT ); Sun, 24 Dec 2006 16:35:34 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752880AbWLXVfe (ORCPT ); Sun, 24 Dec 2006 16:35:34 -0500 Received: from p02c11o145.mxlogic.net ([208.65.145.68]:46581 "EHLO p02c11o145.mxlogic.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752886AbWLXVfd (ORCPT ); Sun, 24 Dec 2006 16:35:33 -0500 X-Greylist: delayed 866 seconds by postgrey-1.27 at vger.kernel.org; Sun, 24 Dec 2006 16:35:33 EST Date: Sun, 24 Dec 2006 23:21:13 +0200 From: "Michael S. Tsirkin" To: Linus Torvalds Cc: Andrei Popa , Peter Zijlstra , Andrew Morton , Gordon Farquharson , Martin Michlmayr , Hugh Dickins , Nick Piggin , Arjan van de Ven , openib-general@openib.org, Linux Kernel Mailing List Subject: Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3) Message-ID: <20061224212113.GA31813@mellanox.co.il> Reply-To: "Michael S. Tsirkin" References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.11 X-OriginalArrivalTime: 24 Dec 2006 21:23:02.0609 (UTC) FILETIME=[B197EC10:01C727A1] X-TM-AS-Product-Ver: SMEX-7.0.0.1526-3.6.1039-14890.000 X-TM-AS-Result: No--18.361200-4.000000-31 X-Spam: [F=0.0100000000; S=0.010(2006120601)] X-MAIL-FROM: X-SOURCE-IP: [194.90.237.34] Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1809 Lines: 49 > Quoting Linus Torvalds : > Subject: Re: [PATCH] mm: fix page_mkclean_one (was: 2.6.19 file content corruption on ext3) > > Peter, tell me I'm crazy, but with the new rules, the following condition > is a bug: > > - shared mapping > - writable > - not already marked dirty in the PTE > > because that combination means that the hardware can mark the PTE dirty > without us even realizing (and thus not marking the "struct page *" > dirty). Er. Sorry about bumping in, and I'm not sure I understand all of the discussion, but this reminded me of an old issue with COW that created what looks like a vaguely similiar data corruption on infiniband. We solved this for infiniband with MADV_DONTFORK, but I always wondered why does it not affect other parts of kernel. Small reminder from that discussion: down mmap sem get user pages up mmap sem page becomes shared, and COW (e.g. fork) process writes to first byte of page <----- gets a copy Now we had a problem: struct page that we got from get user pages does not point to a correct page in our process. For example: if at some point we map this page for DMA, and hardware writes to last byte of page -----> process does not see this data. So for infiniband, what we do is a combination of - prevent page from becoming COW while hardware might DMA to this page, and - ask users not to write to page if hardware might DMA to same page (even if its using different bytes). I just wandered - is there some chance something like this could be happening in the fs code? HTH, -- MST - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/