Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755247AbZFGQCn (ORCPT ); Sun, 7 Jun 2009 12:02:43 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753012AbZFGQCg (ORCPT ); Sun, 7 Jun 2009 12:02:36 -0400 Received: from mga14.intel.com ([143.182.124.37]:1514 "EHLO mga14.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752695AbZFGQCf (ORCPT ); Sun, 7 Jun 2009 12:02:35 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.41,320,1241420400"; d="scan'208";a="151560318" Date: Mon, 8 Jun 2009 00:02:25 +0800 From: Wu Fengguang To: Nai Xia Cc: Andi Kleen , Nick Piggin , "hugh@veritas.com" , "riel@redhat.com" , "akpm@linux-foundation.org" , "chris.mason@oracle.com" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" Subject: Re: [PATCH] [13/16] HWPOISON: The high level memory error handler in the VM v3 Message-ID: <20090607160225.GA24315@localhost> References: <200905271012.668777061@firstfloor.org> <20090527201239.C2C9C1D0294@basil.firstfloor.org> <20090528082616.GG6920@wotan.suse.de> <20090528093141.GD1065@one.firstfloor.org> <20090528120854.GJ6920@wotan.suse.de> <20090528134520.GH1065@one.firstfloor.org> <20090528145021.GA5503@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4065 Lines: 99 On Thu, Jun 04, 2009 at 02:25:24PM +0800, Nai Xia wrote: > On Thu, May 28, 2009 at 10:50 PM, Wu Fengguang wrote: > > On Thu, May 28, 2009 at 09:45:20PM +0800, Andi Kleen wrote: > >> On Thu, May 28, 2009 at 02:08:54PM +0200, Nick Piggin wrote: > > > > [snip] > > > >> > > >> > BTW. I don't know if you are checking for PG_writeback often enough? > >> > You can't remove a PG_writeback page from pagecache. The normal > >> > pattern is lock_page(page); wait_on_page_writeback(page); which I > >> > >> So pages can be in writeback without being locked? I still > >> wasn't able to find such a case (in fact unless I'm misreading > >> the code badly the writeback bit is only used by NFS and a few > >> obscure cases) > > > > Yes the writeback page is typically not locked. Only read IO requires > > to be exclusive. Read IO is in fact page *writer*, while writeback IO > > is page *reader* :-) > > Sorry for maybe somewhat a little bit off topic, > I am trying to get a good understanding of PG_writeback & PG_locked ;) > > So you are saying PG_writeback & PG_locked are acting like a read/write lock? > I notice wait_on_page_writeback(page) seems always called with page locked -- No. Note that pages are not locked in wait_on_page_writeback_range(). > that is the semantics of a writer waiting to get the lock while it's > acquired by > some reader:The caller(e.g. truncate_inode_pages_range() and > invalidate_inode_pages2_range()) are the writers waiting for > writeback readers (as you clarified ) to finish their job, right ? Sorry if my metaphor confused you. But they are not typical reader/writer problems, but more about data integrities. Pages have to be "not under writeback" when truncated. Otherwise data lost is possible: 1) create a file with one page (page A) 2) truncate page A that is under writeback 3) write to file, which creates page B 4) sync file, which sends page B to disk quickly Now if page B reaches disk before A, the new data will be overwritten by truncated old data, which corrupts the file. > So do you think the idea is sane to group the two bits together > to form a real read/write lock, which does not care about the _number_ > of readers ? We don't care number of readers here. So please forget about it. Thanks, Fengguang > > The writeback bit is _widely_ used.  test_set_page_writeback() is > > directly used by NFS/AFS etc. But its main user is in fact > > set_page_writeback(), which is called in 26 places. > > > >> > think would be safest > >> > >> Okay. I'll just add it after the page lock. > >> > >> > (then you never have to bother with the writeback bit again) > >> > >> Until Fengguang does something fancy with it. > > > > Yes I'm going to do it without wait_on_page_writeback(). > > > > The reason truncate_inode_pages_range() has to wait on writeback page > > is to ensure data integrity. Otherwise if there comes two events: > >        truncate page A at offset X > >        populate page B at offset X > > If A and B are all writeback pages, then B can hit disk first and then > > be overwritten by A. Which corrupts the data at offset X from user's POV. > > > > But for hwpoison, there are no such worries. If A is poisoned, we do > > our best to isolate it as well as intercepting its IO. If the interception > > fails, it will trigger another machine check before hitting the disk. > > > > After all, poisoned A means the data at offset X is already corrupted. > > It doesn't matter if there comes another B page. > > > > Thanks, > > Fengguang > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at  http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at  http://www.tux.org/lkml/ > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/