Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753568Ab0HWJ0E (ORCPT ); Mon, 23 Aug 2010 05:26:04 -0400 Received: from TYO201.gate.nec.co.jp ([202.32.8.193]:35849 "EHLO tyo201.gate.nec.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751270Ab0HWJ0A (ORCPT ); Mon, 23 Aug 2010 05:26:00 -0400 Date: Mon, 23 Aug 2010 18:24:49 +0900 From: Naoya Horiguchi To: Wu Fengguang Cc: Andi Kleen , Andrew Morton , Christoph Lameter , Mel Gorman , "Jun'ichi Nomura" , linux-mm , LKML Subject: Re: [PATCH 1/9] HWPOISON, hugetlb: move PG_HWPoison bit check Message-ID: <20100823092449.GC3769@spritzera.linux.bs1.fc.nec.co.jp> References: <1281432464-14833-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1281432464-14833-2-git-send-email-n-horiguchi@ah.jp.nec.com> <20100818001842.GC6928@localhost> <20100819075543.GA4125@spritzera.linux.bs1.fc.nec.co.jp> <20100819092828.GA20863@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-2022-jp Content-Disposition: inline In-Reply-To: <20100819092828.GA20863@localhost> User-Agent: Mutt/1.5.20 (2009-12-10) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2275 Lines: 50 On Thu, Aug 19, 2010 at 05:28:28PM +0800, Wu Fengguang wrote: > On Thu, Aug 19, 2010 at 03:55:43PM +0800, Naoya Horiguchi wrote: > > On Wed, Aug 18, 2010 at 08:18:42AM +0800, Wu Fengguang wrote: > > > On Tue, Aug 10, 2010 at 05:27:36PM +0800, Naoya Horiguchi wrote: > > > > In order to handle metadatum correctly, we should check whether the hugepage > > > > we are going to access is HWPOISONed *before* incrementing mapcount, > > > > adding the hugepage into pagecache or constructing anon_vma. > > > > This patch also adds retry code when there is a race between > > > > alloc_huge_page() and memory failure. > > > > > > This duplicates the PageHWPoison() test into 3 places without really > > > address any problem. For example, there are still _unavoidable_ races > > > between PageHWPoison() and add_to_page_cache(). > > > > > > What's the problem you are trying to resolve here? If there are > > > data structure corruption, we may need to do it in some other ways. > > > > The problem I tried to resolve in this patch is the corruption of > > data structures when memory failure occurs between alloc_huge_page() > > and lock_page(). > > The corruption occurs because page fault can fail with metadata changes > > remained (such as refcount, mapcount, etc.) > > Since the PageHWPoison() check is for avoiding hwpoisoned page remained > > in pagecache mapping to the process, it should be done in > > "found in pagecache" branch, not in the common path. > > This patch moves the check to "found in pagecache" branch. > > That's good stuff to put in the changelog. OK. > > In addition to that, I added 2 PageHWPoison checks in "new allocation" branches > > to enhance the possiblity to recover from memory failures on pages under allocation. > > But it's a different point from the original one, so I drop these retry checks. > > So you'll remove the first two chunks and retain the 3rd chunk? Yes. > That makes it a small bug-fix patch suitable for 2.6.36 and I'll > happily ACK it :) Thank you! Naoya Horiguchi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/