Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752153Ab0HSJ2e (ORCPT ); Thu, 19 Aug 2010 05:28:34 -0400 Received: from mga09.intel.com ([134.134.136.24]:45580 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752131Ab0HSJ2c (ORCPT ); Thu, 19 Aug 2010 05:28:32 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.56,232,1280732400"; d="scan'208";a="649211443" Date: Thu, 19 Aug 2010 17:28:28 +0800 From: Wu Fengguang To: Naoya Horiguchi Cc: Andi Kleen , Andrew Morton , Christoph Lameter , Mel Gorman , "Jun'ichi Nomura" , linux-mm , LKML Subject: Re: [PATCH 1/9] HWPOISON, hugetlb: move PG_HWPoison bit check Message-ID: <20100819092828.GA20863@localhost> References: <1281432464-14833-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1281432464-14833-2-git-send-email-n-horiguchi@ah.jp.nec.com> <20100818001842.GC6928@localhost> <20100819075543.GA4125@spritzera.linux.bs1.fc.nec.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100819075543.GA4125@spritzera.linux.bs1.fc.nec.co.jp> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2124 Lines: 44 On Thu, Aug 19, 2010 at 03:55:43PM +0800, Naoya Horiguchi wrote: > On Wed, Aug 18, 2010 at 08:18:42AM +0800, Wu Fengguang wrote: > > On Tue, Aug 10, 2010 at 05:27:36PM +0800, Naoya Horiguchi wrote: > > > In order to handle metadatum correctly, we should check whether the hugepage > > > we are going to access is HWPOISONed *before* incrementing mapcount, > > > adding the hugepage into pagecache or constructing anon_vma. > > > This patch also adds retry code when there is a race between > > > alloc_huge_page() and memory failure. > > > > This duplicates the PageHWPoison() test into 3 places without really > > address any problem. For example, there are still _unavoidable_ races > > between PageHWPoison() and add_to_page_cache(). > > > > What's the problem you are trying to resolve here? If there are > > data structure corruption, we may need to do it in some other ways. > > The problem I tried to resolve in this patch is the corruption of > data structures when memory failure occurs between alloc_huge_page() > and lock_page(). > The corruption occurs because page fault can fail with metadata changes > remained (such as refcount, mapcount, etc.) > Since the PageHWPoison() check is for avoiding hwpoisoned page remained > in pagecache mapping to the process, it should be done in > "found in pagecache" branch, not in the common path. > This patch moves the check to "found in pagecache" branch. That's good stuff to put in the changelog. > In addition to that, I added 2 PageHWPoison checks in "new allocation" branches > to enhance the possiblity to recover from memory failures on pages under allocation. > But it's a different point from the original one, so I drop these retry checks. So you'll remove the first two chunks and retain the 3rd chunk? That makes it a small bug-fix patch suitable for 2.6.36 and I'll happily ACK it :) Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/