Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755941AbcLBKDY (ORCPT ); Fri, 2 Dec 2016 05:03:24 -0500 Received: from mail-wj0-f193.google.com ([209.85.210.193]:34250 "EHLO mail-wj0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750917AbcLBKDW (ORCPT ); Fri, 2 Dec 2016 05:03:22 -0500 Date: Fri, 2 Dec 2016 11:03:19 +0100 From: Michal Hocko To: Mel Gorman Cc: Andrew Morton , Christoph Lameter , Vlastimil Babka , Johannes Weiner , Jesper Dangaard Brouer , Linux-MM , Linux-Kernel Subject: Re: [PATCH 1/2] mm, page_alloc: Keep pcp count and list contents in sync if struct page is corrupted Message-ID: <20161202100318.GF6830@dhcp22.suse.cz> References: <20161202002244.18453-1-mgorman@techsingularity.net> <20161202002244.18453-2-mgorman@techsingularity.net> <20161202081216.GA6830@dhcp22.suse.cz> <20161202094933.jxcgvtth2poqdm3n@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20161202094933.jxcgvtth2poqdm3n@techsingularity.net> User-Agent: Mutt/1.6.0 (2016-04-01) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1727 Lines: 41 On Fri 02-12-16 09:49:33, Mel Gorman wrote: > On Fri, Dec 02, 2016 at 09:12:17AM +0100, Michal Hocko wrote: > > On Fri 02-12-16 00:22:43, Mel Gorman wrote: > > > Vlastimil Babka pointed out that commit 479f854a207c ("mm, page_alloc: > > > defer debugging checks of pages allocated from the PCP") will allow the > > > per-cpu list counter to be out of sync with the per-cpu list contents > > > if a struct page is corrupted. This patch keeps the accounting in sync. > > > > > > Fixes: 479f854a207c ("mm, page_alloc: defer debugging checks of pages allocated from the PCP") > > > Signed-off-by: Mel Gorman > > > cc: stable@vger.kernel.org [4.7+] > > > > I am trying to think about what would happen if we did go out of sync > > and cannot spot a problem. Vlastimil has mentioned something about > > free_pcppages_bulk looping for ever but I cannot see it happening right > > now. > > free_pcppages_bulk can infinite loop if the page count is positive and > there are no pages. While I've only seen this during development, a > corrupted count loops here > > do { > batch_free++; > if (++pindex == NR_PCP_LISTS) > pindex = 0; > list = &pcp->lists[pindex]; > } while (list_empty(list)); > > It would only be seen in a situation where struct page corruption was > detected so it's rare. OK, I was apparently sleeping when responding. I focused on t he outer loop and that should just converge. But it is true that this inner loop can just runaway... Could you add that to the changelog please? This definitely warrants stable backport. Thanks! -- Michal Hocko SUSE Labs