Received: by 10.223.185.116 with SMTP id b49csp846211wrg; Fri, 16 Feb 2018 08:06:54 -0800 (PST) X-Google-Smtp-Source: AH8x226DuSEg54mfNNpit9g90xuSXYy0MhZv4HnvlrF3BeXV4q7vvoGDWYpweLAROGZ/3c/Gexp2 X-Received: by 10.98.178.8 with SMTP id x8mr6585719pfe.57.1518797214098; Fri, 16 Feb 2018 08:06:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1518797214; cv=none; d=google.com; s=arc-20160816; b=fi8sjfj4N5HYDiYzyAzCuhB/F32ZRNt4AO9fJ1bNSaHRR/mFDvuOXjxeqnhTLBGqld vvfk/kR/wWOxHG0+01Fj3XOtwPcDzNrpZ+Kpq3SMzM106QPWGaROs1VisQf3LMpjXyzl B9gjs55Gq8UolMG8xxdaFppkUG+MMXnao6kdi/XsAJ8PICnZV7BetHTOz8mAmHac4Mfv RQle5S3KwS1KpxYEIo4D/TBwVo+wnTTSynweormhYXko8Cgwf8uvIKJNGwLM4EA0efWH /Gsg1BzFvpaer2myQ9u9qALAb4v0WPXniKP1IlqSYEOYNGB1lmj+QeCxG8hBK5rOutyC tABA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=z7fNOEUTJB2NIFGliqXclTn2PkYArQufA/f3aBFZ9LE=; b=ydlJr98jh4IFnpUqXhu/GqwasMLDQj9TiyaHwV7L2TVTPGPwG82vRqTlCwx1xemcFz E3pkB8ECI4B3PpDarJOfnZY4uyw+X6ToNJQwlDxYsvygqxCg+Sn/rWoG+q+zWKVEVbI2 fv1dcw8YwwkDLMEnvuixKQIF+Y7liZrZ5ejsdedvKU85lM/qktEi1eLnr2hPFmeLSGUI +Kzux83ulFPn1hQ+y4bJnKvDqSgYpJsDIZ6iw8lPa6IIU8ZAtAIBaLD93Dpg4d9h/JhH qtV7IfovmUXUgd58UCLnhXE22mXqYIu23TjFNvwZ7l6AIr9psSRKsXe5+YeCxrzmZiPj sVoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=hfEH7UmT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j71si5851280pgd.404.2018.02.16.08.06.38; Fri, 16 Feb 2018 08:06:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=hfEH7UmT; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756777AbeBPAlB (ORCPT + 99 others); Thu, 15 Feb 2018 19:41:01 -0500 Received: from aserp2130.oracle.com ([141.146.126.79]:51210 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752984AbeBPAk7 (ORCPT ); Thu, 15 Feb 2018 19:40:59 -0500 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w1G0eVXr044114; Fri, 16 Feb 2018 00:40:31 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=z7fNOEUTJB2NIFGliqXclTn2PkYArQufA/f3aBFZ9LE=; b=hfEH7UmTHula6zDxqlzgKZ5TlTWTTPYcanE3K0GqBXwVyz4jX0sAiWDLYcxtgs1qeuTN SDqkk1/zz8X5IPSrbuBMLqiJTprLw2jnWVs8R8cwnwtUpzqfCx1mv5dQ3RzzwwMTlew7 2//SVBen33bPbvU2RAQV5dghyRnGsE/UXUA6Q7slAaVZD+M6BDcu+FtYEcFhhbaCErZk CCyKc2c3yZp3WexQxps3cJd+InjqnGOpk5FqFVQzikYW3v7s/hzaPQcy2pnlVT3YSzQG 8zX7e8aZ0S9w0Pf/b0I0wcdwQWZsqCLFJHc0Wh2aMDB8OLnm9OcD2PyxNdfbvRJgfJN7 ng== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2130.oracle.com with ESMTP id 2g5my78021-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 16 Feb 2018 00:40:31 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w1G0eVJC018437 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=FAIL); Fri, 16 Feb 2018 00:40:31 GMT Received: from abhmp0004.oracle.com (abhmp0004.oracle.com [141.146.116.10]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w1G0eT7j005818; Fri, 16 Feb 2018 00:40:29 GMT Received: from [192.168.1.164] (/98.246.252.205) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 15 Feb 2018 16:40:29 -0800 Subject: Re: [RFC PATCH 1/3] mm: make start_isolate_page_range() fail if already isolated To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Christopher Lameter , Guy Shattah , Anshuman Khandual , Michal Nazarewicz , Vlastimil Babka , David Nellans , Laura Abbott , Pavel Machek , Dave Hansen References: <20180212222056.9735-1-mike.kravetz@oracle.com> <20180212222056.9735-2-mike.kravetz@oracle.com> From: Mike Kravetz Message-ID: Date: Thu, 15 Feb 2018 16:40:28 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: <20180212222056.9735-2-mike.kravetz@oracle.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8806 signatures=668672 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802160007 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 02/12/2018 02:20 PM, Mike Kravetz wrote: > start_isolate_page_range() is used to set the migrate type of a > page block to MIGRATE_ISOLATE while attempting to start a > migration operation. It is assumed that only one thread is > attempting such an operation, and due to the limited number of > callers this is generally the case. However, there are no > guarantees and it is 'possible' for two threads to operate on > the same range. I confirmed my suspicions that this is possible today. As a test, I created a large CMA area at boot time. I wrote some code to exercise large allocations and frees via cma_alloc()/cma_release(). At the same time, I just allocated and freed'ed gigantic pages via the sysfs interface. After a little bit of running, 'free memory' on the system went to zero. After 'stopping' the tests, I observed that most zone normal page blocks were marked as MIGRATE_ISOLATE. Hence 'not available'. As mentioned in the commit message, I doubt we will see this is normal operations. But, my testing confirms that it is possible. Therefore, we should consider a patch like this or some other form of mitigation even of we don't move forward with adding the new interface. -- Mike Kravetz > > Since start_isolate_page_range() is called at the beginning of > such operations, have it return -EBUSY if MIGRATE_ISOLATE is > already set. > > This will allow start_isolate_page_range to serve as a > synchronization mechanism and will allow for more general use > of callers making use of these interfaces. > > Signed-off-by: Mike Kravetz > --- > mm/page_alloc.c | 8 ++++---- > mm/page_isolation.c | 10 +++++++++- > 2 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 76c9688b6a0a..064458f317bf 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -7605,11 +7605,11 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, > * @gfp_mask: GFP mask to use during compaction > * > * The PFN range does not have to be pageblock or MAX_ORDER_NR_PAGES > - * aligned, however it's the caller's responsibility to guarantee that > - * we are the only thread that changes migrate type of pageblocks the > - * pages fall in. > + * aligned. The PFN range must belong to a single zone. > * > - * The PFN range must belong to a single zone. > + * The first thing this routine does is attempt to MIGRATE_ISOLATE all > + * pageblocks in the range. Once isolated, the pageblocks should not > + * be modified by others. > * > * Returns zero on success or negative error code. On success all > * pages which PFN is in [start, end) are allocated for the caller and > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 165ed8117bd1..e815879d525f 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -28,6 +28,13 @@ static int set_migratetype_isolate(struct page *page, int migratetype, > > spin_lock_irqsave(&zone->lock, flags); > > + /* > + * We assume we are the only ones trying to isolate this block. > + * If MIGRATE_ISOLATE already set, return -EBUSY > + */ > + if (is_migrate_isolate_page(page)) > + goto out; > + > pfn = page_to_pfn(page); > arg.start_pfn = pfn; > arg.nr_pages = pageblock_nr_pages; > @@ -166,7 +173,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) > * future will not be allocated again. > * > * start_pfn/end_pfn must be aligned to pageblock_order. > - * Returns 0 on success and -EBUSY if any part of range cannot be isolated. > + * Returns 0 on success and -EBUSY if any part of range cannot be isolated > + * or any part of the range is already set to MIGRATE_ISOLATE. > */ > int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, > unsigned migratetype, bool skip_hwpoisoned_pages) >