Received: by 10.213.65.68 with SMTP id h4csp514475imn; Tue, 13 Mar 2018 11:27:50 -0700 (PDT) X-Google-Smtp-Source: AG47ELsmbocn0Gd5Cw1L2hGlzgZfcHIQ6UxyFyo3EIWv7G+HjOrVceI3RKy6tIB8hn2JCrYzA+or X-Received: by 10.98.61.133 with SMTP id x5mr1507232pfj.181.1520965669992; Tue, 13 Mar 2018 11:27:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520965669; cv=none; d=google.com; s=arc-20160816; b=pAb9we83CZc3rMdnqzyih4C8KQP28X6URsGhNMGDtbzBaSCenLQ2cIA7714qbEHr9b BOAIA2A5wutrfzVGKTvUaoTGCOsPnjpyY0Hc3jVIAWQ0Gjsj605to3PkX8rvi4Ri5kbm sfeLha1WnLnW/GeKeByKOYjI1qs2ZnBY3VtFYNLhEuME40yn2G4xfaEgkb9gB/bnBg14 4l5rIo6dvFfiMbyq8DKqsRVw1dDhmUFz/BmkfnZVwHatkVfwN6MRxFbjj+xRNLJ6G/D9 wo+vsby1gvlxz+uFLDlUuPMLkLsAwiEIfwHQ4QVy+m0yaeL1OXOLXdwRLnmB/lRv+/YG s3Aw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:to:from:dkim-signature:arc-authentication-results; bh=dAyguJrd7usL4Ot0/oHND9VfgZw8ee553vP6NrirYeA=; b=cdKeYBgwfU1Y/0/o+qV0WcuDfs8vzW6jZz6c8PXvUOZ+QFp4YUkrj//1AesCQrXKEu oj+1IzzA3puG22drcNqGtckYtad/KcJsipSCRUmmdYpDxiAetHPT+TR7EIxhyV7mraaI UCjrIaZKwz5QDQ7usYKCC4knJx0otXRicEFHcXJxPjDLHf3EeSgrYpNhPL5IuXjkuCJ2 qGO+GOHjEQe2+UI/Q4Yf6FcZ4kbBHw+XFIoRPfdCinhLDN532Az4yS+iVcnu3BC8GfOJ /WGIb/zhZOV6Vlye7mVygfKbs6so/H+v8134XT+xMw7OcbmqwLyB9bQ1IPsJQKJAT3VY QOBg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=KyDWt7BI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a6si510105pfi.123.2018.03.13.11.27.35; Tue, 13 Mar 2018 11:27:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=KyDWt7BI; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932443AbeCMSYm (ORCPT + 99 others); Tue, 13 Mar 2018 14:24:42 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:50484 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751412AbeCMSYk (ORCPT ); Tue, 13 Mar 2018 14:24:40 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w2DIKL4H184248; Tue, 13 Mar 2018 18:24:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id : in-reply-to : references; s=corp-2017-10-26; bh=dAyguJrd7usL4Ot0/oHND9VfgZw8ee553vP6NrirYeA=; b=KyDWt7BIkw+7zAjL7l59WczaVRJUb+5QqcbbxfHuH9yhssSkkzJAN7Xm/EGUJneRh8xa CuApFMZmL9TrQue1V4uDtOowm0nPt6XKxXy08+Zb1Pvpvk9iwW/tCtqV9YgI0SL5yLrQ 5pKXjg2Wt25d9Qg4k77pF3hm/YieN1FJ+RJT+1qHBMyaoBktvBtma3laRGZcEjOKowkP RpF6OIu71HyjRRPEjVxVAx6Uy8QJuCrz3rczKysoUjcSXXmPz8kWs9nYyHv2o7GtmqBv ty0VFfuwaYeoFGro+1LRdyHElpGKoYIB36pvwZMpEFS5hINRJ6kFmo5PkYbq9DsmnkBh Kg== Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp2130.oracle.com with ESMTP id 2gpku700mk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Mar 2018 18:24:08 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id w2DIO7mX004234 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Mar 2018 18:24:07 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w2DIO4vA030738; Tue, 13 Mar 2018 18:24:05 GMT Received: from localhost.localdomain (/98.216.35.41) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 13 Mar 2018 11:24:04 -0700 From: Pavel Tatashin To: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, pasha.tatashin@oracle.com, m.mizuma@jp.fujitsu.com, akpm@linux-foundation.org, mhocko@suse.com, catalin.marinas@arm.com, takahiro.akashi@linaro.org, gi-oh.kim@profitbricks.com, heiko.carstens@de.ibm.com, baiyaowei@cmss.chinamobile.com, richard.weiyang@gmail.com, paul.burton@mips.com, miles.chen@mediatek.com, vbabka@suse.cz, mgorman@suse.de, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [v6 1/2] mm: disable interrupts while initializing deferred pages Date: Tue, 13 Mar 2018 14:23:54 -0400 Message-Id: <20180313182355.17669-2-pasha.tatashin@oracle.com> X-Mailer: git-send-email 2.16.2 In-Reply-To: <20180313182355.17669-1-pasha.tatashin@oracle.com> References: <20180313182355.17669-1-pasha.tatashin@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8831 signatures=668690 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=2 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1803130205 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Vlastimil Babka reported about a window issue during which when deferred pages are initialized, and the current version of on-demand initialization is finished, allocations may fail. While this is highly unlikely scenario, since this kind of allocation request must be large, and must come from interrupt handler, we still want to cover it. We solve this by initializing deferred pages with interrupts disabled, and holding node_size_lock spin lock while pages in the node are being initialized. The on-demand deferred page initialization that comes later will use the same lock, and thus synchronize with deferred_init_memmap(). It is unlikely for threads that initialize deferred pages to be interrupted. They run soon after smp_init(), but before modules are initialized, and long before user space programs. This is why there is no adverse effect of having these threads running with interrupts disabled. Signed-off-by: Pavel Tatashin --- include/linux/memory_hotplug.h | 53 ++++++++++++++++++++++-------------------- include/linux/mmzone.h | 5 ++-- mm/page_alloc.c | 19 ++++++++------- 3 files changed, 42 insertions(+), 35 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index aba5f86eb038..2b0265265c28 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -51,24 +51,6 @@ enum { MMOP_ONLINE_MOVABLE, }; -/* - * pgdat resizing functions - */ -static inline -void pgdat_resize_lock(struct pglist_data *pgdat, unsigned long *flags) -{ - spin_lock_irqsave(&pgdat->node_size_lock, *flags); -} -static inline -void pgdat_resize_unlock(struct pglist_data *pgdat, unsigned long *flags) -{ - spin_unlock_irqrestore(&pgdat->node_size_lock, *flags); -} -static inline -void pgdat_resize_init(struct pglist_data *pgdat) -{ - spin_lock_init(&pgdat->node_size_lock); -} /* * Zone resizing functions * @@ -246,13 +228,6 @@ extern void clear_zone_contiguous(struct zone *zone); ___page; \ }) -/* - * Stub functions for when hotplug is off - */ -static inline void pgdat_resize_lock(struct pglist_data *p, unsigned long *f) {} -static inline void pgdat_resize_unlock(struct pglist_data *p, unsigned long *f) {} -static inline void pgdat_resize_init(struct pglist_data *pgdat) {} - static inline unsigned zone_span_seqbegin(struct zone *zone) { return 0; @@ -293,6 +268,34 @@ static inline bool movable_node_is_enabled(void) } #endif /* ! CONFIG_MEMORY_HOTPLUG */ +#if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT) +/* + * pgdat resizing functions + */ +static inline +void pgdat_resize_lock(struct pglist_data *pgdat, unsigned long *flags) +{ + spin_lock_irqsave(&pgdat->node_size_lock, *flags); +} +static inline +void pgdat_resize_unlock(struct pglist_data *pgdat, unsigned long *flags) +{ + spin_unlock_irqrestore(&pgdat->node_size_lock, *flags); +} +static inline +void pgdat_resize_init(struct pglist_data *pgdat) +{ + spin_lock_init(&pgdat->node_size_lock); +} +#else /* !(CONFIG_MEMORY_HOTPLUG || CONFIG_DEFERRED_STRUCT_PAGE_INIT) */ +/* + * Stub functions for when hotplug is off + */ +static inline void pgdat_resize_lock(struct pglist_data *p, unsigned long *f) {} +static inline void pgdat_resize_unlock(struct pglist_data *p, unsigned long *f) {} +static inline void pgdat_resize_init(struct pglist_data *pgdat) {} +#endif /* !(CONFIG_MEMORY_HOTPLUG || CONFIG_DEFERRED_STRUCT_PAGE_INIT) */ + #ifdef CONFIG_MEMORY_HOTREMOVE extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 7522a6987595..d14168da66a7 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -633,14 +633,15 @@ typedef struct pglist_data { #ifndef CONFIG_NO_BOOTMEM struct bootmem_data *bdata; #endif -#ifdef CONFIG_MEMORY_HOTPLUG +#if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT) /* * Must be held any time you expect node_start_pfn, node_present_pages * or node_spanned_pages stay constant. Holding this will also * guarantee that any pfn_valid() stays that way. * * pgdat_resize_lock() and pgdat_resize_unlock() are provided to - * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG. + * manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG + * or CONFIG_DEFERRED_STRUCT_PAGE_INIT. * * Nests above zone->lock and zone->span_seqlock */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3d974cb2a1a1..cada509e2176 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1506,7 +1506,7 @@ static void __init deferred_free_pages(int nid, int zid, unsigned long pfn, } else if (!(pfn & nr_pgmask)) { deferred_free_range(pfn - nr_free, nr_free); nr_free = 1; - cond_resched(); + touch_nmi_watchdog(); } else { nr_free++; } @@ -1535,7 +1535,7 @@ static unsigned long __init deferred_init_pages(int nid, int zid, continue; } else if (!page || !(pfn & nr_pgmask)) { page = pfn_to_page(pfn); - cond_resched(); + touch_nmi_watchdog(); } else { page++; } @@ -1552,23 +1552,25 @@ static int __init deferred_init_memmap(void *data) int nid = pgdat->node_id; unsigned long start = jiffies; unsigned long nr_pages = 0; - unsigned long spfn, epfn; + unsigned long spfn, epfn, first_init_pfn, flags; phys_addr_t spa, epa; int zid; struct zone *zone; - unsigned long first_init_pfn = pgdat->first_deferred_pfn; const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); u64 i; + /* Bind memory initialisation thread to a local node if possible */ + if (!cpumask_empty(cpumask)) + set_cpus_allowed_ptr(current, cpumask); + + pgdat_resize_lock(pgdat, &flags); + first_init_pfn = pgdat->first_deferred_pfn; if (first_init_pfn == ULONG_MAX) { + pgdat_resize_unlock(pgdat, &flags); pgdat_init_report_one_done(); return 0; } - /* Bind memory initialisation thread to a local node if possible */ - if (!cpumask_empty(cpumask)) - set_cpus_allowed_ptr(current, cpumask); - /* Sanity check boundaries */ BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn); BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat)); @@ -1598,6 +1600,7 @@ static int __init deferred_init_memmap(void *data) epfn = min_t(unsigned long, zone_end_pfn(zone), PFN_DOWN(epa)); deferred_free_pages(nid, zid, spfn, epfn); } + pgdat_resize_unlock(pgdat, &flags); /* Sanity check that the next zone really is unpopulated */ WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); -- 2.16.2