Received: by 10.213.65.68 with SMTP id h4csp514226imn; Tue, 13 Mar 2018 11:27:21 -0700 (PDT) X-Google-Smtp-Source: AG47ELtxESulwbBAQtNTf0kEONAFBgshHHtxROZjhp44c+m5tOuxB3/KBtIitcjWGyD6SIMx1fGx X-Received: by 2002:a17:902:904b:: with SMTP id w11-v6mr1421759plz.11.1520965640957; Tue, 13 Mar 2018 11:27:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520965640; cv=none; d=google.com; s=arc-20160816; b=ZKOTsXyXFCGecpxx7+HOPvKJsXF+8mDn+2jzExMF1/0MYxQkDnloj/e6knl2cyuOAg KpVGc6mZvffCQBig6jSfw+TvqtnfeZYC3+3lX+0CqWFuaQbo/4sZCcnC8ZDsviLbWjpr lbNQhIoJFsBEJpdCMemkwp4GTZi2xLahqrsO2L6twbQonwM2IQPNLFjgq/xck7ct7QUd FHjJ7dNi1sgXNQX+xFXsGcx3IG2t71FqgFoLFmatfcG+woSy6Qifc5e8eDhMPPUm6NV4 3wScL99vl/QbVkkBiiQp8rLRxSSA17xJ9fTQgi0K0Buh5KJEbQ5ptA466BBfPug/xlgr uhpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:to:from :dkim-signature:arc-authentication-results; bh=nVb3GsJ+VrAXaf7D/IDKXscGZYPRp6yExvSPvNtDUnc=; b=aTaA/K2/75zgTm8kNDcSk1pBRg7EEA97VGp3XwyPJIE3sU4Ox1vURfbs0CSfbsaH4W QqtydIJOBDZpV+YyVzJgokQbkfUtGMnxv61QVu/IcE/+InbZsqdCQcF0/D3XwSSk+YoQ +Pt/ACNirjol4/lWud8mhJ/sryrpKR//0A7woPAy0vSD07fNZa04S+joXUWzaz7tklAY PrAB4Xu+mINXyIeefkTzKEawh8hDFesc/51mAZ6CI6wtmvKJdk3Fa27uWu0F0be9RS8h PI0xrMRMP2fCODEerQo5Gi3G2ftYXnP2Ym/L3hDceo48YplRaBdvsaX/NbYp3kw1+xxt CKCQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=SfU4wMkE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w9-v6si501638plp.425.2018.03.13.11.27.05; Tue, 13 Mar 2018 11:27:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=SfU4wMkE; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932606AbeCMSZZ (ORCPT + 99 others); Tue, 13 Mar 2018 14:25:25 -0400 Received: from aserp2130.oracle.com ([141.146.126.79]:50488 "EHLO aserp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751806AbeCMSYl (ORCPT ); Tue, 13 Mar 2018 14:24:41 -0400 Received: from pps.filterd (aserp2130.oracle.com [127.0.0.1]) by aserp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w2DIKGmq184196; Tue, 13 Mar 2018 18:24:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id; s=corp-2017-10-26; bh=nVb3GsJ+VrAXaf7D/IDKXscGZYPRp6yExvSPvNtDUnc=; b=SfU4wMkEyYSKM7UGF1ih2Jd1V5pLSO7Ucic391kuev04tKinut/eunZhXP7/gq26y8wH 4WhKooV6ItYF+2lJ5lyqttXkXoxx5Wx/Yag10PQsJRucrfSzwshMz0ZLIpVk7BZzVVdv AQRhWUD6i20Rqxwiadqy1sa9u2fEujQ30PE9ltgACpQ4XSxWN8IQVDAdZ+GzW4+DqVjm 2TrKpsO17fLw4kgS7/Kmeq6voXQWpiJ/w3VeAxZRqpNFCceIB/AFvplks+nPI0HHAXj9 qTz6R347gX/YYGpxdhh67jLy9VSc+7A3BIc5lDOhGAjJLMUuumGr2Ja5uJ+JC4NFdr+L 0A== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by aserp2130.oracle.com with ESMTP id 2gpku700mj-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Mar 2018 18:24:08 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w2DIO7O1002013 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Mar 2018 18:24:07 GMT Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w2DIO2lN025316; Tue, 13 Mar 2018 18:24:03 GMT Received: from localhost.localdomain (/98.216.35.41) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 13 Mar 2018 11:24:02 -0700 From: Pavel Tatashin To: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, pasha.tatashin@oracle.com, m.mizuma@jp.fujitsu.com, akpm@linux-foundation.org, mhocko@suse.com, catalin.marinas@arm.com, takahiro.akashi@linaro.org, gi-oh.kim@profitbricks.com, heiko.carstens@de.ibm.com, baiyaowei@cmss.chinamobile.com, richard.weiyang@gmail.com, paul.burton@mips.com, miles.chen@mediatek.com, vbabka@suse.cz, mgorman@suse.de, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [v6 0/2] initialize pages on demand during boot Date: Tue, 13 Mar 2018 14:23:53 -0400 Message-Id: <20180313182355.17669-1-pasha.tatashin@oracle.com> X-Mailer: git-send-email 2.16.2 X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8831 signatures=668690 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1803130205 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Change log: v5 - v6 - Fixed issues found by Andrew Morton: replaced cond_resched() with touch_nmi_watchdog(), instead of simply deleting it. - Removed useless pgdata_resize_lock_irq(), as regular pgdata_resize_lock() does exactly what is needed. - Included fixes to comments by Andrew from mm-initialize-pages-on-demand-during-boot-v5-fix.patch. v4 - v5 - Fix issue reported by Vlasimil Babka: > I've noticed that this function first disables the > on-demand initialization, and then runs the kthreads. > Doesn't that leave a window where allocations can fail? The > chances are probably small, but I think it would be better > to avoid it completely, rare failures suck. > > Fixing that probably means rethinking the whole > synchronization more dramatically though :/ - Introduced a new patch that uses node resize lock to synchronize on-demand deferred page initialization, and regular deferred page initialization. v3 - v4 - Fix !CONFIG_NUMA issue. v2 - v3 Andrew Morton's comments: - Moved read of pgdat->first_deferred_pfn into deferred_zone_grow_lock, thus got rid of READ_ONCE()/WRITE_ONCE() - Replaced spin_lock() with spin_lock_irqsave() in deferred_grow_zone - Updated comments for deferred_zone_grow_lock - Updated comment before deferred_grow_zone() explaining return value, and also noinline specifier. - Fixed comment before _deferred_grow_zone(). v1 - v2 Added Tested-by: Masayoshi Mizuma This change helps for three reasons: 1. Insufficient amount of reserved memory due to arguments provided by user. User may request some buffers, increased hash tables sizes etc. Currently, machine panics during boot if it can't allocate memory due to insufficient amount of reserved memory. With this change, it will be able to grow zone before deferred pages are initialized. One observed example is described in the linked discussion [1] Mel Gorman writes: " Yasuaki Ishimatsu reported a premature OOM when trace_buf_size=100m was specified on a machine with many CPUs. The kernel tried to allocate 38.4GB but only 16GB was available due to deferred memory initialisation. " The allocations in the above scenario happen per-cpu in smp_init(), and before deferred pages are initialized. So, there is no way to predict how much memory we should put aside to boot successfully with deferred page initialization feature compiled in. 2. The second reason is future proof. The kernel memory requirements may change, and we do not want to constantly update reset_deferred_meminit() to satisfy the new requirements. In addition, this function is currently in common code, but potentially would need to be split into arch specific variants, as more arches will start taking advantage of deferred page initialization feature. 3. On demand initialization of reserved pages guarantees that we will initialize only as many pages early in boot using only one thread as needed, the rest are going to be efficiently initialized in parallel. [1] https://www.spinics.net/lists/linux-mm/msg139087.html Pavel Tatashin (2): mm: disable interrupts while initializing deferred pages mm: initialize pages on demand during boot include/linux/memblock.h | 10 -- include/linux/memory_hotplug.h | 53 ++++++----- include/linux/mmzone.h | 5 +- mm/memblock.c | 23 ----- mm/page_alloc.c | 202 +++++++++++++++++++++++++++++++---------- 5 files changed, 186 insertions(+), 107 deletions(-) -- 2.16.2