Received: by 10.213.65.68 with SMTP id h4csp579242imn; Tue, 13 Mar 2018 13:45:44 -0700 (PDT) X-Google-Smtp-Source: AG47ELte49YjNeJhERDWKK7DxmnqB3ks6ncNokXfEdxa1+gre3EJHwlXRt/OTQGNK1Mw+Uz3UG8r X-Received: by 10.99.105.70 with SMTP id e67mr1566069pgc.342.1520973944213; Tue, 13 Mar 2018 13:45:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520973944; cv=none; d=google.com; s=arc-20160816; b=UO4qPLU5UcJ4EhxxU48+wE8Elof/Fplx/dd+9T3k5mgt1mCcVZq6NVLh5jr7d1bAfL 3ZTV0JSr7lXt05uNm4UkXqSmb7v5HFFNwhV9sCTwm/Kn+G08B3FjtcVWtgUm5vlwchFy NcPxw1eQJrf49EG9uH4lQ2KLP/o5fFKPVxu1yt5y9Pz+DZc9fXKyPcj2z+TVZsa7th1r mQRpRKavTfvQPJUut6lctHhkW7GPwswOf1RM34IOl3Fgr6It8SWI3+kfBtPpcy5NwrUn IKz6MEu+j6h3QlmJQkrZVpqv34GlUgMidSiFmeKx2I7qSehycmjbKeKmJZOflzLlXSDs zNLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=rO7MiKQh4M0MUbCTlUiRxthC5hVd0dh7GNZT6MAO+aA=; b=ejfvf7eaW4uNZwSr+DTreVWKKb3RoHo6VXNclfz6gef0RaiB19sqK4ndRRJ0e+MrEi /u+pjbtx5uwrs+mk9cT7JlFqBgTQil8xzbGgGGSHJPVHeviHbEbn1SvWo/rJKSphuJk9 sxjUIxnHbXIfpykf/glZMSJbUKi6oIlMGugTyFQHofK0rWIu6NA6V2a46n27zmC1Vu08 vW6uwhCSIrELvXapkSxcTvoxi72nIOr2sd6Ff/hN4cwo7hYLxCDRIeEnGQyOpWp3kbpA 5GaIo3QwXNxNzBtAegAUmQ+wB5aN0U4oPweT6jzCEJ35UaAZe8AY/7waxlyxv76IAH55 6IFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=uR9370Ks; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b2si641638pgt.44.2018.03.13.13.45.29; Tue, 13 Mar 2018 13:45:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=uR9370Ks; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752987AbeCMUoe (ORCPT + 99 others); Tue, 13 Mar 2018 16:44:34 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:43768 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751955AbeCMUoc (ORCPT ); Tue, 13 Mar 2018 16:44:32 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w2DKfxqN160535; Tue, 13 Mar 2018 20:44:02 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : to : cc : references : from : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2017-10-26; bh=rO7MiKQh4M0MUbCTlUiRxthC5hVd0dh7GNZT6MAO+aA=; b=uR9370Ksd16s+Cs6zTxsVot/+a4zM3+mDjyIoYMjOJucc9yz6NviAdBXLtxSyRkCmFkL cRhkrAZ0woQxoz5Il9j65YakSg3Pzcdqdrotl64cjeKqL90Gjn1Xcfti5hTb7BSfoh3/ d8PvCbzwx/QR/LhV4t08zTBkRwL6MT/pI2M808eUx/ewzoeYNdyu8dSSyAhMSAQEsYk/ NIl8UyiBQrwHChHEVv1E01m1aWrg/y3utn56z8okVCX2cLqM4fdc7cqrYs8W12hcxw5B Md07CnpeE4XykMvM55+c9VBexQ5C9xz75SAD8zt9KlfIr+ANz8t3A6yiSksMapnamoDL rA== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2120.oracle.com with ESMTP id 2gpkhe8pub-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Mar 2018 20:44:02 +0000 Received: from aserv0122.oracle.com (aserv0122.oracle.com [141.146.126.236]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w2DKhv31025215 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 13 Mar 2018 20:43:58 GMT Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by aserv0122.oracle.com (8.14.4/8.14.4) with ESMTP id w2DKhoig032593; Tue, 13 Mar 2018 20:43:51 GMT Received: from [192.168.1.10] (/98.216.35.41) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 13 Mar 2018 13:43:50 -0700 Subject: Re: [v5 1/2] mm: disable interrupts while initializing deferred pages To: Andrew Morton Cc: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, m.mizuma@jp.fujitsu.com, mhocko@suse.com, catalin.marinas@arm.com, takahiro.akashi@linaro.org, gi-oh.kim@profitbricks.com, heiko.carstens@de.ibm.com, baiyaowei@cmss.chinamobile.com, richard.weiyang@gmail.com, paul.burton@mips.com, miles.chen@mediatek.com, vbabka@suse.cz, mgorman@suse.de, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <20180309220807.24961-1-pasha.tatashin@oracle.com> <20180309220807.24961-2-pasha.tatashin@oracle.com> <20180312130410.e2fce8e5e38bc2086c7fd924@linux-foundation.org> <20180313160430.hbjnyiazadt3jwa6@xakep.localdomain> <20180313115549.7badec1c6b85eb5a1cf21eb6@linux-foundation.org> <20180313194546.k62tni4g4gnds2nx@xakep.localdomain> <20180313131156.f156abe1822a79ec01c4800a@linux-foundation.org> From: Pavel Tatashin Message-ID: Date: Tue, 13 Mar 2018 16:43:47 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180313131156.f156abe1822a79ec01c4800a@linux-foundation.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8831 signatures=668690 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1803130231 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > Soft lockup: kernel has run for too long without rescheduling > Hard lockup: kernel has run for too long with interrupts disabled > > Both of these are detected by the NMI watchdog handler. > > 9b6e63cbf85b89b2d fixes a soft lockup by adding a manual rescheduling > point. Replacing that with touch_nmi_watchdog() won't work (I think). > Presumably calling touch_softlockup_watchdog() will "work", in that it > suppresses the warning. But it won't fix the thing which the warning > is actually warning about: starvation of the CPU scheduler. That's > what the cond_resched() does. But, unlike memmap_init_zone(), which can be used after boot, here we do not worry about kernel running for too long. This is because we are booting, and no user programs are running. So, it is acceptable to have a long uninterruptible span, as long as we making a useful progress. BTW, the boot CPU still has interrupts enabled during this span. Comment in: include/linux/nmi.h, states: * If the architecture supports the NMI watchdog, touch_nmi_watchdog() * may be used to reset the timeout - for code which intentionally * disables interrupts for a long time. This call is stateless. Which is exactly what we are trying to do here, now that these threads run with interrupts disabled. Before, where they were running with interrupts enabled, and cond_resched() was enough to satisfy soft lockups. > > I'm not sure what to suggest, really. Your changelog isn't the best: > "Vlastimil Babka reported about a window issue during which when > deferred pages are initialized, and the current version of on-demand > initialization is finished, allocations may fail". Well... where is > ths mysterious window? Without such detail it's hard for others to > suggest alternative approaches. Here is hopefully a better description of the problem: Currently, during boot we preinitialize some number of struct pages to satisfy all boot allocations. Even if these allocations happen when we initialize the reset of deferred pages in page_alloc_init_late(). The problem is that we do not know how much kernel will need, and it also depends on various options. So, with this work, we are changing this behavior to initialize struct pages on-demand, only when allocations happen. During boot, when we try to allocate memory, the on-demand struct page initialization code takes care of it. But, once the deferred pages are initializing in: page_alloc_init_late() for_each_node_state(nid, N_MEMORY) kthread_run(deferred_init_memmap()) We cannot use on-demand initialization, as these threads resize pgdat. This whole thing is to take care of this time. My first version of on-demand deferred page initialization would simply fail to allocate memory during this period of time. But, this new version waits for threads to finish initializing deferred memory, and successfully perform the allocation. Because interrupt handler would wait for pgdat resize lock. Thank you, Pavel