Received: by 10.213.65.68 with SMTP id h4csp597945imn; Tue, 13 Mar 2018 14:25:42 -0700 (PDT) X-Google-Smtp-Source: AG47ELuJtc7bU99JmHKCVNgL+d/O8JnUCntQ9EXDRrQRBbt0EBGObut/6MThXA8Z4mvHWy1Vl7Ud X-Received: by 10.98.153.157 with SMTP id t29mr1949696pfk.201.1520976342200; Tue, 13 Mar 2018 14:25:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520976342; cv=none; d=google.com; s=arc-20160816; b=xR7jqr0Be5qEqgk7TO5LFWh9QuXJx/f39roq7J5kEaru4O+9bm8bB/2ZGuf2fsxu9/ 8OaC2ggYW/QAGOQCtxZCGED/PvX5nS2Sd8K/Tx7MfdLXhuKuDPNVSiqjNsdVFOxokG39 6N7JV0ISk6gNWZ2TrU602mYV0/WNkxdy49XX6nn1Y/YbaAXiXjw7/Ov0jhClE2fN91Tz fA+Rj/E/xVRZ0BNDy7KGuA1LmANmi6we7eMVJ6wNfe/C8hURkpxJJ0y8eej7aGeG/GZg gNdzQzVo9F4CRTKcHW08MM3s9CtEs/AZB8djtHI8lcDpGUZ0UaDbO3DyQDYsPfEaZ6n/ GLVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:subject:cc:to:from:date :arc-authentication-results; bh=IJIhiKtJkDBdT4jjLBlZ7uNHyE3GK0Yjo9BeGFWLEQU=; b=02Q1yfE3mv3MSM2NGig8mtkUCNHwvPvIDeAaYB7jcn9F4zo0dFsVupejuGkF/CoOh9 6/Y+zRZVPD4lisLpbfliY7LhlzXRXsZUmyfb+74b5rPfqEJOClPs5fy+OPfJGQxKLfVj ofw46kiGIOMWMGXkdEZONwK9ftreW3/OIWxuNPgBCbE3MxNlS1Ym5vudNjpH/PHDcwZi 5/kTr4i3Ze1ms8W9RUwgNW2LTYBqLmn9etsSwxqSDHDEUnhCf6+u6xQCLoY6CjvTgXWu ACM905XuNJaC4Uipcmox1u6m2xw49Su08ORjrjNpwDf4+A1FmWnOk39FbQlF0SEx+acu WIag== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id f5si653745pgq.806.2018.03.13.14.25.27; Tue, 13 Mar 2018 14:25:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752981AbeCMVYP (ORCPT + 99 others); Tue, 13 Mar 2018 17:24:15 -0400 Received: from mail.linuxfoundation.org ([140.211.169.12]:55528 "EHLO mail.linuxfoundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752946AbeCMVYO (ORCPT ); Tue, 13 Mar 2018 17:24:14 -0400 Received: from akpm3.svl.corp.google.com (unknown [104.133.9.71]) by mail.linuxfoundation.org (Postfix) with ESMTPSA id 3B7F2E2F; Tue, 13 Mar 2018 21:24:13 +0000 (UTC) Date: Tue, 13 Mar 2018 14:24:12 -0700 From: Andrew Morton To: Pavel Tatashin Cc: steven.sistare@oracle.com, daniel.m.jordan@oracle.com, m.mizuma@jp.fujitsu.com, mhocko@suse.com, catalin.marinas@arm.com, takahiro.akashi@linaro.org, gi-oh.kim@profitbricks.com, heiko.carstens@de.ibm.com, baiyaowei@cmss.chinamobile.com, richard.weiyang@gmail.com, paul.burton@mips.com, miles.chen@mediatek.com, vbabka@suse.cz, mgorman@suse.de, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [v5 1/2] mm: disable interrupts while initializing deferred pages Message-Id: <20180313142412.d373318b81164c4cb4b864b3@linux-foundation.org> In-Reply-To: References: <20180309220807.24961-1-pasha.tatashin@oracle.com> <20180309220807.24961-2-pasha.tatashin@oracle.com> <20180312130410.e2fce8e5e38bc2086c7fd924@linux-foundation.org> <20180313160430.hbjnyiazadt3jwa6@xakep.localdomain> <20180313115549.7badec1c6b85eb5a1cf21eb6@linux-foundation.org> <20180313194546.k62tni4g4gnds2nx@xakep.localdomain> <20180313131156.f156abe1822a79ec01c4800a@linux-foundation.org> X-Mailer: Sylpheed 3.6.0 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 13 Mar 2018 16:43:47 -0400 Pavel Tatashin wrote: > > > Soft lockup: kernel has run for too long without rescheduling > > Hard lockup: kernel has run for too long with interrupts disabled > > > > Both of these are detected by the NMI watchdog handler. > > > > 9b6e63cbf85b89b2d fixes a soft lockup by adding a manual rescheduling > > point. Replacing that with touch_nmi_watchdog() won't work (I think). > > Presumably calling touch_softlockup_watchdog() will "work", in that it > > suppresses the warning. But it won't fix the thing which the warning > > is actually warning about: starvation of the CPU scheduler. That's > > what the cond_resched() does. > > But, unlike memmap_init_zone(), which can be used after boot, here we do > not worry about kernel running for too long. This is because we are > booting, and no user programs are running. > > So, it is acceptable to have a long uninterruptible span, as long > as we making a useful progress. BTW, the boot CPU still has > interrupts enabled during this span. > > Comment in: include/linux/nmi.h, states: > > * If the architecture supports the NMI watchdog, touch_nmi_watchdog() > * may be used to reset the timeout - for code which intentionally > * disables interrupts for a long time. This call is stateless. > > Which is exactly what we are trying to do here, now that these threads > run with interrupts disabled. > > Before, where they were running with interrupts enabled, and > cond_resched() was enough to satisfy soft lockups. hm, maybe. But I'm not sure that touch_nmi_watchdog() will hold off a soft lockup warning. Maybe it will. And please let's get the above thoughts into the changlog. > > > > I'm not sure what to suggest, really. Your changelog isn't the best: > > "Vlastimil Babka reported about a window issue during which when > > deferred pages are initialized, and the current version of on-demand > > initialization is finished, allocations may fail". Well... where is > > ths mysterious window? Without such detail it's hard for others to > > suggest alternative approaches. > > Here is hopefully a better description of the problem: > > Currently, during boot we preinitialize some number of struct pages to satisfy all boot allocations. Even if these allocations happen when we initialize the reset of deferred pages in page_alloc_init_late(). The problem is that we do not know how much kernel will need, and it also depends on various options. > > So, with this work, we are changing this behavior to initialize struct pages on-demand, only when allocations happen. > > During boot, when we try to allocate memory, the on-demand struct page initialization code takes care of it. But, once the deferred pages are initializing in: > > page_alloc_init_late() > for_each_node_state(nid, N_MEMORY) > kthread_run(deferred_init_memmap()) > > We cannot use on-demand initialization, as these threads resize pgdat. > > This whole thing is to take care of this time. > > My first version of on-demand deferred page initialization would simply fail to allocate memory during this period of time. But, this new version waits for threads to finish initializing deferred memory, and successfully perform the allocation. > > Because interrupt handler would wait for pgdat resize lock. OK, thanks. Please also add to changelog.