Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp3602842ybf; Tue, 3 Mar 2020 08:54:55 -0800 (PST) X-Google-Smtp-Source: ADFU+vu1bsaL0pOxDHEYxty3hyE4ukuJiqbFjvmMXHpHs4Z8p8uEMXlFxgiKUQ1Uf8sxxSNCiEbo X-Received: by 2002:a9d:76c9:: with SMTP id p9mr375501otl.135.1583254494702; Tue, 03 Mar 2020 08:54:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583254494; cv=none; d=google.com; s=arc-20160816; b=NDA+evkPpm86n/1DHizOeZ2cZvLk67xBpcqYp3rtbAxOobI9umgsja6QDSZXY5eqMq FYIIgf0P6e3h+PpRrBRHvAx+7hnXTOFX2DDW2oRjXyrnIh/fMSIJXdgdGoI6cSQpEG9g NqDIVfxqjokc12XLJRKE/b54gxLUb1mwKA9KrXuuUoBOCE+mkuXjiuua2MelKIEpSdw2 AlzCKg5lBe787C7HGTOsSPx7qNtGLUPeonZIiS1VkRDzMa7E3KR9mEo1FcLQV57lleLA uRrtIv4GxN3CjEr8ZNudwBLsBEz0jw6DplKdje4ggiwizyFj0e+DUkSx3qAmdSUaC5mZ rAJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=wUcmKwmAgutjacDL/ZAQm9LhSxhb/sumO4Jyr1eQA/4=; b=t2lvlvvvELD7XHv4YeYh9NqJ6mGueGkdAkLe6hUfS/dY1smdS2iqLJNzt+opEpGrro oPQPyLjtmIRIxMJM4fsBH5ZBvfqa5pgy+D68c+uKtXHaGgwpyQA4/EsYUqUulE1mA/k8 ys5UuGOSCJnDtoXmAk7EoVOyGg5Bngrdc5nSA8SXHXQs3JevjkKr0cFMgTgKvgqif/ag YBv0v//Mz69I5Ob2OlZ7HchSIyAsX66lD43mxkP928ZLot9jiZaF0rl8eqCRZ3lAkEvO RxXHkzyRXa+VJ8iCCKt3MUVPu4OFNCTbjOpOXM8GBsfyfKbQsxk7l/6s6s03dynBBHKO XqxQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v18si8486990oth.1.2020.03.03.08.54.41; Tue, 03 Mar 2020 08:54:54 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730417AbgCCQxB (ORCPT + 99 others); Tue, 3 Mar 2020 11:53:01 -0500 Received: from relay.sw.ru ([185.231.240.75]:46376 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729755AbgCCQxB (ORCPT ); Tue, 3 Mar 2020 11:53:01 -0500 Received: from dhcp-172-16-24-104.sw.ru ([172.16.24.104]) by relay.sw.ru with esmtp (Exim 4.92.3) (envelope-from ) id 1j9Amn-0006yJ-94; Tue, 03 Mar 2020 19:52:49 +0300 Subject: Re: [PATCH v2 1/1] mm: fix interrupt disabled long time inside deferred_init_memmap() To: Shile Zhang , Andrew Morton , Pavel Tatashin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Vlastimil Babka , Michal Hocko References: <20200303161551.132263-1-shile.zhang@linux.alibaba.com> <20200303161551.132263-2-shile.zhang@linux.alibaba.com> From: Kirill Tkhai Message-ID: Date: Tue, 3 Mar 2020 19:52:47 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.5.0 MIME-Version: 1.0 In-Reply-To: <20200303161551.132263-2-shile.zhang@linux.alibaba.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03.03.2020 19:15, Shile Zhang wrote: > When 'CONFIG_DEFERRED_STRUCT_PAGE_INIT' is set, 'pgdatinit' kthread will > initialise the deferred pages with local interrupts disabled. It is > introduced by commit 3a2d7fa8a3d5 ("mm: disable interrupts while > initializing deferred pages"). > > The local interrupt will be disabled long time inside > deferred_init_memmap(), depends on memory size. > On machine with NCPUS <= 2, the 'pgdatinit' kthread could be pined on > boot CPU, then the tick timer will stuck long time, which caused the > system wall time inaccuracy. > > For example, the dmesg shown that: > > [ 0.197975] node 0 initialised, 32170688 pages in 1ms > > Obviously, 1ms is unreasonable. > Now, fix it by restore in the pending interrupts inside the while loop. > The reasonable demsg shown likes: > > [ 1.069306] node 0 initialised, 32203456 pages in 894ms The way I understand the original problem, that Pavel fixed: we need disable irqs in deferred_init_memmap() since this function may be called in parallel with deferred_grow_zone() called from interrupt handler. So, Pavel added lock to fix the race. In case of we temporary unlock the lock, interrupt still be possible, so my previous proposition returns the problem back. Now thought again, I think we have to just add: pgdat_resize_unlock(); pgdat_resize_lock(); instead of releasing interrupts, since in case of we just release them with lock held, a call of interrupt->deferred_grow_zone() bring us to a deadlock. So, unlock the lock is must. > Signed-off-by: Shile Zhang > --- > mm/page_alloc.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3c4eb750a199..d3f337f2e089 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1809,8 +1809,12 @@ static int __init deferred_init_memmap(void *data) > * that we can avoid introducing any issues with the buddy > * allocator. > */ > - while (spfn < epfn) > + while (spfn < epfn) { > nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); > + /* let in any pending interrupts */ > + local_irq_restore(flags); > + local_irq_save(flags); > + } > zone_empty: > pgdat_resize_unlock(pgdat, &flags); I think we need here something like below (untested): diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 79e950d76ffc..323afa9a4db5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1828,7 +1828,7 @@ static int __init deferred_init_memmap(void *data) { pg_data_t *pgdat = data; const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); - unsigned long spfn = 0, epfn = 0, nr_pages = 0; + unsigned long spfn = 0, epfn = 0, nr_pages = 0, prev_nr_pages = 0; unsigned long first_init_pfn, flags; unsigned long start = jiffies; struct zone *zone; @@ -1869,8 +1869,18 @@ static int __init deferred_init_memmap(void *data) * that we can avoid introducing any issues with the buddy * allocator. */ - while (spfn < epfn) + while (spfn < epfn) { nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); + /* + * Release interrupts every 1Gb to give a possibility + * a timer to advance jiffies. + */ + if (nr_pages - prev_nr_pages > (1UL << (30 - PAGE_SHIFT))) { + prev_nr_pages = nr_pages; + pgdat_resize_unlock(pgdat, &flags); + pgdat_resize_lock(pgdat, &flags); + } + } zone_empty: pgdat_resize_unlock(pgdat, &flags); (I believe the comment may be improved more).