Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp737163ybb; Wed, 1 Apr 2020 08:43:37 -0700 (PDT) X-Google-Smtp-Source: ADFU+vs0xUYwMzISlqH/Wk5VTHGaW7SBjpMbdB0Itjjj2RVFdRz+SYaXKwlrgBpmSeZXLNFHKir3 X-Received: by 2002:a9d:128:: with SMTP id 37mr16896518otu.270.1585755816865; Wed, 01 Apr 2020 08:43:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585755816; cv=none; d=google.com; s=arc-20160816; b=GZyWSKdnrLbkBON2GIEInQpj/QiqYM8VcevSt5i4x1Re9hZpPDAgXMc9ZVQnHhZrHw u8USs5pCbqxcb4Ek3zEk1Nx8wndf6AY5cPm0qFZBIfC1BOatrXVan7pBIVeP8KAtsRPc Fyr6fbIHNZEKXTpCOUCnyy8GKBjtZsh+WwxUTnX4dzFyvYJx3bdIqyvDLmcDTgfqLfoq wbYIopx/Ex1sfI2H9mVP2LzoMtZm5pN+rjGcaeIbkpF5ah4Lji1S7Cl51Ubt4XNYhfnN 7uBVF34ih/ZWJoK2AUBu5G+jl6lbZI7BcJ+qoOkH9+7Zel/8jIc+6FMjGb8wEe4Iw61n cxTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=Qh51pqUcxSK1vy6HHAY3PK3fGGZoLe1l0UQDBNVaGW8=; b=MZwpI6LsOxX6vAVvOoiJr5v4+xz1+AezoRBye5GbYRW5SPIj4VTQ/jkoYD8P9aBRP4 wdG+8+2q7lz2ciU7X5TAGXf+OsMDWOZN4QgvmCzKGsp85z2UeyYx8fiFjQaxXDOzbdEv G87LIN3i20bPeP8Vt/xAhiYSM5GZxGFhHcpK72vf36MZsDeXSxGxO8r0WLQtC8d+HZot JU6MW6kD6yVPHPy9Q5A9A0ibREMD7fseCVdEc/5ItFL/jMc7jxQoidj33Ex5hZxIF7/x 53+e1wg8P2CprP24paiVu0r1KqSPXNqU1WVKNe8cADIKQvuMizOjwIX+fV01jAtclD/N aOyg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w18si917876otp.73.2020.04.01.08.43.23; Wed, 01 Apr 2020 08:43:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1733100AbgDAPmW (ORCPT + 99 others); Wed, 1 Apr 2020 11:42:22 -0400 Received: from mail-wm1-f68.google.com ([209.85.128.68]:38099 "EHLO mail-wm1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1732849AbgDAPmW (ORCPT ); Wed, 1 Apr 2020 11:42:22 -0400 Received: by mail-wm1-f68.google.com with SMTP id f6so215603wmj.3 for ; Wed, 01 Apr 2020 08:42:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=Qh51pqUcxSK1vy6HHAY3PK3fGGZoLe1l0UQDBNVaGW8=; b=MgqrVMHsgPreWBYx9E0FR9r9m2pNuofHguq7cWRktHqq8yLtiHNUFO1oq3xPCyWSRr X8Sya023KPrnSPpKknolCwtzZPbx0CBduJ14CCUZjtFs7VsRLGJcQ/bxjgvx3g2l5pO6 IUdRojgcMEPygmhC0UnHKJZbRRA0uj03NQbhg/buh3f1LG3yUnTN2bAibohrvgcDbsSZ 0gm1bO3WRo8CHes/nw+Uk58iAYyXpKl6ayd7QMDQqLbo68x0exqt4r84f49qXHJs80gg Hr0PCXHia4VpOBdW1E79Z0rkoVz/bKK00pU5gmAE0U4ok3aNfNE/EhLWCO02UQysLGbe QbBg== X-Gm-Message-State: AGi0PuZITlsLLnT6J6ZRr912O9ZPmLk1hJZ/Ch+vYxB2lbGgh8SyeB6w FzvSxQHt1oXMslZQUqFjQ8g= X-Received: by 2002:a7b:c050:: with SMTP id u16mr5284608wmc.68.1585755740307; Wed, 01 Apr 2020 08:42:20 -0700 (PDT) Received: from localhost (ip-37-188-180-223.eurotel.cz. [37.188.180.223]) by smtp.gmail.com with ESMTPSA id g2sm3322729wrs.42.2020.04.01.08.42.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 01 Apr 2020 08:42:19 -0700 (PDT) Date: Wed, 1 Apr 2020 17:42:17 +0200 From: Michal Hocko To: Shile Zhang Cc: Andrew Morton , Kirill Tkhai , Pavel Tatashin , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] mm: fix tick timer stall during deferred page init Message-ID: <20200401154217.GQ22681@dhcp22.suse.cz> References: <20200311123848.118638-1-shile.zhang@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200311123848.118638-1-shile.zhang@linux.alibaba.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I am sorry but I have completely missed this patch. On Wed 11-03-20 20:38:48, Shile Zhang wrote: > When 'CONFIG_DEFERRED_STRUCT_PAGE_INIT' is set, 'pgdatinit' kthread will > initialise the deferred pages with local interrupts disabled. It is > introduced by commit 3a2d7fa8a3d5 ("mm: disable interrupts while > initializing deferred pages"). > > On machine with NCPUS <= 2, the 'pgdatinit' kthread could be bound to > the boot CPU, which could caused the tick timer long time stall, system > jiffies not be updated in time. > > The dmesg shown that: > > [ 0.197975] node 0 initialised, 32170688 pages in 1ms > > Obviously, 1ms is unreasonable. > > Now, fix it by restore in the pending interrupts for every 32*1204 pages > (128MB) initialized, give the chance to update the systemd jiffies. > The reasonable demsg shown likes: > > [ 1.069306] node 0 initialised, 32203456 pages in 894ms > > Fixes: 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred pages"). I dislike this solution TBH. It effectivelly conserves the current code and just works around the problem. Why do we hold the IRQ lock here in the first place? This is an early init code and a very limited code is running at this stage. Certainly nothing memory hotplug related which should be the only path really interested in the resize lock AFAIR. This needs a double checking but I strongly believe that the lock can be simply dropped in this path. > Co-developed-by: Kirill Tkhai > Signed-off-by: Kirill Tkhai > Signed-off-by: Shile Zhang > --- > mm/page_alloc.c | 25 ++++++++++++++++++++++--- > 1 file changed, 22 insertions(+), 3 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3c4eb750a199..a3a47845e150 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1763,12 +1763,17 @@ deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn, > return nr_pages; > } > > +/* > + * Release the pending interrupts for every TICK_PAGE_COUNT pages. > + */ > +#define TICK_PAGE_COUNT (32 * 1024) > + > /* Initialise remaining memory on a node */ > static int __init deferred_init_memmap(void *data) > { > pg_data_t *pgdat = data; > const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); > - unsigned long spfn = 0, epfn = 0, nr_pages = 0; > + unsigned long spfn = 0, epfn = 0, nr_pages = 0, prev_nr_pages = 0; > unsigned long first_init_pfn, flags; > unsigned long start = jiffies; > struct zone *zone; > @@ -1779,6 +1784,7 @@ static int __init deferred_init_memmap(void *data) > if (!cpumask_empty(cpumask)) > set_cpus_allowed_ptr(current, cpumask); > > +again: > pgdat_resize_lock(pgdat, &flags); > first_init_pfn = pgdat->first_deferred_pfn; > if (first_init_pfn == ULONG_MAX) { > @@ -1790,7 +1796,6 @@ static int __init deferred_init_memmap(void *data) > /* Sanity check boundaries */ > BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn); > BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat)); > - pgdat->first_deferred_pfn = ULONG_MAX; > > /* Only the highest zone is deferred so find it */ > for (zid = 0; zid < MAX_NR_ZONES; zid++) { > @@ -1809,9 +1814,23 @@ static int __init deferred_init_memmap(void *data) > * that we can avoid introducing any issues with the buddy > * allocator. > */ > - while (spfn < epfn) > + while (spfn < epfn) { > nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); > + /* > + * Release the interrupts for every TICK_PAGE_COUNT pages > + * (128MB) to give tick timer the chance to update the > + * system jiffies. > + */ > + if ((nr_pages - prev_nr_pages) > TICK_PAGE_COUNT) { > + prev_nr_pages = nr_pages; > + pgdat->first_deferred_pfn = spfn; > + pgdat_resize_unlock(pgdat, &flags); > + goto again; > + } > + } > + > zone_empty: > + pgdat->first_deferred_pfn = ULONG_MAX; > pgdat_resize_unlock(pgdat, &flags); > > /* Sanity check that the next zone really is unpopulated */ > -- > 2.24.0.rc2 -- Michal Hocko SUSE Labs