Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp595599ybl; Fri, 10 Jan 2020 03:44:18 -0800 (PST) X-Google-Smtp-Source: APXvYqxSCRnLwA54Rsl1I+rgfMAS7eMv7Ef4ilFajrlxsu+SQPFA+qTn8DSdmthuWpny2ntMW1tA X-Received: by 2002:a54:4085:: with SMTP id i5mr1910055oii.17.1578656658589; Fri, 10 Jan 2020 03:44:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1578656658; cv=none; d=google.com; s=arc-20160816; b=xsfzwMQ1HdJXplNU312uk/HZVOFzzr1AotdbjAZFhQGn8ArTkK/vg/DTsysCakg3VR G14n2869vaavtjTwiOYci/Jp9xgOuzX1Ycc5xKOx/XRVv2z7e6GTxlAyOUFQQ03YTYHO IRq7U+oCQATKXKmYiyRNmlEuaFBa0gyoiFgFz70dF3JOtmJokvtN4FNCuEhdwLCmYXHx X1mrMgIlGPvZWPfCSKRCnUexAonWnnN1m6vwiaXm5nxBx2iniyNMmZtrE4LFl4TgKYhB UcqceFQR3xF349q/Yn+sBLkCG4QxqBHAYvh+VrhAHWbHh/O23rPCmPHf/ney6p9vw7us ICoQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=d0p8R76qlXPMUN1eAfytoacB2h/YZyp0MPfm8xbN0dc=; b=htm36KkC014OQ+S5DTrePU+x6dlxlEGDj3gu0aWZfLAVHJizBoqGOSy7kwomrPdbZ6 G8DjZn6NdC0FADVpk4jzJs4oZVTc7Cz1DZUIelShhKyKb0OPYM8lL0h5IIs3M0CVSgsu KdUuWYTCzJDIXJLyilkG+ZcmpHuKRR6Tr0MUFFLMiFsu0AtwtIzBH/tyoZkxV6txke8N seqCV8atV9JKF7Ifph2wf27Pg0HU0prsIPUojitGXZs1mBuYWusv0AjB29r+o2nSvWCy Xp/8g/3WWZE8TgHEclKSO3qHNQnicHOVmlRAKFNd/R8g6ihejsPs50UppV3Ind8JVDn2 quMw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j24si1168628otk.76.2020.01.10.03.44.06; Fri, 10 Jan 2020 03:44:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=virtuozzo.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727821AbgAJLmM (ORCPT + 99 others); Fri, 10 Jan 2020 06:42:12 -0500 Received: from relay.sw.ru ([185.231.240.75]:54746 "EHLO relay.sw.ru" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727457AbgAJLmM (ORCPT ); Fri, 10 Jan 2020 06:42:12 -0500 Received: from dhcp-172-16-24-104.sw.ru ([172.16.24.104]) by relay.sw.ru with esmtp (Exim 4.92.3) (envelope-from ) id 1ipsfy-0006Gw-Sg; Fri, 10 Jan 2020 14:42:03 +0300 Subject: Re: [PATCH RESEND] mm: fix tick_sched timer blocked by pgdat_resize_lock To: Shile Zhang , Andrew Morton , Pavel Tatashin Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20200110082510.172517-2-shile.zhang@linux.alibaba.com> <20200110093053.34777-1-shile.zhang@linux.alibaba.com> From: Kirill Tkhai Message-ID: <1ee6088c-9e72-8824-3a9a-fc099d196faf@virtuozzo.com> Date: Fri, 10 Jan 2020 14:42:02 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.1 MIME-Version: 1.0 In-Reply-To: <20200110093053.34777-1-shile.zhang@linux.alibaba.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10.01.2020 12:30, Shile Zhang wrote: > When 'CONFIG_DEFERRED_STRUCT_PAGE_INIT' is set, 'pgdat_resize_lock' > will be called inside 'pgdatinit' kthread to initialise the deferred > pages with local interrupts disabled. Which is introduced by > commit 3a2d7fa8a3d5 ("mm: disable interrupts while initializing deferred > pages"). > > But 'pgdatinit' kthread is possible be pined on the boot CPU (CPU#0 by > default), especially in small system with NRCPUS <= 2. In this case, the > interrupts are disabled on boot CPU during memory initialising, which > caused the tick_sched timer be blocked, leading to wall clock stuck. > > Fixes: commit 3a2d7fa8a3d5 ("mm: disable interrupts while initializing > deferred pages") > > Signed-off-by: Shile Zhang > --- > include/linux/memory_hotplug.h | 16 ++++++++++++++-- > 1 file changed, 14 insertions(+), 2 deletions(-) > > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h > index ba0dca6aac6e..be69a6dc4fee 100644 > --- a/include/linux/memory_hotplug.h > +++ b/include/linux/memory_hotplug.h > @@ -6,6 +6,8 @@ > #include > #include > #include > +#include > +#include > > struct page; > struct zone; > @@ -282,12 +284,22 @@ static inline bool movable_node_is_enabled(void) > static inline > void pgdat_resize_lock(struct pglist_data *pgdat, unsigned long *flags) > { > - spin_lock_irqsave(&pgdat->node_size_lock, *flags); > + /* > + * Disable local interrupts on boot CPU will stop the tick_sched > + * timer, which will block jiffies(wall clock) update. > + */ > + if (current->cpu != get_boot_cpu_id()) > + spin_lock_irqsave(&pgdat->node_size_lock, *flags); > + else > + spin_lock(&pgdat->node_size_lock); > } > static inline > void pgdat_resize_unlock(struct pglist_data *pgdat, unsigned long *flags) > { > - spin_unlock_irqrestore(&pgdat->node_size_lock, *flags); > + if (current->cpu != get_boot_cpu_id()) > + spin_unlock_irqrestore(&pgdat->node_size_lock, *flags); > + else > + spin_unlock(&pgdat->node_size_lock); > } > static inline > void pgdat_resize_init(struct pglist_data *pgdat) 1)Linux kernel is *preemptible*. Kernel with CONFIG_PREEMPT_RT option even may preempt *kernel* code in the middle of function. When you are executing a code containing pgdat_resize_lock() and pgdat_resize_unlock(), the process may migrate to another cpu between them. bool cpu another cpu ---------------------------------- pgdat_resize_lock() spin_lock() --> migrate to another cpu pgdat_resize_unlock() spin_unlock_irqrestore() (Yes, in case of CONFIG_PREEMPT_RT, process is preemptible even after spin_lock() call). This looks like a bad helpers, and we should not introduce such the design. 2)I think there is no the problem this patch solves. Do we really this statistics? Can't we simple remove print message from deferred_init_memmap() and solve this? Also, you may try to check that sched_clock() gives better results with interrupts disabled (on x86 it uses rdtsc, when it's possible. But it also may fallback to jiffies-based clock in some hardware cases, and they also won't go with interrupts disabled). Kirill