Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753490AbdIDI6q (ORCPT ); Mon, 4 Sep 2017 04:58:46 -0400 Received: from szxga05-in.huawei.com ([45.249.212.191]:5530 "EHLO szxga05-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753318AbdIDI6p (ORCPT ); Mon, 4 Sep 2017 04:58:45 -0400 Message-ID: <59AD15B6.7080304@huawei.com> Date: Mon, 4 Sep 2017 16:58:30 +0800 From: Xishi Qiu User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Michal Hocko CC: Andrew Morton , KAMEZAWA Hiroyuki , Reza Arbab , Yasuaki Ishimatsu , Igor Mammedov , Vitaly Kuznetsov , , LKML , Michal Hocko Subject: Re: [PATCH 2/2] mm, memory_hotplug: remove timeout from __offline_memory References: <20170904082148.23131-1-mhocko@kernel.org> <20170904082148.23131-3-mhocko@kernel.org> In-Reply-To: <20170904082148.23131-3-mhocko@kernel.org> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.25.179] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A0B0202.59AD15BE.0018,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2014-11-16 11:51:01, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 1761ffefd837e757f5c08f6fd5997eba Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2605 Lines: 76 On 2017/9/4 16:21, Michal Hocko wrote: > From: Michal Hocko > > We have a hardcoded 120s timeout after which the memory offline fails > basically since the hot remove has been introduced. This is essentially > a policy implemented in the kernel. Moreover there is no way to adjust > the timeout and so we are sometimes facing memory offline failures if > the system is under a heavy memory pressure or very intensive CPU > workload on large machines. > > It is not very clear what purpose the timeout actually serves. The > offline operation is interruptible by a signal so if userspace wants Hi Michal, If the user know what he should do if migration for a long time, it is OK, but I don't think all the users know this operation (e.g. ctrl + c) and the affect. Thanks, Xishi Qiu > some timeout based termination this can be done trivially by sending a > signal. > > If there is a strong usecase to do this from the kernel then we should > do it properly and have a it tunable from the userspace with the timeout > disabled by default along with the explanation who uses it and for what > purporse. > > Signed-off-by: Michal Hocko > --- > mm/memory_hotplug.c | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index c9dcbe6d2ac6..b8a85c11360e 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1593,9 +1593,9 @@ static void node_states_clear_node(int node, struct memory_notify *arg) > } > > static int __ref __offline_pages(unsigned long start_pfn, > - unsigned long end_pfn, unsigned long timeout) > + unsigned long end_pfn) > { > - unsigned long pfn, nr_pages, expire; > + unsigned long pfn, nr_pages; > long offlined_pages; > int ret, node; > unsigned long flags; > @@ -1633,12 +1633,8 @@ static int __ref __offline_pages(unsigned long start_pfn, > goto failed_removal; > > pfn = start_pfn; > - expire = jiffies + timeout; > repeat: > /* start memory hot removal */ > - ret = -EBUSY; > - if (time_after(jiffies, expire)) > - goto failed_removal; > ret = -EINTR; > if (signal_pending(current)) > goto failed_removal; > @@ -1711,7 +1707,7 @@ static int __ref __offline_pages(unsigned long start_pfn, > /* Must be protected by mem_hotplug_begin() or a device_lock */ > int offline_pages(unsigned long start_pfn, unsigned long nr_pages) > { > - return __offline_pages(start_pfn, start_pfn + nr_pages, 120 * HZ); > + return __offline_pages(start_pfn, start_pfn + nr_pages); > } > #endif /* CONFIG_MEMORY_HOTREMOVE */ >