Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp723754imm; Wed, 1 Aug 2018 04:22:11 -0700 (PDT) X-Google-Smtp-Source: AAOMgpd4Vae7GH3nFq/9UmFE0wlyQMO8xr368liLF3O4/HgDT2kJt+/rg3u52oBb19eZPZdH4d9P X-Received: by 2002:a62:384:: with SMTP id 126-v6mr26232450pfd.11.1533122531689; Wed, 01 Aug 2018 04:22:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1533122531; cv=none; d=google.com; s=arc-20160816; b=zOjEQ2HGQl5Wspn0E6haWQMgQPO9N+2GXzqvcSOPMiO2M4Eoi81knXIQDlQDBjgT78 oFNGMiL6ygc93IqQ+jmamO1hoVTWK6RRTqFg+oq8t1aF2L+OSLJx2JLG8/NsOB2llNA7 hRgOp+074cJlD4N1iWQzUUjbWhOL163b9Agnu/WYbpXW7qc9C0ArG4HZv+s7p38wZAaO hj3a5y96bFO7tetb2mdViRn2Px1X9HDP+Ie99pb00YOiKoGYvUR0uIklYmWhxHx0Q45v 71+9uwn/eA3llGxANtBszw4b6AUKredHIkMxXg/7DgRtP2kxuZJtmTueX8YEcnATMwAR ka3A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=Lo8DAdBPqFE2idqsZVXHpsDl2L3Z6dmgfDk5rB+Tb8Q=; b=Yc/E63NDXJ2NRvks3fiWZlW+NAQGWULyYoxBfwsdOXNkP5QKwLI/wA74F6YaTMxIS4 r2e6+ORlMnIiMBI7cHalnmY42FLSJp2DWXziI+rCNdB84rnGWVvtRKv9IKS8Jd4JEz4q swHpX2yJ5GRcupuHH2C22Z4LBASljpXi3fs8M8jv6UY5UMF/6bgu+SKlYaGFYZhzoLMb WPIcahkcWjucf7FdX0SWKwfyEoh+wS/h6Nh63hC33OBoPHKEkBWugBObwPx0kzmBcqVV NiTIozso7/O0iwuAIwMEEmEXlfXsbEKUqEK0Rr5V6/m0EnyvpBNpJ8QQ1llhfsxxLOAT Mc3A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v11-v6si15684497pgl.27.2018.08.01.04.21.57; Wed, 01 Aug 2018 04:22:11 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2389070AbeHANGA (ORCPT + 99 others); Wed, 1 Aug 2018 09:06:00 -0400 Received: from mx2.suse.de ([195.135.220.15]:37170 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2389018AbeHANGA (ORCPT ); Wed, 1 Aug 2018 09:06:00 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 1D416ADBE; Wed, 1 Aug 2018 11:20:41 +0000 (UTC) Date: Wed, 1 Aug 2018 13:20:39 +0200 From: Michal Hocko To: Michael Ellerman Cc: John Allen , n-horiguchi@ah.jp.nec.com, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, kamezawa.hiroyu@jp.fujitsu.com, mgorman@suse.de Subject: Re: Infinite looping observed in __offline_pages Message-ID: <20180801112039.GJ16767@dhcp22.suse.cz> References: <20180725181115.hmlyd3tmnu3mn3sf@p50.austin.ibm.com> <20180725200336.GP28386@dhcp22.suse.cz> <87r2jifimk.fsf@concordia.ellerman.id.au> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87r2jifimk.fsf@concordia.ellerman.id.au> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 01-08-18 21:09:39, Michael Ellerman wrote: > Michal Hocko writes: > > On Wed 25-07-18 13:11:15, John Allen wrote: > > [...] > >> Does a failure in do_migrate_range indicate that the range is unmigratable > >> and the loop in __offline_pages should terminate and goto failed_removal? Or > >> should we allow a certain number of retrys before we > >> give up on migrating the range? > > > > Unfortunatelly not. Migration code doesn't tell a difference between > > ephemeral and permanent failures. > > What's to stop an ephemeral failure happening repeatedly? If there is a short term pin on the page that prevents the migration then the holder of the pin should realease it and the next retry will succeed the migration. If the page gets freed on the way then it will not be reallocated because they are isolated already. I can only see complete OOM to be the reason to fail allocation of the target place as the migration failure and that is highly unlikely and sooner or later trigger the oom killer and release some memory. The biggest problem here is that we cannot tell ephemeral and long term pins... -- Michal Hocko SUSE Labs