Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp663973pxk; Wed, 16 Sep 2020 13:45:36 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyS0V7wZ2sKTMZ+VMTereq/shgFcmodd/MxnW752Bj4qXE5FKCHIAKDBp4L05vYqj8gytkz X-Received: by 2002:aa7:cb92:: with SMTP id r18mr29730028edt.158.1600289136397; Wed, 16 Sep 2020 13:45:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600289136; cv=none; d=google.com; s=arc-20160816; b=LPbSfj0KYjxp3UY8DRwHBbJj/m9XhCDcq4lNEwnZPVYIOIa355dwvYSXpwAwc2Dp9s 4c1jpvR2bwy7rV8kF9jMnArxL6T5WiP9Jflnp0krEQACLfTR4rCH3X+c++8F2t+KCy8a ywnAu597J/yzsScRm1TNTGyNqbHkHkYT/WhRn0/zdrdRuWKBYstkDCmvKbE3WmzBq91Q KMpkM9V/nqjXCGn1t9NwueVjyv7E1s1Tfb0Djn3zw8Gowd7EjSntr0+nGYYeQ8/EglkG fkzXmeyQnJeYoQ9ABGINW8zLDcTlolNYj4waL18ioZDDXwq80PwGQUjVnc1WHlevY4H6 0HrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=a6zQErMeosxf3V32s7vZ83sGAj1GSeZNuxCfG2vt0ZY=; b=ex/kM5eycssKOvcJka619zrZ+WJWc6UvBawCQfxllc8YwEZMDVN6dy1H4l4emgQdUf 8SFI0e793sJoM+B+B3RvwU05TVbR4cHVsnz2VD4OWUvVHApASR1jVP3bDdoSIWDh+oWd C5OK7EtBqwrBFpzYRdXpxzIHH6QAQ5z5a0zfLtFssrdYyyYD4aDfFPDiyjtDYSjHkkqc t9J+z6EvDazzD6/vOSifiZ9RyIm2d1ZzMoMlzx1y7zRBcJXo9D0+fIoFQSZabFE0HOIg Qz47ADmeXt4Bi44jKhLAIwJBvhWliMDH1Sklv01GeAhBGQ47eGzzRxo4bKpeG6G/AljS wz2A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w15si13292113edf.56.2020.09.16.13.45.13; Wed, 16 Sep 2020 13:45:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726802AbgIPUnH (ORCPT + 99 others); Wed, 16 Sep 2020 16:43:07 -0400 Received: from mx2.suse.de ([195.135.220.15]:50184 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726647AbgIPRDn (ORCPT ); Wed, 16 Sep 2020 13:03:43 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 544ABAFE6; Wed, 16 Sep 2020 14:07:01 +0000 (UTC) Date: Wed, 16 Sep 2020 16:06:43 +0200 From: Oscar Salvador To: HORIGUCHI =?utf-8?B?TkFPWUEo5aCA5Y+j44CA55u05LmfKQ==?= Cc: Aristeu Rozanski , "akpm@linux-foundation.org" , "mhocko@kernel.org" , "tony.luck@intel.com" , "cai@lca.pw" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" Subject: Re: [PATCH v3 0/5] HWpoison: further fixes and cleanups Message-ID: <20200916140624.GA17833@linux> References: <20200914101559.17103-1-osalvador@suse.de> <20200915212222.GA18315@cathedrallabs.org> <20200916134215.GA30407@hori.linux.bs1.fc.nec.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20200916134215.GA30407@hori.linux.bs1.fc.nec.co.jp> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 16, 2020 at 01:42:15PM +0000, HORIGUCHI NAOYA(堀口 直也) wrote: > On Tue, Sep 15, 2020 at 05:22:22PM -0400, Aristeu Rozanski wrote: > I reproduced the similar -EBUSY with small average x86 VM, where it seems to me > a race between page_take_off_buddy() and page allocation. Oscar's debug patch > shows the following kernel messages: > > [ 627.357009] Soft offlining pfn 0x235018 at process virtual address 0x7fd112140000 > [ 627.358747] __get_any_page: 0x235018 free buddy page > [ 627.359875] page:00000000038b52c9 refcount:0 mapcount:-128 mapping:0000000000000000 index:0x1 pfn:0x235018 > [ 627.362002] flags: 0x57ffe000000000() > [ 627.362841] raw: 0057ffe000000000 fffff84648d12688 ffff955abffd1dd0 0000000000000000 > [ 627.364555] raw: 0000000000000001 0000000000000000 00000000ffffff7f 0000000000000000 > [ 627.366258] page dumped because: page_handle_poison > [ 627.367357] page->mem_cgroup:ffff9559b6912000 > [ 627.368342] page_handle_poison: hugepage_or_freepage failed\xb8n > [ 627.368344] soft_offline_free_page: page_handle_poison -EBUSY > [ 627.370901] page:00000000038b52c9 refcount:6 mapcount:3 mapping:000000001226bf89 index:0x2710 pfn:0x235018 > [ 627.373048] aops:ext4_da_aops ino:c63f3 dentry name:"system.journal" > [ 627.374526] flags: 0x57ffe00000201c(uptodate|dirty|lru|private) > [ 627.375865] raw: 0057ffe00000201c fffff84648d300c8 ffff955ab8c3f020 ffff955aba5f4ee0 > [ 627.377586] raw: 0000000000002710 ffff9559b811fc98 0000000500000002 ffff9559b6912000 > [ 627.379308] page dumped because: soft_offline_free_page > [ 627.380480] page->mem_cgroup:ffff9559b6912000 > > CPU 0 CPU 1 > > get_any_page // returns 0 (free buddy path) > soft_offline_free_page > the page is allocated > page_handle_poison -> fail > return -EBUSY > > I'm still not sure why this issue is invisible before rework patch, > but setting migrate type to MIGRATE_ISOLATE during offlining could affect > the behavior sensitively. Well, this is very timing depending. AFAICS, before the rework patchset, we could still race with an allocation as the page could have been allocated between the get_any_page() and the call to set_hwpoison_free_buddy_page() which takes the zone->lock to prevent that. Maybe we just happen to take longer now to reach take_page_off_buddy, so the race window is bigger. AFAICS, this has nothing to do with MIGRATE_ISOLATE, because here we are dealing with pages that already free (part of the buddy system). The only thing that comes to my mind right off the bat, might be to do a "retry" in soft_offline_page in case soft_offline_free_page returns -EBUSY, so we can call again get_any_page and try to handle the new type of page. Something like (untested): @@ -1923,6 +1977,7 @@ int soft_offline_page(unsigned long pfn, int flags) { int ret; struct page *page; + bool try_again = true; if (!pfn_valid(pfn)) return -ENXIO; @@ -1938,6 +1993,7 @@ int soft_offline_page(unsigned long pfn, int flags) return 0; } +retry: get_online_mems(); ret = get_any_page(page, pfn, flags); put_online_mems(); @@ -1945,7 +2001,10 @@ int soft_offline_page(unsigned long pfn, int flags) if (ret > 0) ret = soft_offline_in_use_page(page); else if (ret == 0) - ret = soft_offline_free_page(page); + if (soft_offline_free_page(page) && try_again) { + try_again = false; + goto retry; + } return ret; -- Oscar Salvador SUSE L3