Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp520399pxk; Wed, 2 Sep 2020 07:52:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJySeQ5hKSsoiFcL/11uDXkgk++kzFZZ52Oj25pAa/dq10R+u0eBazsVm+nl+5l7IFQGPtCz X-Received: by 2002:a17:906:386:: with SMTP id b6mr368466eja.538.1599058350545; Wed, 02 Sep 2020 07:52:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599058350; cv=none; d=google.com; s=arc-20160816; b=QBa+Lwc76FJc9iLs/UlvissfPPSNOu9DKFwFiCdx7CkIpftTWqys23EuncOThz2jpB 9sOACEM5ojDraxsLoLAYC/gqqW1OChxmujpq7TsQsw81EBNMX13RhEdcIMw4j8HLl2k9 ZQ81Ig8latFVyft3jw8lahD2gDXPXNFxo4o6bZsHB5ODcT4fmqVqhpjjE3CZKrAnvSVF OIaF0WdWxgeRSe90Msw4LibgpyQGpN12GP/hMify+hFcOn1LaLA1Y0bV8VIKXEBx+F9L 3BBA/YlOQKlSSciCFSWfAXPNgRjGWtippePMTSvm1opb7ybFgIehoISgYqkBuOyAH9j0 Zexg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=24iRSsQWlATvcy9yZUm5N/rRdtfpcrnVU3W5p9ZhRcg=; b=PxFr5sBl4l2FhS6aHyCUI3tHQjOb15trEZ+aFPPXp5j4nKA8U2BJaT1MNqS6TaBdMG pQun/myd4bWjIVEgdxrJGlj3EI5W4SEJe4EYuMcVWRVDdAY9Myc5zqlgmCaQMHoPT7sF 63hLHw5wLMD6bl7bu9J9OOciK0/Dvne/kmDKVppbRB4zZwxqjKVhpXNQ4Xax+hfccsWr xnz+Nwyt/vQPVSJDU7Gjh8Yn0NTOoiAvrfycgUfiFpMQvjqVZ8e+PCDziKUd2xW4ao23 G1GSzBXx5YRQG5PWrj2y5xMW+KRM6wRGhoIpUUiYFC8GWqrgEJfO2D4bWW74AwvOwidS 0lUA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id r15si2606511ejx.326.2020.09.02.07.52.07; Wed, 02 Sep 2020 07:52:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727984AbgIBOta (ORCPT + 99 others); Wed, 2 Sep 2020 10:49:30 -0400 Received: from mx2.suse.de ([195.135.220.15]:40892 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726771AbgIBOtV (ORCPT ); Wed, 2 Sep 2020 10:49:21 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 849C4ACBA; Wed, 2 Sep 2020 14:49:21 +0000 (UTC) Subject: Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline To: Pavel Tatashin , Michal Hocko Cc: LKML , Andrew Morton , linux-mm , Mel Gorman References: <20200901124615.137200-1-pasha.tatashin@soleen.com> <20200902140116.GI4617@dhcp22.suse.cz> <20200902141057.GK4617@dhcp22.suse.cz> From: Vlastimil Babka Message-ID: Date: Wed, 2 Sep 2020 16:49:19 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/2/20 4:31 PM, Pavel Tatashin wrote: >> > > The fix is to try to drain per-cpu lists again after >> > > check_pages_isolated_cb() fails. >> >> Still trying to wrap my head around this but I think this is not a >> proper fix. It should be the page isolation to make sure no races are >> possible with the page freeing path. >> > > As Bharata B Rao found in another thread, the problem was introduced > by this change: > c52e75935f8d: mm: remove extra drain pages on pcp list > > So, the drain used to be tried every time with lru_add_drain_all(); > Which, I think is excessive, as we start a thread per cpu to try to > drain and catch a rare race condition. With the proposed change we > drain again only when we find such a condition. Fixing it in > start_isolate_page_range means that we must somehow synchronize it > with the release_pages() which adds costs to runtime code, instead of > to hot-remove code. Agreed. Isolation was always racy wrt freeing to pcplists, and it was simply acceptable to do some extra drains if needed. Removing that race would be indeed acceptable only if it didn't affect alloc/free fastpaths. > Pasha >