Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp437378pxk; Wed, 2 Sep 2020 05:55:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyuh8MHpm8ZC8xQNq54wJJsJdhaGoVWQCYToBpMq3fM0T3lgxbWZiaWZBg9NIhI7nMGNTA7 X-Received: by 2002:a17:906:3cc:: with SMTP id c12mr6393498eja.333.1599051309497; Wed, 02 Sep 2020 05:55:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599051309; cv=none; d=google.com; s=arc-20160816; b=gDuvTyaBEjGRtCaMxTgukLxF48t6TJzTJ3x2gptFTFAiry4y665fMbeDHM0gQVWkHM doeyhrONTN6DwDQx4ndVgCfn9b3C5+SpgV2XqsXvA/KdQpRdln5p6AemMsZWRZlKJUHQ nmZbBs90D//wL1DnE6YzhQjommwZ8/sIhaLeDNRWXOQdpmQtM1vLbol3dfvKA+Au0N8o HWae5yrTFmbnB2r4YhXGStQdWXqEyPE00D+JVHRd4geT7al+FyNQRgYUNbMlJTjO1fva TAU3JUhMeEaL7N1c8II41Q0yzBQHCXunPO53SSuynGwllDR4qTKSuK7Y9cC7AKG8wK2T 6o8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=vb8nEfGlhVBFAznOemnS757d63JHXGeX/MUl+8OqStw=; b=jiJDYZ95aGiVn0bgRxi0ms8D90noEXTFGqia1n5bjFk3Jwd/Gq6aymZvgReRRya6pc 13DMD1SF2KjoY03XgXE/eRaSCVQXYiHgDZTBVDGUxOxUAUWO+B+TFHmiGoNc8an/IatE 3dzVf1wzOIUgybG30Iujk5Tv83VczSaaG6SoImHkM/wEk/Y+tC1HnS9L8wLCFxXB9tWa 6hbiyJ+MCw7To3nBAzo6igFjYdZJjQrHb5oUf/0BhQ1WqyD50Mqw2IsenRACpdSzYeWj Chm5DA5t7g+GVo5HyxBjfk996WoN90lOXmrhBSMTzN6i/LMhIfPhQqJSJTviOiPGUyMn sGJQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@soleen.com header.s=google header.b=SYxd7JAe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id sd6si2564270ejb.207.2020.09.02.05.54.45; Wed, 02 Sep 2020 05:55:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@soleen.com header.s=google header.b=SYxd7JAe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726942AbgIBMvr (ORCPT + 99 others); Wed, 2 Sep 2020 08:51:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726637AbgIBMvp (ORCPT ); Wed, 2 Sep 2020 08:51:45 -0400 Received: from mail-ej1-x643.google.com (mail-ej1-x643.google.com [IPv6:2a00:1450:4864:20::643]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53201C061245 for ; Wed, 2 Sep 2020 05:51:44 -0700 (PDT) Received: by mail-ej1-x643.google.com with SMTP id lo4so6460013ejb.8 for ; Wed, 02 Sep 2020 05:51:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=vb8nEfGlhVBFAznOemnS757d63JHXGeX/MUl+8OqStw=; b=SYxd7JAeBsojOK2HUAbQyxkhg1r2C33uBFoIFXIdeqIVKX1/oBdUHl9X47BM1Irz6D 24JJGwShtYpRXBQHDq0ap5W78KaAZab4FGlK5TlftAOyS+INqj2HqACK+BewNBms9+hQ 1KsBJbQYD8yb/ODXzfAkQfuclmUacIuGK8ecF1/h7gl83zPGuZQSgZBZoWkfoleS8Z/p yYgY+vxRSoj5p2o9I6xBA7+Eag6iQfUbh8pGhIs3gJoQ+wvK/x3mNY431ncbl+amubjR QDoq1oKdlhTDKI6qRQ5WztZ6wB0fRJP+w9S8Uy8WFrOugIyUV0VEvee4h3eN8VTLoWb+ 51vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=vb8nEfGlhVBFAznOemnS757d63JHXGeX/MUl+8OqStw=; b=Z80eUBmVUAS77t9Zh9UtUlazKXNPp0k9NY+pEATEY6PHNrZfMoW6TX/8fPDccWeBXl mjnK6HQFqF6AfVfrWwuqpDugmoKUaJKboyC5S2Yz1G0Ql7Exsga+CZ2wx4oE+zSkx30i IqGG26fD1FTfRJvSqrEnOReYTibrIZR3nK2FIC7jp6shNmY9QCRsICYeDWtyaDtisuNz 9pRA37XX0OZ0lt+UYUhKkHHsufi5aYZWz9sASPLf2mOmeFkUkMEQqS03nBYvCJoozik+ SL9x+xb9hE9GgRVdM80FIvvVCgHLnL391CGxBsE9JJI97s6IqgUlj+eKZGlW2klgHcp6 xREw== X-Gm-Message-State: AOAM532qz856o2ZBTfryAc4RnxW155/i7eF2+kJ4zOlzqNeY3efulC7K 3uuAFw8Jm8aWMzI8QGaylao5PBxcgJLtFc73kAObMw== X-Received: by 2002:a17:907:2055:: with SMTP id pg21mr6481789ejb.501.1599051102784; Wed, 02 Sep 2020 05:51:42 -0700 (PDT) MIME-Version: 1.0 References: <20200127173453.2089565-1-guro@fb.com> <20200130020626.GA21973@in.ibm.com> <20200130024135.GA14994@xps.DHCP.thefacebook.com> <20200813000416.GA1592467@carbon.dhcp.thefacebook.com> <6469324e-afa2-18b4-81fb-9e96466c1bf3@suse.cz> <20200902112624.GC4617@dhcp22.suse.cz> In-Reply-To: <20200902112624.GC4617@dhcp22.suse.cz> From: Pavel Tatashin Date: Wed, 2 Sep 2020 08:51:06 -0400 Message-ID: Subject: Re: [PATCH v2 00/28] The new cgroup slab memory controller To: Michal Hocko Cc: Vlastimil Babka , Roman Gushchin , Bharata B Rao , "linux-mm@kvack.org" , Andrew Morton , Johannes Weiner , Shakeel Butt , Vladimir Davydov , "linux-kernel@vger.kernel.org" , Kernel Team , Yafang Shao , stable , Linus Torvalds , Sasha Levin , Greg Kroah-Hartman , David Hildenbrand Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > > Thread #1: memory hot-remove systemd service > > > Loops indefinitely, because if there is something still to be migrated > > > this loop never terminates. However, this loop can be terminated via > > > signal from systemd after timeout. > > > __offline_pages() > > > do { > > > pfn = scan_movable_pages(pfn, end_pfn); > > > # Returns 0, meaning there is nothing available to > > > # migrate, no page is PageLRU(page) > > > ... > > > ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, > > > NULL, check_pages_isolated_cb); > > > # Returns -EBUSY, meaning there is at least one PFN that > > > # still has to be migrated. > > > } while (ret); > Hi Micahl, > This shouldn't really happen. What does prevent from this to proceed? > Did you manage to catch the specific pfn and what is it used for? I did. > start_isolate_page_range and scan_movable_pages should fail if there is > any memory that cannot be migrated permanently. This is something that > we should focus on when debugging. I was hitting this issue: mm/memory_hotplug: drain per-cpu pages again during memory offline https://lore.kernel.org/lkml/20200901124615.137200-1-pasha.tatashin@soleen.com Once the pcp drain race is fixed, this particular deadlock becomes irrelavent. The lock ordering, however, cgroup_mutex -> mem_hotplug_lock is bad, and the first race condition that I was hitting and described above is still present. For now I added a temporary workaround by using save to file instead of piping the core during shutdown. I am glad the mainline is fixed, but stables should also have some kind of fix for this problem. Pasha