Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp504313pxk; Wed, 2 Sep 2020 07:29:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJydyhS3Bxk3VsnQqTBBxXdS1heCjG9FooDaSp0BTtDJXqgjhwu6ntAW/YbdtKZH8l76ICuk X-Received: by 2002:a17:906:fcc8:: with SMTP id qx8mr270059ejb.13.1599056984376; Wed, 02 Sep 2020 07:29:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1599056984; cv=none; d=google.com; s=arc-20160816; b=bvVTsUpyxPgzmIjYFku1EM+gh4o+NH2gptj/mNOKhreWwh93GQN6EWSCfwl1/Crp7C W2HZxTjTXqHSjZBQEOn4mBdw8NhPAqjHaLtmENqvEYfBzZDNHkkjolEqanJ8jLWwGvoW 4lxuZaiga3ghvUATzL4Sc8lnGmjfrBxg9Amz3o/ChO3pw3zMlPIRkr0DRavlroBNGXzx P4bITa743pmEwSXuJgCv/nv2JyyT9tUPrqy0XfjK4lsouN9a8x/vb5Z2Xn1JNCyXM5+O e2unIuhhjtCG/uQ1F+F328h2Yg4eyN+gNewNnErf/muYPXQ0F1+QOwQQSllC/ZUjxdlA uwdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=u3Brl6sICZlzc94xToldZguqfFAj/4hEX+7tdenJFrc=; b=CIbGCi0saesJLvsLVTVtA7dLVUMa8NrHyMsMorQiZGzy9tKhdWkn14KwJOLS+7nqpt i3LuXCzuQ++wO1p0c1wapt5r/nPbVhsDzPIRuQc5DStN+fMcT1ogB40B2mYmNRI9mjVS ybyZ/1r8kIAVB2QnGL2q62O8JFJa4M13PhYLKVMyOAdS7hBzg+c1clKGy7dFPPLslyqP 0G1w1nSVc8zQaFMtJjHt7UcCPf5PPvzgb2ZOwyTjZ0Mp6UeL8iLmM5HkH6/5wr6xSgjD UBGNgzsG3h1laO7CU9pkoWhFVxJu6dAoVUgVDkUZ4DIXi/tM9U6U15735IIu+RohN1+E vF6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@soleen.com header.s=google header.b=I2Psemck; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t23si2644478ejj.626.2020.09.02.07.29.20; Wed, 02 Sep 2020 07:29:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@soleen.com header.s=google header.b=I2Psemck; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727997AbgIBO2D (ORCPT + 99 others); Wed, 2 Sep 2020 10:28:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39358 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727907AbgIBOVH (ORCPT ); Wed, 2 Sep 2020 10:21:07 -0400 Received: from mail-ej1-x641.google.com (mail-ej1-x641.google.com [IPv6:2a00:1450:4864:20::641]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11D54C061244 for ; Wed, 2 Sep 2020 07:21:02 -0700 (PDT) Received: by mail-ej1-x641.google.com with SMTP id a15so4241244ejf.11 for ; Wed, 02 Sep 2020 07:21:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=u3Brl6sICZlzc94xToldZguqfFAj/4hEX+7tdenJFrc=; b=I2PsemckSsLgoHVt1Xg0x2wQ7e0ID3DBNr3ryKwjK/fgtFGGtX/ir5WKeeB9+bcGW4 gsNX0NPdP6WwLb2mZAZBlTrqY3ZR4ny6DfQ/9yM9BqxEYOjZo437d+dMpJPOm3MLIbWt H0FTN+N++tBXTlSCoZ5nkMWVtR/JPUY74MoElKzak/aYuwsJsRNbo7hokZ91BxPrbJl+ f50sSJRQZsVOM4FUkxtO638SLf0hsHk84p08CyINiAMwJwOXr4q3Ln1fHkVa/Yi8ngX/ zrmlJXRvkI717b/aX3ezQ8wsM4tzGejW/x3/P/mho2WIj4zQ/XGfNwxA2sug1+DFe/Wl 3j8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=u3Brl6sICZlzc94xToldZguqfFAj/4hEX+7tdenJFrc=; b=LuVoaInGHwB/9VGc24bZGFQ14Dt6Z8phOuafTI1UHJ8r6/oS2PXa+GuuCgcP8v+o7X d3cJbxEBecZQmvkgeJ/AxJSvzR1eXtZC2KdjzJSfkqcDhGIynJUxEqRncqL4D5K7KFAi Vg4G420RPfI47yTv4vr4o03vAwijTcf4lAcquwUc5rYO2vn3kVta9ebICoOEdp6O/hel lBFfE+9Kcs5xLdp9OywI/Bh3+gLme1cfL56/k9Jt6PM20NyXfWWWsxTxHfteDdr0isYB uOpCg4ZU+qlHUttWLi8o75+swqbfXz6k/8hjl60VSg1+97EdRTJo86tqBQEbYRJUcy+f bcKw== X-Gm-Message-State: AOAM5307BX8/qW9Ulu3Yjxidx/r9RqAqws7ohTNM3oyvadK4RJ9Eqgyh eVGLekT2iFFEO85pRnCG7aQzKKhUNxxFgxsCmWcoGA== X-Received: by 2002:a17:906:a116:: with SMTP id t22mr220123ejy.353.1599056460668; Wed, 02 Sep 2020 07:21:00 -0700 (PDT) MIME-Version: 1.0 References: <6469324e-afa2-18b4-81fb-9e96466c1bf3@suse.cz> <20200902135018.GF4617@dhcp22.suse.cz> In-Reply-To: <20200902135018.GF4617@dhcp22.suse.cz> From: Pavel Tatashin Date: Wed, 2 Sep 2020 10:20:24 -0400 Message-ID: Subject: Re: [PATCH v2 00/28] The new cgroup slab memory controller To: Michal Hocko Cc: David Hildenbrand , Vlastimil Babka , Roman Gushchin , Bharata B Rao , "linux-mm@kvack.org" , Andrew Morton , Johannes Weiner , Shakeel Butt , Vladimir Davydov , "linux-kernel@vger.kernel.org" , Kernel Team , Yafang Shao , stable , Linus Torvalds , Sasha Levin , Greg Kroah-Hartman , David Hildenbrand Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > > This is how we are using it at Microsoft: there is a very large > > number of small memory machines (8G each) with low downtime > > requirements (reboot must be under a second). There is also a large > > state ~2G of memory that we need to transfer during reboot, otherwise > > it is very expensive to recreate the state. We have 2G of system > > memory memory reserved as a pmem in the device tree, and use it to > > pass information across reboots. Once the information is not needed we > > hot-add that memory and use it during runtime, before shutdown we > > hot-remove the 2G, save the program state on it, and do the reboot. > > I still do not get it. So what does guarantee that the memory is > offlineable in the first place? It is in a movable zone, and we have more than 2G of free memory for successful migrations. > Also what is the difference between > offlining and simply shutting the system down so that the memory is not > used in the first place. In other words what kind of difference > hotremove makes? For performance reasons during system updates/reboots we do not erase memory content. The memory content is erased only on power cycle, which we do not do in production. Once we hot-remove the memory, we convert it back into DAXFS PMEM device, format it into EXT4, mount it as DAX file system, and allow programs to serialize their states to it so they can read it back after the reboot. During startup we mount pmem, programs read the state back, and after that we hotplug the PMEM DAX as a movable zone. This way during normal runtime we have 8G available to programs. Pasha