Received: by 2002:a25:8b12:0:0:0:0:0 with SMTP id i18csp4657089ybl; Mon, 26 Aug 2019 13:53:29 -0700 (PDT) X-Google-Smtp-Source: APXvYqwhQN8eRU63I+5mw/c3lGtQLkEkMccUXcrfSMsgHOpD2ACI5TzD9BmvqjorbY2GSox6KdH7 X-Received: by 2002:a17:90a:1b0a:: with SMTP id q10mr21783298pjq.91.1566852808975; Mon, 26 Aug 2019 13:53:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1566852808; cv=none; d=google.com; s=arc-20160816; b=tAOgThYA+dAbUSer8ROmOXmQCe5qGFF1kzk/EMptIKFeKWqUc9Rf5JZ1mA9LsnaPWn qpYeakOfjgAtPvgweKE2pWEQqhLL9NRrJ58XnDMHA0JGYqOrqer305mbQTV+ET3ttNnn TG/jdclTljnM4NgKNYVNFyQmpYKvz2TGL2q7+E+aQjiIkmeN242MTn9BcDCUPtOY3U9K ZFKWeDoDkHvIDy7vhQkKhzw6JMYiOSxMbsNIUJP1skiEr4qzRVI2YzM3Lr3nif7hVZKq lwwnBl0EtbLHs1Dgq1PgpQK5vIUDouYBMd9QA2jSK4fLAgfoVlhDOsRR3QcKi1v1NecN xljg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=1Fse6KfUwQK5pIfL54PeFWHLZR7X3tE0GyhbPlYXq/0=; b=GKitsR+Ny/HSh68Z0OHkTjTUenWeazIIcEcsgVU7QWUFyddgl6r22G928/SmG418I/ eegMRI8ReKv4y+u0ldB3dz+pZo1sWNpjfzeka9PXLmNU+eqg+QA7XZo9wMbWQaZOl0wJ 5VfeiRJ20Qenhjb5HcAIov30fqkQT2JDQEjxlrAjHmRGNghBMaJD5nRVH0+pv2uN8BZV OxoSH4hvMl+cgWne5lODvbfwjaT+QXmVY4b8tyGd/32P2v9MRg6v6pAH0U4KGbahiL3J 1nprlrDgCJ7dV7K+OjuqrAAqOTN4u4BKPggj+bZuiuwNKaJTRsfsgGfpL1/g5qKMzu5Y 9tBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=eg5LfXDw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s21si10119368pgg.168.2019.08.26.13.53.12; Mon, 26 Aug 2019 13:53:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=eg5LfXDw; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729682AbfHZUoe (ORCPT + 99 others); Mon, 26 Aug 2019 16:44:34 -0400 Received: from mail-pg1-f193.google.com ([209.85.215.193]:46807 "EHLO mail-pg1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727041AbfHZUoe (ORCPT ); Mon, 26 Aug 2019 16:44:34 -0400 Received: by mail-pg1-f193.google.com with SMTP id m3so11268189pgv.13 for ; Mon, 26 Aug 2019 13:44:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=1Fse6KfUwQK5pIfL54PeFWHLZR7X3tE0GyhbPlYXq/0=; b=eg5LfXDwuRDQSPuSCfWN0+tMFtzeJMtHuaG+vg7hSeaFFi8e6uT5NZjXWhEbElNCIm wQoMIFey3azHElIyf1v5hUZ8vDABYzU0+csFOrfao1YDcNwxazAuK1ne8XoZr9BVu+jC LK+ArB2snj9xb9Da862ypDop4sCG1nfUpVXRUfq9hiLj6M8s8El4KfzHP+btClnTEOSE 3dy3vZa8fUTD55e2BSnv4dafcrYUcmNbQOlkfWdDZ5fwKh/BtoLeNmowCU8q+zk2Zy+q e6/7InzJiwBJ4Prto1OonGBcGRd4aK6orEwmlsMPg63DcMIVNIkI0PA8DflKwjE4DAoK /w1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=1Fse6KfUwQK5pIfL54PeFWHLZR7X3tE0GyhbPlYXq/0=; b=MziWe7TFV0xn/XUI2/KQ5HifQHZ/xkf/ja1sp+hsxzmjJTM2ctZXiE9bX9CPT1BZsl m8oNqJaGvTr4VcnP73uSVPm1nwfQSBsWmT+1Zg+5lkiOrX4LMhvgdQNBOuRiPyEbXrTI vJ9jo9wGvsUCu6k5E9Are5bMWiz+gpBwBPW8Xdn3gxM20NZUWClv8jwKM4Fz0GoxmK0S xOxxhFCxo37j2tVTHLUuL7TclIgwaa/knCSAkcFcIX6XVNbmZf8ouTYhLDMzx0DtjUBW W8CNZrfGJjnKHynRBHz9iCadxJBCYLgmGq0cvZa+IGWZngeELVNltiGm4dxEjISUT6IG JtQQ== X-Gm-Message-State: APjAAAXIycMzNBr3iQobtXl1oqDXp0mRGklg+fn1Q+ODM74U7zJnms9R fGFZxnQ9YDqP2wr4woGcxMem60Kr X-Received: by 2002:a63:9d8a:: with SMTP id i132mr7952728pgd.410.1566852273588; Mon, 26 Aug 2019 13:44:33 -0700 (PDT) Received: from bharath12345-Inspiron-5559 ([103.110.42.34]) by smtp.gmail.com with ESMTPSA id u7sm11140563pgr.94.2019.08.26.13.44.24 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 26 Aug 2019 13:44:33 -0700 (PDT) Date: Tue, 27 Aug 2019 02:14:20 +0530 From: Bharath Vedartham To: Michal Hocko Cc: Khalid Aziz , akpm@linux-foundation.org, vbabka@suse.cz, mgorman@techsingularity.net, dan.j.williams@intel.com, osalvador@suse.de, richard.weiyang@gmail.com, hannes@cmpxchg.org, arunks@codeaurora.org, rppt@linux.vnet.ibm.com, jgg@ziepe.ca, amir73il@gmail.com, alexander.h.duyck@linux.intel.com, linux-mm@kvack.org, linux-kernel-mentees@lists.linuxfoundation.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 0/2] Add predictive memory reclamation and compaction Message-ID: <20190826204420.GA16800@bharath12345-Inspiron-5559> References: <20190813014012.30232-1-khalid.aziz@oracle.com> <20190813140553.GK17933@dhcp22.suse.cz> <3cb0af00-f091-2f3e-d6cc-73a5171e6eda@oracle.com> <20190814085831.GS17933@dhcp22.suse.cz> <20190815170215.GQ9477@dhcp22.suse.cz> <2668ad2e-ee52-8c88-22c0-1952243af5a1@oracle.com> <20190821140632.GI3111@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190821140632.GI3111@dhcp22.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Michal, Here are some of my thoughts, On Wed, Aug 21, 2019 at 04:06:32PM +0200, Michal Hocko wrote: > On Thu 15-08-19 14:51:04, Khalid Aziz wrote: > > Hi Michal, > > > > The smarts for tuning these knobs can be implemented in userspace and > > more knobs added to allow for what is missing today, but we get back to > > the same issue as before. That does nothing to make kernel self-tuning > > and adds possibly even more knobs to userspace. Something so fundamental > > to kernel memory management as making free pages available when they are > > needed really should be taken care of in the kernel itself. Moving it to > > userspace just means the kernel is hobbled unless one installs and tunes > > a userspace package correctly. > > From my past experience the existing autotunig works mostly ok for a > vast variety of workloads. A more clever tuning is possible and people > are doing that already. Especially for cases when the machine is heavily > overcommited. There are different ways to achieve that. Your new > in-kernel auto tuning would have to be tested on a large variety of > workloads to be proven and riskless. So I am quite skeptical to be > honest. Could you give some references to such works regarding tuning the kernel? Essentially, Our idea here is to foresee potential memory exhaustion. This foreseeing is done by observing the workload, observing the memory usage of the workload. Based on this observations, we make a prediction whether or not memory exhaustion could occur. If memory exhaustion occurs, we reclaim some more memory. kswapd stops reclaim when hwmark is reached. hwmark is usually set to a fairly low percentage of total memory, in my system for zone Normal hwmark is 13% of total pages. So there is scope for reclaiming more pages to make sure system does not suffer from a lack of pages. Since we are "predicting", there could be mistakes in our prediction. The question is how bad are the mistakes? How much does a wrong prediction cost? A right prediction would be a win. We rightfully predict that there could be exhaustion, this would lead to us reclaiming more memory(than hwmark)/compacting memory beforehand(unlike kcompactd which does it on demand). A wrong prediction on the other hand can be categorized into 2 situations: (i) We foresee memory exhaustion but there is no memory exhaustion in the future. In this case, we would be reclaiming more memory for not a lot of use. This situation is not entirely bad but we definitly waste a few clock cycles. (ii) We don't foresee memory exhaustion but there is memory exhaustion in the future. This is a bad case where we may end up going into direct compaction/reclaim. But it could be the case that the memory exhaustion is far in the future and even though we didnt see it, kswapd could have reclaimed that memory or drop_cache occured. How often we hit wrong predictions of type (ii) would really determine our efficiency. Coming to your situation of provisioning vms. A situation where our work will come to good is when there is a cloud burst. When the demand for vms is super high, our algorithm could adapt to the increase in demand for these vms and reclaim more memory/compact more memory to reduce allocation stalls and improve performance. > Therefore I would really focus on discussing whether we have sufficient > APIs to tune the kernel to do the right thing when needed. That requires > to identify gaps in that area. One thing that comes to my mind is based on the issue Khalid mentioned earlier on how his desktop took more than 30secs to boot up because of the caches using up a lot of memory. Rather than allowing any unused memory to be the page cache, would it be a good idea to fix a size for the caches and elastically change the size based on the workload? Thank you Bharath > -- > Michal Hocko > SUSE Labs >