Received: by 2002:a05:7412:98c1:b0:fa:551:50a7 with SMTP id kc1csp309303rdb; Fri, 5 Jan 2024 10:29:16 -0800 (PST) X-Google-Smtp-Source: AGHT+IE5EapFL/883LXxh3DzZzhb3/Hg3mnPotSe8h3V3BZOUOvcy7MR3XjGiMn7NTFZyl03MuBW X-Received: by 2002:a9d:6d81:0:b0:6dc:27df:ec75 with SMTP id x1-20020a9d6d81000000b006dc27dfec75mr2624022otp.23.1704479356289; Fri, 05 Jan 2024 10:29:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704479356; cv=none; d=google.com; s=arc-20160816; b=0m/eKP3zVTr28UsYWqDFB7Uih9VG0dmy5C756E3aUeolzZgvDsDjgle8Ae7OVL1/+M lnT2w0AchgpxhgPCFcPpmcQZho7oQgmq15dTRzbrweRTrZKniD1BowzO0KXeNeotBn5O tNrNSfXBg3rNoAY/XW4kNBoa11A6AyiXwppZpT4U3A07PewzFVmaEvUBfogSVqgQf96/ 6fHiZR2PkFhKniUFNBnxXijoYpP813nDhKIwg2jf3CxWSwgto2xGKTJh0bmfcxIldKsL oeydeltHaYDqcgglKNdq34GOtvqTK9+oqTWpF+Z8yyEqiyYd6wPF+koGSju1a4wWFLjs dh7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:autocrypt:from:references:cc :to:content-language:subject:user-agent:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:date:message-id :dkim-signature; bh=+XWVgfxg2yP0CcdclE5GlUr++EjiHn7Khirff1S4+8w=; fh=6Bu/dQftbuEFDSeNT53OfBY+VriwYZCnXYdTlXeFAsU=; b=EgZk4c6m8uJwAMDdBn9uktjXbQA7rhw/zWiYa6+7wmoTojphkydRh5coZxMmsXoVFr CD7rg7iopiSnEnqyeDohRJOlL3Cukg+S96r6f6BxBD19Cj3zmh0B6O6yegZGg/xbTueH TgeUGxttFt8RfSWkgRBAW3spWqWgSASm1uP+BGaK2H0FJ8Ua8cNMp72n0yy+pSyCzuT8 9NlPHFlgBeyG/eKaVETqbmReWbkI1+vRZbDoHMfd2hTRGo7sWYtQnKvE707dbSfavfmT czeKGrKaO3/AVMUGufheaNsdGaWbq3EzXGfkW47OKYJQZq/FlEGJnhgEvWUHcFDmbMD+ ubNw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CBs5OSGq; spf=pass (google.com: domain of linux-kernel+bounces-18216-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-18216-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id r1-20020a05622a034100b004297d4f552csi709486qtw.599.2024.01.05.10.29.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 05 Jan 2024 10:29:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-18216-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=CBs5OSGq; spf=pass (google.com: domain of linux-kernel+bounces-18216-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-18216-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 0D1BB1C236A3 for ; Fri, 5 Jan 2024 18:29:16 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1D9F23529A; Fri, 5 Jan 2024 18:29:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CBs5OSGq" X-Original-To: linux-kernel@vger.kernel.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F30235887; Fri, 5 Jan 2024 18:29:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1704479347; x=1736015347; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=B1cGgk2x4GOJQVRsb0t9LqqD4V45rFbkA/2x1daK9v0=; b=CBs5OSGqs5zxXGVgwHGgrVR0Ydx6SPYLlt/6J5TS1h5IFsc5Kz7C4Mdy nbqJv3AqqT8bU8969wkJwsqMh+xHTrV5USskbEFy/ctjDjtCqS2ZMxNMx ztM6Ttu4xcyn30y7gr8VHmnfP5Uxj30F4gvu9bXtymZ2ks/Ld5EmQWaaC 9WxjuUU1OtNgZ635f/MYvna6Bhr+HEvhR80TGzz+XHlJxE9o9WZ9xlU5l Aj04PME1PWEOrxRtl4ui51ZDqbmVKc65mKFZaYpNmQwDcNoS0PQoEcBHJ WIjkIdqeusWXhTFX1qV93mNvDefOTruyZL6ubcyPw00f4Ao0oPZvkGkW9 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10944"; a="4925947" X-IronPort-AV: E=Sophos;i="6.04,334,1695711600"; d="scan'208";a="4925947" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmvoesa104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jan 2024 10:29:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10944"; a="954018550" X-IronPort-AV: E=Sophos;i="6.04,334,1695711600"; d="scan'208";a="954018550" Received: from sbidasar-mobl.amr.corp.intel.com (HELO [10.209.42.45]) ([10.209.42.45]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jan 2024 10:29:05 -0800 Message-ID: <772cb101-33e3-4b55-a311-142b22575e7c@intel.com> Date: Fri, 5 Jan 2024 10:29:05 -0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v6 00/12] Add Cgroup support for SGX EPC memory Content-Language: en-US To: Haitao Huang , jarkko@kernel.org, dave.hansen@linux.intel.com, tj@kernel.org, mkoutny@suse.com, linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org, x86@kernel.org, cgroups@vger.kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, sohil.mehta@intel.com Cc: zhiquan1.li@intel.com, kristen@linux.intel.com, seanjc@google.com, zhanb@microsoft.com, anakrish@microsoft.com, mikko.ylinen@linux.intel.com, yangjie@microsoft.com References: <20231030182013.40086-1-haitao.huang@linux.intel.com> From: Dave Hansen Autocrypt: addr=dave.hansen@intel.com; keydata= xsFNBE6HMP0BEADIMA3XYkQfF3dwHlj58Yjsc4E5y5G67cfbt8dvaUq2fx1lR0K9h1bOI6fC oAiUXvGAOxPDsB/P6UEOISPpLl5IuYsSwAeZGkdQ5g6m1xq7AlDJQZddhr/1DC/nMVa/2BoY 2UnKuZuSBu7lgOE193+7Uks3416N2hTkyKUSNkduyoZ9F5twiBhxPJwPtn/wnch6n5RsoXsb ygOEDxLEsSk/7eyFycjE+btUtAWZtx+HseyaGfqkZK0Z9bT1lsaHecmB203xShwCPT49Blxz VOab8668QpaEOdLGhtvrVYVK7x4skyT3nGWcgDCl5/Vp3TWA4K+IofwvXzX2ON/Mj7aQwf5W iC+3nWC7q0uxKwwsddJ0Nu+dpA/UORQWa1NiAftEoSpk5+nUUi0WE+5DRm0H+TXKBWMGNCFn c6+EKg5zQaa8KqymHcOrSXNPmzJuXvDQ8uj2J8XuzCZfK4uy1+YdIr0yyEMI7mdh4KX50LO1 pmowEqDh7dLShTOif/7UtQYrzYq9cPnjU2ZW4qd5Qz2joSGTG9eCXLz5PRe5SqHxv6ljk8mb ApNuY7bOXO/A7T2j5RwXIlcmssqIjBcxsRRoIbpCwWWGjkYjzYCjgsNFL6rt4OL11OUF37wL QcTl7fbCGv53KfKPdYD5hcbguLKi/aCccJK18ZwNjFhqr4MliQARAQABzUVEYXZpZCBDaHJp c3RvcGhlciBIYW5zZW4gKEludGVsIFdvcmsgQWRkcmVzcykgPGRhdmUuaGFuc2VuQGludGVs LmNvbT7CwXgEEwECACIFAlQ+9J0CGwMGCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEGg1 lTBwyZKwLZUP/0dnbhDc229u2u6WtK1s1cSd9WsflGXGagkR6liJ4um3XCfYWDHvIdkHYC1t MNcVHFBwmQkawxsYvgO8kXT3SaFZe4ISfB4K4CL2qp4JO+nJdlFUbZI7cz/Td9z8nHjMcWYF IQuTsWOLs/LBMTs+ANumibtw6UkiGVD3dfHJAOPNApjVr+M0P/lVmTeP8w0uVcd2syiaU5jB aht9CYATn+ytFGWZnBEEQFnqcibIaOrmoBLu2b3fKJEd8Jp7NHDSIdrvrMjYynmc6sZKUqH2 I1qOevaa8jUg7wlLJAWGfIqnu85kkqrVOkbNbk4TPub7VOqA6qG5GCNEIv6ZY7HLYd/vAkVY E8Plzq/NwLAuOWxvGrOl7OPuwVeR4hBDfcrNb990MFPpjGgACzAZyjdmYoMu8j3/MAEW4P0z F5+EYJAOZ+z212y1pchNNauehORXgjrNKsZwxwKpPY9qb84E3O9KYpwfATsqOoQ6tTgr+1BR CCwP712H+E9U5HJ0iibN/CDZFVPL1bRerHziuwuQuvE0qWg0+0SChFe9oq0KAwEkVs6ZDMB2 P16MieEEQ6StQRlvy2YBv80L1TMl3T90Bo1UUn6ARXEpcbFE0/aORH/jEXcRteb+vuik5UGY 5TsyLYdPur3TXm7XDBdmmyQVJjnJKYK9AQxj95KlXLVO38lczsFNBFRjzmoBEACyAxbvUEhd GDGNg0JhDdezyTdN8C9BFsdxyTLnSH31NRiyp1QtuxvcqGZjb2trDVuCbIzRrgMZLVgo3upr MIOx1CXEgmn23Zhh0EpdVHM8IKx9Z7V0r+rrpRWFE8/wQZngKYVi49PGoZj50ZEifEJ5qn/H Nsp2+Y+bTUjDdgWMATg9DiFMyv8fvoqgNsNyrrZTnSgoLzdxr89FGHZCoSoAK8gfgFHuO54B lI8QOfPDG9WDPJ66HCodjTlBEr/Cwq6GruxS5i2Y33YVqxvFvDa1tUtl+iJ2SWKS9kCai2DR 3BwVONJEYSDQaven/EHMlY1q8Vln3lGPsS11vSUK3QcNJjmrgYxH5KsVsf6PNRj9mp8Z1kIG qjRx08+nnyStWC0gZH6NrYyS9rpqH3j+hA2WcI7De51L4Rv9pFwzp161mvtc6eC/GxaiUGuH BNAVP0PY0fqvIC68p3rLIAW3f97uv4ce2RSQ7LbsPsimOeCo/5vgS6YQsj83E+AipPr09Caj 0hloj+hFoqiticNpmsxdWKoOsV0PftcQvBCCYuhKbZV9s5hjt9qn8CE86A5g5KqDf83Fxqm/ vXKgHNFHE5zgXGZnrmaf6resQzbvJHO0Fb0CcIohzrpPaL3YepcLDoCCgElGMGQjdCcSQ+Ci FCRl0Bvyj1YZUql+ZkptgGjikQARAQABwsFfBBgBAgAJBQJUY85qAhsMAAoJEGg1lTBwyZKw l4IQAIKHs/9po4spZDFyfDjunimEhVHqlUt7ggR1Hsl/tkvTSze8pI1P6dGp2XW6AnH1iayn yRcoyT0ZJ+Zmm4xAH1zqKjWplzqdb/dO28qk0bPso8+1oPO8oDhLm1+tY+cOvufXkBTm+whm +AyNTjaCRt6aSMnA/QHVGSJ8grrTJCoACVNhnXg/R0g90g8iV8Q+IBZyDkG0tBThaDdw1B2l asInUTeb9EiVfL/Zjdg5VWiF9LL7iS+9hTeVdR09vThQ/DhVbCNxVk+DtyBHsjOKifrVsYep WpRGBIAu3bK8eXtyvrw1igWTNs2wazJ71+0z2jMzbclKAyRHKU9JdN6Hkkgr2nPb561yjcB8 sIq1pFXKyO+nKy6SZYxOvHxCcjk2fkw6UmPU6/j/nQlj2lfOAgNVKuDLothIxzi8pndB8Jju KktE5HJqUUMXePkAYIxEQ0mMc8Po7tuXdejgPMwgP7x65xtfEqI0RuzbUioFltsp1jUaRwQZ MTsCeQDdjpgHsj+P2ZDeEKCbma4m6Ez/YWs4+zDm1X8uZDkZcfQlD9NldbKDJEXLIjYWo1PH hYepSffIWPyvBMBTW2W5FRjJ4vLRrJSUoEfJuPQ3vW9Y73foyo/qFoURHO48AinGPZ7PC7TF vUaNOTjKedrqHkaOcqB185ahG2had0xnFsDPlx5y In-Reply-To: <20231030182013.40086-1-haitao.huang@linux.intel.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit There's very little about how the LRU design came to be in this cover letter. Let's add some details. How's this? Writing this up, I'm a lot more convinced that this series is, in general, taking the right approach. I honestly don't see any other alternatives. As much as I'd love to do something stupidly simple like just killing enclaves at the moment they hit the limit, that would be a horrid experience for users _and_ a departure from what the existing reclaim support does. That said, there's still a lot of work do to do refactor this series. It's in need of some love to make it more clear what is going on and to making the eventual switch over to per-cgroup LRUs more gradual. Each patch in the series is still doing way too much, _especially_ in patch 10. == The existing EPC memory management aims to be a miniature version of the core VM where EPC memory can be overcommitted and reclaimed. EPC allocations can wait for reclaim. The alternative to waiting would have been to send a signal and let the enclave die. This series attempts to implement that same logic for cgroups, for the same reasons: it's preferable to wait for memory to become available and let reclaim happen than to do things that are fatal to enclaves. There is currently a global reclaimable page SGX LRU list. That list (and the existing scanning algorithm) is essentially useless for doing reclaim when a cgroup hits its limit because the cgroup's pages are scattered around that LRU. It is unspeakably inefficient to scan a linked list with millions of entries for what could be dozens of pages from a cgroup that needs reclaim. Even if unspeakably slow reclaim was accepted, the existing scanning algorithm only picks a few pages off the head of the global LRU. It would either need to hold the list locks for unreasonable amounts of time, or be taught to scan the list in pieces, which has its own challenges. tl;dr: An cgroup hitting its limit should be as similar as possible to the system running out of EPC memory. The only two choices to implement that are nasty changes the existing LRU scanning algorithm, or to add new LRUs. The result: Add a new LRU for each cgroup and scans those instead. Replace the existing global cgroup with the root cgroup's LRU (only when this new support is compiled in, obviously).