Received: by 2002:a05:7412:d8a:b0:e2:908c:2ebd with SMTP id b10csp483373rdg; Tue, 10 Oct 2023 17:01:52 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHZl0MMs88m0jlxqBJt1CAs1JMYm/hEnEjKqzQIWbrqIiWV1pmP1ahKgsfBg1XTE0ShsHks X-Received: by 2002:a05:6358:7e07:b0:139:c75f:63eb with SMTP id o7-20020a0563587e0700b00139c75f63ebmr22608614rwm.21.1696982512143; Tue, 10 Oct 2023 17:01:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696982512; cv=none; d=google.com; s=arc-20160816; b=oG8Ywr6VTMxDI7qMCEaacdx4ZN4YLCqN1CCr2kMrlQxC69tw3t8RKzFt1QgNP54Jx3 SaGYvJKgggOYIxNgzPxoJP8u05VcxgNTc3I/ZZTgwap2C1uJbEpcd6WUCM3izpuADdFH 88F1WWn6CA5qRGLW6Q7w6VWYonRRLiUng0cPc4+KMmYKFO+ELipqRNB/BTHv2nTDWV8q PgYZ38TY0cNGqW63NwCPL+5OmtW13A9tJpi52M8Wf0J0rCQ9pB86KjNh7j9Y4nOf7yNE daz4mO3UPQdPpvwv3xcwtKSkOwQh44kzJckC0s4Xtcmo19siT0KB0RHfqnM/jK6FX/O+ 8OgQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=zx6blw3s97QmWxGpjq4G1Ep7AiFARQsfwR/ogZE8NGk=; fh=Wxe0AjUJ6uxcA67aMQXR2/vuPpIaUCJrpg2RMLyoJII=; b=RzOPB1K6xZtGsQQxhQ+kIxhojjf4ZOz2I8QeMOwXw09Ew9YfL8gkqoLa1k7VXEAkun 1I9cp1GAIonh3XZIr3eg18Px67dhYOJ23KRmRko4WjcxgL7qDCOULLmo15kKhn9uTbTc s4hRsszvHbOszAaUsFWBN6CmGqfjTFlYnqaOvVeog/FZqTH1qSQryge8/j4+B7FWWS/p 4afpVwzzwGcIfRWEcQipnNKfWJS63CU3/bWAizfMhVX6JZ7mu+BS0Sr/NKFXGokwey1/ wfxmihnRaHneHvfs3CqeXqEIxiK21swsr4l1fNkSQYWBahVG4RFs6TqEOl8fKnzDPHbq RuWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=cYxIPgft; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from howler.vger.email (howler.vger.email. [2620:137:e000::3:4]) by mx.google.com with ESMTPS id u20-20020a63b554000000b0054405623a4asi2089017pgo.615.2023.10.10.17.01.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 10 Oct 2023 17:01:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) client-ip=2620:137:e000::3:4; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=cYxIPgft; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:4 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id 6F7D38030A91; Tue, 10 Oct 2023 17:01:49 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344204AbjJKABe (ORCPT + 99 others); Tue, 10 Oct 2023 20:01:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344266AbjJKAB3 (ORCPT ); Tue, 10 Oct 2023 20:01:29 -0400 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B92299D for ; Tue, 10 Oct 2023 17:01:27 -0700 (PDT) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-59f61a639b9so96101517b3.1 for ; Tue, 10 Oct 2023 17:01:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1696982487; x=1697587287; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=zx6blw3s97QmWxGpjq4G1Ep7AiFARQsfwR/ogZE8NGk=; b=cYxIPgft69sX9irmLtF0U6HBpxczMc/SfecP0u31r3wQL1jpZy51X2LfuGlmvkdg8X QAP5NEAC18akGQHyT7LThqI160sgPyslaSr6zqbC5+znjgtk33z+f2bt+4Tp/bejRJkj KfKzdhcg8q64gUR7aY4CDIFWk/XQ1m1PbqwlsXW0jOZDoTM72Dno0ByIrXhX5WG5/QLi IRsYElQ6aGhAsUuV5U4pBBBDDdjlmcfnSn39b+Y3FA5yfyew1VSfqXkvUULIEeIIr6b6 jNhysV1DfFHQenxyRnVwUsp+WZAf0Heqim2KshNGIQGGI9yEzkBU5y8FtM8Tmhb/dL9u sNMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696982487; x=1697587287; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zx6blw3s97QmWxGpjq4G1Ep7AiFARQsfwR/ogZE8NGk=; b=SNFaUhVJrHY1lzE6rJj8Eh0rytEII/VJfv0e2J4PJOOt3yS/XzZHXmw4KtNjUiUn19 HuqluYy9HRnTdd3X46DZ7J5TwgkGURXEsjmiWqtGhjFQ+ujU/sl/nwIEgU3+XUvM/Pi1 nZtir7xZn03CTd7S5otQlEvIQnIhQZHYb1tASgXADUflnz0OdjOCRY0hso90c9km8Qc+ CsIUHjQc170vQDRimj8SSuOixcoGDGXYnNFma/VMHHTPHr9vAsJMcCz4r18VnwOxC7hf 8PaZr8WjLhkSquglevaMp1PDhnKrSjBtkbGUr7K1VTaPAgMHGqnTwAfbV1nwQJYEnqll xnsA== X-Gm-Message-State: AOJu0YxM/4pKxIbAHeygmAQ4H0EClUm7tHivY0uHgLms/0K44F6darG0 7kVcZnF8OYWpj2aXgi/yngQcwXe+83U= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a81:ca4d:0:b0:59b:e81f:62ab with SMTP id y13-20020a81ca4d000000b0059be81f62abmr403963ywk.7.1696982486958; Tue, 10 Oct 2023 17:01:26 -0700 (PDT) Date: Tue, 10 Oct 2023 17:01:25 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230923030657.16148-1-haitao.huang@linux.intel.com> <20230923030657.16148-13-haitao.huang@linux.intel.com> <1b265d0c9dfe17de2782962ed26a99cc9d330138.camel@intel.com> <548d2ab828307f7d1c6d7f707e587cd27b0e7fe4.camel@intel.com> Message-ID: Subject: Re: [PATCH v5 12/18] x86/sgx: Add EPC OOM path to forcefully reclaim EPC From: Sean Christopherson To: Haitao Huang Cc: Kai Huang , Bo Zhang , "linux-sgx@vger.kernel.org" , "cgroups@vger.kernel.org" , "yangjie@microsoft.com" , Zhiquan1 Li , "dave.hansen@linux.intel.com" , "linux-kernel@vger.kernel.org" , "mingo@redhat.com" , "tglx@linutronix.de" , "tj@kernel.org" , "anakrish@microsoft.com" , "jarkko@kernel.org" , "hpa@zytor.com" , "mikko.ylinen@linux.intel.com" , Sohil Mehta , "bp@alien8.de" , "x86@kernel.org" , "kristen@linux.intel.com" Content-Type: text/plain; charset="us-ascii" X-Spam-Status: No, score=-4.8 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 10 Oct 2023 17:01:49 -0700 (PDT) On Tue, Oct 10, 2023, Haitao Huang wrote: > On Mon, 09 Oct 2023 21:23:12 -0500, Huang, Kai wrote: > > > On Mon, 2023-10-09 at 20:42 -0500, Haitao Huang wrote: > > > Hi Sean > > > > > > On Mon, 09 Oct 2023 19:23:04 -0500, Sean Christopherson > > > wrote: > > > > I can see userspace wanting to explicitly terminate the VM instead of > > > > "silently" > > > > the VM's enclaves, but that seems like it should be a knob in the > > > > virtual EPC > > > > code. > > > > > > If my understanding above is correct and understanding your statement > > > above correctly, then don't see we really need separate knob for vEPC > > > code. Reaching a cgroup limit by a running guest (assuming dynamic > > > allocation implemented) should not translate automatically killing > > > the VM. > > > Instead, it's user space job to work with guest to handle allocation > > > failure. Guest could page and kill enclaves. > > > > > > > IIUC Sean was talking about changing misc.max _after_ you launch SGX VMs: > > > > 1) misc.max = 100M > > 2) Launch VMs with total virtual EPC size = 100M <- success > > 3) misc.max = 50M > > > > 3) will also succeed, but nothing will happen, the VMs will be still > > holding 100M EPC. > > > > You need to somehow track virtual EPC and kill VM instead. > > > > (or somehow fail to do 3) if it is also an acceptable option.) > > > Thanks for explaining it. > > There is an error code to return from max_write. I can add that too to the > callback definition and fail it when it can't be enforced for any reason. > Would like some community feedback if this is acceptable though. That likely isn't acceptable. E.g. create a cgroup with both a host enclave and virtual EPC, set the hard limit to 100MiB. Virtual EPC consumes 50MiB, and the host enclave consumes 50MiB. Userspace lowers the limit to 49MiB. The cgroup code would reclaim all of the enclave's reclaimable EPC, and then kill the enclave because it's still over the limit. And then fail the max_write because the cgroup is *still* over the limit. So in addition to burning a lot of cycles, from userspace's perspective its enclave was killed for no reason, as the new limit wasn't actually set. > I think to solve it ultimately, we need be able to adjust 'capacity' of VMs > not to just kill them, which is basically the same as dynamic allocation > support for VMs (being able to increase/decrease epc size when it is > running). For now, we only have static allocation so max can't be enforced > once it is launched. No, reclaiming virtual EPC is not a requirement. VMM EPC oversubscription is insanely complex, and I highly doubt any users actually want to oversubcribe VMs. There are use cases for cgroups beyond oversubscribing/swapping, e.g. privileged userspace may set limits on a container to ensure the container doesn't *accidentally* consume more EPC than it was allotted, e.g. due to a configuration bug that created a VM with more EPC than it was supposed to have. My comments on virtual EPC vs. cgroups is much more about having sane, well-defined behavior, not about saying the kernel actually needs to support oversubscribing EPC for KVM guests.