Received: by 2002:ab2:6203:0:b0:1f5:f2ab:c469 with SMTP id o3csp1464511lqt; Sat, 20 Apr 2024 17:20:24 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWPw4FGDkDBJgklIu8o+b/xjUvheebP5gGL+2ea5eHd6iqKZkdiRGQNviHjajyzqK5aGJLz6b29K1JGh4x2Sw90SMOIcMZdK5m+RorWJw== X-Google-Smtp-Source: AGHT+IFgDBLjc0gY9xBXks38TeWEZgVnuUcb5Zh+TmGkEDJrz4PZkLh/pENXedpTCvCACvy5VYJg X-Received: by 2002:a17:903:41c9:b0:1e2:c554:a93c with SMTP id u9-20020a17090341c900b001e2c554a93cmr7739392ple.29.1713658824179; Sat, 20 Apr 2024 17:20:24 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713658824; cv=pass; d=google.com; s=arc-20160816; b=yFfDRN3IPyijEvgBXSPLh9ToXWgPXgufnExYM/aOmnFd49B/xO60LzeOb+8+tP1MPL HbABYlg5tnDJ9GKd12WQL0qsdcG72rJTteeDBJ2VLsJ6bWQuJVDliMJlbUNAJhcrr8Wn SmEDe9ZRxl43OufIAsPYV9KfnfH9B4pbh73mHnpuYbfCgs3oITx+kS+EYtrNTQAMXSY5 e5rgbqCPWWzWwiAnNZQA7Cdo8vHxr3Xf/wMK5MsHAHO6S8mPY4RbAwylCZRUJUSgv5BP 5m+JspcH35v9ewMmZyvPzMW/h4MJzg9DAE2QT3MUU5at3S4HBD0IgURsosOUkcoklMNm PWBg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=oz0Tuj6DlzITFTKo1WuqGSj2CD5OL4MfynPTYKg49wk=; fh=ciIzdkqoLXLhQo3JBppmYljCW4O9eAEdYEDKW0YsBQs=; b=DSdCE2N15wwVoWscNppzv9R8G3gZF7VFg/OoeCzm0tLAYPodoz6y1KxZQNabkRoHZ7 AVGK/mIgojWQCkUqrP2Fwxa4SZcY+jEWesJLVdrTWdbBkJBTOZFj+YEmAw9wlVGnCQV2 A3G8w7TeqE64OQ2cAapaSNwTD2vaMz84uSIu19YBcX1zk29wkeDHvU3dq1gqUcCQkwNo uDbr88zjsC3xfd6EMdWaij/OL2x4PEtKXmdKQ8PdQTKBwbVZpQsWRCJFUorOyqI/JcJk 2tKSdWAZeDmIgcPVBSPzrr4Z2m9A4ptbaG1wcly/CVlSwTTH4PPvm1MNSvbzMjDOTBKx Eaaw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=MPmkbA11; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-152397-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-152397-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id q5-20020a170902b10500b001e278594751si5450791plr.178.2024.04.20.17.20.23 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 20 Apr 2024 17:20:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-152397-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=MPmkbA11; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-152397-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-152397-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id DF822B20BD4 for ; Sun, 21 Apr 2024 00:20:22 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D08A428E7; Sun, 21 Apr 2024 00:20:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MPmkbA11" Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F6AAA41 for ; Sun, 21 Apr 2024 00:20:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.208.47 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713658809; cv=none; b=A2WmlEkG3OFefm1OtUWgw5w0vneSqZZ6mO6MIorDZu8YY62zcJeksc03ubjnbYshyeluEJ2ZqQruZWGv/n0KZMWnrnSom5eQ+uTZpZoYuDI5hcmwYlJ7xGXm2SCOjr0Ute/9SR5CSAgtiP9idX6vBVXZfseidpRaMIpXVXHmHV0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713658809; c=relaxed/simple; bh=3iI/ZwfiK90rE1vl9rkJqobbZPBa+/ZCwxY8qPyIrUY=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=bdDBdw/+L0a1/bsivP+IeFfN/JooZAPzeVE0F2LEklsegTLpwam4GVL+dSYshHbyRPursipmcVtm1A9mI1Qi1VpAp77Taf9mlEiXxHWVOTw8eHs5GnFTBJNgHGWTqx0LB4LSKTc/9x3zpFN5LTchX2/Y7hOHy0pmvtFfvTfQpI0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MPmkbA11; arc=none smtp.client-ip=209.85.208.47 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-ed1-f47.google.com with SMTP id 4fb4d7f45d1cf-571e13cd856so9150a12.0 for ; Sat, 20 Apr 2024 17:20:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1713658806; x=1714263606; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=oz0Tuj6DlzITFTKo1WuqGSj2CD5OL4MfynPTYKg49wk=; b=MPmkbA11LNgrZKk4vsRN+T5dMR0orzI7I4MFqpA4zRuCtyWZqjo/CgVe1ODZifLOqC cDIHhbGh/errrP761LxUQuVpMDRF+aLYzlu6Wt5j0HhoFRUZDZdudDi+gIagl9R4BV6U B13iuRz+0fuFFBP5tzxD0UFr/4qzyozhSlTIifjoxINX4MF2TkMysyn2Sdjn1AD7rbCj /jBBE4+yF0SkKo3TXcNhanX0eRNxCjTUzbCkiu19+wClsbOm2Q9t1ITNi8XGso3Y++nN IBHj2keaMHM42LtPLkQGzskwyFi02KV5HiOxdDxNxnebv4kzosIy+8kjf7Hg/R6/ikRh 622w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1713658806; x=1714263606; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=oz0Tuj6DlzITFTKo1WuqGSj2CD5OL4MfynPTYKg49wk=; b=CceXWYkfREGpQh0hjGBdBPs+spsiwtV4W4MKfFP/UXO5mYcgpIjy1SMeTTGbl0Iwjz /flSSotOVSSTKvEqrnPfKDpD5xTt5MzNahR4lJxbz/2yULXHExtm74LuQQi6BzbD9X6a vwFufS28AqVJ+4EC0whFQ+7HOZo8xVeCiID3+oI+gWUbQrfC4u9VPbWbJxGrRKbUqUur qpyjigpwHVLpCETmrmSu4mQcqF6mcNogTZalKHVPRTMXozqLqd3YYGo53fEiaQCyR7ro e+ZWgBZK8dry8h+mXpWjTWGTUWIVaDHN7ICyFw+ox9VhW3zw/kvd0Aj1uBswh6vqBmcf 45SA== X-Forwarded-Encrypted: i=1; AJvYcCVoFT1DqYO5bLIUkGH50WRo3m5zBMsQZxTJF39Zl+/vmd8ondz1rGl+ids7X0JTFl5bqLHFheFnb3uEjY0j8S5Mupe22yAePtNG6v21 X-Gm-Message-State: AOJu0Yxb08yGO+YefaNVsX6Sg9MfcKQ4guH3vH7B5x3ehu2Kirc9YccO e8pOrj+ctZ4KjB85c+muvGuFvnC96gm5KnifhdFg1m8JkAxU9bkJogUNM0gCdI3D1b4nBf8ULlA zVBlJffp4N1AI5QnC27uyR/P8u3yfcuxTdRGx X-Received: by 2002:a50:ec95:0:b0:571:fc6f:426a with SMTP id e21-20020a50ec95000000b00571fc6f426amr28507edr.6.1713658806100; Sat, 20 Apr 2024 17:20:06 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240401232946.1837665-1-jthoughton@google.com> <20240401232946.1837665-6-jthoughton@google.com> In-Reply-To: From: Yu Zhao Date: Sat, 20 Apr 2024 18:19:28 -0600 Message-ID: Subject: Re: [PATCH v3 5/7] KVM: x86: Participate in bitmap-based PTE aging To: James Houghton Cc: David Matlack , Andrew Morton , Paolo Bonzini , Marc Zyngier , Oliver Upton , Sean Christopherson , Jonathan Corbet , James Morse , Suzuki K Poulose , Zenghui Yu , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Shaoqin Huang , Gavin Shan , Ricardo Koller , Raghavendra Rao Ananta , Ryan Roberts , David Rientjes , Axel Rasmussen , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvm@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Apr 19, 2024 at 3:48=E2=80=AFPM James Houghton wrote: > > On Fri, Apr 19, 2024 at 2:07=E2=80=AFPM David Matlack wrote: > > > > On 2024-04-19 01:47 PM, James Houghton wrote: > > > On Thu, Apr 11, 2024 at 10:28=E2=80=AFAM David Matlack wrote: > > > > On 2024-04-11 10:08 AM, David Matlack wrote: > > > > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > > > { > > > > bool young =3D false; > > > > > > > > if (!range->arg.metadata->bitmap && kvm_memslots_have_rmaps= (kvm)) > > > > young =3D kvm_handle_gfn_range(kvm, range, kvm_age_= rmap); > > > > > > > > if (tdp_mmu_enabled) > > > > young |=3D kvm_tdp_mmu_age_gfn_range(kvm, range); > > > > > > > > return young; > > > > } > > > > > > > > bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > > > { > > > > bool young =3D false; > > > > > > > > if (!range->arg.metadata->bitmap && kvm_memslots_have_rmaps= (kvm)) > > > > young =3D kvm_handle_gfn_range(kvm, range, kvm_test= _age_rmap); > > > > > > > > if (tdp_mmu_enabled) > > > > young |=3D kvm_tdp_mmu_test_age_gfn(kvm, range); > > > > > > > > return young; > > > > > > > > > Yeah I think this is the right thing to do. Given your other > > > suggestions (on patch 3), I think this will look something like this > > > -- let me know if I've misunderstood something: > > > > > > bool check_rmap =3D !bitmap && kvm_memslot_have_rmaps(kvm); > > > > > > if (check_rmap) > > > KVM_MMU_LOCK(kvm); > > > > > > rcu_read_lock(); // perhaps only do this when we don't take the MMU l= ock? > > > > > > if (check_rmap) > > > kvm_handle_gfn_range(/* ... */ kvm_test_age_rmap) > > > > > > if (tdp_mmu_enabled) > > > kvm_tdp_mmu_test_age_gfn() // modified to be RCU-safe > > > > > > rcu_read_unlock(); > > > if (check_rmap) > > > KVM_MMU_UNLOCK(kvm); > > > > I was thinking a little different. If you follow my suggestion to first > > make the TDP MMU aging lockless, you'll end up with something like this > > prior to adding bitmap support (note: the comments are just for > > demonstrative purposes): > > > > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > { > > bool young =3D false; > > > > /* Shadow MMU aging holds write-lock. */ > > if (kvm_memslots_have_rmaps(kvm)) { > > write_lock(&kvm->mmu_lock); > > young =3D kvm_handle_gfn_range(kvm, range, kvm_age_rmap= ); > > write_unlock(&kvm->mmu_lock); > > } > > > > /* TDM MMU aging is lockless. */ > > if (tdp_mmu_enabled) > > young |=3D kvm_tdp_mmu_age_gfn_range(kvm, range); > > > > return young; > > } > > > > Then when you add bitmap support it would look something like this: > > > > bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) > > { > > unsigned long *bitmap =3D range->arg.metadata->bitmap; > > bool young =3D false; > > > > /* SHadow MMU aging holds write-lock and does not support bitma= p. */ > > if (kvm_memslots_have_rmaps(kvm) && !bitmap) { > > write_lock(&kvm->mmu_lock); > > young =3D kvm_handle_gfn_range(kvm, range, kvm_age_rmap= ); > > write_unlock(&kvm->mmu_lock); > > } > > > > /* TDM MMU aging is lockless and supports bitmap. */ > > if (tdp_mmu_enabled) > > young |=3D kvm_tdp_mmu_age_gfn_range(kvm, range); > > > > return young; > > } > > > > rcu_read_lock/unlock() would be called in kvm_tdp_mmu_age_gfn_range(). > > Oh yes this is a lot better. I hope I would have seen this when it > came time to actually update this patch. Thanks. > > > > > That brings up a question I've been wondering about. If KVM only > > advertises support for the bitmap lookaround when shadow roots are not > > allocated, does that mean MGLRU will be blind to accesses made by L2 > > when nested virtualization is enabled? And does that mean the Linux MM > > will think all L2 memory is cold (i.e. good candidate for swapping) > > because it isn't seeing accesses made by L2? > > Yes, I think so (for both questions). That's better than KVM not > participating in MGLRU aging at all, which is the case today (IIUC -- > also ignoring the case where KVM accesses guest memory directly). We > could have MGLRU always invoke the mmu notifiers, but frequently > taking the MMU lock for writing might be worse than evicting when we > shouldn't. Maybe Yu tried this at some point, but I can't find any > results for this. No, in this case only the fast path (page table scanning) is disabled. MGLRU still sees the A-bit from L2 using the rmap, i.e., the slow path calling folio_check_references().