Received: by 2002:ab2:7a55:0:b0:1f4:4a7d:290d with SMTP id u21csp336746lqp; Thu, 4 Apr 2024 15:03:14 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCXI+/eNT0HqMZPHroM4RFDTtrkncjMqNNs1VAQuKD+rRqzgNddJiV2YjrQWUg3hjOAoydFImtYDKfKAR7BBs4lB4mIuDwupqFh3mJnk4Q== X-Google-Smtp-Source: AGHT+IFK79LE5Tb73iSDR7xOn3QcOzctooGSQnx6KuIQDWCswNweP0Y5pMmr0xatizAVcNXhA05Z X-Received: by 2002:ad4:5bec:0:b0:699:2dbc:b767 with SMTP id k12-20020ad45bec000000b006992dbcb767mr3313629qvc.10.1712268194467; Thu, 04 Apr 2024 15:03:14 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1712268194; cv=pass; d=google.com; s=arc-20160816; b=V2MQFaypyZJbCj0tE1bmezZCCRgWU5TbWO4c4jrtEvitQXAQ+9ErmjN4GK2iaGlJAc UwvlqUfwHVQOvEam/6ypOon9dN7ajYqFDcB1dRu5fY2B1Iaz/SzgR6iuuOYVFqGB2hDu wzISF3P6qF3vCM4CBYCGlh6x2ueF3Ahb5sTcMZolNxK6ABnRqdS4W84dUJypjRNOZ9Zl vauX1hyH3nKAwCv3PNJxuJtNsindhvVFzysCR6XEqZJrMQYjNsLt3UJWOG+cNIhzmwWo m/PXtZvyakNsT+aiWso5Cuej/LEljAjdEaSbwU3sCkXo/cEusO0LLcr69mGRLexF80QF 1uQw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=9ZNQ0U5unaFTxk2zzDkpqLdaGLsgSJwRkFdrhcIu1eQ=; fh=JVng8Pn1Na6I8Re5zBnpOvSpmIa3Odj67GbvamQu1cY=; b=gX6X+qeny+C7UMV/nu2DAG6QHU92e9H6NFpprbJg9eHk9cC5hKGF3mTDfo5ZRpv+MD iS+rKCqNQHd8Mk2eCL8nvkrtqlzUAj2K9VTBUX/UkOEhoVgpxEjBimoyroHmV/NtWk17 uZrwrSiGGIy9TOCvfMLeXMb1jB5v2u471Ao/GfTfVGe6UfAOuaTn7U/6+7OU+hFfz+MA HC43aTUp/0YSYP0JEK2/LTtpd5oeWNNokPhOPmZGUkptnF5UJL5Q971UKPJ8F4UxN4Hx Q+v7QopdsG7hq8dg92dPNhT6gKUI/e+ZPx8jIE8bsukI+5R4AsauvtPA+ZPyaj9/VGJo sjdw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=VaHPnctX; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-132179-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-132179-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id dj15-20020a056214090f00b006968ee0ad51si287927qvb.316.2024.04.04.15.03.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 Apr 2024 15:03:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-132179-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=VaHPnctX; arc=pass (i=1 spf=pass spfdomain=flex--seanjc.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-132179-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-132179-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 10A5C1C21D55 for ; Thu, 4 Apr 2024 22:03:13 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 90AEC13C661; Thu, 4 Apr 2024 22:02:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VaHPnctX" Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 312D513C3F2 for ; Thu, 4 Apr 2024 22:02:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712268174; cv=none; b=IbIGjTYS7MuNwe1WJT3cvKc3Wvi39ekDt9pRYxql4N3C21aGoaGbMz31mp4OGiHXuGBEaGNjpG1oak/qJDRBo4JHQ2ClHQYjbRcTqQwsLmmkvFHf3ykEeEqqc24aSgr2EJ+EzNGDf9Y7yHMOMYmT+Qq6Mko4m54LwROf3XxHvxE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1712268174; c=relaxed/simple; bh=gCXKx584ihLrRGLv1AW+QX7Fe19+djW1Hf4uwf3WwHA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jvFp6ACfafHsBohB5ia6a3sE8fY+H7KozfKNjXuJ5IMGf3y5r2rmj0O9BYHe5a9/BomBWqSAU0imLCYFxdrX8SvD8aIvnA758NEFL3sgOv6934WFFcxf0Lwgyjkc8l236Ne6GWZL5MCH6kElutR5YxgOHPDr/5xdj7ckTT8Dm68= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=VaHPnctX; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-60cd62fa1f9so23794617b3.0 for ; Thu, 04 Apr 2024 15:02:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1712268172; x=1712872972; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9ZNQ0U5unaFTxk2zzDkpqLdaGLsgSJwRkFdrhcIu1eQ=; b=VaHPnctXrCXyHSegUpa5TBFCy9Zl8TuNvFuAdlc5bd4kumNzD0X5D6xV2fcexJAMut Jiu4RiOWcsqXBVEQZxatOzn/Z2xpo2iYSBQ3msK8KRG45Ur2LZTMCNVbSo9kSppQaMJn V1+G8wq0mXC9bQcM2JdaqKbn2rIZsvQTds3VNQZ/EslCnoqUX7BDKomJj5eYsJk2EaqA 6eBrgK7kqd1Hcv0UkDt3aQAC4VaEk0nGE3LUXfinu77y00q37uat8v3+bETbpNxcol9x VXEeKK9mUKjdSqitNfY2bFjn5OIjM/CA0pjnt33Dt9DrOzM0Z49FFxzy3ajJEmcMyCLl wSBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712268172; x=1712872972; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9ZNQ0U5unaFTxk2zzDkpqLdaGLsgSJwRkFdrhcIu1eQ=; b=bBCOMHTdiH02k8gOy2zjc/dmWopUl5BHAFkwkv6QLhwDas0mJVWCm+9KowHLoYsHfK DxM6eS52x117ayDdipiOfKJ50zsDKASxbo6RyV37hIbMGQaIeCii8Eyy2n9S8/u505l/ I3ClKmBop31vZLrs9p2T/UNdjZgQ9YyDM/2rQHmNjBfA8PG/uTsyYAzShrcW8bUPaKDH cixi5EuDtBmnXZYEXLz9+WE/2Qt1Khb2RxjwJsdb6M2R4yviOdKccnl++c/eTE3D9zVl d29uixOCWNnnInOHgEDZdL8Cjiuv5cKSgiwI/+ZCaXSK8UJdy5krKhwDV3h9XYxZxU5w Ne9g== X-Forwarded-Encrypted: i=1; AJvYcCUaDFSPKGKjh8lWK4efSQbXui0l+/faruyrHB27345nRwfl/81q7JqcYtlwcHNy7r9pl84ATOtKvvhAEXUwI9mWYyIisaJ1dvBJx9RW X-Gm-Message-State: AOJu0YzjWpEmNiOoyV/VOV88LXhGxXgTBFPa5uY6G6znsNuGCekYm8NK cbzPhx3s9JetyUDlZ0x5Bg6Jkz1ZuB2eTTehfzY9R3pAqx0BsiBznc4LVSe3n4Q96cDvkd9Mkqe GeA== X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a25:6b4b:0:b0:dc2:26f6:fbc8 with SMTP id o11-20020a256b4b000000b00dc226f6fbc8mr91119ybm.7.1712268172238; Thu, 04 Apr 2024 15:02:52 -0700 (PDT) Date: Thu, 4 Apr 2024 15:02:50 -0700 In-Reply-To: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240320005024.3216282-1-seanjc@google.com> <4d04b010-98f3-4eae-b320-a7dd6104b0bf@redhat.com> <42dbf562-5eab-4f82-ad77-5ee5b8c79285@redhat.com> Message-ID: Subject: Re: [RFC PATCH 0/4] KVM: x86/mmu: Rework marking folios dirty/accessed From: Sean Christopherson To: David Hildenbrand Cc: David Matlack , Paolo Bonzini , kvm@vger.kernel.org, linux-kernel@vger.kernel.org, David Stevens , Matthew Wilcox Content-Type: text/plain; charset="us-ascii" On Thu, Apr 04, 2024, David Hildenbrand wrote: > On 04.04.24 19:31, Sean Christopherson wrote: > > On Thu, Apr 04, 2024, David Hildenbrand wrote: > > > On 04.04.24 00:19, Sean Christopherson wrote: > > > > Hmm, we essentially already have an mmu_notifier today, since secondary MMUs need > > > > to be invalidated before consuming dirty status. Isn't the end result essentially > > > > a sane FOLL_TOUCH? > > > > > > Likely. As stated in my first mail, FOLL_TOUCH is a bit of a mess right now. > > > > > > Having something that makes sure the writable PTE/PMD is dirty (or > > > alternatively sets it dirty), paired with MMU notifiers notifying on any > > > mkclean would be one option that would leave handling how to handle dirtying > > > of folios completely to the core. It would behave just like a CPU writing to > > > the page table, which would set the pte dirty. > > > > > > Of course, if frequent clearing of the dirty PTE/PMD bit would be a problem > > > (like we discussed for the accessed bit), that would not be an option. But > > > from what I recall, only clearing the PTE/PMD dirty bit is rather rare. > > > > And AFAICT, all cases already invalidate secondary MMUs anyways, so if anything > > it would probably be a net positive, e.g. the notification could more precisely > > say that SPTEs need to be read-only, not blasted away completely. > > As discussed, I think at least madvise_free_pte_range() wouldn't do that. I'm getting a bit turned around. Are you talking about what madvise_free_pte_range() would do in this future world, or what madvise_free_pte_range() does today? Because today, unless I'm really misreading the code, secondary MMUs are invalidated before the dirty bit is cleared. mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, range.start, range.end); lru_add_drain(); tlb_gather_mmu(&tlb, mm); update_hiwater_rss(mm); mmu_notifier_invalidate_range_start(&range); tlb_start_vma(&tlb, vma); walk_page_range(vma->vm_mm, range.start, range.end, &madvise_free_walk_ops, &tlb); tlb_end_vma(&tlb, vma); mmu_notifier_invalidate_range_end(&range); KVM (or any other secondary MMU) can re-establish mapping with W=1,D=0 in the PTE, but the costly invalidation (zap+flush+fault) still happens. > Notifiers would only get called later when actually zapping the folio. And in case we're talking about a hypothetical future, I was thinking the above could do MMU_NOTIFY_WRITE_PROTECT instead of MMU_NOTIFY_CLEAR. > So at least for some time, you would have the PTE not dirty, but the SPTE > writable or even dirty. So you'd have to set the page dirty when zapping the > SPTE ... and IMHO that is what we should maybe try to avoid :)