Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp7424133rwb; Wed, 23 Nov 2022 06:25:22 -0800 (PST) X-Google-Smtp-Source: AA0mqf4AEin37NWq8h0zl60r6aKYtbspl63oR3X18XZlMJ83amPkal9E7Q2tbDIWSpppZlnPYJa0 X-Received: by 2002:a63:1302:0:b0:439:e030:3fa8 with SMTP id i2-20020a631302000000b00439e0303fa8mr8425068pgl.554.1669213521925; Wed, 23 Nov 2022 06:25:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669213521; cv=none; d=google.com; s=arc-20160816; b=j8di5IV99sVR74LMoqHI+jE73m2EUER5AzCVgxsHNokl7m0hhWfliMrPuaZ++XjGSG CvkKJjLKIZ7C7eknc42PCJ577eeTO69lMLNOwUqp8G4tEtnOrg6kE2j7UAR3vVjku2wy WQh/sBhlavxyzaKoaXdS/nxZArNLjnn0vefGa6h0Z09P5r0KaxPV+GNC8fDF0zvr0YcR 21KK+nOmYNwXBWi5rc7h2EBSDBen5XgpBpRvz1grlWqGlYs8JDl0AHBAaP3kjcBJIYjH JG+rCZyCX8rnQF0URJ5pLGnDDB0QYgzT9J1ydDGaa93dGBYQLDy8SK0ZwwgAjfO6vDU0 bpVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=8/DTsoiDdjjsRtV9pwPMMoAHlpx48KJLm5OuDCWlE6M=; b=NdOlpn0ixUCoF/L/UJHjcEWnp9KU/DP7gnxE+UEs5+PyvYn+3YeXPovIKkzZHVtIeG L5i/l3WL6xbZkqnZM3LwlhBcuz+4rPdb+UU72YurktiBWtjshK4Je4mhO0Ryp1wrmT7e odIhN0d0qF8bzLXLmesivydo5D39b5STub6ZuUwSkDVDAy128mMrwc7bqHPOntvzzpOa 54s3EKmWNrDZ1cCEEKAgEeGaar656dSaR4u0nduJzbBM6nNU9uoTwZ3HZ+IF0QnCkc9u KZr7ZQwG9Vfhf6v5CV89fMfXWKCst1D8sSJe8tzWFArgMSq9lwiWW8Fwi+1SexrK86vQ yZhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ImN9HtjA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w8-20020aa79548000000b0056d8f42a69csi3884892pfq.145.2022.11.23.06.25.10; Wed, 23 Nov 2022 06:25:21 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ImN9HtjA; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238142AbiKWOMj (ORCPT + 88 others); Wed, 23 Nov 2022 09:12:39 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35414 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236038AbiKWOMf (ORCPT ); Wed, 23 Nov 2022 09:12:35 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C778CDD for ; Wed, 23 Nov 2022 06:11:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669212694; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=8/DTsoiDdjjsRtV9pwPMMoAHlpx48KJLm5OuDCWlE6M=; b=ImN9HtjAGd1kycPpbn2VPlQDfoA5FI4JEG2SZK0CojefJ88vThg1QNnXwTp6PY4rABU2Gu ZEf+XQ4r7fouAxiubqpQ2rsxekt0MJYPJO2mwWgaLLspNH464R2JgUQHEtWBhWMVE/l/6c 4zRMQy0mAAbJS4SJP4gju+XbG4EY8hQ= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-204-EGs5dCU5M7GQ7px830C7fg-1; Wed, 23 Nov 2022 09:11:33 -0500 X-MC-Unique: EGs5dCU5M7GQ7px830C7fg-1 Received: by mail-qt1-f199.google.com with SMTP id u31-20020a05622a199f00b003a51fa90654so17231213qtc.19 for ; Wed, 23 Nov 2022 06:11:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8/DTsoiDdjjsRtV9pwPMMoAHlpx48KJLm5OuDCWlE6M=; b=77BHNdIAFjFGYRl0ziUuYlGOf3bu75KamnG3gCN37zMljGnnHGMEfJISdTk+lMh4Wt 4ncEz1SxQOSaJKYnl6tz0a+89Vu+iKk+9EfDGgkAtE3H6lhIK5OmLbePDj0hbKBKZnZN 0eYKW+8j3CVbO6pPlu6wrYvPFmAYkESwbOcQd6pZsuGMRVqcwESiVC2q+4B/aB0q4TZX UU3qCsL2KPeHebFqh4viN5bDYNJcAlgWb+viW0IUWy3196nKJ/F4/DO7MRfAWPNbvSAt M8/I7ZwHNCbvUrERop1JorbzlUv7wlnm3BmWuW1TsNvXyd2FBnyJmoU+pmFZ2lSxzdLK aCaA== X-Gm-Message-State: ANoB5plGQsxos1mOsUaLoeVXr4TIXdAj/wjiOG/4fjkgv3aR6sx8hHlq qviUFNz2M6ZTYOxBKC9frwxQ2wk+04Y/C5VdYNpkUt9CKKs6aKQlytApZp5vz+M/pIpkNYJ6fFp YsjL6Gh9PHqflPgPCsR3ga2s/ X-Received: by 2002:a05:620a:215c:b0:6fa:937f:61d4 with SMTP id m28-20020a05620a215c00b006fa937f61d4mr10724828qkm.280.1669212692628; Wed, 23 Nov 2022 06:11:32 -0800 (PST) X-Received: by 2002:a05:620a:215c:b0:6fa:937f:61d4 with SMTP id m28-20020a05620a215c00b006fa937f61d4mr10724802qkm.280.1669212692303; Wed, 23 Nov 2022 06:11:32 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id n30-20020ac81e1e000000b003a50b9f099esm9980897qtl.12.2022.11.23.06.11.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Nov 2022 06:11:31 -0800 (PST) Date: Wed, 23 Nov 2022 09:11:30 -0500 From: Peter Xu To: Muhammad Usama Anjum Cc: =?utf-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Andrei Vagin , Danylo Mocherniuk , Alexander Viro , Andrew Morton , Suren Baghdasaryan , Greg KH , Christian Brauner , Yang Shi , Vlastimil Babka , Zach O'Keefe , "Matthew Wilcox (Oracle)" , "Gustavo A. R. Silva" , Dan Williams , kernel@collabora.com, Gabriel Krisman Bertazi , David Hildenbrand , Peter Enderborg , "open list : KERNEL SELFTEST FRAMEWORK" , Shuah Khan , open list , "open list : PROC FILESYSTEM" , "open list : MEMORY MANAGEMENT" , Paul Gofman , Andrea Arcangeli Subject: Re: [PATCH v6 0/3] Implement IOCTL to get and/or the clear info about PTEs Message-ID: References: <20221109102303.851281-1-usama.anjum@collabora.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20221109102303.851281-1-usama.anjum@collabora.com> X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 09, 2022 at 03:23:00PM +0500, Muhammad Usama Anjum wrote: > Soft-dirty PTE bit of the memory pages can be read by using the pagemap > procfs file. The soft-dirty PTE bit for the whole memory range of the > process can be cleared by writing to the clear_refs file. There are other > methods to mimic this information entirely in userspace with poor > performance: > - The mprotect syscall and SIGSEGV handler for bookkeeping > - The userfaultfd syscall with the handler for bookkeeping Userfaultfd is definitely slow in this case because it needs the messaging roundtrip that happens in two different threads synchronously, so at least more schedule effort even than mprotect. I saw the other patch on vma merging with SOFTDIRTY, didn't look deeper there but IIUC it won't really help much if the other commit (34228d47) can't be reverted then it seems to help nothing. And, it does looks risky to revert that because in the same commit it mentioned the case where one can clear ref right before a vma merge, so definitely worth more thoughts and testings, which I agree with you. I'm thinking whether the vma issue can be totally avoided. For example by providing an async version of uffd-wp. Currently uffd-wp must be synchronous and it'll be slow but it services specific purposes. And this is definitely not the 1st time any of us thinking about uffd-wp being async, it's just that we need to solve the problem of storage on the dirty information. Actually we can also use other storage form but so far I didn't think of anything that's easy and clean. Current soft-dirty bit also has its defects (e.g. the need to take mmap lock and walk the pgtables), but that part will be the same as soft-dirty for now. Now I'm wildly thinking whether we can just reuse the soft-dirty bit in the ptes already defined. The GET interface could be similar as proposed here, or at least a separate issue. So _maybe_ we can have a feature (bound to the uffd context) for uffd that enables async uffd-wp, in which case the wr-protect fault is not sending any message anymore (nor enqueuing) but instead setting the soft-dirty then quickly resolving the write bit immediately and continue the fault. Clearing of the soft-dirty bit needs to be done in UFFDIO_WRITEPROTECT alongside of clearing uffd-wp bit. So on that part the current GET+CLEAR interface for pagemap may need to be replaced. And frankly, it feels weird to me to allow change mm layout in pagemap ioctls.. With this we can keep the pagemap interface to only fetch information, like before. A major benefit of using uffd is that uffd is by nature pte-based, so no fiddling with vma needed at all. Firstly, no need to worry about merging vmas with tons of false positives. Meanwhile, one can wr-protect in page-size granule easily. All the wr-protect is not governed by vma flag anymore but based on uffd-wp flag, so no extra overhead too on any page that the monitor is not interested. There's already infrastructure code for persisting uffd-wp bit, so it'll naturally work similarly for an async mode if to come to the world. It's just that we'll also need to consider exclusive use of the bit, so we'll need to fail clear_refs on vmas where we have VM_UFFD_WP and also the async feature enabled. I would hope that's very rare, but worth thinking about its side effect. The same will need to apply to UFFDIO_REGISTER on async wp mode when soft-dirty enabled, we'll need to bailout too. Said that, this is not a suggestion of a new design, but just something I thought about when reading this, and quickly writting this down. Thanks, -- Peter Xu