Received: by 2002:a05:6358:3188:b0:123:57c1:9b43 with SMTP id q8csp5819718rwd; Wed, 24 May 2023 07:13:01 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5kCgBJu8hZy2CHvpbbYSX11cWyfPmGTcFsjfQ+d5cPwro8drs8Z+bvpxR50Y4SXmdsvWgs X-Received: by 2002:a05:6a00:1401:b0:64d:6c6f:84f2 with SMTP id l1-20020a056a00140100b0064d6c6f84f2mr3626027pfu.19.1684937581424; Wed, 24 May 2023 07:13:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684937581; cv=none; d=google.com; s=arc-20160816; b=jTcmwIh+PsHjN/KSvWfqaSM4jP/QMnQ0eRiPobO5MCw8U/h8au99NFEZ6N++2I+hX4 oar7PA/lyRer5t4zgbYF3AEpSyVFCIUGc4pFWPk3RMgWO5PBcZjk1j0F1bXVqR8g2j/5 i2EYkxhDCbqr3QsyoZhczzVe2zLLq0jC80nC5QOa9yJf3Y7BrGHpYccBUKSiZ0uRFfsg dsa8+Yj1+qLW4H4SesTOhKgR9gCzCTXKqzghoqGu9eVJEinG6eHAEKvKohQQ6gYGL9kB SL52dZKMaN5+pAyVUjS3fsbJ/mdy268cf56q5wQGUMKlCQFMlSM6uwjpFK6oCWr7rQaU r69g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=zqvJBcdP4jWNQEKJ2URv+kldtu3J4d9i0mGEu6ibPxU=; b=GVp7narnvAj6B4ZWRzNBHe+N+SixiMWeCswgeGGYV9EvKmefkQ4OBw5FL/B74C+yMB rb412mr5j6r1MKAsBtcHcEDURcSXnP9WAarDXjJCb5ODQajM5eNAtjI8OHK3O0iXIkSy MVMb3Wapt/DCqknWOuO8elH9YKn9KvbeREPtb+Vg8a3pmh7fJeIq+I4ZYXPDI3Y2JKm7 JAgtQFDESBJrNAI/HlukfQqmIiaS+q2XvWl0quF9daBpuF2agZw02paL591jlMt/k0Pq jKOYAQO1Q1RlnsIeCOAXq8rSP4WMJwFbX5uyMLh3hMC/t+aymrntnnYlR8/n6tqWgEWH wyoA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JMbT6qon; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h20-20020aa796d4000000b0064378422f5fsi1242170pfq.169.2023.05.24.07.12.47; Wed, 24 May 2023 07:13:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=JMbT6qon; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235582AbjEXN4q (ORCPT + 99 others); Wed, 24 May 2023 09:56:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43736 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234058AbjEXN4p (ORCPT ); Wed, 24 May 2023 09:56:45 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A750FA7 for ; Wed, 24 May 2023 06:56:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684936564; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=zqvJBcdP4jWNQEKJ2URv+kldtu3J4d9i0mGEu6ibPxU=; b=JMbT6qonp/G+zvFzenSapH5UyWMPOPSSebXTSleDyWEuBAZzAb+bkOzMhmWZ/WRhczr5xg Pp9nVj4/8C96NNG2brEvO+dBH1HO00rbMYfsghfZLT4KaTUwYpL3wpZsiCNkzO1kSqXymI jjnf8/EZb41YOME4/d/ES2G6iZobF2o= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-655-Jb4_9hSwMvCNMS-hK4DNRg-1; Wed, 24 May 2023 09:56:03 -0400 X-MC-Unique: Jb4_9hSwMvCNMS-hK4DNRg-1 Received: by mail-qv1-f70.google.com with SMTP id 6a1803df08f44-624a29df9feso1909476d6.1 for ; Wed, 24 May 2023 06:56:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684936563; x=1687528563; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=zqvJBcdP4jWNQEKJ2URv+kldtu3J4d9i0mGEu6ibPxU=; b=N9aquGRztKS/w9BBFO9Fw9gMhI8gbvq1wmUrnogk3rHWVL2hTmwZbQ7G9ZYrt+Evg8 J4SVvSZ308tBCWMchlqyBvRwN0tnsNv9+lVuMgkwgT59uOLuZbX8/hC9aipVOLAxKl+F QlAhVxxxRO5mqWKtv+jiWsYODt0XtYu+q1NRtHceiT96lttJIjt7aZjsXLXbs6YM14Zo uUdVvp3RIDaxH1A3fffymo7ndWBVlxz5dwJsns4o2XulA2bntbst2DIgS0rWHFC+mZh3 N9lTx9j99eKw3Kt3n9cOf4EEQRRgfKQ/GWlLXPWklc30otjbGV4VFOHq2UEm4z86VW45 WwTA== X-Gm-Message-State: AC+VfDwIlRMbGqtgq5JiHoPGJrcLk3SwPoqI28tK6POajUvmwAFsKg02 J8f8tblnlWorDjDd4vUCfTwwZnTh+NNdGR/8ZHHO+1ww9nhcbMJfvXMFR1BKLpCIeedOz8RL0y9 GVDQQNTbuJDtGNEYUogB1iXER X-Received: by 2002:a05:6214:3016:b0:624:dcc5:819f with SMTP id ke22-20020a056214301600b00624dcc5819fmr18149953qvb.1.1684936563052; Wed, 24 May 2023 06:56:03 -0700 (PDT) X-Received: by 2002:a05:6214:3016:b0:624:dcc5:819f with SMTP id ke22-20020a056214301600b00624dcc5819fmr18149935qvb.1.1684936562709; Wed, 24 May 2023 06:56:02 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-62-70-24-86-62.dsl.bell.ca. [70.24.86.62]) by smtp.gmail.com with ESMTPSA id v16-20020a0ccd90000000b00604ee171d99sm3516206qvm.106.2023.05.24.06.56.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 May 2023 06:56:01 -0700 (PDT) Date: Wed, 24 May 2023 09:55:59 -0400 From: Peter Xu To: Muhammad Usama Anjum Cc: linux-mm@kvack.org, Paul Gofman , Alexander Viro , Shuah Khan , Christian Brauner , Yang Shi , Vlastimil Babka , "Liam R . Howlett" , Yun Zhou , Cyrill Gorcunov , =?utf-8?B?TWljaGHFgiBNaXJvc8WCYXc=?= , Andrew Morton , Suren Baghdasaryan , Andrei Vagin , Alex Sierra , Matthew Wilcox , Pasha Tatashin , Danylo Mocherniuk , Axel Rasmussen , "Gustavo A . R . Silva" , David Hildenbrand , Dan Williams , linux-kernel@vger.kernel.org, Mike Rapoport , linux-fsdevel@vger.kernel.org, linux-kselftest@vger.kernel.org, Greg KH , kernel@collabora.com, Nadav Amit Subject: Re: [PATCH RESEND v15 2/5] fs/proc/task_mmu: Implement IOCTL to get and optionally clear info about PTEs Message-ID: References: <20230420060156.895881-1-usama.anjum@collabora.com> <20230420060156.895881-3-usama.anjum@collabora.com> <0edfaf12-66f2-86d3-df1c-f5dff10fb743@collabora.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 24, 2023 at 04:26:33PM +0500, Muhammad Usama Anjum wrote: > On 5/24/23 12:43 AM, Peter Xu wrote: > > Hi, Muhammad, > > > > On Mon, May 22, 2023 at 04:26:07PM +0500, Muhammad Usama Anjum wrote: > >> On 5/22/23 3:24 PM, Muhammad Usama Anjum wrote: > >>> On 4/26/23 7:13 PM, Peter Xu wrote: > >>>> Hi, Muhammad, > >>>> > >>>> On Wed, Apr 26, 2023 at 12:06:23PM +0500, Muhammad Usama Anjum wrote: > >>>>> On 4/20/23 11:01 AM, Muhammad Usama Anjum wrote: > >>>>>> +/* Supported flags */ > >>>>>> +#define PM_SCAN_OP_GET (1 << 0) > >>>>>> +#define PM_SCAN_OP_WP (1 << 1) > >>>>> We have only these flag options available in PAGEMAP_SCAN IOCTL. > >>>>> PM_SCAN_OP_GET must always be specified for this IOCTL. PM_SCAN_OP_WP can > >>>>> be specified as need. But PM_SCAN_OP_WP cannot be specified without > >>>>> PM_SCAN_OP_GET. (This was removed after you had asked me to not duplicate > >>>>> functionality which can be achieved by UFFDIO_WRITEPROTECT.) > >>>>> > >>>>> 1) PM_SCAN_OP_GET | PM_SCAN_OP_WP > >>>>> vs > >>>>> 2) UFFDIO_WRITEPROTECT > >>>>> > >>>>> After removing the usage of uffd_wp_range() from PAGEMAP_SCAN IOCTL, we are > >>>>> getting really good performance which is comparable just like we are > >>>>> depending on SOFT_DIRTY flags in the PTE. But when we want to perform wp, > >>>>> PM_SCAN_OP_GET | PM_SCAN_OP_WP is more desirable than UFFDIO_WRITEPROTECT > >>>>> performance and behavior wise. > >>>>> > >>>>> I've got the results from someone else that UFFDIO_WRITEPROTECT block > >>>>> pagefaults somehow which PAGEMAP_IOCTL doesn't. I still need to verify this > >>>>> as I don't have tests comparing them one-to-one. > >>>>> > >>>>> What are your thoughts about it? Have you thought about making > >>>>> UFFDIO_WRITEPROTECT perform better? > >>>>> > >>>>> I'm sorry to mention the word "performance" here. Actually we want better > >>>>> performance to emulate Windows syscall. That is why we are adding this > >>>>> functionality. So either we need to see what can be improved in > >>>>> UFFDIO_WRITEPROTECT or can I please add only PM_SCAN_OP_WP back in > >>>>> pagemap_ioctl? > >>>> > >>>> I'm fine if you want to add it back if it works for you. Though before > >>>> that, could you remind me why there can be a difference on performance? > >>> I've looked at the code again and I think I've found something. Lets look > >>> at exact performance numbers: > >>> > >>> I've run 2 different tests. In first test UFFDIO_WRITEPROTECT is being used > >>> for engaging WP. In second test PM_SCAN_OP_WP is being used. I've measured > >>> the average write time to the same memory which is being WP-ed and total > >>> time of execution of these APIs: > > > > What is the steps of the test? Is it as simple as "writeprotect", > > "unprotect", then write all pages in a single thread? > > > > Is UFFDIO_WRITEPROTECT sent in one range covering all pages? > > > > Maybe you can attach the test program here too. > > I'd not attached the test earlier as I thought that you wouldn't be > interested in running the test. I've attached it now. The test has multiple Thanks. No plan to run it, just to make sure I understand why such a difference. > threads where one thread tries to get status of flags and reset them, while > other threads write to that memory. In main(), we call the pagemap_scan > ioctl to get status of flags and reset the memory area as well. While in N > threads, the memory is written. > > I usually run the test by following where memory area is of 100000 * pages: > ./win2_linux 8 100000 1 1 0 > > I'm running tests on real hardware. The results are pretty consistent. I'm > also testing only on x86_64. PM_SCAN_OP_WP wins every time as compared to > UFFDIO_WRITEPROTECT. If it's multi-threaded test especially when the ioctl runs together with the writers, then I'd assume it's caused by writers frequently need to flush tlb (when writes during UFFDIO_WRITEPROTECT), the flush target could potentially also include the core running the main thread who is also trying to reprotect because they run on the same mm. This makes me think that your current test case probably is the worst case of Nadav's patch 6ce64428d6 because (1) the UFFDIO_WRITEPROTECT covers a super large range, and (2) there're a _lot_ of concurrent writers during the ioctl, so all of them will need to trigger a tlb flush, and that tlb flush will further slow down the ioctl sender. While I think that's the optimal case sometimes, of having minimum tlb flush on the ioctl(UFFDIO_WRITEPROTECT), so maybe it makes sense somewhere else where concurrent writers are not that much. I'll need to rethink a bit on all these to find out whether we can have a good way for both.. For now, if your workload is mostly exactly like your test case, maybe you can have your pagemap version of WP-only op there, making sure tlb flush is within the pgtable lock critical section (so you should be safe even without Nadav's patch). If so, I'd appreciate you can add some comment somewhere about such difference of using pagemap WP-only and ioctl(UFFDIO_WRITEPROTECT), though. In short, functional-wise they should be the same, but trivial detail difference on performance as TBD (maybe one day we can have a good approach for all and make them aligned again, but maybe that also doesn't need to block your work). -- Peter Xu