Received: by 2002:ab2:6203:0:b0:1f5:f2ab:c469 with SMTP id o3csp2805542lqt; Tue, 23 Apr 2024 02:02:02 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCWuKbMQRp+1V2kCIN/L+f1zZdbTKjeNnAgx2XlFdeDR0OPjNkjpXjOP2RASDTF8qkkbna+iM0bRrTe7aTmbZKZn8cemVdHtLXddeQg3AA== X-Google-Smtp-Source: AGHT+IGUd8kjW1IGkQI95g/UYa8bPuykpg2AUbwVGgY5rvPonrgVG7uZ/56sDg0H1TofShHXyo6w X-Received: by 2002:a17:903:41cb:b0:1e6:414:1a01 with SMTP id u11-20020a17090341cb00b001e604141a01mr14229795ple.54.1713862922269; Tue, 23 Apr 2024 02:02:02 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1713862922; cv=pass; d=google.com; s=arc-20160816; b=buk1U2b3azDttGPDzL3nukQ8dLrzaC6ccs1KU/wQLZr8Q/JNWECfNizodFJnxT3OQW v8iP4FhEmn/dUFRCotaKRB1bCra0z4Da4PxwVP7WFwkZf1w99+V8wRbs+UghU+dHKlkF Gz3GZXM8W0NneALr/wajfbo5Tr2wxVJjmaC4LJ9TcvIWN18uLMmurOm6RvHq1B4KHEx8 Hqg8gGUoiy/WHvXmvH40JYBhEt5OnEh48+VJIR3bmZMVAj5BabEyLecWRA/N9swjHAMy ht0nGS9ey4SPfyi3SQDSLXqRoAKMN8lGO9ufLWXa3OWl4a+BCwjQYg+U1ppI3rBRYLkN CnBg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=HKlLqbv8iQJO3Y/rPYgZ7mk6d5nYGox8KWOy1cB61OI=; fh=VLpgnHw97GwcUYxe5aaMLx9WCeZqTy+z2jfYYaIIXBE=; b=QZUELEc2n5aGpLcL65+kK3Iz4xUNfujMY9/nt0u//+EYq+e4/vigBopmR5eGsBp/aa Se9cc+Xr7MyQUy5ITVYzKIp4R+w/djTSYGNVJwfJJGVDo6pcpDUEXovMuzakKR5E4ack yZ7wEI0frAxyBAP4cVQlArg2O4217hPCmVtpOfVPOl59wumaUAcomacYyEeHZEQJvLZK 6ksqwL4ugiCTnHkbMJGe78unKL8lF8ORkCmvlJvfFT2pSrbuTTcs+n2EiKO6O9FRR6m9 6+4SXGJ8wReyZrjMX9FAC1Mcwtf0H1c/52+thMgFXHrBkBFtfEYExHO0SZNoCHbY4IC5 qMEQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-154728-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-154728-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id me6-20020a170902fc4600b001e503e6dbc3si9233821plb.648.2024.04.23.02.02.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 Apr 2024 02:02:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-154728-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-154728-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-154728-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id ECF76B241EF for ; Tue, 23 Apr 2024 08:49:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7792156473; Tue, 23 Apr 2024 08:49:12 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A47C0524A6; Tue, 23 Apr 2024 08:49:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713862151; cv=none; b=h+eEAhcXU834ObEZROS+EBVN1cPmYEolmte8vwNBdXJmo6LA36UORNxKqbXB9xpwVw9htTnSFdBjcqB3tbItikZ5Etxnd3P/43lGy0K2bwhlD++r1ImyTq85JSztVQ0IupLxCJK9s356a8lsX2pQt8Ol4s9klARqD2cGMmR0Qok= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1713862151; c=relaxed/simple; bh=Tgt7XtMEAaRR8592T/vdx2u5zNhrU+QC1fto751OEbk=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=AVaN8fR044CThHZRjXihAMeblPgUxWv6iCQcp4gcXSb4KyOnNi/BH/u1B6zlmyu3SqiQfTMRHeY6MQCBW40nDQnjXr9DEFcEcGcNgeUhoKqFrOP4iFfaHdPhpdREADASfmKOhFksv6N/s5u7n+bvIh4hsdWRjF9et44H34sUf/o= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DC426339; Tue, 23 Apr 2024 01:49:36 -0700 (PDT) Received: from [10.57.74.127] (unknown [10.57.74.127]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EDECA3F64C; Tue, 23 Apr 2024 01:49:06 -0700 (PDT) Message-ID: <9e73ad2f-198c-4ab5-a462-2e238edd9b34@arm.com> Date: Tue, 23 Apr 2024 09:49:05 +0100 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v1 0/5] arm64/mm: uffd write-protect and soft-dirty tracking Content-Language: en-GB To: David Hildenbrand , Mike Rapoport Cc: Shivansh Vij , Catalin Marinas , Will Deacon , Andrew Morton , Shuah Khan , Joey Gouly , Ard Biesheuvel , Mark Rutland , Anshuman Khandual , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" , "linux-kselftest@vger.kernel.org" References: <20240419074344.2643212-1-ryan.roberts@arm.com> <24999e38-e4f7-4616-8eae-dfdeba327558@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 19/04/2024 18:12, David Hildenbrand wrote: > On 19.04.24 18:30, Mike Rapoport wrote: >> On Fri, Apr 19, 2024 at 11:45:14AM +0200, David Hildenbrand wrote: >>> On 19.04.24 10:33, Shivansh Vij wrote: >>>>> On 19/04/2024 08:43, Ryan Roberts wrote: >>>>>> Hi All, >>>>>> >>>>>> This series adds uffd write-protect and soft-dirty tracking support for >>>>>> arm64. I >>>>>> consider the soft-dirty support (patches 3 and 4) as RFC - see rationale >>>>>> below. >>>>>> >>>>>> That said, these are the last 2 SW bits and we may want to keep 1 bit in >>>>>> reserve >>>>>> for future use. soft-dirty is only used for CRIU to my knowledge, and it is >>>>>> thought that their use case could be solved with the more generic uffd-wp. So >>>>>> unless somebody makes a clear case for the inclusion of soft-dirty >>>>>> support, we >>>>>> are probably better off dropping patches 3 and 4 and keeping bit 63 for >>>>>> future >>>>>> use. Although note that the most recent attempt to add soft-dirty for >>>>>> arm64 was >>>>>> last month [1] so I'd like to give Shivansh Vij the opportunity to make the >>>>>> case. >>>> >>>> Appreciate the opportunity to provide input here. >>>> >>>> I picked option one (dirty tracking in arm) because it seems to be the >>>> simplest way to move forward, whereas it would be a relatively heavy >>>> effort to add uffd-wp support to CRIU. >>>> >>>>  From a performance perspective I am also a little worried that uffd >>>> will be slower than just tracking the dirty bits asynchronously with >>>> sw dirty, but maybe that's not as much of a concern with the addition >>>> of uffd-wp async. >>>> >>>> With all this being said, I'll defer to the wisdom of the crowd about >>>> which approach makes more sense - after all, with this patch we should >>>> get uffd-wp support on arm so at least there will be _a_ way forward >>>> for CRIU (albeit one requiring slightly more work). >>> >>> Ccing Mike and Peter. In 2017, Mike gave a presentation "Memory tracking for >>> iterative container migration"[1] at LPC >>> >>> Some key points are still true I think: >>> (1) More flexible and robust than soft-dirty >>> (2) May obsolete soft-dirty >>> >>> We further recently added a new UFFD_FEATURE_WP_ASYNC feature as part of >>> [2], because getting soft-dirty return reliable results in some cases turned >>> out rather hard to fix. But it sounds like the current soft-dirty semantic is sufficient for CRIU on other arches? If I understood correctly from my brief scan of the linked post, the problem is that soft-dirty can sometimes provide false-positives? So could result in uneccessary copy, but never lost data? >>> >>> We might still have to optimize that approach for some very sparse large >>> VMAs, but that should be solvable. >>> >>>   "The major defect of this approach of dirty tracking is we need to >>>   populate the pgtables when tracking starts. Soft-dirty doesn't do it >>>   like that. It's unwanted in the case where the range of memory to track >>>   is huge and unpopulated (e.g., tracking updates on a 10G file with >>>   mmap() on top, without having any page cache installed yet). One way to >>>   improve this is to allow pte markers exist for larger than PTE level >>>   for PMD+. That will not change the interface if to implemented, so we >>>   can leave that for later.")[3] >>> >>> >>> If we can avoid adding soft-dirty on arm64 that would be great. This will >>> require work on the CRIU side. One downside of uffd-wp is that it is >>> currently not as avilable on architectures as soft-dirty. >> >> Using uffd-wp instead of soft-dirty in CRIU will require quite some work on >> CRIU side and probably on the kernel side too. >> >> And as of now we'll anyway have to maintain soft-dirty because powerpc and >> s390 don't have uffd-wp. >> >> With UFFD_FEATURE_WP_ASYNC the concern that uffd-wp will be slower than >> soft-dirty probably doesn't exist, but we won't know for sure until >> somebody will try. >> >> But there were other limitations, the most prominent was checkpointing an >> application that uses uffd. If CRIU is to use uffd-wp for tracking of the >> dirty pages, there should be some support for multiple uffd contexts for a >> VMA and that's surely a lot of work. > > Is it even already supported to checkpoint an application that is using uffd? > Hard to believe, what if the monitor is running in a completely different > process than the one being checkpointed? Shivansh, do you speak for CRIU? Are you able to comment on whether CRIU supports checkpointing an app that uses uffd? > > Further ... isn't CRIU already using uffd in some cases? ...documentation > mentions [1] that it is used for "lazy (or post-copy) restore in CRIU". At least > if the documentation is correct and its actually implemented. > > [1] https://criu.org/Userfaultfd Shivansh, same question - do you know the current CRIU status/plans for using uffd-wp instead of soft-dirty? If CRIU doesn't currently implement it and has no current plans to, how can we guage interest in making a plan? > >> >>> But I'll throw in another idea: do we really need soft-dirty and uffd-wp to >>> exist at the same time in the same process (or the VMA?). In theory, we My instinct is that MUXing a PTE bit like this will lead to some subtle problems that won't appear on arches that support either one or both of the features independently and unconditionally. Surely better to limit ourselves to either "arm64 will only support uffd-wp" or "arm64 will support both uffd-wp and soft-dirty". That way, we could move ahead with reviewing/merging the uffd-wp support asynchronously to deciding whether we want to support soft-dirty. >> >> For instance to have dirty memory tracking in CRIU for an application that >> uses uffd-wp :) >> > > Hah! Not a concern for application on architectures where uffd-wp does not exist > yet! Well, initially, until these applications exist and make use of it :P > > Also, I'm not sure if CRIU can checkpoint each and every application ... I > suspect one has to draw a line what can be supported and what not. > > Case in point: how should CRIU checkpoint an application that is using softdirty > tracking itself? If I'm not missing something important, that might not work .... > > If the answer is "no other application is using soft-dirty tracking", then it's > really a shame we have to carry this baggage (+waste precious PTE bits) only for > one application ...