Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp6168538pxu; Wed, 23 Dec 2020 16:00:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJwb+LKnmC/4OyhztX0eUZFP2SuKeMIysN8iqIHqw3KeQ7GgEFBn9wuAkBqCk1kXM59A/MuX X-Received: by 2002:a05:6402:8d5:: with SMTP id d21mr16810980edz.57.1608768000246; Wed, 23 Dec 2020 16:00:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608768000; cv=none; d=google.com; s=arc-20160816; b=zZpWFQBF8f9myS/nlG8Sbf55tI2u6yBIb7Suk6+JQTRjoVML5nREfhY8JoIIfRKZt7 st9n25PmLNn0kxFGn8qidObVsJ095WGn8Xt0wfDjRFwRmLm65ZyCRZA8QylZUDGxRrV9 Uu0bp8JGgL5FzvRuqT85Xm1db0RC9xTDBaIBR5+K8ed4htGETzj4Pm+ewwINFLjijn63 J3Rd20gfFv0imZeGeGYmCXhqFl7ouz+yE7AYMWN/EXsn0AYAO5nsss8ohLfmIJqnTDY7 TO5qHRLMcq/a5+KvOoIfNu9xjhW70Unp9kefEAgIT3G7FWZythKV+L9rQInPCib8UMxu XU9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=fX9BrNYtyqUJqlsQzDSDD/LvzkNk1AuVbPuwYkphfN8=; b=U1vLTNEsogcn9v8EaNd6+v0o8I7Yw53/6R5wNYtEbHJb3RE3wgmPchZH7YUiw+oi0b ehaKrEyC1kbXgDFeDhfjrmDECuB5jLfI6r6d10RUIPzGQagO5T3N6UthYEGgPjBoJFhy WLzjG6Sbf5KObMQDLpPS2o0fwa/RkI99GsWTHzCzrXAXq6NiM5SS4Dm1LTr8R2P2trf3 jPimwXRtdoxJxjo9Vech1ETEQhAAIi0rnH03y7TFukY3KHV4z3NQ/QauukI8xfqoTZ2e MlV7h+T8/EpTRb+hVaP4hWY5sOjsoDAOtuQ/FJkKgDu7WsEcK1VUsaPB10GALDeTlVC9 N+mg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=B5+dGlJs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g24si12614793ejh.77.2020.12.23.15.59.26; Wed, 23 Dec 2020 16:00:00 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=B5+dGlJs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727396AbgLWX4p (ORCPT + 99 others); Wed, 23 Dec 2020 18:56:45 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:38927 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727029AbgLWX4p (ORCPT ); Wed, 23 Dec 2020 18:56:45 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1608767718; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fX9BrNYtyqUJqlsQzDSDD/LvzkNk1AuVbPuwYkphfN8=; b=B5+dGlJsrEAuD+7fCzNVP2eXQyNsjq4LkCzQy8tgzIRsdFqadtSsnJd2U5DKBaOfMncdGc mdKeaUUIXY/FdOcS5ke4lP3drKSqAnZ+DhSPnF+IEy/MgkH5lNdUM3v1vaQGCxak7+0g7c zRdTvvm8Lkx00GX8oPEOoai/YaFZqsY= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-60-9uL81yDSN_yiXHLV5Sjddg-1; Wed, 23 Dec 2020 18:55:17 -0500 X-MC-Unique: 9uL81yDSN_yiXHLV5Sjddg-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 15FB1801AC0; Wed, 23 Dec 2020 23:55:15 +0000 (UTC) Received: from mail (ovpn-112-5.rdu2.redhat.com [10.10.112.5]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8E4405D74C; Wed, 23 Dec 2020 23:55:11 +0000 (UTC) Date: Wed, 23 Dec 2020 18:55:11 -0500 From: Andrea Arcangeli To: Nadav Amit Cc: Yu Zhao , Peter Zijlstra , Minchan Kim , Linus Torvalds , Peter Xu , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Andy Lutomirski , Will Deacon Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect Message-ID: References: <20201221172711.GE6640@xz-x1> <76B4F49B-ED61-47EA-9BE4-7F17A26B610D@gmail.com> <9E301C7C-882A-4E0F-8D6D-1170E792065A@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/2.0.3 (2020-12-04) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Dec 23, 2020 at 02:45:59PM -0800, Nadav Amit wrote: > I think it may be reasonable. Whatever solution used, there will be 2 users of it: uffd-wp will use whatever technique used by clear_refs_write to avoid the mmap_write_lock. My favorite is Yu's patch and not the group lock anymore. The cons is it changes the VM rules (which kind of reminds me my initial proposal of adding a spurious tlb flush if mm_tlb_flush_pending is set, except I didn't correctly specify it'd need to go in the page fault), but it still appears the simplest. > Just a proposal: At some point we can also ask ourselves whether the > “artificial" limitation of the number of software bits per PTE should really > limit us, or do we want to hold some additional metadata per-PTE by either > putting it in an adjacent page (holding 64-bits of additional software-bits > per PTE) or by finding some place in the page-struct to link to this > metadata (and have the liberty of number of bits per PTE). One of the PTE > software-bits can be repurposed to say whether there is “extra-metadata” > associated with the PTE. > > I am fully aware that there will be some overhead associated, but it > can be limited to less-common use-cases. That's a good point, so far far we didn't run out so it's not an immediate concern. (as opposed we run out in page->flags where the PG_tail went to some LSB). In general kicking the can down the road sounds like the best thing to do for those bit shortage matters, until we can't anymore at least.. There's no gain to the kernel runtime, in doing something generically good here (again see where PG_tail rightfully went). So before spending RAM and CPU, we'd need to find a more compact encoding with the bits we already have available. This reminds me again we could double check if we could make VM_UFFD_WP mutually exclusive with VM_SOFTDIRTY. I wasn't sure if it could ever happen in a legit way to use both at the same time (CRIU on a app using uffd-wp for its own internal mm management?). Theoretically it's too late already for it, but VM_UFFD_WP is relatively new, if we're sure it cannot ever happen in a legit way, it would be possible to at least evaluate/discuss it. This is an immediate matter. What we'll do if we later run out, is not an immediate matter instead, because it won't make our life any simpler to resolve it now. Thanks, Andrea