Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp6104241pxu; Wed, 23 Dec 2020 13:42:11 -0800 (PST) X-Google-Smtp-Source: ABdhPJylbbvIUR1pl8QuA+ZATYm1KXDzzZ603ufovEtBFboFreY1GJoDWUJoTxdFHz+x/JKF4FHE X-Received: by 2002:a17:906:d209:: with SMTP id w9mr24871791ejz.211.1608759731436; Wed, 23 Dec 2020 13:42:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1608759731; cv=none; d=google.com; s=arc-20160816; b=DHlhm9aqGeA/ktnYkOxtJALntVRj3ck0y6PTi2P8Xnw8mouX6uXJfpf8iPIewCVRlN iZEmb+8+cdZ+Qmt1NPCpqdk3639Te6NEHU365EG6hvkep44YB4B+yNmar1qXcI7pqoCy RBjZv/I1nQzb8pEI15L44w23bNrzM/VW1Oqi8MGMlganb3vhZe7b93FYt+mFhz3X0+Ug PAIc38aeZWEPedMqXSek3bEgXffvnDKPHhU8t7ntw9AhEUhPwFDsB4fPnos3sKZJ8uqH n6RWspQO8n+vrZGZnO8qjq3MPnxif185+fuZ9FtlURD93Yx9XdOII2SJXIzMN32X2T4O 8aiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=n4KJUOQ5ygu+EEs0eGeRkAyIIxVGqGUkA4CfFbcmhdE=; b=OtpbLHbK4CAClJsZXokscoe7Yhd4v9FWGoheQgzQqYqeb5sTMDnwcH6NjfJkuPpHhc Y6zYEAcRPQXpAK0bjQXmMODUBvHyqtOja33HpBsKxf5ir6vgHqNyZ2xzETqEoTJg0opa JqeLUMGy30GcfurFDSb23+Ns6CqO8oEXVaZBB31Li6EDu2tZMmJRpc+JFBAlu2At1Akb 2Yn9DVM6J0+JdWuFZ/TH6jbvd5gMktmObZLUsEtjBy2OPaFt39a3TXZo/xDPg2WBklbl +APgLd7Flg7tnugsblX4VK/efgo1IP3fYvduO/TbtCXK/vUUJHaRAujPZdWibl37LKWx xplQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dKRNeKok; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u18si13956789eda.523.2020.12.23.13.41.49; Wed, 23 Dec 2020 13:42:11 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=dKRNeKok; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726839AbgLWVki (ORCPT + 99 others); Wed, 23 Dec 2020 16:40:38 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:32112 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725270AbgLWVkh (ORCPT ); Wed, 23 Dec 2020 16:40:37 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1608759550; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=n4KJUOQ5ygu+EEs0eGeRkAyIIxVGqGUkA4CfFbcmhdE=; b=dKRNeKokM7953ig9GpA/D8VHXCIld4sC9AzkEenCnL+4qFXZqFfhTG7VQeB+m57jvlshwI 4fEPHKnfepeodhJK81jOxqXokYjCyQ/8lQsJqaLIHHBpRrKVrKxOQJcnJbUUohLt26l72J HGCymxzlWbicmHzwKXLh/ZKc4q/hUKQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-464-IH7l51CUNKGXJql0Seo12g-1; Wed, 23 Dec 2020 16:39:06 -0500 X-MC-Unique: IH7l51CUNKGXJql0Seo12g-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id A7CFB803622; Wed, 23 Dec 2020 21:39:04 +0000 (UTC) Received: from mail (ovpn-112-5.rdu2.redhat.com [10.10.112.5]) by smtp.corp.redhat.com (Postfix) with ESMTPS id F33BD614F5; Wed, 23 Dec 2020 21:39:00 +0000 (UTC) Date: Wed, 23 Dec 2020 16:39:00 -0500 From: Andrea Arcangeli To: Yu Zhao Cc: Andy Lutomirski , Linus Torvalds , Peter Xu , Nadav Amit , linux-mm , lkml , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , stable , Minchan Kim , Will Deacon , Peter Zijlstra Subject: Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect Message-ID: References: <1FCC8F93-FF29-44D3-A73A-DF943D056680@gmail.com> <20201221223041.GL6640@xz-x1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.0.3 (2020-12-04) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 22, 2020 at 08:36:04PM -0700, Yu Zhao wrote: > Thanks for the details. I hope we can find a way put the page_mapcount back where there's a page_count right now. If you're so worried about having to maintain a all defined well documented (or to be documented even better if you ACK it) marker/catcher for userfaultfd_writeprotect, I can't see how you could consider to maintain the page fault safe against any random code leaving too permissive TLB entries out of sync of the more restrictive pte permissions as it was happening with clear_refs_write, which worked by luck until page_mapcount was changed to page_count. page_count is far from optimal, but it is a feature it finally allowed us to notice that various code (clear_refs_write included apparently even after the fix) leaves stale too permissive TLB entries when it shouldn't. The question is only which way you prefer to fix clear_refs_write and I don't think we can deviate from those 3 methods that already exist today. So clear_refs_write will have to pick one of those and currently it's not falling in the same category with mprotect even after the fix. I think if clear_refs_write starts to take the mmap_write_lock and really start to operate like mprotect, only then we can consider to make userfaultfd_writeprotect also operate like mprotect. Even then I'd hope we can at least be allowed to make it operate like KSM write_protect_page for len <= HPAGE_PMD_SIZE, something that clear_refs_write cannot do since it works in O(N) and tends to scan everything at once, so there would be no point to optimize not to defer the flush, for a process with a tiny amount of virtual memory mapped. vm86 also should be fixed to fall in the same category with mprotect, since performance there is irrelevant. Thanks, Andrea