Received: by 2002:a25:ab43:0:0:0:0:0 with SMTP id u61csp1366596ybi; Wed, 19 Jun 2019 19:24:01 -0700 (PDT) X-Google-Smtp-Source: APXvYqxrFzmdj4wqBVw76bMH7X9Y2wDkbx1hXd8pQcMjrNW/bSvacDLio8uPD60sqtoaHyhOP10E X-Received: by 2002:a17:902:b43:: with SMTP id 61mr125469010plq.322.1560997440985; Wed, 19 Jun 2019 19:24:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560997440; cv=none; d=google.com; s=arc-20160816; b=pQmAAyEkxBQqDrZWoNWtUo8oLkatmS3s5g74Aud1tUMFHdJdSVbpbjwd57+hldaW2b a6NF6vtv/atOGeeTllVEqRiswuxiVYd/2SWa1COCYyKBwIt/IGDNn44jrZNPE1kbM0vi PYHOxOof3mKh6uCqoMy06fayn6DqNuocQlhiyjrUEHLREqcWk/y0Gde+Qyb4WtsZegZz lzPl1cKP2u/RW1jMEFJrgH7fjHJBU3dD4HjSpxmXzq1+WbI7f+OJvDrZuNXllyi6lFna icaG4yoSV7ldPREviyOgfW+wXiz3Z4bCnXoy8w4jIBbPjVu7uVNebzGoCSIe+S9E8KT2 L1NQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=6t42ZVA7nQGmUriMjbIStZ4r/pZ9wxkWvWVsOfcj4vw=; b=Q8Crm9vQ5oFJ8+PZA4Hf5HI8xl240MJ0LcesYD0Vfsg7kFZG/dzbiEDj5YTRPwtloS sgTxBSga1Bb1pJ/qn2awq3g8R6TRRVfEEEgmYJQP/SP8neAygpf1MdncmMdYVAQMFPvr iYbXoMc+GVZ2WP8/CQkinUC/BnKMqw5/8p90T3jhyRVYIhfb2lBgTqVMF1NEmPIglcij RSnJR0XgG+gd+SUvVgagfdlPYRb/Ihl1NBsA8x65sou4jkFjpTpvQ5rGna6MrqtR8NZ/ Unaf6rxNYnr3AI/4V2sYV787eVQ/xiNEXnoav1jDoYo+dUV7UkwlJfLzpzt56PmaB+Ki GFDQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u13si4322990pgp.478.2019.06.19.19.23.45; Wed, 19 Jun 2019 19:24:00 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731428AbfFTCVy (ORCPT + 99 others); Wed, 19 Jun 2019 22:21:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:50766 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726480AbfFTCVx (ORCPT ); Wed, 19 Jun 2019 22:21:53 -0400 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id EC73E30872F8; Thu, 20 Jun 2019 02:21:52 +0000 (UTC) Received: from xz-x1.redhat.com (ovpn-12-78.pek2.redhat.com [10.72.12.78]) by smtp.corp.redhat.com (Postfix) with ESMTP id DB4D21001E69; Thu, 20 Jun 2019 02:21:43 +0000 (UTC) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: David Hildenbrand , Hugh Dickins , Maya Gokhale , Jerome Glisse , Pavel Emelyanov , Johannes Weiner , peterx@redhat.com, Martin Cracauer , Denis Plotnikov , Shaohua Li , Andrea Arcangeli , Mike Kravetz , Marty McFadden , Mike Rapoport , Mel Gorman , "Kirill A . Shutemov" , "Dr . David Alan Gilbert" Subject: [PATCH v5 07/25] userfaultfd: wp: hook userfault handler to write protection fault Date: Thu, 20 Jun 2019 10:19:50 +0800 Message-Id: <20190620022008.19172-8-peterx@redhat.com> In-Reply-To: <20190620022008.19172-1-peterx@redhat.com> References: <20190620022008.19172-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.47]); Thu, 20 Jun 2019 02:21:53 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Andrea Arcangeli There are several cases write protection fault happens. It could be a write to zero page, swaped page or userfault write protected page. When the fault happens, there is no way to know if userfault write protect the page before. Here we just blindly issue a userfault notification for vma with VM_UFFD_WP regardless if app write protects it yet. Application should be ready to handle such wp fault. v1: From: Shaohua Li v2: Handle the userfault in the common do_wp_page. If we get there a pagetable is present and readonly so no need to do further processing until we solve the userfault. In the swapin case, always swapin as readonly. This will cause false positive userfaults. We need to decide later if to eliminate them with a flag like soft-dirty in the swap entry (see _PAGE_SWP_SOFT_DIRTY). hugetlbfs wouldn't need to worry about swapouts but and tmpfs would be handled by a swap entry bit like anonymous memory. The main problem with no easy solution to eliminate the false positives, will be if/when userfaultfd is extended to real filesystem pagecache. When the pagecache is freed by reclaim we can't leave the radix tree pinned if the inode and in turn the radix tree is reclaimed as well. The estimation is that full accuracy and lack of false positives could be easily provided only to anonymous memory (as long as there's no fork or as long as MADV_DONTFORK is used on the userfaultfd anonymous range) tmpfs and hugetlbfs, it's most certainly worth to achieve it but in a later incremental patch. v3: Add hooking point for THP wrprotect faults. CC: Shaohua Li Signed-off-by: Andrea Arcangeli [peterx: don't conditionally drop FAULT_FLAG_WRITE in do_swap_page] Reviewed-by: Mike Rapoport Reviewed-by: Jerome Glisse Signed-off-by: Peter Xu --- mm/memory.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index ddf20bd0c317..05bcd741855b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2579,6 +2579,11 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; + if (userfaultfd_wp(vma)) { + pte_unmap_unlock(vmf->pte, vmf->ptl); + return handle_userfault(vmf, VM_UFFD_WP); + } + vmf->page = vm_normal_page(vma, vmf->address, vmf->orig_pte); if (!vmf->page) { /* @@ -3794,8 +3799,11 @@ static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf) /* `inline' is required to avoid gcc 4.1.2 build error */ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf, pmd_t orig_pmd) { - if (vma_is_anonymous(vmf->vma)) + if (vma_is_anonymous(vmf->vma)) { + if (userfaultfd_wp(vmf->vma)) + return handle_userfault(vmf, VM_UFFD_WP); return do_huge_pmd_wp_page(vmf, orig_pmd); + } if (vmf->vma->vm_ops->huge_fault) return vmf->vma->vm_ops->huge_fault(vmf, PE_SIZE_PMD); -- 2.21.0