Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp681199pxv; Thu, 15 Jul 2021 13:19:50 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyQYYKJXotQbcLNGlF3blHMHZGwD6OmX2//s3S4YLMhIIt3Y1X9WHW6DBlBAoZbUgzsIYwm X-Received: by 2002:a05:6402:1692:: with SMTP id a18mr9662288edv.344.1626380390610; Thu, 15 Jul 2021 13:19:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626380390; cv=none; d=google.com; s=arc-20160816; b=g/e6JHqetWNzXGPoqcHJf0RS+UpjGcvfRJOdJWT2ADTt9LiPFIaRD1sRC/v6rrJss/ H4Cfacw03q9fQ1gMSyp+IpLpHYj8Y4MvsmwQ+3jPZVkiyGBPo3xGuHOG0OxLQXhW2bm3 gJFXRLMgR42eMmnEH28AIISwCFiCYKIOifu4jg+dHgPFljACy6jlgeOORfnO7B/2pfyX QtMzx1oCdMCoByOnUoB6tbsqpLV6pQwp3Y8cZzxVvGsXsvv/pfLs1H+yI/QZ0nxVezQN YtUCC+88cPzQSg3hJdKeiunQbIl8eBYH1Mew9pUF5xFg98LnsKE0r9opvS4agCVU3GoL Jhlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=8MXCY9EtHu8l00OeGEX9KSWIeSxJZ91wjW9WQpZLBho=; b=F0Z13z/K/rXPr6P+5+lrrNxYxUwlHwkoUDsnTldkct294S4HYzIm0WPDV1vl81pqCT syQml6vAnYXC46sd08DiHpzs52XZgMcguC3eVGIvDVDhub8qSr6yl9GvD4p63OBnoY/s P4xkYHF4xlCp3vUzfvuUqNi9yrFj2SDs9teyeTZ+wv+MgI6FK/lU0AJkL760kQDTl4qH rEQfZlcRfPEsLmEo1H960aETl24Br+ABHKkjV3pHR3YCwSd4GAskVi7rcN2gOvFBOHMT 3rfj2J3ST7Ep4MHZF2nh4Kz5tERasOEUCF6cuoHjGfLzAbcNyqwVJXUKXcxuc964RwNz AEvg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=C3swhgjX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id mp10si5176588ejc.391.2021.07.15.13.19.26; Thu, 15 Jul 2021 13:19:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=C3swhgjX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1343681AbhGOUU7 (ORCPT + 99 others); Thu, 15 Jul 2021 16:20:59 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:33664 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S244476AbhGOUS5 (ORCPT ); Thu, 15 Jul 2021 16:18:57 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1626380163; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8MXCY9EtHu8l00OeGEX9KSWIeSxJZ91wjW9WQpZLBho=; b=C3swhgjXGtkiR+bvTqDo8b8pWCLqSxPCi0R49QXG5yM+PZCfPi+h3GQx8e2raHEu9Grmjs rniO3U89LHlAX4G2sRz+pw54vZJVWtsSnW2B8XqMTVkdpxYFrNlegC6qniXG1aTpIKQ4o9 +AChKYofxIeLi1QfeigJg1Ns/Asjnmo= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-395-rCIPxT3VPlKTXlItnBpTvQ-1; Thu, 15 Jul 2021 16:16:02 -0400 X-MC-Unique: rCIPxT3VPlKTXlItnBpTvQ-1 Received: by mail-qt1-f198.google.com with SMTP id t6-20020ac80dc60000b029024e988e8277so4899691qti.23 for ; Thu, 15 Jul 2021 13:16:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=8MXCY9EtHu8l00OeGEX9KSWIeSxJZ91wjW9WQpZLBho=; b=TQesIiXtzt70cxVm6Kfw4XBeQYQ4tMy8y7iF136eJLBVrypM1mNM3b5Nb0vI7WmbiV 2lTO1UMUBkbjz0go3m47GekZfwh7VdIv8jIQrz/61/q1RCAUt1+mW77eAgF7QStnMoyN HvPg2ejUslUi9g+6Gr16NehuJrV92rcn27n9SmRJIQL9OBYPwI47h35m2p0+WHD19YUe ZrJ4PmQknwiT+8uxrJUDE66U95hWr/pSn0Fv1ikXoDyUdXIxtQprp5wNsOnDhMpGnDkM S3eoxDUKMOpLVJK86qVw8Mtlwf5kv4C7IW5PaJt6CAwd9mQLzA8vLOAPXdO99SvQ6Xzk 1SmA== X-Gm-Message-State: AOAM530kLI0fyJB6m9Es/WS2tivt4+GFly7Nn6aVGUufErmqC51wDzat 3g9xMbaHtQEAotoRg3aOtm3BJESL3P8TD+x3XmMPe0ksBX6YDsyB4GSPukll5P/mHMTaurfPxyY lHI5wd+oZ9ECdO3kDUjQkH4JLA0uyFmAJ+Qcn61HuxF5wdxNvhhU+PSO6B5LDtEAvff8gcIBi6g == X-Received: by 2002:a0c:a997:: with SMTP id a23mr6444410qvb.48.1626380161875; Thu, 15 Jul 2021 13:16:01 -0700 (PDT) X-Received: by 2002:a0c:a997:: with SMTP id a23mr6444371qvb.48.1626380161571; Thu, 15 Jul 2021 13:16:01 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-65-184-144-111-238.dsl.bell.ca. [184.144.111.238]) by smtp.gmail.com with ESMTPSA id x15sm2931686qkm.66.2021.07.15.13.16.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Jul 2021 13:16:01 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Axel Rasmussen , Nadav Amit , Jerome Glisse , "Kirill A . Shutemov" , Jason Gunthorpe , Alistair Popple , Andrew Morton , David Hildenbrand , peterx@redhat.com, Andrea Arcangeli , Matthew Wilcox , Mike Kravetz , Tiberiu Georgescu , Hugh Dickins , Miaohe Lin , Mike Rapoport Subject: [PATCH v5 11/26] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Date: Thu, 15 Jul 2021 16:15:58 -0400 Message-Id: <20210715201558.211445-1-peterx@redhat.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20210715201422.211004-1-peterx@redhat.com> References: <20210715201422.211004-1-peterx@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org File-backed memory differs from anonymous memory in that even if the pte is missing, the data could still resides either in the file or in page/swap cache. So when wr-protect a pte, we need to consider none ptes too. We do that by installing the uffd-wp special swap pte as a marker. So when there's a future write to the pte, the fault handler will go the special path to first fault-in the page as read-only, then report to userfaultfd server with the wr-protect message. On the other hand, when unprotecting a page, it's also possible that the pte got unmapped but replaced by the special uffd-wp marker. Then we'll need to be able to recover from a uffd-wp special swap pte into a none pte, so that the next access to the page will fault in correctly as usual when trigger the fault handler next time, rather than sending a uffd-wp message. Special care needs to be taken throughout the change_protection_range() process. Since now we allow user to wr-protect a none pte, we need to be able to pre-populate the page table entries if we see !anonymous && MM_CP_UFFD_WP requests, otherwise change_protection_range() will always skip when the pgtable entry does not exist. Note that this patch only covers the small pages (pte level) but not covering any of the transparent huge pages yet. But this will be a base for thps too. Signed-off-by: Peter Xu --- mm/mprotect.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/mm/mprotect.c b/mm/mprotect.c index 4b743394afbe..8ec85b276975 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -186,6 +187,32 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, set_pte_at(vma->vm_mm, addr, pte, newpte); pages++; } + } else if (unlikely(is_swap_special_pte(oldpte))) { + if (uffd_wp_resolve && !vma_is_anonymous(vma) && + pte_swp_uffd_wp_special(oldpte)) { + /* + * This is uffd-wp special pte and we'd like to + * unprotect it. What we need to do is simply + * recover the pte into a none pte; the next + * page fault will fault in the page. + */ + pte_clear(vma->vm_mm, addr, pte); + pages++; + } + } else { + /* It must be an none page, or what else?.. */ + WARN_ON_ONCE(!pte_none(oldpte)); + if (unlikely(uffd_wp && !vma_is_anonymous(vma))) { + /* + * For file-backed mem, we need to be able to + * wr-protect even for a none pte! Because + * even if the pte is null, the page/swap cache + * could exist. + */ + set_pte_at(vma->vm_mm, addr, pte, + pte_swp_mkuffd_wp_special(vma)); + pages++; + } } } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); @@ -219,6 +246,25 @@ static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd) return 0; } +/* + * File-backed vma allows uffd wr-protect upon none ptes, because even if pte + * is missing, page/swap cache could exist. When that happens, the wr-protect + * information will be stored in the page table entries with the marker (e.g., + * PTE_SWP_UFFD_WP_SPECIAL). Prepare for that by always populating the page + * tables to pte level, so that we'll install the markers in change_pte_range() + * where necessary. + * + * Note that we only need to do this in pmd level, because if pmd does not + * exist, it means the whole range covered by the pmd entry (of a pud) does not + * contain any valid data but all zeros. Then nothing to wr-protect. + */ +#define change_protection_prepare(vma, pmd, addr, cp_flags) \ + do { \ + if (unlikely((cp_flags & MM_CP_UFFD_WP) && pmd_none(*pmd) && \ + !vma_is_anonymous(vma))) \ + WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd)); \ + } while (0) + static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) @@ -237,6 +283,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, next = pmd_addr_end(addr, end); + change_protection_prepare(vma, pmd, addr, cp_flags); + /* * Automatic NUMA balancing walks the tables with mmap_lock * held for read. It's possible a parallel update to occur -- 2.31.1