Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp125870pxp; Wed, 16 Mar 2022 01:58:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz9FDm7hhhoH8nKOTyo8GSA9swVAx1pqiEsbuAgyZFfuj0YyeCpKLsbZm9ZT12evWXcdNdU X-Received: by 2002:aa7:cb18:0:b0:413:3a7a:b5d6 with SMTP id s24-20020aa7cb18000000b004133a7ab5d6mr28232469edt.254.1647421089199; Wed, 16 Mar 2022 01:58:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647421089; cv=none; d=google.com; s=arc-20160816; b=XXr5vjOiLzh7KsSdDpzU0uBL5ZNUt1WK1Bal0YsP0F1E+xSTr+QaE3PNNUZpnMUYOh HVdyW1AXJ8pGoZcqCn5tIHEh60wvVlIi3u3GbH5v9lW+zJzVVxK8C/pkOUEqnJg3C4Gd lm7k8LwVZTBemoJpdHBItn5HTQpOMpj06UCNhkZMty4LfvvvggVVWcIVrsszAHJN6G7C bwBGy+MFx3MfMRf/NqakKe04JykNPFi1BPHWI/Y7BwGJf4G05klbrvFAUpP5N3ABJgvf NeODefPIzP5xwfwrJTZhW5rbMGw6FHbK0d5aKkT9Uh2BSX/AQnavyErCbAegBMCWnAFP jObQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=O2sXHURm0NoY8IkGl4zp8hYr0GiRSSumlHJx+8DBDR8=; b=fioAg2wSdlihrZjTHKnb6bSdFHck9LW6XDnSgKkNKLSFCW5xiBaEFHJBzib9ZOH/eo Z5S1vlfICegvnvBK8KAgPTR7VR2/3g7lFvapQEAzbBVhXh9l1U0KRC7iDdb7iHzVUq4Q G9b95Jg7pHfW4pR0KqIqgMXW/U19wQo6j8nPzAbwdlSfPVxwH/VcK1hJ/qbtx2HS/9z6 clEcrmO+XimrZBXWnyyt5f1rERkmtcKJn6CI5HesmaskoFmme5q4owsZyITiAezjzvD/ ZDmpHYo0bQHXt8mQaZ7bMmXIXcbqIsObfU8kfSlYRzPcjLzmsN2fWX778GOzu2Q7E6I0 NQWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=VhtWPU5h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id eb7-20020a0564020d0700b0041845189396si948367edb.598.2022.03.16.01.57.43; Wed, 16 Mar 2022 01:58:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=VhtWPU5h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237283AbiCNUvj (ORCPT + 99 others); Mon, 14 Mar 2022 16:51:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51342 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234806AbiCNUvi (ORCPT ); Mon, 14 Mar 2022 16:51:38 -0400 Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2CAA63C49A for ; Mon, 14 Mar 2022 13:50:27 -0700 (PDT) Received: by mail-pj1-x1029.google.com with SMTP id m11-20020a17090a7f8b00b001beef6143a8so500637pjl.4 for ; Mon, 14 Mar 2022 13:50:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=O2sXHURm0NoY8IkGl4zp8hYr0GiRSSumlHJx+8DBDR8=; b=VhtWPU5hfae5y3UUUMJ53v9z+E9cDVJIzdnbdhqMLcZjn/y1tL1g4Ip4yO+C4Fok2K bkA2PkP51EJHCCddXQKObdt27Fru7/62k24TcydvfkiIVLHPqli3aLjUtpSqhf6fdvm8 1tdvjT2lcpLsz9HkJYzRiUDg0oCZ5KAIkOCZllpiAF6fb+esenRjlDpqT7+FW9d7AchY mGVomHGFT+saFC/IwCLVOpGVGM13KdQCU94o195Z/wWPW30GzVIhaRLhaD0BjuDsiejE 24QuSkzXgDPKVuG/LFy/ynbPOQDZfiRQ0Ri1H2+DhkMdX+72GpsZkfJDWYWMHLKiSbnT /mxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=O2sXHURm0NoY8IkGl4zp8hYr0GiRSSumlHJx+8DBDR8=; b=3vsInfFwLCHj8nGxSRiaXM671OdkrWd96q1j+B7fxECB2dr8PM2RKtDPrBlGFdY8ta o8HozaupAaLHw7e0LcqdlzZwPzmI5tyubML7rcC84S1AFi7uJh7VRNS0kgSNH/9z9w6Y 3VgVa8fC1ny/vvo7G91loAo5tPQboF5eFFFpw2A4f5DmRviRz6Q94duKmvHPPFwhxYfc 7JqLbnhw1/kD5pejCSGzGNEKgVzgxsdcT1R4YBLIsH8al0zvTfZkJ3GhE2G/YCQ6/bQK qljh7O8gYsQ3F9PpUzKWNYl/B+6Mt9VK/ju11VzuzjPS4/XekM76Ecd/L3kxOwMtQXEg Fe0g== X-Gm-Message-State: AOAM530HvZhfXG86Yj/qXZ8pUowkgfnLJUD/+g2U6iwXaZSh5d3uUCk5 oR9DWVnrzhRTw7/wCDcrgnEfWaADXwxwGMGC1p9OTA== X-Received: by 2002:a17:902:7296:b0:14b:4bc6:e81 with SMTP id d22-20020a170902729600b0014b4bc60e81mr25125235pll.132.1647291026682; Mon, 14 Mar 2022 13:50:26 -0700 (PDT) MIME-Version: 1.0 References: <20220302082718.32268-1-songmuchun@bytedance.com> <20220302082718.32268-6-songmuchun@bytedance.com> In-Reply-To: From: Dan Williams Date: Mon, 14 Mar 2022 13:50:16 -0700 Message-ID: Subject: Re: [PATCH v4 5/6] dax: fix missing writeprotect the pte entry To: Muchun Song Cc: Matthew Wilcox , Jan Kara , Al Viro , Andrew Morton , Alistair Popple , Yang Shi , Ralph Campbell , Hugh Dickins , Xiyu Yang , "Kirill A. Shutemov" , Ross Zwisler , Christoph Hellwig , linux-fsdevel , Linux NVDIMM , Linux Kernel Mailing List , Linux MM , Xiongchun duan , Muchun Song Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 11, 2022 at 1:06 AM Muchun Song wrote: > > On Thu, Mar 10, 2022 at 8:59 AM Dan Williams wrote: > > > > On Wed, Mar 2, 2022 at 12:30 AM Muchun Song wrote: > > > > > > Currently dax_mapping_entry_mkclean() fails to clean and write protect > > > the pte entry within a DAX PMD entry during an *sync operation. This > > > can result in data loss in the following sequence: > > > > > > 1) process A mmap write to DAX PMD, dirtying PMD radix tree entry and > > > making the pmd entry dirty and writeable. > > > 2) process B mmap with the @offset (e.g. 4K) and @length (e.g. 4K) > > > write to the same file, dirtying PMD radix tree entry (already > > > done in 1)) and making the pte entry dirty and writeable. > > > 3) fsync, flushing out PMD data and cleaning the radix tree entry. We > > > currently fail to mark the pte entry as clean and write protected > > > since the vma of process B is not covered in dax_entry_mkclean(). > > > 4) process B writes to the pte. These don't cause any page faults since > > > the pte entry is dirty and writeable. The radix tree entry remains > > > clean. > > > 5) fsync, which fails to flush the dirty PMD data because the radix tree > > > entry was clean. > > > 6) crash - dirty data that should have been fsync'd as part of 5) could > > > still have been in the processor cache, and is lost. > > > > Excellent description. > > > > > > > > Just to use pfn_mkclean_range() to clean the pfns to fix this issue. > > > > So the original motivation for CONFIG_FS_DAX_LIMITED was for archs > > that do not have spare PTE bits to indicate pmd_devmap(). So this fix > > can only work in the CONFIG_FS_DAX_LIMITED=n case and in that case it > > seems you can use the current page_mkclean_one(), right? > > I don't know the history of CONFIG_FS_DAX_LIMITED. > page_mkclean_one() need a struct page associated with > the pfn, do the struct pages exist when CONFIG_FS_DAX_LIMITED > and ! FS_DAX_PMD? CONFIG_FS_DAX_LIMITED was created to preserve some DAX use for S390 which does not have CONFIG_ARCH_HAS_PTE_DEVMAP. Without PTE_DEVMAP then get_user_pages() for DAX mappings fails. To your question, no, there are no pages at all in the CONFIG_FS_DAX_LIMITED=y case. So page_mkclean_one() could only be deployed for PMD mappings, but I think it is reasonable to just disable PMD mappings for the CONFIG_FS_DAX_LIMITED=y case. Going forward the hope is to remove the ARCH_HAS_PTE_DEVMAP requirement for DAX, and use PTE_SPECIAL for the S390 case. However, that still wants to have 'struct page' availability as an across the board requirement. > If yes, I think you are right. But I don't > see this guarantee. I am not familiar with DAX code, so what am > I missing here? Perhaps I missed a 'struct page' dependency? I thought the bug you are fixing only triggers in the presence of PMDs. The CONFIG_FS_DAX_LIMITED=y case can still use the current "page-less" mkclean path for PTEs.