Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp3714859pxp; Tue, 15 Mar 2022 05:01:21 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzbIJRz4qispAxp1vl3mDvdUpHmwtyCI766pqySKpwVtU/lkkaKy+GvVXW6JazlcW4/xl86 X-Received: by 2002:a17:907:72d0:b0:6db:4788:66a9 with SMTP id du16-20020a17090772d000b006db478866a9mr23111032ejc.516.1647345681445; Tue, 15 Mar 2022 05:01:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1647345681; cv=none; d=google.com; s=arc-20160816; b=cc24P2QKXbFHpbK13JCaHETCI+GfTyWWsz21e+dfdDueJrzL6FXBw2CyWRoYIJkEgW /NguaJYfa+iNCvFT8M9JrF36mPmOVum9gbXm9RZ3a5EPR66E9GAKZE3sgn49cRlhEJoo q0l2cKRASZyevmy/Kb5nhU79ew2ySB5kB6/ajWuTBe1vXWFRtYH/+T/UO5Dt4ClGqtkR GIkCp4TVDHYgfhWCAbsaNDtnqJxsjde11Mb+jDyDfpMb8Mqgjf1wzOInFLXe/uN0jWst 9enG8A0O365NQFfkvDRrWAwFsdvamug5W2o3K0nZva3wqqCo7evSjjH77WRpTm1wL+nK y+mQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=9fx9f/dGI0jG5qMZR80BsY4AEIW3adegKwssDhY/VxU=; b=m+Hrw1i7vKNshSjc2ARCr0Q70Piu6nwktKVbhh8Qon70WAx729gp/IbTXVpc/Zagp3 b93MS+5piuuA1l1VE+VKwyoPl2wMKVqYol+raTtdQJH521LcQe4BEC6eoET1RugdsLZk pLbgndxfJVxnNQahIr5ZsCe8cMX3ezQDAm9cgpgdQ+YhbqGq1eENyz67q76gEN0Kz1Uy uipfscvObt//dSZHcYFagVmpheoq8s/XeKpqMdaCZRiX/rreaOAu62Sdwdau6OpcAXnp pZFR3mjRArd4khEheVJ0xFB4+n2copZvpWtDHwdyxmmNgrgap/uGHjHSy/I2CKjTCpNL 6sBQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=X3kXd8Ig; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jr8-20020a170906a98800b006d0b4d921dfsi9973230ejb.515.2022.03.15.05.00.55; Tue, 15 Mar 2022 05:01:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=X3kXd8Ig; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242545AbiCOHye (ORCPT + 99 others); Tue, 15 Mar 2022 03:54:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59392 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S245033AbiCOHyd (ORCPT ); Tue, 15 Mar 2022 03:54:33 -0400 Received: from mail-yb1-xb33.google.com (mail-yb1-xb33.google.com [IPv6:2607:f8b0:4864:20::b33]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80C2BE0C9 for ; Tue, 15 Mar 2022 00:53:21 -0700 (PDT) Received: by mail-yb1-xb33.google.com with SMTP id y142so1396161ybe.11 for ; Tue, 15 Mar 2022 00:53:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9fx9f/dGI0jG5qMZR80BsY4AEIW3adegKwssDhY/VxU=; b=X3kXd8IgVsb6phSuLDtCPMN6ZSj5LyxjrsIZNHgQ1y74uWx+8ke/pSvk5/wrAfN+AV lwpxSb95g8tsy1IKUKROLcWljPhTXP8G8CdfQvYCYBCD4WKDaNM1W7gd26JURACTcKTK U3+872+E2nf2ZnX6YFLLKzHo1LqglmwIRHhnzsh/9xpK3F76XwIov/1a/2/lc3AjvK3y WJ3CsWyLfDSnxyYDTEkXFcjTEtAy0l249WxSCqLrNslHL2R5ohpXwQGVELyaSdGfbjGW r8S2HA/zPdzeDW0xO6HOu5NqHdJUdjPlrWCCjkMuxlduawfSy2iACKrnAHZi4WI2RqB9 NLBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9fx9f/dGI0jG5qMZR80BsY4AEIW3adegKwssDhY/VxU=; b=2xIKuvJ6N0rcRjGCxNPG4BKnOlfLGV91f6S2r7+JBCgGVClLPDI81YAERw8WbagWfY Y3VcTr37ZupbSWmtiKNf4SmF6OC6hKvrjt0EmyJC2EglU3P5oBAmKoEhNTnD+9qRR7ao HLyGRWgB6rcfumYZ6mxpPIISexl1X8hPtWWv8ttV/zw9JKz+uVzksoH6BfCVx34SSfvV lkU0vpU21kEQYmkbTCQGihjUUWNubJToKHZheRCQZfWpxO3PnaV6gXrnAvtj/hiRhIwD y4fxXVXIlUMYn4bsf7dJg0hCq3z2QK/thdPNVle9OymC6K14NO1N3NNpWNhJf4tx+agB i8oQ== X-Gm-Message-State: AOAM530mfYOaJv5Mkv9256z33qKcwJCrqg8yCvj1TXz4EPFGSXhjXdvK Uwwaz6k1yx6zJeZiYEreG1diGbOKzKCtisK3YtqBgXZZQNVWhnDM X-Received: by 2002:a25:dc4:0:b0:629:2337:f9ea with SMTP id 187-20020a250dc4000000b006292337f9eamr21531864ybn.6.1647330800437; Tue, 15 Mar 2022 00:53:20 -0700 (PDT) MIME-Version: 1.0 References: <20220302082718.32268-1-songmuchun@bytedance.com> <20220302082718.32268-6-songmuchun@bytedance.com> In-Reply-To: From: Muchun Song Date: Tue, 15 Mar 2022 15:51:31 +0800 Message-ID: Subject: Re: [PATCH v4 5/6] dax: fix missing writeprotect the pte entry To: Dan Williams Cc: Matthew Wilcox , Jan Kara , Al Viro , Andrew Morton , Alistair Popple , Yang Shi , Ralph Campbell , Hugh Dickins , Xiyu Yang , "Kirill A. Shutemov" , Ross Zwisler , Christoph Hellwig , linux-fsdevel , Linux NVDIMM , Linux Kernel Mailing List , Linux MM , Xiongchun duan , Muchun Song Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 15, 2022 at 4:50 AM Dan Williams wrote: > > On Fri, Mar 11, 2022 at 1:06 AM Muchun Song wrote: > > > > On Thu, Mar 10, 2022 at 8:59 AM Dan Williams wrote: > > > > > > On Wed, Mar 2, 2022 at 12:30 AM Muchun Song wrote: > > > > > > > > Currently dax_mapping_entry_mkclean() fails to clean and write protect > > > > the pte entry within a DAX PMD entry during an *sync operation. This > > > > can result in data loss in the following sequence: > > > > > > > > 1) process A mmap write to DAX PMD, dirtying PMD radix tree entry and > > > > making the pmd entry dirty and writeable. > > > > 2) process B mmap with the @offset (e.g. 4K) and @length (e.g. 4K) > > > > write to the same file, dirtying PMD radix tree entry (already > > > > done in 1)) and making the pte entry dirty and writeable. > > > > 3) fsync, flushing out PMD data and cleaning the radix tree entry. We > > > > currently fail to mark the pte entry as clean and write protected > > > > since the vma of process B is not covered in dax_entry_mkclean(). > > > > 4) process B writes to the pte. These don't cause any page faults since > > > > the pte entry is dirty and writeable. The radix tree entry remains > > > > clean. > > > > 5) fsync, which fails to flush the dirty PMD data because the radix tree > > > > entry was clean. > > > > 6) crash - dirty data that should have been fsync'd as part of 5) could > > > > still have been in the processor cache, and is lost. > > > > > > Excellent description. > > > > > > > > > > > Just to use pfn_mkclean_range() to clean the pfns to fix this issue. > > > > > > So the original motivation for CONFIG_FS_DAX_LIMITED was for archs > > > that do not have spare PTE bits to indicate pmd_devmap(). So this fix > > > can only work in the CONFIG_FS_DAX_LIMITED=n case and in that case it > > > seems you can use the current page_mkclean_one(), right? > > > > I don't know the history of CONFIG_FS_DAX_LIMITED. > > page_mkclean_one() need a struct page associated with > > the pfn, do the struct pages exist when CONFIG_FS_DAX_LIMITED > > and ! FS_DAX_PMD? > > CONFIG_FS_DAX_LIMITED was created to preserve some DAX use for S390 > which does not have CONFIG_ARCH_HAS_PTE_DEVMAP. Without PTE_DEVMAP > then get_user_pages() for DAX mappings fails. > > To your question, no, there are no pages at all in the > CONFIG_FS_DAX_LIMITED=y case. So page_mkclean_one() could only be > deployed for PMD mappings, but I think it is reasonable to just > disable PMD mappings for the CONFIG_FS_DAX_LIMITED=y case. > > Going forward the hope is to remove the ARCH_HAS_PTE_DEVMAP > requirement for DAX, and use PTE_SPECIAL for the S390 case. However, > that still wants to have 'struct page' availability as an across the > board requirement. Got it. Thanks for your patient explanation. > > > If yes, I think you are right. But I don't > > see this guarantee. I am not familiar with DAX code, so what am > > I missing here? > > Perhaps I missed a 'struct page' dependency? I thought the bug you are > fixing only triggers in the presence of PMDs. The Right. > CONFIG_FS_DAX_LIMITED=y case can still use the current "page-less" > mkclean path for PTEs. But I think introducing pfn_mkclean_range() could make the code simple and easy to maintain here since it could handle both PTE and PMD mappings. And page_vma_mapped_walk() could work on PFNs since commit [1], which is the case here, we do not need extra code to handle the page-less case here. What do you think? [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b786e44a4dbfe64476e7120ec7990b89a37be37d