Received: by 2002:a05:6a10:af89:0:0:0:0 with SMTP id iu9csp2741351pxb; Mon, 31 Jan 2022 03:21:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJzc28ImXcb/17EE0MT0nr8VuapZl2u2+iPUj1+4Hvm6kZ83BNMyPp5q5tr2nEaocfK+P1GA X-Received: by 2002:a17:907:94d0:: with SMTP id dn16mr17246286ejc.95.1643628106377; Mon, 31 Jan 2022 03:21:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643628106; cv=none; d=google.com; s=arc-20160816; b=EvPstQJToISPe/0/DYPu5JPTYefMriNddZfrmtjpDSETFF/AkdffnulLJPcahnsQFK BZzIcpFHFyBafBMqukYn68z3TMnOi94bZWHc35bI2v/aAhw9+oGtRsrNAbxlO65pbL3r EjRij7EE2bvuwYr/QHk0hUqFjSpUxnSYkhbny8BuOmyYkTime1fy/C+q3ZV0iIR3xqCv 1GdR2UOUvTE20qffuGdAfA7ZIdPtl+Z3r6A1Mk7LdWRGxj62KOY1M7bRdAH0kLH52Xu6 CSlpVOFlMSNe4V2e8yn0HD0fVgwPmp45srludE30JQTZkK17vRE47VQuRwZ5CzNPyDBZ JDeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=BLJFT4AhBocD6kHHT+suS0eJkn3pPtgeNyCXLqLIpak=; b=phwFBKsfmJvIZolWkdUzsT8gPlYyifxciYLF/Tw54ovG13t9CGOq/PgE7bKo9kVIf5 6b9n9B5+tMbSxeeZwSoHgiZAsIMeR8Fwc8mNABSzw6J0zxrxlrSuJr2UQBYJ00VQ+mD3 eZaD1hgG6siKYE239CCRxZoN6Ee8Knsh9cfRz5mAr3d2ay/AXoMo5FPw6snDzsFefLrU h5gPHCEYWNZObCm/E6E2rAgrmfUMYCx1lUv1R0h4brcNZp4bPmTtBx3jG2hfqkm1VgNC w0B2vQ3O0DFVukr4VQZQ6uTujilO2oMrZBuypDUhxoy+KH612dcNy/V5s/sDiRdGA6FW NE2Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=eXWnQezl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id s1si7659507ejn.756.2022.01.31.03.21.21; Mon, 31 Jan 2022 03:21:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=eXWnQezl; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1350335AbiA1QxW (ORCPT + 99 others); Fri, 28 Jan 2022 11:53:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48494 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1350319AbiA1QxT (ORCPT ); Fri, 28 Jan 2022 11:53:19 -0500 Received: from mail-ej1-x62a.google.com (mail-ej1-x62a.google.com [IPv6:2a00:1450:4864:20::62a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07C27C061714; Fri, 28 Jan 2022 08:53:19 -0800 (PST) Received: by mail-ej1-x62a.google.com with SMTP id d10so17895482eje.10; Fri, 28 Jan 2022 08:53:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BLJFT4AhBocD6kHHT+suS0eJkn3pPtgeNyCXLqLIpak=; b=eXWnQezlzjgM85qsZszg2oBXIDeFvSRt24xoZNoEeaT02DL98kW6vYQmfrS77uf+ck 73j/FMVrXvBV0t9/NOXyQild2tI4WGX8LDLVemr0lyO49SRXaerhke2s3D69BJNjvCE6 EkeqgY43aO97+R6xHwHtbrc9e11d3/+UMkt6snjEgmt0taXrCSuA2GuPd4pHy1hNB7I+ JTGxPnrpmZS0GWmVAx+zeXJM22AXimC3Ot09+82+1KRP1Boz6rvwfZKcSeUJ3JefUBHG ODRGdKMPEWfT57hTNdkNbbhNDpIZ4q6j3j//lxvpy07jrisIO7uQj0TNdx6y12/EJM16 S38A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BLJFT4AhBocD6kHHT+suS0eJkn3pPtgeNyCXLqLIpak=; b=X8x8o6T+8UhkbXC5dW4xqqpLRkzPr0pYvoi2/CmWnuPUnMInFvmSm1wU/z2gkcr1SG 4BE7+DIxeJs1U2CDuAHKqZl/Ax5cAae4lDCJIY07I4zMPptlhYz/SGa8vEre5ZvkcIAX xNhOCpRKyYHK3GWjp0EwT5GVseJ+5Vb3G/I91sD2hPWglNEmlRQknMp0WpX7xSEfI9NV n20Ov1y+vzNciDevtnk34lEoUQzNor4u8Sx3IyguNXdmSGjxuKiEcTSW5EpHHlFcnHyV OkhxnUns26X51gariCBBDlK98rMLC0RAM3iqachxXjpuFWuYXcFglIRYBzWqApz6NLrD 4Ueg== X-Gm-Message-State: AOAM5311qG/pzKWZURnCGfyzaxqwWQlHI86QCC8/rElsxcf7mNImnFH8 rnnZ7LVVyeP0Osr+UjFTxMOp5EOJETiH5Lt/8p8= X-Received: by 2002:a17:907:3d94:: with SMTP id he20mr7753163ejc.637.1643388797473; Fri, 28 Jan 2022 08:53:17 -0800 (PST) MIME-Version: 1.0 References: <20220120202805.3369-1-shy828301@gmail.com> <5b4e2c29-8f1a-5a68-d243-a30467cc02d4@redhat.com> <5a565d5a-0540-4041-ce63-a8fd5d1bb340@redhat.com> <2a1c5bd2-cb8c-b93b-68af-de620438d19a@redhat.com> <8c0430e5-fecb-3eda-3d40-e94caa8cbd78@redhat.com> In-Reply-To: <8c0430e5-fecb-3eda-3d40-e94caa8cbd78@redhat.com> From: Yang Shi Date: Fri, 28 Jan 2022 08:53:05 -0800 Message-ID: Subject: Re: [v2 PATCH] fs/proc: task_mmu.c: don't read mapcount for migration entry To: David Hildenbrand Cc: Jann Horn , "Kirill A. Shutemov" , Matthew Wilcox , Andrew Morton , Linux MM , Linux Kernel Mailing List , stable Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 28, 2022 at 12:02 AM David Hildenbrand wrote: > > On 27.01.22 22:16, Yang Shi wrote: > > On Wed, Jan 26, 2022 at 10:54 AM David Hildenbrand wrote: > >> > >>>>> Just page lock or elevated page refcount could serialize against THP > >>>>> split AFAIK. > >>>>> > >>>>>> > >>>>>> But yeah, using the mapcount of a page that is not even mapped > >>>>>> (migration entry) is clearly wrong. > >>>>>> > >>>>>> To summarize: reading the mapcount on an unlocked page will easily > >>>>>> return a wrong result and the result should not be relied upon. reading > >>>>>> the mapcount of a migration entry is dangerous and certainly wrong. > >>>>> > >>>>> Depends on your usecase. Some just want to get a snapshot, just like > >>>>> smaps, they don't care. > >>>> > >>>> Right, but as discussed, even the snapshot might be slightly wrong. That > >>>> might be just fine for smaps (and I would have enjoyed a comment in the > >>>> code stating that :) ). > >>> > >>> I think that is documented already, see Documentation/filesystems/proc.rst: > >>> > >>> Note: reading /proc/PID/maps or /proc/PID/smaps is inherently racy (consistent > >>> output can be achieved only in the single read call). > >> > >> Right, but I think there is a difference between > >> > >> * Atomic values that change immediately afterwards ("this value used to > >> be true at one point in time") > >> * Values that are unstable because we cannot read them atomically ("this > >> value never used to be true") > >> > >> I'd assume with the documented race we actually talk about the first > >> point, but I might be just wrong. > >> > >>> > >>> Of course, if the extra note is preferred in the code, I could try to > >>> add some in a separate patch. > >> > >> When staring at the (original) code I would have hoped to find something > >> like: > >> > >> /* > >> * We use page_mapcount() to get a snapshot of the mapcount. Without > >> * holding the page lock this snapshot can be slightly wrong as we > >> * cannot always read the mapcount atomically. As long we hold the PT > >> * lock, the page cannot get unmapped and it's at safe to call > >> * page_mapcount(). > >> */ > >> > >> With the addition of > >> > >> "... For unmapped pages (e.g., migration entries) we cannot guarantee > >> that, so treat the mapcount as being 1." > > > > It seems a little bit confusing to me, it is not safe to call with PTL > > held either, right? I'd like to rephrase the note to: > > The implication that could have been spelled out is that only a mapped > page can get unmapped. (I know, there are some weird migration entries > nowadays ...) Yes, I see your point. Just felt "only a mapped page can get unmapped" sounds not that straightforward (just my personal feeling). How's about "It is not safe to call page_mapcount() even with PTL held if the page is not mapped, especially for migration entries". > > /* > * We use page_mapcount() to get a snapshot of the mapcount. Without > * holding the page lock this snapshot can be slightly wrong as we > * cannot always read the mapcount atomically. As long we hold the PT > * lock, a mapped page cannot get unmapped and it's at safe to call > * page_mapcount(). Especially for migration entries, it's not safe to > * call page_mapcount(), so we treat the mapcount as being 1. > */ > > > > -- > Thanks, > > David / dhildenb >