Received: by 2002:a05:6358:1087:b0:cb:c9d3:cd90 with SMTP id j7csp1315907rwi; Fri, 14 Oct 2022 16:54:12 -0700 (PDT) X-Google-Smtp-Source: AMsMyM43ZKNF8u7AqEUj8D4Evg927WBHIe/xr9rmGctIRZipjvqltCo1mMVmZ0VexhuIf4ScleuU X-Received: by 2002:a17:906:9c83:b0:779:c14c:55e4 with SMTP id fj3-20020a1709069c8300b00779c14c55e4mr283986ejc.619.1665791651894; Fri, 14 Oct 2022 16:54:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665791651; cv=none; d=google.com; s=arc-20160816; b=hoPCp1cpLJrXLh0D99lBZUTjkLQ5VqLqqbmO2DAgAtz7rN1Pj+Ee1jZW3O0QblARF0 /7obF4QiKpLOHBN64Xwmcppbv9dHctEjCdRb5mbDFtHCoJubVXVbzNwEVPZlLiMx8tBV pGz9uRSvVEc3BsODC1AdSkWLBzlYaGgI8c6/jYdoswRnnVc78INWV8j1NZ7yXvnLaDTL SbL9xkbWleuQ5wujMGyp1Nmuamyf6kB6fZlZgDV6fLyg85uTNyAJTKdsBIaRv5kV0UOz Qyyy36ISAJBHNkGhEM+LcRbgv4OxaVYKkmQRSHuQZvAM8WJq7wmVjnrmvLyiBP8tAHrw zHbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:feedback-id :dkim-signature:dkim-signature; bh=M73D+dncr5/TWrQ5CiXGIZUf2R2xBbwveAi+EFE4fEk=; b=ig1HGW+SzmpECPhyJcqe3AISRWO33p1Eq6MyTrSs70pYszUfiZ9IeqZeIoOCoNTxko 4KExaLp1owWRaQaGsY4gm3oXUMwMVkBD8cS+lflE/kQZzrNZwY3u6gjb2O11s8V/8ttG VnWFt1uVYnXVrky/fTQHK6PWJmxtF3Z211oPm5GHiWt6ASzsjJbbSSQ//mOyem8RjiYB qgQjU5UuSQbejVw4DVgErQd0tAVpdolkDgjMonpWpa4J/4/2ZnncMS8QdGY6FMf0YQus Ao7mTkvRwtcII8IHRwREH+e3cbKciJ/L9UOay+upLhVgYPxX2x77XuzbB+tAo8lZJLEU h8+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@shutemov.name header.s=fm1 header.b=RMqn5vaW; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=DgBUj8Wm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fj3-20020a1709069c8300b00730ed690a72si3985476ejc.630.2022.10.14.16.53.45; Fri, 14 Oct 2022 16:54:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@shutemov.name header.s=fm1 header.b=RMqn5vaW; dkim=pass header.i=@messagingengine.com header.s=fm3 header.b=DgBUj8Wm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229640AbiJNW4D (ORCPT + 99 others); Fri, 14 Oct 2022 18:56:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229537AbiJNW4A (ORCPT ); Fri, 14 Oct 2022 18:56:00 -0400 Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFAC44A107 for ; Fri, 14 Oct 2022 15:55:58 -0700 (PDT) Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 846193200909; Fri, 14 Oct 2022 18:55:57 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Fri, 14 Oct 2022 18:55:58 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shutemov.name; h=cc:cc:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:sender :subject:subject:to:to; s=fm1; t=1665788157; x=1665874557; bh=M7 3D+dncr5/TWrQ5CiXGIZUf2R2xBbwveAi+EFE4fEk=; b=RMqn5vaWWc6KfSESFS bFrZxeHCm6h+SYVXKblWfKEKiJvD6sEwhZlqEn2bLZK1GQlkCbVD4yG8mwGW5DAB rvHbW/FmrbXUKMpCs0Cs0OKPo8cJ8JzsxEwnab9PLk2ude/SBmzmqJsnoXJehJP0 veo5d6MV2wqochWSph6FNlazyb6oygmEdxBzICaXBQwUz4MiCa+gYjyAjNldJ/8Q 70F/6VBiW4gTnfVMKWW20wtsw0WM4+98YqK7abSB9VzdxzVjzyVX172vFQeZyfkU 4e1bBVk5ioAqI+bfUUcl03SBAXmgsMKuFvNOTLcJHvUkDCCHeVwU72Y4l1ONqokd np8Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; t=1665788157; x=1665874557; bh=M73D+dncr5/TWrQ5CiXGIZUf2R2x BbwveAi+EFE4fEk=; b=DgBUj8WmGfVVXapD9O1r6q63nPgkH3lf0Cl3vtVKLUVc ELr8FXoWR8k0kHFx+WxIeMCoKXFOz/Sq3dn+xYD1sg/3bj0txk8dlEiZhI0G+NJi Cckx/8x8Mw2NytaIhjAhxjvUmm3ymmoOH1CxWY229Upq+ymJRqgCMgBTAIJGB0qa vTcHpSXZzlNEsKfOJbF0cgIPbuJU0UkJJvge9i3nwjIeZSSuo6yXRkg5D9QTsF+M 3TIKt2KYfcZqbxIdXOmveJBFQyEKL6o/byhl86qfn80MySFQrSmDzfdeC6cYI3Dc rWpuCGJJLq1pKeAD9s0tmpblgIguIwY11oJJHsh/1w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrfeekfedgudeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepfffhvfevuffkfhggtggujgesthdttddttddtvdenucfhrhhomhepfdfmihhr ihhllhcutedrucfuhhhuthgvmhhovhdfuceokhhirhhilhhlsehshhhuthgvmhhovhdrnh grmhgvqeenucggtffrrghtthgvrhhnpefhieeghfdtfeehtdeftdehgfehuddtvdeuheet tddtheejueekjeegueeivdektdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpehkihhrihhllhesshhhuhhtvghmohhvrdhnrghmvg X-ME-Proxy: Feedback-ID: ie3994620:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 14 Oct 2022 18:55:56 -0400 (EDT) Received: by box.shutemov.name (Postfix, from userid 1000) id 188911094FB; Sat, 15 Oct 2022 01:55:54 +0300 (+03) Date: Sat, 15 Oct 2022 01:55:54 +0300 From: "Kirill A. Shutemov" To: Jann Horn Cc: Andy Lutomirski , Linux-MM , Mel Gorman , Rik van Riel , kernel list , Kees Cook , Ingo Molnar , Sasha Levin , Andrew Morton , Will Deacon , Peter Zijlstra , Linus Torvalds Subject: Re: [BUG?] X86 arch_tlbbatch_flush() seems to be lacking mm_tlb_flush_nested() integration Message-ID: <20221014225554.q6lxvc2ffp5drqvs@box.shutemov.name> References: <20221014222346.n337tvkbyr33dsdx@box.shutemov.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Oct 15, 2022 at 12:29:57AM +0200, Jann Horn wrote: > On Sat, Oct 15, 2022 at 12:23 AM Kirill A. Shutemov > wrote: > > On Fri, Oct 14, 2022 at 08:19:42PM +0200, Jann Horn wrote: > > > Hi! > > > > > > I haven't actually managed to reproduce this behavior, so maybe I'm > > > just misunderstanding how this works; but I think the > > > arch_tlbbatch_flush() path for batched TLB flushing in vmscan ought to > > > have some kind of integration with mm_tlb_flush_nested(). > > > > > > I think that currently, the following race could happen: > > > > > > [initial situation: page P is mapped into a page table of task B, but > > > the page is not referenced, the PTE's A/D bits are clear] > > > A: vmscan begins > > > A: vmscan looks at P and P's PTEs, and concludes that P is not currently in use > > > B: reads from P through the PTE, setting the Accessed bit and creating > > > a TLB entry > > > A: vmscan enters try_to_unmap_one() > > > A: try_to_unmap_one() calls should_defer_flush(), which returns true > > > A: try_to_unmap_one() removes the PTE and queues a TLB flush > > > (arch_tlbbatch_add_mm()) > > > A: try_to_unmap_one() returns, try_to_unmap() returns to shrink_folio_list() > > > B: calls munmap() on the VMA that mapped P > > > B: no PTEs are removed, so no TLB flush happens > > > B: munmap() returns > > > > I think here we will serialize against anon_vma/i_mmap lock in > > __do_munmap() -> unmap_region() -> free_pgtables() that A also holds. > > > > So I believe munmap() is safe, but MADV_DONTNEED (and its flavours) is not. > > shrink_folio_list() is not in a context that is operating on a > specific MM; it is operating on a list of pages that might be mapped > into different processes all over the system. s/specific MM/specific page/ > So A has temporarily held those locks somewhere inside > try_to_unmap_one(), but it will drop them before it reaches the point inside try_to_unmap(), which handles all mappings of the page. > where it issues the batched TLB flush. > And this batched TLB flush potentially covers multiple MMs at once; it > is not targeted towards a specific MM, but towards all of the CPUs on > which any of the touched MMs might be active. But, yes, you are right. I thought that try_to_unmap_flush() called inside try_to_unmap() under the lock. -- Kiryl Shutsemau / Kirill A. Shutemov