Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752007AbdHAK7b (ORCPT ); Tue, 1 Aug 2017 06:59:31 -0400 Received: from outbound-smtp04.blacknight.com ([81.17.249.35]:57416 "EHLO outbound-smtp04.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751900AbdHAK70 (ORCPT ); Tue, 1 Aug 2017 06:59:26 -0400 Date: Tue, 1 Aug 2017 11:59:24 +0100 From: Mel Gorman To: Minchan Kim Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-team , Ingo Molnar , Russell King , Tony Luck , Martin Schwidefsky , "David S. Miller" , Heiko Carstens , Yoshinori Sato , Jeff Dike , linux-arch@vger.kernel.org, Nadav Amit Subject: Re: [PATCH v2 3/4] mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem Message-ID: <20170801105924.h4u4ocplofdpylh5@techsingularity.net> References: <1501566977-20293-1-git-send-email-minchan@kernel.org> <1501566977-20293-4-git-send-email-minchan@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <1501566977-20293-4-git-send-email-minchan@kernel.org> User-Agent: NeoMutt/20170421 (1.8.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2239 Lines: 47 On Tue, Aug 01, 2017 at 02:56:16PM +0900, Minchan Kim wrote: > Nadav reported parallel MADV_DONTNEED on same range has a stale TLB > problem and Mel fixed it[1] and found same problem on MADV_FREE[2]. > > Quote from Mel Gorman > > "The race in question is CPU 0 running madv_free and updating some PTEs > while CPU 1 is also running madv_free and looking at the same PTEs. > CPU 1 may have writable TLB entries for a page but fail the pte_dirty > check (because CPU 0 has updated it already) and potentially fail to flush. > Hence, when madv_free on CPU 1 returns, there are still potentially writable > TLB entries and the underlying PTE is still present so that a subsequent write > does not necessarily propagate the dirty bit to the underlying PTE any more. > Reclaim at some unknown time at the future may then see that the PTE is still > clean and discard the page even though a write has happened in the meantime. > I think this is possible but I could have missed some protection in madv_free > that prevents it happening." > > This patch aims for solving both problems all at once and is ready for > other problem with KSM, MADV_FREE and soft-dirty story[3]. > > TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending > and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can catch > there are parallel threads going on. In that case, forcefully, flush TLB > to prevent for user to access memory via stale TLB entry although it fail > to gather page table entry. > > I confiremd this patch works with [4] test program Nadav gave so this patch > supersedes "mm: Always flush VMA ranges affected by zap_page_range v2" > in current mmotm. > > NOTE: > This patch modifies arch-specific TLB gathering interface(x86, ia64, > s390, sh, um). It seems most of architecture are straightforward but s390 > need to be careful because tlb_flush_mmu works only if mm->context.flush_mm > is set to non-zero which happens only a pte entry really is cleared by > ptep_get_and_clear and friends. However, this problem never changes the > pte entries but need to flush to prevent memory access from stale tlb. > > Any thoughts? > Acked-by: Mel Gorman -- Mel Gorman SUSE Labs