Received: by 2002:a05:6500:2018:b0:1fb:9675:f89d with SMTP id t24csp167099lqh; Thu, 30 May 2024 18:47:41 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCUWk6WXyL2mCw2wM8ZmLcJx422tt2C4nyhSybFxy0S4RozisM8B1dy7KKVmT+ik0j8xgxi8hZA4XrViYUgQE/gpI9oUAISyAXkOh9oSgg== X-Google-Smtp-Source: AGHT+IFhrYSXOdHGklrtPKM6nvOdnczLsVQ/RR7gWh6yNsNgWaFB6bxPBxZNb3RTfC2dp59ec/h8 X-Received: by 2002:a17:906:aa44:b0:a59:a977:a154 with SMTP id a640c23a62f3a-a6822447c5bmr30252366b.64.1717120060980; Thu, 30 May 2024 18:47:40 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717120060; cv=pass; d=google.com; s=arc-20160816; b=fsPYT/LZB1tELmz5IeGoyWNdKYOomAMYQurgGi/K29NeRriLJ4q2p3PIojqBvVRMtj FL72BZe8ETcmNRYTgOkC+J6yzmH8PlmgtuR869lOtSLXvqx5YiGQ3DWR4v+Y8tn8/lOj YANijDypGDU/ZtmpXOnIo2fBEx76/TO/M0oowIKl33A8e9v/MY9KOqkjlwhKsiSWUDqK VRRdJzwJk+blQUpXva1HiboPB5RRgiDDJyMtq7YcMXOuZOPJAUpy8GtD5I5UfO1oMVHV zPNMYqXpXL3fQXj/fNI3vAN1+5ge42torNwqncnJGbnlAvXbpMQ15/b1LozXDFulI23W Gejg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :user-agent:message-id:date:references:in-reply-to:subject:cc:to :from:dkim-signature; bh=PJDANj4+0n7pfXaR6YkRJnkCicgMDSkDK6L13cU1/V4=; fh=AUuh9luXd6+5FmYWNdoWNvdUtClkGpAFiOeTUZ2zi1k=; b=I6GSYDgPwNKCpP83DbOPjMKZSCYcOaS6t7zl5mSBO/QP3zs2Da8JSzcMRU1YyvTZ+m k84Yc1t1aztPF9ucRuyuiF6GZwTQtyUG4ziOoglb9LMOIneId2LtKg5Lh3DN7XATtYic a4HjTSCggteaY+ooUGqQSHlnFe/k5kMjJzNmlpKhdCopnHMvNnywqtCk3P8bgHxZmXH0 C7NnPlb6J2x/f7+ueNHB+/kbE8KoZgLHG85WJVkwLBgHbFxritzaex4ux40HDHxzLfgX OC/2iCW4iHZdQZy39LMmXtY7hOKGXx3tvRpcXtwclqPI+tPx+XcZTOxZe7uV+Sq7X/p/ 4RCw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=chafA5M6; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-196170-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-196170-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id a640c23a62f3a-a67ea38db69si36280466b.455.2024.05.30.18.47.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 May 2024 18:47:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-196170-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=chafA5M6; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-196170-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-196170-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 84B471F2417D for ; Fri, 31 May 2024 01:47:40 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6AAC013FF9; Fri, 31 May 2024 01:47:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="chafA5M6" Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.10]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E681E33DF for ; Fri, 31 May 2024 01:47:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.10 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717120052; cv=none; b=BONeVIhw82BZfIHoIVZda87qZB/nmLJVPYjCgE2JeOL5cZ70EssgcTkTtNeb85mtu0A2HpmmlIPIz9GPE5k3b9brltopVoG0OyBp0PKmSMrvfAYuRukBvrMzc9A1pY75VMyGzdVkONAOHpIXI08q9soZSBRt2HHauY5laG7voN4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717120052; c=relaxed/simple; bh=bRLXaD0t9oBTss9DDyeGZGopPy1q6rqAwm3aGhuXiBE=; h=From:To:Cc:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=rVL+5KzN0kQGpvbbq9rrY1kfJMHYpCbdzTyLBTkR8DhsJl28s/+Ce333LEUEeexi1yM8YlSRVgHnrCPkl2yWsJqVxpkl0yQE6Ity1jSm1wPBYkb9DgdckokoII1UdjcSgllipGfiY0aopHoQDQ8LU26rnUce7XFfsTrxirKZAH8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=chafA5M6; arc=none smtp.client-ip=198.175.65.10 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1717120051; x=1748656051; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=bRLXaD0t9oBTss9DDyeGZGopPy1q6rqAwm3aGhuXiBE=; b=chafA5M6BLn4QauLDm5r+mfLRdsqG9kysuAZDwcqcXMSZzREHgOtWjOg itQ7l8hnuOWcUBFN8hgX1Z0jO+xXi90bszXY4oSFgDTeK4o0WJP2Z0L4Y zOt0PRadxht2FPSMvdQeYJfdA6PnbPBCcIgILSk1s8wSMk2F0cEr6IJjq IyhhynFcJeLFZ9x0TagcEj6xIyx6E1ifq1KEbBbWQrBM4stIieBW1pHJu Ot0MOgbtxEK+1wukWPkqRAwszuggb5LOM6gOjveefoO+MwCCFJ/KLhjxa TuxeUw0xMBx/0H0Mw42/ts/moGK+P9iqN+F05ie2LtvJLZQLumFKwcQUz Q==; X-CSE-ConnectionGUID: FmcApWTbRtm1bW33MCKTTQ== X-CSE-MsgGUID: P79gYVikRYaSibXM3uu44Q== X-IronPort-AV: E=McAfee;i="6600,9927,11088"; a="31142754" X-IronPort-AV: E=Sophos;i="6.08,202,1712646000"; d="scan'208";a="31142754" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 May 2024 18:47:30 -0700 X-CSE-ConnectionGUID: 8bXyluNQQ5iUZIhU/5+CYw== X-CSE-MsgGUID: mdxGSkWESN2KkwjT+/CaNA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.08,202,1712646000"; d="scan'208";a="36579659" Received: from unknown (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 May 2024 18:47:25 -0700 From: "Huang, Ying" To: Byungchul Park Cc: Dave Hansen , , , , , , , , , , , , , , , , Subject: Re: [PATCH v10 00/12] LUF(Lazy Unmap Flush) reducing tlb numbers over 90% In-Reply-To: <20240530093306.GA35610@system.software.com> (Byungchul Park's message of "Thu, 30 May 2024 18:33:07 +0900") References: <982317c0-7faa-45f0-82a1-29978c3c9f4d@intel.com> <20240527015732.GA61604@system.software.com> <8734q46jc8.fsf@yhuang6-desk2.ccr.corp.intel.com> <44e4f2fd-e76e-445d-b618-17a6ec692812@intel.com> <20240529050046.GB20307@system.software.com> <961f9533-1e0c-416c-b6b0-d46b97127de2@intel.com> <20240530005026.GA47476@system.software.com> <87a5k814tq.fsf@yhuang6-desk2.ccr.corp.intel.com> <20240530071847.GA15344@system.software.com> <871q5j1zdf.fsf@yhuang6-desk2.ccr.corp.intel.com> <20240530093306.GA35610@system.software.com> Date: Fri, 31 May 2024 09:45:33 +0800 Message-ID: <87wmnazrcy.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=ascii Byungchul Park writes: > On Thu, May 30, 2024 at 04:24:12PM +0800, Huang, Ying wrote: >> Byungchul Park writes: >> >> > On Thu, May 30, 2024 at 09:11:45AM +0800, Huang, Ying wrote: >> >> Byungchul Park writes: >> >> >> >> > On Wed, May 29, 2024 at 09:41:22AM -0700, Dave Hansen wrote: >> >> >> On 5/28/24 22:00, Byungchul Park wrote: >> >> >> > All the code updating ptes already performs TLB flush needed in a safe >> >> >> > way if it's inevitable e.g. munmap. LUF which controls when to flush in >> >> >> > a higer level than arch code, just leaves stale ro tlb entries that are >> >> >> > currently supposed to be in use. Could you give a scenario that you are >> >> >> > concering? >> >> >> >> >> >> Let's go back this scenario: >> >> >> >> >> >> fd = open("/some/file", O_RDONLY); >> >> >> ptr1 = mmap(-1, size, PROT_READ, ..., fd, ...); >> >> >> foo1 = *ptr1; >> >> >> >> >> >> There's a read-only PTE at 'ptr1'. Right? The page being pointed to is >> >> >> eligible for LUF via the try_to_unmap() paths. In other words, the page >> >> >> might be reclaimed at any time. If it is reclaimed, the PTE will be >> >> >> cleared. >> >> >> >> >> >> Then, the user might do: >> >> >> >> >> >> munmap(ptr1, PAGE_SIZE); >> >> >> >> >> >> Which will _eventually_ wind up in the zap_pte_range() loop. But that >> >> >> loop will only see pte_none(). It doesn't do _anything_ to the 'struct >> >> >> mmu_gather'. >> >> >> >> >> >> The munmap() then lands in tlb_flush_mmu_tlbonly() where it looks at the >> >> >> 'struct mmu_gather': >> >> >> >> >> >> if (!(tlb->freed_tables || tlb->cleared_ptes || >> >> >> tlb->cleared_pmds || tlb->cleared_puds || >> >> >> tlb->cleared_p4ds)) >> >> >> return; >> >> >> >> >> >> But since there were no cleared PTEs (or anything else) during the >> >> >> unmap, this just returns and doesn't flush the TLB. >> >> >> >> >> >> We now have an address space with a stale TLB entry at 'ptr1' and not >> >> >> even a VMA there. There's nothing to stop a new VMA from going in, >> >> >> installing a *new* PTE, but getting data from the stale TLB entry that >> >> >> still hasn't been flushed. >> >> > >> >> > Thank you for the explanation. I got you. I think I could handle the >> >> > case through a new flag in vma or something indicating LUF has deferred >> >> > necessary TLB flush for it during unmapping so that mmu_gather mechanism >> >> > can be aware of it. Of course, the performance change should be checked >> >> > again. Thoughts? >> >> >> >> I suggest you to start with the simple case. That is, only support page >> >> reclaiming and migration. A TLB flushing can be enforced during unmap >> >> with something similar as flush_tlb_batched_pending(). >> > >> > While reading flush_tlb_batched_pending(mm), I found it already performs >> > TLB flush for the target mm, if set_tlb_ubc_flush_pending(mm) has been >> > hit at least once since the last flush_tlb_batched_pending(mm). >> > >> > Since LUF also relies on set_tlb_ubc_flush_pending(mm), it's going to >> > perform TLB flush required, in flush_tlb_batched_pending(mm) during >> > munmap(). So it looks safe to me with regard to munmap() already. >> > >> > Is there something that I'm missing? >> > >> > JFYI, regarding to mmap(), I have reworked on fault handler to give up >> > luf when needed in a better way. >> >> If TLB flush is always enforced during munmap(), then your solution can >> only avoid TLB flushing for page reclaiming and migration, not unmap. > > I'm not sure if I understand what you meant. Could you explain it in > more detail? > > LUF works for only *unmapping* that happens during page reclaiming and > migration. Other unmappings than page reclaiming and migration are not > what LUF works for. That's why I thought flush_tlb_batched_pending() > could handle the pending tlb flushes in the case. > > It'd be appreciated if you explain what you meant more. > In the following email, you have claimed that LUF can avoid TLB flushing for munmap()/mmap(). https://lore.kernel.org/linux-mm/20240527015732.GA61604@system.software.com/ Now, you said it can only avoid TLB flushing for page reclaiming and migration. So, to avoid confusion, I suggest you to send out a new series and make it explicit that it can only optimize page reclaiming and migration, but not munmap(). And it would be good too to add some words about how it interact with other TLB flushing mechanisms. -- Best Regards, Huang, Ying