Received: by 10.192.165.148 with SMTP id m20csp913242imm; Wed, 25 Apr 2018 09:31:46 -0700 (PDT) X-Google-Smtp-Source: AIpwx489CpXUPXaXlNLS28D1h23PY1t9zNbhOqMx956/BSzbTbqknStlUTg3Ilh9WKdxkDXF/ON1 X-Received: by 10.99.116.76 with SMTP id e12mr24139017pgn.270.1524673906233; Wed, 25 Apr 2018 09:31:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524673906; cv=none; d=google.com; s=arc-20160816; b=eewd04gXNAkS3B2lPb1B5zRGDuwEwo3u8mSKWmFPb6MOMrBD1q9MD3iiROIBLjnO38 bfUkg5A3epePD3HHURdkRYwY2IRorftzRw3C9pWib0mEzG9Uyo1kqUjgHwEpEo1uqhZ9 xR5gwnOxO88FJpiu+SDB0LQyzIuFKkR0AnYG46w72hmFbbB9jfwL8iPsiGht+OhzDP5J d3SEus+8JI0v7rdDAoanOVvgmiBdWbKclr60I0M9yHemKkI01a46DeU50QZFQY2KE1pb prQaHD2hcRSBV6igTUh2949Z+XAYj7rUr09I0EzylUP1fzruq5Lm1EfKCBIb8NIYpGwn cq4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=2u8OZyScZApJKvHP5JjodbMAA2MA+9tRn+YDg+xFsQQ=; b=fbwv1GptA57P/PtcLSgxeaAfx4cBT0w17dLD7cTbM5l6Chj98FW5DUXsiJk352OF7h 93tG5I/sVGUHoZsvXGUMWSyOxV5g2RVCbIZoNA+pFyWEtqoRY2WUolQdHY+E1BbuFQJz 2TKzp67oC7H6aIBaJc7rTXJItISmrF/9rgXZnNLVa6zq0VyTsOO/O9bch2qYFuG65sAD K0Ddwvym5biwvs5lErEidoaMwZc17giAC7A04YZohmIQ3ZSo1azWLVHL9Q5MXB5fKZEO YVGUtqQDELh79VM01d5NlXr8dVK0IPfTJwgSP3p+rWaCE3X5JN1k8UZXoJ1j2ZUe1H3H qMcg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=Tukwi4gk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 5-v6si16042668plx.148.2018.04.25.09.31.31; Wed, 25 Apr 2018 09:31:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=Tukwi4gk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755230AbeDYQaV (ORCPT + 99 others); Wed, 25 Apr 2018 12:30:21 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:47592 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754963AbeDYQaQ (ORCPT ); Wed, 25 Apr 2018 12:30:16 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=2u8OZyScZApJKvHP5JjodbMAA2MA+9tRn+YDg+xFsQQ=; b=Tukwi4gkMAfI3lyMSzNFhdQn3 /6QDkmxRneo2hgfFjdFywoaIwFcQt1XpAa7mrFHfx2Jwxx0y6QwuKRDfhGHo7I5qF6Q6elLl7SHVZ dEit6uJc9G6KWvVcs7UAtw/Rwv+TXYSX0ILY4heGTjsESxdyA6/D+F0+ybwE5oZnnMwhZUFMfkSck qDqpXCORSsq15Aq+t8LRZe8Tk5LdDR2G/j+5bXJ/S8ocmp1BN6gHNIeHjaTekuyLyExdfnq7FjNyl mAzt9L90PbslccIr+hYLKbHpa0CZgZnkHP6XNSyBSIU67gsf5o4B79W69lXDTYAn7loPtS1pZeLJu lSh4ExaKQ==; Received: from willy by bombadil.infradead.org with local (Exim 4.90_1 #2 (Red Hat Linux)) id 1fBNJ8-0007An-Vd; Wed, 25 Apr 2018 16:30:14 +0000 Date: Wed, 25 Apr 2018 09:30:14 -0700 From: Matthew Wilcox To: Eric Dumazet Cc: Christoph Hellwig , Eric Dumazet , "David S . Miller" , netdev , Andy Lutomirski , linux-kernel , linux-mm , Soheil Hassas Yeganeh Subject: Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive Message-ID: <20180425163014.GD8546@bombadil.infradead.org> References: <20180425052722.73022-1-edumazet@google.com> <20180425052722.73022-2-edumazet@google.com> <20180425062859.GA23914@infradead.org> <5cd31eba-63b5-9160-0a2e-f441340df0d3@gmail.com> <20180425160413.GC8546@bombadil.infradead.org> <8ce78bd6-8142-2937-11fd-2e4a2b22d90c@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8ce78bd6-8142-2937-11fd-2e4a2b22d90c@gmail.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 25, 2018 at 09:20:55AM -0700, Eric Dumazet wrote: > On 04/25/2018 09:04 AM, Matthew Wilcox wrote: > > If you don't zap the page range, any of the CPUs in the system where > > any thread in this task have ever run may have a TLB entry pointing to > > this page ... if the page is being recycled into the page allocator, > > then that page might end up as a slab page or page table or page cache > > while the other CPU still have access to it. > > Yes, this makes sense. > > > > > You could hang onto the page until you've built up a sufficiently large > > batch, then bulk-invalidate all of the TLB entries, but we start to get > > into weirdnesses on different CPU architectures. > > > > zap_page_range() is already doing a bulk-invalidate, > so maybe vm_replace_page() wont bring serious improvement if we end-up doing same dance. Sorry, I was unclear. zap_page_range() bulk-invalidates all pages that were torn down as part of this call. What I was trying to say was that we could have a whole new API which put page after page into the same address, and bumped the refcount on them to prevent them from actually being freed. Once we get to a batch limit, we invalidate all of the pages which were mapped at those addresses and can then free the pages back to the allocator. I don't think you can implement this scheme on s390 because it requires the userspace address to still be mapped to that page on shootdown (?) but I think we could implement it on x86. Another possibility is if we had some way to insert the TLB entry into the local CPU's page tables only, we wouldn't need to broadcast-invalidate the TLB entry; we could just do it locally which is relatively quick.