Received: by 10.192.165.148 with SMTP id m20csp893061imm; Wed, 25 Apr 2018 09:13:28 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqUJTNu+p4ioDdj1QTRXdvPDGoC4J3OmxnWIOlqbbdGVdq1spev2xniwrzUFFuSRq5nsRKZ X-Received: by 10.101.76.79 with SMTP id l15mr4275263pgr.61.1524672808489; Wed, 25 Apr 2018 09:13:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524672808; cv=none; d=google.com; s=arc-20160816; b=Pc1GmenL5iAMJ6sOepF/2EihyDmRTnicT55xdmChnb1Cx0n9n4UanBvgd9BdVLQ9jr W8ldSLDj5tySMX/gLYPz2f3xisvObnRD9F3/gUlXDe7BRe8nP5pH49CPnvzzKSBk9Zz/ 2+GBDIB+SEgOOGlk4tUiO5SXDjSZ4GGwsAsgDjMV8qIcqa0wJmbsU5cOyQj4FPD6lT3O X0c/DcwH8am0AGAXLRqSbw35xe91qD6isfihSsLrqiyJh+Uie2YIFm/YmBe0YjpirW0P 8b33/MM96er3VBFCljG6EPduhnSGAIoS+sqppcwn9o0V3qL/xZ/Kjn2WaLH3NiI3kX6I jgJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=AbACvAvdfIaxxBT5ibORAZvym+Won9nequynRVvy/lI=; b=J9c549QFKWMqhXy7LotHvlngGiTB4GJmyaaxlF1Y4zF53DCy6KT1EVftmqyt/C8QYv 4eveM2mL1QylKvQ+rf6KelMF9oyT4HVig1PgmdvV02rAwOIXn0QPXhDISDo6T48SXxoj +SxL4obOth6Q3UtNE0EYefxiB7Z8fKu8Sm+EbvHJEFcuTqJmYVCRfKgzPpXtNUWhxff3 5Hiqhp+9Hg4RlGbnVWrO0YEWmDT9Uh4YTG+pBOS/SYAB+0uLceAHaIhO7twoBUj9ZART XJ7GuwODUh5pJWrjLw5dRgheqGnJpzCFXPcbHVmNsIDygmPChgSOgRMFzs1F04DRWNCG kjjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=cTabu687; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g8-v6si15491648plt.254.2018.04.25.09.13.13; Wed, 25 Apr 2018 09:13:28 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=cTabu687; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755721AbeDYQEZ (ORCPT + 99 others); Wed, 25 Apr 2018 12:04:25 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:48418 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754954AbeDYQER (ORCPT ); Wed, 25 Apr 2018 12:04:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=AbACvAvdfIaxxBT5ibORAZvym+Won9nequynRVvy/lI=; b=cTabu687gnYZn6C5w2JvNUKET YVLZtvbXorij2aQE2YkfkEYaaOvOGSJhrjdLkFotPmPNqxGiONujx/YRGLIFTmoRcwsSohrtRp8bn G/wyBQHMrx1z55mFNs8XH131FZkdfGr0byETBKHQHcWR4tP4a1VrcbcICIOVALkgxVMamL6VNXgFH 8cbCbsaT0qSsHDX70p1iYX2mlvw6KWCOwOCEZWrrBfzIBnpzfEDIMokR5Qff0xxVdeyWd4jx33S+s krAHtd5O5QUYFRZyCvky+NKLyDLAA/LqKE63oL/ZU/Od9FaaC5IfA4tX/kFBgkEU2iuZFr4J0ZW4e FDDTx2/gw==; Received: from willy by bombadil.infradead.org with local (Exim 4.90_1 #2 (Red Hat Linux)) id 1fBMtx-0000s6-Rp; Wed, 25 Apr 2018 16:04:13 +0000 Date: Wed, 25 Apr 2018 09:04:13 -0700 From: Matthew Wilcox To: Eric Dumazet Cc: Christoph Hellwig , Eric Dumazet , "David S . Miller" , netdev , Andy Lutomirski , linux-kernel , linux-mm , Soheil Hassas Yeganeh Subject: Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive Message-ID: <20180425160413.GC8546@bombadil.infradead.org> References: <20180425052722.73022-1-edumazet@google.com> <20180425052722.73022-2-edumazet@google.com> <20180425062859.GA23914@infradead.org> <5cd31eba-63b5-9160-0a2e-f441340df0d3@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5cd31eba-63b5-9160-0a2e-f441340df0d3@gmail.com> User-Agent: Mutt/1.9.2 (2017-12-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 25, 2018 at 06:01:02AM -0700, Eric Dumazet wrote: > On 04/24/2018 11:28 PM, Christoph Hellwig wrote: > > On Tue, Apr 24, 2018 at 10:27:21PM -0700, Eric Dumazet wrote: > >> When adding tcp mmap() implementation, I forgot that socket lock > >> had to be taken before current->mm->mmap_sem. syzbot eventually caught > >> the bug. > >> > >> Since we can not lock the socket in tcp mmap() handler we have to > >> split the operation in two phases. > >> > >> 1) mmap() on a tcp socket simply reserves VMA space, and nothing else. > >> This operation does not involve any TCP locking. > >> > >> 2) setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements > >> the transfert of pages from skbs to one VMA. > >> This operation only uses down_read(¤t->mm->mmap_sem) after > >> holding TCP lock, thus solving the lockdep issue. > >> > >> This new implementation was suggested by Andy Lutomirski with great details. > > > > Thanks, this looks much more sensible to me. > > > > Thanks Christoph > > Note the high cost of zap_page_range(), needed to avoid -EBUSY being returned > from vm_insert_page() the second time TCP_ZEROCOPY_RECEIVE is used on one VMA. > > Ideally a vm_replace_page() would avoid this cost ? If you don't zap the page range, any of the CPUs in the system where any thread in this task have ever run may have a TLB entry pointing to this page ... if the page is being recycled into the page allocator, then that page might end up as a slab page or page table or page cache while the other CPU still have access to it. You could hang onto the page until you've built up a sufficiently large batch, then bulk-invalidate all of the TLB entries, but we start to get into weirdnesses on different CPU architectures.