Received: by 10.192.165.148 with SMTP id m20csp673154imm; Wed, 25 Apr 2018 06:03:13 -0700 (PDT) X-Google-Smtp-Source: AIpwx4/cmwZOFcZ6aZlh3zwIdQuhMF5gTBL+DSXxipa5XA1un1fd16BrJrbuV/reNBBuwfNVR90c X-Received: by 10.101.64.201 with SMTP id u9mr24023998pgp.142.1524661392944; Wed, 25 Apr 2018 06:03:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524661392; cv=none; d=google.com; s=arc-20160816; b=CyMrXPNgVPGI5SoNJlOg5PDnN0cMOVulBsJww+3BEnmxsoPMLYsw9pT5HDdmACT1mv 9v2S5PktIcONwxy3Mugl0omZRHkbLVGr79TE8CxRCgu8v078ngKkbVl5W6lZkhH5PwBW /6UQ2PWV2sAcOgLfVVpuOBhgb81wDxIwuAok5ddO+KvS/uB6I1JORg4JCnIVYXhbaw7y 3wDjCyi3pixVHOyOH4qFXClu4Dsaa0RaOP3vRvkZspE5nkTdUJKmglGu57511jzQgUWs euzHn6AvpEVXQbm362Z4pk4VhcZ5rgFCfCmJQ1qamfgjBR2Wd/tSlc6pvmNGzhBBeWoh JcWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=3Vdv6QiDCn/N1NaGhOD/WI7auS790VP8d6jHYakAstc=; b=BZbNZSX9hz44pBftqhLMp1U0qe8KmvcId996slNrsC/cPyz+yPAVXoU0+6D8xE1nsn /aCi1Co6MGWcrRSZs7OkYFo3dMG+JKtNl10ucxQMQQN05D7nPSaf127+5Q8DcThIWFic fA/6xI6KMeqebJfofY7yZbfNeGQn2OSuaYWsG0QYX1JFlHnH+9kS/KzH2IzrVZXWYovn nb0hwwm6NOx1ribjpWCDxtTDiXj86whA9QuiAQ5U6ieiuzO36ZgA6k2BTDrNB16l1REQ rUbz2QKVpjVYV46qGFdZBVGTTrG5og/DVeISzFPxxqIsYXOd22MTT+FgkD30o/rsganq qbFQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=bxc8iJUN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a9si13150580pgf.259.2018.04.25.06.02.56; Wed, 25 Apr 2018 06:03:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=bxc8iJUN; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754283AbeDYNBq (ORCPT + 99 others); Wed, 25 Apr 2018 09:01:46 -0400 Received: from mail-pg0-f68.google.com ([74.125.83.68]:34068 "EHLO mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753367AbeDYNBm (ORCPT ); Wed, 25 Apr 2018 09:01:42 -0400 Received: by mail-pg0-f68.google.com with SMTP id p10so13293476pgn.1; Wed, 25 Apr 2018 06:01:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=3Vdv6QiDCn/N1NaGhOD/WI7auS790VP8d6jHYakAstc=; b=bxc8iJUNOaQqFHzPmUp+P2cswjCCek+918kfRCxesiH45vYFWiZ0++qLN9TTQ2RzjN 81ABAb4lXKWHnC7kSNy2VK/Zj5re6pnLmL/eTd75B88gvHsAw+DzAP9tGmrp39/xzMOG gPO/XdI0TJwglY6iCN6YTd+KF4SDoB5xOrF4ZMSHg0DHOm9gVwXGQU2VfpoueCqQlEV3 OWnMUYCIiQ76YPB1INzNmz2ZTP3b1vqsGPbaJOWUyOwYaLoX3dHzW17CAZYaxSa3VPub WYlHvj5yDc0XL188ZaMDUwiBo7GRV5QQRfg8LKLmg1tmhuzx3PuU8KsnBxnrekd/rN/0 UQbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=3Vdv6QiDCn/N1NaGhOD/WI7auS790VP8d6jHYakAstc=; b=aCOCQjrS2O6edItLx/x/zxdOtdge8/mMsELg4tdOBqJcM2H6EWXmD41heCUDhASTrd jcrgM/7Jej5c36nZBonWyZrblyeXKcf2tJ1pB6wUj654O2Bgne1a6Skozgmi0PVCaVa6 SFH8alW1QbHWEqSdCOQ0ch8dOEj4voXuPvWZXkP4sytNQZrnFGQCCNjK2p3HaHpG3JXC 2J5xgruMLxhcPChXkYYCO5FO8//bAOsF1vdlu8gX1mU4WfUAjw4lz/GZips1D2rHjXtE AfmIe9UDRSXxOVu7NM0kXDxFl7Qara9E5JV/xoUkFH27asiovCumPwwJpisBCxpG8Pwp VOLw== X-Gm-Message-State: ALQs6tC6rUB3ix/GYBAtFHQpittc1hkVd67CNyWu/LYd9Q5NmX7rM0XC kCQRdKT2sRlVLVMrnLV7UNU= X-Received: by 10.101.92.199 with SMTP id b7mr23429691pgt.138.1524661265328; Wed, 25 Apr 2018 06:01:05 -0700 (PDT) Received: from [192.168.86.235] (c-67-180-167-114.hsd1.ca.comcast.net. [67.180.167.114]) by smtp.gmail.com with ESMTPSA id y7sm25329902pgr.26.2018.04.25.06.01.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 25 Apr 2018 06:01:04 -0700 (PDT) Subject: Re: [PATCH net-next 1/2] tcp: add TCP_ZEROCOPY_RECEIVE support for zerocopy receive To: Christoph Hellwig , Eric Dumazet Cc: "David S . Miller" , netdev , Andy Lutomirski , linux-kernel , linux-mm , Soheil Hassas Yeganeh References: <20180425052722.73022-1-edumazet@google.com> <20180425052722.73022-2-edumazet@google.com> <20180425062859.GA23914@infradead.org> From: Eric Dumazet Message-ID: <5cd31eba-63b5-9160-0a2e-f441340df0d3@gmail.com> Date: Wed, 25 Apr 2018 06:01:02 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180425062859.GA23914@infradead.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 04/24/2018 11:28 PM, Christoph Hellwig wrote: > On Tue, Apr 24, 2018 at 10:27:21PM -0700, Eric Dumazet wrote: >> When adding tcp mmap() implementation, I forgot that socket lock >> had to be taken before current->mm->mmap_sem. syzbot eventually caught >> the bug. >> >> Since we can not lock the socket in tcp mmap() handler we have to >> split the operation in two phases. >> >> 1) mmap() on a tcp socket simply reserves VMA space, and nothing else. >> This operation does not involve any TCP locking. >> >> 2) setsockopt(fd, IPPROTO_TCP, TCP_ZEROCOPY_RECEIVE, ...) implements >> the transfert of pages from skbs to one VMA. >> This operation only uses down_read(¤t->mm->mmap_sem) after >> holding TCP lock, thus solving the lockdep issue. >> >> This new implementation was suggested by Andy Lutomirski with great details. > > Thanks, this looks much more sensible to me. > Thanks Christoph Note the high cost of zap_page_range(), needed to avoid -EBUSY being returned from vm_insert_page() the second time TCP_ZEROCOPY_RECEIVE is used on one VMA. Ideally a vm_replace_page() would avoid this cost ? 6.51% tcp_mmap [kernel.kallsyms] [k] unmap_page_range 5.90% tcp_mmap [kernel.kallsyms] [k] vm_insert_page 4.85% tcp_mmap [kernel.kallsyms] [k] _raw_spin_lock 4.50% tcp_mmap [kernel.kallsyms] [k] mark_page_accessed 3.51% tcp_mmap [kernel.kallsyms] [k] page_remove_rmap 2.99% tcp_mmap [kernel.kallsyms] [k] page_add_file_rmap 2.53% tcp_mmap [kernel.kallsyms] [k] release_pages 2.38% tcp_mmap [kernel.kallsyms] [k] put_page 2.37% tcp_mmap [kernel.kallsyms] [k] smp_call_function_single 2.28% tcp_mmap [kernel.kallsyms] [k] __get_locked_pte 2.25% tcp_mmap [kernel.kallsyms] [k] do_tcp_setsockopt.isra.35 2.21% tcp_mmap [kernel.kallsyms] [k] page_clear_age