Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3423289pxb; Sun, 31 Jan 2021 15:47:46 -0800 (PST) X-Google-Smtp-Source: ABdhPJx7+3JvLqrcyKqbdHT5oFhHg8QX9cOlc3MoH7aT4eeUVmz4hnc8whYph3oO91RFzLeOi02f X-Received: by 2002:aa7:d64b:: with SMTP id v11mr15533438edr.16.1612136866548; Sun, 31 Jan 2021 15:47:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612136866; cv=none; d=google.com; s=arc-20160816; b=ySoMGrL4WyQqd37c9a/KvjIcWrKqjFWHyy0nIX8zUCJyXK+Gm6vz/MVOZ96dS5WNz9 MXnoWbDRrr0gGmvg4gXHP0CNubmh0Dk8DHfFQhFk+tYn1TlAJNy9/Ze9MWkLhyNaGo6C fw9rmEM9qLtzwUltyaYzsQ1Voz7cak67vzJWSdgRB8HoD+qGbFJKGLE2eSa80tgskwCy VvW//5DxaX6q75/4/FRxGLQnauJ/fIB7F7Xq5jeKkf7W0gqg/PaKlOj4wH13MWGM217r Y29rJvZCToleD+MTsi9KKKjYriprHLF11EbtSMY8cLElePCMCeMFxx6IvTPJwm6JZJ7p eNXA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:references:in-reply-to :subject:cc:date:to:from; bh=zWMDc9xA1P7tN5/xjuI2l+1dUi629qNXR0c4ttb7lZo=; b=aDJvRDIglGT/M9vBcquZlhbNzz5OnCKaWpJqcSaJptu5UbnwOyEhOC/w5+OPX4naJd Gjor1tn+3KlII/cB/Y42al1si8y5paszB7joWa59CTtEzumoKjCSLDMZozutBddqQGDd 1XeMZ1IUgBQfciHU+tCKJWjbO5IXAixB2UhHUcb9RO8cJydsMX1VkU9SL3TjCrYySGf6 ZfBB6XSZZiOqjd0mgxAk5whlOJXxUF9gFLgIJm1/lhiKTjqtm0CwTz6vktpcD7m12iZK US6Hi30eUQX2kQQtPae64CtUtrYOAxUPS8wb9fDbJv4NAkHkKjFC+Gp9vFxKdD3M0pE+ +hBA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j8si9488461edv.130.2021.01.31.15.47.12; Sun, 31 Jan 2021 15:47:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229506AbhAaXqX (ORCPT + 99 others); Sun, 31 Jan 2021 18:46:23 -0500 Received: from mx2.suse.de ([195.135.220.15]:60766 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229495AbhAaXqT (ORCPT ); Sun, 31 Jan 2021 18:46:19 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 5A490AC4F; Sun, 31 Jan 2021 23:45:36 +0000 (UTC) From: NeilBrown To: Chuck Lever Date: Mon, 01 Feb 2021 10:45:31 +1100 Cc: Linux NFS Mailing List Subject: Re: releasing result pages in svc_xprt_release() In-Reply-To: <597824E7-3942-4F11-958F-A6E247330A9E@oracle.com> References: <811BE98B-F196-4EC1-899F-6B62F313640C@oracle.com> <87im7ffjp0.fsf@notabene.neil.brown.name> <597824E7-3942-4F11-958F-A6E247330A9E@oracle.com> Message-ID: <878s88fz6s.fsf@notabene.neil.brown.name> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable On Fri, Jan 29 2021, Chuck Lever wrote: >> On Jan 29, 2021, at 5:43 PM, NeilBrown wrote: >>=20 >> On Fri, Jan 29 2021, Chuck Lever wrote: >>=20 >>> Hi Neil- >>>=20 >>> I'd like to reduce the amount of page allocation that NFSD does, >>> and was wondering about the release and reset of pages in >>> svc_xprt_release(). This logic was added when the socket transport >>> was converted to use kernel_sendpage() back in 2002. Do you >>> remember why releasing the result pages is necessary? >>>=20 >>=20 >> Hi Chuck, >> as I recall, kernel_sendpage() (or sock->ops->sendpage() as it was >> then) takes a reference to the page and will hold that reference until >> the content has been sent and ACKed. nfsd has no way to know when the >> ACK comes, so cannot know when the page can be re-used, so it must >> release the page and allocate a new one. >>=20 >> This is the price we pay for zero-copy, and I acknowledge that it is a >> real price. I wouldn't be surprised if the trade-offs between >> zero-copy and single-copy change over time, and between different >> hardware. > > Very interesting, thanks for the history! Two observations: > > - I thought without MSG_DONTWAIT, the sendpage operation would be > total synchronous -- when the network layer was done with retransmissions, > it would unblock the caller. But that's likely a mistaken assumption > on my part. That could be why sendmsg is so much slower than sendpage > in this particular application. > On the "send" side, I think MSG_DONTWAIT is primarily about memory allocation. send_msg() can only return when the message is queued. If it needs to allocate memory (or wait for space in a restricted queue), then MSG_DONTWAIT says "fail instead". It certainly doesn't wait for successful xmit and ack. On the "recv" side it is quite different of course. > - IIUC, nfsd_splice_read() replaces anonymous pages in rq_pages with > actual page cache pages. Those of course cannot be used to construct > subsequent RPC Replies, so that introduces a second release requirement. Yep. I wonder if those pages are protected against concurrent updates .. so that a computed checksum will remain accurate. > > So I have a way to make the first case unnecessary for RPC/RDMA. It > has a reliable Send completion mechanism. Sounds like releasing is > still necessary for TCP, though; maybe that could be done in the > xpo_release_rqst callback. It isn't clear to me what particular cost you are trying to reduce. Is handing a page back from RDMA to nfsd cheaper than nfsd calling alloc_page(), or do you hope to keep batches of pages together to avoid multi-page overheads, or is this about cache-hot pages, or ??? > > As far as nfsd_splice_read(), I had thought of moving those pages to > a separate array which would always be released. That would need to > deal with the transport requirements above. > > If nothing else, I would like to add mention of these requirements > somewhere in the code too. Strongly agree with that. > > What's your opinion? To form a coherent opinion, I would need to know what that problem is. I certainly accept that there could be performance problems in releasing and re-allocating pages which might be resolved by batching, or by copying, or by better tracking. But without knowing what hot-spot you want to cool down, I cannot think about how that fits into the big picture. So: what exactly is the problem that you see? Thanks, NeilBrown > > > -- > Chuck Lever --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQJCBAEBCAAsFiEEG8Yp69OQ2HB7X0l6Oeye3VZigbkFAmAXQRwOHG5laWxiQHN1 c2UuZGUACgkQOeye3VZigbm/5g//VnCiYNAm1U02K3RsE+30dsK9rKHYUsZC3YL0 oqJRJ+xst4qgaF3/n3kPgu6w5tdVvr1G5hWK02+VJbm/yUkAkz8MI1nnfXzWqcjt atdnaAMfb2a9tGG1On7S1aH79E2P/tNUFrD6faki8lAmlCaeVbgNVjD6PvN2W8Ak hxuCAnGL1Ah54Ma3t4TFBfQ0fqII+NdUgwjpV3EBY8zIxNi43p7NzyZ8+U+9FhTZ q/pJI11Vm8JDWlQBOR/VBNJZlidAFLz8/HkBWgVB4cTs4VzSHkF7zzGP//H6pChl XHIOGU4x4IpuRdkEkRfGn4LkFlZpsw3DPCr/dkHasAITL63Y2t/HyMj9W20faCU0 C1hWYGNqw/aQc44jardvV3nCpQwNusAUOI9fliZOh9BX/1wYi9PaxFJMXozSVS5n UWwhG5nWSYzI6/l0o4ArviBQw+SWFgul0v6mK8hDjWWJWbYt8iFls86SDU4rQFgt QeLu8ggx7c/33fnbtPvGbj7nyUFqwFTIC5eEXl0Qp2UM53tK2kNF69Wiol9WiZqX vP2wLNerAUoWvloKaS8fpZRYg7kBUQraO30/NqxKfXQZlm0iuzjB4sB+0BCf7+3C 197SO6LX1oM4UWtJzBnP4ypEtDH8nmWoqt+FICELYl3MbboswqoajBeVNoxhTAL+ LAEOuuU= =p+KS -----END PGP SIGNATURE----- --=-=-=--