Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69314C43441 for ; Thu, 29 Nov 2018 19:28:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 2701A20868 for ; Thu, 29 Nov 2018 19:28:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2701A20868 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725928AbeK3GfU (ORCPT ); Fri, 30 Nov 2018 01:35:20 -0500 Received: from foss.arm.com ([217.140.101.70]:43870 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725788AbeK3GfT (ORCPT ); Fri, 30 Nov 2018 01:35:19 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A8FD880D; Thu, 29 Nov 2018 11:28:49 -0800 (PST) Received: from [10.1.197.50] (e120937-lin.cambridge.arm.com [10.1.197.50]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DE7723F73F; Thu, 29 Nov 2018 11:28:48 -0800 (PST) Subject: Re: [PATCH v3 40/44] SUNRPC: Simplify TCP receive code by switching to using iterators To: Catalin Marinas , Trond Myklebust Cc: linux-nfs@vger.kernel.org References: <20180917130335.112832-32-trond.myklebust@hammerspace.com> <20180917130335.112832-33-trond.myklebust@hammerspace.com> <20180917130335.112832-34-trond.myklebust@hammerspace.com> <20180917130335.112832-35-trond.myklebust@hammerspace.com> <20180917130335.112832-36-trond.myklebust@hammerspace.com> <20180917130335.112832-37-trond.myklebust@hammerspace.com> <20180917130335.112832-38-trond.myklebust@hammerspace.com> <20180917130335.112832-39-trond.myklebust@hammerspace.com> <20180917130335.112832-40-trond.myklebust@hammerspace.com> <20180917130335.112832-41-trond.myklebust@hammerspace.com> <20181109111930.idve77thgcnsg5u3@localhost> From: Cristian Marussi Message-ID: Date: Thu, 29 Nov 2018 19:28:47 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1 MIME-Version: 1.0 In-Reply-To: <20181109111930.idve77thgcnsg5u3@localhost> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Hi Trond, Catalin On 09/11/2018 11:19, Catalin Marinas wrote: > Hi Trond, > > On Mon, Sep 17, 2018 at 09:03:31AM -0400, Trond Myklebust wrote: >> Most of this code should also be reusable with other socket types. >> >> Signed-off-by: Trond Myklebust >> --- >> include/linux/sunrpc/xprtsock.h | 19 +- >> include/trace/events/sunrpc.h | 15 +- >> net/sunrpc/xprtsock.c | 694 +++++++++++++++----------------- >> 3 files changed, 335 insertions(+), 393 deletions(-) > > With latest mainline (24ccea7e102d, it includes Al Viro's iov_iter > fixup) I started hitting some severe slowdown and systemd timeouts with > nfsroot on arm64 machines (physical or guests under KVM). Interestingly, > it only happens when the client kernel is configured with 64K pages, the > 4K pages configuration runs fine. It also runs fine if I add rsize=65536 > to the nfsroot= argument. > > Bisecting led me to commit 277e4ab7d530 ("SUNRPC: Simplify TCP receive > code by switching to using iterators"). Prior to this commit, it works > fine. > > Some more info: > > - defconfig with CONFIG_ARM64_64K_PAGES enabled > > - kernel cmdline arg: nfsroot=:/srv/nfs/debian-arm64,tcp,v4 > > - if it matters, the server is also an arm64 machine running 4.19 with > 4K pages configuration > > I haven't figured out what's wrong or even how to debug this as I'm not > familiar with the sunrpc code. Any suggestion? > > Thanks. > I've done a bit of experiments/observations with this since it was seriously impacting all form of testing on arm64 with a 64K pages configuration. I can confirm rsize=65536 workaround above mentioned by Catalin is effective also for me, as it is to reset back before the commit mentioned in the subject. In the following I tested instead with: - linus arm64 v4.20-rc1 64K pages + "Debug Lockups and Hangs" Enabled - hw Juno-r2 - fully NFS mounted rootfs (Debian 9) - NO rsize workaround - NFS Client config(nfsstat -m) Flags: rw,relatime,vers=4.0,rsize=4096,wsize=4096,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.1,local_lock=none,addr=192.168.0.254 Observations: 1. despite some general boot slow down (not so evident in my setup), I hit the issue when simply trying to launch LTP or LKP tests (all is in NFS mounted rootfs): the immediately observable behavior is just that the application gets 'apparently' stuck straight away (NO output whatsoever). Waiting some seconds yields no progress or result NOR any Lockup is detected by Kernel. A good deal of effort is needed to kill the process at this point...BUT it is feasible (many SIGSTOP + KILL)...and the system is back alive. 2. running again LKP via 'strace' we can observe the process apparently starting fine but then suddenly hanging randomly multiple times: at first on an execve() and then on some read() while trying to load its own file components; each hang lasts 30-45 seconds approximately. In LKP as an example: $ strace lkp run ./dbench-100%.yaml .... newfstatat(AT_FDCWD, "/opt/lkp-tests/bin/run-local", {st_mode=S_IFREG|0755, st_size=4367, ...}, 0) = 0 faccessat(AT_FDCWD, "/opt/lkp-tests/bin/run-local", X_OK) = 0 execve("/opt/lkp-tests/bin/run-local", ["/opt/lkp-tests/bin/run-local", "./dbench-100%.yaml"], [/* 12 vars */] <<< HANGGG ... ... 30-40 secs.. .... openat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems.rb", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7 fstat(7, {st_mode=S_IFREG|0644, st_size=33018, ...}) = 0 close(7) = 0 getuid() = 0 geteuid() = 0 getgid() = 0 getegid() = 0 oopenat(AT_FDCWD, "/usr/lib/ruby/2.3.0/rubygems.rb", O_RDONLY|O_NONBLOCK|O_CLOEXEC) = 7 fcntl(7, F_SETFL, O_RDONLY) = 0 fstat(7, {st_mode=S_IFREG|0644, st_size=33018, ...}) = 0 fstat(7, {st_mode=S_IFREG|0644, st_size=33018, ...}) = 0 ioctl(7, TCGETS, 0xffffdd0895d8) = -1 ENOTTY (Inappropriate ioctl for device) read(7, <<