Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp2902822pxt; Mon, 9 Aug 2021 11:32:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwzVcMyknuYnsXAT0GiB+zXGRyLfHHx2xApLbAfiJt1OsjttXjNqEAPbRmecu0AIpNOPKUY X-Received: by 2002:a17:906:d52:: with SMTP id r18mr23671123ejh.47.1628533957381; Mon, 09 Aug 2021 11:32:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628533957; cv=none; d=google.com; s=arc-20160816; b=gHMCCrbhjJoPRYYcUB8/HN5JDCVhrEljgIRKEmxP8ZV2YSq8+Em/ElRcBXvL+3NH/x tX1QPSQWYa97hTVA8QuNc3UvQK40YJIuoXbf8hO98pM9t83uYD5gIPQYZB+xYbqOyIfW xc8aZuftxF6gvoeCBI13N6AM93HHAeqI6ilDQvCpDykGAo5+IqDUvcjwmPL4ne4DBK/n Ya3AGImHrly2pIJQRCfSt2vxILXCwTaCxcfoRPZJLneXQewIB3GmuGeUHXK/fC6Gp+Ug lacrH8JXK+GYB5Rbt0mkcsf6syHejFvfjehFCxnzJZUafWz6K9H14abcN0qOdlkN1w7L N5WQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:thread-index:thread-topic :content-transfer-encoding:mime-version:subject:references :in-reply-to:message-id:cc:to:from:date:dkim-signature:dkim-filter; bh=GE75/w75Z+n12N/Oo9YBZrUlPJ/VLiT2pAw72DqaJJ0=; b=slzSvFAfvew9TEaSLogUoKfQvxFupt5+l6SufVvTts9gCSu/zWtbqQ051xGf6gv4KU 6WfZLqvmNeXvMTv3YHutMrUB6oBTXHPfW6x11PlTGdGlzpfuo1vr7oYbAxim72rjCeOk FCn6OcHXuYnkkVKXoA0frvk5+GrTN+QMtAsTan/NvxbZQsVG6VKOWS5/P7/RDq4trb6L N1lPdbBwdXKRc5I11mlCgFusdPdleSwsI8jaCRDuiBIPb6D1ShZGUPoAt0whGldiHARf qbshGtsDFgEH/uC1piFBj0yRbfVF5U5x2VPGbfvcJj5BrUmfgLiIOXUq8YQFYgCFA+U2 8+pQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@raptorengineering.com header.s=B8E824E6-0BE2-11E6-931D-288C65937AAD header.b=LFois8Ad; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=raptorengineering.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id b19si8611551edd.545.2021.08.09.11.32.11; Mon, 09 Aug 2021 11:32:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@raptorengineering.com header.s=B8E824E6-0BE2-11E6-931D-288C65937AAD header.b=LFois8Ad; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=raptorengineering.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235043AbhHISb2 (ORCPT + 99 others); Mon, 9 Aug 2021 14:31:28 -0400 Received: from mail.rptsys.com ([23.155.224.45]:13201 "EHLO mail.rptsys.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235334AbhHISb0 (ORCPT ); Mon, 9 Aug 2021 14:31:26 -0400 Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id 8E11F37B2FD40E; Mon, 9 Aug 2021 13:31:04 -0500 (CDT) Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10032) with ESMTP id MkWdjweFs0WV; Mon, 9 Aug 2021 13:30:59 -0500 (CDT) Received: from localhost (localhost [127.0.0.1]) by mail.rptsys.com (Postfix) with ESMTP id AFEB337B2FD404; Mon, 9 Aug 2021 13:30:59 -0500 (CDT) DKIM-Filter: OpenDKIM Filter v2.10.3 mail.rptsys.com AFEB337B2FD404 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=raptorengineering.com; s=B8E824E6-0BE2-11E6-931D-288C65937AAD; t=1628533859; bh=GE75/w75Z+n12N/Oo9YBZrUlPJ/VLiT2pAw72DqaJJ0=; h=Date:From:To:Message-ID:MIME-Version; b=LFois8AdZkk1lYqh2DkazL/qwQzNiUtnNxe29Djh6UIdV+OlFvMkIZM7bcwTx4ls0 YmN4s0P9nQNk8VS+ortr/En3OGtu7Gj3qRkv5rQBElpmWwekU/VxibJY4b3oWti5oA rznZSFNIyO2tqHjGcH+UKNvQAV5jlnOTY4aM5mRM= X-Virus-Scanned: amavisd-new at rptsys.com Received: from mail.rptsys.com ([127.0.0.1]) by localhost (vali.starlink.edu [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id Fk1BRtmVwjna; Mon, 9 Aug 2021 13:30:59 -0500 (CDT) Received: from vali.starlink.edu (unknown [192.168.3.2]) by mail.rptsys.com (Postfix) with ESMTP id 8F51037B2FD401; Mon, 9 Aug 2021 13:30:59 -0500 (CDT) Date: Mon, 9 Aug 2021 13:30:59 -0500 (CDT) From: Timothy Pearson To: hedrick Cc: Chuck Lever , "J. Bruce Fields" , linux-nfs Message-ID: <1065010667.1047836.1628533859535.JavaMail.zimbra@raptorengineeringinc.com> In-Reply-To: References: <281642234.3818.1625478269194.JavaMail.zimbra@raptorengineeringinc.com> <1288667080.5652.1625478421955.JavaMail.zimbra@raptorengineeringinc.com> <3A4DF3BB-955C-4301-BBED-4D5F02959F71@rutgers.edu> <359473237.1035413.1628528802863.JavaMail.zimbra@raptorengineeringinc.com> <2FEAFB26-C723-450D-A115-1D82841BBF73@rutgers.edu> <77ED566A-7738-4F62-867C-1C2DFC5D34AB@oracle.com> Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy load MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Mailer: Zimbra 8.5.0_GA_3042 (ZimbraWebClient - GC91 (Linux)/8.5.0_GA_3042) Thread-Topic: CPU stall, eventual host hang with BTRFS + NFS under heavy load Thread-Index: /4LRKH3x2rGXKyuTO2X5jsbUAugA9w== Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org FWIW that's *exactly* what we see. Eventually, if the server is left alone= for enough time, even the login system stops responding -- it's as if the = I/O subsystem degrades and eventually blocks entirely. ----- Original Message ----- > From: "hedrick" > To: "Chuck Lever" > Cc: "Timothy Pearson" , "J. Bruce Fields"= , "linux-nfs" > > Sent: Monday, August 9, 2021 1:29:30 PM > Subject: Re: CPU stall, eventual host hang with BTRFS + NFS under heavy l= oad > Evidence is ambiguous. It seems that NFS activity hangs. The first time t= his > occurred I saw a process at 100% running rpciod. I tried to do a =E2=80= =9Csync=E2=80=9D and > reboot, but the sync hung. >=20 > The last time I couldn=E2=80=99t get data, but the kernel was running and= responding to > ping. An ssh session responded to CR but when I tried to sudo it hung. At= tempt > to login hung. Oddly, even though the ssh session responded to CR, syslog > entries on the local system stopped until the reboot. However we also sen= d > syslog entries to a central server. Those continued and showed a continui= ng set > of mounts and unmounts happening through the reboot. >=20 > I was goiog to get a stack trace of the 100% process if that happened aga= in, but > last time I wasn=E2=80=99t in a situation to do that. I don=E2=80=99t thi= nk users will put up > with further attempts to debug, so for the moment I=E2=80=99m going to tr= y disabling > delegations. >=20 >> On Aug 9, 2021, at 1:37 PM, Chuck Lever III wro= te: >>=20 >> Then when you say "server hangs" you mean that the entire NFS server > > system deadlocks. It's not just unresponsive on one or more exports.