Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp10117449ybi; Wed, 10 Jul 2019 23:50:38 -0700 (PDT) X-Google-Smtp-Source: APXvYqwbLmHWrxNlWdWBB4OuGBJ2eKW2Jic3FaqjOXw93ITcdtR+m+X1yoRyri7yr5VwlzV1m/WI X-Received: by 2002:a17:902:8d97:: with SMTP id v23mr2741134plo.157.1562827838493; Wed, 10 Jul 2019 23:50:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562827838; cv=none; d=google.com; s=arc-20160816; b=P1Dwhjw9AkskAL83E5sc7QznBqGKKbFyU/c5H4Eh3l/UBGs+xNcScm+K6FfXDeS2QN yK9wf2XY0tuuC8+tpFMd9rqHWOz8wOvvtx0LW1qVGFgv7RxjqcPX3FZezrzMKD2HWc7k Ynif9PY3ptJORfGrSxZR2gNSvFBwBsUxA3Wr6Vx+DOci5PRmSr87vJ6FuOjcNP0WfhEP Ky/0SWD0e9YMcC66QKBzxCVPavkE98HuVpFERVsMBYwMyIiV/3uVJNQLJL9Nutin/dYF vT0YWLkezpZ0W5fWgoFUkkz4cLMXBC+QKahj7vXhN3biv/Gu5kPINQFB5iIUTtBxSea2 PEJA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:subject:message-id:date:from :mime-version:dkim-signature; bh=+ntdsTiWVThPk9Dhmu/aXX9siFIXi+NZoeZ+1O1Wb1M=; b=HDmK2bidHvWWCbRIoDart8OWTfRZyiYyYAYJ0RtdvYVdTHYb/Cx5sehmTggGqt13Bk aIwVg5qVNCm5GkgZDxwxoBI/wXqE0gh1liJYRG+X4fNgzoRhnLytL9UC3ewVAjdxMM7W vMzjNt0d2nynlKsbLeb65cAJs2wf2NEy9h/7YYYvLhxFuYZgrKJ0ZiFdFWQbPtCTkJkH XAjsg7Z/cs2QpFY9ZCmZv6ATX8nH9i2p02DuuFqXn8vcV0Km6Lt/9pMmFuqd6hK8tZvU IclY+h02yI39VaKdMnmhSit9ot5wvk9F+fG9XaGTdzYQEZHL+iLgkMaER+Woy8gCG2s9 /FXQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@techterra-in.20150623.gappssmtp.com header.s=20150623 header.b=ZwkKvpP4; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v10si4308731plg.320.2019.07.10.23.50.11; Wed, 10 Jul 2019 23:50:38 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@techterra-in.20150623.gappssmtp.com header.s=20150623 header.b=ZwkKvpP4; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728045AbfGKGt6 (ORCPT + 99 others); Thu, 11 Jul 2019 02:49:58 -0400 Received: from mail-ot1-f48.google.com ([209.85.210.48]:46336 "EHLO mail-ot1-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725963AbfGKGt6 (ORCPT ); Thu, 11 Jul 2019 02:49:58 -0400 Received: by mail-ot1-f48.google.com with SMTP id z23so4736732ote.13 for ; Wed, 10 Jul 2019 23:49:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=techterra-in.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=+ntdsTiWVThPk9Dhmu/aXX9siFIXi+NZoeZ+1O1Wb1M=; b=ZwkKvpP4Tc/g8GDeKvwonkpadKcjPUZWu3kNonEm+S+YLFne5x97H+bn2RLCPeyFOY UnI88ggeLU1xd2ZhlsqoWIfrSv8GHiV0rf85WqyLPkQGtCoqoS/c93GHrYGUsl+CwWRe zhDc507Y818pv8lOAPbXQgdfomP/hLUsO13DYDQgUuo5jN+UKSyefsa+NHoGElqVZuIx 8ste+mhSKvQ8WqvwM4q6H54nQ61ABvx5u6N7YSfXfxGxK+SPl2BDGQA7ey2iuTlr0jT6 oV4TovGMXM0qFbBcTWlnkMKytzO8WZACjpLCzbaDEPi0g/PMLKmBPr+Mq1K74kv/Ot0P VEgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=+ntdsTiWVThPk9Dhmu/aXX9siFIXi+NZoeZ+1O1Wb1M=; b=RC790eesVrkKhlOMDbp8l3iacWzcflGYVP6YMYGWoxJGyreUP54zlkCFOp6hlwQb6/ 71D/X+Fwd2K1lhtuOz0FkpIba9lZUXa6X3VecTMgog2KHd+SjDrJB15cfvGlCnUke6t9 OE04NqjJxMLIfHV/lC5zaCQY94F4uN1qkbGkdOe/4G2Uok3QF1a0CKL6SgU2v3YlqYe9 R9EmvbqWSQo7TTJZPtFvxSxn4/kNXtVc8CsFRkYiqyIjZ5WQ9l5LEqtD8CzL+0OJTb3e y+xymf7gmFt7rxGtsA8nUlCjt0Dxt/prZGOp6KWWrMo/aXOhrggdr/C8LRX8tjGbjYgp 5sdQ== X-Gm-Message-State: APjAAAUvKLWEPGkBZp/kE9LDcUgf1VymRuC6s2dg36OL0A5TvnaiyfEw 3AmNCxR+01R9GZCxYN6XKKBPPdcKUdfRG8191ys5e06Y X-Received: by 2002:a05:6830:2098:: with SMTP id y24mr1934965otq.173.1562827797029; Wed, 10 Jul 2019 23:49:57 -0700 (PDT) MIME-Version: 1.0 From: Indivar Nair Date: Thu, 11 Jul 2019 12:19:21 +0530 Message-ID: Subject: rpc.statd dies because of pacemaker monitoring To: linux-nfs@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Hi ..., I have a 2 node Pacemaker cluster built using CentOS 7.6.1810 It serves files using NFS and Samba. Every 15 - 20 minutes, the rpc.statd service fails, and the whole NFS service is restarted. After investigation, it was found that the service fails after a few rounds of monitoring by Pacemaker. The Pacemaker's script runs the following command to check whether all the services are running - --------------------------------------------------------------------------------------------------------------------------------------- rpcinfo > /dev/null 2>&1 rpcinfo -t localhost 100005 > /dev/null 2>&1 nfs_exec status nfs-idmapd > $fn 2>&1 rpcinfo -t localhost 100024 > /dev/null 2>&1 --------------------------------------------------------------------------------------------------------------------------------------- The script is scheduled to check every 20 seconds. This is the message we get in the logs - ------------------------------------------------------------------------------------------------------------------------------------- Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: check_default: access by 127.0.0.1 ALLOWED Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: Received NULL request from 127.0.0.1 Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: check_default: access by 127.0.0.1 ALLOWED (cached) Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: Received NULL request from 127.0.0.1 Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: check_default: access by 127.0.0.1 ALLOWED (cached) Jul 09 07:33:56 virat-nd01 rpc.mountd[51641]: Received NULL request from 127.0.0.1 ------------------------------------------------------------------------------------------------------------------------------------- After 10 seconds, we get his message - ------------------------------------------------------------------------------------------------------------------------------------- Jul 09 07:34:09 virat-nd01 nfsserver(virat-nfs-daemon)[54087]: ERROR: rpc-statd is not running ------------------------------------------------------------------------------------------------------------------------------------- Once we get this error, the NFS service is automatically restarted. "ERROR: rpc-statd is not running" message is from the pacemaker's monitoring script. I have pasted that part of the script below. I disabled monitoring and everything is working fine, since then. I cant keep the cluster monitoring disabled forever. Kindly help. Regards, Indivar Nair Part of the pacemaker script that does the monitoring (/usr/lib/ocf/resources.d/heartbeat/nfsserver) ======================================================================= nfsserver_systemd_monitor() { local threads_num local rc local fn ocf_log debug "Status: rpcbind" rpcinfo > /dev/null 2>&1 rc=$? if [ "$rc" -ne "0" ]; then ocf_exit_reason "rpcbind is not running" return $OCF_NOT_RUNNING fi ocf_log debug "Status: nfs-mountd" rpcinfo -t localhost 100005 > /dev/null 2>&1 rc=$? if [ "$rc" -ne "0" ]; then ocf_exit_reason "nfs-mountd is not running" return $OCF_NOT_RUNNING fi ocf_log debug "Status: nfs-idmapd" fn=`mktemp` nfs_exec status nfs-idmapd > $fn 2>&1 rc=$? ocf_log debug "$(cat $fn)" rm -f $fn if [ "$rc" -ne "0" ]; then ocf_exit_reason "nfs-idmapd is not running" return $OCF_NOT_RUNNING fi ocf_log debug "Status: rpc-statd" rpcinfo -t localhost 100024 > /dev/null 2>&1 rc=$? if [ "$rc" -ne "0" ]; then ocf_exit_reason "rpc-statd is not running" return $OCF_NOT_RUNNING fi nfs_exec is-active nfs-server rc=$? # Now systemctl is-active can't detect the failure of kernel process like nfsd. # So, if the return value of systemctl is-active is 0, check the threads number # to make sure the process is running really. # /proc/fs/nfsd/threads has the numbers of the nfsd threads. if [ $rc -eq 0 ]; then threads_num=`cat /proc/fs/nfsd/threads 2>/dev/null` if [ $? -eq 0 ]; then if [ $threads_num -gt 0 ]; then return $OCF_SUCCESS else return 3 fi else return $OCF_ERR_GENERIC fi fi return $rc } =======================================================================