Received: by 2002:a05:7412:8521:b0:e2:908c:2ebd with SMTP id t33csp1773952rdf; Sun, 5 Nov 2023 13:51:35 -0800 (PST) X-Google-Smtp-Source: AGHT+IFKqCyIPX29HOOrM14FySTlCbbL3dVQpqA68LuoHLuL7fZCHPJWWMNEsLsHHQJ6dqVGp+Nc X-Received: by 2002:a05:6870:1290:b0:1bf:df47:7b5e with SMTP id 16-20020a056870129000b001bfdf477b5emr29570543oal.16.1699221095340; Sun, 05 Nov 2023 13:51:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699221095; cv=none; d=google.com; s=arc-20160816; b=yhkqehitj85FQLs1+CULUngS2pRNd+qT/+ADGeYXHgyb8WJ6X3F2oDqTxGm0qn/z8L QeBbsEcAgMWQ8yCQYG2yPqxJWhx4UF/hiFLjlhvafFuoxl3bQxNEFsgCtr9tX1GqHwaf U3O05xmnoljL3nEb0T6DsK6et6wKcQ5L4Lcs37YpvKNTWWfR035FPhomSMX1z0/eNmS9 N80Z0Fs5Za3j6j8Ovbp1iFecekddydXIWDKUdHYHEKr2e2K5n+VEfPXDfdbttINGHuD7 0oHnw0UyU1nLCa+NA6Ub3/2t2UAiEICD4MbS3NHGo6OIdUizyZIP1Ysynr+JRPeCap6E yzdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:references:in-reply-to:subject :cc:to:from:mime-version:content-transfer-encoding:dkim-signature :dkim-signature; bh=nqRgNE5P1jDzN7ZIPihQxLLxqpGLvqnRD7xIJqZCR3w=; fh=270L2zBtgQa+OHtxR/LlGGXicfPgrNiEXy9U3xBoI/o=; b=btQ+affFgqD1hTvP6tOnRlU8By3Weg4hsD8CFc+2pOu6TilJyxxELekqkTouzW61bj vpVwTz/HwUScCpf52i7+kUAryo32f/7D0LfDaNT/fJfMV9ZD9HhXVVIbMEv8W+9ogUnV Pn5p8Kul4REE/Z658ttiQmpl14pLV47UHZuBi3YQEcnHImBEW2gNgAP2T0JT5MSOAF8F On+qXG3zVWe1SA6qMq4n9wUjwvDwfhVeOdvetksmk7Nc1nfjSoQhC+7XYqqrS0hhogVG bhb/9dzLIyIi9QULMcJzTxRwtV3LUW/Co7FU/uVxXeRuUlQxCfN/O9CJSWmXVYB6YcXQ aJeQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=bkebJNJQ; dkim=neutral (no key) header.i=@suse.de header.b=6ZSOgg8O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id u186-20020a6385c3000000b00578a56baebesi5949694pgd.674.2023.11.05.13.51.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 05 Nov 2023 13:51:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=bkebJNJQ; dkim=neutral (no key) header.i=@suse.de header.b=6ZSOgg8O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 07AA5805092B; Sun, 5 Nov 2023 13:51:33 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229451AbjKEVvY (ORCPT + 99 others); Sun, 5 Nov 2023 16:51:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229445AbjKEVvW (ORCPT ); Sun, 5 Nov 2023 16:51:22 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50135BF; Sun, 5 Nov 2023 13:51:20 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id D1D6B1F37C; Sun, 5 Nov 2023 21:51:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1699221078; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nqRgNE5P1jDzN7ZIPihQxLLxqpGLvqnRD7xIJqZCR3w=; b=bkebJNJQlgGnGaGuzrpv8xNNiYhy55GigospdGDghd7pT7NDhBo6KlBtKs3xDTrTtClU9j bjojsKlad8D6MHdAvX1mCiAW2luv/aldEG11y+2Mwl53MZldLEhANu1pp5KfWo5olHrDWl GrGtAc2VACtJxdpRnoSYlBzY0lkgtAQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1699221078; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=nqRgNE5P1jDzN7ZIPihQxLLxqpGLvqnRD7xIJqZCR3w=; b=6ZSOgg8Ov49ZnnUtryHN6DXIxvEgeqeALDmK5+wogtQ4QXMTOz6frL4vjDWLKH6aALqPjq qvvC28koZqkUEKAQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 906B413463; Sun, 5 Nov 2023 21:51:17 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id XKH0EFUOSGV/bgAAMHmgww (envelope-from ); Sun, 05 Nov 2023 21:51:17 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 From: "NeilBrown" To: "Donald Buczek" Cc: "Linux Kernel Mailing List" , linux-fsdevel@vger.kernel.org Subject: Re: Heisenbug: I/O freeze can be resolved by cat $task/cmdline of unrelated process In-reply-to: <77184fcc-46ab-4d69-b163-368264fa49f7@molgen.mpg.de> References: <77184fcc-46ab-4d69-b163-368264fa49f7@molgen.mpg.de> Date: Mon, 06 Nov 2023 08:51:11 +1100 Message-id: <169922107188.24305.7903791112230110428@noble.neil.brown.name> X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Sun, 05 Nov 2023 13:51:33 -0800 (PST) On Sun, 05 Nov 2023, Donald Buczek wrote: .... >=20 > for task in /proc/*/task/*; do > echo "# # $task: $(cat $task/comm) : $(cat $task/cmdline | xargs = -0 echo)" > cmd cat $task/stack > done >=20 > which can further be reduced to >=20 > for task in /proc/*/task/*; do echo $task $(cat $task/cmdline | xargs = -0 echo); done >=20 > This is absolutely reproducible. Above line unblocks the system reliably. >=20 > Another remarkable thing: We've modified above code to do the > processes slowly one by one and checking after each step if I/O > resumed. And each time we've tested that, it was one of the 64 nfsd > processes (but not the very first one tried). While the systems > exports filesystems, we have absolutely no reason to assume, that any > client actually tries to access this nfs server. Additionally, when > the full script is run, the stack traces show all nfsd tasks in their > normal idle state ( [<0>] svc_recv+0x7bd/0x8d0 [sunrpc] ). >=20 > Does anybody have an idea, how a `cat /proc/PID/cmdline` on a specific > assumed-to-be-idle nfsd thread could have such an "healing" effect? /proc/PID/cmndline for an nfsd thread is empty. So it probably isn't accessing 'cmdline' specifically that unblocks, but any (or almost any) proc file for the process might help. You say that *after* accessing cmdline, the "stack" file shows a normal stack trace. It might be interesting to see if that same stack is present *before* accessing cmdline. But my guess is that nfsd is mostly a distraction. It would help to see the fully "echo t > /proc/sysrq-trigger" list of all process stacks. That should reveal where the blockage is. NeilBrown >=20 > I'm well aware, that, for example, a hardware problem might result in > just anything and that the question might not be answerable at all. > If so: please excuse the noise. >=20 > Thanks > Donald > --=20 > Donald Buczek > buczek@molgen.mpg.de > Tel: +49 30 8413 1433 >=20 >=20