Received: by 2002:a05:6358:d09b:b0:dc:cd0c:909e with SMTP id jc27csp912172rwb; Wed, 9 Nov 2022 10:07:07 -0800 (PST) X-Google-Smtp-Source: AMsMyM7Pso3yfmgYFn6iKnnjbhkN8+yubRdlZPwmTOZRolHTIP/DweN3JfxjlZ6m0wLer5JjNUHc X-Received: by 2002:a17:90a:fa46:b0:20d:5efa:84fc with SMTP id dt6-20020a17090afa4600b0020d5efa84fcmr78765404pjb.20.1668017227332; Wed, 09 Nov 2022 10:07:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668017227; cv=none; d=google.com; s=arc-20160816; b=YDnI46KoRcReGDUUY/dC6VC7NYfV/UHaboA8EZC/Aq84c1MdFpollOZZzBn8ODR7oP DhUqtg8BrDA+8b84YDgVJHv4BR+OCjFuFp+6zwL1xArX+u3SojJG410fWXfBxJ7RO118 +q0t1sZO1LwM1eUpimx9I48lu7kPx6IeRlwjMCk8yJjQlV3K/h0jJZ7kqCEjvnuOHVlV 4dFKEdkbsAqi2FBUfbqO8MJC68xJHQiE9hzDshunGrvE77gpIEZigmR5vyRVBqyG86ze B1FLT8beUOy1uQJR2uC118/76T3gib0YV/4sdMHsN96wnDa5lJXNMjx9Gn8+F63QKh7/ PjqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=+IUAvFtIPlXoEeo2BNXURxJO4kzh0K7CW4/5dTZlXGk=; b=DznPeNIxNeXMv258e3Tc2DTeTkJYqHsaACL2bZ1E+9hD811mFu7DH77Dw5u61c0yCk YslzFWe9TOZPKtnjVeZ9GcYbTDm8LDKEOxWP0e7CqnRfGUCRmUXYAmbI1zempdaSYUUI FY29m9OJNi8OHkyZTQ3QcKgH+IuJfQByEsYcb2SI3H9lpglU37a8AXvSMj7VLBfdSAxj lp4gh+7o2P9/GWhBxw995GuEzU0ocdUhL0Rj6xEum2j4Q4qDTr6SjDIw8k2xYdRXi4VW 7YfuLOU29krlqhhD2i+2JfMgFmIDG3fMH70i2YmOc/pr8eoEsqKbpq5nKf4JtCB6wDMA FXoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SPlGxW8q; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i24-20020aa796f8000000b0056be594a8b2si15800572pfq.202.2022.11.09.10.06.39; Wed, 09 Nov 2022 10:07:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SPlGxW8q; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229662AbiKISGc (ORCPT + 99 others); Wed, 9 Nov 2022 13:06:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41260 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229517AbiKISGb (ORCPT ); Wed, 9 Nov 2022 13:06:31 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F19D6FCC2; Wed, 9 Nov 2022 10:06:30 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7352561965; Wed, 9 Nov 2022 18:06:30 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C8950C433D6; Wed, 9 Nov 2022 18:06:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1668017189; bh=def/zwDwIgnLvY6DS4LJlJj5V1RwdIlh0oQC8qKGHvA=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=SPlGxW8qPxcnuMsP4wor+EnBYavO0TH8Bg+DsY3ubPkeaTY7oTcVNgDKhqmFdx1Sy 2YlFRuuYpuSYdzbLkg9y81/aI+bIFKpbHiYCTzN7BE8qdpwihiLf/sJbBi8AddpTRl Kv5p42m/t1Lq7mysiUvaD6AngIDScJrge7TnrmMwyNMU5f0FYlWgS/VHNXv3ZrOcji uW4w2r8UzN/WeOfa+L5sqYQ2kC/HTyLDryAOUM2VXfeAo9AhphR2h+0VNHpROvAfg+ c2PoNOG8TdtYVIbPDf2zwd3UTWyY6IDpzsMGVXfF9UnmPzXdLqVtKXV0CVe4ZX/JQJ p3PzkA93X2wOw== Date: Wed, 9 Nov 2022 10:06:29 -0800 From: "Darrick J. Wong" To: Filipe Manana Cc: Shinichiro Kawasaki , Zorro Lang , "fstests@vger.kernel.org" , Linux NFS Mailing List , Chuck Lever III , "djwong@vger.kernel.org" , "linux-xfs@vger.kernel.org" Subject: Re: generic/650 makes v6.0-rc client unusable Message-ID: References: <3E21DFEA-8DF7-484B-8122-D578BFF7F9E0@oracle.com> <20220904131553.bqdsfbfhmdpuujd3@zlang-mailbox> <20221109041951.wlgxac3buutvettq@shindev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Wed, Nov 09, 2022 at 10:36:04AM +0000, Filipe Manana wrote: > On Wed, Nov 9, 2022 at 4:22 AM Shinichiro Kawasaki > wrote: > > > > On Sep 04, 2022 / 21:15, Zorro Lang wrote: > > > On Sat, Sep 03, 2022 at 06:43:29PM +0000, Chuck Lever III wrote: > > > > While investigating some of the other issues that have been > > > > reported lately, I've found that my v6.0-rc3 NFS/TCP client > > > > goes off the rails often (but not always) during generic/650. > > > > > > > > This is the test that runs a workload while offlining and > > > > onlining CPUs. My test client has 12 physical cores. > > > > > > > > The test appears to start normally, but then after a bit > > > > the NFS server workload drops to zero and the NFS mount > > > > disappears. I can't run programs (sudo, for example) on > > > > the client. Can't log in, even on the console. The console > > > > has a constant stream of "can't rotate log: Input/Output > > > > error" type messages. > > > > I also observe this failure when I ran fstests using btrfs on my HDDs. > > The failure is recreated almost always. > > I'm wondering what do you get in dmesg, any traces? > > I've excluded the test from my runs for over an year now, due to some > crash that I reported > to the mm and cpu hotplug people here: > > https://lore.kernel.org/linux-mm/CAL3q7H4AyrZ5erimDyO7mOVeppd5BeMw3CS=wGbzrMZrp56ktA@mail.gmail.com/ > > Unfortunately I had no reply from anyone who works or maintains those > subsystems. > > It didn't happen very often, and I haven't tested again with recent kernels. I've been testing with xfs/btrfs/ext4 nightly, and haven't seen any problems with the last two. There's some very infrequent log accounting problem that is probably a regression from Dave's recent round of log refactorings, so once we're clear of the write race corruption problem, I intend to inquire about that. Granted I also don't have hundreds-of-cpus machines to test this kind of stuff, so I don't know how well hotplug mania fares on a big iron. I don't think it's valid to remove a test from the auto group because it uncovers bugs. If test runner folks want to put it in their own exclude lists for their own convenience, that's fine with me. --D > > > > > > > > > > I haven't looked further into this yet. Actually I'm not > > > > quite sure where to start looking. > > > > > > > > I recently switched this client from a local /home to an > > > > NFS-mounted one, and that's where the xfstests are built > > > > and run from, fwiw. > > > > > > If most of users complain generic/650, I'd like to exclude g/650 from the > > > "auto" default run group. Any more points? > > > > +1. I wish to remove it from the "auto" group. Since I can not login to the test > > machine after the failure, I suggest to put it in the "dangerous" group. > > > > -- > > Shin'ichiro Kawasaki