Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp1823747rdb; Thu, 7 Dec 2023 09:37:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IFblxbt/WCFu3wYIjzG1dmey3/XOaTW96SRB8Hg26ru0YjGbe/tMCkbsQIJC21C8V5VVx2R X-Received: by 2002:a05:6a20:a087:b0:18f:c76a:992e with SMTP id r7-20020a056a20a08700b0018fc76a992emr2047864pzj.109.1701970677756; Thu, 07 Dec 2023 09:37:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701970677; cv=none; d=google.com; s=arc-20160816; b=OcB9jOPmHrEwNeXO11zEyYbcveYGnkQla3+b1ixv04TvRkYPM4V1Fw8v0JoyCeHWuM Ho9s1oFnywd60OeIlckgvMRLDF94o28IUxT7pDw5pHkeQkyokUBBzSONDzQfTztLmyWz KwqOQ7doGzWnmkdgkxhJ3yJ9IX1o+bBUo75F9G6ESXwF/NZgw/8SVTHxWxk39KdNOLze bjpJvXoo/e4q3V85dVQfR7rxZ95M5epzucu2eluD73+AFhPOvkvMKJikiCrQl3uCpbiP QPK8zbf1DJrL1fUbxqEpyXfIfWiWdSACdrxkWsLcA8pg3GC3PMfJwaAG6cTfN3dBdrWz Ta6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:cc:to:subject :message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=kgROHqw9A6fUIa4yKceHMPx7PdpY9xpG9Pz3bkavX0M=; fh=0jVUnLYePgE6VqCKEKj/Lu9gshv9L6hbkQeJeMhNj3k=; b=y6PkVqV1Qb26xRQvZL7Ujidd4NWQAIrpwHs/9hpr7+4+ywBuG5PX5/iQYGYw0eMN8P PzApCvpe5nzHIc/gRsRAa+G4NvX5Drlq5Lfc5vqrMTZV159fptWEjKbNgtktW5wGaAvz IiYkrOAfiiqjF8d/4DSt4xgXS/o5f71WHFyP4eZGBzwVjI53mPmjqal6OgyPMHIqkpgj yxJXqHLVT7JpXfzVGfW9soFxt2bR+ZbGgNoWYbotEAyf297B5B0+eqTHn4IXBsm4ssHD lgQ2xWwpRANX0/AqwIjJMRbz+EiqUJXBjhSvepRJOL6UPTMz5opaF8XeMlixghn6tQSc Gi5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Sp+8mrnj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id u33-20020a056a0009a100b0068e2d888713si54202pfg.167.2023.12.07.09.37.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 09:37:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Sp+8mrnj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 1036680B26B5; Thu, 7 Dec 2023 09:37:55 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1443442AbjLGRhl (ORCPT + 99 others); Thu, 7 Dec 2023 12:37:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58304 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1443464AbjLGRhk (ORCPT ); Thu, 7 Dec 2023 12:37:40 -0500 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CDDE31710 for ; Thu, 7 Dec 2023 09:37:45 -0800 (PST) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6BE67C433CA; Thu, 7 Dec 2023 17:37:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1701970665; bh=3sD3ogk0NVpfU+VklXA2R5FUXoYkr9sHh9QdJwrb4lM=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=Sp+8mrnjzS0mh8ZWo/b1VT7M7xanGjzrL4R3/XA1w/JAYje7CiSLb5pRkhwIQah6g LFA8wuhDBjy/4X1vGOgNrRuuAWkkJD6iae8Jescm6zG++r8bGmwwhSaWBQfP5/wB1L KR0KzPYFBj+fgtYbZnOhUjiUPiXQSr2c8kER503VXVvoqzbtDmlIW+qV8Y5GNMgJrr Lqa6l4kKHheSHZgybaBDkR1+Vjuwcoj51Y1k8zsYOVF5kTcrU6Jc+V9dqf6T0kXJwn BTyvDIxNo60IizDKOgFk+D6R0BhGw9nRmoDGrM7ghTPtkVry6iNmpHdtxECQOUaSCM zwZN8q56cnHrQ== Received: by mail-lf1-f52.google.com with SMTP id 2adb3069b0e04-50bfd7be487so1237430e87.0; Thu, 07 Dec 2023 09:37:45 -0800 (PST) X-Gm-Message-State: AOJu0YyHbZdQPuOgsdKs6Yp2yOMGeIaH+KAwE0Yaofo061MANnntZfIs k4JB58tM3flslq6ZmU05VJc6AdiGfQrtixuxro4= X-Received: by 2002:a05:6512:3da1:b0:50b:f9b2:cf2 with SMTP id k33-20020a0565123da100b0050bf9b20cf2mr2182723lfv.40.1701970663497; Thu, 07 Dec 2023 09:37:43 -0800 (PST) MIME-Version: 1.0 References: <6e6816dd-2ec5-4bca-9558-60cfde46ef8c@sapience.com> <714b22c7-b8dd-008d-a1ea-a184dc8ec1cf@linux.dev> In-Reply-To: From: Song Liu Date: Thu, 7 Dec 2023 09:37:32 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: md raid6 oops in 6.6.4 stable To: Genes Lists Cc: Guoqing Jiang , Bagas Sanjaya , snitzer@kernel.org, yukuai3@huawei.com, axboe@kernel.dk, mpatocka@redhat.com, heinzm@redhat.com, Linux Kernel Mailing List , Linux RAID , Linux Regressions , Bhanu Victor DiCara <00bvd0+linux@gmail.com>, Xiao Ni , Greg Kroah-Hartman Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-1.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Thu, 07 Dec 2023 09:37:55 -0800 (PST) On Thu, Dec 7, 2023 at 7:58=E2=80=AFAM Genes Lists wro= te: > > On 12/7/23 09:42, Guoqing Jiang wrote: > > Hi, > > > > On 12/7/23 21:55, Genes Lists wrote: > >> On 12/7/23 08:30, Bagas Sanjaya wrote: > >>> On Thu, Dec 07, 2023 at 08:10:04AM -0500, Genes Lists wrote: > >>>> I have not had chance to git bisect this but since it happened in > >>>> stable I > >>>> thought it was important to share sooner than later. > >>>> > >>>> One possibly relevant commit between 6.6.3 and 6.6.4 could be: > >>>> > >>>> commit 2c975b0b8b11f1ffb1ed538609e2c89d8abf800e > >>>> Author: Song Liu > >>>> Date: Fri Nov 17 15:56:30 2023 -0800 > >>>> > >>>> md: fix bi_status reporting in md_end_clone_io > >>>> > >>>> log attached shows page_fault_oops. > >>>> Machine was up for 3 days before crash happened. > > > > Could you decode the oops (I can't find it in lore for some reason) > > ([1])? And > > can it be reproduced reliably? If so, pls share the reproduce step. > > > > [1]. https://lwn.net/Articles/592724/ > > > > Thanks, > > Guoqing > > - reproducing > An rsync runs 2 x / day. It copies to this server from another. The > copy is from a (large) top level directory. On the 3rd day after booting > 6.6.4, the second of these rysnc's triggered the oops. I need to do > more testing to see if I can reliably reproduce. I have not seen this > oops on earlier stable kernels. > > - decoding oops with scripts/decode_stacktrace.sh had errors : > readelf: Error: Not an ELF file - it has the wrong magic bytes at > the start > > It appears that the decode script doesn't handle compressed modules. > I changed the readelf line to decompress first. This fixes the above > script complaint and the result is attached. I probably missed something, but I really don't think the commit (2c975b0b8b11f1ffb1ed538609e2c89d8abf800e) could trigger this issue. From the trace: kernel: RIP: 0010:update_io_ticks+0x2c/0x60 =3D> 2a:* f0 48 0f b1 77 28 lock cmpxchg %rsi,0x28(%rdi) << trapped = here. [...] kernel: Call Trace: kernel: kernel: ? __die+0x23/0x70 kernel: ? page_fault_oops+0x171/0x4e0 kernel: ? exc_page_fault+0x175/0x180 kernel: ? asm_exc_page_fault+0x26/0x30 kernel: ? update_io_ticks+0x2c/0x60 kernel: bdev_end_io_acct+0x63/0x160 kernel: md_end_clone_io+0x75/0xa0 <<< change in md_end_clone_io The commit only changes how we update bi_status. But bi_status was not used/checked at all between md_end_clone_io and the trap (lock cmpxchg). Did I miss something? Given the issue takes very long to reproduce. Maybe we have the issue before 6.6.4? Thanks, Song