Received: by 2002:a05:7412:b130:b0:e2:908c:2ebd with SMTP id az48csp1223657rdb; Sat, 18 Nov 2023 07:37:29 -0800 (PST) X-Google-Smtp-Source: AGHT+IGFtOcbalxLj0Sn3zRHGrrhZNmqBobhVXBnxGx3NmZWkX/xmaIUGFVA+NNA7QqC/zJsUQbe X-Received: by 2002:a05:6808:1888:b0:3b5:663c:9b91 with SMTP id bi8-20020a056808188800b003b5663c9b91mr3996652oib.12.1700321849173; Sat, 18 Nov 2023 07:37:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700321849; cv=none; d=google.com; s=arc-20160816; b=cHoohxcPtaIECTjcONl3lJFjKEz1r/8bbxDH+5OIkeQaJ+RCDpf0hOHGFTGVrJ/T7u OvbTC198ygOhAQOEgcbnTzToSZlYsZZtNoWxZZhYszisd2eMG4/GRGfc4pZEG1GnKqMY mJ8sJYTRIg6lCmckQnxHUVoKMumnKbwwrx4WA8aRv7c9m+ZHUB8dThyWZEzFWzlsjXrE 6xIzKfa0Cw6PWRVvjhW0h0Oki9j/HButBi7keSxsoHyzxjKmBwWF6udaVCeNcio8ydih Vkj/5BSuSZQXBLdpsoyKW9snByyMWNw06E9OcHm/TIVx7/KpG4cvejMWJ6Q9VyE7f8nx 1yEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:mail-followup-to:message-id:subject:to:from :date:dkim-signature; bh=nmqU21qbLBipYaRstMKAglKCu2uL8lE+8Hu6o1FhMEI=; fh=54X36hc4C8yhX49HNED0S+YM4iUwh+d4rcLGPGErGVY=; b=A+XycRdwJcnaTQLkU+/ls3PY3ycMV4TUBxcP2F2WLp63/NzHu+6/kGITVvvuQxy0D8 RUyaig8tFIfNUuyh9/aos+EVMkkI2/PmX4yylSLDGAVZzWubsgqRizr+kWDXzB1TNzDt 1RMbsUFVT3RFvSca0rK0V4kv1dS7hTYl9Q+lehUGFTdx8EL80/1BydNeadCBQTTb6RyW CCmVKh708O62ZgRoA4px0VgbWzOfHp9R1HtoxLZRJSYNTQXMB45jfkbo0LKZm686Eehy T74AnoIs5tb4+jHrBhrjMqGeqgMOcYvFZCCwnumD3SbDvgKPMAPottrDUwjNiQ4JhiFf dErw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@debian.org header.s=smtpauto.stravinsky header.b=I+6tloqd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id a13-20020a65640d000000b005bd641c3614si4462354pgv.769.2023.11.18.07.37.28 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 18 Nov 2023 07:37:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@debian.org header.s=smtpauto.stravinsky header.b=I+6tloqd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id E4A16809FC9A; Sat, 18 Nov 2023 07:37:13 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229717AbjKRPhC (ORCPT + 99 others); Sat, 18 Nov 2023 10:37:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37560 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229469AbjKRPhA (ORCPT ); Sat, 18 Nov 2023 10:37:00 -0500 Received: from stravinsky.debian.org (stravinsky.debian.org [IPv6:2001:41b8:202:deb::311:108]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BB664C4; Sat, 18 Nov 2023 07:36:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debian.org; s=smtpauto.stravinsky; h=X-Debian-User:In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:To:From:Date:Reply-To:Cc: Content-Transfer-Encoding:Content-ID:Content-Description; bh=nmqU21qbLBipYaRstMKAglKCu2uL8lE+8Hu6o1FhMEI=; b=I+6tloqdssaYDTvjf127c/KdoC 0ijQ7lvtFPm3q6J7iqH9NhXl1qNIGG3fIKug1TAMNjAJtvCOe/wUKhdf0DkfH2fzkcCT9p2NUzPCl r6QwfjYgPl1A2qfXJEBfVPV3fDMs4aYk6MRjmU4gI6sKQ0mEVlV1FK3UEmkhD5orhOmMEEMnTl4hW 3PDJyW1VRNs+f5VXm/dfDd1m08/OQ1X94HGsDSq7P3ziiINrydAnApnTq6yPBHn2XG76Aj0S5gXQ3 jWYBAgxFS3arDVU4wRnjfdDewoz/KRA06t/LydyiKqDtVbtQaeEvUTRjlfs7PlLO8t88r+9PW/hQ8 YIl2IA8g==; Received: from authenticated user by stravinsky.debian.org with esmtpsa (TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.94.2) (envelope-from ) id 1r4NNK-002GW9-7M; Sat, 18 Nov 2023 15:36:50 +0000 Received: from ohm.aurel32.net ([2001:bc8:30d7:111::2] helo=ohm.rr44.fr) by hall.aurel32.net with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1r4NNG-00AvqW-UC; Sat, 18 Nov 2023 16:36:46 +0100 Received: from aurel32 by ohm.rr44.fr with local (Exim 4.97) (envelope-from ) id 1r4NNF-00000003iRQ-0d40; Sat, 18 Nov 2023 16:36:45 +0100 Date: Sat, 18 Nov 2023 16:36:45 +0100 From: Aurelien Jarno To: Jiaxun Yang , linux-mips@vger.kernel.org, linux-kernel@vger.kernel.org, tsbogend@alpha.franken.de, syq@debian.org, stable@vger.kernel.org Subject: Re: [PATCH] MIPS: process: Remove lazy context flags for new kernel thread Message-ID: Mail-Followup-To: Jiaxun Yang , linux-mips@vger.kernel.org, linux-kernel@vger.kernel.org, tsbogend@alpha.franken.de, syq@debian.org, stable@vger.kernel.org References: <20231026111715.1281728-1-jiaxun.yang@flygoat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/2.2.12 (2023-09-09) X-Debian-User: aurel32 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,LOTS_OF_MONEY, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Sat, 18 Nov 2023 07:37:14 -0800 (PST) Hi, On 2023-10-27 16:58, Aurelien Jarno wrote: > On 2023-10-26 12:17, Jiaxun Yang wrote: > > We received a report from debian infra team, says their build machine > > crashes regularly with: > > > > [ 4066.698500] do_cpu invoked from kernel context![#1]: > > [ 4066.703455] CPU: 1 PID: 76608 Comm: iou-sqp-76326 Not tainted 5.10.0-21-loongson-3 #1 Debian 5.10.162-1 > > [ 4066.712793] Hardware name: Loongson Lemote-3A4000-7A-1w-V1.00-A1901/Lemote-3A4000-7A-1w-V1.00-A1901, BIOS Loongson-PMON-V3.3-20201222 12/22/2020 > > [ 4066.725672] $ 0 : 0000000000000000 ffffffff80bf2e48 0000000000000001 9800000200804000 > > [ 4066.733642] $ 4 : 9800000105115280 ffffffff80db4728 0000000000000008 0000020080000200 > > [ 4066.741607] $ 8 : 0000000000000001 0000000000000001 0000000000000000 0000000002e85400 > > [ 4066.749571] $12 : 000000005400cce0 ffffffff80199c00 000000000000036f 000000000000036f > > [ 4066.757536] $16 : 980000010025c080 ffffffff80ec4740 0000000000000000 980000000234b8c0 > > [ 4066.765501] $20 : ffffffff80ec5ce0 9800000105115280 98000001051158a0 0000000000000000 > > [ 4066.773466] $24 : 0000000000000028 9800000200807e58 > > [ 4066.781431] $28 : 9800000200804000 9800000200807d40 980000000234b8c0 ffffffff80bf3074 > > [ 4066.789395] Hi : 00000000000002fb > > [ 4066.792943] Lo : 00000000428f6816 > > [ 4066.796500] epc : ffffffff802177c0 _save_fp+0x10/0xa0 > > [ 4066.801695] ra : ffffffff80bf3074 __schedule+0x804/0xe08 > > [ 4066.807230] Status: 5400cce2 KX SX UX KERNEL EXL > > [ 4066.811917] Cause : 1000002c (ExcCode 0b) > > [ 4066.815899] PrId : 0014c004 (ICT Loongson-3) > > [ 4066.820228] Modules linked in: asix usbnet mii sg ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables nfnetlink_log nfnetlink xt_hashlimit ipt_REJECT nf_reject_ipv4 xt_NFLOG xt_multiport xt_tcpudp xt_state xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c iptable_filter sch_fq tcp_bbr fuse drm drm_panel_orientation_quirks configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic ohci_pci dm_mod r8169 realtek mdio_devres ohci_hcd ehci_pci of_mdio xhci_pci fixed_phy xhci_hcd ehci_hcd libphy usbcore usb_common > > [ 4066.868085] Process iou-sqp-76326 (pid: 76608, threadinfo=0000000056dd346c, task=000000001209ac62, tls=000000fff18298e0) > > [ 4066.878897] Stack : ffffffff80ec0000 0000000000000000 ffffffff80ec0000 980000010db34100 > > [ 4066.886867] 9800000100000004 d253a55201683fdc 9800000105115280 0000000000000000 > > [ 4066.894832] 0000000000000000 0000000000000001 980000010db340e8 0000000000000001 > > [ 4066.902796] 0000000000000004 0000000000000000 980000010db33d28 ffffffff80bf36d0 > > [ 4066.910761] 980000010db340e8 980000010db34100 980000010db340c8 ffffffff8070d740 > > [ 4066.918726] 980000010946cc80 9800000104b56c80 980000010db340c0 0000000000000000 > > [ 4066.926690] ffffffff80ec0000 980000010db340c8 980000010025c080 ffffffff80ec5ce0 > > [ 4066.934654] 0000000000000000 9800000105115280 ffffffff802c59b8 980000010db34108 > > [ 4066.942619] 980000010db34108 2d7071732d756f69 ffff003632333637 d253a55201683fdc > > [ 4066.950585] ffffffff8070d1c8 980000010db340c0 98000001092276c8 000000007400cce0 > > [ 4066.958552] ... > > [ 4066.960981] Call Trace: > > [ 4066.963414] [] _save_fp+0x10/0xa0 > > [ 4066.968270] [] __schedule+0x804/0xe08 > > [ 4066.973462] [] schedule+0x58/0x150 > > [ 4066.978397] [] io_sq_thread+0x578/0x5a0 > > [ 4066.983764] [] ret_from_kernel_thread+0x14/0x1c > > [ 4066.989823] > > [ 4066.991297] Code: 000c6940 05a10011 00000000 f4830b10 f4850b30 f4870b50 f4890b70 f48b0b90 > > > > It seems like kernel is trying to save a FP context for a kthread. > > Since we don't use FPU in kernel for now, TIF_USEDFPU must be set > > accidentally for that kthread. > > > > Inspecting the code it seems like create_io_thread may be invoked > > from threads that have FP context alive, causing TIF_USEDFPU to be > > copied from that context to kthread unexpectedly. > > > > Move around code blocks to ensure flags regarding lazy hardware > > context get cleared for kernel threads as well. > > > > Cc: stable@vger.kernel.org > > Reported-by: Aurelien Jarno > > Signed-off-by: Jiaxun Yang > > Thanks for the patch. In the meantime we have found that the problem is > reproducible by building the kitinerary package. The crash happens when > cmake starts the build. It's not impossible that other packages are able > to also trigger the crash, but we haven't identified them yet. > > Anyway, I have been able to test a backport of the patch onto the 5.10 > kernel (with minor adjustments) and I confirm it fixes the reported > issue. > > Tested-by: Aurelien Jarno It seems that this patch hasn't been merged yet, either in Linus' tree or in the MIPS tree. Is there anything blocking? Regards Aurelien -- Aurelien Jarno GPG: 4096R/1DDD8C9B aurelien@aurel32.net http://aurel32.net