Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp764061pxb; Thu, 31 Mar 2022 17:23:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwJpTvV0KjqBw8shcIQooaD1iHeTrPKK5caOvumdSi2mw3f5AqacYdaYqC604yDulhqPHxr X-Received: by 2002:a17:902:f78d:b0:14d:522e:deb3 with SMTP id q13-20020a170902f78d00b0014d522edeb3mr7864808pln.173.1648772632538; Thu, 31 Mar 2022 17:23:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648772632; cv=none; d=google.com; s=arc-20160816; b=bqShC8HV/CXXCKlSBlmZNF1liuSaf4tMI74Csa453eIqA6h8fVh5ouITMHexXFMZzV YVnv2+0jcFgdcT7brCUdGMBRmKd+YMi4CS73HDl5jY8n3slfVjWJag1FUSnWyccnaGSm ietYgemcOOgDnDR9rf4YDNqbEY6NVrIIx0EAh3cMKrSacvZoTEgjWVQZMdPLIokozA4p E7j7mOUB4AkfR+8CyBZ7JrkN7mjQkOQ7vQOibHprmdKKrbbAmS0d2bzXQUlqU+1McNX0 ZQP82a4Ic/LV6KBMZee/2n/aaAAYuH8D58WP7zNsOgscn2Y0k0eGkZbh2clz3fM7LNju VT7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=jPM2xoSsPRzne0FwV80gOP1p1YR6AO8TXdLsJHBdtgE=; b=DZ9MzUgJTefXv4Tin8AGbZyE3WPPeniZ7U0j79JfNUrrxNotWhGPJ/J3TL0aWtaiKY loUnXC6zGnnxYtH13mKBnT2K0AqpOhk1cGwu8RvKdhLuVNIGV3eWFIMApMb0iAsFjIz9 oefRQvwKaujg1UJxGsZdASJrRUORYjwhXDR4soh4uGTnAQSFjAunieswbyihraLGqCFl flWIxsp0+kg4BRb1imeg6H0QgKiXo4Xt1t/qI8sOCNy0dzZUuUqOYuPCzPV8svc1fH1Q p4EyS6QoQtc48oQ0/OBHibyIiLnnmMH2TR0pVLSK8Jlaqw5kvGU3bzMC2yBWHHqVBBGM WBcg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u18-20020a17090341d200b00153b2d164ffsi785121ple.263.2022.03.31.17.23.38; Thu, 31 Mar 2022 17:23:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241138AbiCaUB5 (ORCPT + 99 others); Thu, 31 Mar 2022 16:01:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240855AbiCaUBz (ORCPT ); Thu, 31 Mar 2022 16:01:55 -0400 Received: from 1wt.eu (wtarreau.pck.nerim.net [62.212.114.60]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 7ABFB17335E for ; Thu, 31 Mar 2022 13:00:07 -0700 (PDT) Received: (from willy@localhost) by pcw.home.local (8.15.2/8.15.2/Submit) id 22VJxwec026309; Thu, 31 Mar 2022 21:59:58 +0200 Date: Thu, 31 Mar 2022 21:59:58 +0200 From: Willy Tarreau To: Ben Greear Cc: Linux Kernel Mailing List Subject: Re: Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00 since 5.17 Message-ID: <20220331195958.GA26284@1wt.eu> References: <20220331034343.GC23200@1wt.eu> <794a9c23-d3b9-c454-8f78-760060f0b9f2@candelatech.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <794a9c23-d3b9-c454-8f78-760060f0b9f2@candelatech.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_PASS, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 31, 2022 at 11:36:17AM -0700, Ben Greear wrote: > On 3/30/22 8:43 PM, Willy Tarreau wrote: > > Hi Ben, > > > > On Wed, Mar 30, 2022 at 02:27:56PM -0700, Ben Greear wrote: > > > Run /init as init process > > > Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00 > > > /init: error whitsc: Refined TSC clocksource calibration: 2903.996 MHz > > > le loading shareCPU: 2 PID: 1 Comm: init Not tainted 5.17.0+ #12 > > > d libraries: libclocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x29dc020bb13, max_idle_ns: 440795273180 ns > > > rt.so.1: cannot Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > > > open shared objeCall Trace: > > > ct file: No such > > > file or directo dump_stack_lvl+0x47/0x5c > > > ry > > > > The coincidence between this error about your userland "libclocksource" > > and the messages about the clock sources being refined makes me wonder > > if there could be an error experienced during this lib's initialization > > at a moment where the list of clocksources appears empty or opening one > > of the /sys file is temporarily refused. I suspect that making a much > > larger or much smaller initrd could change the initialization order > > enough to prevent such an event from happening, but that sounds a bit > > odd :-/ > > > > Willy > > For whatever reason, it was quite reproducible yesterday. I notice that it > often (50+% of the time) failed on soft reboot, but I don't think it failed > a single time when I then went and powered it down fully and powered it back on. > > So possibly it is some un-initialized memory somewhere that is exacerbating > some problem. > > I will keep a watch on these errors and see if they always related to libclocksource. > Looks like 'rt.so.1' is what it cannot find though? So maybe nothing particular to > do with /sys? I really don't know, not knowing these libs. We could imagine that the former only falls back on the latter when failing to find what it needs in /sys. Of course I'm not trying to invent a scenario, just to find a few rational explanations to what you're observing, as it's certainly strange. Maybe other tests could involve trying the reset button and also kexec to introduce some variations in the reboot methods. It sort-of reminds me a BIOS bug I faced a decade ago by which a server would randomly hang on soft reboot, and I finally figured that the BIOS couldn't handle reboots triggered by CPUs other than CPU0. The workaround I proposed then was to always use "taskset -c 0 reboot" and that one would never fail. Maybe you're observing sort of a variation of this, where some devices are not correctly initialized or your initrd gets some parts corrupted when loaded from the wrong CPU, just guessing. Cheers, Willy