Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp1888535rwb; Thu, 29 Sep 2022 03:43:19 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4u+F0y+qoIPEOo/VJZYF2aEmpYfz4j8ZRajyz2nwE6pR/6gWs8ScF1tanrXt+OfumeqliS X-Received: by 2002:a17:90b:1801:b0:202:7cf6:9f93 with SMTP id lw1-20020a17090b180100b002027cf69f93mr15183019pjb.170.1664448199367; Thu, 29 Sep 2022 03:43:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664448199; cv=none; d=google.com; s=arc-20160816; b=c5Ysg7L4GcLtPzWj2dqYc6BwNNKNdSFed4z/frnDSjRLf6eZ+uoEqcmfLLG3fHJ5EN KzG+LGnzBIBpxmeUAf7seqRBR1EoDR4Vd/0t5eEZbqnu/IHF1SrAQfka9eprbKFayR7S 4vfNc3PoPwrwQYq3InT3wVr6v/XZzkh8ZF47uomVxx48fpV1qmOBFOxVKM3UY3c5iYA1 RHShn5yxhyeF7o+72wDnRdmG8oO0hE0kIBNDEYvBxpPq2OQ5eqEj4/zuhuLlwv9u+sRD 4OcC3We/OHsFSrlbptxjbBtSDkjokDgWw24K4uUgi/GMcvSPAbR7D08eVrxw+MQqkSzw TfCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=C9csrttS1u7dS43wZTmDCRyS86K0Uo6jdxqCmerZvvc=; b=AG36FDsHkemo/Qj2prIOUj+3Wtyt5MQtk0b3Tul36sUODPUaRB0EX+4jwH4h77PS1p 8vwvu6DW/myOgohdo2r7qHH3xZaB7RAJ8RUs4fEswzmQaoQGv5ABWec1/6NUYqpgaCV2 knO/0C2eNgy8PYi4GOlE8/QQcYS4wyX6QuP1CW6pt2f5uGIFF70qOw+GJWQVDyVgMPQs NsSL0T9d6H8zWlKJpW90V1ddEA6cSdB7AhaOD3y1e+PJ64UbS0t8kKx1WF5g6G7CN0Nd +Yo88CqxIjNp3DH2Qtliem03H9nGi1jMPvhWC32bqQAkaPpI588a9buJCU2HA6TMIgK4 h70g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=nwb+DVpI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t35-20020a634463000000b0042bd711f28bsi8375563pgk.641.2022.09.29.03.43.07; Thu, 29 Sep 2022 03:43:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=nwb+DVpI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234945AbiI2KMq (ORCPT + 99 others); Thu, 29 Sep 2022 06:12:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234284AbiI2KMo (ORCPT ); Thu, 29 Sep 2022 06:12:44 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 84D011497A1 for ; Thu, 29 Sep 2022 03:12:41 -0700 (PDT) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id B3E5021C09; Thu, 29 Sep 2022 10:12:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1664446360; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=C9csrttS1u7dS43wZTmDCRyS86K0Uo6jdxqCmerZvvc=; b=nwb+DVpI0KvS4d4BNZ637BpZl6+psHRaKEdWeFdHS5y6bweRXRVdSs2tM7j+iPUTCGzUM3 IkGirSJJkMJ8lCn1a7bnfu1wVQyYha4B0aEGpS/tcHscEVqCv3Xo9+GygzPndqYMb5CYvC q/yNLgt+raTD9FZZr7tFw7mZLZqaaL0= Received: from suse.cz (unknown [10.100.208.146]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id B4B672C149; Thu, 29 Sep 2022 10:12:39 +0000 (UTC) Date: Thu, 29 Sep 2022 12:12:39 +0200 From: Petr Mladek To: Conor Dooley Cc: Thorsten Leemhuis , Conor Dooley , senozhatsky@chromium.org, rostedt@goodmis.org, john.ogness@linutronix.de, linux-kernel@vger.kernel.org, regressions@lists.linux.dev Subject: Re: [resend][bug] low-probability console lockups since 5.19 Message-ID: References: <98f62903-3d6f-30b4-82ef-3b0460824907@leemhuis.info> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 2022-09-29 10:29:05, Conor Dooley wrote: > On Thu, Sep 29, 2022 at 11:06:01AM +0200, Thorsten Leemhuis wrote: > > Hi Conor > > > > On 28.09.22 18:55, Conor Dooley wrote: > > > On Fri, Sep 23, 2022 at 05:24:17PM +0100, Conor Dooley wrote: > > >> > > >> Been bisecting a bug that is causing a boot failure in my CI & have > > >> ended up here.. The bug in question is a low(ish) probability lock up > > >> of the serial console, I would estimate about 1-in-5 chance on the > > >> boards I could actually trigger it on which it has taken me so long > > >> to realise that this was an actual problem. Thinking back on it, there > > >> were other failures that I would retroactively attribute to this > > >> problem too, but I had earlycon disabled > > > > There is one thing I wonder when skimming this thread: was there maybe > > some other change somewhere in the kernel between the introduction and > > the revert of the printk console kthreads patches that is the real > > culprit here that makes existing, older races easier to hit? But I guess > > in the end that would be very hard to find and it's easier to fix the > > problem in the console driver... :-/ > > Entirely possible that something arrived in the middle, yeah. I've done > 100s of reboots on that interim section, albeit with the threaded > printers enabled, as I restarted the bisection several times & never hit > this failure then. Interesting. I wonder if the used console was fixed during the window when the kthreads were enabled. > I don't know anything about console/printk/serial drivers unfortunately > so I will almost certainly not be able to find the problem by > inspection. I'd rather submit patches than send reports, but I really > really need some help here. I looked at the two patterns Petr suggested, > but the former I am not sure applies since the issue is present even > when earlycon is disabled & the latter appears (to my untrained eye) to > be accounted for in the 8250 driver. The problem with the missing port->lock is visible only when the early console is enabled. But It is really hard to hit without the kthreads. The problem with enabled IRQs was visible only with kthreads. The original code called console->write() callback already with IRQs disabled. The kthreads called console->write() callback with IRQs enabled. It made sense. They need to be disabled only when really needed and the tested drivers did this correctly. Best Reagrds, Petr