Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp2785670ybz; Mon, 27 Apr 2020 04:35:16 -0700 (PDT) X-Google-Smtp-Source: APiQypL/QMNK0FpzvblP9m4aXbbS432jtDtfYxl4aoAIrmS6QjsEKJviOTuwUfvfYxQFg6zN94om X-Received: by 2002:a17:906:3709:: with SMTP id d9mr20002957ejc.94.1587987315951; Mon, 27 Apr 2020 04:35:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1587987315; cv=none; d=google.com; s=arc-20160816; b=uz5VWaAzFXWbGGOYTN1RfmLyW1GZpvuj5sjjCEawNn4VKqBOT2j5XfhEwlqZT7lkVq TGRy4uaChGvJLbd4sP9wdKYW3GRaQb63vAj9AzzA0b8NmksH2uW1DQFXfYmehmn1I2mF ylc4T/XxTXj4y4Y9JF4eTcSbdmvI6Db73R40FnE4qo7VGOsX5TKXBWOrAnSWHT/q3tmZ 7YRSOwFaigKhQBK14bGiDUxXIlk9ChswaYnumsMAifS8WrBRIpKNvnsOYcsfqf0RCaKk W7qYMQVHKo899tif+gqvevwUv7xC3c8rnF+aGen/85Wrs4hnoOGzfPhbFIvVqOPmiRhG Ae0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :dkim-signature; bh=c8lcvtEYC9y6uEniKrRVr+DemGbYY65JfTRYgvU5zbs=; b=eDR7f8WNF2riIgX2sIFN9vOtNBHNnAgW3+Qn3VK6242tMQ5SeSnUbER+tCp1ycCEza dzAuLYDyD0oK8KLD5llsTFQlx4m4wi50styFgf2d9P845jj5MGUgRBxhlp3Wm5jJcPhs kQbnF22Tzq1PgeCQFu6vPPIKvZssr5vtTWQ5C5Vt9lYDecFvT358H1dYVwtXksZx85Jz ya2ekGAFPVHFYAbGkyr7TuHOu/uuZD8VhepnHw15FII7IcXY6db8TLIj8ZLgH/pa1wZ2 dVRzTsupnsC5iIflZw6YmHAcRDpx8Hn4BjeidWZ8KvSP+YMbJvYqNMwRODkljEjK5n6M JmbA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=lJfkm9oi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p25si8080085eji.408.2020.04.27.04.34.48; Mon, 27 Apr 2020 04:35:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=lJfkm9oi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727103AbgD0Lc0 (ORCPT + 99 others); Mon, 27 Apr 2020 07:32:26 -0400 Received: from mail.kernel.org ([198.145.29.99]:40294 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727074AbgD0LcX (ORCPT ); Mon, 27 Apr 2020 07:32:23 -0400 Received: from localhost (unknown [213.57.247.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 0128E206A4; Mon, 27 Apr 2020 11:32:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1587987142; bh=HyZEmG6wmcdniR2Viwm5NPHIbC+E2pVAsWUXqBU5HWg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=lJfkm9oiIt/XWlTLQ0TOgPbLC3asB2J0JzUMyvTUz9w8/BsLgMnd/hm0lF+RZhRKl 5CTU+CitEMpThKsklh5bwutbGd0kBFxqV+4UfKTtbmnmE00Vzy6vGQcsrqxCQVpIxv j6Qc4YCqMSfXRpIdePLZGmxtAa1S0y3RRIyBamSU= Date: Mon, 27 Apr 2020 14:32:18 +0300 From: Leon Romanovsky To: Thomas Gleixner Cc: Ingo Molnar , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , x86 , Suresh Siddha , linux-kernel@vger.kernel.org Subject: Re: [PATCH] x86/apic: Fix circular locking dependency between console and hrtimer locks Message-ID: <20200427113218.GB134660@unreal> References: <20200407170925.1775019-1-leon@kernel.org> <20200414054836.GA956407@unreal> <20200414062454.GA84326@gmail.com> <87tv15qj5u.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87tv15qj5u.fsf@nanos.tec.linutronix.de> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 27, 2020 at 01:09:49PM +0200, Thomas Gleixner wrote: > Ingo Molnar writes: > > * Leon Romanovsky wrote: > > The fix definitely looks legit, lockdep is right that we shouldn't take > > the console_sem.lock even under trylock. > > > > It's only a printk_once(), yet I'm wondering why in the last ~8 years > > this never triggered. Nobody ever ran lockdep and debug console level > > enabled on such hardware, or did something else change? > > > > One possibility would be that apic_check_deadline_errata() marked almost > > all Intel systems as broken and the TSC-deadline hardware never actually > > got activated. In that case you have triggered rarely tested code and > > might see other weirdnesses. Just saying. :-) > > > > Or a bootup with "debug" specified is much more rare in production > > systems, hence the 8 years old bug. > > None of this makes any sense at all. > > The local APIC timer (in this case the TSC deadline timer) is set up > during early boot on the boot CPU (before SMP setup) with this call > chain: > > smp_prepare_cpus() > native_smp_prepare_cpus() > x86_init.timers.setup_percpu_clockev() > setup_boot_APIC_clock() > setup_APIC_timer() > clockevents_config_and_register() > tick_check_new_device() > tick_setup_device() > tick_setup_oneshot() > clockevents_switch_state() > lapic_timer_set_oneshot() > __setup_APIC_LVTT() > printk_once(...) > > Nothing holds hrtimer.base_lock in this call chain. Can't printk hold that lock through console/netconsole? > > But the lockdep splat clearly says: > > [ 735.324357] stack backtrace: > [ 735.324360] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 5.6.0-for-upstream-dbg-2020-04-03_10-44-43-70 #1 > > ... > > So how can that be the first invocation of that printk_once()? > > While the patch looks innocent, it papers over the underlying problem > and wild theories are not really helping here. > > Here is a boot log excerpt with lockdep enabled and 'debug' on the > command line: > > [ 0.000000] Linux version 5.7.0-rc3 ... > ... > [ 3.992125] TSC deadline timer enabled > [ 3.995820] smpboot: CPU0: Intel(R) .... > ... > [ 4.050766] smp: Bringing up secondary CPUs ... > > No splat nothing. The real question is WHY this triggers on Leons > machine 735 seconds after boot and on CPU3. I want to believe that the timestamp are not correct, have no clue if it is even possible. But let's talk about facts: 1. It is started after -rc1 (we don't test linux-next). 2. This workaround helped to eliminate the splat. 3. My machine experiences the extra splat all the time https://lore.kernel.org/lkml/20200414070502.GR334007@unreal/ Unfortunately, I can't bisect because the failure mentioned in the commit message because it doesn't happen on one machine all the time, but when we are talking about night run regression, at least one of the runners hits such lockdep prints. I can add to our regression any debug patch and get results day after, if it helps. Thanks > > Thanks, > > tglx