Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2179515yba; Mon, 6 May 2019 00:50:24 -0700 (PDT) X-Google-Smtp-Source: APXvYqzaqiP/R1/9wMtSRHCjCaJk5odcSRUKSSIhBNrwheqDW2LPBZu/oecARex3/C6PE1uwT1ir X-Received: by 2002:a65:63d5:: with SMTP id n21mr30752650pgv.330.1557129024882; Mon, 06 May 2019 00:50:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1557129024; cv=none; d=google.com; s=arc-20160816; b=x7rjlH6KG+4LkRMB1d77MdBBgYKV3k/W1L3xJPzXs2q5FD4/yshOyijM3poUZ0E04X K/qPG2kyym1Hspj2lAKEYsEP7EZ2Icv6qdLMt4GhcYozMi4/AtXb3oesCDCx0QITKEmo uE7HlLQ/U7Z4p+ud0mqCZIPQc0mFN7pxnR9LzckHktwUZFm44fZQ2KFcoAN9gOY3oGNy IxZdj9IV5BGASCpFzXWzHyL2lmVJgQOvOdPDv90XGlKerTk8qTnU7tmej5OLkVmy6GaO Kx1aulPERcERW+imKazp0yHk+qiJA0ZdN5GNyOHpI4VvP1CoS3mjh1ro9W0OWASxgQ4S eqFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=rWV04ycEOLQzSYbsYnQxkKoCBhd/XTNstF66uwoIH60=; b=dveQ6sj4Qx9akw1ZW56AkiZQJDZMVf1GQc8hPr0nYg5TmF95ZTtzPuWoqhVpS8WM+s 89LWo4rK5cuIQrODx0AOBg8vMHdsACDegLmZRhfL08iNFXq2OQS/pZnAWhoImgUpfkfd uljlhV9vlRQnLvbMRWz5VbCQT8EhCMOc9QyRmXq6VNyUv7QSReZqgS0AStHJHjxNAZWx OCiz/lCrHiqK96jDrhOAne8t3oCEzZGTqL4CmUYqoBzPDZUwfzedN7w2u0yL1a/OnCUc 3rp4CICpRKWPro6EkVmUhsRxvcdiKt7dEocLevu9HKoTE6ktrrg3BfcqMFTCZppy5Ob9 Tm/Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s16si14219982plr.369.2019.05.06.00.50.09; Mon, 06 May 2019 00:50:24 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726470AbfEFHsM (ORCPT + 99 others); Mon, 6 May 2019 03:48:12 -0400 Received: from mx2.suse.de ([195.135.220.15]:53992 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726438AbfEFHsL (ORCPT ); Mon, 6 May 2019 03:48:11 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id C5F40AE27; Mon, 6 May 2019 07:48:09 +0000 (UTC) Date: Mon, 6 May 2019 09:48:09 +0200 From: Petr Mladek To: Daniel Vetter Cc: DRI Development , Intel Graphics Development , Daniel Vetter , Peter Zijlstra , Ingo Molnar , Will Deacon , Sergey Senozhatsky , Steven Rostedt , John Ogness , Linux Kernel Mailing List Subject: Re: [PATCH] RFC: console: hack up console_trylock more Message-ID: <20190506074809.huawsdaynyci5kwz@pathway.suse.cz> References: <20190502141643.21080-1-daniel.vetter@ffwll.ch> <20190503151437.dc2ty2mnddabrz4r@pathway.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon 2019-05-06 09:11:37, Daniel Vetter wrote: > On Fri, May 3, 2019 at 5:14 PM Petr Mladek wrote: > > On Thu 2019-05-02 16:16:43, Daniel Vetter wrote: > > > console_trylock, called from within printk, can be called from pretty > > > much anywhere. Including try_to_wake_up. Note that this isn't common, > > > usually the box is in pretty bad shape at that point already. But it > > > really doesn't help when then lockdep jumps in and spams the logs, > > > potentially obscuring the real backtrace we're really interested in. > > > One case I've seen (slightly simplified backtrace): > > > > > > Call Trace: > > > > > > console_trylock+0xe/0x60 > > > vprintk_emit+0xf1/0x320 > > > printk+0x4d/0x69 > > > __warn_printk+0x46/0x90 > > > native_smp_send_reschedule+0x2f/0x40 > > > check_preempt_curr+0x81/0xa0 > > > ttwu_do_wakeup+0x14/0x220 > > > try_to_wake_up+0x218/0x5f0 > > > pollwake+0x6f/0x90 > > > credit_entropy_bits+0x204/0x310 > > > add_interrupt_randomness+0x18f/0x210 > > > handle_irq+0x67/0x160 > > > do_IRQ+0x5e/0x130 > > > common_interrupt+0xf/0xf > > > > > > > > > This alone isn't a problem, but the spinlock in the semaphore is also > > > still held while waking up waiters (up() -> __up() -> try_to_wake_up() > > > callchain), which then closes the runqueue vs. semaphore.lock loop, > > > and upsets lockdep, which issues a circular locking splat to dmesg. > > > Worse it upsets developers, since we don't want to spam dmesg with > > > clutter when the machine is dying already. > > > > > > Fix this by creating a __down_trylock which only trylocks the > > > semaphore.lock. This isn't correct in full generality, but good enough > > > for console_lock: > > > > > > - there's only ever one console_lock holder, we won't fail spuriously > > > because someone is doing a down() or up() while there's still room > > > (unlike other semaphores with count > 1). > > > > > > - console_unlock() has one massive retry loop, which will catch anyone > > > who races the trylock against the up(). This makes sure that no > > > printk lines will get lost. Making the trylock more racy therefore > > > has no further impact. > > > > To be honest, I do not see how this could solve the problem. > > > > The circular dependency is still there. If the new __down_trylock() > > succeeds then console_unlock() will get called in the same context > > and it will still need to call up() -> try_to_wake_up(). > > > > Note that there are many other console_lock() callers that might > > happen in parallel and might appear in the wait queue. > > Hm right. It's very rare we hit this in our CI and I don't know how to > repro otherwise, so just threw this out at the wall to see if it > sticks. I'll try and come up with a new trick then. Single messages are printed from scheduler via printk_deferred(). WARN() might be solved by introducing printk deferred context, see the per-cpu variable printk_context. Best Regards, Petr