Received: by 10.223.176.46 with SMTP id f43csp4436436wra; Tue, 23 Jan 2018 09:22:46 -0800 (PST) X-Google-Smtp-Source: AH8x227keck9WgnI7zqwLT/twsN0qkpociP/fVdcFEP8TMCftn/Fn1jXtb+qYAKPZkQUnpIA5Nz9 X-Received: by 10.107.57.4 with SMTP id g4mr4422661ioa.236.1516728166083; Tue, 23 Jan 2018 09:22:46 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1516728166; cv=none; d=google.com; s=arc-20160816; b=IZV7Wf6el4ULlCTbA98bM5GuA0zhOG9wz8zq+zsUsdPKahBqGkbXgOxRB0Ye4vzRtI 3suqAersVOQ1TvLreBdq20pDVJJS4rZcH41maBDrhEi6htSQlNGFfnL4rMgh11LhR9wy lBBz1cdY/C91i7NyFZnotkfVWRLaTwpUrQnKozLfwSjn6zhHD2eyoX4/9hT0YEf/mDRa W/7y6x73vRcFHYy/BYSQDGEiTHeay0x6wiTSvOIkdUCcqr1UZgKCmPgj8JeUCtb46Q/V uPVIKIq51J0UneY3Yyy/eiv/mNlJKppIOWKtqK1+PqPmY+n5MFasY+yVT07gRo8Hoj9Z Slfg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=5UBNIpb+GKmxuv8kc/hVAbQXM1SWtAdOn1fGr+zoQaw=; b=qS3wLTJpk6xb2y+MTE6pHP51D6OJLev3nMo7hqMPTyGv64UrEoF6xj1x92CwvMawmX DCGrJ+Af/pUtBbtambOJIKFTeyLCkGzIV9nGFxVMpMaLtfKjnE4oebaUmTV6hjPu6MWR MxcdD6VMzW60xgxQDF+1fCuSDddGLjlulN/lKaXaX3wDrI0KGuXEnVd6TcYNe/543b5q eBStVcwh6ggFCTl4NiuoLTzkF7vo1ku+36LuIG5dFsGmmOYhUSe0HcGK5jegdeoKuw00 XbFqkQ4gRtRBifQRR1U7xpHaaS93UCOJQRjhqnSefTt8BzuBM++/LW1uHJidv9zbten0 1Kkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=H2eAjSii; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g38si16352844ioj.81.2018.01.23.09.22.33; Tue, 23 Jan 2018 09:22:46 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20161025 header.b=H2eAjSii; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751512AbeAWRV6 (ORCPT + 99 others); Tue, 23 Jan 2018 12:21:58 -0500 Received: from mail-qt0-f169.google.com ([209.85.216.169]:37283 "EHLO mail-qt0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751259AbeAWRV5 (ORCPT ); Tue, 23 Jan 2018 12:21:57 -0500 Received: by mail-qt0-f169.google.com with SMTP id d54so3250200qtd.4 for ; Tue, 23 Jan 2018 09:21:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=5UBNIpb+GKmxuv8kc/hVAbQXM1SWtAdOn1fGr+zoQaw=; b=H2eAjSiiHKnzig1+wKhYRyrrS7/jH35hPzwjTcWLQKdjgnkMhecpfM0oFoQsu+TruO zbFGhZwQUm9Six/h1oH5UD4QTXfVqzn4Biwjf6xwdNxpvKfkp37Wx/MZkic2xx4XKcWd doblO0VbYVxmVOrNIQankh81LWrNenHNTjHGO/AbK7nRHMR/UiQsa367dhdjM01FLaIx 39oVjCFN1fNXBHiXT+gzm6KDytlmvcLAFEI9hHDz4W5FtGvim75FEhaO5lyJ5E6DQNwp eGeTP7o5zU1a8VI2rYIsaHK7sT5eiBFCKaR3F+WftUXkNzzecndz3oNYP+6MRIsIDZzR FZeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to:user-agent; bh=5UBNIpb+GKmxuv8kc/hVAbQXM1SWtAdOn1fGr+zoQaw=; b=lg8cbNyc7O4b7smII/tk/sJ+AbTOJF0C6lrcvCGg1AJwmsohuC3Yn3esO6qnw0ykQ8 /b9RKrMEGvSqVz7piuGLHjv9KPkrW+K5mISzkiCpFFnYrKg37dB4DLYneHJZohOQ28ax XYOTGGgJBWffpZieV8B3rvqj4N+bC7k3KXlNsKGmEQJV6M06572Cdzf6qvM3gTqPW40j TsAaGIwaz6ROkDzGaOYAJDq7TP8V1aqFYK2iS7cYSNal9Wn+/Be6ViklOfVKrsyWEwZ9 SvYrqsk6sh+yxGv+HL6BAaJqnC4hZAebxHRB2AACm4x2+1AyG2PDRWdsrBfizGL3rjOs Uktw== X-Gm-Message-State: AKwxyteeshcP3Ssf/81IWJ+MoIB9E46OIQfdCGlySlVsVI+wMfL/Y8yP o9ZtX0Evfzq/+gLoF4RaoYv80C1j X-Received: by 10.55.249.12 with SMTP id l12mr4580356qkj.354.1516728116572; Tue, 23 Jan 2018 09:21:56 -0800 (PST) Received: from localhost (dhcp-ec-8-6b-ed-7a-cf.cpe.echoes.net. [72.28.5.223]) by smtp.gmail.com with ESMTPSA id m70sm12064084qke.7.2018.01.23.09.21.55 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 23 Jan 2018 09:21:55 -0800 (PST) Date: Tue, 23 Jan 2018 09:21:53 -0800 From: Tejun Heo To: Steven Rostedt Cc: Sergey Senozhatsky , Sergey Senozhatsky , Petr Mladek , akpm@linux-foundation.org, linux-mm@kvack.org, Cong Wang , Dave Hansen , Johannes Weiner , Mel Gorman , Michal Hocko , Vlastimil Babka , Peter Zijlstra , Linus Torvalds , Jan Kara , Mathieu Desnoyers , Tetsuo Handa , rostedt@home.goodmis.org, Byungchul Park , Pavel Machek , linux-kernel@vger.kernel.org Subject: Re: [PATCH v5 0/2] printk: Console owner and waiter logic cleanup Message-ID: <20180123172153.GF1771050@devbig577.frc2.facebook.com> References: <20180119132052.02b89626@gandalf.local.home> <20180120071402.GB8371@jagdpanzerIV> <20180120104931.1942483e@gandalf.local.home> <20180121141521.GA429@tigerII.localdomain> <20180123064023.GA492@jagdpanzerIV> <20180123095652.5e14da85@gandalf.local.home> <20180123152130.GB429@tigerII.localdomain> <20180123104121.2ef96d81@gandalf.local.home> <20180123154347.GE1771050@devbig577.frc2.facebook.com> <20180123111330.4356ec8d@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180123111330.4356ec8d@gandalf.local.home> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hey, On Tue, Jan 23, 2018 at 11:13:30AM -0500, Steven Rostedt wrote: > From what I understand is that there's an issue with one of the printk > consoles, due to memory pressure or whatnot. Then a printk happens > within a printk recursively. It gets put into the safe buffer and an > irq is sent to printk this printk. > > The issue you are saying is that when the printk enables interrupts, > the irq work triggers and loads the log buffer with the safe buffer, and > then the printk sees the new data added and continues to print, and > hence never leaves this printk. I'm not sure it's irq or the same calling context, but yeah whatever it may be, it keeps adding new data. > Your solution is to delay the flushing of the safe buffer to another > thread (work queue), which I also have issues with, because you break > the "get printks out ASAP mantra". Then the work queue comes in and > flushes the printks. And since the printks cause printks, we continue > to spam the machine, but hey, we are making forward progress. I'm not sure "get printks out ASAP mantra" is the overriding concern after spending 20s flushing in an unknown context. I'm honestly curious. Would that still matter that much at that point? I went through the recent common crashes in the fleet earlier today and a good number of them are printk taking too long unnecessarily escalating the situation (most commonly triggering NMI watchdog). I'm not saying that this should override other concerns but it seems clear to me that we're pretty badly exposed on this front. > Again, this is treating the symptom and not solving the problem. Or adding a safety net when things go south, but this isn't what I was trying to argue. I mostly thought your understanding of what I reported wasn't accurate and wanted to clear that up. > I really hate delaying printks to another thread, unless we can > guarantee that that thread is ready to go immediately (basically > spinning on a run queue waiting to print). Because if the system is > having issues (which is the main reason for printks to happen), there's > no guarantee that a work queue or another thread will ever schedule, > and the safe printk buffer never gets out to the consoles. > > I much rather have throttling when recursive printks are detected. > Make it a 100 lines to print if you want, but then throttle. Because > once you have 100 lines or so, you will know that printks are causing > printks, and you don't give a crap about the repeated process. Allow > one flushing of the printk safe buffers, and then if it happens again, > throttle it. > > Both methods can lose important data. I believe the throttling of > recursive printks, after 100 prints or whatever, will be the least > likely to lose important data, because printks caused by printks will > just keep repeating the same data, and we don't care about repeats. But > delaying the flushing could very well lose important data that caused > a lockup. Hmmm... what you're suggesting still seems more fragile - ie. when does that 100 count get reset? OOM prints quite a few lines and if we're resetting on each line, that two order explosion of messages can still be really really bad. And issues like that seem to suggest that the root problem to handle here is avoiding locking up a context in flushing for too long. Your approach is trying to avoid causing that but it's a symptom which can be reached in many different ways. Thanks. -- tejun