Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932203AbcCNS4B (ORCPT ); Mon, 14 Mar 2016 14:56:01 -0400 Received: from g4t3428.houston.hp.com ([15.201.208.56]:57767 "EHLO g4t3428.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753642AbcCNSz6 (ORCPT ); Mon, 14 Mar 2016 14:55:58 -0400 Message-ID: <1457984870.6393.271.camel@hpe.com> Subject: Re: [PATCH 2/2] x86/mtrr: Refactor PAT initialization code From: Toshi Kani To: Ingo Molnar Cc: "bp@suse.de" , "hpa@zytor.com" , "tglx@linutronix.de" , "mcgrof@suse.com" , "jgross@suse.com" , "paul.gortmaker@windriver.com" , "x86@kernel.org" , "linux-kernel@vger.kernel.org" Date: Mon, 14 Mar 2016 13:47:50 -0600 In-Reply-To: <20160312161804.GA12688@gmail.com> References: <1457671546-13486-1-git-send-email-toshi.kani@hpe.com> <1457671546-13486-3-git-send-email-toshi.kani@hpe.com> <20160311090140.GA4873@gmail.com> <20160311091301.GA11595@gmail.com> <1457721266.6393.123.camel@hpe.com> <20160312161804.GA12688@gmail.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.4 (3.18.4-1.fc23) Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5044 Lines: 127 On Sat, 2016-03-12 at 17:18 +0100, Ingo Molnar wrote: > * Toshi Kani wrote: > > > On Fri, 2016-03-11 at 09:13 +0000, Ingo Molnar wrote: > > > * Ingo Molnar wrote: > > > > > > > > > > > * Toshi Kani wrote: > > > > > > > > > MTRR manages PAT initialization as it implements a rendezvous > > > > > handler that initializes PAT as part of MTRR initialization. > > > > > > > > > > When CPU does not support MTRR, ex. qemu32 virtual CPU, MTRR > > > > > simply skips PAT init, which causes PAT left enabled without > > > > > initialization. [...] > > > > > > > > What practical effects does this have to the user? Does the kernel > > > > crash? > > > > > > Btw., I find this omission _highly_ annoying: describing what > > > negative effects a bug _causes in practice_ is the most important > > > part of a changelog. How on earth can an experienced contributor omit > > > such an important component from a patch description? > > > > > > Most readers of changelogs couldn't care less about technical details > > > of how the bug is fixed (of course others will read it so it's nice > > > to have too), but what symptoms a bug causes, how serious is it, > > > whether it should be backported are like super important compared to > > > everything else you wrote - and both the description and the > > > changelogs are totally silent on those topics ... > > > > > > I've seen this in other PAT patches - please try to improve this. > > > > My apology. I agree the importance of describing the negative effect of > > the issue. This case is complicated to describe thoroughly, but here is > > a summary. > > The new changelog looks very good, thanks! > > > The issue was reported as a regression caused by 'commit 9cd25aac1f44 > > ("x86/mm/pat: Emulate PAT when it is disabled")'. So, the goal of this > > patchset is to fix this regression. > > https://lkml.org/lkml/2016/3/3/828 > > So one thing that matters more than anything else in the changelog, the > title! Right now the title is: > >   x86/mtrr: Refactor PAT initialization code > > ... that's a nice title for a true refactoring of the code, but this > isn't really that, the purpose of this fix is to fix a bad Xorg crash for > Qemu users. > > The principle you need to remember is that readers of your changelogs > will be _very happy_ about 'negative' phrases like: > >   bad bug >   Xorg crash >   boot failure >   kernel crash >   NULL dereference > > I.e. the 'best' title for a bug fix is to characterize it in the most > negative truthful fashion in the changelog. It sounds a bit > counterintuitive but it's true.  > > So in this case the best changelog title would be something like: > >   x86/pat: Fix Xorg crashes in Qemu sessions > > People will absolutely _love_ such titles, because: > >   - users who are trying to find mysterious Xorg failures can grep for it > and might find it before it hits a stable kernel they are using > >   - maintainers (like me) are able to see it at a glance that this fix > should go to Linus more urgently than other fixes. (and definitely more > urgently than feature patches.) > >   - stable kernel maintainers and distro backporters can see it > immediately at a glance that they really want this fix. > > So by being intentionally and maximally negative in the title, you are > being very helpful to your fellow developers and users! > > Now consider the original title: > >   x86/mtrr: Refactor PAT initialization code > > 99% of people will glance over such a title, which is not good. > Furhermore, maintainers like me will get _annoyed_ at such titles, > because this neutrally formulated title, while very polite, actively > hides the important detail that these patches fix real negative bugs for > real users. > > Okay? Thanks for all the explanation and guidance! That's very helpful. Yes, I will keep this in mind. > And please also note that in the Linux kernel no-one ever 'blames' other > people for bugs. Bugs are part of the human condition and they happen all > the time as long as they are not introduced by carelessness. So in the > typical case you cannot possibly socially embarrass any good kernel > developer by reporting and fixing a bug he introduced. The typical > reaction you will get is 'oh great, one bug less to worry about!', so > socially you can be absolutely honest and 'impolite' about the negative > effects of bugs. Understood. > > The negative effects of the issue were two failures in Xorg on qemu32 > > env, which was triggered by the fact that its virtual CPU does not > > support MTRR. > > https://lkml.org/lkml/2016/3/4/775 > >  #1. copy_process() failed in the check in reserve_pfn_range() > >  #2. error path in copy_process() then hit WARN_ON_ONCE in > > untrack_pfn(). > > Yeah, it's nice to quote actual crash signatures as well (in a short > form) - because people hitting the crashes often do a google search and > might find the fix based on such patterns. Will do. Thanks! -Toshi