2020-10-01 08:55:24

by Thorsten Leemhuis

[permalink] [raw]
Subject: [RFC PATCH v1 00/26] Make reporting-bugs easier to grasp and yet more detailed

This series rewrites the "how to report bugs to the Linux kernel maintainers"
document to make it more straight forward and the essence easier to grasp. At
the same time make the text provide a lot more details about the process in form
of a reference section, so users that want or need to know them have them at
hand.

The goal of this rewrite: improve the quality of the bug reports and reduce the
number of reports that get ignored. This was motivated by many reports of poor
quality the main author of the rewrite stumped upon when he was tracking
regressions.

For the curious, this is how the text looks in the end:
https://gitlab.com/knurd42/linux/-/raw/reporting-bugs-rfc/Documentation/admin-guide/reporting-bugs.rst

For comparison, here you can find the old text and the commits to it and its
predecessor:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-bugs.html
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-bugs.rst
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/Documentation/admin-guide/reporting-bugs.rst
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/REPORTING-BUGS

This is an early RFC and likely has some spelling and grammatical mistakes.
Sorry for that, the main author is not a native English speaker and makes too
many of those mistakes even in his mother tongue. He used hunspell and
LanguageTool to find errors, but noticed those tools miss quite a few mistakes.
Hopefully it's not too bad.

The main author of the rewrite is also fully aware the text got quite long in
the end. That happened as he tried to make users avoid many of the problem he
noticed in bug report, which needed quite a bit of space to describe.
Nevertheless, he tried to make sure the text uses a structure where only those
that want to know all the details have to read it. That's mainly realized with
the help of the TL;DR and the short guide at the top of the document. Those
should be good enough for a lot of situations.

There are a few points that will need to be discussed. The comment in the
individual patches will point some of those out; that for example includes
things like "dual licensing under CC-BY 4.0", "are we asking too much from users
when telling them to test mainline?", and "CC LKML or something else on all
reports?". But a few points are best raised here:

* The old and the new reporting-bugs text take a totally different approach to
bugzilla.kernel.org. The old mentions it as the place to file your issue if
you don't know where to go. The new one mentions it rarely and most of the
time warn users that it's often the wrong place to go. This approach was
chosen as the main author noticed quite a few users (or even a lot?) get no
reply to the bugs they file in bugzilla. That's kind of expected, as quite a
few (many? most?) of the maintainers don't even get notified when reports for
their subsystem get filed there. Anyway: not getting a reply is something
that is just annoying for users and might make them angry. Improving bugzilla
would be an option, but on the kernel and maintainers summit 2017 (sorry it
took so long) it was agreed on to first go this route, as it's easier to
reach and less controversial, as many maintainers likely are unwilling to
deal with bugzilla.

* The text states "see above" or "see below" in a few places. Should those be
proper links? But then some anchors will need to be placed manually in a few
places, which slightly hurt readability of the plain text. Could RST or
autosectionlabel help here somewhat (without changing the line
"autosectionlabel_maxdepth = 2" in Documentation/conf.py, which likely is
unwanted)?

* The new text avoids the word "bug" and uses "issues" instead, as users face
issues which might or might not be caused by bugs. Due to this approach it
might make sense to rename the document to "reporting-issues". But for now
everything is left as it is, as changing the name of a well known file has
downsides; but maybe at least the documents headline should get the
s/bugs/issues/ treatment.

* How to make sure everybody that cares get a chance to review this? As this is
an early RFC, the author chose to sent it only to the docs maintainer,
linux-docs and LKML, to see how well this approach is received in general.
Once it is agreed that this is the route forward, a lot of other people need
to be CCed to review it; the stable maintainers for example should check if
the section on handling issues with stable and longterm kernels is acceptable
for them. In the end it's something a lot of maintainers might want to take
at least a quick look at, as they will be dealing with the reports. But there
is no easy way to contact all of them (apart from CCing all of them), as most
of them likely don't read LKML anymore. Should the author maybe abuse
ksummit-discuss, as this likely will reach all the major stakeholders Side
note: maybe it would be good to have a list for things like this on vger...

The patch series is against docs-next and can also be found on gitlab:
git://[email protected]:knurd42/linux.git reporting-bugs-rfc

Strictly speaking this series is not bisectable, as the old text it left in
place and removed slowly by the patches in the series when they add new text
that covers the same aspect. Thus, both old and new text are incomplete or
inconsistent (and thus would not build, if we'd talked about code). But that is
only relevant for those that read the text before the series is fully applied.
That seemed like an acceptable downside in this case, as this makes it easier to
compare the old and new approach.

Note: The main autor is not a developer, so he will have gotten a few things in
the procedure wrong. Let him know if you spot something where things are off.

Thorsten Leemhuis (26):
docs: reporting-bugs: temporary markers for licensing and diff reasons
docs: reporting-bugs: Create a TLDR how to report issues
docs: reporting-bugs: step-by-step guide on how to report issues
docs: reporting-bugs: step-by-step guide for issues in stable &
longterm
docs: reporting-bugs: begin reference section providing details
docs: reporting-bugs: point out we only care about fresh vanilla
kernels
docs: reporting-bugs: let users classify their issue
docs: reporting-bugs: make readers check the taint flag
docs: reporting-bugs: help users find the proper place for their
report
docs: reporting-bugs: remind people to look for existing reports
docs: reporting-bugs: remind people to back up their data
docs: reporting-bugs: tell users to disable DKMS et al.
docs: reporting-bugs: point out the environment might be causing issue
docs: reporting-bugs: make users write notes, one for each issue
docs: reporting-bugs: make readers test mainline, but leave a loophole
docs: reporting-bugs: let users check taint status again
docs: reporting-bugs: explain options if reproducing on mainline fails
docs: reporting-bugs: let users optimize their notes
docs: reporting-bugs: decode failure messages [need help]
docs: reporting-bugs: instructions for handling regressions
docs: reporting-bugs: details on writing and sending the report
docs: reporting-bugs: explain what users should do once the report got
out
docs: reporting-bugs: details for issues specific to stable and
longterm
docs: reporting-bugs: explain why users might get neither reply nor
fix
docs: reporting-bugs: explain things could be easier
docs: reporting-bugs: add SPDX tag and license hint, remove markers

Documentation/admin-guide/bug-bisect.rst | 2 +
Documentation/admin-guide/reporting-bugs.rst | 1586 +++++++++++++++--
Documentation/admin-guide/tainted-kernels.rst | 2 +
scripts/ver_linux | 81 -
4 files changed, 1441 insertions(+), 230 deletions(-)
delete mode 100755 scripts/ver_linux


base-commit: e0bc9cf0a7d527ff140f851f6f1a815cc5c48fea
--
2.26.2


2020-10-01 08:57:55

by Thorsten Leemhuis

[permalink] [raw]
Subject: [RFC PATCH v1 04/26] docs: reporting-bugs: step-by-step guide for issues in stable & longterm

Handle stable and longterm kernels in a subsection, as dealing with them
directly in the main part of the step-by-step guide turned out to make
it messy and hard to follow: it looked a bit like code with a large
amount of if-then-else section to handle special cases, which made the
default code-flow hard to understand.

Yet again each step will later be repeated in a reference section and
described in more detail.

Signed-off-by: Thorsten Leemhuis <[email protected]>
---
Documentation/admin-guide/reporting-bugs.rst | 49 ++++++++++++++++++++
1 file changed, 49 insertions(+)

diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
index 203df36af55f..e0a6f4328e87 100644
--- a/Documentation/admin-guide/reporting-bugs.rst
+++ b/Documentation/admin-guide/reporting-bugs.rst
@@ -156,6 +156,55 @@ After these preparations you'll now enter the main part:
yourself, if you don't get any help or if it is unsatisfying.


+Reporting issues only occurring in older kernel version lines
+-------------------------------------------------------------
+
+This section is for you, if you tried the latest mainline kernel as outlined
+above, but failed to reproduce your issue there; at the same time you want to
+see the issue fixed in older version lines or a vendor kernel that's regularly
+rebased on new stable or longterm releases. If that case follow these steps:
+
+ * Prepare yourself for the possibility that going through the next few steps
+ might not get the issue solved in older releases: the fix might be too big or
+ risky to get backported there.
+
+ * Check if the kernel developers still maintain the Linux kernel version line
+ you care about: go to `the front-page of kernel.org <https://kernel.org>`_
+ and make sure it mentions the latest release of the particular version line
+ without an '[EOL]' tag.
+
+ * Check the `archives of the Linux stable mailing list
+ <https://lore.kernel.org/stable/>`_ for existing reports.
+
+ * Install the latest release from the particular version line as a vanilla
+ kernel. Ensure this kernel is not tainted and still shows the problem, as the
+ issue might have already been fixed there.
+
+ * Search the Linux kernel version control system for the change that fixed
+ the issue in mainline, as its commit message might tell you if the fix is
+ scheduled for backporting already. If you don't find anything that way,
+ search the appropriate mailing lists for posts that discuss such an issue or
+ peer-review possible fixes. That might lead you to the commit with the fix
+ or tell you if it's unsuitable for backporting. If backporting was not
+ considered at all, join the newest discussion, asking if its in the cards.
+
+ * Check if you're dealing with a regression that was never present in
+ mainline by installing the first release of the version line you care about.
+ If the issue doesn't show up with it, you basically need to report the issue
+ with this version like you would report a problem with mainline (see above).
+ This ideally includes a bisection followed by a search for existing reports
+ on the net; with the help of the subject and the two relevant commit-ids. If
+ that doesn't turn up anything, write the report; CC or forward the report to
+ the stable maintainers, the stable mailing list, and those that authored the
+ change. Include the shortened commit-id if you found the change that causes
+ it.
+
+ * One of the former steps should lead to a solution. If that doesn't work out,
+ ask the maintainers for the subsystem that seems to be causing the issue for
+ advice; CC the mailing list for the particular subsystem as well as the
+ stable mailing list.
+
+
.. ############################################################################
.. Temporary marker added while this document is rewritten. Sections above
.. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
--
2.26.2

2020-10-01 08:58:01

by Thorsten Leemhuis

[permalink] [raw]
Subject: [RFC PATCH v1 03/26] docs: reporting-bugs: step-by-step guide on how to report issues

Add a more detailed section on how to report bugs to the Linux kernel
developers that nevertheless still is shorter, more straight-forward and
thus easier to gasp than the old text. It should provide enough details
for most users, even if it still leaves a lot of things unexplained.
Some of them can be important, that's why later patches will add a
reference section describing each of the steps and the motivation for it
in more detail. The text of the particular step will be repeated there
as introduction.

The order of the steps was chosen in the interest of the users: make
sure they get the basics right before they do more complicated,
time-consuming, and dangerous tasks. Some of it also explain a few
basics that might seem natural to kernel developers, but are things that
people often get wrong.

Signed-off-by: Thorsten Leemhuis <[email protected]>
---
Documentation/admin-guide/reporting-bugs.rst | 103 +++++++++++++++++++
1 file changed, 103 insertions(+)

diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
index 7bde6f32ff72..203df36af55f 100644
--- a/Documentation/admin-guide/reporting-bugs.rst
+++ b/Documentation/admin-guide/reporting-bugs.rst
@@ -53,6 +53,109 @@ Security issues are typically best report privately; also CC the security team
or forward your report there.


+Step-by-step guide how to report issues to the kernel maintainers
+=================================================================
+
+Above TL;DR outlines roughly how to report issues to the Linux kernel
+developers. It might be all that's needed for people already familiar with
+reporting issues to Free/Libre & Open Source Software (FLOSS) projects. For
+everyone else there is this section. It is more detailed and uses a
+step-by-step approach. It still tries to be brief for readability; if it's to
+brief for you, look up the details in the reference section below, where each
+of the steps is explained in more detail.
+
+Note, this section covers a few more aspects than the TL;DR and does things in a
+slightly different order. That's in your interest, to make sure you notice early
+if an issue that looks like a Linux kernel problem is actually caused by
+something else. These steps thus help to ensure the time you invest in this
+process won't feel wasted in the end:
+
+ * Stop reading this document and report the problem to your vendor instead,
+ unless you are running a vanilla mainline kernel already or are willing to
+ install it.
+
+ * See if the issue you are dealing with qualifies as regression, security
+ issue, or a really severe problem: those are 'issues of high priority' that
+ need special handling in some steps that are about to follow.
+
+ * Check if your kernel was 'tainted' when the issue occurred, as the event that
+ made the kernel set this flag might be causing the issue you face.
+
+ * Locate the driver or kernel subsystem that seems to be causing the issue.
+ Find out how and where its developers expect reports. Note: most of the time
+ this won't be `bugzilla.kernel.org <https://bugzilla.kernel.org/>`_, as issues
+ typically need to be sent by mail to a maintainer and a public mailing list.
+
+ * Search the archives of the bug tracker or mailing list in question
+ thoroughly for reports that might match your issue. Also check if you find
+ something with your favorite internet search engine or in the `Linux Kernel
+ Mailing List (LKML) archives <https://lore.kernel.org/lkml/>`_. If you find
+ anything, join the discussion instead of sending a new report.
+
+ * Create a fresh backup and put system repair and restore tools at hand.
+
+ * Ensure your system does not enhance its kernels by building additional
+ kernel modules on-the-fly locally, which solutions like DKMS might be doing
+ without your knowledge.
+
+ * Make sure it's not the kernels surroundings that are causing the issue you
+ face.
+
+ * Write down coarsely how to reproduce the issue. If you deal with multiple
+ issue at once, create separate notes for each of them and make sure they
+ work independently on a freshly booted system. That's needed, as each issue
+ needs to get reported to the kernel developers separately, unless they are
+ strongly entangled.
+
+After these preparations you'll now enter the main part:
+
+ * Install the latest Linux mainline kernel: that's where all issue get fixed
+ first, because it's the version line the kernel developers mainly care about.
+ Testing and reporting with the latest Linux stable kernel can be acceptable
+ alternative in some situations, but is best avoided.
+
+ * Ensure the kernel you just installed does not 'taint' itself when running.
+
+ * Reproduce the issue with the kernel you just installed. If it doesn't show up
+ there, head over to the instructions for issues only happening with stable
+ and longterm kernels if you want to see it fixed there.
+
+ * Optimize your notes: try to find and write the most straightforward way to
+ reproduce your issue. Make sure the end result has all the important details,
+ and at the same time is easy to read and understand for others that hear
+ about it for the first time. And if you learned something in this process,
+ consider searching again for existing reports about the issue.
+
+ * If the failure includes a stack dump, like an Oops does, consider decoding it
+ to find the offending line of code.
+
+ * If your problem is a regression, try to narrow down when the issue was
+ introduced as much as possible.
+
+ * Start to compile the report by writing a detailed description about the
+ issue. Always mentions a few things: the latest kernel version you installed
+ for reproducing, the Linux Distribution used, and your notes how to
+ reproduce the issue. Ideally, make the kernels build configuration (.config)
+ and the output from ``dmesg`` available somewhere on the net and link to it.
+ Include or upload all other information that might be relevant, like the
+ output/screenshot of an Oops or the output from ``lspci``. Once you
+ wrote this main part insert a normal length paragraph on top of it outlining
+ the issue and the impact quickly. On top of this add one sentence that
+ briefly describes the problem and gets people to read on. Now give the thing
+ a descriptive title or subject that yet again is shorter. Then you're ready
+ to send or file the report like the `MAINTAINERS file
+ <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/MAINTAINERS>`_
+ told you, unless you are dealing with one of those 'issues of high priority':
+ they need special care which is explained in 'Special handling for high
+ priority issues' below.
+
+ * Wait for reactions and keep the thing rolling until you can accept the
+ outcome in one way or the other. Thus react publicly and in a timely manner
+ to any inquiries. Test proposed fixes. Do proactive testing when a new rc1
+ gets released. Sent friendly reminders if things stall. And try to help
+ yourself, if you don't get any help or if it is unsatisfying.
+
+
.. ############################################################################
.. Temporary marker added while this document is rewritten. Sections above
.. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
--
2.26.2

2020-10-01 08:58:13

by Thorsten Leemhuis

[permalink] [raw]
Subject: [RFC PATCH v1 08/26] docs: reporting-bugs: make readers check the taint flag

Tell users early in the process to check the taint flag, as that will
prevent them from investing time into a report that might be worthless.
That way users for example will notice that the issue they face is in
fact caused by an add-on kernel module or and Oops that happened
earlier.

This approach has a downside: users will later have to check the flag
again with the mainline kernel the guide tells them to install. But that
is an acceptable trade-off here, as checking only takes a few seconds
and can easily prevent wasting time in useless testing and debugging.

Signed-off-by: Thorsten Leemhuis <[email protected]>
---

= RFC =

Should "disable DKMS" come before this step? But then the backup step right
before that one would need to be moved as well, as disabling DKMS can mix things
up.
---
Documentation/admin-guide/reporting-bugs.rst | 59 +++++++++++++++++++
Documentation/admin-guide/tainted-kernels.rst | 2 +
2 files changed, 61 insertions(+)

diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
index 430a0c3ee0ad..61b6592ddf74 100644
--- a/Documentation/admin-guide/reporting-bugs.rst
+++ b/Documentation/admin-guide/reporting-bugs.rst
@@ -311,6 +311,65 @@ fatal error where the kernels stop itself) with a 'Oops' (a recoverable error),
as the kernel remains running after an 'Oops'.


+Check 'taint' flag
+------------------
+
+ *Check if your kernel was 'tainted' when the issue occurred, as the event
+ that made the kernel set this flag might be causing the issue you face.*
+
+The kernel marks itself with a 'taint' flag when something happens that might
+lead to follow-up errors that look totally unrelated. The issue you face might
+be such an error if your kernel is tainted. That's why it's in your interest to
+rule this out early before investing more time into this process. This is the
+only reason why this step is here, as this process later will tell you to
+install the latest mainline kernel and check its taint flag, as that's the
+kernel the report will be mainly about.
+
+On a running system is easy to check if the kernel tainted itself: it's not
+tainted if ``cat /proc/sys/kernel/tainted`` returns '0'. Checking that file is
+impossible in some situations, that's why the kernel also mentions the taint
+status when it reports an internal problem (a 'kernel bug'), a recoverable
+error (a 'kernel Oops') or a non-recoverable error before halting operation (a
+'kernel panic'). Look near the top of the error messages printed when one of
+these occurs and search for a line starting with 'CPU:'. It should end with
+'Not tainted' if the kernel was not tainted beforehand; it was tainted if you
+see 'Tainted:' followed by a few spaces and some letters.
+
+If your kernel is tainted study
+:ref:`Documentation/admin-guide/tainted-kernels.rst <taintedkernels>` to find
+out why and try to eliminate the reason. Often it's because a recoverable error
+(a 'kernel Oops') occurred and the kernel tainted itself, as the kernel knows
+it might misbehave in strange ways after that point. In that case check your
+kernel or system log and look for a section that starts with this::
+
+ Oops: 0000 [#1] SMP
+
+That's the first Oops since boot-up, as the '#1' between the brackets shows.
+Every Oops and any other problem that happen after that point might be a
+follow-up problem to that first Oops, even if they look totally unrelated. Try
+to rule this out by getting rid of that Oops and reproducing the issue
+afterwards. Sometimes simply restarting will be enough, sometimes a change to
+the configuration followed by a reboot can eliminate the Oops. But don't invest
+too much time into this at this point of the process, as the cause for the Oops
+might already be fixed in the newer Linux kernel version you are going to
+install later in this process.
+
+Quite a few kernels are also tainted because an unsuitable kernel modules was
+loaded. This for example is the case if you use Nvidias proprietary graphics
+driver, VirtualBox, or other software that installs its own kernel modules: you
+will have to remove these modules and reboot the system, as they might in fact
+be causing the issue you face.
+
+The kernel also taints itself when it's loading a module that resists in the
+staging tree of the Linux kernel source. That's a special area for code (mostly
+drivers) that does not yet fulfill the normal Linux kernel quality standards.
+When you report an issue with such a module it's obviously okay if the kernel is
+tainted, just make sure the module in question is the only reason for the taint.
+If the issue happens in an unrelated area reboot and temporary block the module
+from being loaded by specifying ``foo.blacklist=1`` as kernel parameter (replace
+'foo' with the name of the module in question).
+
+
.. ############################################################################
.. Temporary marker added while this document is rewritten. Sections above
.. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
index abf804719890..2900f477f42f 100644
--- a/Documentation/admin-guide/tainted-kernels.rst
+++ b/Documentation/admin-guide/tainted-kernels.rst
@@ -1,3 +1,5 @@
+.. _taintedkernels:
+
Tainted kernels
---------------

--
2.26.2

2020-10-01 08:58:22

by Thorsten Leemhuis

[permalink] [raw]
Subject: [RFC PATCH v1 05/26] docs: reporting-bugs: begin reference section providing details

Provide an introduction to the reference section that will provide more
details how to report an issue. Mention a few general things here. Those
are not strictly needed, but likely wise to write down somewhere.

Signed-off-by: Thorsten Leemhuis <[email protected]>
---

= RFC =

Should we keep the links to
https://www.chiark.greenend.org.uk/~sgtatham/bugs.html and
http://www.catb.org/esr/faqs/smart-questions.html? Are they worth it? Or is
there anything similar or better that's a bit fresher and ideally still
maintained?
---
Documentation/admin-guide/reporting-bugs.rst | 46 +++++++++++++++++---
1 file changed, 40 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
index e0a6f4328e87..be1bce8d43aa 100644
--- a/Documentation/admin-guide/reporting-bugs.rst
+++ b/Documentation/admin-guide/reporting-bugs.rst
@@ -205,6 +205,46 @@ rebased on new stable or longterm releases. If that case follow these steps:
stable mailing list.


+Reference section: Reporting issues to the kernel maintainers
+=============================================================
+
+The detailed guides above outlines all the mayor steps in brief fashion, which
+should be enough for most people. But sometimes there are situations where even
+experienced users might wonder how to actually do one of those steps. That's
+what this section is for, as it will provide a lot more details on each of the
+steps. Consider this a reference documentation: it's possible to read it from
+top to bottom, but more meant to skim over and a place to look up details in
+case you need them.
+
+A few words of general advice before digging into the details:
+
+ * The Linux kernel developers are well aware this process is complicated and
+ demands more than other FLOSS projects. We'd love to make it simpler, but
+ that would require work in various places as well as infrastructure that
+ would need constant maintenance; nobody has stepped up to do that work, so
+ that's just how things are for now.
+
+ * A warranty or support contract with some vendor doesn't entitle you to
+ request fixes from developers in the upstream Linux kernel community: such
+ contracts are completely outside the scope of the Linux kernel, its
+ development community, and this document. That's why you can't demand
+ anything such a contract guarantees in this context, not even if the
+ developer handling the issue works for the vendor in question. If you want to
+ claim your rights, use the vendors support channel instead. When doing so,
+ you might want to mention you'd like to see the issue fixed in the upstream
+ Linux kernel; motivate them by saying it's the only way to ensure the fix in
+ the end will get incorporated in all Linux distributions.
+
+ * If you never reported an issue to a FLOSS project before you should consider
+ reading `How to Report Bugs Effectively
+ <https://www.chiark.greenend.org.uk/~sgtatham/bugs.html>`_
+ and `How To Ask Questions The Smart Way
+ <http://www.catb.org/esr/faqs/smart-questions.html>`_.
+
+With that of the table, find below the details on how to properly report issues
+to the Linux kernel developers.
+
+
.. ############################################################################
.. Temporary marker added while this document is rewritten. Sections above
.. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
@@ -281,12 +321,6 @@ http://vger.kernel.org/lkml/).
Tips for reporting bugs
-----------------------

-If you haven't reported a bug before, please read:
-
- https://www.chiark.greenend.org.uk/~sgtatham/bugs.html
-
- http://www.catb.org/esr/faqs/smart-questions.html
-
It's REALLY important to report bugs that seem unrelated as separate email
threads or separate bugzilla entries. If you report several unrelated
bugs at once, it's difficult for maintainers to tease apart the relevant
--
2.26.2

2020-10-01 08:58:37

by Thorsten Leemhuis

[permalink] [raw]
Subject: [RFC PATCH v1 07/26] docs: reporting-bugs: let users classify their issue

Explicitly outline that some issues are more important than others and
thus need to be handled differently in some steps that are about to
follow. This makes things explicit and easy to find if you need to look
up what issues actually qualify as "regression" or a "severe problem".

The alternative would have been: explain each of the three types in the
place where it requires special handling for the first time. But that
makes it quite easy to miss and harder to find when you need to look it
up.

Signed-off-by: Thorsten Leemhuis <[email protected]>
---
Documentation/admin-guide/reporting-bugs.rst | 39 ++++++++++++++++++++
1 file changed, 39 insertions(+)

diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
index 434e1a890dfe..430a0c3ee0ad 100644
--- a/Documentation/admin-guide/reporting-bugs.rst
+++ b/Documentation/admin-guide/reporting-bugs.rst
@@ -272,6 +272,45 @@ you want to circumvent it consider installing the mainline kernel yourself; just
make sure it's the latest one (see below).


+Issue of high priority?
+-----------------------
+
+ *See if the issue you are dealing with qualifies as regression, security
+ issue, or a really severe problem: those are 'issues of high priority' that
+ need special handling in some steps that are about to follow.*
+
+Linus Torvalds and the leading Linux kernel developers want to see some issues
+fixed as soon as possible, hence these 'issues of high priority' get handled
+slightly different in the reporting process. Three type of cases qualify:
+regressions, security issues, and really severe problems.
+
+You deal with a 'regression' if something that worked with an older version of
+the Linux kernel does not work with a newer one or somehow works worse with it.
+It thus is a regression when a Wi-Fi driver that did a fine job with Linux 5.7
+somehow misbehaves with 5.8 or doesn't work at all. It's also a regression if
+an application shows erratic behavior with a newer kernel, which might happen
+due to incompatible changes in the interface between the kernel and the
+userland (like procfs and sysfs). Significantly reduced performance or
+increased power consumption also qualify as regression. But keep in mind: the
+new kernel needs to be build with a configuration that is similar to the one
+from the old kernel (see below how to archive that). That's because
+process is sometimes only possible by doing incompatible changes; but to avoid
+regression such changes have to be enabled explicitly during build time
+configuration.
+
+What qualifies as security issue is left to your judgment. Consider reading
+:ref:`Documentation/admin-guide/security-bugs.rst <securitybugs>` before
+proceeding.
+
+An issue is a 'really severe problem' when something totally unacceptable bad
+happens. That's for example the case when a Linux kernel corrupts the data it's
+handling or damages hardware it's running on. You're also dealing with a severe
+issue when the kernel suddenly stops working with an error message ('kernel
+panic') or without any farewell note at all. Note: do not confused a 'panic' (a
+fatal error where the kernels stop itself) with a 'Oops' (a recoverable error),
+as the kernel remains running after an 'Oops'.
+
+
.. ############################################################################
.. Temporary marker added while this document is rewritten. Sections above
.. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
--
2.26.2

2020-10-01 08:59:20

by Thorsten Leemhuis

[permalink] [raw]
Subject: [RFC PATCH v1 06/26] docs: reporting-bugs: point out we only care about fresh vanilla kernels

More explicitly than the old text point out the Linux kernel developers
don't care about vendor kernels. That is obvious to Linux kernel
developers, but something a lot of users fail to gasp, as quite a few
(maybe a lot?) reports on bugzilla.kernel.org show; most of them get
silently ignored, which is frustrating for people that invested time in
preparing and writing the report. Try to minimize that and explain it
properly, as some users will think "why do kernel devs makes things so
complicated for me and force me to install a fresh vanilla kernel".

Signed-off-by: Thorsten Leemhuis <[email protected]>
---

= RFC =

Should we accept reports for issues with kernel images that are pretty close to
vanilla? But when are they close enough and how to put that line in words? Maybe
something like this (any major distributions missing?):

```Note: Some Linux kernel developers accept reports from vendor kernels that
are known to be close to upstream. That for example is often the case for the
kernels that Debian GNU/Linux Sid or Fedora Rawhide ship, which are close to
mainline. Additionally, Arch Linux, other Fedora releases, and openSUSE
Tumbleweed often use recent stable kernels that are pretty close to upstream,
too. So a report with one of these might be acceptable. But it depends heavily
on the issue in question and some developers nevertheless will ignore report
from these kernels, that's why installing the latest mainline vanilla kernel is
the safe bet.```
---
Documentation/admin-guide/reporting-bugs.rst | 33 ++++++++++++++++----
1 file changed, 27 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
index be1bce8d43aa..434e1a890dfe 100644
--- a/Documentation/admin-guide/reporting-bugs.rst
+++ b/Documentation/admin-guide/reporting-bugs.rst
@@ -245,6 +245,33 @@ With that of the table, find below the details on how to properly report issues
to the Linux kernel developers.


+Make sure you're using the upstream Linux kernel
+------------------------------------------------
+
+ *Stop reading this document and report the problem to your vendor instead,
+ unless you are running a vanilla mainline kernel already or are willing to
+ install it.*
+
+Like most programmers, Linux kernel developers don't like to spend time dealing
+with reports for issues that don't even happen with the source code they
+maintain: it's just a waste everybody's time, yours included. That's why you
+later will have to test your issue with the latest 'vanilla kernel': a kernel
+that was build using the Linux sources taken straight from
+`kernel.org <https://kernel.org/>`_ and not modified or enhanced in any way.
+
+Almost all kernels used in devices (Computers, Laptops, Smartphones, Routers,
+…) and most kernels shipped by Linux distributors are ancient from the point of
+kernel development and heavily modified. They thus do not qualify for reporting
+an issue to the Linux kernel developers: the issue you face with such a kernel
+might be fixed already or caused by the changes or additions, even if they look
+small or totally unrelated. That's why issues with such kernels need to be
+reported to the vendor that distributed it. Its developers should look into the
+report and, in case it turns out to be an upstream issue, fix it directly
+upstream or report it there. If the company or project is uncooperative or if
+you want to circumvent it consider installing the mainline kernel yourself; just
+make sure it's the latest one (see below).
+
+
.. ############################################################################
.. Temporary marker added while this document is rewritten. Sections above
.. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
@@ -262,12 +289,6 @@ Please see https://www.kernel.org/ for a list of supported kernels. Any
kernel marked with [EOL] is "end of life" and will not have any fixes
backported to it.

-If you've found a bug on a kernel version that isn't listed on kernel.org,
-contact your Linux distribution or embedded vendor for support.
-Alternatively, you can attempt to run one of the supported stable or -rc
-kernels, and see if you can reproduce the bug on that. It's preferable
-to reproduce the bug on the latest -rc kernel.
-

How to report Linux kernel bugs
===============================
--
2.26.2

2020-10-01 09:00:26

by Thorsten Leemhuis

[permalink] [raw]
Subject: [RFC PATCH v1 01/26] docs: reporting-bugs: temporary markers for licensing and diff reasons

Add two temporary markers for the transition to the rewritten document.

Both tell users that the document is incomplete and partly inconsistent
before all patches from the series got applied. It also points out the
new text is dual-licensed under GPLv2+ and CC-BY 4.0. The latter is
better for documentation in general. It's also more liberal, which is a
nice-to-have for a document like this, as other that makes it possible
for websites and books to republish it or build upon it.

The second marker separates old and new text, which makes diffs a lot
more readable. It's also there for licensing reasons, as it makes is
obvious that all old text is gone in the end. Then a proper SPDX license
tag will get added as well.

Signed-off-by: Thorsten Leemhuis <[email protected]>
---

= RFC =

Let me know if you think dual-licensing was a bad idea or if CC-BY-4.0 is a bad
choice here.
---
Documentation/admin-guide/reporting-bugs.rst | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
index 42481ea7b41d..4bbb9132782b 100644
--- a/Documentation/admin-guide/reporting-bugs.rst
+++ b/Documentation/admin-guide/reporting-bugs.rst
@@ -3,6 +3,19 @@
Reporting bugs
++++++++++++++

+.. ############################################################################
+.. Temporary marker added while this document is rewritten. The sections below
+.. up to a second marker of this kind are new and dual-licensed under GPLv2+
+.. and CC-BY 4.0. Both sections are incomplete as of now and thus might be
+.. inconsistent/not make sense before all patches of the rewrite got applied.
+.. ###########################################################################
+
+.. ############################################################################
+.. Temporary marker added while this document is rewritten. Sections above
+.. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
+.. Both sections are incomplete as of now and thus sometimes inconsistent.
+.. ###########################################################################
+
Background
==========

--
2.26.2

2020-10-01 09:01:02

by Thorsten Leemhuis

[permalink] [raw]
Subject: [RFC PATCH v1 02/26] docs: reporting-bugs: Create a TLDR how to report issues

Get straight to the point in a few paragraphs instead of forcing users
to read quite a bit of text, like the old approach did.

All normally needed fits into the first two paragraphs. The third is
dedicated to issues only happening in stable and longterm kernels, as
things otherwise get hard to follow. At the end explicitly spell out
that some issues need to be handled slightly different.

This TLDR naturally leaves lots of details out. But it will be good
enough in some situations, for example for users that recently reported
an issue or are familiar with reporting issues to FLOSS projects.

Signed-off-by: Thorsten Leemhuis <[email protected]>
---
Documentation/admin-guide/reporting-bugs.rst | 43 ++++++++++++++++++++
1 file changed, 43 insertions(+)

diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
index 4bbb9132782b..7bde6f32ff72 100644
--- a/Documentation/admin-guide/reporting-bugs.rst
+++ b/Documentation/admin-guide/reporting-bugs.rst
@@ -10,6 +10,49 @@ Reporting bugs
.. inconsistent/not make sense before all patches of the rewrite got applied.
.. ###########################################################################

+
+The short guide (aka TL;DR)
+===========================
+
+This is how you report issues with the Linux kernel to its developers:
+
+If you deal with multiple issues at once, process each of them separately. Try
+your best guess which area of the kernel might be responsible for your issue.
+Check the `MAINTAINERS file
+<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/MAINTAINERS>`_
+how developers of that particular area expect to be told about issues; note,
+it's rarely `bugzilla.kernel.org <https://bugzilla.kernel.org/>`_, as most
+subsystems expect reports by mail sent to their maintainers and their public
+mailing list!
+
+Check the archives of the determined destination thoroughly for existing
+reports; also search the LKML archives and the internet as a whole. If you can't
+find any, install the `latest Linux mainline version <https://kernel.org/>`_.
+Make sure to use a vanilla kernel and avert any add-on kernel modules externally
+developed; also ensure the kernel is running in a healthy environment and does
+not 'taint' itself before the issue occurs. If you can reproduce it, write a
+report to the destination you determined earlier. Afterwards keep the ball
+rolling by proactive testing, a status update now and then, and helping where
+you can.
+
+You can't reproduce an issue with mainline you want to see fixed in older
+version lines? Then make sure the line you care about still gets support.
+Install its latest release as vanilla kernel. If you can reproduce the issue
+there, try to find the commit that fixed it in mainline or any discussion
+preceding it: those will often mention if backporting is planed or impossible;
+if not, ask for it. In case you don't find anything, check if it's a regression
+specific to the version line that need to be bisected and report just like a
+problem in mainline with the stable mailing list CCed. If you reached this point
+without a solution, ask for advice by mailing the subsystem maintainer with the
+subsystem and stable mailing list in CC.
+
+If you deal with a regression, bisect it to find the culprit and CC or forward
+your report to its developers.
+
+Security issues are typically best report privately; also CC the security team
+or forward your report there.
+
+
.. ############################################################################
.. Temporary marker added while this document is rewritten. Sections above
.. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
--
2.26.2

2020-10-02 02:34:14

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC PATCH v1 02/26] docs: reporting-bugs: Create a TLDR how to report issues

On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:
> Get straight to the point in a few paragraphs instead of forcing users
> to read quite a bit of text, like the old approach did.
>
> All normally needed fits into the first two paragraphs. The third is
> dedicated to issues only happening in stable and longterm kernels, as
> things otherwise get hard to follow. At the end explicitly spell out
> that some issues need to be handled slightly different.
>
> This TLDR naturally leaves lots of details out. But it will be good
> enough in some situations, for example for users that recently reported
> an issue or are familiar with reporting issues to FLOSS projects.
>
> Signed-off-by: Thorsten Leemhuis <[email protected]>
> ---
> Documentation/admin-guide/reporting-bugs.rst | 43 ++++++++++++++++++++
> 1 file changed, 43 insertions(+)
>
> diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
> index 4bbb9132782b..7bde6f32ff72 100644
> --- a/Documentation/admin-guide/reporting-bugs.rst
> +++ b/Documentation/admin-guide/reporting-bugs.rst
> @@ -10,6 +10,49 @@ Reporting bugs
> .. inconsistent/not make sense before all patches of the rewrite got applied.
> .. ###########################################################################
>
> +
> +The short guide (aka TL;DR)
> +===========================
> +
> +This is how you report issues with the Linux kernel to its developers:
> +
> +If you deal with multiple issues at once, process each of them separately. Try
> +your best guess which area of the kernel might be responsible for your issue.
> +Check the `MAINTAINERS file
> +<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/MAINTAINERS>`_
> +how developers of that particular area expect to be told about issues; note,

for how
?

> +it's rarely `bugzilla.kernel.org <https://bugzilla.kernel.org/>`_, as most
> +subsystems expect reports by mail sent to their maintainers and their public
> +mailing list!
> +
> +Check the archives of the determined destination thoroughly for existing
> +reports; also search the LKML archives and the internet as a whole. If you can't
> +find any, install the `latest Linux mainline version <https://kernel.org/>`_.
> +Make sure to use a vanilla kernel and avert any add-on kernel modules externally
> +developed; also ensure the kernel is running in a healthy environment and does
> +not 'taint' itself before the issue occurs. If you can reproduce it, write a

I don't care for "does not 'taint' itself". How about
and is not
already tainted before the issue occurs.

> +report to the destination you determined earlier. Afterwards keep the ball
> +rolling by proactive testing, a status update now and then, and helping where
> +you can.
> +
> +You can't reproduce an issue with mainline you want to see fixed in older
> +version lines? Then make sure the line you care about still gets support.
> +Install its latest release as vanilla kernel. If you can reproduce the issue

Is "vanilla" well understood?

> +there, try to find the commit that fixed it in mainline or any discussion
> +preceding it: those will often mention if backporting is planed or impossible;
> +if not, ask for it. In case you don't find anything, check if it's a regression
> +specific to the version line that need to be bisected and report just like a

that needs

> +problem in mainline with the stable mailing list CCed. If you reached this point
> +without a solution, ask for advice by mailing the subsystem maintainer with the
> +subsystem and stable mailing list in CC.
> +
> +If you deal with a regression, bisect it to find the culprit and CC or forward
> +your report to its developers.
> +
> +Security issues are typically best report privately; also CC the security team

reported

> +or forward your report there.
> +
> +
> .. ############################################################################
> .. Temporary marker added while this document is rewritten. Sections above
> .. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
>


--
~Randy

2020-10-02 03:04:08

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC PATCH v1 03/26] docs: reporting-bugs: step-by-step guide on how to report issues

On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:
>
> Signed-off-by: Thorsten Leemhuis <[email protected]>
> ---
> Documentation/admin-guide/reporting-bugs.rst | 103 +++++++++++++++++++
> 1 file changed, 103 insertions(+)
>
> diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
> index 7bde6f32ff72..203df36af55f 100644
> --- a/Documentation/admin-guide/reporting-bugs.rst
> +++ b/Documentation/admin-guide/reporting-bugs.rst
> @@ -53,6 +53,109 @@ Security issues are typically best report privately; also CC the security team
> or forward your report there.
>
>
> +Step-by-step guide how to report issues to the kernel maintainers
> +=================================================================
> +
> +Above TL;DR outlines roughly how to report issues to the Linux kernel

The above

> +developers. It might be all that's needed for people already familiar with
> +reporting issues to Free/Libre & Open Source Software (FLOSS) projects. For
> +everyone else there is this section. It is more detailed and uses a
> +step-by-step approach. It still tries to be brief for readability; if it's to

too

> +brief for you, look up the details in the reference section below, where each
> +of the steps is explained in more detail.
> +
> +Note, this section covers a few more aspects than the TL;DR and does things in a

Note:

> +slightly different order. That's in your interest, to make sure you notice early
> +if an issue that looks like a Linux kernel problem is actually caused by
> +something else. These steps thus help to ensure the time you invest in this
> +process won't feel wasted in the end:
> +
> + * Stop reading this document and report the problem to your vendor instead,
> + unless you are running a vanilla mainline kernel already or are willing to
> + install it.
> +
> + * See if the issue you are dealing with qualifies as regression, security
> + issue, or a really severe problem: those are 'issues of high priority' that
> + need special handling in some steps that are about to follow.
> +
> + * Check if your kernel was 'tainted' when the issue occurred, as the event that
> + made the kernel set this flag might be causing the issue you face.
> +
> + * Locate the driver or kernel subsystem that seems to be causing the issue.
> + Find out how and where its developers expect reports. Note: most of the time
> + this won't be `bugzilla.kernel.org <https://bugzilla.kernel.org/>`_, as issues
> + typically need to be sent by mail to a maintainer and a public mailing list.
> +
> + * Search the archives of the bug tracker or mailing list in question
> + thoroughly for reports that might match your issue. Also check if you find
> + something with your favorite internet search engine or in the `Linux Kernel
> + Mailing List (LKML) archives <https://lore.kernel.org/lkml/>`_. If you find
> + anything, join the discussion instead of sending a new report.
> +
> + * Create a fresh backup and put system repair and restore tools at hand.
> +
> + * Ensure your system does not enhance its kernels by building additional
> + kernel modules on-the-fly locally, which solutions like DKMS might be doing
> + without your knowledge.
> +
> + * Make sure it's not the kernels surroundings that are causing the issue you

kernel's

> + face.
> +
> + * Write down coarsely how to reproduce the issue. If you deal with multiple
> + issue at once, create separate notes for each of them and make sure they

issues

> + work independently on a freshly booted system. That's needed, as each issue
> + needs to get reported to the kernel developers separately, unless they are
> + strongly entangled.
> +
> +After these preparations you'll now enter the main part:
> +
> + * Install the latest Linux mainline kernel: that's where all issue get fixed
> + first, because it's the version line the kernel developers mainly care about.
> + Testing and reporting with the latest Linux stable kernel can be acceptable

can be an acceptable

> + alternative in some situations, but is best avoided.
> +
> + * Ensure the kernel you just installed does not 'taint' itself when running.
> +
> + * Reproduce the issue with the kernel you just installed. If it doesn't show up
> + there, head over to the instructions for issues only happening with stable
> + and longterm kernels if you want to see it fixed there.

Can you link (reference) to that section?

> +
> + * Optimize your notes: try to find and write the most straightforward way to
> + reproduce your issue. Make sure the end result has all the important details,
> + and at the same time is easy to read and understand for others that hear
> + about it for the first time. And if you learned something in this process,
> + consider searching again for existing reports about the issue.
> +
> + * If the failure includes a stack dump, like an Oops does, consider decoding it
> + to find the offending line of code.

Refer to scripts/decodecode ?
or is that done elsewhere?

> +
> + * If your problem is a regression, try to narrow down when the issue was
> + introduced as much as possible.
> +
> + * Start to compile the report by writing a detailed description about the
> + issue. Always mentions a few things: the latest kernel version you installed
> + for reproducing, the Linux Distribution used, and your notes how to

I would say: notes on how to
Maybe it's just me.

> + reproduce the issue. Ideally, make the kernels build configuration (.config)

kernel's

> + and the output from ``dmesg`` available somewhere on the net and link to it.
> + Include or upload all other information that might be relevant, like the
> + output/screenshot of an Oops or the output from ``lspci``. Once you
> + wrote this main part insert a normal length paragraph on top of it outlining

part, insert

> + the issue and the impact quickly. On top of this add one sentence that
> + briefly describes the problem and gets people to read on. Now give the thing
> + a descriptive title or subject that yet again is shorter. Then you're ready
> + to send or file the report like the `MAINTAINERS file
> + <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/MAINTAINERS>`_
> + told you, unless you are dealing with one of those 'issues of high priority':

tells you,

OK, I like present tense as much as possible.

> + they need special care which is explained in 'Special handling for high
> + priority issues' below.

Can we provide a link to that section here?

> +
> + * Wait for reactions and keep the thing rolling until you can accept the
> + outcome in one way or the other. Thus react publicly and in a timely manner
> + to any inquiries. Test proposed fixes. Do proactive testing when a new rc1

when a new -rc
(release candidate) is released. Send

> + gets released. Sent friendly reminders if things stall. And try to help
> + yourself, if you don't get any help or if it is unsatisfying.
> +
> +
> .. ############################################################################
> .. Temporary marker added while this document is rewritten. Sections above
> .. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
>


--
~Randy

2020-10-02 03:28:35

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC PATCH v1 04/26] docs: reporting-bugs: step-by-step guide for issues in stable & longterm

On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:
> Handle stable and longterm kernels in a subsection, as dealing with them
> directly in the main part of the step-by-step guide turned out to make
> it messy and hard to follow: it looked a bit like code with a large
> amount of if-then-else section to handle special cases, which made the
> default code-flow hard to understand.
>
> Yet again each step will later be repeated in a reference section and
> described in more detail.
>
> Signed-off-by: Thorsten Leemhuis <[email protected]>
> ---
> Documentation/admin-guide/reporting-bugs.rst | 49 ++++++++++++++++++++
> 1 file changed, 49 insertions(+)
>
> diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
> index 203df36af55f..e0a6f4328e87 100644
> --- a/Documentation/admin-guide/reporting-bugs.rst
> +++ b/Documentation/admin-guide/reporting-bugs.rst
> @@ -156,6 +156,55 @@ After these preparations you'll now enter the main part:
> yourself, if you don't get any help or if it is unsatisfying.
>
>
> +Reporting issues only occurring in older kernel version lines
> +-------------------------------------------------------------
> +
> +This section is for you, if you tried the latest mainline kernel as outlined
> +above, but failed to reproduce your issue there; at the same time you want to
> +see the issue fixed in older version lines or a vendor kernel that's regularly
> +rebased on new stable or longterm releases. If that case follow these steps:
> +
> + * Prepare yourself for the possibility that going through the next few steps
> + might not get the issue solved in older releases: the fix might be too big or
> + risky to get backported there.
> +
> + * Check if the kernel developers still maintain the Linux kernel version line
> + you care about: go to `the front-page of kernel.org <https://kernel.org>`_
> + and make sure it mentions the latest release of the particular version line
> + without an '[EOL]' tag.

Explain somewhere that EOL = End Of Life (in parens).

> +
> + * Check the `archives of the Linux stable mailing list
> + <https://lore.kernel.org/stable/>`_ for existing reports.
> +
> + * Install the latest release from the particular version line as a vanilla
> + kernel. Ensure this kernel is not tainted and still shows the problem, as the
> + issue might have already been fixed there.
> +
> + * Search the Linux kernel version control system for the change that fixed
> + the issue in mainline, as its commit message might tell you if the fix is
> + scheduled for backporting already. If you don't find anything that way,
> + search the appropriate mailing lists for posts that discuss such an issue or
> + peer-review possible fixes. That might lead you to the commit with the fix
> + or tell you if it's unsuitable for backporting. If backporting was not
> + considered at all, join the newest discussion, asking if its in the cards.

it's

> +
> + * Check if you're dealing with a regression that was never present in
> + mainline by installing the first release of the version line you care about.
> + If the issue doesn't show up with it, you basically need to report the issue
> + with this version like you would report a problem with mainline (see above).
> + This ideally includes a bisection followed by a search for existing reports
> + on the net; with the help of the subject and the two relevant commit-ids. If
> + that doesn't turn up anything, write the report; CC or forward the report to
> + the stable maintainers, the stable mailing list, and those that authored the

those who (?)

> + change. Include the shortened commit-id if you found the change that causes
> + it.
> +
> + * One of the former steps should lead to a solution. If that doesn't work out,
> + ask the maintainers for the subsystem that seems to be causing the issue for
> + advice; CC the mailing list for the particular subsystem as well as the
> + stable mailing list.
> +
> +
> .. ############################################################################
> .. Temporary marker added while this document is rewritten. Sections above
> .. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
>


--
~Randy

2020-10-02 16:53:30

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC PATCH v1 05/26] docs: reporting-bugs: begin reference section providing details

Hi--

On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:
> Provide an introduction to the reference section that will provide more
> details how to report an issue. Mention a few general things here. Those
> are not strictly needed, but likely wise to write down somewhere.
>
> Signed-off-by: Thorsten Leemhuis <[email protected]>
> ---
>
> = RFC =
>
> Should we keep the links to
> https://www.chiark.greenend.org.uk/~sgtatham/bugs.html and
> http://www.catb.org/esr/faqs/smart-questions.html? Are they worth it? Or is
> there anything similar or better that's a bit fresher and ideally still
> maintained?

Dunno. They are interesting but outdated.

> ---
> Documentation/admin-guide/reporting-bugs.rst | 46 +++++++++++++++++---
> 1 file changed, 40 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
> index e0a6f4328e87..be1bce8d43aa 100644
> --- a/Documentation/admin-guide/reporting-bugs.rst
> +++ b/Documentation/admin-guide/reporting-bugs.rst
> @@ -205,6 +205,46 @@ rebased on new stable or longterm releases. If that case follow these steps:
> stable mailing list.
>
>
> +Reference section: Reporting issues to the kernel maintainers
> +=============================================================
> +
> +The detailed guides above outlines all the mayor steps in brief fashion, which

outline major

> +should be enough for most people. But sometimes there are situations where even
> +experienced users might wonder how to actually do one of those steps. That's
> +what this section is for, as it will provide a lot more details on each of the
> +steps. Consider this a reference documentation: it's possible to read it from

as

> +top to bottom, but more meant to skim over and a place to look up details in
> +case you need them.
> +
> +A few words of general advice before digging into the details:
> +
> + * The Linux kernel developers are well aware this process is complicated and
> + demands more than other FLOSS projects. We'd love to make it simpler, but
> + that would require work in various places as well as infrastructure that
> + would need constant maintenance; nobody has stepped up to do that work, so
> + that's just how things are for now.
> +
> + * A warranty or support contract with some vendor doesn't entitle you to
> + request fixes from developers in the upstream Linux kernel community: such
> + contracts are completely outside the scope of the Linux kernel, its
> + development community, and this document. That's why you can't demand
> + anything such a contract guarantees in this context, not even if the
> + developer handling the issue works for the vendor in question. If you want to
> + claim your rights, use the vendors support channel instead. When doing so,

vendor's

> + you might want to mention you'd like to see the issue fixed in the upstream
> + Linux kernel; motivate them by saying it's the only way to ensure the fix in
> + the end will get incorporated in all Linux distributions.
> +
> + * If you never reported an issue to a FLOSS project before you should consider
> + reading `How to Report Bugs Effectively
> + <https://www.chiark.greenend.org.uk/~sgtatham/bugs.html>`_
> + and `How To Ask Questions The Smart Way
> + <http://www.catb.org/esr/faqs/smart-questions.html>`_.
> +
> +With that of the table, find below the details on how to properly report issues

off

> +to the Linux kernel developers.
> +
> +
> .. ############################################################################
> .. Temporary marker added while this document is rewritten. Sections above
> .. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
> @@ -281,12 +321,6 @@ http://vger.kernel.org/lkml/).
> Tips for reporting bugs
> -----------------------
>
> -If you haven't reported a bug before, please read:
> -
> - https://www.chiark.greenend.org.uk/~sgtatham/bugs.html
> -
> - http://www.catb.org/esr/faqs/smart-questions.html
> -
> It's REALLY important to report bugs that seem unrelated as separate email
> threads or separate bugzilla entries. If you report several unrelated
> bugs at once, it's difficult for maintainers to tease apart the relevant
>


--
~Randy

2020-10-02 17:02:36

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC PATCH v1 07/26] docs: reporting-bugs: let users classify their issue

On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:
> Explicitly outline that some issues are more important than others and
> thus need to be handled differently in some steps that are about to
> follow. This makes things explicit and easy to find if you need to look
> up what issues actually qualify as "regression" or a "severe problem".
>
> The alternative would have been: explain each of the three types in the
> place where it requires special handling for the first time. But that
> makes it quite easy to miss and harder to find when you need to look it
> up.
>
> Signed-off-by: Thorsten Leemhuis <[email protected]>
> ---
> Documentation/admin-guide/reporting-bugs.rst | 39 ++++++++++++++++++++
> 1 file changed, 39 insertions(+)
>
> diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
> index 434e1a890dfe..430a0c3ee0ad 100644
> --- a/Documentation/admin-guide/reporting-bugs.rst
> +++ b/Documentation/admin-guide/reporting-bugs.rst
> @@ -272,6 +272,45 @@ you want to circumvent it consider installing the mainline kernel yourself; just
> make sure it's the latest one (see below).
>
>
> +Issue of high priority?
> +-----------------------
> +
> + *See if the issue you are dealing with qualifies as regression, security
> + issue, or a really severe problem: those are 'issues of high priority' that
> + need special handling in some steps that are about to follow.*
> +
> +Linus Torvalds and the leading Linux kernel developers want to see some issues
> +fixed as soon as possible, hence these 'issues of high priority' get handled
> +slightly different in the reporting process. Three type of cases qualify:

differently
at least that's what I would say. :)

> +regressions, security issues, and really severe problems.
> +
> +You deal with a 'regression' if something that worked with an older version of
> +the Linux kernel does not work with a newer one or somehow works worse with it.
> +It thus is a regression when a Wi-Fi driver that did a fine job with Linux 5.7
> +somehow misbehaves with 5.8 or doesn't work at all. It's also a regression if
> +an application shows erratic behavior with a newer kernel, which might happen
> +due to incompatible changes in the interface between the kernel and the
> +userland (like procfs and sysfs). Significantly reduced performance or
> +increased power consumption also qualify as regression. But keep in mind: the
> +new kernel needs to be build with a configuration that is similar to the one

built

> +from the old kernel (see below how to archive that). That's because

achieve

> +process is sometimes only possible by doing incompatible changes; but to avoid

eh? That's because ... ???

> +regression such changes have to be enabled explicitly during build time
> +configuration.
> +
> +What qualifies as security issue is left to your judgment. Consider reading
> +:ref:`Documentation/admin-guide/security-bugs.rst <securitybugs>` before
> +proceeding.
> +
> +An issue is a 'really severe problem' when something totally unacceptable bad

unacceptably

> +happens. That's for example the case when a Linux kernel corrupts the data it's
> +handling or damages hardware it's running on. You're also dealing with a severe
> +issue when the kernel suddenly stops working with an error message ('kernel
> +panic') or without any farewell note at all. Note: do not confused a 'panic' (a

confuse

> +fatal error where the kernels stop itself) with a 'Oops' (a recoverable error),
> +as the kernel remains running after an 'Oops'.
> +
> +
> .. ############################################################################
> .. Temporary marker added while this document is rewritten. Sections above
> .. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
>


--
~Randy
Reported-by: Randy Dunlap <[email protected]>

2020-10-02 17:12:39

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC PATCH v1 08/26] docs: reporting-bugs: make readers check the taint flag

On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:
> Tell users early in the process to check the taint flag, as that will
> prevent them from investing time into a report that might be worthless.
> That way users for example will notice that the issue they face is in
> fact caused by an add-on kernel module or and Oops that happened
> earlier.
>
> This approach has a downside: users will later have to check the flag
> again with the mainline kernel the guide tells them to install. But that
> is an acceptable trade-off here, as checking only takes a few seconds
> and can easily prevent wasting time in useless testing and debugging.
>
> Signed-off-by: Thorsten Leemhuis <[email protected]>
> ---
>
> = RFC =
>
> Should "disable DKMS" come before this step? But then the backup step right
> before that one would need to be moved as well, as disabling DKMS can mix things
> up.
> ---
> Documentation/admin-guide/reporting-bugs.rst | 59 +++++++++++++++++++
> Documentation/admin-guide/tainted-kernels.rst | 2 +
> 2 files changed, 61 insertions(+)
>
> diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
> index 430a0c3ee0ad..61b6592ddf74 100644
> --- a/Documentation/admin-guide/reporting-bugs.rst
> +++ b/Documentation/admin-guide/reporting-bugs.rst
> @@ -311,6 +311,65 @@ fatal error where the kernels stop itself) with a 'Oops' (a recoverable error),
> as the kernel remains running after an 'Oops'.
>
>
> +Check 'taint' flag
> +------------------
> +
> + *Check if your kernel was 'tainted' when the issue occurred, as the event
> + that made the kernel set this flag might be causing the issue you face.*
> +
> +The kernel marks itself with a 'taint' flag when something happens that might
> +lead to follow-up errors that look totally unrelated. The issue you face might
> +be such an error if your kernel is tainted. That's why it's in your interest to
> +rule this out early before investing more time into this process. This is the
> +only reason why this step is here, as this process later will tell you to
> +install the latest mainline kernel and check its taint flag, as that's the
> +kernel the report will be mainly about.
> +
> +On a running system is easy to check if the kernel tainted itself: it's not
> +tainted if ``cat /proc/sys/kernel/tainted`` returns '0'. Checking that file is
> +impossible in some situations, that's why the kernel also mentions the taint

situations;

> +status when it reports an internal problem (a 'kernel bug'), a recoverable
> +error (a 'kernel Oops') or a non-recoverable error before halting operation (a
> +'kernel panic'). Look near the top of the error messages printed when one of
> +these occurs and search for a line starting with 'CPU:'. It should end with
> +'Not tainted' if the kernel was not tainted beforehand; it was tainted if you
> +see 'Tainted:' followed by a few spaces and some letters.
> +
> +If your kernel is tainted study

tainted, study

> +:ref:`Documentation/admin-guide/tainted-kernels.rst <taintedkernels>` to find
> +out why and try to eliminate the reason. Often it's because a recoverable error
> +(a 'kernel Oops') occurred and the kernel tainted itself, as the kernel knows
> +it might misbehave in strange ways after that point. In that case check your
> +kernel or system log and look for a section that starts with this::
> +
> + Oops: 0000 [#1] SMP
> +
> +That's the first Oops since boot-up, as the '#1' between the brackets shows.
> +Every Oops and any other problem that happen after that point might be a
> +follow-up problem to that first Oops, even if they look totally unrelated. Try
> +to rule this out by getting rid of that Oops and reproducing the issue
> +afterwards. Sometimes simply restarting will be enough, sometimes a change to
> +the configuration followed by a reboot can eliminate the Oops. But don't invest
> +too much time into this at this point of the process, as the cause for the Oops
> +might already be fixed in the newer Linux kernel version you are going to
> +install later in this process.
> +
> +Quite a few kernels are also tainted because an unsuitable kernel modules was

module

> +loaded. This for example is the case if you use Nvidias proprietary graphics

Nvidia's

> +driver, VirtualBox, or other software that installs its own kernel modules: you
> +will have to remove these modules and reboot the system, as they might in fact
> +be causing the issue you face.

You will need to reboot the system and try to reproduce the issue without loading
any of these proprietary modules.

> +
> +The kernel also taints itself when it's loading a module that resists in the

resides

> +staging tree of the Linux kernel source. That's a special area for code (mostly
> +drivers) that does not yet fulfill the normal Linux kernel quality standards.
> +When you report an issue with such a module it's obviously okay if the kernel is
> +tainted, just make sure the module in question is the only reason for the taint.

tainted;

> +If the issue happens in an unrelated area reboot and temporary block the module

temporarily

> +from being loaded by specifying ``foo.blacklist=1`` as kernel parameter (replace
> +'foo' with the name of the module in question).
> +
> +
> .. ############################################################################
> .. Temporary marker added while this document is rewritten. Sections above
> .. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
> diff --git a/Documentation/admin-guide/tainted-kernels.rst b/Documentation/admin-guide/tainted-kernels.rst
> index abf804719890..2900f477f42f 100644
> --- a/Documentation/admin-guide/tainted-kernels.rst
> +++ b/Documentation/admin-guide/tainted-kernels.rst
> @@ -1,3 +1,5 @@
> +.. _taintedkernels:
> +
> Tainted kernels
> ---------------
>
>


--
~Randy

2020-10-03 07:28:57

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [RFC PATCH v1 02/26] docs: reporting-bugs: Create a TLDR how to report issues

Randy, many thanks for looking through this, you feedback is much
appreciated! Consider all the obvious spelling and grammatical mistakes
you pointed out fixed, I won't mention all of them in this reply to keep
things easier to follow.

Am 02.10.20 um 04:32 schrieb Randy Dunlap:
> On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:
> […]
>> +<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/MAINTAINERS>`_
>> +how developers of that particular area expect to be told about issues; note,
> for how
> ?

Not sure myself, but I guess you're right and thus followed your advice :-D

> […]
>> +Make sure to use a vanilla kernel and avert any add-on kernel modules externally
>> +developed; also ensure the kernel is running in a healthy environment and does
>> +not 'taint' itself before the issue occurs. If you can reproduce it, write a
>
> I don't care for "does not 'taint' itself". How about
> and is not
> already tainted before the issue occurs.

Hmmm, what I wanted to bring across: the kernel is not tainted when it
arrives, it taints itself after it was started. You suggestion removes
that intention, but now that I read my text again I notice it wasn't
really good at it either. Ohh well, I guess I go with your suggestion,
as it seems bringing that point over it asking for too much here.

> […]
>> +You can't reproduce an issue with mainline you want to see fixed in older
>> +version lines? Then make sure the line you care about still gets support.
>> +Install its latest release as vanilla kernel. If you can reproduce the issue
>
> Is "vanilla" well understood?

I'd say for the TLDR using it without and explanation is fine. But the
main section didn't prominently mention it, that why I added the first
step slightly and added this:

This kernel must not be modified or enhanced in any way and thus be
'vanilla'.

Ciao, Thorsten

2020-10-03 08:08:26

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [RFC PATCH v1 03/26] docs: reporting-bugs: step-by-step guide on how to report issues

Many thx for you comments. Consider all the obvious spelling and
grammatical mistakes you pointed out fixed, I won't mention all of them
in this reply to keep things easier to follow.

Am 02.10.20 um 05:02 schrieb Randy Dunlap:
> On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:
> […]
>> +brief for you, look up the details in the reference section below, where each
>> +of the steps is explained in more detail.
>> +
>> +Note, this section covers a few more aspects than the TL;DR and does things in a
> Note:

Ohh, really? LanguageTool suggested to use the comma once when I forgot
a colon, so I assumed it was okay. Uhhps.


>> + * Reproduce the issue with the kernel you just installed. If it doesn't show up
>> + there, head over to the instructions for issues only happening with stable
>> + and longterm kernels if you want to see it fixed there.
> Can you link (reference) to that section?

I raised that problem in the cover letter, as this is not the only place
where it would make sense. Hoping for input from Jonathan here how to do
that without adding lots of anchors...

>> + * Optimize your notes: try to find and write the most straightforward way to
>> + reproduce your issue. Make sure the end result has all the important details,
>> + and at the same time is easy to read and understand for others that hear
>> + about it for the first time. And if you learned something in this process,
>> + consider searching again for existing reports about the issue.
>> +
>> + * If the failure includes a stack dump, like an Oops does, consider decoding it
>> + to find the offending line of code.
> Refer to scripts/decodecode ?
> or is that done elsewhere?

Elsewhere and this step and that document likely needs to be heavily
updated anyway, as pointed out in a later patch :-/

>> +
>> + * If your problem is a regression, try to narrow down when the issue was
>> + introduced as much as possible.
>> +
>> + * Start to compile the report by writing a detailed description about the
>> + issue. Always mentions a few things: the latest kernel version you installed
>> + for reproducing, the Linux Distribution used, and your notes how to
>
> I would say: notes on how to
> Maybe it's just me.

Googled a bit and to me as a non-native English speaker looks like
you're correct.

>> + reproduce the issue. Ideally, make the kernels build configuration (.config)
> kernel's

Uggh, sorry, this mistake will show up a few more times, looks like I
applied German grammar rules to English. :-/

>> + the issue and the impact quickly. On top of this add one sentence that
>> + briefly describes the problem and gets people to read on. Now give the thing
>> + a descriptive title or subject that yet again is shorter. Then you're ready
>> + to send or file the report like the `MAINTAINERS file
>> + <https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/MAINTAINERS>`_
>> + told you, unless you are dealing with one of those 'issues of high priority':
> tells you,
> OK, I like present tense as much as possible.

Hmmm. Normally I'd agree, but I used past tense here because it refers
to something the reader did in an earlier step.

>> + * Wait for reactions and keep the thing rolling until you can accept the
>> + outcome in one way or the other. Thus react publicly and in a timely manner
>> + to any inquiries. Test proposed fixes. Do proactive testing when a new rc1
> when a new -rc
> (release candidate) is released. Send

I only meant "rc1" here, not every rc. More about this in a later patch.

Regarding explaining "rc" as "release candidate": my stupid brain has a
really hard time following that suggestion, as it still remembers some
words someone named Linus Torvalds wrote many many years ago:
```
I'll just use "-rc", and we can all agree that it stands for "Ridiculous
Count" rather than "Release Candidate".
```
https://lore.kernel.org/lkml/[email protected]/


I'll go and try to find some pills to force my brain into compliance.
;-) Once they start to work it hopefully can agree to this:

Do proactive testing: retest with at least every first release candidate
(RC) of a new mainline version and report your results.

Ciao, Thorsten

2020-10-03 08:26:11

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [RFC PATCH v1 04/26] docs: reporting-bugs: step-by-step guide for issues in stable & longterm

Many thx for you comments. Consider all the obvious spelling and
grammatical mistakes you pointed out fixed, I won't mention them in this
reply to keep things easier to follow.

Am 02.10.20 um 05:25 schrieb Randy Dunlap:
> On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:

>> + * Check if the kernel developers still maintain the Linux kernel version line
>> + you care about: go to `the front-page of kernel.org <https://kernel.org>`_
>> + and make sure it mentions the latest release of the particular version line
>> + without an '[EOL]' tag.
> Explain somewhere that EOL = End Of Life (in parens).

The section that describes this step in more detail explains the
acronym. To keep this section short I'd like to omit the explanation
here, as it's a pretty well known term anyway. Hope that's okay for you.

Ciao, Thorsten

2020-10-03 08:29:17

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [RFC PATCH v1 05/26] docs: reporting-bugs: begin reference section providing details

Am 02.10.20 um 18:49 schrieb Randy Dunlap:
> On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:

Many thx for you comments, all suggestions implemented.

Ciao, Thorsten

2020-10-03 09:43:37

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [RFC PATCH v1 07/26] docs: reporting-bugs: let users classify their issue

Many thx for you comments. Consider all the obvious spelling and
grammatical mistakes you pointed out fixed, I won't mention all of them
in this reply to keep things easier to follow.

Am 02.10.20 um 18:59 schrieb Randy Dunlap:
> On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:

>> +Linus Torvalds and the leading Linux kernel developers want to see some issues
>> +fixed as soon as possible, hence these 'issues of high priority' get handled
>> +slightly different in the reporting process. Three type of cases qualify:
>
> differently
> at least that's what I would say. :)

/me googles

Yeah, seems you are right, thx.

>> +from the old kernel (see below how to archive that). That's because
> achieve
>> +process is sometimes only possible by doing incompatible changes; but to avoid
> eh? That's because ... ???

Argh, that was a last minute change :-/ Now reads:

That's because incompatible changes sometimes can not be avoided when
implementing big improvements are implemented; but to avoid

>> +regression such changes have to be enabled explicitly during build time
regressions
>> +configuration.
>> +

Ciao, Thorsten

2020-10-03 09:57:40

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [RFC PATCH v1 08/26] docs: reporting-bugs: make readers check the taint flag

Many thx for you comments. Consider all the obvious spelling and
grammatical mistakes you pointed out fixed, I won't mention all of them
in this reply to keep things easier to follow.

Am 02.10.20 um 19:08 schrieb Randy Dunlap:
> On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:

>> +driver, VirtualBox, or other software that installs its own kernel modules: you
>> +will have to remove these modules and reboot the system, as they might in fact
>> +be causing the issue you face.
> You will need to reboot the system and try to reproduce the issue without loading
> any of these proprietary modules.

Hmmm. Preventing the Nvidia module from loading without disabling or
uninstalling the other parts of the graphics driver can easily to a
situation where the GUI is not starting. And blacklisting all modules
that VirtualBox needs on the host requires quite a bit of tying at the
boot loader iirc. So how about this:

Quite a few kernels are also tainted because an unsuitable kernel module
was loaded. This for example is the case if you use Nvidia's proprietary
graphics driver, VirtualBox, or other software that installs its own
kernel modules, as they might be causing the issue you face. You thus
have to prevent those modules from loading for the reporting process.
Most of the time the easiest way to do that is: temporarily uninstall
such software including any modules they might have installed.
Afterwards reboot.

Ciao, Thorsten

2020-10-03 17:48:57

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC PATCH v1 08/26] docs: reporting-bugs: make readers check the taint flag

On 10/3/20 2:56 AM, Thorsten Leemhuis wrote:
> Many thx for you comments. Consider all the obvious spelling and
> grammatical mistakes you pointed out fixed, I won't mention all of them
> in this reply to keep things easier to follow.
>
> Am 02.10.20 um 19:08 schrieb Randy Dunlap:
>> On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:
>
>>> +driver, VirtualBox, or other software that installs its own kernel modules: you
>>> +will have to remove these modules and reboot the system, as they might in fact
>>> +be causing the issue you face.
>> You will need to reboot the system and try to reproduce the issue without loading
>> any of these proprietary modules.
>
> Hmmm. Preventing the Nvidia module from loading without disabling or
> uninstalling the other parts of the graphics driver can easily to a
> situation where the GUI is not starting. And blacklisting all modules
> that VirtualBox needs on the host requires quite a bit of tying at the
> boot loader iirc. So how about this:
>
> Quite a few kernels are also tainted because an unsuitable kernel module
> was loaded. This for example is the case if you use Nvidia's proprietary
> graphics driver, VirtualBox, or other software that installs its own
> kernel modules, as they might be causing the issue you face. You thus
> have to prevent those modules from loading for the reporting process.
> Most of the time the easiest way to do that is: temporarily uninstall
> such software including any modules they might have installed.
> Afterwards reboot.

Sure, OK.

thanks.
--
~Randy

2020-11-09 11:06:13

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [RFC PATCH v1 00/26] Make reporting-bugs easier to grasp and yet more detailed

Lo!

Am 01.10.20 um 10:39 schrieb Thorsten Leemhuis:
> This series rewrites the "how to report bugs to the Linux kernel maintainers"
> document to make it more straight forward and the essence easier to grasp. At
> the same time make the text provide a lot more details about the process in form
> of a reference section, so users that want or need to know them have them at
> hand.
>
> The goal of this rewrite: improve the quality of the bug reports and reduce the
> number of reports that get ignored. This was motivated by many reports of poor
> quality the main author of the rewrite stumped upon when he was tracking
> regressions.

So, now that those weeks with the merge window, the OSS & ELC Europe,
and this US election thing are behind us it seems like a good time to ask:

How to move on with this?

@Jon: I'd be really appreciate to hear your thoughts on this.

@Randy: Thx again for all suggestions and pointing out many spelling
mistakes, that helped a lot! You didn't reply to some of the patches,
which made me wonder: did you not look at those (which is totally fine)
or was there nothing to point out? And what I'd really like to know:
what are you thinking about the whole thing?

@Everyone: Yes, I know, the length of the text is a bit intimidating,
but the structure was carefully chosen to get everything crucial across
at the top quickly, to make sure impatient readers quickly find what
they need -- and the details as well in later sections, in case they
need them. Yes, sure, that is not easy to achieve, but I think having
all the relevant information close together is of benefit for the
readers. Keeping details out that a significant share of readers will
likely need sounds a bit like saying "we don't take that patch, it for
the embedded use case and we only care about desktops and server" to me
(something which we don't do for good reasons and served us quite well
afaics)[¹].

Ohh, and btw: I still look for any input of what to write in the "decode
strack trace" section (see patch 19 you'll also find here
https://lore.kernel.org/lkml/fc63c021e58106559717fe1ecbbd24163e1c152d.1601541165.git.linux@leemhuis.info/
). Anyone seen some blog post or article that gives on the current state
of the art that might get me started?

Ciao, Thorsten

[¹] Side note: I noticed I even forgot to describe one thing: how to
join an existing mailing list discussion without breaking threading.
That something that even experienced users sometimes have trouble with,
afaics.


> For the curious, this is how the text looks in the end:
> https://gitlab.com/knurd42/linux/-/raw/reporting-bugs-rfc/Documentation/admin-guide/reporting-bugs.rst
>
> For comparison, here you can find the old text and the commits to it and its
> predecessor:
> https://www.kernel.org/doc/html/latest/admin-guide/reporting-bugs.html
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/Documentation/admin-guide/reporting-bugs.rst
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/Documentation/admin-guide/reporting-bugs.rst
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/REPORTING-BUGS
>
> This is an early RFC and likely has some spelling and grammatical mistakes.
> Sorry for that, the main author is not a native English speaker and makes too
> many of those mistakes even in his mother tongue. He used hunspell and
> LanguageTool to find errors, but noticed those tools miss quite a few mistakes.
> Hopefully it's not too bad.
>
> The main author of the rewrite is also fully aware the text got quite long in
> the end. That happened as he tried to make users avoid many of the problem he
> noticed in bug report, which needed quite a bit of space to describe.
> Nevertheless, he tried to make sure the text uses a structure where only those
> that want to know all the details have to read it. That's mainly realized with
> the help of the TL;DR and the short guide at the top of the document. Those
> should be good enough for a lot of situations.
>
> There are a few points that will need to be discussed. The comment in the
> individual patches will point some of those out; that for example includes
> things like "dual licensing under CC-BY 4.0", "are we asking too much from users
> when telling them to test mainline?", and "CC LKML or something else on all
> reports?". But a few points are best raised here:
>
> * The old and the new reporting-bugs text take a totally different approach to
> bugzilla.kernel.org. The old mentions it as the place to file your issue if
> you don't know where to go. The new one mentions it rarely and most of the
> time warn users that it's often the wrong place to go. This approach was
> chosen as the main author noticed quite a few users (or even a lot?) get no
> reply to the bugs they file in bugzilla. That's kind of expected, as quite a
> few (many? most?) of the maintainers don't even get notified when reports for
> their subsystem get filed there. Anyway: not getting a reply is something
> that is just annoying for users and might make them angry. Improving bugzilla
> would be an option, but on the kernel and maintainers summit 2017 (sorry it
> took so long) it was agreed on to first go this route, as it's easier to
> reach and less controversial, as many maintainers likely are unwilling to
> deal with bugzilla.
>
> * The text states "see above" or "see below" in a few places. Should those be
> proper links? But then some anchors will need to be placed manually in a few
> places, which slightly hurt readability of the plain text. Could RST or
> autosectionlabel help here somewhat (without changing the line
> "autosectionlabel_maxdepth = 2" in Documentation/conf.py, which likely is
> unwanted)?
>
> * The new text avoids the word "bug" and uses "issues" instead, as users face
> issues which might or might not be caused by bugs. Due to this approach it
> might make sense to rename the document to "reporting-issues". But for now
> everything is left as it is, as changing the name of a well known file has
> downsides; but maybe at least the documents headline should get the
> s/bugs/issues/ treatment.
>
> * How to make sure everybody that cares get a chance to review this? As this is
> an early RFC, the author chose to sent it only to the docs maintainer,
> linux-docs and LKML, to see how well this approach is received in general.
> Once it is agreed that this is the route forward, a lot of other people need
> to be CCed to review it; the stable maintainers for example should check if
> the section on handling issues with stable and longterm kernels is acceptable
> for them. In the end it's something a lot of maintainers might want to take
> at least a quick look at, as they will be dealing with the reports. But there
> is no easy way to contact all of them (apart from CCing all of them), as most
> of them likely don't read LKML anymore. Should the author maybe abuse
> ksummit-discuss, as this likely will reach all the major stakeholders Side
> note: maybe it would be good to have a list for things like this on vger...
>
> The patch series is against docs-next and can also be found on gitlab:
> git://[email protected]:knurd42/linux.git reporting-bugs-rfc
>
> Strictly speaking this series is not bisectable, as the old text it left in
> place and removed slowly by the patches in the series when they add new text
> that covers the same aspect. Thus, both old and new text are incomplete or
> inconsistent (and thus would not build, if we'd talked about code). But that is
> only relevant for those that read the text before the series is fully applied.
> That seemed like an acceptable downside in this case, as this makes it easier to
> compare the old and new approach.
>
> Note: The main autor is not a developer, so he will have gotten a few things in
> the procedure wrong. Let him know if you spot something where things are off.
>
> Thorsten Leemhuis (26):
> docs: reporting-bugs: temporary markers for licensing and diff reasons
> docs: reporting-bugs: Create a TLDR how to report issues
> docs: reporting-bugs: step-by-step guide on how to report issues
> docs: reporting-bugs: step-by-step guide for issues in stable &
> longterm
> docs: reporting-bugs: begin reference section providing details
> docs: reporting-bugs: point out we only care about fresh vanilla
> kernels
> docs: reporting-bugs: let users classify their issue
> docs: reporting-bugs: make readers check the taint flag
> docs: reporting-bugs: help users find the proper place for their
> report
> docs: reporting-bugs: remind people to look for existing reports
> docs: reporting-bugs: remind people to back up their data
> docs: reporting-bugs: tell users to disable DKMS et al.
> docs: reporting-bugs: point out the environment might be causing issue
> docs: reporting-bugs: make users write notes, one for each issue
> docs: reporting-bugs: make readers test mainline, but leave a loophole
> docs: reporting-bugs: let users check taint status again
> docs: reporting-bugs: explain options if reproducing on mainline fails
> docs: reporting-bugs: let users optimize their notes
> docs: reporting-bugs: decode failure messages [need help]
> docs: reporting-bugs: instructions for handling regressions
> docs: reporting-bugs: details on writing and sending the report
> docs: reporting-bugs: explain what users should do once the report got
> out
> docs: reporting-bugs: details for issues specific to stable and
> longterm
> docs: reporting-bugs: explain why users might get neither reply nor
> fix
> docs: reporting-bugs: explain things could be easier
> docs: reporting-bugs: add SPDX tag and license hint, remove markers
>
> Documentation/admin-guide/bug-bisect.rst | 2 +
> Documentation/admin-guide/reporting-bugs.rst | 1586 +++++++++++++++--
> Documentation/admin-guide/tainted-kernels.rst | 2 +
> scripts/ver_linux | 81 -
> 4 files changed, 1441 insertions(+), 230 deletions(-)
> delete mode 100755 scripts/ver_linux
>
>
> base-commit: e0bc9cf0a7d527ff140f851f6f1a815cc5c48fea
>

2020-11-09 18:24:08

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [RFC PATCH v1 00/26] Make reporting-bugs easier to grasp and yet more detailed

On Mon, 9 Nov 2020 12:01:56 +0100
Thorsten Leemhuis <[email protected]> wrote:

> @Jon: I'd be really appreciate to hear your thoughts on this.

Seems like it's time to post a new version with all of your feedback so
far reflected, and we'll go from there?

Thanks,

jon

2020-11-10 03:26:00

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC PATCH v1 00/26] Make reporting-bugs easier to grasp and yet more detailed

On 11/9/20 3:01 AM, Thorsten Leemhuis wrote:
> Lo!
>
> Am 01.10.20 um 10:39 schrieb Thorsten Leemhuis:
>> This series rewrites the "how to report bugs to the Linux kernel maintainers"
>> document to make it more straight forward and the essence easier to grasp. At
>> the same time make the text provide a lot more details about the process in form
>> of a reference section, so users that want or need to know them have them at
>> hand.
>>
>> The goal of this rewrite: improve the quality of the bug reports and reduce the
>> number of reports that get ignored. This was motivated by many reports of poor
>> quality the main author of the rewrite stumped upon when he was tracking
>> regressions.
>
> So, now that those weeks with the merge window, the OSS & ELC Europe, and this US election thing are behind us it seems like a good time to ask:
>
> How to move on with this?
>
> @Jon: I'd be really appreciate to hear your thoughts on this.
>
> @Randy: Thx again for all suggestions and pointing out many spelling mistakes, that helped a lot! You didn't reply to some of the patches, which made me wonder: did you not look at those (which is totally fine) or was there nothing to point out? And what I'd really like to know: what are you thinking about the whole thing?

Hi,

I looked at all of the patches in the series but did not have any comments
on the ones where I didn't reply.


thanks.

--
~Randy

2020-11-10 12:03:39

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [RFC PATCH v1 00/26] Make reporting-bugs easier to grasp and yet more detailed

Am 09.11.20 um 19:21 schrieb Jonathan Corbet:
> On Mon, 9 Nov 2020 12:01:56 +0100
> Thorsten Leemhuis <[email protected]> wrote:
>
>> @Jon: I'd be really appreciate to hear your thoughts on this.
>
> Seems like it's time to post a new version with all of your feedback so
> far reflected, and we'll go from there?

Will do, just give me a day to two.

Ciao, Thorsten

P.S.: BTW, @Randy, thx for yesterdays clarification in another mail of
this subthread!

2020-11-11 15:26:51

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [RFC PATCH v1 02/26] docs: reporting-bugs: Create a TLDR how to report issues

Am 03.10.20 um 09:27 schrieb Thorsten Leemhuis:
> Randy, many thanks for looking through this, you feedback is much
> appreciated! Consider all the obvious spelling and grammatical mistakes
> you pointed out fixed, I won't mention all of them in this reply to keep
> things easier to follow.
>
> Am 02.10.20 um 04:32 schrieb Randy Dunlap:
>> On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:
>> […]
>>> +<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/MAINTAINERS>`_
>>> +how developers of that particular area expect to be told about issues; note,
>> for how
>> ?
> Not sure myself, but I guess you're right and thus followed your advice :-D

I'm preparing to send v2 and was a bit unhappy with this and another
section when seeing it again after weeks. In the end I reshuffled and
rewrote significant parts of it, see below.

Randy, would be great if you could take another look, but no pressure:
just ignore it, if you lack the time or energy.

```
The short guide (aka TL;DR)
===========================

If you're facing multiple issues with the Linux kernel at once, report
each separately to its developers. Try your best guess which kernel part
might be causing the issue. Check the :ref:`MAINTAINERS <maintainers>`
file for how its developers expect to be told about issues. Note, it's
rarely `bugzilla.kernel.org <https://bugzilla.kernel.org/>`_, as in
almost all cases the report needs to be sent by email!

Check the destination thoroughly for existing reports; also search the
LKML archives and the web. Join existing discussion if you find matches.
If you don't find any, install `the latest Linux mainline kernel
<https://kernel.org/>`_. Make sure it's vanilla, thus is not patched or
using add-on kernel modules. Also ensure the kernel is running in a
healthy environment and is not already tainted before the issue occurs.

If you can reproduce your issue with the mainline kernel, send a report
to the destination you determined earlier. Make sure it includes all
relevant information, which in case of a regression should mention the
change that's causing it which can often can be found with a bisection.
Also ensure the report reaches all people that need to know about it,
for example the security team, the stable maintainers or the developers
of the patch that causes a regression. Once the report it out, answer
any questions that might be raised and help where you can. That includes
keeping the ball rolling: every time a new rc1 mainline kernel is
released, check if the issue is still happening there and attach a
status update to your initial report.

If you can not reproduce the issue with the mainline kernel, consider
sticking with it; if you'd like to use an older version line and want to
see it fixed there, first make sure it's still supported. Install its
latest release as vanilla kernel. If you cannot reproduce the issue
there, try to find the commit that fixed it in mainline or any
discussion preceding it: those will often mention if backporting is
planed or considered impassable. If backporting was not discussed, ask
if it's in the cards. In case you don't find any commits or a preceding
discussion, see the Linux-stable mailing list archives for existing
reports, as it might be a regression specific to the version line. If it
is, it round about needs to be reported like a problem in mainline
(including the bisection).

If you reached this point without a solution, ask for advice one the
subsystem's mailing list.
```

Ciao, Thorsten

2020-11-12 05:41:28

by Randy Dunlap

[permalink] [raw]
Subject: Re: [RFC PATCH v1 02/26] docs: reporting-bugs: Create a TLDR how to report issues

On 11/11/20 7:24 AM, Thorsten Leemhuis wrote:
> Am 03.10.20 um 09:27 schrieb Thorsten Leemhuis:
>> Randy, many thanks for looking through this, you feedback is much
>> appreciated! Consider all the obvious spelling and grammatical mistakes
>> you pointed out fixed, I won't mention all of them in this reply to keep
>> things easier to follow.
>>
>> Am 02.10.20 um 04:32 schrieb Randy Dunlap:
>>> On 10/1/20 1:39 AM, Thorsten Leemhuis wrote:
>>> […]
>>>> +<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/plain/MAINTAINERS>`_
>>>> +how developers of that particular area expect to be told about issues; note,
>>>     for how
>>> ?
>> Not sure myself, but I guess you're right and thus followed your advice :-D
>
> I'm preparing to send v2 and was a bit unhappy with this and another section when seeing it again after weeks. In the end I reshuffled and rewrote significant parts of it, see below.
>
> Randy, would be great if you could take another look, but no pressure: just ignore it, if you lack the time or energy.
>
> ```
> The short guide (aka TL;DR)
> ===========================
>
> If you're facing multiple issues with the Linux kernel at once, report each separately to its developers. Try your best guess which kernel part might be causing the issue. Check the :ref:`MAINTAINERS <maintainers>` file for how its developers expect to be told about issues. Note, it's rarely `bugzilla.kernel.org <https://bugzilla.kernel.org/>`_, as in almost all cases the report needs to be sent by email!
>
> Check the destination thoroughly for existing reports; also search the LKML archives and the web. Join existing discussion if you find matches. If you don't find any, install `the latest Linux mainline kernel <https://kernel.org/>`_. Make sure it's vanilla, thus is not patched or using add-on kernel modules. Also ensure the kernel is running in a healthy environment and is not already tainted before the issue occurs.
>
> If you can reproduce your issue with the mainline kernel, send a report to the destination you determined earlier. Make sure it includes all relevant information, which in case of a regression should mention the change that's causing it which can often can be found with a bisection. Also ensure the report reaches all people that need to know about it, for example the security team, the stable maintainers or the developers of the patch that causes a regression. Once the report it out, answer any questions that might be raised and help where you can. That includes keeping the ball rolling: every time a new rc1 mainline kernel is released, check if the issue is still happening there and attach a status update to your initial report.
>
> If you can not reproduce the issue with the mainline kernel, consider sticking with it; if you'd like to use an older version line and want to see it fixed there, first make sure it's still supported. Install its latest release as vanilla kernel. If you cannot reproduce the issue there, try to find the commit that fixed it in mainline or any discussion preceding it: those will often mention if backporting is planed or considered impassable. If backporting was not discussed, ask if it's in the cards. In case you don't find

impossible. ??

any commits or a preceding discussion, see the Linux-stable mailing list archives for existing reports, as it might be a regression specific to the version line. If it is, it round about needs to be reported like a problem in mainline (including the bisection).

maybe: it still needs to be reported like

>
> If you reached this point without a solution, ask for advice one the subsystem's mailing list.
> ```

Otherwise it looks good to me.

thanks.
--
~Randy

2020-11-12 05:44:04

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: [RFC PATCH v1 02/26] docs: reporting-bugs: Create a TLDR how to report issues

Am 12.11.20 um 04:33 schrieb Randy Dunlap:
> On 11/11/20 7:24 AM, Thorsten Leemhuis wrote:
>> Am 03.10.20 um 09:27 schrieb Thorsten Leemhuis:
>>> Am 02.10.20 um 04:32 schrieb Randy Dunlap:
>>>> On 10/1/20 1:39 AM, Thorsten Leemhuis wrote: […]
> […]

Sorry for the mail with those overly long lines, seems Thunderbird does
not behave as it used to (or I did something stupid) :-/

>> I'm preparing to send v2 and was a bit unhappy with this and
>> another section when seeing it again after weeks. In the end I
>> reshuffled and rewrote significant parts of it, see below.
>>
>> […]
>> If you can not reproduce the issue with the mainline kernel,
>> consider sticking with it; if you'd like to use an older version
>> line and want to see it fixed there, first make sure it's still
>> supported. Install its latest release as vanilla kernel. If you
>> cannot reproduce the issue there, try to find the commit that fixed
>> it in mainline or any discussion preceding it: those will often
>> mention if backporting is planed or considered impassable. If
>> backporting was not discussed, ask if it's in the cards. In case
>> you don't find
> impossible. ??

Hmmm, I didn't won't to use "impossible" as it often is possible, but
considered to hard/to much work. But I guess my dict sent me the wrong way.

I'll guess I'll switch to "considered too complex"

>> any commits or a preceding discussion, see the Linux-stable mailing
>> list archives for existing reports, as it might be a regression
>> specific to the version line. If it is, it round about needs to be
>> reported like a problem in mainline (including the bisection).
> maybe: it still needs to be reported like

Went with:

If it is, report it like you would report a problem in mainline
(including the bisection).

>> If you reached this point without a solution, ask for advice one
>> the subsystem's mailing list. ```
> Otherwise it looks good to me.

Many thanks for looking at it, much appreciated!

Ciao, Thorsten