2024-03-26 12:22:37

by Thorsten Leemhuis

[permalink] [raw]
Subject: [RFC PATCH v1 0/2] docs: reporting-issues: rework while involving the 'verify bugs' text

This is a RFC with two WIP patches that basically rewrite the detailed
step-by-step guide and the TLDR of
Documentation/admin-guide/reporting-issues.rst. Those two patches cover
all main changes I currently plan to do in those areas of the text, but
the explanations in the reference section are not yet updated to match
the changed step-by-step guide.

I'm nevertheless posting this now as RFC so people get a chance to
express things like "Thorsten, you are crazy, go away and find a hobby"
or "you are on the wrong path, this makes things worse, and would also
create a lot of trouble for translators for a questionable gain".
Getting such feedback now would be good: I'd prefer to not waste time on
updating the reference section if something like the two patches posted
here have no chance to be merged.

That being said: I (obviously) think these changes are worth it, as they
make both the TLDR and the guide easier to follow and fix a few things
that didn't work too well. It also offers users a new fast track to
inquire if a regression is known already. The step-by-step guide
furthermore is now a bit more verbose, so users have to consult the
reference section less -- this felt appropriate, now that the TLDR uses
a step-by-step approach as well that is quite similar.

In the end it looks like a rewrite, even if many things remained
similar. And all in all those changes sadly makes both sections larger:

TLDR:
- before: 374 words, 2332 characters;
- after: 491 words, 3085 characters

Step-by-step guide:
- before: 1058 words, 6279 characters (excluding a section that becomes
obsolete)
- before: 1332 words, 8048 characters (including a section that becomes
obsolete)
- after: 1491 words, 9015 characters;

Note, the changes to the reference section should not turn out to be as
extensive as these two patches, as many of the steps in the new detailed
step-by-step guide had equivalents in the older one; many sections in
the reference section will thus only need small changes or maybe none at
all; a few things are also unnecessary now, so the reference section
should get shorter.

To alleviate reviewing and translations, I plan to submit the changes to
the reference section in two steps. The first patch will perform all
changes, but will add newlines before significant changes, which will
wrap at 120 characters or so: both things should make it easier to see
the actual changes with ordinary diff. A second patch then will just
rewrap the text to the usual 80 characters boundary.

Side note: the two patches submitted now could and maybe should be
merged into one, but I decided to keep them separate for now to have
section-specific diffstats.

Thorsten Leemhuis (2):
docs: reporting-issue: rework the detailed guide
docs: reporting-issue: rework the TLDR

.../admin-guide/reporting-issues.rst | 497 ++++++++++--------
1 file changed, 273 insertions(+), 224 deletions(-)


base-commit: b8cfda5c9065cd619a97c17da081cbfab3b1e756
--
2.44.0



2024-03-26 12:22:53

by Thorsten Leemhuis

[permalink] [raw]
Subject: [RFC PATCH v1 1/2] docs: reporting-issue: rework the detailed guide

Rework the detailed step-by-step guide for various reasons:

* Simplify the search with the help of lore.kernel.org/all/, which did
not exist when the text was written.

* Make use of the recently added document
Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst,
which covers many steps this text partly covered way better.

* The 'quickly report a stable regression to the stable team' approach
hardly worked out: most of the time the regression was not known yet.
Try a different approach using the regressions list.

* Reports about stable/longterm regressions most of the time were
greeted with a brief reply along the lines of 'Is mainline affected as
well?'; this is needed to determine who is responsible, so we might as
well make the reporter check that before sending the report (which
verify-bugs-and-bisect-regressions.rst already tells them to do, too).

* A lot of fine tuning after seeing what people were struggling with.

FIXME: adjust the entries in the reference section to match these
changes.

Not-signed-off-by: Thorsten Leemhuis <[email protected]>
---
.../admin-guide/reporting-issues.rst | 391 ++++++++++--------
1 file changed, 210 insertions(+), 181 deletions(-)

diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
index 2fd5a030235ad0..e6083946c146e8 100644
--- a/Documentation/admin-guide/reporting-issues.rst
+++ b/Documentation/admin-guide/reporting-issues.rst
@@ -48,187 +48,216 @@ Once the report is out, answer any questions that come up and help where you
can. That includes keeping the ball rolling by occasionally retesting with newer
releases and sending a status update afterwards.

-Step-by-step guide how to report issues to the kernel maintainers
-=================================================================
-
-The above TL;DR outlines roughly how to report issues to the Linux kernel
-developers. It might be all that's needed for people already familiar with
-reporting issues to Free/Libre & Open Source Software (FLOSS) projects. For
-everyone else there is this section. It is more detailed and uses a
-step-by-step approach. It still tries to be brief for readability and leaves
-out a lot of details; those are described below the step-by-step guide in a
-reference section, which explains each of the steps in more detail.
-
-Note: this section covers a few more aspects than the TL;DR and does things in
-a slightly different order. That's in your interest, to make sure you notice
-early if an issue that looks like a Linux kernel problem is actually caused by
-something else. These steps thus help to ensure the time you invest in this
-process won't feel wasted in the end:
-
- * Are you facing an issue with a Linux kernel a hardware or software vendor
- provided? Then in almost all cases you are better off to stop reading this
- document and reporting the issue to your vendor instead, unless you are
- willing to install the latest Linux version yourself. Be aware the latter
- will often be needed anyway to hunt down and fix issues.
-
- * Perform a rough search for existing reports with your favorite internet
- search engine; additionally, check the archives of the `Linux Kernel Mailing
- List (LKML) <https://lore.kernel.org/lkml/>`_. If you find matching reports,
- join the discussion instead of sending a new one.
-
- * See if the issue you are dealing with qualifies as regression, security
- issue, or a really severe problem: those are 'issues of high priority' that
- need special handling in some steps that are about to follow.
-
- * Make sure it's not the kernel's surroundings that are causing the issue
- you face.
-
- * Create a fresh backup and put system repair and restore tools at hand.
-
- * Ensure your system does not enhance its kernels by building additional
- kernel modules on-the-fly, which solutions like DKMS might be doing locally
- without your knowledge.
-
- * Check if your kernel was 'tainted' when the issue occurred, as the event
- that made the kernel set this flag might be causing the issue you face.
-
- * Write down coarsely how to reproduce the issue. If you deal with multiple
- issues at once, create separate notes for each of them and make sure they
- work independently on a freshly booted system. That's needed, as each issue
- needs to get reported to the kernel developers separately, unless they are
- strongly entangled.
-
- * If you are facing a regression within a stable or longterm version line
- (say something broke when updating from 5.10.4 to 5.10.5), scroll down to
- 'Dealing with regressions within a stable and longterm kernel line'.
-
- * Locate the driver or kernel subsystem that seems to be causing the issue.
- Find out how and where its developers expect reports. Note: most of the
- time this won't be bugzilla.kernel.org, as issues typically need to be sent
- by mail to a maintainer and a public mailing list.
-
- * Search the archives of the bug tracker or mailing list in question
- thoroughly for reports that might match your issue. If you find anything,
- join the discussion instead of sending a new report.
-
-After these preparations you'll now enter the main part:
-
- * Unless you are already running the latest 'mainline' Linux kernel, better
- go and install it for the reporting process. Testing and reporting with
- the latest 'stable' Linux can be an acceptable alternative in some
- situations; during the merge window that actually might be even the best
- approach, but in that development phase it can be an even better idea to
- suspend your efforts for a few days anyway. Whatever version you choose,
- ideally use a 'vanilla' build. Ignoring these advices will dramatically
- increase the risk your report will be rejected or ignored.
-
- * Ensure the kernel you just installed does not 'taint' itself when
- running.
-
- * Reproduce the issue with the kernel you just installed. If it doesn't show
- up there, scroll down to the instructions for issues only happening with
- stable and longterm kernels.
-
- * Optimize your notes: try to find and write the most straightforward way to
- reproduce your issue. Make sure the end result has all the important
- details, and at the same time is easy to read and understand for others
- that hear about it for the first time. And if you learned something in this
- process, consider searching again for existing reports about the issue.
-
- * If your failure involves a 'panic', 'Oops', 'warning', or 'BUG', consider
- decoding the kernel log to find the line of code that triggered the error.
-
- * If your problem is a regression, try to narrow down when the issue was
- introduced as much as possible.
-
- * Start to compile the report by writing a detailed description about the
- issue. Always mention a few things: the latest kernel version you installed
- for reproducing, the Linux Distribution used, and your notes on how to
- reproduce the issue. Ideally, make the kernel's build configuration
- (.config) and the output from ``dmesg`` available somewhere on the net and
- link to it. Include or upload all other information that might be relevant,
- like the output/screenshot of an Oops or the output from ``lspci``. Once
- you wrote this main part, insert a normal length paragraph on top of it
- outlining the issue and the impact quickly. On top of this add one sentence
- that briefly describes the problem and gets people to read on. Now give the
- thing a descriptive title or subject that yet again is shorter. Then you're
- ready to send or file the report like the MAINTAINERS file told you, unless
- you are dealing with one of those 'issues of high priority': they need
- special care which is explained in 'Special handling for high priority
- issues' below.
-
- * Wait for reactions and keep the thing rolling until you can accept the
- outcome in one way or the other. Thus react publicly and in a timely manner
- to any inquiries. Test proposed fixes. Do proactive testing: retest with at
- least every first release candidate (RC) of a new mainline version and
- report your results. Send friendly reminders if things stall. And try to
- help yourself, if you don't get any help or if it's unsatisfying.
-
-
-Reporting regressions within a stable and longterm kernel line
---------------------------------------------------------------
-
-This subsection is for you, if you followed above process and got sent here at
-the point about regression within a stable or longterm kernel version line. You
-face one of those if something breaks when updating from 5.10.4 to 5.10.5 (a
-switch from 5.9.15 to 5.10.5 does not qualify). The developers want to fix such
-regressions as quickly as possible, hence there is a streamlined process to
-report them:
-
- * Check if the kernel developers still maintain the Linux kernel version
- line you care about: go to the `front page of kernel.org
- <https://kernel.org/>`_ and make sure it mentions
- the latest release of the particular version line without an '[EOL]' tag.
-
- * Check the archives of the `Linux stable mailing list
- <https://lore.kernel.org/stable/>`_ for existing reports.
-
- * Install the latest release from the particular version line as a vanilla
- kernel. Ensure this kernel is not tainted and still shows the problem, as
- the issue might have already been fixed there. If you first noticed the
- problem with a vendor kernel, check a vanilla build of the last version
- known to work performs fine as well.
-
- * Send a short problem report to the Linux stable mailing list
- ([email protected]) and CC the Linux regressions mailing list
- ([email protected]); if you suspect the cause in a particular
- subsystem, CC its maintainer and its mailing list. Roughly describe the
- issue and ideally explain how to reproduce it. Mention the first version
- that shows the problem and the last version that's working fine. Then
- wait for further instructions.
-
-The reference section below explains each of these steps in more detail.
-
-
-Reporting issues only occurring in older kernel version lines
--------------------------------------------------------------
-
-This subsection is for you, if you tried the latest mainline kernel as outlined
-above, but failed to reproduce your issue there; at the same time you want to
-see the issue fixed in a still supported stable or longterm series or vendor
-kernels regularly rebased on those. If that the case, follow these steps:
-
- * Prepare yourself for the possibility that going through the next few steps
- might not get the issue solved in older releases: the fix might be too big
- or risky to get backported there.
-
- * Perform the first three steps in the section "Dealing with regressions
- within a stable and longterm kernel line" above.
-
- * Search the Linux kernel version control system for the change that fixed
- the issue in mainline, as its commit message might tell you if the fix is
- scheduled for backporting already. If you don't find anything that way,
- search the appropriate mailing lists for posts that discuss such an issue
- or peer-review possible fixes; then check the discussions if the fix was
- deemed unsuitable for backporting. If backporting was not considered at
- all, join the newest discussion, asking if it's in the cards.
-
- * One of the former steps should lead to a solution. If that doesn't work
- out, ask the maintainers for the subsystem that seems to be causing the
- issue for advice; CC the mailing list for the particular subsystem as well
- as the stable mailing list.
-
-The reference section below explains each of these steps in more detail.
+The detailed step-by-step guide on reporting Linux kernel issues
+================================================================
+
+The short guide above might be all needed for people already familiar
+with reporting issues to Free/Libre & Open Source Software projects. For
+everyone else there is this more detailed step-by-step guide. It still tries to
+be brief and leaves a lot of details occasionally relevant to a reference
+section, which holds additional information for almost all of the steps.
+
+Note: this step-by-step guide covers more aspects than the short guide above and
+does things in a slightly different order; that is done in the reader's interest,
+to make sure you notice early on when on the wrong track.
+
+* Be aware you must have or install a fresh vanilla mainline kernel for
+ reporting; you furthermore must remove any software that builds or relies on
+ externally developed kernel modules possibly installed. There is also a decent
+ chance you will have to build a patched kernel yourself to help resolve the
+ issue.
+
+ In case that sounds do demanding to you, better report the issue to the vendor
+ who built your kernel (usually your Linux distributor or hardware manufacturer).
+
+* Skim the output of ``journalctl -k`` for any indicators of problems that might
+ lead to your bug.
+
+* Check if the kernel was already 'tainted' when the issue first occurred: the
+ event that led to this flag being set might cause your issue, even if it looks
+ totally unrelated.
+
+* Consider some glitch in your kernel's environment makes it misbehave -- like
+ a hardware defect, a mis-configured system firmware, an overclocked component,
+ a broken initramfs, an inconsistent file system, broken firmware files,
+ a pre-release compiler, or a malfunctioning/misconfigured Linux distribution.
+
+* If you deal with multiple issues at once, process them separately from now on.
+ If there is even a small chance they are related, briefly mention the other
+ issues in each of the reports later, ideally while linking to the others.
+
+* Search for fixes and earlier reports referring to an issue like yours. Start
+ by checking `lore <https://lore.kernel.org/all/>`_. Then perform a general
+ internet search. Consult :ref:`MAINTAINERS <maintainers>` to determine where
+ developers of the affected code expect bugs to be submitted to; if in a doubt,
+ use your best guess to determine the driver or kernel subsystem. If its
+ developers have a dedicated mailing list not archived on lore, search its
+ archives; when they are among the few that uses one of
+ various bug trackers, search it as well. Note, bugzilla.kernel.org
+ is the right place to file bugs only for a small percentage of the kernel; if
+ you submit bugs for other code there it most likely will be ignored.
+
+ If you find fixes, try them. If you find matching reports, evaluate whatever
+ is wiser: joining the discussion or reporting the problem anew. In the latter
+ case mention and link to the related report you found; after you submit it,
+ add a note to the related report along the lines of 'I have a problem that
+ might be the same or related, for details see <link_to_your_report>'.
+
+* Are you facing a regression? One still occurring with a less than two
+ (ideally: one) weeks old kernel from the affected series? A kernel that is
+ vanilla or close to it? Then send a brief (one or two short paragraphs) email
+ to <[email protected]> asking if the problem is known already.
+ Consider proceeding with this guide immediately to confine the problem and
+ report it properly; definitely do so, if you don't receive any helpful
+ answer within three days.
+
+* Evaluate if the issue you are dealing with qualifies as regression, security
+ issue, or a really severe problem: those need special handling in some of the
+ following steps.
+
+* Write down coarsely how to reproduce the issue on a freshly booted system.
+
+* Verify the bug and potentially bisect any regression as described in
+ Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst;
+ alternatively handle the tasks it covers on your own:
+
+ * Verify the bug occurs with an up-to-date kernel. For regressions within a
+ still supported stable or longterm series this means the latest release from
+ that series. In all other cases, this means a mainline release, pre-release, or
+ snapshot ideally less than one week old and two at maximum; the latest release
+ from the newest stable series might work as well, especially if the series
+ is based on a mainline version released in the past two weeks.
+
+ * In case of a regression, consider bisecting it. If it is one within a stable
+ or longterm series, you must verify if current mainline is affected as well.
+
+ * All kernels used for verifying and reporting bugs must be free of externally
+ developed modules (like Nvidia's graphics drivers, OpenZFS, or VirtualBox's
+ host drivers). The kernels also should be built from pristine (aka 'vanilla')
+ Linux sources, but lightly patched might work, too. The kernels furthermore
+ should not be 'tainted' when the issue occurs.
+
+ Note, don't skip this step or take its demands lightheartedly, as there is a
+ decent chance your report otherwise will be ignored or welcomed brusquely.
+
+* If you learned anything new about the bug while following this guide so far,
+ consider searching once more for earlier reports and fixes.
+
+* Were you unable to reproduce a bug with a current mainline kernel you want to
+ see fixed in a stable or longterm series? A bug that is not a regression? Then
+ move over to ‘Resolving non-regressions only occurring in stable or longterm
+ kernels’.
+
+* Optional: if your failure involves a 'panic', 'Oops', 'warning', or 'BUG',
+ ideally decode the included stack trace.
+
+* Prepare the report by writing a detailed description of the issue.
+
+ Always mention the Linux distribution and the kernel version used for the
+ verification; also include your notes on how to reproduce the issue. If your
+ failure involves a 'panic', 'Oops', 'warning', or 'BUG', include a copy or
+ photo of it.
+
+ Most of the time you also want to describe relevant aspects of your
+ environment, like the machine's model name, the relevant hardware components,
+ or the version of related userspace drivers. Often you want to also save the
+ output of ``journalctl -k`` to a file you later attach to your report or
+ upload somewhere and link to.
+
+ If there other aspects about the environment likely are relevant, attach or
+ upload & link detailed information about is as well, like the output from
+ commands as ``lsblk``, ``lspci``, ``lsusb.py`` and
+ ``grep -s '' /sys/class/dmi/id/*``.
+
+ If anything in the attached or linked files is certainly relevant, ensure
+ to copy that part to the body of the report to make it easily accessible.
+ Furthermore make sure to not overload the report with many or huge
+ attachments: developers will ask for additional data when needed.
+
+ Ensure both the subject and the first sentence of the report outlines the core
+ of the problem and gets people interested enough to read on.
+
+ When finished, review and optimize the report once more to make it as
+ straightforward as possible and the core of the problem easy to grasp.
+
+* Submit your report in the appropriate way, which depends on the outcome of the
+ verification:
+
+ * In case you deal with a security issue, follow the instructions in
+ Documentation/process/security-bugs.rst.
+
+ * Are you facing a regression within a stable or longterm kernel series you
+ were unable to reproduce with a fresh mainline kernel? Then report it by
+ email to the stable team while CCing the regressions lists (To:
+ Greg Kroah-Hartman <[email protected]>,
+ Sasha Levin <[email protected]>; CC: [email protected],
+ [email protected]).
+
+ * In all other cases, submit the report as specified in MAINTAINERS. In case
+ of a regression you have to report by mail, CC the regressions list
+ ([email protected]); when you know the culprit, also CC everyone
+ in its 'Signed-off-by' chain. In case of a regression you had to file in a
+ bug tracker, write a short heads-up email with a link to the report to the
+ list and everyone that signed the patch off, if the culprit is known.
+
+ Did you send the brief inquiry about a regression mentioned earlier? Then in
+ both of these cases keep it involved: either send your report as a reply to
+ the earlier inquiry while adding relevant recipients or send a quick note
+ with a link to the proper report.
+
+* Wait for reactions and keep the ball rolling until you can accept the outcome
+ in one way or the other. That among others means:
+
+ * React publicly and in a timely manner to any inquiries.
+
+ * Try to quickly test proposed fixes.
+
+ * Perform proactive testing: retest with at least every first release
+ candidate (e.g. -rc1) of a new mainline version and report your findings in
+ a reply to your report.
+
+ * If things stall for more than three or four weeks, check if that happened
+ due to an inadequate report of yours; if not, send a friendly inquiry.
+
+ * Be aware that nobody is obliged to help you, unless it is a recent
+ regression, a security issue, or a really severe problem; hence try to help
+ yourself, if you don't receive any or only unsatisfying help.
+
+Resolving non-regressions only occurring in stable or longterm kernels
+----------------------------------------------------------------------
+
+Are you facing an issue in a still supported stable or longterm series you were
+unable to reproduce with a fresh mainline kernel? An issue that is also not a
+regression and still happens in the series latest release? In that case follow
+these steps:
+
+* Prepare yourself for the possibility that trying to resolve the issue resolved
+ in the affected stable or longterm series might not work out: the fix might be
+ too big or risky to include there.
+
+* Search Linux' mainline Git repository or lore for the change that resolved the
+ issue; when unsuccessful, consider using a bisection to find it. Then check
+ the description of the fix for a 'stable tag', e.g, a line like
+ 'Cc: <[email protected]>':
+
+ * In case there is such a tag the change is already scheduled for backporting.
+ Usually it will be picked up within two or three weeks after being merged to
+ mainline. Note, a version number after the tag might limit backporting to a
+ series that is newer than the one you care for; plans to backport a change
+ sometimes are also discarded. In such cases search lore or contact the
+ involved developers for details, but you likely are out of luck.
+
+ * If there was no stable tag, search the mailing list archives if backporting
+ nevertheless is in the works. If not, search for the review of the fix and
+ check if backporting to stable and longterm kernels is planned or was
+ rejected. If it's neither, send a reply asking the developers if backporting
+ to the series is an option. Note, they might greenlight it, but unwilling to
+ handle the job themselves -- in that case consider testing and submitting the
+ fix and everything it depends on as explained in
+ Documentation/process/stable-kernel-rules.rst.
+
+ In case you have trouble locating the fix or the discussion about it, consider
+ asking the maintainers and developers of the affected subsystem for advice.


Reference section: Reporting issues to the kernel maintainers
--
2.44.0


2024-03-26 12:26:51

by Thorsten Leemhuis

[permalink] [raw]
Subject: [RFC PATCH v1 2/2] docs: reporting-issue: rework the TLDR

Rework the TLDR (aka the short guide) for various reasons:

* People had to read it entirely and then act upon what they learned,
which from feedback I got was apparently somewhat hard and confusing
given everything we expect from bug reporters; this partly was because
the first paragraph covered a special case (regression in
stable/longterm kernel) and not the main aspect most people cared
about when they came to the document.

Use a step-by-step approach to avoid this.

* Make use of
Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst

* The 'quickly report a stable regression to the stable team' approach
hardly worked out: most of the time the regression was not known yet.
Try a different approach using the regressions list.

* Reports about stable/longterm regressions most of the time were
greeted with a brief reply along the lines of 'Is mainline affected as
well?'; this is needed to determine who is responsible, so it might as
well make the reporter check that before sending the report (which
verify-bugs-and-bisect-regressions.rst already tells them to do, too).

Not-signed-off-by: Thorsten Leemhuis <[email protected]>
---
.../admin-guide/reporting-issues.rst | 104 +++++++++++-------
1 file changed, 62 insertions(+), 42 deletions(-)

diff --git a/Documentation/admin-guide/reporting-issues.rst b/Documentation/admin-guide/reporting-issues.rst
index e6083946c146e8..5f3c840ab94524 100644
--- a/Documentation/admin-guide/reporting-issues.rst
+++ b/Documentation/admin-guide/reporting-issues.rst
@@ -5,48 +5,68 @@ Reporting issues
++++++++++++++++


-The short guide (aka TL;DR)
-===========================
-
-Are you facing a regression with vanilla kernels from the same stable or
-longterm series? One still supported? Then search the `LKML
-<https://lore.kernel.org/lkml/>`_ and the `Linux stable mailing list
-<https://lore.kernel.org/stable/>`_ archives for matching reports to join. If
-you don't find any, install `the latest release from that series
-<https://kernel.org/>`_. If it still shows the issue, report it to the stable
-mailing list ([email protected]) and CC the regressions list
-([email protected]); ideally also CC the maintainer and the mailing
-list for the subsystem in question.
-
-In all other cases try your best guess which kernel part might be causing the
-issue. Check the :ref:`MAINTAINERS <maintainers>` file for how its developers
-expect to be told about problems, which most of the time will be by email with a
-mailing list in CC. Check the destination's archives for matching reports;
-search the `LKML <https://lore.kernel.org/lkml/>`_ and the web, too. If you
-don't find any to join, install `the latest mainline kernel
-<https://kernel.org/>`_. If the issue is present there, send a report.
-
-The issue was fixed there, but you would like to see it resolved in a still
-supported stable or longterm series as well? Then install its latest release.
-If it shows the problem, search for the change that fixed it in mainline and
-check if backporting is in the works or was discarded; if it's neither, ask
-those who handled the change for it.
-
-**General remarks**: When installing and testing a kernel as outlined above,
-ensure it's vanilla (IOW: not patched and not using add-on modules). Also make
-sure it's built and running in a healthy environment and not already tainted
-before the issue occurs.
-
-If you are facing multiple issues with the Linux kernel at once, report each
-separately. While writing your report, include all information relevant to the
-issue, like the kernel and the distro used. In case of a regression, CC the
-regressions mailing list ([email protected]) to your report. Also try
-to pin-point the culprit with a bisection; if you succeed, include its
-commit-id and CC everyone in the sign-off-by chain.
-
-Once the report is out, answer any questions that come up and help where you
-can. That includes keeping the ball rolling by occasionally retesting with newer
-releases and sending a status update afterwards.
+Reporting issues
+++++++++++++++++
+
+The short guide on reporting Linux kernel issues (aka "the TL;DR")
+==================================================================
+
+Rule out something external causes your kernel to misbehave: skim the output of
+``journalctl -k``; make sure the kernel is not tainted; consider a glitch in the
+environment (hardware, firmware, initramfs, distribution, file system, ...).
+
+If you deal with multiple issues, process each separately.
+
+Search `lore <https://lore.kernel.org/all/>`_ for earlier reports and fixes.
+Then the wider internet. Consult :ref:`MAINTAINERS <maintainers>` to determine
+how bugs for the affected driver or subsystem must be submitted. This is usually
+by mail and rarely bugzilla.kernel.org; if the driver or subsystem uses an
+externally archived list or one of various bug trackers, search those as well.
+
+In case you deal with a regression still occurring in a less than two (ideally:
+one) weeks old kernel that is vanilla or close to it: send a brief email to
[email protected] asking if the regression is known; consider
+continuing this guide right afterwards, but definitely do so if you do not
+receive a positive reply within three days.
+
+Verify the bug and in case of a regression potentially bisect it as described in
+Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst; alternatively
+perform these tasks through different measures as outlined in more detailed
+step-by-step guide below.
+
+Were you unable to reproduce a bug with a mainline kernel you want to see fixed
+in a stable or longterm series? A bug that is not a regression? Then move over
+to 'Resolving non-regressions only occurring in stable or longterm kernels'.
+
+Compile a report with all important details. This always includes the
+distribution and kernel version used. Most of the time you also want to describe
+relevant aspects of your system and make the kernel's log messages available; do
+the same for everything else most likely relevant. In case of a regression, make
+that aspect obvious in the title; also specify the last working and first broken
+early in the body.
+
+Submit your report in the appropriate way, which depends on the outcome of the
+verification:
+
+* Are you facing a regression within a stable or longterm kernel series you were
+ unable to reproduce in a mainline kernel? Then report it by email to the
+ stable team while CCing the regressions list (e.g. To: Greg Kroah-Hartman
+ <[email protected]>, Sasha Levin <[email protected]>;
+ CC: [email protected], [email protected]) and everyone in the
+ culprit's 'Signed-off-by' chain.
+
+* In all other cases, submit the report as specified in MAINTAINERS. In case of
+ a regression you have to report by mail, CC the regressions list
+ ([email protected]); when you know the culprit, also CC everyone in
+ its 'Signed-off-by' chain. In case of a regression you have to file in a bug
+ tracker, write a short heads-up email with a link to the report to the list
+ once you have done so -- if the culprit is known, CC everyone that signed the
+ culprit off, too.
+
+Answer any questions in a timely manner and help where you can to resolve the
+issue. Retest with at least every first release candidate (-rc1) of a new
+mainline version and report your findings.
+

The detailed step-by-step guide on reporting Linux kernel issues
================================================================
--
2.44.0


2024-04-10 20:59:06

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [RFC PATCH v1 1/2] docs: reporting-issue: rework the detailed guide

Thorsten Leemhuis <[email protected]> writes:

> Rework the detailed step-by-step guide for various reasons:
>
> * Simplify the search with the help of lore.kernel.org/all/, which did
> not exist when the text was written.
>
> * Make use of the recently added document
> Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst,
> which covers many steps this text partly covered way better.
>
> * The 'quickly report a stable regression to the stable team' approach
> hardly worked out: most of the time the regression was not known yet.
> Try a different approach using the regressions list.
>
> * Reports about stable/longterm regressions most of the time were
> greeted with a brief reply along the lines of 'Is mainline affected as
> well?'; this is needed to determine who is responsible, so we might as
> well make the reporter check that before sending the report (which
> verify-bugs-and-bisect-regressions.rst already tells them to do, too).
>
> * A lot of fine tuning after seeing what people were struggling with.

So I have read through this, and don't find anything objectionable. I
will point out that each of those bullet items above might be better
handled in a separate patch; the result might be easier to review.

Thanks,

jon

2024-04-10 21:00:15

by Jonathan Corbet

[permalink] [raw]
Subject: Re: [RFC PATCH v1 2/2] docs: reporting-issue: rework the TLDR

Thorsten Leemhuis <[email protected]> writes:

> Rework the TLDR (aka the short guide) for various reasons:
>
> * People had to read it entirely and then act upon what they learned,
> which from feedback I got was apparently somewhat hard and confusing
> given everything we expect from bug reporters; this partly was because
> the first paragraph covered a special case (regression in
> stable/longterm kernel) and not the main aspect most people cared
> about when they came to the document.
>
> Use a step-by-step approach to avoid this.
>
> * Make use of
> Documentation/admin-guide/verify-bugs-and-bisect-regressions.rst
>
> * The 'quickly report a stable regression to the stable team' approach
> hardly worked out: most of the time the regression was not known yet.
> Try a different approach using the regressions list.
>
> * Reports about stable/longterm regressions most of the time were
> greeted with a brief reply along the lines of 'Is mainline affected as
> well?'; this is needed to determine who is responsible, so it might as
> well make the reporter check that before sending the report (which
> verify-bugs-and-bisect-regressions.rst already tells them to do, too).
>
> Not-signed-off-by: Thorsten Leemhuis <[email protected]>
> ---
> .../admin-guide/reporting-issues.rst | 104 +++++++++++-------
> 1 file changed, 62 insertions(+), 42 deletions(-)

From a quick read, no objections here.

jon