2020-11-12 18:03:16

by Thorsten Leemhuis

[permalink] [raw]
Subject: [RFC PATCH v2 20/26] docs: reporting-bugs: instructions for handling regressions

Describe what users will have to do if they deal with a regression.
Point out that bisection is really important.

While at it explicitly mention the .config files for the newer kernel
needs to be similar to the old kernel, as that's an important detail
quite a few people seem to miss sometimes.

Signed-off-by: Thorsten Leemhuis <[email protected]>
---
Documentation/admin-guide/bug-bisect.rst | 2 +
Documentation/admin-guide/reporting-bugs.rst | 54 ++++++++++++++++++++
2 files changed, 56 insertions(+)

diff --git a/Documentation/admin-guide/bug-bisect.rst b/Documentation/admin-guide/bug-bisect.rst
index 59567da344e8..38d9dbe7177d 100644
--- a/Documentation/admin-guide/bug-bisect.rst
+++ b/Documentation/admin-guide/bug-bisect.rst
@@ -1,3 +1,5 @@
+.. _bugbisect:
+
Bisecting a bug
+++++++++++++++

diff --git a/Documentation/admin-guide/reporting-bugs.rst b/Documentation/admin-guide/reporting-bugs.rst
index be866dd1e6b6..6646d393f21a 100644
--- a/Documentation/admin-guide/reporting-bugs.rst
+++ b/Documentation/admin-guide/reporting-bugs.rst
@@ -851,6 +851,60 @@ sometimes needs to get decoded to be readable, which is explained in
admin-guide/bug-hunting.rst.


+Special care for regressions
+----------------------------
+
+ *If your problem is a regression, try to narrow down when the issue was
+ introduced as much as possible.*
+
+Linux lead developer Linus Torvalds insists that the Linux kernel never
+worsens, that's why he deems regressions as unacceptable and wants to see them
+fixed quickly. That's why changes that introduced a regression are often
+promptly reverted if the issue they cause can't get solved quickly any other
+way. Reporting a regression is thus a bit like playing a kind of trump card to
+get something quickly fixed. But for that to happen the change that's causing
+the regression needs to be known. Normally it's up to the reporter to track
+down the culprit, as maintainers often won't have the time or setup at hand to
+reproduce it themselves.
+
+To find the change there is a process called 'bisection' which the document
+:ref:`Documentation/admin-guide/bug-bisect.rst <bugbisect>` describes in
+detail. That process will often require you to build about ten to twenty kernel
+images, trying to reproduce the issue with each of them before building the
+next. Yes, that takes some time, but don't worry, it works a lot quicker than
+most people assume. Thanks to a 'binary search' this will lead you to the one
+commit in the source code management system that's causing the regression. Once
+you find it, search the net for the subject of the change, its commit id and
+the shortened commit id (the first 12 characters of the commit id). This will
+lead you to existing reports about it, if there are any.
+
+Note, a bisection needs a bit of know-how, which not everyone has, and quite a
+bit of effort, which not everyone is willing to invest. Nevertheless, it's
+highly recommended performing a bisection yourself. If you really can't or
+don't want to go down that route at least find out which mainline kernel
+introduced the regression. If something for example breaks when switching from
+5.5.15 to 5.8.4, then try at least all the mainline releases in that area (5.6,
+5.7 and 5.8) to check when it first showed up. Unless you're trying to find a
+regression in a stable or longterm kernel, avoid testing versions which number
+has three sections (5.6.12, 5.7.8), as that makes the outcome hard to
+interpret, which might render your testing useless. Once you found the major
+version which introduced the regression, feel free to move on in the reporting
+process. But keep in mind: it depends on the issue at hand if the developers
+will be able to help without knowing the culprit. Sometimes they might
+recognize from the report want went wrong and can fix it; other times they will
+be unable to help unless you perform a bisection.
+
+When dealing with regressions make sure the issue you face is really caused by
+the kernel and not by something else, as outlined above already.
+
+In the whole process keep in mind: an issue only qualifies as regression if the
+older and the newer kernel got built with a similar configuration. The best way
+to archive this: copy the configuration file (``.config``) from the old working
+kernel freshly to each newer kernel version you try. Afterwards run ``make
+oldnoconfig`` to adjust it for the needs of the new version without enabling
+any new feature, as those are allowed to cause regressions.
+
+
.. ############################################################################
.. Temporary marker added while this document is rewritten. Sections above
.. are new and dual-licensed under GPLv2+ and CC-BY 4.0, those below are old.
--
2.28.0