2023-08-01 12:17:52

by Ricardo Cañuelo

[permalink] [raw]
Subject: Kernel regression tracking/reporting initiatives and KCIDB

Hi all,

I'm Ricardo from Collabora. In the past months, we’ve been analyzing the
current status of CI regression reporting and tracking in the Linux
kernel: assessing the existing tools, testing their functionalities,
collecting ideas about desirable features that aren’t available yet and
sketching some of them.

As part of this effort, we wrote a Regression Tracker tool [1] as a
proof of concept. It’s a rather simple tool that takes existing
regression data and reports and uses them to show more context on each
reported regression, as well as highlighting the relationships between
them, whether they can be caused by an infrastructure error and other
additional metadata about their current status. We’ve been using it
mostly as a playground for us to explore the current status of the
functionalities provided by CI systems and to test ideas about new
features.

We’re also checking other tools and services provided by the community,
such as regzbot [2], collaborating with them when possible and thinking
about how to combine multiple scattered efforts by different people
towards the same common goal. As a first step, we’ve contributed to
regzbot and partially integrated its results into the Regression Tracker
tool.

So far, we’ve been using the KernelCI regression data and reports as a
data source, we're now wondering if we could tackle the problem with a
more general approach by building on top of what KCIDB already provides.

In general, CI systems tend to define regressions as a low-level concept
which is rather static: a snapshot of a test result at a certain point
in time. When it comes to reporting them to developers, there's much
more info that could be added to them. In particular, the context of it
and the fact that a reported regression has a life cycle:

- did this test also fail on other hardware targets or with other kernel
configurations?
- is it possible that the test failed because of an infrastructure
error?
- does the test fail consistently since that commit or does it show
unstable results?
- does the test output show any traces of already known bugs?
- has this regression been bisected and reported anywhere?
- was the regression reported by anyone? If so, is there someone already
working on it?

Many of these info points can be extracted from the CI results databases
and processed to provide additional regression data. That’s what we’re
trying to do with the Regression Tracker tool, and we think it’d be
interesting to start experimenting with the data in KCIDB to see how
this could be improved and what would be the right way to integrate this
type of functionality.

Please let us know if that's a possibility and if you'd like to add
anything to the ideas proposed above.

Cheers,
Ricardo

[1] https://kernel.pages.collabora.com/kernelci-regressions-tracker/
[2] https://linux-regtracking.leemhuis.info/regzbot/all/


2023-08-02 08:44:26

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: Kernel regression tracking/reporting initiatives and KCIDB

On 01.08.23 13:47, Ricardo Cañuelo wrote:
>
> I'm Ricardo from Collabora. In the past months, we’ve been analyzing the
> current status of CI regression reporting and tracking in the Linux
> kernel: assessing the existing tools, testing their functionalities,
> collecting ideas about desirable features that aren’t available yet and
> sketching some of them.

Thx for your mail.

Side note: I wonder how many people are interested in this topic and if
it would be a good idea to have a session about it at the kernel summit
on this years LPC; there afaik are still a few days left to submit a
session about this; or did somebody already do this?

But FWIW, I'm currently taken back and forth if I want to make the trip
to the US; as things stand currently I likely won't go. Anyway, back to
your mail.

> As part of this effort, we wrote a Regression Tracker tool [1] as a
> proof of concept. It’s a rather simple tool that takes existing
> regression data and reports and uses them to show more context on each
> reported regression, as well as highlighting the relationships between
> them, whether they can be caused by an infrastructure error and other
> additional metadata about their current status. We’ve been using it
> mostly as a playground for us to explore the current status of the
> functionalities provided by CI systems and to test ideas about new
> features.

/me wanted to say "might be good to call it KernelCI regression tracker
instead to avoid confusion", but then saw that the page behind the the
url "[1]" above already calls it exactly that :-D

> We’re also checking other tools and services provided by the community,
> such as regzbot [2], collaborating with them when possible and thinking
> about how to combine multiple scattered efforts by different people
> towards the same common goal. As a first step, we’ve contributed to
> regzbot and partially integrated its results into the Regression Tracker
> tool.

thx again for those contributions!

> So far, we’ve been using the KernelCI regression data and reports as > data source, we're now wondering if we could tackle the problem with a
> more general approach by building on top of what KCIDB already provides.

That's more your area of expertise, but I have to wonder: doesn't that
mainly depend on what the people/projects want which feed their test
results into KCIDB? I had expected some of them might already have
something to stay on top of regressions found by their systems, to at
least ensure they notice and fix tests that broke for external reasons
-- e.g. a test script going sideways, faulty hardware, a network
miss-configuration or others things which naturally will occur in this
line of work.

> In general, CI systems tend to define regressions as a low-level concept
> which is rather static: a snapshot of a test result at a certain point
> in time. When it comes to reporting them to developers, there's much
> more info that could be added to them.

I wonder if it should be s/could/should/ here, as *if I* would be
running CI systems I'd fear that developers sooner or later might start
ignoring more and more of the reports my systems sends when too many of
them turn out to be bogus/misleading -- which naturally will happen for
various reasons you outlined below yourself (broken
hardware/test/network/...) (and seem to happen regularly, as mentioned
in https://lwn.net/Articles/939538/ ).

That doesn't mean that I think each failed test should be judged by a
human before it's sent to the developers. Compile errors for example
will be helpful often right away, especially for stable-rc.

> In particular, the context of it
> and the fact that a reported regression has a life cycle:
>
> - did this test also fail on other hardware targets or with other kernel
> configurations?
> - is it possible that the test failed because of an infrastructure
> error?
> - does the test fail consistently since that commit or does it show
> unstable results?
> - does the test output show any traces of already known bugs?
> - has this regression been bisected and reported anywhere?
> - was the regression reported by anyone? If so, is there someone already
> working on it?
>
> Many of these info points can be extracted from the CI results databases
> and processed to provide additional regression data. That’s what we’re
> trying to do with the Regression Tracker tool, and we think it’d be
> interesting to start experimenting with the data in KCIDB to see how
> this could be improved and what would be the right way to integrate this
> type of functionality.

I (with my likely somewhat biased view due to regzbot and my work with
it) wonder if we have two aspects here that might be wise to keep separated:

* tests suddenly failing in one or multiple CI systems, which might be
due to something going sideways in the tests or a real kernel regression

* regressions found by individuals or CI systems where a human with some
knowledge about the kernel did a sanity check (and also looked for
duplicates) to ensure this most likely is a regression that should be
acted upon -- and thus is also something that definitely should not be
forgotten.

Your regression tracking tool could be the former, regzbot the latter
(which could feed the outcome back to the CI regression tracking
system). But as I said, my view is obviously biased, so maybe I'm to
blinded to see a better solution.

> [1] https://kernel.pages.collabora.com/kernelci-regressions-tracker/
> [2] https://linux-regtracking.leemhuis.info/regzbot/all/

Ciao, Thorsten

2023-08-04 18:06:49

by Nikolai Kondrashov

[permalink] [raw]
Subject: Re: Kernel regression tracking/reporting initiatives and KCIDB

Hi Ricardo,

On 8/1/23 14:47, Ricardo Cañuelo wrote:
> Hi all,
>
> I'm Ricardo from Collabora. In the past months, we’ve been analyzing the
> current status of CI regression reporting and tracking in the Linux
> kernel: assessing the existing tools, testing their functionalities,
> collecting ideas about desirable features that aren’t available yet and
> sketching some of them.
>
> As part of this effort, we wrote a Regression Tracker tool [1] as a
> proof of concept. It’s a rather simple tool that takes existing
> regression data and reports and uses them to show more context on each
> reported regression, as well as highlighting the relationships between
> them, whether they can be caused by an infrastructure error and other
> additional metadata about their current status. We’ve been using it
> mostly as a playground for us to explore the current status of the
> functionalities provided by CI systems and to test ideas about new
> features.
>
> We’re also checking other tools and services provided by the community,
> such as regzbot [2], collaborating with them when possible and thinking
> about how to combine multiple scattered efforts by different people
> towards the same common goal. As a first step, we’ve contributed to
> regzbot and partially integrated its results into the Regression Tracker
> tool.

Nicely done!

Especially cooperating with regzbot, something I haven't seen so far.

Various other kernel CI systems have been building similar things for a while,
and it's nice to see the trend growing. It means we're getting somewhere.

I tried to review these efforts last year:
https://archive.fosdem.org/2022/schedule/event/masking_known_issues_across_six_kernel_ci_systems/

> So far, we’ve been using the KernelCI regression data and reports as a
> data source, we're now wondering if we could tackle the problem with a
> more general approach by building on top of what KCIDB already provides.

Yes, I would love to work with you on a KCIDB implementation/integration.

I've been exploring and implementing a solution for tracking regressions (or
"known issues" as I usually call them) based on what I researched (and
presented above).

At this moment KCIDB submitters can send data linking a particular test or
build result to an issue and its category (kernel/test/framework). We can
generate notifications on e.g. a new issue being found by CI system
(maintainers) in a particular repo/branch. There's no support on dashboards
yet, and I'm yet to push for integration with particular CI systems.

Here's the full announcement with examples:
https://lore.kernel.org/kernelci/[email protected]/

Admittedly it's not much, and you're much further along (as many other CI
systems), but we have further plans (more on that below).

> In general, CI systems tend to define regressions as a low-level concept
> which is rather static: a snapshot of a test result at a certain point
> in time. When it comes to reporting them to developers, there's much
> more info that could be added to them. In particular, the context of it
> and the fact that a reported regression has a life cycle:
>
> - did this test also fail on other hardware targets or with other kernel
> configurations?
> - is it possible that the test failed because of an infrastructure
> error?
> - does the test fail consistently since that commit or does it show
> unstable results?
> - does the test output show any traces of already known bugs?
> - has this regression been bisected and reported anywhere?
> - was the regression reported by anyone? If so, is there someone already
> working on it?
>
> Many of these info points can be extracted from the CI results databases
> and processed to provide additional regression data. That’s what we’re
> trying to do with the Regression Tracker tool, and we think it’d be
> interesting to start experimenting with the data in KCIDB to see how
> this could be improved and what would be the right way to integrate this
> type of functionality.

These all are very useful insights to extract from the data, nicely done!

Here's how they map to KCIDB:

> - did this test also fail on other hardware targets or with other kernel
> configurations?

KCIDB doesn't have a schema for identifying hardware at this moment. We can
work on that, but meanwhile KCIDB dashboards wouldn't be able to show this.

> - is it possible that the test failed because of an infrastructure
> error?

Not sure how to approach this in KCIDB. How do you (plan to) do it?

> - does the test fail consistently since that commit or does it show
> unstable results?

This is a difficult thing to properly figure out in KCIDB, because it
aggregates data from multiple CI systems. A single CI system can assume that
earlier results for a branch correspond to earlier commits. However, because
different CI systems test at different speeds, KCIDB cannot make that
assumption. Commits and their testing results can come in any order. So we
cannot draw these kinds of conclusions based on time alone.

The only way KCIDB can make this work correctly is by correlating with actual
git history, following the commit graph. I did some research into graph
databases and while they can potentially help us do it, their performance with
the actual Linux kernel git history turned out to be abysmal, due to a large
number of nodes and edges, and the lack of optimization for DAGs:

https://fosdem.org/2023/schedule/event/graph_case_for_dag/

I got an optimistic promise from Neo4j folks to have this possibly working by
next FOSDEM, but I wouldn't hold my breath for that. The fallback plan is to
hack something together using libgit2 and/or the git command-tools.

Before that happens, I think we can still do other things, just with time, to
help us along.

E.g. on the dashboard of a particular test result, display graphs of this
test's results over time: overall, and for
architecture/compiler/config/repo/branch of this test run. And something
similar for test views on revision/checkout/build dashboards.

BTW, a couple Mercurial folks approached me after the talk above, saying that
they're working on supporting storing test results in history so they could do
a similar kind of correlation and reasoning. So the idea is in the air.

> - does the test output show any traces of already known bugs?
> - was the regression reported by anyone? If so, is there someone already
> working on it?

This is what the KCIDB issue-linking support described above is working
towards. Next step is to build a triaging system linking issues to build/test
results automatically, based on patterns submitted by both CI systems, via the
regular submission interface, and humans, via a dedicated UI.

Patterns would specify which issue (bug URL) they're matching and include
basic things like test name, architecture, hardware, and so on, but also
patterns to find in e.g. test output files, logs, or in dmesg.

That should answer questions of whether a test or a build exhibit a particular
issue seen before.

> - has this regression been bisected and reported anywhere?

Once we have history correlation mentioned above, then we would be able to
find the PASS/FAIL boundaries between commits for particular issues, already
based on just issue-linking reported by CI systems (even before implementing
triaging).

This would be a way to detect bisections, among other things. I.e. detecting
if two adjacent commits both have results of a particular test, and they are
different. This would, of course, also detect cases when the results just
happened to appear in adjacent commits, not only because of bisection.

I think this could be done more generally via frequency domain analyzis (FFT)
of test outcomes over git history, which would also detect cases of flaky test
changing failure frequency. But here I'm getting waaay ahead of myself :D

Anyway, these are my ideas for KCIDB. I would love to hear your ideas as well
feedback on the above. Email, IRC, Slack, or a video call would all do :D

--

One comment regarding the prototype you shared is that it's quite verbose and
it's difficult to put together a feeling of what's been happening from
overabundance of textual information. I think a visual touch could help here.

E.g. drawing a timeline of test results, pointing particular events (first
failed, first passed, stability and so on) along its length.

So instead of this:

> first failed: today (2023-08-02)
>
> kernel: chromeos-stable-20230802.0
> commit: 5c04267bed569d41aea3940402c7ce8cf975a5fe
>
> most recent fail: today (2023-08-02)
>
> kernel: chromeos-stable-20230802.0
> commit: 5c04267bed569d41aea3940402c7ce8cf975a5fe
>
> last passed: 1 day ago (2023-08-01)
>
> kernel: chromeos-stable-20230801.1
> commit: cd496545d91d820441277cd6a855b9af725fdb8a

Something like this (roughly):

|
2023-08-02 F - last FAIL
F
|
P
F
|
2023-08-02 F - first FAIL
|
2023-08-01 P - last PASS
|
P

And e.g. have the commit and other extra info pop up as needed when hovering
over the status (F/P) letters/icons

And in general try to express information more visually, so it could be
absorbed at a glance, without needing to read much text, and tuck away
information that's not immediately necessary into more on-hover popups.

---

Hope this helps, and thanks for reading through :D
Nick

2023-08-07 09:02:01

by Nikolai Kondrashov

[permalink] [raw]
Subject: Re: Kernel regression tracking/reporting initiatives and KCIDB

Hi Thorsten,

On 8/2/23 11:07, Thorsten Leemhuis wrote:
> On 01.08.23 13:47, Ricardo Cañuelo wrote:
>> So far, we’ve been using the KernelCI regression data and reports as
>> data source, we're now wondering if we could tackle the problem with a
>> more general approach by building on top of what KCIDB already provides.
>
> That's more your area of expertise, but I have to wonder: doesn't that
> mainly depend on what the people/projects want which feed their test
> results into KCIDB? I had expected some of them might already have
> something to stay on top of regressions found by their systems, to at
> least ensure they notice and fix tests that broke for external reasons
> -- e.g. a test script going sideways, faulty hardware, a network
> miss-configuration or others things which naturally will occur in this
> line of work.

Yes, some of this is already done by some CI systems submitting results to
KCIDB. Syzbot is doing a very good job deduplicating crashes they have found,
0day is looking for outcome differences, AFAIK, and CKI has its known-issue
tracking system, which handles problems of various origin.

>> In general, CI systems tend to define regressions as a low-level concept
>> which is rather static: a snapshot of a test result at a certain point
>> in time. When it comes to reporting them to developers, there's much
>> more info that could be added to them.
>
> I wonder if it should be s/could/should/ here, as *if I* would be
> running CI systems I'd fear that developers sooner or later might start
> ignoring more and more of the reports my systems sends when too many of
> them turn out to be bogus/misleading -- which naturally will happen for
> various reasons you outlined below yourself (broken
> hardware/test/network/...) (and seem to happen regularly, as mentioned
> in https://lwn.net/Articles/939538/ ).

Yes, this is a constant struggle.

> That doesn't mean that I think each failed test should be judged by a
> human before it's sent to the developers. Compile errors for example
> will be helpful often right away, especially for stable-rc.

Ehhh, KCIDB gets build failures all the time (in merged code) and it takes
a while before a fix propagates across all the trees.

For example, the recent v6.5-rc5 has got 14 build failures (out of 865 builds
received):

https://kcidb.kernelci.org/d/revision/revision?orgId=1&var-git_commit_hash=52a93d39b17dc7eb98b6aa3edb93943248e03b2f&var-patchset_hash=

I suspect that someone somewhere is already working on these, or even their
fix has been merged somewhere already, but the CI just keeps failing
meanwhile.

>> In particular, the context of it
>> and the fact that a reported regression has a life cycle:
>>
>> - did this test also fail on other hardware targets or with other kernel
>> configurations?
>> - is it possible that the test failed because of an infrastructure
>> error?
>> - does the test fail consistently since that commit or does it show
>> unstable results?
>> - does the test output show any traces of already known bugs?
>> - has this regression been bisected and reported anywhere?
>> - was the regression reported by anyone? If so, is there someone already
>> working on it?
>>
>> Many of these info points can be extracted from the CI results databases
>> and processed to provide additional regression data. That’s what we’re
>> trying to do with the Regression Tracker tool, and we think it’d be
>> interesting to start experimenting with the data in KCIDB to see how
>> this could be improved and what would be the right way to integrate this
>> type of functionality.
>
> I (with my likely somewhat biased view due to regzbot and my work with
> it) wonder if we have two aspects here that might be wise to keep separated:
>
> * tests suddenly failing in one or multiple CI systems, which might be
> due to something going sideways in the tests or a real kernel regression
>
> * regressions found by individuals or CI systems where a human with some
> knowledge about the kernel did a sanity check (and also looked for
> duplicates) to ensure this most likely is a regression that should be
> acted upon -- and thus is also something that definitely should not be
> forgotten.
>
> Your regression tracking tool could be the former, regzbot the latter
> (which could feed the outcome back to the CI regression tracking
> system). But as I said, my view is obviously biased, so maybe I'm to
> blinded to see a better solution.

I agree that a human would be trusted more most of the time, and it would be
beneficial to give the results of human review a boost. However, ultimately,
automatic error detection is also made by humans, and it doesn't get tired,
can detect harder-to-spot problems, and problems happening en-masse, as you
mention.

If we consider applying patterns defined by humans to find already-known
issues in other test results, we get a combination of the two. I think
training an AI on the manually-detected issues, and those picked by those
patterns, could help us find completely new issues, and would further blur the
line between manual and automatic issue detection. Something that I'm looking
forward to exploring.

Regardless, I think we need both, and, in general, every trick in the book to
get Linux quality control on track.

Nick

2023-08-08 17:47:01

by Ricardo Cañuelo

[permalink] [raw]
Subject: Re: Kernel regression tracking/reporting initiatives and KCIDB

Hi Nikolai,

Thanks for the comprehensive answer, see my comments below,

On vie, ago 04 2023 at 19:06:26, Nikolai Kondrashov <[email protected]> wrote:
> I tried to review these efforts last year:
> https://archive.fosdem.org/2022/schedule/event/masking_known_issues_across_six_kernel_ci_systems/

That's a good summary, and it's nice to see an unbiased analysis of the
current state of the art and what could be improved. I think we agree in
most of the points.

> At this moment KCIDB submitters can send data linking a particular test or
> build result to an issue and its category (kernel/test/framework). We can
> generate notifications on e.g. a new issue being found by CI system
> (maintainers) in a particular repo/branch. There's no support on dashboards
> yet, and I'm yet to push for integration with particular CI systems.
>
> Here's the full announcement with examples:
> https://lore.kernel.org/kernelci/[email protected]/

What's the current status of this? Have any of the CI systems provided a
set of issues and incidents yet?

I like the solution to bring CI-specific patterns and definitions into a
CI-agnostic aggregator like KCIDB. I'm not familiar with the KCIDB
schemas so I'm not aware of the different ways that each individual CI
system chooses to define their test results.

A different approach could be to define a standard way to report CI test
results that every CI system could adhere to, but I guess that task
could become way too expensive. I think defining some common
abstractions is definitely possible and the way you did it is a good
start. Hopefully it can be extended if needed.

> > - is it possible that the test failed because of an infrastructure
> > error?
>
> Not sure how to approach this in KCIDB. How do you (plan to) do it?

In general, we can tell certain common errors very clearly by parsing
the test log. We had prior experience running tests in different
platforms in Collabora's LAVA lab so we were familiar with the most
frequent types of infrastructure errors seen there: assorted network
errors, failure to mount an NFS volume, failure to find the ramdisk,
etc.

We define a set of strings to match against:
https://gitlab.collabora.com/kernel/kernelci-regressions-tracker/-/blob/main/configs/logs-string-match.yaml
and we keep adding to the list when we found new patterns

At some point we'd like to use a similar approach, although automatized,
to detect frequent patterns in test outputs so that a regression
analyzer can match test results with known issues to avoid excessive
report duplication and also to do a first-stage tagging and triaging of
regressions.

> > - does the test fail consistently since that commit or does it show
> > unstable results?
>
> This is a difficult thing to properly figure out in KCIDB, because it
> aggregates data from multiple CI systems. A single CI system can assume that
> earlier results for a branch correspond to earlier commits. However, because
> different CI systems test at different speeds, KCIDB cannot make that
> assumption. Commits and their testing results can come in any order. So we
> cannot draw these kinds of conclusions based on time alone.

Couldn't the results be filtered by CI-system origin so they can be
analyzed without the noise of results from other origins? I was thinking
about filtering the results by CI origin and then sorting them by
date. But, as I said, I'm not familiar with the KCIDB schema so I
wouldn't really know how to navigate through the results.

I can check the code to find the schema definitions but I was wondering
if there's a quick way to visualize the raw test results data and the
relationships between them.

> The only way KCIDB can make this work correctly is by correlating with actual
> git history, following the commit graph.

I think this might depend on the context that we have for every test
result. That could be different depending on the data origin, but in our
case, when checking the recent history of a KernelCI test we took
advantage of the fact that KernelCI builds progress linearly (new builds
are always newer versions of old builds, AFAICT), so we built the
queries around that and we don't have to do any processing of the git
repos to fetch the list of test runs of a particular test configuration
after the last failed run:
https://gitlab.collabora.com/kernel/kernelci-regressions-tracker/-/blob/main/common/analysis.py#L27

For other systems this will be different, of course. But the key idea is
that we might be able to get to the same point using different paths,
maybe simpler.

> E.g. on the dashboard of a particular test result, display graphs of this
> test's results over time: overall, and for
> architecture/compiler/config/repo/branch of this test run. And something
> similar for test views on revision/checkout/build dashboards.

Yes, we have similar visions about how the dashboard
features. Unfortunately, a proper dashboard would be a key part of the
solution but it looks like it's still pretty far away in the future.

> BTW, a couple Mercurial folks approached me after the talk above, saying that
> they're working on supporting storing test results in history so they could do
> a similar kind of correlation and reasoning. So the idea is in the air.

This is very interesting. I think one thing we're missing from git is
a standard way to keep commits metadata that won't be tied to specific
refs (ie. metadata that could be linked to a commit even if it gets
rebased).
There's plenty of commit metadata scattered all over the kernel commit
logs in the form of tags, and then anyone can write tools on top of that
to find relationships between commits, patches and tests, but that's
kind of suboptimal. Having proper support for that in the repo would
make a huge difference in how the repo data is used, beyond tracking code
changes.

> This is what the KCIDB issue-linking support described above is working
> towards. Next step is to build a triaging system linking issues to build/test
> results automatically, based on patterns submitted by both CI systems, via the
> regular submission interface, and humans, via a dedicated UI.
>
> Patterns would specify which issue (bug URL) they're matching and include
> basic things like test name, architecture, hardware, and so on, but also
> patterns to find in e.g. test output files, logs, or in dmesg.
>
> That should answer questions of whether a test or a build exhibit a particular
> issue seen before.

This is a good approach, and it could tie in with what I mentioned above
about log parsing and identifying similar issues from their test log
outputs.

> One comment regarding the prototype you shared is that it's quite verbose and
> it's difficult to put together a feeling of what's been happening from
> overabundance of textual information. I think a visual touch could help here.

I agree, this started as an experiment and it shows. Having a nice and
flexible html output was never the main requirement.

> E.g. drawing a timeline of test results, pointing particular events (first
> failed, first passed, stability and so on) along its length.
>
> So instead of this:
>
> > first failed: today (2023-08-02)
> >
> > kernel: chromeos-stable-20230802.0
> > commit: 5c04267bed569d41aea3940402c7ce8cf975a5fe
> >
> > most recent fail: today (2023-08-02)
> >
> > kernel: chromeos-stable-20230802.0
> > commit: 5c04267bed569d41aea3940402c7ce8cf975a5fe
> >
> > last passed: 1 day ago (2023-08-01)
> >
> > kernel: chromeos-stable-20230801.1
> > commit: cd496545d91d820441277cd6a855b9af725fdb8a
>
> Something like this (roughly):
>
> |
> 2023-08-02 F - last FAIL
> F
> |
> P
> F
> |
> 2023-08-02 F - first FAIL
> |
> 2023-08-01 P - last PASS
> |
> P

That's absolutely right. We have the data, I just wish I had the webdev
chops to do that XD. Although we could probably give it a try now that
the feature set is mostly stable.

Thanks for the detailed answers and your insights. It's great to see we
share many points of view. Now that we know there's a long road ahead
it's time to draw an initial roadmap and see how far it can get us.

So, what would you recommend we can start with KCIDB? Would there be a
way for us to consume the data, even if it's only partially, to
experiment and get some ideas about what to do next? Would you rather
have these features implemented as part of KCIDB itself?

Cheers,
Ricardo

2023-08-20 07:33:50

by Ricardo Cañuelo

[permalink] [raw]
Subject: Re: Kernel regression tracking/reporting initiatives and KCIDB

Hi,

On jue, ago 17 2023 at 15:32:21, Guillaume Tucker <[email protected]> wrote:
> With the new API, data is owned by the users who submit it so we can
> effectively provide a solution for grouping data from multiple CI
> systems like KCIDB does.
>
> The key thing here is that KernelCI as a project will be
> providing a database with regression information collected from
> any public CI system.

Does this mean that KernelCI will replace KCIDB? or will they both keep
working separately?

> So the topic of tracking regressions for the whole kernel is already
> part of the roadmap for KernelCI, and if just waiting for CI systems
> to push data is not enough we can have services that actively go and
> look for regressions to feed them into the database under a particular
> category (or user).
> It would be good to align ideas you may have with KernelCI's
> plans

Our ideas start by studying the required features and needs for
regression analysis, reporting and tracking in a general and
system-agnostic way. First the concepts, then the implementation. I
think that analyzing the problem from the specific perspective of
KernelCI (or any other CI system in particular). If we start with a
general approach we can always specialize it later to a particular
implementation, but starting a design with a restricted design in mind,
tailored to a specific system, will probably tie it to that system
permanently.

IMO the work we want to do with regressions should be higher-level,
based on the data produced by a CI system (any of them) and not
dependent on any particular implementation.

> also please take into account the fact that the current
> Regression tracker you've created relies on the legacy system
> which is going to be retired in the coming months.

That's correct. The regression tracker started as a proof of concept to
explore ideas and we based it on KernelCI test data. We're aware that
the legacy system will be retired soon, that's why we want to look into
KCIDB as a data source.

>> - did this test also fail on other hardware targets or with other kernel
>> configurations?
>> - is it possible that the test failed because of an infrastructure
>> error?
>
> This should be treated as a false-positive failing test rather
> than a "regression". But yes of course we need to deal with
> them, it's just slightly off-topic here I think.

Not regressions, that's right, but I don't think these should be simply
categorized as false-positives. If we treated these two particular cases
as false positives we would be hiding and missing important results:

- If the same test case on the same kernel version failed with different
configurations or in other boards, highlighting that information could
help narrow down the investigation or point it to the right
direction. There's definitely a failure (probablyl not a regression)
but the thing to fix might not be a kernel code commit but the
configuration used for the test. This can be submitted to the test
authors or the maintainers of the CI system running the test.

- If the test failed because of an infrastructure error, that's
something that can be reported to the lab maintainers to fix. This can
be done automatically.

>> - does the test fail consistently since that commit or does it show
>> unstable results?
>> - does the test output show any traces of already known bugs?
>> - has this regression been bisected and reported anywhere?
>> - was the regression reported by anyone? If so, is there someone already
>> working on it?
>
> These are all part of the post-regression checks we've been
> discussing to run as part of KernelCI. Basically, extending from
> the current automated bisection jobs we have and also taking into
> account the notion of dynamic scheduling. However, when
> collecting data from other CI systems I don't think there is much
> we can do if the data is not there. But we might be able to
> create collaborations to run extra post-regression checks in
> other CI systems to tackle this.

This is why I think handling this at a higher level, once all the test
data from multiple CI systems has been collected, could be the right
strategy. Can't these post-regression checks be applied to a common DB
with results aggregated from different CI systems? As long as the
results are collected in a common and standard way, I mean. We could
have those checks implemented only once, in a centralized and generic
way, instead of having a different implementation of the same process in
each of the data sources.

> Experimenting with KCIDB now may be interesting, but depending on
> the outcome of the discussions around having one central database
> for KernelCI it might not be the optimal way to do it.

Why not? Sorry, I might not have the full context, can you or Nikolai
give a bit more insight about the possible future status of KCIDB and
KernelCI and the relationship between them?

Thanks,
Ricardo

2023-08-21 15:53:05

by Ricardo Cañuelo

[permalink] [raw]
Subject: Re: Kernel regression tracking/reporting initiatives and KCIDB

On vie, ago 18 2023 at 22:11:52, Guillaume Tucker <[email protected]> wrote:
> KernelCI is not any CI, it's designed to be the main system for
> the upstream kernel. So it already took the high-level approach
> to look at all this after becoming an LF project and we came up
> with KCIDB and now the new API as the community still needs
> an "active" system and not just a database for collecting data
> from other systems.

That sounds good and I think that's the way to go, but does that mean
that, in theory, most or all current CI systems (0-day, CKI, etc) will
"push" their results to the new KernelCI in the future?

> Right, except you might hit another deprecation hurdle if we
> start changing how things are designed around KCIDB and the new
> API. There's no doubt KCIDB will be supported for a long time,
> but taking into considerations all the new developments can save
> you a lot of trouble.

So, if using KCIDB as a data source is not a good idea right now, do you
have any suggestions on how to keep contributing to the improvement of
regression analysis?

If the new KernelCI API is already working with a large enough
regression database maybe this analysis work can be plugged into the
pipeline and we can start working on that.

> My point here is that KernelCI started tackling this issue of
> reporting kernel bugs several years ago at a very high level and
> we've come up with some carefully engineered solutions for it, so
> it looks like you're walking in our footsteps now. The new web
> dashboard, new API & Pipeline and KCIDB which pioneered working
> outside the native realm of KernelCI provide some answers to the
> challenges you're currently investigating. So maybe it is
> actually the best strategy for you to carry on doing things
> independently, but it would seem to me like due diligence for
> each of us to know what others are doing.

I surely must have missed most of those discussions but I couldn't find
any traces of the functionalities I listed either in a design document
or implemented anywhere. We certainly wouldn't have started this stream
of work if we knew this was already a work in progress. If there are
already concrete plans and some kind of design for this, let me know so
we can contribute to it.

If the solutions that have been engineered so far are still unplanned,
then I agree it'll be better to keep improving on this
independently. But in order to do that we'd need to be able to use other
data sources (KCIDB). Then, once the new KernelCI is ready to implement
these functionalities we can try to move them there after they've been
tested independently.

Cheers,
Ricardo