2013-08-19 15:52:10

by Anton Arapov

[permalink] [raw]
Subject: Re: [ATTEND] oops.kernel.org prospect

On Mon, Aug 19, 2013 at 11:39:39AM -0400, Theodore Ts'o wrote:
> On Mon, Aug 19, 2013 at 05:16:43PM +0200, Anton Arapov wrote:
> > > Why not just do that through email? You'll reach a much wider group of
> > > people than the tiny 80 developers at the conference.
> >
> > Ouch! Someone to take it as replacement of email - the least I wanted. It will
> > go email-way in either case.
> >
> > These tiny 80 may give the most valuable feedback on the topic. And often
> > it is the most difficult to get attention of them, especially via email.
> > In case it fits the conference, it could dilute the heavy topics.
>
> Usyually the best thing to do is to start the discussion on the
> mailing list (and we can do that on ksummit-2013-discuss, but this is
> always why it's sometimes useful to cc lkml on topic proposals, so we
> can jump start the discussion), and see if it's controversial or not.

Oh well,... I didn't have a time for this right now, nor project is
not exactly in the state I'm willing to show (mostly webui)

// CC'd: lkml (please don't complain on styles yet, focus on functionality)

> I suspect the biggest issue is one of "who will bell the cat". As in,
> if no one creates a new oops.kernel.org, and if we can't get the
> community distributions to be willing to set up their systems to
> automatically submit oops reports to the server, it's going to be
> somewhat pointless to have the discussion at the kernel summit.

It is created and distributions are willing to submit oopses
automatically. Debian, Fedora, Ubuntu are already sending reports.

The above is exactly what I would be able to share on conference with
more details.

> Note that main value of oops.kernel.org is the fact that we find out
> which bugs are most likely hitting "real users", as opposed to the
> very valuable testing done by developers on the linux-next and -rc
> kernels. It's main drawback is that for privacy reasons, we don't get
> much more information than the stack trace. So if it's just
> linux-next testers and developers using oops.kernel.org, who are the
> people who are more likely able to submit a detailed bug report, the
> oops.kernel.org server probably won't add much value. In fact, if the
> testers depend on the sending of an anonymous stack trace, and don't
> send us more detailed reproduction information, the existence of
> oops.kernel.org could actually make things worse.

And the above here is one of the feedbacks I am willing to see. As well as
ideas how to overcome it.

> This means, realistically, it needs someone from Fedora or Open SuSE
> or Debian or Ubuntu being willing to sponsor the uploading of
> information to oops.kernel.org by default.
...up and running, now...

hth,
--
Anton


2013-08-19 21:25:28

by Dave Jones

[permalink] [raw]
Subject: Re: [ATTEND] oops.kernel.org prospect

On Mon, Aug 19, 2013 at 05:52:02PM +0200, Anton Arapov wrote:
> On Mon, Aug 19, 2013 at 11:39:39AM -0400, Theodore Ts'o wrote:
> > On Mon, Aug 19, 2013 at 05:16:43PM +0200, Anton Arapov wrote:
> > > > Why not just do that through email? You'll reach a much wider group of
> > > > people than the tiny 80 developers at the conference.
> > >
> > > Ouch! Someone to take it as replacement of email - the least I wanted. It will
> > > go email-way in either case.
> > >
> > > These tiny 80 may give the most valuable feedback on the topic. And often
> > > it is the most difficult to get attention of them, especially via email.
> > > In case it fits the conference, it could dilute the heavy topics.
> >
> > Usyually the best thing to do is to start the discussion on the
> > mailing list (and we can do that on ksummit-2013-discuss, but this is
> > always why it's sometimes useful to cc lkml on topic proposals, so we
> > can jump start the discussion), and see if it's controversial or not.
>
> Oh well,... I didn't have a time for this right now, nor project is
> not exactly in the state I'm willing to show (mostly webui)
>
> // CC'd: lkml (please don't complain on styles yet, focus on functionality)

I stumbled across this a week or so ago, and had some thoughts back then,
but didn't mail them anywhere because I wasn't sure who ran it, and couldn't
tell how far along it was.

Quick brain dump

* Visiting it with chromium gets an annoying warning about the https server
identifying as a different server. (does it even need https?)

* There's a lot of tainted kernel traces in there. 99% of kernel developers
will never care about these in my experience. You can adjust this on a per-query
basis it seems, but better would be to turn them off globally, and have them
available just for people who want to search for 'all' (tainted or untainted) oopses.

- That the tainted oopses are counted as 'regular' oopses is skewing the 'top bugs'
on the front page.

- As well as proprietary, take care of 'out of tree' tainted modules in the same way.

* I clicked through some of the debian oopses, and saw these:
https://oops.kernel.org/browse-reports/oops-detail/?id=30497
https://oops.kernel.org/browse-reports/oops-detail/?id=30499
It would be useful to know if this was the same user. (It seems likely, but
there's no way to know for sure). You don't need identifying info other than
"These came from the same system" side-stepping any privacy concerns.

* In the Linked modules section, if there's an out-of-tree/proprietary module,
we annotate those in oopses with (O), or (P). This seems to be lost in your UI.
(Bonus points for making them stand out)

* The traces by default lack a lot of information, forcing clicking of the 'show raw oops'
in every case. Missing useful info (at least): EIP/RIP, other registers.

* 'Show raw oops' doesn't. (At least on chromium)

* This bug last seen: 2013-08-17
Also useful here would be something like:
Seen on: 3.2-rc2, 3.10-rc10 (You can probably just list earliest/latest rather than
every single kernel it's been seen on, unless you want a 'show all' button)

* Instead of summaries like "general protection fault: 4000 [#1] SMP"
Decode the EIP/RIP, and call it "general protection fault in i915_gem_do_execbuffer".
Not only does it make reading summaries easier, it should allow you to detect
dupes better. (Sidenote, abrt needs this too, when it files bugzillas)

* Looking over the summaries at https://oops.kernel.org/browse-reports/?distro=Fedora&search=submit
The first thing that comes to mind is "There's a lot of soft lockup bugs here"
Some means of grouping similar looking bugs would be useful.
(In bugzilla, clicking 'sort by summary' kinda gives this, but it still sucks).

* When Arjan ran kerneloops, he would periodically mail out a "top 10 oopses" report
on the latest tree. That seems like something that would be worth doing again,
but only after filtering out the tainted stuff as mentioned above.

* Some kind of "find similar bugs in other bug trackers" feature would be really awesome.

* There's a bunch of bugs in there that have been tainted 'W'. These are almost never useful,
because we're already deep in "bad shit happened" land at that point.
It'll also mean you could get flooded with oopses from a single crash if something
keeps on spewing traces. Just give up after filing the first oops.

* Take for example: https://oops.kernel.org/browse-reports/oops-detail/?id=30410
This is a 2.6.27.5 kernel bug, that was filed *last week*.
I'd bet dollars to donuts no-one is going to give a crap about that bug.
I'm not sure if it's better here to never file 'ancient' bugs, or to periodically
archive/delete ones that have been in the db more than a few years.

* Looking at https://oops.kernel.org/browse-reports/?function=ironlake_crtc_disable&search=submit
It seems the hashing algorithm for detecting dupes could use some work.
Many of these traces are probably exactly the same problem.
Are you hashing symbols in the trace beginning with '? ' ? If so, you probably shouldn't be.

Dave

2013-08-20 08:02:53

by Anton Arapov

[permalink] [raw]
Subject: Re: [ATTEND] oops.kernel.org prospect

On Mon, Aug 19, 2013 at 05:25:12PM -0400, Dave Jones wrote:
> On Mon, Aug 19, 2013 at 05:52:02PM +0200, Anton Arapov wrote:
> > On Mon, Aug 19, 2013 at 11:39:39AM -0400, Theodore Ts'o wrote:
> > > On Mon, Aug 19, 2013 at 05:16:43PM +0200, Anton Arapov wrote:
> > > > > Why not just do that through email? You'll reach a much wider group of
> > > > > people than the tiny 80 developers at the conference.
> > > >
> > > > Ouch! Someone to take it as replacement of email - the least I wanted. It will
> > > > go email-way in either case.
> > > >
> > > > These tiny 80 may give the most valuable feedback on the topic. And often
> > > > it is the most difficult to get attention of them, especially via email.
> > > > In case it fits the conference, it could dilute the heavy topics.
> > >
> > > Usyually the best thing to do is to start the discussion on the
> > > mailing list (and we can do that on ksummit-2013-discuss, but this is
> > > always why it's sometimes useful to cc lkml on topic proposals, so we
> > > can jump start the discussion), and see if it's controversial or not.
> >
> > Oh well,... I didn't have a time for this right now, nor project is
> > not exactly in the state I'm willing to show (mostly webui)
> >
> > // CC'd: lkml (please don't complain on styles yet, focus on functionality)
>
> I stumbled across this a week or so ago, and had some thoughts back then,
> but didn't mail them anywhere because I wasn't sure who ran it, and couldn't
> tell how far along it was.
>
> Quick brain dump
>
> * Visiting it with chromium gets an annoying warning about the https server
> ...
[snip]
> ...
> Dave

Thanks, Dave! Will be fixed and improved.

Anton.

2013-08-20 08:22:21

by Borislav Petkov

[permalink] [raw]
Subject: Re: [ATTEND] oops.kernel.org prospect

On Tue, Aug 20, 2013 at 10:02:43AM +0200, Anton Arapov wrote:
> > * Visiting it with chromium gets an annoying warning about the https server
> > ...
> [snip]
> > ...
> > Dave
>
> Thanks, Dave! Will be fixed and improved.

Yeah, collecting oopses is a good idea, so +1.

However, we probably want to think about what exactly we're going to
do with that information. For example, if I want to address an issue,
I probably want to know how I can reproduce the oops - maybe something
like allowing the reporter to add free text note to the oops.

And yes, as tytso already said, we are very often going to need more
info about a system causing the oops (dmesg, lspci, dmidecode, etc,
etc). I'm not sure how we're going to collect that without sacrificing
some privacy. Or maybe, we could be able to ask people to open a bug on
bugzilla.kernel.org where further debugging can take place...

Which reminds me: maybe connecting bug reports on bugzilla.kernel.org
with your stats could also be a way to connect bug reports with
reporters...

Thanks.

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-08-20 12:38:30

by Dave Jones

[permalink] [raw]
Subject: Re: [ATTEND] oops.kernel.org prospect

On Tue, Aug 20, 2013 at 10:22:16AM +0200, Borislav Petkov wrote:
> On Tue, Aug 20, 2013 at 10:02:43AM +0200, Anton Arapov wrote:
> > > * Visiting it with chromium gets an annoying warning about the https server
> > > ...
> > [snip]
> > > ...
> > > Dave
> >
> > Thanks, Dave! Will be fixed and improved.
>
> Yeah, collecting oopses is a good idea, so +1.
>
> However, we probably want to think about what exactly we're going to
> do with that information. For example, if I want to address an issue,
> I probably want to know how I can reproduce the oops - maybe something
> like allowing the reporter to add free text note to the oops.

abrt used to have a free-form entry like this.
What happened is users have no idea what to type in there, so you end up
with bugs containing things like "don't know" or worse, some crazy moon
language you can't even read.

> And yes, as tytso already said, we are very often going to need more
> info about a system causing the oops (dmesg, lspci, dmidecode, etc,
> etc). I'm not sure how we're going to collect that without sacrificing
> some privacy. Or maybe, we could be able to ask people to open a bug on
> bugzilla.kernel.org where further debugging can take place...

Two things worth noting here, are 1) the original kerneloops also didn't
collect anything like this, and was still very useful, and 2) for the more
common issues (which let's face it, are going to be the only things
people really look at) chances are pretty high that there's going to be
someone also reporting it on lkml, or in a distro bug tracker.

What might be useful however, is collecting things like dmi/lspci/lsusb etc
and _asking_ the user if they're ok with including them at time of filing.
We might scare off some of the more paranoid OMGMYSECRETDATAS users, but
chances are high most people won't care. This requires the client to have
a UI though, which aiui, it currently doesn't. Anton?

We might also ask if they want to provide an email address for feedback,
but that leads to a bunch of questions about how we expose that to developers
without exposing it to spambots.

Dave

2013-08-20 13:22:29

by Anton Arapov

[permalink] [raw]
Subject: Re: [ATTEND] oops.kernel.org prospect

On Tue, Aug 20, 2013 at 08:37:48AM -0400, Dave Jones wrote:
> On Tue, Aug 20, 2013 at 10:22:16AM +0200, Borislav Petkov wrote:
> > On Tue, Aug 20, 2013 at 10:02:43AM +0200, Anton Arapov wrote:
> > > > * Visiting it with chromium gets an annoying warning about the https server
> > > > ...
> > > [snip]
> > > > ...
> > > > Dave
> > >
> > > Thanks, Dave! Will be fixed and improved.
> >
> > Yeah, collecting oopses is a good idea, so +1.
> >
> > However, we probably want to think about what exactly we're going to
> > do with that information. For example, if I want to address an issue,
> > I probably want to know how I can reproduce the oops - maybe something
> > like allowing the reporter to add free text note to the oops.
> abrt used to have a free-form entry like this.
> What happened is users have no idea what to type in there, so you end up
> with bugs containing things like "don't know" or worse, some crazy moon
> language you can't even read.
Agree.

> > And yes, as tytso already said, we are very often going to need more
> > info about a system causing the oops (dmesg, lspci, dmidecode, etc,
> > etc). I'm not sure how we're going to collect that without sacrificing
> > some privacy. Or maybe, we could be able to ask people to open a bug on
> > bugzilla.kernel.org where further debugging can take place...
> Two things worth noting here, are 1) the original kerneloops also didn't
> collect anything like this, and was still very useful, and 2) for the more
> common issues (which let's face it, are going to be the only things
> people really look at) chances are pretty high that there's going to be
> someone also reporting it on lkml, or in a distro bug tracker.
>
> What might be useful however, is collecting things like dmi/lspci/lsusb etc
> and _asking_ the user if they're ok with including them at time of filing.
> We might scare off some of the more paranoid OMGMYSECRETDATAS users, but
> chances are high most people won't care. This requires the client to have
> a UI though, which aiui, it currently doesn't. Anton?
The above is possible with abrt/libreport-kerneloops it does have UI
and a possibility to include the dmi/lspci/lsusb into the message to
oops.kernel.org.

Some distros still using the old reporting tool written by Arjan that
doesn't have UI.

I am going to research what and how distros are using nowadays, get in
touch with people/distro_maintainers in order to align the process as
well as gather their views and concerns on sharing anything other than
just a stacktrace and doing unconditionally(w/o user intervention).
oops.kernel.org can sanitize the 'private' data is it already does for
oopses.

Will be keeping lkml posted on my progress.

> We might also ask if they want to provide an email address for feedback,
> but that leads to a bunch of questions about how we expose that to developers
> without exposing it to spambots.
I'd not want to ask user about anything. In Fedora, Abrt end up
this way -- abrt asks user to review the report and whether one is
willing to send it to Bugzilla and oops.kernel.org. User also can
check a "don't ask me in the future for this kind of issues - just
send reports" checkbox. This is what I was able to get from Abrt
folks so far.


Anton

2013-08-20 15:20:13

by Dave Hansen

[permalink] [raw]
Subject: Re: [Ksummit-2013-discuss] [ATTEND] oops.kernel.org prospect

On 08/19/2013 02:25 PM, Dave Jones wrote:
> * This bug last seen: 2013-08-17
> Also useful here would be something like:
> Seen on: 3.2-rc2, 3.10-rc10 (You can probably just list earliest/latest rather than
> every single kernel it's been seen on, unless you want a 'show all' button)

Once you have the "seen on" stuff sorted out, it would also be really
nice to be able to easily select bugs only seen on "versions after 3.8",
just so we can filter out some of the older stuff. The kernel version
in the filter is useful, but would be much more so if we had ranges,
even if it was just "newer than $foo".

When you go to "Show Raw Oops", it usually doesn't show up for me in
Chrome. There's a javascript error:

> The page at https://oops.kernel.org/browse-reports/oops-detail/?id=30565# displayed insecure content from http://oops.kernel.org/get-raw.php?id=30565&token=a94733d146ae15f8cec871d4f238956da80d7c5322f6253f9b49d4c5cc8fe8a1.

If I go over and load the page as plain old http, it works fine. Also,
when those oopses show up, the font is variable-width. It would
probably be a wee bit easier to read if it were displayed in a
fixed-width font.

This is a much more minor nit, but the source code links (like clicking
on bad_page from here:
https://oops.kernel.org/browse-reports/oops-detail/?id=30565#) from the
traces for the distribution kernels link over to mainline kernel source.
This means that the line numbers don't _quite_ line up. It would be
really cool if it was able to dump you right over to a copy of the
Debian-specific source in that bug's case. But, this is a generic
problem that folks have who work across lots of distros: you don't
always have the right source in front of you for any given kernel.

Anyway... cool stuff. I always forget that oops.kernel.org is there,
but it's always fun to poke around when I run across it. :)

2013-08-20 15:48:15

by Guenter Roeck

[permalink] [raw]
Subject: Re: [Ksummit-2013-discuss] [ATTEND] oops.kernel.org prospect

On Tue, Aug 20, 2013 at 08:20:04AM -0700, Dave Hansen wrote:
> On 08/19/2013 02:25 PM, Dave Jones wrote:
> > * This bug last seen: 2013-08-17
> > Also useful here would be something like:
> > Seen on: 3.2-rc2, 3.10-rc10 (You can probably just list earliest/latest rather than
> > every single kernel it's been seen on, unless you want a 'show all' button)
>
> Once you have the "seen on" stuff sorted out, it would also be really
> nice to be able to easily select bugs only seen on "versions after 3.8",
> just so we can filter out some of the older stuff. The kernel version
> in the filter is useful, but would be much more so if we had ranges,
> even if it was just "newer than $foo".
>
A per subsystem filter would be nice to have too.

Guenter

2013-08-20 17:03:08

by Anton Arapov

[permalink] [raw]
Subject: Re: [Ksummit-2013-discuss] [ATTEND] oops.kernel.org prospect

On Tue, Aug 20, 2013 at 08:20:04AM -0700, Dave Hansen wrote:
> On 08/19/2013 02:25 PM, Dave Jones wrote:
> > * This bug last seen: 2013-08-17
> > Also useful here would be something like:
> > Seen on: 3.2-rc2, 3.10-rc10 (You can probably just list earliest/latest rather than
> > every single kernel it's been seen on, unless you want a 'show all' button)
>
> Once you have the "seen on" stuff sorted out, it would also be really
> nice to be able to easily select bugs only seen on "versions after 3.8",
> just so we can filter out some of the older stuff. The kernel version
> in the filter is useful, but would be much more so if we had ranges,
> even if it was just "newer than $foo".
This is good idea.


> When you go to "Show Raw Oops", it usually doesn't show up for me in
> Chrome. There's a javascript error:
> > The page at https://oops.kernel.org/browse-reports/oops-detail/?id=30565# displayed insecure content from http://oops.kernel.org/get-raw.php?id=30565&token=a94733d146ae15f8cec871d4f238956da80d7c5322f6253f9b49d4c5cc8fe8a1.
> If I go over and load the page as plain old http, it works fine. Also,
> when those oopses show up, the font is variable-width. It would
> probably be a wee bit easier to read if it were displayed in a
> fixed-width font.
The latest version of Firefox should have the same issue, it is
prohibited now to show/get insecure content under the secure
connection. I will most probably redirect any https request to http
automatically to avoid this issue.


> This is a much more minor nit, but the source code links (like clicking
> on bad_page from here:
> https://oops.kernel.org/browse-reports/oops-detail/?id=30565#) from the
> traces for the distribution kernels link over to mainline kernel source.
> This means that the line numbers don't _quite_ line up. It would be
> really cool if it was able to dump you right over to a copy of the
> Debian-specific source in that bug's case. But, this is a generic
> problem that folks have who work across lots of distros: you don't
> always have the right source in front of you for any given kernel.

Will put into todo list, might be someday...

> Anyway... cool stuff. I always forget that oops.kernel.org is there,
> but it's always fun to poke around when I run across it. :)

Thanks,
Anton.

>requests accumulated:
http://trello.com/b/ZvLKCkJX/oops-kernel-org-support-and-development

2013-08-20 17:06:37

by Anton Arapov

[permalink] [raw]
Subject: Re: [ATTEND] oops.kernel.org prospect

On Mon, Aug 19, 2013 at 05:25:12PM -0400, Dave Jones wrote:
> On Mon, Aug 19, 2013 at 05:52:02PM +0200, Anton Arapov wrote:
> > On Mon, Aug 19, 2013 at 11:39:39AM -0400, Theodore Ts'o wrote:
> > > On Mon, Aug 19, 2013 at 05:16:43PM +0200, Anton Arapov wrote:
> > > > > Why not just do that through email? You'll reach a much wider group of
> > > > > people than the tiny 80 developers at the conference.
> > > >
> > > > Ouch! Someone to take it as replacement of email - the least I wanted. It will
> > > > go email-way in either case.
> > > >
> > > > These tiny 80 may give the most valuable feedback on the topic. And often
> > > > it is the most difficult to get attention of them, especially via email.
> > > > In case it fits the conference, it could dilute the heavy topics.
> > >
> > > Usyually the best thing to do is to start the discussion on the
> > > mailing list (and we can do that on ksummit-2013-discuss, but this is
> > > always why it's sometimes useful to cc lkml on topic proposals, so we
> > > can jump start the discussion), and see if it's controversial or not.
> >
> > Oh well,... I didn't have a time for this right now, nor project is
> > not exactly in the state I'm willing to show (mostly webui)
> >
> > // CC'd: lkml (please don't complain on styles yet, focus on functionality)
>
> I stumbled across this a week or so ago, and had some thoughts back then,
> but didn't mail them anywhere because I wasn't sure who ran it, and couldn't
> tell how far along it was.
>
> Quick brain dump
>
> * Visiting it with chromium gets an annoying warning about the https server
> identifying as a different server. (does it even need https?)
>
> * There's a lot of tainted kernel traces in there. 99% of kernel developers
> will never care about these in my experience. You can adjust this on a per-query
> basis it seems, but better would be to turn them off globally, and have them
> available just for people who want to search for 'all' (tainted or untainted) oopses.
>
> - That the tainted oopses are counted as 'regular' oopses is skewing the 'top bugs'
> on the front page.
>
> - As well as proprietary, take care of 'out of tree' tainted modules in the same way.
>
> * I clicked through some of the debian oopses, and saw these:
> https://oops.kernel.org/browse-reports/oops-detail/?id=30497
> https://oops.kernel.org/browse-reports/oops-detail/?id=30499
> It would be useful to know if this was the same user. (It seems likely, but
> there's no way to know for sure). You don't need identifying info other than
> "These came from the same system" side-stepping any privacy concerns.
>
> * In the Linked modules section, if there's an out-of-tree/proprietary module,
> we annotate those in oopses with (O), or (P). This seems to be lost in your UI.
> (Bonus points for making them stand out)
>
> * The traces by default lack a lot of information, forcing clicking of the 'show raw oops'
> in every case. Missing useful info (at least): EIP/RIP, other registers.
>
> * 'Show raw oops' doesn't. (At least on chromium)
>
> * This bug last seen: 2013-08-17
> Also useful here would be something like:
> Seen on: 3.2-rc2, 3.10-rc10 (You can probably just list earliest/latest rather than
> every single kernel it's been seen on, unless you want a 'show all' button)
>
> * Instead of summaries like "general protection fault: 4000 [#1] SMP"
> Decode the EIP/RIP, and call it "general protection fault in i915_gem_do_execbuffer".
> Not only does it make reading summaries easier, it should allow you to detect
> dupes better. (Sidenote, abrt needs this too, when it files bugzillas)
>
> * Looking over the summaries at https://oops.kernel.org/browse-reports/?distro=Fedora&search=submit
> The first thing that comes to mind is "There's a lot of soft lockup bugs here"
> Some means of grouping similar looking bugs would be useful.
> (In bugzilla, clicking 'sort by summary' kinda gives this, but it still sucks).
>
> * When Arjan ran kerneloops, he would periodically mail out a "top 10 oopses" report
> on the latest tree. That seems like something that would be worth doing again,
> but only after filtering out the tainted stuff as mentioned above.
>
> * Some kind of "find similar bugs in other bug trackers" feature would be really awesome.
>
> * There's a bunch of bugs in there that have been tainted 'W'. These are almost never useful,
> because we're already deep in "bad shit happened" land at that point.
> It'll also mean you could get flooded with oopses from a single crash if something
> keeps on spewing traces. Just give up after filing the first oops.
>
> * Take for example: https://oops.kernel.org/browse-reports/oops-detail/?id=30410
> This is a 2.6.27.5 kernel bug, that was filed *last week*.
> I'd bet dollars to donuts no-one is going to give a crap about that bug.
> I'm not sure if it's better here to never file 'ancient' bugs, or to periodically
> archive/delete ones that have been in the db more than a few years.
>
> * Looking at https://oops.kernel.org/browse-reports/?function=ironlake_crtc_disable&search=submit
> It seems the hashing algorithm for detecting dupes could use some work.
> Many of these traces are probably exactly the same problem.
> Are you hashing symbols in the trace beginning with '? ' ? If so, you probably shouldn't be.

Dave,

FYI,
I've put all the above, hopefully nothing missed, to the list that
available here:
http://trello.com/b/ZvLKCkJX/oops-kernel-org-support-and-development

Will keep lkml posted on progress though.

Anton.

2013-08-20 20:58:30

by Ben Hutchings

[permalink] [raw]
Subject: Re: [Ksummit-2013-discuss] [ATTEND] oops.kernel.org prospect

On Tue, Aug 20, 2013 at 08:20:04AM -0700, Dave Hansen wrote:
[...]
> This is a much more minor nit, but the source code links (like clicking
> on bad_page from here:
> https://oops.kernel.org/browse-reports/oops-detail/?id=30565#) from the
> traces for the distribution kernels link over to mainline kernel source.
> This means that the line numbers don't _quite_ line up. It would be
> really cool if it was able to dump you right over to a copy of the
> Debian-specific source in that bug's case. But, this is a generic
> problem that folks have who work across lots of distros: you don't
> always have the right source in front of you for any given kernel.
[...]

For Debian kernels this should be quite easy. The sources are
browseable at:

http://sources.debian.net/src/linux/$PACKAGE_VERSION/$FILE#L$LINE

The package version is not the same as the kernel release string,
but appears at the end of the same line in oops messages, e.g. for
<http://oops.kernel.org/browse-reports/oops-detail/?id=30218> the
package version is 3.10.1-1.

This doesn't work for versions older than 3.2, or those with the RT
patchset, as sources.debian.net can't show the patched source for
these.

Ben.

--
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
- Albert Camus

2013-08-20 21:39:07

by Anton Arapov

[permalink] [raw]
Subject: Re: [Ksummit-2013-discuss] [ATTEND] oops.kernel.org prospect

On Tue, Aug 20, 2013 at 09:58:12PM +0100, Ben Hutchings wrote:
> On Tue, Aug 20, 2013 at 08:20:04AM -0700, Dave Hansen wrote:
> [...]
> > This is a much more minor nit, but the source code links (like clicking
> > on bad_page from here:
> > https://oops.kernel.org/browse-reports/oops-detail/?id=30565#) from the
> > traces for the distribution kernels link over to mainline kernel source.
> > This means that the line numbers don't _quite_ line up. It would be
> > really cool if it was able to dump you right over to a copy of the
> > Debian-specific source in that bug's case. But, this is a generic
> > problem that folks have who work across lots of distros: you don't
> > always have the right source in front of you for any given kernel.
> [...]
>
> For Debian kernels this should be quite easy. The sources are
> browseable at:
>
> http://sources.debian.net/src/linux/$PACKAGE_VERSION/$FILE#L$LINE
>
> The package version is not the same as the kernel release string,
> but appears at the end of the same line in oops messages, e.g. for
> <http://oops.kernel.org/browse-reports/oops-detail/?id=30218> the
> package version is 3.10.1-1.
>
> This doesn't work for versions older than 3.2, or those with the RT
> patchset, as sources.debian.net can't show the patched source for
> these.


Thanks, Ben!

Anton

2013-08-21 19:35:08

by Borislav Petkov

[permalink] [raw]
Subject: Re: [ATTEND] oops.kernel.org prospect

On Tue, Aug 20, 2013 at 08:37:48AM -0400, Dave Jones wrote:
> abrt used to have a free-form entry like this. What happened is users
> have no idea what to type in there, so you end up with bugs containing
> things like "don't know" or worse, some crazy moon language you can't
> even read.

Prepend the entry with an informative question maybe:

"Enter bug reproduction information here:"

> Two things worth noting here, are 1) the original kerneloops also
> didn't collect anything like this, and was still very useful, and
> 2) for the more common issues (which let's face it, are going to be
> the only things people really look at) chances are pretty high that
> there's going to be someone also reporting it on lkml, or in a distro
> bug tracker.

Ok.

> What might be useful however, is collecting things like
> dmi/lspci/lsusb etc and _asking_ the user if they're ok with including
> them at time of filing. We might scare off some of the more paranoid
> OMGMYSECRETDATAS users, but chances are high most people won't care.
> This requires the client to have a UI though, which aiui, it currently
> doesn't. Anton?

Definitely a step in the right direction.

> We might also ask if they want to provide an email address for
> feedback, but that leads to a bunch of questions about how we expose
> that to developers without exposing it to spambots.

Right.

2013-08-21 19:44:12

by Francois Romieu

[permalink] [raw]
Subject: Re: [ATTEND] oops.kernel.org prospect

Anton Arapov <[email protected]> :
[...]
> Oh well,... I didn't have a time for this right now, nor project is
> not exactly in the state I'm willing to show (mostly webui)

I have sorted the r8169 oopses by kernel revision to start with the most
recent kernels. I don't get why the r8169 driver appears in the
"Caused by:" field when
- the bug is about "scheduling while atomic: Xorg/3042/0×00000001"
- the kernel is PDWO with fglrx
- r8169 appears in the module list, nowhere else (not even the oops)

I tried :

http://oops.kernel.org/browse-reports/?c=1&d=1&oopsclass=default&oopstype=default&distro=default&module=&driver=r8169&function=&file=&bugline=&kernel=&tainted=true&search=submit

(0 answer if Stack, Registers or Disassembled code is added)

("tainted=true" while "Untainted only" was asked for, huh ?)

The answers contains:

http://oops.kernel.org/browse-reports/oops-detail/?id=29778

and

http://oops.kernel.org/browse-reports/oops-detail/?id=29856

I can't even find the "r8169" word in those.

Is it the currently expected behavior ?

--
Ueimor

2013-08-22 10:44:16

by Anton Arapov

[permalink] [raw]
Subject: Re: [ATTEND] oops.kernel.org prospect

On Wed, Aug 21, 2013 at 09:43:57PM +0200, Francois Romieu wrote:
> Anton Arapov <[email protected]> :
> [...]
> > Oh well,... I didn't have a time for this right now, nor project is
> > not exactly in the state I'm willing to show (mostly webui)
>
> I have sorted the r8169 oopses by kernel revision to start with the most
> recent kernels. I don't get why the r8169 driver appears in the
> "Caused by:" field when
> - the bug is about "scheduling while atomic: Xorg/3042/0×00000001"
> - the kernel is PDWO with fglrx
> - r8169 appears in the module list, nowhere else (not even the oops)
>
> I tried :
> http://oops.kernel.org/browse-reports/?c=1&d=1&oopsclass=default&oopstype=default&distro=default&module=&driver=r8169&function=&file=&bugline=&kernel=&tainted=true&search=submit
>
> (0 answer if Stack, Registers or Disassembled code is added)
> ("tainted=true" while "Untainted only" was asked for, huh ?)
>
> The answers contains:
> http://oops.kernel.org/browse-reports/oops-detail/?id=29778
> and
> http://oops.kernel.org/browse-reports/oops-detail/?id=29856
>
> I can't even find the "r8169" word in those.
> Is it the currently expected behavior ?

Thanks for the report. I will check it.

Anton.

2013-10-04 08:54:04

by Anton Arapov

[permalink] [raw]
Subject: Re: [ATTEND] oops.kernel.org prospect

On Mon, Aug 19, 2013 at 05:25:12PM -0400, Dave Jones wrote:
> On Mon, Aug 19, 2013 at 05:52:02PM +0200, Anton Arapov wrote:
> > On Mon, Aug 19, 2013 at 11:39:39AM -0400, Theodore Ts'o wrote:
> > > On Mon, Aug 19, 2013 at 05:16:43PM +0200, Anton Arapov wrote:
> > > > > Why not just do that through email? You'll reach a much wider group of
> > > > > people than the tiny 80 developers at the conference.
> > > >
> > > > Ouch! Someone to take it as replacement of email - the least I wanted. It will
> > > > go email-way in either case.
> > > >
> > > > These tiny 80 may give the most valuable feedback on the topic. And often
> > > > it is the most difficult to get attention of them, especially via email.
> > > > In case it fits the conference, it could dilute the heavy topics.
> > >
> > > Usyually the best thing to do is to start the discussion on the
> > > mailing list (and we can do that on ksummit-2013-discuss, but this is
> > > always why it's sometimes useful to cc lkml on topic proposals, so we
> > > can jump start the discussion), and see if it's controversial or not.
> >
> > Oh well,... I didn't have a time for this right now, nor project is
> > not exactly in the state I'm willing to show (mostly webui)
> >
> > // CC'd: lkml (please don't complain on styles yet, focus on functionality)
>
> I stumbled across this a week or so ago, and had some thoughts back then,
> but didn't mail them anywhere because I wasn't sure who ran it, and couldn't
> tell how far along it was.
>
> Quick brain dump
>
> * Visiting it with chromium gets an annoying warning about the https server
> identifying as a different server. (does it even need https?)

It was an openshift+chromium issue, it should be resolved as per
https://bugzilla.redhat.com/show_bug.cgi?id=908417


> * There's a lot of tainted kernel traces in there. 99% of kernel developers
> will never care about these in my experience. You can adjust this on a per-query
> basis it seems, but better would be to turn them off globally, and have them
> available just for people who want to search for 'all' (tainted or untainted) oopses.
>
> - That the tainted oopses are counted as 'regular' oopses is skewing the 'top bugs'
> on the front page.
>
> - As well as proprietary, take care of 'out of tree' tainted modules in the same way.

It is possible to filter out tainted reports now.


> * I clicked through some of the debian oopses, and saw these:
> https://oops.kernel.org/browse-reports/oops-detail/?id=30497
> https://oops.kernel.org/browse-reports/oops-detail/?id=30499
> It would be useful to know if this was the same user. (It seems likely, but
> there's no way to know for sure). You don't need identifying info other than
> "These came from the same system" side-stepping any privacy concerns.

Watching oopses from one source is still in to do.

But now you can see "Total count: 14 (from 7 unique sources) " per
oops, for example:
http://oops.kernel.org/oops/warning-at-net-ipv4-tcp_input-c2776-tcp_fastretrans_alert0xc21-0xc60-6/

> * In the Linked modules section, if there's an out-of-tree/proprietary module,
> we annotate those in oopses with (O), or (P). This seems to be lost in your UI.
> (Bonus points for making them stand out)

implemented.


> * The traces by default lack a lot of information, forcing clicking of the 'show raw oops'
> in every case. Missing useful info (at least): EIP/RIP, other registers.

should be improved now.

> * 'Show raw oops' doesn't. (At least on chromium)
>
> * This bug last seen: 2013-08-17
> Also useful here would be something like:
> Seen on: 3.2-rc2, 3.10-rc10 (You can probably just list earliest/latest rather than
> every single kernel it's been seen on, unless you want a 'show all' button)

implemented.

> * Instead of summaries like "general protection fault: 4000 [#1] SMP"
> Decode the EIP/RIP, and call it "general protection fault in i915_gem_do_execbuffer".
> Not only does it make reading summaries easier, it should allow you to detect
> dupes better. (Sidenote, abrt needs this too, when it files bugzillas)

fixed.

> * Looking over the summaries at https://oops.kernel.org/browse-reports/?distro=Fedora&search=submit
> The first thing that comes to mind is "There's a lot of soft lockup bugs here"
> Some means of grouping similar looking bugs would be useful.
> (In bugzilla, clicking 'sort by summary' kinda gives this, but it still sucks).

improved && fixed

> * When Arjan ran kerneloops, he would periodically mail out a "top 10 oopses" report
> on the latest tree. That seems like something that would be worth doing again,
> but only after filtering out the tainted stuff as mentioned above.

I will start to do it.

> * Some kind of "find similar bugs in other bug trackers" feature would be really awesome.

still in todo.

> * There's a bunch of bugs in there that have been tainted 'W'. These are almost never useful,
> because we're already deep in "bad shit happened" land at that point.
> It'll also mean you could get flooded with oopses from a single crash if something
> keeps on spewing traces. Just give up after filing the first oops.
>
> * Take for example: https://oops.kernel.org/browse-reports/oops-detail/?id=30410
> This is a 2.6.27.5 kernel bug, that was filed *last week*.
> I'd bet dollars to donuts no-one is going to give a crap about that bug.
> I'm not sure if it's better here to never file 'ancient' bugs, or to periodically
> archive/delete ones that have been in the db more than a few years.
>
> * Looking at https://oops.kernel.org/browse-reports/?function=ironlake_crtc_disable&search=submit
> It seems the hashing algorithm for detecting dupes could use some work.
> Many of these traces are probably exactly the same problem.
> Are you hashing symbols in the trace beginning with '? ' ? If so, you probably shouldn't be.

hash function improved.


Thanks for this feedback. There are still a number of improvements
planned, mostly cosmetic ones. I will keep you posted.


Anton.