2003-01-01 19:31:58

by Larry McVoy

[permalink] [raw]
Subject: Raw data from dedicated kernel bug database

What are the chances that the raw data from the kernel bugdb could be
made available? I bet Bradford wants it and I know I want. We are
working on an integrated bugdb for BitKeeper and it would be cool if
we could track the real db at osdl.

The advantage of having the data available is that for the BK kernel
users we could give them access to the bugdb while they are doing
checkins so that the developers link the changes to the bugs as they
do the fixes, which is a good time to do it.

To calm any fears that we are trying to take over the bugdb, we're not.
We just want to track it. Any changes made in a BK bugdb are trivially
exportable to an external format and if the need arises we'll work with
IBM/OSDL to make that happen. In fact, we can automate it.


Getting back to the data, the ideal raw data format for us would be

for bug in `cat list-o-bugs`
do for field in `cat list-o-fields`
do extract $field from $bug into $bug.$field
set-timestamp \
of $bug.$field \
to date that $bugd.$field was created/updated
done
done

I'm not sure if the fields are all self-contained. For example, the updates
are done by someone, is that someone part of the data or is it "metadata".
What about state transitions (open -> closed)?

The other alternative is to make the whole infrastructure available as
tarball, the mysql db et al, so that someone could slurp that down and
poke at it locally.

Any chance of this? I'd much appreciate it.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm


2003-01-01 19:58:48

by John Bradford

[permalink] [raw]
Subject: Re: Raw data from dedicated kernel bug database

> What are the chances that the raw data from the kernel bugdb could be
> made available? I bet Bradford wants it and I know I want.
^^^^^^^^

I do have a first name, you know :-).

It would be nice to have some more bugs listed in my new database - at
the moment there is only one, but I was reluctant to spend time
manually filling it, when that time could be better spent improving
the system itself.

Importing the existing data in to my database isn't going to
automatically give you all of the advantages of it, because it's
ability to search via config options, and track version information
obviously relies on that information being present, but it would be
simple enough for someone, (me, if nobody else is interested), to add
it one the database is populated.

I'm working on adding more features at the moment, but if you've got
feedback, (positive or negative), please let me hear it - traffic to
it has been pretty high, but I'm not getting much mail about it.

Personally, I think the ability to upload your .config file, and have
it say, "OK, the following bugs are known to be triggered by those
options", and to have a colour-coded table of
working/broken/untested/can't test kernel versions for each bug could
potentially save us all loads of work, but if you disagree, just let
me know.

> To calm any fears that we are trying to take over the bugdb, we're not.
> We just want to track it. Any changes made in a BK bugdb are trivially
> exportable to an external format and if the need arises we'll work with
> IBM/OSDL to make that happen. In fact, we can automate it.

Let me know if I can add some kind of export function to my DB that
will help BK users, Larry^WMcVoy, and I'll consider it.

John.

2003-01-01 21:22:37

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Raw data from dedicated kernel bug database

> What are the chances that the raw data from the kernel bugdb could be
> made available? I bet Bradford wants it and I know I want. We are
> working on an integrated bugdb for BitKeeper and it would be cool if
> we could track the real db at osdl.
>
> The advantage of having the data available is that for the BK kernel
> users we could give them access to the bugdb while they are doing
> checkins so that the developers link the changes to the bugs as they
> do the fixes, which is a good time to do it.
>
> To calm any fears that we are trying to take over the bugdb, we're not.
> We just want to track it. Any changes made in a BK bugdb are trivially
> exportable to an external format and if the need arises we'll work with
> IBM/OSDL to make that happen. In fact, we can automate it.

This needs some more thought about how this would work, and exactly what
we'd achieve by doing it. I think this is only going to work if it's a
2-way link, fully automated - I'm not keen on creating a situation
with two disjoint sets of data. Having access to bug data at checkin
time sounds useful to me, but not if that data doesn't get fed back up
to the main database.

Fully automated links would have to be done very carefully to avoid
screwing things up - having a agent going wild logging stuff at high
rates doesn't sound fun to me. If you only needed to edit some seperate
BK field we added to the database for a reference, that would seem at
first thought to be a lot easier and safer.

I'm not sure how other people feel about exporting stuff into bitkeeper
type-licensed products ... if the non-BK people like Alan and Andrew, and
the other people who've done lots of the work in the DB like Dave and
Randy,
and the OSDL are OK with it, then I'd be willing to help. If they object,
then probably not. Personally, I'm not opposed to using non-free software,
but I find the fact that BK licensing changed after it had been adopted
extremely unsettling ... the merge / diff-viewer tool looks cool though ;-)

M.

2003-01-01 22:06:45

by Larry McVoy

[permalink] [raw]
Subject: Re: Raw data from dedicated kernel bug database

> This needs some more thought about how this would work, and exactly what
> we'd achieve by doing it. I think this is only going to work if it's a
> 2-way link, fully automated - I'm not keen on creating a situation
> with two disjoint sets of data. Having access to bug data at checkin
> time sounds useful to me, but not if that data doesn't get fed back up
> to the main database.

That's fine with me, I had assumed that is what you wanted.

> I'm not sure how other people feel about exporting stuff into bitkeeper
> type-licensed products ... if the non-BK people like Alan and Andrew, and
> the other people who've done lots of the work in the DB like Dave and
> Randy,
> and the OSDL are OK with it, then I'd be willing to help. If they object,
> then probably not.

This raises the question of who owns the data in the bug database
hosted by OSDL/IBM. It seems questionable that you would even consider
restricting access to it. Not even sourceforge does that, you can ftp
the CVS tarballs, they build them nightly. If you are going to restrict
access to the data for political reasons I find it difficult to believe
that your database is going to gain any real traction.

We're not proposing to replace the OSDL bug database, we're proposing
to mirror it. Exactly the same way as people mirror the BK trees into
patches, CVS, SVN, whatever. Doesn't it seem strange to you that the
so-called non-free BK data is freely available but the free bug data
is not freely available? Sounds pretty unfree to me but I'll leave
it to the community to decide whether they want the data locked up in
politically correct tools.

Some clarification is probably in order. Does IBM or OSDL think they
have a copyright on the bug data? I don't remember any discussion of
that and http://www.osdl.org/legal/ip_policy.html seems to suggest that
the bug data is not considered OSDL IP. If you're not going to make
the data easily available it's not that hard to make a one way bridge
using wget but if you think that data is copyrighted I'm sure that I'm
not the only one who would like to know about that.

A bug database that only IBM can really use, at the DB level, seems
like a pretty unopen and unfree thing. If it's supposed to support the
open source developers shouldn't they all be able to get at the data?
In any way they find useful? What if Red Hat wants access to the SQL
data because it is too big to manage through web forms? Do they get it?
What about all the other companies? If the position is that you get to
pick and choose who gets at the real data then I can promise you someone
else will start hosting a database with far more open rules. I'll do
it if noone else does. The data should be freely accessible in the
most reasonable form. The GPL was very careful to not allow people to
pretend to conform by providing the source in anything but the most
useful form, it would seem that you should conform to those rules.
If you aren't going to, please tell us.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2003-01-02 00:24:47

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Raw data from dedicated kernel bug database

>> I'm not sure how other people feel about exporting stuff into bitkeeper
>> type-licensed products ... if the non-BK people like Alan and Andrew, and
>> the other people who've done lots of the work in the DB like Dave and
>> Randy,
>> and the OSDL are OK with it, then I'd be willing to help. If they object,
>> then probably not.
>
> This raises the question of who owns the data in the bug database
> hosted by OSDL/IBM. It seems questionable that you would even consider
> restricting access to it. Not even sourceforge does that, you can ftp
> the CVS tarballs, they build them nightly. If you are going to restrict
> access to the data for political reasons I find it difficult to believe
> that your database is going to gain any real traction.

I didn't say I was going to restrict access to it. I said that depedant
on the feeling of the community, I would or would not help you automate
things. Moreover, I think I was reasonably clear it wasn't really my
opinion on the subject I was going on, it was that of the community.

Who owns the data is indeed an interesting question. I'd say it's owned
by the community, but that's not very helpful in practice. If we have to
get all explicit about things, it should probably fall under some GPL type
license. My first gut-feeling impression of a general priniciple is that
if you copy the data and make changes to it, you should have to make
those changes available back to people (without using non-GPL (or
similar) tools). But that's not explicitly stated at the moment - if it
needs to be, then I guess I'll try to get some consensus on exactly what
people want.

> We're not proposing to replace the OSDL bug database, we're proposing
> to mirror it. Exactly the same way as people mirror the BK trees into
> patches, CVS, SVN, whatever. Doesn't it seem strange to you that the
> so-called non-free BK data is freely available but the free bug data
> is not freely available? Sounds pretty unfree to me but I'll leave
> it to the community to decide whether they want the data locked up in
> politically correct tools.

My objection is more pragmatic actually - I don't want to do anything
that pisses off the people who won't touch BK enough that they feel
the need to create their own database, and split the effort in twain.
As I said before, I don't want to end up with multiple databases with
distinct sets of data in them ... I'll do what I can to avoid that.

I thought the data in BK was always free, I never personally called
that into question - the minute that is no longer true, I think we
have a much larger problem on our hands. In reply to the "sounds pretty
unfree to me" comment, you could apply much the same argument to the
GPL - you can't copy that code into anywhere you please and do anything
you like with it. But ultimately I agree with your final sentence, it's
up to the community.

> Some clarification is probably in order. Does IBM or OSDL think they
> have a copyright on the bug data? I don't remember any discussion of
> that and http://www.osdl.org/legal/ip_policy.html seems to suggest that
> the bug data is not considered OSDL IP. If you're not going to make
> the data easily available it's not that hard to make a one way bridge
> using wget but if you think that data is copyrighted I'm sure that I'm
> not the only one who would like to know about that.

I don't know of any copyright claims on the current data, nor would I
like to see one that restricted future data to either IBM and/or OSDL.

> A bug database that only IBM can really use, at the DB level, seems
> like a pretty unopen and unfree thing. If it's supposed to support the
> open source developers shouldn't they all be able to get at the data?
> In any way they find useful? What if Red Hat wants access to the SQL
> data because it is too big to manage through web forms? Do they get it?
> What about all the other companies? If the position is that you get to
> pick and choose who gets at the real data then I can promise you someone
> else will start hosting a database with far more open rules. I'll do
> it if noone else does. The data should be freely accessible in the
> most reasonable form. The GPL was very careful to not allow people to
> pretend to conform by providing the source in anything but the most
> useful form, it would seem that you should conform to those rules.
> If you aren't going to, please tell us.

Errm ... now you're getting way off base. For one, this has very little,
if anything at all, to do with IBM. Anything I've posted thus far on
this topic is my own personal opinion, and I've repeatedly made it clear
that I'm looking for community opinion on the matter. This is community
project, not an IBM database, and I've made that very clear to people
inside as well as outside of IBM. If other people hate the idea of
exporting the data they filed, processesed and manged to BK, then I'm
not going to spend my time helping you.

Given there's no real licensing on the current data, I'm not going to
stop you doing a wget of the current data. If the community seems to
want it, I'll look at putting some sort of license agreement on future
data. I don't feel desperately obliged to export things for you in a
form that happens to be more convenient to you than the access everybody
else has, but if people in the community that I respect tell me they think
I should, I'll probably try.

However, before everyone summons a gaggle of lawyers to start bickering
about details of all this ... can we see if we can just work out how to
get along just fine? Assuming people don't object massively to you
pulling the data into BK, what added value can we get from that integration,
and how are we going to make it available back to the community? What data
are you looking at appending to each bug, and at what cycle in the process?
Do you want to add information when a (supposed) fix is checked in? Anything
else?

M.

2003-01-02 01:12:39

by Alan

[permalink] [raw]
Subject: Re: Raw data from dedicated kernel bug database

On Wed, 2003-01-01 at 22:15, Larry McVoy wrote:
> > I'm not sure how other people feel about exporting stuff into bitkeeper
> > type-licensed products ... if the non-BK people like Alan and Andrew, and
> > the other people who've done lots of the work in the DB like Dave and
> > Randy,

I don't care. I care that people have the ability to take the data and
do clever stuff with it. I don't care what tools they use so long as
they can choose what tools they use.


2003-01-02 01:33:58

by Timothy D. Witham

[permalink] [raw]
Subject: Re: Raw data from dedicated kernel bug database

The data is there for everybody. As long as we can automate the
extraction I don't see any issue with multiple people extracting
and using with other tools. Data and manure only work if you
can spread it around.

My opinion is that the more uses of the data the better. So
the question is, "What does Larry need to make this happen?".


Tim

On Wed, 2003-01-01 at 18:03, Alan Cox wrote:
> On Wed, 2003-01-01 at 22:15, Larry McVoy wrote:
> > > I'm not sure how other people feel about exporting stuff into bitkeeper
> > > type-licensed products ... if the non-BK people like Alan and Andrew, and
> > > the other people who've done lots of the work in the DB like Dave and
> > > Randy,
>
> I don't care. I care that people have the ability to take the data and
> do clever stuff with it. I don't care what tools they use so long as
> they can choose what tools they use.
--
Timothy D. Witham - Lab Director - [email protected]
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office) (503)-702-2871 (cell)
(503)-626-2436 (fax)

2003-01-02 02:08:17

by Larry McVoy

[permalink] [raw]
Subject: Re: Raw data from dedicated kernel bug database

Thanks Tim!

On Wed, Jan 01, 2003 at 04:38:58PM -0800, Timothy D. Witham wrote:
> The data is there for everybody. As long as we can automate the
> extraction I don't see any issue with multiple people extracting
> and using with other tools. Data and manure only work if you
> can spread it around.

That is a great quote, mind if I stick on my quotes page?

> My opinion is that the more uses of the data the better. So
> the question is, "What does Larry need to make this happen?".

If your guys are too busy to figure out how to do this, since I'm asking
you to do something for me, how about they give me a snapshot of the
DB's, I'll get one of my guys to tinker with it enough that they can
get the data out, and then we'll provide a script to do this on an
ongoing basis. So you could run

cd /home/bugme
make export

out of cron and it would serve up a tarball that anyone could eat.
Anyone else who is interested in the data can contact me with their
desired export format and I'm merge sort over the requests. If
nobody cares then what I'd create is a directory tree that looked
like:

bugdb/
bugs/
MM-YYYY/
bug1.field1
bug1.field2
...
bug1.fieldN
bug2.field1
bug2.field2
...
bug2.fieldN
...
users/
user1.field1
...
user1.fiendN
user2.field1
...
user2.fiendN
...

In other words, a zillion little files, a cluster of files per bugid,
with each file in the cluster representing a field in the bug. That
way there are no parse/unparse issues (if we used XML then we need to
unXML it to get it into some other DB). Each MM-YYYY directory is
used to store all bugs created in that month (so we don't end up with
one directory with 10 million files in it).

It wastes tons of space because there will be zillions of these files
but it's a tarball and it's only for import/export. And it has to be
the most neutral format.

How's that sound?
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2003-01-02 02:31:11

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Raw data from dedicated kernel bug database

Alan:
> I don't care. I care that people have the ability to take the data and
> do clever stuff with it. I don't care what tools they use so long as
> they can choose what tools they use.

Tim:
> The data is there for everybody. As long as we can automate the
> extraction I don't see any issue with multiple people extracting
> and using with other tools. Data and manure only work if you
> can spread it around.

OK - cool. Sounds like people are happy ;-)

Larry, can I presume that you'll reciprocate, and export whatever you
do to the data in BK in some argument-free format (probably the same
one we export to you)? That's what I was getting at by talking about
GPL style licenses ... perhaps not particularly coherently ;-)

I think the concerns I had about tools going wild are actually fairly
easy to resolve by making it a pull-pull interchange ... don't know
why I was thinking of push models.

Thanks,

M.



2003-01-02 02:47:47

by Larry McVoy

[permalink] [raw]
Subject: Re: Raw data from dedicated kernel bug database

> Larry, can I presume that you'll reciprocate, and export whatever you
> do to the data in BK in some argument-free format (probably the same
> one we export to you)?

Yup. A BK database is actually a BK repostory with an SQL layer on
top of it. So all of the stuff you can do with BK you can do with
BK/Database. We can export changes as patches, as flat files, as
associative arrays in perl, take your pick.

> I think the concerns I had about tools going wild are actually fairly
> easy to resolve by making it a pull-pull interchange ... don't know
> why I was thinking of push models.

Cool. I've already tracked down an SQL hacker who is willing to contract
with us to write the scripts to get the data out of your Bugzilla database.
He said that I need to ask you to do this:

shut down the mysql database
grab all the MySQL files and stuff them in a tarball
turn on the mysql database again

Then he can set up a mysql instance here and start hacking on the scripts.
How's that sound?
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2003-01-02 05:04:18

by Martin J. Bligh

[permalink] [raw]
Subject: Re: Raw data from dedicated kernel bug database

> Yup. A BK database is actually a BK repostory with an SQL layer on
> top of it. So all of the stuff you can do with BK you can do with
> BK/Database. We can export changes as patches, as flat files, as
> associative arrays in perl, take your pick.

OK, something like that sounds good to me.

> Cool. I've already tracked down an SQL hacker who is willing to contract
> with us to write the scripts to get the data out of your Bugzilla
> database. He said that I need to ask you to do this:
>
> shut down the mysql database
> grab all the MySQL files and stuff them in a tarball
> turn on the mysql database again
>
> Then he can set up a mysql instance here and start hacking on the scripts.
> How's that sound?

I'll leave the details to the database guys at OSDL, but I presume they
do backups in a similar fashion already, so ...

M.

2003-01-02 17:10:47

by Timothy D. Witham

[permalink] [raw]
Subject: Re: Raw data from dedicated kernel bug database

Sorry for the slow response I'm on vacation and sharing
the computer with 4 teenagers who are all addicted
to IRC. But I think that Martin can get this done.

Martin if you need help from my folks please contact
them and get this done.

Tim

On Wed, 2003-01-01 at 18:56, Larry McVoy wrote:
> > Larry, can I presume that you'll reciprocate, and export whatever you
> > do to the data in BK in some argument-free format (probably the same
> > one we export to you)?
>
> Yup. A BK database is actually a BK repostory with an SQL layer on
> top of it. So all of the stuff you can do with BK you can do with
> BK/Database. We can export changes as patches, as flat files, as
> associative arrays in perl, take your pick.
>
> > I think the concerns I had about tools going wild are actually fairly
> > easy to resolve by making it a pull-pull interchange ... don't know
> > why I was thinking of push models.
>
> Cool. I've already tracked down an SQL hacker who is willing to contract
> with us to write the scripts to get the data out of your Bugzilla database.
> He said that I need to ask you to do this:
>
> shut down the mysql database
> grab all the MySQL files and stuff them in a tarball
> turn on the mysql database again
>
> Then he can set up a mysql instance here and start hacking on the scripts.
> How's that sound?
--
Timothy D. Witham - Lab Director - [email protected]
Open Source Development Lab Inc - A non-profit corporation
15275 SW Koll Parkway - Suite H - Beaverton OR, 97006
(503)-626-2455 x11 (office) (503)-702-2871 (cell)
(503)-626-2436 (fax)