2013-04-12 20:22:15

by D M German

[permalink] [raw]
Subject: helping with tracking commits across repos


Hi Everybody,

I am professor of computer science at the University of Victoria
(Canada).

During the last year and a half, we have been trying to track the
commits as they move in the entire linux git repos ecosystem. We have
amassed a good amount of data that tell us for every commit (and in fact
for every unique patch inside a commit) where it has been and whether it
has reached linus or not ---or any other repository, as a matter of
fact.

please look at the following URLs:

http://o.cs.uvic.ca:20810/perl/cid.pl?cid=9753dfe19a85e7e45a34a56f4cb2048bb4f50e27

http://o.cs.uvic.ca:20810/perl/cid.pl?cid=55345fb9ff68e2e5c0259c814542e72aec972c02

or

http://o.cs.uvic.ca:20810/perl/cid.pl?cid=e59bcdae87ec116dde25da6d725f79fefb253693

it will give you an idea of the data we have. You can also track other
commits using the input box, if you are interested.

I wonder if this information is of use to any of you. If you have
specific needs on how you think this info (and some more we have) can be
of use, please let me know.

By the way, I'll be at the Linux Collaboration Summit next week (I am
involved with the development of SPDX). If any of you is interested to
meet, please let me know,


--daniel german
[email protected]
http://turingmachine.org


2013-04-12 20:31:51

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: helping with tracking commits across repos

On Fri, Apr 12, 2013 at 01:22:09PM -0700, D M German wrote:
>
> Hi Everybody,
>
> I am professor of computer science at the University of Victoria
> (Canada).
>
> During the last year and a half, we have been trying to track the
> commits as they move in the entire linux git repos ecosystem. We have
> amassed a good amount of data that tell us for every commit (and in fact
> for every unique patch inside a commit) where it has been and whether it
> has reached linus or not ---or any other repository, as a matter of
> fact.
>
> please look at the following URLs:
>
> http://o.cs.uvic.ca:20810/perl/cid.pl?cid=9753dfe19a85e7e45a34a56f4cb2048bb4f50e27
>
> http://o.cs.uvic.ca:20810/perl/cid.pl?cid=55345fb9ff68e2e5c0259c814542e72aec972c02
>
> or
>
> http://o.cs.uvic.ca:20810/perl/cid.pl?cid=e59bcdae87ec116dde25da6d725f79fefb253693
>
> it will give you an idea of the data we have. You can also track other
> commits using the input box, if you are interested.

Very interesting, thanks for the links.

> I wonder if this information is of use to any of you. If you have
> specific needs on how you think this info (and some more we have) can be
> of use, please let me know.

For stable releases, I can't think of anything, other than the tracking
of commits from a stable tree into a distro tree, as you do show
happening above into the openSUSE kernel, which is really nice.

But, for the linux-next stuff, that could be very interesting. We
always like seeing what commits in a -rc1 release did NOT previously
show up in linux-next. Stephen has some tools on how to do this, it
would be interesting to see if your tools could do something like that
to track down the "rouge" commits that don't get community testing.

> By the way, I'll be at the Linux Collaboration Summit next week (I am
> involved with the development of SPDX). If any of you is interested to
> meet, please let me know,

I'll be there as well, if you want to talk about this in person, just
grab me if you see me around.

thanks,

greg k-h

2013-04-13 16:48:41

by Vinod Koul

[permalink] [raw]
Subject: Re: helping with tracking commits across repos

On Fri, 2013-04-12 at 13:22 -0700, D M German wrote:
> Hi Everybody,
>
> I am professor of computer science at the University of Victoria
> (Canada).
>
> During the last year and a half, we have been trying to track the
> commits as they move in the entire linux git repos ecosystem. We have
> amassed a good amount of data that tell us for every commit (and in fact
> for every unique patch inside a commit) where it has been and whether it
> has reached linus or not ---or any other repository, as a matter of
> fact.
i see some of the commits shown not in linus tree, although they are...
perhaps a bug?
http://o.cs.uvic.ca:20810/perl/cid.pl?cid=765024697807ad1e1cac332aa891253ca4a339da

It shows the same for linus's merge!
http://o.cs.uvic.ca:20810/perl/cid.pl?cid=cfb63bafdb87bbcdc5d6dbbca623d3f69475f118

--
Vinod Koul
Intel Corp.

2013-04-13 18:01:28

by D M German

[permalink] [raw]
Subject: Re: helping with tracking commits across repos

vinod>
vinod>
vinod>
vinod> On Fri, 2013-04-12 at 13:22 -0700, D M German wrote:
vinod> > Hi Everybody,
vinod> >
vinod> > I am professor of computer science at the University of Victoria
vinod> > (Canada).
vinod> >
vinod> > During the last year and a half, we have been trying to track the
vinod> > commits as they move in the entire linux git repos ecosystem. We have
vinod> > amassed a good amount of data that tell us for every commit (and in fact
vinod> > for every unique patch inside a commit) where it has been and whether it
vinod> > has reached linus or not ---or any other repository, as a matter of
vinod> > fact.
vinod> i see some of the commits shown not in linus tree, although they are...
vinod> perhaps a bug?
vinod> http://o.cs.uvic.ca:20810/perl/cid.pl?cid=765024697807ad1e1cac332aa891253ca4a339da
vinod>
vinod> It shows the same for linus's merge!
vinod> http://o.cs.uvic.ca:20810/perl/cid.pl?cid=cfb63bafdb87bbcdc5d6dbbca623d3f69475f118

Hi Vinod,

the tracking of the path-to-linus is something that is not done
automatically yet (I have to start the process manually, as there are
some issues I need to verify--it is a heuristic), but I plan to run it
automatically.

Nonetheless, it might be run once a day, so the commits of the day will
always be slightly behind.

One thing that will help me is that if any of you feel I am not tracking
your repository, please send me an email with its address.

thank you!

--daniel


--
Daniel M. German "Don't try to be like Jackie.
There is only one Jackie...
Jackie Chan -> Study computers instead"
http://turingmachine.org/
http://silvernegative.com/
dmg (at) uvic (dot) ca
replace (at) with @ and (dot) with .

2013-04-13 18:57:01

by D M German

[permalink] [raw]
Subject: Re: helping with tracking commits across repos

D M German twisted the bytes to say:

dmg> One thing that will help me is that if any of you feel I am not tracking
dmg> your repository, please send me an email with its address.

dmg> thank you!

dmg> --daniel

I have now listed all the repositories I am tracking:

http://o.cs.uvic.ca:20810/perl/repos.pl

if your repo is not in the list, please let me know. I spend a fair
amount of time tracking them down, and that will really help.

Thanks!



--
Daniel M. German "Mathematics belong to God."
Donald Knuth
http://turingmachine.org/
http://silvernegative.com/
dmg (at) uvic (dot) ca
replace (at) with @ and (dot) with .

2013-04-14 04:01:49

by Ben Hutchings

[permalink] [raw]
Subject: Re: helping with tracking commits across repos

I notice that where a commit is cherry-picked cleanly on a stable
branch, like 6b90466cfec2a2fe027187d675d8d14217c12d82, your script finds
the corresponding commit on the stable branch. This is useful.

But where some backporting changes are needed, such as for
f01fc1a82c2ee68726b400fadb156bd623b5f2f1, which became
8ebfe28181b02766ac41d9d841801c146e6161c1 on the 3.2.y branch, the
corresponding commit isn't found.

It should be possible to find such backported commits based on a simple
regex search over the commit message:

for (<$body>) {
if (/^commit (.*) upstream\.\n/) {
$upstream = $1;
} elsif (/^\[ Upstream commit (.*) \]\n/) {
$upstream = $1;
} elsif (/^\(cherry picked from commit (.*)\)\n/) {
$upstream = $1;
}
}

This covers all formats in current use to show a direct correspondence
between a single mainline and stable branch commit. (Really we should
settle on just one format...)

Ben.

--
Ben Hutchings
It is impossible to make anything foolproof because fools are so ingenious.


Attachments:
signature.asc (828.00 B)
This is a digitally signed message part

2013-04-15 18:21:35

by D M German

[permalink] [raw]
Subject: Re: helping with tracking commits across repos

Hi Ben,

On 4/13/13, Ben Hutchings <[email protected]> wrote:
> I notice that where a commit is cherry-picked cleanly on a stable
> branch, like 6b90466cfec2a2fe027187d675d8d14217c12d82, your script finds
> the corresponding commit on the stable branch. This is useful.
>
> But where some backporting changes are needed, such as for
> f01fc1a82c2ee68726b400fadb156bd623b5f2f1, which became
> 8ebfe28181b02766ac41d9d841801c146e6161c1 on the 3.2.y branch, the
> corresponding commit isn't found.
>
> It should be possible to find such backported commits based on a simple
> regex search over the commit message:

I took a more aggressive approach. I decided to look for 40 length
character hashes in the
comment and patch of the commit.

if the hash is in the database of commits I maintain, then I display
it. if not, it is ignored.
You can revisit the output of that commit:

http://o.cs.uvic.ca:20810/perl/cid2.pl?cid=8ebfe28181b02766ac41d9d841801c146e6161c1

This means I am able to show when a commit is referenced by another one, and
the commits that a commit references.

hopefully this solves the use-case you described.

The commit logs are being scanned as I write this. It should be done
in a couple of hours.

--dmg

---
Daniel M. German
http://turingmachine.org

2013-04-15 18:39:14

by Ben Hutchings

[permalink] [raw]
Subject: Re: helping with tracking commits across repos

On Mon, Apr 15, 2013 at 11:21:32AM -0700, dmg wrote:
> Hi Ben,
>
> On 4/13/13, Ben Hutchings <[email protected]> wrote:
[...]
> > It should be possible to find such backported commits based on a simple
> > regex search over the commit message:
>
> I took a more aggressive approach. I decided to look for 40 length
> character hashes in the
> comment and patch of the commit.
[...]
> hopefully this solves the use-case you described.

That's great, thankyou.

> The commit logs are being scanned as I write this. It should be done
> in a couple of hours.

By the way, I noticed that you're not HTML-escaping the commit
header.

Ben.

--
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
- Albert Camus

2013-04-15 21:49:41

by D M German

[permalink] [raw]
Subject: Re: helping with tracking commits across repos



Greg KH twisted the bytes to say:

Greg> But, for the linux-next stuff, that could be very interesting. We
Greg> always like seeing what commits in a -rc1 release did NOT previously
Greg> show up in linux-next. Stephen has some tools on how to do this, it
Greg> would be interesting to see if your tools could do something like that
Greg> to track down the "rouge" commits that don't get community testing.


Hi Greg,

Let me see if I understood you.

I looked into commits created during 2013 that satisfy the following
condition:

* We observed them in Linus repo _before_ Linux-next (that does not
necessarily mean they didn't appear in linus before linux next)
* Are not merge commits
* Were committed in 2013 (no point of showing you 2012, i guess).

they are listed here (it takes a couple of minutes to create the list... so
it is not a realtime list, but I that can be fixed
if it is useful):

http://o.cs.uvic.ca:20810/perl/next.pl

is this what you had in mind? obviously Linus commits appear in his repo
before Next, so I could drop him from the report.

I have also added the commit that merges each commit, which is probably
useful too. If it is empty either we haven't update the data or it
was done straight into linus repo (as in 3e2e0d2c222bdf5bafd722dec1618fa6073ef372).

--daniel


--
Daniel M. German
http://turingmachine.org/
http://silvernegative.com/
dmg (at) uvic (dot) ca
replace (at) with @ and (dot) with .

2013-04-15 23:09:34

by Greg KH

[permalink] [raw]
Subject: Re: helping with tracking commits across repos

On Mon, Apr 15, 2013 at 02:49:34PM -0700, D M German wrote:
>
>
> Greg KH twisted the bytes to say:
>
> Greg> But, for the linux-next stuff, that could be very interesting. We
> Greg> always like seeing what commits in a -rc1 release did NOT previously
> Greg> show up in linux-next. Stephen has some tools on how to do this, it
> Greg> would be interesting to see if your tools could do something like that
> Greg> to track down the "rouge" commits that don't get community testing.
>
>
> Hi Greg,
>
> Let me see if I understood you.
>
> I looked into commits created during 2013 that satisfy the following
> condition:
>
> * We observed them in Linus repo _before_ Linux-next (that does not
> necessarily mean they didn't appear in linus before linux next)
> * Are not merge commits
> * Were committed in 2013 (no point of showing you 2012, i guess).
>
> they are listed here (it takes a couple of minutes to create the list... so
> it is not a realtime list, but I that can be fixed
> if it is useful):
>
> http://o.cs.uvic.ca:20810/perl/next.pl

Yes, that's a great thing. Maybe the ability to see the subject: line
of the commit somewhere easier than having to click through to the patch
would be nice, so we can just glance at the report and say, "Look at all
of the btrfs patches that showed up out of nowhere, what happened?"

Oh, and if you could do it for a specific kernel release, not a date
range, that would be nice (i.e. report for 3.9-rc1, 3.8-rc1, 3.7-rc1,
etc.)

> is this what you had in mind? obviously Linus commits appear in his repo
> before Next, so I could drop him from the report.

That's just a tiny number so it's probably not needed.

> I have also added the commit that merges each commit, which is probably
> useful too. If it is empty either we haven't update the data or it
> was done straight into linus repo (as in 3e2e0d2c222bdf5bafd722dec1618fa6073ef372).

Yes, that is useful, thanks.

greg k-h

2013-04-16 00:13:51

by D M German

[permalink] [raw]
Subject: Re: helping with tracking commits across repos



Greg KH twisted the bytes to say:

>> http://o.cs.uvic.ca:20810/perl/next.pl

Greg> Yes, that's a great thing. Maybe the ability to see the subject: line
Greg> of the commit somewhere easier than having to click through to the patch
Greg> would be nice, so we can just glance at the report and say, "Look at all
Greg> of the btrfs patches that showed up out of nowhere, what happened?"

Greg> Oh, and if you could do it for a specific kernel release, not a date
Greg> range, that would be nice (i.e. report for 3.9-rc1, 3.8-rc1, 3.7-rc1,
Greg> etc.)

What would be the simplest approach to getting the date? I suspect that
it can be done by doing some command line magic in Linus git repo.

>> is this what you had in mind? obviously Linus commits appear in his repo
>> before Next, so I could drop him from the report.

Greg> That's just a tiny number so it's probably not needed.

>> I have also added the commit that merges each commit, which is probably
>> useful too. If it is empty either we haven't update the data or it
>> was done straight into linus repo (as in 3e2e0d2c222bdf5bafd722dec1618fa6073ef372).

I's not difficult to do. I'll take care of it.

--dmg


--
Daniel M. German "Trying is the first step
Homer Simpson -> towards failure."
http://turingmachine.org/
http://silvernegative.com/
dmg (at) uvic (dot) ca
replace (at) with @ and (dot) with .

2013-04-16 12:50:57

by Luis Henriques

[permalink] [raw]
Subject: Re: helping with tracking commits across repos

Hi Daniel,

On Sat, Apr 13, 2013 at 11:01:24AM -0700, D M German wrote:
> vinod>
> vinod>
> vinod>
> vinod> On Fri, 2013-04-12 at 13:22 -0700, D M German wrote:
> vinod> > Hi Everybody,
> vinod> >
> vinod> > I am professor of computer science at the University of Victoria
> vinod> > (Canada).
> vinod> >
> vinod> > During the last year and a half, we have been trying to track the
> vinod> > commits as they move in the entire linux git repos ecosystem. We have
> vinod> > amassed a good amount of data that tell us for every commit (and in fact
> vinod> > for every unique patch inside a commit) where it has been and whether it
> vinod> > has reached linus or not ---or any other repository, as a matter of
> vinod> > fact.
> vinod> i see some of the commits shown not in linus tree, although they are...
> vinod> perhaps a bug?
> vinod> http://o.cs.uvic.ca:20810/perl/cid.pl?cid=765024697807ad1e1cac332aa891253ca4a339da
> vinod>
> vinod> It shows the same for linus's merge!
> vinod> http://o.cs.uvic.ca:20810/perl/cid.pl?cid=cfb63bafdb87bbcdc5d6dbbca623d3f69475f118
>
> Hi Vinod,
>
> the tracking of the path-to-linus is something that is not done
> automatically yet (I have to start the process manually, as there are
> some issues I need to verify--it is a heuristic), but I plan to run it
> automatically.
>
> Nonetheless, it might be run once a day, so the commits of the day will
> always be slightly behind.
>
> One thing that will help me is that if any of you feel I am not tracking
> your repository, please send me an email with its address.

While looking at the repos list, I realised you are tracking some ubuntu
git trees that are not actually useful, namely:

git://kernel.ubuntu.com/roc/linux-2.6
git://kernel.ubuntu.com/rtg/net-next

I would like to ask if you could remove these two and add instead the
linux-3.5.y branch in the git://kernel.ubuntu.com/ubuntu/linux.git repo
(I'm not sure if you track the branches separately).

Cheers,
--
Luis

2013-04-16 16:33:17

by Greg KH

[permalink] [raw]
Subject: Re: helping with tracking commits across repos

On Mon, Apr 15, 2013 at 05:13:45PM -0700, D M German wrote:
>
>
> Greg KH twisted the bytes to say:
>
> >> http://o.cs.uvic.ca:20810/perl/next.pl
>
> Greg> Yes, that's a great thing. Maybe the ability to see the subject: line
> Greg> of the commit somewhere easier than having to click through to the patch
> Greg> would be nice, so we can just glance at the report and say, "Look at all
> Greg> of the btrfs patches that showed up out of nowhere, what happened?"
>
> Greg> Oh, and if you could do it for a specific kernel release, not a date
> Greg> range, that would be nice (i.e. report for 3.9-rc1, 3.8-rc1, 3.7-rc1,
> Greg> etc.)
>
> What would be the simplest approach to getting the date? I suspect that
> it can be done by doing some command line magic in Linus git repo.

You want to look at the commits from the last major release (i.e. 3.8)
to the -rc1 release, (i.e. 3.9-rc1). You can't look at the dates,
that's not going to reflect when the patch landed in Linus's branch.

So a simple:
git rev-list --no-merges v3.8..v3.9-rc1
will give you the commits needed.

As for storage issues, look at the -s option to 'git clone'. If you
read the man page, it talks about the .git/objects/info/alternates file
you can use to save on diskspace.

Use Linus's repo as a full clone, and then all other git repo you pull
from, should have the alternates file pointing to Linus's tree. That
way, only the commits that are not in Linus's tree will be in the
directory, the shared ones will not be duplicated.

You can do this today on your existing repos by creating the alternates
file, and then doing 'git gc' on the repo to clean out the duplicates.
That should save you a _lot_ of storage space.

And you should also be keeping only --bare repos, no need for the
checked out trees on your disks, you don't need them, right?

Hope this helps,

greg k-h

2013-04-24 06:26:41

by D M German

[permalink] [raw]
Subject: Re: helping with tracking commits across repos



Greg> On Mon, Apr 15, 2013 at 05:13:45PM -0700, D M German wrote:
>>
>>
>> Greg KH twisted the bytes to say:
>>
>> >> http://o.cs.uvic.ca:20810/perl/next.pl
>>
Greg> Yes, that's a great thing. Maybe the ability to see the subject: line
Greg> of the commit somewhere easier than having to click through to the patch
Greg> would be nice, so we can just glance at the report and say, "Look at all
Greg> of the btrfs patches that showed up out of nowhere, what happened?"
>>
Greg> Oh, and if you could do it for a specific kernel release, not a date
Greg> range, that would be nice (i.e. report for 3.9-rc1, 3.8-rc1, 3.7-rc1,
Greg> etc.)
>>
>> What would be the simplest approach to getting the date? I suspect that
>> it can be done by doing some command line magic in Linus git repo.

Greg> You want to look at the commits from the last major release (i.e. 3.8)
Greg> to the -rc1 release, (i.e. 3.9-rc1). You can't look at the dates,
Greg> that's not going to reflect when the patch landed in Linus's branch.

Hi Greg,

It took me longer than expected, but I finally got it working.

I have a heuristic to estimate when a commit is merged by Linus. It
seems to work well in commits since 2008.

The commits that mark the releases are nicely labeled by Linus. Since I
know in which commit any commit is merged by Linus I can determine what
release the commit is part of.

Take a look:

http://o.cs.uvic.ca:20810/perl/next.pl

It is only doing 2013, but if needed, I can expand the range of
dates. Is suspect older are not that interesting any more.

I still have to "cron" the update of some data to fully do this report
automatically. I hope to do that very soon.

It will help to have some extra eyes. If anybody finds a bug please let
me know.

I also improved some of the other reports to include the log of the
commit whenever is makes sense.

I haven't tried the suggestions on how to reduce space... that is my
next goal.

--daniel

--
Daniel M. German "Beauty is the first test; there is no
permanent place in the world for ugly
G. H. Hardy -> mathematics."
http://turingmachine.org/
http://silvernegative.com/
dmg (at) uvic (dot) ca
replace (at) with @ and (dot) with .