2005-09-11 16:02:47

by Peter Osterlund

[permalink] [raw]
Subject: What's up with the GIT archive on www.kernel.org?

Since about 20 hours ago, it seems the
linux/kernel/git/torvalds/linux-2.6.git/ archive on http://www.kernel.org
alternates between at least two different HEAD commits. First it was

40 hours ago [PATCH] md: fix BUG when raid10 rebuilds without
enough drives

then it changed to

15 hours ago Merge
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6

then it changed back to the raid10 commit. It looks like it has
flipped back and forth quite a few times. Currently it seems to happen
once every couple of minutes or so.

This affects both gitweb and rsync, but the rsync flipping is not
synchronized with the gitweb flipping.

Does anyone else see this? "host http://www.kernel.org" gives me two IP
addresses:

http://www.kernel.org is an alias for zeus-pub.kernel.org.
zeus-pub.kernel.org has address 204.152.191.5
zeus-pub.kernel.org has address 204.152.191.37

Is it possible that one of those computers hasn't received the latest
changes for some reason?

--
Peter Osterlund - [email protected]
http://web.telia.com/~u89404340


2005-09-11 16:12:48

by Linus Torvalds

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?



On Sun, 11 Sep 2005, Peter Osterlund wrote:
>
> Since about 20 hours ago, it seems the
> linux/kernel/git/torvalds/linux-2.6.git/ archive on http://www.kernel.org
> alternates between at least two different HEAD commits.

Are there perhaps two different front-end machines? And mirroring
problems?

> Does anyone else see this? "host http://www.kernel.org" gives me two IP
> addresses:
>
> http://www.kernel.org is an alias for zeus-pub.kernel.org.
> zeus-pub.kernel.org has address 204.152.191.5
> zeus-pub.kernel.org has address 204.152.191.37
>
> Is it possible that one of those computers hasn't received the latest
> changes for some reason?

Absolutely. The mirroring has been slow again lately. I've packed my
archive, but I suspect others should much more aggressively now be using
the "objects/info/alternates" information to point to my tree, so that
they don't even need to have their objects at all (no packing
even necessary - just running "git prune-packed" on peoples archives
would get rid of any duplicate objects when I pack mine).

Linus

2005-09-11 18:55:21

by Sam Ravnborg

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?

>
> Absolutely. The mirroring has been slow again lately. I've packed my
> archive, but I suspect others should much more aggressively now be using
> the "objects/info/alternates" information to point to my tree, so that
> they don't even need to have their objects at all (no packing
> even necessary - just running "git prune-packed" on peoples archives
> would get rid of any duplicate objects when I pack mine).

Can you post a small description how to utilize this method?


What I've done lately has been to cp -al your .git archive.
This works well when I get everything merged up and has been my lazy
method to avoid doing merges yet (being cogito user I do not trust merge
atm. because I have mixed up older cogito with newest git).

Sam

2005-09-11 19:07:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?



On Sun, 11 Sep 2005, Sam Ravnborg wrote:
> >
> > Absolutely. The mirroring has been slow again lately. I've packed my
> > archive, but I suspect others should much more aggressively now be using
> > the "objects/info/alternates" information to point to my tree, so that
> > they don't even need to have their objects at all (no packing
> > even necessary - just running "git prune-packed" on peoples archives
> > would get rid of any duplicate objects when I pack mine).
>
> Can you post a small description how to utilize this method?

Just do

echo /pub/scm/linux/kernel/git/torvalds/linux-2.6.git/objects > objects/info/alternates

in your tree, and that will tell git that your tree can use my object
directory as an "alternate" source of objects. At that point, you can
remove all objects that I have.

However, that only works with a local directory - you can't say that the
alternate object directory is over the network (unless you use NFS or
similar, of course ;).

Another potential problem is that while the above makes git understand to
pick the objects from my directory, it can in theory cause problems for
mirrors etc - since they mirror things to a different location and/or may
not mirror all of it anyway.

Anyway, modulo those caveats, you can then just do

git prune-packed

and it will remove all unpacked objects in your git archive that can be
reached through a pack-file - including any packfiles in _my_ directory.

Then you never need to pack your own objects any more. Just leave
everything unpacked, and rely on me packing every once in a while, and
just do "git prune-packed" when I do.

That allows a site like kernel.org to effectively share 99% of all
objects, and do it efficiently.

Linus

2005-09-11 19:44:39

by Sam Ravnborg

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?

> >
> > Can you post a small description how to utilize this method?
>
> Just do
>
> echo /pub/scm/linux/kernel/git/torvalds/linux-2.6.git/objects > objects/info/alternates
>
> in your tree, and that will tell git that your tree can use my object
> directory as an "alternate" source of objects. At that point, you can
> remove all objects that I have.

OK - what I did:

cd /pub/scm/linux/kernel/git/sam
rm -rf kbuild.git
git clone /pub/scm/linux/kernel/git/torvalds/linux-2.6.git kbuild.git
rename to .git to kbuild.git

I had to specify both GIT_DIR and GIT_OBJECT_DIRECTORY to make
git-prune-packed behave as expected. I assume this is normal when I
rename the .git directory like in this case.

I will se if any pullers complins (mostly/only Andrew I think).

Sam

2005-09-11 19:56:19

by Linus Torvalds

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?



On Sun, 11 Sep 2005, Sam Ravnborg wrote:
>
> I had to specify both GIT_DIR and GIT_OBJECT_DIRECTORY to make
> git-prune-packed behave as expected. I assume this is normal when I
> rename the .git directory like in this case.

You should only need to specify GIT_DIR - it should figure out that the
object directory follows GIT_DIR on its own.

Also, I forget what version of git is installed on kernel.org. The
"alternates" support has been around for a while, and looking at the date
of "/usr/bin/git" it _seems_ recent (Sep 7), but I haven't seen any
announcement of updating since the last one (which was git-0.99.4, which
is too old).

You can try removing all the packs in your .git/objects/packs directory.
Everything _should_ still work fine.

Famous last words.

Linus

2005-09-11 21:09:37

by Roland Dreier

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?

Linus> You can try removing all the packs in your
Linus> .git/objects/packs directory. Everything _should_ still
Linus> work fine.

Does "everything" include someone doing

git clone rsync://rsync.kernel.org/pub/scm/linux/kernel/git/roland/whatever.git

How about http:// instead of rsync://?

In other words, is the git network transport smart enough to handle
the alternates path?

Or is the idea that everyone will clone your tree and then pull extra
stuff from other trees?

- R.

2005-09-11 21:24:55

by Linus Torvalds

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?



On Sun, 11 Sep 2005, Roland Dreier wrote:
>
> Does "everything" include someone doing
>
> git clone rsync://rsync.kernel.org/pub/scm/linux/kernel/git/roland/whatever.git

Nope. Only server-side smart protocols will handle this.

There is such an anonymous server, btw: "git-daemon" implements anonymous
access much more efficient than rsync/http. Sadly, kernel.org still
doesn't offer it (but it's now used in the wild, ie I've done a couple of
merges with people running the git daemon).

> In other words, is the git network transport smart enough to handle
> the alternates path?

The _git_ network transport is. rsync and http aren't.

Linus

2005-09-11 21:34:27

by Linus Torvalds

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?



On Sun, 11 Sep 2005, Linus Torvalds wrote:
>
> The _git_ network transport is. rsync and http aren't.

Btw, there's no reason why a client-side thing couldn't just parse the
"alternates" thing, and if it doesn't find the objects in the main object
directory, go and fetch them from the alternates itself.

IOW, this is not a fundamental problem with alternates, it's just that
since there is no server-side smarts to handle it (ie just raw file access
with rsync/http), it needs to be handled at the client side instead.

Linus

2005-09-11 22:13:14

by Andrew Morton

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?

Linus Torvalds <[email protected]> wrote:
>
> On Sun, 11 Sep 2005, Peter Osterlund wrote:
> >
> > Since about 20 hours ago, it seems the
> > linux/kernel/git/torvalds/linux-2.6.git/ archive on http://www.kernel.org
> > alternates between at least two different HEAD commits.
>
> Are there perhaps two different front-end machines? And mirroring
> problems?

I think so. Yesterday I was wgetting files from Greg's directory on
kernel.org and kept on getting two totally different sets of files between
successive identical wget invokations.

> Does anyone else see this? "host http://www.kernel.org" gives me two IP
> addresses:
>
> http://www.kernel.org is an alias for zeus-pub.kernel.org.
> zeus-pub.kernel.org has address 204.152.191.5
> zeus-pub.kernel.org has address 204.152.191.37
>
> Is it possible that one of those computers hasn't received the latest
> changes for some reason?

Yes, I'd say that's the problem.

2005-09-11 22:49:43

by Alex Riesen

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?

On 9/12/05, Andrew Morton <[email protected]> wrote:
> > Does anyone else see this? "host http://www.kernel.org" gives me two IP
> > addresses:
> >
> > http://www.kernel.org is an alias for zeus-pub.kernel.org.
> > zeus-pub.kernel.org has address 204.152.191.5
> > zeus-pub.kernel.org has address 204.152.191.37
> >
> > Is it possible that one of those computers hasn't received the latest
> > changes for some reason?
>
> Yes, I'd say that's the problem.

Could this be reason I'm getting this from cogito trying to update git:

Applying changes...
error: unable to find 720d150c48fc35fca13c6dfb3c76d60e4ee83b87
fatal: git-cat-file 720d150c48fc35fca13c6dfb3c76d60e4ee83b87: bad file
usage: git-cat-file [-t | -s | <type>] <sha1>
Invalid commit id: 720d150c48fc35fca13c6dfb3c76d60e4ee83b87

2005-09-12 01:39:53

by Junio C Hamano

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?

Linus Torvalds <[email protected]> writes:

> Btw, there's no reason why a client-side thing couldn't just parse the
> "alternates" thing, and if it doesn't find the objects in the main object
> directory, go and fetch them from the alternates itself.

There is.

For kernel.org, you could say '/pub/scm/blah' in your alternates
and expect it to work, only because http://kernel.org/pub
hierarchy happens to match the absolute path /pub on the
filesystem, but for most people's default HTTP server
installation, they would need to say /var/www/scm/blah to have
alternate work locally, but somebody has to know that the named
directory is served as http://machine.xz/pub/scm/blah somewhere.

Client side smarts need some help from the user here to know
that '/var/www/scm/blah' read off of objects/info/alternates
match that URL.

2005-09-12 02:45:40

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?

On Sunday 11 September 2005 20:39, Junio C Hamano wrote:
> Linus Torvalds <[email protected]> writes:
>
> > Btw, there's no reason why a client-side thing couldn't just parse the
> > "alternates" thing, and if it doesn't find the objects in the main object
> > directory, go and fetch them from the alternates itself.
>
> There is.
>
> For kernel.org, you could say '/pub/scm/blah' in your alternates
> and expect it to work, only because http://kernel.org/pub
> hierarchy happens to match the absolute path /pub on the
> filesystem, but for most people's default HTTP server
> installation, they would need to say /var/www/scm/blah to have
> alternate work locally, but somebody has to know that the named
> directory is served as http://machine.xz/pub/scm/blah somewhere.
>

Call me brain-dead but all of this just makes me rsync my tree to
kernel.org and then manually do "ln -f" for all the packs that Linus
has. This way I am sure tht the tree is what I have plus and it is
"pullable".


--
Dmitry

2005-09-12 03:40:01

by Linus Torvalds

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?



On Sun, 11 Sep 2005, Junio C Hamano wrote:
>
> For kernel.org, you could say '/pub/scm/blah' in your alternates
> and expect it to work, only because http://kernel.org/pub
> hierarchy happens to match the absolute path /pub on the
> filesystem, but for most people's default HTTP server
> installation, they would need to say /var/www/scm/blah to have
> alternate work locally, but somebody has to know that the named
> directory is served as http://machine.xz/pub/scm/blah somewhere.

Yes. We should probably have some well-defined meaning for relative paths
in there regardless (eg just define that they are always relative to the
main GIT_OBJECT_DIRECTORY or something).

That would also allow mirrors to mirror the git archives in different
places, without upsetting the result (as long as they are mirrored
together).

Linus

2005-09-12 17:11:37

by H. Peter Anvin

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?

Junio C Hamano wrote:
>
> For kernel.org, you could say '/pub/scm/blah' in your alternates
> and expect it to work, only because http://kernel.org/pub
> hierarchy happens to match the absolute path /pub on the
> filesystem...
>

Actually it doesn't. /pub in the root directory on kernel.org is just a
convenience symlink.

-hpa

2005-09-12 18:23:13

by Tony Luck

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?

On 9/11/05, Linus Torvalds <[email protected]> wrote:
> There is such an anonymous server, btw: "git-daemon" implements anonymous
> access much more efficient than rsync/http. Sadly, kernel.org still
> doesn't offer it (but it's now used in the wild, ie I've done a couple of
> merges with people running the git daemon).

Should the git daemon take a look at objects/info/alternates to check
that if it exists, it
points to a repository that also has a "git-daemon-export-ok" file?
I don't see that this
could be used for anything nasty, but it does provide a loophole where
the daemon may
open files outside the initial repository ... so a sanity check seems in order.

-Tony

2005-09-12 18:37:39

by Linus Torvalds

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?



On Mon, 12 Sep 2005, Tony Luck wrote:
>
> Should the git daemon take a look at objects/info/alternates to check
> that if it exists, it points to a repository that also has a
> "git-daemon-export-ok" file?

I considered it, but decided against the complexity. I just don't see the
point. The "git-daemon-export-ok" is not so much about security as about
_accidental_ exposure.

Remember: the security is in the writing. If you allow "bad people" enough
capabilities that they can create their own git archive and can read the
target archive, those "bad people" could just export the target archive
some other way in the first place (ie they could have just copied the
files over to their own area).

And there are actually real downsides to requiring "git-daemon-export-ok"
from a security standpoint. In particular, imagine that a company has a
"master archive", and wants to export just a particular "public branch"
from that master archive. The way you can do that right now is to create a
dummy git archive, that is empty except for having one head (symlink to
the public branch head in the master) and an "alternates" pointer to the
master.

See? You don't actually want to expose the master archive itself: so
requiring that one to also have "git-daemon-export-ok" would actually
_defeat_ the security in the system.

So the git approach to security is that you secure the writing side.
That's where you use ssh. And even if you happen to run git-daemon, it
will never export anything that you didn't explicitly mark for export, so
it defaults to a "nothing exported" mode. But once you mark a project for
public export, the branches exposed there really are public.

(And the branches _not_ exposed there are private. Sure, if you can guess
the SHA1 ID's, you can make git-daemon export them, but the point is that
git-daemon will never expose any SHA1's from other projects unless they
have the "git-daemon-export-ok" flag set. And the thing is, if you know
the SHA1's, you already know the contents and you had a leak some other
way, so..).

Linus

2005-09-12 18:42:40

by Ryan Anderson

[permalink] [raw]
Subject: Re: What's up with the GIT archive on www.kernel.org?

On Sun, Sep 11, 2005 at 09:45:33PM -0500, Dmitry Torokhov wrote:
>
> Call me brain-dead but all of this just makes me rsync my tree to
> kernel.org and then manually do "ln -f" for all the packs that Linus
> has. This way I am sure tht the tree is what I have plus and it is
> "pullable".

If you have access to make hardlinks, you should be able to use
git-relink to do the hard work for you.

>From memory:
git relink my_dir1 my_dir2 ... master_dir

or:
git relink my-kernel-tree /pub/scm/.../torvalds/linux.git/

(I think that will work - via a bug in my initial attempt to write
git-relink, I look to make sure the path ends in ".git/" not "/.git/".
So the above should work. I think.)

--

Ryan Anderson
sometimes Pug Majere