LinuxLists.cc - Re: ia64 git pull

2005-04-21 21:56:04

Subject: Re: ia64 git pull

Adding linux-kernel to Cc: list, as I'm sure Linus wants to hear from
all maintainers, not just those that hang out on the linux-ia64 list.

>On Thu, 21 Apr 2005, Linus Torvalds wrote:
>Btw, just in case it wasn't obvious anyway: I based pretty much _all_ of
>the git design on three basic goals: performance, distribution and
>integrity checking. Everything else pretty much flows from those three
>things.

>But I really tried to make sure that git ends up not just working as a
>system for me to apply patches, but as a system for me to merge other
>peoples work. And I think technically, git does that merge thing very
>well, and it's certainly living up to my expectations.

>However, I kind of knew what my expectations _were_ in the first place,
>and as such the thing is very much designed for what I think I needed to
>have to work with you guys. Making it fit the old workflow was obviously a
>big deal, but I don't know how people really worked on the "other end",
>so...

>In other words, what this digression is leading up to is just me saying
>that if you have feedback on how this whole git thing is working for
>_you_, please don't feel shy. I realize that it's a bit raw and rough
>right now, and people are working on making for better interfaces, but if
>you have some particular worry or issue, don't feel like git was forced
>upon you as a fait accompl?, but complain and tell me what your biggest
>problems are.

>I may not be able to do a lot about them (the unmentioned fourth basic
>goal was obviously: simple enough that I could actually implement the dang
>thing), but still..

>So if there's some feature (or _lack_ of one) that really bugs you, speak
>up.

I can't quite see how to manage multiple "heads" in git. I notice that in
your tree on kernel.org that .git/HEAD is a symlink to heads/master ...
perhaps that is a clue.

I'd like to have at least two, or perhaps even three "HEADS" active in my
tree at all times. One would correspond to my old "release" tree ... pointing
to changes that I think are ready to go into the Linus tree. A second would
be the "testing" tree ... ready for Andrew to pull into "-mm", but not ready
for the base. The third (which might only exist in my local tree) would be
for changes that I'm playing around with.

I can see how git can easily do this ... but I don't know how to set up my
public tree so that you and Andrew can pull from the right HEAD.

-Tony

2005-04-21 22:29:17

by Linus Torvalds

[permalink] [raw]

Subject: Re: ia64 git pull

On Thu, 21 Apr 2005 [email protected] wrote:
>
> I can't quite see how to manage multiple "heads" in git. I notice that in
> your tree on kernel.org that .git/HEAD is a symlink to heads/master ...
> perhaps that is a clue.

It's mainly a clue to bad practice, in my opinion. I personally like the
"one repository, one head" approach, and if you want multiple heads you
just do multiple repositories (and you can then mix just the object
database - set your SHA1_FILE_DIRECTORY environment variable to point to
the shared object database, and you're good to go).

But yes, if you want to mix heads in the same repo, you can do so, but
it's a bit dangerous to switch between them, since you'll have to blow any
dirty state away, or you end up having checked-out state that is
inconsistent with the particular head you're working on.

Switching a head is pretty easy from a _technical_ perspective: make sure
your tree is clean (ie fully checked in), and switch the .git/HEAD symlink
to point to a new head. Then just do

read-tree .git/HEAD
checkout-cache -f -a

and you're done. Assuming most checked-out state matches, it should be
basically instantaneous.

Oh, and you may want to check that yoy didn't have any files that are now
stale, using "show-files --others" to show what files are _not_ being
tracked in the head you just switched to.

I think Pasky has helper functions for doing this, but since I think
multiple heads is really too confusing for mere mortals like me, I've not
looked at it.

> I'd like to have at least two, or perhaps even three "HEADS" active in my
> tree at all times. One would correspond to my old "release" tree ... pointing
> to changes that I think are ready to go into the Linus tree. A second would
> be the "testing" tree ... ready for Andrew to pull into "-mm", but not ready
> for the base. The third (which might only exist in my local tree) would be
> for changes that I'm playing around with.

I _really_ suggest having separate directories and not play with heads.

As shown above, git can technically support what you're doing very
efficiently, but the problem is not technology - it's user confusion. It's
just too damn easy to do the wrong thing if you end up using the same
directory for all your heads, and switch between them.

In contrast, having separate directories, all with their own individual
heads, you are much more likely to know what you're up to by just getting
the hints from the environment.

Linus

2005-04-21 22:53:38

by Petr Baudis

[permalink] [raw]

Subject: Re: ia64 git pull

Dear diary, on Thu, Apr 21, 2005 at 11:55:43PM CEST, I got a letter
where [email protected] told me that...
> I can't quite see how to manage multiple "heads" in git. I notice that in
> your tree on kernel.org that .git/HEAD is a symlink to heads/master ...
> perhaps that is a clue.
>
> I'd like to have at least two, or perhaps even three "HEADS" active in my
> tree at all times. One would correspond to my old "release" tree ... pointing
> to changes that I think are ready to go into the Linus tree. A second would
> be the "testing" tree ... ready for Andrew to pull into "-mm", but not ready
> for the base. The third (which might only exist in my local tree) would be
> for changes that I'm playing around with.

To set up that, go into your "master" working directory, and do:

git fork release ~/release
git fork testing ~/testing

Then in ~/release or ~/testing respectively, you have working trees for
the respective branches.

> I can see how git can easily do this ... but I don't know how to set up my
> public tree so that you and Andrew can pull from the right HEAD.

Currently, git pull will _never_ care about your internal heads
structure in the remote repository. It will just take HEAD at the given
rsync URI, and update the remote branch's head in your repository to
that commit ID. This actually seems to be one of the very common
pitfalls for people.

The way to work around that is to setup separate rsync URIs for each of
the trees. ;-) I think I will make git-pasky (Cogito) accept also URIs
in form

rsync://host/path!branchname

which will allow you to select the particular branch in the given
repository, defaulting to the "master" branch.

Would that work for you?

--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

2005-04-21 22:59:06

by Petr Baudis

[permalink] [raw]

Subject: Re: ia64 git pull

Dear diary, on Fri, Apr 22, 2005 at 12:29:07AM CEST, I got a letter
where Linus Torvalds <[email protected]> told me that...
> On Thu, 21 Apr 2005 [email protected] wrote:
> >
> > I can't quite see how to manage multiple "heads" in git. I notice that in
> > your tree on kernel.org that .git/HEAD is a symlink to heads/master ...
> > perhaps that is a clue.
>
> It's mainly a clue to bad practice, in my opinion. I personally like the
> "one repository, one head" approach, and if you want multiple heads you
> just do multiple repositories (and you can then mix just the object
> database - set your SHA1_FILE_DIRECTORY environment variable to point to
> the shared object database, and you're good to go).

There are two points regarding this:

(i) You need to track heads of remote branches anyway.

(ii) I want an easy way for user to refer to head of another working
tree on the same repository (so that you can do just git merge testing).

This is why the multiple heads are there, and it's actually everything
they _do_. There's nothing more behind it.

--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

2005-04-21 23:01:45

by Tony Luck

[permalink] [raw]

Subject: Re: ia64 git pull

> It's mainly a clue to bad practice, in my opinion. I personally like the
> "one repository, one head" approach, and if you want multiple heads you
> just do multiple repositories (and you can then mix just the object
> database - set your SHA1_FILE_DIRECTORY environment variable to point to
> the shared object database, and you're good to go).

Maybe I just have a terminology problem?

I want to have one "shared objects database" which I keep locally and
mirror publicly at kernel.org/pub/scm/...

I will have several "repositories" locally for various grades of patches,
each of which use SHA1_FILE_DIRECTORY to point to my single repository.
So I never have to worry about getting all the git commands to switch
context ... I just use "cd ../testing", and "cd ../release".

But now I need a way to indicate to consumers of the public shared object
data base which HEAD to use.

Perhaps I should just say "merge 821376bf15e692941f9235f13a14987009fd0b10
from rsync://rsync.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git"?

That works for interacting with you, because you only pull from me when
I tell you there is something there to pull.

But Andrew had a cron job or somthing to keep polling every day. So he
needs to see what the HEAD is.

Does this make sense ... or am I still missing the point?

-Tony

2005-04-21 23:18:02

by Linus Torvalds

[permalink] [raw]

Subject: Re: ia64 git pull

On Thu, 21 Apr 2005 [email protected] wrote:
>
> I want to have one "shared objects database" which I keep locally and
> mirror publicly at kernel.org/pub/scm/...

Ahh, ok. That's easy.

Just set up one repository. Then, make SHA1_FILE_DIRECTORY point to that
repository, and everybody will automatically share all file objects.

HOWEVER. And this is a big however - I don't think you want to do this at
this point.

Why? Because I'm still using the stupid "get all objects" thing when I
pull. That's not a fundamental design decision, but basically not doing so
requires that the other end be "git aware", and have some server that is
trustworthy that you can tell "get me all objects I need".

In the absense of that kind of git-aware server, anybody pulling from you
would have to pull _every_ object you have, regardless of whether they
wanted to use them or not. I don't think that's very nice.

So in the long run this issue goes away - we'll just have synchronization
tools that won't get any unnecessary pollution. But in the short run I
actually check my git archive religiously for being clean, and I do

fsck-cache --unreachable $(cat .git/HEAD)

quite often - not because git has been flaky, but simply beause I'm anal.
And getting objects from other branches would mess with that..

> But now I need a way to indicate to consumers of the public shared object
> data base which HEAD to use.

Yes. You'd just need to document where you put those heads. As you say,
you can make it be part of an announcement:

> Perhaps I should just say "merge 821376bf15e692941f9235f13a14987009fd0b10
> from rsync://rsync.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git"?

..but that doesn't actually work very well even for me, because I'd much
rather automate pulling from you, rather than having to cut-and-paste the
sha names.

So I think you could just define a head name, and say something like:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git/HEAD.for-linus

and we're all done. Give andrew a different filename, and you're done. The
above syntax is trivial for me to automate.

Linus

2005-04-22 00:23:05

by Petr Baudis

[permalink] [raw]

Subject: Re: ia64 git pull

Dear diary, on Fri, Apr 22, 2005 at 01:19:47AM CEST, I got a letter
where Linus Torvalds <[email protected]> told me that...
> So in the long run this issue goes away - we'll just have synchronization
> tools that won't get any unnecessary pollution. But in the short run I
> actually check my git archive religiously for being clean, and I do
>
> fsck-cache --unreachable $(cat .git/HEAD)
>
> quite often - not because git has been flaky, but simply beause I'm anal.
> And getting objects from other branches would mess with that..

(Note that when using git-pasky, you need to do

fsck-cache --unreachable $(cat .git/heads/*)

instead.)

--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

2005-04-22 00:23:05

by Bernd Eckenfels

[permalink] [raw]

Subject: Re: ia64 git pull

In article <[email protected]> you wrote:
> Why? Because I'm still using the stupid "get all objects" thing when I
> pull.

one could do a symlink/hardlink parallel tree for a specific snapshot with
GIT tools, and then only poll that with git-unaware copy tools.

I guess this would make sense for the most common kernel development lines.

Another improvement would be to add a "secondary storage fetch" script, so
the git tools can, if they cant find a object by hash just query an external
pool. In combination with the above this will allow users to compare progress.

Greetings
Bernd

2005-04-22 01:25:54

by Petr Baudis

[permalink] [raw]

Subject: Re: ia64 git pull

Dear diary, on Fri, Apr 22, 2005 at 01:01:13AM CEST, I got a letter
where [email protected] told me that...
> But now I need a way to indicate to consumers of the public shared object
> data base which HEAD to use.
>
> Perhaps I should just say "merge 821376bf15e692941f9235f13a14987009fd0b10
> from rsync://rsync.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git"?

I've just added to git-pasky a possibility to refer to branches inside
of repositories by a fragment identifier:

rsync://rsync.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git#testing

will refer to your testing branch in that repository.

HTH,

--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

2005-04-22 01:48:48

by Inaky Perez-Gonzalez

[permalink] [raw]

Subject: Re: ia64 git pull

>>>>> Petr Baudis <[email protected]> writes:

> I've just added to git-pasky a possibility to refer to branches
> inside of repositories by a fragment identifier:

> rsync://rsync.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git#testing

> will refer to your testing branch in that repository.

Can we use something other than #? we'll have to scape it all the time
in the shell (or at least also allow some other char, something
without special meta-character meaning in the shell, like %).

--

Inaky

2005-04-22 01:53:46

by Petr Baudis

[permalink] [raw]

Subject: Re: ia64 git pull

Dear diary, on Fri, Apr 22, 2005 at 03:48:35AM CEST, I got a letter
where Inaky Perez-Gonzalez <[email protected]> told me that...
> >>>>> Petr Baudis <[email protected]> writes:
>
> > I've just added to git-pasky a possibility to refer to branches
> > inside of repositories by a fragment identifier:
>
> > rsync://rsync.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6.git#testing
>
> > will refer to your testing branch in that repository.
>
> Can we use something other than #? we'll have to scape it all the time
> in the shell (or at least also allow some other char, something
> without special meta-character meaning in the shell, like %).

Remember that it's an URL (so you can't use '%'), and '#' is the
canonical URL fragment identifier delimiter. (And fragments are perfect
for this kind of thing, if you look at the RFC, BTW.)

Still, why would you escape it? My shell will not take # as a comment
start if it is immediately after an alphanumeric character.

--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

2005-04-22 02:06:47

by Inaky Perez-Gonzalez

[permalink] [raw]

Subject: Re: ia64 git pull

>>>>> Petr Baudis <[email protected]> writes:

> Remember that it's an URL (so you can't use '%'), and '#' is the
> canonical URL fragment identifier delimiter. (And fragments are
> perfect for this kind of thing, if you look at the RFC, BTW.)

Ouch, true--rule out %...

> Still, why would you escape it? My shell will not take # as a
> comment start if it is immediately after an alphanumeric character.

Well, you just taught me something about bash I didn't know....

/me goes back to his hole with some more knowledge.

Thanks,

--

Inaky

2005-04-22 03:28:29

by David A. Wheeler

[permalink] [raw]

Subject: Re: ia64 git pull

Petr Baudis <[email protected]> writes:
>>Still, why would you escape it? My shell will not take # as a
>>comment start if it is immediately after an alphanumeric character.

I guess there MIGHT be some command shell implementation
that stupidly _DID_ accept "#" as a comment character,
even immediately after an alphanumeric.
If that's true, then using # there would be a pain for portability.

But I think that's highly improbable. A quick peek
at the Single Unix Specification as posted by the Open Group
seems to say that, according to the standards, that's NOT okay:
http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html#tag_02
Basically, the command shell is supposed to tokenize, and "#"
only means comment if it's at the beginning of a token.

And as far as I can tell, it's not an issue in practice either.
I did a few quick tests on Fedora Core 3 and OpenBSD.
On Fedora Core 3, I can say that bash, ash & csh all do NOT
consider "#" as a comment start if an alpha precedes it.
The same is true for OpenBSD /bin/sh, /bin/csh, and /bin/rksh.
If such different shells do the same thing (this stuff isn't even
legal C-shell text!), it's likely others do too.

--- David A. Wheeler