It is quite obvious that comparison of programs of given type (SMC) on some program site (Bazaar-NG) is usually biased towards said program, perhaps unconsciously: by emphasizing the features which were important For example simple namespace for git: you can use shortened sha1 (even to only 6 characters, although usually 8 are used), you can use tags, you can use ref^m~n syntax. I'm not sure about "No" in "Supports Repository". Git supports multiple branches in one repository, and what's better supports development using multiple branches, but cannot for example do a diff or a cherry-pick between repositories (well, you can use git-format-patch/git-am to cherry-pick changes between repositories...). About "checkouts", i.e. working directories with repository elsewhere: you can use GIT_DIR environmental variable or "git --git-dir" option, or symlinks, and if Nguyen Thai Ngoc D proposal to have .gitdir/.git "symref"-like file to point to repository passes, we can use that. Partial checkouts are only partially supported as of now; it means you have to do some lowe level stuff to do partial checkout, and be carefull when comitting. BTW it depends what you mean by partial checkout, but they are somewhat incompatibile with atomic commits to snapshot based repository. Git supports renames in its own way; it doesn't use file ids, nor remember renames (the new "note" header for use e.g. by porcelains didn't pass if I remember correctly). But it does *detect* moving _contents_, and even *copying* _contents_ when requested. And of course it detect renames in merges. Git doesn't have some "plugin framework", but because it has many "plumbing" commands, it is easy to add new commands, and also new merge strategies, using shell scripts, Perl, Python and of course C. So the answer would be "Somewhat", as git has plugable merge strategies, Gaah, subscribe-to-post mailing list! -- Jakub Narebski Warsaw, Poland ShadeHawk on #git -
I believe they mean checking out only the latest few revisions instead of copying the whole repo. This issue is a problem for Mozilla. If you want to change a line in the git version you have to download the I believe partial checkout means being able to check one directory tree out of the repo and work on it while ignoring what is happening in the rest of the repo. This is another issue for Mozilla which has -- Jon Smirl jonsmirl@gmail.com -
From http://bazaar-vcs.org/RcsComparisons A "Checkout" is a working tree that points elsewhere for its RCS data. You can always do like Linux kernel did, splitting repository into current and historical part (which would contain also dead branches), and creating and publishing current-historical graft file, to join So split different projects into different repositories. There was some helper program (git-splitrepo or something like that) for that posted on git mailing list. And use "superrepository" to gather all projects together (see last discussion about subprojects on git mailing list). -- Jakub Narebski Poland -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bazaar's namespace is "simple" because all branches can be named by a URL, and all revisions can be named by a URL + a number. If that's true of Git, then it certainly has a simple namespace. Using That sounds right. So those branches are persistent, and can be worked It sounds like the .gitdir/.git proposal would give Git "checkouts", by Yes, I'm very much aware of that tension. It will be fun when Bazaar You'll note we referred to that bevhavior on the page. We don't think what Git does is the same as supporting renames. AIUI, some Git users It sounds like you're saying it's extensible, not that it supports plugins. Plugins have very simple installation requirements. They can provide merge strategies, repository types, internet protocols, new commands, etc., all seamlessly integrated. What you're describing actually sounds like the Arch approach to extensibility: provide a whole bunch of basic commands and let users build an RCS on top of that. As the author of two different Arch front-ends, I can say I haven't found that approach satisfactory. Invoking multiple commands tends re-invoke the same validation routines over and over, killing efficiency, and diagnostics tend to be pretty poorly integrated. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFFNAb90F+nu1YWqI0RAvRDAJ9HHHdbhT1+aA3wOGeuUDkjRIr7BQCcDBKB cL+DAy5GdTDk8Iz9TUkQ//M= =AJAu -----END PGP SIGNATURE----- -
In my experience there are two key features to rename support. The first that files move about efficiently ie. we don't have to carry a different copy of the same file for each name it has had, this git handles nicely. The second is the seemless following of history 'back', this git does not do trivially (when limited to specific files). git log on a renamed file pretty much stops at the rename point and you have deal with it yourself. I would love to see someone respond with a pickaxe like command line which would list each and every change and its origin though merges and the like. Hmmm. -apw -
Well, all refs (branches and tags) are named by [relative] path. So for example we can have 'master', 'next', 'jc/diff' branches, 'v1.4.0' and 'examples/tag' tags. Cogito for example uses <repository URL>#<branch> Well, <ref>~<n> means <n>-th _parent_ of a given ref, which for branches (which constantly change) is a moving target. There was proposal to add some kind of serial number to git (like Subversion revision numbers) and even solution how to do this... but one must realize that any serial number must be _local_ to the repository. One cannot have universally valid revision numbers (even only per branch) in distributed development. Subversion can do that only because it is centralized SCM. Global numbering and distributed nature doesn't mix... hence contents based sha1 as commit identifiers. But this doesn't matter much, because you can have really lightweight tags in git (especially now with packed refs support). So you can have Branches are persistent, have _separate_ (!) namespace (are not incorporated in repository URL according to some kind of convention like in Subversion), can be worked independently, you can easily switch between branches in one working directory. Branches are cheap in git (notion of topic branches). I wonder if any SCM other than git has easy way to "rebase" a branch, i.e. cut branch at branching point, and transplant it to the tip of other branch. For example you work on 'xx/topic' topic branch, and want to have changes in those branch but applied to current work, not to the version some time ago when you have started working on said feature. What your comparison matrick lacks for example is if given SCM saves information about branching point and merges, so you can get where two branches diverged, and when one branch was merged into Actually it is better to work with clone of repository, perhaps either symlinking object database, or by alternates mechanism (with alternates repositories would share old history, but gather new ...
I agree. Each Git repository is designed to work with one working directory. Using .gitdir/.git proposal, you are likely to checkout two working directories from one repo. -- Duy -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ah. Bazaar uses negative numbers to refer to <n>th parents, and positive numbers to refer to the number of commits that have been made Sure. Our UI approach is that unique identifiers can usefully be abstracted away with a combination of URL + number, in the vast majority The nice thing about revision numbers is that they're implicit-- no one If I understand correctly, in Bazaar, you'd just merge the current work I'm not sure what you mean about divergence. For example, Bazaar records the complete ancestry of each branch, and determining the point of divergence is as simple as finding the last common ancestor. But are you considering only the initial divergence? Or if the branches merge and then diverge again, would you consider that the point of divergence? merge-point tracking is a prerequisite for Smart Merge, which does I'm not sure what you mean by API, unless you mean the commandline. If that's what you mean, surely all unix commands are extensible in that regard. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNGKQ0F+nu1YWqI0RAsW+AJoDOsNRmBjo3raT43JL6qn7SuJNRwCfe9l5 oAZ9OyrxMQlHnwrruhcjz9Y= =RNuG -----END PGP SIGNATURE----- -
But this only works when the URL is public. In Git I can just lookup the unique SHA1 for a revision in my private repository and toss it into an email with a quick copy and paste. With Bazaar it sounds like I'd have to do that relative to some known public repository, which just sounds like more work to me. But I don't want to see this otherwise interesting thread devolve into Git has two approaches: - merge: The two independent lines of development are merged together under a new single graph node. This is a merge commit and has two parent pointers, one for each independent line of development which was combined into one. Up to 16 independent lines can be merged at once, though 12 is the record. - rebase: The commits from one line of development are replayed onto a totally different line of development. This is often used to reapply your changes onto the upstream branch after the upstream has changed but before you send your changes upstream. It can often generate more readable commit history. I believe what you are talking about in Bazaar is the former (merge) I'm believe you nailed what Jakub was talking about on the head. And yes, I noticed its in your matrix but its not very clear. I think that some additional explanation there may help other readers. -- Shawn. -
Yes, but then people need to know how to get it out of your private repository. For stuff that goes into well-known repositories I suppose You can also name a revision using its UUID, in which case things will For the 'rebase' operation in Bazaar you can use 'bzr graft': http://spacepants.org/src/bzrgraft/ -- Martin -
What do you do once a branch has been thrown away, or has had 20 other branches merged into it? Does the offset-number change for the revision merge != rebase though, although they are indeed similar. Let's take the example of a 'master' branch and topic branch topicA. If you rebase topicA onto 'master', development will appear to have been serial. If you instead merge them, it will either register as a real merge or, if the branch tip of 'master' is the branch start-point of topicA, it will result in a "fast-forward" where 'master' is just updated to the I'm fairly certain he's talking about the API in the sense it's being talked about in every other application. Extensive work has been made to libify a lot of the git code, which means that most git commands are made up of less than 400 lines of C code, where roughly 80% of the code is command-specific (i.e., argument parsing and presentation). -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We always track the number of parents since the initial commit in the Ah, now I see what you mean, and the "graft" plugin mentioned by others Ah, okay. So it sounds to me like git is extensible, though not as thoroughly as bzr. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFFNTat0F+nu1YWqI0RAn9aAJ9WzMrM72be+3SlwCpvJXQ/X2Y3nQCfeYk3 NTIJuZSze9URUaAsiO4Hu5o= =9nvr -----END PGP SIGNATURE----- -
While this I think is quite reliable (there was idea to store "generation number" with each commit, e.g. using not implemented "note" header, or commit-id to generation number "database" as a better heuristic than timestamp for revision ordering in git-rev-list output), and probably independent on repository (it is global property of commit history, and commit history is included in sha1 of its parents), numbering branching Very useful as a kind of poor-man's-Quilt (or StGit). You develop some feature step by step, commit by commit in your repository cooking it in topic branch. Then before sending it to mailing list or maintainer as a series of patches (using git-format-patch and git-send-email) you rebase it on top of current work (current state), to ensure that Fast-forward is a really good idea. Perhaps you could implement it, I think having good API for C, shell and Perl (and to lesser extent for any scripting language) means that it is extensible more. Git is not as of yet libified; when it would be we could think about bindings for other programming languages (there is preliminary Java binding/interface). -- Jakub Narebski Poland -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 We support it as 'pull', but merge doesn't do it automatically, because we'd rather have merge behave the same all the time, and because 'pull' I guess it's a value judgement on which is more important to extensibility: Git has more language support. Bzr has plugin autoloading, Protocol plugins, Repository format plugins, and more. Because Python supports monkey-patching, a plugin can change absolutely anything. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFFNUrP0F+nu1YWqI0RAizXAJ0Wnf2ZoIRpaba3mX2L4pN9XcWDPQCePtg/ G/W6Oxm+kd8SzhGEEfLAxL8= =VqC7 -----END PGP SIGNATURE----- -
We want linear history, not polluted by merges. For example you cannot send merge commit via email. Another problem is that you want to send _series_ of patches, string of commits (revisions), creating feature part by part, with clean history; with merge you get _final result_ which will apply cleanly, with rebase you would get that series I smell yet another terminology conflict (although this time fault is on the git side), namely that in git terminology "pull" is "fetch" (i.e. getting changes done in remote repository since laste "fetch" Which is _not_ a good idea. Git is created in such way, that the repository is abstracted away (introduction of pack format, and improving pack format can and was done "behind the scenes", not changing any porcelanish (user) commands), but we don't want any chage that would change this abstraction. Changing repository format is not a good idea for "dumb" protocols; native protocol is quite extensible (for example there was introduced multi-ack extension for better downloading of multiple branches with lesser number of object in the pack sent; even earlier there were intoduced thin packs), and does a kind of feature detection between client and server. Adding cURL based FTP read-only support to existing HTTP support was a matter of few lines, if I remember correctly. Besides, if monkey-patching is something akin to advices, I guess that performance might suffer. To make perhaps not that good analogy. In git adding new commands is like adding new filesystem to Linux kernel using existing VFS interface, or existing FUSE/LUFS interface. In Bazaar adding new command is like writing new filesystem support (plugin) in mikrokernel like L4/Mach. (And please take note for what project git was created for :-)) -- Jakub Narebski ShadeHawk on #git Poland -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yes, that's something that I'd heard about the kernel development methodology-- that a series of small patches is preferred to one patch that makes the whole change. That's not the way we operate. We like to review all the changes at once. But because bundles are applied with a 'merge' command, not a 'patch' command, an old bundle will tend to apply more cleanly than an I'm not sure what you think Bazaar does. In Bazaar, a repository format plugin implements the same API that a native repository format does. I can't parse this. Repository formats and protocols are different I was meaning dumb protocol extension. I can't say how extensible the We support read and write over native, ftp and WebDAV (a plugin). We No, monkey-patched code executes at the same speed as unpatched code. There are arguments against monkey-patching, but speed is not one of them. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNVkM0F+nu1YWqI0RAjCaAJwOcWSUdVy7RpUZROJVxAC9aj/V/wCfUg0T uHkdc9k6i+v0QnhEvTXdszM= =YO8G -----END PGP SIGNATURE----- -
Perhaps it would be nice to have "bundles" in git too. As of now we can save arbitrary part of history in a pack, but it is binary not textual representation. Some of git workflow stems from old, pre-SCM Linux kernel workflow of sending _patches_ via email. By the way, are bzr "bundles" compatibile with ordinary patch? git-format-patch patches are. They have additional metainfo, But if I remember correctly Subversion does not remember merge points (merge commits), so how can you provide full Bazaar-NG compatibility with Subversion repository as backend? Some repository formats lack some features. Besides, as I said repository database and stuff is quite well abstracted away. In git we have import tools (most of them capable of incremental import), a few exchange tools like git-cvsexportcommit, git-cvsserver, and "Dumb" protocols in git are protocols for which server provides access to contents git repository plus some additional info (usually generated using hooks). The client (be it git-fetch or git-push) discovers which files to download or what to upload, but it only can download repository "as is". So if server repository was created with repository format plugin, Native git protocol (git:// and git+ssh://) does feature discovery, then negotiates what contents has to be send, and finally tries to send minimal Git has read-only access over git:// protocol (served by git-daemon on port 9418), read-write access over git+ssh:// protocol (you can limit exposition using git-shell), read-only access via HTTP, HTTPS, FTP "dumb" protocols, read-write access via WebDAV "dumb" protocol. Git is open-source, we don't need plugins ;-) -- Jakub Narebski ShadeHawk on #git Poland -
And deprecated read-only (I think), deprecated, suggested to use only for cloning, rsync:// "dumb" protocol. -- Jakub Narebski Poland -
Actually, the reason to _not_ have bundles very much stems from the fact that BK did have bundles, and they were pretty horrid. It would be easy to send the exact same data as the native git protocol sends over ssh (or the git port) as an email encoding. We did that a few times with BK (there it's called "bk send" and "bk receive" to pack and unpack those things), and after doing it about five times, I absolutely refused to ever do it again. There's just no point, except to make your mailbox grow without bounds, and it was really annoying. So sending things as patches is just a lot more convenient if you want emails. And if you want to sync two repos directly, I think we've gotten sufficiently past the old UUCP days when you want to use email as a packetization medium. That said, "bundles" certainly wouldn't be _hard_ to do. And as long as nobody tries to send _me_ any of them, I won't mind ;) Linus -
I never used BK, but my understanding is that it was based on changesets, so a bundle was a group of changesets. Because a git commit represents the entire tree state, how can we avoid sending the entire tree in each bundle? The interactive protocols can ask "what do you have?" but an email bundle is presumably meant to work without a round trip. We could always make a guess ("git send --remote-has master~10") but that seems awfully error-prone. I assume a changeset-oriented system would implicitly keep some concept of "I think Linus is at master~10" and do it automatically. -Peff -
We could always anchor at a well known point ("git send v2.6.18.."). If you as the recipient do not have the preimage, the "bundle" would identify what the assumed common ancestor is and you can fetch it before proceeding. -
That's not the problem. That's easy to handle - and we already do. That's the whole point of the wire-transfer protocol (ie sending deltas, and only Right, but they can do exactly what bk did: you have to have a reference to what the other side has. In git, that's usually even simpler: you'd do git send origin.. and that "origin" is what the other end is expected to already have. Of course, if you send an unconnected bundle (ie you give an origin that the other end _doesn't_ have), you're screwed. In other words, to get such a pack, we'd _literally_ just do something like git-rev-list --objects-edge origin.. | git-pack-objects --stdout | uuencode and that would be it. You'd still need to add a "diffstat" to the thing, and tell the other end what the current HEAD is (so that it knows what it's supposed to fast-forward to), but it _literally_ is that simple. "plug-in architecture" my ass. "I recognize this - it's UNIX!". Linus -
Dear diary, on Wed, Oct 18, 2006 at 04:52:25PM CEST, I got a letter Took me exactly an hour from mkdir cogito-bundle to cg-push to kernel.org. :-) cogito-bundle is an example on how to create third-party addons or plugins adding own commands to Cogito and using Cogito's infrastructure. It's not _that_ easy currently since you have to replicate large part of the build infrastructure locally; that could be fixed by installing some "library makefiles" and asciidoc toolkit to /usr/share or something, if there would be a real demand for such an addon API. cg-help and the cg wrapper will pick up the newly installed commands automagically. The only thing missing is updating cogito(7) to list the addon commands, which would take a bit more work. Though it's an example, it's actually supposed to be useful, by doing exactly what is outlined above - l - it lets you exchange commits over mail by so-called "bundles", similar to e.g. Bazaar bundles - basically, it is like push or fetch, but over email, and the commit ids are preserved when transferred in bundles (if you just send patches, the commit ids will end up different). The provided cg-bundle and cg-unbundle commands are rather crude and don't support many things - they don't actually include a diff, only a diffstat, etc. The uuencoded bundle is inlined in the mail, which I suspect isn't very useful; perhaps it would be more practical to just attach it binarily. Feel free to send patches (or bundles ;). An example bundle is available at http://pasky.or.cz/~pasky/cp/example-bundle.txt as generated by cogito.master$ cg-bundle -r v0.18 -m"Subject is this" \ -m"And some body now..." --stdout and cogito-bundle is available at git://git.kernel.org/pub/scm/cogito/cogito-bundle.git/ (gitweb http://kernel.org/git/?p=cogito/cogito-bundle.git) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo ...
Dear diary, on Wed, Oct 18, 2006 at 08:52:25PM CEST, I got a letter By the way, originally I just wanted to index and save the pack, but when trying to feed it to git-index-pack, I kept getting fatal: packfile '.git/objects/pack/pack-b2ab684daebea5b9c5a6492fa732e0d2e1799c8e.pack' has unresolved deltas while feeding it to git-unpack-objects works fine. Any idea what's wrong? (BTW, I got the id by sha1summing the pack file; is there an existing way to name a pack properly if I have it lying around, unnamed? sha1sum seems to be specific to a fairly new GNU coreutils version.) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
Yes. You told the pipeline, with --objects-edge, to create a thin pack. By definition that is _not_ indexable. -
Ah true. I missed the "thin" pack. Any idea why we should still prevent this? It is not like it was a technical limitation. Nicolas -
It still is in sha1-file.c; or at least the last time I looked at that code. The base is always resolved from the same pack/index as the delta. If you fix sha1-file.c sure, I don't see why you can't allow indexing thin packs. -- Shawn. -
If there are advantages to do so then maybe. That would be for another day though, as I've been burned a bit with packs recently. Nicolas -
I guess its my turn then to work in the mmap window code, huh? :-) -- Shawn. -
There are bigger reasons to _never_ allow packs to contain deltas to outside of themselves: - there's no point. If you have many small packs, you're doing something wrong. The whole _point_ of packs is to put things into the same file, so that you can avoid the filesystem overhead. And once packs are big and few, the advantage of having deltas to outside the pack is basically zero. - it's a bad design. Self-sufficient packs means that a pack is a "safe" thing. When the index says that it contains an object, then it damn well contains it. In contrast, if you had packs that only contained a delta, and the pack needed some _other_ pack (or loose object) to actually generate that object, then it's not safe any more. You could end up with a situation where you get two packs from two different sources, and they contain deltas to _each_other_, and you have no way of actually generating the object itself any more. (Or you end up having to have rules to figure out when you have a loop, and stop looking just in the packed files, and start looking for loose objects instead) In other words, it has potentially _serious_ downsides. So DAMMIT! Stop looking to make the data structures worse. The fact is, the git data structures are FINE. They are well-designed. They work well. There's no _point_ in changing them, especially since changing them seems to be all about making things less reliable for dubious gain. One of the advantages of git is that you can explain things with object relationships, and that the file format is stable as _hell_. Thats a GOOD thing. Please realize that if you want to change the file formats, you'd have a hell of a better reason for it that "just because I can". Please. Really. So next time somebody suggests a new pack-format, ask yourself: - does it save disk-space by 50% or more? - does it drop memory usage by 50% or more? - does it improve performance by 50% ...
That and all of the other reasons you cited in your message are why I haven't finished trying to use some sort of dictionary based compression for packing objects. On the other hand we've already seen how packs >1.5 GiB in size (certainly well within the 4 GiB limitation in the current index file format) cannot be repacked by git-repack-objects on a 32 bit address space as the entire pack file is mmap'd on one shot. After the kernel space of ~1 GiB and the pack file at ~1.5 GiB there's very little address space left for the application code. My comment that you quoted was about mmap'ing the pack files in large chunks (around 64-128 MiB at a time, but configurable from .git/config) rather than as an entire massive mapping. It had absolutely nothing to do about changing the pack file format, the index format, or any other on disk format. Although it would add a new pair of configuration options to .git/config. Is that change too radical? :-) With such a change the Git and Linux kernel repositories would both still mmap in one chunk but much larger projects like Mozilla or very large pack files coming out of git-fastimport would actually be usable on 32 bit architectures without running into address space limitations so quickly. Git would also be slightly more usable for some people who have a lot of very uncompressable data stored in Git. Unless of course you are actively working on a fix for the Linux kernel so that we can actually have all 4 GiB of virtual address space available for the userspace git-repack-objects process. Or have some sort of secret plan to upgrade everyone who uses Git to 64 bit processors which support 64 bit address spaces... -- Shawn. -
I wonder what you would need the configuration options for. If mmap() pack works well, it works well, and if it is broken nobody has reason to enable it. The code should be able to adjust the mmap window to appropriate size itself and its automatic adjustment does not even have to be the absolute optimum (since the user would not know what the optimum would be anyway), so maybe your configuration options would not be "enable" nor "window-size" -- and I am puzzled as to what they are. -
All very true. However what do we do about the case where we mmap over 1 GiB worth of pack data (because the mmap succeeds and we have at least that much in .pack and .idx files) and then the application starts to demand a lot of memory via malloc? At some point malloc will return NULL, xmalloc will die(), and that's the end of the program. If the user was able to set the maximum threshold of how much data we mmap then they could initially prevent us from mmap'ing over 1 GiB; instead using a smaller upper limit like 512 MiB. Of course as I write this I think the better solution to this problem is to simply modify xmalloc (and friends) so that if the underlying malloc returned NULL and we have a large amount of stuff mmap'd from packs we try releasing some of the unused pack windows and retry the malloc before die()'ing. The other configuration option is the size of the mmap window. This should by default be at least 32 MiB, probably closer to 128 MiB. But its nice to be able to force it as low as a single system page to setup test cases in the t/ directory for the mmap window code. Earlier this summer we discussed this exact issue and said this value probably needs to be configurable if only to facilitate the unit tests. -- Shawn. -
I see. So you are allowing users to control individual window size and total mmap memory. That makes sense. -
Sure. I agree that we should do that, if only because it's clearly getting hard to handle large pack-files on a 32-bit architecture. You just seemed to say that in the _context_ of wanting to support having multiple pack-files open (in order to allow deltas to refer to things outside their own pack-file). I just wanted to head that particular idea off at the pass. I think thin packs have been a good idea, and they certainly cut the amount of data sent over the network down by a large amount (much more than 50%), so I think thin packs are a great idea. Just _not_ when indexed. So I don't object to mmap windows at all. I object to them only in the context of "they would allow us to use deltas between two different packs" discussion ;) Linus -
Having mmap windows or not has no impact on using deltas between packs. We already map multiple packs at once. We just don't do delta resolution between them, for the reasons you have already given. The two are totally unrelated. I apologize for somehow making yourself (and others) think they are. -- Shawn. -
Ah, I feel quite behind. I was about to say "oh have you been pushing with --thin option?", and then realized that we made it default since late March this year. I need to run memtest86 on myself X-<. -
Remember what I said earlier: "If there are advantages to do so then To me this is the real killer. Shawn was talking about a different issue though. Nicolas -
Actually there is a point to storing thin packs. When I pull from a remote repo (or push to a remote repo) a huge number of objects and the target disk that is about to receive that huge number of loose objects is slooooooooow I would rather just store the thin pack then store the loose objects. Ideally that thin pack would be repacked (along with the other existing packs) as quickly as possible into a self-contained pack. But that of course is unlikely to happen in practice; especially Yes, it does. But it could also be useful when you fetch 20k+ objects onto a Windows system or push 1k+ objects onto the slowest NFS system I have ever seen... where writing file data (aka packs) is reasonable but creating or deleting files takes nearly 1 second per file. I don't want to kill the better part of an hour waiting for a push to complete! -- Shawn. -
I'm really nervous about keeping thin packs around. But a possibly good (and fairly simple) alternative would be to just create a non-thin pack on the receiving side. Right now we unpack into a lot of loose objects, but it should be possible to instead "unpack" into a non-thin pack. In other words, we could easily still use the thin pack for communication, we'd just "fill it out" on the receiving side. Linus -
Funny, I had the same thought. :-) We already know how many objects are coming in on a thin pack; its right there in the header. We could just have some threshold at which we start writing a full pack rather than unpacking. Writing such a full pack would be a simple matter of copying the input stream out to a temporary pack, but sticking any delta bases into a table in memory. At the end of the data stream if we have any delta bases which weren't actually in that pack then find them and copy them onto the end, update the header and recompute the checksum. git-fastimport does some of that already, though its trivial code... Worst case scenario would be the incoming thin pack is 100% deltas as we would need to copy in a base object for every object mentioned in the pack. -- Shawn. -
It should not be hard to write another program that generates a packfile like pack-object does but taking a thin pack as its input. Then receive-pack can drive it instead of unpack-objects. -
Give me half an hour. It should be trivial to make "unpack-objects" write the "unpacked" objects into a pack-file instead. Linus -
Heh, three people having the same idea that goes in the same direction at the same time is not necessarily a good sign of efficient project management... I am currently fighting with FC5 so please go ahead. -
Or maybe it is just a sign of a good way to resolve the issue I was raising. :-) -- Shawn. -
If you use builtin-unpack-objects.c from next, you'll be able to generate the pack index pretty easily as well, as all the needed info is stored in the obj_list array. Just need to append objects remaining on the delta_list array to the end of the pack, sort the obj_list by sha1 and write the index. Pretty trivial indeed. Nicolas -
Hi, Easy! You take all the fun out of it! Ciao, Dscho -
Actually, I've hit an impasse. The index isn't the problem. The problem is actually writing the resultant pack-file itself in one go. The silly thing is, the pack-file contains the number of entries in the header. That's a silly problem, because the _natural_ way to turn a thin pack into a normal pack would be to just add the missing objects from the local store into the resulting pack. But we don't _know_ how many such missing objects there are, until we've gone through the whole source pack. So you can't easily do a streaming "write the result as you go along" version using that approach. So there's _another_ way of fixing a thin pack: it's to expand the objects without a base into non-delta objects, and keeping the number of objects in the pack the same. But _again_, we don't actually know which ones to expand until it's too late. The end result? I can expand them all (I have a patch that does that). Or I could leave as deltas the ones I have already seen the base for in the pack-file (I don't have that yet, but that should be a SMOP). But I'm not very happy with even the latter choice, because it really potentially expands things that didn't _need_ expansion, they just got expanded because we hadn't seen the base object yet. So I'll happily send my patches to anybody who wants to try (I don't write the index file yet, but it should be easy to add), but I'm getting the feeling that "builtin-unpack-objects.c" is the wrong tool to use for this, because it's very much designed for streaming. It would probably be better to start from "index-pack.c" instead, which is already a multi-pass thing, and wouldn't have had any of the problems I hit. So it's conceptually totally trivial to rewrite a pack-file as another pack-file, but at least so far, it's turned out to be less trivial in practice (or at least in a single pass, without holding everything in memory, which I definitely do _not_ want to do). So I'm leaving this for today, and ...
A potentially even simpler way would probably be to literally just use "git-pack-objects" directly, and just have a very special mode that allows mapping the thin pack as if it was a real pack (ie basically pre-populating a fake pack entry, where the fake part comes from adding the missing objects by hand to the mapping). So many ways to do it, so little real motivation ;) Linus -
Hi, You do not write this to stdout, right? Why not just come back and correct the number of objects? Of course, the SHA1 has to be calculated _after_ that. Ciao, Dscho -
That's the issue. I wanted the pack-file thing to look as similar to the
old code as possible. And that means using the "sha1write()" interfaces,
which calculate the SHA1 checksum _as_ we write.
So yes, I wanted to do it all in one phase.
Anyway, if anybody is interested, here's a series of four patches that do
something that _almost_ works. I save away the SHA1's and the offsets so
that I could write an index too, but I didn't actually do that part.
But with this, I can rewrite a pack-file "in flight", and the end result
can then have "git index-pack" run on it, and used as a pack. It's just
that there are no deltas left because of some of the silly problems I
outlined (the code to write out deltas is actually there and just
uncommented - it works, but it leaves the end result with unsatisfied
deltas again).
Linus
---
commit 4efd9b0f44635b3075c9aad6d1cc8830e3abded3
Author: Linus Torvalds <torvalds@osdl.org>
Date: Wed Oct 18 17:22:04 2006 -0700
Fix up csum-file interfaces
Add "const" where appropriate
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
diff --git a/csum-file.c b/csum-file.c
index b7174c6..3237228 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -47,7 +47,7 @@ int sha1close(struct sha1file *f, unsign
return 0;
}
-int sha1write(struct sha1file *f, void *buf, unsigned int count)
+int sha1write(struct sha1file *f, const void *buf, unsigned int count)
{
while (count) {
unsigned offset = f->offset;
@@ -115,7 +115,7 @@ struct sha1file *sha1fd(int fd, const ch
return f;
}
-int sha1write_compressed(struct sha1file *f, void *in, unsigned int size)
+int sha1write_compressed(struct sha1file *f, const void *in, unsigned int size)
{
z_stream stream;
unsigned long maxsize;
@@ -127,7 +127,7 @@ int sha1write_compressed(struct sha1file
out = xmalloc(maxsize);
/* Compress it */
- stream.next_in = in;
+ stream.next_in = (void *) in;
stream.avail_in = size;
stream.next_out = ...Hmmm.... unpack-objects receives a (possibly thin) pack over its stdin. That part has to be streamed. But its output is currently always written to multiple files as separate objects. So, while the input comes from a stream, the output doesn't have to. In that case, why not just write the input directly to a temporary file, append the missing objects, seek back to adjust the object number, and finally run a SHA1_Update() on the whole thing? This forces you to write everything and then read everything back, but this should not be too bad especially that the written data is likely to still be cached. Once its final sha1sum is written then it just need to be moved with the Most base objects, well all of them nowadays, are written before their deltas. So in practice the only objects that will get expanded are the But index-pack is totally incompatible with any streaming. It mmap() the whole pack and happily perform random accesses. So you'd need to write the entire thin pack to disk anyway before it could work on it. This is not really better than the unpack-objects option. At least unpack-objects is structured to perform work on the fly as data is I'll have a look at your patches tomorrow as well. I have many ideas brewing, including randering index-pack obsolete since actually unpack-objects could do it all already (both tools have many concepts in common). Nicolas -
pack-objects.c::write_one() makes sure that we write out base immediately after delta if we haven't written out its base yet, so I suspect if you buffer one delta you should be Ok, no? -
If we create full packs out of thin packs the base objects will end up at the end of the pack so this assumption is a bad one to rely upon if we want to make things robust (like being able to feed such a pack back). Nicolas -
It doesn't matter. I realized that my bogus patch to unpack-objects was more seriously broken anyway: even the "un-deltify every single object" was broken. And that's despite the fact that I _tested_ it, and verified the end result by hand. Why? Because I tested it within one repo, by just piping the output of git-pack-objects --stdout directly to the repacker. That seemed to be a good way to test it without setting up anything bigger. But it turns out that it misses one of the big problems: if you don't unpack the objects in a way that later phases can read, none of the streaming code works at all, and you have to buffer up _everything_ in memory just to be able to read any previous _non_delta objects too. So my patch-series works - but it only works in a repo that already has all the objects in question, because then it can look up the objects in the original database. Which makes it useless. Duh. So forget about unpack-objects. It's designed to be streaming (and it's a _good_ design for what it does), but repacking really cannot be done that way. Repacking needs to be done by saving the thin pack to disk, and then doing a multi-pass over it (like git-index-pack does, for example). Just throw my patch away. It's not even useful as a basis for anything else, unless you want to use it as a way to keep all the objects in memory and use the "unpack-objects" logic to just _parse_ the incoming pack. I suspect using "index-pack" is saner (since it already has the multi-pass logic), or just doing somethign that maps all the objects in memory, and then calls builtin-pack-objects once it has set up the new thin pack so that others can see/use the new objects without realizing that they aren't in the canonical pack-format. Linus -
You are correct that it is not possible to create a pack with all objects expanded in a single pass. But that doesn't mean that a single pass conversion to a full pack is impossible. If we find a delta against a base that is not found in our repository we can keep it as a delta, the base should show up later on in the thin-pack. Whenever we find a delta against a base that we haven't seen in the received part of the thin pack, but is available from the repository we should expand it because there is a chance we may not see About that patch series, is there a simple way to import the series into a local repository? git-am doesn't like it, even after splitting it into separate files on the linebreaks. I guess git-mailinfo could be taught to recognise the git-log headers. Or have I missed some useful git apply trick. Jan -
Yes, indeed. We can also have another heuristic: if we find a delta, and we haven't seen the object it deltas against, we can still keep it as a delta IF WE ALSO DON'T ALREADY HAVE THE BASE OBJECT. Because then we know that the base object has to be there later in the pack (or we have a dangling delta, which we'll just consider an error). So yeah, maybe my patch-series is something we can still save. However, the thing that makes me suspect that it is _not_ saveable, is this: - let's assume we have a nice thin pack, with object A B C D (in that order), which is actually a good pack in itself (ie it _might_ be thin, but it's actually self-sufficient) - let A be a full object, and B be packed as a delta off A, C as a delta off B, and D as a delta off C. - Try to repack it as a streaming thing (the end result _should_ obviously be exactly the same as the input, since it turns out to be self-sufficient) Looks trivial, no? The answer is: no. It's not trivial. Or rather, it _is_ trivial, but you have to _remember_ all of the actual data for A, B, C and D all the way to the end, because only if you have that data in memory can you actually _recreate_ B, C and D even enough to get their SHA1's (which you need, just in order to know that the pack is complete, must less to be able to create a non-delta version in case it hadn't been). So we can definitely do the one-pass creation, but it requires that we keep track of everything we've expanded so far in memory (because we won't have the data available any other way - we don't have them as objects in our object database, and we don't have a good new pack yet). No, you've not missed anything. I didn't really expect anybody to want to seriously play with it, so I didn't bother to do things properly. Especially since I hadn't even written very good commit messages. Anyway, I just pushed the "rewrite-pack" branch to my git repo on kernel.org, so once it mirrors out, if you ...
It looks like you were really close. When we cannot resolve a delta, we just write it to the packfile and we don't queue it. If it can be resolved we write it as a full object. The only thing that cannot be reliably tracked is the pack index information. The offsets are trivial, but we cannot calculate the SHA1 for a delta without applying it to it's base, if the base comes later the existing code could do it, but if it has already been written to the pack we can't easily track back. And why add all the extra complexity. Running git-index-pack after git-update-objects --repack not only generates the correct index without a problem, it also serves as an extra consistency check and we keep this code isolated from any possible future changes to the index file format. I'll try to follow this up with 2 patches, one is an almost trivial change to your code that makes it write out a pack with all full objects and resolvable deltas converted to full objects, any unresolved deltas are expected to be relative to some other object in the same pack. The rewritten pack is indexed correctly even when I run git-update-index in a repository that does not contain any of the objects in the thin-pack. Ofcourse it also works when the objects are available, but the resulting full pack is considerably bigger since we can find a suitable base for Only if you want to build the index at the same time, we don't need to I think I still left quite a bit of the mess unfixed. Jan -
If I understand correctly, if we see an unresolvable delta, we are just making the assumption that its base has arrived (or will arrive) in the same pack (without checking). This means that we could end up with a corrupted repository if the sender gives us a bad pack. I believe that git's network interaction has been designed specifically to avoid such possibilities (e.g., verifying completeness and integrity of downloaded objects). -Peff -
The resulting pack should be correct if we have the base somewhere else in
the received pack, if we didn't have the base the received pack would be
faulty and can't be unpacked as loose objects either.
The internal pack index information is not updated correctly anymore.
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
---
builtin-unpack-objects.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)
diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index f139308..b95c93c 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -246,7 +246,10 @@ static void unpack_delta_entry(unsigned
}
if (!has_sha1_file(base_sha1)) {
- add_delta_to_list(base_sha1, delta_data, delta_size);
+ if (pack_file)
+ write_pack_delta(base_sha1, delta_data, delta_size);
+ else
+ add_delta_to_list(base_sha1, delta_data, delta_size);
return;
}
base = read_sha1_file(base_sha1, type, &base_size);
--
1.4.2.1
-
Tracking the offsets is not that hard, but calculating the sha1 for the
deltas is tricky, we may have already seen and written out the base we
need. So it is actually easier to avoid the complexity altogether and
rely on git-index-pack to rebuild the index. The indexing step is also a
useful validation whether the final pack contains a base for every delta.
Signed-off-by: Jan Harkes <jaharkes@cs.cmu.edu>
---
builtin-unpack-objects.c | 57 +++++++++++-----------------------------------
1 files changed, 14 insertions(+), 43 deletions(-)
diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c
index b95c93c..3df7938 100644
--- a/builtin-unpack-objects.c
+++ b/builtin-unpack-objects.c
@@ -89,29 +89,6 @@ static void *get_data(unsigned long size
}
static struct sha1file *pack_file;
-static unsigned long pack_file_offset;
-
-struct index_entry {
- unsigned long offset;
- unsigned char sha1[20];
-};
-
-static unsigned int index_nr, index_alloc;
-static struct index_entry **index_array;
-
-static void add_pack_index(unsigned char *sha1)
-{
- struct index_entry *entry;
- int nr = index_nr;
- if (nr >= index_alloc) {
- index_alloc = (index_alloc + 64) * 3 / 2;
- index_array = xrealloc(index_array, index_alloc * sizeof(*index_array));
- }
- entry = xmalloc(sizeof(*entry));
- entry->offset = pack_file_offset;
- hashcpy(entry->sha1, sha1);
- index_array[nr++] = entry;
-}
static void write_pack_delta(const unsigned char *base, const void *delta, unsigned long delta_size)
{
@@ -122,11 +99,9 @@ static void write_pack_delta(const unsig
sha1write(pack_file, header, hdrlen);
sha1write(pack_file, base, 20);
datalen = sha1write_compressed(pack_file, delta, delta_size);
-
- pack_file_offset += hdrlen + 20 + datalen;
}
-static void write_pack_object(const char *type, const unsigned char *sha1, const void *buf, unsigned long size)
+static void write_pack_object(const void *buf, unsigned long size, const char *type, const unsigned char *sha1)
{
...I don't think it is a good idea. After looking at the problem for a while I should side with Linus. unpack-objects is not the proper tool for the job. The way to go is to make input to index-pack streamable. This patch in particular creates additional restrictions on pack files that were not present before. And I don't think this is a good thing. This patch impose an ordering on REF_DELTA objects that doesn't need to exist. Say for example that an OFS_DELTA depends on an object which is a REF_DELTA object. With this patch any pack with the base for that REF_DELTA stored after the OFS_DELTA object will be broken. And to really do thin pack fixing properly we really want to just append missing base objects at the end of the pack which falls in the broken case above. Nicolas -
I agree. By the way, it is rather rare for us to see a NAK on this list. I'd welcome to see more of them ;-). -
I don't see where it imposes any ordering. If we see a complete object it will remain complete. If we find a delta, and we have the base in the current repository it will be expanded to a complete object. When we get a delta that doesn't have a base in the current repository it will remain unresolved and is written out as a delta. So the output pack will always contain fewer deltas as the input. btw. I don't really know what OFS_DELTA and REF_DELTA objects are, I grepped the source and found no references to either. I can only find an OBJ_DELTA. But if any of the deltas depend on an object that is not in the thin pack, the base has to be available in the current repository and as such it will be expanded to a full object, replacing the possibly external delta reference with an internal base object. If the base is not found in the current repository the base has to be another object in the original thin pack so we can write out the delta as is. There is no before or after decision here. We don't look back in the thin pack, and we don't have to look forward either. So I don't understand why your example would break or not depending on if the base I guess I'll grep through the mailinglists to try to figure out what these OFS and REF deltas are and why they behave so differently depending on their order in the pack. Jan -
It's been cooking in "next" branch for quite a while. -
Ah yes, just went through the thread about the git-index-pack breaking on 64-bit systems and the back and forth about the possible complexity of ... I guess one of these must be false. But clearly this patch breaks those offset based delta's when we expand random deltas in place. Jan -
But the point of the whole exercice is actually to avoid unresolved deltas. And you know if you have unresolved deltas only when the whole pack has been processed. If the base object is not in the repository but it is in the pack _after_ the delta that needs it, you won't have resolved it. If this is a thin pack with missing base objects for whatever reason you're screwed. If the delta has its base object in both the repository _and_ in the pack but after the delta then you will have expanded the delta needlessly. So your solution is suboptimal. The optimal solution really consists of appending missing base objects to a thin pack in order to make it complete, or error out if those cannot be found. Nicolas -
We've tried this already, and I shelved the patch for 64-index for now due to exactly the same reasoning as yours (and it would have conflicted heavily with Shawn's windowed-mmap() patch). It involved updating just the index file format, so you are right on both counts. But you are always right anyway, so it may not be a news at all ;-). -
It is a technical limitation. We have never assumed that the virtual address space is big enough to hold more than one whole pack mmapped at the same time. Lifting this needs the piecemeal mmap() change somebody was talking about. I might bite the bullet and do that myself but I've been hoping to get an appliable patch from somewhere else ;-). -
Even though its not big enough for some larger packs on a 32 I might be able to do it this weekend. I'll try to spend some time on it. You'll either see a patch series, or you won't. ;-) -- Shawn. -
Did you really manage to miss the "heads-up: git-index-pack in "next" is
broken" thread?
The fix:
diff --git a/index-pack.c b/index-pack.c
index fffddd2..56c590e 100644
--- a/index-pack.c
+++ b/index-pack.c
@@ -23,6 +23,12 @@ union delta_base {
unsigned long offset;
};
+/*
+ * Even if sizeof(union delta_base) == 24 on 64-bit archs, we really want
+ * to memcmp() only the first 20 bytes.
+ */
+#define UNION_BASE_SZ 20
+
struct delta_entry
{
struct object_entry *obj;
@@ -211,7 +217,7 @@ static int find_delta(const union delta_
struct delta_entry *delta = &deltas[next];
int cmp;
- cmp = memcmp(base, &delta->base, sizeof(*base));
+ cmp = memcmp(base, &delta->base, UNION_BASE_SZ);
if (!cmp)
return next;
if (cmp < 0) {
@@ -232,9 +238,9 @@ static int find_delta_childs(const union
if (first < 0)
return -1;
- while (first > 0 && !memcmp(&deltas[first - 1].base, base, sizeof(*base)))
+ while (first > 0 && !memcmp(&deltas[first - 1].base, base, UNION_BASE_SZ))
--first;
- while (last < end && !memcmp(&deltas[last + 1].base, base, sizeof(*base)))
+ while (last < end && !memcmp(&deltas[last + 1].base, base, UNION_BASE_SZ))
++last;
*first_index = first;
*last_index = last;
@@ -312,7 +318,7 @@ static int compare_delta_entry(const voi
{
const struct delta_entry *delta_a = a;
const struct delta_entry *delta_b = b;
- return memcmp(&delta_a->base, &delta_b->base, sizeof(union delta_base));
+ return memcmp(&delta_a->base, &delta_b->base, UNION_BASE_SZ);
}
static void parse_pack_objects(void)
Nicolas
-
Since you created a "thin" pack (that's what the "--objects-edge" means), the pack actually contains deltas to objects that are _not_ in the pack. In other words, it's not a valid stand-alone pack, it's only a valid thin pack, useful to transfer data to the other end (and the other end had better have the objects that the deltas are against already). As a result, index-file refuses to index it: it cannot be used as a stand-alone pack, it's _only_ useful as a transfer medium. So don't even _try_ to use it as a standalone pack-file. It won't work. (If you want somethign that actually works as a stand-alone pack-file, change the "--objects-edge" flag to just "--objects" - that makes the pack-file self-sufficient, and doesn't try to delta against "edge" A properly named _standalone_ pack gets named not by its actual contents, but by the SHA1-sum of the sorted list of objects it contains. That's so that a pack-file will be named the same thing regardless of how the contents are actually packed. A thin pack cannot be named that way at all, for the same reason you cannot index it: it has a set of objects it enumerates (so you could name it by them), but it _also_ has a set of objects outside of it that it depends on. That said, even a thin pack internally has a SHA1 checksum of its contents: the last 20 bytes should be the SHA1-sum of all preceding bytes. So if you just want _some_ kind of name, you can use the last 20 bytes of a pack, which is just its internal integrity-checksum (but that is _different_ from the "pack-xxxxxx.idx"/"pack-xxxxxx.pack" naming). Linus -
On Wed, 18 Oct 2006 20:52:25 +0200 Couldn't these just as easily have been written as git-bundle and Not sure if it would be useful, but it shouldn't be too hard to have Think you're right about making it an attachment instead. Sean -
Petr Baudis пишет: You probably miss main idea of bzr bundles. It's not just the way to send via e-mail or other appropriate transport the part of repository. It primarily was designed to be human readable as usual diff (i.e. patch). It was designed to solve 2 thing simultaneously: - be informative for human as usual patch - be consistent for machine. -- Alexander -
On Thu, 19 Oct 2006 09:46:32 +0300 Petr already mentioned that the data currently shown in the email text isn't really useful. But it's simple to make it an attachment and show a combined diff instead. Although that might just make the email bigger for not a lot of gain. It's easy to use the git command line and gui tools to inspect the bundle after importing it into your repository. And just as easy to expunge the bundle afterward if it isn't up to grade. Sean -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 In Bazaar bundles, the text of the diff is an integral part of the data. It is used to generate the text of all the files in the revision. Bazaar bundles were designed to be used on mailing lists. So you can review the changes from the diff, comment on them, and if it seems It's my understanding that most changes discussed on lkml are provided as a series of patches. Bazaar bundles are intended as a direct replacement for patches in that use case. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFFONck0F+nu1YWqI0RAgrHAJ0flmF1wCGYYUSk8f2iy8LuZnkaKQCdFSIo JIaKi9S8TzUkhvaWpYYP5AA= =MgZo -----END PGP SIGNATURE----- -
On Fri, 20 Oct 2006 10:03:16 -0400 Perhaps I missed something in the earlier mails about this feature. As I understood it, the email sent has a combined diff that shows the net effect of all the commits included in the bundle. (Whereas the current Cogito version only shows a diffstat) If the recipient of such a bundle is unable to extract the diff of each separate commit included in the bundle then I can't see any value in the feature at all. But showing a combined diff in the email may have marginal value, so long as when the bundle is imported into the recipient repository the individual commits A combined diff of a bunch of changes would usually be most _unwelcome_ for review on lkml. The constant refrain is to ask people to split their changes up into smallish individual patches for review. Sean -
OK, that was how I was envisioning it, as well, but I was concerned about the "screwed" part. But I'm not sure how often that would be an issue in practice (after all, patches require some matchup of the base, though not as strict as SHA1s). Thanks for the explanation. -Peff -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 That's true. We support merge points in a way that's compatible with svk. Subversion allows revisions to have arbitrary properties, and svk Bzr's subversion support is quite nice. You can commit, merge, run history viewers. There are screenshots and stuff here: http://bazaar-vcs.org/BzrForeignBranches/Subversion Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNWhc0F+nu1YWqI0RAkH7AJ4/S648shA8IKg42xcGWdjnjmA+PgCdEDhg Af/mcG+XTy3Tsb9b1x3rYcg= =xnjF -----END PGP SIGNATURE----- -
Sounds a bit like [PATCH 0/8] would have the output of git diff $(git merge-base master)..topic-branch for any given patch-series. It might be easier to review the whole patch-series in some cases. Especially with patch-series where more than one patch touches the same part of the code. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
AAUI, the initial claim was that after a rebase, git can do a fast-forward, but Aaron has missed the /after a rebase/ part. And yes, it the bzr terminology, bzr can do a "pull" after a "graft". I don't think there's a fundamental difference here. -- Matthieu -
On Tue, 17 Oct 2006 17:27:44 -0400 But really why does any of that matter? This is the open source world. We don't need plugins to extend features, we just add the feature to the source. The example I asked about earlier is a case in point. Apparently in bzr "bisect" was implemented as a plugin, yet in Git it was implemented as a command without any issue at all, no plugins needed, and its compiled and runs at machine speed. Sean -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 That can lead to feature bloat. Some plugins are not useful to everyone, e.g. Mercurial repository support. Some plugins introduce additional dependencies that we don't want to have in the core (e.g. the rsync, baz-import and graph-ancestry commands). Plugins also don't have a Bazaar's rigid release cycle, testing requirements and coding conventions, so they are a convenient way to try out an idea, before committing to the effort of getting it merged into The bisect plugin is just as performant as any other bzr command. (The whole VCS is in Python.) Most people don't use it, so we don't ship it as part of the base install, but anyone who wants it can have it. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNVy70F+nu1YWqI0RAnlxAJ9+ZXryG/KJxi6hjpz+U/gU3y06MQCdH2Ez cFlnxwWksB+q2b1dXI3cfwo= =HAy6 -----END PGP SIGNATURE----- -
On Tue, 17 Oct 2006 18:44:11 -0400 Shrug, it's really not that tough to do in regular ole source code. On Fedora for instance you have your choice of which rpms you want Hmm.. It's pretty easy to test out Git ideas too. People do it all the time, and without plugins. Junio maintains several such trees for instance. Dunno.. I just think plugs _sounds_ good to developers Sure, and anyone who wants to use StGit on top of Git can download and use it as well. Sean -
/me too post ;-) git-core, git-email, git-arch, git-cvs, git-svn, gitk (and git-debuginfo). gitk and gitweb were developed in its own repositories, but some time ago got incorporated into git repository. We have contrib/ area. Thanks to many low lewel (plumbing in git-speak) commands it is very easy to prototype (write actually) new command in language suitable for fast prototyping, i.e. shell or Perl (or Python, too). Then if it is performance critical, or if it get troublesome to manage shell script version, it gets rewritten in C as builtin command. -- Jakub Narebski Poland -
Example time! There's a plugin for Bzr which adds support for Cygwin-compatible symlink support on Windows. (IIRC, this involves monkey-patching some of the Python standard library bits). Now, this is something which is *proposed* as a feature to be merged into upstream bzr, and it may happen at some point. That said, when I have a Windows-using coworker who wants to check out a repository that has symlinks in it (with his win32-native, no-cygwin-required bzr upstream binary), I don't need to tell him to go download and build bzr from a third party; instead, I just need to tell him to run a single command to check out the plugin in question into the bzr plugins folder. From an end-user convenience perspective, it's a pretty significant win. -
On Wed, 18 Oct 2006 16:04:52 -0500 You'll need a better example than that. Git has supported a version of Cygwin-compatible symlink support on Windows for quite some time. And no plugins were needed. Sean -
The win32-compatible symlink support is not, in and of itself, the point. The point is that core, pervasive functionality can be modified at runtime, with no recompilation or installation of tools not included in the bzr package itself, simply by dropping a directory into place. This means that folks who don't have the skillset to merge three branches together (say, upstream plus two different trees adding extra functionality) and run a build can still install a few plugins to enhance their copy of bzr (which was installed by their IT staff, or a shiny click-through idiot-friendly Windows installer, etc). And yes, there are people like that who are part of bzr's target audience. Think (of the lower end of the set of) DBAs, QA folk and such. Granted, I'm speaking with my IT hat on here rather than my developer hat -- but plugins are a pretty clear usability win. -
Hi, Please note that this is not welcome here. I _need_ to trust my SCM. And _that_ means that no strange non-mainline beast can be allowed to change core features. So, the wonderful upside of plugins you described here are actually the reason I will never, _never_ use bzr with plugins. Ciao, Dscho -- It's not paranoia. It's called experience. -
I presume that for this reason you will also never, _never_ use a non-mainline branch of git -- even if its actual code only touches UI enhancements or something similarly non-core -- because third-party branches have the ability, in theory, to make changes to the core of the revision control system. And that you will never, _never_ use third-party wrappers because they might play LD_PRELOAD tricks. Or run any software with root privileges you haven't personally written. Or... Sean's point that plugins are a comparatively minor win made inexpensive on account of bzr's use of Python is reasonable (though we may choose to differ on what level of value we attach to the utility). The claim that an extensibility mechanism should be rejected wholesale on account of being excessively powerful, on the other hand, is just silly. (If you couldn't write a plugin that *didn't* touch the core, this would be a different story. This is, however, very much not the case). -
Hi, you neatly clipped the most important part of my email: I quoted you NO! The point was that I will not gladly run anything which could change the core. If I know it touches only the UI, there is no problem. If I get a shell script using git-core programs to do its job, I _know_ that my repository will not be fscked afterwards. Most of it comes down to trust. And yes, you are correct, I will not run git with some obscure module LD_PRELOADed that some guy from some planet sent me. You might have missed my argument being about the SCM, and not the Oh, but NO! An extensibility mechanism which allows for a fragile system _is_ silly. Not my rejection of it. Just take an example (illustrating that once again, one should not attribute everything to malevolence...): I write a plugin for bzr. It does really wonderful things, it even cooks you dinner. Only that I happened to make a small mistake (if you followed some threads on the git list, you'd know that small mistakes are a hobby of mine), and by this mistake, your repository is ... gone. Small mistake, big consequence. That is wrong with such a powerful system which caters for developers, which are human after all. Note that such a small mistake would be much more likely caught in git: if it touches the core, plenty of eyes look at it. Ciao, Dscho -
If you're willing to look at the source of a branch to know that it touches only the UI, why would you not be willing to look at the source It's a silly point. If you're willing to look at what your shell script does and validate that it doesn't do LD_PRELOAD tricks or swap out git core pieces, why wouldn't you be willing to accept a plugin after a similar level of review, rather than stating outright that you would Shell scripts allow for a fragile system because they could include C code snippets which they then compile and LD_PRELOAD. Sure, they "allow for" a fragile system -- but the author has to go out of their way to make it so. Similarly, folks writing bzr plugins need to take explicit actions to monkeypatch existing code (as opposed to adding a new transport/storage format/command/etc but leaving the old ones alone). If you trust the author of your shell script not to build their own LD_PRELOAD at runtime, why don't you trust the author of your bzr plugin not to monkeypatch in replacements to core code if they say they aren't? -
Hi, That is why I said I'd be gladly using a shell-script using git-core programs. It is typically no more than 20 lines, and I can review that Well, I do not expect people to misbehave. You do not compile a nasty C-program from a shell script _by mistake_. I also expect people not to constantly miss my point. It could be that I am not as proficient in the English language as I thought. In that case, I'll better shut up. Ciao, Dscho -
You also don't replace bzrlib functionality (in your terms, plumbing) in I think your point is predicated on a misunderstanding of how plugins work. -
On Wed, 18 Oct 2006 18:31:32 -0500 Sure they can be. But their value I think is overstated, especially in an open source project where anyone can grab a copy of the source and update it with a trial feature. This updated copy can be wrapped in a nice GUI installer just as easily as any plugin. Now, I suppose plugins let end users mix and match trial features slightly easier, but hopefully your base package isn't so devoid of features that this is honestly necessary. As Petr pointed out, all this comes to Bzr essentially for free since it's a part of python. So be it, but I've yet to hear an example where plugins were anything more than a minor convenience rather than a fundamental win over the way Git is developing. For an example, just look how few lines of git were needed to implement the essential features of the bzr bundle feature. With no plugins or monkey business needed ;o) Sean -
The plugin Vs core feature is not a technical problem. The code for a plugin and for a core functionality will roughly be the same, but in a different file. There can be many reasons why you want to implement something as a plugin: * This is project-specific, upstream is not interested (for example, bzr has a plugin to submit a merge request to a robot, it will probably never come in the core). * The feature is not matured enough, so you don't want to merge it in upstream, but you want to make it available to people without patching (for example, "bzr uncommit" was once in the bzrtools plugin, and finally landed in upstream). * The feature you're adding are only of use to a small subset of users. You don't want to pollute, in particular "bzr help commands" with it, especially not to disturb beginners. I've been arguing in favor of a configuration option to hide commands from "bzr help commands" instead, but nobody seemed interested. * Explicit divergent points of view between the implementor of the plugin and upstream. That avoids a fork. I don't remember any such case with bzr. I'd compare bzr's plugins to Firefox extensions. Geeks used to like the big Mozilla-with-tons-of-config-options, but Firefox-with-only-the-most-relevant-features is the one which allowed a wide adoption by non-geeks. Still, geeks can customize their browser, and add features without having to wait for Mozilla Fundation to incorporate it in upstream. Now, I don't know git enough to know whether the way it is extensible allow all of the above, but bzr's plugin system it quite good at that. At the time git was almost exclusively used by the kernel, you didn't have all those problems since you targeted only one community, but I guess you already had some needs for flexibility. -- Matthieu -
So, bzr's plug-in architecture provides a 'protocol' for communicating with bzr? Or is it functionally the same as a Python module which is loaded after being named on the bzr command-line (or placed in a special folder) then executed along with all the other plug-ins? I'm trying to understand if writing a plug-in is any simpler than understanding the bzr source code. Can I ask the git folks what Sean meant in the above about a 'command'. Are you talking about shell scripts? Is 'git' the only program you need? AFAIK, 'bzr' is the sole program in Bazaar, and everything is done with command line options to bzr. Is that true of git? To what extent is git tied to a [programmable] shell? I've heard someone say there's no Windows version of git for some reason, can someone elaborate? Ta, Loki -
'git' is actually two things:
1) Its a wrapper command which executes 'git-foo' if you call it
with 'foo' as its first parameter. It searches for 'git-foo'
in the GIT_EXEC_PATH environment variable, which has a default
set at compile time, usually to the directory you are going to
install Git into.
2) Its most of the core Git plumbing. There are currently around 48
'builtin' commands. These are things which 'git' knows how to do
without executing another program. If you look at the installation
these 48 builtin commands are just hardlinks back to 'git'. For
example 'git-update-index' is really just a hardlink back to 'git'
and 'git' knows to perform the update index logic when its called
as either 'git-update-index' or as 'git update-index'.
We're moving more towards #2, but there are still a large number
No. In Git at least half of the things Git can do are not builtin to
'git' and thus require exec()'ing an external program (e.g. git-fetch).
However these often appear as though they are command line options to
'git' as 'git fetch' just means exec 'git-fetch' (by #1 above).
On the other hand there are a wide range of tools which are more or
less the same thing, just with different options applied to them.
All of the diff programs, log, whatchanged, show - these are all
just variations on a theme. Their individual implementations are
Git is still very much tied to a shell. For example 'git commit'
is really the shell script 'git-commit'. This is a rather long
shell script and it does a lot of things for the user; not having
it would make Git useless to for most people. It also has not been
rewritten in C. There is a roadmap however to convert it to C to
help remove the programmable shell requirement and people have been
Git runs on Cygwin. But there's no native Win32 (without Cygwin)
version of Git because:
- Git uses POSIX APIs and expects POSIX behavior from the OS its
running on. Without ...Historically, "git" was _only_ a wrapper program. When you did
git log
it just executed the real program called "git-log", which was often a
shell-script. That was just so that things could easily be extended, and
you could use shell-script for simple one-liner things, and native C for
more "core" stuff.
For example, "git log" used to be a one-line shell-script that just did
git-rev-list --pretty HEAD | LESS=-S ${PAGER:-less}
but it ended up being a lot more capable, and eventually just rewritten
as an internal command..
These days, most of the simple things like "git log" are all built into
the "git" program, although for anything not built in, it still acts as
just a wrapper, which allows not only random functionality to still be
written in shell (or sometimes perl), but also ends up being the simplest
possible plug-in mechanism: you can define your own commands by just
writing a shell-script thing, calling it "git-mycommand", installing it in
the proper place, and it ends up being accessible as "git mycommand".
Almost all of "core" git is pure C, which unlike something like python or
perl obviously tends to have a fair amount of system issues. That said,
much of it really is fairly portable, so doing the built-in git stuff
should _largely_ work even natively under Windows with some effort.
The problem ends up being that few enough people seem to develop under
Windows, and the cygwin port works better (because it handles a number of
the portability issues and also handles the scripts that are still shell).
Those two issues seem to mean that not a lot of effort has been put into
aiming for a native windows binary (or into moving away from shell
scripts).
Most of the shell scripts really are fairly simple. So if somebody
_really_ wanted to, it would probably not be hard to spend some effort to
either just write them as C and turn them into built-ins, or porting them
to some other scripting language.
Of course, most Windows users ...Some of the internal commands that have been coded in C are actually much better handled by the shell in the first place. It's much simpler to write and extend as well as being much more traceable for runtime problems. The shell commands that would be used for most of these git routines have options for requesting it to be more verbose so the user actually has a lot more power over reporting and/or logging. In addition it tends to be more portable and the amount of code is drastically reduced in a script style of programming. The criticisms against such use of shell scripting tends to be a matter of personal taste. People believe, for some reason or another, that it is a lower-class type of programming that is less robust and is harder to understand. Seldom have there been cogent arguments for coding such features in C as opposed to shell scripting, especially in the case of git where the shell becomes a very powerful ally. David -
Yes. However, from a portability (to Windows) standpoint, shell is just about the worst choice. Not that perl/python/etc really help - unless the _whole_ program is one perl/python thing. Windows just doesn't like pipelines etc very much. So I'd like all the _common_ programs to be built-ins.. Linus -
And I would prefer the opposite because we're talking about git. As an information manager, it should be seen and not heard. Nobody is going to spend their time to become a git or CVS or perforce expert. As an individual primarily interested in development, I should not be required to learn command lines for dozens of different git-specific commands to do my job quickly and effectively. I would opt for a much more simpler approach and deal with shell scripting for many of these commands because I'm familiar with them and I can pipe any command with the options I already know and have used before to any other command. As a developer on Linux based systems, I should not need to deal with code in a revision control system that is longer and less traceable because the authors of that system decided they wanted to support Windows too. Moving away from the functionality that the shell provides is a mistake for a system such as git where it could be so advantageous because of the inherent nature of git as an information manager. This is the reason why I was a fan of git long ago and used it for my own needs before tons of unnecessary features and unneeded complexity was added on. David -
I don't understand how converting shell scripts to C has any impact whatsoever on the usage of git. The plumbing shell scripts didn't go away; you can still call them and they behave identically. Some C->shell conversions may have made the code "longer and less traceable." However, many of those conversions caused the code to be shorter (because communication between C functions is simpler than going over pipes, and because anything involving a data structure more complex than a string is difficult in the shell) and more robust (fewer opportunities for quoting/parsing errors, and none of the shell gotchas like missing the error code in "foo | bar"). Do you have any specific reason to believe that the git code is of worse Is there something you used to do with git that you no longer can? Is there a reason you can't ignore the newer commands? -Peff -
No, my criticism is against the added complexity which makes the modification of git increasingly difficult with every new release. It's a pretty limited use case of the entire package, I'm sure, but one of the major advantages that I saw in git early on was the ability to tailor it to your own personal needs very easily with some simple shell knowledge You're ignoring the advantageous nature of the shell with regard to git. The shell is so much better prepared to deal with information managers by nature than the C programming language. It's not a matter of shorter code, per se, it's about the developer's ability to make small changes to the operation of the information manager on demand to tailor to his or her _current_ needs. For any experienced shell programmer it is so much easier to go in and change an option or pipe to a different command or comment out a simple shell command in a .sh file than editing the C code. And sometimes it's necessary to have several different variations of that command which is very easy with slightly renamed .sh files instead of adding on more and more flags to commands that have become so complex at this point that it's difficult to know the basics of how to manage a project. This all became very obvious when the tutorials came out on "how to use git in 20 commands or less" effectively. These tutorials shouldn't need to exist with an information manager that started as a quick, efficient, and _simple_ project. You're treating git development in the same light as you treat Linux development; let's be honest and say that 99% of the necessary git functionality was there almost a year ago and ever since nothing of absolute necessity has been added that serious developers care about in a revision control system. Look at LKML, nobody is waiting on these new releases and upgrading to them when they're announced. And this is the community that git has _targeted_. Most other projects don't care about the syntactics ...
OK, you seemed to imply problems for end users in your first paragraph, Yes, it's true that some operations might be easier to play with in the shell. However, does it actually come up that you want to modify existing git programs? The more common usage seems to be gluing the plumbing together in interesting ways, and that is still very much You can do the same thing in C. In fact, look at how similar git-whatchanged, git-log, and git-diff are. I don't understand how a C->shell conversion has anything to do with options being added. If you look at all of the conversions, they Sorry, I don't see how this is related to the programming language _at all_. Are you arguing that the interface of git should be simplified so that such tutorials aren't necessary? If so, then please elaborate, as I'm sure many here would like to hear proposals for improvements. If you're arguing that git now has too many features, then which features I don't agree with this. There are tons of enhancements that I find useful (e.g., '...' rev syntax, rebasing with 3-way merge, etc) that I think other developers ARE using. There are scalability and performance improvements. And there are new things on the way (Junio's pickaxe work) that will hopefully make git even more useful than it already is. If you don't think recent git versions are worthwhile, then why don't you run an old version? You can even use git to cherry-pick patches onto I don't agree, but since you haven't provided anything specific enough Can you name one customization that you would like to perform now that you feel can't be easily done (and presumably that would have been easier in the past)? -Peff -
Indeed. I still use my old git-send-patch script whenever I want to send patches, simply because I don't like git-send-email and its defaults much. The interface hasn't changed one bit since I wrote it. That's pretty stable, since send-patch was created couple of hours before git.c was submitted to the list, as I wrote the "send-patch" script to send the patch that did the rewriting. I'm personally all for a rewrite of the necessary commands in C ("commit" comes to mind), but as many others, I have no personal interest in doing the actual work. I'm fairly certain that once we get it working natively on windows with some decent performance, windows hackers will pick up the ball and write "wingit", which will be a log viewer and GUI thing for fetching/merging/committing/reverting/rebasing/sending patches and whatnot. Possibly it will have hooks to Visual C++ or some other IDE. I don't know how that sort of thing works, but I'm sure someone clever and bored enough will want to investigate the possibilities. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
a quick lesson on program nameing
^^^^^^
how many other people read this as 'wing it' rather then 'win git'? ;-)
David Lang
-
Yes, that's certainly a less than optimal name... What about gitk? Is it "gi tk" or "git k" ? This has actually been the source of much local debate. :-) -- Shawn. -
in this case I think it's both, (or technicaly git tk with the double t's combined to save typeing) David Lang -
Yes, it does. I'll give you an example from six months ago: there was a need for the group that I work with to support a faster type of hashing function for whatever reason. This would have been simple with previous versions of git, but if you've ever looked at the SHA1 code in git, you'll realize that you're probably better off never trying to touch it. There is absolutely _no_ abstraction of it at all and the code is so deeply coupled in the source that abstracting it away is a pain. Likewise, there is always room for personal or organizational tweaks on the part of the developer. Things like distributed pulling and merging should actually be pretty simple to implement if the complexity wasn't so high in the merge-* family. This is something I implemented after an enormous headache because we were dealing with very large projects: yes, larger than the Linux kernel. And this is _exactly_ where piping would help; we have implementations of distributed grep over very No you can't. Making a one line addition, commenting out a line, or changing a simple flag in a shell script is much easier. And like I already said, you can save multiple versions for your common use if you work on a specific project much of the time and change how it operates depending on the needs of that one project so you never need to do it again or you can _distribute_ that shell file to your colleagues so that everybody is doing their work via the same method. This makes it so you can just say "type X, then type Y, then type Z" and everybody is operating It's not, it's related to the original vision of git which was meant for efficiency and simplicity. A year ago it was very easy to pick up the package and start using it effectively within a couple hours. Keep in mind that this was without tutorials, it was just reading man pages. Today it would be very difficult to know what the essential commands are and how to use them simply to get the job done, unless you use the ...
First off, thanks for giving examples. I was having trouble seeing where Is this really an artifact of the C code versus the shell code? A lot of parts of the system need to touch SHA1 hashes, and I think it has been sprinkled throughout the code from the beginning. In fact, I think the libification of git-rev-list has made the code a lot _cleaner_ (and shorter), in that the C programs can all use the same nice interface. The external interface is still there, but now there is consistency among programs when using rev syntax (ISTR issues in the distant past where program X didn't understand syntax because the parsing was all I guess I don't see how this was ever any easier. Do you mean that when we called an external grep, it was easier to plug in your distributed The "same thing" I referred to was changing behavior trivially based on Sure, shell can be easier to modify (though in well-written C, you're likely just commenting out a few lines or a function call -- maybe you can argue whether or not git is well-written). However, I remain unconvinced that this is a common use case, or that it is something that should weigh heavily when compared with portability, efficiency, or Simplicity is fine if all you want is plumbing. But normal people want to _use_ git without hacking their own shell scripts, so it makes sense to provide the scripts that other people have hacked together (as shell, perl, C, or whatever). Do I want to use git-send-email? Hell no, the interface is terrible to me. But do the plumbing commands still exist so that I can use the scripts I hacked together? Absolutely. I can take Was it? The most common complaint I've heard about git, starting a year ago, was the lack of documentation and tutorials and the complexity of I think this has been the case for a long time. It's just that there No, it illustrates a lack of simplicity that currently exists; it says There has been work on scaling to larger repositories (e.g., mozilla and xorg prompting ...
Compared to todays version, original git was neither efficient nor simple. Unless you mean "some random version along the way where git had everything *I* need and not the useless cruft that other people use", in Have you tried "git --help"? It shows the most common commands and a short description of what they do. It's a very good pointer to which man-pages you need to read, and I imagine this would actually be one of the very first commands that new git users try. If they don't but just expect things to work according to some premade mental model they have No it hasn't. The ten or so commands that Linus first introduced when announcing git still work pretty much the same. Nobody in their right mind would ever claim that those ten commands made up anything that even remotely resembled a complete scm, but they were something to build on by anyone who wanted to extend it. So far, ~220 people have wanted to extend it in ways that others thought useful, because their patches are Well, my head hurt when I tried to learn CVS without a tutorial, and mercurial and darcs and svn as well. I didn't pick up the functionality of the 'ls' command completely without reading the man-page for it. If you want something that works for everyone without having to read any documentation what so ever, buy Lego, cause computers ain't for you, my Actually, I don't see why git shouldn't be perfectly capable of handling a repo containing several terabytes of data, provided you don't expect it to turn up the full history for the project in a couple of seconds and you don't actually *change* that amount of data in each revision. If you want a vcs that handles that amount with any kind of speed, I think you'll find rsync and raw rvs a suitable solution. On the other hand, you fellas at google don't really use git to store the data from the search database, do you? I mean, it's written for source control management. People that tried to keep their mboxes in git failed ...
actually, there are some real problems in this area. the git pack format can't be larger then 4G, and I wouldn't be surprised if there were other issues with files larger then 4G (these all boil down to 32 bit limits). once these limits are dealt with then you will be right. David Lang -
There is no such limit on the pack format. A pack itself can be as large as you want. The 4G limit is in the tool not the format. The actual pack limits are as follows: - a pack can have infinite size - a pack cannot have more than 4294967296 objects - each non-delta objects can be of infinite size - delta objects can be of infinite size themselves but... - current delta encoding can use base objects no larger than 4G The _code_ is currently limited to 4G though, especially on 32-bit architectures. The delta issue could be resolved in a backward compatible way but it hasn't been formalized yet. The pack index is actually limited to 32-bits meaning it can cope with packs no larger than 4G. But the pack index is a local matter and not part of the protocol so this is not a big issue to define a new index format and automatically convert existing indexes at that point. Nicolas -
the offset within a pack for the starting location of an object cannot be larger then 4G. David Lang -
Well, strictly speaking, even that isn't actually a limit on the _pack_ format itself. It's really just the (totally separate) index that currently uses 32-bit offsets. For example, you can actually use the pack-file to transfer more than 4GB of data over the network. You'd not need to change the format at all. Only the local _index_ of the result needs to change - but we never transfer that at all (it's always generated locally), so that's really a separate issue. It's not even hard to fix. It's just that right now, the biggest repository that we know about (mozilla) is not even close to the limit. And it took them ten years to get there. So if the mozilla people switch to git, and keep going at the same rate, we have about 70 years left before we need to fix the indexing ;) (Of course, other projects, like the kernel, seem to grow faster, so it might be "only" a decade or two - but since the index format is a local thing, even that won't be too painful, since we don't really need a global flag-day once we decide to start supporting larger offsets in the index) Linus -
To be more exact, yes. But I don't think we'll ever consider use scenarios with packs > 4G with the current index format. There is simply no point. Nicolas -
That's also I wondered, but I also can understand where David is coming from, and I agree with him to a certain degree. When I learned git, I learned a lot from trying to piece my own plumbing together, since there weren't much Porcelain to speak of back then. Then we had many usability enhancements before the 1.0 release to add Porcelainish done as shell scripts. This had two positive effects, aside from adding usability. Interested people had more shell scripts to learn from. The scripts were easy to adjust to feature requests from the list, and as we learned from user experience based on these scripts it was definitely quicker to codify the best current practice workflow in them than if they were written in C. It would have taken us a lot more effort to add "git commit -o paths" vs "git commit -i paths" if it were already converted to C, for example. This continued and our Porcelainish scripts matured quickly. Then 1.3 series started to move some of the mature ones into C. As many people already have pointed out, being written in C and not doing pipe() has two advantages (better portability to platforms with awkward pipe support and one less process usually mean better performance). git-log family with path limiting had a real boost in performance because the path limiting can be done in the revision traversal side not diff-tree that used to be on the downstream side of the pipe. So this in overall was a right thing to do. One thing we lost during the process, however, is a ready access to the pool of "sample scripts" when people would want to scratch their own itches. Linus's original tutorial talked about "this pattern of pipe is so useful that we have a three liner shell script wrapper that is called git-foo", and interested people can easily look at how the plumbing commands fit together. The plumbing is still there, and I and people who already know git would still script around git-rev-list when we need to (by the way, scripting around git-log is a ...
I think this is part of the complication of discussion I'm having with David. There are really two sets of users for git: people who want to hack scripts based on plumbing, and people who want everything to "just work." I think it's a good point that as the system matures (movement Housing historical implementations seems like it would just lead to I think this is a better approach. I think it also makes sense to let people know that it's an acceptable approach to start new features as shell and then have them mature to C (looking at the current codebase, and some of Dscho's rantings, one might get the impression that git isn't accepting new shell scripts). -Peff -
I agree. Although that ought to be rare in principle, given that one advertised feature of git is that the plumbing is supposed to be stable, we occasionally had to have to subtly break things to improve plumbing and at the same time run around to make sure that all the script users (both in-tree and New commands like pickaxe and for-each-ref were easier to code in C, and cherry rewrite in C was really about how crufty the shell script version was from the beginning (and there weren't in-tree users of it left so it was not maintained at all but thanks to plumbing being stable it just kept working perhaps correctly but still horribly). -
Isn't this how git has been developed since day one, more or less? If a command is missing, it gets added as a shell-script. I agree with you on the "pipes from this sent here does this, and look how useful it is" lectures are gone since many commands were rewritten. Otoh, they're gone because they now instead provide examples on how to interface with the libified parts of git, so it's not a loss per se, just a switch in what it teaches. I also agree with David that shell is much more fun to muck around with and prototype in, because you see results to much faster. However, since our plumbing is so rock-solid (and getting extended with --stdin options to more and more commands), I see no reason why we shouldn't have a "how to extend git" with the old shell-based porcelain scripts up somewhere at the web. Perhaps it would kill two birds with one stone and increase the addition of new utilities to git, while at the same time keeping the already rewritten commands in C. Btw, the old shell-versions still work with the new plumbing (well, mostly anyways). They just have problems with filenames and revisions with spaces and special chars and things like that, same as they've always had. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Others have answered this, but the thing is, it was a _wonderful_ way to prototype things, and to add obvious (and nice) early UI issues that made git much more usable. But no, things are not better handled in shell. Shell tends to make some things really _hard_ to do. A fair chunk of the rewrite was because core functionality made things easier. For example, the whole internal revision partsing library is really actually a lot more capable than we could easily expose as a simple pipeline: the original "git log" pipeline worked very well, and you can actually still use those kinds of pipelines for a lot of work, but at the same time, some things really just work better when you have "deeper" interfaces. For example, the revision parsing library not only makes "git log" trivial as C, it's also needed for an efficient "git annotate/blame/pickaxe" kind of thing. There are also things that are just ludicrously hard to do in shell-script, like exclusive and atomic file operations. We used perl and python for some things, but finding people who know them tends to be problematic, and python in particular was also a dependency problem too, so the fact that the default recursive merge was python wasn't wonderful. So I think the shell-scripts are great (and some of them quite likely will remain around for the forseeable future) for prototyping, but for core functionality they were not wonderful. They are sometimes good examples of how powerful a scripting language git can be, though. Scripting is still very important, even though a lot of the core stuff doesn't necessarily depend on being scripts itself. But error handling in scripting is very hard or inconvenient, especially in pipelines. So some things were actively problematic (ie "git-rev-list --all --objects | git-pack-objects") and moving it to use the internal library interface was simply technically the right thing to do. Others had real performance issues, eg the new merge in C is a lot ...
Excuse me? What does that "throws away your local commit ordering" mean? A fast-forward does no such thing. It leaves the local commit ordering alone, it just appends other things on top of it. It's the only sane thing you can do, since the work you merged was already based on your top commit. So generating an extra "merge" commit would be actively wrong, and adds "history" that is not history at all. It also means that if people merge back and forth from each other, you get into an endless loop of useless merge commits. What's the point? They only clutter up the history, and they mean that you can never agree on a common state. There's no reason _ever_ to not just fast-forward if one repository is a strict superset of the other. You must be doing something wrong. Is it just that people want to pee in the snow and leave their mark? Linus -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Say this is the ordering in branch A: a | b | c Say this is the ordering in branch B: a | b |\ d c |/ e When A pulls B, it gets the same ordering as B has. If B did not have e It's not a tree change, but it records the fact that one branch merged You can pull if you don't want that. We haven't found that people are Maybe not in Git. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNV7u0F+nu1YWqI0RAhGtAJwOlWpl088pbl63EHyF04qQCYlXBgCfW0Tm cfXuE0vqeWelfFbpzffiCNI= =McQ2 -----END PGP SIGNATURE----- -
Sure. But that doesn't throw away any local commit ordering. The original order (a->b->c) is still very much there. The fact that there was a branch off 'b' and there is also (a->b->d) and a merge of the two at 'e' doesn't But that's a totally specious "record". It has no meaning in a distributed SCM. There is absolutely zero semantic information in it. The fact that you _locally_ want to remember where you were is a total non-issue for a true distributed system. You shouldn't force everybody else to see your local view - since it has no relevance to them, and I don't think there is any in bzr either. Can you explain? In other words, the empty merge is totally semantically empty even in the bazaar world. Why does it exist? Linus -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 After the pull, it's no longer the mainline ordering for the branch. c is represented a revision that was merged into the branch, while d is It means the the order that revisions are shown in log commands changes, It records the committer, the date, the commit message, the parent It exists because it is useful. Because it makes the behavior of bzr merge uniform. Because in some workflows, commits show that a person has signed off on a change. It's not something special-- it's just another commit, like regular commits, and merge commits. It would be harder to forbid than it is to permit. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNXQQ0F+nu1YWqI0RAnxDAJ4hbuLkEK1eBlyoEOz7NAlqLVth9gCfed4w nfeiR2KVvN+N9zdSrC8MKcY= =et73 -----END PGP SIGNATURE----- -
Well, that is another example while generation number is/can be global,
...but that means that revision numers are totally, absolutely useless.
Unless by some miracle of engineering, or adding namespace, they can be
All totally empty information. What should be commit message? I have
fetched changes from remote repository? You can remove one of parents
(the one of pointing to before fast-forward "merge") without changing
reachability.
---------
/ \
But if you record "fast-forward merge", you force all people pulling
from your repository to have this purely local and without any significant
Signing off the fact of fetching changes? For true merge you are signing
off the fact that there were no conflicts, or you sign off your conflict
Actualy the check is very easy. And you have to do similar check when
fetchin/pushing to ensure that you don't clobber your changes.
--
Jakub Narebski
Poland
-
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 No. The numbering always follows the leftmost parent. So each revision No, because no one pulls unless they're trying to maintain a mirror of Even if I agreed that the revision was meaningless, the cost of such a You sign off on the contents of the revision you fetched. You say "I Agreed. It's just that not checking is easier still. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNXzD0F+nu1YWqI0RAiGvAJsEbPNNlqZ7QCH7EE39YABqEm/BtwCaAxIo NHqG4NVZpvymTUlCLYyCqKM= =YUdC -----END PGP SIGNATURE----- -
Aaron, thanks for carrying this thread along and helping to bridge some communication gaps. For example, when I saw your original two two diagrams I was totally mystified how you were claiming that appending a couple of nodes and edges to a DAG could change the "order" of the DAG. I think I understand what you're describing with the leftmost-parent ordering now. But it's definitely an ordering that I would describe as local-only. That is, the ordering has meaning only with respect to a particular linearization of the DAG and that linearization is If in practice, nobody does the mirroring "pull" operation then how are the numbers useful? For example, given your examples above, if I'm understanding the concepts and terminology correctly, then if A and B both "merge" from each other (and don't "pull") then they will each end up with identical DAGs for the revision history but totally distinct numbers. Correct? So in that situation the numbers will not help A and B determine that they have identical history or even identical working trees. So what good are the numbers? I can see that the numbers would have applicability with reference to a single repository, (or equivalently a mirror of that repository), but no utility as soon as there is any distributed development happening. -Carl
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Well, the linarization for any particular head is well-defined, but The DAGs will be different. If A merges B, we get: a | b |\ c d |\| | e |/ f If B merges A before this, nothing happens, because B is already a superset of A. If B merges afterward, we get this: a | b |\ d c |/| e | |\| | f |/ They are good for naming mainline revisions that introduced particular Well, there's distributed, and then there's *DISTRIBUTED*. We don't quasi-randomly merge each others' branches. We have a star topology around bzr.dev. So when we refer to revnos, they're usually in bzr.dev. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNZsp0F+nu1YWqI0RAkmWAJ9PkrkubIHVgAn5Wbdkg9IBAHCviACdFx2x 6ClmK4GmC1pRuRQACcSijNM= =SM1Y -----END PGP SIGNATURE----- -
Seems like an awful lot of merge commits. In git, I think these trees would be identical (actually both to bazaar and to each other), with the exception that the 'g' commit wouldn't exist, since git does fast-forward and relies on dependency-chain only to present the graph instead of mucking around with info in external files (recording of As explained above, they would be identical in git. The fact that you register a fast-forward as a merge makes them not so, but this is So in essence, the revnos work wonderfully so long as there is a central server to make them immutable? Doesn't this mean that one of your key features doesn't actually work in a completely distributed setup (i.e., each dev has his own repo, there is no mother-ship, everyone pulls from each other)? I can see the six-line hook that lays the groundwork for this in git before me right now. I'll happily refuse to write it down anywhere. I get the feeling that sha's are easier to handle in the long run, while revno's might be good to use in development work. In git, we have <branch/tag/"committish">~<number> syntax for this. In my experience, finding the revision sha of an old bug is what takes time. Copy-paste is just as fast with 20 bytes as with 4 bytes. Honestly now, do you actually remember the revno for a bug that you stopped working on three weeks ago, or do you have to go look it up? If someone wants to notify you about the revision a bug was introduced, do they not communicate the revno to you by email/irc/somesuch? -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Ok. This I don't get. Let me recaptulize: Branch A a | b | c Branch B a | b | \ d c | / e In branch A, do merge branch B (git pull B) you get as result branch B, because A fastforwards to B and you don't get a merge commit f In branch B, do merge branch A (git pull A), the result would be branch B, because we are already uptodate. You _never_ have a commit f or g. -Peter -
Revnos were supposed to be superior to using sha1 (or shortened sha1)
as commit identifiers because of two key features:
1. They were simplier than sha1, therefore easier to use
2. Given two revisions related by lineage (i.e. one is ancestor of
the other) you can from a glance know which revision was earlier
But the details invalidated 1.: for complicated history, for a large
project, with many contributors and nonlinear development we have
www.repository.com:127.2.31.57 vs 988859a (7 chars shortcut of sha1)
to have immutable revno. And we have to use _immutable_ (up to few
years) revison identifiers, unless we want our "simple ids" scheme
to make a mess...
And I'm not sure if 2. is true, if even for revisions with direct
lineage we don't have to compare 127.15.2.16 with 210.2.20.3 for
example. Having generation number would solve 2.; as of now git
check for fast-forward case by checking if merge-base of two
revisions is one of the revisions.
--
Jakub Narebski
Poland
-
On Wed, Oct 18, 2006 at 10:39:32AM +0200 I heard the voice of It seems from my somewhat detached perspective that there's a lot of conflation of 'conventions' with 'capabilities' around this thread... With a single linear branch, revnos work wonderfully, and are probably much more useful than any sort of UUID. It would be silly in this day and age to design a VCS aimed specifically for this use case, of course. That doesn't mean a VCS shouldn't make it easy, though. With a star config, revnos are useful locally and with reference to the "main" branch[es]. And, most of the world is star configs of one sort or another. Actually, one might say that practically ALL the world outside of linux-kernel is star-configs ;) In many cases in the star setup, a revno (particularly along the 'trunk') is more directly useful than a UUID; consider particularly the case of somebody who's just mirroring/following, not actively developing. In some cases, the UUID is more useful. Certainly, using a revno in a case where the UUID is more appropriate is Bad, but that's just a matter of using the right tool. With a uber-distributed full-mesh setup, revnos may be basically useless for anything except local lookups (which boils down to "useless for most anything you'd identify a revision for"). For that case, you'd practically always use the UUID, and pretend revnos don't exist. The merge revno forms (123.5.2.17 and the like), I'm somewhat ambivalent about in many ways. But, you don't have to use them any more than you have to use "top-level" revnos. If either form of revno is Wrong for your case (whether it be because "I hate numbers wholesale", or because "Numbers don't cover this case usefully"), then you just use the UUID and pretend the number isn't there. If you wanted them completely out of sight, I wouldn't expect it to be very hard to talk bzr into never showing the revnos and just showing the UUID ("revid"). [ I don't speak for bzr, despite the fact that ...
That might be the case today. However, since we introduced git at the office, mini-projects are cropping up like mad, and pieces of toy-code are being pushed around among the employees. When something is found to be useful enough to attract management attention, it's given a spot at the "master site". It doesn't need one. It's just that we have this one place where gitweb is installed, which management likes whereas devs don't have that on their laptop. It's also convenient to have one place to find all changes rather than pulling from 1-to-N different people just to have a look at what they've done. The point I'm trying to make here is that the star config might be the most common case today because a) old scm's enforced this use case and it is therefor the most common way just out of habit. b) projects you actually *see* have gotten past the "Joe made some cool I can easily imagine the use case Linus pointed out with BK. Because revnos work wonderfully 80% of the time, people get confused, frustrated But they *do* exist, and they *usually* work, so people are bound to try them first. Teaching them when they work and when they don't (or rather, when they should and when they shouldn't, cause they will work by accident sometimes too) is bound to be a lot harder than sending them a So what's the point in having them? You can't seriously tell me that you think of 123.5.2.17 as something you can easily remember, do you? Count Not really. It's just that case 3 is the most flexible of them all. It's trivial to enforce linear development in git. Just add a hook that forbids merge commits. Set up a "master repo" and put the hook there and you've turned it into CVS with off-line log-browsing (more or less). Set up a master-server and enable the reflog there and you've turned it into bazaar, more or less. In git, the mothership repo is there for conveniance, because it's nice to have one place to set up mailing-list hooks, gitweb, git-daemon and ...
On Wed, Oct 18, 2006 at 01:19:10PM +0200 I heard the voice of c) Stars work well as a mental model for humans. Heck, in large, Linux is star-ish. There s "2.6.1", "2.6.2", etc; that's a trunk. Any time you have releases, you're establishing a "master" branch. For most people using Linux, there's a trunk, whether it's the kernel.org trunk, or the "What Redhat ships" trunk, etc. The closer you drill to the day-to-day work on the kernel, the farther it gets from trunks, but if it were full-mesh at all levels I don't think it would be nearly as usable for regular computing tasks as it is. Perhaps someday a heavy full-mesh setup will be the common case for VCS usage. I find that very difficult to buy for various reasons, but it could happen. If it does, bzr may well revisit the choice and decide revnos contribute little enough marginal value as to be a loss, Perhaps, for some projects. And in those cases, perhaps you'd want to flip a hypothetical "dump those numbers in the bin" switch. That doesn't mean every project wants to, or that those projects who don't and have no trouble and discernible gain from revno usage are No, I don't. But I don't use merge revnos for various reasons, one of the primary ones being that they don't currently intuitively follow from me (and that intuitiveness is the major attraction of revnos in the first place). I rarely refer to non-mainline revisions at all, in fact. And I use revnos for mainline revisions regularly. Heck, I communicate revnos _verbally_; people handle that easily with numbers, not so easily with hex strings. The vast majority of my branches are simple cases, and I like simple tools that match simple mental models for them. For the more intricate cases, revids provide a more rigorous tool, and I WANT a VCS that lets me choose which is appropriate. If I wanted a Yes, but this doesn't necessarily mean everything you seem to try and cover with it. The more rigorous tool will cover the simplest case (those ...
On Wed, 18 Oct 2006 07:43:20 -0500 Just to be clear here, Git is also able to supports this model if you so choose. It's quite easy for a server to generate Git tags for every commit it gets. It's just that this is basically a non issue in the Git world. People who use Git aren't crying out for salvation from sha1 numbers. So I think this entire discussion is a bit overblown. But just to be clear, there is nothing in the Git model that prohibits tagging every commit with something you find less objectionable than sha1's. They can appear in the log listings and in gitk etc, and everyone who pulls from the central server will get them. In fact, for some imports of other VCS into Git, exactly that is done; so every commit can be referenced by its sha1 _or_ the "friendly" number it was known by in its original VCS. Sean -
I really don't think that's even true. Most projects do tend to have a star-like setup, but I think that's largely due to historical tools, not mental models. For example, I used CVS professionally for too long a few years ago, and the thing I _really_ hated was exactly how it forced people who were working on "experimental stuff" to be so tightly organized around the central repository (and how they had to do things that were visible and annoying to the mainline). And I think that's where the "star-like" situation breaks down: when you have a group of people who go off to do something experimental. Suddenly the "mainline" in that case isn't the central and most important repository any more, and instead you really have another second (and third, fourth etc) "centerpoint" that another group works around. Now, what does that mean? It means that whenever you look at a big project from the outside, you tend to see a star-like thing: there's the "big common thing", and you won't even be _seeing_ the off-shoots, because they tend to be used by developers to try out new ideas etc. So it looks like a star, but it really isn't, and shouldn't be. An SCM should support the _developers_, not the users. The users don't need an SCM, they just need a place to fetch the "standard" thing (preferably with a vendor that supports them or at least makes them feel comfy). But an SCM really should support the off-shoots, because that's where the exciting stuff happens. Btw, this is also why distribution is so fundamentally important: Most of the off-shoots tend to be failures, but that is as it should be. Again, this is where SVN and CVS and other centralized models fail _miserably_. Because branches are in a centralized repository, the cost of failure is visible to all, and thus people don't like creating branches for things that don't look "obviously viable" to the people around the central repository. In contrast, in a truly distributed environmen, a ...
Wow. Thanks for elucidating---again I was making some incorrect assumptions about the system, so your answer was surprising and appreciated. So, am I correct in my understanding now that it's impossible for two users to establish identical code history on both sides through merge? If the two kept merging back and forth the history would pick up a new commit each time even though there were no code changes. Right? That's a startling property. I'm surprised to learn that the generally-used mechanism for getting new changes doesn't have a mode where it says "you're already up to date---doing nothing". I do understand that there's a separate "pull" that does allow for correct synchronization of a local repository with a remote repository, and it does have the "up to date---doing nothing" behavior. But as you already said, it's often avoided specifically because it destroys locally-created revision numbers. Another way of describing bzr's "pull" is that it establishes a master-slave relationship between the remote and local repository, (his numbers are more important than mine, so I'll throw mine away). I think Linus already provided a good argument in this thread about why that kind of asymmetry is bad for software projects and why tools should not provide it. So there are some aspects of the bzr design that rob from its ability to function as a distributed version control system. It really does bias itself toward centralization, (the so called "star topoloogy" as opposed to something "fully" distributed). And by the way, some people seem to have the opinion that there's something unique about the way the linux kernel is developed that allows is to benefit from a fully distributed system. The assumption seems to be that projects with a central tree won't benefit the same way, and don't really need the full set of features of a distributed system. That's not true in my experience. With cairo, for example, we had been using cvs. Obviously, it imposes a centralized ...
On Wed, Oct 18, 2006 at 08:38:24AM -0700 I heard the voice of I think this has the causality backward. It's avoided because it changes the ancestry of the branch in question, by rearranging the left parents; this ties into Linus' assertion that all parents ought to be treated equally, which I'm beginning to think is the base lynchpin of this whole dissension. Without a differentiation of the parents, there's no such creature as a "mainline" on a branch, so it's hard to find anything to base revnos on from the get-go; the whole discussion becomes meaningless and incomprehensible then. With the differentiation, numbering along the leftmost 'mainline' makes sense, and fits the way people tend to work. "I did this, then I did this, then I merged in Joe's stuff, then I did this", and the numbering follows along that. And as long as it's the same branch, those revnos will always be the same; I can't go back and add something in between my first and second commits. THAT'S where revnos are useful; referring to a point on given branch. Certainly, they're of no (or extremely limited) use when referring to _different_ branches. And when you change the arrangement of parents on a branch, you create a different branch. That's why bzr (the project, not the program) tends toward trunks that are merged into, rather than ephemeral trunks that are merged from and then replaced with the new trunk, and has its UI optimized by default for that case; because the ordering of the parents IS considered important and to be preserved. Ancestry changes aren't avoided because it would screw up the revnos; the revnos don't get screwed up because the ancestry changes are avoided for their OWN sake, and it's BECAUSE of that pre-existing tendancy that the revnos could come into being in the first place. If you need to refer to a specific revision in a vacuum, a revno is the *WRONG* tool for the job. Revnos exist to refer to points along a branch. And in cases where there's a meaningful ...
You, and others, keep saying "leftmost". What on earth does left or right have to do with anything? Or rather, how do you determine which So long as the given branch is, in git-speak, "master"? I think I'm starting to see how this would work, but I still fail to see how you can then come up with revnos such as 2343.1.14.7.19, since the only ones that seem to actually make any sense are the ones that track the strictly linear development. In git, this can be accomplished by auto-tagging each update of any branch with a tag named numerically and incrementally, although no-one really bothers with it. Let's say you have the following graph, where A is the root commit, B introduces the base for a couple of new features that three separate coders start to work on in their own repositories. The feature started on in D is logically coded as a two-stage change. F fixes a bug introduced in D. I is the result of an octopus merge of all three branches, where the three features are implemented and all bugs are fixed (this is btw by far the most common pattern we have in our repos here at work). A | B /|\ C | D | | |\ | | E F | | |/ | | G | H / \|/ I Now a couple of questions arise. - How do I do to get to C, D, E, F, G and H? - When these get merged, which one will be considered the "left" parent, I'm sure it's supported. The question is whether or not bazaar makes it easy for those developers to exchange valuable information (revids, since their revnos will be mixed up) so they can communicate detailed info about "commit X introduced a bug in foo_diddle(). I fixed it in commit Y, so if you merge it we can release". If revids are always printed anyways, I see even less need for revnos. If it's hard to get the revids I wouldn't consider the truly distributed workflow supported any more than I consider CVS file rename support
Not sure it's the same in git, but in bzr, a new revision is always created by a commit (it can be "fetched" by other commands though). If you "merge", then you have to commit after. What people call "leftmost ancestor" is the revision which used to be the tip at the time you commited. For example, if you do "bzr diff; bzr commit" the diff shown before is the same as the one got with "bzr diff -r last:1" right after the commit. I believe this doesn't make a difference for merge algorithms, but in the UI, it's here when you say, e.g.: bzr diff -r last:12..before:revid:foo@bar-auents987aue (once in "last:", and once in "before:") -- Matthieu -
Dear diary, on Thu, Oct 19, 2006 at 02:04:14PM CEST, I got a letter
The lack of parents ordering in Git is directly connected with
fast-forwarding.
Consider
repo1 repo2
a a
/ /
b c
Now repo2 merges with repo1:
repo1 repo2
a a
/ / \
b c b
\ /
m
repo1 tip ('b') is not ancestor of repo2 tip ('c') so a three-way merge
is done and a new 'm' merge commit is created.
And now repo1 merges with repo2:
repo1 repo2
a a
/ \ / \
c b c b
\ / \ /
m m
Because previous repo1 tip ('b') was ancestor of repo2 tip ('m'), a
fast-forward happenned and repo1 tip simply moved to 'm'. But this
"flipped" the development from repo1 POV - you cannot assume anymore
that the first ("leftmost") parent is special.
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-
Yes, bzr has similar thing too. AIUI, the difference is that git does it automatically, while bzr has two commands in its UI, "merge" and "pull". In your case, the "leftmost ancestor" of m is b, because at the time it was created, it was commited from b. One problem with that approach is that from revision m and looking backward in history (say, running "bzr log"), you have two ways to go backward: 1) Take the history of _your_ commits, and your pull till the point where you've branched. 2) Follow the history taking the leftmost ancestor at each step. In bzr, the notion of "branch" corresponds to a succession of revisions, which are explicitely stored in a file (ls .bzr/branch/revision-history), which is what commands like "log" follow, and what is used for revision numbering. And this sucession of revision must obey (at most) one of the above. In the past, it was 1), which means that "pull" (i.e. fast-forward) was only adding revisions to a branch. In your scenario, repo1 would get a revision history of "a c m" while repo2 would have had "a b m" with the same tip. Today, the revision history follows leftmost ancestor. One good property of this is that revision history is unique for a given revision. But the terrible drawback is that "pull" and "push" do not /add/ revisions to your revision history, they rewrite the target one with the source one. That means I can have $ bzr log --line 1: some upstream stuff 2: started my work 3: continued my work # upstream merges. $ bzr pull $ bzr log --line 1: some upstream stuff 2: some other upstream stuff ... 3: ... commited while I was working 4: merged from Matthieu this terrible feature -- Matthieu -- definitely curious to give a real try to git ;-) -
Yes. We're identifying the core underlying technical difference behind
the recent discussion. Namely bzr treats one parent as special, (the
parent that was the branch tip previously). And this special treatment
eliminates the ability to fast-forward, adds merge commits that
wouldn't exist with fast forwarding, and is able to make its revision
There's a bit more to it than that though. The git command named
"pull" will perform a fast-forward if possible, but will create a
merge commit if necessary. For example:
a a a
| pulls | and fast-forwards to |
b b b
| |
c c
whereas:
a a a
| pulls | and creates a merge / \
b c b c
\ /
m
So I'm curious. What does bzr pull do in the case of divergence like
It should be mentioned that git can, (annoyingly not by default), save
a file detailing the history of a branch, (time a revision ID for
every time the branch tip moved). This is the "reflog" support and
provides the same information that bzr is encoding in its "leftmost
ancestor" branches.
Importantly, though, git's reflog is entirely local and is not
Uhm, don't you really have to follow both? And the only ambiguity is
OK. With git the two reflogs on the two machines would also have "a c
m" and "a b m". But is this the only kind of log that exists? If I
had code history as above and wanted to ask questions about what led
to commit m, then I would want to know about both b and c which
contribute to it.
And that's what "git log" provides. It lists all the commits that are
reachable from a given commit by following parent links. Surely bzr
has a way to view the complete history that way?
Meanwhile, I suggest that there really is no significance to which
parent of a commit used to have the branch head pointing at ...No. bzr could trivially do fast-forward too. It's an explicit design They don't exist either with "pull". The difference between bzr and git is smaller than you think on this The bzr command "pull" will do a fast-forward if possible, but will refuse to continue and ask you to create the merge commit with other Here, bzr will refuse to pull. It will say "branches have diverged" and tell you to use merge. Then, you'll do $ bzr merge # optionally "bzr status" $ bzr commit -m "merged such or such thing" So, "git pull" seems roughly equivalent to something like Not yet. The "numbers will be changed" is if b pulls, right after. Then, one other difference is in the UI. bzr shows you commits in a kind of hierarchical maner, like (fictive example, that's not the real exact format). $ bzr log commiter: upstream@maintainer.com message: merged the work on a feature ------ commiter: contributor@site.com message: prepared for feature X ------ commiter: contributor@site.com message: implemented feature X ------ commiter: contributor@site.com message: added testcase for feature X ------ commiter: upstream@maintainer.com message: something else No big difference in the model either, but it probably reveals a different vision of what "history" means. -- Matthieu -
I have lost somewhere among many emails in this thread the email I wanted to reply to, the one mentioning for the first time the lack of parents ordering in GIT, but this one should do. There are exactly _two_ places where Git treats first parent specially (correct me if I'm wrong). First, <commit-ish>^ is shortcut for <commit-ish>^1, i.e. for first parent of commit. <commit-ish>~<n> is shortcut for <commit-ish>^^...^ (n-times '^'), which means that <commit-ish>~<n> is n-th parent in 1st-parent lineage of <commit-ish>. But you can always use names like for example next~12^2^^2~2. Second, git-diff with only one <commit-ish> generates diff to first parent. But you can always use '-c' or '-cc' combined diff format or '-m' with default diff format to compare to _all_ parents. -- Jakub Narebski Poland -
I stand corrected: git-diff refuses to show anything if provided with only one commit, and commit has more than one parent. So it does not reat first parent specially. -- Jakub Narebski Poland -
Yes, it seems you have found the needle. :-) In git, history is a DAG; a commit has a _set_ of parents, so by definition they are not ordered. This has a number of consequences. For example, you can't really answer the question "Which branch was this commit on?". All you can say is that "This commit is reachable from (and therefore part of) branches X, Y, and Z." In all other SCMs I have seen, a "branch" is conceptually an ordered series of commits (some of which may be merges). In git, a "branch" is a pointer to a commit, period. The commit knows its set of parents, so all its history is there, but there is fundamentally no way to tell which branch a commit was "on" when it was created. This is an important point; it means there is no concept of "my" or "your" branch. Every participant is adding commits to the same DAG, and may at any point decide to share her additions with someone else, or keep them private forever. And because "branches" don't really exist, every commit really is created equal. Really, every commit. Not even the initial commit of a project is special -- it's just a commit with an empty parent set. And, it's perfectly possible to make a (merge) commit whose parents belong to previously disconnected parts of the DAG. This of course means that it's not even possible to differentiate commits based on which project they're part of, since one can create a commit whose parents belong to different projects. All commits are _really_ born equal! There's just one great DAG of all git commits that could possibly exist. (This has been done in git's own history; the graphical viewer gitk was originally a separate project, with its own initial commit, but that initial commit is now reachable from all commits currently being made to git -- that is, it has been merged.) This structure of things may seem complex, since it's different, but mathematically it's quite simple, and that's what counts in the end if you want to do nontrivial things. -- Karl Hasselstr
On Thu, Oct 19, 2006 at 01:46:39PM +0200 I heard the voice of
By default, merge will refuse to do its thing if there are uncommitted
changes in the working tree, whether those changes are something
you've done, or the pending results of a previous merge. A '--force'
arg to merge will make it go forward though, so yes, you can merge
multiple other branches in one merge if you want to.
Actually, I can kill 2 birds here. Quick little bictopus merge:
% bzr log --show-ids
------------------------------------------------------------
revno: 2
revision-id: fullermd@over-yonder.net-20061019151856-c3b406b8bcdfb537
parent: fullermd@over-yonder.net-20061019151437-5b99dff6ed1d76cd
parent: fullermd@over-yonder.net-20061019151800-2fe41e4949f5e237
parent: fullermd@over-yonder.net-20061019151807-3d7047e387edcad9
committer: Matthew Fuller <fullermd@over-yonder.net>
branch nick: a
timestamp: Thu 2006-10-19 10:18:56 -0500
message:
merge
------------------------------------------------------------
revno: 1.2.1
merged: fullermd@over-yonder.net-20061019151800-2fe41e4949f5e237
parent: fullermd@over-yonder.net-20061019151437-5b99dff6ed1d76cd
committer: Matthew Fuller <fullermd@over-yonder.net>
branch nick: b
timestamp: Thu 2006-10-19 10:18:00 -0500
message:
bar
------------------------------------------------------------
revno: 1.1.1
merged: fullermd@over-yonder.net-20061019151807-3d7047e387edcad9
parent: fullermd@over-yonder.net-20061019151437-5b99dff6ed1d76cd
committer: Matthew Fuller <fullermd@over-yonder.net>
committer: Matthew Fuller <fullermd@over-yonder.net>
branch nick: c
timestamp: Thu 2006-10-19 10:18:07 -0500
message:
baz
------------------------------------------------------------
revno: 1
revision-id: fullermd@over-yonder.net-20061019151437-5b99dff6ed1d76cd
committer: Matthew Fuller <fullermd@over-yonder.net>
branch nick: a
timestamp: Thu 2006-10-19 10:14:37 -0500
message:
...On Thu, Oct 19, 2006 at 11:01:03AM -0500 I heard the voice of
Let me elaborate a little on this.
for the previously discussed merge, basically duplicating
'fast-forward' behavior. It doesn't currently, but it could just as
well without disturbing the attributes it gains from assigning meaning
to the left-most parent. The choice to create E is the result of an
independent decision from the choice to treat the left path as
special.
What the leftmost discussion impacts is the case of
a-.
|\ \
| b c
|/ /
D-'
vs
a-.-.
\ \ \
b c |
/ / /
D-'-'
Now, the branches are distinct to bzr, but they're not different. If
you try to merge one from the other, merge will quite rightly tell you
there's nothing to do, since you both have all the same revs. git
doesn't recognize the distinction at all, of course. The difference
is mostly cosmetic. But, it's a cosmetic difference that bzr devs
(and users, I venture) find _useful_, which is why it's fought for.
And everything else seems to follow from that.
If you don't think the distinction is meaningful or useful, you can
ignore it, and the tool should work just fine. The main place the
distinction would show up is in the cosmetics of how "log" looks (and
probably similarly in any tool that graphically describes ancestry),
and a custom log output formatter could probably be very easily
written to obviate even that.
--
Matthew Fuller (MF4839) | fullermd@over-yonder.net
Systems/Network Administrator | http://www.over-yonder.net/~fullermd/
On the Internet, nobody can hear you scream.
-
Right. You have to do it your way, because of the "simple revision numbers". Which gets us back to where we started: "simple" is in the eye of the beholder. I personally think that git revision naming is a lot simpler, exactly because it doesn't impose arbitrary rules on users. For example, what happens is that: - you like the simple revision numbers - that in turn means that you can never allow a mainline-merge to be done by anybody else than the main maintainer - that in turn means that the whole situation is no longer distributed, it's more like a "disconnected access to a central repository" The "main trunk matters" mentality (which has deep roots in CVS - don't get me wrong, I don't think you're the first one to do this) is fundamentally antithetical to truly distributed system, because it basically assumes that some maintainer is "more important" than others. That special maintainer is the maintainer whose merge-trunk is followed, and whose revision numbers don't change when they are merged back. That may even be _true_ in many cases. But please do realize that it's a real issue, and that it has real impact - it does two things: - it impacts the technology and workflow directly itself: "pull" and "merge" are different: a central maintainer would tend to do a "merge", and one more in the outskirts would tend to do more of a "pull", expecting his work to then be merged back to the "trunk" at some later point) - it will result in _psychological_ damage, in the sense that there's always one group that is the "trunk" group, and while you can pass the baton around (like the perl people do), it's always clear who sits centrally. Maybe this is fine. It's certainly how most projects tend to work. I'll just point out that one of my design goals for git was to make every single repository 100% equal. That means that there MUST NOT be a "trunk", or a special line of development. There is no "vendor branch". ...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 That's not true of bzr development. The "main maintainer" that runs the bzr.dev is an email bot. It's not an integrator-- its work is purely mechanical. It can't resolve merge conflicts. Most of the merge work is done in integration branches run by the core developers. Although Martin is our project leader, lays out ground rules, and makes design decisions, he doesn't have to be involved in any Linus, if you got hit by a bus, it would still be a shock, and it would still take time for the Linux world to recover. Your insights and talent, both technical and social, make you the most important kernel developer. And it stays that way because you deserve it. Projects with good leadership don't fork, or if they do, the fork withers and dies pretty quickly. It is fine to say all branches are equal from a technical perspective. - From a social perspective, it's just not true. The scale of Bazaar development is much smaller than the scale of kernel development, so it doesn't make sense to maintain long-term divergent branches like the mm tree. We do occasionally have long-lived feature As I mentioned earlier, there are four people who each run their own I think you're implying that on a technical level, bzr doesn't support this. But it does. Every published repository has unique identifiers for every revision on its mainline, and it's exceedingly uncommon for these to change. There are special procedures to maintain bzr.dev, but there's nothing technically unique about it. People develop against bzr.dev rather than my integration branch, because they have non-technical reasons for wanting their changes to be merged into On an actively-developed bzr branch, the first parent *is* special: - - it's a revision that you committed - - the diff between a revision and its first parent is the same as the I don't think your analysis holds together completely, because all actively-maintained branches have very stable ...
That's actually a very important insight, but supporting the wrong conclusion. In a healthy situation, the only thing that makes a branch special are social issues, such as you describe. That's how it should be. But think about your favorite example of an unhealthy social situation around a software project and a big, nasty fork. Every example I can think of involves some technical distinction that makes one branch more special than another. Now, those situations also involve social problems, and those are even more significant. But the technical blessing of one branch does not help. And I think it contributes to the social problems in many cases. So, I think the technical thing that is distributed version control is an extremely important thing for us to use to help maintain healthy social software projects. Reducing the technical hurdle of a fork, (to where continual forking is actually a totally expected part of the process), is a very healthy thing. Now, both bzr and git are distributed systems, and either one will help a great deal in the respects I'm talking about compared to something like cvs. As far as the revision numbers, my impression is that the numbers would be confusing or worthless if I were to use bzr the way I'm Which just says to me that the bzr developers really are sticking to a centralized model. That's fine, but it does have impacts, and the tool Every argument you make for the number change being uncommon just strengthens the argument that it will be all that more confusing/frustrating when the numbers do change. In cairo, for example, we've made a habit of including a revision identifier in our bug tracking system for every commit that resolves a bug. I like having the assurance that those numbers will survive forever. And it doesn't matter if the repository moves, or the project is forked, or anything else. Those numbers cannot change. I understand that bzr also has unique identifiers, but it sounds like the tools try to hide ...
There is a mix of - Just giving the overall tarball version number, which is most meaningful to users (and not related to bzr versions) - Giving a mainline revision number, which will never revert because we never pull (fast-forward) that branch. That has the substantial (imo) benefit that you can immediately compare these numbers by eye, and they are easy to quote. - Giving a unique id, which is obviously most definitive and appropriate if you're talking about something which is not on the mainline or a well known branch. The launchpad.net bug tracker links branches to bugs and does this through revision ids. -- Martin -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I'm not as familiar with those details. The one fork that I know a lot about, when Baz (the old Bazaar architecture) forked off from Arch, showed me that for each developer branch, one branch must be special. This is just because it is hard to maintain a branch that applies cleanly to two diverging codebases. So each developer must develop against the fork that they want to merge their code into. If they want their code to be applied to the other fork, someone must port it. So I really do feel that special branches are inescapable. With bzr, you have the freedom to choose which branch you consider special, and change your mind at any time. There are no technical They would remain stable if you only used pull to update your origin I don't see why you're reaching that conclusion. I'd like to understand that better, because Linus seems to be concluding the same thing, and it That doesn't follow. Just because something is arguably true doesn't make it bad. And in this case, I'm not arguing that it's true, I'm We do it the other way around: we put a bug number in the commit message. And I personally have been developing a bugtracker that is distributed in the same way bzr is; it stores bug data in the source Yes, we put revnos in our bug trackers. No, we can't prove that they will always be valid. But there are significant disincentives to changing them, so I am quite comfortable assuming they will not change. And the older a revno gets, the less likely it is to change. On the other hand, I think your revision identifiers are not as permanent as you think. In the first place, it seems fairly common in the Git community to rebase. This process throws away old revisions and creates new revisions that are morally equivalent[1]. I don't know whether Git fetches unreferenced revisions, but bzr's policy is to fetch only revisions referenced in the ancestry DAG of the branch. In the second place, one must ...
First, I want to point out that I think we're having a delightfully enlightening conversation here, and I'm glad for that. Let me provide a couple of hypothetical situations to try to demonstrate my thinking here. The first is far-fetched but perhaps easier to understand the implications. But the second is the real, everyday situation that is much more important. Far-fetched ----------- Let's imagine there's a complete fork in the bzr codebase tomorrow. We need not suppose any acrimony, just an amiable split as two subsets of the team start taking the code in different directions. Now, at the time of the fork, all published revision numbers apply equally well to either team's codebase, (obviously, since they are identical). But as the projects diverge they each start publishing revision numbers with respect to their own repositories in their own bug trackers, etc. Obviously, each project has its own "mainline" so these new revision numbers are only unique within each project and not between the two. Time passes... Finally the two teams (who had remained good friends after the breakup) find a unifying theory that will let them work on a single tool that will meet the needs of both user bases. So they want to merge their code together. After the merge, there can be only one mainline, so one team or the other will have to concede to give up the numbers they had generated and published during the fork. That is, the numbers will not be usable within the new, merged repository. Everyday -------- Now, the above scenario is just silly. It's not likely to ever happen, so it's really not worth considering as a motivating case. But, what does (and should) happen everyday is exactly the same. So here's a realistic situation that is worth considering: An individual takes the bzr codebase and starts working on it. It's experimental stuff, so it's not pushed back into the central repository yet. But our coder isn't a total recluse, so his friends help him with the code ...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I don't think this is true. The abandoned mainline does not need to be destroyed. It can be kept at the same location that it always was, with the numbers that it always had. So the number + URL combo stays meaningful. Additionally, the new mainline can keep a mirror of the abandoned mainline in its repository, because there are virtually no They certainly can. The coder says "I've put up a branch at http://example.com/bzr/feature. In revision 5, I started work on feature A. I finished work in revision 6. But then I had to fix a related bug in revision 7." As long as that coder is active, they'll keep their repository at the same location. And because branches are cheap (even cheaper than delta-compressed revisions), there's no reason to delete old branches. This is true, but his code is likely to all land in the mainline at once. Since his own revnos are more fine-grained, he's not likely want I felt that you were mischaracterizing my _statement_ that "it's exceedingly uncommon for [revnos] to change" as an _argument_ "it's exceedingly uncommon for [revnos] to change". The reality is that we keep saying revnos don't change because git users keep saying "but what If you're interested, it's called "Bugs Everywhere" and it's available here: http://panoramicfeedback.com/opensource/ So actually, not all branches are treated equally by Git users. Public branches are treated as append-only, but private branches are treated as Same here. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFOAPm0F+nu1YWqI0RAhkdAJ9InxuEjbToGQU2AOJmfZw124Lb2wCeMmDC 9w08eZbmL19FfVQmtpPcYkQ= =AmGo -----END PGP SIGNATURE----- -
Sure that's possible, but it gets rather unwieldy the more repositories you have involved. I've been arguing that bzr really does encourage centralized, not distributed development, and you were having trouble seeing how I came to that conclusion. Do you see how "maintain an independent URL namespace for every distributed branch" doesn't And this part I don't understand. I can understand the mainline storing the revisions, but I don't understand how it could make them accessible by the published revision numbers of the "abandoned" ...which is what you just said there yourself. On the other hand, git names really do live forever, regardless of where the code is hosted or how it moves around. When I'm talking about historical stability, I'm talking about being able to publish numbers that live forever. It sounds like bzr has numbers like this inside it, (but not nearly as simple as the ones that git has), but that users aren't in the practice of communicating with them. Instead, users communicate with the unstable numbers. And that's a shame from an historical What I'd like to be able to do, is advertise a temporary repository, and while using it, publish names for revisions that will still be valid when the code gets pushed out to the mainline. That is supporting distributed development, and everything I'm hearing says OK. The original claim that sparked the discussion was that bzr has a "simple namespace" while git does not. We've been talking for quite a while here, and I still don't fully understand how these numbers are generated or what I can expect to happen to the numbers associated with a given revision as that revision moves from one repository to another. It's really not a simple scheme. Meanwhile, I have been arguing that the "simple" revision numbers that bzr advertises have restrictions on their utility, (they can only be used with reference to a specific repository, or with reference to another that treats it as canonical). I _think_ I understand ...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I understand your argument now. It's nothing to do with numbers per se, I meant that the active branch and a mirror of the abandoned branch could be stored in the same repository, for ease of access. Bazaar encourages you to stick lots and lots of branches in your repository. They don't even have to be related. For example, my repo I can see where you're coming from, but to me, the trade-off seems worthwhile. Because historical data gets less and less valuable the older it gets. By the time the URL for a branch goes dark, there's When you create a new branch from scratch, the number starts at zero. If you copy a branch, you copy its number, too. Every time you commit, the number is incremented. If you pull, your numbers are adjusted to be identical to those of the branch you pulled from. Sure. It's the "favors centralization" thing that I don't agree with, In my experience, users who don't understand distributed systems don't What's nice is being able see the revno 753 and knowing that "diff -r 752..753" will show the changes it introduced. Checking the revo on a branch mirror and knowing how out-of-date it is. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFOCEf0F+nu1YWqI0RAhgtAJwK4jkWFjjF2iHJb1VyXqgszsHElACff2U7 olZJiAED80tIS6kgkqFsJps= =BkRZ -----END PGP SIGNATURE----- -
I don't know if that is what Carl's problem is, but yes, to somebody from the git world, it's totally insane to have the _same_ commit have ten different names just depending on which branch is was in. In git-land, the name of a commit is the same in every branch. Do you have something like gitk --all in your graphical viewers? That one shows _all_ the branches of a repository, and how they relate to each other in git. How do you name your commits in such a viewer, since every branch has a _different_ name for the same commit? Linus -
I've been following the git-vs-bzr discussion, and I'd like to ask a question (being new to both bzr and git). How does git disambiguate SHA1 hash collisions? I think git has an alternative way to name revisions (can someone please explain it in more detail, I've seen <ref>~<n> mentioned only in passing in this thread). It seems to me collisions are a good argument in favour of having two independent naming schemes, so that you're not solely relying on hashes being unique. A strong argument is that a global namespace based on hashes of data is ideal because the names are generated from the data being named, and therefore are immutable. Same data => same name for that data, always and forever, which is desirable when merging named data from many sources. But the converse isn't true: one name does not necessarily map to only that data. Have I misunderstood? Is this a problem? Ta, Loki -
Hi, It does not. You can fully expect the universe to go down before that happens. The only reasonable worry is about SHA-1 being broken some time in future, i.e. being able to construct a malign version of some source code _which has the same hash_. There were plenty of discussions about that; Please search the mailing list. (The consent was that those do not matter, because an existing object will _never_ be overwritten by a fetch, so you would not get that invalid object anyway.) Hth, Dscho -
Hi, Dear diary, on Fri, Oct 20, 2006 at 10:38:48AM CEST, I got a letter well, that's somewhat a bold statement, since when you have a way to fabricate malicious objects, you probably can socially engineer to have it distributed to a large portion of repositories if you try hard enough. Or you hack kernel.org and replace the object. Who knows. But the thing is that noone has come any closer to this kind of attack at all. Currently known attacks are that you can relatively fast (which doesn't mean "5 minutes"; I think that in case of SHA1 the complexity is still huge, just smaller than intended, but I may remember wrong; you can get a MD5 collision of this kind within one minute on a standard notebook) create a _pair_ of objects sharing the same hash, where both objects contain a big binary blob. So you would first have to engineer to have one of those objects accepted officially, then engineer the malicious one getting in. Generating an object that hashes to a predetermined value is much harder problem and AFAIK there's no much progress in breaking this. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
Dear diary, on Fri, Oct 20, 2006 at 09:47:16AM CEST, I got a letter This is just a notion that lets you point to revisions relative to a given id. <id>~<n> means n-th ancestor of the given commit. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
If you want pretty name, you tag it. Tags are exchanged during fetch/push operation. And you can have pretty names of revisions > 752..753" will show the changes it introduced.
How does git chose which ancestor to use if this revision has more than one in this case? -- Matthieu -
Well if there is more than one parent, then there are more than one diff. For instance this is a merge commit which I asked to 'see'. This gets shown in the combined diff format, showing the results of the conflict resolution. diff --cc this index fbbafbf,10c8337..43b7af0 --- a/this +++ b/this @@@ -1,3 -1,3 +1,4 @@@ 1 + 2a +2b 3 If you want to know each individual diff in a more 'standard' form you can ask about the parents specifically. apw@pinky$ git diff HEAD^1.. diff --git a/this b/this index fbbafbf..43b7af0 100644 --- a/this +++ b/this @@ -1,3 +1,4 @@ 1 +2a 2b 3 apw@pinky$ git diff HEAD^2.. diff --git a/bar b/bar new file mode 100644 index 0000000..8dc5f23 --- /dev/null +++ b/bar @@ -0,0 +1 @@ +this that other diff --git a/this b/this index 10c8337..43b7af0 100644 --- a/this +++ b/this @@ -1,3 +1,4 @@ 1 2a +2b 3 -
If a revision has multiple parents, what does it diff against in this case? Do you get one diff against each parent revision? James. -
If revision has multiple parents (is merge commit), git-diff
(which is used by git-show) does not show differences (unless you
give two revisions in git-diff case).
You can either use '-m' option to show differences from all its
parents, or '-c'/'--cc' to show combined diff ('--cc' shows more
compact diff).
--
Jakub Narebski
Poland
-
I was accustomed to doing such things in CVS, but I find the git way much more pleasant, since I don't have to do any arithmetic: diff d8a60^..d8a60 (Yes, I am capable of performing subtraction in my head, but I find that a "parent-of" operator matches my cognitive model better, especially when you get into things like d8a60^2~3). Does bzr have a similar shorthand for mentioning relative commits? -Peff -
By the way "diff d8a60" also works (unless d8a60 is merge commit, in
By the way, git has the following extended SHA1 syntax for <commit-ish>
(documented in git-rev-parse(1)):
* full SHA1 (40-chars hexadecimal string) or abbreviation unique for
repository
* symbolic ref name. E.g. 'master' typically means commit object referenced
by $GIT_DIR/refs/heads/master; 'v1.4.1' means commit object referenced
[indirectly] by $GIT_DIR/refs/tags/v1.4.1. You can say 'heads/master'
and 'tags/master' if you have both head (branch) and tag named 'master',
but don't do that. HEAD means current branch (and is usually default).
* <ref>@{<date>} or <ref>@{<n>} to specify value of <ref> (usually branch)
at given point of time, or n changes to ref back. Available only if you
have reflog for given ref.
* <commit-ish>^<n> means n-th parent of given revision. <commit-ish>^0
means commit itself. <commit-ish>^ is a shortcut for <commit-ish>^1.
<commit-ish>~<n> is shortcut for <commit-ish>^^..^ with n*'^', for
example rev~3 is equivalent to rev^^^, which in turn is equivalent
to rev^1^1^1
Additionally it has following undocumented extended SHA1 syntax to refer
to trees (directories) and blobs (file contents)
* <revision>:<filename> gives SHA1 of tree or blob at given revision
* :<stage>:<filename> (I think for blobs only) gives SHA1 for different
versions of file during unresolved merge conflict.
I'm not enumerating here all the ways to specify part of DAG of history,
except that it includes "A ^B" meaning "all from A", "exclude all from B",
"B..A" meaning "^B A", "A...B" meaning "A B --not $(git merge-base A B)",
and of course "A -- path" meaning "all from A", "limit to changes in path".
What about _your_ SMC? ;-)
--
Jakub Narebski
Poland
-
Hi, I could be wrong, but I have the impression (even after actually testing it) that "git diff d8a60" is equivalent to "git diff d8a60..HEAD", _not_ "git diff d8a60^..d8a60". IIRC we had a "-p" flag to denote "parent" once upon a time, but that no longer works... "git-show" is definitely what you want. Ciao, Dscho -
Ooops, I mixed git-diff-tree (which behaves as mentioned above) with
git-diff, which according to documentation compares with working tree
(and not HEAD) if only one <tree-ish> is given.
git-diff(1):
? When one <tree-ish> is given, the working tree and the named tree are
compared, using git-diff-index. The option --cached can be given to com-
pare the index file and the named tree.
git-diff-tree(1):
If there is only one <tree-ish> given, the commit is compared with its par-
ents (see --stdin below).
--
Jakub Narebski
Poland
-
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yes, you could e.g. do: bzr diff -r before:753..753 Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFOl9s0F+nu1YWqI0RAhW7AJ4vi4kgen/8h6j2AgueU+kcsmLrPwCeKry9 pp68K4rAmXjjkPvK32LvmPk= =qDn2 -----END PGP SIGNATURE----- -
What about grandparent of commit (d8a60^^ or d8a60~2 in git), or choosing one of the parents in merge commit (d8a60^2 is second parent of a commit)? before:before:753 ? -- Jakub Narebski Poland -
Yes, "before:" can take any revision specifier, including "before:something-else". -- Matthieu -
Well, I'm glad to know we each feel like we are communicating at The entire discussion is about how to name things in a distributed system. The premise that Linus has put forth in a very compelling way, is that attempting to use sequential numbers for names in a distributed system will break down. The breakdown could be that the names are not stable, or that the system is used in a centralized way to avoid the instability of the names. Now, that causality might not accurately describe the way bzr has developed. It may be that the centralization bias was determined by other reasons, and that given those, using sequential numbers for names makes perfect sense. But it really is fundamental and unavoidable that sequential numbers Granted, everything can be stored in one repository. But that still doesn't change what I was trying to say with my example. One of the repositories would "win" (the names it published during the fork would still be valid). And the other repository would "lose" (the names it published would be not valid anymore). Right? Now, maybe there's some "simple" mapping from old names to new names for the losing repository, (something like adding a prefix of "losers/" to the beginning of the names or something or adding a "15." prefix or whatever). The point is that the old names are invalidated. And there's no way to guarantee this kind of change won't happen in the future, (no matter how old a project is). I constructed that example to show that the naming has a social impact in forcing a distinction between winners and losers in the merge, (or mainline and side branch, or whatever you want to name the distinction). The two re-joining projects could be really amiable, create a new virgin mainline and treat both histories as side branches. In this version, everyone loses as all the old names are Git allows this just fine. And lots of branches belonging to a single project is definitely the common usage. It is not common (nor encouraged) for unrelated ...
On Fri, Oct 20, 2006 at 02:48:52PM -0700 I heard the voice of I think we're getting into scratched-record-mode on this. Git: Revnos aren't globally unique or persistent. Bzr: Yes, we know. G: Therefore they're useless. B: No, they're very useful in [situation] and [situation], and we deal with [situation] all the time, and they work great for that. G: But they fall apart totally in [situation]. B: Yes, so use revids there. G: So use revids everywhere. B: Revnos are handier tools for [situation] and [situation] for [reason] and [reason]. *brrrrrrrrrrrrrrrrip!!!* *skip back to start* I'm not sure there's any unturned stone left along this line, so I'm not sure how productive it really is to keep walking down it. So, to make something productive of it, I'm going to put it onto my todo list to spend some time with bzr trying to use revids for stuff. I'm fairly certain that, due to the bzr cultural tendancy to use revnos where possible, there are some rough edges in the UI when using revids that should be filed down (though I think it much less likely to turn I think it's more accurately describable as a branch-identity bias. The git claim seems to be that the two statements are identical, but I The term is somewhat overloaded, which is why it's causing you trouble (and did me). It refers both to the conceptual entity ("a line of development" roughly, much like what 'branch' means in git and VCS in general), and to the physical location (directory, URL) where that branch is stored, and where it'll often have a working tree. Branches Then all branches stored under that 'bzrtest' dir will use the bzrtest/.bzr/ dir for storing the revisions, and shared revisions will only exist once saving the space/time for multiple copies. Probably, you'd actually want 'init-repo --trees' in this case, because repos default to being [working]tree-less. In a tree-less setup, you'd create a [lightweight] checkout of the branch(es) you wanted to work on ...
This is wrong. There are two kinds of checkouts lightweight.. and "normal/heavyweight". I think you are getting this alittle wrong, and I think the reason is that you are thinking of repositories, while in bzr you normally think of branches. For example, I think (correct me if I'm wrong) that if I have a git repository of a upstream linux-repo (Linus' for example). I guess I'll use "pull" to keep my copy up to date with the upstream repo? If I then would like to hack something special, I would "clone" the repo and get a new repo and that's where I do my work. Is that correct? In bzr you never (well...) clone a full repository, but you clone one line-of-development (a branch). So "bzr branch" is always a "one-branch-only "clone" in git or cg". "bzr checkout" is a "bzr branch" followed by a setting saying "whenever you commit here, commit in the master branch also". "bzr checkout --lightweight" is a way to get only a snapshot of the working tree out of a branch. Whenever you commit, it's done in the remote branch. /Erik -
Note: instead of symlinking .git/objects/ objects database, you can simply set and export GIT_OBJECT_DIRECTORY environment variable. -- Jakub Narebski Poland -
On Sat, Oct 21, 2006 at 04:08:18PM +0200 I heard the voice of This is obviously some new meaning of "centralization" bearing no resemblance whatsoever to how I understand the word. In git, apparently, you don't give a crap about a branch's identity (alternately expressible as "it has none"), and so you throw it away all the time. Given that, revnos even if git had them would never be of ANY use to you, so it's no wonder you have no use for the notion. I DO give a crap about my branchs' identities. I WANT them to retain them. If I have 8 branches, they have 8 identities. When I merge one into another, I don't WANT it to lose its identity. When I merge a branch that's a strict superset of second into that second, I don't WANT the second branch to turn into a copy of the first. If I wanted that, I'd just use the second branch, or make another copy of it. I don't WANT to copy it. I just want to merge the changes in, and keep on with my branch's current identity. Maybe that's what you mean by 'centralization'; each branch is central to itself. That seems a pretty useless definition, though. In my mind, actually, it's MORE distributed; my branch remains my branch, and your branch remains your branch, and the difference doesn't keep us from working together and moving changes back and forth. Forcing my branch to become your branch sounds a lot more "centralized" to me. Now, we can discuss THAT distinction. I'm not _opposed_ to git's model per se, and I can think of a lot of cases where it's be really handy. But those aren't most of my cases. And as long as we don't agree on branch identity, it's completely pointless to keep yakking about revnos, because they're a direct CONSEQUENCE of that difference in mental model. See? They're an EFFECT, not a CAUSE. If bzr didn't have revnos, I'd STILL want my branch to keep its identity. You could name the mainline revisions after COLORS if you wanted, and I'd still want my branch to keep its identity. Aren't we ...
OK, let's discuss. :) I think the concept of "my" branch doesn't make any sense in git. Everyone is working collectively on a DAG of the history, and we all have pointers into the DAG. Something is "my" branch in the sense that I have a repository with a pointer into the DAG, but then again, so do N other people. I control my pointer, but that's it. So don't think of it as "git throws away branch identity" as much as "git never cared about branch identity in the first place, and doesn't think it's relevant." Now, there are presumably advantages and disadvantages to these approaches. I like the fact that I can prepare a repository from scratch, import it from cvs, copy it, push it, or do whatever I like, and the end result is always exactly the same (revids included). With your model, on the other hand, it seems the advantages are that in many The difference, I think, is that it's easier in git to move the upstream around: you simply start fetching from a different place. I'm not clear on how that works in bzr (if it invalidates revnos or has other side effects). -Peff -
That's good example of fully distributed approach. I can fetch directly (actually, I cannot) from Junio private repository, I can fetch from public git.git repository, either using git:// or http:// protocol, I can fetch from somebody else clone of git repository: intermixing those fetches, and revids (commit-ids) remain constant and unchanged. -- Jakub Narebski Poland -
Moving upstram around does not invalidate revnos. Switching to different
upstream (ie. the head revisions are different) does. And this may
happen by doing a merge with the previous mainline as non-first parent
-- revnos are simply short aliases for revids, not persistent unique
So they (revids) do in bzr.
--------------------------------------------------------------------------------
- Jan Hudec `Bulb' <bulb@ucw.cz>
-
This is nice for a couple of situations: - if some particular machine is down, nobody really cares. It doesn't really change the workflow at all if "master.kernel.org" were to be off-line due to some trouble - it just happens to be a machine with good bandwidth that a number of kernel (and git) developers have access to, but if you want to sync with something else, go wild. We could just sync directly between developers, although most people tend to have firewalls (I certainly have a very anal one - not even ssh gets in) making it usually easier to go through some - any - public place. But in git, the "public place" really is just an intermediary. It has nothing to do with anything history-wise, and it's revision ID's are a non-issue. It's just a temporary staging area (although re-using the same repo over and over for pushing things out obviously means you can do just incremental updates, so most everybody does that) - sometimes you have multiple branches in the same tree that have very _different_ sources. For example, you might start out cloning my tree, but if you _also_ want to track the stable tree, you just do so: you can just do git fetch <repo> <remote-branch-name>:<local-branch-name> at any time, and you now have a new branch that tracks a different repository entirely (to make it easier to keep track of them, you'd probably want to make note of this in your .config file or your remote tracking data, but that's a small "usability detail", not a real conceptual issue). - the same "multi-source" thing is true for pushing things out too, not just fetching: I still have my personal git.git repository on kernel.org for historical reasons, even though Junio maintains the normal one. So when I did some experimental (and broken) stuff for "git unpack-objects" in a local branch, and others were interested in fixing it, I just pushed it out to my git repo as a ...
On Sat, Oct 21, 2006 at 03:19:49PM -0400 I heard the voice of This is as I understand it. But in my mind, it does make sense. I fundamentally DO think of "my commits" differently from "revisions I've merged", and I want the tool to preserve that for me. "My commits" tend to be steps along a path, "merges" tend to be completed paths. I usually use bzr's "log --short" for looking at logs, which doesn't show merged revs at all. That works, because most of the time I don't care about them; I know if I merged something, it's a completed piece, which I described in the log message; it's not a PART of a task like my commits usually are. So, just the message for my merge rev tells me what I need to know, and if I need to drill down into it, I can use the regular (--long) log output to look at the revision in it. This lets me know, for instance, that if I want to re-check something I did 3 commits ago, and I just merged another branch, the commit I'm interested in is the 4th commit back on the mainline; I don't need to grub through a bunch of revisions that aren't mine to try and find it. So, if me and Bob are working on different bits of the same project in parallel, finish up, and merge back and forth to sync up (ignoring for the moment the "empty merge commit" bit), even though we now both have the 'same' stuff, we have the same head rev with all the same parents, the parents are in a different order, and my 'mainline' (the path of left-most parents, or 'first' as I understand git calls them) is different than his; my mainline is my commits, his mainline is his. If one of us were to 'pull' the other, our branch would become a duplicate of his and so adopt his 'mainline', which we want to avoid because then it doesn't fit the mental model of "what I did", which is what I think of my branch as. Obviously, this is a totally foreign mentality to git, and that's great because it seems to work for you. I can see advantages to it, and I can conceive of situations where I ...
On Sat, 21 Oct 2006 16:46:29 -0500 It's not completely foreign, it's one of the things you can use the git reflog feature to record. It's just that it's utterly clear in Git that this is a local feature and is never replicated as part This is where the git model is clearly superior and allows a true distributed model. Because there is no concept of a "mainline" (except locally via reflog) you can always merge with anyone participating in the DAG without having to overwrite or lose ordering. Sean -
I don't think so. Recently, I've been trying to track a particular patch in the kernel. It was done as a series of commits, and probably would have been its own branch in bzr, but when I was trying to group the commits together to analyze them as a group, the easiest way to do that was by the original committer's name. Now, there's probably a better way to hunt that stuff down, but in this case hunting the user down worked for me. (It may have made a difference that I was using gitweb instead of a local clone.) And the case of hunting down your own commits is just a degenerate case of hunting down someone else's. -
As far as "its own branch in bzr" would such a branch remain available Vast, huge, gaping, cosmic difference. Almost none of the power of git is exposed by gitweb. It's really not worth comparing. (Now a gitweb-alike that provided all the kinds of very easy browsing and filtering of the history like gitk and git might be nice to have.) -Carl
Yes, in the sense that you can recreate the branch by using that branch's last commit. But not in the git sense that there's a branch ID pointing at the commit in question. You know what? It occurs to me that much of the problem with git branches vs. bzr branches might be solved when bzr gets proper tagging support. Because, after all, aren't branches more like special tags in So, very probably, I would have had a far easier time of it if I had been able to really use git to do the work, instead of gitweb. I still don't think, though, that it's a sign of a small project to be concerned about one's own branches more than others. -
Both branches _and_ tags in git are 100% the same thing: they're just shorthand for the commit name. That's _literally_ all they are. They are a symbolic name for a 160-bit SHA1 hash. So yes, you can say that branches are like special tags, or that (unsigned) tags are like special branches. There's no real "technical" difference: in both cases, it's just an arbitrary name for the top commit. However, there are some purely UI differences between tags and branches, which really don't affect any of the "name->SHA1" translation at all, but which affect how you can _use_ a tag-name vs a branch-name. - A branch is always a pointer to a _commit_ object. In contrast, a tag can point to anything. It can point to a tree (and that means that you can do _diff_ between a tag and a branch, but such a tree doesn't have any "history" associated with it - it's purely about a certain "state", so you cannot say that it has a parent or anything like that). A tag can also point to a single file object ("blob": pure file content), which is soemthing that the git.git repository uses to point to the GPG public key that Junio uses to sign things, for example. But perhaps more commonly, a tag can also point to a special "tag" object, which is just a form of indirection that can optionally contain an explanation and a digitally signed verification. When I cut a kernel release, for example, my tag's don't point to the commit that is the release commit, they point to a GPG-signed tag-object that in turn points to the commit. With those signed tags, people can verify (if they get my public key) that a particular release was something I did. And due to the cryptographic nature of the hash, trusting the tag object also means that you can trust the commit it points to, and the whole history that points to. So while from a _revision_lookup_ standpoint a "branch" and a "tag" do 100% the same thing, we put ...
Dear diary, on Sun, Oct 22, 2006 at 01:49:04AM CEST, I got a letter http://repo.or.cz/git-browser/by-commit.html?r=linux-2.6.git It could use plenty of improvement, though. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
There was one, but it got discontinued due to performance issues. Shame that, because it would have been nice to have to show "foreign" visitors how gitk/qgit works. It would especially show the way git thinks about branches and stuff like that. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Perhaps I'd better use "star topology bias" instead of "centralization bias". In git branches are lightweight. Branch names are local to repository. Repositories have identity. Bzr "branch" is strange mix of one-branch git repository and git branch. Git main workflow is fully decentralized workflow. All clones of the same repository are created equal. In bzr the suggested workflow (with revnos) forces one (or more) branches to be mainline (use "merge", get empty-merges, revnos don't change) and leaf (use "pull", revnos change). I don't understand. If I merge 'next' branch into 'master' in git, I still have two branches: 'master' and 'next'. And I don't understand why you are so hung on branch identities. Yes, if somebody clones your 'repo' repository, he can have your 'master' branch (refs/heads/master) named 'repo' (refs/heads/repo) or 'repo/master' (refs/remotes/repo/master), but why that matters to you. It is _his_ For revnos to work you MUST have one "branch" to be considered special, the hub in star topology. This very much precludes fully distributed development. BTW. I get that you can use revids in revnos in bzr for fully distributed and not star-topology geared development. But Bazaar-NG revids are uglier that Git commit-ids. In git I can fetch your changes but I don't need to merge them. Take for example Junio 'pu' (proposed updates) branch: this is the branch you shouldn't merge as it's history is constantly being rewritten. If you don't want for your WIP to be publicly available, you don't publish it. For example as far as I understand Junio works on Git in his private repository, with many, many feature branches, but he does push to public [bare] repository only some subset of branches, and we can fetch/pull only those. But still, if I am impatient I can pull from Junio every hour, and I don't get 24 totally useless empty merge messages if he took day But please, have you realized that in this workflow the two clones of the same ...
I think you missed the point. Speaking for myself, I want to maintain the identity of _my_ branches. If you clone one of them, I _don't_ care. That's your branch. Branch identity as presented here is not intended to OK, just to clarify what you are saying here:=20 1. revnos don't work because they don't serve the same purpose as revids or git's SHA1 commit ids. 2. bzr does not support fully distributed development because revnos "don't work" as stated in #1. 3. Ok, bzr does support distributed development, I just say it doesn't because I think revids are ugly. Thus, revids are ugly. Is this really the argument you want to be making? I'm not disagreeing with you; it's just that I'm not sure it's relevant. Can we just put the whole "revnos don't work" thing to rest? Revnos are only intended to be significant relative to a given branch. They are not intended to serve as an absolute, global identifier. Revnos + a url _are_ globally significant, but are not static except in certain topologies. Revids are globally significant and static in any topology. If a user does not like or cannot use revnos, they may use revids. Revnos are not a tool to be used for every job. In no way does that mean that they are broken. If a given developer or group of developers primarily use revnos or revids, it _may_ indicate that _they_ have a bias towards central (or star) or distributed development, but does not necessarily have any I think that when I attempt to pull from one branch to another, if they are identical, neither branch changes. Merging + pulling results in identical history, causing revnos on the pulling branch to change. Just merging maintains divergent views of the same history.=20 Perhaps bzr has a central bias in the view that each developer has the option of seeing their own branch as the central focus of his/her development. This view would be the same from each branch; each developer views his/her own branch as special. If the developer does not want ...
Branches in bzr are both one-source (one head) DAG (of parents), and the "mainline" i.e. track of commits commited in this branch-as-place. Bazaar-NG tries to keep both information in DAG by using first parent to mark commits on current branch-as-place. Additionally bzr by default uses revnos, numbering commits on branch, which needs maintaining mainline identity for revnos not to change even for one branch-as-place. This leads to the need to use "merge" if you want to maintain revnos unchanged, and "pull" if you are not interested in that. Git correctly realizes that mainline identity is local information, and instead of trying to save local information in DAG which is shared, it uses reflog. That is the EFFECT of preferring fast-forward over preserving "first parent is my branch" property. So the RESULT is that shared history is identically ordered. -- Jakub Narebski Poland -
Revnos works only locally, or in star-topology configuration. They have some consequences: treating first parent specially, need for merges instead of fast-forward even if fast-forward would be applicable, two different "fetch" operators: "pull" (which uses revids on the Bazaar is biased towards centralized/star-topology development if we want to use revids. In fully distributed configuration there is no I think that bzr revids are uglier that git commit-ids. If on the pros side of bzr is "simple namespace", you must remember that it is simple namespace only for not fully distributed development. The pros of "simple namespace" with cons of "merge" vs "pull" and centralization required for uniqueness of revids. -- Jakub Narebski Poland -
s/revids/revnos/g but yes, I think I said this later in my previous So revnos aren't globally meaningful in fully distributed settings. So what? I don't see how this translates into bias. There is a lot of functionality provided by bazaar that doesn't really apply to my use I think you've switched revids and revnos, but I get what you are saying. In fact, I think I said pretty much the same thing in the email you are replying to. I don't think that anyone is disagreeing about anything other than the assertion that bzr is biased because revnos are used to simplify cases where it is possible to do so. In any case, Matthew Fuller & Carl Worth cover this in greater detail in emails further down in this thread (or one of its siblings), so I think I'll stop here. -davidc --=20 gpg-key: http://www.zettazebra.com/files/key.gpg
First, bzr is biased towards using revnos: bzr commands uses revnos by default to provide revision (you have to use revid: prefix/operator to use revision identifiers), bzr commands outputs revids only when requested, examples of usage uses revision numbers. In order to use revnos as _global_ identifiers in distributed development, you need central "branch", mainline, to provide those revnos. You have either to have access to this "revno server" and refer to revisions by "revno server" URL and revision number, or designate one branch as holding revision numbers ("revno server") and preserve revnos on "revno server" by using bzr "merge", while copying revnos when fetching by using bzr "pull" for leaf branches. In short: for revnos to be global identifiers you need star-topology. Even if you use revnos only locally, you need to know which revisions are "yours", i.e. beside branch as DAG of history of given revision you need "ordered series of revisions" (to quote Bazaar-NG wiki Glossary), or path through this diagram from given revision to one of the roots (initial, parentless revisions). Because bzr does that by preserving mentioned path as first-parent path (treating first parent specially), i.e. storing local information in a DAG (which is shared), to preserve revnos you need to use "merge" instead of "pull", which means that you get empty-merge in clearly fast-forward case. This means "local changes bias", which some might take as not being fully distributed. Sidenote 1: Why Bazaar-NG tries to store "branch as ordered series of revisions"/"branch as path through revisions DAG" in DAG instead of storing it separately (like reflog stores history of tip of branch, which is roughly equivalent of "branch as path" in bzr). It needs some kind of cache of mapping from revno to the revision itself anyway (unless performance doesn't matter for bzr developers ;-)! All what left is to propagate this mapping on "pull"... Sidenote 2: "Fringe" developer using default git ...
As has been said before, you can set an alias to always show revision Why do you continue to repeat this argument? No one is claiming that a revision number by itself, as Bazaar uses them, is a global identifier. In fact, we keep on saying that they only have meaning in the context of a branch. If you want to use a revision number as part of a globally unique identifier, it needs to be in combination with I won't dispute that Bazaar has features that make it easier to work with the revisions in the line of development of the branch you're working on in comparison to the revisions from merges. But given that every Bazaar branch has this same bias towards their own main line of development, how can that affect whether or not it is distributed? James. -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 And, unlike git, Bazaar branches are all independent entities[1], and they each have a URL. So: http://code.aaronbentley.com/bzrrepo/bzr.ab 1695 is a name for abentley@panoramicfeedback.com-20060927202832-9795d0528e311e31 And it does not depend on any other branch, especially not bzr.dev Since: 1. anyone with write access to the urls can create them 2. anyone with read access to the urls can read them 3. the maintainers of the mainline have no control over them (except as provided by 1) these identifiers are not centralized. Aaron [1] The fact that they may share storage is not important to the model. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFFPPlm0F+nu1YWqI0RAlmLAJ9cpw5X7UXQ82EmoIeUrKzEaFbhdACfZPsS CRJ69XWi7XAWJRi7Fgt9ICU= =WrV9 -----END PGP SIGNATURE----- -
If you don't use centralized numbers (i.e. always refering to bzr.dev, either by using always (bzr.dev URL, revno), or by using "merge" for bzr.dev and "pull" for rest), the numbers are volatile. If URL vanishes, then (URL, revno) to revid mapping is no longer valid. Yeah, I know, cool URI don't change... Besides, you need [constant] network access for this mapping. -- Jakub Narebski Poland -
I _think_ that Aaron was trying to say that abentley@panoramicfeedback.com-20060927202832-9795d0528e311e31 is always constant, so you can use that. Of course, nobody will ever do that, because in practice they're not shown, the same way the "true" BK revision names were never shown and thus never really used. Linus -
By the way, I wonder if accidentally identical revisions (see example for accidental clean merge on revctrl.org) would get the same revision id in bzr. In git they would. -- Jakub Narebski Poland -
They won't. The revision id is made up of the committers email address, a timestamp and a bunch of random data. It wouldn't be hard to switch using checksums as revids instead, but I don't think there are any plans in that direction. Cheers, Jelmer --=20 Jelmer Vernooij <jelmer@samba.org> - http://samba.org/~jelmer/
The place for timestamp and commiter info is in the revision metadata (in commit object in git). Not in revision id. Unless you think that "accidentally the same" doesn't happen... -- Jakub Narebski Poland -
The revision id isn't parsed by bzr. It's just a unique identifier that is generated at commit-time and is currently created by concatenating those three fields. It can be anything you like. The bzr-svn plugin for example creates revision ids in the form svn:REVNUM@REPOS_UUID-BRANCHPATH and bzr-git uses git:GITREVID. Nothing will break if bzr would start using a different format. Cheers, Jelmer --=20 Jelmer Vernooij <jelmer@samba.org> - http://samba.org/~jelmer/
Well, git and bzr really do share the same "stable" revision naming, although in git it's more indirect, and thus "covers" more. In git, the revision name indirectly includes the commit comments too (and git obviously also distinguishes between "committer" and "author", and those end up being indirectly credited in the name of the commit too). But in a very real sense, the bzr stable ("real") revision name does effectively contain the same things as a git ID: it's just that it's a small subset (only committer+date+random number) of what git includes in its names. So you could more easily _fake_ a commit name in bzr, and depending on how things are done it might be more open to malicious attacks for that reason (or unintentionally - if two people apply the exact same patch from an email, and take the author/date info from the email like hit does, you might have clashes. But with a 64-bit random number, that's probably unlikely, unless you also hit some other bad luck like having the pseudo-random sequence seeded by "time()", and people just _happen_ to apply the email at the exact same second). The git use of hashes and parenthood information make any accidental clashes like that a non-issue: if you have exactly the same information, it really _is_ the same commit, since the hash includes the parenthood too. So you're left with just malicious attacks, and those currently look practically impossible too, of course. So I don't think bzr and git differ in this respect. I think you can _trust_ stable git names a lot more, but that's a separate issue. Linus -
There are no requirements on what a revid is in bzr. It's a unique identifier, nothing more. It can be whatever you like, as long as it's unique for that specific commit. The committer+date+random\ number is Bzr stores a checksum of the commit separately from the revision id in the metadata of a revision. The revision is not used by itself to check the integrity of a revision. Cheers, Jelmer --=20 Jelmer Vernooij <jelmer@samba.org> - http://samba.org/~jelmer/
I think Linus' original point here was that if you communicate the revision id to another person and they fetch that revision there is no assurance that the commit they have received is the exact same commit you had. In Git that assurance is implicitly present as the unique identification you communicated to the other person is also that integrity verification. Therefore its nearly impossible to spoof. -- Shawn. -
In unpacked git repository commit-id is also commit address. Pack files adds another level of indirection via pack index file. And functions as checksum. P.S. I'm interested what are bzr equivalents of git different types of objects: commits (revision info) and what is stored in there besides commit message and "snapshot"; trees/manifest i.e. how files are gathered together to form given revision; blob i.e. what is the storage format and how it is divided: changeset-like of Arch or file "buckets" of Mercurial and CVS, or something yet different together. Is there equivalent of git tags and tags objects? -- Jakub Narebski Poland -
That wasn't what I was trying to aim at - the problem is that the bzr revision ID isn't "safe" in itself. Anybody can create a revision with the same names - and they may both have checksums that match their own revision, but you have no idea which one is "correct". So you just have to trust the person that generates the name, to use a proper name generation algorithm. You have to _trust_ that your 64-bit random number really is random, for example. And that nobody is trying to mess with your repo. This isn't a problem in normal behaviour, but it's a problem in an attack schenario: imagine somebody hacking the central server, and replacing the repository with something that had all the same commit names, but one of the revisions was changed to introduce a nasty backhole problem. Change all the checksums to match too.. It would _look_ fine to somebody who fetches an update, and the maintainer might not ever even notice (because he wouldn't send the _old_ revision again, and _his_ tree would be fine, so he'd happily continue to to send out new revisions on top of the bad one on the public site, never even realizing that people are fetching something that doesn't match what he is pushing). In contrast, in git, if you replace something in a git repository, the name changes, and if I were to try to push an update on top of a broken repo like that, it simply wouldn't work - I couldn't fast-forward my own branch, because it's no longer a proper subset of what I'm trying to send. So in git, you can _trust_ the names. They actually self-verify. You can't have maliciously made-up names that point to something else than what they are. [ Also, as a result, and related to this same issue: the git protocol actually never sends object names when sending the object itself. It just sends the object data, and the _recipient_ generates the name from that. So you can't do the _other_ kind of spoofing, and make a repository that _claims_ to have ...
git can have no "accidentally identical revisions". They'd have to be purposefully done, but yes, they'd obviously (on purpose) get the same revision name if that's the case. You may think of tree (not commit) identity, where git on purpose names trees the same regardless of how you got to them. So on a _tree_ level, you are always supposed to get the same result regardless of how you import things (ie two people importing the same tar-ball should always get exactly the same tree ID). But the actual commit names are identical only if the same people are claimed to have authored (and committed) them at the same time - so it's definitely not "accidental" if the commits are called the same: they really _are_ the same. Btw, I think you misunderstand the term "accidental clean merge". It means that two identical changes on two branches will merge without conflicts being reported. A merge algorithm that doesn't do "accidental clean merge" is totally broken. The accidental clean merge is a usability requirement for pretty much anything - you often have two branches doing the same thing (possibly for different reasons - two people independently found the same bug that showed itself in two different ways - so they may even think that they are fixing different issues, and may have written totally different changelogs to explain the bug, but the solution is identical and should obviously merge cleanly). So "accidental clean merge" may _sound_ like something bad, but it's actually a seriously good property (it's really just a special case of "convergence" - again, that's a good thing). Linus -
Sorry, I don't understand this statement. How are git branches not independent? Sure, they tend to exist in repositories with other branches, but there's no need to (it simply allows the sharing of object storage). There's no reason I can't move any branch from any repo into its own repo, or vice versa move any unrelated branch into a repo with other branches. It all Just Works because there _isn't_ any branch information. It's simply a pointer into the DAG, so if I have the right parts of the DAG (which git is careful to make sure of), I can just make a pointer, and I In cogito, branches can each have a URL, but git-clone doesn't have a way (that I know of) to clone only a subset of branches. It would be The git analog is of course: http://kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git v2.6.18 as a name for e478bec0ba0a83a48a0f6982934b6de079e7e6b3 The difference being that Linus assigned the "local" name of v2.6.18 Of course. For me, the above commit is actually ssh://peff.net/home/peff/git/linux-2.6 v2.6.18 but once it is in my local repository, it's indistinguishable from one I pulled directly from kernel.org. And I wonder if THAT is at the root of this discussion. bzr isn't "centralized" in the sense that you have to talk to a central server, or rely on it for doing any operations. But you actually CARE about where your commits come from, and git fundamentally doesn't. -Peff -
By the way, git repositories (remember that working area in bzr is associated with branch, and in git with repository) can share storage, either sharing only immutable "old history" (part of DAG) via $GIT_DIR/objects/info/alternates file or GIT_ALTERNATE_OBJECT_DIRECTORIES environment variable, or via having shared commit object database via symlinking $GIT_DIR/objects directory or via setting GIT_OBJECT_DIRECTORY variable. Git doesn't support latter fully out of the box (you must be careful with prune) but on the other side bzr doesn't support cloning whole Well, with exception of reflog, which is local to repository On the other side Cogito doesn't have way to clone all the branches. -- Jakub Narebski Poland -
Agreed. Of course, I want the simplest case to be the simplest. When working on my own branch, regardless if it is a standalone project or part of a distributed one, I don't want to have to type SHA hashes or revids. Numbers serve my purposes best in this case. When I communicate Ok. Let's not repeat this again. I think I said this once, and you've said it in two following emails. It's a given. Assume that we all know "local changes bias" I can buy that. I even like it. I don't even care if that makes bazaar "not fully distributed." I don't think the distinction between "fully" and "almost, except for some technicality" distributed is one that has much practical value. -davidc --=20 gpg-key: http://www.zettazebra.com/files/key.gpg
I apologize if I've come across as beating a dead horse on this. I've really tried to only respond where I still confused, or there are explicit indications that the reader hasn't understood what I was saying, ("I don't understand how you've come to that conclusion", etc.). I'll be even more careful about that below, labeling paragraphs I'm missing something: I still haven't seen strong examples for this last claim. When are they handier? I asked a couple of messages back and two people replied that given one revno it's trivial to compute the revno of its parent. But that's no win over git's revision specifications, Maybe I wasn't clear: There's no doubt that there has been semantic confusion over the term branch that has been confounding communication on both sides. Here's my attempt to describe the situation, (which only became this clear recently as I started playing with bzr more). This is not an attempt at a complete description, but is hopefully accurate, neutral, and sufficient for the current discussion: Abstract: In a distributed VCS we are using a distributed process to create a DAG, (nodes are associated with revisions and point to parent nodes). The distributed nature means that the collective DAG will have multiple source nodes, (often termed heads or tips). Git: A subset of the DAG is stored in a "repository". The DAG in the repository may have many source nodes. A "branch" is a named reference to a node (whether or not a source). Multiple local repositories may share storage for common objects. There are inter-repository commands for copying revisions and adjusting branch references, but basically all other operations act within a single repository. Bzr: A subset of the DAG is stored in a "branch". The DAG in the branch has a single source node. Multiple local branches may share storage for common objects through a "repository". Basically all operations (where applicable) can act between branches. Let me know if I ...
git-show-branch also shows git-name-rev like names. BTW. git-show-branch has somewhat strange, and different from other git commands UI. You can think of it as text version of gitk/qgit history viewer (although you can use tig for CLI (ncurses) graph). -- Jakub Narebski Poland -
Having used both (though my familiarity with git is less), in my opinion the biggest win is the obvious one: sequential numbers work in the head better than SHA1 checksums. "But it's not a problem in practice!" is a good retort, except that I wonder whether the set of "practices" you're using includes anyone who decided to pass on git in favor of something else--perhaps because they saw a few SHAs float by and ran in terror. Beware of self-selection bias. Put another way, "strength" of example is often in the eye of the beholder. That we continue to give you the same "weak" examples may be evidence that we have a different impression of their strengths, and that your analysis of their strengths isn't convincing to us. I suppose this line of conversation still has value if you don't see any benefit at all, but OTOH if you really don't see how sequential numbers are easier to work with in the head than SHA sums with modifiers, I'm I wonder if part of the problem is that the revno scheme we've been talking about (the x.y.z... format) doesn't technically exist in any released version of bzr that I know of. Previous to 0.12, bzr revnos were absolutely a local thing; revisions from merges didn't even have revnos (except for the merge commit itself). If you merged a branch and you later wanted to recreate that branch, or see a diff from that branch, etc., you had to use revids. So when you talk of a "centralization bias" in bzr, a lot of us get confused, defensive, etc., because from our perspective, bzr and git weren't all that much different until just recently. Now it may be that you're right that "global" revnos like bzr has now introduce a bias in favor of centralization. If that's true, I'm not sure that totally vindicates the git model. We have to ask if the bias is a good thing, but so do you; after all, we may have done so because of user demand, and if our users want it, maybe yours will want it too someday. (I say "may" because I haven't been paying ...
On Sat, 21 Oct 2006 19:07:10 -0400 There is no need to speculate, the numbers will only be reliable on a local basis. So yes you can force a single repository like bzr.dev to always "win" any conflict and force the other guy to change ie. a central repo model. But they can not be maintained consistently in a truly distributed system. As Linus pointed out that is fact, not opinion. Now the opinion of the bzr people is that it doesn't matter and that for all important cases it works well enough. If all the people who don't like the look of sha1's self select bzr, so be it, but that doesn't change the fundamental argument. But just to reiterate, the design of Git is flexible enough to where you can automatically generate "revno" tags for every commit in your repo _today_. You'd end up with the exact same problems that bzr will eventually hit, but Git already has everything you need today to refer to every commit in your repo as r1 r2 r3 r4 etc... Sean -
[ Time to trim up CC's a bit ] On Sat, Oct 21, 2006 at 01:47:08PM -0700 I heard the voice of Oh, I don't mean the whole topic in general. It's just that there are only so many ways one can say "revnos are only valid in certain situations", and I really think we must have hit them all by now. We all agree on that; we just disagree (probably highly based on This seems correct; at least, it's correct enough to work from until Rather, unless you can one way or another access the branch the number I think it's using that 'c' word there that's causing contention here; we're ascribing different meanings to it. Revnos only apply to a specific "branch" (in this usage, I'm talking about branch abstractly and somewhat specifically; more in a moment), and so except by wild coincidence are only useful in talking about that branch. One of the two cases (the second discussed later) where that's useful is when you have long-lived branches. In git, apparently, you don't have long-lived "branches" in this particular meaning of the word, but the way people use bzr they do. Perhaps this is what you mean by 'centralization'. That long-lived branch doesn't have to be any sort of "trunk", though it usually is; it could as easily be something totally peripheral. Now, details of that use of "branch". In mathematical terms, a branch may be defined purely by its head rev (and the graph built up by recursing through all the parents), but in [bzr] UI and mental model terms, a "branch" is that plus its mainline[0]; the left-most or first line of descent, which colloquially is the difference between 'things I commit' and 'things I merge'. Let me try flexing my git-expression muscles here. Given a branch at a specific point in time, you point at the head rev, and there's a subset we call 'mainline' of the whole set of parents, which is expressed by following the 'first' parent pointers back to a single origin (there can be 50 origins in the whole graph, of course, but only one of ...
I would say that: revnos are handier tools than revids...etc I think that since G: was making a statement about revids, B: was making an implicit comparison with them. bzr log -r before:1 =20 being handier than bzr log -r before:revid:david@zettazebra.com-20061022175244-4b85cb5f0cbc79a= d -davidc --=20 gpg-key: http://www.zettazebra.com/files/key.gpg
This is new to me. At work, we merge our toy repositories back and forth between devs only. There is no central repo at all. Does this mean that each merge would add one extra commit per time the one I'm merging with has merged with me? -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
From what I understand, "bzr merge" will create one extra commit to preserve the "first parent is my branch" feature. "bzr pull" will do fast-forward if your DAG is proper subset of pulled branch/repository DAG, but at the cost that it would change your revno to revision mapping to those of the pulled repository. That's a consequence of preserving branch as "my work" i.e. as path through "branch DAG" in the DAG using first parent as special, instead of saving it outside DAG. -- Jakub Narebski Poland -
Actually, "bzr merge" does not create any commits on the branch -- you need to run "bzr commit" afterwards (possibly after resolving conflicts). The control files for the working tree record a pending merge, which gets recorded when you get round to the commit. So you can easily check if there were any tree changes resulting from the merge. If there aren't, or you made the merge by mistake, you can make a call to "bzr revert" to clean things up without ever having created a new revision. James. -
One result of this approach is that developers of different trees don't necessarily have common revision IDs to compare. Imagine a question like: When you ran that test did you have the same code I've got? In git, the answer would be determined by comparing revision IDs. In bzr, the only answer I'm hearing is attempting a merge to see if it introduces any changes. (I'm deliberately avoiding "pull" since we're talking about distributed cases here). And to comment on something mentioned earlier in the thread, there's no need for "wildly complex" distributed scenarios. All of these issues are present with developers working together as peers, (and each considering their own repository as canonical). A harder question (for bzr) is: Do you have all of the history I've got? (The problem being that when one developer is missing some history and merges it in, she necessarily creates new history, so there's never a stable point for both sides to agree on.) -Carl
Can you really just rely on equal revision IDs meaning you have the same code though? Lets say that I clone your git repository, and then we both merge the same diverged branch. Will our head revision IDs match? From a quick look at the logs of cairo, it seems that the commits generated for such a merge include the date and author, so the two commits would have different SHA1 sums (and hence different revision IDs). So I'd have a revision you don't have and vice versa, even though the Or run "bzr missing". If the sole missing revision is a merge (and not the revisions introduced by the merge), you could assume that you Why does it matter if they create a new revision? They can still tell if they've got all the history you had. James. -
If you two have the same commit that is a guarantee that you two have identical trees. The reverse is not true as logic 101 would teach ;-). Doing fast-forward instead of doing a "useless" merges helps somewhat but not in cases like two people merging the same branches the same way or two people applying the same patch on top of the same commit. You need to compare tree object IDs for Is it "you could assume" or "it is guaranteed"? If former, what kind of corner cases could invalidate that assumption? -
That was the point I was trying to make. Carl asserted that in git you could tell if you had the same tree as someone else based on revision IDs, which doesn't seem to be the case all the time. The reverse assertion (that if you have the same revision ID, you have Sure, you can do the same in Bazaar by comparing the inventories for The merge revision will also include any manual conflict resolution. If the other person resolved the conflicts differently. James. -
If you have the same revision (commit IDs), you have the same tree (at the same time, by the same committer, etc). If you have a different revision (commit), you may or may not have the same tree. You can then check the tree id, which will either be the same (you have the same tree) or differ (you don't). Thus, in the converse, if you have the same tree, you _will_ have the same tree id. You may or may not have the same commit id. -Peff -
>>>>> "Jeff" == Jeff King <peff@peff.net> writes:
Jeff> On Thu, Oct 26, 2006 at 05:57:20PM +0800, James Henstridge wrote:
>> >If you two have the same commit that is a guarantee that you two
>> >have identical trees. The reverse is not true as logic 101
>> >would teach ;-).
>>
>> That was the point I was trying to make. Carl asserted that in git
>> you could tell if you had the same tree as someone else based on
>> revision IDs, which doesn't seem to be the case all the time.
Jeff> If you have the same revision (commit IDs), you have
Jeff> the same tree (at the same time, by the same committer,
Jeff> etc).
Jeff> If you have a different revision (commit), you may or
Jeff> may not have the same tree. You can then check the tree
Jeff> id, which will either be the same (you have the same
Jeff> tree) or differ (you don't).
Jeff> Thus, in the converse, if you have the same tree, you
Jeff> _will_ have the same tree id. You may or may not have
Jeff> the same commit id.
Ok, so git make a distinction between the commit (code created by
someone) and the tree (code only).
Commits are defined by their parents.
Trees are defined by their content only ?
If that's the case, how do you proceed ?
Calculate a sha1 representing the content (or the content of the
diff from parent) of all the files and dirs in the tree ? Or
from the sha1s of the files and dirs themselves recursively based
on sha1s of the files and dirs they contain ?
I ask because the later seems to provide some nice effects
similar to what makes BDD
(http://en.wikipedia.org/wiki/Binary_decision_diagram) so
efficient: you can compare graphs of any complexity or size in
O(1) by just comparing their signatures.
Vincent
-
Yes (a commit is a tree, zero or more parents, commit message, and Recursively. Each tree is an ordered list of 4-tuples: pathname, type, sha1, mode. If the type is "blob" then the sha1 is the hash of the file contents. If the type is "tree" then the sha1 is the id of a sub-tree. Yes, if two trees' hashes compare equal, they contain the same data. I believe we are not currently using this optimization to find merge differences, but there was some discussion earlier this week about doing so. -Peff -
Sorry, I should clarify: a commit is a _tree id_, zero or more _parent ids_, commit message, etc. -Peff -
Commits are defined by a _combination_ of:
- the tree they commit (which is recursive, so the commit name indirectly
includes information EVERY SINGLE BIT in the whole tree, in every
single file)
- the parent(s) if any (which is also recursive, so the commit name
indirectly includes information about EVERY SINGLE BIT in not just the
current tree, but every tree in the history, and every commit that is
reachable from it)
- the author, committer, and dates of each (and committer is actually
very often different from author)
- the actual commit message
So a commit really names - uniquely and authoratively - not just the
Where "contents" does include names and permissions/types (eg execute bit
If you compare the commit name, and they are equal, you automatically know
- the trees are 100% identical
- the histories are 100% identical
If you only care about the actual tree, you compare the tree name for
equality, ie you can do
git-rev-parse commit1^{tree} commit2^{tree}
and compare the two: if and only if they are equal are the actual contents
This is exactly what git does. You can compare entire trees (and
subdirectories are just other trees) by just comparing 20 bytes of
information.
How do you think we can do a diff between two arbitrary kernel revisions
so fast? Why do you think we can afford to do a
git log drivers/usb include/linux/usb*
that literally picks out the history (by comparing state) for every commit
in the tree?
I can do the above log-generation in less than ten _seconds_ for the last
year and a half of the kernel. That's 20k+ lines of logs of commits that
only touch those files and directories. And I _need_ it to be fast,
because that's literally one of the most common operations I do.
And the reason it's fast is that we can compare 20,000 files (names,
contents, permissions) by just comparing a _single_ 20-byte SHA1.
In git, revision names (and _everything_ has a revision name: commits, ...>>>>> "Linus" == Linus Torvalds <torvalds@osdl.org> writes:
Linus> On Thu, 26 Oct 2006, Vincent Ladeuil wrote:
>>
>> Ok, so git make a distinction between the commit (code created by
>> someone) and the tree (code only).
>>
>> Commits are defined by their parents.
Linus> Commits are defined by a _combination_ of:
Linus> - the tree they commit (which is recursive, so the
Linus> commit name indirectly includes information EVERY
Linus> SINGLE BIT in the whole tree, in every single file)
And here you keep that separate from any SCM related info,
right ?
Linus> - the parent(s) if any (which is also recursive, so
Linus> the commit name indirectly includes information about
Linus> EVERY SINGLE BIT in not just the current tree, but
Linus> every tree in the history, and every commit that is
Linus> reachable from it)
Linus> - the author, committer, and dates of each (and
Linus> committer is actually very often different from
Linus> author)
Linus> - the actual commit message
Linus> So a commit really names - uniquely and authoratively
Linus> - not just the commit itself, but everything ever
Linus> associated with it.
Thanks for the clarification. But no need to shout about EVERY
SINGLE BIT, the pointer to BDDs was already talking a bit about
bits :)
But I agree, this is the important point that may be missed.
>> Trees are defined by their content only ?
Linus> Where "contents" does include names and
Linus> permissions/types (eg execute bit and symlink etc).
Which can also be expressed as: "Everything the user can
manipulate outside the SCM context", right ?
>> If that's the case, how do you proceed ?
Linus> If you compare the commit name, and they are equal,
Linus> you automatically know
Linus> - the trees are 100% identical
Linus> - the histories are 100% identical
And that's the only info you can get, no ...I don't understand that question.
The commits contain the tree information. A raw commit in git (this is the
true contents of the current top commit in my kernel tree, just added
indentation and an empty line between the command I used to generate it
and the output, to make it stand out better in the email) looks something
like this:
[torvalds@g5 linux]$ git-cat-file commit HEAD
tree ba1ed8c744654ca91ee2b71b7cdee149c8edbef1
parent 2a4f739dfc59edd52eaa37d63af1bd830ea42318
parent 012d64ff68f304df1c35ce5902f5023dc14b643f
author Linus Torvalds <torvalds@g5.osdl.org> 1161873881 -0700
committer Linus Torvalds <torvalds@g5.osdl.org> 1161873881 -0700
Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Fix memory corruption in pci_4u_free_consistent().
[SPARC64]: Fix central/FHC bus handling on Ex000 systems.
where the _name_ of the commit is
[torvalds@g5 linux]$ git-rev-parse HEAD
e80391500078b524083ba51c3df01bbaaecc94bb
ie the commit itself contains the exact tree name (and the name of the
parents), and the name of the commit is literally the SHA1 of the contents
Again, I'm not sure what you mean by that. The SCM does not track
_everything_. It does not track user names and inode numbers, so in a
sense a developer can change things that the SCM simply doesn't _care_
about and never tracks. But yes, the tree contents uniquely identify the
No, there is ordering there too. But yes, the ordering is not in the name
itself, you have to go look at the actual commit history to see it.
No.
If the signatures are equal, the contents are equal, and vice versa. It
No. Don't even think that way. That just confuses you. The hash is
cryptographic, and large enough, that you really can equate the contents
with the hash. Anything else is just not even interesting.
Linus
-
Hello all, Following the very interesting debate about the differences between bzr and git, I thought it was about time I tried to learn properly about git and how to use it. I've been using bzr for a good while now, although since I'm not a serious developer I only use it for simple purposes, keeping track of code I write on my own for academic projects. So, a few questions about differences I don't understand... First off a really dumb one: how do I identify myself to git, i.e. give it a name and email address? Currently it uses my system identity, My Name <username@computer.(none)>. I haven't found any equivalent of the bzr whoami command. Now to more serious business. One of the main operational differences I see as a new user is that bzr defaults to setting up branches in different locations, whereas git by default creates a repository where branches are different versions of the directory contents and switching branches *changes* the directory contents. bzr branch seems to be closer to git-clone than git-branch (N.B. I have never used bzr repos so might not be making a fair comparison). With this in mind, is there any significance to the "master" branch (is it intended e.g. to indicate a git repository's "stable" version according to the owner?), or is this just a convenient default name? Could I delete or rename it? Using bzr I would normally give the central branch(*) the name of the project. (* Central or main on my own system. Not intended to be central in the sense of a CVS-style version control setup:-) Any other useful comments that can be made to a bzr user about working with this difference, positive or negative aspects of it? Next question ... one of the reasons I started seriously thinking about git was that in the VCS comparison discussion, it was noted that git is a lot more flexible than bzr in terms of how it can track data (e.g. the git pickaxe command, although I understand that's not in the released version [1.4.4.1] yet?). A ...
On Tue, 28 Nov 2006 01:01:46 +0100 Assuming you have a recent version of git, then: $ git repo-config --global user.email "you@email.com" $ git repo-config --global user.name "Your Name" Will setup a ~/.gitconfig in your home directory; these settings will apply in any repo you use. Drop the "--global" to set them It's just a common convention and carries no special significance; Don't be afraid to git-clone your local repo, especially with the -l and -s options. That will get you a separate repo/working directory while not taking up much extra disk space (objects from your first repo will be shared with the second). Once you get comfortable with multiple branches in a single repo/ working directory, it often is much better than the alternatives. The Git cherry-pick command lets you grab specific commits from other branches in your repo. But cherry-pick works at the commit level, there is no easy way to grab a single function for instance and merge just its history into another branch. However, you can merge an entire separate project into yours even though they don't share a base commit. This has been done several times in the history of Git itself. For instance you can see two separate "initial" commits in the Git repo with a command like "gitk README gitk" which gives a graphical history of the "gitk" and "README" files and shows each started life in a separate initial commit. Use "git show 5569b" to see Linus bragging on Don't think a direct bridge between the two has been written yet. Cheers, Sean -
Depending on whether you like editing config files by hand or not, you would either just edit your ~/.gitconfig file and add a section like: [user] name = My Name Goes Here email = myemail@work.com or you would use "git repo-config" to do it for you. Personally, I find it easier to just edit the .gitconfig file directly, since the config file syntax is actually rather pleasant, but if you want to do it with a git command, you'd do git repo-config --global user.name "Joseph Wakeling" git repo-config --global user.email joseph.wakeling@webdrake.net (where the "--global" just tells repo-config to use the user-global ~/.gitconfig file - you can also do this on a per-repository basis in the repository .git/config file if you want to have different identities for You can do either, it's almost purely a matter of taste. Using a local branch and switching between them in place has some advantages once you get used to it: most notably you can trivially use git commands that work on data from different branches at the same time. So with that kind of setup it's very natural to do things like "show me everything that is in branch 'x', but _not_ in branch 'y'", and once you get used to that, you really appreaciate it. But at the same time, if you want to actually keep several branches checked out at the same time, and prefer to work on them that way, just use "git clone" to create the other branch instead. It really is just a matter of taste. I suspect that most people tend to end up using the "multiple branches in the same directory and switching between them" approach after a time, but that's really just an unsubstantiated feeling, and it certainly isn't It's just a convenient default name, and it has no real meaning otherwise. Feel free to rename it any way you want (just make sure to edit HEAD to There should be no difference, although since everybody seems to use "master" by default, the documentation is probably geared towards it, ...
Thanks to everyone for your very detailed responses. :-)
On the subject of blame and pulling patches from unrelated branches,
So ... if I understand correctly, I can get patches from somewhere else,
but in the branch history, I will not be able to tell the difference
from having simply newly created them?
With regards to git blame/pickaxe/annotate, the idea of tracking *code*
rather than files was one thing that really excited me when I read about
it in the earlier discussion, and is probably the main reason I'm trying
out git. I'd like to understand this properly so is there a simple
exercise I can do to demonstrate its capabilities? I tried an
experiment where I created one file with two lines, then cut one of the
lines, pasted it into a new file, and committed both changes at the same
time. But git blame -C on the second file just gives me the
time/date/sha1 of its creation, and no indication that the line was
taken from elsewhere.
Back to the more basic queries ... one more difference I've observed
from bzr, after playing around for a while, involves the commands to
undo changes and commits. It looks like git reset combines the
capabilities of both bzr uncommit and bzr revert: I can undo changes
since the last commit by resetting to HEAD, and I can undo commits by
resetting to HEAD^ or earlier.
Some things here I'm not quite sure about:
(1) the difference between git reset --soft and git reset --mixed,
probably because I don't understand the way the index works, the
difference between changed, updated and committed.
(2) How to remove changes made to an individual file since the last commit.
Last, could someone explain the git merge command? git pull seems to do
many things which I would need to use bzr merge for---I can "pull"
between branches which have diverged, for example. I don't understand
quite what git merge does that's different, and when to use one or the
other.
Many thanks again to everyone,
-- Joe
-
Think of it this way: if the _patch_ looks like it's a code movement, then "git blame" will show it as a code movement. Ie, if the patch (to a human) looks like it's moving a function from one file into another (which in a patch will obviously be a question of removing it from one file, and adding it to another), then git will also see it that way, and then "git blame" will also follow its history as it moved. But if somebody sends you a patch that just adds a new function that didn't exist in that context at all, then "git blame" won't ever realize Actually, I think you found a bug. Now, with small changes, "git blame -C" will just ignore copies entirely, so your particular test might not have even been supposed to work, but trying with a new git repo with two bigger files checked in at the initial commit, I'm actually not seeing "git blame -C" do the right thing even for real code movement. And the problem seems to go to the "root commit": if the file existed in the root, the logic in "git blame" to diff against the (nonexistent) parent of the root commit won't do the right thing, and that just confuses git blame entirely. I think Junio screwed up at some point. I'll send him a bug-report once I've triaged this a bit more, but I can recreate your breakage if I start a new git database and create two files in the root, and move data between them in the second commit (but if I instead create the second file in the second commit, and do the movement in the third commit, git blame -C works I'm not quite sure what "bzr revert" does. Git does have a "revert" too, but it will append a _new_ commit that actually undoes the commit you're asking to revert. If you want to just "undo history" (whether it's one commit or many - I don't see why it would be different) then yes, "git reset" is the thing to use. I _suspect_ that bzr people use "uncommit" to undo a commit in order to fix it up. In git, you could do that with "git reset" and a new ...
Obvious when I think about it, otherwise every 'int i;' in the kernel
Actually my setup was like the latter situation you describe, so blame
was probably working fine and just ignoring the small change. But
serendipity is a wonderful thing. :-)
-- Joe
-
Indeed. We didn't do that heuristic originally, and the most common sequence that was "blamed" on being copied from somewhere else was something like the string "<tab><tab><tab>}<nl><tab><tab>}<nl><tab>}<nl>" which is obviously very common in C, especially when you have coding Yeah. As it turns out, the bug was really that "git blame" ended up just not showing the filenames (that it had followed correctly), because it had decided (incorrectly) that they weren't interesting because it all came from the same commit, and it had already shown that commit (just not that _file_ in that commit). So it's fixed now, and probably would never trigger except for the stupid special case that was "let's just show an example of this" ;) Linus -
I'm very happy my stupidity could help. ;-)
On a related note ...
I do think that bzr has quite an intuitive set of commands, and it is
easy to learn, though at this point I don't feel git is really *that*
much more difficult in itself. Although the terminal output for some
problems could be improved, most of my difficulties are stemming from
overlap of command names when the commands themselves do different
things, and the fact that git's documentation is somewhat more technical
than bzr's.
What would be nice would be to have in the documentation a whole bunch
of stupid examples for the main commands, something where someone can
create a repo from scratch, create and modify some simple files
according to instructions, and see the particular command in action.
The tutorials do this, of course, but only for a few cases, when to be
honest it's the more complex commands that most need such explanation.
For beginners, especially less technically skilled ones, it would be
good to have a lot more of, "Do this, here's what git will respond, this
is what it means, here's how to fix it...."
As a relatively non-technical user, perhaps I should keep track of my
difficulties (and others') and try to write something up.
-- Joe
-
100% agreed. A lot of the man-pages etc have been written to be about the technology, not about the _use_ of it. I encouraged people at some point to add an "Examples" section to some of the functions to show what it all _means_, so for "man git-log", I think some of the most useful stuff is that examples section that shows the combination of revision naming and path-name limiting, for example. I personally think that that is a much better way of teaching people what the commands actually do than by mentioning the arguments one by one. But that only exists for a couple of man-pages, and mostly for the simple ones at that. And a lot of the real examples would need "real data" to work on, so it can't easily be done as a trivial example in a man-page, it really needs a tutorial to "build up" to the situation where you can then Yeah. The git "tutorial.txt" should be extended, and preferably be a while nice set of "follow along with the bouncing ball" kind of web-page sequence. So I absolutely agree. It's just that at least me personally, I just can't write documentation. I wrote some of the original tutorial, I've written some of the original tech docs, but I just can't get into the whole "document it" mindset, especially not from a user perspective. It doesn't float my boat, and judging by a lot of the discussions, I obviously also don't even see why something could _possibly_ cause confusion. To make things worse, a lot of the docs (and by that I also mean some of the error messages and helpful hints) tend to be old. The whole fact that "git commit" mentions "git update-index" is exactly that kind of thing: it's largely a legacy message. You'd almost never actually _use_ git-update-index itself these days, and it's much more convenient to just list the files you want to commit to "git commit" directly (or just use the -a flag, if that is what you want to do). But that message exists, because it was written in an earlier age. Linus -
Here's a crazy idea. How about a "git tutorial" builtin or "git example" or something that would create a repository into some useful state for demonstrating something. I know that I'm regularly putting stuff into emails like: mkdir gittest cd gittest git init-db echo hello > hello git add hello git commit -m "add hello" git checkout -b other echo other > other git add other git commit -m "add other" git checkout master # OK, that was just setup, here's what I want to demonstrate git pull . other ... So maybe if there was a command to setup a standard example repository, ("git boilerplate", "git sandbox", "git playground" ?), then the documentation could use that to have full-fledged examples without having to duplicate similar setup each time. And then there could be a way for this command to also spit out the commands it is using to reach some state so it could even serve as a sort of self-documenting tutorial of some sort. Anyone interested in exploring something like that? -Carl
Hi, That sounds fine! Actually, it should be very simple to turn the tutorial into such a script, displaying the command with an explanation, and executing the command. It could even call gitk from time to time, so the user can form a mental model of the ancestor graph. Ciao, Dscho -
Currently tutorial.txt doesn't work like that--there are places where it just tells the user to edit a file, or make a few commits, without listing commands to do so. It also isn't linear. That could all be "fixed", but I think the result would just make it more tedious. But I agree that a "git tutorial" command to set up a canonical example repository might be fun. --b. -
Hi, ;-) I did not forget... t1200-tutorial.sh But it serves a different purpose: it makes sure that we did not break the commands in the tutorial. (I fear that the script and the tutorial have diverged a little bit, though). git-tutorial should not test that, rather it should show the user what is possible, and encourage playing with git. Ciao, Dscho -
usage: bzr annotate FILENAME aliases: ann, blame, praise Show the origin of each line in a file. /Erik -
I also have a basic question about git regarding its content tracking and merging. Does this mean if I have, for example, a large C++ file with a bunch of methods in it and I move one of the methods from the bottom of the file to the top and in another branch someone makes a change to that method that when I merge their changes git will merge their changes into the method at the top of the file where I have moved it? If so that would be really quite impressive! Cheers, Nick -
Hi, As for now, no, it does not. This is a shortcoming of RCS merge which does the heavy-lifting. Having said that, stay tuned for new developments: the functionality of merge is being integrated in git. This opens the door to make use of the code tracking support in git, to do exactly what you just proposed. Ciao, Dscho -
Right now (and in the near future), nope. "git blame" will track the changes (so the pure movement wasn't just an addition of new code, but you'll see it track it all the way down to the original), but "git merge" is still file-based. In other words, "git merge" does uses a data similarity analysis that could be used for smaller chunks than a whole file, but at least for now it does it on a file granularity only (and then passes it off to the standard RCS three-way merge on a file-by-file basis). That said, if the movement happens _within_ a file, then just about any SCM could do what you ask for, by just using something smarter than the standard 3-way merge. So that part isn't even about tracking data across files - it's just about a per-file merge strategy. The "track data, not files" thing becomes more interesting when you factor out a file into two or more files, and can continue to merge across such a code re-filing event. Git can do it for "annotate", but doesn't do it for Indeed, and it's one of the potential future goals that was discussed very early in the git design phase. The point of _not_ doing file ID tracking is exactly that you can actually do better than that by just tracking the data. So some day, we may do it. And not just within one file, but even between files. Because file renames really is just a very specific special case of data movement, and I don't think it's even the most common case. That said, there are several reasons why you might not actually _ever_ want it in practice, and why I say "potential future goal" and "we may do it". I think this is going to be both a matter of not just writing the code (which we haven't done), but also deciding if it's really worth it. Because merges are things where you may not want too much smarts: - Quite often, a failed merge that needs manual fixup may even be _preferable_ to a successful merge that did the merge "technically correctly", but in an unexpected ...
Yes. Because each commit contains parent revision id's, which in turn contain *their* parent revision id's, which in turn..., you know you have exactly the same revision, code, and history leading up to that revision. You may have other revisions on top or on other branches, but all commits, including merge-points and whatnot, leading to that Merges preserve author and commit info. You may need to create a new branch (a git branch, the cheap kind which is a 41-byte file) and fetch "his" into "yours". This will be very cheap if you both have the same code but not the same history, as everything but a few commit-objects will be shared. A more likely scenario though is this; Bob writes a feature that doesn't work as per spec. He doesn't know why. He asks Alice to have a look, so he communicates the commits to her by "please pull this branch from here", or by sending patches and telling Alice the branch-point revision to apply them to. Alice creates the "bobs-bugs/nr1232" at the branch-point and fetches Bobs branch into that or applies the patches on top of that (in the fetch scenario she wouldn't need to know the branch point, since git would figure this out for her). She knows this should create a revision named 00123989aaddeddad39, so if it doesn't, she doesn't have the same code. I imagine this works roughly the same in bazaar, although the original case where tests have already been done and the testers wanted to know "assume" != "know", or was that just sloppy phrasing? -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Two things differ in bzr and git, here: * bzr doesn't do "autocommit" after a merge. So, new revisions are created only if you use"commit". * bzr has two commands, "pull" and "merge". "pull" just does what the git people call "fast-forward", and only this (it refuses to do anything if the branches diverged). In particular, you never have to commit after a pull (well, except if you had some local, uncommited changes). "merge" changes your working directory, and you have to commit after. "merge" will never do fast-forward, it will never change the revision to which your working tree revfers to, and it's your option to commit or not after (if you see that it introduces no changes, you might not want to commit). The final rule in bzr would be "you create an extra commit each time you commit" ;-). As a side-note, it could be interesting to have a git-like merge command (chosing automatically between merge and pull), probably not in the core, but as a plugin. -- Matthieu -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 So I'd say that revnos without the context of a location can only refer to the current branch that the user is working on. They don't refer to the mainline, which typically has its own numbers that don't match the user's. If you're saying that bzr is "centralized" in that the user's current Right. You need something guaranteed to be unique. It's the revno + url combo that is unique. That may not be permanent, but anyone can No. It would be silly for the losing side to publish a mirror of the winning branch at the same location where they had previously published their own branch. So the old number + URL combination would remain valid. If the losing faction decided to maintain their own branch after the merge, they'd have two options 1. continue to develop against the losing "branch", without updating its numbers from the "winning" branch. It would be hard to tell who had won or lost in this case. 2. create a new mirror of the "winning" branch and develop against that. I'm not sure what this point of this would be. I think the most realistic thing in this scenario is that they leave the "losing" branch exactly where it was, and develop against the "winning" Right. This is a difference between Bazaar and Git that's I'd characterize as being "branch-oriented" vs "repository-oriented". We'll I got the impression there was also a local ordering of revisions. Is that wrong? A Bazaar branch is a directory inside a repository that contains: - a name referencing a particular revision - (optional) the location of the default branch to pull/merge from - (optional) the location of the default branch to push to - (optional) the policy for GPG signing - (optional) an alternate committer-id to use for this branch - (optional) a nickname for the branch - other configuration options A Bazaar branch doesn't contain any commit objects ("revisions" in Bazaar parlance). Those are retrieved from the ...
No, there is no such thing like local ordering of revisions. Each revision (commit) has link to its parent(s). Branch technically is just a reference to a particular commit object. The commit itself gives us sub-DAG of DAG of whole history, the DAG of all parents of said commit. Such lineage of commit pointed by branch is conceptually a branch; i.e. branch is DAG of development (not line of development, as there is no special meaning of first parent). You can have (in git repository) also reflog, which records values of branch-as-reference, or branch tip of branch-as-named-lineage. But for example fetch and fast-forward 5 commits in history is Erm, wasn't revno to revid mapping also part of bzr "branch"? We store configuration per repository, not per branch, although there is some branch specific configuration. Workingtree: ~/ Gaah, it's even more inconvenient. Certainly more than using name Is there a command to list all branches in bzr? Is there a command Thats opposite to git view. In git, working area is associated with repository (clone of repository), not branch. We copy whole repositories Which shells? If I understand it '^' was chosen (for example as NOT operator for specify sub-DAG instead of '!') because of no problems for shell expansion. And considering that many git commands are/were written in shell, one certainly would notice that. -- Jakub Narebski Poland -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 It's not part of the conceptual model. The revno-to-revid mapping is done using the DAG. The branch just tracks the head. The .bzr/branch/revision-history file is from an earlier model in which branches had a local ordering. Nowadays, it can be treated as: - a reference to the head revision The notation was that ~/repo would contain the .git directory for the Of course if you have a copy of bzr.dev on your computer, you don't need to type the full URL. it's just like the 'merge ../b' above. But how can you use the branch name of a branch that isn't on your computer? I suspect git requires a separate 'clone' step to get it onto Sorry, it's been quite a long time since people complained at me for using ^, so I don't remember. Perhaps Edgar is right about it being the pipe character in old shells. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD4DBQFFOq+80F+nu1YWqI0RAp/KAJ9Bw1q9/nd3gUAjcX3c+24aoEifeQCYlbD0 tUZ01ra11vkQ7V3RzarXeg== =oFIC -----END PGP SIGNATURE----- -
No. You can merge a branch from a remote repository in a single step: git pull http://example.com/git/repo branch-of-interest But if you want to do something besides (or before) a merge, (for example, just explore its history, do some diffs etc.) then you would fetch it instead, assigning it a local branch name in the process: git fetch http://example.com/git/repo branch-of-interest:local-name After which "local-name" is all one would need to use. So after a fetch like the above, the equivalent of "bzr missing --theirs-only" would be: git log ..local-name [This shows some of the expressive power of git revision specifications. There's no need for a separate "missing" command. It's just one case of viewing a particular subset of the DAG. And the specification language makes almost all interesting subsets easy. The --mine-only specification would be "local-name.."] And beyond what bzr missing does (I believe) it's easy to also see the patch content of each commit with: git log -p ..local-name And then if everything is happy, one could merge that branch in: git pull . local-name (And, yes, it is the case that "pull" with a repository URL of "." is how merging is done. It's bizarre to me that this is not "git merge local-name" instead. There actually _is_ a "git merge" command that could be used here, but it is somewhat awkward to use, (requiring both a commit message (without the -m of git-commit(!)) and an explicit mention of the current branch). So using it would be something like: git merge "merge of local-name" HEAD local-name I've never claimed that git is completely free of its UI warts---though there are fewer now than when I started using it.) But, yes, the notion in git is to bring things in to the current repository and then work with them locally. This has an advantage that network traffic is spent only once if doing multiple operations, (say the three steps shown above: 1) investigate commit messages, 2) investigate patch content, ...
In git DAG is DAG od parents. There are no "child" links. So it is natural to refer to n-th ancestor of given commit (in git <ref>~<n>, in bzr -<m>). To have incrementing (from 1 for first revision on given branch) revision numbers you either have to have links to "children", which automatically means that revisions cannot be immutable to allow for branching at arbitrary revision, or to transverse DAG here and back again (perhaps with cache of revno-to-revid mapping to help performance). Additionally to have incrementing revision numbers you have to remember which part of DAG is our branch; which parent in merge to chose to follow. Bazaar-NG decides here to distinguish first parent; to have first parent immutable it doesn't use fast-forward and always use merge, sometimes The default layout of "clothed" repository is Repository: ~/repo/.git/ Branches: ~/repo/.git/refs/heads/ Workingtree: No, as it was said in other messages in this thread, you can fetch a branch (branches), even from other repository that the one you cloned from, into given branch (branches). For git it would be $ git fetch <URL> <remotebranch>:<localbranch> You probably would want to save above info in remotes file or in config. For cg (Cogito) it would be $ cg branch-add <localbranch> <URL>#<remotebranch> $ cg fetch <localbranch> In git you always use names like 'master', 'next', 'HEAD' (meaning current branch) and also HEAD^, next~5 when comparing branches, viewing history, merging branches, switching to branch etc. Not '../master'... -- Jakub Narebski Poland -
On Sat, 21 Oct 2006 16:05:18 -0400 Of course it works as long as you accept the implicit requirements of supporting them and ignore the cases where they change out from underneath the user. But as soon as users want to embrace distributive models where there isn't a central shared repo, at best revno's are unhelpful and at worst they are counterproductive. The proof of this is that if revno's were sufficient bzr wouldn't need revid's. Since the utility provided by revno's seems so minimal even in the case where they do work, Git simply doesn't bother with them. And "our" experience is that Git really does work well without them. Sean -
Yes. This really is what it boils down to. The _only_ time you actually use revision numbers (as opposed to branch-names or tag-names) is when you want a _stable_ number. It's that simple. You never really need a revision number otherwise. In other situations, you do things like git log --since=2.days.ago gitk v2.6.18.. git diff --stat --summary ORIG_HEAD.. or whatever. It's clearly not "stable", but it's also clearly not a revision number from a UI perspective. When you want a revision number is _exactly_ when you're moving things between branches, or reporting a bug to somebody else, or similar. And that's also _exactly_ when you want the number to be stable and meaningful (ie the other end should be able to rely on the number). And if you need refer to a central repository to do that, it's clearly not distributed. Not needing such a central reference point is what the word "distributed" _means_ in computer science for chrissake! Linus -
But it is *not* *distributed*. The definition of a distributed system
among other things require, that resource identifiers are independent on
the location of the resources. So only using the revision-ids is really
I regularly use bzr and I never used git. But I'd not hesitate a second
to pull --overwrite over the old location. Because the url has a meaning
"the base I develop against" for me and I'd want to preserve that
This is one of things I on the other hand like better on bzr than git.
Because it is really branches and not repositories that I usually care
about.
--------------------------------------------------------------------------------
- Jan Hudec `Bulb' <bulb@ucw.cz>
-
Why not? I think it really does. And due to the fact that merges are merges and will show up as such, I think it's very suitable for feature branches. In fact, in the bzr development of bzr itself. All commits are done in feature branches and then merged into bzr.dev (the main "trunk" of bzr) when they are considered stable. Consider the following bzr branch mainline featureA cd featureA hack hack; bzr commit -m 'f1'; hack hack bzr commit -m f2; etc No I want to merge in mainline again bzr merge ../mainline; bzr commit -m merge hack hack; bzr commit -m f3; hack hack bzr commit -m f4; etc right now, I would have something line this in the branch log ----------------------------------------------------------------- committer: Erik B
I think I haven't properly explained what "feature branch" means. "Feature branch" is short (or medium) lived branch, created for development of one isolated feature. When feature is in stable stage, we merge feature branch and forget about it. We are not interested in the fact that given feature was developed on given branch. BTW. for example in published git.git repository are only available in the form of "digest" 'pu' (proposed updates) branch. I guess what you are talking about are long lived "development branches" (like git.git 'maint', 'master', 'next' and 'pu' branches), or perhaps long lived another user's clone of given git repository. Git considers having clones of given repository totally equivalent, and having fast-forward property more important than remembering "which branch (which clone) has this commit came from" or at least "this commit is from this (current) branch-clone". You have graphical history viewers (bzr has it's own: bzr-gtk), committer and author info, and reflog if enabled if you really, Which if I remember correctly (at least by default) needs and generates As it clarified during this long discussion, bzr "branches" are something between git branches and one-branch [local] clones. Can you for example create branch starting from an arbitrary revision, not only tip of branch? The above sequence of operations can be done in (at least) two different ways in git. Less used: $ cd /somewhere/else $ git clone -l -s <mainrepo>/.git featureA $ cd featureA $ hack; hack; git commit -a -m "f1"; hack; hack; git commit -a -m "f2"; etc $ cd <mainrepo> $ git pull /somewhere/else/featureA/.git (this does commit and merge) But more common used is: $ git branch featureA mainline $ git checkout featureA $ hack; hack; git commit -a -m "f1"; hack; hack; git commit -a -m "f2"; etc $ git checkout mainline $ git pull . featureA The automatic merge message takes care of this, if we enable merge.summary config option. For ...
At Sun, 22 Oct 2006 11:56:32 +0200, "=3D?ISO-8859-1?Q?Erik_B=3DE5gfors?=3D"= Thanks for sharing this example. I think when we look at concrete things that the tools actually let you do, we have a better conversation. Plus, this example highlights some very interesting differences between the tools. So here is a complete sequence of git commands to construct the scenario (even the extra hacking in mainline): mkdir gittest; cd gittest git init-db touch mainline; git add mainline; git commit -m "Initial commit of mainlin= e" git checkout -b featureA touch f1; git add f1; git commit -m f1 touch f2; git add f2; git commit -m f2 git checkout -b mainline master touch sd; git add sd; git commit -m "something done in mainline"; touch se; git add se; git commit -m "something else done in mainline"; git checkout featureA git pull . mainline touch f3; git add f3; git commit -m f3 touch f4; git add f4; git commit -m f4 For reference, here's the same with bzr: mkdir bzrtest; cd bzrtest bzr init-repo . --trees bzr init mainline; cd mainline touch mainline; bzr add mainline; bzr commit -m "Initial commit of mainlin= e" cd ..; bzr branch mainline featureA; cd featureA touch f1; bzr add f1; bzr commit -m f1 touch f2; bzr add f2; bzr commit -m f2 cd ../mainline/ touch sd; bzr add sd; bzr commit -m "something done in mainline" touch se; bzr add se; bzr commit -m "something else done in mainline" cd ../featureA bzr merge ../mainline/; bzr commit -m "merge" touch f3; bzr add f3; bzr commit -m f3 touch f4; bzr add f4; bzr commit -m f4 [As has recently been pointed out, the tools really are more the same OK. So here is a difference in the tools. With git, you don't get the indentation for the "non-mainline" commits. This is because git doesn't recognize any branch in the DAG to be more significant than any other. Instead, git provides a flat, and (heuristically) time-sorted view of the commits. (It's heuristic in that git just uses the time stamps in ...
Thanks for this mail, this makes me happy to see. The tools are pretty much the same but have some different view on how to do things.. If I understand you correctly, you'll get the same thing with "bzr missing". $ bzr missing ../mainline/ You have 1 extra revision(s): ------------------------------------------------------------ revno: 2 committer: Erik B
On Sun, Oct 22, 2006 at 07:25:41AM -0700 I heard the voice of This throws me a little. I'd expect it to Just Do It when it's fast-forwarding, but if it's doing a merge, I'd prefer it to stop and wait before creating the commit, even if there are no textual conflicts. I realize you can just look at it afterward and back out Every branch has a nickname, settable with 'bzr nick' (defaulting to whatever the directory it's in is), and that's stored as a text field in each commit. It's mostly cosmetic, but it's handy to see at a From what I can gather from this, though, that means that when I merge stuff from featureA into mainline (and keep on with other stuff in featureA), I'll no longer be able to see those older commits from this command. And I'll see merged revisions from branches other than mainline (until they themselves get merged into mainline), correct? It sounds more like a 'bzr missing --mine-only' than looking down a The branch: (head) and ancestor: (latest common rev) revspecs let you refer to the respective bits of other branches, which I think would Well, what would be the fun in that? 8-} -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. -
Or you can use --no-commit option to git pull, and commit later. But it is true that you can always amend the commit with If I remember correctly Linus argued against it, because branch name is something local to repository (most common example is "mine 'master' is yours 'origin'"). There was proposal for "note" header for notes like merge algorithm used, or branch name, visible only in 'raw' mode, but it wasn't That's true. That is what history viewers are for (gitk, qgit, tig, gitview, git-show-branch, git-browser) are for. And there is always reflog (if you enable it, of course). -- Jakub Narebski Poland -
one thing you are missing 'mainline' in this git command is not saying 'everything that's in the 'main' published branch'. it's saying 'everything reachable by the tag 'mainline' so when you branched off for your feature development you could set a tag that says 'branchpoint' and no matter what gets merged in mainline after that you can always do branchpoint..featureA and find what you've done. that being said, mainline..featureA is also extremely useful, it tells you what development stuff you have done that have not yet been merged into mainline David Lang -
The thing that the bzr people don't seem to realize is that their choice of revision naming has serious side effects, some of them really technical, and limiting. I already briought this up once, and I suspect that the bzr people simply DID NOT UNDERSTAND the question: - how do you do the git equivalent of "gitk --all" which is just another reason why "branch-local" revision naming is simply stupid and has real _technical_ problems. I really suspect that a lot of people can't see further than their own feet, and don't understand the subtle indirect problems that branch-local naming causes. For example, how long does it take to do an arbitrary "undo" (ie forcing a branch to an earlier state) in a project with tens of thousands of commits? That's actually a really important operation, and yes, performance does matter. It's something that you do a lot when you do things like "bisect" (which I used to approximate with BK by hand, and yes, re-weaving the branch history was apparently a big part of why it took _minutes_ to do sometimes). Again, this is something that people don't expect to have _anything_ to do with revision numbering, but the fact is, it's a big part of the picture. If you have branch-local revision numbering, you need to renumber all revisions on events like this, and even if it is "just" re-creatigng the revno->"real ID" cache, it's actually an expensive operation exactly because it's going to be at least linear in history. One of the git design requirements was that no operation should _ever_ need to be linear in history size, because it becomes a serious limiter of scalability at some point. We were seeing some of those issues with BK, which is why I cared. So in git, doing things like jumping back and forth in history is O(1). Always (with a really low constant cost too). Of course, checking out the end result is then roughly O(n), but even there "n" is the size of the _changes_, not number of revisions or number of ...
On Mon, Oct 23, 2006 at 10:29:53AM -0700 I heard the voice of I for one simply DO NOT UNDERSTAND the question, because I don't know what that is or what I'd be trying to accomplish by doing it. The I don't understand the thrust of this, either. As I understand the operation you're talking about, it doesn't have anything to do with a branch; you'd just be whipping the working tree around to different I agree, and I currently find a number of places bzr doesn't hit the level of performance I think it should. I'm not convinced, however, that any notable proportion of that has to do with the abstract model behind it. And insofar as it has to do with the physical storage model, that can easily be (and I'm confident will be, considering it's I consider it a _technical_ sign of a way of thinking about branches I prefer 8-} -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. -
on many modern VCS systems it's O(n) on the number of changes (start from where you are and apply the patch to change it to rev -1, then apply the patch to change it to rev -2, etc) on git it's O(1) (write the new files into place) David Lang -
gitk (and all other logging functions) can take as its argument a set of arbitrary revision expressions. That means, for example, that you can give it a list of branches and tags, and it will generate the combined log for all of them. "--all" is just shorthand for that, but it's really just a special case of the generic facility. This is _invaluable_ when you want to actually look at how the branches are related. The whole _point_ of having branches is that they tend to have common state. For example, let's say that you have a branch called "development", and a branch called "experimental", and a branch called "mainline". Now, _obviously_ all of these are related, but if you want to see how, what would you do? In git, one natural thing would be, for example, to do gitk development experimental ^mainline (where instead of "gitk" you can use any of the history listing things - gitk is just the visually more clear one) which will show you what exists in the branches "development" and "experimental", but it will _subtract_ out anything in "mainline" (which is sensible - you may want to see _just_ the stuff that is getting worked on - and the stuff in mainline is thus uninteresting). See? When you visualize multiple branches together, HAVING PER-BRANCH REVISION NUMBERS IS INSANE! Yet, clearly, it's a valid and interesting operation to do. An equally interesting thing to ask is: I've got two branches, show me the differences between them, but not the stuff in common. Again, very simple. In git, you'd literally just write gitk a...b (where "..." is "symmetric difference"). Or, if you want to see what is in "a" but _not_ in "b", you'd do gitk b..a (now ".." is regular set difference, and the above is really identical to the "a ^b" syntax). And trust me, these are all very valid things to do, even though you're talking about different branches. No. If you "undo", you'd undo the whole history too. And if you undo to a point ...
On Mon, Oct 23, 2006 at 03:44:13PM -0700 I heard the voice of I have zero problem believing that. It seems from all accounts a wonderful swiss-army chainsaw, and while none of that power is useful to me personally in anything I'm VCS'ing at the moment, I'd feel awful shiny knowing it was sitting there waiting for me. All else being equal, I'd think more highly of a VCS with those capabilities than one without. bzr-the-program doesn't have a lot of that capability, and what it does have is rather more verbose to access. Perhaps some attribute of bzr-the-current-storage-model would make some bit of that significantly more expensive than it has to be (I don't know of any, and can't think offhand of anywhere it might hide, but that's way off my turf). But I don't understand how bzr-the-abstract-data-model makes such things impossible, or even significantly different than doing so in git. In git, you're just chopping off one DAG where another one intersects it (or similar operations). To do it in bzr, you'd do... exactly the same thing. The revnos, or the mainline, are completely useless in such an operation of course, but they don't hurt it; the tool would just just ignore them like it does the SHA-1 of files in I wouldn't be so absolutist about it, but certainly they're of extremely limited utility if of any at all in such cases. And yes, it can be an interesting operation. But what does that have to do with using revnos in other cases? You keep saying "having" where I would Well, I guess in this particular case I still don't see why you'd generally undo big hunks of a branch versus just flipping your working tree to different versions. But contrived examples are still examples, and even if so, truncate()'ing a list of numbers is a constant time operation. And even if you had to renumber totally... my $DEITY, I'd expect my old 200MHz PPro to renumber a hundred Quite frankly, I just don't think you understand that I WANT to care about first parents. No, ...
one key difference is that with bzr you have to do this chopping by creating the branches at the time changes are done, with git you do this chopping after the fact when you are displaying the results. As such you can chop and compare things in ways that were never contemplated by nobody is saying that the bzr approach is invalid for your workflow. what people are saying is that it doesn't easily support a truely distributed workflow. this is a very different statement. your workflow isn't truely distributed so you bzr's model works well for you. no problem, just don't claim that becouse you haven't run into any problems with your workflow that there are no problems with bzr with other workflows. David Lang -
On Tue, Oct 24, 2006 at 08:58:56AM -0700 I heard the voice of HUH? Why on earth do you think that? To do this in a git data model, you point at 2 (or 3, or 4, or...) revisions, anywhere in the revision-space universe. You derive back a DAG of the history from each of them by recursing over parent links. You figure out where (if anywhere) those DAG's intersect. And based on that, you alter what and how you display; including or excluding certain revs, changing the angles of lines or columnation of dots in a graph, etc. To do it in a bzr data model, you would follow *EXACTLY* the same And it's one that carries around a lot of unstated assumptions about what "truely distributed" means, which *I*'m certainly not understanding, because any meaning I can apply to the term doesn't lead me to the conclusions it does you. Certainly, depending on your workflow, certain parts of the UI are of lesser utility than they are in mine, down to and including zero. And it's probably certain that some parts of the UI aren't up to handling various workflows, too, including OUR workflow. That's kinda what "in development" means... But that's a very different statement from the claim that they CAN'T be without changes to the conceptual model underneath. Just because a UI is built around maintaining the fiction of a mainline doesn't mean the system requires it. All you'd have to do to abandon it is write a different log formatter that didn't show revnos and didn't nest merge commits, and change (or add an option to) 'merge' to fast-forward if possible. The difference between the views on how the pieces should fit together really IS just that fine. -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. -
it sounded like you were saying that the way to get the slices of the DAG was to use branches in bzr. to do this you need to create the branches with the correct info on each branch. this is only practical if the branches are created as the changes are made, if you try to do this after the fact you need to create the changes in the branch before you do the slicing. with git you can look at the DAG and pick any arbatrary points in it as points the claim isn't that bzr can't be modified to support these other workflows (it sounds as if just changing to tools to use the internal refid's rather then the current refno's would come very close to solving this problem), it's that the current refno's (use of which is strongly encouraged by the current UI) cannot support some workflows, and therefor the claim that it supports fully distributed workflows as well as git is false remember that this entire thing started with a feature comparison checklist, the definitions of some of the items on the checklist is being questioned. after that there's the issue of if the VCS in question has the feature. this discussion started with two topologies 1. Centralized: all commits must go to one repository, connectivity required to check-in 2. Distributed: everything else since then one additional topology has been defined, and one has been redefined 1. Centralized: all commits must go to one repository, connectivity required to check-in 2. Star: one repository is 'special' or 'primary' and all other repositories sync to this, but development can take place against local repositories, connectivity is only requred when syncing the repositories. as updates take place the history is defined by the primary repository, and can overwrite or change the history as defined by local repositories. 3. Distributed: all repositories are equal (any definition of 'primary' is a matter of convention, not a requirement of the tool) development can take place against local ...
On Tue, Oct 24, 2006 at 11:03:20AM -0700 I heard the voice of I'm not entirely sure I understand what you mean here, but I think you're saying "Nobody's written the code in bzr to show arbitrary I think this statement arouses so much grumbling because (a) bzr does support such a lot better than often seems implied, (b) where it doesn't, the changes needed to do so are relatively minor (often merely cosmetic), and (c) disagreement over whether some of the I think there's a real intent for bzr TO support at least all common topologies. I'll buy that current development has focused more on [relatively] simple topologies than the more wildly complex ones. I look forward to more addressing of the less common cases as the tool matures, and I think a lot of this thread will be good material to work with as that happens. It's just the suggestion that providing fruit for simple topologies _necessarily_ prejudices against complex That's a good enough reason for me. Before this thread, I wasn't interested in using git. I'm still not, but now I understand much better /why/ I'm not. And when (I'm sure it'll happen sooner or later) some project I follow picks up using git, I'll have enough grounding in the tool's mental model to work with it when I have to. -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. -
I think we are talking past each other here.
what I think was said was
G 'one feature of git is that you can view arbatrary slices trivially'
B 'bzr can do this too, you just use branches to define the slices'
G 'but this limits you becouse branches are defined as code is developed, git
lets you define slices at viewing time'
by the way, I think it's more then just saying 'well, the code could be written
to do this in $VCS' some decisions and standard ways of doing things can impact
how hard it is to implement a feature, and some decisions can make it
one concern that the git people are voicing is that the things that work for
simple topologies (revno's) can't be used with the more complex ones (where you
need the refid's). especially the fact that users need to do things
significantly different when there are fairly subtle changes to the topology.
the scenerio that came up elsewhere today where you have
Master
/ \
dev1 dev2
and then dev1 and dev2 both start working on the same thing (without knowing
it), then discover they are working on the same thing. they now have threeB
options
1. merge their stuff up to the master so that they can both pull it down.
but this puts broken, experimental stuff up in the master
2. declare one of the dev trees to be the master
this changes the topology to
Master--dev1--dev2
3. pull from each other frequently to keep in sync.
this changes the topology to
Master
/ \
dev1--dev2
if they do this with bzr then the revno's break, they each get extra commits
showing up (so they can never show the same history).
in git this is a non-issue, they can pull back and forth and the only new
history to show up will be changes.
this is the situation that the kernel developers are in frequently. it sounds as
if you haven't needed to do this yet, so you haven't encountered the problems.
David Lang
-
On Wed, Oct 25, 2006 at 03:40:00PM -0700 I heard the voice of
Ah. This is more like "bzr [mostly] only does this now in terms of a
single branch (or some point back along it)". The slices that go
between branches are very limited ('missing' gives you one view;
'branch:' and 'ancestor:' revision specifications give you another).
bzrk/'visualize' gives an interface similar to gitk, but also only in
the context of a single branch/head looking backward through its
previous tree AFAIK. Any random DAG-slicing of what you have in the
revision store can be done, somebody would just have to write the code
for it. Nothing about 'the workflow preserves parents' would make
that any harder than writing the code for git was.
Much of this is probably a result of the 'branch'-centric (rather than
'repository'-centric) view of the world; similarly to the fact that
branches are referred to by location (local ../otherbranch, or remote
http/sftp/etc) rather than by a name. This is one of the bits of bzr
These two are either/or, not and; either they pull (in which case
their old mainline is no longer meaningful), or they merge (in which
In git, this is a non-issue because you don't get to CHOOSE which way
to work. You always (if you can) pull and obliterate your local
mainline. In bzr, it's only an 'issue' because you CAN choose, and
CAN maintain your local mainline. You CAN choose, right now, to do a
git and pull back and forth and only new history show up as changed by
creating a 'bzr-pull' shell script that does a 'bzr pull || bzr merge'
(though you'd be a lot better off adding a '--fast-forward-if-you-can'
option to merge and aliasing that over).
More basically, though, I don't think that "histories become exactly
equivalent" is a necessary pass-word to enter the Hallowed City of
Truely Distributed Development. And I certainly see no reason to
believe we'll agree on it this time any more than We (in broad) have
the last 6 times it came up in the thread.
--
Matthew ...Yes they do. They can (and in this case probably will) create a topic-branch named "the-other-dev/featureX" and keep it solely for tracking the other peers changes, keeping their own topic-branch for their own changes, and another branch where they merge both changes in, or cherry-pick from each branch to get to the desired result fast. This works easily because in git a) branches are as cheap as I can ever imagine an SCM making them. b) the "slice the DAG and view anything you like from any branch you like any time you like and mix them however you want" approach of the visualizers makes it trivial for a 10-year old fledgling programmer to see what changes what, and where, and by whom, and why. The "b" above was a feature I didn't know I needed until it became available to me. Thanks to Paul Mackerras (spelling?) for creating the wonderful gitk tool, and to Marco Costalba for making a faster and, imo, Git puts emphasis on code. Bazaar puts emphasis on developers and branch-structure. Depending on your preferrence, I imagine one suits some people better. I really, really, really don't care if my branch-tip gets moved because I hadn't made any changes to it while the other dev hacked away or if it causes a merge because we had decided to work on different parts of the feature. Perhaps this is a result of the insanely good visualizers (kudos again to Paul and Marco) that easily lets me see who did what when and where anyways. What I *do* care about is being able to easily make sure all the devs have the same code to work and The only issue I have with bzr's revno's and truly distributed setup is that, by looking at the table, it seems to claim that you have found some miraculous way to make revnos work without a central server. Since everyone agrees that they don't, this should IMO be listed as mutually exclusive features. On a side-note, git has made my life easier, so I childishly want to defend it and see it on top of every list in the world. ...
Haha, I feel the same way about bzr. Some of the features that bazaar has, such as how it preservs the leftmost parent and treats that specially in some cases, are things that I REALLY love and don't want to live without. All in all, I feel that git and bazaar and both excellent products, what will happen in the future will be interesting to see. /Erik -- google talk/jabber. zindar@gmail.com SIP-phones: sip:erik_bagfors@gizmoproject.com sip:17476714687@proxy01.sipphone.com -
On Thu, Oct 26, 2006 at 12:13:39PM +0200 I heard the voice of Not where I was going with that section of the mail; I was looking at just the merge vs fast-forward distinction. In git, you don't get to choose; in bzr you do. -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The "simple namespace" is both a URL and a revno. And therefore, it's just as distributed and decentralized as the web. There is very little difference between this: http://example.com/mywebpage#5 And this: http://example.com/mybranch 5 In fact, we've been planning to unify them into one identifier. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFFQLxr0F+nu1YWqI0RAiVrAJ9rb+uylIuxqMo2VMelI3Qm6oNQOwCfeTAb kOkp9kOkRl1YEVEP+G3y2SU= =Zgsg -----END PGP SIGNATURE----- -
Since bzr branch is, and is ONLY, a pointer to a revision, I don't see any design decision that would make this harder in bzr. The UI was only The more I read this thread I actually think bzr does support distributed topology as well as git. The whole difference is that bzr makes a distinction between the first and other parents of a revision, while git does not. This distinction is done in two places: 1. The log shows the first parent and than, as indented subsection the ancestry of other parents until the point where the ancestries meet again. This actually captures a pattern people usually use. When you merge, you usually put in the log something along the lines: "merged X, which bars and fixes foo." when you actually merge M, which you consider a "mainline" and therefore not worth mentioning and X. Linus does it this way too -- he actually posted a log message as an example, that showed exactly this. 2. Assigns revision aliases in this same order (except the "major" number for the subsection is based on the common ancestor, not on the merge point). They are not special thing that is generated at commit time; they are infered from the shape of the DAG (and cached for performance reasons). And the only issue I think is, that the bzr UI and documentation pushes forward these aliases (revnos) more than appropriate for fully distributed case and hides the real revision names (revids) too much for That's a deficiency of merge not telling that a merge is pointless. Actually I think than bzr merge *should* reduce to pull in all cases: - If the common ancestor is on the leftmost path of the other branch, than the existing revnos as seen on this branch will not change in any case, only more than one is added. I think it's safe for merge to reduce to pull in this case and consider it a bug in bzr that it does not. - If the common ancestor is not on the leftmost path on the other branch, than it is because the branch was ...
There are two things to do: * Mark the tree as corresponding to a different revision in the past. This is roughly "echo 'revision@id-123' > .bzr/checkout/last-revision" in bzr. Obviously, writting the file is O(1), but computing the revision identifier if you say "bzr switch -r 42" (I'm not sure switch accepts this BTW), you have to load the revision history. Indeed, bzr would load it anyway to make sure that the revision you switch to is in the revision history. In bzr, you have .bzr/branch/revision-history for each branch, which is a newline-separated list of revision-identifiers. In the case of bzr.dev, for example, this file is 112KB as of now. This is O(history), with "history" being the length of the path from HEAD to the initial commit, following the leftmost ancestor (i.e. number of revisions in a centralized workflow, and less than this otherwise). That said, the constant factor is very small. For example, on bzr.dev, I did "grep -n some-rev-id" (which does revid-to-revno), it takes 0.004 seconds (Vs 0.003 seconds to grep in /dev/null instead ;-) ), so you'd need many orders of magnitude before this becomes a limitation. Linus's point AIUI is that this will _never_ be a limitation of git. * Then, do the "merge" to make your tree up to date. You can hardly do faster than git and its unpacked format, but this is at the cost of disk space. But as you say, in almost any modern VCS, that's O(diff). In a space-efficient format, that's just the tradeoff you make between full copies of a file and delta-compression. -- Matthieu -
With this sort of setup, I would publish my branches in a directory
tree like this:
/repo
/branch1
/branch2
I make "/repo" a Bazaar repository so that it stores the revision data
for all branches contained in the directory (the tree contents,
revision meta data, etc).
The "/repo/branch1" essentially just contains a list of mainline
revision IDs that identify the branch. This could probably be just
store the head revision ID, but there are some optimisations that make
use of the linear history here.
If the ancestry of "/repo/branch2" is a subset of branch1 (as it might
be if the in the case of forked then merged projects), then all its
revision data will already be in the repository when branch1 was
imported. The only cost of keeping the branch around (and publishing
it) is the list of revision IDs in its mainline history.
For similar reasons, the cost of publishing 20 related Bazaar branches
on my web server is generally not 20 times the cost of publishing a
single branch.
I understand that you get similar benefits by a GIT repository with
With the repository structure mentioned above, the cost of publishing
multiple branches is quite low. If I continue to work on the project,
then there is no particular bandwidth or disk space reasons for me to
cut off access to my old branches.
For similar reasons, it doesn't cost me much to mirror other people's
If you need that level of stability then you want the revision
identifier in both the GIT and Bazaar cases.
As for simplicity, note that Bazaar doesn't extract any special
meaning from the "$email-$date-$random" format of the revision
identifiers. The only property it cares about is that they are
globally unique. For example, revision identifiers generated by the
Arch -> Bazaar importer have a different format and are handled the
That is correct. The revision numbers assigned to particular
revisions in the context of one branch won't necessarily be the same
I can't say anything ...And here we have a feature which is as far as I see unique to git, namely to have persistent branches with _separate namespace_. It means that we can have hierarchical branch names (including names like "remotes/<remotename>/<branch of remote>", or "jc/diff"), and we don't have to guess where repository name ends and branch name begins. The idea of "branches (and tags) as directories" was if I understand it correctly introduced by Subversion, and from what can be seen from troubles with git-svn (stemming from the fact that division between project name and branch name is the matter of _convention_) at least You can get similar benefits by a GIT repository with shared object database using alternates mechanism. And that is usually preferred over storing unrelated branches, i.e. branches pointing to disconnected DAG (separate trees in BK terminology) of revision, if that you mean by multiple head revisions (because in GIT there is no notion of "mainline" But the revision number in this case _changes_. It is from 7 to branch:7 but still it changes somewhat. Emphasisis on _potential_. SHA1 id abbreviated to 6 characters might be not unique in larger project, but for example the chance that SHA1 id abbreviated to 7 or 8 characters is not unique is really low. Yet another analogy: SHA1 identifiers of commits (and not only commits) can be compared to Message-Ids of Usenet messages, while revision numbers can be compared to Xref number of Usenet message which if I understand correctly is unique only for given news server. But Message-Ids cannot be shortened meaningfully like SHA1 ids can; newertheless they are used in communication without any problems. Even if namespace is not simple ;-) -- Jakub Narebski Poland -
With the above layout, I would just type:
bzr branch http://server/repo/branch1
This command behaves identically whether the repository data is in
/repo or in /repo/branch1. Someone pulling from the branch doesn't
have to care what the repository structure is. Having a separate
namespace for branch names only really makes sense if the user needs
to care about it.
As for heirarchical names, there is nothing stopping you from using
deaper directory structures with Bazaar too. Bazaar just checks each
I think you are a bit confused about how Bazaar works here. A Bazaar
repository is a store of trees and revision metadata. A Bazaar branch
is just a pointer to a head revision in the repository. As you can
probably guess, the data for the branch is a lot smaller than the data
for the repository.
You can store the repository and branch in the same directory to get a
standalone branch. The layout I described above has a repository in a
parent directory, shared by multiple branches.
If you are comparing Subversion and Bazaar, a Bazaar branch shares
more properties with a full Subversion repository rather than a
I may have got the git terminology wrong. I was trying to draw
parallels between the .git/refs/... files in a git repository and the
way multiple branches can be stored in a Bazaar repository.
I am not claiming that you'll get bandwidth or disk space benefits for
storing unrelated branches in a single Bazaar repository. But if the
branches are related, then there will be space savings (which is what
A revision number is only has meaning in the context of a branch. If
I mirror a branch, the revision numbers in the context of each will
refer to the same revision IDs.
My point was that by shortening the IDs with GIT, you are trading
global uniqueness (i.e. the identifier may clash with one found in a
different context) for the convenience of shorter identifiers.
Provided you know that the tradeoff is being made, it isn't generally
much of a ...With Cogito (you can think of it either as alternate Git UI, or as SCM built on top of Git) you would use $ cg clone http://server/repo#branch for example $ cg clone git://git.kernel.org/pub/scm/git/git.git#next to clone _single_ branch (in bzr terminology, "heavy checkout" of branch). But you can also clone _whole_ repository, _all_ published branches with $ cg clone git://git.kernel.org/pub/scm/git/git.git With core Git it is the same, but we don't have the above shortcut for checking only one branch; branches to checkout are in separate arguments to git-clone. In bzr it seems that you cannot distinguish (at least not only from URL) where repository ends and branch begins. *Sidenote:* In current version of gitweb you can get file in given repository in given branch using the following notation: http://path/to/gitweb.cgi/repo/sitory/branch/name:file/name gitweb can detect where branch name ends and repository name begins; usually (by convention) "bare" git repositories uses <project>.git name, "clothed" git repositories uses <project>/.git Oh, that explained yet another difference between Bazaar-NG (and other SCM which uses similar model) and Git. In Git branch is just a pointer to head (top) commit (hence they are stored under .git/refs/heads/) in given line of development. Git also stores information (in .git/HEAD) about which branch we are currently on, which means on which branch git puts new commits. Nothing more (well, there can be log of changes to head in .git/logs/refs/heads/ but that is optional and purely local information). In Bazaar-NG you have to store (if I understand it correctly) mapping from revnos to revisions. By default (it means for example default behavior of git-clone, if we don't use --bare option) git repository is _embedded_ in working area. We have .git/ .git/HEAD ... .git/refs/heads/ ... <working area files, e.g.> So repo/branch wouldn't work, because 'branch' would conflict ...
Dear diary, on Fri, Oct 20, 2006 at 03:17:26PM CEST, I got a letter Nope, cg clone will in this case clone the master branch (or whatever the remote HEAD points at). cg clone -a is planned but not implemented You don't need to, you can switch your working tree between various branches. I think Linus said he does that (or was it Junio?), and I do that as well, as well as many others. A good question would be "when to create another branch and when to clone the repository". And I don't think there's any good answer, except "when you are comfortable with it". :-) Both approaches have pros/cons. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
That's probably because Cogito still uses obsolete branches/ $ git clone git://git.kernel.org/pub/scm/git/git.git clones _whole_ repository, all the branches and tags, and saves information I should have said: bring working area to state given by some revision (instead of "populate working area"). -- Jakub Narebski Poland -
My understanding of git is that this would be equivalent to the "bzr branch" command. A checkout (heavy or lightweight) has the property I suppose that'd be useful if you want a copy of all the branches at Two points: (1) if we are publishing branches, we wouldn't include working trees -- they are not needed to pull or merge from such a branch. (2) if we did have working trees, they'd be rooted at /repo/branch1 and /repo/branch2 -- not at /repo (since /repo is not a branch). In case (2) there is a potential for conflicts if you nest branches, but people don't generally trigger this problem with the way they use That is fairly similar to the default mode of operation with Bazaar: you have a repository, branch and working tree all rooted in the same directory. If you have separated working trees and branches, then The layout of a standalone branch would be: .bzr/repository/ -- storage of trees and metadata .bzr/branch/ -- branch metadagta (e.g. pointer to the head revision) .bzr/checkout/ -- working tree book-keeping files source code If we use a shared repository, the contained branches would lack the .bzr/repository/ directory. The parent directory would instead have a .bzr/repository/, but usually wouldn't have .bzr/branch/ (unless there is a branch rooted at the base of the repository). if we are publishing a branch to a web server, we'd skip the working tree, so the source code and .bzr/checkout/ directory would be missing. In the case of a checkout, the .bzr/branch/ directory has a special format and acts as a pointer to the original branch. If the checkout is lightweight, the .bzr/repository/ directory would be missing, and Okay. So using Bazaar terminology, this seems to be an issue of the working tree being associated with the repository rather than the branch? Well, a branch can easily have multiple URLs even if there is only one copy of it. I might write to it via local file access or sftp (which would be a file: or sftp: ...
Not exactly (my mistake in explaining it). "cg clone git://host/repo@branch" clones only part of history DAG of commits reachable from given branch. Still it is full repository. You can add branches to it later with That is _very_ useful. And that is default option for Git. For example with git.git repository I'm interested both in 'master' branch (main line of development), and in 'next' branch (development branch). For example I send some patches, based on 'master', they get accepted but in 'next' (to cook for a while for example), and I want to do further work in this direction I have to base my new work on 'next' branch. It looks like the Bazaar-NG "branches" are equivalent of the one-branch-clone of Git. And if there is no command to clone whole repository, how you do public repository? See below. Same with Git. Public repositories are usually "bare" clones, i.e. without working directory. We can clone/fetch from "clothed" repo There is no problem in Git to have git repository nested within working area: of course you better ignore .git directory; you can ignore files in this embedded repository or not. The layout of git repository (git clone, as it is equivalent of bzr branch) you have the following layout: .git/objects/ -- repository objects database .git/refs/ -- heads (branches) and tags .git/index -- staging area for commit (adding files, merge resolving) .git/HEAD -- which branch is current branch The equivalent of shared repository would be having .git/objects/ to be symlink to some directory which would serve as common area to store object database. You can use alternates file: .git/objects/info/alternates can have list of absolute pathnames (one per line) where objects can be found instead. If I understand correctly new objects gets commited to current repository object database, therefore to have equivalent of symlinking .git/objects directory you would have for every repository which you want to share object database to have ...
Dear diary, on Sat, Oct 21, 2006 at 12:50:31AM CEST, I got a letter It's not exactly convenient, but you can do xpasky@machine[0:0]~/git$ GIT_ALTERNATE_OBJECT_DIRECTORIES=../cogito/.git/objects cg-diff -r `GIT_DIR=../cogito/.git cg-object-id -c HEAD`..HEAD I don't personally think it's worth a special UI, but there're no boundaries for initiative... :-) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
First, I want to point out that I think we're having a delightfully enlightening conversation here, and I'm glad for that. Let me provide a couple of hypothetical situations to try to demonstrate my thinking here. The first is far-fetched but perhaps easier to understand the implications. But the second is the real, everyday situation that is much more important. Far-fetched ----------- Let's imagine there's a complete fork in the bzr codebase tomorrow. We need not suppose any acrimony, just an amiable split as two subsets of the team start taking the code in different directions. Now, at the time of the fork, all published revision numbers apply equally well to either team's codebase, (obviously, since they are identical). But as the projects diverge they each start publishing revision numbers with respect to their own repositories in their own bug trackers, etc. Obviously, each project has its own "mainline" so these new revision numbers are only unique within each project and not between the two. Time passes... Finally the two teams (who had remained good friends after the breakup) find a unifying theory that will let them work on a single tool that will meet the needs of both user bases. So they want to merge their code together. After the merge, there can be only one mainline, so one team or the other will have to concede to give up the numbers they had generated and published during the fork. That is, the numbers will not be usable within the new, merged repository. Everyday -------- Now, the above scenario is just silly. It's not likely to ever happen, so it's really not worth considering as a motivating case. But, what does (and should) happen everyday is exactly the same. So here's a realistic situation that is worth considering: An individual takes the bzr codebase and starts working on it. It's experimental stuff, so it's not pushed back into the central repository yet. But our coder isn't a total recluse, so his friends help him with the code ...
Note that the id's are still permanent in this case; they will never (module some assumptions about the crypto) be reused. So a given id points at one and only one object, for all time; it's just that we may So in this case you can certainly lose the launch codes. But you have forever granted everyone a way to determine whether a given guess at the launch codes is correct. (Again, assuming some stuff about SHA1). --b. -
In what sense? Yes, you can make a guess if you have stored the SHA1 that contained the launch codes. But the point is that that particular SHA1 is no longer part of the repository. Keeping that SHA1 is no easier than just keeping the launch codes in the first place. -Peff -
Well, I thought the discussion was about what meaning references have after branches were modified or removed. In which case the interesting situation is one where an object is gone but someone somewhere still holds a reference (because the SHA1 was mentioned in a bug report or an Could be. Anyway, the important difference between the SHA1 references and small integers is that there's no aliasing in the former case. Which is important--I'd rather have a reference to nothing than a reference to the wrong thing.... --b. -
Git tries very hard to make sure you don't have a reference to something that doesn't exist. But yes, you could have a reference to the SHA1 in another, non-git source, and try to guess the data from it. However, there's a bit of a two-step procedure, since the SHA1 will likely be of the commit. You have to guess the commit author, date, message, and the contents of the rest of the tree to make a correct guess. In practice I think most "launch code" scenarios are less about guessable confidentiality, and more about ceasing to publish things you shouldn't be (like copyright or patent encumbered code). -Peff -
bzr seems to use the classic UUID format, and it's funny how much it looks like a real BK ChangeSet revision number ("key"). Here's the quoted bzr "true" revision ID: Matthieu.Moy@imag.fr-20061017152029-4c5a2861bcf23b7d and here's a BK "ChangeSet Key": adi@zaphod.bitmover.com|ChangeSet|20031031183805|57296 (I don't have BK installed anywhere, so I had to google for changeset keys, and this was just some random key in the BK bugzilla ;) Looks very similar, don't they? And yes, the true revision ID is stable over time (at least it was in BK, and I assume it is in bzr too). The biggest difference seems to be that in bzr, the final checksum is 64-bit, while for BK, it was just a 16-bit checksum/unique number (the rest is just user-name/machine-name and date: I assume that the bzr commit was done at 10/17/2006 3:20:29PM, and the example BK ChangeSet was created 10/31/2003 6:38:50PM - it looks like _exactly_ the same date format). With BK, you can also use a "md5 key", and I don't actually know how they work. They may just be the md5 hash of the ChangeSet key, I think that may be how those things are indexed. So in bkcvs, you'll see a line like this: BKrev: 42516681VmgTWL0bkLcltPGiI6Yk5Q which is the BK md5 key for my last kernel revision in BK (2.6.12-rc2). Again, these numbers are stable, unlike the simple revisions. Note that from a usability standpoint, the UUID's look more readable to a human, but are actually much worse than the md5 keys (or the SHA1's that git uses). At least with a hash, the first few digits are likely to be unique, so you can do things like auto-completion (or just short names). With the email+date+random number kind of UUID, you don't have that. (Pure hashes obviously also tend to just all have the same length, and are easier to parse automatically, so from a programmatic standpoint they are a lot easier too - but the surprising thing is how they are actually easier on humans too, even if the UUID's look more ...
On Thu, Oct 19, 2006 at 08:25:26AM -0700 I heard the voice of Actually, as best I know, it's not a checksum, just random bits (a This I agree with, at least in part. -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. -
Ahh. They may be that even in BK. I know BK had various 16-bit CRC checksums, but they were probably on the actual _file_ contents, not in the key itself. Linus -
Btw, I do believe that bzr seems to be acting a lot like BK, at least when it comes to versioning. I suspect that is not entirely random either, and I suspect it's been a conscious effort to some degree. Which is fine, in the sense that there are certainly much worse things to try to copy. That said, at least BK was up-front about the versions changing, and didn't try to do anything to hinder it. It still confused some people, and it wasn't a great naming system, but it did work. In the big picture, the version naming between BK and git hasn't been an issue for anybody in practice, I suspect. So if you want to look at features that actually matter more, try out something like gitk drivers/scsi include/scsi on the kernel archive (I assume that somebody has tried importing the kernel git tree into bzr - quite frankly, if bzr cannot handle that size tree without problems, you have much bigger issues!). In other words, being able to look at history of more than a single file has been a _huge_ bonus. The other big difference is being able to do merges in seconds. The biggest cost of doing a big merge these days seems to literally be generating the diffstat of the changes at the end (which is purely a UI issue, but one that I find so important that I'll happily take the extra few seconds for that, even if it sometimes effectively doubles the overhead). Looking at the dates of the merges yesterday, they're literally half a minute apart, and that's not me _scripting_ them - that's me actually looking up the emails, typing in the "git pull " and pasting the source repository, and git fetching the data over the network and merging it, and checking out the result (and me verifying that the resulting diffstat matches what the email says). Doing four of those in a row in less than two minutes is actually a really big deal. At some point, "performance" is just more than a question of how fast things are, it becomes a big part of ...
By curiosity, how would you compare git and Bitkeeper, on a purely technical basis? (not asking for a detailed comparison, but an "X is globaly/much/terribly/not better than Y" kind of statement ;-) ) -- Matthieu -
I think git is better for kernel work these days, but a large portion of that is that a lot of the features have literally been tweaked for us (for very obvious reasons). For example, the whole "rebase" thing (or explicitly making cherry-picking easy) is something that a number of kernel people do, and even if I have to admit to not liking the practice very much (it kind of hides the "true" development history), it does have huge advantages, and it makes history a lot easier to read. Similarly, I often used the single-file graphical history viewing in BK ("revtool"), but being able to follow the history of multiple files as one "entity" really is something that once you get used to, it's really really hard going back, and "gitk" does generate a much more readable graph. And I think the git way of doing branches is just simply superior. Git always did branches in the sense that the way merges happened you _always_ had several heads, but actually making them available and switching between them was something that wasn't my idea, and that I even was a bit apprehensive about. I was wrong. Git branches are branches done right. I just don't see how you _could_ do them better. That said, a lot of the features I like and _I_ consider really important are possibly not that important to others. For example, maybe nobody else really cares about viewing the history of a particular subsystem, the way I do. For a lot of people, single-file is probably ok. For example, while git now does "annotate" (or "blame"), it's not lightning fast, and I simply don't care. Doing a git blame kernel/sched.c takes about three seconds for me, and that's on a pretty good machine (and on the kernel tree, which for me is always in the cache ;). Quite frankly, if I cared deeply about that kind of annotation, I'd probably be upset about it. There are basically _no_ other git operations that take that long. I can get the _full_ log of the last 18 months of the kernel much ...
ll.6041-6091 of that file is blamed to arch/ia64/kernel/domain.c by pickaxe -C (attributed to commit 2.6.12-rc2) while blame says they are brought in by commit 9c1cfa, which says "Move the ia64 domain setup code to the generic code". I am slowly realizing that comparing the output from blame and pickaxe might be a good way to study the project history. -
Having used both in a past job setting (simultaneously even), BitKeeper was a huge win over CVS, but after a while, some of its tools were just very frustrating in comparison with comparable Git interfaces, and I had actually written a terribly slow BK -> Git converter just so I could incrementally import our BK tree, then use Git's history-viewing because it was so much more pleasant to work with. For small projects (~5 people), they weren't hugely different, but Git just felt more comfortable after a while. (It was actually possible to do a commit from the command line in a single command, without getting annoyed by the interface, for a trivial example.) -
An interesting effect on this is when people have a column for merge performance in a SCM comparison table, they would include time to run the diffstat as part of the time spent for merging when they fill in the number for git, but not for any other SCM. I know you won't misunderstand me but for the sake of others, I should add this: I am not saying diffstat should be optional. -
The point here is, that because of using the bot, the revnos on bzr.dev
are indeed stable (and many of the merges are in fact pointless merges
(ie. merges of revision and it's ancestor)). But if you don't use the
bot, than doing:
bzr merge mainline
bzr push mainline
makes your revision the leftmost parent is your revison, not the one
from "mainline". The fact that bzr treats leftmost parent somewhat
specially makes people to replace the above with
bzr branch mainline
cd mainline
bzr merge feature-branch
bzr push
which is, well, more complicated (but you see it's not about main
maintainer -- anybody with write access can push).
--------------------------------------------------------------------------------
- Jan Hudec `Bulb' <bulb@ucw.cz>
-
I'd like to point out that the same thing has happened in bzr-land. Back in the "pre-bot" days, only Martin did put things in "his branch" where most people got bzr from (same as Linus' git branch), but he was away for a few weeks and during this time, there was 3 (or 4 perhaps) other branches, called integration branches, that was being used. They were all maintained by different people. Everyone learned really quickly to use them instead of Martin's branch. When Martin came back, he just pulled/merged these branches and everything was back to normal. I'd say in this case, bzr was even more "without a trunk" then in the example Linus gives above. What seams to be one interesting thing in this discussion is that, because people use bzr and git in slightly different ways, they think that one or the other cannot be used in another way. bzr's use of revision numbers, doesn't mean it hasn't got unique revision identifiers, and I can't see any reason why it couldn't be used in the same way as git. Both are excellent tools, and since git is more specialized (built to support the exact workflow used in kernel development), it's more suited for that exact use. bzr tries to take a broader view, for example, it does support a centralized workflow if you want one. Most people don't, but a few might. Because of this, it probably fits the kernel development less good than git. That's fine I think! I happens to fit my workflow better than git does :) Regards, Erik -
close to 200 post on bzr-git war! is this the right place (git mailing list) to discuss about future features of bzr ? -- Christian -
Perhaps not, but the tone is friendly (mostly), the patience of the bazaar people seems infinite and lots of people seem to be having fun while at the same time learning a thing or two about a different SCM. Best case scenario, both git and bazaar come out of the discussion as better tools. If there would never be any cross-pollination, git wouldn't have half the features it has today. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
I second this. I'm bzr user and occasionnal developper, and I learnt a lot about git in the discussion. I hope I also could explain well some of the features of bzr to some git guys, it's always interesting to understand why other people do things on a different way, or why they do it in the same way. -- Matthieu -
Thanks everyone for taking time to explain details. However, I don't use SCM for code development. I use it for collaborative documentation, white boarding and tracking configurations. In fact in my company no one uses SCM for code development. Everyone here uses it for collaborative documentation and white boarding. Only I use SCM for tracking configurations. I think of SCMs in terms of an SCM core and SCM tools. First I want to say every SCM I know of sucks when it comes to tracking configurations, simply because they don't record or restore file metadata, like perms, ownership, and acl. I don't see recording or restoring file metadata as part of the SCM core. I do however feel an SCM core needs to have provisions for extended file inventory information. The problem with extended file inventory information, it is fs specific. For this reason I feel it is essential that the SCM core allow multiple sets of extended file inventory information. The SCM tools are responsible, based on the local config, for recording metadata and creating extended file inventory, translating file metadata of one file system. When tracking configurations octopus merges are surprisingly common. If a configuration changed is not signed off by a responsible person, it can not be accepted. Doing otherwise is simply an invitation to attackers and makes trouble shooting far too difficult. Also configuration file in one directory will most often not be members of the same repo. For example each file etc in directory would members of different repos according to its associated application/pkg. Somethings I like the SCM tools to handle. Personally I would like the SCM tools to be platform independent. This would ensure that correct things happening on ext3 mounted on windows. I don't think execute bit belongs in the basic file inventory information. Instead I would like to use this replace by a filter in the extended file inventory indicating what file metadata if any should be recorded or ...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Arch supports that kind of metadata. I believe SVN supports recording arbitrary file properties, so it's just Our choices have been predicated on producing the best SCM we can for the purpose of developing software. We find that the execute bit is very useful for build scripts and other incidental scripts. The other attributes didn't seem useful for software development, so An XML diff/patch or merge will not handle ODF properly. There's too The bzr "webserve" plugin provides rss feeds. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFFN5oB0F+nu1YWqI0RAjSoAJ9xrZtSrZpVVoz6qAf/sZnd/StsUACfenqX 6bemNgMSbhtL0JjIlvulrb4= =bSpK -----END PGP SIGNATURE----- -
yes svn has arbitrary properties which can be manipulated. They are not really intended for permissions, ownership, and acl. To use the svn properties for this requires adding scm tools. Also svn does not allow files in the same directory to live in I have only experiment with xml diffs on odf files. From my experience xml diffs work fine on svg files. For more information, please refer to yes, Multiple merge sources is handy for collaborative document editing -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Agreed. I think it's okay to require extra work to set the scm up to It would surprise me if many SCMs that support atomic commit also That's something I'd like for software development, too. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFOEsO0F+nu1YWqI0RAo+6AJ9lzF0+O1I8rgkyCOdhsir1gjo0NQCfXEVV EIsDmS+eR/7cHKQfmnPJRA4= =g5jk -----END PGP SIGNATURE----- -
In fact I think svk would. You would have to switch them by setting
an environment variable, but it's probably doable. That is because
unlike other version control systems, it does not store the information
about checkout in the checkout, but in the central directory and that
can be set. I don't know git well enough to tell whether git could do
the same by setting GIT_DIR.
--------------------------------------------------------------------------------
- Jan Hudec `Bulb' <bulb@ucw.cz>
-
That's not a simple matter.
Tracking ownership hardly makes sense as soon as you have two
developers on the same project. What does it mean to checkout a file
belonging to user foo and group bar on a system not having such user
and group?
Just restoring the complete user/group/other rwx permission is already
a mess. In my experience (GNU Arch did this):
1) It sucks ;-). Me working with umask 022 so that my collegues can
"cp -r" from me, working on a project with people having umask 077,
I got some files not readable, some yes, well, a mess. *I* have set
my umask, and *I* want my tools to obey.
2) It's a security hole. If you work with people having umask=002 (not
indecent if your default group contains just you), you end-up with
world-writable files in your ${HOME}.
That said, it can be interesting to have it, but disabled by default.
The 'x' bit, OTOH, is definitely useful.
--
Matthieu
-
I fully agree with Andreas: I am just a bzr user (not even a bzr developer) and when looking for a decentralized VCS I also looked at git and a few others. I think I am learning quite a bit about bzr, git, and VCS in general. -- Ramon Diaz-Uriarte Statistical Computing Team Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) http://ligarto.org/rdiaz -
Dear diary, on Thu, Oct 19, 2006 at 09:02:16AM CEST, I got a letter There is perhaps no "technical" reason, but it's also what the user interface is designed around - most probably, using UUIDs instead of revnos would be a lot less convenient for bzr people because you probably primarily show revnos everywhere and UUIDs only in few special places and/or when asked specifically through a command (correct me if I'm wrong). Also, do you support "UUID autocompletion" so that you can I think they are in fact just as flexible (+-epsilon). Git can support centralized workflow as well - you have some central repository somewhere and all the developers clone it, then pull from it and push to it in basically the same way they would use CVS. And it is perhaps currently even more used in practice than the "single-man" workflow nowadays, as more project are using Git. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
[ trim back CC a bit ] On Thu, Oct 19, 2006 at 01:37:31PM +0200 I heard the voice of The primary place you'd see either is in 'log'. To show the UUID, you'd add a "--show-ids" arg to it (and via per-user config aliasing, you could just alias 'log' to 'log --show-ids' if you always wanted to see them, so you wouldn't have to type it. The output looks something like: revno: 1 revision-id: fullermd@over-yonder.net-20061019151437-5b99dff6ed1d76cd committer: Matthew Fuller <fullermd@over-yonder.net> branch nick: a timestamp: Thu 2006-10-19 10:14:37 -0500 message: Foo (without --show-ids, it's the same, except not showing the With the form of bzr UUID's, that's not particularly useful, since you're probably into the minutes/seconds of the timestamp before it becomes unique, at which points you're close to 2/3 of the way through the whole string. -- Matthew Fuller (MF4839) | fullermd@over-yonder.net Systems/Network Administrator | http://www.over-yonder.net/~fullermd/ On the Internet, nobody can hear you scream. -
So? It makes no sense to me to cater only to "successful projects"... most Yes, but what matters here is the principle... if branches aren't equal, it makes some things unnecessarily hard (i.e., forking, passing maintainership over, ...). Sure, they aren't activities that should be actively "Very rare" != "never". The "very rare" cases /will/ come back to bite you, once you grow accustomed to "hasn't ever happened" What makes a "published repository" special, as oposed to my local Are they different among repositories, even though they came from another OK. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 2654431 Universidad Tecnica Federico Santa Maria +56 32 2654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 2797513 -
funny. I actually read another post from Linus, and when I "merge" with your post (understand: bisect), the following comes out: - git is the fastest scm around - git has the smallest scm footprint - git is also aimed at small(ish) projects my personal proof of concept on the last point is that I'm a IC design engineer who threw away other scm in favor of git since git-1.4.2 and regret now the years wasted on _other_ scm. But your mileage may vary. -- Christian -
In the Git world that happens via "git tag -s", i.e, a cryptographically strong "signoff". (There's also the secondary convention of appending Signed-off-by: to email-applied patches, but that's something that would translate effectively to any other system, since it's outside the SCM.) -
For non-git people (and maybe even git people who didn't follow some of
the "reflog" work):
- git does actually have "local view" support, but it is very much
_defined_ to be local. It does not pollute any history as seen by
anybody else. It's called "reflog" (where "ref" is just the git name
for any reference into a tree, and the "log" part is hopefully obvious)
So each git repository can have (if you enable it) a full log of all the
changes to each branch. But it's not in the core git datastructures that
get replicated - because the local view of how the branches have changed
really _is_ just a local view. It's just a local log to each repository
(actually, one per branch).
It's what allows a git person to say
git diff "master@{5.hours.ago}"
because while "5 hours ago" is _not_ well-defined in a distributed
environment (five hours ago for _whom_?) it's perfectly well-defined in a
purely _local_ sense of one particular branch.
So there's no need for a fakey "merge" that isn't a real merge and that
doesn't make sense for anybody else because it doesn't actually add any
real knowledge about the _history_ of the tree (only about a single
repository). If you want to see how the history of a particular repository
has evolved, you can just look at the reflog (although admittedly, common
tools like "gitk" don't even show it - the data is there if they would
want to, but the most common usage is the above kind of "show me what
happened in the last five hours in my current branch".
Linus
-
How that works with branching point, and with merges? For example
in the case depicted below, how you refer to commit marked by X?
---- time --->
--*--*--*--*--*--*--*--*--*-- <branch>
\ /
\-*--X--*--/
The branch it used to be on is gone...
Besides, in git commit object has pointers (in the form of sha1 ids)
to all its parents. So <ref>^ (parent of <ref>), or <ref>^<m> (m-th
parent of <ref>), or <ref>~<n> (n-th parent in 1st-parent lineage
of <ref>) are natural, and fast. <ref>+<n> (which would add yet another
character as forbidden in branch name) would need either serial number
(per repository or per branch) to commit id database, or getting full
history and looking it up in full history.
Branches in git are remembered not by their starting points, but by
Git could do that too, by having file (files) with serial number
or branch/tag+serial number to commit id mapping. But this would
have to be local matter. And this would take some disk space, and
would seriously affect fetch performance (now git just downloads
what it doesn't have and dumps it into repository database).
BTW. what if repository is moved from one URL to another, for example
moving to different host? All "abstracted away" identifiers get
Two words: post-commit hook. You can automate action of adding tags
(especially now with packed refs, which means that we can have huge number
That is the alternate solution, but this would mean that merge would be
recorded (unless you squash it). And for published branches (like 'next'
for example) it is better solution, because rebase is in fact rewriting
history.
But rebase means that you had
A---B---C topic
/
D---E---F---G master
Rebasing 'topic' branch on top of master would mean that you would get
A'--B'--C' topic
/
D---E---F---G master
where A', B', C' represent the same changeset as A, ...In bzr 0.12 this is : 2.1.2 (assuming the first * is numbered '1'.) These numbers are fairly stable, in particular everything's number in the mainline will be the same number in all the branches created from it at that point in time, but a branch that initially creates a revision or obtains it before the mainline will have a different number until they syncronise with the mainline via pull. -Rob --=20 GPG key available at: <http://www.robertcollins.net/keys.txt>.
So basically anyone can pull/push from/to each other but only so long as they decide upon a common master that handles synchronizing of the number part of the url+number revision short-hands? One thing that's been nagging me is how you actually find out the url+number where the desired revision exists. That is, after you've synced with master, or merged the mothership's master-branch into one of your experimental branches where you've done some work that went before mothership's master's current tip, do you have to have access to the mothership's repo (as in, do you have to be online) to find out the number part of url+number shorthand, or can you determine it solely from what you have on your laptop? -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
I can't say for bzr 0.>12 which do not exist ;-) For previous versions, it didn't have that "simple" number, and you had to use the rev-id. -- Matthieu -
Anyone can push and pull from each other - full stop. Whenever they 'pull' in bzr terms, they get fast-forward happening (if I understand the git fast-forward behaviour correctly). After a fast-forward, the dotted decimal revision numbers in the two branches are identical - and they remain immutable until another fast forward occurs. Push always fast forwards, so the public copy of ones own repository that others pull or merge from is identical to your own. In a 'collection of branches with no mainline' scenario, people usually have fast forward occur from time to time, keeping the numbers consistent from the point You can determine it locally - if you know any of the motherships revisions locally, we can generate the dotted-revnos that the motherships master-branch would have from the local data - and the last merge of mothership you did will have given you that details. I dont think we have a ui command to spit this out just yet, but it will be trivial to whip one up. More commonly though, like git users have 'origin' and 'master' branches, bzr users tend to have a branch that is the 'origin' (for bzr itself this is usually called bzr.dev), as well as N other branches for their own work, which is probably why we haven't seen the need to have a ui command to spit out the revnos for an arbitrary branch. -Rob --=20 GPG key available at: <http://www.robertcollins.net/keys.txt>.
This is where it breaks down for me. "until another fast forward occurs" To me, this means bazaar isn't distributed at all and I could achieve much the same distributedness(?) by rsyncing an SVN repo, working against that and then rsyncing it back with some fancy merging. In other words, bazaar requires there to be one Lord of the Code, or some of the key features break down. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Dear diary, on Wed, Oct 18, 2006 at 10:53:16AM CEST, I got a letter Well as far as I understand, the Lord of the Code is whoever you pulled from the last time. It's just a different focus here. If I understood everything in this thread correctly, both Git and Bazaar have persistent (SHA1, UUID) and volatile (revspec, revision number) revision ids. The only difference is that Git primarily presents the user with the SHA1 ids while Bazaar primarily presents the user with a revision number (and that revspecs change after every commit while revision numbers change only after a merge). -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
You mis-understand. git doesn't have a "ui command to spit out the revnos for an arbitrary branch" either. Normally, you'd just use the branch-name. Nobody ever uses the SHA1's directly. What git does (and does very well) is to be _scriptable_. It was designed that way. I'm a UNIX guy. I think piping is very powerful. And when you script things, your scripts pass SHA1's around internally. So for example, to repack a git archive, you'd normally do git repack -a -d and you don't have any "UI" with SHA1 numbers. But internally, this used to be git-rev-list --all --objects | git-pack-objects where "git-rev-list" is the one that lists all object names (which are the SHA1 numbers), and "git-pack-objects" is the one that takes a list of objects and packs them. (These days, since our internal C libraries have become so much better, the object traversal is done internally to packing, so we don't actually use the pipe any more for repacking an archive, but that's just an implementation detail) You seem to think that we use SHA1 names as _humans_. We don't. The SHA1 names are used internally, and humans just use the branch names. The only case you'd (as a human) use the SHA1 name is when you want to pass it on to another person that may have a different archive (ie you mail somebody a revision that is problematic). It would obviously be totally unworkable to say "it's the grand-parent of my current HEAD commit", since that's a local description. So instead, you'd say "it's commit 9550e59c4587f637d9aa34689e32eea460e6f50c". So I think people (totally incorrectly) think that git users use a lot of SHA1 names, just because they see the git users on the kernel mailing list sending each others SHA1 names. But that's because you see only the case where you _want_ to communicate a stable revision name to another side. Sending a number like 1.57.8.312 to describe what commit broke would be a _bug_, because a person who has a differently ...
With the exception of having sometimes commit-ids in the commit messages, for example "Fixes bug introduced by aabbcc00" (although usually you just write "Fixes bug in some_function in some_file"), and automatically generated This reverts d119e3de13ea1493107bd57381d0ce9c9dd90976 commit. (in addition to 'Revert "<Commit title>") for git-revert generated commit messages. And it is true that you usually use branchname, or branchname~n syntax. Git even has git-name-rev to convert from sha1 to temporary, local ref^m~n... syntax. By the way, git has very powerfull syntax to get revisions, and revision lists. For example "git-rev-list foo bar ^baz" means "list all the commits which are included in foo and bar lineage, but not in baz", or more useful "git log origin..next". How's that in bzr? -- Jakub Narebski Poland -
Yes. But in both cases, that's usually because you literally ended up having the commit name because somebody else (which _can_ be you) searched for it (with something like "bisect") and gave it to you. So even that case is really about communicating a stable name from one place (the "find the bug") to another (the "revert the buggy commit"). So yes, _communication_ should always happen by full SHA1's, because those are the only thing that always remain stable. (The fact that "gitk" and I think "gitweb" can then turn them into hyperlinks in the commit message is obviously one reason we then tend to give them such prominent visibility - they actually end up being very useful later on). In bzr, either you don't get the hyperlinks, or you need to use the non-simple name in the commit messages, since the simple names don't actually work. Either way, it's an inferior setup. Linus -
And here, by "fairly stable", you really mean "totally idiotic", don't you? Guys, let's be blunt here, and just say you're wrong. The fact is, I've used a system that uses the same naming bzr does, and I've used it likely longer and with a bigger project than anybody has likely _ever_ used bzr for. It sounds like bzr is doing _exactly_ what bitkeeper did. Those "simple" numbers are totally idiotic. And when I say "totally idiotic", please go back up a few sentences, and read those again. I know what I'm talking about. I know probably better than anybody in the bzr camp. Those "simple" numbers are anything but. They may be short, most of the time, but when you bandy things like "-r 56" around, what you're ignoring is that for a _real_ project you actually get numbers like "1.517.3.57", which isn't really any simpler or shorter than saying "7786ce19". You still want to cut-and-paste it. And the "simple" numbers have a real downside, which is that THEY CHANGE. What happens is that somebody else started _another_ branch at revision 2, and did important work, and and they also had a "2.1.2" revision, and then they merged your work, and you merged their merge back, that "simple" revision number changed, didn't it? Suddenly "2.1.2" means something different for one of the users. We had people in the bitkeeper world that _never_ actually understood that the numbers changed. The "simple" numbers were stable enough that a lot of people thought they were real revisions, and then they were really _really_ confused when a number like "1.517.3.57" suddenly went away after a merge, and became something else instead. And yes, bitkeeper had a "real key" internally too. If you actually wanted to give a real revision, you had to give something that looked a lot like what the bzr internal revision numbers look like. Of course, most users didn't even _know_ or understand those revision numbers, so as a result, you had tons of people who used the ...
Be as blunt as you want. You're expressing an opinion, and thats fine. I happen to think that we're right : users appear to really appreciate this bit of the UI, and I've not yet seen any evidence of confusion about it - though I will admit there is the possibility of that occurring. I think its completely ok that git and bzr have made different choices in this regard, but I *dont* think our choice is in any regard 'totally idiotic'. [snip examples that are clearly predicated on how bk worked, not on how bzr works]. -Rob --=20 GPG key available at: <http://www.robertcollins.net/keys.txt>.
On Wed, 18 Oct 2006 08:27:58 +1000 Yeah, but it's an opinion that is based on a huge real world project with hundreds of developers. If Bazaar is ever used in a project of that size it may just see the same type of issues as Bk. As has been mentioned elsewhere, Git users really appreciate the short forms it provides for referencing commits, so much so that there is no reason to invent a new (unstable) numbering system or attempt to hide the true underlying commit identities. Just out of curiosity is there a Bazaar repo of the Linux kernel available somewhere? Sean -
Yup. The new command will also automagically appear in the "git help -a" output. Those two functions have been available since the C wrapper was born, although "git help -a" was the only available output for "command not found" until someone introduced the more newbie-friendly list that pops up now adays. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Precisely how does this rebase operate in git ?=20 Does it preserve revision ids for the existing work, or do they all change? bzr has a graft plugin which walks one branch applying all its changes to another preserving the users metadata but changing the uuids for revisions.=20 -Rob --=20 GPG key available at: <http://www.robertcollins.net/keys.txt>.
On Tue, 17 Oct 2006 19:37:45 +1000 git rebase does exactly the same as you describe, including changing the sha1 for each commit it moves. Sean -
Hey, "simple" is in the eye of the beholder. You can always just define Bazaar's naming convention to be simple. I pretty much _guarantee_ that a "number" is not a valid way to uniquely name a revision in a distributed environment, though. I bet the "number" really only names a revision in one _single_ repository, right? Which measn that it's actually not a "name" of the revision at all. It's just a local shorthand that has no meaning, and the exact same revision will be called something different when in somebody elses repository. I wouldn't call that "simple". I'd call it "insane". In contrast, in git, a revision is a revision is a revision. If you give the SHA1 name, it's well-defined even between different repositories, and you can tell somebody that "revision XYZ is when the problem started", and they'll know _exactly_ which revision it is, even if they don't have your particular repository. Now _that_ is true simplicity. It does automatically mean that the names are a bit longer, but in this case, "longer" really _does_ mean "simpler". If you want a short, human-readable name, you _tag_ it. It takes all of a Well, in the git world, it's really just one shared repository that has separate branch-namespaces, and separate working trees (aka "checkouts"). So yes, it probably matches what bazaar would call a checkout. Almost nobody seems to actually use it that way in git - it's mostly more efficient to just have five different branches in the same working tree, and switch between them. When you switch between branches in git, git only rewrites the part of your working tree that actually changed, so switching is extremely efficient even with a large repo. So there is seldom any real need or reason to actually have multiple The fact is, git supports renames better than just about anybody else. It just does them technically differently. The fact that it happens to be the _right_ way, and everybody else is incompetent, is not my fault ...
Unless you have branch(es) with totally different contents, like git.git But without .git being either symlink, or .git/.gitdir "symref"-link, you have to remember what to ser GIT_DIR to, or parameter for --git-dir option. I'd like to mention once again that in Git branches and tags have totally separate namespace than repository namespace. -- Jakub Narebski Poland -
Hi, But I _do_ work with it! I just don't need to "checkout" it! Example: git -p cat-file -p todo:TODO You'd just use alternates for that. But as Linus mentioned in another email, you mostly can use the _same_ working directory. If you want to work on another branch, which is not all that different from the current branch (say, you have a bug fix branch on top of an upstream branch), you just _switch_ to it. Git recognizes those files which are changed, and updates only these. Therefore, if you have something like a Makefile system to build the project, you actually save (compile) time as compared to the multiple-checkout scenario. I use this system a lot, since I maintain a few bugfixes for a few projects until the bugfixes are applied upstream. BTW the multiple-branches-in-one-working-directory workflow was propagated by Jeff a long time ago, and it really changed my way of working. Thanks, Jeff! Ciao, Dscho -
Ok, if there ever was an example of a strange git command-line, that was Well, you can just add [alias] cat=-p cat-file -p to your ~/.gitconfig file, and you're there. [ For all the non-git people here: the first "-p" is shorthand for "--paginate", and means that git will automatically start a pager for the output. The second "-p" is shorthand for "pretty" (there's no long-format command line switch for it, though), and means that git cat-file will show the result in a human-readable way, regardless of whether it's just a text-file, or a git directory ] So then you can do just git cat todo:TODO and you're done. [ So for the non-git people, what that will actually _do_ is to show the TODO file in the "todo" branch - regardless of whether it is checked out or not, and start a pager for you. ] I actually do this sometimes, but I've never done it for branches (and I do it seldom enough that I haven't added the alias). I do it for things like git cat v2.6.16:Makefile to see what a file looked like in a certain tagged release. People sometimes find the git command line confusing, but I have to say, the thing is _damn_ expressive. I've never seen anybody else do things like the above that git does really naturally, with not that much confusion really. Even that "alias" file is quite readable, although I'd suggest writing out the switches in full, ie [alias] cat=--paginate cat-file -p instead. That kind of helps explains what the alias does and avoids the question of why there are two "-p" switches. Linus -
Hi, Ha! I have that for a long time! Although I named it "s", since "git s todo:TODO" is two letters shorter... Ciao, Dscho P.S.: BTW a certain person complained about ~/.gitconfig not being documented, but evidently the itch was not big enough for that person to document it himself... -
This very useful syntax (<ent>:<path>) didn't get documented "officially" anywhere. It was actually documented in commit log v1.4.1^0~255^2. Maybe someone should copy and paste it to git documentation? Maybe core-tutorial.txt or git-rev-parse.txt, is there any better place? -- Duy -
Yes. I have to say, that's likely a fairly odd case, and I wouldn't be surprised if other VCS's don't support that mode of operation at _all_. The fact that git branches can be independent of each other is very I'd strongly suggest that people who do this should actually do git clone -l instead of actually playing games with symlinking .git/ itself or using GIT_DIR. It means that the two checkouts get separate branch namespaces, but that's really what you'd want most of the time. You _can_ share the whole branch namespace and do the symlink of .git (or just set GIT_DIR - but that's pretty inconvenient), and it might end up being "closer" to what some other VCS would do. But the natural thing to do with git is to just share some of the objects through local "slaving" of the repositories, and consider them otherwise entirely independent. Linus -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Bazaar also supports multiple unrelated branches in a repository, as does CVS, SVN (depending how you squint), Arch, and probably Monotone. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNFy90F+nu1YWqI0RAgMeAJ99OikxXspSg+efnN6j3ySoPuOovQCfaKA6 yPCRw5Kl/V+ThnU6fsPA8TQ= =DYAN -----END PGP SIGNATURE----- -
It does work, very well at that.
I have a directory for each separate branch and simply use
cd(1) to change the current working directory to that branch.
So, instead of "git checkout <branch>", I do "cd ../<branch>".
One only needs to watch out when one updates the repository.
If there had been updates in those branches, then one needs
to git-reset the "branch" directory... (you know what I mean)
(For example when I come to work in the morning an sync up
with home from my usb key...)
The script is called:
Usage: git-mkdir-of-branch <original-directory> <branch> <new-directory>
where <branch> is the name of an existing branch in <original-directory>/.git/refs/heads
and uses simple symbolic links and some git plumbing to do the
job. It can be found in my git trees. I never bothered to send
it out to Junio, since it could be considered heretic. ;-)
Luben
-
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Right. That's why I said all revisions can be named by a URL + a number, because it's the combination of the URL + a number that is I agree that a revision is a revision, but I don't think that's a When two people have copies of the same revision, it's usually because they are each pulling from a common branch, and so the revision in that branch can be named. Bazaar does use unique ids internally, but it's But tags have local meaning only, unless someone has access to your The key thing about a checkout is that it's stored in a different location from its repository. This provides a few benefits: - - you can publish a repository without publishing its working tree, possibly using standard mirroring tools like rsync. - - you can have working trees on local systems while having the repository on a remote system. This makes it easy to work on one logical branch from multiple locations, without getting out of sync. - - you can use a checkout to maintain a local mirror of a read-only You can operate that way in bzr too, but I find it nicer to have one checkout for each active branch, plus a checkout of bzr.dev. Our switch command also rewrites only the changed part of the working tree. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNFrv0F+nu1YWqI0RAgBHAJ9XpmdvuCNDysxFhnyeCmkEG/z0ggCggMsJ WyW6lqGMokh0k0It1KOdgtk= =L1SR -----END PGP SIGNATURE----- -
The revision will change between different repos though, so random-contributor A that doesn't have his repo publicised needs to send patches and can't log his exact problem revision somewhere, which makes it hard for random contributor B that runs into a similar problem but on a different project sometime later to find the offending code. I prefer the git way, but I'm a git user and probably biased. That said, it shouldn't be impossible to add fixed, user-friendly bazaar-like revision numbers for git. We just have to reverse the <committish>[^~]<number> syntax to also accept <committish>+<number>. This would work marvelously with serial development but breaks horribly with merges unless the first (or last) commit on each new branch gets given a tag or some such. Either way, I'm fairly certain both bazaar and git needs to distribute information to the user in need of finding the revision (which url and which number vs which sha). I also imagine that the bazaar users, just like the git users, are sufficiently apt copy-paste people to never Well, if two people have the same revision in git, you *know* they have pulled from each other, because ALL objects are immutable. The point of I imagine the bazaar-names with url+number only has local meaning unless someone has access to your repository too. One of the great benefits of git is that each revision is *always exactly the same* no matter in which repository it appears. This includes file-content, filesystem This I'm not so sure about. Anyone wanna fill out how shallow clones and Check. Well, actually, you just clone it as usual but with the --bare Works in git as well, but each "checkout" (actually, locally referenced repository clone) gets a separate branch/tag namespace. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 No, you don't. They may have each pulled from a different repository. Take revision 00aabbcc, created by Linus. Linus has it because he committed it. I have it because I pulled Linus' repository. You have it because Andrew Morton pulled Linus' repository, and you pulled Andrew In Bazaar, a revision id always refers to the same logical entity, but With most SCMs that store the repository in the root of the tree, disentangling the tree and repository requires care. OTOH, this is just In our terminology, if it can diverge from the original, it's a branch, not a checkout. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFFNOM10F+nu1YWqI0RAvNUAJwN/QviOs+sUuN9ep4Otyrgax9SmwCfSH7t XdxOxo7smshNlzU3qoxq6Nw= =nxsM -----END PGP SIGNATURE----- -
On Tue, 17 Oct 2006 10:05:41 -0400 Well his point was that they have pulled from each other directly or indirectly. You can safely say that rev 00aabbcc.. in _any_ repository is the same rev. This discussion started because of doubt expressed by some here on the list that the "simple" numbering scheme used by bzr can offer the same guarantee. That is, rev 1.2.1 may be completely Why? Uncommitted changes shouldn't be propagated. Once you have cloned the repo, you can checkout your own copy of the working tree files. Sean -
I realized it as I read it now. What I meant was that you know you have This I don't understand. Let's say Alice has revision-154 in her repo, located at alice.example.com. Let's say that commit is accessible with the url "alice.example.com:revision-154". Bob pulls from her repo into his own, which is located at bob.example.com. Lots of questions here, so I'll split them up. Feel free to delete the non-applicable ones. Will the commit in Bob's repo be accessible at "bob.example.com:revision-154"? If it's not, how can you backtrack from old bugreports and find the error being discussed? If it is, how does that work if Bob suddenly wants to commit things before Alice is done working with her changes? Also, suppose they both push to a master-repo where Caesar has pushed his changes and nicked the slot for revision-154. Does the master repo re-organize everything and then invalidate Bob's and Alice's changes, or does it tell Alice and Bob that they need to update and then reorganize their repos before they're allowed to push? I really can't get my head around the usefulness of revision-numbers hopping around which is probably why I'm having such a trouble groking You get the working tree files by default. Use --bare if you don't want them to be checked out (i.e. written to the working tree) after the This clears things up immensely. bazaar checkout != git checkout. I still fail to see how a local copy you can't commit to is useful, but it doesn't really matter to me as I've already found a tool that does everything I want wrt scm needs. -- Andreas Ericsson andreas.ericsson@op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 -
Another equation can help. Revision Identity != Revision Number. $ bzr log --show-ids ------------------------------------------------------------ revno: 1 revision-id: Matthieu.Moy@imag.fr-20061017152029-4c5a2861bcf23b7d committer: Matthieu Moy <Matthieu.Moy@imag.fr> branch nick: foo timestamp: Tue 2006-10-17 17:20:29 +0200 message: some message See, bzr has this unique revision identifier (not based on a hashsum). The design choice of bzr is to hide it as much as possible from the user interface. Then, if I'm in the branch in which I typed this command, I can reffer to this revision with simply bzr whatever -r 1 In the general case, I can access it with bzr whatever -r revid:Matthieu.Moy@imag.fr-20061017152029-4c5a2861bcf23b7d (There's currently a lack in the UI to specify a remote revision-id, but that's not a problem in the model itself) bzr's internal use almost exclusively revision ID (ancestry information is all about revision id), and revno are a UI layered on top of it. I don't have strong needs in revision control, but I actually never encountered a case where I had to access a revision by providing its ID. So, for people like me, revision numbers are sufficient, and they are simple (for example, I can tell without running any command that revision 42 is older than revision 56 in a particular branch). -- Matthieu -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 bzr differentiates between pull and merge. Pull is a mirroring command. So with pull, yes revision-154 will be accessible at bob.example.com:revision-154. With merge, it won't. Bob can refer to it as "154:alice.example.com", I don't see how this applies. You can always commit in a branch. If alice and bob both commit, then they are diverged and can't pull. If My bzr is run from a local copy I can't commit to. To get the latest changes from http://bazaar-vcs.org, I can run "bzr update ~/bzr/dev". To merge the latest changes into my branch, I can run "bzr merge ~/bzr/dev". It's also convenient for applying other peoples' patches to. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFFNTKl0F+nu1YWqI0RAhRkAJ0d5KyRElEiFm/m5iRrTIk00RyqywCfe2IY dhW46SYWm+FTQpN30VY5tPs= =6SFm -----END PGP SIGNATURE----- -
Dear diary, on Tue, Oct 17, 2006 at 09:44:37PM CEST, I got a letter The question is, why is it useful to enforce the "no commit" rule? Git can work exactly the same, it just doesn't _enforce_ the rule. And is the capability of enforcing such a rule important enough to warrant its own column in the comparison table? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sure. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNXRU0F+nu1YWqI0RAptIAJ0btflKFEjF9a7Kt/qVZufK003DpACeK7Dc leW4ICG1LbOC9DGrAd5ztlY= =JGvL -----END PGP SIGNATURE----- -
Tags are propagated during clone, and during fetch/pull (getting changes from repository). So in that sense they are global. If you don't publish your repository, then neither tags, nor <URL>+<rev no> In git we usually use "git clone --local" (with repository database hardlinked) or "git clone --shared"/"git clone --reference <repository>" (which automatically sets alternates, i.e. file pointing to alternate repository database) for that. This way one gets his/her own refs namespace, so two people can work on different branches simultaneously. Alternate solution would be to symlink .git, or .git/objects (i.e. In git you can access contents _without_ checkout/working area. For example gitweb (one of git's web interfaces) uses only repository Luben (IIRC) works this way. -- Jakub Narebski Poland -
Bazaar can do this too. For example, "bzr cat http://something -r some-revision" gets the content of a file at a given revision. But that's not what Aaron was refering to. In Bazaar, checkouts can be two things: 1) a working tree without any history information, pointing to some other location for the history itself (a la svn/CVS/...). (this is "light checkout") 2) a bound branch. It's not _very_ different from a normal branch, but mostly "commit" behaves differently: - it commits both on the local and the remote branch (equivalent to "commit" + "push", but in a transactional way). - it refuses to commit if you're out of date with the branch you're bound to. (this is "heavy checkout") In both cases, this has the side effect that you can't commit if the "upstream" branch is read-only. That's not fundamental, but handy. I use it for example to have several "checkouts" of the same branch on different machines. When I commit, bzr tells me "hey, boss, you're out of date, why don't you update first" if I'm out of date. And if commit succeeds, I'm sure it is already commited to the main branch. I'm sure I won't pollute my history with merges which would only be the result of forgetting to update. Once more, that's not fundamental, but handy. The more fundamental thing I suppose is that it allows people to work in a centralized way (checkout/commit/update/...), and Bazaar was designed to allow several different workflows, including the centralized one. -- Matthieu -
On Tue, 17 Oct 2006 13:19:08 +0200 Git can do this from a local repository, it just can't do it from a remote repo (at least over the git native protocol). However, over gitweb you can grab and unpack a tarball from a remote repo. This doesn't sound right, at least in the spirit of git. Git really wants to have a local commit which you may or may not push to a remote repo at a later time. There is no upside to forcing it all to happen in one step, and a lot of downsides. Gits focus is to support distributed offline development, not requiring a remote repo to be Again this seems really anti-git. There is no reason for your local branch to be marked read only just because some upstream branch is This is exactly the same in Git. You really only ever push upstream when your local changes fast forward the remote, (ie. you're up to date). While Git really isn't meant to work in a centralized way there's nothing preventing such a work flow. It just requires the use of some surrounding infrastructure. Sean -
Anyway, given the price of disk space today, this only makes sense if you have a fast access to the repository (otherwise, you consider your local repository as a cache, and you're ready to pay the disk space price to save your bandwidth). In this case, it's often in your I lied in my above description ;-). I should have said "by default" ... but you have "commit --local" if you want to have a local commit on a bound branch (at this point, I should remind that not all branches are "bound branches". "bzr branch" Will, take the example of my bzr setup. I have one repository, say, $repo. In it, I have one branch "$repo/bzr.dev" which is an exact mirror of http://bazaar-vcs.org's branch. I also have branches for patches (occasional in my case) that I'll send to upstream. Say $repo/feature1, $repo/feature2, ... If, by mistake, I start hacking on bzr.dev itself, I'll be warned at commit time, create a branch, and commit in this new branch. I believe git manages this in a different way, allowing you to commit in this branch, and creating the branch next time you pull. But you know this Yes, but you will have to do a merge at some point, right ? While I'm keeping a purely linear history (not that it is good in the general case, but for "projects" on which I'm the only developper, I find it good. For example, my ${HOME}/etc/). But don't get me wrong, I also prefer the decentralized way in most case. And I'm happy that bzr and git work like this by default. Just that at least *I* have cases where a centralized approach suits me better, and then I'm happy with that particular feature of bzr. -- Matthieu -
On Tue, 17 Oct 2006 14:03:21 +0200 This is most likely the reason that people using Git don't clammor more for the ability to work without a local repository. Disk is cheap and it just makes sense the vast majority of the time to have a complete copy of the repository yourself. There are a lot of powerful things you can do once you have all that information in your repo. Not the least of which is performing any and all operations while flying on a plane Well, with Git the default is to only commit locally. Of course, you could set your post commit hook to always push it to a remote if Well, it's just a slight difference in perspective rather than any big issue here. Git treats all repositories as peers, so it would never assume that just because one other particular repo has a branch marked as read only that it should be marked read only locally. It lets you commit to it, and then push to say a third and fourth repo that are writable as well. In practice this doesn't really cause any Well if you're committing changes from multiple different machines, how is that different from having say 3 different developers committing changes to the central repo? How does bzr avoid a merge when you're pushing changes from 3 separate machines? You mentioned that if you try to push and you're not up to date you'll be prompted to update (ie. pull from the upstream repo). When you do such a pull do your local changes get rebased on top or is there a merge? By your comments I guess you're saying they're rebased rather than merged, and this is how you keep a linear history. Git can do this easily, but it's not done by default. Sean -
The workflow is different.
If I commit broken changes on a repository shared by multiple
developers, they'll insult me, and they'll be right. While I find
nothing wrong in commiting broken changes to my ${HOME}/etc/ when
Err, the same way people have been doing for years ;-). If you don't
have local commits, "bzr update" will work in the same way as "cvs
update", it keeps your local changes, without recording history. Like
"git pull" does if you have uncommited changes I think.
--
Matthieu
-
On Tue, 17 Oct 2006 15:44:36 +0200 Ah, okay. Well Git can definitely manage this. Just means you have to rebase any local changes before pushing. This will keep the history linear and make sure that no merges are needed in the case you were asking about. So far, it sounds to me like bazaar and git are more alike than they are different. Each have a few commands the other doesn't but all in all they sound very similar. But i'm a Git fanboy so I aint switching now ;o) Sean -
Sure. As I said before, the little add-on of checkouts is that you say once "I don't want to do local commit here", and bzr reminds you this each time you commit. Well, where it can make a difference is that it does it in a transactional way, that is, you don't have that little window between the time you pull and the time you push your next Sure. And at least, if you want to prove that your decentralized SCM is the best, you'd better look at features other than the ability to commit on a local branch ;-). If you want a _real_ flamewar, better talk about rename management or revision identity. The thing is that most people migrated from CVS/svn, so they found their new SCM to be incredibly better the existing. But it's generally not _so_ much better than the other modern alternatives ;-). (and don't forget to thank Darcs and Monotone who brought most of the good Probably not going to switch either, but that might happen. -- Matthieu -
On Tue, 17 Oct 2006 16:19:46 +0200 Yeah, it would be bad luck, but Git wouldn't actually let the push succeed if someone had changed the upstream repo in that small window. It would complain that your push wasn't a fast forward and ask you Heh, true enough. And the fact is they're all "borrowing" the best ideas from one another. All of a sudden the others are all getting git-like bisect and gitk guis. And of course Linus has said that he got quite a bit of inspiration from Monotone originally. Beyond the distributed offline nature of using Git, the killer "feature" for me is its raw speed and flexibility[1]. It's really nice to be able to branch in under a second and try out a line of development etc. Maybe this is just as easy in Bazaar but it's not true of say Mercurial. Honestly, I just can't imagine any other SCM meeting my needs better than Git. So I have a hard time taking complaints about rename management or revision identity seriously. While they don't affect my usage, IMHO the two biggest failings of Git are its lack of a shallow clone and its reliance on shell and other scripting languages so there is no native Windows version. I'm sure both of these areas are handled better by Bazaar and/or some of the other new SCMs where they'd be a better choice than Git. Sean [1] As an aside, I don't understand why bazaar pushes the idea of "plugins". For instance someone mentioned that bazaar has a bisect "plugin". Well Git was able to add a bisect "command" without needing a plugin architecture.. so i'm at a loss as to why plugins are seen as an advantage. -
Dear diary, on Tue, Oct 17, 2006 at 02:03:21PM CEST, I got a letter (In rich countries. This may still be very different in poorer countries. E.g. some actual mplayer developer(s) from Turkey opposed transition to a distributed version control system simply because they have trouble affording the required additional diskspace for the full history. SVN is already very space-hungry for them. (It stores basically two complete checkouts in parallel.)) But the much bigger practical problem is bandwidth, plenty of people still have internet connections where downloading several tens/hundreds of megabytes of the complete history is quite a big thing, and the servers ain't gonna be happy from that either, nor those paying the bandwidth bills. ;-) And this is one of the big problems the Mozilla guys have - having everyone download 450M worth of the full CVS-imported history (and I'll bet no other VCS will beat that size) seems to be not So how is the light checkout actually implemented? Do you grab the complete new snapshot each time the remote repository is updated? Do all the (at least read-only, like "log" and "diff", perhaps "status") commands work on such a light checkout? This is something sorely missing in Git but if it's really only "we just provide bandwidth-expensive way to keep your tree up-to-date and that's all," that would not be hard at all to implement in Git too, using git-archive --remote. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 No, the lightweight checkouts store very little. They have - - a copy of tree shape (filenames, paths, sha1 sums) from the last commit. - - a copy of tree shape for the current working directory Yes. And if you check out from a read-write branch, all write commands, work, too. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNXeN0F+nu1YWqI0RAsdrAJ0bUj4swxm5sod9WnsbPZ9yIQ7FVQCdE4UB 8x0ddFkbr5cPISTihw96d8c= =/XAr -----END PGP SIGNATURE----- -
Dear diary, on Wed, Oct 18, 2006 at 02:38:37AM CEST, I got a letter I see, I guess that means "the index file and tree objects for the last Ok, one last question - do you do most of the work locally, fetching bits of data as you need, or remotely, only taking input/producing output over the network (the pserver model)? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Personally, I do not do remote commits over slow links. At home, I use a single machine, and mirror my repository to a public machine using rsync. At work, I store my repository on an NFS server, and push my repository to a public machine using rsync. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNXpu0F+nu1YWqI0RAjPTAJ4w9YOM5XLpnIP9jYywtfMr+LZLvACfdycA /TYAGUVGweR5+cPtDVAIBq4= =rsNR -----END PGP SIGNATURE----- -
Dear diary, on Wed, Oct 18, 2006 at 02:50:54AM CEST, I got a letter I meant the work of the commands (bzr log and such), not your personal workflow. :-) Sorry for being unclear. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 When using the native network protocol, work can happen remotely. (But the native protocol is quite new, and support for "smart" operations is currently limited.) When using the dumb protocols, data is fetched from the remote system and processed locally. Light checkouts are not recommended when the server is on a slow link, but heavyweight checkouts are quite suitable in that situation. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNX3j0F+nu1YWqI0RAtRcAJ0fEZam6H3hs3YHY/dEYEhk3A73BQCdENHY s9+KZTfqnDJg8mHNmC2C/Ok= =Nqcn -----END PGP SIGNATURE----- -
Ah. So in git terminology it stores index and working directory (and perhaps the name of branch). -- Jakub Narebski Poland -
Dear diary, on Tue, Oct 17, 2006 at 02:03:21PM CEST, I got a letter In fact, in Git the branch is actually created at the moment you clone. For simplicity sake, let's say you cloned just a single branch, not the whole repository (or imagine a repository with a single branch). Then, in your local repository, two branches will be created: 'origin' and 'master'. The origin branch is considered readonly (though Git does not enforce it) and only mirrors the branch in the remote repository. The master branch is the branch you do your work on, and it corresponds to the contents of your working tree. Thus, when you are "updating" your repository (we also call that "pull"), what happens is that new commits are _fetched_ from the remote repository to your 'origin' branch and then the 'origin' branch is _merged_ to the 'master' branch. (You can even separate those two steps and do them manually. So you can e.g. periodically fetch but just check diffs with your master branch and never actually merge, or whatever.) If you never do any local commits on the repository, every time you merge the 'master' branch is ancestor of the 'origin' branch and only so-called fast-forward merge happens - the 'master' branch is updated to point at the same commit as the 'origin' branch. If you _did_ do some local commits, a real merge of the two branches happens and a new merge commit tying the current master and origin history together is recorded on the merge branch. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
It will quietly accept the commit. Later when you attempt to run `git fetch` to download any changes from the remote repository to your local origin branch the fetch command will fail as it won't be a strict fast-forward due to there being changes in origin which aren't in the remote repository being downloaded. The user can force those changes to be thrown away with `git fetch --force`, though they probably would want to first examine the branch with `git log origin` to see what commits (if any) should be saved, and either extract them to patches for reapplication or create a holder branch via `git branch holder origin` to allow them to later merge the holder branch (or parts thereof) after the fetch has forced origin to match the remote repository. So in short by default Git stops and tells the user something fishy is going on, but the error message isn't obvious about what that is and how they can resolve it easily. There has been discussion about marking these branches that we know the user fetches into as read-only, to prevent `git commit` from actually committing to such a branch (we also have the same case with the special bisect branch), but I don't think anyone has stepped forward with the complete implementation of that yet. Like anything I think people get used to the idea that those branches are strictly for fetching and shouldn't be used for anything else. There's really no reason to checkout a fetched into branch anyway; temporary branches are less than 1 second away with `git checkout -b tmp origin` (for example). -- Shawn. -
Git cannot do that remotely (with exception of git-tar-tree/git-archive which has --remote option), yet. But you can get contents of a file (with "git cat-file -p [<revision>:|:<stage>:]<filename>"), list directory (with "git ls-tree <tree-ish>") and compare files or directories (git diff family of commands) without need for working directory. AFAICT working area is required _only_ to resolve conflicts during In git by default in the top directory of working area you have .git directory which contains whole repository (object database, refs (i.e. branches and tags), information which branch is current, index aka. gitcache, configuration, etc.). You can share object database locally (which includes network filesystem). You can have .git (usually <project>.git then) directory without working area. There was proposal to allow for tracking branches to be marked read-only, but it was not implemented yet. But git has reverse check: it forbids (unless forced by user) to fetch into branch which has local changes (does not fast-forward). This make sure that no information is lost. The idea is that you fetch changes into tracking branch (e.g. 'master' branch of some parent remote repository into 'origin' or 'remotes/<repository name>/master' branch); you don't commit changes to such branch. You do your own work either on 'master' branch, then merge (typically using "git pull") corresponding 'origin' tracking branch, or use separate private feature branch and use rebase after fetch. Git is designed for distributed workflows, not for centralized one. All repositories are created equal :-) -- Jakub Narebski ShadeHawk on #git and #revctl Poland -
And you can use GIT_DIR environmental variable or --git-dir option to git wrapper. -- Jakub Narebski Poland -
On Tue, 17 Oct 2006 13:45:31 +0200 Interesting, I didn't know about the --remote option. So in fact as long as the remote has enabled upload-tar then anyone can do a "light checkout". However, it appears that kernel.org for instance doesn't enable this feature. Sean -
Same as bzr then I believe. "bzr pull" will suggest you to use "merge" Note that "bound branches" and "other branches" in bzr are not so different. The "master" (the one you make a checkout of) doesn't have to know it has checkouts, and the "checkout" just has one file pointing to the "master", and you can switch from one flow to the other with "bzr bind/unbind". So, in Bazaar, all repositories are /almost/ created equal ;-). -- Matthieu -
What about 3) getting the repo with all the history while still not having to be online to actually commit to *your* copy of the repo. When you later get online, you can send all your changes in a big hunk, or let bazaar email It appears we have different ideas of what's handy. Perhaps it's just a difference in workflow, or lack of "email-commits-as-patches" tools in bazaar, but the ability to commit to whatever branch I like in my local repo and then just send the diffs by email or please-pull requests to upstream authors is what makes git work so well for me. I can ofcourse also pull the changes to another branch, or cherrypick them one by one, or... OTOH, if by "commit" you mean "send your changes back to central server", and bazaar'ish for "register my current set of changes in the local clone of the repo" is called something else, it sounds very Centralized works in git too after a fashion. Most projects have a master repo hidden somewhere that frequently gets pushed out for publishing and which most (all?) contributors sync against from time to time, but it's by no means a certainty. What *is* a certainty is that the published branches are exactly identical to the ones in the master repo, and all the downstream authors will get a history where they can easily track master's development. For git, I suppose Junio has the hidden master repo which he publishes at kernel.org. Linus does the same with the Linux repo. On a side-note, it sounds as though the "bound branch" scenario encourages making a big change as one mega-diff, so long as it implements one feature, whereas the git workflow with topic-branches that eventually gets merged to master allows changes to sort of accumulate up to a feature in the steps one actually has to take to make the feature work. Side-note 2: Three really great things that have made work a lot easier and more enjoyable since we changed from cvs to git and that aren't mentioned in the comparison table: * ...
Well, the discussion was about checkouts, so I was talking about checkouts ;-). What you mention is the default behavior of Bazaar when you use "bzr branch" or "bzr get". BTW, it's also possible to do this with a You have "bzr bundle" in Bazaar, and there was work to have it actually send the email ( http://bazaar-vcs.org/SubmitByMail ), but I don't think it's finished yet. And yes, this is a great feature, the first time I used it was with Darcs, and I was impressed how easy I could submit a patch without any setup and with a 5-lines tutorial. Even wiki seems complex after Sure. Once again, Bazaar does it this way too. There's an _additional feature_ called checkout which allows you to work in another way, though. As most "feature", it's not useful to everybody. Sure. And regarding this, hopufully, most modern VCS go in the same direction. > * Dependency/history graph display tools
In bzr, the "bundle" appears like a patch, but it actually contain the same information as the revision(s) it contains (I believe this applies to hg and Darcs too). A bundle can be used almost like a branch. That's a key point, since revision identity is not based on content's hash, so applying a patch is very different from merging a That's the key point, but patch review for non-accidental developpers Bazaar's bundle use base64 encoding for binaries. I don't think that's efficient binary diff (xdelta-like) though. Aaron has been fighting quite a lot with MUA and MTA mixing up the patches (line ending in particular) ... -- Matthieu -
The patch generated by git-format-patch has author information (in "From:" header), original commit date (in "Date:" header), commit message (first line in "Subject:", rest in message body), place for comments which are not to be included in commit message, diffstat for easier patch review, and git extended diff (with information about renames detection, mode changes, 7-characters wide shortcuts of file contents identifiers). It does not record parent information, original comitter and comitter date, which branch we are on etc. You can quite easily provide ordering of patches. Sending patches via email prohibits first line of commit message to be enclosed in brackets (subject usually is "[PATCH] Commit description" or "[PATCH n/m] Commit description") and enforces git convention of commit message to consist of first line describing commit shortly, separated by empty line from the longer description and signoff lines. If I remember correctly git binary diff format is xdiff based, and uses kind of ascii85 encoding (PostScript). -- Jakub Narebski Poland -
Dear diary, on Tue, Oct 17, 2006 at 04:41:02PM CEST, I got a letter It should be noted that there's no user interface for sending/receiving that and I suspect no reasonably usable user interface for creating it. How frequently are the bundles used in practice? It's a cultural difference, I suspect. Git comes from an environment based on intensive exchanges of patches and patch series and an environment not mandating developers to use any tool besides diff/patch, so Git is very focused at good support for applying patches and there simply has been no big conscious demand for bundles support given this. Another aspect of this is that Git (Linus ;) is very focused on getting the history right, nice and clean (though it does not _mandate_ it and you can just wildly do one commit after another; it just provides tools to easily do it). This means that the downstream maintainers have to rebase patches, possibly reorder them, and update the changesets with bugfixes instead of stacking the bugfixes upon them in separate changes - then Linus merges the patches and only at that point they are "etched" forever. This means that the history will contain neatly laid out way of how $FEATURE was achieved, but of course also more work for downstream maintainers. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Many times each day. Most submission to the bzr mainline are done with Yes, rebasing is very uncommon in the bzr community. We would rather evaluate the complete change than walk through its history. (Bundles only show the changes you made, not the changes you merged from the mainline.) In an earlier form, bundles contained a patch for every revision, and people *hated* reading them. So there's definitely a cultural difference there. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNXWW0F+nu1YWqI0RAuRnAJ9aZVLo4T1sfmyGC2t364UyHX+6wACff7sM peal5rAdk/T515RGeKXkWlo= =O61J -----END PGP SIGNATURE----- -
Dear diary, on Wed, Oct 18, 2006 at 02:30:14AM CEST, I got a letter BTW, I think what describes the Git's (kernel's) stance very nicely is what I call the Al Viro's "homework problem": http://lkml.org/lkml/2005/4/7/176 If I understand you right, the bzr approach is what's described as "the dumbest kind" there? (No offense meant!) -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
Yes and no, The bundle includes both the full final thing, and each step along the way. Each step along the way is something you'll get when you merge it. Once merged, it will be "next one" in the description above. It would typically look something like this in "bzr log"(shortened) In this example, doing C requires doing A and B as well... committer: foobar@foobar.com message: merged in C ------- committer: bar@bar.com message: opps, fix bug in A ------- committer: bar@bar.com message: implement B ------- committer: bar@bar.com message: implement A So, you'll get full history, including errors made :) You can also see who approved it to this branch (foobar) and who did the actual work (bar) /Erik -
Dear diary, on Wed, Oct 18, 2006 at 11:28:32AM CEST, I got a letter
I see, that's what I've been missing, thanks. So it's the middle path
(as any other commonly used VCS for that matter, expect maybe darcs?;
patch queues and rebasing count but it's a hack, not something properly
supported by the design of Git, since at this point the development
cannot be fully distributed).
I also assume that given this is the case, the big diff does really not
serve any purpose besides human review?
But somewhere else in the thread it's been said that bundles can also
contain merges. Does that means that bundles can look like:
1
/ \
2 4
| | _
3 5 |
\ / | a bundle
6 |
~
In that case, against what the big diff from 6 is done? 2? 4? Or even 1?
--
Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
#!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj
$/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1
lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/)
-
When you run the "bundle" command, you can tell it what you want the bundle to be created against. So, If I just commited 5, I can run "bzr bundle -r-1" to get the bundle against 4, or I can do "bzr bundle path/to/other/branch" to get a bundle that relates to it. To merge a bundle into a branch, the parrent of the first revision in the bundle, has to exist in the branch is't being merged into. (well, unless you use patch, but that's outside of bzr, and bzr wouldn't know about each revision in them) This command will find a common root and create a bundle that corresponds to it. The "big diff" as you call it, would be the changes between the point where the branch was created, and the last commit. In the case of just committing 5, and you want to create a bundle that can be merged back at point 6, the "big diff" would be against 1 since that's the branch point. /Erik -- google talk/jabber. zindar@gmail.com SIP-phones: sip:erik_bagfors@gizmoproject.com sip:17476714687@proxy01.sipphone.com -
Take for example "[PATCH 0/6] ref deletion and D/F conflict avoidance with packed-refs." Isn't it easier to review than "bundle", aka. mega-patch? -- Jakub Narebski Poland -
There are even more important reasons to prefer a series of micro-commits over a mega-patch than just ease of merging. In the cairo project, I've often reviewed a single patch and said: "This all looks like perfectly good code and I'd be happy to have it all in the tree. But please rebuild this as a series of independent patches (perhaps along the lines of a, b, c, ...)" I do that not just to make the history "look nice" but because code history is something we _use_ a lot and separate commits for separate actions just make the history so much more usable. We have great tools like bisect to identify commits that introduce bugs. I know that I'd be delighted to see bisect comes back pointing at some minimal commit as causing a bug, (which would make finding the bug so much easier). But it's also been my experience that the largest commits are also the most likely to be the things returned by bisect. Big commits really do introduce bugs more frequently than small commits. Finally, if someone had gone through the useful work to create small, independent changes, (and likely finding and fixing bugs in the process), what a horrible shame it would be to throw away that work and merge it as a single patch, (welcome to the pain of CVS branch merging). Now, I do admit that it is often useful to take the overall view of a patch series being submitted. This is often the case when a patch series is in some sub-module of the code for which I don't have as much direct involvement. In cases like that I will often do review only of the diff between the tips of the mainline and the branch of interest, (or if I trust the maintainer enough, perhaps just the diffstat between the two). But I'm still very glad that what lands in the history is the series of independent changes, and not one mega commit. -Carl
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 A bundle isn't a mega-patch. It contains all the source revisions. So when you merge or pull it, you get all the original revisions in your Bisect should work equally well with revisions pulled or merged from a The number of changes shown in the diff has nothing to do with the So the difference here is that bundles preserve the original commits the changes came from, so even though it's presented as an overview, you still have a series of independent changes in your history. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNZ820F+nu1YWqI0RAjNyAJ90HMCAiopuAMvkKlcCEdc4F6QKLwCdGEWI VOZThAQrvqybe5z93eC44BY= =xBZM -----END PGP SIGNATURE----- -
But what patch reviewer see is a mega-patch showing the changeset of a whole "bundle", isn't it? I think it is much better to review series of patches commit by commit; besides it allows to correct some inner patches before applying the whole series or drop one of patches in series (and it happened from time to time on git mailing list). So if git introduces bundles, I think they would take form of series of "patch" mails + introductory email with series description (currently it is not saved anywhere), shortlog, diffstat and perhaps more metainfo like bundle parent (which I think should be email form of branch really), tags introduced etc. -- Jakub Narebski Poland -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Yes. Carl was saying that, aside from the issue of what a reviewer sees, a bundle is bad for other reasons. I am saying those other reasons don't apply. I wasn't addressing the issue of what a reviewer sees. To me, seeing the individual patches is like reading a book where every page has a different word on it, and so it's hard to put it together into a full sentence. I'm not saying my way is The Right Way, just my personal preference. For larger pieces of work, we try to split them up into logical units, and merge those units independently. The Bundle format can also support a patch-by-patch output, but we don't It's important to remember that bundles represent revisions, not patches. When you merge a bundle, you 1. install those revisions into your repository. These revisions are latent, as though they were on another branch. 2. merge the head revision of the bundle into your branch. Virtually any merge selection process that works with branches would also work with bundles. So tweaking before merging is really a matter The parent in a bundle revision is the revision-id of the parent of that revision in the branch. I don't think it's possible to change that parent id into something else, without changing the meaning of a bundle. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFFNlb40F+nu1YWqI0RAnxxAJ9ETibey1Qyvz/zVxdGipaHGtnddgCfTtzt CQUZ2dK64BS5K5WYecFAsfM= =bJxq -----END PGP SIGNATURE----- -
As for what the reviewer wants to see, I think it depends on what kind
of code it is. Kernel code is complex and does not have (at least I have
not heared of) unit-tests, so short patches are preferable for review.
And since C is of the more verbose languages, short patches mean
spliting them up into several pieces.
On the other hand bzr has unit-tests and python is less verbose, so the
single patch for a feature is not so big and is manageable. The patches
to bzr still come in logical steps, but usually one step per feature is
enough.
Also programmers usually don't develop even the single logical step as a
single commit. Instead they they also commit to backup their work,
when they try something they think they may in future return, when they
need to continue on another computer and so on. And these commits are
generally not logical steps. Also the steps are often not in a logical
order. Therefore showing diff for each commit in the bundle often does
not make sense.
So there is one bundle per logical step and therefore has a summary
diff. Individual bundles for individual steps are preferable anyway,
since the maintainer may decide to accept just some of them. A tool to
generate a series of bundles (either each with just one commit or each
with several commits) would be possible, just noone was interested
enough to do it yet.
--------------------------------------------------------------------------------
- Jan Hudec `Bulb' <bulb@ucw.cz>
-
In git you can backup your work on temporary branch; besides there That is why before sending patch series based on some feature branch, you should at least rebase the branch on top of current work, to ensure that the series would apply cleanly. If feature branch/patch series needs cleanup (going from "answer" to "solution" http://lkml.org/lkml/2005/4/7/176), i.e. patch (commit) reordering, joining two patches into one, patch splitting, you can use git-cherry-pick, git-cherry-pick --no-commit and git commit --amend combination, or git-format-patch, patch editing and reordering, and git-am. Or just use StGit or pg. -- Jakub Narebski Poland -
You did. The plugin is largely based on my experiences with the git version, and explicitly gives credit in the comments. -
Differences in nomenclature is really messing this discussion up. In git, a "checkout" is the act of pulling objects from the object database Now I'm really confused. Does bazaar have both "clone" (git-style fetching a full repo and all the branches) and "checkout" (cvs-style >> * Dependency/history graph display tools
Yes, it has both. That's "bzr branch" (git clone) and "bzr checkout" (cvs checkout). Difference between "bzr branch" and "git clone" is that bzr doesn't fetch all the branches. It fetches one "branch" (succession of revisions) with all the ancestors of the revisions of the branch. -- Matthieu -
You're not telling us bzr still follows the utterly stupid update-before-commit model, right? Right? OG. -
One last time: bzr _CAN_ follow the utterly stupid update-before-commit model. It doesn't force you to do so, obviously. -- Matthieu -
Dear diary, on Tue, Oct 17, 2006 at 01:19:08PM CEST, I got a letter It isn't very nice because it enforces the update-before-commit workflow, which was complaint of many CVS users and I can remember it being one of the selling points of the distributed VCSes in 2001 or so, although it is not so emphasized lately. (I understand that this is something optional in Bazaar.) BTW, merge commits aren't bad. They reflect what really happenned, explicitly record the merge resolution taken, if there was any, and protect you from accidentally losing or damaging [any portion of] your changes. And they aren't cluttery either since we hide them from non-graphical history listings by default. Still, I can recognize that in some scenarios, people might find it useful, and I can remember some people asking for it in the past. So I couldn't resist and implemented it in Cogito as cg-commit --push. Pushed out now. Took me about 5 minutes implementing it and 10 minutes documenting it. ;-) P.S.: A general note for bleeding-edge Cogito users, I've rewritten the local changes handling so that we always do three-way merge now instead of that braindead patches diffing/applying, but it's not completely stable yet, some testcases still fail. So be a bit careful when updating/uncommitting/switching/... with uncommitted changes in the working tree. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
On Tue, 17 Oct 2006 00:24:15 -0400 Yeah, even in git you typically don't publish your working tree when making it available for cloning. In fact the native git network That is a very nice feature. Git would be improved if it could I'm not sure what you mean here. A bzr checkout doesn't have any history does it? So it's not a mirror of a branch, but just a checkout of the branch head? If so, Git can export a tarball of a branch (actually a snapshot as at any given commit) which can be mirrored out. Sean -
Hi, It would also make things slow as hell. How do you deal with something like annotate in such a setup? Ciao, Dscho -
On Tue, 17 Oct 2006 12:30:27 +0200 (CEST) Some commands like annotate might not make any sense in such a set up. But one way to get the same (perhaps even better) feature into git would be to support shallow clones, in which case even annotate would continue to work even if somewhat crippled by the lack of a complete history. Sean -
Hi, You'd probably have to do all processing server-side (git log, blame, merges... like in subversion, where you can merge and rename/move files remotely, IIRC). Of course, all the things which make git really useful for me (gitk, git log with all its arguments etc.) would not be available. Cheap checkouts would be made possible easily that way at the cost of higher server load and an abstraction layer over network for object access. I don't know if that sounds reasonable at all. Matthias -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 For the particular case of annotate, bzr is designed to store annotations at commit time. So annotate should require remote access to a small amount of data from two files-- not a great cost. But our default form of checkout contains a local copy of all history data, so that readonly operations happen at local speed. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFFNN8Y0F+nu1YWqI0RAqXtAJ4qKGQ5ZwlMF795kz3udeuRTcRy6wCghr53 tjw9cNVxzrQ0XSUO2v52ZIo= =W6q7 -----END PGP SIGNATURE----- -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sure, and so can bzr. But using a checkout of the branch head means: - - No one has to do anything special to provide a working tree of a given revision - - I can still run any readonly operations I desire - - I can update to the latest version of bzr.dev with one command. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFFNTRc0F+nu1YWqI0RAsL2AKCCG0bP8m01WVllfPMzCdFZjmgEgACfeToz 57HERFJ6ZkkS3VrxLRnVPAs= =3CX7 -----END PGP SIGNATURE----- -
If I can add some clarification: There is a lightweight checkout and
heavyweight checkout. The former contains no history and does everything
(except status and I am not sure about diff) by accessing the remote
data. The later contains mirror of the history data and does
write-through on commit (and otherwise behaves like normal branch with
repository)
What would be really useful would be a checkout, or even a branch (ie.
with ability to commit locally), that would only contain history data
since some point. This would allow downloading very little data when
branching, but than working locally as with normal repository clone.
In bzr this was already discussed and the storage supports so called
"ghost" revisions, whose existence is known, but not their data. There
are even repositories around that contain them (created by converting
data from arch), but to my best knowledge there is no user interface to
create branches or checkouts with partial data.
--------------------------------------------------------------------------------
- Jan Hudec `Bulb' <bulb@ucw.cz>
-
On Sat, 21 Oct 2006 20:58:25 +0200 In Git the same functionality can be achieved with so called shallow- clones. Unfortunately, they've only been discussed and not yet implemented. Sean -
There are two forms of checkout: a normal checkout which contains the complete history of the branch, and a lightweight checkout, which just has a pointer back to the original location of the history. In both cases, a "bzr commit" invocation will commit changes to the remote location. In general, you only want to use a lightweight checkout when there is a fast reliably connection to the branch (e.g. if it is on the local file system, or local network). Aaron would be talking about a normal (heavyweight) checkout here. With a heavyweight checkout, you can do pretty much anything without access to the branch. In contrast, almost all operations on a lightweight checkout need access to the branch. James. -
So the "lightweight checkout" is equivalent of "lazy clone" we have much discussed on git mailing list about (without any resulting code, unfortunately). The point of problem was how to do this fast, without need for fast reliable connection to the repository it was cloned from. For example if to leave fetched objects in some kind of cache, or even in "lightweight checkout"/"lazy clone" repository database. If repository we do "lightweight checkout"/"lazy clone" from is on local file system (perhaps network file system), then we can use alternates mechanism (git clone -l -s). That's why "lazy clone" was We have terminology conflict here. Bazaar-NG "pull" and "merge" vs. GIT "fetch", "pull" and "merge"; Bazaar-NG "checkout" vs. GIT "clone" and "checkout". In GIT "clone" is what is used to copy whole repository, "checkout" is what is used to extract given/current branch to [given] working area. -- Jakub Narebski Poland -
In bzr there are two different kind of checkouts. One is a called a lightweight checkout and that's really a "normal" checkout in the way svn for example does it. In this mode, you have the branch remotely and only the working tree locally. So it's just a checkout of the branch head (of any other revision if using -r when doing the checkout). Then there are none lightweight checkouts, heavyweight checkouts. These are the default type. A heavyweight checkout is in fact a full branch locally, but it is "bound" to the remote branch. What this means is that all commands such as diff/status/log/etc can be done locally. So it's really quick. It acts the same as a lightweight checkout in most regards, so when I run "bzr update" it actually pulls from the remove branch, and when I run "bzr commit" it commits the same revision in both the remote branch and the local branch. It does this in one transaction so one can't work and the other fail (they would both fail in that case). What this also gives you is that when you want to clone the branch, you don't need to go the the remote branch to get the revisions and also, when being offline, you can commit locally. Committing locally is a very cool feature in my mind. If you work in a centralized manner with checkouts, you normally commit directly to the central branch, but when you are offline, that will fail (of course :) ). So what you can do then is to run "bzr commit --local" to commit only to your local checkout branch, then when you get online again you can run "bzr update". In this case the update will take any new commits that has been done while you were away, pull them into your local branch, and make your local commits into something that has been merged into the "checkout". I find this REALLY useful. Don't know if that made sense, here it is in commands. $ bzr checkout t p $ cd p $ echo hej >> hosts $ bzr commit --local -m 'offline' $ echo hej >> hosts $ bzr commit --local -m 'offline 2' Now I get ...
Ehh. Exactly like the bzr numbers? You have to have access to the original
repo to name it.
So your point is?
If you do
git log v2.6.17
in a kernel repository, you'll see exactly what I see - because you'll
have gotten the tags, aka the "easy revision names".
Now, I'm obviously biased, but the thing is, git really does do this
right. No meaningless numbers. You give _meaningful_ revision names, and
they can be extremely powerful.
And no, it's not just tags or the raw SHA1 numbers. You can do
relationships like
git log HEAD~5..
which means "show the log for everything since five parents ago" (which is
_not_ the same as "show the last five revisions", because one of them may
have been a merge, and brought in a lot more of new commits).
Or, you can say
git diff mybranch@{2.days.ago}..nextbranch
which says exactly what you'd read it as: show the diff between what
"mybranch" looked like 2 days ago and what "nextbranch" looks like right
now.
Or, since the namespace is the same for commit history _and_ for actual
file contents, and since some commands don't need commits, you can decide
to name not a revision, but a specific file or subdirectory in a revision,
and do things like
git -p grep -1 request_irq v2.6.17~2:drivers/char
where the "revision" is not a commit revision at all, it's a _tree_
revision, because we've looked up the revision for "v2.6.17~2" (which
means "the grandparent of the tag 2.6.17"), and then within that commit we
looked up the tree "drivers/char", and then we grepped (recursively) for
the string "request_irq" within that subtree (with one line of context),
and then we paginated the output through "less" (or whatever your pager is
set to).
In other words, yes, the above does _exactly_ what you'd expect it to do.
The fact is, nobody ever uses the SHA1 names directly in their normal
work. You'd use the branch names, tag-names, or some relationship operator
like "this long ago" or "the parent of" or ...Hi Aaron, How should this cope with a distributed project? IOW how does it deal with "this revision and that revision are exactly the same"? If I understand you correctly, you are claiming that you are not really identifying a revision, but a revision _at a certain place with a place-dependent number_. This conflicts with my understanding of a It depends on your usage. If you want to do anything interesting, like assure that you have the correct version, or assure that two different person's tags actually tag the same revision, there is no simpler Of course! Persistence (and reliability) are the number one goal of git. Performance is the next one. As an example of completely independet branches, look at the "next" and the "todo" branch of git. They are _completely_ independent, i.e. not even Oh, we start another flamewar again? Honestly, if you want to record renames, why don't you also support (with a command for each of those purposes) code copying? And refactoring? And copyright year bumps? _put your favourite here_ If you really, really think about it: it makes much more sense to record your intention in the commit message. So, instead of recording for _every_ _single_ file in folder1/ that it was moved to folder2/, it is better to say that you moved folder1/ to folder2/ _because of some special reason_! Same goes for all other thinkable examples. If you want to track code, then let the tracker do its work, i.e. let git-pickaxe figure where your code came from. It is likely being more It is more like the Unix way. Let each command do _one_ thing, but let it Welcome to git! Git's commands are very efficient, and you can even pipe them efficiently! And now that we have GIT_TRACE, diagnostics are no concern. Ciao, Dscho -
Hi! Dear diary, on Tue, Oct 17, 2006 at 01:45:34AM CEST, I got a letter I think Aaron rather meant that in case of an error, the error messages may seem incoherent from the perspective of a porcelain user if it's been generated by the plumbing. And I had that problem in Cogito as well few times in the past, but I think most of those are reasonable now (I can't think of a counter-example off the top of my head). Calling multiple git commands _is_ a problem, especially in a loop, but I think it's more the inherent fork()+execve() overhead than whatever happens over and over when main() takes over. Many git commands got adjusted so that you can call them just once and then feed from/to them over longer time period. -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ #!/bin/perl -sp0777i<X+d*lMLa^*lN%0]dsXx++lMlN/dsM0<j]dsj $/=unpack('H*',$_);$_=`echo 16dio\U$k"SK$/SM$n\EsN0p[lN*1 lK[d2%Sa2/d0$^Ixp"|dc`;s/\W//g;$_=pack('H*',/((..)*)$/) -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 There are two answers here. One is that the URL + number is UI, not internals. A unique ID is used internally, so that can be compared. But to fully ensure that there are no differences, i.e. that no one has No, I am claiming that a revision at a certain place with a place-dependent number is one name for a revision, but it may have other I can use the 'bzr missing' command to check whether my branch is in sync with a remote branch. Or I can use the 'pull' command to update my You'd be surprised. When we last spoke to the Mercurial team, Mercurial didn't support multiple persistent branches in one repository. Pulling from a remote repository could join two branches into one. I'm told I'd hope not. It sounds as though you feel that supporting renames in the data representation is *wrong*, and therefore it should be an insult to you if we said that Git fully supported renames. Aaron -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFNGVq0F+nu1YWqI0RAsXiAJ9hjH2sQGG3E9oIYP2SxscXvVQsJACdHtkj +r37JPSjbQCuchPo08P3px8= =5MHE -----END PGP SIGNATURE----- -
I think you missed the simplicity of the git naming here. With git, I can receive a bug report that specifies a bug that appears in a revision such as: 71037f3612da9d11431567c05c17807499ab1746 And since I have a commit object in my repository with that same name I have a strong assurance that I am testing the identical software as the bug reporter without me ever needing any access to pull from the reporter's repository. And this works in an entirely distributed fashion. Any two users can be certain they are working with identical software on both ends by exchanging and comparing a few bytes, (in email, irc, bugzilla, what have you), without any need to refer to a common repository which both users have access to. -Carl
It would seem that the majority of folks on the Git list feel that way, myself among them. I don't know that we'd find it an insult to say Git fully supports renames but I do think we have had better results from *not* recording them and looking for them after the fact with smart tools. Junio's recent work with git-pickaxe (or whatever its name finally settles out to be) is a perfect example of this. Despite not having "recorded renames" git-pickaxe is able to fairly accurately detect blocks of code moving between files, of which renaming files is just a special case. This provides some fairly accurate blame reporting pointing to exactly which commit/author/datetime put a given line of code into the project. No additional metadata required. All existing repositories can immediately benefit from the new tool. Rather slick if you ask me. -- Shawn. -
Not recording and not supporting are quite different things. What we don't do is to _record_ renames in the data structure. I personally would not use a word as strong as _wrong_ (and Linus may disagree), but (1) we can support renames without recording them just fine, (2) recording renames would not help to tell users about line movements across files which we would want to do, and (3) we are getting closer to come up with a way to even do (2) without recording renames. Given these, perhaps I might say recording renames is _pointless_ when I am in good mood. -
Yes. There's a risk of confusing a feature with an implementation detail. From http://bazaar-vcs.org/RcsComparisons: "If a user can rename a file in the RCS without loosing the RCS history for a file, then renames are considered supported. If the operation resultes in a delete/add (aka "DA pair"), then renames are not considered supported. If the operation results in a copy/delete pair, renames are considered "somewhat" supported. The problem with copy support is that it is hard to define sane merge semantics for copies." The first sentence sounds like a description of a user-visible feature. The rest of it sounds like implementation. And git probably has some deficiencies here, but it'd be more useful to identify them in terms of things a user can't do. --b. -
On Tue, 17 Oct 2006 01:08:59 -0400 The "bzr missing" command sounds like a handy one. Someone on the xorg mailing list was recently lamenting that git does not have an easy way to compare a local branch to a remote one. While this turns out to not be a big problem in git, it might be nice to have such a command. Sean -
Just a small nit here: bzr does /not/ record the move of every file: it records the rename of folder1 to folder2. One piece of data is all thats recorded - no new manifest for the subdirectory is needed. Of course, a user can choose to move all the contents of a folder and not the folder itself - its up to the user. By recording the folder rename rather than the contents rename, we get merges of new files added to folder1 in other branches come into folder2 automatically, without needing to do arbitrarily deep history processing to determine that. This also does not prevent us doing history analysis as well, to determine other interesting things - such as cross file 'blame' as has been mentioned in this thread
