Hey all, I just wanted to let those of you who are interested know that I've been making a lot of progress on the Git Community Book (http://book.git-scm.com) I was wondering if anyone was interested in helping me with a few parts. For one, there are some sections that I personally have very little experience with, and was looking for some notes/blog posts/personal experiences on, namely Advanced History Modification (filter-branch, advanced rebasing, etc), Corruption Recovery, Branch Tracking, Subversion Integration, Git with Perl/Python/PHP, and Using Git with Editors (especially NetBeans/Eclipse). Also, the last section of the book is on some of the plumbing - mostly stuff I've found difficult to pick up with the existing documentation while re-implementing stuff in Ruby. I would really appreciate it if someone could proofread some of these chapters for errors: http://book.git-scm.com/7_the_packfile.html http://book.git-scm.com/7_raw_git.html http://book.git-scm.com/7_transfer_protocols.html Some of the next things I'm interested in producing is a cookbook style guide and some searching tools for all the online documentation, just to keep everyone up to date on where I'm going with the project. Also, there is now a simple PDF downloadable version of the book available and being kept up to date with the html version. Thanks, Scott --
Hello Scott! Nice book, I just started reading it and I have a recommendation to make, at "Chapter 4: Git Treeishes" you write --------- http://book.git-scm.com/4_git_treeishes.html Range Finally, you can specify a range of commits with the range spec. This will give you all the commits between 7b593b5 and 51bea1 (where 51bea1 is most recent), excluding 7b593b5 but including 51bea1: 7b593b5..51bea1 This will include every commit since 7b593b: 7b593b.. --------- This in not quite correct. "commits between A and B" cannot really apply here. I believe that "commits reachable from B and not from A" is more precise. Actually you are already using the "reachability" explanation at the start of "Chapter 3: Basic usage". This issue is also described at the rev-parse man page. Apart from that, you could also include "a...b" syntax for completeness. -christos --
Hi, I just had a very quick look over the PDF, meaning only looking at pictures and headlines. Just nitpicking about one thing: I was wondering if "Stash Queue" is the right headline, because I usually use git stash save # oh, an interrupt, have to do something else now and after this is done: git stash pop # back to the real work And if you are interrupted in an interrupt, you want the last stash being the first one to pop, which is a stack-like (last in, first out) behavior. Of course, there may be cases where you want the queuing behavior that you advertise in the book. I use it rather seldomly. But perhaps it is just me :-) Regards, Stephan -- Stephan Beyer <s-beyer@gmx.net>, PGP 0x6EDDD207FCC5040F --
The checksums in the index file "trailers" are all claiming to be 4 bytes, and that's wrong - they're full SHA1 sums at 20 bytes each. The v2 pack-file _also_ has per-object CRC's, and those are indeed just 4 bytes each, and are correctly listed as such. The pack-file itself also has a few more things there, it's not just the "PACK" string and then the objects. It has two more 32-bit words: a pack file version number and the number of entries in the pack-file (all network byte order). It also has its own checksum at the end (20-byte SHA1 again). But looks good otherwise from a quick look. Linus --
Nice pictures. You might also want to know that code for reading pack idx version 2 was backported to 1.4.4.5 for people who are stuck on 1.4.4 series for whatever reason. What is the target audience of this section? If it is written for a mere curious type, or if it is written to give "here is the general idea, for more details read the source", the level of detail here would be Ok. If you are writing for people who want to (re)implement something that produces these files, you might want to at least say that offset/sha1[] table is sorted by sha1[] values (this is to allow binary search of this table), and fanout[] table points at the offset/sha1[] table in a specific way (so that part of the latter table that covers all hashes that start with a given byte can be found to avoid 8 iterations of the binary search). <data> part is just zlib stream for non-delta object types; for the two delta object representations, the <data> portion contains something that identifies which base object this delta representation depends on, and the delta to apply on the base object to resurrect this object. ref-delta uses 20-byte hash of the base object at the beginning of <data>, while ofs-delta stores an offset within the same packfile to identify the base object. In either case, two important constraints a reimplementor must adhere to are: * delta representation must be based on some other object within the same packfile; * the base object must be of the same underlying type (blob, tree, commit I am guessing this is for Porcelain writers who use plumbing. Please don't teach echoing into .git/refs/... but DO teach using update-ref with the -m option. We do not want people's random Porcelains flipping the tip of branches without leaving trail in reflog for users to use to recover from mistakes. --
I've implemented all of these and Linus's fixes and suggestions. Thanks for the feedback. To answer your earlier question, these docs are basically for people working on bindings/re-implementations in other languages, since there is no real linked library available yet, as a primer before they dig into the source, or possibly so they don't have to. I'm not fantastic at C, so it took me a while in some cases - figuring out that the size listed in the object header was not the actual size of the data, but the size of it when expanded, for example, was not very easy to do. I've been doing a lot of work on re-implementations in Ruby and ObjC because I can't easily make real bindings, so I thought I would add things that I could not easily find in the docs for others that are trying in other languages. If you want, I could create a patch for any of this stuff to Documentation/ (that goes for the whole book), but someone will have to tell me which parts might be useful to add. Thanks again for taking the time! Scott --
OK, time for me to throw in comments. ;-) I do like this book, its organized and concise. Thanks for doing it. http://book.git-scm.com/7_how_git_stores_objects.html: The loose object formatting of header = "#{type} #{size}#body" store = header + content I can't read Ruby so I'm not sure what the header value computes out to here. #body should be a \0. I'm also not sure that the prior line setting size = content.length.to_s is very clear for the non-Ruby people to understand how a size is formatted. If the code shown here is the Ruby implementation I'm a little concerned about it writing directly into the loose object. If the write is partial then you have a partial object which is at the right name, but is unusable. That can give you corruption that is difficult to track down and fix. C Git and JGit both write to temporary files then atomically move the temporary file into position under its proper name only after it has been fully written. If an implementor is implementing they should be offered this advice, and probably do so right here in this section of the book. "When objects are written to disk, it is often in the loose format, since that format is less expensive to access." I'm not sure that statement is true. Access from packs tends to scream compared to access from loose objects. The overheads of opening and closing the file descriptors, even on Linux, is what kills performance for data access. However Git writes to loose objects first and packs later for _safety_ not efficiency. Although it is a lot more efficient to write a 2 KB loose object and avoid rewriting a 50 MB pack, but its also less likely to fail and make you lose your work. http://book.git-scm.com/7_the_git_index.html: I wouldn't say that the index stores permissions. More like it stores the "class" or "type" of the thing located at that path. There are 4 major classes: - regular file - executable file - symbolic link - git submodule The 5th class is the s...
Thanks a ton for this, I'll incorporate all of this. Sorry, the markdown thingy is translating all the '\0's to '#body' for some freaking reason unless I write it as '\\0'. I'll fix this - it's difficult for me to find these sometimes. As for the rest of the ruby That is a good idea - I don't do it that way and I certainly will change the implementation to do so and modify these docs to reflect Thanks for the clarification. I write to loose objects first largely because it's so much easier to do. But also because I don't mmap objects, so packfile access is not faster for implementations that can't do that very well. Also, I had originally meant "less expensive Interesting. This documentation is actually from the User Manual - I'll update this chapter first and if it looks better, I'll submit a I'm an idiot. I say this because I actually implemented a bunch of this stuff (in Ruby) and ran into most of these issues when trying to implement it. So I knew these things not 3 weeks ago, but I still wrote it this way. Dur. Thanks for the corrections, I'll update Thanks again for all the time it must have taken to review all of this - I'll make sure it gets into the book, and where appropriate, back into the UM or other internal git docs. Scott --
I have experience mixing C and Ruby code if you are interested, it's actually quite easy. I also think a shared library would make sense. Keep up the good work ;) -- Felipe Contreras --
I'm going to bite and ask the obvious questions: 1. How does what you're producing differ from the current Git Users' Manual? 2. Is this project of yours aiming to obsolete the Git Users' Manual with "official" sanctioning from people involved with Git? 3. Assuming 2 is a "no", patches to the Users' Guide would be nice. :) -- Thomas Adam --
Just for reference, a lot of this was discussed here a while back: http://thread.gmane.org/gmane.comp.version-control.git/90653 I'm going for a different audience with this project. I'd like for it to be a lot more user-friendly, easily digestible, and to include I think there will be people who prefer the Users Manual format, who think screencasts are wussy :) Also, I'm not sure an "official" sanctioning would do much of anything - because of the images and screencasts, this will never be included in the git source like the UM is, but it's also open source so if I would love to do this, but I don't know what exactly the community thinks is missing/lacking. My ideas about what is helpful is rarely the same as the git lists :) However, if someone pointed to one of the chapters I wrote and said "that would be great in the UM", I would happily convert it. Scott --
| Ingo Molnar | Re: [BUG] long freezes on thinkpad t60 |
| Rafael J. Wysocki | Re: [Bug 10030] Suspend doesn't work when SD card is inserted |
| Jamie Lokier | Proposal for "proper" durable fsync() and fdatasync() |
| jimmy bahuleyan | Re: how about mutual compatibility between Linux's GPLv2 and GPLv3? |
git: | |
| Martin Langhoff | Handling large files with GIT |
| Matt Mackall | Re: cleaner/better zlib sources? |
| Wink Saville | git-svn segmetation fault |
| Bill Lear | Meaning of "fatal: protocol error: bad line length character"? |
| Florin Andrei | firewall is very slow, something's wrong |
| Wijnand Wiersma | Almost success: OpenBSD on Xen |
| Marcus Andree | Re: OpenBSD kernel janitors |
| Richard Stallman | Real men don't attack straw men |
| David Miller | Re: tcp bw in 2.6 |
| Rick Jones | Re: 2.6.24 BUG: soft lockup - CPU#X |
| Patrick McHardy | [NET_SCHED 00/04]: External SFQ classifiers/flow classifier |
| Patrick McHardy | Re: [PATCH 2/2] [e1000 VLAN] Disable vlan hw accel when promiscuous mode |
