login
Header Space

 
 

Git Community Book

Previous thread: [PATCH] avoid gitweb uninitialized value warning by Joey Hess on Friday, September 5, 2008 - 2:26 pm. (2 messages)

Next thread: [PATCH] Add --dirstat-by-file diff option by Heikki Orsila on Friday, September 5, 2008 - 3:27 pm. (3 messages)
To: git list <git@...>
Date: Friday, September 5, 2008 - 3:08 pm

Hey all,

I just wanted to let those of you who are interested know that I've
been making a lot of progress on the Git Community Book
(http://book.git-scm.com)  I was wondering if anyone was interested in
helping me with a few parts.  For one, there are some sections that I
personally have very little experience with, and was looking for some
notes/blog posts/personal experiences on, namely Advanced History
Modification (filter-branch, advanced rebasing, etc), Corruption
Recovery, Branch Tracking, Subversion Integration, Git with
Perl/Python/PHP, and Using Git with Editors (especially
NetBeans/Eclipse).

Also, the last section of the book is on some of the plumbing - mostly
stuff I've found difficult to pick up with the existing documentation
while re-implementing stuff in Ruby.  I would really appreciate it if
someone could proofread some of these chapters for errors:

http://book.git-scm.com/7_the_packfile.html
http://book.git-scm.com/7_raw_git.html
http://book.git-scm.com/7_transfer_protocols.html

Some of the next things I'm interested in producing is a cookbook
style guide and some searching tools for all the online documentation,
just to keep everyone up to date on where I'm going with the project.
Also, there is now a simple PDF downloadable version of the book
available and being kept up to date with the html version.

Thanks,
Scott
--
To: Scott Chacon <schacon@...>
Cc: git list <git@...>
Date: Saturday, September 6, 2008 - 2:26 pm

Hello Scott!

Nice book, I just started reading it and I have a recommendation to
make, at "Chapter 4: Git Treeishes" you write

---------
http://book.git-scm.com/4_git_treeishes.html
Range

Finally, you can specify a range of commits with the range spec. This
will give you all the commits between 7b593b5 and 51bea1 (where 51bea1
is most recent), excluding 7b593b5 but including 51bea1:

7b593b5..51bea1

This will include every commit since 7b593b:

7b593b..
---------

This in not quite correct. "commits between A and B" cannot really
apply here. I believe that "commits reachable from B and not from A"
is more precise. Actually you are already using the "reachability"
explanation at the start of "Chapter 3: Basic usage".

This issue is also described at the rev-parse man page.

Apart from that, you could also include "a...b" syntax for completeness.

-christos
--
To: Scott Chacon <schacon@...>
Cc: git list <git@...>
Date: Friday, September 5, 2008 - 8:48 pm

Hi,


I just had a very quick look over the PDF, meaning only looking at
pictures and headlines.

Just nitpicking about one thing:
I was wondering if "Stash Queue" is the right headline, because I
usually use

	git stash save	# oh, an interrupt, have to do something else now

and after this is done:

	git stash pop	# back to the real work

And if you are interrupted in an interrupt, you want the last stash
being the first one to pop, which is a stack-like (last in, first out)
behavior.

Of course, there may be cases where you want the queuing behavior that
you advertise in the book.
I use it rather seldomly. But perhaps it is just me :-)

Regards,
  Stephan

-- 
Stephan Beyer &lt;s-beyer@gmx.net&gt;, PGP 0x6EDDD207FCC5040F
--
To: Scott Chacon <schacon@...>
Cc: git list <git@...>
Date: Friday, September 5, 2008 - 4:27 pm

The checksums in the index file "trailers" are all claiming to be 4 bytes, 
and that's wrong - they're full SHA1 sums at 20 bytes each.

The v2 pack-file _also_ has per-object CRC's, and those are indeed just 4 
bytes each, and are correctly listed as such.

The pack-file itself also has a few more things there, it's not just the 
"PACK" string and then the objects. It has two more 32-bit words: a pack 
file version number and the number of entries in the pack-file (all 
network byte order). It also has its own checksum at the end (20-byte SHA1 
again).

But looks good otherwise from a quick look.

		Linus
--
To: Scott Chacon <schacon@...>
Cc: git list <git@...>
Date: Friday, September 5, 2008 - 3:41 pm

Nice pictures.  You might also want to know that code for reading pack idx
version 2 was backported to 1.4.4.5 for people who are stuck on 1.4.4
series for whatever reason.

What is the target audience of this section?  If it is written for a mere
curious type, or if it is written to give "here is the general idea, for
more details read the source", the level of detail here would be Ok.

If you are writing for people who want to (re)implement something that
produces these files, you might want to at least say that offset/sha1[]
table is sorted by sha1[] values (this is to allow binary search of this
table), and fanout[] table points at the offset/sha1[] table in a specific
way (so that part of the latter table that covers all hashes that start
with a given byte can be found to avoid 8 iterations of the binary
search).

&lt;data&gt; part is just zlib stream for non-delta object types; for the two
delta object representations, the &lt;data&gt; portion contains something that
identifies which base object this delta representation depends on, and the
delta to apply on the base object to resurrect this object.  ref-delta
uses 20-byte hash of the base object at the beginning of &lt;data&gt;, while
ofs-delta stores an offset within the same packfile to identify the base
object.  In either case, two important constraints a reimplementor must
adhere to are:

 * delta representation must be based on some other object within the same
   packfile;

 * the base object must be of the same underlying type (blob, tree, commit

I am guessing this is for Porcelain writers who use plumbing.  Please
don't teach echoing into .git/refs/...  but DO teach using update-ref with
the -m option.  We do not want people's random Porcelains flipping the tip
of branches without leaving trail in reflog for users to use to recover
from mistakes.




--
To: Junio C Hamano <gitster@...>
Cc: git list <git@...>
Date: Friday, September 5, 2008 - 5:34 pm

I've implemented all of these and Linus's fixes and suggestions.
Thanks for the feedback.

To answer your earlier question, these docs are basically for people
working on bindings/re-implementations in other languages, since there
is no real linked library available yet, as a primer before they dig
into the source, or possibly so they don't have to.

I'm not fantastic at C, so it took me a while in some cases - figuring
out that the size listed in the object header was not the actual size
of the data, but the size of it when expanded, for example, was not
very easy to do.  I've been doing a lot of work on re-implementations
in Ruby and ObjC because I can't easily make real bindings, so I
thought I would add things that I could not easily find in the docs
for others that are trying in other languages.

If you want, I could create a patch for any of this stuff to
Documentation/ (that goes for the whole book), but someone will have
to tell me which parts might be useful to add.

Thanks again for taking the time!
Scott
--
To: Scott Chacon <schacon@...>
Cc: Junio C Hamano <gitster@...>, git list <git@...>
Date: Saturday, September 6, 2008 - 2:33 am

OK, time for me to throw in comments.  ;-)

I do like this book, its organized and concise.  Thanks for doing it.


http://book.git-scm.com/7_how_git_stores_objects.html:

The loose object formatting of

 header = "#{type} #{size}#body"
 store = header + content

I can't read Ruby so I'm not sure what the header value computes
out to here.  #body should be a \0.  I'm also not sure that the
prior line setting size = content.length.to_s is very clear for
the non-Ruby people to understand how a size is formatted.

If the code shown here is the Ruby implementation I'm a little
concerned about it writing directly into the loose object.  If the
write is partial then you have a partial object which is at the
right name, but is unusable.  That can give you corruption that
is difficult to track down and fix.  C Git and JGit both write
to temporary files then atomically move the temporary file into
position under its proper name only after it has been fully written.

If an implementor is implementing they should be offered this advice,
and probably do so right here in this section of the book.

"When objects are written to disk, it is often in the loose format,
since that format is less expensive to access."

I'm not sure that statement is true.  Access from packs tends
to scream compared to access from loose objects.  The overheads
of opening and closing the file descriptors, even on Linux, is
what kills performance for data access.  However Git writes to
loose objects first and packs later for _safety_ not efficiency.
Although it is a lot more efficient to write a 2 KB loose object
and avoid rewriting a 50 MB pack, but its also less likely to fail
and make you lose your work.


http://book.git-scm.com/7_the_git_index.html:

I wouldn't say that the index stores permissions.  More like it
stores the "class" or "type" of the thing located at that path.
There are 4 major classes:

	- regular file
	- executable file
	- symbolic link
	- git submodule

The 5th class is the s...
To: Shawn O. Pearce <spearce@...>
Cc: Junio C Hamano <gitster@...>, git list <git@...>
Date: Saturday, September 6, 2008 - 2:14 pm

Thanks a ton for this, I'll incorporate all of this.


Sorry, the markdown thingy is translating all the '\0's to '#body' for
some freaking reason unless I write it as '\\0'.  I'll fix this - it's
difficult for me to find these sometimes.  As for the rest of the ruby

That is a good idea - I don't do it that way and I certainly will
change the implementation to do so and modify these docs to reflect

Thanks for the clarification.  I write to loose objects first largely
because it's so much easier to do.  But also because I don't mmap
objects, so packfile access is not faster for implementations that
can't do that very well.  Also, I had originally meant "less expensive

Interesting.  This documentation is actually from the User Manual -
I'll update this chapter first and if it looks better, I'll submit a


I'm an idiot.  I say this because I actually implemented a bunch of
this stuff (in Ruby) and ran into most of these issues when trying to
implement it.  So I knew these things not 3 weeks ago, but I still
wrote it this way.  Dur.  Thanks for the corrections, I'll update


Thanks again for all the time it must have taken to review all of this
- I'll make sure it gets into the book, and where appropriate, back
into the UM or other internal git docs.

Scott
--
To: Scott Chacon <schacon@...>
Cc: Junio C Hamano <gitster@...>, git list <git@...>
Date: Friday, September 5, 2008 - 6:09 pm

I have experience mixing C and Ruby code if you are interested, it's
actually quite easy.

I also think a shared library would make sense.

Keep up the good work ;)

-- 
Felipe Contreras
--
To: Scott Chacon <schacon@...>
Cc: git list <git@...>
Date: Friday, September 5, 2008 - 3:15 pm

I'm going to bite and ask the obvious questions:

1.  How does what you're producing differ from the current Git Users' Manual?
2.  Is this project of yours aiming to obsolete the Git Users' Manual
with "official" sanctioning from people involved with Git?
3.  Assuming 2 is a "no", patches to the Users' Guide would be nice.  :)

-- Thomas Adam
--
To: Thomas Adam <thomas.adam22@...>
Cc: git list <git@...>
Date: Friday, September 5, 2008 - 4:45 pm

Just for reference, a lot of this was discussed here a while back:

http://thread.gmane.org/gmane.comp.version-control.git/90653


I'm going for a different audience with this project.  I'd like for it
to be a lot more user-friendly, easily digestible, and to include

I think there will be people who prefer the Users Manual format, who
think screencasts are wussy :)
Also, I'm not sure an "official" sanctioning would do much of anything
- because of the images and screencasts, this will never be included
in the git source like the UM is, but it's also open source so if

I would love to do this, but I don't know what exactly the community
thinks is missing/lacking.  My ideas about what is helpful is rarely
the same as the git lists :)  However, if someone pointed to one of
the chapters I wrote and said "that would be great in the UM", I would
happily convert it.

Scott
--
Previous thread: [PATCH] avoid gitweb uninitialized value warning by Joey Hess on Friday, September 5, 2008 - 2:26 pm. (2 messages)

Next thread: [PATCH] Add --dirstat-by-file diff option by Heikki Orsila on Friday, September 5, 2008 - 3:27 pm. (3 messages)
speck-geostationary