Re: Switching from CVS to GIT

Previous thread: git blame crashes with internal error by Björn on Sunday, October 14, 2007 - 7:36 am. (12 messages)

Next thread: Re: Addition of "xmlto" to install documentation by Markus Elfring on Sunday, October 14, 2007 - 12:38 pm. (1 message)
From: Benoit SIGOURE
Date: Sunday, October 14, 2007 - 10:10 am

Context: GNU make seems to be willing to switch from CVS to ...  
something else.


I think the best thing to do is to ask directly on the Git ML.

Someone already pointed out that he'd like to use Git on Windows but  
doesn't want to install either Cygwin or MSYS.  Is this possible, or  
will it be possible in the near future?  Is it possible to use one of  
the various GUIs (git-gui, gitk, qgit) on Windows without requiring a  
POSIXish shell etc.?

When will the librarification of Git be finished?  (if Git is  
available as a library, and if this library works on Windows, it will  
greatly help truly native Windows ports).

Not that I like Windows in any way, right, but it's legitimate for  
people working on Windows ports of various software to be willing to  
have a truly native port of Git for Windows.

-- 
Benoit Sigoure aka Tsuna
EPITA Research and Development Laboratory


From: Marco Costalba
Date: Sunday, October 14, 2007 - 11:06 am

qgit-2.0 works natively under Windows

http://sourceforge.net/project/showfiles.php?group_id=139897

Check the README for how to install.


Marco
-

From: Johannes Schindelin
Date: Sunday, October 14, 2007 - 11:20 am

Hi,


There is msysGit.  This project is nearing to its first beta, being 
self-hosted since mid-August IIRC.

It is a port of Git to MinGW, using parts of MSys as long as we have 
dependencies on bash and perl.

I have no doubt that we'll manage to finish version 0.3 of the installer 
this week, still not decided if it is still alpha or already beta.

There are some issues with using msysGit, none of them really serious, but 
you better be ready to ask questions on this list or #git in case 
something crops up.  msysGit is young.

Having said that, IMHO msysGit is already quite usable, and should be 
pretty stable within a few weeks (if it is not already).

Ciao,
Dscho

-

From: Martin Langhoff
Date: Sunday, October 14, 2007 - 10:35 pm

I've been using it recently, I have to say it's pretty impressive -
you can use it from cmd.com or from a bash window (courtesy of the
msys environment included). The GUIs that ship with git are there
(git-gui, gitk).

I use gitk extensively, and it works *great*. My work-style is of a
shell window for status/diff/commit actions and one or more gitk
windows for browsing of proj history. You can use git-gui for a visual
status/git/commit workflow, or qgit. qgit is more integrated, and
might feel more "at home" for users that expect something more
MDI-ish.

cheers.


martin
-

From: Andreas Ericsson
Date: Sunday, October 14, 2007 - 11:27 am

It is sort of possible. Without cygwin he'll be in the black for the few
features that are still implemented as shell-scripts, but perhaps he/she

qgit is possible to use natively, if one installs the qgit4 libraries for
windows, but it's more of a viewer than an action gui. git-gui and gitk
are usable if you have the windows TCL port. I haven't tried it, but
there are installers available, so testing it out (with all dependencies)

When someone gets around to doing it ;-)

For a real answer, I'll have to defer to others. Everything works to my
satisfaction where I'm using it, so I'm not very inclined to fiddle with

Yup. I believe the primary reason for libification is to easier support

Naturally. Amazingly few of those stuck with windows have so far
volunteered for helping out though, and since many of us on this list
don't even have a windows system for testing, it's kinda slow going :-/

I'd imagine getting in touch with Dscho to get a list of what's needed,
or reading the biweekly msys.git herald on this list, is the best way
of finding out the porting project's current priorities.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Johannes Schindelin
Date: Sunday, October 14, 2007 - 11:39 am

Hi,


Umm.  There are quite a few shell scripts still _necessary_ to run git: 
git-commit, git-fetch and git-merge being the most prominent ones.  The 
first two are in the process of being rewritten _right_ _now_, but no 
official git release has them yet.

And I have to disagree strongly with the "black": In msysGit (which brings 

FWIW msysGit comes with Tcl.  You can run git gui and gitk without any 

There has been a GSoC project, and it has a nice small API which can be 
called from Python, for example.

Funnily enough, the first user is qgit as far as I know, which is written 

Why?

I do not see any reason why libification helps the user experience on 
Windows.

Ciao,
Dscho

-

From: Andreas Ericsson
Date: Sunday, October 14, 2007 - 12:09 pm

Oh? I didn't know that. Windows and its unixifying toolboxes is unknown

Yes, my phrasing there was a bit obscure. I meant that all dependencies

I was under the impression that the windows port suffers from Windows'
lack of a proper fork() and friends and that a proper library would
help solving those problems. Perhaps I was misinformed.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Johannes Schindelin
Date: Sunday, October 14, 2007 - 1:14 pm

Hi,


It suffered.  Until Hannes Sixt did a very fine job which cumulated in the 
patch series he posted yesterday.  Of course, this work is the reason 
msysGit is functional.

Ciao,
Dscho

-

From: Alex Riesen
Date: Sunday, October 14, 2007 - 3:14 pm

Re "functional". Have to remind something (besides the fork):

Filesystem:

- no proper VFS (can't do anything with files opened elsewhere, and we
  have not enough error handling and diagnostic output to detect the
  problems)

- no proper filename semantics (case-insensitivity and stupid rules for
  allowed characters in filenames, like ":" in filenames in
  cross-platform projects)

- no acceptable level of performance in filesystem and VFS (readdir,
  stat, open and read/write are annoyingly slow)

- it is the only OS in the world with multi-root (/a/b/c and /a/b/c
  can be not the same, depending on what current "drive" is) and
  multi-cwd, which hasn't had formed itself into a problem yet, but
  surely will

- no real "mmap" (which kills perfomance and complicates code)

Interprocess communication:

- no reliable text environment (I'm programming in the damn thing for
  10 years and I still don't know how to pass an environment variable
  _for_sure_)

- it has only one argument (limited in size) passed to started
  programs, which means that there is no possible way to safely pass
  file and text arguments on command line (more than one, that is)


-

From: Eli Zaretskii
Date: Sunday, October 14, 2007 - 3:41 pm

That's a 20-20 hindsight: if you deliberately write a program to rely
heavily on Posix-isms, don't be surprised when you discover that it

I'm not sure what you are talking about.  What VFS do you use on

There's a flag on Windows to open files case-sensitively, if you need
that.  In any case, I don't see how this can be of any real relevance
to porting GIT.  As for ":" in file names, simply don't use it, like
you don't use white space or characters below 32 decimal: it's

With what libraries?  Native `stat' and `readdir' are quite fast.
Perhaps you mean the ported glibc (libgw32c), where `readdir' is

So what? on Unix "a/b/c" can be not the same.  Both cases are simply
not complete file names, that's all.  No one said there must be a
single root for all volumes, it's the Posix jingoism creeping in



Not enough context, so I cannot talk intelligently about this.  Why do
you need interprocess communication in the first place? why not simply
give birth to a subsidiary process and pass it a command line (which
can be up to 32KB)?
-

From: Johannes Schindelin
Date: Sunday, October 14, 2007 - 4:45 pm

Hi,


The problem is that on Windows, you cannot keep a file open and delete it 
at the same time.  This is an issue in Windows' equivalent of VFS.

A neat trick to work with temporary files without permission issues is to 
open the file and delete it right after that.  This does not work on 

The problem is not so much opening, but determining if an existing file 
and a file in the index have the same name.

For example, "README" in the index, but "readme" in the working directory, 
will be handled as "deleted/untracked" by the current machinery.  IOW git 
will not know that what it gets from readdir() as "readme" really is the 

No, native.

Once you experienced the performance of git on Linux, then rebooted into 
Windows on the same box, you will grow a beard while waiting for trivial 
operations.

Sure, git kicks ass on Windows, but only as compared to other programs _on 

I think Alex means this: you can have C:\a\b\c and D:\a\b\c.  So depending 
on which drive you are, you mean one or the other.  Just comparing the 

Yes.  And we rely on the performance very much.

Hth,
Dscho

-

From: Eli Zaretskii
Date: Sunday, October 14, 2007 - 9:06 pm

That is no longer true, for quite some time.  NT4 and later versions

Maybe GIT assumes too much about `readdir' and `stat', and should

What _I_ meant is that the C: part is part of the full file name,

There's no need for mmap to get memory performance, except if sbrk and
friends are too slow.
-

From: Eli Zaretskii
Date: Sunday, October 14, 2007 - 10:56 pm

That's because you think file names are simple strings and can be
compared by simple string comparison.  This naìve view is not true
even on POSIX systems: "foo/bar" and "/a/b/foo/bar" can be the same
file, as well as "/a/b/c/d" and "/x/y/z", given the right symlinks.
But for some reason that eludes me, people who are accustomed to POSIX
stop right there and in effect say "file names are strings, if we only
make them absolute and resolve links".  Instead, recognize that file
names are not strings (although they inherit some of the strings'
traits), and think in terms of "file-name comparison" abstraction;

Can you show a test case where this penalty is clearly visible?  I'm
curious to see the numbers.  TIA
-

From: Johannes Schindelin
Date: Monday, October 15, 2007 - 1:44 am

Hi,




... yes!  There you have it.  Absolute filenames, resolved by readlink() 
are assumed to be the unique (!) identifiers for the contents.

_Note:_ absolute paths _without_ readlink() resolving are _still_ unique 
identifiers; this time for files/symlinks.

Things like this utter rubbish that two different file names (which are 
the keys in the keystore that a filesystem really is) make Windows' 
filesystem operations so slow.

I wonder when Windows heads will realise that this "convenience" is just 
another reason why Windows is easily outperformed by other OSes (yes, the 

No, I cannot.  I will not go and buy a copy of Windows just to show you 
the numbers.

Since quite some time I only run Linux on my machine(s), and the reason 
was a very unscientific experiment: I kept with the OS that did not freeze 
and let me do nothing for more than one second.

Now, that is my _personal_ decision.  If _you_ have no problem with 
Windows, just stick with it.  (I always thought this goes without saying, 
but Windows users tend to be very religious about this issue, thinking 
just because I hate Windows that I want to make them switch.  Hahaha, no.)

Ciao,
Dscho

-

From: David Kastrup
Date: Monday, October 15, 2007 - 1:57 am

They aren't.  One can mount the same file system several times in
different places.  In Linux, one can even mount directories and files
to several places at once.  Most Unices also support some
case-insensitive file systems, and readlink does not canonicalize the

Not even that.  A unique identifier for files would imply that
touching the file does not affect, say, the access times of files with
other unique identifiers.

-- 
David Kastrup

-

From: Alex Riesen
Date: Monday, October 15, 2007 - 10:49 am

They tend to be so exactly because they know how pathetic they are.
They just want to have something where they don't suck and do
everything to find it. And fail. Then they resort to graphics and
user-friendly interface.

-

From: Dave Korn
Date: Monday, October 15, 2007 - 11:25 am

Translation:  "I feel that I am superior to other people.  This post has no
content apart from me shooting my mouth off in an attempt to prove how much
cleverer I am than anyone else.  However apart from my self-love I have no
contribution to make to the discussion."

  This isn't slashdot.  A computer is just a tool, and it's really *you* who
are being pathetic, because you confuse a choice of mass-manufactured consumer
product with a statement about personal identity.  Loyalty to your favourite
brand is a game of one-upmanship suitable only for kids.  You need to grow up.

    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

-

From: Johannes Schindelin
Date: Monday, October 15, 2007 - 11:34 am

Hi,


I sense a classical Stockholm Syndrome here ;-)

Ciao,
Dscho

-

From: Alex Riesen
Date: Monday, October 15, 2007 - 12:34 pm

Yep. Will do.

-

From: Alex Riesen
Date: Monday, October 15, 2007 - 10:53 am

Not really. I meant that "/a/b/c" and "/a/b/c". Note the leading
slash. On windoze it is _NOT_ absolute path. It is relative to the
root of the current drive.

-

From: Andreas Ericsson
Date: Sunday, October 14, 2007 - 4:55 pm

True. It was originally developed because Linux kernel development came
to a stand-still and needed an scm quickly. Since the original design
worked out nicely, nobody bothered (then) about possible future porting
issues. Windows is still a second class citizen, but that's true for
pretty much every unix-born application out there, so I'm not all that


Because having

	Path/foo
	path/Foo
	PATH
	path/foo

is possible in git's native playground, but not on windows, so it can
quite seriously hamper cross-platform cooperation. When that happens,
users usually start blaming the tools in use. Browse the list archives
for HFS and you'll see what I mean, although come to think of it, the
HFS problems might actually be worse, since HFS reports case-changes

It's still a real problem because sooner or later someone will use that,

Not really. mmap() provides a real performance boost when reading large
repos, due to the sliding window code that handles pack-files. mmap
was invented for occasions like that, and was allowed to endure because
it was a much better solution than simply read(fd, buf, st.st_size) and


Because some of the commands operate on large data-sets that are best
passed as a stream. It's ridiculously easy to set that up on unix, but

I believe work is in progress that will run things as threads rather
than using fork()+execve(). 32KiB of data is nowhere near enough to
sustain many of the more data-hungry commands. Or rather, it won't be
once the repository has grown passed 50-odd revisions.


All that being said, welcome to the git mailing list. Hopefully you
can help iron out the wrinkles on windows. You seem to have a fairly
good grasp of what's available there, and I'm sure the msys team would
be pretty happy to get a few patches to speed them on their way.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Daniel Barkalow
Date: Monday, October 15, 2007 - 5:45 pm

Responding only to those portions where I think Windows experience and a 
Windows perspective would be helpful...


I believe the hassle is that readdir doesn't necessarily report a README in 
a directory which is supposed to have a README, when it has a readme 
instead. I think we want O(n) comparison of sorted lists, which doesn't 

We want getting stat info, using readdir to figure out what files exist, 
for 106083 files in 1603 directories with a hot cache to take under 1s; 
otherwise "git status" takes a noticeable amount of time with a medium-big 
project, and we want people to be able to get info on what's changed 
effectively instantly. My impression is that Windows' native stat and 
readdir are plenty fast for what normal Windows programs want, but we 
actually expect reasonable performance on an unreasonably-big 
metadata-heavy input. AFAICT, nothing but Linux is optimized for this, but 
we're used to being able to find out if there's any change to a large 
directory structure in practically no time. On the other hand, we really 
just want to beat users' expectations for this operation, not our own 
expectations, so this may only be a problem for people benchmarking 

I believe the need here is quick setup and fast access to sparse portions 
of several 100M files. It's hard to beat a page fault for read speed.

We also expect to be able to make a sequence of file system operations 
such that programs starting at any time see the same database as the files 
containing the database get restructured. My impression is that this is 
very hard or impossible with Windows, and also that it doesn't matter for 
Windows users, because they'll only have one program at a time accessing 
the repository. A lot of our filesystem demands are about making a wide 
variety of race conditions give the same result regardless of how the race 
goes, and we're just being overly careful for a Windows environment 
(although not necessarily for users with a UNIX background using Windows 

A ...
From: Eli Zaretskii
Date: Monday, October 15, 2007 - 9:30 pm

Sorry I'm asking potentially stupid questions out of ignorance: why

You comparison function should be case-insensitive on Windows, or am I

If that's the issue, then it's not a good idea to call `stat' and
`readdir' on Windows at all.  `stat' is a single system call on Posix
systems, while on Windows it usually needs to go out of its way
calling half a dozen system services to gather the `struct stat' info.
You need to call something like FindFirstFile, which can do the job of
`stat' and `readdir' together (and of `fnmatch', if you need to filter
only some files) in one go.  I don't know whether this will scan 100K
files under one second (maybe I will try it one of these days), but it
will definitely be faster than `readdir'+`stat' by maybe as much as an

If you need memory-mapped files, they are available on Windows.  I
thought the original comment about `mmap' was because it was used to

Sorry, I don't understand this; please tell more about the operations,
``the same database'' issue (what database?) and what do you mean by

Windows supports pipelines with almost 100% the same functionality as
Posix.  Again, perhaps I'm missing something.
-

From: Andreas Ericsson
Date: Monday, October 15, 2007 - 10:14 pm

Because it might have been checked in as README, and since git is case
sensitive that is what it'll think should be there when it reads the
directories. If it's not, users get to see

	removed: README
	untracked: readme

and there's really no easy way out of this one, since users on a case-
sensitive filesystem might be involved in this project too, so it
could be an intentional rename, but we don't know for sure. Just
clobbering the in-git file is wrong, but overwriting a file on disk

To be honest though, there are so many places which do the readdir+stat
that I don't think it'd be worth factoring it out, especially since it
*works* on windows. It's just slow, and only slow compared to various
unices. I *think* (correct me if I'm wrong) that git is still faster
than a whole bunch of other scm's on windows, but to one who's used to
its performance on Linux that waiting several seconds to scan 10k files


/* I'm on a limb here. Nicolas Pitre knows the git packfile format, so
 * perhaps he'll be kind enough to correct me if I'm wrong */

The mmap() stuff is primarily convenient when reading huge packfiles. As
far as I understand it, they're ordered by some sort of delta similarity
score, so mmap()'ing 100MiB or so of a certain packfile will most likely
mean we have a couple of thousand "connected" revisions in memory. That
database gets sort of restructured as the memory-chunk that's mmap()'ed
get moved to read in the next couple of thousand revisions.

In all honesty, this doesn't matter much for already fully packed projects
unless they're significantly larger than the Linux kernel, since git is so
amazingly good at compressing large repos to a small size. Linux is ~180
MiB fully packed, and most developer's systems could just read() that
entire packfile into memory without much problem. But then again, no-one's
ever had problems supporting the "normal" cases.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             ...
From: Eli Zaretskii
Date: Monday, October 15, 2007 - 11:25 pm

This is a non-issue, then: Windows filesystems are case-preserving, so
if `README' became `readme', someone deliberately renamed it, in which

It _must_ have been an intentional rename.  While years ago there used
to be old DOS programs that could cause such a rename as a side effect
of modifying a file, that time is long gone.  There's no longer a need
to cater to such programs, as even DOS programs can support

Something for Windows users to decide, I guess.  It's not hard to

I think only the Linux filesystem is as fast as you say.  But I may be

Unless that 10K is a typo and you really meant 100K, I don't think 10K
files should take several seconds to scan on Windows.  I just tried
"find -print" on a directory with 32K files in 4K subdirectories, and
it took 8 sec elapsed with a hot cache.  So 10K files should take at
most 2 seconds, even without optimizing file traversal code.  Doing
the same with native Windows system calls ("dir /s") brings that down
to 4 seconds for 32K files.

On the other hand, what packages have 100K files?  If there's only one
-- the Linux kernel -- then I think this kind of performance is for
all practical purposes unimportant on Windows, because while it is
reasonable to assume that someone would like to use git on Windows,
assuming that someone will develop the Linux kernel on Windows is --
how should I put it -- _really_ far-fetched ;-)

As for speed of file ops ``just feeling wrong'': it's not limited to
git in any way.  You will see the same with "tar -x", with "find" and
even with "cp -r", when you compare Linux filesystems, especially on a
fast 64-bit machine, with comparable Windows operations.  A Windows
user who occasionally works on GNU/Linux already knows that, so seeing
the same in git will not come as a surprise.  Again, I wonder how this
compares with other free OSes, like FreeBSD (unless they use the same
filesystem), and with proprietary Unices, like AIX and Solaris.
-

From: Daniel Barkalow
Date: Tuesday, October 16, 2007 - 12:07 am

I'm partially worried about cases where checking out a "README" fails to 

I think you're right (nothing else can compete with Linux for doing half a 
million trivial syscalls), but other unixes aren't terrible, either. 
IIRC, on OS X, we had problems when we were doing 4 times as many syscalls 

Actually, there are a number of projects much bigger than the Linux 
kernel; I think KDE was considering using git, and wanted Windows support, 
and KDE is insanely huge, mostly as a result of having one big repository 

For most things, Unix filesystems are fast enough that the bulk of the 
time is spent elsewhere. "git status" without any changes and a hot cache 
is unusual in being both a common operation and entirely trivial syscalls 
if the filesystem makes it efficient.

The problem we've had is that Linux users who occasionally work on Windows 
say git seems impossibly slow on Windows.

	-Daniel
*This .sig left intentionally blank*
-

From: Johannes Schindelin
Date: Tuesday, October 16, 2007 - 5:29 am

Hi,

[by explicit request culling make-w32 from the Cc list]


No, it is not.  On FAT filesystems, for example, I experienced Windows 
happily naming a file "head" which was created under then name "HEAD".

This is the single reason why I cannot have non-bare repositories on a USB 

No.  It can also be the output of a program which deletes the file first, 
and then (since the filesystem is so "conveniently" case insensitive) 
creates it again, with a lowercase filename.

And don't you tell me that there are no such programs.  I have to use 
them, and they are closed source.



On Linux, I would have hit Control-C already.  Such an operation typically 

Mozilla, KDE, OpenOffice.org, X.org, ....

Ciao,
Dscho

-

From: Peter Karlsson
Date: Tuesday, October 16, 2007 - 5:38 am

If you create a file name with only capital letters, I believe Explorer
and the file browser will display the name with an initial capital, and
the rest lowercase, or in all lowercase. IIRC, this is because such a
file is saved with only an MS-DOS name and no LFN entry, and those have
special rules to avoid them being displayed in all-uppercase.

I believe it is possible to create a LFN entry for such a file, but I
can't remember right now how to do it.

-- 
\\// Peter - http://www.softwolves.pp.se/
-

From: Eli Zaretskii
Date: Tuesday, October 16, 2007 - 6:04 am

That's true, but the names are only displayed like that, what's on

I don't think this true anymore in modern versions of Windows, but I
might be mistaken.  In any case, the reason for the Explorer behavior
is immaterial for us, what matters is that file names on disk preserve
the lettercase of the program that created them, at least AFAIK.
-

From: Eli Zaretskii
Date: Tuesday, October 16, 2007 - 5:53 am

What program did that, and how did you see that the file was named
"head" instead of "HEAD"?  (The latter question is because Explorer,
for example, does not show the file names exactly like they are
written in the directory, it capitalises them.  But this is
application-level code; in the directory the file names are written
like you gave them in the argument to whatever "create file" API you



We were not comparing Linux with Windows, we were talking about
Windows user experience.  On Windows 4 seconds is not too long.
-

From: Johannes Schindelin
Date: Tuesday, October 16, 2007 - 6:15 am

Hi,




Well, I was talking about user experience.  In this case of a user who 
happens to be on Windows, but knows Linux' speed.

Ciao,
Dscho

-

From: Dave Korn
Date: Tuesday, October 16, 2007 - 8:47 am

Hi there!  Did someone call?

  Cross-development in general isn't what I'd call "far-fetched", and there's
no law of cross-development that says the host has to be the same platform as
the target.  :-)[*]

    cheers,
      DaveK

[*] - this smiley sponsored by the Department of the Bleedin' Obvious.
-- 
Can't think of a witty .sigline today....
From: David Brown
Date: Tuesday, October 16, 2007 - 8:56 am

Oh, I wish others could think this clearly.  Quoting a serious line off of
a task list at an unnamed company:

   - Make Linux kernel compile under windows.

I don't think it will move past just being a wish list item, but there seem
to be people that think it should be done.

Admittedly, they don't want developers doing it on windows, but want to
integrate kernel building into a windows-heavy build and release process.

David
-

From: Nicolas Pitre
Date: Tuesday, October 16, 2007 - 9:04 am

Linux is compilable on Windows, and has been for a long time already.
With Cygwin it is pretty trivial to do.  I prefer native Linux though.


Nicolas
-

From: Dave Korn
Date: Tuesday, October 16, 2007 - 9:23 am

Do that kind of thing here all the time, hence my previous post.  Apart from
the netfilter stuff with the filenames-that-match-in-all-but-case, no real
problems, took me a couple of hours one afternoon.

  Cygwin is a good match for linux dev work.

    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

-

From: Christopher Faylor
Date: Tuesday, October 16, 2007 - 11:06 am

Ditto.

Coincidentially enough this is the reason I wrote managed mode for cygwin's
mount.

But, we're pretty far off-topic aren't we?

cgf
-

From: Andreas Ericsson
Date: Tuesday, October 16, 2007 - 9:59 am

But it's most definitely not. The *huge* projects that have looked at
git have sometimes turned it down simply because they're either cross-
platform (Mozilla) or they have translators that use windows exclusively
(KDE and Mozilla, just to mention two).

Both Mozilla and KDE repos are *much* larger than the Linux repo.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Steffen Prohaska
Date: Tuesday, October 16, 2007 - 12:14 am

Maybe we need a configuration similar to core.autocrlf (which controls
newline conversion) to control filename comparison and normalization?

Most obviously for the case (in-)sensitivity on Windows, but I also
remember the unicode normalization happening on Mac's HFS filesystem
that caused trouble in the past.

	Steffen
-

From: Johannes Schindelin
Date: Tuesday, October 16, 2007 - 5:33 am

Hi,

[culled make-w32 list by explicit request]


Robin Rosenberg has some preliminary code for that.  The idea is to wrap 
all filesystem operations in cache.h, and do a filename normalisation 
first.

Ciao,
Dscho

-

From: Steffen Prohaska
Date: Tuesday, October 16, 2007 - 6:16 am

At that point we could add a safety check. Paths that differ only by
case, or whitespace, or ... (add general and project specific rules  
here)
should be denied. This would guarantee that tree objects can always be
checked out. Even if the filesystem capabilities are limited.

Robin, what do you think?

	Steffen
-

From: Johannes Schindelin
Date: Tuesday, October 16, 2007 - 6:21 am

Hi,


This would be an independent change.  The method I talked about only ever 
looks at one filename, never what is already there.

What you want would probably be all too easy with a pre-commit hook.  No 
need to clutter the git-core with code that is usually not needed (you'd 
only ever activate it on Linux when other developers use Windows or 
MacOSX).

Ciao,
Dscho


-

From: Steffen Prohaska
Date: Tuesday, October 16, 2007 - 6:50 am

Personally, I'd be very happy if git enforced the minimal consent  
between
(supported) filesystems and provided a system to guarantee that I can  
only
create tree objects that can be checked out on all (supported)  
filesystems.

I'd _always_ switch on such a mechanism. I think the idea of relying on
filenames that only differ by whitespace or case is insane  
independent of
the capabilities of the filesystem used. Humans hardly see such  
differences.
There may be other characters that should be avoided purely for  
technical reasons. If git checked this, too, I'd be happy.

An update hook is only very loosely coupled to git. I'd prefer a tighter
integration. 'git add <something>' should immediately report the  
problem.
But, maybe I'll try a commit hook first.

	Steffen
-

From: Johannes Schindelin
Date: Tuesday, October 16, 2007 - 7:14 am

Hi,


This will not happen.  In the Linux kernel, there were exactly such cases, 
where the filenames differed only in case.

Also, some projects I checked out (notably Perl) assume that Makefile is 
different from makefile.

So I think this will always be something Windows users would wish to 
impose onto others, while Linux users would always refuse.

Ciao,
Dscho
-

From: Steffen Prohaska
Date: Tuesday, October 16, 2007 - 7:36 am

and Mac users, who also need to deal with a case-preserving,

maybe Linux kernel developers. When I work on Linux, I'd be happy
if git saved me from creating directories containing Readme and
readme at the same time.

	Steffen
-

From: Eli Zaretskii
Date: Tuesday, October 16, 2007 - 8:12 am

Here is one Windows user that will never try to impose that ;-)

However, it's possible that an option could be supported to do that
when the user particularly wants that in her database.  Just a
thought...
-

From: Robin Rosenberg
Date: Wednesday, October 17, 2007 - 12:33 pm

My code only normalizes filenames to UTF-8 inside git, which isn't the same 
thing. I think that can be extended to handling MacOSX normalized UTF-8 and
Windows UTF-16 so, when you check out a thing from git there will be no 
surprises. Case insensitivity is another dimension. I have no idea as to the
performance of the code, it's more like a proof-that-it-can-be-done.

The code cannot "fail", it always does something reasonable, like not 
converting when that is not possible. Something else has to be done for 
validation.

The UTF-16 that windows use is not a current issue because git  only does 
local code page. Jgit, but it isn't very smart either because git doesn't say 
anything about filename encoding, while Windows/MacOSX/CIFS and other 
filesystems does.

The fact that git uses eigth bit file names may also be a reason performance 
is slower on Windows, because the eight-bit Win32API transforms all strings 
and filenames to the native UTF-16 encoding on *every* system call, in and 
out; that's a lot of work when you do it thousands of times. If git itself 
did the transform it might be made smarter and more suited to git's purposes, 
and most importantly faster. I have no idea about the performance hit. One
has to measure something.

I notice a number of SCM's out there, including one with a \$\d{4} pricetag 
gets you into trouble if you rename a file from Foo to FOO on Windows.

-- robin
-

From: Daniel Barkalow
Date: Monday, October 15, 2007 - 10:56 pm

Say the project upstream has the file being "README", but, for some 
reason, it has ended up checked out as "readme" in your directory. Since 
your filesystem is case insensitive, it's supposed to be the same file, 
but when git goes through the list of files in the directory, it sees 
"readme", and there's nothing between reachable.h and read-cache.c in the 
list of tracked files. We've got a sorted list of filenames we're tracking 
along with their most-recently-seen content, and we want to merge the 
results of readdir with them, and this is obviously more straightforward 
if the filename that's the match for "README" is provided byte-for-byte 

We want both lists sorted, so that we can step through the pair together 
and always reach matches together. This requires that the equivalent names 

Ah, that's helpful. We don't actually care too much about the particular 
info in stat; we just want to know quickly if the file has changed, so we 
can hash only the ones that have been touched and get the actual content 

No, we get our memory with malloc like normal people. The mmap is because 
we want to feed files and parts of files to zlib, and mmap makes that 

Git is built around a database of objects, which includes "blobs" (file 
content), "trees" (directory structure), "commits" (history linkage), and 
"tags" (additional annotations). Each of these objects gets hashed, and is 
referenced by hash. So we need to be able to get the object with a given 
hash quickly, and write an object and take its hash (ideally, stream the 
write and find out the hash at the end, with the database key set at that 
point). Also, this database should be compressed effectively, because it 
ought to compress really well, since a lot of the blobs and trees are only 
slightly different from other blobs or trees (by whatever changes were 
made between that revision and other revisions).

The current implementation of the persistant storage of this database is a 
bit complicated, with the goal being ...
From: Eli Zaretskii
Date: Tuesday, October 16, 2007 - 12:03 am

As I wrote in my other message, using native APIs improves performance

Is this because another user might be accessing the database, or are
there other popular use cases that cause this?  If the former, then
this is not terribly important on Windows, since the situation when
more than one user is logged and actively works is quite rare,
basically limited to some scheduled task (the equivalent of a cron
job) running for some user while another one is logged in
interactively.


Perhaps mmap introduces complications (I simply don't know), but in
general, as I show elsewhere in this thread, you can do similar things
on Windows, if you use native APIs (as opposed to emulations of Posix,
like `open'), although you may need to rename the old file to get it
out of the way of the new one with the same name, because otherwise
the old file will still be seen, even if deleted, as long as it's open
in some process.
-

From: Johannes Schindelin
Date: Tuesday, October 16, 2007 - 5:39 am

Hi,

[culled make-w32, as per explicit request]


Somehow this does not appeal to my "portability is good" side.  You know, 
if we had to do such trickeries for every platform we support, we'd soon 
be as big as Subversion *cough*.

For me, this is the most annoying part about programming Win32.  They went 
out of their way to make it incompatible with everything else, and as a 

Quite to the contrary.  Explorer often accesses files it should not lock.  
On the machine I test msysGit on, this is the most common reason for a 
test case to fail: it cannot delete the temporary directory, which 
_should_ be unused.  Indeed, a second after that, it _is_ unused.

Ciao,
Dscho

-

From: Eli Zaretskii
Date: Tuesday, October 16, 2007 - 6:16 am

You have to decide whether you care about performance enough to do
that or not.  If you do, then introducing file I/O abstractions at
higher level than the normal ``use-library-functions'' method is not
such a hard problem, and doesn't make the binary larger because each
platform gets only its own backend.  In practice, I have found that in
most cases a few well-designed and strategically placed macros is all

Portability is a two-way street.  A program that wasn't designed to be
portable will by definition be hard to port.  To me, what's annoying
is a program that was designed around a single-OS model of APIs.

Cross-platform programs are not that hard if you design them to be
like that from the ground up.  I'm working for a firm that does that
for a living: we develop software that compiles and runs on Windows

One more reason not to launch Explorer, if you ask me ;-)  But maybe
you have valid reasons to do that.  All I can say is that I never saw
such problems, but then I don't usually run programs that rewrite
files in a frenzy.
-

From: Johannes Schindelin
Date: Tuesday, October 16, 2007 - 6:24 am

Hi,


Yes, I know that we'll have to use more special casing of Windows for 
performance reasons.  I was only lamenting that it would not need to be 


Funny.  Last time I checked the toolbar went away, as well as the desktop, 
when I killed explorer.exe.

Ciao,
Dscho

-

From: Eli Zaretskii
Date: Tuesday, October 16, 2007 - 8:02 am

That's a ``feature'': Explorer is the parent of all the desktop
display.  Kinda like the login shell on Unix: if you kill it, there
goes your whole session.  Except that on Windows, the OS pays
attention and restarts Explorer right away to get you back in
business.  (In first versions of Windows, there was no restarting of
Explorer, so if you killed it, you needed to reboot :-()
-

From: Johannes Schindelin
Date: Tuesday, October 16, 2007 - 8:18 am

Hi,


I kinda knew that.  But what's now with your recommendation to never run 
Explorer?

Ciao,
Dscho

-

From: Eli Zaretskii
Date: Tuesday, October 16, 2007 - 8:43 am

I meant not to open "My Computer" and use the GUI for browsing the
directories.  If you meant that the touching of files is done even if
you don't open the GUI, then just ignore my advice: Explorer cannot be
killed.  I'm surprised that it touches files and directories,
though...
-

From: Daniel Barkalow
Date: Tuesday, October 16, 2007 - 10:04 am

I think that it would be a worthwhile project, from the point of view of 
making the code easier to follow and making the internal APIs clearer, to 
organize git's source to abstract the object database to read_sha1_file(), 
has_sha1_file(), hash_sha1_file(), and write_sha1_file() as the arbiters 
of what is in the local database (with other functions public as support 
for over-the-wire protocols, which may, by not-really-coincidence, by used 
for local storage as well); then Windows could have an entirely different 
storage mechanism that doesn't rely on filesystem metadata speed.

It would also be worthwhile to untangle the index's stat cache aspects and 
its tree-object-related aspects, so that there can be a platform- and 
repository-specific concept of how to handle the working area, and then 
Windows could do different stuff for the default case of setting up a 
directory on the local filesystem.

	-Daniel
*This .sig left intentionally blank*
-

From: David Kastrup
Date: Monday, October 15, 2007 - 11:06 pm

Well, are "I" and "i" the same letters?  What about "İ" and "i"?  Or
"I" and "ı"?  What about Greek where uppercasing loses accents
(actually not unusual in literate French, either).  And what about
German ß and SS/SZ?

"case-insensitive" is a simple word, but the devil is in the details,
and that means basically requiring a system-provided sorting function.
And actually the _killer_ detail here is that git _must_ have the same
sorting order on every platform, since the order of files in a
directory tree affects its SHA-1 sum.  So a system-dependent sorting
order breaks git interoperability.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum
-

From: Johannes Sixt
Date: Monday, October 15, 2007 - 11:42 pm

Thanks to Marius Storm-Olsen we already have a stat replacement that's twice 
as fast as msvcrt's stat. I calls only one API function 
(GetFileAttributesEx, but of course I don't know what's going on under its 
hood), because we need only a small part of struct stat filled in correctly.

-- Hannes

-

From: Eli Zaretskii
Date: Tuesday, October 16, 2007 - 12:17 am

Yes, I've seen that.  What I'm saying is that you can combine
`readdir' with `stat' in one API call (FindFirstFile/FindNextFile),
which will both read the directory and return you the attributes you
get from `stat'.  Think about `readdir' that brings you mode bits and
modification time together with the name, as some modern systems do.
-

From: Dave Korn
Date: Sunday, October 14, 2007 - 3:59 pm

Whuh?

http://msdn2.microsoft.com/en-us/library/y5zz48s1(VS.80).aspx


    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

-

From: Johannes Schindelin
Date: Sunday, October 14, 2007 - 5:01 pm

Hi,


It does have an exec() call, yes, since that is required by the C 
standard.  But internally, it converts that into one single command line.

In corner cases, you find problems with that.

Hth,
Dscho

-

From: Alex Riesen
Date: Monday, October 15, 2007 - 10:36 am

Like: "damn, it is just IMPOSSIBLE to implement without them corner
cases."

-

From: David Brown
Date: Sunday, October 14, 2007 - 5:03 pm

The MS exec* calls just concatenate all of the argv arguments, separating
them with a space into a single buffer.

Look at the general _exec* page:

   http://msdn2.microsoft.com/en-us/library/431x4c1w(VS.80).aspx

and read the first "Note" section.

If you know what the library on the other end is doing to re-parse the
arguments back into separate strings, it might be possible to quote things
enough to handle names with spaces, but it is hard.

David
-

From: Eli Zaretskii
Date: Sunday, October 14, 2007 - 11:08 pm

It's not hard, it's just a bit of work.  And it needs to be done
exactly once.
-

From: Andreas Ericsson
Date: Monday, October 15, 2007 - 3:16 am

Before someone beats me to it: "Patches welcome" ;-)

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Johannes Sixt
Date: Monday, October 15, 2007 - 3:38 am

From: Andreas Ericsson
Date: Monday, October 15, 2007 - 3:52 am

Yup. Although it was more in the nature of "whoever wrote it surely knows
he/she did it and where to find the patch", so I expect this wasn't much
of a timesink for you. My apologies if I was incorrect.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231
-

From: Dave Korn
Date: Monday, October 15, 2007 - 4:16 am

291                         /* Thanks, Bill. You'll burn in hell for that. */


  ;-)


    cheers,
      DaveK
-- 
Can't think of a witty .sigline today....

-

From: Peter Karlsson
Date: Tuesday, October 16, 2007 - 4:13 am

It's not the only OS with drive letters (although I don't see Git
coming to my Symbian OS phone any time soon), but there is only one
root. The problem is that it isn't addressable in the file system, and
that the concept of what is the root is different depending on what you
ask (either it's above the drive letters, or "My Computer").

You can create a search path rooted in "My Computer" if you want (using
shell APIs), but you probably can't get a readable text representation

Well, there are many other ways of passing arguments than on the
command line, but they are probably difficult to access from console
applications (things like DDE or whatever the current implementation is
called).

-- 
\\// Peter - http://www.softwolves.pp.se/
-

From: Martin Langhoff
Date: Sunday, October 14, 2007 - 10:43 pm

I'm a unix-head too. Last couple of weeks had to work on a windows
server, and installed msysGit. Very impressed - all the needed

I think msys' DLLs might be doing what cygwin does, an emulated fork.

A quite surprising thing is that msysgit manages to be very fast. Not
as fast as the same git, same hw running on a recent Linux, but pretty
usable fast for a tree with a few thousand files. Earlier/other git
ports to win32 are pretty slow (still faster than svn and friends, but
slooow).

cheers,


m
-

From: Johannes Sixt
Date: Sunday, October 14, 2007 - 11:39 pm

FWIW, I'm using the MinGW port from cmd.exe, i.e. not from a posix shell, on 
a *production* repository. gitk and git-gui work. Not all operations that I 
regularly use are available[*] via the GUIs, like git-rebase or 
non-fast-forwarding push, so the use of the command line is needed from time 
to time.

Unfortunately, "Fetch" does not yet work[*] from within git-gui, so you have 
to fall back to git-fetch on the command line.

Of course, the Posix toolset, including a shell, must still be installed 
(and in my setup they are in the PATH), but you don't have to use it.

[*] Note the distinction between "not available" and "does not work".

-- Hannes

-

From: Shawn O. Pearce
Date: Monday, October 15, 2007 - 4:12 pm

Rebase in git-gui is starting to be developed.  But its still not even
close to something I can use, let alone that I would be willing to ship
to another person for testing.

Force push (non-fast-forwarding push) is in git-gui.git's master
branch now as part of the 0.9.x series.  There's a new checkbox
option in the push dialog to trigger adding --force to git-push

What's broken?  Is this that Git protocol dump showing up in
git-gui's console window thing?

Are you using the C based fetch that is in git.git's next branch,
or the shell script based one that is in master?  Which Tcl/Tk
version are you using to run git-gui?

-- 
Shawn.
-

From: Johannes Sixt
Date: Monday, October 15, 2007 - 11:10 pm

It's the scripted fetch that does not work. The symptom is that the output 
of at least one of the commands (upload-pack, I think, because what I see is 
wire protocol) goes to a newly spawned console instead of wherever it was 
redirected to.

I didn't bother reporting since builtin-fetch is on the way (which will 
hopefully make this a moot point) and our team here is comfortable with 
calling git fetch on the command line.

-- Hannes

-

From: Shawn O. Pearce
Date: Monday, October 15, 2007 - 11:21 pm

Hmm.  The way the builtin-fetch works this shouldn't happen, but
I'd appreciate it if you could test and report back before that
topic merges into master.

-- 
Shawn.
-

From: Johannes Sixt
Date: Monday, October 15, 2007 - 11:29 pm

This happens with git 1.5.3 plus the git-gui that comes with that.

FWIW, I'm in the process of merging master of git.git into git/mingw.git, 
and then the builtin-fetch series (because on top of that there is my 
fork/exec removal series, which I'd like to adjust for Windows). And *then* 
I'll be able to report back to you.

-- Hannes

-

From: Johannes Schindelin
Date: Tuesday, October 16, 2007 - 8:16 am

Hi,


Note that Issue 57 on msysgit.googlecode.com talks exactly about the same 
issue.

Ciao,
Dscho

-

Previous thread: git blame crashes with internal error by Björn on Sunday, October 14, 2007 - 7:36 am. (12 messages)

Next thread: Re: Addition of "xmlto" to install documentation by Markus Elfring on Sunday, October 14, 2007 - 12:38 pm. (1 message)