Re: [test failure] Re: t4114 binary file becomes symlink

Previous thread: Build Git with Visual Studio 2008 by Frank Li on Saturday, July 18, 2009 - 6:35 am. (1 message)

Next thread: git svn .git/svn/*/index files taking up huge amounts of disk space by Robert Zeh on Saturday, July 18, 2009 - 7:58 am. (2 messages)
From: Nicolas Sebrecht
Date: Saturday, July 18, 2009 - 6:45 am

Hi,

Running 'sh t4114-apply-typechange.sh --verbose --debug' fails since its
introduction by b67b9612e1a90ae093445abeaeff930e9f4cf936 with this
output:


   * expecting success: 
   	git checkout -f foo-becomes-binary &&
   	git diff-tree -p --binary HEAD foo-symlinked-to-bar > patch &&
   	git apply --index < patch
   	
   ./test-lib.sh: line 234: 26816 Segmentation fault      git checkout -f
   foo-becomes-binary
   * FAIL 8: binary file becomes symlink
   	
   		git checkout -f foo-becomes-binary &&
   		git diff-tree -p --binary HEAD foo-symlinked-to-bar > patch &&
   		git apply --index < patch
   		
   
   diff --git a/foo b/foo
   deleted file mode 120000
   index ba0e162..0000000
   --- a/foo
   +++ /dev/null
   @@ -1 +0,0 @@
   -bar
   \ No newline at end of file
   diff --git a/foo b/foo
   new file mode 100644
   index 0000000..ab26de5
   --- /dev/null
   +++ b/foo
   @@ -0,0 +1 @@
   +how far is the sun?


The filesystem used is ext3 (2.6.26-gentoo-r4-v7).

I guess it's not really expected. So, I'll start my own research but I
don't know this code.

I can provide more tests if needed.

-- 
Nicolas Sebrecht
--

From: Jeff King
Date: Saturday, July 18, 2009 - 6:56 am

Sorry, I can't reproduce here (I tried v1.6.3 and the current
'next'). The tests pass just fine with --debug (which, IIRC, doesn't
actually do much). What is the exact commit you're seeing it fail on?

Can you try running it under gdb to get a stack trace? If you have
valgrind installed, can you run the test script with --valgrind?

-Peff
--

From: Nicolas Sebrecht
Date: Saturday, July 18, 2009 - 7:16 am

It fails on:
  - next
  - v1.6.3
  - b67b9612e1a90ae093445abeaeff930e9f4cf936

$ sh t4114-apply-typechange.sh --valgrind

<snip>

* expecting success: 
	git checkout -f foo-becomes-binary &&
	git diff-tree -p --binary HEAD foo-symlinked-to-bar > patch &&
	git apply --index < patch
	
==10807== Invalid read of size 1
==10807==    at 0x4C22349: strlen (in /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
==10807==    by 0x5616ED6: vfprintf (in /lib64/libc-2.8.so)
==10807==    by 0x563C159: vsnprintf (in /lib64/libc-2.8.so)
==10807==    by 0x495E90: git_vsnprintf (snprintf.c:38)
==10807==    by 0x48917B: strbuf_addf (strbuf.c:203)
==10807==    by 0x412AA0: cmd_checkout (builtin-checkout.c:364)
==10807==    by 0x404222: handle_internal_command (git.c:243)
==10807==    by 0x404466: main (git.c:483)
==10807==  Address 0x1 is not stack'd, malloc'd or (recently) free'd
{
   <insert a suppression name here>
   Memcheck:Addr1
   fun:strlen
   fun:vfprintf
   fun:vsnprintf
   fun:git_vsnprintf
   fun:strbuf_addf
   fun:cmd_checkout
   fun:handle_internal_command
   fun:main
}
==10807== 
==10807== Process terminating with default action of signal 11 (SIGSEGV)
==10807==  Access not within mapped region at address 0x1
==10807==    at 0x4C22349: strlen (in /usr/lib64/valgrind/amd64-linux/vgpreload_memcheck.so)
==10807==    by 0x5616ED6: vfprintf (in /lib64/libc-2.8.so)
==10807==    by 0x563C159: vsnprintf (in /lib64/libc-2.8.so)
==10807==    by 0x495E90: git_vsnprintf (snprintf.c:38)
==10807==    by 0x48917B: strbuf_addf (strbuf.c:203)
==10807==    by 0x412AA0: cmd_checkout (builtin-checkout.c:364)
==10807==    by 0x404222: handle_internal_command (git.c:243)
==10807==    by 0x404466: main (git.c:483)
==10807==  If you believe this happened as a result of a stack overflow in your
==10807==  program's main thread (unlikely but possible), you can try to increase
==10807==  the size of the main thread stack using the --main-stacksize= flag.
==10807==  The main thread ...
From: Jeff King
Date: Saturday, July 18, 2009 - 8:31 am

Hmm. So it is clearly reproducible on your system, but not on mine. I
wonder what the difference could be.

Are you compiling with any special options? I usually compile with just
"-g -Wall -Werror", but I also tried with "-O2" and couldn't reproduce.

Looking at that strbuf_addf call, we presumably have a bogus pointer
either in old->name or new->name. Which is odd, since reading the code,
both get memset() to zero, and then assigned from something which should
be sane.

At this point, I would try either running it under gdb or putting in
some debugging printfs into update_refs_for_switch to try to isolate
where the bogus value is coming from (valgrind sees it as cmd_checkout,
but presumably that is because it inlines the static
update_refs_for_switch).

Can you try that? Otherwise, I'm not sure how to proceed because I can't
reproduce it on my box.

-Peff
--

From: Junio C Hamano
Date: Saturday, July 18, 2009 - 11:46 am

Me neither, but I found an unrelated anomaly in the output from the test
nearby.

    Switched to branch 'foo-baz-renamed-from-foo'
    *   ok 3: file renamed from foo/baz to foo

    * expecting success:
            git checkout -f foo-becomes-a-directory &&
            git diff-tree -p HEAD initial > patch &&
            git apply --index < patch

    error: Invalid path ''
    Switched to branch 'foo-becomes-a-directory'
    *   ok 4: directory becomes file

Notice the "error"?

This is coming from the oneway_merge() codepath in unpack-trees.c.

When we switch branches with "checkout -f", unpack_trees() feeds two 
cache_entries to oneway_merge() function in its src[] array argument.  The
zeroth entry comes from the current index, and the first entry represents
what the merge result should be, taken from the tree recorded in the
commit we are switching to.

When we have a blob (either regular file or a symlink) in the index and in
the work tree at path "foo", and the switched-to tree has "foo/bar",
i.e. "foo" becomes a directory, src[0] is obviously that blob currently
registered at "foo".  Even though we do not have anything at "foo" in the
switched-to tree, src[1] is _not_ NULL.

The unpack_trees() machinery places a special marker df_conflict_entry
to signal that no blob exists at "foo", but it will become a directory
that may have somthing underneath it, namely "foo/bar".

Passing that df_conflict_entry marker to merged_entry() happens to remove
the "foo" in the end because the df_conflict_entry does not have any name
(hence the "error" message) and its addition in add_index_entry() is
rejected, but it is wrong.

 unpack-trees.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 48d862d..720f7a1 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -999,7 +999,7 @@ int oneway_merge(struct cache_entry **src, struct unpack_trees_options *o)
 		return error("Cannot do a oneway merge of %d trees",
 ...
From: Nicolas Sebrecht
Date: Saturday, July 18, 2009 - 1:39 pm

The Makefile have

  215:CFLAGS = -g -O2 -Wall
  217:ALL_CFLAGS = $(CFLAGS)

and it should be OK. I know that -Werror will fail on

  libgit.a(mkdtemp.o): In function `gitmkdtemp':
    /home/nicolas/dev/official_packages/git/compat/mkdtemp.c:5: warning: the
    use of `mktemp' is dangerous, better use `mkstemp'

but I think this is an unrelated problem.

$ gcc --version

I couldn't reproduce this error here with

  sh t4114-apply-typechange.sh --verbose
and
  sh t4114-apply-typechange.sh --verbose --debug

on the current next.

With this patch, I still fail on the test.

-- 
Nicolas Sebrecht
--

From: Linus Torvalds
Date: Saturday, July 18, 2009 - 4:18 pm

Ack. This looks like a really old bug.

That whole 'df_conflict_entry' always confused me.

		Linus
--

From: Johannes Sixt
Date: Saturday, July 18, 2009 - 12:06 pm

amd64-linux, and you build with SNPRINTF_RETURNS_BOGUS? Why do you have this 
option set?

-- Hannes
--

From: Nicolas Sebrecht
Date: Saturday, July 18, 2009 - 1:17 pm

I don't tune any flags myself and

  $ printf "$CC\n $CFLAGS\n $LDFLAGS\n $LIBS\n $CPPFLAGS\n $CPP\n"

is empty. Is there another way to accidentally set a flag other than
environment variables or options to 'make'?

I've tried

  make
  make install

and

  ./configure --prefix=/home/nicolas/bin
  make
  make install

I don't know if it changes anything but both failed with the same error.


-- 
Nicolas Sebrecht
--

From: Nicolas Sebrecht
Date: Saturday, July 18, 2009 - 2:13 pm

Hum, 'rm configure ; make configure ; ./configure' give

 checking whether snprintf() and/or vsnprintf() return bogus value... yes

WTF?

-- 
Nicolas Sebrecht
--

From: Jakub Narebski
Date: Sunday, July 19, 2009 - 3:33 am

It looks like some recent bug in configure.ac, as I have run
./configure without "make configure" and it had

  NO_LIBGEN_H=@NO_LIBGEN_H@
  NEEDS_RESOLV=@NEEDS_RESOLV@

  SNPRINTF_RETURNS_BOGUS=

and when I did "rm configure ; make configure ; ./configure"
it gave me

  NO_LIBGEN_H=
  NEEDS_RESOLV=

  SNPRINTF_RETURNS_BOGUS=UnfortunatelyYes

I have tried to find which commit introduced this regression.

 $ git bisect start origin v1.6.3 v1.6.3.2 -- configure.ac config.mak.in
 $ git bisect run ~/git/test.sh

finds ecc395c (Makefile: add NEEDS_LIBGEN to optionally add -lgen to
compile arguments, 2009-07-10) as a first bad commit.  But I don't see
how it could have changed it... Strange...

CC-ed Brandon Casey, author of blamed changeset, and David Syzdek who
offered at some time help with maintaining autoconf.


P.S. Perhaps it is time for creating MAINTAINERS file for git?

-- >8 --
#!/bin/bash

rm -f configure &&
make configure  &&
./configure -q  &&
grep -q "^SNPRINTF_RETURNS_BOGUS=$" config.mak.autogen

# end of test.sh
-- 
Jakub Narebski
Poland
ShadeHawk on #git
--

From: Nicolas Sebrecht
Date: Sunday, July 19, 2009 - 5:48 am

Thank you, I did the same investigation here. :-)


AC_CHECK_LIB may fail to check the good libraries. Use
AC_SEARCH_LIBS instead as pointed out by the autotool
documentation.

http://www.gnu.org/software/autoconf/manual/html_node/Libraries.html#Libraries

Signed-off-by: Nicolas Sebrecht <nicolas.s.dev@gmx.fr>
---

 configure.ac |   16 ++++++++--------
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/configure.ac b/configure.ac
index 74d0af5..48dd5f3 100644
--- a/configure.ac
+++ b/configure.ac
@@ -339,9 +339,9 @@ AC_MSG_NOTICE([CHECKS for libraries])
 
 GIT_STASH_FLAGS($OPENSSLDIR)
 
-AC_CHECK_LIB([crypto], [SHA1_Init],
+AC_SEARCH_LIBS([SHA1_Init], [crypto],
 [NEEDS_SSL_WITH_CRYPTO=],
-[AC_CHECK_LIB([ssl], [SHA1_Init],
+[AC_SEARCH_LIBS([SHA1_Init], [ssl],
  [NEEDS_SSL_WITH_CRYPTO=YesPlease
   NEEDS_SSL_WITH_CRYPTO=],
  [NO_OPENSSL=YesPlease])])
@@ -358,7 +358,7 @@ AC_SUBST(NO_OPENSSL)
 
 GIT_STASH_FLAGS($CURLDIR)
 
-AC_CHECK_LIB([curl], [curl_global_init],
+AC_SEARCH_LIBS([curl_global_init], [curl],
 [NO_CURL=],
 [NO_CURL=YesPlease])
 
@@ -372,7 +372,7 @@ AC_SUBST(NO_CURL)
 
 GIT_STASH_FLAGS($EXPATDIR)
 
-AC_CHECK_LIB([expat], [XML_ParserCreate],
+AC_SEARCH_LIBS([XML_ParserCreate], [expat],
 [NO_EXPAT=],
 [NO_EXPAT=YesPlease])
 
@@ -469,7 +469,7 @@ AC_SUBST(NO_DEFLATE_BOUND)
 #
 # Define NEEDS_SOCKET if linking with libc is not enough (SunOS,
 # Patrick Mauritz).
-AC_CHECK_LIB([c], [socket],
+AC_SEARCH_LIBS([socket], [c],
 [NEEDS_SOCKET=],
 [NEEDS_SOCKET=YesPlease])
 AC_SUBST(NEEDS_SOCKET)
@@ -479,13 +479,13 @@ test -n "$NEEDS_SOCKET" && LIBS="$LIBS -lsocket"
 # Define NEEDS_RESOLV if linking with -lnsl and/or -lsocket is not enough.
 # Notably on Solaris hstrerror resides in libresolv and on Solaris 7
 # inet_ntop and inet_pton additionally reside there.
-AC_CHECK_LIB([resolv], [hstrerror],
+AC_SEARCH_LIBS([hstrerror], [resolv],
 [NEEDS_RESOLV=],
 [NEEDS_RESOLV=YesPlease])
 AC_SUBST(NEEDS_RESOLV)
 test -n "$NEEDS_RESOLV" && ...
From: Nicolas Sebrecht
Date: Sunday, July 19, 2009 - 6:14 am

The wrong check of lib gen added a flag ' -lgen' to $LIBS (in configure).
This wrong flag then gave:

 [...]/bin/ld: cannot find -lgen collect2: ld returned 1 exit status


I wonder if we could have some feedbacks on this patch. The change on
the whole file may make more bad than good. Works as expected here on
Linux (glibc 2.9).

Otherwise, the following lines are sufficient to correct the original

-- 
Nicolas Sebrecht
--

From: Junio C Hamano
Date: Sunday, July 19, 2009 - 9:13 am

Thanks.

The autoconf documentation does not give much confidence either way:

	AC_CHECK_LIB requires some care in usage, and should be avoided in
	some common cases.

It is unclear what these "some" are, and is sufficiently unclear for us to
decide if our situation falls into these "some common cases."  The only
problem it cites is that alternative libraries may contain a variant
implementation of the same function found in a more common library, in
which case you may not want to use such an alternative implementation, and
I do not think "basename() in -lgen" falls into that category.

It does say

	These days it is normally better to use AC_SEARCH_LIBS(...)
	instead of AC_CHECK_LIB(...).

but without any further justification, which does not help building
confidence in the suggestion.

It is very tempting to revert the configure.ac part of commit ecc395c
especially in an -rc freeze, but if some autoconf experts can help us
decide by clarifying the situation, and explain why AC_SEARCH_LIBS() is
indeed the better way these days (which I suspect it is), I'd say we can
apply your patch to favor AC_SEARCH_LIBS() over AC_CHECK_LIB() with more
confidence.

If autoconf people say AC_SEARCH_LIBS() should be preferred over
AC_CHECK_LIB(), it probably is the right thing to do.  But I want to have
a warmer-and-fuzzier feeling that we understand why it went wrong, rather
than blindly doing what the documentation says because the documentation
tells us to.
--

From: Eric Blake
Date: Sunday, July 19, 2009 - 3:53 pm

As autoconf maintainer (and hoping to release autoconf 2.64 soon), yes, I can
definitively say that AC_SEARCH_LIBS is nicer than AC_CHECK_LIB.

-- 
Eric Blake


--

From: Brandon Casey
Date: Tuesday, July 21, 2009 - 8:04 am

After actually reading the autoconf documentation, don't some of the
tests have the [action-if-found] and [action-if-not-found] actions
backwards?

AC_CHECK_LIB has the form:

   AC_CHECK_LIB (library, function, [action-if-found], [action-if-not-found], [other-libraries])

The test I added (which I blindly copied from the NEEDS_RESOLV test) looks like:

   AC_CHECK_LIB([gen], [basename], [NEEDS_LIBGEN=], [NEEDS_LIBGEN=YesPlease])
   AC_SUBST(NEEDS_LIBGEN)
   test -n "$NEEDS_LIBGEN" && LIBS="$LIBS -lgen"

Won't this check whether a program which calls basename() successfully links with -lgen?
If it successfully links, then it will perform NEED_LIBGEN= and if not it will set
NEEDS_LIBGEN=YesPlease, right?  Isn't that the opposite of what should be done?

If that is correct, then the NEEDS_RESOLV and NEEDS_LIBGEN tests are wrong and they
may still be wrong even if AC_SEARCH_LIBS is used instead of AC_CHECK_LIB.


A question about AC_SEARCH_LIBS:

With AC_SEARCH_LIBS, which of [action-if-found] or [action-if-not-found]
is executed if the function is found in the standard c library i.e. "calling
`AC_LINK_IFELSE([AC_LANG_CALL([], [function])])' first with no libraries"?
Is the answer neither?  If the answer is [action-if-found], won't the
NEEDS_LIBGEN=YesPlease be set when the function is found in the c library?
Assuming the NEEDS_LIBGEN test is made to look like this:

   AC_SEARCH_LIBS([basename], [gen], [NEEDS_LIBGEN=YesPlease], [NEEDS_LIBGEN=])

Depending on the answer to that question, we either will want to use
AC_SEARCH_LIBS, or stick with AC_CHECK_LIB but correct the [action] fields,
or maybe even stick with AC_CHECK_LIB but rework the NEEDS_RESOLV and
NEEDS_LIBGEN tests to look like the NEEDS_SOCKET test.

-brandon
--

From: Brandon Casey
Date: Tuesday, July 21, 2009 - 8:12 am

btw, the "`AC_LINK_IFELSE(..." line is from the autoconf documentation.  It is
the first thing that AC_SEARCH_LIBS does.  It tries to link without any of the
specified libraries first, then it iterates trying to link using each specified
library, stopping when it is successful.

If you didn't recognize that that line was from the documentation, it may be
confusing.

-brandon
--

From: Paolo Bonzini
Date: Tuesday, July 21, 2009 - 8:20 am

It evaluates the action-if-found and adds nothing to LIBS.  Instead, if 
it is found in a library, it evaluates the action-if-found after adding 
(actually prepending) -lBLAH to LIBS.

Paolo
--

From: Brandon Casey
Date: Tuesday, July 21, 2009 - 8:34 am

That's what I suspected.  It means we can't have NEEDS_SOMETHING=true in
the action-if-found parameter when using AC_SEARCH_LIBS since that may cause
the Makefile to append a library requirement when none is necessary.

-brandon
--

From: Brandon Casey
Date: Tuesday, July 21, 2009 - 1:23 pm

From: Brandon Casey <drafnel@gmail.com>

The "action" parameters for these two tests were supplied incorrectly for
the way the tests were implemented.  The tests check whether a program
which calls hstrerror() or basename() successfully links when -lresolv or
-lgen are used, respectively.  A successful linking would result in
NEEDS_RESOLV or NEEDS_LIBGEN being unset, and failure would result in
setting the respective variable.

Aside from that issue, the tests did not handle the case where neither
library was necessary for accessing the functions in question.  So solve
both of these issues by re-working the two tests so that their form is like
the NEEDS_SOCKET test which attempts to link with just the c library, and
if it fails then assumes that the additional library is necessary and sets
the appropriate variable.

Signed-off-by: Brandon Casey <drafnel@gmail.com>
---


Maybe this is the appropriate thing to do?

-brandon


 configure.ac |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index 74d0af5..ba44cf2 100644
--- a/configure.ac
+++ b/configure.ac
@@ -479,13 +479,13 @@ test -n "$NEEDS_SOCKET" && LIBS="$LIBS -lsocket"
 # Define NEEDS_RESOLV if linking with -lnsl and/or -lsocket is not enough.
 # Notably on Solaris hstrerror resides in libresolv and on Solaris 7
 # inet_ntop and inet_pton additionally reside there.
-AC_CHECK_LIB([resolv], [hstrerror],
+AC_CHECK_LIB([c], [hstrerror],
 [NEEDS_RESOLV=],
 [NEEDS_RESOLV=YesPlease])
 AC_SUBST(NEEDS_RESOLV)
 test -n "$NEEDS_RESOLV" && LIBS="$LIBS -lresolv"
 
-AC_CHECK_LIB([gen], [basename],
+AC_CHECK_LIB([c], [basename],
 [NEEDS_LIBGEN=],
 [NEEDS_LIBGEN=YesPlease])
 AC_SUBST(NEEDS_LIBGEN)
-- 
1.6.3.1.24.g152f4

--

From: Junio C Hamano
Date: Tuesday, July 21, 2009 - 1:33 pm

Thanks.

Instead of saying that "hstrerror not in -lc means we do have -lresolv and
the function will be found there" blindly, we may want to have a nested
check.

	AC_CHECK_LIB([c], [hstrerror], [NEEDS_RESOLV=],
       	    AC_CHECK_LIB([resolv], [hstrerror], [NEEDS_RESOLV=YesPlease]))

But we do not have any provision for the case where -lc does not have it
and -lresolv does not have it either (or -lresolv does not exist) anyway,
so we might as well go with your patch.

I take it that swapping [if-found][if-not-found] parameters is what the
autoconf documentation warns against?  That is, both -lc and -lresolv may
have it but -lresolv one may be a specialized one you would not normally
--

From: Brandon Casey
Date: Wednesday, July 22, 2009 - 7:59 am

Yeah, I understood how the NEEDS_SSL_WITH_CRYPTO test worked like this, but
I didn't make the tests for NEEDS_RESOLV and NEEDS_LIBGEN any more complex

That's what I assume, and that's probably why AC_SEARCH_LIBS was invented which
attempts to link without using any of the specified libraries first.  It just
doesn't seem to fit our needs in this case.

-brandon
--

From: Brandon Casey
Date: Wednesday, July 22, 2009 - 3:15 pm

From: Brandon Casey <drafnel@gmail.com>

An entry in the config.mak.in file is necessary for the NEEDS_LIBGEN variable
to appear in the config.mak.autogen file with the value assigned by the
configure script.

Signed-off-by: Brandon Casey <drafnel@gmail.com>
---


Junio,

You probably want to apply or squash this somewhere.

As I noted in the original patch, the autoconf part was untested.
I actually did some light testing on this one.  I created the configure file
on linux and ran it on Solaris and IRIX.  Both produce an error which looks
something like:

   configure[1234]: syntax error at line 4806 : `;' unexpected

And line 4806 looks like:

   for ac_lib in ; do
      ...

It works with bash though, and it works with /bin/sh on Solaris 10.  On
Solaris 10, the configure script correctly detects that hstrerror cannot be
used without -lresolv, and basename can be used without -lgen.  In
config.mak.autogen, NEEDS_RESOLV is set to 'YesPlease' and NEEDS_LIBGEN is
unset.  On my Solaris 7, bash is not available, but the informational messages
indicate the same results as for Solaris 10.  On IRIX, hstrerror() can be used
without -lresolv and basename cannot be used without -lgen.  In
config.mak.autogen, NEEDS_RESOLV is unset, and NEEDS_LIBGEN is set to
'YesPlease'.

So, I think this patch should be the final one.

-brandon


 config.mak.in |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/config.mak.in b/config.mak.in
index dd60451..67b12f7 100644
--- a/config.mak.in
+++ b/config.mak.in
@@ -34,6 +34,7 @@ NO_LIBGEN_H=@NO_LIBGEN_H@
 NEEDS_LIBICONV=@NEEDS_LIBICONV@
 NEEDS_SOCKET=@NEEDS_SOCKET@
 NEEDS_RESOLV=@NEEDS_RESOLV@
+NEEDS_LIBGEN=@NEEDS_LIBGEN@
 NO_SYS_SELECT_H=@NO_SYS_SELECT_H@
 NO_D_INO_IN_DIRENT=@NO_D_INO_IN_DIRENT@
 NO_D_TYPE_IN_DIRENT=@NO_D_TYPE_IN_DIRENT@
-- 
1.6.3.1.24.g152f4

--

From: Junio C Hamano
Date: Wednesday, July 22, 2009 - 3:35 pm

Thanks.
--

From: Brandon Casey
Date: Thursday, July 23, 2009 - 9:22 am

Junio,

I see you squashed this and applied it to master (a1142892), but
since my email was not clear, the commit message is not accurate
(in an unimportant way).

You said

    Without [the addition of NEEDS_LIBGEN to config.mak.in], the
    generated shell script would contain a snippet like this:

       for ac_lib in ; do
         ...

    which is incorrect.

Actually, the "for ac_lib in ; do" snippet is produced with or without
my patch.  I didn't mean to imply that there was a change in that
respect.  It's just that that snippet is what prevents the configure
script from completing successfully on IRIX and Solaris 7 when executing
it with /bin/sh or /bin/ksh.  When bash is used, the configure script
can execute successfully, and then it can be verified that the patched
configure.ac produces a configure script that operates correctly.

Sorry for the confusion.

-brandon



--

From: Johannes Sixt
Date: Saturday, July 18, 2009 - 3:03 pm

If you are on Linux and only want a special prefix, you don't 
need ./configure; just:

   echo prefix=/home/nicolas/bin > config.mak
   make && make install

-- Hannes
--

From: Jeff King
Date: Saturday, July 18, 2009 - 3:29 pm

Ah, that's what I was missing. I can reproduce it by setting
SNPRINTF_RETURNS_BOGUS. I think the problem is in the git_vsnprintf
code, and it just by coincidence triggers in this test because of the
exact string we are trying to format.

Look at compat/snprintf.c. In git_vsnprintf, we are passed a "va_list
ap", which we then repeatedly call vsnprintf on, checking the return to
make sure we have enough space. But using a va_list repeatedly without a
va_end and va_start in the middle invokes undefined behavior. So we need
to va_copy it and use the copy.

A patch is below, which fixes the problem for me. However, va_copy is
C99, so we would generally try to avoid it. But I don't think there is a
portable way of writing this function without it. And most systems
shouldn't need to use our snprintf at all, so maybe it is portable
enough. I dunno.

---
diff --git a/compat/snprintf.c b/compat/snprintf.c
index 6c0fb05..e862db6 100644
--- a/compat/snprintf.c
+++ b/compat/snprintf.c
@@ -18,9 +18,12 @@ int git_vsnprintf(char *str, size_t maxsize, const char *format, va_list ap)
 {
 	char *s;
 	int ret = -1;
+	va_list cp;
 
 	if (maxsize > 0) {
-		ret = vsnprintf(str, maxsize-SNPRINTF_SIZE_CORR, format, ap);
+		va_copy(cp, ap);
+		ret = vsnprintf(str, maxsize-SNPRINTF_SIZE_CORR, format, cp);
+		va_end(cp);
 		if (ret == maxsize-1)
 			ret = -1;
 		/* Windows does not NUL-terminate if result fills buffer */
@@ -39,7 +42,9 @@ int git_vsnprintf(char *str, size_t maxsize, const char *format, va_list ap)
 		if (! str)
 			break;
 		s = str;
-		ret = vsnprintf(str, maxsize-SNPRINTF_SIZE_CORR, format, ap);
+		va_copy(cp, ap);
+		ret = vsnprintf(str, maxsize-SNPRINTF_SIZE_CORR, format, cp);
+		va_end(cp);
 		if (ret == maxsize-1)
 			ret = -1;
 	}
--

From: Nicolas Sebrecht
Date: Saturday, July 18, 2009 - 3:51 pm

My investigations made me realize I was building a 64-bits git version
in a 32-bits userland (gentoo flag multilib set) which is not the best
thing to do. So, another possible fix is to export CFLAGS with '-m32'.
Mixing 32 and 64-bits applications is bad. :-)

I confirm this patch does fix the failure for the 64 bits version. Thank
you all.

Now, I wonder if it is safe to run a 32-bits git version on repositories
built with a 64-bits version. It should be safe but do you think it
actually is?

-- 
Nicolas Sebrecht
--

From: Johannes Sixt
Date: Sunday, July 19, 2009 - 4:01 am

Problem is, snprintf was made for very old systems, which typically do not 
have va_copy. (E.g. Windows, but there the situation *might* have changed 
with the switch to gcc 4.)

The rationale not to use va_copy is that this function is to be used *only* if 
necessary, i.e. portability is already lacking, and if it can be verified 
that this function works as is. Portability and correct-by-the-law C code are 
*not* a goal here. If the function does not work as is, don't use it.

-- Hannes
--

From: Jeff King
Date: Monday, July 20, 2009 - 2:09 am

OK, I guess I can buy the "don't use this unless you need it" rationale.
But two questions:

  1. _Are_ we sure it works under Windows?  That is, do we know for a
     fact that using a va_list twice is OK there, or are we just going
     on the fact that nobody has reported the bug?

     If we're not sure, then you might want to try running the recipe
     below which consistently produces a segfault for me on Linux amd64
     (but not i386, which seems to use a different representation for
     va_lists).

  2. In this case, using SNPRINTF_RETURNS_BOGUS was a mistake. But
     unfortunately using it erroneously doesn't simply cause the
     compilation to barf, or to use a slower implementation; instead it
     introduces a very subtle and hard to diagnose bug on some
     platforms. Is there anything simple we can do to protect people
     from that?

     I can't really think of anything simple, because such a mechanism
     would basically involve compiling a test program and seeing if it
     segfaults.

Anyway, bug-reproducing recipe is below.

-- >8 --
cat <<'EOF' >test-vsnprintf.c
#define SNPRINTF_RETURNS_BOGUS
#include "git-compat-util.h"
int main() {
        char buf[16];
        /* this 8 may need to be tweaked depending on
         * the system's vsnprintf return value; the goal
         * is to get git_vsnprintf to have to look at
         * it's va_list twice */
        git_snprintf(buf, 8, "%s %s", "foo", "bar");
        return 0;
}
EOF
make test-vsnprintf.o compat/snprintf.o
gcc -o test-vsnprintf test-vsnprintf.o compat/snprintf.o
./test-vsnprintf ;# or valgrind test-vsnprintf
--

From: Johannes Sixt
Date: Monday, July 20, 2009 - 1:51 pm

We are sure (well, I am sure ;) that it worked on Windows with gcc 3. It 
certainly is a reasonable workaround. It remains to confirm that the 
workaround works as expected on the other systems that use it (IRIX, 
HP/UX).  'git branch -v' is a test that calls the system provided vsnprintf 
twice (as long as the branch head commits have moderatly long summary lines).

On Windows, however, today everybody who compiles git is most likely using 
msysgit's gcc 4.4. This version has C99 vsnprintf, and the workaround should 
not be used anymore, although it does not hurt.

-- Hannes
--

From: Linus Torvalds
Date: Monday, July 20, 2009 - 2:56 pm

Not only that, but it's literally how people used to historically do 
things.

Sure, some places do require 'va_copy()', and there's a reason why it got 
added in C99. But there's also a reason why it wasn't added earlier - 
because it didn't exist!

So I agree with Johannes: we should _not_ make our compat version of 
'snprintf()' use va_copy, for the simple reason that

 - modern C library versions will have a working snprintf already

 - old setups that don't have a working snprintf quite likely don't have a 
   va_copy either.

So the set of systems that need both the va_copy _and_ our compat version 
of snprintf is likely much smaller than the set of systems that would 
actively break compilation if they don't have va_copy.

Of course, we could also add a new "NEEDS_VA_COPY" thing, and do

	#ifdef NEEDS_VA_COPY
	  #define va_copy(dst,src) (dst) = (src)
	#endif

or something like that.

		Linus
--

Previous thread: Build Git with Visual Studio 2008 by Frank Li on Saturday, July 18, 2009 - 6:35 am. (1 message)

Next thread: git svn .git/svn/*/index files taking up huge amounts of disk space by Robert Zeh on Saturday, July 18, 2009 - 7:58 am. (2 messages)