Re: [PATCH v2 04/13] Teach rebase interactive the mark command

Previous thread: git-send-email: Skipping - not found. by Timur Tabi on Monday, April 28, 2008 - 2:15 pm. (3 messages)

Next thread: Re: Dumb idea on git library for script languages by Eric Hanchrow on Monday, April 28, 2008 - 9:06 pm. (1 message)
From: Junio C Hamano
Date: Monday, April 28, 2008 - 5:25 pm

No, read the message again and think for 5 minutes.

Picking the same commit twice does not make any sense, neither does
picking the resulting commit from an earlier operation in the same
sequencer run.  Which means that the commit object name for 'pick' can
mean _only_ the pre-rewritten commit object, not 'the result of an earlier
operation that used that commit'.  And you always pick on top of the
current (detached) HEAD.

Reset is different.  You can reset either to the named commit to start
building from a known state that existed before the sequencer run started,
or reset to the result of pick (or merge) of the named commit, and your
proposal breaks down here, because you cannot tell between the two.

To rebuild this history on top of a commit O' elsewhere:

        O---A---B
             \   \
              D---E---F---G
                         / 
                        X

you would need to:

	pick A
        pick B
        reset <<to the state after "pick A">>
        pick D
        merge <<the state after "pick B">>
        pick F
        merge X (taken from somebody else)

and the syntax proposed to express <<the above part>> can either be your
"the result of the last operation that used the named commit", which is
simple in some cases, or "named commit, be it with mark or standard sha-1
expression".

Introducing a 'mark' insn to mark the previous result you may want to go
back to is one way to solve this without ambiguity.  Then abbreviated
object name won't have to be mapped as in your proposal.
--

From: Johannes Schindelin
Date: Monday, April 28, 2008 - 5:39 pm

Hi,


pick abcdefg
pick pqrstuv

Now imagine that pqrstuv is a unique commit name _before_ cherry-picking 
abcdefg, but not _after_ it.  Unlikely?  Yes.  Impossible?  No.

Ciao,
Dscho

--

From: Junio C Hamano
Date: Monday, April 28, 2008 - 10:17 pm

No, I am still being misunderstood.  That is not the uniqueness issue I
was talking about.

As I tried to describe in a message that used a phrase "the beauty of your
approach" (which I suspect you stopped reading after that phrase), your
rule "an object name names the commit after the last operation that used
the commit, unless it is the first appearance in the sequence, in which
case it names the original commit" works most of the time, but becomes
cumbersome and unwieldy when the history you try to reproduce becomes more
elaborate.  And the reason I suggested 'mark' was because I was seeing
this in the context of the sequencer, not just "rebase -i".

This is just a minor syntax issue and I am not sure why we got into this
misunderstanding, but let's try again.  Suppose you want to recreate this
history on top of a different O'.  For merges, upper parents are earlier
ones:

     A         reset O'
    / \        pick  B
   /   X       reset O'
  /   / \      pick A
 O---B   Z     merge B -- recreate X
  \   \ /      reset O'
   \   Y       pick C
    \ /        merge B? -- recreate Y
     C         reset B -- go back to recreated X
               merge B? -- recreate Z

The above sequence does not work.  As you already used B to create X',
when you want to recreate Y, the last operation used B is now X' and you
cannot name B' to merge that to C' to produce Y' (and the above sequence
is wrong in that it attempts to recreate Y' in a wrong order --- we should
reset to the recreated B and merge recreated C to produce Y', but for that
you need to be able to say "reset to B'", but you've already used B to
recreate X' so you cannot do that).

You could (because you are intelligent human) reorder things to take
advantage of the fact that pick and merge uses the "current detached HEAD"
to avoid refering to B in this particular case, like this:

        reset O'
        pick A
        reset O'
        pick B	
        reset O'
        pick C
        reset B	-- go back to ...
From: Johannes Sixt
Date: Tuesday, April 29, 2008 - 12:12 am

Because it is hand-crafted. I'd expect rebase to suggest a series that
works as long as the user doesn't modify it. Like this:

	reset O'
	pick C
	reset O'
	pick B
	merge C -- recreate Y
	reset O'
	pick A
	merge B -- recreate X
	merge Y -- recreate Z

Here all commit names are clearly the original in the first insn that
references it, and the rewritten version in later references. No marks needed.

If the user modifies the insns, he better knows what he's doing, in
particular, when it's necessary to rebuild such complex histories.

-- Hannes

--

From: Johannes Schindelin
Date: Tuesday, April 29, 2008 - 3:52 am

Hi,


I fully agree.  rebase -i is _not_ about the same goal as git-sequencer.  
rebase -i is about user interaction.  sequencer is about having a common 
plumbing for the different porcelains.

And of course, if you want to play games with rebase -i, you can _always_ 
use the "edit" command (even if you do not plan to edit) to get the commit 
name of the new commit.

And you can _always_ use the _full_ commit name to reference the original 
commit (at least that is how I planned it: the original _short_ name would 
be replaced by the rewritten commit name, but not the _long_ name).

Ciao,
Dscho

--

From: Junio C Hamano
Date: Tuesday, April 29, 2008 - 2:16 pm

The problem is that both of you stopped reading after the part you quoted.

--

From: Johannes Schindelin
Date: Tuesday, April 29, 2008 - 2:25 pm

Hi,


I did not.  But I assumed that Hannes' example showed that it is always 
possible to reorder the commands such that you there is no problem with 
interpreting a short commit name as the original commit _until_ that 
commit is rewritten, and _then_ as the rewritten commit.

It is a simple matter of the word "acyclic" in the term "DAG".  It means 
that whenever you need to refer to a commit, it either comes before or 
after the commit you need it for, not both directions.

And I tried to make clear that I thought deeply about the issue by 
mentioning that you can always use "edit" to stop somewhere and mark (even 
by a lightweight tag), should you need to split a commit such that you 
need to reference both the original and the rewritten commit.

I think the balance to cut here is that of usability.  You can cater for 
the obscure cases, but that just does not make sense.  With some recipe -- 
as illustrated by Hannes -- it is very easy to see what it does, and as 
easy to modify it, should that be necessary.

The alternative is obviously easier for the cases that next to nobody will 
need.

So no, your argument does not convince me, and I still think that I 
understood it correctly from the start.

Ciao,
Dscho


--

From: Junio C Hamano
Date: Tuesday, April 29, 2008 - 3:23 pm

I fell in the same "acyclic" fallacy before I realized it was a mistake,
especially after thought about the "rewritten B needs to be used more than
twice as a merge source" issue.  That's why I earlier said the beauty of
your approach is attractive but it "unfortunately" breaks down.

For "rebase -i", the tool needs to spit out insns (and again I'd prefer
not to require the tools to be clever to be able to write them out), and
the generated sequence needs to be easily understood by the end user who
needs to be able to edit (e.g. drop lines, reorder them, s/pick/edit/) and
easily visualize what the resulting shape of the history would be.  If we
limit ourselves to the context of a non-merge-preserving "rebase -i", the
insns will not need 'mark' (nor 'merge') and the resulting todo file would
look identical in both approaches.

But we also would want to have a sequencer generic enough to be capable of
faithfully reproducing a history with merges.
--

From: Johannes Schindelin
Date: Tuesday, April 29, 2008 - 3:55 pm

hi,


I do not understand.  The topological order assures that you have 
rewritten every commit that needs to be rewritten before rewriting the 
current commit.

Puzzled,
Dscho
--

From: Junio C Hamano
Date: Tuesday, April 29, 2008 - 4:06 pm

Perhaps it would help to go back to the message J6t incompletely quoted,
and try the example with the parent order of Y swapped (i.e. B == Y^2, C
== Y^1)

Recreating X and Y both need to refer to the rewritten B as the parameter
to "merge" insn.  You create X first then you cannot refer to B anymore to
recreate Y.  The other way around you cannot name B to recreate X.

--

From: Johannes Schindelin
Date: Tuesday, April 29, 2008 - 4:31 pm

Hi,


If you refer to "B" as the "short name of the original commit which refers 
to the rewritten commit as soon as B was rewritten", then I really do not 
see the problem.

Every commit has 0..n parents.  These are properly identified before 
rebasing.  Some of them have to be rewritten, because they are rebased.

So if you order the commits topologically, so that ancestors come first, 
you will have to jump around a bit with the "reset" command, but you can 
basically make sure that all parents that needed rewriting were 
rewritten already before rewriting that commit.

Now, if you want to split a commit, you may want to refer to the original 
commit instead of an already rewritten commit, but I think that this 
occasion is rare enough, that we can ask the user to tag that commit, and 
refer to that commit by its tag in the todo list.

Or you write down the original's long name and use that one.

But if you use the _default_ todo list, i.e. you want to rebase preserving 
merges without interfering manually with the process, what I said about 
the topological ordering still holds true.

At no point will you need to refer _both_ to the original _and_ to the 
rewritten commit name.

Come to think of it, I cannot think of a (default) case where the 
_original_ name of a to-be-rewritten commit has to be referred to, except 
for the "pick" command.

Ciao,
Dscho

--

From: Junio C Hamano
Date: Tuesday, April 29, 2008 - 6:23 pm

Hmmm.  Perhaps you are thinking about using not just A, B, C but also
names like X, Y, and Z in the insn sequence?  I was operating under the
impression that you used only single parent commits to name things, and a
name will stand for the result of the last operation that used the name
(e.g. after "pick B", B names the result of cherry-picking the original B
to detached HEAD).


                 A
                / \
               /   X
              /   / \
             O---B   Z
              \   \ /
               \   Y
                \ /
                 C

            X = checkout A, merge B
            Y = checkout C, merge B
            Z = checkout X, merge Y

I start from Q, create A', B' and C' with:

	reset Q
	pick A
        reset Q
        pick B
        reset Q
        pick C

Then I can recreate X by doing

	reset A
        merge B

The problem I had was to figure out the way to go back to "rewritten X".
I assumed you would say "B" because that is the last insn in the sequence
that used that name.

But instead you are thinking of letting me just say "X", and somehow make
the machinery guess by noticing "Ah, original X is a merge between
original A and B, and we have a merge between rewritten A and rewritten B,
so we will treat that merge as rewritten "X"?

I actually was hoping we could avoid that, which feels messy.

But now I may be misunderstanding what you meant to say.


--

From: Johannes Sixt
Date: Tuesday, April 29, 2008 - 11:25 pm

You had used this notion in your post:

	merge B -- recreate X

Did you mean the '-- recreate X' part as just a comment? I understood it
as part of the instruction, namely to say that the result of the merge is
the rewritten X. In this case you can refer to X in subsequent insns
unambiguously (keep in mind that it is actually the abbreviated SHA1 of
the original merge commit).

-- Hannes

--

From: Junio C Hamano
Date: Wednesday, April 30, 2008 - 12:10 am

Purely as a comment to let the readers know what I was describing.
--

From: Johannes Schindelin
Date: Wednesday, April 30, 2008 - 1:47 am

Hi,


I cannot bring myself to feel that this is messy.  The more I think about 
it, the clearer it becomes for me that the pick call should use the 
original commit, whereas the merge call should use the rewritten commit 
(and should therefore only be called when all ancestors of that merge 
which need rebasing were rebased already).

BTW I think that I made a stupid mistake in one of my previous mails: when 
I wrote an example for the "merge" command (as I would like it), I did 
_not_ list the original commit name of that merge.  I.e.

	merge <parent2> <parent3>... <message>

I completely forgot that for the $DOTEST/rewritten/ to work, the original 
commit name of that merge has to be listed.

But this got me thinking, and I think that to leave out the first parent 
was another mistake I made, so I really would like to have this syntax:

	merge <orig-commit> <parent1> <parent2>... <message>

This would allow to change the parents in the interactive rebase, and if 
<parent1> is not the current commit at that point, it would implicitly 
call "reset".

What appeals to me is the simplicity of this approach: you refer to the 
commits by calling them by their (original) name.

In the (probably really rare) occasion that you really need to refer to an 
original _and_ a rewritten commit, you can always use _any_ commit-ish as 
argument to the command.

Ciao,
Dscho

--

From: Junio C Hamano
Date: Wednesday, April 30, 2008 - 2:19 am

Ok, that clears my confusion, but it raises another issue.

In the context of "rebase -i", this may not be a problem, but by forcing
us to name commits always with original commits, we cannot build (instead
of rebuild) a history that does not yet exist using the sequencer
machinery, can we?

If we want to transplant "^C ^N O" in this history elsewhere:

      --o---C---N
               / 
          B---M
         /   /
        O---A

while inserting a new fix-up commit F on top of B before we merge that
side branch to rewritten A:

              --o---C---N'
                       / 
              B'--X---M'
             /       /
        O---Q-------A'

Would/Should the machinery somehow figure out that the merge between the
rewritten A (which is A') and an inserted commit X (which is made on top
of the rewritten B) corresponds to M in the original history?

This is not a made up example, but something I have to do once (on my
non-git days) or many more times (on my git days) every day when
rebuilding 'pu' on top of updated 'next' using updated tips of topic
branches.  I was hoping that the sequencer mechanism can help me
automating the process a bit more.

--

From: Johannes Sixt
Date: Wednesday, April 30, 2008 - 3:29 am

No. The 'merge' insn tells the original merge commit. So, rebase would
suggest this todo list:

	pick B
	reset Q
	pick A
	merge M A B
	merge N M C

and you would have to change it to

	pick B
	pick X       <--
	reset Q
	pick A
	merge M A X  <--
	merge N M C

But you could just as well do this:

	edit B       <--
	reset Q
	pick A
	merge M A B
	merge N M C

When the 'edit B' stops, HEAD is at B'. Now you git-am X, then 'rebase
--continue'. Since now HEAD is at X, X is recorded as the rewritten
version of B, and the 'merge M A B' would pick X as the second parent.

-- Hannes

--

From: Johannes Schindelin
Date: Wednesday, April 30, 2008 - 4:56 am

Hi,


The idea I hinted at was to refer to them by another name than the short 
name.  Then we can use the sequencer machinery.

I still maintain that it is such a rare need (even if you are a power user 
of it) that it makes sense to cater for other, simpler uses.

Ciao,
Dscho

--

From: Dmitry Potapov
Date: Wednesday, April 30, 2008 - 6:06 am

Maybe, it would be better if re-written commits were marked a bit
differently, so there will be no confusion about whether an original
or re-written commit is referred. For instance, re-written commits can
be marked by adding apostrophe at the end, so if the original commit
was "abcdef" then the re-written should be called as "abcdef'". At
least, it will make plain clear for anyone where in merge rewritten
commits are mentioned. Otherwise, it looks too magical to me.

Dmitry
--

From: Johannes Schindelin
Date: Thursday, May 1, 2008 - 5:59 am

Hi,


Fair enough.  (For the "too magical" part.)

But it would break down if you picked one commit twice, in order to split 
it.  OTOH, this is a rare thing, and you really only need to refer to 
rewritten commits in the "reset" and "merge" commands.

But there is a bigger problem with what you suggest: When merging a commit 
that is _not_ in the rewritten part of the history, adding an apostrophe 
is actively wrong.

And I still believe strongly that a regular "rebase -i -p" user will not 
want to refer to original commits, except for the "pick" command.

Ciao,
Dscho

--

Previous thread: git-send-email: Skipping - not found. by Timur Tabi on Monday, April 28, 2008 - 2:15 pm. (3 messages)

Next thread: Re: Dumb idea on git library for script languages by Eric Hanchrow on Monday, April 28, 2008 - 9:06 pm. (1 message)