I think we could do something there. Let's have a look at a few
marker+immediate values fast paths on x86_32 :
4631: b0 00 mov $0x0,%al
4633: 84 c0 test %al,%al
4635: 0f 85 c6 00 00 00 jne 4701 <try_to_wake_up+0xea>
7059: b0 00 mov $0x0,%al
705b: 84 c0 test %al,%al
705d: 75 63 jne 70c2 <sched_exec+0xb6>
83ac: b0 00 mov $0x0,%al
83ae: 84 c0 test %al,%al
83b0: 75 29 jne 83db <wait_task_inactive+0x69>
If we want to support NMI context and have the ability to instrument
preemptable code without too much headache, we must insure that every
modification will leave the code in a "correct" state and that we do not
grow the size of any reachable instruction. Also, we must insure gcc
did not put code between these instructions. Modifying non-relocatable
instructions would also be a pain, since we would have to deal with
instruction pointer relocation in the breakpoint code when the code
modification is being done.
Luckily, gcc almost never place any code between the mov, test and jne
instructions. But since we cannot we sure, we could dynamically check
for this code pattern after the mov instruction. If we find it, then we
play with it as if it was a single asm block, but if we don't find what
we expect, then we use standard immediate values for that. I expect the
heavily optimised version will be usable almost all the time.
This heavily optimized version could consist of a simple jump to the
address following the "jne" instruction. To activate the immediate
value, we could simply put back a mov $0x1,%al. I don't think we care
_that_ much about the active tracing performance, since we take a
supplementary function call already in that case.
We could probably force the mov into %al to make sure we search for only
one test pattern (%al,%al). We would have to decode the jne instruction
to see how big it is so we can put the correct offset in the jmp
instruction replacing the original mov.
The only problem that arises is if the gcc compiler uses the zero flag
set by testb by code following the jne instruction, but in our case, I
don't see how gcc could ever want to reuse the zero flag set by a test
on our own mov to a register unless we re-use the value loaded somewhere
else.
Dealing with the non-relocatable jmp instruction could be done by
checking, in the int3 immediate values notifiy callback, if the
instruction being modified is a jmp. If it is, we simply update the
return address without executing the bypass code.
What do you think of these ideas ?
Mathieu
--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--