Re: [kvm-devel] include files for kvmclock

Previous thread: 2.6.23.1 NULL pointer dereference in NFS by Mathieu Desnoyers on Tuesday, November 6, 2007 - 1:48 pm. (4 messages)

Next thread: [PATCH] FRV: Fix the extern declaration of kallsyms_num_syms by David Howells on Tuesday, November 6, 2007 - 2:48 pm. (4 messages)
From: Glauber de Oliveira Costa
Date: Tuesday, November 6, 2007 - 3:18 pm

This is a new version of kvm paravirt clock implementation

This time, the clockevents part was completely wiped out.
Not that I'm dropping it: As said in my last message, I'm starting
to take the path of having a specialized irq chip, as for avi-san's
suggestion.

However, last iteration made clocksources and clockevents a lot more
independent, and so, turning it into a clocksource-only implementation
was just a matter of deleting code - that can be later added almost as-is.
It also favours people willing to use the clocksource, but going towards
userspace HPET emulation, or things like this.

The goal is to have this part included independently of the other bits.

From last release:

* no more clockevents (for a while, hang on!)
* per-vcpu hv_clock area, which led to...
* no more purely tsc timing.

If you have a new concern with this version, or I failed to address a previous concern of yours, just voice it!


-

From: Glauber de Oliveira Costa
Date: Tuesday, November 6, 2007 - 3:18 pm

This patch introduces the include files for kvm clock.
They'll be needed for both guest and host part.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
---
 include/asm-x86/kvm_para.h |   23 +++++++++++++++++++++++
 include/linux/kvm.h        |    1 +
 include/linux/kvm_para.h   |   20 ++++++++++++++++++++
 3 files changed, 44 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index c6f3fd8..af9fb75 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -10,15 +10,38 @@
  * paravirtualization, the appropriate feature bit should be checked.
  */
 #define KVM_CPUID_FEATURES	0x40000001
+#define KVM_FEATURE_CLOCKEVENTS 0
+#define KVM_FEATURE_CLOCKSOURCE 1
+
 
 #ifdef __KERNEL__
 #include <asm/processor.h>
+extern void kvmclock_init(void);
+
+union kvm_hv_clock {
+	struct {
+		u64 tsc_mult;
+		u64 now_ns;
+		/* That's the wall clock, not the water closet */
+		u64 wc_sec;
+		u64 wc_nsec;
+		u64 last_tsc;
+		/* At first, we could use the tsc value as a marker, but Jeremy
+		 * well noted that it will cause us locking problems in 32-bit
+		 * sys, so we have a special version field */
+		u32 version;
+	};
+	char page_align[PAGE_SIZE];
+};
+
 
 /* This instruction is vmcall.  On non-VT architectures, it will generate a
  * trap that we will then rewrite to the appropriate instruction.
  */
 #define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1"
 
+#define KVM_HCALL_REGISTER_CLOCK	1
+
 /* For KVM hypercalls, a three-byte sequence of either the vmrun or the vmmrun
  * instruction.  The hypervisor may replace it with something else but only the
  * instructions are guaranteed to be supported.
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 71d33d6..7ac8786 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -359,6 +359,7 @@ struct kvm_signal_mask {
 #define KVM_CAP_MMU_SHADOW_CACHE_CONTROL 2
 #define KVM_CAP_USER_MEMORY 3
 #define KVM_CAP_SET_TSS_ADDR 4
+#define ...
From: Glauber de Oliveira Costa
Date: Tuesday, November 6, 2007 - 3:18 pm

This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writting to it periodically.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
---
 drivers/kvm/irq.c      |    1 +
 drivers/kvm/kvm_main.c |    2 +
 drivers/kvm/svm.c      |    1 +
 drivers/kvm/vmx.c      |    1 +
 drivers/kvm/x86.c      |   59 ++++++++++++++++++++++++++++++++++++++++++++++++
 drivers/kvm/x86.h      |   13 ++++++++++
 6 files changed, 77 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/irq.c b/drivers/kvm/irq.c
index 22bfeee..0344879 100644
--- a/drivers/kvm/irq.c
+++ b/drivers/kvm/irq.c
@@ -92,6 +92,7 @@ void kvm_vcpu_kick_request(struct kvm_vcpu *vcpu, int request)
 
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
 {
+	vcpu->time_needs_update = 1;
 	kvm_inject_apic_timer_irqs(vcpu);
 	/* TODO: PIT, RTC etc. */
 }
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 0b8edca..5834573 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -20,6 +20,7 @@
 #include "x86_emulate.h"
 #include "irq.h"
 
+#include <linux/clocksource.h>
 #include <linux/kvm.h>
 #include <linux/module.h>
 #include <linux/errno.h>
@@ -1242,6 +1243,7 @@ static long kvm_dev_ioctl(struct file *filp,
 		case KVM_CAP_MMU_SHADOW_CACHE_CONTROL:
 		case KVM_CAP_USER_MEMORY:
 		case KVM_CAP_SET_TSS_ADDR:
+		case KVM_CAP_CLK:
 			r = 1;
 			break;
 		default:
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 95a3489..cb8c19d 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -617,6 +617,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
 
 	__free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT));
 	kvm_vcpu_uninit(vcpu);
+	release_clock(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, svm);
 }
 
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index da3a339..b5edeed 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ ...
From: Glauber de Oliveira Costa
Date: Tuesday, November 6, 2007 - 3:18 pm

This is the guest part of kvm clock implementation
It does not do tsc-only timing, as tsc can have deltas
between cpus, and it did not seem worthy to me to keep
adjusting them.

We do use it, however, for fine-grained adjustment.

Other than that, time comes from the host.

Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
---
 arch/i386/Kconfig           |   10 +++
 arch/x86/kernel/Makefile_32 |    1 +
 arch/x86/kernel/kvmclock.c  |  164 +++++++++++++++++++++++++++++++++++++++++++
 arch/x86/kernel/setup_32.c  |    5 ++
 4 files changed, 180 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/kernel/kvmclock.c

diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig
index b4437ce..a3b45f1 100644
--- a/arch/i386/Kconfig
+++ b/arch/i386/Kconfig
@@ -257,6 +257,16 @@ config VMI
 	  at the moment), by linking the kernel to a GPL-ed ROM module
 	  provided by the hypervisor.
 
+config KVM_CLOCK
+	bool "KVM paravirtualized clock"
+	select PARAVIRT
+	help
+	  Turning on this option will allow you to run a paravirtualized clock
+	  when running over the KVM hypervisor. Instead of relying on a PIT
+	  (or probably other) emulation by the underlying device model, the host
+	  provides the guest with timing infrastructure, as time of day, and
+	  timer expiration.
+
 source "arch/x86/lguest/Kconfig"
 
 endif
diff --git a/arch/x86/kernel/Makefile_32 b/arch/x86/kernel/Makefile_32
index b9d6798..df76d8c 100644
--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -43,6 +43,7 @@ obj-$(CONFIG_K8_NB)		+= k8.o
 obj-$(CONFIG_MGEODE_LX)		+= geode_32.o mfgpt_32.o
 
 obj-$(CONFIG_VMI)		+= vmi_32.o vmiclock_32.o
+obj-$(CONFIG_KVM_CLOCK)		+= kvmclock.o
 obj-$(CONFIG_PARAVIRT)		+= paravirt_32.o
 obj-y				+= pcspeaker.o
 
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c
new file mode 100644
index 0000000..8778d61
--- /dev/null
+++ b/arch/x86/kernel/kvmclock.c
@@ -0,0 +1,164 @@
+/*  KVM paravirtual clock driver. A clocksource ...
From: Avi Kivity
Date: Tuesday, November 6, 2007 - 10:50 pm

No one can actually see this as you're updating a private structure. 


What if !dst_vcpu?  What about locking?

Suggest simply using vcpu.  Every guest cpu can register its own



kmap() is bad since the page can move due to swapping. 


While it's a static inline, please prefix with kvm_ in case one day it
isn't.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-

From: Glauber de Oliveira Costa
Date: Wednesday, November 7, 2007 - 6:08 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Because my plan was exactly, updating it at each timer interrupt.
There's a trade off between
updating every run (hopefully more precision, but more overhead), versus
updating at timer irqs, or other events.

Too slow. Updating guest time, even only in timer interrupts, was a too
frequent operation, and the kmap / kunmap (atomic) at every iteration
deemed the whole thing
Earlier version had a check for !dst_vcpu, you are absolutely right.

Locking was not a problem in practice, because these operations are done
 serialized, by the same cpu.

This hypercall is called by cpu_up, which, at least in the beginning,
it's called by cpu0. And that's why each vcpu cannot register its own.
(And why we don't need locking).

Well, theorectically each vcpu do can register its own clocksource, it
will just be a little bit more complicated, we have to fire out an IPI,
and have the other cpu to catch it, and call the hypercall.

But I honestly don't like it.
Usually, the cpu leaves start_secondary with a clock already registered,
You are right. I developed part of it in an older version of kvm, where
reality was:

struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
{
        struct kvm_memory_slot *slot;

        gfn = unalias_gfn(kvm, gfn);
        slot = __gfn_to_memslot(kvm, gfn);
        if (!slot)
                return NULL;
        return slot->phys_mem[gfn - slot->base_gfn];
}

Yeah , right, but again: It will be slow to the point of making the

Okay, will do.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHMbjDjYI8LaFUWXMRAvcnAKCZOtPqHAxvcUkAfSaOezPWq1ib2wCg1TNz
fT1rt86/j25K/6lmFsRmbI0=
=nSkW
-----END PGP SIGNATURE-----
-

From: Avi Kivity
Date: Wednesday, November 7, 2007 - 6:59 am

I think kvm_inject_pending_timer_irqs() is called every __vcpu_run(), so 
your cunning plan has been foiled.


I think that we should update it every time a heavyweight exit has been 
taken.  That takes care of the tradeoff quite nicely -- heavyweight 

kvm_write_guest() will eventually be a copy_to_user(), so you need not 


Can it not be done via the processor startup sequence?  Then there's no 
need for ipis and locking.


What if the guest wants to place it in address 5GB?  That's unlikely for 
Linux and Windows, but let's do it right anyway.

-- 
error compiling committee.c: too many arguments to function

-

From: Glauber de Oliveira Costa
Date: Tuesday, November 6, 2007 - 2:35 pm

And of course, this was my test files by mistake ;-)
Oh god... ;-)

Patches aren't numbered but this one should go first. And please just ignore,
the replica of the hv_clock union definition in the patch bellow. It
should all go in asm/kvm_para.h

The other two patches are fine, and can be applied in any order, so


-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-

From: Avi Kivity
Date: Tuesday, November 6, 2007 - 10:55 pm

Do we really need 128-bit time?  you must be planning to live forever.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

-

From: Jeremy Fitzhardinge
Date: Tuesday, November 6, 2007 - 10:58 pm

Well, he's planning on having lots of very small nanoseconds.

    J
-

From: Glauber de Oliveira Costa
Date: Wednesday, November 7, 2007 - 5:49 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

The wc_nsec is legacy, and should be gone. It's not really used in
current code. However, you gave me a very good idea. Living forever
would be awesome! Where can I apply ?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHMbRfjYI8LaFUWXMRAjsQAJ4vDBW0M48fMaL9sl6XfN0+Pd82egCgutwe
9rR7+H8MUQznyinlJc76kbo=
=b3wx
-----END PGP SIGNATURE-----
-

From: Jeremy Fitzhardinge
Date: Tuesday, November 6, 2007 - 3:50 pm

From: Glauber de Oliveira Costa
Date: Tuesday, November 6, 2007 - 3:58 pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

It's called silly mistake. ;-)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHMPGvjYI8LaFUWXMRAgt2AJ9NKgq2LCueUidH56ZgUYA+5wBhGwCfdqQB
otFP1/SFowaANQ8FojEtJUE=
=8Xqp
-----END PGP SIGNATURE-----
-

From: Akio Takebe
Date: Wednesday, November 7, 2007 - 1:16 am

Hi, Glauber

Why does kvm_hv_clock need page_align?
And also the kvm_hv_clock is alloced with kvm_vcpu,
so the align is not enough, isn't it?
I thik __atribute__((__aligne__(PAGE_SIZE)))) is better than it.

Best Regards,

Akio Takebe

-

From: Glauber de Oliveira Costa
Date: Wednesday, November 7, 2007 - 6:16 am

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Each vcpu will register a page on its own. In the guest side, it will be
There's no requirements on the host part at all. So it doesn't really
It deals with the start of the structure, but not with its size. See the
guest part: Where it matters, I do use it.


Thanks

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org

iD8DBQFHMbqyjYI8LaFUWXMRAgfOAKCTeKF3cWbhILYSXY+MjtXo8B87EwCeNNhn
z9RDYaCWHIxsqlciMF0i27w=
=EIEM
-----END PGP SIGNATURE-----
-

From: Akio Takebe
Date: Wednesday, November 7, 2007 - 6:27 am

Oh, I confused the guest/host part, but I could understand it.
Thank you for your explanation.

Best Regards,

Akio Takebe

-

Previous thread: 2.6.23.1 NULL pointer dereference in NFS by Mathieu Desnoyers on Tuesday, November 6, 2007 - 1:48 pm. (4 messages)

Next thread: [PATCH] FRV: Fix the extern declaration of kallsyms_num_syms by David Howells on Tuesday, November 6, 2007 - 2:48 pm. (4 messages)