This is a new version of kvm paravirt clock implementation This time, the clockevents part was completely wiped out. Not that I'm dropping it: As said in my last message, I'm starting to take the path of having a specialized irq chip, as for avi-san's suggestion. However, last iteration made clocksources and clockevents a lot more independent, and so, turning it into a clocksource-only implementation was just a matter of deleting code - that can be later added almost as-is. It also favours people willing to use the clocksource, but going towards userspace HPET emulation, or things like this. The goal is to have this part included independently of the other bits. From last release: * no more clockevents (for a while, hang on!) * per-vcpu hv_clock area, which led to... * no more purely tsc timing. If you have a new concern with this version, or I failed to address a previous concern of yours, just voice it! -
This patch introduces the include files for kvm clock.
They'll be needed for both guest and host part.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
---
include/asm-x86/kvm_para.h | 23 +++++++++++++++++++++++
include/linux/kvm.h | 1 +
include/linux/kvm_para.h | 20 ++++++++++++++++++++
3 files changed, 44 insertions(+), 0 deletions(-)
diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index c6f3fd8..af9fb75 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -10,15 +10,38 @@
* paravirtualization, the appropriate feature bit should be checked.
*/
#define KVM_CPUID_FEATURES 0x40000001
+#define KVM_FEATURE_CLOCKEVENTS 0
+#define KVM_FEATURE_CLOCKSOURCE 1
+
#ifdef __KERNEL__
#include <asm/processor.h>
+extern void kvmclock_init(void);
+
+union kvm_hv_clock {
+ struct {
+ u64 tsc_mult;
+ u64 now_ns;
+ /* That's the wall clock, not the water closet */
+ u64 wc_sec;
+ u64 wc_nsec;
+ u64 last_tsc;
+ /* At first, we could use the tsc value as a marker, but Jeremy
+ * well noted that it will cause us locking problems in 32-bit
+ * sys, so we have a special version field */
+ u32 version;
+ };
+ char page_align[PAGE_SIZE];
+};
+
/* This instruction is vmcall. On non-VT architectures, it will generate a
* trap that we will then rewrite to the appropriate instruction.
*/
#define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1"
+#define KVM_HCALL_REGISTER_CLOCK 1
+
/* For KVM hypercalls, a three-byte sequence of either the vmrun or the vmmrun
* instruction. The hypervisor may replace it with something else but only the
* instructions are guaranteed to be supported.
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 71d33d6..7ac8786 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -359,6 +359,7 @@ struct kvm_signal_mask {
#define KVM_CAP_MMU_SHADOW_CACHE_CONTROL 2
#define KVM_CAP_USER_MEMORY 3
#define KVM_CAP_SET_TSS_ADDR 4
+#define ...This is the host part of kvm clocksource implementation. As it does
not include clockevents, it is a fairly simple implementation. We
only have to register a per-vcpu area, and start writting to it periodically.
Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com>
---
drivers/kvm/irq.c | 1 +
drivers/kvm/kvm_main.c | 2 +
drivers/kvm/svm.c | 1 +
drivers/kvm/vmx.c | 1 +
drivers/kvm/x86.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++
drivers/kvm/x86.h | 13 ++++++++++
6 files changed, 77 insertions(+), 0 deletions(-)
diff --git a/drivers/kvm/irq.c b/drivers/kvm/irq.c
index 22bfeee..0344879 100644
--- a/drivers/kvm/irq.c
+++ b/drivers/kvm/irq.c
@@ -92,6 +92,7 @@ void kvm_vcpu_kick_request(struct kvm_vcpu *vcpu, int request)
void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
{
+ vcpu->time_needs_update = 1;
kvm_inject_apic_timer_irqs(vcpu);
/* TODO: PIT, RTC etc. */
}
diff --git a/drivers/kvm/kvm_main.c b/drivers/kvm/kvm_main.c
index 0b8edca..5834573 100644
--- a/drivers/kvm/kvm_main.c
+++ b/drivers/kvm/kvm_main.c
@@ -20,6 +20,7 @@
#include "x86_emulate.h"
#include "irq.h"
+#include <linux/clocksource.h>
#include <linux/kvm.h>
#include <linux/module.h>
#include <linux/errno.h>
@@ -1242,6 +1243,7 @@ static long kvm_dev_ioctl(struct file *filp,
case KVM_CAP_MMU_SHADOW_CACHE_CONTROL:
case KVM_CAP_USER_MEMORY:
case KVM_CAP_SET_TSS_ADDR:
+ case KVM_CAP_CLK:
r = 1;
break;
default:
diff --git a/drivers/kvm/svm.c b/drivers/kvm/svm.c
index 95a3489..cb8c19d 100644
--- a/drivers/kvm/svm.c
+++ b/drivers/kvm/svm.c
@@ -617,6 +617,7 @@ static void svm_free_vcpu(struct kvm_vcpu *vcpu)
__free_page(pfn_to_page(svm->vmcb_pa >> PAGE_SHIFT));
kvm_vcpu_uninit(vcpu);
+ release_clock(vcpu);
kmem_cache_free(kvm_vcpu_cache, svm);
}
diff --git a/drivers/kvm/vmx.c b/drivers/kvm/vmx.c
index da3a339..b5edeed 100644
--- a/drivers/kvm/vmx.c
+++ b/drivers/kvm/vmx.c
@@ ...This is the guest part of kvm clock implementation It does not do tsc-only timing, as tsc can have deltas between cpus, and it did not seem worthy to me to keep adjusting them. We do use it, however, for fine-grained adjustment. Other than that, time comes from the host. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> --- arch/i386/Kconfig | 10 +++ arch/x86/kernel/Makefile_32 | 1 + arch/x86/kernel/kvmclock.c | 164 +++++++++++++++++++++++++++++++++++++++++++ arch/x86/kernel/setup_32.c | 5 ++ 4 files changed, 180 insertions(+), 0 deletions(-) create mode 100644 arch/x86/kernel/kvmclock.c diff --git a/arch/i386/Kconfig b/arch/i386/Kconfig index b4437ce..a3b45f1 100644 --- a/arch/i386/Kconfig +++ b/arch/i386/Kconfig @@ -257,6 +257,16 @@ config VMI at the moment), by linking the kernel to a GPL-ed ROM module provided by the hypervisor. +config KVM_CLOCK + bool "KVM paravirtualized clock" + select PARAVIRT + help + Turning on this option will allow you to run a paravirtualized clock + when running over the KVM hypervisor. Instead of relying on a PIT + (or probably other) emulation by the underlying device model, the host + provides the guest with timing infrastructure, as time of day, and + timer expiration. + source "arch/x86/lguest/Kconfig" endif diff --git a/arch/x86/kernel/Makefile_32 b/arch/x86/kernel/Makefile_32 index b9d6798..df76d8c 100644 --- a/arch/x86/kernel/Makefile_32 +++ b/arch/x86/kernel/Makefile_32 @@ -43,6 +43,7 @@ obj-$(CONFIG_K8_NB) += k8.o obj-$(CONFIG_MGEODE_LX) += geode_32.o mfgpt_32.o obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o +obj-$(CONFIG_KVM_CLOCK) += kvmclock.o obj-$(CONFIG_PARAVIRT) += paravirt_32.o obj-y += pcspeaker.o diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c new file mode 100644 index 0000000..8778d61 --- /dev/null +++ b/arch/x86/kernel/kvmclock.c @@ -0,0 +1,164 @@ +/* KVM paravirtual clock driver. A clocksource ...
No one can actually see this as you're updating a private structure. What if !dst_vcpu? What about locking? Suggest simply using vcpu. Every guest cpu can register its own kmap() is bad since the page can move due to swapping. While it's a static inline, please prefix with kvm_ in case one day it isn't. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Because my plan was exactly, updating it at each timer interrupt.
There's a trade off between
updating every run (hopefully more precision, but more overhead), versus
updating at timer irqs, or other events.
Too slow. Updating guest time, even only in timer interrupts, was a too
frequent operation, and the kmap / kunmap (atomic) at every iteration
deemed the whole thing
Earlier version had a check for !dst_vcpu, you are absolutely right.
Locking was not a problem in practice, because these operations are done
serialized, by the same cpu.
This hypercall is called by cpu_up, which, at least in the beginning,
it's called by cpu0. And that's why each vcpu cannot register its own.
(And why we don't need locking).
Well, theorectically each vcpu do can register its own clocksource, it
will just be a little bit more complicated, we have to fire out an IPI,
and have the other cpu to catch it, and call the hypercall.
But I honestly don't like it.
Usually, the cpu leaves start_secondary with a clock already registered,
You are right. I developed part of it in an older version of kvm, where
reality was:
struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
{
struct kvm_memory_slot *slot;
gfn = unalias_gfn(kvm, gfn);
slot = __gfn_to_memslot(kvm, gfn);
if (!slot)
return NULL;
return slot->phys_mem[gfn - slot->base_gfn];
}
Yeah , right, but again: It will be slow to the point of making the
Okay, will do.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)
Comment: Using GnuPG with Remi - http://enigmail.mozdev.org
iD8DBQFHMbjDjYI8LaFUWXMRAvcnAKCZOtPqHAxvcUkAfSaOezPWq1ib2wCg1TNz
fT1rt86/j25K/6lmFsRmbI0=
=nSkW
-----END PGP SIGNATURE-----
-
I think kvm_inject_pending_timer_irqs() is called every __vcpu_run(), so your cunning plan has been foiled. I think that we should update it every time a heavyweight exit has been taken. That takes care of the tradeoff quite nicely -- heavyweight kvm_write_guest() will eventually be a copy_to_user(), so you need not Can it not be done via the processor startup sequence? Then there's no need for ipis and locking. What if the guest wants to place it in address 5GB? That's unlikely for Linux and Windows, but let's do it right anyway. -- error compiling committee.c: too many arguments to function -
And of course, this was my test files by mistake ;-) Oh god... ;-) Patches aren't numbered but this one should go first. And please just ignore, the replica of the hv_clock union definition in the patch bellow. It should all go in asm/kvm_para.h The other two patches are fine, and can be applied in any order, so -- Glauber de Oliveira Costa. "Free as in Freedom" http://glommer.net "The less confident you are, the more serious you have to act." -
Do we really need 128-bit time? you must be planning to live forever. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -
Well, he's planning on having lots of very small nanoseconds.
J
-
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The wc_nsec is legacy, and should be gone. It's not really used in current code. However, you gave me a very good idea. Living forever would be awesome! Where can I apply ? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Remi - http://enigmail.mozdev.org iD8DBQFHMbRfjYI8LaFUWXMRAjsQAJ4vDBW0M48fMaL9sl6XfN0+Pd82egCgutwe 9rR7+H8MUQznyinlJc76kbo= =b3wx -----END PGP SIGNATURE----- -
-
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 It's called silly mistake. ;-) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Remi - http://enigmail.mozdev.org iD8DBQFHMPGvjYI8LaFUWXMRAgt2AJ9NKgq2LCueUidH56ZgUYA+5wBhGwCfdqQB otFP1/SFowaANQ8FojEtJUE= =8Xqp -----END PGP SIGNATURE----- -
Hi, Glauber Why does kvm_hv_clock need page_align? And also the kvm_hv_clock is alloced with kvm_vcpu, so the align is not enough, isn't it? I thik __atribute__((__aligne__(PAGE_SIZE)))) is better than it. Best Regards, Akio Takebe -
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Each vcpu will register a page on its own. In the guest side, it will be There's no requirements on the host part at all. So it doesn't really It deals with the start of the structure, but not with its size. See the guest part: Where it matters, I do use it. Thanks -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Remi - http://enigmail.mozdev.org iD8DBQFHMbqyjYI8LaFUWXMRAgfOAKCTeKF3cWbhILYSXY+MjtXo8B87EwCeNNhn z9RDYaCWHIxsqlciMF0i27w= =EIEM -----END PGP SIGNATURE----- -
Oh, I confused the guest/host part, but I could understand it. Thank you for your explanation. Best Regards, Akio Takebe -
