[PATCH 5/8] KVM: PVDMA: Update dma_alloc_coherent to make it paravirt-aware

Previous thread: unionfs and sys_readahead by Paul Albrecht on Tuesday, November 6, 2007 - 3:01 pm. (4 messages)

Next thread: libata: cdrw/dvdrom disabed after s2ram (2.6.24-rc2) by Roberto Oppedisano on Wednesday, November 7, 2007 - 7:15 am. (15 messages)
From: Amit Shah
Date: Wednesday, November 7, 2007 - 7:21 am

This patchset is work in progress and is sent out for comments.

Guests within KVM can have paravirtualized DMA access. I've tested
the e1000 driver, and that works fine. A few problems/conditions to
get things to work:

- The pv driver should only be used as a module. If built into the
  kernel, It freezes during the HD bringup
- Locks aren't taken on the host; multiple guests with passthrough
  won't work
- Only 64 bit host and 64 bit guests are supported

And there are several FIXMEs mentioned in the code, but none
as grave as the ones already mentioned above.

The bulk of the passthrough work is done in userspace (qemu). Patches
will be sent shortly to the kvm-devel and qemu lists.
-

From: Amit Shah
Date: Wednesday, November 7, 2007 - 7:21 am

Introduce three hypercalls and one ioctl for enabling guest
DMA mappings.

An ioctl comes from userspace (qemu) to notify of a physical
device being assigned to a guest. Guests make a hypercall (once
per device) to find out if the device is a passthrough device
and if any DMA translations are necessary.

Two other hypercalls map and unmap DMA regions respectively
for the guest. We basically look up the host page address
and return it in case of a single-page request.

For a multi-page request, we do a dma_map_sg.

Since guests are pageable, we pin all the pages under the DMA
operation on the map request and unpin them on the unmap
operation.

Major tasks still to be done: implement proper locking (get a
vm-lock), we never free some part of memory

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 drivers/kvm/x86.c          |  273 ++++++++++++++++++++++++++++++++++++++++++++
 include/asm-x86/kvm_para.h |   23 ++++-
 include/linux/kvm.h        |    3 +
 3 files changed, 297 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index e905d46..60ea93a 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -21,8 +21,11 @@
 
 #include <linux/kvm.h>
 #include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/pci.h>
 #include <linux/vmalloc.h>
 #include <linux/module.h>
+#include <linux/highmem.h>
 
 #include <asm/uaccess.h>
 
@@ -61,6 +64,254 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
 	{ NULL }
 };
 
+/* Paravirt DMA: We pin the host-side pages for the GPAs that we get
+ * for the DMA operation. We do a sg_map on the host pages for a DMA
+ * operation on the guest side. We un-pin the pages on the
+ * unmap_hypercall.
+ */
+struct dma_map {
+	struct list_head list;
+	int nents;
+	struct scatterlist *sg;
+};
+
+/* This list is to store the guest bus:device:function and host
+ * bus:device:function mapping for passthrough'ed devices.
+ */
+/* FIXME: make this per-vm */
+/* FIXME: delete this list at the ...
From: Amit Shah
Date: Wednesday, November 7, 2007 - 7:21 am

We have some structures defined which are going to be
used by userspace for ioctls.

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 include/linux/kvm_para.h |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index e4db25f..ff6ac27 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -12,12 +12,12 @@
 /* Return values for hypercalls */
 #define KVM_ENOSYS		1000
 
-#ifdef __KERNEL__
 /*
  * hypercalls use architecture specific
  */
 #include <asm/kvm_para.h>
 
+#ifdef __KERNEL__
 static inline int kvm_para_has_feature(unsigned int feature)
 {
 	if (kvm_arch_para_features() & (1UL << feature))
@@ -26,4 +26,3 @@ static inline int kvm_para_has_feature(unsigned int feature)
 }
 #endif /* __KERNEL__ */
 #endif /* __LINUX_KVM_PARA_H */
-
-- 
1.5.3

-

From: Amit Shah
Date: Wednesday, November 7, 2007 - 7:21 am

[Empty message]
From: Amit Shah
Date: Wednesday, November 7, 2007 - 7:21 am

A guest can call dma_ops->is_pv_device() to find out
if a device is a passthrough'ed device (device passed
on to a guest by the host). If this is true, a hypercall
will be made to translate DMA mapping operations.

This function can be done away with and just a
kvm_is_pv_device() call can be added, which can be no-op
on a non-pv guest (or on the host).

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 include/asm-x86/dma-mapping_64.h |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/dma-mapping_64.h b/include/asm-x86/dma-mapping_64.h
index ecd0f61..3943edd 100644
--- a/include/asm-x86/dma-mapping_64.h
+++ b/include/asm-x86/dma-mapping_64.h
@@ -48,6 +48,8 @@ struct dma_mapping_ops {
 				int direction);
 	int             (*dma_supported)(struct device *hwdev, u64 mask);
 	int		is_phys;
+	/* Is this a physical device in a paravirtualized guest? */
+	int		(*is_pv_device)(struct device *hwdev, const char *name);
 };
 
 extern dma_addr_t bad_dma_address;
-- 
1.5.3

-

From: Amit Shah
Date: Wednesday, November 7, 2007 - 7:21 am

Of all the DMA calls, only dma_alloc_coherent might not actually
call dma_ops->alloc_coherent. We make sure that gets called
if the device that's being worked on is a PV device

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 arch/x86/kernel/pci-dma_64.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index aa805b1..d4b1713 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -11,6 +11,7 @@
 #include <asm/io.h>
 #include <asm/gart.h>
 #include <asm/calgary.h>
+#include <linux/kvm_para.h>
 
 int iommu_merge __read_mostly = 1;
 EXPORT_SYMBOL(iommu_merge);
@@ -134,6 +135,18 @@ dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
 		memset(memory, 0, size);
 		if (!mmu) {
 			*dma_handle = virt_to_bus(memory);
+			if (unlikely(dma_ops->is_pv_device)
+			    && unlikely(dma_ops->is_pv_device(dev, dev->bus_id))) {
+				void *r;
+				r = dma_ops->alloc_coherent(dev, size,
+							    dma_handle,
+							    gfp);
+				if (r == NULL) {
+					free_pages((unsigned long)memory,
+						   get_order(size));
+					memory = NULL;
+				}
+			}
 			return memory;
 		}
 	}
-- 
1.5.3

-

From: Amit Shah
Date: Wednesday, November 7, 2007 - 7:21 am

Add Makefile rule for compiling the new file
that we create

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 drivers/kvm/Makefile |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index cf18ad4..f492e3e 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -8,3 +8,5 @@ kvm-intel-objs = vmx.o
 obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
 kvm-amd-objs = svm.o
 obj-$(CONFIG_KVM_AMD) += kvm-amd.o
+kvm-pv-dma-objs = kvm_pv_dma.o
+obj-$(CONFIG_KVM_PV_DMA) += kvm_pv_dma.o
-- 
1.5.3

-

From: Amit Shah
Date: Wednesday, November 7, 2007 - 7:21 am

This is to be enabled on a guest. Currently, only
'module' works; compiling it in freezes at HD bringup

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 drivers/kvm/Kconfig |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index 6569206..3385c10 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -47,6 +47,14 @@ config KVM_AMD
 	  Provides support for KVM on AMD processors equipped with the AMD-V
 	  (SVM) extensions.
 
+config KVM_PV_DMA
+	tristate "Para-virtualized DMA access"
+       ---help---
+         Provides support for DMA operations in the guest. A hypercall
+	 is raised to the host to enable devices owned by guest to use
+	 DMA. Select this if compiling a guest kernel and you need
+	 paravirtualized DMA operations.
+
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
 source drivers/lguest/Kconfig
-- 
1.5.3

-

From: Amit Shah
Date: Wednesday, November 7, 2007 - 7:21 am

Check for CONFIG_VIRTUALIZATION instead of CONFIG_KVM,
since the PV drivers won't depend on CONFIG_KVM and we
still want to be selectable

Signed-off-by: Amit Shah <amit.shah@qumranet.com>
---
 drivers/Makefile |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/Makefile b/drivers/Makefile
index 8cb37e3..6f1c287 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -47,7 +47,7 @@ obj-$(CONFIG_SPI)		+= spi/
 obj-$(CONFIG_PCCARD)		+= pcmcia/
 obj-$(CONFIG_DIO)		+= dio/
 obj-$(CONFIG_SBUS)		+= sbus/
-obj-$(CONFIG_KVM)		+= kvm/
+obj-$(CONFIG_VIRTUALIZATION)	+= kvm/
 obj-$(CONFIG_ZORRO)		+= zorro/
 obj-$(CONFIG_MAC)		+= macintosh/
 obj-$(CONFIG_ATA_OVER_ETH)	+= block/aoe/
-- 
1.5.3

-

Previous thread: unionfs and sys_readahead by Paul Albrecht on Tuesday, November 6, 2007 - 3:01 pm. (4 messages)

Next thread: