Hi, This set of patches adds support for the Large Physical Extensions on the ARM architecture (available with the Cortex-A15 processor). LPAE comes with a 3-level page table format (compared to 2-level for the classic one), allowing up to 40-bit physical address space. The ARM LPAE documentation is available from (free registration needed): http://infocenter.arm.com/help/topic/com.arm.doc.ddi0406b_virtualization_extns/index.html The full set of patches (kernel fixes, LPAE and support for an emulated Versatile Express with Cortex-A15 tile) is available on this branch: git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-2.6-cm.git arm-lpae Changelog: - Upgraded to latest mainline kernel (2.6.37-rc1) solving several conflicts. - Enable CONFIG_ARCH_DMA_ADDR_T_64BIT for future compatibility with the unified dma_addr_t patch. - PHYS_ADDR_FMT printk format changed to be ANSI C compliant. - Alignment fault now uses SIGBUS instead of SIGILL. - arch/arm/kernel/head.S modified to use SECTION_SHIFT and reduce the amount of #ifdef's. - setup_mm_for_reboot() modified for LPAE. - identity_mapping_add/del() modified for LPAE. - Removed FIRST_USER_PGD_NR definition as the place where it was used have been modified. Any comments are welcome. Thanks. Catalin Marinas (13): ARM: LPAE: Use PMD_(SHIFT|SIZE|MASK) instead of PGDIR_* ARM: LPAE: Factor out 2-level page table definitions into separate files ARM: LPAE: Do not assume Linux PTEs are always at PTRS_PER_PTE offset ARM: LPAE: Introduce L_PTE_NOEXEC and L_PTE_NOWRITE ARM: LPAE: Introduce the 3-level page table format definitions ARM: LPAE: Page table maintenance for the 3-level format ARM: LPAE: MMU setup for the 3-level page table format ARM: LPAE: Change setup_mm_for_reboot() to work with LPAE ARM: LPAE: Remove the FIRST_USER_PGD_NR and USER_PTRS_PER_PGD definitions ARM: LPAE: Add fault handling support ARM: LPAE: Add context switching support ARM: LPAE: Add SMP support ...
The DFSR and IFSR register format is different when LPAE is enabled. In
addition, DFSR and IFSR have the similar definitions for the fault type.
This modifies modifies the fault code to correctly handle the new
format.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/mm/alignment.c | 8 ++++-
arch/arm/mm/fault.c | 80 +++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 87 insertions(+), 1 deletions(-)
diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c
index 724ba3b..bc98a6e 100644
--- a/arch/arm/mm/alignment.c
+++ b/arch/arm/mm/alignment.c
@@ -906,6 +906,12 @@ do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
return 0;
}
+#ifdef CONFIG_ARM_LPAE
+#define ALIGNMENT_FAULT 33
+#else
+#define ALIGNMENT_FAULT 1
+#endif
+
/*
* This needs to be done after sysctl_init, otherwise sys/ will be
* overwritten. Actually, this shouldn't be in sys/ at all since
@@ -939,7 +945,7 @@ static int __init alignment_init(void)
ai_usermode = UM_FIXUP;
}
- hook_fault_code(1, do_alignment, SIGBUS, BUS_ADRALN,
+ hook_fault_code(ALIGNMENT_FAULT, do_alignment, SIGBUS, BUS_ADRALN,
"alignment exception");
/*
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 5da7b0c..2dde9cd 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -33,10 +33,15 @@
#define FSR_WRITE (1 << 11)
#define FSR_FS4 (1 << 10)
#define FSR_FS3_0 (15)
+#define FSR_FS5_0 (0x3f)
static inline int fsr_fs(unsigned int fsr)
{
+#ifdef CONFIG_ARM_LPAE
+ return fsr & FSR_FS5_0;
+#else
return (fsr & FSR_FS3_0) | (fsr & FSR_FS4) >> 6;
+#endif
}
#ifdef CONFIG_MMU
@@ -108,7 +113,9 @@ void show_pte(struct mm_struct *mm, unsigned long addr)
pte = pte_offset_map(pmd, addr);
printk(", *pte=%08lx", pte_val(*pte));
+#ifndef CONFIG_ARM_LPAE
printk(", *ppte=%08lx", pte_val(pte[-LINUX_PTE_OFFSET]));
+#endif
pte_unmap(pte);
} while(0);
@@ -467,6 +474,72 @@ static ...This is an unrelated change - should it be in a different patch? --
It was intended to be in this patch as I couldn't find a better place. This patch sorts out the fault handling (and error reporting) for LPAE and we don't need the additional printk here. -- Catalin --
It doesn't sort the fault error reporting actually. With pte_val() returning u64 constants on LPAE, all the above printk's using %08lx will issue warnings. Also, as one of your previous patches changed the non-LPAE stuff to use u32, which is 'unsigned int', %08lx is wrong for them too, and will cause the compiler to spit out warnings. I can only assume this patch hasn't been build-tested, or maybe it has but the warnings ignored? It seems a larger patch is required here - and as such might as well become a separate "fix fault reporting" patch. --
This has been fixed in a subsequent version of the series with the Probably the latter. I run the resulting kernels both on VE (with A9) and a model supporting A15. -- Catalin --
Placing the Linux PTEs at a 2KB offset inside a page is a workaround for
the 2-level page table format where not enough spare bits are available.
With LPAE this is no longer required. This patch changes such assumption
by using a different macro, LINUX_PTE_OFFSET, which is defined to
PTRS_PER_PTE for the 2-level page tables.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/pgalloc.h | 6 +++---
arch/arm/include/asm/pgtable-2level.h | 1 +
arch/arm/include/asm/pgtable.h | 6 +++---
arch/arm/mm/fault.c | 2 +-
arch/arm/mm/mmu.c | 3 ++-
5 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h
index b12cc98..c2a1f64 100644
--- a/arch/arm/include/asm/pgalloc.h
+++ b/arch/arm/include/asm/pgalloc.h
@@ -62,7 +62,7 @@ pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr)
pte = (pte_t *)__get_free_page(PGALLOC_GFP);
if (pte) {
clean_dcache_area(pte, sizeof(pte_t) * PTRS_PER_PTE);
- pte += PTRS_PER_PTE;
+ pte += LINUX_PTE_OFFSET;
}
return pte;
@@ -95,7 +95,7 @@ pte_alloc_one(struct mm_struct *mm, unsigned long addr)
static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
{
if (pte) {
- pte -= PTRS_PER_PTE;
+ pte -= LINUX_PTE_OFFSET;
free_page((unsigned long)pte);
}
}
@@ -128,7 +128,7 @@ pmd_populate_kernel(struct mm_struct *mm, pmd_t *pmdp, pte_t *ptep)
* The pmd must be loaded with the physical
* address of the PTE table
*/
- pte_ptr -= PTRS_PER_PTE * sizeof(void *);
+ pte_ptr -= LINUX_PTE_OFFSET * sizeof(void *);
__pmd_populate(pmdp, __pa(pte_ptr) | _PAGE_KERNEL_TABLE);
}
diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h
index d60bda9..36bdef7 100644
--- a/arch/arm/include/asm/pgtable-2level.h
+++ b/arch/arm/include/asm/pgtable-2level.h
@@ -71,6 +71,7 @@
#define PTRS_PER_PTE 512
#define ...Hmm. I think we should be doing this a different way - in fact, I think
we should switch the order of the linux vs hardware page tables. This
actually simplifies the code a bit too - notice that we lose the arith.
in __pte_map, __pte_unmap, pmd_page_vaddr, which is all page table
walking stuff.
arch/arm/include/asm/pgalloc.h | 34 +++++++++++++++-------------------
arch/arm/include/asm/pgtable.h | 30 +++++++++++++++---------------
arch/arm/mm/fault.c | 2 +-
arch/arm/mm/mmu.c | 2 +-
arch/arm/mm/proc-macros.S | 10 +++++-----
arch/arm/mm/proc-v7.S | 8 +++-----
6 files changed, 40 insertions(+), 46 deletions(-)
diff --git a/arch/arm/include/asm/pgalloc.h b/arch/arm/include/asm/pgalloc.h
index b12cc98..e2a6613 100644
--- a/arch/arm/include/asm/pgalloc.h
+++ b/arch/arm/include/asm/pgalloc.h
@@ -38,6 +38,11 @@ extern void free_pgd_slow(struct mm_struct *mm, pgd_t *pgd);
#define PGALLOC_GFP (GFP_KERNEL | __GFP_NOTRACK | __GFP_REPEAT | __GFP_ZERO)
+static inline void clean_pte_table(void *ptr)
+{
+ clean_dcache_area(ptr + PTE_HWTABLE_OFF, PTE_HWTABLE_SIZE);
+}
+
/*
* Allocate one PTE table.
*
@@ -60,10 +65,8 @@ pte_alloc_one_kernel(struct mm_struct *mm, unsigned long addr)
pte_t *pte;
pte = (pte_t *)__get_free_page(PGALLOC_GFP);
- if (pte) {
- clean_dcache_area(pte, sizeof(pte_t) * PTRS_PER_PTE);
- pte += PTRS_PER_PTE;
- }
+ if (pte)
+ clean_pte_table(pte);
return pte;
}
@@ -79,10 +82,8 @@ pte_alloc_one(struct mm_struct *mm, unsigned long addr)
pte = alloc_pages(PGALLOC_GFP, 0);
#endif
if (pte) {
- if (!PageHighMem(pte)) {
- void *page = page_address(pte);
- clean_dcache_area(page, sizeof(pte_t) * PTRS_PER_PTE);
- }
+ if (!PageHighMem(pte))
+ clean_pte_table(page_address(pte));
pgtable_page_ctor(pte);
}
@@ -94,10 +95,8 @@ pte_alloc_one(struct mm_struct *mm, unsigned long addr)
*/
static inline void pte_free_kernel(struct mm_struct *mm, pte_t *pte)
...On 15 November 2010 17:42, Russell King - ARM Linux It looks like a good clean-up to me (though I need some refactoring on my LPAE patches). Do you plan to push this upstream? If you add a comment and a signed-off line, I can carry it in my LPAE branch until it appears in mainline. Thanks. -- Catalin --
This patch introduces the pgtable-3level*.h files with definitions specific to the LPAE page table format (3 levels of page tables). Each table is 4KB and has 512 64-bit entries. An entry can point to a 40-bit physical address. The young, write and exec software bits share the corresponding hardware bits (negated). Other software bits use spare bits in the PTE. The patch also changes some variable types from unsigned long or int to pteval_t or pgprot_t. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/page.h | 4 + arch/arm/include/asm/pgtable-3level-hwdef.h | 78 ++++++++++++++++++ arch/arm/include/asm/pgtable-3level-types.h | 55 +++++++++++++ arch/arm/include/asm/pgtable-3level.h | 113 +++++++++++++++++++++++++++ arch/arm/include/asm/pgtable-hwdef.h | 4 + arch/arm/include/asm/pgtable.h | 6 +- arch/arm/mm/mm.h | 8 +- arch/arm/mm/mmu.c | 2 +- 8 files changed, 264 insertions(+), 6 deletions(-) create mode 100644 arch/arm/include/asm/pgtable-3level-hwdef.h create mode 100644 arch/arm/include/asm/pgtable-3level-types.h create mode 100644 arch/arm/include/asm/pgtable-3level.h diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h index 3848105..e5124db 100644 --- a/arch/arm/include/asm/page.h +++ b/arch/arm/include/asm/page.h @@ -151,7 +151,11 @@ extern void __cpu_copy_user_highpage(struct page *to, struct page *from, #define clear_page(page) memset((void *)(page), 0, PAGE_SIZE) extern void copy_page(void *to, const void *from); +#ifdef CONFIG_ARM_LPAE +#include <asm/pgtable-3level-types.h> +#else #include <asm/pgtable-2level-types.h> +#endif #endif /* CONFIG_MMU */ diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h b/arch/arm/include/asm/pgtable-3level-hwdef.h new file mode 100644 index 0000000..2f99c3c --- /dev/null +++ b/arch/arm/include/asm/pgtable-3level-hwdef.h @@ -0,0 ...
It is really not correct to have these constants type'd as pmd_t. The idea behind pmd_t et.al. is to detect when normal arithmetic or logical operations are performed on page table entries when the accessors instead should be used. By typing these as pmd_t, it means operations need to be: u32 pmdval = pmd_val(foo) | pmd_val(PMD_TYE_TABLE); Again, this is wrong. There's an accessor for pgprot_t typed data. This Ditto. --
On 15 November 2010 18:34, Russell King - ARM Linux OK, I'll define pgprotval_t and accessors. -- Catalin --
From: Will Deacon <will.deacon@arm.com> This patch uses the types.h implementation in asm-generic to define the dma_addr_t type as the same width as phys_addr_t. NOTE: this is a temporary patch until the corresponding patches unifying the dma_addr_t and removing the dma64_addr_t are merged into mainline. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/types.h | 20 +------------------- 1 files changed, 1 insertions(+), 19 deletions(-) diff --git a/arch/arm/include/asm/types.h b/arch/arm/include/asm/types.h index 345df01..dc1bdbb 100644 --- a/arch/arm/include/asm/types.h +++ b/arch/arm/include/asm/types.h @@ -1,30 +1,12 @@ #ifndef __ASM_ARM_TYPES_H #define __ASM_ARM_TYPES_H -#include <asm-generic/int-ll64.h> +#include <asm-generic/types.h> -#ifndef __ASSEMBLY__ - -typedef unsigned short umode_t; - -#endif /* __ASSEMBLY__ */ - -/* - * These aren't exported outside the kernel to avoid name space clashes - */ #ifdef __KERNEL__ #define BITS_PER_LONG 32 -#ifndef __ASSEMBLY__ - -/* Dma addresses are 32-bits wide. */ - -typedef u32 dma_addr_t; -typedef u32 dma64_addr_t; - -#endif /* __ASSEMBLY__ */ - #endif /* __KERNEL__ */ #endif --
From: Will Deacon <will.deacon@arm.com>
The physical start address of memory may be > 4GB and therefore
unrepresentable using an unsigned long.
This patch changes early_mem and arm_add_memory to use phys_addr_t
instead of unsigned long for the start address.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/kernel/setup.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 3cadb46..751ac80 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -442,7 +442,7 @@ static struct machine_desc * __init setup_machine(unsigned int nr)
return list;
}
-static int __init arm_add_memory(unsigned long start, unsigned long size)
+static int __init arm_add_memory(phys_addr_t start, unsigned long size)
{
struct membank *bank = &meminfo.bank[meminfo.nr_banks];
@@ -478,7 +478,8 @@ static int __init arm_add_memory(unsigned long start, unsigned long size)
static int __init early_mem(char *p)
{
static int usermem __initdata = 0;
- unsigned long size, start;
+ unsigned long size;
+ phys_addr_t start;
char *endp;
/*
--
From: Will Deacon <will.deacon@arm.com>
LPAE provides support for memory banks with physical addresses of up
to 40 bits.
This patch adds a new atag, ATAG_MEM64, so that the Kernel can be
informed about memory that exists above the 4GB boundary.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/setup.h | 10 +++++++++-
arch/arm/kernel/compat.c | 4 ++--
arch/arm/kernel/setup.c | 12 +++++++++++-
3 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/setup.h b/arch/arm/include/asm/setup.h
index 5092118..fab849f 100644
--- a/arch/arm/include/asm/setup.h
+++ b/arch/arm/include/asm/setup.h
@@ -43,6 +43,13 @@ struct tag_mem32 {
__u32 start; /* physical start address */
};
+#define ATAG_MEM64 0x54420002
+
+struct tag_mem64 {
+ __u64 size;
+ __u64 start; /* physical start address */
+};
+
/* VGA text type displays */
#define ATAG_VIDEOTEXT 0x54410003
@@ -147,7 +154,8 @@ struct tag {
struct tag_header hdr;
union {
struct tag_core core;
- struct tag_mem32 mem;
+ struct tag_mem32 mem32;
+ struct tag_mem64 mem64;
struct tag_videotext videotext;
struct tag_ramdisk ramdisk;
struct tag_initrd initrd;
diff --git a/arch/arm/kernel/compat.c b/arch/arm/kernel/compat.c
index 9256523..f224d95 100644
--- a/arch/arm/kernel/compat.c
+++ b/arch/arm/kernel/compat.c
@@ -86,8 +86,8 @@ static struct tag * __init memtag(struct tag *tag, unsigned long start, unsigned
tag = tag_next(tag);
tag->hdr.tag = ATAG_MEM;
tag->hdr.size = tag_size(tag_mem32);
- tag->u.mem.size = size;
- tag->u.mem.start = start;
+ tag->u.mem32.size = size;
+ tag->u.mem32.start = start;
return tag;
}
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 751ac80..0128db2 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -591,11 +591,21 @@ __tagtable(ATAG_CORE, parse_tag_core);
static int __init ...From: Will Deacon <will.deacon@arm.com> This patch ensures that the phys_addr_t datatype is used to represent physical addresses which may be beyond the range of an unsigned long. The virt <-> phys macros are updated accordingly to ensure that virtual addresses can remain as they are. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/memory.h | 17 +++++++++-------- arch/arm/include/asm/outercache.h | 14 ++++++++------ arch/arm/include/asm/pgalloc.h | 2 +- arch/arm/include/asm/pgtable.h | 2 +- arch/arm/include/asm/setup.h | 2 +- arch/arm/mm/init.c | 6 +++--- arch/arm/mm/mmu.c | 12 +++++++----- 7 files changed, 30 insertions(+), 25 deletions(-) diff --git a/arch/arm/include/asm/memory.h b/arch/arm/include/asm/memory.h index 23c2e8e..756252b 100644 --- a/arch/arm/include/asm/memory.h +++ b/arch/arm/include/asm/memory.h @@ -15,6 +15,7 @@ #include <linux/compiler.h> #include <linux/const.h> +#include <linux/types.h> #include <mach/memory.h> #include <asm/sizes.h> @@ -138,15 +139,15 @@ * files. Use virt_to_phys/phys_to_virt/__pa/__va instead. */ #ifndef __virt_to_phys -#define __virt_to_phys(x) ((x) - PAGE_OFFSET + PHYS_OFFSET) -#define __phys_to_virt(x) ((x) - PHYS_OFFSET + PAGE_OFFSET) +#define __virt_to_phys(x) (((phys_addr_t)(x) - PAGE_OFFSET + PHYS_OFFSET)) +#define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET + PAGE_OFFSET)) #endif /* * Convert a physical address to a Page Frame Number and back */ -#define __phys_to_pfn(paddr) ((paddr) >> PAGE_SHIFT) -#define __pfn_to_phys(pfn) ((pfn) << PAGE_SHIFT) +#define __phys_to_pfn(paddr) ((unsigned long)((paddr) >> PAGE_SHIFT)) +#define __pfn_to_phys(pfn) ((phys_addr_t)(pfn) << PAGE_SHIFT) /* * Convert a page to/from a physical address @@ -188,21 +189,21 @@ * translation for translating DMA addresses. Use the driver * DMA ...
From: Will Deacon <will.deacon@arm.com>
Memory banks living outside of the 32-bit physical address
space do not have a 1:1 pa <-> va mapping and therefore the
__va macro may wrap.
This patch ensures that such banks are marked as highmem so
that the Kernel doesn't try to split them up when it sees that
the wrapped virtual address overlaps the vmalloc space.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/mm/mmu.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index b03e431..787a409 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -785,7 +785,8 @@ static void __init sanity_check_meminfo(void)
#ifdef CONFIG_HIGHMEM
if (__va(bank->start) > vmalloc_min ||
- __va(bank->start) < (void *)PAGE_OFFSET)
+ __va(bank->start) < (void *)PAGE_OFFSET ||
+ bank->start > ULONG_MAX)
highmem = 1;
bank->highmem = highmem;
@@ -794,7 +795,7 @@ static void __init sanity_check_meminfo(void)
* Split those memory banks which are partially overlapping
* the vmalloc area greatly simplifying things later.
*/
- if (__va(bank->start) < vmalloc_min &&
+ if (!highmem && __va(bank->start) < vmalloc_min &&
bank->size > vmalloc_min - __va(bank->start)) {
if (meminfo.nr_banks >= NR_BANKS) {
printk(KERN_CRIT "NR_BANKS too low, "
--
This patch modifies the pgd/pmd/pte manipulation functions to support
the 3-level page table format. Since there is no need for an 'ext'
argument to cpu_set_pte_ext(), this patch conditionally defines a
different prototype for this function when CONFIG_ARM_LPAE.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/cpu-multi32.h | 8 ++++
arch/arm/include/asm/cpu-single.h | 4 ++
arch/arm/include/asm/pgalloc.h | 26 ++++++++++++-
arch/arm/include/asm/pgtable.h | 72 ++++++++++++++++++++++++++++++++++++
arch/arm/include/asm/proc-fns.h | 13 ++++++
arch/arm/mm/ioremap.c | 8 ++-
arch/arm/mm/pgd.c | 18 +++++++--
arch/arm/mm/proc-v7.S | 8 ++++
8 files changed, 149 insertions(+), 8 deletions(-)
diff --git a/arch/arm/include/asm/cpu-multi32.h b/arch/arm/include/asm/cpu-multi32.h
index e2b5b0b..985fcf5 100644
--- a/arch/arm/include/asm/cpu-multi32.h
+++ b/arch/arm/include/asm/cpu-multi32.h
@@ -57,7 +57,11 @@ extern struct processor {
* Set a possibly extended PTE. Non-extended PTEs should
* ignore 'ext'.
*/
+#ifdef CONFIG_ARM_LPAE
+ void (*set_pte_ext)(pte_t *ptep, pte_t pte);
+#else
void (*set_pte_ext)(pte_t *ptep, pte_t pte, unsigned int ext);
+#endif
} processor;
#define cpu_proc_init() processor._proc_init()
@@ -65,5 +69,9 @@ extern struct processor {
#define cpu_reset(addr) processor.reset(addr)
#define cpu_do_idle() processor._do_idle()
#define cpu_dcache_clean_area(addr,sz) processor.dcache_clean_area(addr,sz)
+#ifdef CONFIG_ARM_LPAE
+#define cpu_set_pte_ext(ptep,pte) processor.set_pte_ext(ptep,pte)
+#else
#define cpu_set_pte_ext(ptep,pte,ext) processor.set_pte_ext(ptep,pte,ext)
+#endif
#define cpu_do_switch_mm(pgd,mm) processor.switch_mm(pgd,mm)
diff --git a/arch/arm/include/asm/cpu-single.h b/arch/arm/include/asm/cpu-single.h
index f073a6d..f436df2 100644
--- a/arch/arm/include/asm/cpu-single.h
+++ ...Just make LPAE and non-LPAE both provide PTE_PFN_MASK - for non-LPAE
this can be defined as ~0UL to optimize it away. However, PTE_PFN_MASK
is the wrong name for this - you're not masking out the PFN, but the
physical address. It only becomes a PFN when you shift.
... here it becomes much more confusing - it suggests that
"pmd_val(pmd) & PTE_PFN_MASK" gives you a PFN, which you then pass to
a function which takes a physical address.
Also, pmd_page_vaddr() in my patches ends up as:
static inline pte_t *pmd_page_vaddr(pmd_t pmd)
{
+ return __va(pmd_val(pmd) & PAGE_MASK);
}
which is almost the same. I'd suggest that this becomes for both:
static inline pte_t *pmd_page_vaddr(pmd_t pmd)
{
return __va(pmd_val(pmd) & PTE_PFN_MASK & PAGE_MASK);
}
This actually wants to become something more like:
+ pgd = pgd_base + pgd_index(0);
+ if (pgd_none_or_clear_bad(pgd))
+ goto no_pgd;
+ pmd = pmd_offset(pgd, 0);
+ if (pmd_none_or_clear_bad(pmd))
+ goto no_pmd;
pte = pmd_pgtable(*pmd);
pmd_clear(pmd);
pte_free(mm, pte);
+no_pmd:
+ pgd_clear(pgd);
pmd_free(mm, pmd);
+no_pgd:
free_pgd(pgd_base);
--
This patch moves page table definitions from asm/page.h, asm/pgtable.h
and asm/ptgable-hwdef.h into corresponding *-2level* files.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/page.h | 40 +-------
arch/arm/include/asm/pgtable-2level-hwdef.h | 91 +++++++++++++++++
arch/arm/include/asm/pgtable-2level-types.h | 64 ++++++++++++
arch/arm/include/asm/pgtable-2level.h | 147 +++++++++++++++++++++++++++
arch/arm/include/asm/pgtable-hwdef.h | 77 +--------------
arch/arm/include/asm/pgtable.h | 139 +-------------------------
6 files changed, 306 insertions(+), 252 deletions(-)
create mode 100644 arch/arm/include/asm/pgtable-2level-hwdef.h
create mode 100644 arch/arm/include/asm/pgtable-2level-types.h
create mode 100644 arch/arm/include/asm/pgtable-2level.h
diff --git a/arch/arm/include/asm/page.h b/arch/arm/include/asm/page.h
index a485ac3..3848105 100644
--- a/arch/arm/include/asm/page.h
+++ b/arch/arm/include/asm/page.h
@@ -151,45 +151,7 @@ extern void __cpu_copy_user_highpage(struct page *to, struct page *from,
#define clear_page(page) memset((void *)(page), 0, PAGE_SIZE)
extern void copy_page(void *to, const void *from);
-#undef STRICT_MM_TYPECHECKS
-
-#ifdef STRICT_MM_TYPECHECKS
-/*
- * These are used to make use of C type-checking..
- */
-typedef struct { unsigned long pte; } pte_t;
-typedef struct { unsigned long pmd; } pmd_t;
-typedef struct { unsigned long pgd[2]; } pgd_t;
-typedef struct { unsigned long pgprot; } pgprot_t;
-
-#define pte_val(x) ((x).pte)
-#define pmd_val(x) ((x).pmd)
-#define pgd_val(x) ((x).pgd[0])
-#define pgprot_val(x) ((x).pgprot)
-
-#define __pte(x) ((pte_t) { (x) } )
-#define __pmd(x) ((pmd_t) { (x) } )
-#define __pgprot(x) ((pgprot_t) { (x) } )
-
-#else
-/*
- * .. while these make it easier on the compiler
- */
-typedef unsigned long pte_t;
-typedef unsigned long pmd_t;
-typedef unsigned long ...This also introduces pteval_t. It would be useful to have the
introduction of pteval_t as a separate patch, which not only
This should become:
typedef struct { pteval_t pte; } pte_t;
L_PTE_* can then be declared using linux/const.h stuff to typedef them
to pteval_t. shared_pte_mask also needs to be pteval_t.
As far as the __p*_error() functions, these should probably be passed
the pte/pmd/pgd value itself, rather than first passing them through
__pte_val() et.al.
Of couse, I now have patches for this and my other points... will sort
them out into a series in the next few days.
--
On 15 November 2010 23:31, Russell King - ARM Linux I already do this for LPAE but can be done for the 2-level definitions for consistency. BTW, do you think it's worth adding STRICT_MM_TYPECHECKS for LPAE as Thanks. -- Catalin --
No you don't. You define the 2nd level definitions using pmd_t which Definitely, because it'll throw out warnings for most of your _AT(pmd_t,) definitions. --
On 16 November 2010 09:59, Russell King - ARM Linux I was only referring to L_PTE_*. The PMD_* definitions are wrong indeed. -- Catalin --
BTW, don't post another round of patches just because you've had _some_ comments back - your v2 patches are still being looked through, your v3 patches haven't even been looked at yet. It took some 4 hours for the mailing list to get through last nights posting frenzy that it really isn't worth overloading - is it really worth bringing the list server to its knees when the job is only half done? --
On 16 November 2010 10:04, Russell King - ARM Linux I'll wait for your patches on the PTE offset and than rebase mine on top. It may be sometime next week as I already have a lot to do. I posted the v3 patches just to clarify the issues around the 3/20 patch. The other patches are pretty much the same, so you can skip this version and wait for v4. -- Catalin --
This function was assuming that there are only two levels of page
tables. The patch changes looping over the PMD entries to make it
compatible with LPAE.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/mm/mmu.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 4147cc6..3784acc 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1098,13 +1098,16 @@ void setup_mm_for_reboot(char mode)
if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
base_pmdval |= PMD_BIT4;
- for (i = 0; i < FIRST_USER_PGD_NR + USER_PTRS_PER_PGD; i++, pgd++) {
+ for (i = 0; i < TASK_SIZE >> PMD_SHIFT; i++) {
unsigned long pmdval = (i << PMD_SHIFT) | base_pmdval;
pmd_t *pmd;
+ unsigned long addr = i << PMD_SHIFT;
- pmd = pmd_off(pgd, i << PMD_SHIFT);
+ pmd = pmd_off(pgd + pgd_index(addr), addr);
pmd[0] = __pmd(pmdval);
+#ifndef CONFIG_ARM_LPAE
pmd[1] = __pmd(pmdval + (1 << (PMD_SHIFT - 1)));
+#endif
flush_pmd_entry(pmd);
}
--
The same is required for the identity mapping code. If this uses that code, the problem becomes localized there. --
With LPAE, TTBRx registers are 64-bit. The ASID is stored in TTBR0
rather than a separate Context ID register. This patch makes the
necessary changes to handle context switching on LPAE.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/mm/context.c | 18 ++++++++++++++++--
arch/arm/mm/proc-v7.S | 8 +++++++-
2 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/arch/arm/mm/context.c b/arch/arm/mm/context.c
index b0ee9ba..d40d3fa 100644
--- a/arch/arm/mm/context.c
+++ b/arch/arm/mm/context.c
@@ -22,6 +22,20 @@ unsigned int cpu_last_asid = ASID_FIRST_VERSION;
DEFINE_PER_CPU(struct mm_struct *, current_mm);
#endif
+#ifdef CONFIG_ARM_LPAE
+#define cpu_set_asid(asid) { \
+ unsigned long ttbl, ttbh; \
+ asm(" mrrc p15, 0, %0, %1, c2 @ read TTBR0\n" \
+ " mov %1, %1, lsl #(48 - 32) @ set ASID\n" \
+ " mcrr p15, 0, %0, %1, c2 @ set TTBR0\n" \
+ : "=r" (ttbl), "=r" (ttbh) \
+ : "r" (asid & ~ASID_MASK)); \
+}
+#else
+#define cpu_set_asid(asid) \
+ asm(" mcr p15, 0, %0, c13, c0, 1\n" : : "r" (asid))
+#endif
+
/*
* We fork()ed a process, and we need a new context for the child
* to run in. We reserve version 0 for initial tasks so we will
@@ -37,7 +51,7 @@ void __init_new_context(struct task_struct *tsk, struct mm_struct *mm)
static void flush_context(void)
{
/* set the reserved ASID before flushing the TLB */
- asm("mcr p15, 0, %0, c13, c0, 1\n" : : "r" (0));
+ cpu_set_asid(0);
isb();
local_flush_tlb_all();
if (icache_is_vivt_asid_tagged()) {
@@ -99,7 +113,7 @@ static void reset_context(void *info)
set_mm_context(mm, asid);
/* set the new ASID */
- asm("mcr p15, 0, %0, c13, c0, 1\n" : : "r" (mm->context.id));
+ cpu_set_asid(mm->context.id);
isb();
}
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 33a8c82..b0932c1 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -117,6 +117,11 @@ ENTRY(cpu_v7_switch_mm)
#ifdef CONFIG_MMU
...This patch adds the MMU initialisation for the LPAE page table format. The swapper_pg_dir size with LPAE is 5 rather than 4 pages. The __v7_setup function configures the TTBRx split based on the PAGE_OFFSET and sets the corresponding TTB control and MAIRx bits (similar to PRRR/NMRR for TEX remapping). The 36-bit mappings (supersections) and a few other memory types in mmu.c are conditionally compiled. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/kernel/head.S | 96 +++++++++++++++++++++++++++++++------------ arch/arm/mm/mmu.c | 32 ++++++++++++++- arch/arm/mm/proc-macros.S | 5 +- arch/arm/mm/proc-v7.S | 99 ++++++++++++++++++++++++++++++++++++++++---- 4 files changed, 193 insertions(+), 39 deletions(-) diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S index dd6b369..fd8a29e 100644 --- a/arch/arm/kernel/head.S +++ b/arch/arm/kernel/head.S @@ -21,6 +21,7 @@ #include <asm/memory.h> #include <asm/thread_info.h> #include <asm/system.h> +#include <asm/pgtable.h> #ifdef CONFIG_DEBUG_LL #include <mach/debug-macro.S> @@ -45,11 +46,20 @@ #error KERNEL_RAM_VADDR must start at 0xXXXX8000 #endif +#ifdef CONFIG_ARM_LPAE + /* LPAE requires an additional page for the PGD */ +#define PG_DIR_SIZE 0x5000 +#define PTE_WORDS 3 +#else +#define PG_DIR_SIZE 0x4000 +#define PTE_WORDS 2 +#endif + .globl swapper_pg_dir - .equ swapper_pg_dir, KERNEL_RAM_VADDR - 0x4000 + .equ swapper_pg_dir, KERNEL_RAM_VADDR - PG_DIR_SIZE .macro pgtbl, rd - ldr \rd, =(KERNEL_RAM_PADDR - 0x4000) + ldr \rd, =(KERNEL_RAM_PADDR - PG_DIR_SIZE) .endm #ifdef CONFIG_XIP_KERNEL @@ -129,11 +139,11 @@ __create_page_tables: pgtbl r4 @ page table address /* - * Clear the 16K level 1 swapper page table + * Clear the swapper page table */ mov r0, r4 mov r3, #0 - add r6, r0, #0x4000 + add r6, r0, #PG_DIR_SIZE 1: str r3, [r0], #4 str r3, [r0], #4 str r3, [r0], #4 @@ -141,6 +151,23 @@ ...
This should have been called PTE_ORDER, the PTE_WORDS naming is misleading. -- Catalin --
PTE is not the right prefix here - we don't deal with the lowest level of page tables, which in Linux is called PTE. I think you mean PMD_WORDS Are you sure these shifts by 18 places are correct? They're actually (val >> SECTION_SHIFT) << 2, so maybe they should be (SECTION_SHIFT - PMD_WORDS) ? --
On 22 November 2010 13:10, Russell King - ARM Linux SECTION_SHIFT - PMD_ORDER is (20 - 2) for classic page tables and (21 - 3) for LPAE. But we could change the 18 to some macros for clarification (the line would be long though). -- Catalin --
So yes, it's SECTION_SHIFT - PMD_ORDER, which is how they should be used IMHO. I don't see why another macro would be necessary. --
I didn't mean adding another macro but using (SECTION_SHIFT - PMD_ORDER) on a long line. -- Catalin --
With 3-level page tables, starting secondary CPUs required allocating
the pgd as well. Since LPAE Linux uses TTBR1 for the kernel page tables,
this patch reorders the CPU setup call in the head.S file so that the
swapper_pg_dir is used. TTBR0 is set to the value generated by the
primary CPU.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/kernel/head.S | 10 +++++-----
arch/arm/kernel/smp.c | 39 +++++++++++++++++++++++++++++++++++++--
2 files changed, 42 insertions(+), 7 deletions(-)
diff --git a/arch/arm/kernel/head.S b/arch/arm/kernel/head.S
index fd8a29e..b54d00e 100644
--- a/arch/arm/kernel/head.S
+++ b/arch/arm/kernel/head.S
@@ -321,6 +321,10 @@ ENTRY(secondary_startup)
moveq r0, #'p' @ yes, error 'p'
beq __error_p
+ pgtbl r4
+ add r12, r10, #BSYM(PROCINFO_INITFUNC)
+ blx r12 @ initialise processor
+ @ (return control reg)
/*
* Use the page tables supplied from __cpu_up.
*/
@@ -328,12 +332,8 @@ ENTRY(secondary_startup)
ldmia r4, {r5, r7, r12} @ address to jump to after
sub r4, r4, r5 @ mmu has been enabled
ldr r4, [r7, r4] @ get secondary_data.pgdir
- adr lr, BSYM(__enable_mmu) @ return address
mov r13, r12 @ __secondary_switched address
- ARM( add pc, r10, #PROCINFO_INITFUNC ) @ initialise processor
- @ (return control reg)
- THUMB( add r12, r10, #PROCINFO_INITFUNC )
- THUMB( mov pc, r12 )
+ b __enable_mmu
ENDPROC(secondary_startup)
/*
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 40b386c..089e2ae 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -82,8 +82,10 @@ static inline void identity_mapping_add(pgd_t *pgd, unsigned long start,
pmd = pmd_offset(pgd + pgd_index(addr), addr);
pmd[0] = __pmd(addr | prot);
addr += SECTION_SIZE;
+#ifndef CONFIG_ARM_LPAE
pmd[1] = __pmd(addr | prot);
addr += SECTION_SIZE;
+#endif
flush_pmd_entry(pmd);
outer_clean_range(__pa(pmd), __pa(pmd + 1));
}
@@ -98,7 +100,9 @@ static ...I really don't like this being different in ordering from the boot CPU bring up. If we want to have the init function dealing with split page tables, we should pass in two pointers for it in both paths. --
PGDIR_SHIFT and PMD_SHIFT for the classic 2-level page table format have
the same value (21). This patch converts the PGDIR_* uses in the kernel
to the PMD_* equivalent.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/kernel/module.c | 2 +-
arch/arm/kernel/smp.c | 4 ++--
arch/arm/mm/dma-mapping.c | 6 +++---
arch/arm/mm/mmu.c | 16 ++++++++--------
4 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/arch/arm/kernel/module.c b/arch/arm/kernel/module.c
index d9bd786..6b30f01 100644
--- a/arch/arm/kernel/module.c
+++ b/arch/arm/kernel/module.c
@@ -32,7 +32,7 @@
* recompiling the whole kernel when CONFIG_XIP_KERNEL is turned on/off.
*/
#undef MODULES_VADDR
-#define MODULES_VADDR (((unsigned long)_etext + ~PGDIR_MASK) & PGDIR_MASK)
+#define MODULES_VADDR (((unsigned long)_etext + ~PMD_MASK) & PMD_MASK)
#endif
#ifdef CONFIG_MMU
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 8c19595..40b386c 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -78,7 +78,7 @@ static inline void identity_mapping_add(pgd_t *pgd, unsigned long start,
if (cpu_architecture() <= CPU_ARCH_ARMv5TEJ && !cpu_is_xscale())
prot |= PMD_BIT4;
- for (addr = start & PGDIR_MASK; addr < end;) {
+ for (addr = start & PMD_MASK; addr < end;) {
pmd = pmd_offset(pgd + pgd_index(addr), addr);
pmd[0] = __pmd(addr | prot);
addr += SECTION_SIZE;
@@ -95,7 +95,7 @@ static inline void identity_mapping_del(pgd_t *pgd, unsigned long start,
unsigned long addr;
pmd_t *pmd;
- for (addr = start & PGDIR_MASK; addr < end; addr += PGDIR_SIZE) {
+ for (addr = start & PMD_MASK; addr < end; addr += PMD_SIZE) {
pmd = pmd_offset(pgd + pgd_index(addr), addr);
pmd[0] = __pmd(0);
pmd[1] = __pmd(0);
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e4dd064..2aab1b4 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -120,8 +120,8 @@ static void ...This lot really does need unifying - and in any case this last addition should be using 'SECTION SIZE' not something related to PMD shifts. Strangely, it's something I've done over the weekend... --
On the classic page tables, the PMD_SHIFT is 21 while the SECTION_SHIFT is 20. They have slightly different meaning. But we currently have some hacks to cope with PMD_SHIFT being 21 by writing the pmd[0] and pmd[1] in the same call. The way I see to use SECTION_SHIFT is to drop the pmd[] array (but haven't looked closely OK, the more clean-up the better. -- Catalin --
Correct, and if you look at the code again and analyze what it's doing, you'll see that it's using the wrong thing. The code pre-exists the SECTION_* macros, and was never fixed up when they were introduced. SECTION_SIZE is the right thing here - it's setting up sections. Just look at identity_mapping_add() to see how the code _should_ be. --
This patch adds the ARM_LPAE and ARCH_PHYS_ADDR_T_64BIT Kconfig entries allowing LPAE support to be compiled into the kernel. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/Kconfig | 2 +- arch/arm/mm/Kconfig | 13 +++++++++++++ 2 files changed, 14 insertions(+), 1 deletions(-) diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index f35fe82..e376b7b 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -1599,7 +1599,7 @@ config CMDLINE_FORCE config XIP_KERNEL bool "Kernel Execute-In-Place from ROM" - depends on !ZBOOT_ROM + depends on !ZBOOT_ROM && !ARM_LPAE help Execute-In-Place allows the kernel to run from non-volatile storage directly addressable by the CPU, such as NOR flash. This saves RAM diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig index 8493ed0..3ca2d15 100644 --- a/arch/arm/mm/Kconfig +++ b/arch/arm/mm/Kconfig @@ -615,6 +615,19 @@ config IO_36 comment "Processor Features" +config ARM_LPAE + bool "Support for the Large Physical Address Extension" + depends on MMU && CPU_V7 + help + Say Y if you have an ARMv7 processor supporting the LPAE page table + format and you would like access memory beyond the 4GB limit. + +config ARCH_PHYS_ADDR_T_64BIT + def_bool ARM_LPAE + +config ARCH_DMA_ADDR_T_64BIT + def_bool ARM_LPAE + config ARM_THUMB bool "Support Thumb user binaries" depends on CPU_ARM720T || CPU_ARM740T || CPU_ARM920T || CPU_ARM922T || CPU_ARM925T || CPU_ARM926T || CPU_ARM940T || CPU_ARM946E || CPU_ARM1020 || CPU_ARM1020E || CPU_ARM1022 || CPU_ARM1026 || CPU_XSCALE || CPU_XSC3 || CPU_MOHAWK || CPU_V6 || CPU_V7 || CPU_FEROCEON --
Hello.
Maybe "like to access"?
WBR, Sergei
--
These macros were only used in setup_mm_for_reboot and get_pgd_slow. Both have been modified to no longer use these definitions. One of the reasons is the different meaning that PGD has with the 2-level and 3-level page tables. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/pgtable-2level.h | 3 --- arch/arm/include/asm/pgtable-3level.h | 3 --- arch/arm/mm/pgd.c | 2 +- 3 files changed, 1 insertions(+), 7 deletions(-) diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h index 4e21166..a0548b6 100644 --- a/arch/arm/include/asm/pgtable-2level.h +++ b/arch/arm/include/asm/pgtable-2level.h @@ -92,9 +92,6 @@ */ #define FIRST_USER_ADDRESS PAGE_SIZE -#define FIRST_USER_PGD_NR 1 -#define USER_PTRS_PER_PGD ((TASK_SIZE/PGDIR_SIZE) - FIRST_USER_PGD_NR) - /* * section address mask and size definitions. */ diff --git a/arch/arm/include/asm/pgtable-3level.h b/arch/arm/include/asm/pgtable-3level.h index 5b1482d..381b04b 100644 --- a/arch/arm/include/asm/pgtable-3level.h +++ b/arch/arm/include/asm/pgtable-3level.h @@ -58,9 +58,6 @@ */ #define FIRST_USER_ADDRESS PAGE_SIZE -#define FIRST_USER_PGD_NR 1 -#define USER_PTRS_PER_PGD ((TASK_SIZE/PGDIR_SIZE) - FIRST_USER_PGD_NR) - /* * section address mask and size definitions. */ diff --git a/arch/arm/mm/pgd.c b/arch/arm/mm/pgd.c index e7c149b..09238fa 100644 --- a/arch/arm/mm/pgd.c +++ b/arch/arm/mm/pgd.c @@ -25,7 +25,7 @@ #else #define alloc_pgd() (pgd_t *)__get_free_pages(GFP_KERNEL, 2) #define free_pgd(pgd) free_pages((unsigned long)pgd, 2) -#define FIRST_KERNEL_PGD_NR (FIRST_USER_PGD_NR + USER_PTRS_PER_PGD) +#define FIRST_KERNEL_PGD_NR (TASK_SIZE >> PGDIR_SHIFT) #endif /* --
We don't actually need this macro anymore, it can be killed (and I've already done so.) --
The LPAE page table format needs to explicitly disable execution or write permissions on a page by setting the corresponding bits (similar to the classic page table format with Access Flag enabled). This patch introduces null definitions for the 2-level format and the actual noexec and nowrite bits for the LPAE format. It also changes several PTE maintenance macros and masks. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> --- arch/arm/include/asm/pgtable-2level.h | 2 + arch/arm/include/asm/pgtable.h | 44 +++++++++++++++++++++------------ arch/arm/mm/mmu.c | 6 ++-- 3 files changed, 33 insertions(+), 19 deletions(-) diff --git a/arch/arm/include/asm/pgtable-2level.h b/arch/arm/include/asm/pgtable-2level.h index 36bdef7..4e21166 100644 --- a/arch/arm/include/asm/pgtable-2level.h +++ b/arch/arm/include/asm/pgtable-2level.h @@ -128,6 +128,8 @@ #define L_PTE_USER (1 << 8) #define L_PTE_EXEC (1 << 9) #define L_PTE_SHARED (1 << 10) /* shared(v6), coherent(xsc3) */ +#define L_PTE_NOEXEC (0) +#define L_PTE_NOWRITE (0) /* * These are the memory types, defined to be compatible with diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index ea08ab7..5bd0e64 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -66,23 +66,23 @@ extern pgprot_t pgprot_kernel; #define _MOD_PROT(p, b) __pgprot(pgprot_val(p) | (b)) -#define PAGE_NONE pgprot_user -#define PAGE_SHARED _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_WRITE) +#define PAGE_NONE _MOD_PROT(pgprot_user, L_PTE_NOEXEC | L_PTE_NOWRITE) +#define PAGE_SHARED _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_WRITE | L_PTE_NOEXEC) #define PAGE_SHARED_EXEC _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_WRITE | L_PTE_EXEC) -#define PAGE_COPY _MOD_PROT(pgprot_user, L_PTE_USER) -#define PAGE_COPY_EXEC _MOD_PROT(pgprot_user, L_PTE_USER | L_PTE_EXEC) -#define PAGE_READONLY _MOD_PROT(pgprot_user, L_PTE_USER) -#define ...
Let's not make this more complicated than it has to be. If we need the inverse of WRITE and EXEC, then that's what we should change everyone to, not invent a new system to work along side the old system. We're already inverting the write bit for the vast majority of processors, and exec has always been inverted by the ARMv6 and v7 code. --
On 15 November 2010 18:30, Russell King - ARM Linux Yes, that's fine. For PMD, we may still need a dummy PMD_SECT_AP_WRITE for the 3-level definitions unless we change the __pmd() accessor or __pmd_populate(). -- Catalin --
On 15 November 2010 18:30, Russell King - ARM Linux This adds an additional instruction in set_pte_ext, unless you can write the bit checking in a better way: tst r1, #L_PTE_NOWRITE orrne r3, r3, #PTE_EXT_APX tsteq r1, #L_PTE_DIRTY orreq r3, r3, #PTE_EXT_APX -- Catalin --
I think that would work with 3 instructions: eor r1, r1, L_PTE_DIRTY tst r1, #L_PTE_NOWRITE | L_PTE_DIRTY orrne r3, r3, #PTE_EXT_APX -- Catalin --
It actually results in the same number of instructions. From memory: ARMv3-ARMv5: - eor r3, r1, #L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_WRITE | L_PTE_DIRTY - tst r3, #L_PTE_WRITE | L_PTE_DIRTY @ write and dirty? - orreq r2, r2, #PTE_SMALL_AP_UNO_SRW + eor r3, r1, #L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY + tst r3, #L_PTE_RDONLY | L_PTE_DIRTY @ write and dirty? + orreq r2, r2, #PTE_SMALL_AP_UNO_SRW and for ARMv6+: - tst r1, #L_PTE_WRITE - tstne r1, #L_PTE_DIRTY - orreq r3, r3, #PTE_EXT_APX + eor r1, r1, #L_PTE_DIRTY + tst r1, #L_PTE_RDONLY | L_PTE_DIRTY + orrne r3, r3, #PTE_EXT_APX --
On 15 November 2010 18:30, Russell King - ARM Linux Question on the pgprot_noncached/writecombine/dmacoherent - in the current implementation we pass L_PTE_EXEC on the dmacoherent macro. Do we need to pass L_PTE_NOEXEC to the noncached/writecombine ones? I don't see a reason for any of these to be executable but maybe we can let the code calling them decide. Thanks. -- Catalin --
Erm. Please look at the code again. --
On 17 November 2010 17:16, Russell King - ARM Linux Ah, good point, that was the mask. So for dmacoherent we make sure that L_PTE_EXEC is cleared. I suspect we should now make sure that L_PTE_NOEXEC is set. For the other two, just leave them as they are. -- Catalin --
Already done: #define pgprot_dmacoherent(prot) \ - __pgprot_modify(prot, L_PTE_MT_MASK|L_PTE_EXEC, L_PTE_MT_BUFFERABLE) + __pgprot_modify(prot, L_PTE_MT_MASK, L_PTE_MT_BUFFERABLE|L_PTE_XN) ... #define pgprot_dmacoherent(prot) \ - __pgprot_modify(prot, L_PTE_MT_MASK|L_PTE_EXEC, L_PTE_MT_UNCACHED) + __pgprot_modify(prot, L_PTE_MT_MASK, L_PTE_MT_UNCACHED|L_PTE_XN) --
Are you already doing such changes? Just to avoid duplicating effort (and use common naming scheme). -- Catalin --
I did say that I had patches for all the issues I raised so far... They're just in the process of being posted (if lists.infradead.org this time can cope with one patch every 20 secs...) --
I wasn't sure which patches, so I did the XN/RDONLY as well (not big patch though). I'll rebase my LPAE stuff in the next days and repost. Thanks. -- Catalin --
From: Will Deacon <will.deacon@arm.com>
When using 2-level paging, pte_t and pmd_t are typedefs for
unsigned long but phys_addr_t is a typedef for u32.
This patch uses u32 for the page table entry types when
phys_addr_t is not 64-bit, allowing the same conversion
specifier to be used for physical addresses and page table
entries regardless of LPAE.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/page-nommu.h | 8 ++++----
arch/arm/include/asm/pgtable-2level-types.h | 18 +++++++++---------
2 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/arch/arm/include/asm/page-nommu.h b/arch/arm/include/asm/page-nommu.h
index d1b162a..a20641a 100644
--- a/arch/arm/include/asm/page-nommu.h
+++ b/arch/arm/include/asm/page-nommu.h
@@ -29,10 +29,10 @@
/*
* These are used to make use of C type-checking..
*/
-typedef unsigned long pte_t;
-typedef unsigned long pmd_t;
-typedef unsigned long pgd_t[2];
-typedef unsigned long pgprot_t;
+typedef u32 pte_t;
+typedef u32 pmd_t;
+typedef u32 pgd_t[2];
+typedef u32 pgprot_t;
#define pte_val(x) (x)
#define pmd_val(x) (x)
diff --git a/arch/arm/include/asm/pgtable-2level-types.h b/arch/arm/include/asm/pgtable-2level-types.h
index 30f6741..adc4928 100644
--- a/arch/arm/include/asm/pgtable-2level-types.h
+++ b/arch/arm/include/asm/pgtable-2level-types.h
@@ -21,16 +21,16 @@
#undef STRICT_MM_TYPECHECKS
-typedef unsigned long pteval_t;
+typedef u32 pteval_t;
#ifdef STRICT_MM_TYPECHECKS
/*
* These are used to make use of C type-checking..
*/
-typedef struct { unsigned long pte; } pte_t;
-typedef struct { unsigned long pmd; } pmd_t;
-typedef struct { unsigned long pgd[2]; } pgd_t;
-typedef struct { unsigned long pgprot; } pgprot_t;
+typedef struct { u32 pte; } pte_t;
+typedef struct { u32 pmd; } pmd_t;
+typedef struct { u32 pgd[2]; } pgd_t;
+typedef struct { u32 pgprot; } pgprot_t;
...However, code which prints the value of page table entries assumes that they are unsigned long, and places where we store the raw pte value also uses 'unsigned long'. If we're going to make this change, we need to change more places than this patch covers. grep for pte_val to help find those places. --
On Sunday, November 14, 2010, Russell King - ARM Linux Patch 19/20 introduces a common macro for formatting but we should probably order the patches a bit to avoid problems if anyone is bisecting in the middle of the series. -- Catalin --
Actually not a problem since LPAE is only enabled by the last patch. There may be some compiler warnings without 19/20, I need to check. -- Catalin --
There will be compiler warnings because u32 is unsigned int, and we print it as %08lx. Generic code cases pte values to (long long) and prints them using %08llx. We should do the same. In any case, this patch on its own introduces new compiler warnings. These need to be fixed in this patch, rather than relying on one later in the series. --
On 14 November 2010 15:14, Russell King - ARM Linux We still need some kind of macro because with LPAE we need %016llx since the phys address can go to 40-bit and there are some additional bits in the top word. Unless you'd like to always print 16 characters even for 32-bit ptes (or if there is some other printk magic I'm not Yes, we'll look into this. -- Catalin --
Why not just %010llx? That would just be two extra characters. Arnd --
We still have attributes (like XN, bit 54) stored in the top part of the pte. This may be of interest when debugging. -- Catalin --
They will be printed if they exist. The %010 in front of llx only means to have a minimum of 10 zero-paded digits if the value is smaller than that. However, not having aligned values will be confusing. A macro for the format might be the best compromize. Nicolas
It's what is done in the generic kernel code for page table entries.
printk(KERN_ALERT
"BUG: Bad page map in process %s pte:%08llx pmd:%08llx\n",
current->comm,
(long long)pte_val(pte), (long long)pmd_val(*pmd));
The places where this matters, there isn't any alignment between
lines to worry about:
printk(", *pmd=%08lx", pmd_val(*pmd));
printk(", *pte=%08lx", pte_val(*pte));
printk(", *ppte=%08lx", pte_val(pte[-PTRS_PER_PTE]));
in show_pte() are examples of what need changing.
--
We thought about using something like printk("%0*llx", sizeof(pteval_t) * 2, (long long)pte_val(pte)); but it complicates the code. Anyway, since these are printed for debugging mainline, we can probably cope with some lack of alignment (as Russell said, there may not be any where it matters). -- Catalin --
Not on non-LPAE build, please. Nicolas --
Eeh? %08llx prints 8 characters _minimum_. If it needs more to represent the number, it will use more characters. You surely don't think generic code is brain dead enough to cast something to a 64-bit long long and then only print 32 bits of it??? --
That's correct. I was just wondering whether the alignment would look weird with ptes being printed with different lengths. Anyway, here comes another set of patches with this update (%08llx in printk). -- Catalin --
Actually not a problem since LPAE is only enabled by the last patch. There may be some compiler warnings without 19/20, I need to check. -- Catalin --
From: Will Deacon <will.deacon@arm.com>
Now that the Kernel supports 2 level and 3 level page tables, physical
addresses (and also page table entries) may be 32 or 64-bits depending
upon the configuration.
This patch adds a conversion specifier (PHYS_ADDR_FMT) which represents
a u32 or u64 depending on the width of a physical address.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---
arch/arm/include/asm/types.h | 6 ++++++
arch/arm/kernel/setup.c | 2 +-
arch/arm/mm/fault.c | 8 ++++----
arch/arm/mm/mmu.c | 18 +++++++++---------
4 files changed, 20 insertions(+), 14 deletions(-)
diff --git a/arch/arm/include/asm/types.h b/arch/arm/include/asm/types.h
index dc1bdbb..b740539 100644
--- a/arch/arm/include/asm/types.h
+++ b/arch/arm/include/asm/types.h
@@ -7,6 +7,12 @@
#define BITS_PER_LONG 32
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+#define PHYS_ADDR_FMT "%016llx"
+#else
+#define PHYS_ADDR_FMT "%08x"
+#endif
+
#endif /* __KERNEL__ */
#endif
diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 0128db2..d143241 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -448,7 +448,7 @@ static int __init arm_add_memory(phys_addr_t start, unsigned long size)
if (meminfo.nr_banks >= NR_BANKS) {
printk(KERN_CRIT "NR_BANKS too low, "
- "ignoring memory at %#lx\n", start);
+ "ignoring memory at " PHYS_ADDR_FMT "\n", start);
return -EINVAL;
}
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 2dde9cd..8112f77 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -81,7 +81,7 @@ void show_pte(struct mm_struct *mm, unsigned long addr)
printk(KERN_ALERT "pgd = %p\n", mm->pgd);
pgd = pgd_offset(mm, addr);
- printk(KERN_ALERT "[%08lx] *pgd=%08lx", addr, pgd_val(*pgd));
+ printk(KERN_ALERT "[%08lx] *pgd=" PHYS_ADDR_FMT, addr, pgd_val(*pgd));
do {
pmd_t *pmd;
@@ -97,7 +97,7 @@ void ...I hope this patch is gone in v3. --
Yes, everything is converted to %08llx now. Once you complete yo
