Re: [PATCH 1/6] x86: Change get_max_mapped() to inline

Previous thread: [PATCH 1/3] x86, mm, 64bit: Put early page table high. by Yinghai Lu on Friday, December 17, 2010 - 5:58 pm. (1 message)

Next thread: [PATCH 2/3] x86, 64bit, gart: Fix allocation with memblock by Yinghai Lu on Friday, December 17, 2010 - 5:58 pm. (1 message)
From: Yinghai Lu
Date: Friday, December 17, 2010 - 5:58 pm

Please check

Those three patches to make memblock allocation more top to down.

Thanks

Yinghai
--

From: Yinghai Lu
Date: Monday, December 27, 2010 - 5:48 pm

We pre-allocate those buffer from top, so should use it top-down, so could
return unused part will be bottom side.
Will get less one hole in not used RAM.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/init.h |    6 +++---
 arch/x86/mm/init.c          |   12 ++++++------
 arch/x86/mm/init_32.c       |    4 ++--
 arch/x86/mm/init_64.c       |    5 +++--
 4 files changed, 14 insertions(+), 13 deletions(-)

Index: linux-2.6/arch/x86/include/asm/init.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/init.h
+++ linux-2.6/arch/x86/include/asm/init.h
@@ -11,8 +11,8 @@ kernel_physical_mapping_init(unsigned lo
 			     unsigned long page_size_mask);
 
 
-extern unsigned long __initdata e820_table_start;
-extern unsigned long __meminitdata e820_table_end;
-extern unsigned long __meminitdata e820_table_top;
+extern unsigned long __meminitdata e820_table_start;
+extern unsigned long __initdata e820_table_end;
+extern unsigned long __meminitdata e820_table_bottom;
 
 #endif /* _ASM_X86_INIT_32_H */
Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -18,9 +18,9 @@
 
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
-unsigned long __initdata e820_table_start;
-unsigned long __meminitdata e820_table_end;
-unsigned long __meminitdata e820_table_top;
+unsigned long __meminitdata e820_table_start;
+unsigned long __initdata e820_table_end;
+unsigned long __meminitdata e820_table_bottom;
 
 int after_bootmem;
 
@@ -73,12 +73,12 @@ static void __init find_early_table_spac
 	if (base == MEMBLOCK_ERROR)
 		panic("Cannot find space for the kernel page tables");
 
-	e820_table_start = base >> PAGE_SHIFT;
+	e820_table_start = (base + tables) >> PAGE_SHIFT;
 	e820_table_end = e820_table_start;
-	e820_table_top = e820_table_start + (tables >> ...
From: Yinghai Lu
Date: Monday, December 27, 2010 - 5:48 pm

We need to access it right way, so make sure that it is mapped already.

Prepare to put page table on local node, and nodemap is used before that.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/mm/numa_64.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/arch/x86/mm/numa_64.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/numa_64.c
+++ linux-2.6/arch/x86/mm/numa_64.c
@@ -87,7 +87,7 @@ static int __init allocate_cachealigned_
 
 	addr = 0x8000;
 	nodemap_size = roundup(sizeof(s16) * memnodemapsize, L1_CACHE_BYTES);
-	nodemap_addr = memblock_find_in_range(addr, max_pfn<<PAGE_SHIFT,
+	nodemap_addr = memblock_find_in_range(addr, get_max_mapped(),
 				      nodemap_size, L1_CACHE_BYTES);
 	if (nodemap_addr == MEMBLOCK_ERROR) {
 		printk(KERN_ERR
--

From: Yinghai Lu
Date: Monday, December 27, 2010 - 5:48 pm

Introduce init_memory_mapping_high(), and use it with 64bit.

It will go with every memory segment above 4g to create page table to the
memory range itself.

before this patch all page tables was on one node.

with this patch, one RED-PEN is killed

debug out for 8 sockets system after patch
[    0.000000] initial memory mapped : 0 - 20000000
[    0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff]
[    0.000000]  0000000000 - 007f600000 page 2M
[    0.000000]  007f600000 - 007f750000 page 4k
[    0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff]
[    0.000000] RAMDISK: 7bc84000 - 7f745000
....
[    0.000000] Adding active range (0, 0x10, 0x95) 0 entries of 3200 used
[    0.000000] Adding active range (0, 0x100, 0x7f750) 1 entries of 3200 used
[    0.000000] Adding active range (0, 0x100000, 0x1080000) 2 entries of 3200 used
[    0.000000] Adding active range (1, 0x1080000, 0x2080000) 3 entries of 3200 used
[    0.000000] Adding active range (2, 0x2080000, 0x3080000) 4 entries of 3200 used
[    0.000000] Adding active range (3, 0x3080000, 0x4080000) 5 entries of 3200 used
[    0.000000] Adding active range (4, 0x4080000, 0x5080000) 6 entries of 3200 used
[    0.000000] Adding active range (5, 0x5080000, 0x6080000) 7 entries of 3200 used
[    0.000000] Adding active range (6, 0x6080000, 0x7080000) 8 entries of 3200 used
[    0.000000] Adding active range (7, 0x7080000, 0x8080000) 9 entries of 3200 used
[    0.000000] init_memory_mapping: [0x00000100000000-0x0000107fffffff]
[    0.000000]  0100000000 - 1080000000 page 2M
[    0.000000] kernel direct mapping tables up to 1080000000 @ [0x107ffbd000-0x107fffffff]
[    0.000000]     memblock_x86_reserve_range: [0x107ffc2000-0x107fffffff]          PGTABLE
[    0.000000] init_memory_mapping: [0x00001080000000-0x0000207fffffff]
[    0.000000]  1080000000 - 2080000000 page 2M
[    0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff]
[    0.000000]     ...
From: Yinghai Lu
Date: Wednesday, December 29, 2010 - 4:46 pm

Introduce init_memory_mapping_high(), and use it with 64bit.

It will go with every memory segment above 4g to create page table to the
memory range itself.

before this patch all page tables was on one node.

with this patch, one RED-PEN is killed

debug out for 8 sockets system after patch
[    0.000000] initial memory mapped : 0 - 20000000
[    0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff]
[    0.000000]  0000000000 - 007f600000 page 2M
[    0.000000]  007f600000 - 007f750000 page 4k
[    0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff]
[    0.000000] RAMDISK: 7bc84000 - 7f745000
....
[    0.000000] Adding active range (0, 0x10, 0x95) 0 entries of 3200 used
[    0.000000] Adding active range (0, 0x100, 0x7f750) 1 entries of 3200 used
[    0.000000] Adding active range (0, 0x100000, 0x1080000) 2 entries of 3200 used
[    0.000000] Adding active range (1, 0x1080000, 0x2080000) 3 entries of 3200 used
[    0.000000] Adding active range (2, 0x2080000, 0x3080000) 4 entries of 3200 used
[    0.000000] Adding active range (3, 0x3080000, 0x4080000) 5 entries of 3200 used
[    0.000000] Adding active range (4, 0x4080000, 0x5080000) 6 entries of 3200 used
[    0.000000] Adding active range (5, 0x5080000, 0x6080000) 7 entries of 3200 used
[    0.000000] Adding active range (6, 0x6080000, 0x7080000) 8 entries of 3200 used
[    0.000000] Adding active range (7, 0x7080000, 0x8080000) 9 entries of 3200 used
[    0.000000] init_memory_mapping: [0x00000100000000-0x0000107fffffff]
[    0.000000]  0100000000 - 1080000000 page 2M
[    0.000000] kernel direct mapping tables up to 1080000000 @ [0x107ffbd000-0x107fffffff]
[    0.000000]     memblock_x86_reserve_range: [0x107ffc2000-0x107fffffff]          PGTABLE
[    0.000000] init_memory_mapping: [0x00001080000000-0x0000207fffffff]
[    0.000000]  1080000000 - 2080000000 page 2M
[    0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff]
[    0.000000]     ...
From: H. Peter Anvin
Date: Wednesday, December 29, 2010 - 4:50 pm

Lovely, yet another interbranch conflict.  This makes me very concerned.

What is the delta between these?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.
--

From: Yinghai Lu
Date: Wednesday, December 29, 2010 - 5:11 pm

your new x86/numa have

       setup_physnodes(addr, max_addr, acpi, amd);
       fake_physnodes(acpi, amd, num_nodes);

instead of

        acpi_fake_nodes(nodes, num_nodes);

in  numa_emulation()

Thanks

Yinghai
--

From: David Rientjes
Date: Wednesday, December 29, 2010 - 5:39 pm

That's from f51bf3073a1 (x86, numa: Fake apicid and pxm mappings for NUMA 
emulation) and c1c3443c9c (x86, numa: Fake node-to-cpumask for NUMA 
emulation) in x86/numa.  Given the subject line, I think your patchset is 
targeted to the same branch so I'm not sure what's concerning?
From: H. Peter Anvin
Date: Wednesday, December 29, 2010 - 5:58 pm

No, it's part of a much bigger patchset which doesn't have anything to
do with NUMA.  That's the problem.

In other words, I need a sane way to merge them and resolve the conflict.

	-hpa

--

From: David Rientjes
Date: Wednesday, December 29, 2010 - 6:07 pm

The two patches above from x86/numa that create the conflict should be 
dependent only on 4e76f4e67a (x86, numa: Avoid compiling NUMA emulation 
functions without CONFIG_NUMA_EMU), so cherry-pick them into x86/bootmem?
--

From: H. Peter Anvin
Date: Wednesday, December 29, 2010 - 6:53 pm

That would hurt more, I think.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: Ingo Molnar
Date: Thursday, December 30, 2010 - 2:06 am

x86/bootmem could be based on x86/numa - the latter is stable so it's not like we'll 
have to undo it from under x86/bootmem. We can then send it to Linus once x86/numa 
is upstream.

Btw., i suspect we want to use x86/memblock instead of x86/bootmem?

Thanks,

	Ingo
--

From: Ingo Molnar
Date: Thursday, December 30, 2010 - 3:28 am

FYI, either the x86/numa or the x86/bootmem changes cause the early boot crash 
below. Config attached.

Thanks,

	Ingo

---------------->
Linux version 2.6.37-rc8-tip-01830-g7937b8c-dirty (mingo@sirius) (gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) ) #76886 SMP Thu Dec 30 12:12:49 CET 2010
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
 BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
 BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
 BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
bootconsole [earlyser0] enabled
debug: ignoring loglevel setting.
Notice: NX (Execute Disable) protection cannot be enabled: non-PAE kernel!
DMI 2.3 present.
DMI: A8N-E/System Product Name, BIOS ASUS A8N-E ACPI BIOS Revision 1008 08/22/2005
e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
last_pfn = 0x3fff0 max_arch_pfn = 0x100000
MTRR default type: uncachable
MTRR fixed ranges enabled:
  00000-9FFFF write-back
  A0000-BFFFF uncachable
  C0000-C7FFF write-protect
  C8000-FFFFF uncachable
MTRR variable ranges enabled:
  0 base 0000000000 mask FFC0000000 write-back
  1 disabled
  2 disabled
  3 disabled
  4 disabled
  5 disabled
  6 disabled
  7 disabled
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
Scan SMP from c0000000 for 1024 bytes.
Scan SMP from c009fc00 for 1024 bytes.
Scan SMP from c00f0000 for 65536 bytes.
found SMP MP-table at [c00f5680] f5680
  mpc: f1400-f152c
Scanning 0 areas for low memory corruption
initial memory mapped : 0 - 02800000
init_memory_mapping: 0000000000000000-00000000373fe000
 0000000000 - 0000400000 page 4k
 0000400000 - 0037000000 ...
From: Ingo Molnar
Date: Thursday, December 30, 2010 - 3:30 am

It's x86/bootmem, one of these commits:

3c417751e4f0: x86: Rename e820_table_* to pgt_buf_*
d7992231c148: x86-64: Move out cleanup higmap [_brk_end, _end) out of init_memory_mapping()
4645b6af9427: x86: Use early pre-allocated page table buffer top-down
1411e0ec3123: x86-64, numa: Put pgtable to local node memory
dbef7b56d2fc: x86-64, numa: Allocate memnodemap under max_pfn_mapped
45635ab5e41b: x86: Change get_max_mapped() to inline
1a4a678b12c8: memblock: Make find_memory_core_early() find from top-down
32e3f2b00c52: x86-64, gart: Fix allocation with memblock
4b239f458c22: x86-64, mm: Put early page table high

i'm excluding them from tip:master for now.

Thanks,

	Ingo
--

From: Ingo Molnar
Date: Thursday, December 30, 2010 - 5:01 am

and x86/numa has this build failure:

arch/x86/mm/numa_64.c: In function ‘numa_set_cpumask’:
arch/x86/mm/numa_64.c:851:14: error: ‘physnodes’ undeclared (first use in this function)
arch/x86/mm/numa_64.c:851:14: note: each undeclared identifier is reported only once for each function it appears in

config attached.

Thanks,

	Ingo
From: David Rientjes
Date: Thursday, December 30, 2010 - 11:53 am

Yeah, you reported this one earlier and I sent a patch four days ago to 
fix it (http://marc.info/?l=linux-kernel&m=129340072128297).  I'll reply 
to this email with it again.

Thanks!
From: David Rientjes
Date: Thursday, December 30, 2010 - 11:54 am

"x86, numa: Fake node-to-cpumask for NUMA emulation" broke the build when
CONFIG_DEBUG_PER_CPU_MAPS is set and CONFIG_NUMA_EMU is not.  This is
because it is possible to map a cpu to multiple nodes when NUMA emulation
is used; the patch required a physical node address table to find those
nodes that was only available when CONFIG_NUMA_EMU was enabled.

This extracts the common debug functionality to its own function for
CONFIG_DEBUG_PER_CPU_MAPS and uses it regardless of whether
CONFIG_NUMA_EMU is set or not.

NUMA emulation will now iterate over the set of possible nodes for each
cpu and call the new debug function whereas only the cpu's node will be
used without NUMA emulation enabled.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 arch/x86/mm/numa_64.c |   48 +++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/numa_64.c b/arch/x86/mm/numa_64.c
--- a/arch/x86/mm/numa_64.c
+++ b/arch/x86/mm/numa_64.c
@@ -833,15 +833,48 @@ void __cpuinit numa_remove_cpu(int cpu)
 #endif /* !CONFIG_NUMA_EMU */
 
 #else /* CONFIG_DEBUG_PER_CPU_MAPS */
+static struct cpumask __cpuinit *debug_cpumask_set_cpu(int cpu, int enable)
+{
+	int node = early_cpu_to_node(cpu);
+	struct cpumask *mask;
+	char buf[64];
+
+	mask = node_to_cpumask_map[node];
+	if (!mask) {
+		pr_err("node_to_cpumask_map[%i] NULL\n", node);
+		dump_stack();
+		return NULL;
+	}
+
+	cpulist_scnprintf(buf, sizeof(buf), mask);
+	printk(KERN_DEBUG "%s cpu %d node %d: mask now %s\n",
+		enable ? "numa_add_cpu" : "numa_remove_cpu",
+		cpu, node, buf);
+	return mask;
+}
 
 /*
  * --------- debug versions of the numa functions ---------
  */
+#ifndef CONFIG_NUMA_EMU
+static void __cpuinit numa_set_cpumask(int cpu, int enable)
+{
+	struct cpumask *mask;
+
+	mask = debug_cpumask_set_cpu(cpu, enable);
+	if (!mask)
+		return;
+
+	if (enable)
+		cpumask_set_cpu(cpu, ...
From: Yinghai Lu
Date: Thursday, December 30, 2010 - 2:18 pm

caused by
4645b6af9427: x86: Use early pre-allocated page table buffer top-down

32 bit fixmap will use the pre-allocated range too. it needs range to
be continuous...

please drop
4645b6af9427: x86: Use early pre-allocated page table buffer top-down
3c417751e4f0: x86: Rename e820_table_* to pgt_buf_*

and will send out new version of

 x86: Rename e820_table_* to pgt_buf_*

Thanks

Yinghai
--

From: Yinghai Lu
Date: Monday, December 27, 2010 - 5:47 pm

Move it into head file. to prepare use it in other files.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/page_types.h |    5 +++++
 arch/x86/kernel/setup.c           |    9 ---------
 2 files changed, 5 insertions(+), 9 deletions(-)

Index: linux-2.6/arch/x86/include/asm/page_types.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/page_types.h
+++ linux-2.6/arch/x86/include/asm/page_types.h
@@ -45,6 +45,11 @@ extern int devmem_is_allowed(unsigned lo
 extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
 
+static inline u64 get_max_mapped(void)
+{
+	return (u64)max_pfn_mapped<<PAGE_SHIFT;
+}
+
 extern unsigned long init_memory_mapping(unsigned long start,
 					 unsigned long end);
 
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -680,15 +680,6 @@ static int __init parse_reservelow(char
 
 early_param("reservelow", parse_reservelow);
 
-static u64 __init get_max_mapped(void)
-{
-	u64 end = max_pfn_mapped;
-
-	end <<= PAGE_SHIFT;
-
-	return end;
-}
-
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
--

From: H. Peter Anvin
Date: Wednesday, December 29, 2010 - 4:05 pm

This is broken.  <asm/page_types.h> doesn't include <linux/types.h>
which is required for the u64 type -- a simple compile test would have
told you this.  Furthermore, it seems to me that it would make more
sense for this to be phys_addr_t instead of u64; would you agree?

	-hpa
--

From: Yinghai Lu
Date: Wednesday, December 29, 2010 - 4:30 pm

yes.
--

From: Yinghai Lu
Date: Wednesday, December 29, 2010 - 4:37 pm

or could just use unsigned long instead.

on 32bit it will be under 4g
on 64bit unsigned long  is 64bit already.

Thanks

Yinghai
--

From: H. Peter Anvin
Date: Wednesday, December 29, 2010 - 4:42 pm

This is true, although it seems fragile -- the whole terminology and the
differences between 32 and 64 bits are just a huge headache.

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: Yinghai Lu
Date: Wednesday, December 29, 2010 - 4:45 pm

Move it into head file. to prepare use it in other files.

-v2: hpa pointed out that u64 should not be used here.
     Actually We could unsigned long here. because for 32bit it will under 4g.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/page_types.h |    5 +++++
 arch/x86/kernel/setup.c           |    9 ---------
 2 files changed, 5 insertions(+), 9 deletions(-)

Index: linux-2.6/arch/x86/include/asm/page_types.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/page_types.h
+++ linux-2.6/arch/x86/include/asm/page_types.h
@@ -45,6 +45,11 @@ extern int devmem_is_allowed(unsigned lo
 extern unsigned long max_low_pfn_mapped;
 extern unsigned long max_pfn_mapped;
 
+static inline unsigned long get_max_mapped(void)
+{
+	return max_pfn_mapped<<PAGE_SHIFT;
+}
+
 extern unsigned long init_memory_mapping(unsigned long start,
 					 unsigned long end);
 
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -680,15 +680,6 @@ static int __init parse_reservelow(char
 
 early_param("reservelow", parse_reservelow);
 
-static u64 __init get_max_mapped(void)
-{
-	u64 end = max_pfn_mapped;
-
-	end <<= PAGE_SHIFT;
-
-	return end;
-}
-
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
--

From: Yinghai Lu
Date: Monday, December 27, 2010 - 5:48 pm

Now it is found from memblock way.

Change the name to purpose related.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/init.h |    6 +++---
 arch/x86/mm/init.c          |   20 ++++++++++----------
 arch/x86/mm/init_32.c       |    8 ++++----
 arch/x86/mm/init_64.c       |    4 ++--
 arch/x86/xen/mmu.c          |    2 +-
 5 files changed, 20 insertions(+), 20 deletions(-)

Index: linux-2.6/arch/x86/include/asm/init.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/init.h
+++ linux-2.6/arch/x86/include/asm/init.h
@@ -11,8 +11,8 @@ kernel_physical_mapping_init(unsigned lo
 			     unsigned long page_size_mask);
 
 
-extern unsigned long __meminitdata e820_table_start;
-extern unsigned long __initdata e820_table_end;
-extern unsigned long __meminitdata e820_table_bottom;
+extern unsigned long __meminitdata pgt_buf_start;
+extern unsigned long __initdata pgt_buf_end;
+extern unsigned long __meminitdata pgt_buf_bottom;
 
 #endif /* _ASM_X86_INIT_32_H */
Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -18,9 +18,9 @@
 
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
-unsigned long __meminitdata e820_table_start;
-unsigned long __initdata e820_table_end;
-unsigned long __meminitdata e820_table_bottom;
+unsigned long __meminitdata pgt_buf_start;
+unsigned long __initdata pgt_buf_end;
+unsigned long __meminitdata pgt_buf_bottom;
 
 int after_bootmem;
 
@@ -73,12 +73,12 @@ static void __init find_early_table_spac
 	if (base == MEMBLOCK_ERROR)
 		panic("Cannot find space for the kernel page tables");
 
-	e820_table_start = (base + tables) >> PAGE_SHIFT;
-	e820_table_end = e820_table_start;
-	e820_table_bottom = base >> PAGE_SHIFT;
+	pgt_buf_start = (base + tables) >> PAGE_SHIFT;
+	pgt_buf_end = pgt_buf_start;
+	pgt_buf_bottom = base >> ...
From: Yinghai Lu
Date: Thursday, December 30, 2010 - 2:54 pm

Now it is found from memblock way.

Change the name to purpose related.

-v2: Ingo found "4/6 x86: Use early pre-allocated page table buffer top-down"
     cause 32bit crash.
     and need to drop it, So update this one accordingly.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/include/asm/init.h |    6 +++---
 arch/x86/mm/init.c          |   20 ++++++++++----------
 arch/x86/mm/init_32.c       |    8 ++++----
 arch/x86/mm/init_64.c       |    4 ++--
 arch/x86/xen/mmu.c          |    2 +-
 5 files changed, 20 insertions(+), 20 deletions(-)

Index: linux-2.6/arch/x86/include/asm/init.h
===================================================================
--- linux-2.6.orig/arch/x86/include/asm/init.h
+++ linux-2.6/arch/x86/include/asm/init.h
@@ -11,8 +11,8 @@ kernel_physical_mapping_init(unsigned lo
 			     unsigned long page_size_mask);
 
 
-extern unsigned long __initdata e820_table_start;
-extern unsigned long __meminitdata e820_table_end;
-extern unsigned long __meminitdata e820_table_top;
+extern unsigned long __initdata pgt_buf_start;
+extern unsigned long __meminitdata pgt_buf_end;
+extern unsigned long __meminitdata pgt_buf_top;
 
 #endif /* _ASM_X86_INIT_32_H */
Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -18,9 +18,9 @@
 
 DEFINE_PER_CPU(struct mmu_gather, mmu_gathers);
 
-unsigned long __initdata e820_table_start;
-unsigned long __meminitdata e820_table_end;
-unsigned long __meminitdata e820_table_top;
+unsigned long __initdata pgt_buf_start;
+unsigned long __meminitdata pgt_buf_end;
+unsigned long __meminitdata pgt_buf_top;
 
 int after_bootmem;
 
@@ -73,12 +73,12 @@ static void __init find_early_table_spac
 	if (base == MEMBLOCK_ERROR)
 		panic("Cannot find space for the kernel page tables");
 
-	e820_table_start = base >> PAGE_SHIFT;
-	e820_table_end = e820_table_start;
-	e820_table_top = ...
From: Yinghai Lu
Date: Monday, December 27, 2010 - 5:48 pm

It is not related to init_memory_mapping(),  and init_memory_mapping() is
getting more bigger.

So make it as seperated function and call it from reserve_brk() and that is
point when _brk_end is concluded.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>

---
 arch/x86/kernel/setup.c |   24 ++++++++++++++++++++++++
 arch/x86/mm/init.c      |   19 -------------------
 2 files changed, 24 insertions(+), 19 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -293,10 +293,32 @@ static void __init init_gbpages(void)
 	else
 		direct_gbpages = 0;
 }
+
+static void __init cleanup_highmap_brk_end(void)
+{
+	pud_t *pud;
+	pmd_t *pmd;
+
+	mmu_cr4_features = read_cr4();
+
+	/*
+	 * _brk_end cannot change anymore, but it and _end may be
+	 * located on different 2M pages. cleanup_highmap(), however,
+	 * can only consider _end when it runs, so destroy any
+	 * mappings beyond _brk_end here.
+	 */
+	pud = pud_offset(pgd_offset_k(_brk_end), _brk_end);
+	pmd = pmd_offset(pud, _brk_end - 1);
+	while (++pmd <= pmd_offset(pud, (unsigned long)_end - 1))
+		pmd_clear(pmd);
+}
 #else
 static inline void init_gbpages(void)
 {
 }
+static inline void cleanup_highmap_brk_end(void)
+{
+}
 #endif
 
 static void __init reserve_brk(void)
@@ -307,6 +329,8 @@ static void __init reserve_brk(void)
 	/* Mark brk area as locked down and no longer taking any
 	   new allocations */
 	_brk_start = 0;
+
+	cleanup_highmap_brk_end();
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
Index: linux-2.6/arch/x86/mm/init.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/init.c
+++ linux-2.6/arch/x86/mm/init.c
@@ -270,25 +270,6 @@ unsigned long __init_refok init_memory_m
 	load_cr3(swapper_pg_dir);
 #endif
 
-#ifdef CONFIG_X86_64
-	if (!after_bootmem && !start) {
-		pud_t *pud;
-		pmd_t ...
From: Yinghai Lu
Date: Monday, December 27, 2010 - 5:47 pm

Please check

Those 6 patches need to be applied after three patches that i sent out 12/17/2010.

it will put page table in local node memory for 64bit numa.

Thanks

Yinghai

--

From: H. Peter Anvin
Date: Tuesday, December 28, 2010 - 1:21 pm

Please explain what you mean with "more top to down".  Not what the code
does, but what is the goal of the patchset.

	-hpa
--

From: Yinghai Lu
Date: Tuesday, December 28, 2010 - 2:36 pm

for example first node with 16g ram, it is into two parts: [0, 2g),
and [4g, 18g).

alloc_bootmem will get allocation from [0, 2g) always until we have
can not find more.

with third patch, it will try to get from [4g, 18g) at first.

second patch is need to applied before third patch, because old way
happenly get under 4g for generic bootmem under 4g

First one is trying not to put page table for [0, 4g) under 512M.

Thanks

Yinghai
--

From: H. Peter Anvin
Date: Tuesday, December 28, 2010 - 3:09 pm

The goal of this is to free up low memory for DMA and kdump, I presume?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.

--

From: Yinghai Lu
Date: Tuesday, December 28, 2010 - 3:25 pm

yes.  otherwise if we put pgtable around 512M, then we have no chance
to allocate 512M for kdump under 896M.
if we put pgtable near 2g <assume [2g, 4g) for mmio),  We can make it happen.

later 6 patches will put try to pgtable on local node.

Thanks
--

Previous thread: [PATCH 1/3] x86, mm, 64bit: Put early page table high. by Yinghai Lu on Friday, December 17, 2010 - 5:58 pm. (1 message)

Next thread: [PATCH 2/3] x86, 64bit, gart: Fix allocation with memblock by Yinghai Lu on Friday, December 17, 2010 - 5:58 pm. (1 message)