Age | Commit message (Collapse) | Author |
|
Also change it to static.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-26-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first.
Then use mapped pages to map more ranges below, and keep looping until
all pages get mapped.
alloc_low_page will use page from BRK at first, after that buffer is used
up, will use memblock to find and reserve pages for page table usage.
Introduce min_pfn_mapped to make sure find new pages from mapped ranges,
that will be updated when lower pages get mapped.
Also add step_size to make sure that don't try to map too big range with
limited mapped pages initially, and increase the step_size when we have
more mapped pages on hand.
We don't need to call pagetable_reserve anymore, reserve work is done
in alloc_low_page() directly.
At last we can get rid of calculation and find early pgt related code.
-v2: update to after fix_xen change,
also use MACRO for initial pgt_buf size and add comments with it.
-v3: skip big reserved range in memblock.reserved near end.
-v4: don't need fix_xen change now.
-v5: add changelog about moving about reserving pagetable to alloc_low_page.
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-22-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
backed by actual DRAM. This is fine for holes under 4GB which are covered
by fixed and variable range MTRRs to be UC. However, we run into trouble
on higher memory addresses which cannot be covered by MTRRs.
Our system with 1TB of RAM has an e820 that looks like this:
BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
and so direct mappings are created for huge memory hole between
0x000000e038000000 to 0x0000010000000000. Even though the kernel never
generates memory accesses in that region, since the page tables mark
them incorrectly as being WB, our (AMD) processor ends up causing a MCE
while doing some memory bookkeeping/optimizations around that area.
This patch iterates through e820 and only direct maps ranges that are
marked as E820_RAM, and keeps track of those pfn ranges. Depending on
the alignment of E820 ranges, this may possibly result in using smaller
size (i.e. 4K instead of 2M or 1G) page tables.
-v2: move changes from setup.c to mm/init.c, also use for_each_mem_pfn_range
instead. - Yinghai Lu
-v3: add calculate_all_table_space_size() to get correct needed page table
size. - Yinghai Lu
-v4: fix add_pfn_range_mapped() to get correct max_low_pfn_mapped when
mem map does have hole under 4g that is found by Konard on xen
domU with 8g ram. - Yinghai
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1353123563-3103-16-git-send-email-yinghai@kernel.org
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Update code that previously assumed pfns [ 0 - max_low_pfn_mapped ) and
[ 4GB - max_pfn_mapped ) were always direct mapped, to now look up
pfn_mapped ranges instead.
-v2: change applying sequence to keep git bisecting working.
so add dummy pfn_range_is_mapped(). - Yinghai Lu
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1353123563-3103-12-git-send-email-yinghai@kernel.org
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Move all declarations of free_initmem() to linux/mm.h so that there's only one
and it's used by everything.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-c6x-dev@linux-c6x.org
cc: microblaze-uclinux@itee.uq.edu.au
cc: linux-sh@vger.kernel.org
cc: sparclinux@vger.kernel.org
cc: x86@kernel.org
cc: linux-mm@kvack.org
|
|
This patch reverts NUMA affine page table allocation added by commit
1411e0ec31 (x86-64, numa: Put pgtable to local node memory).
The commit made an undocumented change where the kernel linear mapping
strictly follows intersection of e820 memory map and NUMA
configuration. If the physical memory configuration has holes or NUMA
nodes are not properly aligned, this leads to using unnecessarily
smaller mapping size which leads to increased TLB pressure. For
details,
http://thread.gmane.org/gmane.linux.kernel/1104672
Patches to fix the problem have been proposed but the underlying code
needs more cleanup and the approach itself seems a bit heavy handed
and it has been determined to revert the feature for now and come back
to it in the next developement cycle.
http://thread.gmane.org/gmane.linux.kernel/1105959
As init_memory_mapping_high() callsites have been consolidated since
the commit, reverting is done manually. Also, the RED-PEN comment in
arch/x86/mm/init.c is not restored as the problem no longer exists
with memblock based top-down early memory allocation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
|
|
There's no reason for these to live in setup_arch(). Move them inside
initmem_init().
- v2: x86-32 initmem_init() weren't updated breaking 32bit builds.
Fixed. Found by Ankita.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
|
|
initmem_init() extensively accesses and modifies global data
structures and the parameters aren't even followed depending on which
path is being used. Drop @start/last_pfn and let it deal with
@max_pfn directly. This is in preparation for further NUMA init
cleanups.
- v2: x86-32 initmem_init() weren't updated breaking 32bit builds.
Fixed. Found by Yinghai.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Shaohui Zheng <shaohui.zheng@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@linux.intel.com>
|
|
Introduce init_memory_mapping_high(), and use it with 64bit.
It will go with every memory segment above 4g to create page table to the
memory range itself.
before this patch all page tables was on one node.
with this patch, one RED-PEN is killed
debug out for 8 sockets system after patch
[ 0.000000] initial memory mapped : 0 - 20000000
[ 0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff]
[ 0.000000] 0000000000 - 007f600000 page 2M
[ 0.000000] 007f600000 - 007f750000 page 4k
[ 0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff]
[ 0.000000] RAMDISK: 7bc84000 - 7f745000
....
[ 0.000000] Adding active range (0, 0x10, 0x95) 0 entries of 3200 used
[ 0.000000] Adding active range (0, 0x100, 0x7f750) 1 entries of 3200 used
[ 0.000000] Adding active range (0, 0x100000, 0x1080000) 2 entries of 3200 used
[ 0.000000] Adding active range (1, 0x1080000, 0x2080000) 3 entries of 3200 used
[ 0.000000] Adding active range (2, 0x2080000, 0x3080000) 4 entries of 3200 used
[ 0.000000] Adding active range (3, 0x3080000, 0x4080000) 5 entries of 3200 used
[ 0.000000] Adding active range (4, 0x4080000, 0x5080000) 6 entries of 3200 used
[ 0.000000] Adding active range (5, 0x5080000, 0x6080000) 7 entries of 3200 used
[ 0.000000] Adding active range (6, 0x6080000, 0x7080000) 8 entries of 3200 used
[ 0.000000] Adding active range (7, 0x7080000, 0x8080000) 9 entries of 3200 used
[ 0.000000] init_memory_mapping: [0x00000100000000-0x0000107fffffff]
[ 0.000000] 0100000000 - 1080000000 page 2M
[ 0.000000] kernel direct mapping tables up to 1080000000 @ [0x107ffbd000-0x107fffffff]
[ 0.000000] memblock_x86_reserve_range: [0x107ffc2000-0x107fffffff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00001080000000-0x0000207fffffff]
[ 0.000000] 1080000000 - 2080000000 page 2M
[ 0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff]
[ 0.000000] memblock_x86_reserve_range: [0x207ffc0000-0x207fffffff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00002080000000-0x0000307fffffff]
[ 0.000000] 2080000000 - 3080000000 page 2M
[ 0.000000] kernel direct mapping tables up to 3080000000 @ [0x307ff3d000-0x307fffffff]
[ 0.000000] memblock_x86_reserve_range: [0x307ffc0000-0x307fffffff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00003080000000-0x0000407fffffff]
[ 0.000000] 3080000000 - 4080000000 page 2M
[ 0.000000] kernel direct mapping tables up to 4080000000 @ [0x407fefd000-0x407fffffff]
[ 0.000000] memblock_x86_reserve_range: [0x407ffc0000-0x407fffffff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00004080000000-0x0000507fffffff]
[ 0.000000] 4080000000 - 5080000000 page 2M
[ 0.000000] kernel direct mapping tables up to 5080000000 @ [0x507febd000-0x507fffffff]
[ 0.000000] memblock_x86_reserve_range: [0x507ffc0000-0x507fffffff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00005080000000-0x0000607fffffff]
[ 0.000000] 5080000000 - 6080000000 page 2M
[ 0.000000] kernel direct mapping tables up to 6080000000 @ [0x607fe7d000-0x607fffffff]
[ 0.000000] memblock_x86_reserve_range: [0x607ffc0000-0x607fffffff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00006080000000-0x0000707fffffff]
[ 0.000000] 6080000000 - 7080000000 page 2M
[ 0.000000] kernel direct mapping tables up to 7080000000 @ [0x707fe3d000-0x707fffffff]
[ 0.000000] memblock_x86_reserve_range: [0x707ffc0000-0x707fffffff] PGTABLE
[ 0.000000] init_memory_mapping: [0x00007080000000-0x0000807fffffff]
[ 0.000000] 7080000000 - 8080000000 page 2M
[ 0.000000] kernel direct mapping tables up to 8080000000 @ [0x807fdfc000-0x807fffffff]
[ 0.000000] memblock_x86_reserve_range: [0x807ffbf000-0x807fffffff] PGTABLE
[ 0.000000] Initmem setup node 0 [0000000000000000-000000107fffffff]
[ 0.000000] NODE_DATA [0x0000107ffbd000-0x0000107ffc1fff]
[ 0.000000] Initmem setup node 1 [0000001080000000-000000207fffffff]
[ 0.000000] NODE_DATA [0x0000207ffbb000-0x0000207ffbffff]
[ 0.000000] Initmem setup node 2 [0000002080000000-000000307fffffff]
[ 0.000000] NODE_DATA [0x0000307ffbb000-0x0000307ffbffff]
[ 0.000000] Initmem setup node 3 [0000003080000000-000000407fffffff]
[ 0.000000] NODE_DATA [0x0000407ffbb000-0x0000407ffbffff]
[ 0.000000] Initmem setup node 4 [0000004080000000-000000507fffffff]
[ 0.000000] NODE_DATA [0x0000507ffbb000-0x0000507ffbffff]
[ 0.000000] Initmem setup node 5 [0000005080000000-000000607fffffff]
[ 0.000000] NODE_DATA [0x0000607ffbb000-0x0000607ffbffff]
[ 0.000000] Initmem setup node 6 [0000006080000000-000000707fffffff]
[ 0.000000] NODE_DATA [0x0000707ffbb000-0x0000707ffbffff]
[ 0.000000] Initmem setup node 7 [0000007080000000-000000807fffffff]
[ 0.000000] NODE_DATA [0x0000807ffba000-0x0000807ffbefff]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D1933D1.9020609@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
Move it into head file. to prepare use it in other files.
[ hpa: added missing <linux/types.h> and changed type to phys_addr_t. ]
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D1933BA.8000508@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
On 32-bit non-PAE system, cast to 'phys_addr_t' truncates value
before subtraction. Subtracting before cast produce same result
but remove following warnings from sparse:
arch/x86/include/asm/pgtable_types.h:255:38: warning: cast truncates bits from constant value (100000000 becomes 0)
arch/x86/include/asm/pgtable_types.h:270:38: warning: cast truncates bits from constant value (100000000 becomes 0)
arch/x86/include/asm/pgtable.h:127:32: warning: cast truncates bits from constant value (100000000 becomes 0)
arch/x86/include/asm/pgtable.h:132:32: warning: cast truncates bits from constant value (100000000 becomes 0)
arch/x86/include/asm/pgtable.h:344:31: warning: cast truncates bits from constant value (100000000 becomes 0)
64-bit or PAE machines will not be affected by this change.
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
LKML-Reference: <1285770588-14065-1-git-send-email-namhyung@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|
The generic resource based page_is_ram() works better with memory
hotplug/hotremove. So switch the x86 e820map based code to it.
CC: Andi Kleen <andi@firstfloor.org>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
LKML-Reference: <20100122033004.470767217@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
|
To eventually interleave emulated nodes over physical nodes, we
need to know the physical topology of the machine without actually
registering it. This does the k8 node setup in two parts:
detection and registration. NUMA emulation can then used the
physical topology detected to setup the address ranges of emulated
nodes accordingly. If emulation isn't used, the k8 nodes are
registered as normal.
Two formals are added to the x86 NUMA setup functions: `acpi' and
`k8'. These represent whether ACPI or K8 NUMA has been detected;
both cannot be true at the same time. This specifies to the NUMA
emulation code whether an underlying physical NUMA topology exists
and which interface to use.
This patch deals solely with separating the k8 setup path into
Northbridge detection and registration steps and leaves the ACPI
changes for a subsequent patch. The `acpi' formal is added here,
however, to avoid touching all the header files again in the next
patch.
This approach also ensures emulated nodes will not span physical
nodes so the true memory latency is not misrepresented.
k8_get_nodes() may now be used to export the k8 physical topology
of the machine for NUMA emulation.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
Impact: unification of declarations, cleanup
Unification of declarations:
moved init_memory_mapping, initmem_init and free_initmem from
page_XX_types.h to page_types.h
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1239693869.3033.31.camel@ht.satnam>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
gcc 3.2.2 reports:
In file included from /usr/src/all/linux-next/arch/x86/include/asm/page.h:8,
from /usr/src/all/linux-next/arch/x86/include/asm/processor.h:18,
from /usr/src/all/linux-next/arch/x86/include/asm/atomic_32.h:6,
from /usr/src/all/linux-next/arch/x86/include/asm/atomic.h:2,
from include/linux/crypto.h:20,
from arch/x86/kernel/asm-offsets_32.c:7,
from arch/x86/kernel/asm-offsets.c:2:
/usr/src/all/linux-next/arch/x86/include/asm/page_types.h:54: warning: parameter has incomplete type
/usr/src/all/linux-next/arch/x86/include/asm/page_types.h:56: warning: parameter has incomplete type
In file included from /usr/src/all/linux-next/arch/x86/include/asm/page.h:8,
from /usr/src/all/linux-next/arch/x86/include/asm/processor.h:18,
from include/linux/prefetch.h:14,
from include/linux/list.h:6,
from include/linux/module.h:9,
from init/main.c:13:
/usr/src/all/linux-next/arch/x86/include/asm/page_types.h:54: warning: parameter has incomplete type
/usr/src/all/linux-next/arch/x86/include/asm/page_types.h:56: warning: parameter has incomplete type
This is a bogus warning, but moving the pat-related functions
into asm/pat.h and including asm/pgtable_types.h should fix it.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
PAGETABLE_LEVELS and the PTE masks should be in pgtable*.h
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/headers
Conflicts:
arch/x86/include/asm/page.h
arch/x86/include/asm/pgtable.h
arch/x86/mach-voyager/voyager_smp.c
arch/x86/mm/fault.c
|
|
pgtable*.h is intended for definitions relating to actual pagetables
and their entries, so move all the definitions for
(pte|pmd|pud|pgd)(val)?_t to the appropriate pgtable*.h headers.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
|
|
|
|
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
|
|
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
|
|
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
|