aboutsummaryrefslogtreecommitdiff
path: root/arch/sparc/include/asm/pgtable_64.h
AgeCommit message (Collapse)Author
2014-01-09mm: fix TLB flush race between migration, and change_protection_rangeRik van Riel
commit 20841405940e7be0617612d521e206e4b6b325db upstream. There are a few subtle races, between change_protection_range (used by mprotect and change_prot_numa) on one side, and NUMA page migration and compaction on the other side. The basic race is that there is a time window between when the PTE gets made non-present (PROT_NONE or NUMA), and the TLB is flushed. During that time, a CPU may continue writing to the page. This is fine most of the time, however compaction or the NUMA migration code may come in, and migrate the page away. When that happens, the CPU may continue writing, through the cached translation, to what is no longer the current memory location of the process. This only affects x86, which has a somewhat optimistic pte_accessible. All other architectures appear to be safe, and will either always flush, or flush whenever there is a valid mapping, even with no permissions (SPARC). The basic race looks like this: CPU A CPU B CPU C load TLB entry make entry PTE/PMD_NUMA fault on entry read/write old page start migrating page change PTE/PMD to new page read/write old page [*] flush TLB reload TLB from new entry read/write new page lose data [*] the old page may belong to a new user at this point! The obvious fix is to flush remote TLB entries, by making sure that pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may still be accessible if there is a TLB flush pending for the mm. This should fix both NUMA migration and compaction. [mgorman@suse.de: fix build] Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-04-19sparc64: Fix race in TLB batch processing.David S. Miller
As reported by Dave Kleikamp, when we emit cross calls to do batched TLB flush processing we have a race because we do not synchronize on the sibling cpus completing the cross call. So meanwhile the TLB batch can be reset (tb->tlb_nr set to zero, etc.) and either flushes are missed or flushes will flush the wrong addresses. Fix this by using generic infrastructure to synchonize on the completion of the cross call. This first required getting the flush_tlb_pending() call out from switch_to() which operates with locks held and interrupts disabled. The problem is that smp_call_function_many() cannot be invoked with IRQs disabled and this is explicitly checked for with WARN_ON_ONCE(). We get the batch processing outside of locked IRQ disabled sections by using some ideas from the powerpc port. Namely, we only batch inside of arch_{enter,leave}_lazy_mmu_mode() calls. If we're not in such a region, we flush TLBs synchronously. 1) Get rid of xcall_flush_tlb_pending and per-cpu type implementations. 2) Do TLB batch cross calls instead via: smp_call_function_many() tlb_pending_func() __flush_tlb_pending() 3) Batch only in lazy mmu sequences: a) Add 'active' member to struct tlb_batch b) Define __HAVE_ARCH_ENTER_LAZY_MMU_MODE c) Set 'active' in arch_enter_lazy_mmu_mode() d) Run batch and clear 'active' in arch_leave_lazy_mmu_mode() e) Check 'active' in tlb_batch_add_one() and do a synchronous flush if it's clear. 4) Add infrastructure for synchronous TLB page flushes. a) Implement __flush_tlb_page and per-cpu variants, patch as needed. b) Likewise for xcall_flush_tlb_page. c) Implement smp_flush_tlb_page() to invoke the cross-call. d) Wire up global_flush_tlb_page() to the right routine based upon CONFIG_SMP 5) It turns out that singleton batches are very common, 2 out of every 3 batch flushes have only a single entry in them. The batch flush waiting is very expensive, both because of the poll on sibling cpu completeion, as well as because passing the tlb batch pointer to the sibling cpus invokes a shared memory dereference. Therefore, in flush_tlb_pending(), if there is only one entry in the batch perform a completely asynchronous global_flush_tlb_page() instead. Reported-by: Dave Kleikamp <dave.kleikamp@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Dave Kleikamp <dave.kleikamp@oracle.com>
2013-02-13sparc64: Fix get_user_pages_fast() wrt. THP.David S. Miller
Mostly mirrors the s390 logic, as unlike x86 we don't need the SetPageReferenced() bits. On sparc64 we also lack a user/privileged bit in the huge PMDs. In order to make this work for THP and non-THP builds, some header file adjustments were necessary. Namely, provide the PMD_HUGE_* bit defines and the pmd_large() inline unconditionally rather than protected by TRANSPARENT_HUGEPAGE. Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-18sparc64: Define pte_accessible()David S. Miller
We can elide flush_tlb_*() calls when _PAGE_VALID is clear as that is the test used to determine whether or not to queue up a TLB flush in set_pte_at(). Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-09sparc64: Support transparent huge pages.David Miller
This is relatively easy since PMD's now cover exactly 4MB of memory. Our PMD entries are 32-bits each, so we use a special encoding. The lowest bit, PMD_ISHUGE, determines the interpretation. This is possible because sparc64's page tables are purely software entities so we can use whatever encoding scheme we want. We just have to make the TLB miss assembler page table walkers aware of the layout. set_pmd_at() works much like set_pte_at() but it has to operate in two page from a table of non-huge PTEs, so we have to queue up TLB flushes based upon what mappings are valid in the PTE table. In the second regime we are going from huge-page to non-huge-page, and in that case we need only queue up a single TLB flush to push out the huge page mapping. We still have 5 bits remaining in the huge PMD encoding so we can very likely support any new pieces of THP state tracking that might get added in the future. With lots of help from Johannes Weiner. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09sparc64: Document PGD and PMD layout.David Miller
We're going to be messing around with the PMD interpretation and layout for the sake of transparent huge pages, so we better clearly document what we're starting with. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09sparc64: Halve the size of PTE tablesDavid Miller
The reason we want to do this is to facilitate transparent huge page support. Right now PMD's cover 8MB of address space, and our huge page size is 4MB. The current transparent hugepage support is not able to handle HPAGE_SIZE != PMD_SIZE. So make PTE tables be sized to half of a page instead of a full page. We can still map properly the whole supported virtual address range which on sparc64 requires 44 bits. Add a compile time CPP test which ensures that this requirement is always met. There is a minor inefficiency added by this change. We only use half of the page for PTE tables. It's not trivial to use only half of the page yet still get all of the pgtable_page_{ctor,dtor}() stuff working properly. It is doable, and that will come in a subsequent change. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09sparc64: Only support 4MB huge pages and 8KB base pages.David Miller
Narrowing the scope of the page size configurations will make the transparent hugepage changes much simpler. In the end what we really want to do is have the kernel support multiple huge page sizes and use whatever is appropriate as the context dictactes. Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-13sparc: Kill mmu_{un,}lockarea().David S. Miller
These were used on sun4c during floppy data transfers since on that chip we had to lock the cpu mappings into the TLB because we cannot take a TLB miss during the assembler floppy interrupt handler that does the data transfer. That is no longer necessary since we've removed sun4c support, thus this stuff can disappear completely. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-01sparc: pgtable_64: change include orderAaro Koskinen
Fix the following build breakage in v3.4-rc1: CC arch/sparc/kernel/cpu.o In file included from /home/aaro/git/linux/arch/sparc/include/asm/pgtable_64.h:15:0, from /home/aaro/git/linux/arch/sparc/include/asm/pgtable.h:4, from arch/sparc/kernel/cpu.c:15: include/asm-generic/pgtable-nopud.h:13:16: error: unknown type name 'pgd_t' include/asm-generic/pgtable-nopud.h:25:28: error: unknown type name 'pgd_t' include/asm-generic/pgtable-nopud.h:26:27: error: unknown type name 'pgd_t' include/asm-generic/pgtable-nopud.h:27:31: error: unknown type name 'pgd_t' include/asm-generic/pgtable-nopud.h:28:30: error: unknown type name 'pgd_t' include/asm-generic/pgtable-nopud.h:38:34: error: unknown type name 'pgd_t' In file included from /home/aaro/git/linux/arch/sparc/include/asm/pgtable_64.h:783:0, from /home/aaro/git/linux/arch/sparc/include/asm/pgtable.h:4, from arch/sparc/kernel/cpu.c:15: include/asm-generic/pgtable.h: In function 'pgd_none_or_clear_bad': include/asm-generic/pgtable.h:258:2: error: implicit declaration of function 'pgd_none' [-Werror=implicit-function-declaration] include/asm-generic/pgtable.h:260:2: error: implicit declaration of function 'pgd_bad' [-Werror=implicit-function-declaration] include/asm-generic/pgtable.h: In function 'pud_none_or_clear_bad': include/asm-generic/pgtable.h:269:6: error: request for member 'pgd' in something not a structure or union Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-28Disintegrate asm/system.h for SparcDavid Howells
Disintegrate asm/system.h for Sparc. Signed-off-by: David Howells <dhowells@redhat.com> cc: sparclinux@vger.kernel.org
2011-11-17sparc: Kill custom io_remap_pfn_range().David S. Miller
To handle the large physical addresses, just make a simple wrapper around remap_pfn_range() like MIPS does. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-25sparc64: add support for _PAGE_SPECIALDavid S. Miller
Luckily there are still a few software PTE bits remaining and they even match up in both the sun4u and sun4v pte layouts. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-05-25sparc: mmu_gather reworkPeter Zijlstra
Rework the sparc mmu_gather usage to conform to the new world order :-) Sparc mmu_gather does two things: - tracks vaddrs to unhash - tracks pages to free Split these two things like powerpc has done and keep the vaddrs in per-cpu data structures and flush them on context switch. The remaining bits can then use the generic mmu_gather. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jeff Dike <jdike@addtoit.com> Cc: Richard Weinberger <richard@nod.at> Cc: Tony Luck <tony.luck@intel.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-04-21sparc: consolidate show_cpuinfo in cpu.cSam Ravnborg
We have all the cpu related info in cpu.c - so move the remaining functions to support /proc/cpuinfo to this file. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26mm: remove pte_*map_nested()Peter Zijlstra
Since we no longer need to provide KM_type, the whole pte_*map_nested() API is now redundant, remove it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Chris Metcalf <cmetcalf@tilera.com> Cc: David Howells <dhowells@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-20MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itselfRussell King
On VIVT ARM, when we have multiple shared mappings of the same file in the same MM, we need to ensure that we have coherency across all copies. We do this via make_coherent() by making the pages uncacheable. This used to work fine, until we allowed highmem with highpte - we now have a page table which is mapped as required, and is not available for modification via update_mmu_cache(). Ralf Beache suggested getting rid of the PTE value passed to update_mmu_cache(): On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables to construct a pointer to the pte again. Passing a pte_t * is much more elegant. Maybe we might even replace the pte argument with the pte_t? Ben Herrenschmidt would also like the pte pointer for PowerPC: Passing the ptep in there is exactly what I want. I want that -instead- of the PTE value, because I have issue on some ppc cases, for I$/D$ coherency, where set_pte_at() may decide to mask out the _PAGE_EXEC. So, pass in the mapped page table pointer into update_mmu_cache(), and remove the PTE value, updating all implementations and call sites to suit. Includes a fix from Stephen Rothwell: sparc: fix fallout from update_mmu_cache API change Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-09-28sparc64: Increase vmalloc size to fix percpu regressions.David S. Miller
Since we now use the embedding percpu allocator we have to make the vmalloc area at least as large as the stretch can be between nodes. Besides some minor asm adjustments, this turned out to be pretty trivial. Signed-off-by: David S. Miller <davem@davemloft.net>
2009-08-25sparc64: Validate linear D-TLB misses.David S. Miller
When page alloc debugging is not enabled, we essentially accept any virtual address for linear kernel TLB misses. But with kgdb, kernel address probing, and other facilities we can try to access arbitrary crap. So, make sure the address we miss on will translate to physical memory that actually exists. In order to make this work we have to embed the valid address bitmap into the kernel image. And in order to make that less expensive we make an adjustment, in that the max physical memory address is decreased to "1 << 41", even on the chips that support a 42-bit physical address space. We can do this because bit 41 indicates "I/O space" and thus covers non-memory ranges. The result of this is that: 1) kpte_linear_bitmap shrinks from 2K to 1K in size 2) we need 64K more for the valid address bitmap We can't let the valid address bitmap be dynamically allocated once we start using it to validate TLB misses, otherwise we have crazy issues to deal with wrt. recursive TLB misses and such. If we're in a TLB miss it could be the deepest trap level that's legal inside of the cpu. So if we TLB miss referencing the bitmap, the cpu will be out of trap levels and enter RED state. To guard against out-of-range accesses to the bitmap, we have to check to make sure no bits in the physical address above bit 40 are set. We could export and use last_valid_pfn for this check, but that's just an unnecessary extra memory reference. On the plus side of all this, since we load all of these translations into the special 4MB mapping TSB, and we check the TSB first for TLB misses, there should be absolutely no real cost for these new checks in the TLB miss path. Reported-by: heyongli@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
2008-09-12sparc64: Fix sparse warnings in fault.cDavid S. Miller
1) set_brkpt() is referenced by nothing and hasn't been used by anyone to my knowledge for many many years. So just delete it. 2) add extern decl for do_sparc64_fault() in asm/pgtable_64.h Signed-off-by: David S. Miller <davem@davemloft.net>
2008-07-27sparc, sparc64: use arch/sparc/includeSam Ravnborg
The majority of this patch was created by the following script: *** ASM=arch/sparc/include/asm mkdir -p $ASM git mv include/asm-sparc64/ftrace.h $ASM git rm include/asm-sparc64/* git mv include/asm-sparc/* $ASM sed -ie 's/asm-sparc64/asm/g' $ASM/* sed -ie 's/asm-sparc/asm/g' $ASM/* *** The rest was an update of the top-level Makefile to use sparc for header files when sparc64 is being build. And a small fixlet to pick up the correct unistd.h from sparc64 code. Signed-off-by: Sam Ravnborg <sam@ravnborg.org>