aboutsummaryrefslogtreecommitdiff
path: root/arch/powerpc/include/asm/exception-64s.h
AgeCommit message (Collapse)Author
2014-01-08powerpc: Fix bad stack check in exception entryMichael Neuling
commit 90ff5d688e61f49f23545ffab6228bd7e87e6dc7 upstream. In EXCEPTION_PROLOG_COMMON() we check to see if the stack pointer (r1) is valid when coming from the kernel. If it's not valid, we die but with a nice oops message. Currently we allocate a stack frame (subtract INT_FRAME_SIZE) before we check to see if the stack pointer is negative. Unfortunately, this won't detect a bad stack where r1 is less than INT_FRAME_SIZE. This patch fixes the check to compare the modified r1 with -INT_FRAME_SIZE. With this, bad kernel stack pointers (including NULL pointers) are correctly detected again. Kudos to Paulus for finding this. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-06-20powerpc: Fix stack overflow crash in resume_kernel when ftracingMichael Ellerman
commit 0e37739b1c96d65e6433998454985de994383019 upstream. It's possible for us to crash when running with ftrace enabled, eg: Bad kernel stack pointer bffffd12 at c00000000000a454 cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40] pc: c00000000000a454: resume_kernel+0x34/0x60 lr: c00000000000335c: performance_monitor_common+0x15c/0x180 sp: bffffd12 msr: 8000000000001032 dar: bffffd12 dsisr: 42000000 If we look at current's stack (paca->__current->stack) we see it is equal to c0000002ecab0000. Our stack is 16K, and comparing to paca->kstack (c0000002ecab3e30) we can see that we have overflowed our kernel stack. This leads to us writing over our struct thread_info, and in this case we have corrupted thread_info->flags and set _TIF_EMULATE_STACK_STORE. Dumping the stack we see: 3:mon> t c0000002ecab0000 [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70 [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180 --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30 [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable) [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130 [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28 [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90 [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34 [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300 [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180 --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0 [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable) [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280 [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130 [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28 [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40 [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34 --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0 ... and so on __ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry path. At that point the irq state is not consistent, ie. interrupts are hard disabled (by the exception entry), but the paca soft-enabled flag may be out of sync. This leads to the local_irq_restore() in trace_graph_entry() actually enabling interrupts, which we do not want. Because we have not yet reprogrammed the decrementer we immediately take another decrementer exception, and recurse. The fix is twofold. Firstly make sure we call DISABLE_INTS before calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles the irq state in the paca with the hardware, making it safe again to call local_irq_save/restore(). Although that should be sufficient to fix the bug, we also mark the runlatch routines as notrace. They are called very early in the exception entry and we are asking for trouble tracing them. They are also fairly uninteresting and tracing them just adds unnecessary overhead. [ This regression was introduced by fe1952fc0afb9a2e4c79f103c08aef5d13db1873 "powerpc: Rework runlatch code" by myself --BenH ] Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-05-09powerpc/irq: Make alignment & program interrupt behave the sameBenjamin Herrenschmidt
Alignment was the last user of the ENABLE_INTS macro, which we can now remove. All non-syscall exceptions now disable interrupts on entry, they get re-enabled conditionally from C code. Don't unconditionally re-enable in program check either, check the original context. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc: Rework lazy-interrupt handlingBenjamin Herrenschmidt
The current implementation of lazy interrupts handling has some issues that this tries to address. We don't do the various workarounds we need to do when re-enabling interrupts in some cases such as when returning from an interrupt and thus we may still lose or get delayed decrementer or doorbell interrupts. The current scheme also makes it much harder to handle the external "edge" interrupts provided by some BookE processors when using the EPR facility (External Proxy) and the Freescale Hypervisor. Additionally, we tend to keep interrupts hard disabled in a number of cases, such as decrementer interrupts, external interrupts, or when a masked decrementer interrupt is pending. This is sub-optimal. This is an attempt at fixing it all in one go by reworking the way we do the lazy interrupt disabling from the ground up. The base idea is to replace the "hard_enabled" field with a "irq_happened" field in which we store a bit mask of what interrupt occurred while soft-disabled. When re-enabling, either via arch_local_irq_restore() or when returning from an interrupt, we can now decide what to do by testing bits in that field. We then implement replaying of the missed interrupts either by re-using the existing exception frame (in exception exit case) or via the creation of a new one from an assembly trampoline (in the arch_local_irq_enable case). This removes the need to play with the decrementer to try to create fake interrupts, among others. In addition, this adds a few refinements: - We no longer hard disable decrementer interrupts that occur while soft-disabled. We now simply bump the decrementer back to max (on BookS) or leave it stopped (on BookE) and continue with hard interrupts enabled, which means that we'll potentially get better sample quality from performance monitor interrupts. - Timer, decrementer and doorbell interrupts now hard-enable shortly after removing the source of the interrupt, which means they no longer run entirely hard disabled. Again, this will improve perf sample quality. - On Book3E 64-bit, we now make the performance monitor interrupt act as an NMI like Book3S (the necessary C code for that to work appear to already be present in the FSL perf code, notably calling nmi_enter instead of irq_enter). (This also fixes a bug where BookE perfmon interrupts could clobber r14 ... oops) - We could make "masked" decrementer interrupts act as NMIs when doing timer-based perf sampling to improve the sample quality. Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2: - Add hard-enable to decrementer, timer and doorbells - Fix CR clobber in masked irq handling on BookE - Make embedded perf interrupt act as an NMI - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want to retrigger an interrupt without preventing hard-enable v3: - Fix or vs. ori bug on Book3E - Fix enabling of interrupts for some exceptions on Book3E v4: - Fix resend of doorbells on return from interrupt on Book3E v5: - Rebased on top of my latest series, which involves some significant rework of some aspects of the patch. v6: - 32-bit compile fix - more compile fixes with various .config combos - factor out the asm code to soft-disable interrupts - remove the C wrapper around preempt_schedule_irq v7: - Fix a bug with hard irq state tracking on native power7
2012-03-09powerpc: Replace mfmsr instructions with load from PACA kernel_msr fieldBenjamin Herrenschmidt
On 64-bit, the mfmsr instruction can be quite slow, slower than loading a field from the cache-hot PACA, which happens to already contain the value we want in most cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc: Improve behaviour of irq tracing on 64-bit exception entryBenjamin Herrenschmidt
Some exceptions would unconditionally disable interrupts on entry, which is fine, but calling lockdep every time not only adds more overhead than strictly needed, but also means we get quite a few "redudant" disable logged, which makes it hard to spot the really bad ones. So instead, split the macro used by the exception code into a normal one and a separate one used when CONFIG_TRACE_IRQFLAGS is enabled, and make the later skip th tracing if interrupts were already disabled. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc: Rework runlatch codeBenjamin Herrenschmidt
This moves the inlines into system.h and changes the runlatch code to use the thread local flags (non-atomic) rather than the TIF flags (atomic) to keep track of the latch state. The code to turn it back on in an asynchronous interrupt is now simplified and partially inlined. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc: Use the same interrupt prolog for perfmon as other interruptsBenjamin Herrenschmidt
The perfmon interrupt is the sole user of a special variant of the interrupt prolog which differs from the one used by external and timer interrupts in that it saves the non-volatile GPRs and doesn't turn the runlatch on. The former is unnecessary and the later is arguably incorrect, so let's clean that up by using the same prolog. While at it we rename that prolog to use the _ASYNC prefix. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-03-09powerpc: Remove legacy iSeries bits from assembly filesBenjamin Herrenschmidt
This removes the various bits of assembly in the kernel entry, exception handling and SLB management code that were specific to running under the legacy iSeries hypervisor which is no longer supported. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-07-12KVM: PPC: book3s_hv: Add support for PPC970-family processorsPaul Mackerras
This adds support for running KVM guests in supervisor mode on those PPC970 processors that have a usable hypervisor mode. Unfortunately, Apple G5 machines have supervisor mode disabled (MSR[HV] is forced to 1), but the YDL PowerStation does have a usable hypervisor mode. There are several differences between the PPC970 and POWER7 in how guests are managed. These differences are accommodated using the CPU_FTR_ARCH_201 (PPC970) and CPU_FTR_ARCH_206 (POWER7) CPU feature bits. Notably, on PPC970: * The LPCR, LPID or RMOR registers don't exist, and the functions of those registers are provided by bits in HID4 and one bit in HID0. * External interrupts can be directed to the hypervisor, but unlike POWER7 they are masked by MSR[EE] in non-hypervisor modes and use SRR0/1 not HSRR0/1. * There is no virtual RMA (VRMA) mode; the guest must use an RMO (real mode offset) area. * The TLB entries are not tagged with the LPID, so it is necessary to flush the whole TLB on partition switch. Furthermore, when switching partitions we have to ensure that no other CPU is executing the tlbie or tlbsync instructions in either the old or the new partition, otherwise undefined behaviour can occur. * The PMU has 8 counters (PMC registers) rather than 6. * The DSCR, PURR, SPURR, AMR, AMOR, UAMOR registers don't exist. * The SLB has 64 entries rather than 32. * There is no mediated external interrupt facility, so if we switch to a guest that has a virtual external interrupt pending but the guest has MSR[EE] = 0, we have to arrange to have an interrupt pending for it so that we can get control back once it re-enables interrupts. We do that by sending ourselves an IPI with smp_send_reschedule after hard-disabling interrupts. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: PPC: Add support for Book3S processors in hypervisor modePaul Mackerras
This adds support for KVM running on 64-bit Book 3S processors, specifically POWER7, in hypervisor mode. Using hypervisor mode means that the guest can use the processor's supervisor mode. That means that the guest can execute privileged instructions and access privileged registers itself without trapping to the host. This gives excellent performance, but does mean that KVM cannot emulate a processor architecture other than the one that the hardware implements. This code assumes that the guest is running paravirtualized using the PAPR (Power Architecture Platform Requirements) interface, which is the interface that IBM's PowerVM hypervisor uses. That means that existing Linux distributions that run on IBM pSeries machines will also run under KVM without modification. In order to communicate the PAPR hypercalls to qemu, this adds a new KVM_EXIT_PAPR_HCALL exit code to include/linux/kvm.h. Currently the choice between book3s_hv support and book3s_pr support (i.e. the existing code, which runs the guest in user mode) has to be made at kernel configuration time, so a given kernel binary can only do one or the other. This new book3s_hv code doesn't support MMIO emulation at present. Since we are running paravirtualized guests, this isn't a serious restriction. With the guest running in supervisor mode, most exceptions go straight to the guest. We will never get data or instruction storage or segment interrupts, alignment interrupts, decrementer interrupts, program interrupts, single-step interrupts, etc., coming to the hypervisor from the guest. Therefore this introduces a new KVMTEST_NONHV macro for the exception entry path so that we don't have to do the KVM test on entry to those exception handlers. We do however get hypervisor decrementer, hypervisor data storage, hypervisor instruction storage, and hypervisor emulation assist interrupts, so we have to handle those. In hypervisor mode, real-mode accesses can access all of RAM, not just a limited amount. Therefore we put all the guest state in the vcpu.arch and use the shadow_vcpu in the PACA only for temporary scratch space. We allocate the vcpu with kzalloc rather than vzalloc, and we don't use anything in the kvmppc_vcpu_book3s struct, so we don't allocate it. We don't have a shared page with the guest, but we still need a kvm_vcpu_arch_shared struct to store the values of various registers, so we include one in the vcpu_arch struct. The POWER7 processor has a restriction that all threads in a core have to be in the same partition. MMU-on kernel code counts as a partition (partition 0), so we have to do a partition switch on every entry to and exit from the guest. At present we require the host and guest to run in single-thread mode because of this hardware restriction. This code allocates a hashed page table for the guest and initializes it with HPTEs for the guest's Virtual Real Memory Area (VRMA). We require that the guest memory is allocated using 16MB huge pages, in order to simplify the low-level memory management. This also means that we can get away without tracking paging activity in the host for now, since huge pages can't be paged or swapped. This also adds a few new exports needed by the book3s_hv code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12KVM: PPC: Split host-state fields out of kvmppc_book3s_shadow_vcpuPaul Mackerras
There are several fields in struct kvmppc_book3s_shadow_vcpu that temporarily store bits of host state while a guest is running, rather than anything relating to the particular guest or vcpu. This splits them out into a new kvmppc_host_state structure and modifies the definitions in asm-offsets.c to suit. On 32-bit, we have a kvmppc_host_state structure inside the kvmppc_book3s_shadow_vcpu since the assembly code needs to be able to get to them both with one pointer. On 64-bit they are separate fields in the PACA. This means that on 64-bit we don't need to copy the kvmppc_host_state in and out on vcpu load/unload, and in future will mean that the book3s_hv code doesn't need a shadow_vcpu struct in the PACA at all. That does mean that we have to be careful not to rely on any values persisting in the hstate field of the paca across any point where we could block or get preempted. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-07-12powerpc, KVM: Rework KVM checks in first-level interrupt handlersPaul Mackerras
Instead of branching out-of-line with the DO_KVM macro to check if we are in a KVM guest at the time of an interrupt, this moves the KVM check inline in the first-level interrupt handlers. This speeds up the non-KVM case and makes sure that none of the interrupt handlers are missing the check. Because the first-level interrupt handlers are now larger, some things had to be move out of line in exceptions-64s.S. This all necessitated some minor changes to the interrupt entry code in KVM. This also streamlines the book3s_32 KVM test. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
2011-05-04powerpc: Save Come-From Address Register (CFAR) in exception framePaul Mackerras
Recent 64-bit server processors (POWER6 and POWER7) have a "Come-From Address Register" (CFAR), that records the address of the most recent branch or rfid (return from interrupt) instruction for debugging purposes. This saves the value of the CFAR in the exception entry code and stores it in the exception frame. We also make xmon print the CFAR value in its register dump code. Rather than extend the pt_regs struct at this time, we steal the orig_gpr3 field, which is only used for system calls, and use it for the CFAR value for all exceptions/interrupts other than system calls. This means we don't save the CFAR on system calls, which is not a great problem since system calls tend not to happen unexpectedly, and also avoids adding the overhead of reading the CFAR to the system call entry path. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-05-04powerpc: Save register r9-r13 values accurately on interrupt with bad stackPaul Mackerras
When we take an interrupt or exception from kernel mode and the stack pointer is obviously not a kernel address (i.e. the top bit is 0), we switch to an emergency stack, save register values and panic. However, on 64-bit server machines, we don't actually save the values of r9 - r13 at the time of the interrupt, but rather values corrupted by the exception entry code for r12-r13, and nothing at all for r9-r11. This fixes it by passing a pointer to the register save area in the paca through to the bad_stack code in r3. The register values are saved in one of the paca register save areas (depending on which exception this is). Using the pointer in r3, the bad_stack code now retrieves the saved values of r9 - r13 and stores them in the exception frame on the emergency stack. This also stores the normal exception frame marker ("regshere") in the exception frame. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-20powerpc: Always use SPRN_SPRG_HSCRATCH0 when running in HV modePaul Mackerras
This uses feature sections to arrange that we always use HSPRG1 as the scratch register in the interrupt entry code rather than SPRG2 when we're running in hypervisor mode on POWER7. This will ensure that we don't trash the guest's SPRG2 when we are running KVM guests. To simplify the code, we define GET_SCRATCH0() and SET_SCRATCH0() macros like the GET_PACA/SET_PACA macros. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-20powerpc: More work to support HV exceptionsBenjamin Herrenschmidt
Rework exception macros a bit to split offset from vector and add some basic support for HDEC, HDSI, HISI and a few more. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-20powerpc: Base support for exceptions using HSRR0/1Benjamin Herrenschmidt
Pass the register type to the prolog, also provides alternate "HV" version of hardware interrupt (0x500) and adjust LPES accordingly We tag those interrupts by setting bit 0x2 in the trap number Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2011-04-20powerpc: In HV mode, use HSPRG0 for PACABenjamin Herrenschmidt
When running in Hypervisor mode (arch 2.06 or later), we store the PACA in HSPRG0 instead of SPRG1. The architecture specifies that SPRGs may be lost during a "nap" power management operation (though they aren't currently on POWER7) and this enables use of SPRG1 by KVM guests. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2010-09-02powerpc: Account time using timebase rather than PURRPaul Mackerras
Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the PURR register for measuring the user and system time used by processes, as well as other related times such as hardirq and softirq times. This turns out to be quite confusing for users because it means that a program will often be measured as taking less time when run on a multi-threaded processor (SMT2 or SMT4 mode) than it does when run on a single-threaded processor (ST mode), even though the program takes longer to finish. The discrepancy is accounted for as stolen time, which is also confusing, particularly when there are no other partitions running. This changes the accounting to use the timebase instead, meaning that the reported user and system times are the actual number of real-time seconds that the program was executing on the processor thread, regardless of which SMT mode the processor is in. Thus a program will generally show greater user and system times when run on a multi-threaded processor than on a single-threaded processor. On pSeries systems on POWER5 or later processors, we measure the stolen time (time when this partition wasn't running) using the hypervisor dispatch trace log. We check for new entries in the log on every entry from user mode and on every transition from kernel process context to soft or hard IRQ context (i.e. when account_system_vtime() gets called). So that we can correctly distinguish time stolen from user time and time stolen from system time, without having to check the log on every exit to user mode, we store separate timestamps for exit to user mode and entry from user mode. On systems that have a SPURR (POWER6 and POWER7), we read the SPURR in account_system_vtime() (as before), and then apportion the SPURR ticks since the last time we read it between scaled user time and scaled system time according to the relative proportions of user time and system time over the same interval. This avoids having to read the SPURR on every kernel entry and exit. On systems that have PURR but not SPURR (i.e., POWER5), we do the same using the PURR rather than the SPURR. This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl for now since it conflicts with the use of the dispatch trace log by the time accounting code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05Make head_64.S aware of KVM real mode codeAlexander Graf
We need to run some KVM trampoline code in real mode. Unfortunately, real mode only covers 8MB on Cell so we need to squeeze ourselves as low as possible. Also, we need to trap interrupts to get us back from guest state to host state without telling Linux about it. This patch adds interrupt traps and includes the KVM code that requires real mode in the real mode parts of Linux. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc: Remove use of a second scratch SPRG in STAB codeBenjamin Herrenschmidt
The STAB code used on Power3 and RS/64 uses a second scratch SPRG to save a GPR in order to decide whether to go to do_stab_bolted_* or to handle a normal data access exception. This prevents our scheme of freeing SPRG3 which is user visible for user uses since we cannot use SPRG0 which, on RS/64, seems to be read-only for supervisor mode (like POWER4). This reworks the STAB exception entry to use the PACA as temporary storage instead. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc: Use names rather than numbers for SPRGs (v2)Benjamin Herrenschmidt
The kernel uses SPRG registers for various purposes, typically in low level assembly code as scratch registers or to hold per-cpu global infos such as the PACA or the current thread_info pointer. We want to be able to easily shuffle the usage of those registers as some implementations have specific constraints realted to some of them, for example, some have userspace readable aliases, etc.. and the current choice isn't always the best. This patch should not change any code generation, and replaces the usage of SPRN_SPRGn everywhere in the kernel with a named replacement and adds documentation next to the definition of the names as to what those are used for on each processor family. The only parts that still use the original numbers are bits of KVM or suspend/resume code that just blindly needs to save/restore all the SPRGs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20powerpc: Rename exception.h to exception-64s.hBenjamin Herrenschmidt
The file include/asm/exception.h contains definitions that are specific to exception handling on 64-bit server type processors. This renames the file to exception-64s.h to reflect that fact and avoid confusion. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>