aboutsummaryrefslogtreecommitdiff
path: root/arch/arc
AgeCommit message (Collapse)Author
2013-11-13ARC: Incorrect mm reference used in vmalloc fault handlerVineet Gupta
commit 9c41f4eeb9d51f3ece20428d35a3ea32cf3b5622 upstream. A vmalloc fault needs to sync up PGD/PTE entry from init_mm to current task's "active_mm". ARC vmalloc fault handler however was using mm. A vmalloc fault for non user task context (actually pre-userland, from init thread's open for /dev/console) caused the handler to deref NULL mm (for mm->pgd) The reasons it worked so far is amazing: 1. By default (!SMP), vmalloc fault handler uses a cached value of PGD. In SMP that MMU register is repurposed hence need for mm pointer deref. 2. In pre-3.12 SMP kernel, the problem triggering vmalloc didn't exist in pre-userland code path - it was introduced with commit 20bafb3d23d108bc "n_tty: Move buffers into n_tty_data" Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Cc: Gilad Ben-Yossef <gilad@benyossef.com> Cc: Noam Camus <noamc@ezchip.com> Cc: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ARC: Ignore ptrace SETREGSET request for synthetic register "stop_pc"Vineet Gupta
commit 5b24282846c064ee90d40fcb3a8f63b8e754fd28 upstream. ARCompact TRAP_S insn used for breakpoints, commits before exception is taken (updating architectural PC). So ptregs->ret contains next-PC and not the breakpoint PC itself. This is different from other restartable exceptions such as TLB Miss where ptregs->ret has exact faulting PC. gdb needs to know exact-PC hence ARC ptrace GETREGSET provides for @stop_pc which returns ptregs->ret vs. EFA depending on the situation. However, writing stop_pc (SETREGSET request), which updates ptregs->ret doesn't makes sense stop_pc doesn't always correspond to that reg as described above. This was not an issue so far since user_regs->ret / user_regs->stop_pc had same value and both writing to ptregs->ret was OK, needless, but NOT broken, hence not observed. With gdb "jump", they diverge, and user_regs->ret updating ptregs is overwritten immediately with stop_pc, which this patch fixes. Reported-by: Anton Kolesov <akolesov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ARC: Fix signal frame management for SA_SIGINFOChristian Ruppert
commit 10469350e345599dfef3fa78a7c19fb230e674c1 upstream. Previously, when a signal was registered with SA_SIGINFO, parameters 2 and 3 of the signal handler were written to registers r1 and r2 before the register set was saved. This led to corruption of these two registers after returning from the signal handler (the wrong values were restored). With this patch, registers are now saved before any parameters are passed, thus maintaining the processor state from before signal entry. Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ARC: Workaround spinlock livelock in SMP SystemC simulationVineet Gupta
commit 6c00350b573c0bd3635436e43e8696951dd6e1b6 upstream. Some ARC SMP systems lack native atomic R-M-W (LLOCK/SCOND) insns and can only use atomic EX insn (reg with mem) to build higher level R-M-W primitives. This includes a SystemC based SMP simulation model. So rwlocks need to use a protecting spinlock for atomic cmp-n-exchange operation to update reader(s)/writer count. The spinlock operation itself looks as follows: mov reg, 1 ; 1=locked, 0=unlocked retry: EX reg, [lock] ; load existing, store 1, atomically BREQ reg, 1, rety ; if already locked, retry In single-threaded simulation, SystemC alternates between the 2 cores with "N" insn each based scheduling. Additionally for insn with global side effect, such as EX writing to shared mem, a core switch is enforced too. Given that, 2 cores doing a repeated EX on same location, Linux often got into a livelock e.g. when both cores were fiddling with tasklist lock (gdbserver / hackbench) for read/write respectively as the sequence diagram below shows: core1 core2 -------- -------- 1. spin lock [EX r=0, w=1] - LOCKED 2. rwlock(Read) - LOCKED 3. spin unlock [ST 0] - UNLOCKED spin lock [EX r=0,w=1] - LOCKED -- resched core 1---- 5. spin lock [EX r=1] - ALREADY-LOCKED -- resched core 2---- 6. rwlock(Write) - READER-LOCKED 7. spin unlock [ST 0] 8. rwlock failed, retry again 9. spin lock [EX r=0, w=1] -- resched core 1---- 10 spinlock locked in #9, retry #5 11. spin lock [EX gets 1] -- resched core 2---- ... ... The fix was to unlock using the EX insn too (step 7), to trigger another SystemC scheduling pass which would let core1 proceed, eliding the livelock. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ARC: Fix 32-bit wrap around in access_ok()Vineet Gupta
commit 0752adfda15f0eca9859a76da3db1800e129ad43 upstream. Anton reported | LTP tests syscalls/process_vm_readv01 and process_vm_writev01 fail | similarly in one testcase test_iov_invalid -> lvec->iov_base. | Testcase expects errno EFAULT and return code -1, | but it gets return code 1 and ERRNO is 0 what means success. Essentially test case was passing a pointer of -1 which access_ok() was not catching. It was doing [@addr + @sz <= TASK_SIZE] which would pass for @addr == -1 Fixed that by rewriting as [@addr <= TASK_SIZE - @sz] Reported-by: Anton Kolesov <Anton.Kolesov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ARC: Handle zero-overhead-loop in unaligned access handlerMischa Jonker
commit c11eb222fd7d4db91196121dbf854178505d2751 upstream. If a load or store is the last instruction in a zero-overhead-loop, and it's misaligned, the loop would execute only once. This fixes that problem. Signed-off-by: Mischa Jonker <mjonker@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ARC: Fix __udelay calculationMischa Jonker
commit 7efd0da2d17360e1cef91507dbe619db0ee2c691 upstream. Cast usecs to u64, to ensure that the (usecs * 4295 * HZ) multiplication is 64 bit. Initially, the (usecs * 4295 * HZ) part was done as a 32 bit multiplication, with the result casted to 64 bit. This led to some bits falling off, causing a "DMA initialization error" in the stmmac Ethernet driver, due to a premature timeout. Signed-off-by: Mischa Jonker <mjonker@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ARC: SMP failed to boot due to missing IVT setupNoam Camus
commit c3567f8a359b7917dcffa442301f88ed0a75211f upstream. Commit 05b016ecf5e7a "ARC: Setup Vector Table Base in early boot" moved the Interrupt vector Table setup out of arc_init_IRQ() which is called for all CPUs, to entry point of boot cpu only, breaking booting of others. Fix by adding the same to entry point of non-boot CPUs too. read_arc_build_cfg_regs() printing IVT Base Register didn't help the casue since it prints a synthetic value if zero which is totally bogus, so fix that to print the exact Register. [vgupta: Remove the now stale comment from header of arc_init_IRQ and also added the commentary for halt-on-reset] Cc: Gilad Ben-Yossef <gilad@benyossef.com> Signed-off-by: Noam Camus <noamc@ezchip.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-10-18ARC: Setup Vector Table Base in early bootVineet Gupta
commit 05b016ecf5e7a8c24409d8e9effb5d2ec9107708 upstream. Otherwise early boot exceptions such as instructions errors due to configuration mismatch between kernel and hardware go off to la-la land, as opposed to hitting the handler and panic()'ing properly. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-29ARC: [lib] strchr breakage in Big-endian configurationJoern Rennecke
commit b0f55f2a1a295c364be012e82dbab079a2454006 upstream. For a search buffer, 2 byte aligned, strchr() was returning pointer outside of buffer (buf - 1) ------------->8---------------- // Input buffer (default 4 byte aigned) char *buffer = "1AA_"; // actual search start (to mimick 2 byte alignment) char *current_line = &(buffer[2]); // Character to search for char c = 'A'; char *c_pos = strchr(current_line, c); printf("%s\n", c_pos) --> 'AA_' as oppose to 'A_' ------------->8---------------- Reported-by: Anton Kolesov <Anton.Kolesov@synopsys.com> Debugged-by: Anton Kolesov <Anton.Kolesov@synopsys.com> Cc: Noam Camus <noamc@ezchip.com> Signed-off-by: Joern Rennecke <joern.rennecke@embecosm.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-29ARC: gdbserver breakage in Big-Endian configuration #2Vineet Gupta
[Based on mainline commit 352c1d95e3220d0: "ARC: stop using pt_regs->orig_r8"] Stop using orig_r8 as it could get clobbered by ST in trap_with_param, and further it is semantically not needed either. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-08-29ARC: gdbserver breakage in Big-Endian configuration #1Vineet Gupta
[Based on mainline commit 502a0c775c7f0a: "ARC: pt_regs update #5"] gdbserver needs @stop_pc, served by ptrace, but fetched from pt_regs differently, based on in_brkpt_traps(), which in turn relies on additional machine state in pt_regs->event bitfield. unsigned long orig_r8:16, event:16; For big endian config, this macro was returning false, despite being in breakpoint Trap exception, causing wrong @stop_pc to be returned to gdb. Issue #1: In BE, @event above is at offset 2 in word, while a STW insn at offset 0 was used to update it. Resort to using ST insn which updates the half-word at right location. Issue #2: The union involving bitfields causes all the members to be laid out at offset 0. So with fix #1 above, ASM was now updating at offset 2, "C" code was still referencing at offset 0. Fixed by wrapping bitfield in a struct. Reported-by: Noam Camus <noamc@ezchip.com> Tested-by: Anton Kolesov <akolesov@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-05-25ARC: lazy dcache flush broke gdb in non-aliasing configsVineet Gupta
gdbserver inserting a breakpoint ends up calling copy_user_page() for a code page. The generic version of which (non-aliasing config) didn't set the PG_arch_1 bit hence update_mmu_cache() didn't sync dcache/icache for corresponding dynamic loader code page - causing garbade to be executed. So now aliasing versions of copy_user_highpage()/clear_page() are made default. There is no significant overhead since all of special alias handling code is compiled out for non-aliasing build Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-23ARC: Use enough bits for determining page's cache colorVineet Gupta
The current code uses 2 bits for determining page's dcache color, thus sorting pages into 4 bins, whereas the aliasing dcache really has 2 bins (8k page, 64k dcache - 4 way-set-assoc). This can cause extraneous flushes - e.g. color 0 and 2. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-23ARC: Brown paper bag bug in macro for checking cache colorVineet Gupta
The VM_EXEC check in update_mmu_cache() was getting optimized away because of a stupid error in definition of macro addr_not_cache_congruent() The intention was to have the equivalent of following: if (a || (1 ? b : 0)) but we ended up with following: if (a || 1 ? b : 0) And because precedence of '||' is more that that of '?', gcc was optimizing away evaluation of <a> Nasty Repercussions: 1. For non-aliasing configs it would mean some extraneous dcache flushes for non-code pages if U/K mappings were not congruent. 2. For aliasing config, some needed dcache flush for code pages might be missed if U/K mappings were congruent. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-23ARC: copy_(to|from)_user() to honor usermode-access permissionsVineet Gupta
This manifested as grep failing psuedo-randomly: -------------->8--------------------- [ARCLinux]$ ip address show lo | grep inet [ARCLinux]$ ip address show lo | grep inet [ARCLinux]$ ip address show lo | grep inet [ARCLinux]$ [ARCLinux]$ ip address show lo | grep inet inet 127.0.0.1/8 scope host lo -------------->8--------------------- ARC700 MMU provides fully orthogonal permission bits per page: Ur, Uw, Ux, Kr, Kw, Kx The user mode page permission templates used to have all Kernel mode access bits enabled. This caused a tricky race condition observed with uClibc buffered file read and UNIX pipes. 1. Read access to an anon mapped page in libc .bss: write-protected zero_page mapped: TLB Entry installed with Ur + K[rwx] 2. grep calls libc:getc() -> buffered read layer calls read(2) with the internal read buffer in same .bss page. The read() call is on STDIN which has been redirected to a pipe. read(2) => sys_read() => pipe_read() => copy_to_user() 3. Since page has Kernel-write permission (despite being user-mode write-protected), copy_to_user() suceeds w/o taking a MMU TLB-Miss Exception (page-fault for ARC). core-MM is unaware that kernel erroneously wrote to the reserved read-only zero-page (BUG #1) 4. Control returns to userspace which now does a write to same .bss page Since Linux MM is not aware that page has been modified by kernel, it simply reassigns a new writable zero-init page to mapping, loosing the prior write by kernel - effectively zero'ing out the libc read buffer under the hood - hence grep doesn't see right data (BUG #2) The fix is to make all kernel-mode access permissions mirror the user-mode ones. Note that the kernel still has full access to pages, when accessed directly (w/o MMU) - this fix ensures that kernel-mode access in copy_to_from() path uses the same faulting access model as for pure user accesses to keep MM fully aware of page state. The issue is peudo-random because it only shows up if the TLB entry installed in #1 is present at the time of #3. If it is evicted out, due to TLB pressure or some-such, then copy_to_user() does take a TLB Miss Exception, with a routine write-to-anon COW processing installing a fresh page for kernel writes and also usable as it is in userspace. Further the issue was dormant for so long as it depends on where the libc internal read buffer (in .bss) is mapped at runtime. If it happens to reside in file-backed data mapping of libc (in the page-aligned slack space trailing the file backed data), loader zero padding the slack space, does the early cow page replacement, setting things up at the very beginning itself. With gcc 4.8 based builds, the libc buffer got pushed out to a real anon mapping which triggers the issue. Reported-by: Anton Kolesov <akolesov@synopsys.com> Cc: <stable@vger.kernel.org> # 3.9 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-23ARC: [mm] Prevent stray dcache lines after__sync_icache_dcach()Vineet Gupta
Flush and INVALIDATE the dcache page. This helper is only used for writeback of CODE pages to memory. So there's no value in keeping the dcache lines around. Infact it is risky as a writeback on natural eviction under pressure can cause un-needed writeback with weird issues on aliasing dcache configurations. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-15ARC: [TB10x] Remove redundant abilis,simple-pinctrl mechanismChristian Ruppert
The TB10x platform port includes a custom mechanism using to set up default pin controller configurations using abilis,simple-default pin configurations of nodes compatible with abilis,simple-pinctrl. This mechanism is redundant with the Linux standard "default" pin configuration, see commit ab78029ecc347debbd737f06688d788bd9d60c1d "drivers/pinctrl: grab default handles from device core". This patch removes the TB10x custom mechanism in favour of the Linux standard. Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com> Reviewed-by: Stephen Warren <swarren@nvidia.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-10Merge tag 'arc-v3.10-rc1-part2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull second set of arc arch updates from Vineet Gupta: "Aliasing VIPT dcache support for ARC I'm satisified with testing, specially with fuse which has historically given grief to VIPT arches (ARM/PARISC...)" * tag 'arc-v3.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: [TB10x] Remove GENERIC_GPIO ARC: [mm] Aliasing VIPT dcache support 4/4 ARC: [mm] Aliasing VIPT dcache support 3/4 ARC: [mm] Aliasing VIPT dcache support 2/4 ARC: [mm] Aliasing VIPT dcache support 1/4 ARC: [mm] refactor the core (i|d)cache line ops loops ARC: [mm] serious bug in vaddr based icache flush
2013-05-10ARC: [TB10x] Remove GENERIC_GPIOVineet Gupta
This tracks Alexandre Courbot's mainline GPIO rework Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Alexandre Courbot <acourbot@nvidia.com>
2013-05-09Merge tag 'arc-v3.10-rc1-part1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC port updates from Vineet Gupta: "Support for two new platforms based on ARC700: - Abilis TB10x SoC [Chritisian/Pierrick] - Simulator only System-C Model [Mischa] ARC specific MM improvements: - Avoid full TLB flush (ASID increment) on munmap (even single page) - VIPT Cache Flushing improvements + Delayed dcache flush for non-aliasing dcache (big performance boost) + icache flush aliasing agnostic (no need to kill all possible aliases) Others: - Avoid needless rebuild of DTB files for every kernel build - Remove builtin cmdline as that is already provided by DeviceTree/bootargs - Fixing unaligned access emulation corner case - checkpatch fixes [Sachin] - Various fixlets [Noam] - Minor build failures/cleanups" * tag 'arc-v3.10-rc1-part1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (35 commits) ARC: [mm] Lazy D-cache flush (non aliasing VIPT) ARC: [mm] micro-optimize page size icache invalidate ARC: [mm] remove the pessimistic all-alias-invalidate icache helpers ARC: [mm] consolidate icache/dcache sync code ARC: [mm] optimise icache flush for kernel mappings ARC: [mm] optimise icache flush for user mappings ARC: [mm] optimize needless full mm TLB flush on munmap ARC: Add support for nSIM OSCI System C model ARC: [TB10x] Adapt device tree to new compatible string ARC: [TB10x] Add support for TB10x platform ARC: [TB10x] Device tree of TB100 and TB101 Development Kits ARC: Prepare interrupt code for external controllers ARC: Allow embedded arc-intc to be properly placed in DT intc hierarchy ARC: [cmdline] Don't overwrite u-boot provided bootargs ARC: [cmdline] Remove CONFIG_CMDLINE ARC: [plat-arcfpga] defconfig update ARC: unaligned access emulation broken if callee-reg dest of LD/ST ARC: unaligned access emulation error handling consolidation ARC: Debug/crash-printing Improvements ARC: fix typo with clock speed ...
2013-05-09ARC: [mm] Aliasing VIPT dcache support 4/4Vineet Gupta
Enforce congruency of userspace shared mappings Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-09ARC: [mm] Aliasing VIPT dcache support 3/4Vineet Gupta
Fix the one zillion warnings Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-09ARC: [mm] Aliasing VIPT dcache support 2/4Vineet Gupta
This is the meat of the series which prevents any dcache alias creation by always keeping the U and K mapping of a page congruent. If a mapping already exists, and other tries to access the page, prev one is flushed to physical page (wback+inv) Essentially flush_dcache_page()/copy_user_highpage() create K-mapping of a page, but try to defer flushing, unless U-mapping exist. When page is actually mapped to userspace, update_mmu_cache() flushes the K-mapping (in certain cases this can be optimised out) Additonally flush_cache_mm(), flush_cache_range(), flush_cache_page() handle the puring of stale userspace mappings on exit/munmap... flush_anon_page() handles the existing U-mapping for anon page before kernel reads it via the GUP path. Note that while not complete, this is enough to boot a simple dynamically linked Busybox based rootfs Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-09ARC: [mm] Aliasing VIPT dcache support 1/4Vineet Gupta
This preps the low level dcache flush helpers to take vaddr argument in addition to the existing paddr to properly flush the VIPT dcache Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-09ARC: [mm] refactor the core (i|d)cache line ops loopsVineet Gupta
Nothing semantical * simplify the alignement code by using & operation only * rename variables clearly as paddr Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-09ARC: [mm] serious bug in vaddr based icache flushVineet Gupta
vaddr used to index the cache was clipped from the wrong end, and thus would potentially fail to flush the correct lines. The problem was dorment for so long because up until the recent optimizations it was only used for ptrace break-point only flushes. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: [mm] Lazy D-cache flush (non aliasing VIPT)Vineet Gupta
flush_dcache_page( ) is MM hook to ensure that a page has consistent views between kernel and userspace. Thus it is called when * kernel writes to a page which at some later point could get mapped to userspace (so kernel mapping needs to be flushed-n-inv) * kernel is about to read from a page with possible userspace mappings (so userspace mappings needs to be made coherent with kernel ones) However for Non aliasing VIPT dcache, any userspace mapping will always be congruent to kernel mapping. Thus d-cache need need not be flushed at all (or delayed indefinitely). The only reason it does need to be flushed is when mapping code pages. Since icache doesn't snoop dcache, those dirty dcache lines need to be written back to memory and icache line invalidated so that icache lines fetch will get the right data. Decent gains on LMBench fork/exec/sh and File I/O micro-benchmarks. (1) FPGA @ 80 MHZ Processor, Processes - times in microseconds - smaller is better ------------------------------------------------------------------------------ Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 3.9-rc6-a Linux 3.9.0-r 80 4.79 8.72 66.7 116. 239. 8.39 30.4 4798 14.K 34.K 3.9-rc6-b Linux 3.9.0-r 80 4.79 8.62 65.4 111. 239. 8.35 29.0 3995 12.K 30.K 3.9-rc7-c Linux 3.9.0-r 80 4.79 9.00 66.1 106. 239. 8.61 30.4 2858 10.K 24.K ^^^^ ^^^^ ^^^ File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- 3.9-rc6-a Linux 3.9.0-r 317.8 204.2 1122.3 375.1 3522.0 4.288 20.7 126.8 3.9-rc6-b Linux 3.9.0-r 298.7 223.0 1141.6 367.8 3531.0 4.866 20.9 126.4 3.9-rc7-c Linux 3.9.0-r 278.4 179.2 862.1 339.3 3705.0 3.223 20.3 126.6 ^^^^^ ^^^^^ ^^^^^ ^^^^ (2) Customer Silicon @ 500 MHz (166 MHz mem) ------------------------------------------------------------------------------ Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- abilis-ba Linux 3.9.0-r 497 0.71 1.38 4.58 12.0 35.5 1.40 3.89 2070 5525 13.K abilis-ca Linux 3.9.0-r 497 0.71 1.40 4.61 11.8 35.6 1.37 3.92 1411 4317 10.K ^^^^ ^^^^ ^^^ Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: [mm] micro-optimize page size icache invalidateVineet Gupta
start address is already page aligned and size is const PAGE_SIZE, thus fixups for alignment not needed in generated code. bloat-o-meter vmlinux-mm5 vmlinux add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-32 (-32) function old new delta __inv_icache_page 82 50 -32 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: [mm] remove the pessimistic all-alias-invalidate icache helpersVineet Gupta
No users of this code anymore - so RIP ! Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: [mm] consolidate icache/dcache sync codeVineet Gupta
Now that we have same helper used for all icache invalidates (i.e. vaddr+paddr based exact line invalidate), consolidate the open coded calls into one place. Also rename flush_icache_range_vaddr => __sync_icache_dcache Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: [mm] optimise icache flush for kernel mappingsVineet Gupta
This change continues the theme from prev commit - this time icache handling for kernel's own code modification (vmalloc: loadable modules, breakpoints for kprobes/kgdb...) flush_icache_range() calls the CDU icache helper with vaddr to enable exact line invalidate. For a true kernel-virtual mapping, the vaddr is actually virtual hence valid as index into cache. For kprobes breakpoint however, the vaddr arg is actually paddr - since that's how normal kernel is mapped in ARC memory map. This implies that CDU will use the same addr for indexing as for tag match - which is fine since kernel code would only have that "implicit" mapping and none other. This should speed up module loading significantly - specially on default ARC700 icache configurations (32k) which alias. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: [mm] optimise icache flush for user mappingsVineet Gupta
ARC icache doesn't snoop dcache thus executable pages need to be made coherent before mapping into userspace in flush_icache_page(). However ARC700 CDU (hardware cache flush module) requires both vaddr (index in cache) as well as paddr (tag match) to correctly identify a line in the VIPT cache. A typical ARC700 SoC has aliasing icache, thus the paddr only based flush_icache_page() API couldn't be implemented efficiently. It had to loop thru all possible alias indexes and perform the invalidate operation (ofcourse the cache op would only succeed at the index(es) where tag matches - typically only 1, but the cost of visiting all the cache-bins needs to paid nevertheless). Turns out however that the vaddr (along with paddr) is available in update_mmu_cache() hence better suits ARC icache flush semantics. With both vaddr+paddr, exactly one flush operation per line is done. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: [mm] optimize needless full mm TLB flush on munmapVineet Gupta
munmap ends up calling tlb_flush() which for ARC was flushing the entire TLB unconditionally (by moving the MMU to a new ASID) do_munmap unmap_region unmap_vmas unmap_single_vma unmap_page_range tlb_start_vma zap_pud_range tlb_end_vma() tlb_finish_mmu tlb_flush() ---> unconditional flush_tlb_mm() So even a single page munmap, a frequent operation when uClibc dynamic linker (ldso) is loading the dependent shared libraries, would move the the ASID multiple times - needlessly invalidating the pre-faulted TLB entries (and increasing the rate of ASID wraparound + full TLB flush). This is now optimised to only be called if tlb->full_mm (which means for exit/execve) cases only. And for those cases, flush_tlb_mm() is already optimised to be a no-op for mm->mm_users == 0. So essentially there are no mmore full mm flushes - except for fork which anyhow needs it for properly COW'ing parent address space. munmap now needs to do TLB range flush, which is implemented with tlb_end_vma() Results ------- 1. ASID now consistenly moves by 4 during a simple ls (as opposed to 5 or 7 before). 2. LMBench microbenchmark also shows improvements Basic system parameters ------------------------------------------------------------------------------ Host OS Description Mhz tlb cache mem scal pages line par load bytes --------- ------------- ----------------------- ---- ----- ----- ------ ---- 3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0404-gcc-4.4-ba 80 8 64 1.1000 1 3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0405-avoid-full 80 8 64 1.1200 1 Processor, Processes - times in microseconds - smaller is better ------------------------------------------------------------------------------ Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc --------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- 3.9-rc5-0 Linux 3.9.0-r 80 4.81 8.69 68.6 118. 239. 8.53 31.6 4839 13.K 34.K 3.9-rc5-0 Linux 3.9.0-r 80 4.46 8.36 53.8 91.3 223. 8.12 24.2 4725 13.K 33.K File & VM system latencies in microseconds - smaller is better ------------------------------------------------------------------------------- Host OS 0K File 10K File Mmap Prot Page 100fd Create Delete Create Delete Latency Fault Fault selct --------- ------------- ------ ------ ------ ------ ------- ----- ------- ----- 3.9-rc5-0 Linux 3.9.0-r 314.7 223.2 1054.9 390.2 3615.0 1.590 20.1 126.6 3.9-rc5-0 Linux 3.9.0-r 265.8 183.8 1014.2 314.1 3193.0 6.910 18.8 110.4 Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: Add support for nSIM OSCI System C modelMischa Jonker
This adds support for an ARC Virtual Platform. This platform is based on the System C standard promoted by the OSCI (Open System C Initiative) and uses nSIM to simulate the ARC CPU core itself. Users can build a virtual SoC by combining System C models of peripherals and CPU cores. Signed-off-by: Mischa Jonker <mjonker@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: [TB10x] Adapt device tree to new compatible stringChristian Ruppert
The original device tree was written using a slightly different implementation of the fixed-factor-clock device tree binding. The compatible string must be modified in order to be compatible with the new implementation. Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: [TB10x] Add support for TB10x platformChristian Ruppert
Infrastructure required to make the Linux kernel compile and boot on the Abilis Systems TB10x series of SOCs based on ARC700 CPUs: - Kmake related files (Kconfig, Makefile, tb10x_defconfig) - TB10x platform initialisation Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com> Signed-off-by: Pierrick Hascoet <pierrick.hascoet@abilis.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: [TB10x] Device tree of TB100 and TB101 Development KitsChristian Ruppert
These are the device tree files for the Abilis Systems TB100 and TB101 ICs and their respective development kit PCBs. These files are committed in preparation of the following patch set which adds support for these chips to the ARC platform. Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com> Signed-off-by: Pierrick Hascoet <pierrick.hascoet@abilis.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: Prepare interrupt code for external controllersChristian Ruppert
This patch adds some room for CPU-external interrupt controllers in the Linux interrupt space. Until now, only the 32 CPU internal interrupt lines were supported which does not allow for external interrupt controllers such as GPIO modules etc. Signed-off-by: Christian Ruppert <christian.ruppert@abilis.com> Signed-off-by: Pierrick Hascoet <pierrick.hascoet@abilis.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: Allow embedded arc-intc to be properly placed in DT intc hierarchyVineet Gupta
arc-intc is initialized in arc common code as it is applicable to all platforms. However platforms with their own external intc still need to refer to it for correct DT interrupt tree hierarchy setup, e.g. static struct of_device_id __initdata tb10x_irq_ids[] = { { .compatible = "snps,arc700-intc", .data = dummy_init_irq }, { .compatible = "abilis,tb10x_ictl", .data = tb10x_init_irq }, {}, }; The fix is to use the generic irqchip framework to tie all irqchips in a special linker section and then call irqchip_init() which calls the DT of_irq_init() for all the intc in one go. That way the platform code need not be aware of arc-intc at all. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: [cmdline] Don't overwrite u-boot provided bootargsVineet Gupta
The existing code was wrong on several counts: * uboot provided bootargs were copied into @boot_command_line, only to be over-written by setup_machine_fdt(), effectively lost * @cmdline_p returned by setup_arch() to start_kernel() didn't include the DT /bootargs Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: [cmdline] Remove CONFIG_CMDLINEVineet Gupta
Given that DeviceTree /bootargs can provide similar functionality, no point in providing duplicate infrastructure. Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: [plat-arcfpga] defconfig updateVineet Gupta
* Allow initramfs path to be symlink * CONFIG_PREEMPT be default Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: unaligned access emulation broken if callee-reg dest of LD/STVineet Gupta
The fixup code correctly updates the callee-regs on stack, but fails to unwind it into actual register file. Thus userspace won't see the update. Reported-by: Noam Camus <noamc@ezchip.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: unaligned access emulation error handling consolidationVineet Gupta
If CONFIG_ARC_MISALIGN_ACCESS is not enabled, or if the fixup fails, call the same error handler: same signal/si_code to user (SIGBUS) Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: Debug/crash-printing ImprovementsVineet Gupta
* Remove the line-break between scratch/callee-regs (sneaked in when we converted from printk to pr_* * Use %pS to print the symbol names of faulting PC (ret pseudo register) and BLINK (call return register) * Don't print user-vma for a kernel crash (only do it for print-fatal-signals based regfile dump) * Verbose print the Interrupt/Exception Enable/Active state * for main executable link address is 0x10000 based (vs. 0) thus offset of faulting PC needs to be adjusted Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: fix typo with clock speedNoam Camus
Signed-off-by: Noam Camus <noamc@ezchip.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: Respect the cpu_id passed for fetching correct cpu infoNoam Camus
Signed-off-by: Noam Camus <noamc@ezchip.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-05-07ARC: Remove non existent refs to GENERIC_KERNEL_EXECVE & GENERIC_KERNEL_THREADAlexander Shiyan
This tracks mainline commit ae903caae267 "Bury the conditionals from kernel_thread/kernel_execve series" which we missed out as ARC port was not yet mainline. [vgupta: commit log modified] Signed-off-by: Alexander Shiyan <shc_work@mail.ru> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2013-04-30arc, print-fatal-signals: reduce duplicated informationVineet Gupta
After the recent generic debug info on dump_stack() and friends, arc is printing duplicate information on debug dumps. [ARCLinux]$ ./crash crash/50: potentially unexpected fatal signal 11. <-- [1] /sbin/crash, TGID 50 <-- [2] Pid: 50, comm: crash Not tainted 3.9.0-rc4+ #132 <-- [3] ... Remove them. [tj@kernel.org: updated patch desc] Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: David S. Miller <davem@davemloft.net> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mike Frysinger <vapier@gentoo.org> Cc: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>