aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-02-13ACPI / init: Flag use of ACPI and ACPI idioms for power supplies to ↵Mark Brown
regulator API commit 49a12877d2777cadcb838981c3c4f5a424aef310 upstream. There is currently no facility in ACPI to express the hookup of voltage regulators, the expectation is that the regulators that exist in the system will be handled transparently by firmware if they need software control at all. This means that if for some reason the regulator API is enabled on such a system it should assume that any supplies that devices need are provided by the system at all relevant times without any software intervention. Tell the regulator core to make this assumption by calling regulator_has_full_constraints(). Do this as soon as we know we are using ACPI so that the information is available to the regulator core as early as possible. This will cause the regulator core to pretend that there is an always on regulator supplying any supply that is requested but that has not otherwise been mapped which is the behaviour expected on a system with ACPI. Should the ability to specify regulators be added in future revisions of ACPI then once we have support for ACPI mappings in the kernel the same assumptions will apply. It is also likely that systems will default to a mode of operation which does not require any interpretation of these mappings in order to be compatible with existing operating system releases so it should remain safe to make these assumptions even if the mappings exist but are not supported by the kernel. Signed-off-by: Mark Brown <broonie@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13turbostat: Use GCC's CPUID functions to support PICJosh Triplett
commit 2b92865e648ce04a39fda4f903784a5d01ecb0dc upstream. turbostat uses inline assembly to call cpuid. On 32-bit x86, on systems that have certain security features enabled by default that make -fPIC the default, this causes a build error: turbostat.c: In function ‘check_cpuid’: turbostat.c:1906:2: error: PIC register clobbered by ‘ebx’ in ‘asm’ asm("cpuid" : "=a" (fms), "=c" (ecx), "=d" (edx) : "a" (1) : "ebx"); ^ GCC provides a header cpuid.h, containing a __get_cpuid function that works with both PIC and non-PIC. (On PIC, it saves and restores ebx around the cpuid instruction.) Use that instead. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13turbostat: Don't put unprocessed uapi headers in the include pathJosh Triplett
commit b731f3119de57144e16c19fd593b8daeb637843e upstream. turbostat's Makefile puts arch/x86/include/uapi/ in the include path, so that it can include <asm/msr.h> from it. It isn't in general safe to include even uapi headers directly from the kernel tree without processing them through scripts/headers_install.sh, but asm/msr.h happens to work. However, that include path can break with some versions of system headers, by overriding some system headers with the unprocessed versions directly from the kernel source. For instance: In file included from /build/x86-generic/usr/include/bits/sigcontext.h:28:0, from /build/x86-generic/usr/include/signal.h:339, from /build/x86-generic/usr/include/sys/wait.h:31, from turbostat.c:27: ../../../../arch/x86/include/uapi/asm/sigcontext.h:4:28: fatal error: linux/compiler.h: No such file or directory This occurs because the system bits/sigcontext.h on that build system includes <asm/sigcontext.h>, and asm/sigcontext.h in the kernel source includes <linux/compiler.h>, which scripts/headers_install.sh would have filtered out. Since turbostat really only wants a single header, just include that one header rather than putting an entire directory of kernel headers on the include path. In the process, switch from msr.h to msr-index.h, since turbostat just wants the MSR numbers. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13slub: Fix calculation of cpu slabsLi Zefan
commit 8afb1474db4701d1ab80cd8251137a3260e6913e upstream. /sys/kernel/slab/:t-0000048 # cat cpu_slabs 231 N0=16 N1=215 /sys/kernel/slab/:t-0000048 # cat slabs 145 N0=36 N1=109 See, the number of slabs is smaller than that of cpu slabs. The bug was introduced by commit 49e2258586b423684f03c278149ab46d8f8b6700 ("slub: per cpu cache for partial pages"). We should use page->pages instead of page->pobjects when calculating the number of cpu partial slabs. This also fixes the mapping of slabs and nodes. As there's no variable storing the number of total/active objects in cpu partial slabs, and we don't have user interfaces requiring those statistics, I just add WARN_ON for those cases. Acked-by: Christoph Lameter <cl@linux.com> Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13mmc: atmel-mci: fix timeout errors in SDIO mode when using DMALudovic Desroches
commit 66b512eda74d59b17eac04c4da1b38d82059e6c9 upstream. With some SDIO devices, timeout errors can happen when reading data. To solve this issue, the DMA transfer has to be activated before sending the command to the device. This order is incorrect in PDC mode. So we have to take care if we are using DMA or PDC to know when to send the MMC command. Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13mmc: fix host release issue after discard operationRay Jui
commit f662ae48ae67dfd42739e65750274fe8de46240a upstream. Under function mmc_blk_issue_rq, after an MMC discard operation, the MMC request data structure may be freed in memory. Later in the same function, the check of req->cmd_flags & MMC_REQ_SPECIAL_MASK is dangerous and invalid. It causes the MMC host not to be released when it should. This patch fixes the issue by marking the special request down before the discard/flush operation. Reported by: Harold (SoonYeal) Yang <haroldsy@broadcom.com> Signed-off-by: Ray Jui <rjui@broadcom.com> Reviewed-by: Seungwon Jeon <tgih.jun@samsung.com> Acked-by: Seungwon Jeon <tgih.jun@samsung.com> Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13mm/page-writeback.c: do not count anon pages as dirtyable memoryJohannes Weiner
commit a1c3bfb2f67ef766de03f1f56bdfff9c8595ab14 upstream. The VM is currently heavily tuned to avoid swapping. Whether that is good or bad is a separate discussion, but as long as the VM won't swap to make room for dirty cache, we can not consider anonymous pages when calculating the amount of dirtyable memory, the baseline to which dirty_background_ratio and dirty_ratio are applied. A simple workload that occupies a significant size (40+%, depending on memory layout, storage speeds etc.) of memory with anon/tmpfs pages and uses the remainder for a streaming writer demonstrates this problem. In that case, the actual cache pages are a small fraction of what is considered dirtyable overall, which results in an relatively large portion of the cache pages to be dirtied. As kswapd starts rotating these, random tasks enter direct reclaim and stall on IO. Only consider free pages and file pages dirtyable. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Tejun Heo <tj@kernel.org> Tested-by: Tejun Heo <tj@kernel.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13mm/page-writeback.c: fix dirty_balance_reserve subtraction from dirtyable memoryJohannes Weiner
commit a804552b9a15c931cfc2a92a2e0aed1add8b580a upstream. Tejun reported stuttering and latency spikes on a system where random tasks would enter direct reclaim and get stuck on dirty pages. Around 50% of memory was occupied by tmpfs backed by an SSD, and another disk (rotating) was reading and writing at max speed to shrink a partition. : The problem was pretty ridiculous. It's a 8gig machine w/ one ssd and 10k : rpm harddrive and I could reliably reproduce constant stuttering every : several seconds for as long as buffered IO was going on on the hard drive : either with tmpfs occupying somewhere above 4gig or a test program which : allocates about the same amount of anon memory. Although swap usage was : zero, turning off swap also made the problem go away too. : : The trigger conditions seem quite plausible - high anon memory usage w/ : heavy buffered IO and swap configured - and it's highly likely that this : is happening in the wild too. (this can happen with copying large files : to usb sticks too, right?) This patch (of 2): The dirty_balance_reserve is an approximation of the fraction of free pages that the page allocator does not make available for page cache allocations. As a result, it has to be taken into account when calculating the amount of "dirtyable memory", the baseline to which dirty_background_ratio and dirty_ratio are applied. However, currently the reserve is subtracted from the sum of free and reclaimable pages, which is non-sensical and leads to erroneous results when the system is dominated by unreclaimable pages and the dirty_balance_reserve is bigger than free+reclaimable. In that case, at least the already allocated cache should be considered dirtyable. Fix the calculation by subtracting the reserve from the amount of free pages, then adding the reclaimable pages on top. [akpm@linux-foundation.org: fix CONFIG_HIGHMEM build] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Tejun Heo <tj@kernel.org> Tested-by: Tejun Heo <tj@kernel.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Wu Fengguang <fengguang.wu@intel.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13mm/memory-failure.c: shift page lock from head page to tail page after thp splitNaoya Horiguchi
commit 54b9dd14d09f24927285359a227aa363ce46089e upstream. After thp split in hwpoison_user_mappings(), we hold page lock on the raw error page only between try_to_unmap, hence we are in danger of race condition. I found in the RHEL7 MCE-relay testing that we have "bad page" error when a memory error happens on a thp tail page used by qemu-kvm: Triggering MCE exception on CPU 10 mce: [Hardware Error]: Machine check events logged MCE exception done on CPU 10 MCE 0x38c535: Killing qemu-kvm:8418 due to hardware memory corruption MCE 0x38c535: dirty LRU page recovery: Recovered qemu-kvm[8418]: segfault at 20 ip 00007ffb0f0f229a sp 00007fffd6bc5240 error 4 in qemu-kvm[7ffb0ef14000+420000] BUG: Bad page state in process qemu-kvm pfn:38c400 page:ffffea000e310000 count:0 mapcount:0 mapping: (null) index:0x7ffae3c00 page flags: 0x2fffff0008001d(locked|referenced|uptodate|dirty|swapbacked) Modules linked in: hwpoison_inject mce_inject vhost_net macvtap macvlan ... CPU: 0 PID: 8418 Comm: qemu-kvm Tainted: G M -------------- 3.10.0-54.0.1.el7.mce_test_fixed.x86_64 #1 Hardware name: NEC NEC Express5800/R120b-1 [N8100-1719F]/MS-91E7-001, BIOS 4.6.3C19 02/10/2011 Call Trace: dump_stack+0x19/0x1b bad_page.part.59+0xcf/0xe8 free_pages_prepare+0x148/0x160 free_hot_cold_page+0x31/0x140 free_hot_cold_page_list+0x46/0xa0 release_pages+0x1c1/0x200 free_pages_and_swap_cache+0xad/0xd0 tlb_flush_mmu.part.46+0x4c/0x90 tlb_finish_mmu+0x55/0x60 exit_mmap+0xcb/0x170 mmput+0x67/0xf0 vhost_dev_cleanup+0x231/0x260 [vhost_net] vhost_net_release+0x3f/0x90 [vhost_net] __fput+0xe9/0x270 ____fput+0xe/0x10 task_work_run+0xc4/0xe0 do_exit+0x2bb/0xa40 do_group_exit+0x3f/0xa0 get_signal_to_deliver+0x1d0/0x6e0 do_signal+0x48/0x5e0 do_notify_resume+0x71/0xc0 retint_signal+0x48/0x8c The reason of this bug is that a page fault happens before unlocking the head page at the end of memory_failure(). This strange page fault is trying to access to address 0x20 and I'm not sure why qemu-kvm does this, but anyway as a result the SIGSEGV makes qemu-kvm exit and on the way we catch the bad page bug/warning because we try to free a locked page (which was the former head page.) To fix this, this patch suggests to shift page lock from head page to tail page just after thp split. SIGSEGV still happens, but it affects only error affected VMs, not a whole system. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13audit: correct a type mismatch in audit_syscall_exit()AKASHI Takahiro
commit 06bdadd7634551cfe8ce071fe44d0311b3033d9e upstream. audit_syscall_exit() saves a result of regs_return_value() in intermediate "int" variable and passes it to __audit_syscall_exit(), which expects its second argument as a "long" value. This will result in truncating the value returned by a system call and making a wrong audit record. I don't know why gcc compiler doesn't complain about this, but anyway it causes a problem at runtime on arm64 (and probably most 64-bit archs). Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13audit: reset audit backlog wait time after error recoveryRichard Guy Briggs
commit e789e561a50de0aaa8c695662d97aaa5eac9d55f upstream. When the audit queue overflows and times out (audit_backlog_wait_time), the audit queue overflow timeout is set to zero. Once the audit queue overflow timeout condition recovers, the timeout should be reset to the original value. See also: https://lkml.org/lkml/2013/9/2/473 Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Signed-off-by: Dan Duval <dan.duval@oracle.com> Signed-off-by: Chuck Anderson <chuck.anderson@oracle.com> Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13fuse: fix pipe_buf_operationsMiklos Szeredi
commit 28a625cbc2a14f17b83e47ef907b2658576a32aa upstream. Having this struct in module memory could Oops when if the module is unloaded while the buffer still persists in a pipe. Since sock_pipe_buf_ops is essentially the same as fuse_dev_pipe_buf_steal merge them into nosteal_pipe_buf_ops (this is the same as default_pipe_buf_ops except stealing the page from the buffer is not allowed). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13Revert "EISA: Initialize device before its resources"Bjorn Helgaas
commit 765ee51f9a3f652959b4c7297d198a28e37952b4 upstream. This reverts commit 26abfeed4341872364386c6a52b9acef8c81a81a. In the eisa_probe() force_probe path, if we were unable to request slot resources (e.g., [io 0x800-0x8ff]), we skipped the slot with "Cannot allocate resource for EISA slot %d" before reading the EISA signature in eisa_init_device(). Commit 26abfeed4341 moved eisa_init_device() earlier, so we tried to read the EISA signature before requesting the slot resources, and this caused hangs during boot. Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1251816 Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13intel-iommu: fix off-by-one in pagetable freeingAlex Williamson
commit 08336fd218e087cc4fcc458e6b6dcafe8702b098 upstream. dma_pte_free_level() has an off-by-one error when checking whether a pte is completely covered by a range. Take for example the case of attempting to free pfn 0x0 - 0x1ff, ie. 512 entries covering the first 2M superpage. The level_size() is 0x200 and we test: static void dma_pte_free_level(... ... if (!(0 > 0 || 0x1ff < 0 + 0x200)) { ... } Clearly the 2nd test is true, which means we fail to take the branch to clear and free the pagetable entry. As a result, we're leaking pagetables and failing to install new pages over the range. This was found with a PCI device assigned to a QEMU guest using vfio-pci without a VGA device present. The first 1M of guest address space is mapped with various combinations of 4K pages, but eventually the range is entirely freed and replaced with a 2M contiguous mapping. intel-iommu errors out with something like: ERROR: DMA PTE for vPFN 0x0 already set (to 5c2b8003 not 849c00083) In this case 5c2b8003 is the pointer to the previous leaf page that was neither freed nor cleared and 849c00083 is the superpage entry that we're trying to replace it with. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13arch/sh/kernel/kgdb.c: add missing #include <linux/sched.h>Wanlong Gao
commit 53a52f17d96c8d47c79a7dafa81426317e89c7c1 upstream. arch/sh/kernel/kgdb.c: In function 'sleeping_thread_to_gdb_regs': arch/sh/kernel/kgdb.c:225:32: error: implicit declaration of function 'task_stack_page' [-Werror=implicit-function-declaration] arch/sh/kernel/kgdb.c:242:23: error: dereferencing pointer to incomplete type arch/sh/kernel/kgdb.c:243:22: error: dereferencing pointer to incomplete type arch/sh/kernel/kgdb.c: In function 'singlestep_trap_handler': arch/sh/kernel/kgdb.c:310:27: error: 'SIGTRAP' undeclared (first use in this function) arch/sh/kernel/kgdb.c:310:27: note: each undeclared identifier is reported only once for each function it appears in This was introduced by commit 16559ae48c76 ("kgdb: remove #include <linux/serial_8250.h> from kgdb.h"). [geert@linux-m68k.org: reworded and reformatted] Signed-off-by: Wanlong Gao <gaowanlong@cn.fujitsu.com> Signed-off-by: Geert Uytterhoeven <geert+renesas@linux-m68k.org> Reported-by: Fengguang Wu <fengguang.wu@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13tracing: Check if tracing is enabled in trace_puts()Steven Rostedt (Red Hat)
commit 3132e107d608f8753240d82d61303c500fd515b4 upstream. If trace_puts() is used very early in boot up, it can crash the machine if it is called before the ring buffer is allocated. If a trace_printk() is used with no arguments, then it will be converted into a trace_puts() and suffer the same fate. Fixes: 09ae72348ecc "tracing: Add trace_puts() for even faster trace_printk() tracing" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13tracing: Have trace buffer point back to trace_arraySteven Rostedt (Red Hat)
commit dced341b2d4f06668efaab33f88de5d287c0f45b upstream. The trace buffer has a descriptor pointer that goes back to the trace array. But it was never assigned. Luckily, nothing uses it (yet), but it will in the future. Although nothing currently uses this, if any of the new features get backported to older kernels, and because this is such a simple change, I'm marking it for stable too. Fixes: 12883efb670c "tracing: Consolidate max_tr into main trace_array structure" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-13SELinux: Fix memory leak upon loading policyTetsuo Handa
commit 8ed814602876bec9bad2649ca17f34b499357a1c upstream. Hello. I got below leak with linux-3.10.0-54.0.1.el7.x86_64 . [ 681.903890] kmemleak: 5538 new suspected memory leaks (see /sys/kernel/debug/kmemleak) Below is a patch, but I don't know whether we need special handing for undoing ebitmap_set_bit() call. ---------- >>From fe97527a90fe95e2239dfbaa7558f0ed559c0992 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Mon, 6 Jan 2014 16:30:21 +0900 Subject: SELinux: Fix memory leak upon loading policy Commit 2463c26d "SELinux: put name based create rules in a hashtable" did not check return value from hashtab_insert() in filename_trans_read(). It leaks memory if hashtab_insert() returns error. unreferenced object 0xffff88005c9160d0 (size 8): comm "systemd", pid 1, jiffies 4294688674 (age 235.265s) hex dump (first 8 bytes): 57 0b 00 00 6b 6b 6b a5 W...kkk. backtrace: [<ffffffff816604ae>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811cba5e>] kmem_cache_alloc_trace+0x12e/0x360 [<ffffffff812aec5d>] policydb_read+0xd1d/0xf70 [<ffffffff812b345c>] security_load_policy+0x6c/0x500 [<ffffffff812a623c>] sel_write_load+0xac/0x750 [<ffffffff811eb680>] vfs_write+0xc0/0x1f0 [<ffffffff811ec08c>] SyS_write+0x4c/0xa0 [<ffffffff81690419>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff However, we should not return EEXIST error to the caller, or the systemd will show below message and the boot sequence freezes. systemd[1]: Failed to load SELinux policy. Freezing. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06Linux 3.10.29Greg Kroah-Hartman
2014-02-06x86, cpu, amd: Add workaround for family 16h, erratum 793Borislav Petkov
commit 3b56496865f9f7d9bcb2f93b44c63f274f08e3b6 upstream. This adds the workaround for erratum 793 as a precaution in case not every BIOS implements it. This addresses CVE-2013-6885. Erratum text: [Revision Guide for AMD Family 16h Models 00h-0Fh Processors, document 51810 Rev. 3.04 November 2013] 793 Specific Combination of Writes to Write Combined Memory Types and Locked Instructions May Cause Core Hang Description Under a highly specific and detailed set of internal timing conditions, a locked instruction may trigger a timing sequence whereby the write to a write combined memory type is not flushed, causing the locked instruction to stall indefinitely. Potential Effect on System Processor core hang. Suggested Workaround BIOS should set MSR C001_1020[15] = 1b. Fix Planned No fix planned [ hpa: updated description, fixed typo in MSR name ] Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20140114230711.GS29865@pd.tnic Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06powerpc: Make sure "cache" directory is removed when offlining cpuPaul Mackerras
commit 91b973f90c1220d71923e7efe1e61f5329806380 upstream. The code in remove_cache_dir() is supposed to remove the "cache" subdirectory from the sysfs directory for a CPU when that CPU is being offlined. It tries to do this by calling kobject_put() on the kobject for the subdirectory. However, the subdirectory only gets removed once the last reference goes away, and the reference being put here may well not be the last reference. That means that the "cache" subdirectory may still exist when the offlining operation has finished. If the same CPU subsequently gets onlined, the code tries to add a new "cache" subdirectory. If the old subdirectory has not yet been removed, we get a WARN_ON in the sysfs code, with stack trace, and an error message printed on the console. Further, we ultimately end up with an online cpu with no "cache" subdirectory. This fixes it by doing an explicit kobject_del() at the point where we want the subdirectory to go away. kobject_del() removes the sysfs directory even though the object still exists in memory. The object will get freed at some point in the future. A subsequent onlining operation can create a new sysfs directory, even if the old object still exists in memory, without causing any problems. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06powerpc: Fix the setup of CPU-to-Node mappings during CPU onlineSrivatsa S. Bhat
commit d4edc5b6c480a0917e61d93d55531d7efa6230be upstream. On POWER platforms, the hypervisor can notify the guest kernel about dynamic changes in the cpu-numa associativity (VPHN topology update). Hence the cpu-to-node mappings that we got from the firmware during boot, may no longer be valid after such updates. This is handled using the arch_update_cpu_topology() hook in the scheduler, and the sched-domains are rebuilt according to the new mappings. But unfortunately, at the moment, CPU hotplug ignores these updated mappings and instead queries the firmware for the cpu-to-numa relationships and uses them during CPU online. So the kernel can end up assigning wrong NUMA nodes to CPUs during subsequent CPU hotplug online operations (after booting). Further, a particularly problematic scenario can result from this bug: On POWER platforms, the SMT mode can be switched between 1, 2, 4 (and even 8) threads per core. The switch to Single-Threaded (ST) mode is performed by offlining all except the first CPU thread in each core. Switching back to SMT mode involves onlining those other threads back, in each core. Now consider this scenario: 1. During boot, the kernel gets the cpu-to-node mappings from the firmware and assigns the CPUs to NUMA nodes appropriately, during CPU online. 2. Later on, the hypervisor updates the cpu-to-node mappings dynamically and communicates this update to the kernel. The kernel in turn updates its cpu-to-node associations and rebuilds its sched domains. Everything is fine so far. 3. Now, the user switches the machine from SMT to ST mode (say, by running ppc64_cpu --smt=1). This involves offlining all except 1 thread in each core. 4. The user then tries to switch back from ST to SMT mode (say, by running ppc64_cpu --smt=4), and this involves onlining those threads back. Since CPU hotplug ignores the new mappings, it queries the firmware and tries to associate the newly onlined sibling threads to the old NUMA nodes. This results in sibling threads within the same core getting associated with different NUMA nodes, which is incorrect. The scheduler's build-sched-domains code gets thoroughly confused with this and enters an infinite loop and causes soft-lockups, as explained in detail in commit 3be7db6ab (powerpc: VPHN topology change updates all siblings). So to fix this, use the numa_cpu_lookup_table to remember the updated cpu-to-node mappings, and use them during CPU hotplug online operations. Further, we also need to ensure that all threads in a core are assigned to a common NUMA node, irrespective of whether all those threads were online during the topology update. To achieve this, we take care not to use cpu_sibling_mask() since it is not hotplug invariant. Instead, we use cpu_first_sibling_thread() and set up the mappings manually using the 'threads_per_core' value for that particular platform. This helps us ensure that we don't hit this bug with any combination of CPU hotplug and SMT mode switching. Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06btrfs: restrict snapshotting to own subvolumesDavid Sterba
commit d024206133ce21936b3d5780359afc00247655b7 upstream. Currently, any user can snapshot any subvolume if the path is accessible and thus indirectly create and keep files he does not own under his direcotries. This is not possible with traditional directories. In security context, a user can snapshot root filesystem and pin any potentially buggy binaries, even if the updates are applied. All the snapshots are visible to the administrator, so it's possible to verify if there are suspicious snapshots. Another more practical problem is that any user can pin the space used by eg. root and cause ENOSPC. Original report: https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/484786 Signed-off-by: David Sterba <dsterba@suse.cz> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06Btrfs: handle EAGAIN case properly in btrfs_drop_snapshot()Wang Shilong
commit 90515e7f5d7d24cbb2a4038a3f1b5cfa2921aa17 upstream. We may return early in btrfs_drop_snapshot(), we shouldn't call btrfs_std_err() for this case, fix it. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06target/iscsi: Fix network portal creation raceAndy Grover
commit ee291e63293146db64668e8d65eb35c97e8324f4 upstream. When creating network portals rapidly, such as when restoring a configuration, LIO's code to reuse existing portals can return a false negative if the thread hasn't run yet and set np_thread_state to ISCSI_NP_THREAD_ACTIVE. This causes an error in the network stack when attempting to bind to the same address/port. This patch sets NP_THREAD_ACTIVE before the np is placed on g_np_list, so even if the thread hasn't run yet, iscsit_get_np will return the existing np. Also, convert np_lock -> np_mutex + hold across adding new net portal to g_np_list to prevent a race where two threads may attempt to create the same network portal, resulting in one of them failing. (nab: Add missing mutex_unlocks in iscsit_add_np failure paths) (DanC: Fix incorrect spin_unlock -> spin_unlock_bh) Signed-off-by: Andy Grover <agrover@redhat.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freezeAsias He
commit f466f75385369a181409e46da272db3de6f5c5cb upstream. vqs are freed in virtscsi_freeze but the hotcpu_notifier is not unregistered. We will have a use-after-free usage when the notifier callback is called after virtscsi_freeze. Fixes: 285e71ea6f3583a85e27cb2b9a7d8c35d4c0d558 ("virtio-scsi: reset virtqueue affinity when doing cpu hotplug") Signed-off-by: Asias He <asias.hejun@gmail.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06SCSI: bfa: Chinook quad port 16G FC HBA claim issueVijaya Mohan Guvva
commit dcaf9aed995c2b2a49fb86bbbcfa2f92c797ab5d upstream. Bfa driver crash is observed while pushing the firmware on to chinook quad port card due to uninitialized bfi_image_ct2 access which gets initialized only for CT2 ASIC based cards after request_firmware(). For quard port chinook (CT2 ASIC based), bfi_image_ct2 is not getting initialized as there is no check for chinook PCI device ID before request_firmware and instead bfi_image_cb is initialized as it is the default case for card type check. This patch includes changes to read the right firmware for quad port chinook. Signed-off-by: Vijaya Mohan Guvva <vmohan@brocade.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06usb: core: get config and string descriptors for unauthorized devicesThomas Pugliese
commit 83e83ecb79a8225e79bc8e54e9aff3e0e27658a2 upstream. There is no need to skip querying the config and string descriptors for unauthorized WUSB devices when usb_new_device is called. It is allowed by WUSB spec. The only action that needs to be delayed until authorization time is the set config. This change allows user mode tools to see the config and string descriptors earlier in enumeration which is needed for some WUSB devices to function properly on Android systems. It also reduces the amount of divergent code paths needed for WUSB devices. Signed-off-by: Thomas Pugliese <thomas.pugliese@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06iwlwifi: pcie: fix interrupt coalescing for 7260 / 3160Emmanuel Grumbach
commit 6960a059b2c618f32fe549f13287b3d2278c09e9 upstream. We changed the timeout for the interrupt coealescing for calibration, but that wasn't effective since we changed that value back before loading the firmware. Since calibrations are notification from firmware and not Rx packets, this doesn't change anyway - the firmware will fire an interrupt straight away regardless of the interrupt coalescing value. Also, a HW issue has been discovered in 7000 devices series. The work around is to disable the new interrupt coalescing timeout feature - do this by setting bit 31 in CSR_INT_COALESCING. This has been fixed in 7265 which means that we can't rely on the device family and must have a hint in the iwl_cfg structure. Fixes: 99cd47142399 ("iwlwifi: add 7000 series device configuration") Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06ALSA: hda/hdmi - allow PIN_OUT to be dynamically enabledStephen Warren
(This is upstream 75fae117a5db "ALSA: hda/hdmi - allow PIN_OUT to be dynamically enabled", backported to stable 3.10 through 3.12. 3.13 and later can take the original patch.) Commit 384a48d71520 "ALSA: hda: HDMI: Support codecs with fewer cvts than pins" dynamically enabled each pin widget's PIN_OUT only when the pin was actively in use. This was required on certain NVIDIA CODECs for correct operation. Specifically, if multiple pin widgets each had their mux input select the same audio converter widget and each pin widget had PIN_OUT enabled, then only one of the pin widgets would actually receive the audio, and often not the one the user wanted! However, this apparently broke some Intel systems, and commit 6169b673618b "ALSA: hda - Always turn on pins for HDMI/DP" reverted the dynamic setting of PIN_OUT. This in turn broke the afore-mentioned NVIDIA CODECs. This change supports either dynamic or static handling of PIN_OUT, selected by a flag set up during CODEC initialization. This flag is enabled for all recent NVIDIA GPUs. Reported-by: Uosis <uosisl@gmail.com> Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06ALSA: hda - hdmi: introduce patch_nvhdmi()Anssi Hannula
(This is a backport of *part* of upstream 611885bc963a "ALSA: hda - hdmi: Disallow unsupported 2ch remapping on NVIDIA codecs" to stable 3.10 through 3.12. Later stable already contain all of the original patch.) Mainline commit 611885bc963a "ALSA: hda - hdmi: Disallow unsupported 2ch remapping on NVIDIA codecs" introduces function patch_nvhdmi(). That function is edited by 75fae117a5db "ALSA: hda/hdmi - allow PIN_OUT to be dynamically enabled". In order to backport the PIN_OUT patch, I am first back-porting just the addition of function patch_nvhdmi(), so that the conflicts applying the PIN_OUT patch are simplified. Ideally, one might backport all of 611885bc963a. However, that commit doesn't apply to stable kernels, since it relies on a chain of other patches which implement new features. Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi> Signed-off-by: Takashi Iwai <tiwai@suse.de> [swarren, extracted just a small part of the original patch] Signed-off-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06KVM: PPC: e500: Fix bad address type in deliver_tlb_misss()Mihai Caraman
commit 70713fe315ed14cd1bb07d1a7f33e973d136ae3d upstream. Use gva_t instead of unsigned int for eaddr in deliver_tlb_miss(). Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06KVM: PPC: Book3S HV: use xics_wake_cpu only when definedAndreas Schwab
commit 48eaef0518a565d3852e301c860e1af6a6db5a84 upstream. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06parisc: fix cache-flushingHelge Deller
commit 57737c49dd72c96cfbcd4f66559f3ffc399aeb4f upstream. This commit: f8dae00684d678afa13041ef170cecfd1297ed40: parisc: Ensure full cache coherency for kmap/kunmap caused negative caching side-effects, e.g. hanging processes with expect and too many inequivalent alias messages from flush_dcache_page() on Debian 5 systems. This patch now partly reverts it and has been in production use on our debian buildd makeservers since a week without any major problems. Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: John David Anglin <dave.anglin@bell.net> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06iwlwifi: pcie: enable oscillator for L1 exitEmmanuel Grumbach
commit 2d93aee152b1758a94a18fe15d72153ba73b5679 upstream. Enabling the oscillator consumes slightly more power (100uA) but allows to make sure that we exit from L1 on time. Not doing so might lead to a PCIe specification violation since we might wake up from L1 at the wrong time. This issue has been identified on 3160 and 7260 only. On older NICs L1 off is not enabled, on newer NICs (7265), the issue is fixed. When the bug occurs the user sees that the NIC has disappeared from the PCI bridge, any access to the device returns 0xff. This fixes: https://bugzilla.kernel.org/show_bug.cgi?id=64541 and has been extensively discussed here: http://markmail.org/thread/mfmpzqt3r333n4bo Fixes: 99cd47142399 ("iwlwifi: add 7000 series device configuration") Reported-and-tested-by: wzyboy <wzyboy@wzyboy.org> Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06ip6tnl: fix double free of fb_tnl_dev on exitNicolas Dichtel
[ No relevant upstream commit. ] This problem was fixed upstream by commit 1e9f3d6f1c40 ("ip6tnl: fix use after free of fb_tnl_dev"). The upstream patch depends on upstream commit 0bd8762824e7 ("ip6tnl: add x-netns support"), which was not backported into 3.10 branch. First, explain the problem: when the ip6_tunnel module is unloaded, ip6_tunnel_cleanup() is called. rmmod ip6_tunnel => ip6_tunnel_cleanup() => rtnl_link_unregister() => __rtnl_kill_links() => for_each_netdev(net, dev) { if (dev->rtnl_link_ops == ops) ops->dellink(dev, &list_kill); } At this point, the FB device is deleted (and all ip6tnl tunnels). => unregister_pernet_device() => unregister_pernet_operations() => ops_exit_list() => ip6_tnl_exit_net() => ip6_tnl_destroy_tunnels() => t = rtnl_dereference(ip6n->tnls_wc[0]); unregister_netdevice_queue(t->dev, &list); We delete the FB device a second time here! The previous fix removes these lines, which fix this double free. But the patch introduces a memory leak when a netns is destroyed, because the FB device is never deleted. By adding an rtnl ops which delete all ip6tnl device excepting the FB device, we can keep this exlicit removal in ip6_tnl_destroy_tunnels(). CC: Steven Rostedt <rostedt@goodmis.org> CC: Willem de Bruijn <willemb@google.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: Steven Rostedt <srostedt@redhat.com> Tested-by: Steven Rostedt <srostedt@redhat.com> (and our entire MRG team) Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Tested-by: John Kacur <jkacur@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06Revert "ip6tnl: fix use after free of fb_tnl_dev"Nicolas Dichtel
[ No relevant upstream commit. ] This reverts commit 22c3ec552c29cf4bd4a75566088950fe57d860c4. This patch is not the right fix, it introduces a memory leak when a netns is destroyed (the FB device is never deleted). Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: Steven Rostedt <srostedt@redhat.com> Tested-by: Steven Rostedt <srostedt@redhat.com> (and our entire MRG team) Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Tested-by: John Kacur <jkacur@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06sit: fix double free of fb_tunnel_dev on exitNicolas Dichtel
[ No relevant upstream commit. ] This problem was fixed upstream by commit 9434266f2c64 ("sit: fix use after free of fb_tunnel_dev"). The upstream patch depends on upstream commit 5e6700b3bf98 ("sit: add support of x-netns"), which was not backported into 3.10 branch. First, explain the problem: when the sit module is unloaded, sit_cleanup() is called. rmmod sit => sit_cleanup() => rtnl_link_unregister() => __rtnl_kill_links() => for_each_netdev(net, dev) { if (dev->rtnl_link_ops == ops) ops->dellink(dev, &list_kill); } At this point, the FB device is deleted (and all sit tunnels). => unregister_pernet_device() => unregister_pernet_operations() => ops_exit_list() => sit_exit_net() => sit_destroy_tunnels() In this function, no tunnel is found. => unregister_netdevice_queue(sitn->fb_tunnel_dev, &list); We delete the FB device a second time here! Because we cannot simply remove the second deletion (sit_exit_net() must remove the FB device when a netns is deleted), we add an rtnl ops which delete all sit device excepting the FB device and thus we can keep the explicit deletion in sit_exit_net(). CC: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: Willem de Bruijn <willemb@google.com> Reported-by: Steven Rostedt <srostedt@redhat.com> Tested-by: Steven Rostedt <srostedt@redhat.com> (and our entire MRG team) Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Tested-by: John Kacur <jkacur@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06xen-netfront: fix resource leak in netfrontAnnie Li
[ Upstream commit cefe0078eea52af17411eb1248946a94afb84ca5 ] This patch removes grant transfer releasing code from netfront, and uses gnttab_end_foreign_access to end grant access since gnttab_end_foreign_access_ref may fail when the grant entry is currently used for reading or writing. * clean up grant transfer code kept from old netfront(2.6.18) which grants pages for access/map and transfer. But grant transfer is deprecated in current netfront, so remove corresponding release code for transfer. * fix resource leak, release grant access (through gnttab_end_foreign_access) and skb for tx/rx path, use get_page to ensure page is released when grant access is completed successfully. Xen-blkfront/xen-tpmfront/xen-pcifront also have similar issue, but patches for them will be created separately. V6: Correct subject line and commit message. V5: Remove unecessary change in xennet_end_access. V4: Revert put_page in gnttab_end_foreign_access, and keep netfront change in single patch. V3: Changes as suggestion from David Vrabel, ensure pages are not freed untill grant acess is ended. V2: Improve patch comments. Signed-off-by: Annie Li <annie.li@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06net: Fix memory leak if TPROXY used with TCP early demuxHolger Eitzenberger
[ Upstream commit a452ce345d63ddf92cd101e4196569f8718ad319 ] I see a memory leak when using a transparent HTTP proxy using TPROXY together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable): unreferenced object 0xffff88008cba4a40 (size 1696): comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s) hex dump (first 32 bytes): 0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00 .. j@..7..2..... 02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9 [<ffffffff81270185>] sk_prot_alloc+0x29/0xc5 [<ffffffff812702cf>] sk_clone_lock+0x14/0x283 [<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b [<ffffffff8129a893>] netlink_broadcast+0x14/0x16 [<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3 [<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d [<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0 [<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e [<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55 [<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725 [<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154 [<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514 [<ffffffff8127aa77>] process_backlog+0xee/0x1c5 [<ffffffff8127c949>] net_rx_action+0xa7/0x200 [<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157 But there are many more, resulting in the machine going OOM after some days. From looking at the TPROXY code, and with help from Florian, I see that the memory leak is introduced in tcp_v4_early_demux(): void tcp_v4_early_demux(struct sk_buff *skb) { /* ... */ iph = ip_hdr(skb); th = tcp_hdr(skb); if (th->doff < sizeof(struct tcphdr) / 4) return; sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo, iph->saddr, th->source, iph->daddr, ntohs(th->dest), skb->skb_iif); if (sk) { skb->sk = sk; where the socket is assigned unconditionally to skb->sk, also bumping the refcnt on it. This is problematic, because in our case the skb has already a socket assigned in the TPROXY target. This then results in the leak I see. The very same issue seems to be with IPv6, but haven't tested. Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06fib_frontend: fix possible NULL pointer dereferenceOliver Hartkopp
[ Upstream commit a0065f266a9b5d51575535a25c15ccbeed9a9966 ] The two commits 0115e8e30d (net: remove delay at device dismantle) and 748e2d9396a (net: reinstate rtnl in call_netdevice_notifiers()) silently removed a NULL pointer check for in_dev since Linux 3.7. This patch re-introduces this check as it causes crashing the kernel when setting small mtu values on non-ip capable netdevices. Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06ip_tunnel: clear IPCB in ip_tunnel_xmit() in case dst_link_failure() is calledDuan Jiong
[ Upstream commit 11c21a307d79ea5f6b6fc0d3dfdeda271e5e65f6 ] commit a622260254ee48("ip_tunnel: fix kernel panic with icmp_dest_unreach") clear IPCB in ip_tunnel_xmit() , or else skb->cb[] may contain garbage from GSO segmentation layer. But commit 0e6fbc5b6c621("ip_tunnels: extend iptunnel_xmit()") refactor codes, and it clear IPCB behind the dst_link_failure(). So clear IPCB in ip_tunnel_xmit() just like commti a622260254ee48("ip_tunnel: fix kernel panic with icmp_dest_unreach"). Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06s390/bpf,jit: fix 32 bit divisions, use unsigned divide instructionsHeiko Carstens
[ Upstream commit 3af57f78c38131b7a66e2b01e06fdacae01992a3 ] The s390 bpf jit compiler emits the signed divide instructions "dr" and "d" for unsigned divisions. This can cause problems: the dividend will be zero extended to a 64 bit value and the divisor is the 32 bit signed value as specified A or X accumulator, even though A and X are supposed to be treated as unsigned values. The divide instrunctions will generate an exception if the result cannot be expressed with a 32 bit signed value. This is the case if e.g. the dividend is 0xffffffff and the divisor either 1 or also 0xffffffff (signed: -1). To avoid all these issues simply use unsigned divide instructions. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06bpf: do not use reciprocal divideEric Dumazet
[ Upstream commit aee636c4809fa54848ff07a899b326eb1f9987a2 ] At first Jakub Zawadzki noticed that some divisions by reciprocal_divide were not correct. (off by one in some cases) http://www.wireshark.org/~darkjames/reciprocal-buggy.c He could also show this with BPF: http://www.wireshark.org/~darkjames/set-and-dump-filter-k-bug.c The reciprocal divide in linux kernel is not generic enough, lets remove its use in BPF, as it is not worth the pain with current cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jakub Zawadzki <darkjames-ws@darkjames.pl> Cc: Mircea Gherzan <mgherzan@gmail.com> Cc: Daniel Borkmann <dxchgb@gmail.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Matt Evans <matt@ozlabs.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06tcp: metrics: Avoid duplicate entries with the same destination-IPChristoph Paasch
[ Upstream commit 77f99ad16a07aa062c2d30fae57b1fee456f6ef6 ] Because the tcp-metrics is an RCU-list, it may be that two soft-interrupts are inside __tcp_get_metrics() for the same destination-IP at the same time. If this destination-IP is not yet part of the tcp-metrics, both soft-interrupts will end up in tcpm_new and create a new entry for this IP. So, we will have two tcp-metrics with the same destination-IP in the list. This patch checks twice __tcp_get_metrics(). First without holding the lock, then while holding the lock. The second one is there to confirm that the entry has not been added by another soft-irq while waiting for the spin-lock. Fixes: 51c5d0c4b169b (tcp: Maintain dynamic metrics in local cache.) Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06net: rds: fix per-cpu helper usageGerald Schaefer
[ Upstream commit c196403b79aa241c3fefb3ee5bb328aa7c5cc860 ] commit ae4b46e9d "net: rds: use this_cpu_* per-cpu helper" broke per-cpu handling for rds. chpfirst is the result of __this_cpu_read(), so it is an absolute pointer and not __percpu. Therefore, __this_cpu_write() should not operate on chpfirst, but rather on cache->percpu->first, just like __this_cpu_read() did before. Signed-off-byd Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06net,via-rhine: Fix tx_timeout handlingRichard Weinberger
[ Upstream commit a926592f5e4e900f3fa903298c4619a131e60963 ] rhine_reset_task() misses to disable the tx scheduler upon reset, this can lead to a crash if work is still scheduled while we're resetting the tx queue. Fixes: [ 93.591707] BUG: unable to handle kernel NULL pointer dereference at 0000004c [ 93.595514] IP: [<c119d10d>] rhine_napipoll+0x491/0x6 Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06net: avoid reference counter overflows on fib_rules in multicast forwardingHannes Frederic Sowa
[ Upstream commit 95f4a45de1a0f172b35451fc52283290adb21f6e ] Bob Falken reported that after 4G packets, multicast forwarding stopped working. This was because of a rule reference counter overflow which freed the rule as soon as the overflow happend. This patch solves this by adding the FIB_LOOKUP_NOREF flag to fib_rules_lookup calls. This is safe even from non-rcu locked sections as in this case the flag only implies not taking a reference to the rule, which we don't need at all. Rules only hold references to the namespace, which are guaranteed to be available during the call of the non-rcu protected function reg_vif_xmit because of the interface reference which itself holds a reference to the net namespace. Fixes: f0ad0860d01e47 ("ipv4: ipmr: support multiple tables") Fixes: d1db275dd3f6e4 ("ipv6: ip6mr: support multiple tables") Reported-by: Bob Falken <NetFestivalHaveFun@gmx.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Thomas Graf <tgraf@suug.ch> Cc: Julian Anastasov <ja@ssi.bg> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06ieee802154: Fix memory leak in ieee802154_add_iface()Christian Engelmayer
[ Upstream commit 267d29a69c6af39445f36102a832b25ed483f299 ] Fix a memory leak in the ieee802154_add_iface() error handling path. Detected by Coverity: CID 710490. Signed-off-by: Christian Engelmayer <cengelma@gmx.at> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-02-06inet_diag: fix inet_diag_dump_icsk() timewait socket state logicNeal Cardwell
[ Based upon upstream commit 70315d22d3c7383f9a508d0aab21e2eb35b2303a ] Fix inet_diag_dump_icsk() to reflect the fact that both TIME_WAIT and FIN_WAIT2 connections are represented by inet_timewait_sock (not just TIME_WAIT). Thus: (a) We need to iterate through the time_wait buckets if the user wants either TIME_WAIT or FIN_WAIT2. (Before fixing this, "ss -nemoi state fin-wait-2" would not return any sockets, even if there were some in FIN_WAIT2.) (b) We need to check tw_substate to see if the user wants to dump sockets in the particular substate (TIME_WAIT or FIN_WAIT2) that a given connection is in. (Before fixing this, "ss -nemoi state time-wait" would actually return sockets in state FIN_WAIT2.) An analogous fix is in v3.13: 70315d22d3c7383f9a508d0aab21e2eb35b2303a ("inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets") but that patch is quite different because 3.13 code is very different in this area due to the unification of TCP hash tables in 05dbc7b ("tcp/dccp: remove twchain") in v3.13-rc1. I tested that this applies cleanly between v3.3 and v3.12, and tested that it works in both 3.3 and 3.12. It does not apply cleanly to 3.2 and earlier (though it makes semantic sense), and semantically is not the right fix for 3.13 and beyond (as mentioned above). Signed-off-by: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>