summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-02-13sched/deadline: Prevent CPU hotplug operation if DL task on CPUv4.16-rc1-bandwidth-accounting-v3Mathieu Poirier
When a DL task is assigned a CPU the "utilisation" (this_bw) and the "active utilisation" (running_bw) fields of rq->dl are incremented accordingly. If the CPU is hotplugged out the DL task is transferred to another CPU but the task's contribution to this_bw and running_bw isn't substracted from the outgoing CPU's rq nor added to the newly appointed CPU. In this example (where a kernel has been instrumented to output the relevant information) we have a 4 CPU system with one 6:10 DL task that has been assigned to CPU 2: root@dragon:/home/linaro/# cat /proc/rq_debug dl_rq[0]: .online : yes .dl_nr_running : 0 .running_bw : 0 .this_bw : 0 .rd->span : 0-3 .dl_nr_migratory : 0 .rd->dl_bw->bw : 3984588 .rd->dl_bw->total_bw : 629145 dl_rq[1]: .online : yes .dl_nr_running : 0 .running_bw : 0 .this_bw : 0 .rd->span: : 0-3 .dl_nr_migratory : 0 .rd->dl_bw->bw : 3984588 <-- RD capacity for 4 CPUs .rd->dl_bw->total_bw : 629145 dl_rq[2]: .online : yes .dl_nr_running : 1 <-- One task running .running_bw : 629145 <-- Normal behavior .this_bw : 629145 <-- Normal behavior .rd->span : 0-3 .dl_nr_migratory : 1 .rd->dl_bw->bw : 3984588 .rd->dl_bw->total_bw : 629145 dl_rq[3]: .online : yes .dl_nr_running : 0 .running_bw : 0 .this_bw : 0 .rd->span : 0-3 .dl_nr_migratory : 0 .rd->dl_bw->bw : 3984588 .rd->dl_bw->total_bw : 629145 At this point we hotplug out CPU2 and list the status again: root@dragon:/home/linaro/# echo 0 > /sys/devices/system/cpu/cpu2/online root@dragon:/home/linaro/# cat /proc/rq_debug dl_rq[0]: .online : yes .dl_nr_running : 1 <-- DL task was moved here .running_bw : 0 <-- Contribution not added .this_bw : 0 <-- Contribution not added .rd->span : 0-1,3 .dl_nr_migratory : 1 .rd->dl_bw->bw : 2988441 <-- RD capacity updated .rd->dl_bw->total_bw : 629145 dl_rq[1]: .online : yes .dl_nr_running : 0 .running_bw : 0 .this_bw : 0 .rd->span : 0-1,3 .dl_nr_migratory : 0 .rd->dl_bw->bw : 2988441 .rd->dl_bw->total_bw : 629145 dl_rq[2]: .online : no <-- runqueue no longer online .dl_nr_running : 0 <-- DL task was moved .running_bw : 629145 <-- Contribution not substracted .this_bw : 629145 <-- Contribution not substracted .rd->span : 2 .dl_nr_migratory : 0 .rd->dl_bw->bw : 996147 .rd->dl_bw->total_bw : 0 dl_rq[3]: .online : yes .dl_nr_running : 0 .running_bw : 0 .this_bw : 0 .rd->span : 0-1,3 .dl_nr_migratory : 0 .rd->dl_bw->bw : 2988441 .rd->dl_bw->total_bw: 629145 Upon rebooting the system a splat is also produced: [ 578.184789] ------------[ cut here ]------------ [ 578.184813] dl_rq->running_bw > old [ 578.184838] WARNING: CPU: 0 PID: 4076 at /home/mpoirier/work/linaro/deadline/kernel/kernel/sched/deadline.c:98 dequeue_task_dl+0x128/0x168 [ 578.191693] Modules linked in: [ 578.191705] CPU: 0 PID: 4076 Comm: burn Not tainted 4.15.0-00009-gf597fc1e5764-dirty #259 [ 578.191708] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 578.191713] pstate: 60000085 (nZCv daIf -PAN -UAO) [ 578.191718] pc : dequeue_task_dl+0x128/0x168 [ 578.191722] lr : dequeue_task_dl+0x128/0x168 [ 578.191724] sp : ffff8000383ebbf0 [ 578.191727] x29: ffff8000383ebbf0 x28: ffff800038288000 [ 578.191733] x27: 0000000000000009 x26: ffff800038890000 [ 578.191739] x25: ffff800038994e60 x24: ffff800038994e00 [ 578.191744] x23: 0000000000000000 x22: 0000000000000000 [ 578.191749] x21: 000000000000000e x20: ffff800038288000 [ 578.191755] x19: ffff80003d950aa8 x18: 0000000000000010 [ 578.191761] x17: 0000000000000001 x16: 0000000000002710 [ 578.191766] x15: 0000000000000006 x14: ffff0000892ed37f [ 578.191772] x13: ffff0000092ed38d x12: 0000000000000000 [ 578.191778] x11: ffff8000383eb840 x10: 0000000005f5e0ff [ 578.191784] x9 : 0000000000000034 x8 : 625f676e696e6e75 [ 578.191794] x7 : 723e2d71725f6c64 x6 : 000000000000016c [ 578.191800] x5 : 0000000000000000 x4 : 0000000000000000 [ 578.191806] x3 : ffffffffffffffff x2 : 000080003480f000 [ 578.191812] x1 : ffff800038288000 x0 : 0000000000000017 [ 578.191818] Call trace: [ 578.191824] dequeue_task_dl+0x128/0x168 [ 578.191830] sched_move_task+0xa8/0x150 [ 578.191837] sched_autogroup_exit_task+0x20/0x30 [ 578.191843] do_exit+0x2c4/0x9f8 [ 578.191847] do_group_exit+0x3c/0xa0 [ 578.191853] get_signal+0x2a4/0x568 [ 578.191860] do_signal+0x70/0x210 [ 578.191866] do_notify_resume+0xe0/0x138 [ 578.191870] work_pending+0x8/0x10 [ 578.191874] ---[ end trace 345388d10dc698fe ]--- As a stop-gap measure before the real solution is available this patch prevents users from carrying out a CPU hotplug operation if a DL task is running (or suspended) on said CPU. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2018-02-13sched/core: Don't change the affinity of DL tasksMathieu Poirier
Now that we mandate that on creation, the ->cpus_allowed mask of a future DL task has to be equal to the rd->span of the root domain it will be associated with, changing the affinity of a DL task doesn't make sense anymore. Indeed, if we set the task to a smaller affinity set then we may be running out of CPU cycle. If we try to increase the size of the affinity set the new CPUs are necessarily part of another root domain where DL utilisation for the task won't be accounted for. As such simply refuse to change the affinity set of a DL task. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2018-02-13cgroup: Constrain the addition of CPUs to a new CPUsetMathieu Poirier
Care must be taken when CPUs are added to a new CPUset. If an ancestor of that set has its sched_load_balance flag switch off then the CPUs in the new CPUset will be added to a new root domain. If the ancestor also had DL tasks those will end up covering more than one root domain, breaking at the same time the DL integrity model. This patch prevents adding CPUs to a new CPUset if one of its ancestor had its sched_load_balance flag off and had DL tasks assigned to it. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2018-02-13cgroup: Constrain 'sched_load_balance' flag when DL tasks are presentMathieu Poirier
This patch prevents the 'sched_load_balance' flag from being set to 0 when DL tasks are present in a CPUset. Otherwise we end up with the DL tasks using CPUs belonging to different root domains, something that breaks the mathematical model behind DL bandwidth management. For example on a 4 core system CPUset "set1" has been created and CPUs 0 and 1 assigned to it. A DL task has also been spun off. By default the DL task can use all the CPUs in the default CPUset. If we set the base CPUset's cpuset.sched_load_balance to 0, CPU 0 and 1 are added to a newly created root domain while CPU 2 and 3 endup in the default root domain. But the DL task is still part of the base CPUset and as such can use CPUs 0 to 3, spanning at the same time more than one root domain. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2018-02-13sched/deadline: Keep new DL task within root domain's boundaryMathieu Poirier
When considering to move a task to the DL policy we need to make sure the CPUs it is allowed to run on matches the CPUs of the root domain of the runqueue it is currently assigned to. Otherwise the task will be allowed to roam on CPUs outside of this root domain, something that will skew system deadline statistics and potentially lead to over selling DL bandwidth. For example we have a system where the cpuset.sched_load_balance flag of the root cpuset has been set to 0 and the 4 core system split in 2 cpuset: set1 has CPU 0 and 1 while set2 has CPU 2 and 3. This results in 3 cpuset, i.e, the default set that has all 4 CPUs along with set1 and set2 as just depicted. We also have task A that hasn't been assigned to any CPUset and as such, is part of the default (root) CPUset. At the time we want to move task A to a DL policy it has been assigned to CPU1. Since CPU1 is part of set1 the root domain will have 2 CPUs in it and the bandwidth constraint checked against the current DL bandwidth allotment of those 2 CPUs. If task A is promoted to a DL policy it's 'cpus_allowed' mask is still equal to the CPUs in the default CPUset, making it possible for the scheduler to move it to CPU2 and CPU3, which could also be running DL tasks of their own. This patch makes sure that a task's cpus_allowed mask matches the CPUs of the root domain associated to the runqueue it has been assigned to. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2018-02-13cpuset: Rebuild root domain deadline accounting informationMathieu Poirier
When the topology of root domains is modified by CPUset or CPUhotplug operations information about the current deadline bandwidth held in the root domain is lost. This patch address the issue by recalculating the lost deadline bandwidth information by circling through the deadline tasks held in CPUsets and adding their current load to the root domain they are associated with. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2018-02-13sched/core: Prevent race condition between cpuset and __sched_setscheduler()Mathieu Poirier
No synchronisation mechanism exist between the cpuset subsystem and calls to function __sched_setscheduler(). As such it is possible that new root domains are created on the cpuset side while a deadline acceptance test is carried out in __sched_setscheduler(), leading to a potential oversell of CPU bandwidth. By making available the cpuset_mutex to the core scheduler it is possible to prevent situations such as the one described above from happening. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2018-02-13sched/core: Streamlining calls to task_rq_unlock()Mathieu Poirier
Calls to task_rq_unlock() are done several times in function __sched_setscheduler(). This is fine when only the rq lock needs to be handled but not so much when other locks come into play. This patch streamline the release of the rq lock so that only one location need to be modified when dealing with more than one lock. No change of functionality is introduced by this patch. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2018-02-13sched/topology: Adding function partition_sched_domains_locked()Mathieu Poirier
Introducing function partition_sched_domains_locked() by taking the mutex locking code out of the original function. That way the work done by partition_sched_domains_locked() can be reused without dropping the mutex lock. No change of functionality is introduced by this patch. Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2018-02-13sched/topology: Add check to backup comment about hotplug lockMathieu Poirier
The comment above function partition_sched_domains() clearly state that the cpu_hotplug_lock should be held but doesn't mandate one to abide to it. Adding an explicit check backs that comment and make it impossible for anyone to miss the requirement. Suggested-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
2018-02-13cpuset: Add 'cpuset_debug' proc entryMathieu Poirier
2018-02-13sched/debug: Add 'rq_debug' proc entryMathieu Poirier
2018-02-13sched/topology: Add 'doms_debug' proc entryMathieu Poirier
2018-02-11Linux 4.16-rc1Linus Torvalds
2018-02-11unify {de,}mangle_poll(), get rid of kernel-side POLL...Al Viro
except, again, POLLFREE and POLL_BUSY_LOOP. With this, we finally get to the promised end result: - POLL{IN,OUT,...} are plain integers and *not* in __poll_t, so any stray instances of ->poll() still using those will be caught by sparse. - eventpoll.c and select.c warning-free wrt __poll_t - no more kernel-side definitions of POLL... - userland ones are visible through the entire kernel (and used pretty much only for mangle/demangle) - same behavior as after the first series (i.e. sparc et.al. epoll(2) working correctly). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-11vfs: do bulk POLL* -> EPOLL* replacementLinus Torvalds
This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL* constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-11Merge branch 'work.poll2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more poll annotation updates from Al Viro: "This is preparation to solving the problems you've mentioned in the original poll series. After this series, the kernel is ready for running for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'` for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done done as a for bulk search-and-replace. After that, the kernel is ready to apply the patch to unify {de,}mangle_poll(), and then get rid of kernel-side POLL... uses entirely, and we should be all done with that stuff. Basically, that's what you suggested wrt KPOLL..., except that we can use EPOLL... instead - they already are arch-independent (and equal to what is currently kernel-side POLL...). After the preparations (in this series) switch to returning EPOLL... from ->poll() instances is completely mechanical and kernel-side POLL... can go away. The last step (killing kernel-side POLL... and unifying {de,}mangle_poll() has to be done after the search-and-replace job, since we need userland-side POLL... for unified {de,}mangle_poll(), thus the cherry-pick at the last step. After that we will have: - POLL{IN,OUT,...} *not* in __poll_t, so any stray instances of ->poll() still using those will be caught by sparse. - eventpoll.c and select.c warning-free wrt __poll_t - no more kernel-side definitions of POLL... - userland ones are visible through the entire kernel (and used pretty much only for mangle/demangle) - same behavior as after the first series (i.e. sparc et.al. epoll(2) working correctly)" * 'work.poll2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: annotate ep_scan_ready_list() ep_send_events_proc(): return result via esed->res preparation to switching ->poll() to returning EPOLL... add EPOLLNVAL, annotate EPOLL... and event_poll->event use linux/poll.h instead of asm/poll.h xen: fix poll misannotation smc: missing poll annotations
2018-02-11Merge tag 'xtensa-20180211' of git://github.com/jcmvbkbc/linux-xtensaLinus Torvalds
Pull xtense fix from Max Filippov: "Build fix for xtensa architecture with KASAN enabled" * tag 'xtensa-20180211' of git://github.com/jcmvbkbc/linux-xtensa: xtensa: fix build with KASAN
2018-02-11Merge tag 'nios2-v4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2 Pull nios2 update from Ley Foon Tan: - clean up old Kconfig options from defconfig - remove leading 0x and 0s from bindings notation in dts files * tag 'nios2-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/lftan/nios2: nios2: defconfig: Cleanup from old Kconfig options nios2: dts: Remove leading 0x and 0s from bindings notation
2018-02-11xtensa: fix build with KASANMax Filippov
The commit 917538e212a2 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT usage") removed KASAN_SHADOW_SCALE_SHIFT definition from include/linux/kasan.h and added it to architecture-specific headers, except for xtensa. This broke the xtensa build with KASAN enabled. Define KASAN_SHADOW_SCALE_SHIFT in arch/xtensa/include/asm/kasan.h Reported by: kbuild test robot <fengguang.wu@intel.com> Fixes: 917538e212a2 ("kasan: clean up KASAN_SHADOW_SCALE_SHIFT usage") Acked-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2018-02-11nios2: defconfig: Cleanup from old Kconfig optionsKrzysztof Kozlowski
Remove old, dead Kconfig option INET_LRO. It is gone since commit 7bbf3cae65b6 ("ipv4: Remove inet_lro library"). Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
2018-02-11nios2: dts: Remove leading 0x and 0s from bindings notationMathieu Malaterre
Improve the DTS files by removing all the leading "0x" and zeros to fix the following dtc warnings: Warning (unit_address_format): Node /XXX unit name should not have leading "0x" and Warning (unit_address_format): Node /XXX unit name should not have leading 0s Converted using the following command: find . -type f \( -iname *.dts -o -iname *.dtsi \) -exec sed -E -i -e "s/@0x([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" -e "s/@0+([0-9a-fA-F\.]+)\s?\{/@\L\1 \{/g" {} + For simplicity, two sed expressions were used to solve each warnings separately. To make the regex expression more robust a few other issues were resolved, namely setting unit-address to lower case, and adding a whitespace before the the opening curly brace: https://elinux.org/Device_Tree_Linux#Linux_conventions This is a follow up to commit 4c9847b7375a ("dt-bindings: Remove leading 0x from bindings notation") Reported-by: David Daney <ddaney@caviumnetworks.com> Suggested-by: Rob Herring <robh@kernel.org> Signed-off-by: Mathieu Malaterre <malat@debian.org> Acked-by: Ley Foon Tan <ley.foon.tan@intel.com>
2018-02-10Merge tag 'pci-v4.16-fixes-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Fix a POWER9/powernv INTx regression from the merge window (Alexey Kardashevskiy)" * tag 'pci-v4.16-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: powerpc/pci: Fix broken INTx configuration via OF
2018-02-10Merge tag 'for-linus-20180210' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "A few fixes to round off the merge window on the block side: - a set of bcache fixes by way of Michael Lyle, from the usual bcache suspects. - add a simple-to-hook-into function for bpf EIO error injection. - fix blk-wbt that mischarectized flushes as reads. Improve the logic so that flushes and writes are accounted as writes, and only reads as reads. From me. - fix requeue crash in BFQ, from Paolo" * tag 'for-linus-20180210' of git://git.kernel.dk/linux-block: block, bfq: add requeue-request hook bcache: fix for data collapse after re-attaching an attached device bcache: return attach error when no cache set exist bcache: set writeback_rate_update_seconds in range [1, 60] seconds bcache: fix for allocator and register thread race bcache: set error_limit correctly bcache: properly set task state in bch_writeback_thread() bcache: fix high CPU occupancy during journal bcache: add journal statistic block: Add should_fail_bio() for bpf error injection blk-wbt: account flush requests correctly
2018-02-10Merge tag 'platform-drivers-x86-v4.16-3' of git://github.com/dvhart/linux-pdx86Linus Torvalds
Pull x86 platform driver updates from Darren Hart: "Mellanox fixes and new system type support. Mostly data for new system types with a correction and an uninitialized variable fix" [ Pulling from github because git.infradead.org currently seems to be down for some reason, but Darren had a backup location - Linus ] * tag 'platform-drivers-x86-v4.16-3' of git://github.com/dvhart/linux-pdx86: platform/x86: mlx-platform: Add support for new 200G IB and Ethernet systems platform/x86: mlx-platform: Add support for new msn201x system type platform/x86: mlx-platform: Add support for new msn274x system type platform/x86: mlx-platform: Fix power cable setting for msn21xx family platform/x86: mlx-platform: Add define for the negative bus platform/x86: mlx-platform: Use defines for bus assignment platform/mellanox: mlxreg-hotplug: Fix uninitialized variable
2018-02-10Merge tag 'chrome-platform-for-linus-4.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform Pull chrome platform updates from Benson Leung: - move cros_ec_dev to drivers/mfd - other small maintenance fixes [ The cros_ec_dev movement came in earlier through the MFD tree - Linus ] * tag 'chrome-platform-for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform: platform/chrome: Use proper protocol transfer function platform/chrome: cros_ec_lpc: Add support for Google Glimmer platform/chrome: cros_ec_lpc: Register the driver if ACPI entry is missing. platform/chrome: cros_ec_lpc: remove redundant pointer request cros_ec: fix nul-termination for firmware build info platform/chrome: chromeos_laptop: make chromeos_laptop const
2018-02-10Merge tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Radim Krčmář: "ARM: - icache invalidation optimizations, improving VM startup time - support for forwarded level-triggered interrupts, improving performance for timers and passthrough platform devices - a small fix for power-management notifiers, and some cosmetic changes PPC: - add MMIO emulation for vector loads and stores - allow HPT guests to run on a radix host on POWER9 v2.2 CPUs without requiring the complex thread synchronization of older CPU versions - improve the handling of escalation interrupts with the XIVE interrupt controller - support decrement register migration - various cleanups and bugfixes. s390: - Cornelia Huck passed maintainership to Janosch Frank - exitless interrupts for emulated devices - cleanup of cpuflag handling - kvm_stat counter improvements - VSIE improvements - mm cleanup x86: - hypervisor part of SEV - UMIP, RDPID, and MSR_SMI_COUNT emulation - paravirtualized TLB shootdown using the new KVM_VCPU_PREEMPTED bit - allow guests to see TOPOEXT, GFNI, VAES, VPCLMULQDQ, and more AVX512 features - show vcpu id in its anonymous inode name - many fixes and cleanups - per-VCPU MSR bitmaps (already merged through x86/pti branch) - stable KVM clock when nesting on Hyper-V (merged through x86/hyperv)" * tag 'kvm-4.16-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (197 commits) KVM: PPC: Book3S: Add MMIO emulation for VMX instructions KVM: PPC: Book3S HV: Branch inside feature section KVM: PPC: Book3S HV: Make HPT resizing work on POWER9 KVM: PPC: Book3S HV: Fix handling of secondary HPTEG in HPT resizing code KVM: PPC: Book3S PR: Fix broken select due to misspelling KVM: x86: don't forget vcpu_put() in kvm_arch_vcpu_ioctl_set_sregs() KVM: PPC: Book3S PR: Fix svcpu copying with preemption enabled KVM: PPC: Book3S HV: Drop locks before reading guest memory kvm: x86: remove efer_reload entry in kvm_vcpu_stat KVM: x86: AMD Processor Topology Information x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested kvm: embed vcpu id to dentry of vcpu anon inode kvm: Map PFN-type memory regions as writable (if possible) x86/kvm: Make it compile on 32bit and with HYPYERVISOR_GUEST=n KVM: arm/arm64: Fixup userspace irqchip static key optimization KVM: arm/arm64: Fix userspace_irqchip_in_use counting KVM: arm/arm64: Fix incorrect timer_is_pending logic MAINTAINERS: update KVM/s390 maintainers MAINTAINERS: add Halil as additional vfio-ccw maintainer MAINTAINERS: add David as a reviewer for KVM/s390 ...
2018-02-10powerpc/pci: Fix broken INTx configuration via OFAlexey Kardashevskiy
59f47eff03a0 ("powerpc/pci: Use of_irq_parse_and_map_pci() helper") replaced of_irq_parse_pci() + irq_create_of_mapping() with of_irq_parse_and_map_pci(), but neglected to capture the virq returned by irq_create_of_mapping(), so virq remained zero, which caused INTx configuration to fail. Save the virq value returned by of_irq_parse_and_map_pci() and correct the virq declaration to match the of_irq_parse_and_map_pci() signature. Fixes: 59f47eff03a0 "powerpc/pci: Use of_irq_parse_and_map_pci() helper" Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2018-02-09Merge tag 'kbuild-v4.16-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull more Kbuild updates from Masahiro Yamada: "Makefile changes: - enable unused-variable warning that was wrongly disabled for clang Kconfig changes: - warn about blank 'help' and fix existing instances - fix 'choice' behavior to not write out invisible symbols - fix misc weirdness Coccinell changes: - fix false positive of free after managed memory alloc detection - improve performance of NULL dereference detection" * tag 'kbuild-v4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (21 commits) kconfig: remove const qualifier from sym_expand_string_value() kconfig: add xrealloc() helper kconfig: send error messages to stderr kconfig: echo stdin to stdout if either is redirected kconfig: remove check_stdin() kconfig: remove 'config*' pattern from .gitignnore kconfig: show '?' prompt even if no help text is available kconfig: do not write choice values when their dependency becomes n coccinelle: deref_null: avoid useless computation coccinelle: devm_free: reduce false positives kbuild: clang: disable unused variable warnings only when constant kconfig: Warn if help text is blank nios2: kconfig: Remove blank help text arm: vt8500: kconfig: Remove blank help text MIPS: kconfig: Remove blank help text MIPS: BCM63XX: kconfig: Remove blank help text lib/Kconfig.debug: Remove blank help text Staging: rtl8192e: kconfig: Remove blank help text Staging: rtl8192u: kconfig: Remove blank help text mmc: kconfig: Remove blank help text ...
2018-02-09mconsole_proc(): don't mess with file->f_posAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-09Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc vfs fixes from Al Viro. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: seq_file: fix incomplete reset on read from zero offset kernfs: fix regression in kernfs_fop_write caused by wrong type
2018-02-10kconfig: remove const qualifier from sym_expand_string_value()Masahiro Yamada
This function returns realloc'ed memory, so the returned pointer must be passed to free() when done. So, 'const' qualifier is odd. It is allowed to modify the expanded string. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-02-10kconfig: add xrealloc() helperMasahiro Yamada
We already have xmalloc(), xcalloc(). Add xrealloc() as well to save tedious error handling. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
2018-02-09platform/x86: mlx-platform: Add support for new 200G IB and Ethernet systemsVadim Pasternak
It adds support for new Mellanox system types of basic classes qmb7, sn34, sn37, containing systems QMB700 (40x200GbE InfiniBand switch), SN3700 (32x200GbE and 16x400GbE Ethernet switch) and SN3410 (6x400GbE plus 48x50GbE Ethernet switch). These are the Top of the Rack systems, equipped with Mellanox COM-Express carrier board and switch board with Mellanox Quantum device, which supports InfiniBand switching with 40X200G ports and line rate of up to HDR speed or with Mellanox Spectrum-2 device, which supports Ethernet switching with 32X200G ports line rate of up to HDR speed. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-09platform/x86: mlx-platform: Add support for new msn201x system typeVadim Pasternak
It adds support for new Mellanox system types of basic half unit size class msn201x, containing system MSN2010 (18x10GbE plus 4x4x25GbE) half and its derivatives. This is the Top of the Rack system, equipped with Mellanox Small Form Factor carrier board and switch board with Mellanox Spectrum device, which supports Ethernet switching with 32X100G ports line rate of up to EDR speed. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-09platform/x86: mlx-platform: Add support for new msn274x system typeVadim Pasternak
It adds support for new Mellanox system types of basic class msn274x, containing system MSN2740 (32x100GbE Ethernet switch with cost reduction) and its derivatives. These are the Top of the Rack system, equipped with Mellanox Small Form Factor carrier board and switch board with Mellanox Spectrum device, which supports Ethernet switching with 32X100G ports line rate of up to EDR speed. Signed-off-by: Vadim Pasternak <vadimp@mellanox.com> Signed-off-by: Darren Hart (VMware) <dvhart@infradead.org>
2018-02-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Make allocations less aggressive in x_tables, from Minchal Hocko. 2) Fix netfilter flowtable Kconfig deps, from Pablo Neira Ayuso. 3) Fix connection loss problems in rtlwifi, from Larry Finger. 4) Correct DRAM dump length for some chips in ath10k driver, from Yu Wang. 5) Fix ABORT handling in rxrpc, from David Howells. 6) Add SPDX tags to Sun networking drivers, from Shannon Nelson. 7) Some ipv6 onlink handling fixes, from David Ahern. 8) Netem packet scheduler interval calcualtion fix from Md. Islam. 9) Don't put crypto buffers on-stack in rxrpc, from David Howells. 10) Fix handling of error non-delivery status in netlink multicast delivery over multiple namespaces, from Nicolas Dichtel. 11) Missing xdp flush in tuntap driver, from Jason Wang. 12) Synchonize RDS protocol netns/module teardown with rds object management, from Sowini Varadhan. 13) Add nospec annotations to mpls, from Dan Williams. 14) Fix SKB truesize handling in TIPC, from Hoang Le. 15) Interrupt masking fixes in stammc from Niklas Cassel. 16) Don't allow ptr_ring objects to be sized outside of kmalloc's limits, from Jason Wang. 17) Don't allow SCTP chunks to be built which will have a length exceeding the chunk header's 16-bit length field, from Alexey Kodanev. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (82 commits) ibmvnic: Remove skb->protocol checks in ibmvnic_xmit bpf: fix rlimit in reuseport net selftest sctp: verify size of a new chunk in _sctp_make_chunk() s390/qeth: fix SETIP command handling s390/qeth: fix underestimated count of buffer elements ptr_ring: try vmalloc() when kmalloc() fails ptr_ring: fail early if queue occupies more than KMALLOC_MAX_SIZE net: stmmac: remove redundant enable of PMT irq net: stmmac: rename GMAC_INT_DEFAULT_MASK for dwmac4 net: stmmac: discard disabled flags in interrupt status register ibmvnic: Reset long term map ID counter tools/libbpf: handle issues with bpf ELF objects containing .eh_frames selftests/bpf: add selftest that use test_libbpf_open selftests/bpf: add test program for loading BPF ELF files tools/libbpf: improve the pr_debug statements to contain section numbers bpf: Sync kernel ABI header with tooling header for bpf_common.h net: phy: fix phy_start to consider PHY_IGNORE_INTERRUPT net: thunder: change q_len's type to handle max ring size tipc: fix skb truesize/datasize ratio control net/sched: cls_u32: fix cls_u32 on filter replace ...
2018-02-09Merge tag 'nfs-for-4.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull more NFS client updates from Trond Myklebust: "A few bugfixes and some small sunrpc latency/performance improvements before the merge window closes: Stable fixes: - fix an incorrect calculation of the RDMA send scatter gather element limit - fix an Oops when attempting to free resources after RDMA device removal Bugfixes: - SUNRPC: Ensure we always release the TCP socket in a timely fashion when the connection is shut down. - SUNRPC: Don't call __UDPX_INC_STATS() from a preemptible context Latency/Performance: - SUNRPC: Queue latency sensitive socket tasks to the less contended xprtiod queue - SUNRPC: Make the xprtiod workqueue unbounded. - SUNRPC: Make the rpciod workqueue unbounded" * tag 'nfs-for-4.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: Don't call __UDPX_INC_STATS() from a preemptible context fix parallelism for rpc tasks Make the xprtiod workqueue unbounded. SUNRPC: Queue latency-sensitive socket tasks to xprtiod SUNRPC: Ensure we always close the socket after a connection shuts down xprtrdma: Fix BUG after a device removal xprtrdma: Fix calculation of ri_max_send_sges
2018-02-09Merge branch 'for-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending Pull SCSI target updates from Nicholas Bellinger: "The highlights include: - numerous target-core-user improvements related to queue full and timeout handling. (MNC) - prevent target-core-user corruption when invalid data page is requested. (MNC) - add target-core device action configfs attributes to allow user-space to trigger events separate from existing attributes exposed to end-users. (MNC) - fix iscsi-target NULL pointer dereference 4.6+ regression in CHAP error path. (David Disseldorp) - avoid target-core backend UNMAP callbacks if range is zero. (Andrei Vagin) - fix a iscsi-target 4.14+ regression related multiple PDU logins, that was exposed due to removal of TCP prequeue support. (Florian Westphal + MNC) Also, there is a iser-target bug still being worked on for post -rc1 code to address a long standing issue resulting in persistent ib_post_send() failures, for RNICs with small max_send_sge" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (36 commits) iscsi-target: make sure to wake up sleeping login worker tcmu: Fix trailing semicolon tcmu: fix cmd user after free target: fix destroy device in target_configure_device tcmu: allow userspace to reset ring target core: add device action configfs files tcmu: fix error return code in tcmu_configure_device() target_core_user: add cmd id to broken ring message target: add SAM_STAT_BUSY sense reason tcmu: prevent corruption when invalid data page requested target: don't call an unmap callback if a range length is zero target/iscsi: avoid NULL dereference in CHAP auth error path cxgbit: call neigh_event_send() to update MAC address target: tcm_loop: Use seq_puts() in tcm_loop_show_info() target: tcm_loop: Delete an unnecessary return statement in tcm_loop_submission_work() target: tcm_loop: Delete two unnecessary variable initialisations in tcm_loop_issue_tmr() target: tcm_loop: Combine substrings for 26 messages target: tcm_loop: Improve a size determination in two functions target: tcm_loop: Delete an error message for a failed memory allocation in four functions sbp-target: Delete an error message for a failed memory allocation in three functions ...
2018-02-09Merge tag 'trace-v4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Al Viro discovered some breakage with the parsing of the set_ftrace_filter as well as the removing of function probes. This fixes the code with Al's suggestions. I also added a few selftests to test the broken cases such that they wont happen again" * tag 'trace-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: selftests/ftrace: Add more tests for removing of function probes selftests/ftrace: Add some missing glob checks selftests/ftrace: Have reset_ftrace_filter handle multiple instances selftests/ftrace: Have reset_ftrace_filter handle modules tracing: Fix parsing of globs with a wildcard at the beginning ftrace: Remove incorrect setting of glob search field
2018-02-09Merge tag '4.16-minor-rc-SMB3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "There are a couple additional security fixes that are still being tested that are not in this set." * tag '4.16-minor-rc-SMB3-fixes' of git://git.samba.org/sfrench/cifs-2.6: Add missing structs and defines from recent SMB3.1.1 documentation address lock imbalance warnings in smbdirect.c cifs: silence compiler warnings showing up with gcc-8.0.0 Add some missing debug fields in server and tcon structs
2018-02-09Merge tag 'fbdev-v4.16-fix' of git://github.com/bzolnier/linuxLinus Torvalds
Pull fbdev fix from Bartlomiej Zolnierkiewicz: "Fix building of the omapfb driver (Tomi Valkeinen)" * tag 'fbdev-v4.16-fix' of git://github.com/bzolnier/linux: video: omapfb: fix missing #includes
2018-02-09Merge tag 'kvm-ppc-next-4.16-2' of ↵Radim Krčmář
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc Second PPC KVM update for 4.16 Seven fixes that are either trivial or that address bugs that people are actually hitting. The main ones are: - Drop spinlocks before reading guest memory - Fix a bug causing corruption of VCPU state in PR KVM with preemption enabled - Make HPT resizing work on POWER9 - Add MMIO emulation for vector loads and stores, because guests now use these instructions in memcpy and similar routines.
2018-02-09Merge branch 'msr-bitmaps' of git://git.kernel.org/pub/scm/virt/kvm/kvmRadim Krčmář
This topic branch allocates separate MSR bitmaps for each VCPU. This is required for the IBRS enablement to choose, on a per-VM basis, whether to intercept the SPEC_CTRL and PRED_CMD MSRs; the IBRS enablement comes in through the tip tree.
2018-02-09ibmvnic: Remove skb->protocol checks in ibmvnic_xmitJohn Allen
Having these checks in ibmvnic_xmit causes problems with VLAN tagging and balance-alb/tlb bonding modes. The restriction they imposed can be removed. Signed-off-by: John Allen <jallen@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09bpf: fix rlimit in reuseport net selftestDaniel Borkmann
Fix two issues in the reuseport_bpf selftests that were reported by Linaro CI: [...] + ./reuseport_bpf ---- IPv4 UDP ---- Testing EBPF mod 10... Reprograming, testing mod 5... ./reuseport_bpf: ebpf error. log: 0: (bf) r6 = r1 1: (20) r0 = *(u32 *)skb[0] 2: (97) r0 %= 10 3: (95) exit processed 4 insns : Operation not permitted + echo FAIL [...] ---- IPv4 TCP ---- Testing EBPF mod 10... ./reuseport_bpf: failed to bind send socket: Address already in use + echo FAIL [...] For the former adjust rlimit since this was the cause of failure for loading the BPF prog, and for the latter add SO_REUSEADDR. Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Link: https://bugs.linaro.org/show_bug.cgi?id=3502 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09sctp: verify size of a new chunk in _sctp_make_chunk()Alexey Kodanev
When SCTP makes INIT or INIT_ACK packet the total chunk length can exceed SCTP_MAX_CHUNK_LEN which leads to kernel panic when transmitting these packets, e.g. the crash on sending INIT_ACK: [ 597.804948] skbuff: skb_over_panic: text:00000000ffae06e4 len:120168 put:120156 head:000000007aa47635 data:00000000d991c2de tail:0x1d640 end:0xfec0 dev:<NULL> ... [ 597.976970] ------------[ cut here ]------------ [ 598.033408] kernel BUG at net/core/skbuff.c:104! [ 600.314841] Call Trace: [ 600.345829] <IRQ> [ 600.371639] ? sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.436934] skb_put+0x16c/0x200 [ 600.477295] sctp_packet_transmit+0x2095/0x26d0 [sctp] [ 600.540630] ? sctp_packet_config+0x890/0x890 [sctp] [ 600.601781] ? __sctp_packet_append_chunk+0x3b4/0xd00 [sctp] [ 600.671356] ? sctp_cmp_addr_exact+0x3f/0x90 [sctp] [ 600.731482] sctp_outq_flush+0x663/0x30d0 [sctp] [ 600.788565] ? sctp_make_init+0xbf0/0xbf0 [sctp] [ 600.845555] ? sctp_check_transmitted+0x18f0/0x18f0 [sctp] [ 600.912945] ? sctp_outq_tail+0x631/0x9d0 [sctp] [ 600.969936] sctp_cmd_interpreter.isra.22+0x3be1/0x5cb0 [sctp] [ 601.041593] ? sctp_sf_do_5_1B_init+0x85f/0xc30 [sctp] [ 601.104837] ? sctp_generate_t1_cookie_event+0x20/0x20 [sctp] [ 601.175436] ? sctp_eat_data+0x1710/0x1710 [sctp] [ 601.233575] sctp_do_sm+0x182/0x560 [sctp] [ 601.284328] ? sctp_has_association+0x70/0x70 [sctp] [ 601.345586] ? sctp_rcv+0xef4/0x32f0 [sctp] [ 601.397478] ? sctp6_rcv+0xa/0x20 [sctp] ... Here the chunk size for INIT_ACK packet becomes too big, mostly because of the state cookie (INIT packet has large size with many address parameters), plus additional server parameters. Later this chunk causes the panic in skb_put_data(): skb_packet_transmit() sctp_packet_pack() skb_put_data(nskb, chunk->skb->data, chunk->skb->len); 'nskb' (head skb) was previously allocated with packet->size from u16 'chunk->chunk_hdr->length'. As suggested by Marcelo we should check the chunk's length in _sctp_make_chunk() before trying to allocate skb for it and discard a chunk if its size bigger than SCTP_MAX_CHUNK_LEN. Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leinter@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09Merge branch 's390-qeth-fixes'David S. Miller
Julian Wiedmann says: ==================== s390/qeth: fixes 2018-02-09 please apply the following two qeth patches for 4.16 and stable. One restricts a command quirk to the intended commandd type, while the other fixes an off-by-one during data transmission that can cause qeth to build malformed buffer descriptors. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09s390/qeth: fix SETIP command handlingJulian Wiedmann
send_control_data() applies some special handling to SETIP v4 IPA commands. But current code parses *all* command types for the SETIP command code. Limit the command code check to IPA commands. Fixes: 5b54e16f1a54 ("qeth: do not spin for SETIP ip assist command") Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-09s390/qeth: fix underestimated count of buffer elementsUrsula Braun
For a memory range/skb where the last byte falls onto a page boundary (ie. 'end' is of the form xxx...xxx001), the PFN_UP() part of the calculation currently doesn't round up to the next PFN due to an off-by-one error. Thus qeth believes that the skb occupies one page less than it actually does, and may select a IO buffer that doesn't have enough spare buffer elements to fit all of the skb's data. HW detects this as a malformed buffer descriptor, and raises an exception which then triggers device recovery. Fixes: 2863c61334aa ("qeth: refactor calculation of SBALE count") Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>