path: root/kernel
AgeCommit message (Collapse)Author
6 daysMerge tag 'trace-v5.12-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix a memory link in dyn_event_release(). An error path exited the function before freeing the allocated 'argv' variable" * tag 'trace-v5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/dynevent: Fix a memory leak in an error handling path
6 daystracing/dynevent: Fix a memory leak in an error handling pathChristophe JAILLET
We must free 'argv' before returning, as already done in all the other paths of this function. Link: https://lkml.kernel.org/r/21e3594ccd7fc88c5c162c98450409190f304327.1618136448.git.christophe.jaillet@wanadoo.fr Fixes: d262271d0483 ("tracing/dynevent: Delegate parsing to create function") Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
8 daysMerge tag 'locking-urgent-2021-04-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixlets from Ingo Molnar: "Two minor fixes: one for a Clang warning, the other improves an ambiguous/confusing kernel log message" * tag 'locking-urgent-2021-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lockdep: Address clang -Wformat warning printing for %hd lockdep: Add a missing initialization hint to the "INFO: Trying to register non-static key" message
10 daysMerge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: mm (kasan, gup, pagecache, and kfence), MAINTAINERS, mailmap, nds32, gcov, ocfs2, ia64, and lib" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: lib: fix kconfig dependency on ARCH_WANT_FRAME_POINTERS kfence, x86: fix preemptible warning on KPTI-enabled systems lib/test_kasan_module.c: suppress unused var warning kasan: fix conflict with page poisoning fs: direct-io: fix missing sdio->boundary ia64: fix user_stack_pointer() for ptrace() ocfs2: fix deadlock between setattr and dio_end_io_write gcov: re-fix clang-11+ support nds32: flush_dcache_page: use page_mapping_file to avoid races with swapoff mm/gup: check page posion status for coredump. .mailmap: fix old email addresses mailmap: update email address for Jordan Crouse treewide: change my e-mail address, fix my name MAINTAINERS: update CZ.NIC's Turris information
10 daysMerge tag 'net-5.12-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Networking fixes for 5.12-rc7, including fixes from can, ipsec, mac80211, wireless, and bpf trees. No scary regressions here or in the works, but small fixes for 5.12 changes keep coming. Current release - regressions: - virtio: do not pull payload in skb->head - virtio: ensure mac header is set in virtio_net_hdr_to_skb() - Revert "net: correct sk_acceptq_is_full()" - mptcp: revert "mptcp: provide subflow aware release function" - ethernet: lan743x: fix ethernet frame cutoff issue - dsa: fix type was not set for devlink port - ethtool: remove link_mode param and derive link params from driver - sched: htb: fix null pointer dereference on a null new_q - wireless: iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_enqueue_hcmd() - wireless: iwlwifi: fw: fix notification wait locking - wireless: brcmfmac: p2p: Fix deadlock introduced by avoiding the rtnl dependency Current release - new code bugs: - napi: fix hangup on napi_disable for threaded napi - bpf: take module reference for trampoline in module - wireless: mt76: mt7921: fix airtime reporting and related tx hangs - wireless: iwlwifi: mvm: rfi: don't lock mvm->mutex when sending config command Previous releases - regressions: - rfkill: revert back to old userspace API by default - nfc: fix infinite loop, refcount & memory leaks in LLCP sockets - let skb_orphan_partial wake-up waiters - xfrm/compat: Cleanup WARN()s that can be user-triggered - vxlan, geneve: do not modify the shared tunnel info when PMTU triggers an ICMP reply - can: fix msg_namelen values depending on CAN_REQUIRED_SIZE - can: uapi: mark union inside struct can_frame packed - sched: cls: fix action overwrite reference counting - sched: cls: fix err handler in tcf_action_init() - ethernet: mlxsw: fix ECN marking in tunnel decapsulation - ethernet: nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx - ethernet: i40e: fix receiving of single packets in xsk zero-copy mode - ethernet: cxgb4: avoid collecting SGE_QBASE regs during traffic Previous releases - always broken: - bpf: Refuse non-O_RDWR flags in BPF_OBJ_GET - bpf: Refcount task stack in bpf_get_task_stack - bpf, x86: Validate computation of branch displacements - ieee802154: fix many similar syzbot-found bugs - fix NULL dereferences in netlink attribute handling - reject unsupported operations on monitor interfaces - fix error handling in llsec_key_alloc() - xfrm: make ipv4 pmtu check honor ip header df - xfrm: make hash generation lock per network namespace - xfrm: esp: delete NETIF_F_SCTP_CRC bit from features for esp offload - ethtool: fix incorrect datatype in set_eee ops - xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory model - openvswitch: fix send of uninitialized stack memory in ct limit reply Misc: - udp: add get handling for UDP_GRO sockopt" * tag 'net-5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (182 commits) net: fix hangup on napi_disable for threaded napi net: hns3: Trivial spell fix in hns3 driver lan743x: fix ethernet frame cutoff issue net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits net: dsa: lantiq_gswip: Don't use PHY auto polling net: sched: sch_teql: fix null-pointer dereference ipv6: report errors for iftoken via netlink extack net: sched: fix err handler in tcf_action_init() net: sched: fix action overwrite reference counting Revert "net: sched: bump refcount for new action in ACT replace mode" ice: fix memory leak of aRFS after resuming from suspend i40e: Fix sparse warning: missing error code 'err' i40e: Fix sparse error: 'vsi->netdev' could be null i40e: Fix sparse error: uninitialized symbol 'ring' i40e: Fix sparse errors in i40e_txrx.c i40e: Fix parameters in aq_get_phy_register() nl80211: fix beacon head validation bpf, x86: Validate computation of branch displacements for x86-32 bpf, x86: Validate computation of branch displacements for x86-64 ...
10 daysgcov: re-fix clang-11+ supportNick Desaulniers
LLVM changed the expected function signature for llvm_gcda_emit_function() in the clang-11 release. Users of clang-11 or newer may have noticed their kernels producing invalid coverage information: $ llvm-cov gcov -a -c -u -f -b <input>.gcda -- gcno=<input>.gcno 1 <func>: checksum mismatch, \ (<lineno chksum A>, <cfg chksum B>) != (<lineno chksum A>, <cfg chksum C>) 2 Invalid .gcda File! ... Fix up the function signatures so calling this function interprets its parameters correctly and computes the correct cfg checksum. In particular, in clang-11, the additional checksum is no longer optional. Link: https://reviews.llvm.org/rG25544ce2df0daa4304c07e64b9c8b0f7df60c11d Link: https://lkml.kernel.org/r/20210408184631.1156669-1-ndesaulniers@google.com Reported-by: Prasad Sodagudi <psodagud@quicinc.com> Tested-by: Prasad Sodagudi <psodagud@quicinc.com> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Cc: <stable@vger.kernel.org> [5.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-04-04workqueue/watchdog: Make unbound workqueues aware of touch_softlockup_watchdog()Wang Qing
84;0;0c84;0;0c There are two workqueue-specific watchdog timestamps: + @wq_watchdog_touched_cpu (per-CPU) updated by touch_softlockup_watchdog() + @wq_watchdog_touched (global) updated by touch_all_softlockup_watchdogs() watchdog_timer_fn() checks only the global @wq_watchdog_touched for unbound workqueues. As a result, unbound workqueues are not aware of touch_softlockup_watchdog(). The watchdog might report a stall even when the unbound workqueues are blocked by a known slow code. Solution: touch_softlockup_watchdog() must touch also the global @wq_watchdog_touched timestamp. The global timestamp can no longer be used for bound workqueues because it is now updated from all CPUs. Instead, bound workqueues have to check only @wq_watchdog_touched_cpu and these timestamps have to be updated for all CPUs in touch_all_softlockup_watchdogs(). Beware: The change might cause the opposite problem. An unbound workqueue might get blocked on CPU A because of a real softlockup. The workqueue watchdog would miss it when the timestamp got touched on CPU B. It is acceptable because softlockups are detected by softlockup watchdog. The workqueue watchdog is there to detect stalls where a work never finishes, for example, because of dependencies of works queued into the same workqueue. V3: - Modify the commit message clearly according to Petr's suggestion. Signed-off-by: Wang Qing <wangqing@vivo.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-04-04workqueue: Move the position of debug_work_activate() in __queue_work()Zqiang
The debug_work_activate() is called on the premise that the work can be inserted, because if wq be in WQ_DRAINING status, insert work may be failed. Fixes: e41e704bc4f4 ("workqueue: improve destroy_workqueue() debuggability") Signed-off-by: Zqiang <qiang.zhang@windriver.com> Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2021-04-02Merge tag 'trace-v5.12-rc5-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix stack trace entry size to stop showing garbage The macro that creates both the structure and the format displayed to user space for the stack trace event was changed a while ago to fix the parsing by user space tooling. But this change also modified the structure used to store the stack trace event. It changed the caller array field from [0] to [8]. Even though the size in the ring buffer is dynamic and can be something other than 8 (user space knows how to handle this), the 8 extra words was not accounted for when reserving the event on the ring buffer, and added 8 more entries, due to the calculation of "sizeof(*entry) + nr_entries * sizeof(long)", as the sizeof(*entry) now contains 8 entries. The size of the caller field needs to be subtracted from the size of the entry to create the correct allocation size" * tag 'trace-v5.12-rc5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix stack trace event size
2021-04-01bpf: program: Refuse non-O_RDWR flags in BPF_OBJ_GETLorenz Bauer
As for bpf_link, refuse creating a non-O_RDWR fd. Since program fds currently don't allow modifications this is a precaution, not a straight up bug fix. Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210326160501.46234-2-lmb@cloudflare.com
2021-04-01bpf: link: Refuse non-O_RDWR flags in BPF_OBJ_GETLorenz Bauer
Invoking BPF_OBJ_GET on a pinned bpf_link checks the path access permissions based on file_flags, but the returned fd ignores flags. This means that any user can acquire a "read-write" fd for a pinned link with mode 0664 by invoking BPF_OBJ_GET with BPF_F_RDONLY in file_flags. The fd can be used to invoke BPF_LINK_DETACH, etc. Fix this by refusing non-O_RDWR flags in BPF_OBJ_GET. This works because OBJ_GET by default returns a read write mapping and libbpf doesn't expose a way to override this behaviour for programs and links. Fixes: 70ed506c3bbc ("bpf: Introduce pinnable bpf_link abstraction") Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210326160501.46234-1-lmb@cloudflare.com
2021-04-01bpf: Refcount task stack in bpf_get_task_stackDave Marchevsky
On x86 the struct pt_regs * grabbed by task_pt_regs() points to an offset of task->stack. The pt_regs are later dereferenced in __bpf_get_stack (e.g. by user_mode() check). This can cause a fault if the task in question exits while bpf_get_task_stack is executing, as warned by task_stack_page's comment: * When accessing the stack of a non-current task that might exit, use * try_get_task_stack() instead. task_stack_page will return a pointer * that could get freed out from under you. Taking the comment's advice and using try_get_task_stack() and put_task_stack() to hold task->stack refcount, or bail early if it's already 0. Incrementing stack_refcount will ensure the task's stack sticks around while we're using its data. I noticed this bug while testing a bpf task iter similar to bpf_iter_task_stack in selftests, except mine grabbed user stack, and getting intermittent crashes, which resulted in dumps like: BUG: unable to handle page fault for address: 0000000000003fe0 \#PF: supervisor read access in kernel mode \#PF: error_code(0x0000) - not-present page RIP: 0010:__bpf_get_stack+0xd0/0x230 <snip...> Call Trace: bpf_prog_0a2be35c092cb190_get_task_stacks+0x5d/0x3ec bpf_iter_run_prog+0x24/0x81 __task_seq_show+0x58/0x80 bpf_seq_read+0xf7/0x3d0 vfs_read+0x91/0x140 ksys_read+0x59/0xd0 do_syscall_64+0x48/0x120 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: fa28dcb82a38 ("bpf: Introduce helper bpf_get_task_stack()") Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210401000747.3648767-1-davemarchevsky@fb.com
2021-04-01tracing: Fix stack trace event sizeSteven Rostedt (VMware)
Commit cbc3b92ce037 fixed an issue to modify the macros of the stack trace event so that user space could parse it properly. Originally the stack trace format to user space showed that the called stack was a dynamic array. But it is not actually a dynamic array, in the way that other dynamic event arrays worked, and this broke user space parsing for it. The update was to make the array look to have 8 entries in it. Helper functions were added to make it parse it correctly, as the stack was dynamic, but was determined by the size of the event stored. Although this fixed user space on how it read the event, it changed the internal structure used for the stack trace event. It changed the array size from [0] to [8] (added 8 entries). This increased the size of the stack trace event by 8 words. The size reserved on the ring buffer was the size of the stack trace event plus the number of stack entries found in the stack trace. That commit caused the amount to be 8 more than what was needed because it did not expect the caller field to have any size. This produced 8 entries of garbage (and reading random data) from the stack trace event: <idle>-0 [002] d... 1976396.837549: <stack trace> => trace_event_raw_event_sched_switch => __traceiter_sched_switch => __schedule => schedule_idle => do_idle => cpu_startup_entry => secondary_startup_64_no_verify => 0xc8c5e150ffff93de => 0xffff93de => 0 => 0 => 0xc8c5e17800000000 => 0x1f30affff93de => 0x00000004 => 0x200000000 Instead, subtract the size of the caller field from the size of the event to make sure that only the amount needed to store the stack trace is reserved. Link: https://lore.kernel.org/lkml/your-ad-here.call-01617191565-ext-9692@work.hours/ Cc: stable@vger.kernel.org Fixes: cbc3b92ce037 ("tracing: Set kernel_stack's caller size properly") Reported-by: Vasily Gorbik <gor@linux.ibm.com> Tested-by: Vasily Gorbik <gor@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-31Merge tag 'trace-v5.12-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull ftrace fix from Steven Rostedt: "Add check of order < 0 before calling free_pages() The function addresses that are traced by ftrace are stored in pages, and the size is held in a variable. If there's some error in creating them, the allocate ones will be freed. In this case, it is possible that the order of pages to be freed may end up being negative due to a size of zero passed to get_count_order(), and then that negative number will cause free_pages() to free a very large section. Make sure that does not happen" * tag 'trace-v5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace: Check if pages were allocated before calling free_pages()
2021-03-30ftrace: Check if pages were allocated before calling free_pages()Steven Rostedt (VMware)
It is possible that on error pg->size can be zero when getting its order, which would return a -1 value. It is dangerous to pass in an order of -1 to free_pages(). Check if order is greater than or equal to zero before calling free_pages(). Link: https://lore.kernel.org/lkml/20210330093916.432697c7@gandalf.local.home/ Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-03-28Merge tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: - Use thread info versions of flag testing, as discussed last week. - The series enabling PF_IO_WORKER to just take signals, instead of needing to special case that they do not in a bunch of places. Ends up being pretty trivial to do, and then we can revert all the special casing we're currently doing. - Kill dead pointer assignment - Fix hashed part of async work queue trace - Fix sign extension issue for IORING_OP_PROVIDE_BUFFERS - Fix a link completion ordering regression in this merge window - Cancellation fixes * tag 'io_uring-5.12-2021-03-27' of git://git.kernel.dk/linux-block: io_uring: remove unsued assignment to pointer io io_uring: don't cancel extra on files match io_uring: don't cancel-track common timeouts io_uring: do post-completion chore on t-out cancel io_uring: fix timeout cancel return code Revert "signal: don't allow STOP on PF_IO_WORKER threads" Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing" Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals" Revert "signal: don't allow sending any signals to PF_IO_WORKER threads" kernel: stop masking signals in create_io_thread() io_uring: handle signals for IO threads like a normal thread kernel: don't call do_exit() for PF_IO_WORKER threads io_uring: maintain CQE order of a failed link io-wq: fix race around pending work on teardown io_uring: do ctx sqd ejection in a clear context io_uring: fix provide_buffers sign extension io_uring: don't skip file_end_write() on reissue io_uring: correct io_queue_async_work() traces io_uring: don't use {test,clear}_tsk_thread_flag() for current
2021-03-27Revert "signal: don't allow STOP on PF_IO_WORKER threads"Jens Axboe
This reverts commit 4db4b1a0d1779dc159f7b87feb97030ec0b12597. The IO threads allow and handle SIGSTOP now, so don't special case them anymore in task_set_jobctl_pending(). Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27Revert "kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing"Jens Axboe
This reverts commit 15b2219facadec583c24523eed40fa45865f859f. Before IO threads accepted signals, the freezer using take signals to wake up an IO thread would cause them to loop without any way to clear the pending signal. That is no longer the case, so stop special casing PF_IO_WORKER in the freezer. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27Revert "kernel: treat PF_IO_WORKER like PF_KTHREAD for ptrace/signals"Jens Axboe
This reverts commit 6fb8f43cede0e4bd3ead847de78d531424a96be9. The IO threads do allow signals now, including SIGSTOP, and we can allow ptrace attach. Attaching won't reveal anything interesting for the IO threads, but it will allow eg gdb to attach to a task with io_urings and IO threads without complaining. And once attached, it will allow the usual introspection into regular threads. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27Revert "signal: don't allow sending any signals to PF_IO_WORKER threads"Jens Axboe
This reverts commit 5be28c8f85ce99ed2d329d2ad8bdd18ea19473a5. IO threads now take signals just fine, so there's no reason to limit them specifically. Revert the change that prevented that from happening. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-27kernel: stop masking signals in create_io_thread()Jens Axboe
This is racy - move the blocking into when the task is created and we're marking it as PF_IO_WORKER anyway. The IO threads are now prepared to handle signals like SIGSTOP as well, so clear that from the mask to allow proper stopping of IO threads. Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-26bpf: Take module reference for trampoline in moduleJiri Olsa
Currently module can be unloaded even if there's a trampoline register in it. It's easily reproduced by running in parallel: # while :; do ./test_progs -t module_attach; done # while :; do rmmod bpf_testmod; sleep 0.5; done Taking the module reference in case the trampoline's ip is within the module code. Releasing it when the trampoline's ip is unregistered. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210326105900.151466-1-jolsa@kernel.org
2021-03-26kernel: don't call do_exit() for PF_IO_WORKER threadsJens Axboe
Right now we're never calling get_signal() from PF_IO_WORKER threads, but in preparation for doing so, don't handle a fatal signal for them. The workers have state they need to cleanup when exiting, so just return instead of calling do_exit() on their behalf. The threads themselves will detect a fatal signal and do proper shutdown. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-26Merge tag 'pm-5.12-rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix an issue related to device links in the runtime PM framework and debugfs usage in the Energy Model code. Specifics: - Modify the runtime PM device suspend to avoid suspending supplier devices before the consumer device's status changes to RPM_SUSPENDED (Rafael Wysocki) - Change the Energy Model code to prevent it from attempting to create its main debugfs directory too early (Lukasz Luba)" * tag 'pm-5.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM: EM: postpone creating the debugfs dir till fs_initcall PM: runtime: Defer suspending suppliers
2021-03-26bpf: Fix a spelling typo in bpf_atomic_alu_string disasmXu Kuohai
The name string for BPF_XOR is "xor", not "or". Fix it. Fixes: 981f94c3e921 ("bpf: Add bitwise atomic instructions") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Brendan Jackman <jackmanb@google.com> Link: https://lore.kernel.org/bpf/20210325134141.8533-1-xukuohai@huawei.com
2021-03-26bpf: Enforce that struct_ops programs be GPL-onlyToke Høiland-Jørgensen
With the introduction of the struct_ops program type, it became possible to implement kernel functionality in BPF, making it viable to use BPF in place of a regular kernel module for these particular operations. Thus far, the only user of this mechanism is for implementing TCP congestion control algorithms. These are clearly marked as GPL-only when implemented as modules (as seen by the use of EXPORT_SYMBOL_GPL for tcp_register_congestion_control()), so it seems like an oversight that this was not carried over to BPF implementations. Since this is the only user of the struct_ops mechanism, just enforcing GPL-only for the struct_ops program type seems like the simplest way to fix this. Fixes: 0baf26b0fcd7 ("bpf: tcp: Support tcp_congestion_ops in bpf") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210326100314.121853-1-toke@redhat.com
2021-03-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: mm (hugetlb, kasan, gup, selftests, z3fold, kfence, memblock, and highmem), squashfs, ia64, gcov, and mailmap" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mailmap: update Andrey Konovalov's email address mm/highmem: fix CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP mm: memblock: fix section mismatch warning again kfence: make compatible with kmemleak gcov: fix clang-11+ support ia64: fix format strings for err_inject ia64: mca: allocate early mca with GFP_ATOMIC squashfs: fix xattr id and id lookup sanity checks squashfs: fix inode lookup sanity checks z3fold: prevent reclaim/free race for headless pages selftests/vm: fix out-of-tree build mm/mmu_notifiers: ensure range_end() is paired with range_start() kasan: fix per-page tags for non-page_alloc pages hugetlb_cgroup: fix imbalanced css_get and css_put pair for shared mappings
2021-03-25gcov: fix clang-11+ supportNick Desaulniers
LLVM changed the expected function signatures for llvm_gcda_start_file() and llvm_gcda_emit_function() in the clang-11 release. Users of clang-11 or newer may have noticed their kernels failing to boot due to a panic when enabling CONFIG_GCOV_KERNEL=y +CONFIG_GCOV_PROFILE_ALL=y. Fix up the function signatures so calling these functions doesn't panic the kernel. Link: https://reviews.llvm.org/rGcdd683b516d147925212724b09ec6fb792a40041 Link: https://reviews.llvm.org/rG13a633b438b6500ecad9e4f936ebadf3411d0f44 Link: https://lkml.kernel.org/r/20210312224132.3413602-2-ndesaulniers@google.com Signed-off-by: Nick Desaulniers <ndesaulniers@google.com> Reported-by: Prasad Sodagudi <psodagud@quicinc.com> Suggested-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Fangrui Song <maskray@google.com> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Cc: <stable@vger.kernel.org> [5.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-03-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds
Pull networking fixes from David Miller: "Various fixes, all over: 1) Fix overflow in ptp_qoriq_adjfine(), from Yangbo Lu. 2) Always store the rx queue mapping in veth, from Maciej Fijalkowski. 3) Don't allow vmlinux btf in map_create, from Alexei Starovoitov. 4) Fix memory leak in octeontx2-af from Colin Ian King. 5) Use kvalloc in bpf x86 JIT for storing jit'd addresses, from Yonghong Song. 6) Fix tx ptp stats in mlx5, from Aya Levin. 7) Check correct ip version in tun decap, fropm Roi Dayan. 8) Fix rate calculation in mlx5 E-Switch code, from arav Pandit. 9) Work item memork leak in mlx5, from Shay Drory. 10) Fix ip6ip6 tunnel crash with bpf, from Daniel Borkmann. 11) Lack of preemptrion awareness in macvlan, from Eric Dumazet. 12) Fix data race in pxa168_eth, from Pavel Andrianov. 13) Range validate stab in red_check_params(), from Eric Dumazet. 14) Inherit vlan filtering setting properly in b53 driver, from Florian Fainelli. 15) Fix rtnl locking in igc driver, from Sasha Neftin. 16) Pause handling fixes in igc driver, from Muhammad Husaini Zulkifli. 17) Missing rtnl locking in e1000_reset_task, from Vitaly Lifshits. 18) Use after free in qlcnic, from Lv Yunlong. 19) fix crash in fritzpci mISDN, from Tong Zhang. 20) Premature rx buffer reuse in igb, from Li RongQing. 21) Missing termination of ip[a driver message handler arrays, from Alex Elder. 22) Fix race between "x25_close" and "x25_xmit"/"x25_rx" in hdlc_x25 driver, from Xie He. 23) Use after free in c_can_pci_remove(), from Tong Zhang. 24) Uninitialized variable use in nl80211, from Jarod Wilson. 25) Off by one size calc in bpf verifier, from Piotr Krysiuk. 26) Use delayed work instead of deferrable for flowtable GC, from Yinjun Zhang. 27) Fix infinite loop in NPC unmap of octeontx2 driver, from Hariprasad Kelam. 28) Fix being unable to change MTU of dwmac-sun8i devices due to lack of fifo sizes, from Corentin Labbe. 29) DMA use after free in r8169 with WoL, fom Heiner Kallweit. 30) Mismatched prototypes in isdn-capi, from Arnd Bergmann. 31) Fix psample UAPI breakage, from Ido Schimmel" * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (171 commits) psample: Fix user API breakage math: Export mul_u64_u64_div_u64 ch_ktls: fix enum-conversion warning octeontx2-af: Fix memory leak of object buf ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation net: bridge: don't notify switchdev for local FDB addresses net/sched: act_ct: clear post_ct if doing ct_clear net: dsa: don't assign an error value to tag_ops isdn: capi: fix mismatched prototypes net/mlx5: SF, do not use ecpu bit for vhca state processing net/mlx5e: Fix division by 0 in mlx5e_select_queue net/mlx5e: Fix error path for ethtool set-priv-flag net/mlx5e: Offload tuple rewrite for non-CT flows net/mlx5e: Allow to match on MPLS parameters only for MPLS over UDP net/mlx5: Add back multicast stats for uplink representor net: ipconfig: ic_dev can be NULL in ic_close_devs MAINTAINERS: Combine "QLOGIC QLGE 10Gb ETHERNET DRIVER" sections into one docs: networking: Fix a typo r8169: fix DMA being used after buffer free if WoL is enabled net: ipa: fix init header command validation ...
2021-03-23PM: EM: postpone creating the debugfs dir till fs_initcallLukasz Luba
The debugfs directory '/sys/kernel/debug/energy_model' is needed before the Energy Model registration can happen. With the recent change in debugfs subsystem it's not allowed to create this directory at early stage (core_initcall). Thus creating this directory would fail. Postpone the creation of the EM debug dir to later stage: fs_initcall. It should be safe since all clients: CPUFreq drivers, Devfreq drivers will be initialized in later stages. The custom debug log below prints the time of creation the EM debug dir at fs_initcall and successful registration of EMs at later stages. [ 1.505717] energy_model: creating rootdir [ 3.698307] cpu cpu0: EM: created perf domain [ 3.709022] cpu cpu1: EM: created perf domain Fixes: 56348560d495 ("debugfs: do not attempt to create a new file before the filesystem is initalized") Reported-by: Ionela Voinescu <ionela.voinescu@arm.com> Signed-off-by: Lukasz Luba <lukasz.luba@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-22lockdep: Address clang -Wformat warning printing for %hdArnd Bergmann
Clang doesn't like format strings that truncate a 32-bit value to something shorter: kernel/locking/lockdep.c:709:4: error: format specifies type 'short' but the argument has type 'int' [-Werror,-Wformat] In this case, the warning is a slightly questionable, as it could realize that both class->wait_type_outer and class->wait_type_inner are in fact 8-bit struct members, even though the result of the ?: operator becomes an 'int'. However, there is really no point in printing the number as a 16-bit 'short' rather than either an 8-bit or 32-bit number, so just change it to a normal %d. Fixes: de8f5e4f2dc1 ("lockdep: Introduce wait-type checks") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210322115531.3987555-1-arnd@kernel.org
2021-03-21Merge tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring followup fixes from Jens Axboe: - The SIGSTOP change from Eric, so we properly ignore that for PF_IO_WORKER threads. - Disallow sending signals to PF_IO_WORKER threads in general, we're not interested in having them funnel back to the io_uring owning task. - Stable fix from Stefan, ensuring we properly break links for short send/sendmsg recv/recvmsg if MSG_WAITALL is set. - Catch and loop when needing to run task_work before a PF_IO_WORKER threads goes to sleep. * tag 'io_uring-5.12-2021-03-21' of git://git.kernel.dk/linux-block: io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL io-wq: ensure task is running before processing task_work signal: don't allow STOP on PF_IO_WORKER threads signal: don't allow sending any signals to PF_IO_WORKER threads
2021-03-21Merge tag 'irq-urgent-2021-03-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Ingo Molnar: "A change to robustify force-threaded IRQ handlers to always disable interrupts, plus a DocBook fix. The force-threaded IRQ handler change has been accelerated from the normal schedule of such a change to keep the bad pattern/workaround of spin_lock_irqsave() in handlers or IRQF_NOTHREAD as a kludge from spreading" * tag 'irq-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Disable interrupts for force threaded handlers genirq/irq_sim: Fix typos in kernel doc (fnode -> fwnode)
2021-03-21Merge tag 'locking-urgent-2021-03-21' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: - Get static calls & modules right. Hopefully. - WW mutex fixes * tag 'locking-urgent-2021-03-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: static_call: Fix static_call_update() sanity check static_call: Align static_call_is_init() patching condition static_call: Fix static_call_set_init() locking/ww_mutex: Fix acquire/release imbalance in ww_acquire_init()/ww_acquire_fini() locking/ww_mutex: Simplify use_ww_ctx & ww_ctx handling
2021-03-21Merge tag 'x86_urgent_for_v5.12-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "The freshest pile of shiny x86 fixes for 5.12: - Add the arch-specific mapping between physical and logical CPUs to fix devicetree-node lookups - Restore the IRQ2 ignore logic - Fix get_nr_restart_syscall() to return the correct restart syscall number. Split in a 4-patches set to avoid kABI breakage when backporting to dead kernels" * tag 'x86_urgent_for_v5.12-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/of: Fix CPU devicetree-node lookups x86/ioapic: Ignore IRQ2 again x86: Introduce restart_block->arch_data to remove TS_COMPAT_RESTART x86: Introduce TS_COMPAT_RESTART to fix get_nr_restart_syscall() x86: Move TS_COMPAT back to asm/thread_info.h kernel, fs: Introduce and use set_restart_fn() and arch_set_restart_data()
2021-03-21signal: don't allow STOP on PF_IO_WORKER threadsEric W. Biederman
Just like we don't allow normal signals to IO threads, don't deliver a STOP to a task that has PF_IO_WORKER set. The IO threads don't take signals in general, and have no means of flushing out a stop either. Longer term, we may want to look into allowing stop of these threads, as it relates to eg process freezing. For now, this prevents a spin issue if a SIGSTOP is delivered to the parent task. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-03-21signal: don't allow sending any signals to PF_IO_WORKER threadsJens Axboe
They don't take signals individually, and even if they share signals with the parent task, don't allow them to be delivered through the worker thread. Linux does allow this kind of behavior for regular threads, but it's really a compatability thing that we need not care about for the IO threads. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-03-21lockdep: Add a missing initialization hint to the "INFO: Trying to register ↵Tetsuo Handa
non-static key" message Since this message is printed when dynamically allocated spinlocks (e.g. kzalloc()) are used without initialization (e.g. spin_lock_init()), suggest to developers to check whether initialization functions for objects were called, before making developers wonder what annotation is missing. [ mingo: Minor tweaks to the message. ] Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210321064913.4619-1-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
2021-03-21genirq: Disable interrupts for force threaded handlersThomas Gleixner
With interrupt force threading all device interrupt handlers are invoked from kernel threads. Contrary to hard interrupt context the invocation only disables bottom halfs, but not interrupts. This was an oversight back then because any code like this will have an issue: thread(irq_A) irq_handler(A) spin_lock(&foo->lock); interrupt(irq_B) irq_handler(B) spin_lock(&foo->lock); This has been triggered with networking (NAPI vs. hrtimers) and console drivers where printk() happens from an interrupt which interrupted the force threaded handler. Now people noticed and started to change the spin_lock() in the handler to spin_lock_irqsave() which affects performance or add IRQF_NOTHREAD to the interrupt request which in turn breaks RT. Fix the root cause and not the symptom and disable interrupts before invoking the force threaded handler which preserves the regular semantics and the usefulness of the interrupt force threading as a general debugging tool. For not RT this is not changing much, except that during the execution of the threaded handler interrupts are delayed until the handler returns. Vs. scheduling and softirq processing there is no difference. For RT kernels there is no issue. Fixes: 8d32a307e4fa ("genirq: Provide forced interrupt threading") Reported-by: Johan Hovold <johan@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Johan Hovold <johan@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://lore.kernel.org/r/20210317143859.513307808@linutronix.de
2021-03-19Merge tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull io_uring fixes from Jens Axboe: "Quieter week this time, which was both expected and desired. About half of the below is fixes for this release, the other half are just fixes in general. In detail: - Fix the freezing of IO threads, by making the freezer not send them fake signals. Make them freezable by default. - Like we did for personalities, move the buffer IDR to xarray. Kills some code and avoids a use-after-free on teardown. - SQPOLL cleanups and fixes (Pavel) - Fix linked timeout race (Pavel) - Fix potential completion post use-after-free (Pavel) - Cleanup and move internal structures outside of general kernel view (Stefan) - Use MSG_SIGNAL for send/recv from io_uring (Stefan)" * tag 'io_uring-5.12-2021-03-19' of git://git.kernel.dk/linux-block: io_uring: don't leak creds on SQO attach error io_uring: use typesafe pointers in io_uring_task io_uring: remove structures from include/linux/io_uring.h io_uring: imply MSG_NOSIGNAL for send[msg]()/recv[msg]() calls io_uring: fix sqpoll cancellation via task_work io_uring: add generic callback_head helpers io_uring: fix concurrent parking io_uring: halt SQO submission on ctx exit io_uring: replace sqd rw_semaphore with mutex io_uring: fix complete_post use ctx after free io_uring: fix ->flags races by linked timeouts io_uring: convert io_buffer_idr to XArray io_uring: allow IO worker threads to be frozen kernel: freezer should treat PF_IO_WORKER like PF_KTHREAD for freezing
2021-03-19bpf: Fix umd memory leak in copy_process()Zqiang
The syzbot reported a memleak as follows: BUG: memory leak unreferenced object 0xffff888101b41d00 (size 120): comm "kworker/u4:0", pid 8, jiffies 4294944270 (age 12.780s) backtrace: [<ffffffff8125dc56>] alloc_pid+0x66/0x560 [<ffffffff81226405>] copy_process+0x1465/0x25e0 [<ffffffff81227943>] kernel_clone+0xf3/0x670 [<ffffffff812281a1>] kernel_thread+0x61/0x80 [<ffffffff81253464>] call_usermodehelper_exec_work [<ffffffff81253464>] call_usermodehelper_exec_work+0xc4/0x120 [<ffffffff812591c9>] process_one_work+0x2c9/0x600 [<ffffffff81259ab9>] worker_thread+0x59/0x5d0 [<ffffffff812611c8>] kthread+0x178/0x1b0 [<ffffffff8100227f>] ret_from_fork+0x1f/0x30 unreferenced object 0xffff888110ef5c00 (size 232): comm "kworker/u4:0", pid 8414, jiffies 4294944270 (age 12.780s) backtrace: [<ffffffff8154a0cf>] kmem_cache_zalloc [<ffffffff8154a0cf>] __alloc_file+0x1f/0xf0 [<ffffffff8154a809>] alloc_empty_file+0x69/0x120 [<ffffffff8154a8f3>] alloc_file+0x33/0x1b0 [<ffffffff8154ab22>] alloc_file_pseudo+0xb2/0x140 [<ffffffff81559218>] create_pipe_files+0x138/0x2e0 [<ffffffff8126c793>] umd_setup+0x33/0x220 [<ffffffff81253574>] call_usermodehelper_exec_async+0xb4/0x1b0 [<ffffffff8100227f>] ret_from_fork+0x1f/0x30 After the UMD process exits, the pipe_to_umh/pipe_from_umh and tgid need to be released. Fixes: d71fa5c9763c ("bpf: Add kernel module with user mode driver that populates bpffs.") Reported-by: syzbot+44908bb56d2bfe56b28e@syzkaller.appspotmail.com Signed-off-by: Zqiang <qiang.zhang@windriver.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210317030915.2865-1-qiang.zhang@windriver.com
2021-03-19static_call: Fix static_call_update() sanity checkPeter Zijlstra
Sites that match init_section_contains() get marked as INIT. For built-in code init_sections contains both __init and __exit text. OTOH kernel_text_address() only explicitly includes __init text (and there are no __exit text markers). Match what jump_label already does and ignore the warning for INIT sites. Also see the excellent changelog for commit: 8f35eaa5f2de ("jump_label: Don't warn on __exit jump entries") Fixes: 9183c3f9ed710 ("static_call: Add inline static call infrastructure") Reported-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lkml.kernel.org/r/20210318113610.739542434@infradead.org
2021-03-19static_call: Align static_call_is_init() patching conditionPeter Zijlstra
The intent is to avoid writing init code after init (because the text might have been freed). The code is needlessly different between jump_label and static_call and not obviously correct. The existing code relies on the fact that the module loader clears the init layout, such that within_module_init() always fails, while jump_label relies on the module state which is more obvious and matches the kernel logic. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lkml.kernel.org/r/20210318113610.636651340@infradead.org
2021-03-19static_call: Fix static_call_set_init()Peter Zijlstra
It turns out that static_call_set_init() does not preserve the other flags; IOW. it clears TAIL if it was set. Fixes: 9183c3f9ed710 ("static_call: Add inline static call infrastructure") Reported-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lkml.kernel.org/r/20210318113610.519406371@infradead.org
2021-03-18Revert "PM: ACPI: reboot: Use S5 for reboot"Josef Bacik
This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095. This patch causes a panic when rebooting my Dell Poweredge r440. I do not have the full panic log as it's lost at that stage of the reboot and I do not have a serial console. Reverting this patch makes my system able to reboot again. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-03-18bpf: Fix fexit trampoline.Alexei Starovoitov
The fexit/fmod_ret programs can be attached to kernel functions that can sleep. The synchronize_rcu_tasks() will not wait for such tasks to complete. In such case the trampoline image will be freed and when the task wakes up the return IP will point to freed memory causing the crash. Solve this by adding percpu_ref_get/put for the duration of trampoline and separate trampoline vs its image life times. The "half page" optimization has to be removed, since first_half->second_half->first_half transition cannot be guaranteed to complete in deterministic time. Every trampoline update becomes a new image. The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and call_rcu_tasks. Together they will wait for the original function and trampoline asm to complete. The trampoline is patched from nop to jmp to skip fexit progs. They are freed independently from the trampoline. The image with fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which will wait for both sleepable and non-sleepable progs to complete. Fixes: fec56f5890d9 ("bpf: Introduce BPF trampoline") Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
2021-03-17bpf: Add sanity check for upper ptr_limitPiotr Krysiuk
Given we know the max possible value of ptr_limit at the time of retrieving the latter, add basic assertions, so that the verifier can bail out if anything looks odd and reject the program. Nothing triggered this so far, but it also does not hurt to have these. Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-03-17bpf: Simplify alu_limit masking for pointer arithmeticPiotr Krysiuk
Instead of having the mov32 with aux->alu_limit - 1 immediate, move this operation to retrieve_ptr_limit() instead to simplify the logic and to allow for subsequent sanity boundary checks inside retrieve_ptr_limit(). This avoids in future that at the time of the verifier masking rewrite we'd run into an underflow which would not sign extend due to the nature of mov32 instruction. Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-03-17bpf: Fix off-by-one for area size in creating mask to leftPiotr Krysiuk
retrieve_ptr_limit() computes the ptr_limit for registers with stack and map_value type. ptr_limit is the size of the memory area that is still valid / in-bounds from the point of the current position and direction of the operation (add / sub). This size will later be used for masking the operation such that attempting out-of-bounds access in the speculative domain is redirected to remain within the bounds of the current map value. When masking to the right the size is correct, however, when masking to the left, the size is off-by-one which would lead to an incorrect mask and thus incorrect arithmetic operation in the non-speculative domain. Piotr found that if the resulting alu_limit value is zero, then the BPF_MOV32_IMM() from the fixup_bpf_calls() rewrite will end up loading 0xffffffff into AX instead of sign-extending to the full 64 bit range, and as a result, this allows abuse for executing speculatively out-of- bounds loads against 4GB window of address space and thus extracting the contents of kernel memory via side-channel. Fixes: 979d63d50c0c ("bpf: prevent out of bounds speculation on pointer arithmetic") Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
2021-03-17bpf: Prohibit alu ops for pointer types not defining ptr_limitPiotr Krysiuk
The purpose of this patch is to streamline error propagation and in particular to propagate retrieve_ptr_limit() errors for pointer types that are not defining a ptr_limit such that register-based alu ops against these types can be rejected. The main rationale is that a gap has been identified by Piotr in the existing protection against speculatively out-of-bounds loads, for example, in case of ctx pointers, unprivileged programs can still perform pointer arithmetic. This can be abused to execute speculatively out-of-bounds loads without restrictions and thus extract contents of kernel memory. Fix this by rejecting unprivileged programs that attempt any pointer arithmetic on unprotected pointer types. The two affected ones are pointer to ctx as well as pointer to map. Field access to a modified ctx' pointer is rejected at a later point in time in the verifier, and 7c6967326267 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0") only relevant for root-only use cases. Risk of unprivileged program breakage is considered very low. Fixes: 7c6967326267 ("bpf: Permit map_ptr arithmetic with opcode add and offset 0") Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation") Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>