summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2022-08-03Merge v5.18.16linaro-local/ci/tcwg_kernel/llvm-release-arm-stable-defconfiglinaro-local/ci/tcwg_kernel/llvm-release-arm-stable-allyesconfiglinaro-local/ci/tcwg_kernel/llvm-release-arm-stable-allnoconfiglinaro-local/ci/tcwg_kernel/llvm-release-arm-stable-allmodconfiglinaro-local/ci/tcwg_kernel/llvm-release-aarch64-stable-defconfiglinaro-local/ci/tcwg_kernel/llvm-release-aarch64-stable-allyesconfiglinaro-local/ci/tcwg_kernel/llvm-release-aarch64-stable-allnoconfiglinaro-local/ci/tcwg_kernel/llvm-release-aarch64-stable-allmodconfiglinaro-local/ci/tcwg_kernel/llvm-master-arm-stable-defconfiglinaro-local/ci/tcwg_kernel/llvm-master-arm-stable-allyesconfiglinaro-local/ci/tcwg_kernel/llvm-master-arm-stable-allnoconfiglinaro-local/ci/tcwg_kernel/llvm-master-arm-stable-allmodconfiglinaro-local/ci/tcwg_kernel/llvm-master-aarch64-stable-defconfiglinaro-local/ci/tcwg_kernel/llvm-master-aarch64-stable-allyesconfiglinaro-local/ci/tcwg_kernel/llvm-master-aarch64-stable-allnoconfiglinaro-local/ci/tcwg_kernel/llvm-master-aarch64-stable-allmodconfiglinaro-local/ci/tcwg_kernel/gnu-release-arm-stable-defconfiglinaro-local/ci/tcwg_kernel/gnu-release-arm-stable-allyesconfiglinaro-local/ci/tcwg_kernel/gnu-release-arm-stable-allnoconfiglinaro-local/ci/tcwg_kernel/gnu-release-arm-stable-allmodconfiglinaro-local/ci/tcwg_kernel/gnu-release-aarch64-stable-defconfiglinaro-local/ci/tcwg_kernel/gnu-release-aarch64-stable-allyesconfiglinaro-local/ci/tcwg_kernel/gnu-release-aarch64-stable-allnoconfiglinaro-local/ci/tcwg_kernel/gnu-release-aarch64-stable-allmodconfiglinaro-local/ci/tcwg_kernel/gnu-master-arm-stable-defconfiglinaro-local/ci/tcwg_kernel/gnu-master-arm-stable-allyesconfiglinaro-local/ci/tcwg_kernel/gnu-master-arm-stable-allnoconfiglinaro-local/ci/tcwg_kernel/gnu-master-arm-stable-allmodconfiglinaro-local/ci/tcwg_kernel/gnu-master-aarch64-stable-defconfiglinaro-local/ci/tcwg_kernel/gnu-master-aarch64-stable-allyesconfiglinaro-local/ci/tcwg_kernel/gnu-master-aarch64-stable-allnoconfiglinaro-local/ci/tcwg_kernel/gnu-master-aarch64-stable-allmodconfiglinaro-local/ci/tcwg_gnu_native_check_gdb/release-armlinaro-local/ci/tcwg_gnu_native_check_gdb/release-aarch64linaro-local/ci/tcwg_gnu_native_check_gcc/release-armlinaro-local/ci/tcwg_gnu_native_check_gcc/release-aarch64linaro-local/ci/tcwg_gnu_native_check_binutils/release-armlinaro-local/ci/tcwg_gnu_native_check_binutils/release-aarch64linaro-local/ci/tcwg_gnu_native_build/release-armlinaro-local/ci/tcwg_gnu_native_build/release-aarch64linaro-local/ci/tcwg_gnu_cross_check_gcc/release-armlinaro-local/ci/tcwg_gnu_cross_check_gcc/release-aarch64linaro-local/ci/tcwg_gnu_cross_build/release-armlinaro-local/ci/tcwg_gnu_cross_build/release-aarch64linaro-local/ci/tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3_LTOlinaro-local/ci/tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O3linaro-local/ci/tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2_LTOlinaro-local/ci/tcwg_bmk_llvm_tx1/llvm-release-aarch64-spec2k6-O2linaro-local/ci/tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3_LTOlinaro-local/ci/tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O3linaro-local/ci/tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2_LTOlinaro-local/ci/tcwg_bmk_llvm_tx1/llvm-master-aarch64-spec2k6-O2linaro-local/ci/tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3_LTOlinaro-local/ci/tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O3linaro-local/ci/tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2_LTOlinaro-local/ci/tcwg_bmk_llvm_tk1/llvm-release-arm-spec2k6-O2linaro-local/ci/tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3_LTOlinaro-local/ci/tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O3linaro-local/ci/tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2_LTOlinaro-local/ci/tcwg_bmk_llvm_tk1/llvm-master-arm-spec2k6-O2linaro-local/ci/tcwg_bmk_llvm_sq/llvm-master-aarch64-spec2k6-Oslinaro-local/ci/tcwg_bmk_llvm_fx/llvm-master-aarch64-spec2k6-O2linaro-local/ci/tcwg_bmk_llvm_fx/llvm-master-aarch64-cpu2017-O3linaro-local/ci/tcwg_bmk_llvm_fx/llvm-master-aarch64-cpu2017-O2linaro-local/ci/tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Oz_LTOlinaro-local/ci/tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Ozlinaro-local/ci/tcwg_bmk_llvm_apm/llvm-release-arm-spec2k6-Oslinaro-local/ci/tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Oz_LTOlinaro-local/ci/tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Ozlinaro-local/ci/tcwg_bmk_llvm_apm/llvm-release-aarch64-spec2k6-Os_LTOlinaro-local/ci/tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oz_LTOlinaro-local/ci/tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Ozlinaro-local/ci/tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Os_LTOlinaro-local/ci/tcwg_bmk_llvm_apm/llvm-master-arm-spec2k6-Oslinaro-local/ci/tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oz_LTOlinaro-local/ci/tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Ozlinaro-local/ci/tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Os_LTOlinaro-local/ci/tcwg_bmk_llvm_apm/llvm-master-aarch64-spec2k6-Oslinaro-local/ci/tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3_LTOlinaro-local/ci/tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O3linaro-local/ci/tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O2_LTOlinaro-local/ci/tcwg_bmk_gnu_tx1/gnu-release-aarch64-spec2k6-O2linaro-local/ci/tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O3_LTOlinaro-local/ci/tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O3linaro-local/ci/tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O2_LTOlinaro-local/ci/tcwg_bmk_gnu_tx1/gnu-master-aarch64-spec2k6-O2linaro-local/ci/tcwg_bmk_gnu_tk1/gnu-release-arm-spec2k6-O3_LTOlinaro-local/ci/tcwg_bmk_gnu_tk1/gnu-release-arm-spec2k6-O3linaro-local/ci/tcwg_bmk_gnu_tk1/gnu-release-arm-spec2k6-O2_LTOlinaro-local/ci/tcwg_bmk_gnu_tk1/gnu-release-arm-spec2k6-O2linaro-local/ci/tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3_LTOlinaro-local/ci/tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3linaro-local/ci/tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O2_LTOlinaro-local/ci/tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O2linaro-local/ci/tcwg_bmk_gnu_sq/gnu-release-aarch64-spec2k6-Oslinaro-local/ci/tcwg_bmk_gnu_sq/gnu-master-aarch64-spec2k6-Oslinaro-local/ci/tcwg_bmk_gnu_fx/gnu-master-aarch64-cpu2017-O3linaro-local/ci/tcwg_bmk_gnu_fx/gnu-master-aarch64-cpu2017-O2linaro-local/ci/tcwg_bmk_gnu_apm/gnu-release-arm-spec2k6-Os_LTOlinaro-local/ci/tcwg_bmk_gnu_apm/gnu-release-arm-spec2k6-Oslinaro-local/ci/tcwg_bmk_gnu_apm/gnu-release-aarch64-spec2k6-Os_LTOlinaro-local/ci/tcwg_bmk_gnu_apm/gnu-release-aarch64-spec2k6-Oslinaro-local/ci/tcwg_bmk_gnu_apm/gnu-master-arm-spec2k6-Os_LTOlinaro-local/ci/tcwg_bmk_gnu_apm/gnu-master-arm-spec2k6-Oslinaro-local/ci/tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-Os_LTOlinaro-local/ci/tcwg_bmk_gnu_apm/gnu-master-aarch64-spec2k6-OsGreg Kroah-Hartman
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03Linux 5.18.16Greg Kroah-Hartman
Link: https://lore.kernel.org/r/20220801114138.041018499@linuxfoundation.org Tested-by: Jon Hunter <jonathanh@nvidia.com> Tested-by: Ronald Warsow <rwarsow@gmx.de> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Tested-by: Shuah Khan <skhan@linuxfoundation.org> Tested-by: Zan Aziz <zanaziz313@gmail.com> Tested-by: Ron Economos <re@w6rz.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Justin M. Forbes <jforbes@fedoraproject.org> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Tested-by: Linux Kernel Functional Testing <lkft@linaro.org> Tested-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03x86/bugs: Do not enable IBPB at firmware entry when IBPB is not availableThadeu Lima de Souza Cascardo
commit 571c30b1a88465a1c85a6f7762609939b9085a15 upstream. Some cloud hypervisors do not provide IBPB on very recent CPU processors, including AMD processors affected by Retbleed. Using IBPB before firmware calls on such systems would cause a GPF at boot like the one below. Do not enable such calls when IBPB support is not present. EFI Variables Facility v0.08 2004-May-17 general protection fault, maybe for address 0x1: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 24 Comm: kworker/u2:1 Not tainted 5.19.0-rc8+ #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Workqueue: efi_rts_wq efi_call_rts RIP: 0010:efi_call_rts Code: e8 37 33 58 ff 41 bf 48 00 00 00 49 89 c0 44 89 f9 48 83 c8 01 4c 89 c2 48 c1 ea 20 66 90 b9 49 00 00 00 b8 01 00 00 00 31 d2 <0f> 30 e8 7b 9f 5d ff e8 f6 f8 ff ff 4c 89 f1 4c 89 ea 4c 89 e6 48 RSP: 0018:ffffb373800d7e38 EFLAGS: 00010246 RAX: 0000000000000001 RBX: 0000000000000006 RCX: 0000000000000049 RDX: 0000000000000000 RSI: ffff94fbc19d8fe0 RDI: ffff94fbc1b2b300 RBP: ffffb373800d7e70 R08: 0000000000000000 R09: 0000000000000000 R10: 000000000000000b R11: 000000000000000b R12: ffffb3738001fd78 R13: ffff94fbc2fcfc00 R14: ffffb3738001fd80 R15: 0000000000000048 FS: 0000000000000000(0000) GS:ffff94fc3da00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff94fc30201000 CR3: 000000006f610000 CR4: 00000000000406f0 Call Trace: <TASK> ? __wake_up process_one_work worker_thread ? rescuer_thread kthread ? kthread_complete_and_exit ret_from_fork </TASK> Modules linked in: Fixes: 28a99e95f55c ("x86/amd: Use IBPB for firmware calls") Reported-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220728122602.2500509-1-cascardo@canonical.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by ↵Waiman Long
first waiter commit 6eebd5fb20838f5971ba17df9f55cc4f84a31053 upstream. With commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent"), the writer that sets the handoff bit can be interrupted out without clearing the bit if the wait queue isn't empty. This disables reader and writer optimistic lock spinning and stealing. Now if a non-first writer in the queue is somehow woken up or a new waiter enters the slowpath, it can't acquire the lock. This is not the case before commit d257cc8cb8d5 as the writer that set the handoff bit will clear it when exiting out via the out_nolock path. This is less efficient as the busy rwsem stays in an unlock state for a longer time. In some cases, this new behavior may cause lockups as shown in [1] and [2]. This patch allows a non-first writer to ignore the handoff bit if it is not originally set or initiated by the first waiter. This patch is shown to be effective in fixing the lockup problem reported in [1]. [1] https://lore.kernel.org/lkml/20220617134325.GC30825@techsingularity.net/ [2] https://lore.kernel.org/lkml/3f02975c-1a9d-be20-32cf-f1d8e3dfafcc@oracle.com/ Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Donnelly <john.p.donnelly@oracle.com> Tested-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/r/20220622200419.778799-1-longman@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03docs/kernel-parameters: Update descriptions for "mitigations=" param with ↵Eiichi Tsukata
retbleed commit ea304a8b89fd0d6cf94ee30cb139dc23d9f1a62f upstream. Updates descriptions for "mitigations=off" and "mitigations=auto,nosmt" with the respective retbleed= settings. Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: corbet@lwn.net Link: https://lore.kernel.org/r/20220728043907.165688-1-eiichi.tsukata@nutanix.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03EDAC/synopsys: Re-enable the error interrupts on v3 hwSherry Sun
commit 4bcffe941758ee17becb43af3b25487f848f6512 upstream. zynqmp_get_error_info() writes 0 to the ECC_CLR_OFST register after an interrupt for a {un-,}correctable error is raised, which disables the error interrupts. Then the interrupt handler will be called only once. Therefore, re-enable the error interrupt line at the end of intr_handler() for v3.x Synopsys EDAC DDR. Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Shubhrajyoti Datta <Shubhrajyoti.datta@xilinx.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220427015137.8406-3-sherry.sun@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03EDAC/synopsys: Use the correct register to disable the error interrupt on v3 hwSherry Sun
commit be76ceaf03bc04e74be5e28f608316b73c2b04ad upstream. v3.x Synopsys EDAC DDR doesn't have the QOS Interrupt register. Use the ECC Clear Register to disable the error interrupts instead. Fixes: f7824ded4149 ("EDAC/synopsys: Add support for version 3 of the Synopsys EDAC DDR") Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Shubhrajyoti Datta <Shubhrajyoti.datta@xilinx.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220427015137.8406-2-sherry.sun@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03EDAC/ghes: Set the DIMM label unconditionallyToshi Kani
commit 5e2805d5379619c4a2e3ae4994e73b36439f4bad upstream. The commit cb51a371d08e ("EDAC/ghes: Setup DIMM label from DMI and use it in error reports") enforced that both the bank and device strings passed to dimm_setup_label() are not NULL. However, there are BIOSes, for example on a HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019 which don't populate both strings: Handle 0x0020, DMI type 17, 84 bytes Memory Device Array Handle: 0x0013 Error Information Handle: Not Provided Total Width: 72 bits Data Width: 64 bits Size: 32 GB Form Factor: DIMM Set: None Locator: PROC 1 DIMM 1 <===== device Bank Locator: Not Specified <===== bank This results in a buffer overflow because ghes_edac_register() calls strlen() on an uninitialized label, which had non-zero values left over from krealloc_array(): detected buffer overflow in __fortify_strlen ------------[ cut here ]------------ kernel BUG at lib/string_helpers.c:983! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 1 Comm: swapper/0 Tainted: G I 5.18.6-200.fc36.x86_64 #1 Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 03/15/2019 RIP: 0010:fortify_panic ... Call Trace: <TASK> ghes_edac_register.cold ghes_probe platform_probe really_probe __driver_probe_device driver_probe_device __driver_attach ? __device_attach_driver bus_for_each_dev bus_add_driver driver_register acpi_ghes_init acpi_init ? acpi_sleep_proc_init do_one_initcall The label contains garbage because the commit in Fixes reallocs the DIMMs array while scanning the system but doesn't clear the newly allocated memory. Change dimm_setup_label() to always initialize the label to fix the issue. Set it to the empty string in case BIOS does not provide both bank and device so that ghes_edac_register() can keep the default label given by edac_mc_alloc_dimms(). [ bp: Rewrite commit message. ] Fixes: b9cae27728d1f ("EDAC/ghes: Scan the system once on driver init") Co-developed-by: Robert Richter <rric@kernel.org> Signed-off-by: Robert Richter <rric@kernel.org> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Robert Elliott <elliott@hpe.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220719220124.760359-1-toshi.kani@hpe.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03ARM: 9216/1: Fix MAX_DMA_ADDRESS overflowFlorian Fainelli
[ Upstream commit fb0fd3469ead5b937293c213daa1f589b4b7ce46 ] Commit 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis") added a check to determine whether arm_dma_zone_size is exceeding the amount of kernel virtual address space available between the upper 4GB virtual address limit and PAGE_OFFSET in order to provide a suitable definition of MAX_DMA_ADDRESS that should fit within the 32-bit virtual address space. The quantity used for comparison was off by a missing trailing 0, leading to MAX_DMA_ADDRESS to be overflowing a 32-bit quantity. This was caught thanks to CONFIG_DEBUG_VIRTUAL on the bcm2711 platform where we define a dma_zone_size of 1GB and we have a PAGE_OFFSET value of 0xc000_0000 (CONFIG_VMSPLIT_3G) leading to MAX_DMA_ADDRESS being 0x1_0000_0000 which overflows the unsigned long type used throughout __pa() and then __virt_addr_valid(). Because the virtual address passed to __virt_addr_valid() would now be 0, the function would loudly warn and flood the kernel log, thus making the platform unable to boot properly. Fixes: 26f09e9b3a06 ("mm/memblock: add memblock memory allocation apis") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix data-races around sysctl_tcp_workaround_signed_windows.Kuniyuki Iwashima
[ Upstream commit 0f1e4d06591d0a7907c71f7b6d1c79f8a4de8098 ] While reading sysctl_tcp_workaround_signed_windows, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 15d99e02baba ("[TCP]: sysctl to allow TCP window > 32767 sans wscale") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03page_alloc: fix invalid watermark check on a negative valueJaewon Kim
commit 9282012fc0aa248b77a69f5eb802b67c5a16bb13 upstream. There was a report that a task is waiting at the throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was increasing. This is a bug where zone_watermark_fast returns true even when the free is very low. The commit f27ce0e14088 ("page_alloc: consider highatomic reserve in watermark fast") changed the watermark fast to consider highatomic reserve. But it did not handle a negative value case which can be happened when reserved_highatomic pageblock is bigger than the actual free. If watermark is considered as ok for the negative value, allocating contexts for order-0 will consume all free pages without direct reclaim, and finally free page may become depleted except highatomic free. Then allocating contexts may fall into throttle_direct_reclaim. This symptom may easily happen in a system where wmark min is low and other reclaimers like kswapd does not make free pages quickly. Handle the negative case by using MIN. Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com Fixes: f27ce0e14088 ("page_alloc: consider highatomic reserve in watermark fast") Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Reported-by: GyeongHwan Hong <gh21.hong@samsung.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yong-Taek Lee <ytk.lee@samsung.com> Cc: <stable@vger.kerenl.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03mm/hmm: fault non-owner device private entriesRalph Campbell
commit 8a295dbbaf7292c582a40ce469c326f472d51f66 upstream. If hmm_range_fault() is called with the HMM_PFN_REQ_FAULT flag and a device private PTE is found, the hmm_range::dev_private_owner page is used to determine if the device private page should not be faulted in. However, if the device private page is not owned by the caller, hmm_range_fault() returns an error instead of calling migrate_to_ram() to fault in the page. For example, if a page is migrated to GPU private memory and a RDMA fault capable NIC tries to read the migrated page, without this patch it will get an error. With this patch, the page will be migrated back to system memory and the NIC will be able to read the data. Link: https://lkml.kernel.org/r/20220727000837.4128709-2-rcampbell@nvidia.com Link: https://lkml.kernel.org/r/20220725183615.4118795-2-rcampbell@nvidia.com Fixes: 08ddddda667b ("mm/hmm: check the device private page owner in hmm_range_fault()") Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Reported-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Alistair Popple <apopple@nvidia.com> Cc: Philip Yang <Philip.Yang@amd.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03stmmac: dwmac-mediatek: fix resource leak in probeDan Carpenter
[ Upstream commit 4d3d3a1b244fd54629a6b7047f39a7bbc8d11910 ] If mediatek_dwmac_clks_config() fails, then call stmmac_remove_config_dt() before returning. Otherwise it is a resource leak. Fixes: fa4b3ca60e80 ("stmmac: dwmac-mediatek: fix clock issue") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/YuJ4aZyMUlG6yGGa@kili Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03net/funeth: Fix fun_xdp_tx() and XDP packet reclaimDimitris Michailidis
[ Upstream commit 51a83391d77bb0f7ff0aef06ca4c7f5aa9e80b4c ] The current implementation of fun_xdp_tx(), used for XPD_TX, is incorrect in that it takes an address/length pair and later releases it with page_frag_free(). It is OK for XDP_TX but the same code is used by ndo_xdp_xmit. In that case it loses the XDP memory type and releases the packet incorrectly for some of the types. Assorted breakage follows. Change fun_xdp_tx() to take xdp_frame and rely on xdp_return_frame() in reclaim. Fixes: db37bc177dae ("net/funeth: add the data path") Signed-off-by: Dimitris Michailidis <dmichail@fungible.com> Link: https://lore.kernel.org/r/20220726215923.7887-1-dmichail@fungible.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03sctp: leave the err path free in sctp_stream_init to sctp_stream_freeXin Long
[ Upstream commit 181d8d2066c000ba0a0e6940a7ad80f1a0e68e9d ] A NULL pointer dereference was reported by Wei Chen: BUG: kernel NULL pointer dereference, address: 0000000000000000 RIP: 0010:__list_del_entry_valid+0x26/0x80 Call Trace: <TASK> sctp_sched_dequeue_common+0x1c/0x90 sctp_sched_prio_dequeue+0x67/0x80 __sctp_outq_teardown+0x299/0x380 sctp_outq_free+0x15/0x20 sctp_association_free+0xc3/0x440 sctp_do_sm+0x1ca7/0x2210 sctp_assoc_bh_rcv+0x1f6/0x340 This happens when calling sctp_sendmsg without connecting to server first. In this case, a data chunk already queues up in send queue of client side when processing the INIT_ACK from server in sctp_process_init() where it calls sctp_stream_init() to alloc stream_in. If it fails to alloc stream_in all stream_out will be freed in sctp_stream_init's err path. Then in the asoc freeing it will crash when dequeuing this data chunk as stream_out is missing. As we can't free stream out before dequeuing all data from send queue, and this patch is to fix it by moving the err path stream_out/in freeing in sctp_stream_init() to sctp_stream_free() which is eventually called when freeing the asoc in sctp_association_free(). This fix also makes the code in sctp_process_init() more clear. Note that in sctp_association_init() when it fails in sctp_stream_init(), sctp_association_free() will not be called, and in that case it should go to 'stream_free' err path to free stream instead of 'fail_init'. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Link: https://lore.kernel.org/r/831a3dc100c4908ff76e5bcc363be97f2778bc0b.1658787066.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03sfc: disable softirqs for ptp TXAlejandro Lucero
[ Upstream commit 67c3b611d92fc238c43734878bc3e232ab570c79 ] Sending a PTP packet can imply to use the normal TX driver datapath but invoked from the driver's ptp worker. The kernel generic TX code disables softirqs and preemption before calling specific driver TX code, but the ptp worker does not. Although current ptp driver functionality does not require it, there are several reasons for doing so: 1) The invoked code is always executed with softirqs disabled for non PTP packets. 2) Better if a ptp packet transmission is not interrupted by softirq handling which could lead to high latencies. 3) netdev_xmit_more used by the TX code requires preemption to be disabled. Indeed a solution for dealing with kernel preemption state based on static kernel configuration is not possible since the introduction of dynamic preemption level configuration at boot time using the static calls functionality. Fixes: f79c957a0b537 ("drivers: net: sfc: use netdev_xmit_more helper") Signed-off-by: Alejandro Lucero <alejandro.lucero-palau@amd.com> Link: https://lore.kernel.org/r/20220726064504.49613-1-alejandro.lucero-palau@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03perf symbol: Correct address for bss symbolsLeo Yan
[ Upstream commit 2d86612aacb7805f72873691a2644d7279ed0630 ] When using 'perf mem' and 'perf c2c', an issue is observed that tool reports the wrong offset for global data symbols. This is a common issue on both x86 and Arm64 platforms. Let's see an example, for a test program, below is the disassembly for its .bss section which is dumped with objdump: ... Disassembly of section .bss: 0000000000004040 <completed.0>: ... 0000000000004080 <buf1>: ... 00000000000040c0 <buf2>: ... 0000000000004100 <thread>: ... First we used 'perf mem record' to run the test program and then used 'perf --debug verbose=4 mem report' to observe what's the symbol info for 'buf1' and 'buf2' structures. # ./perf mem record -e ldlat-loads,ldlat-stores -- false_sharing.exe 8 # ./perf --debug verbose=4 mem report ... dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 sh_addr: 0x4040 sh_offset: 0x3028 symbol__new: buf2 0x30a8-0x30e8 ... dso__load_sym_internal: adjusting symbol: st_value: 0x4080 sh_addr: 0x4040 sh_offset: 0x3028 symbol__new: buf1 0x3068-0x30a8 ... The perf tool relies on libelf to parse symbols, in executable and shared object files, 'st_value' holds a virtual address; 'sh_addr' is the address at which section's first byte should reside in memory, and 'sh_offset' is the byte offset from the beginning of the file to the first byte in the section. The perf tool uses below formula to convert a symbol's memory address to a file address: file_address = st_value - sh_addr + sh_offset ^ ` Memory address We can see the final adjusted address ranges for buf1 and buf2 are [0x30a8-0x30e8) and [0x3068-0x30a8) respectively, apparently this is incorrect, in the code, the structure for 'buf1' and 'buf2' specifies compiler attribute with 64-byte alignment. The problem happens for 'sh_offset', libelf returns it as 0x3028 which is not 64-byte aligned, combining with disassembly, it's likely libelf doesn't respect the alignment for .bss section, therefore, it doesn't return the aligned value for 'sh_offset'. Suggested by Fangrui Song, ELF file contains program header which contains PT_LOAD segments, the fields p_vaddr and p_offset in PT_LOAD segments contain the execution info. A better choice for converting memory address to file address is using the formula: file_address = st_value - p_vaddr + p_offset This patch introduces elf_read_program_header() which returns the program header based on the passed 'st_value', then it uses the formula above to calculate the symbol file address; and the debugging log is updated respectively. After applying the change: # ./perf --debug verbose=4 mem report ... dso__load_sym_internal: adjusting symbol: st_value: 0x40c0 p_vaddr: 0x3d28 p_offset: 0x2d28 symbol__new: buf2 0x30c0-0x3100 ... dso__load_sym_internal: adjusting symbol: st_value: 0x4080 p_vaddr: 0x3d28 p_offset: 0x2d28 symbol__new: buf1 0x3080-0x30c0 ... Fixes: f17e04afaff84b5c ("perf report: Fix ELF symbol parsing") Reported-by: Chang Rui <changruinj@gmail.com> Suggested-by: Fangrui Song <maskray@google.com> Signed-off-by: Leo Yan <leo.yan@linaro.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220724060013.171050-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03virtio-net: fix the race between refill work and closeJason Wang
[ Upstream commit 5a159128faff151b7fe5f4eb0f310b1e0a2d56bf ] We try using cancel_delayed_work_sync() to prevent the work from enabling NAPI. This is insufficient since we don't disable the source of the refill work scheduling. This means an NAPI poll callback after cancel_delayed_work_sync() can schedule the refill work then can re-enable the NAPI that leads to use-after-free [1]. Since the work can enable NAPI, we can't simply disable NAPI before calling cancel_delayed_work_sync(). So fix this by introducing a dedicated boolean to control whether or not the work could be scheduled from NAPI. [1] ================================================================== BUG: KASAN: use-after-free in refill_work+0x43/0xd4 Read of size 2 at addr ffff88810562c92e by task kworker/2:1/42 CPU: 2 PID: 42 Comm: kworker/2:1 Not tainted 5.19.0-rc1+ #480 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: events refill_work Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0xbb/0x6ac ? _printk+0xad/0xde ? refill_work+0x43/0xd4 kasan_report+0xa8/0x130 ? refill_work+0x43/0xd4 refill_work+0x43/0xd4 process_one_work+0x43d/0x780 worker_thread+0x2a0/0x6f0 ? process_one_work+0x780/0x780 kthread+0x167/0x1a0 ? kthread_exit+0x50/0x50 ret_from_fork+0x22/0x30 </TASK> ... Fixes: b2baed69e605c ("virtio_net: set/cancel work on ndo_open/ndo_stop") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03scsi: ufs: core: Fix a race condition related to device managementBart Van Assche
[ Upstream commit f5c2976e0cb0f6236013bfb479868531b04f61d4 ] If a device management command completion happens after wait_for_completion_timeout() times out and before ufshcd_clear_cmds() is called, then the completion code may crash on the complete() call in __ufshcd_transfer_req_compl(). Fix the following crash: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 Call trace: complete+0x64/0x178 __ufshcd_transfer_req_compl+0x30c/0x9c0 ufshcd_poll+0xf0/0x208 ufshcd_sl_intr+0xb8/0xf0 ufshcd_intr+0x168/0x2f4 __handle_irq_event_percpu+0xa0/0x30c handle_irq_event+0x84/0x178 handle_fasteoi_irq+0x150/0x2e8 __handle_domain_irq+0x114/0x1e4 gic_handle_irq.31846+0x58/0x300 el1_irq+0xe4/0x1c0 efi_header_end+0x110/0x680 __irq_exit_rcu+0x108/0x124 __handle_domain_irq+0x118/0x1e4 gic_handle_irq.31846+0x58/0x300 el1_irq+0xe4/0x1c0 cpuidle_enter_state+0x3ac/0x8c4 do_idle+0x2fc/0x55c cpu_startup_entry+0x84/0x90 kernel_init+0x0/0x310 start_kernel+0x0/0x608 start_kernel+0x4ec/0x608 Link: https://lore.kernel.org/r/20220720170228.1598842-1-bvanassche@acm.org Fixes: 5a0b0cb9bee7 ("[SCSI] ufs: Add support for sending NOP OUT UPIU") Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Avri Altman <avri.altman@wdc.com> Cc: Bean Huo <beanhuo@micron.com> Cc: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03scsi: ufs: Support clearing multiple commands at onceBart Van Assche
[ Upstream commit d1a7644648b7cdacaf8d1013a4285001911e9bc8 ] Modify ufshcd_clear_cmd() such that it supports clearing multiple commands at once instead of one command at a time. This change will be used in a later patch to reduce the time spent in the reset handler. Link: https://lore.kernel.org/r/20220613214442.212466-3-bvanassche@acm.org Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03netfilter: nf_queue: do not allow packet truncation below transport header ↵Florian Westphal
offset [ Upstream commit 99a63d36cb3ed5ca3aa6fcb64cffbeaf3b0fb164 ] Domingo Dirutigliano and Nicola Guerrera report kernel panic when sending nf_queue verdict with 1-byte nfta_payload attribute. The IP/IPv6 stack pulls the IP(v6) header from the packet after the input hook. If user truncates the packet below the header size, this skb_pull() will result in a malformed skb (skb->len < 0). Fixes: 7af4cc3fa158 ("[NETFILTER]: Add "nfnetlink_queue" netfilter queue handler over nfnetlink") Reported-by: Domingo Dirutigliano <pwnzer0tt1@proton.me> Signed-off-by: Florian Westphal <fw@strlen.de> Reviewed-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03octeontx2-pf: cn10k: Fix egress ratelimit configurationSunil Goutham
[ Upstream commit b354eaeec8637d87003945439209251d76a2bb95 ] NIX_AF_TLXX_PIR/CIR register format has changed from OcteonTx2 to CN10K. CN10K supports larger burst size. Fix burst exponent and burst mantissa configuration for CN10K. Also fixed 'maxrate' from u32 to u64 since 'police.rate_bytes_ps' passed by stack is also u64. Fixes: e638a83f167e ("octeontx2-pf: TC_MATCHALL egress ratelimiting offload") Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03sctp: fix sleep in atomic context bug in timer handlersDuoming Zhou
[ Upstream commit b89fc26f741d9f9efb51cba3e9b241cf1380ec5a ] There are sleep in atomic context bugs in timer handlers of sctp such as sctp_generate_t3_rtx_event(), sctp_generate_probe_event(), sctp_generate_t1_init_event(), sctp_generate_timeout_event(), sctp_generate_t3_rtx_event() and so on. The root cause is sctp_sched_prio_init_sid() with GFP_KERNEL parameter that may sleep could be called by different timer handlers which is in interrupt context. One of the call paths that could trigger bug is shown below: (interrupt context) sctp_generate_probe_event sctp_do_sm sctp_side_effects sctp_cmd_interpreter sctp_outq_teardown sctp_outq_init sctp_sched_set_sched n->init_sid(..,GFP_KERNEL) sctp_sched_prio_init_sid //may sleep This patch changes gfp_t parameter of init_sid in sctp_sched_set_sched() from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic context bugs. Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Signed-off-by: Duoming Zhou <duoming@zju.edu.cn> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Link: https://lore.kernel.org/r/20220723015809.11553-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03net: dsa: fix reference counting for LAG FDBsVladimir Oltean
[ Upstream commit c7560d1203b7a1ea0b99a5c575547e95d564b2a8 ] Due to an invalid conflict resolution on my side while working on 2 different series (LAG FDBs and FDB isolation), dsa_switch_do_lag_fdb_add() does not store the database associated with a dsa_mac_addr structure. So after adding an FDB entry associated with a LAG, dsa_mac_addr_find() fails to find it while deleting it, because &a->db is zeroized memory for all stored FDB entries of lag->fdbs, and dsa_switch_do_lag_fdb_del() returns -ENOENT rather than deleting the entry. Fixes: c26933639b54 ("net: dsa: request drivers to perform FDB isolation") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220723012411.1125066-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03i40e: Fix interface init with MSI interrupts (no MSI-X)Michal Maloszewski
[ Upstream commit 5fcbb711024aac6d4db385623e6f2fdf019f7782 ] Fix the inability to bring an interface up on a setup with only MSI interrupts enabled (no MSI-X). Solution is to add a default number of QPs = 1. This is enough, since without MSI-X support driver enables only a basic feature set. Fixes: bc6d33c8d93f ("i40e: Fix the number of queues available to be mapped for use") Signed-off-by: Dawid Lukwinski <dawid.lukwinski@intel.com> Signed-off-by: Michal Maloszewski <michal.maloszewski@intel.com> Tested-by: Dave Switzer <david.switzer@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20220722175401.112572-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03ipv4: Fix data-races around sysctl_fib_notify_on_flag_change.Kuniyuki Iwashima
[ Upstream commit 96b9bd8c6d125490f9adfb57d387ef81a55a103e ] While reading sysctl_fib_notify_on_flag_change, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 680aea08e78c ("net: ipv4: Emit notification when fib hardware flags are changed") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix data-races around sysctl_tcp_reflect_tos.Kuniyuki Iwashima
[ Upstream commit 870e3a634b6a6cb1543b359007aca73fe6a03ac5 ] While reading sysctl_tcp_reflect_tos, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: ac8f1710c12b ("tcp: reflect tos value received in SYN to the socket") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_comp_sack_nr.Kuniyuki Iwashima
[ Upstream commit 79f55473bfc8ac51bd6572929a679eeb4da22251 ] While reading sysctl_tcp_comp_sack_nr, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 9c21d2fc41c0 ("tcp: add tcp_comp_sack_nr sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_comp_sack_slack_ns.Kuniyuki Iwashima
[ Upstream commit 22396941a7f343d704738360f9ef0e6576489d43 ] While reading sysctl_tcp_comp_sack_slack_ns, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: a70437cc09a1 ("tcp: add hrtimer slack to sack compression") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_comp_sack_delay_ns.Kuniyuki Iwashima
[ Upstream commit 4866b2b0f7672b6d760c4b8ece6fb56f965dcc8a ] While reading sysctl_tcp_comp_sack_delay_ns, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 6d82aa242092 ("tcp: add tcp_comp_sack_delay_ns sysctl") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03net: Fix data-races around sysctl_[rw]mem(_offset)?.Kuniyuki Iwashima
[ Upstream commit 02739545951ad4c1215160db7fbf9b7a918d3c0b ] While reading these sysctl variables, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. - .sysctl_rmem - .sysctl_rwmem - .sysctl_rmem_offset - .sysctl_wmem_offset - sysctl_tcp_rmem[1, 2] - sysctl_tcp_wmem[1, 2] - sysctl_decnet_rmem[1] - sysctl_decnet_wmem[1] - sysctl_tipc_rmem[1] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix data-races around sk_pacing_rate.Kuniyuki Iwashima
[ Upstream commit 59bf6c65a09fff74215517aecffbbdcd67df76e3 ] While reading sysctl_tcp_pacing_(ss|ca)_ratio, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. Fixes: 43e122b014c9 ("tcp: refine pacing rate determination") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03net: mld: fix reference count leak in mld_{query | report}_work()Taehee Yoo
[ Upstream commit 3e7d18b9dca388940a19cae30bfc1f76dccd8c28 ] mld_{query | report}_work() processes queued events. If there are too many events in the queue, it re-queue a work. And then, it returns without in6_dev_put(). But if queuing is failed, it should call in6_dev_put(), but it doesn't. So, a reference count leak would occur. THREAD0 THREAD1 mld_report_work() spin_lock_bh() if (!mod_delayed_work()) in6_dev_hold(); spin_unlock_bh() spin_lock_bh() schedule_delayed_work() spin_unlock_bh() Script to reproduce(by Hangbin Liu): ip netns add ns1 ip netns add ns2 ip netns exec ns1 sysctl -w net.ipv6.conf.all.force_mld_version=1 ip netns exec ns2 sysctl -w net.ipv6.conf.all.force_mld_version=1 ip -n ns1 link add veth0 type veth peer name veth0 netns ns2 ip -n ns1 link set veth0 up ip -n ns2 link set veth0 up for i in `seq 50`; do for j in `seq 100`; do ip -n ns1 addr add 2021:${i}::${j}/64 dev veth0 ip -n ns2 addr add 2022:${i}::${j}/64 dev veth0 done done modprobe -r veth ip -a netns del splat looks like: unregister_netdevice: waiting for veth0 to become free. Usage count = 2 leaked reference. ipv6_add_dev+0x324/0xec0 addrconf_notify+0x481/0xd10 raw_notifier_call_chain+0xe3/0x120 call_netdevice_notifiers+0x106/0x160 register_netdevice+0x114c/0x16b0 veth_newlink+0x48b/0xa50 [veth] rtnl_newlink+0x11a2/0x1a40 rtnetlink_rcv_msg+0x63f/0xc00 netlink_rcv_skb+0x1df/0x3e0 netlink_unicast+0x5de/0x850 netlink_sendmsg+0x6c9/0xa90 ____sys_sendmsg+0x76a/0x780 __sys_sendmsg+0x27c/0x340 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Tested-by: Hangbin Liu <liuhangbin@gmail.com> Fixes: f185de28d9ae ("mld: add new workqueues for process mld events") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03net: macsec: fix potential resource leak in macsec_add_rxsa() and ↵Jianglei Nie
macsec_add_txsa() [ Upstream commit c7b205fbbf3cffa374721bb7623f7aa8c46074f1 ] init_rx_sa() allocates relevant resource for rx_sa->stats and rx_sa-> key.tfm with alloc_percpu() and macsec_alloc_tfm(). When some error occurs after init_rx_sa() is called in macsec_add_rxsa(), the function released rx_sa with kfree() without releasing rx_sa->stats and rx_sa-> key.tfm, which will lead to a resource leak. We should call macsec_rxsa_put() instead of kfree() to decrease the ref count of rx_sa and release the relevant resource if the refcount is 0. The same bug exists in macsec_add_txsa() for tx_sa as well. This patch fixes the above two bugs. Fixes: 3cf3227a21d1 ("net: macsec: hardware offloading infrastructure") Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03macsec: always read MACSEC_SA_ATTR_PN as a u64Sabrina Dubroca
[ Upstream commit c630d1fe6219769049c87d1a6a0e9a6de55328a1 ] Currently, MACSEC_SA_ATTR_PN is handled inconsistently, sometimes as a u32, sometimes forced into a u64 without checking the actual length of the attribute. Instead, we can use nla_get_u64 everywhere, which will read up to 64 bits into a u64, capped by the actual length of the attribute coming from userspace. This fixes several issues: - the check in validate_add_rxsa doesn't work with 32-bit attributes - the checks in validate_add_txsa and validate_upd_sa incorrectly reject X << 32 (with X != 0) Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03macsec: limit replay window size with XPNSabrina Dubroca
[ Upstream commit b07a0e2044057f201d694ab474f5c42a02b6465b ] IEEE 802.1AEbw-2013 (section 10.7.8) specifies that the maximum value of the replay window is 2^30-1, to help with recovery of the upper bits of the PN. To avoid leaving the existing macsec device in an inconsistent state if this test fails during changelink, reuse the cleanup mechanism introduced for HW offload. This wasn't needed until now because macsec_changelink_common could not fail during changelink, as modifying the cipher suite was not allowed. Finally, this must happen after handling IFLA_MACSEC_CIPHER_SUITE so that secy->xpn is set. Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03macsec: fix error message in macsec_add_rxsa and _txsaSabrina Dubroca
[ Upstream commit 3240eac4ff20e51b87600dbd586ed814daf313db ] The expected length is MACSEC_SALT_LEN, not MACSEC_SA_ATTR_SALT. Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03macsec: fix NULL deref in macsec_add_rxsaSabrina Dubroca
[ Upstream commit f46040eeaf2e523a4096199fd93a11e794818009 ] Commit 48ef50fa866a added a test on tb_sa[MACSEC_SA_ATTR_PN], but nothing guarantees that it's not NULL at this point. The same code was added to macsec_add_txsa, but there it's not a problem because validate_add_txsa checks that the MACSEC_SA_ATTR_PN attribute is present. Note: it's not possible to reproduce with iproute, because iproute doesn't allow creating an SA without specifying the PN. Fixes: 48ef50fa866a ("macsec: Netlink support of XPN cipher suites (IEEE 802.1AEbw)") Link: https://bugzilla.kernel.org/show_bug.cgi?id=208315 Reported-by: Frantisek Sumsal <fsumsal@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03Documentation: fix sctp_wmem in ip-sysctl.rstXin Long
[ Upstream commit aa709da0e032cee7c202047ecd75f437bb0126ed ] Since commit 1033990ac5b2 ("sctp: implement memory accounting on tx path"), SCTP has supported memory accounting on tx path where 'sctp_wmem' is used by sk_wmem_schedule(). So we should fix the description for this option in ip-sysctl.rst accordingly. v1->v2: - Improve the description as Marcelo suggested. Fixes: 1033990ac5b2 ("sctp: implement memory accounting on tx path") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_invalid_ratelimit.Kuniyuki Iwashima
[ Upstream commit 2afdbe7b8de84c28e219073a6661080e1b3ded48 ] While reading sysctl_tcp_invalid_ratelimit, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 032ee4236954 ("tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_autocorking.Kuniyuki Iwashima
[ Upstream commit 85225e6f0a76e6745bc841c9f25169c509b573d8 ] While reading sysctl_tcp_autocorking, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: f54b311142a9 ("tcp: auto corking") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_min_rtt_wlen.Kuniyuki Iwashima
[ Upstream commit 1330ffacd05fc9ac4159d19286ce119e22450ed2 ] While reading sysctl_tcp_min_rtt_wlen, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: f672258391b4 ("tcp: track min RTT using windowed min-filter") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_tso_rtt_log.Kuniyuki Iwashima
[ Upstream commit 2455e61b85e9c99af38cd889a7101f1d48b33cb4 ] While reading sysctl_tcp_tso_rtt_log, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 65466904b015 ("tcp: adjust TSO packet sizes based on min_rtt") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03tcp: Fix a data-race around sysctl_tcp_min_tso_segs.Kuniyuki Iwashima
[ Upstream commit e0bb4ab9dfddd872622239f49fb2bd403b70853b ] While reading sysctl_tcp_min_tso_segs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader. Fixes: 95bd09eb2750 ("tcp: TSO packets automatic sizing") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03mlxsw: spectrum_router: simplify list unwindingTom Rix
[ Upstream commit 6f2f36e5f932c58e370bff79aba7f05963ea1c2a ] The setting of i here err_nexthop6_group_get: i = nrt6; Is redundant, i is already nrt6. So remove this statement. The for loop for the unwinding err_rt6_create: for (i--; i >= 0; i--) { Is equivelent to for (; i > 0; i--) { Two consecutive labels can be reduced to one. Signed-off-by: Tom Rix <trix@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://lore.kernel.org/r/20220402121516.2750284-1-trix@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03net: sungem_phy: Add of_node_put() for reference returned by of_get_parent()Liang He
[ Upstream commit ebbbe23fdf6070e31509638df3321688358cc211 ] In bcm5421_init(), we should call of_node_put() for the reference returned by of_get_parent() which has increased the refcount. Fixes: 3c326fe9cb7a ("[PATCH] ppc64: Add new PHY to sungem") Signed-off-by: Liang He <windhl@126.com> Link: https://lore.kernel.org/r/20220720131003.1287426-1-windhl@126.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03net: pcs: xpcs: propagate xpcs_read error to xpcs_get_state_c37_sgmiiVladimir Oltean
[ Upstream commit 27161db0904ee48e59140aa8d0835939a666c1f1 ] While phylink_pcs_ops :: pcs_get_state does return void, xpcs_get_state() does check for a non-zero return code from xpcs_get_state_c37_sgmii() and prints that as a message to the kernel log. However, a non-zero return code from xpcs_read() is translated into "return false" (i.e. zero as int) and the I/O error is therefore not printed. Fix that. Fixes: b97b5331b8ab ("net: pcs: add C37 SGMII AN support for intel mGbE controller") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220720112057.3504398-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-08-03net/tls: Remove the context from the list in tls_device_downMaxim Mikityanskiy
commit f6336724a4d4220c89a4ec38bca84b03b178b1a3 upstream. tls_device_down takes a reference on all contexts it's going to move to the degraded state (software fallback). If sk_destruct runs afterwards, it can reduce the reference counter back to 1 and return early without destroying the context. Then tls_device_down will release the reference it took and call tls_device_free_ctx. However, the context will still stay in tls_device_down_list forever. The list will contain an item, memory for which is released, making a memory corruption possible. Fix the above bug by properly removing the context from all lists before any call to tls_device_free_ctx. Fixes: 3740651bf7e2 ("tls: Fix context leak on tls_device_down") Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptrZiyang Xuan
commit 85f0173df35e5462d89947135a6a5599c6c3ef6f upstream. Change net device's MTU to smaller than IPV6_MIN_MTU or unregister device while matching route. That may trigger null-ptr-deref bug for ip6_ptr probability as following. ========================================================= BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134 Read of size 4 at addr 0000000000000308 by task ping6/263 CPU: 2 PID: 263 Comm: ping6 Not tainted 5.19.0-rc7+ #14 Call trace: dump_backtrace+0x1a8/0x230 show_stack+0x20/0x70 dump_stack_lvl+0x68/0x84 print_report+0xc4/0x120 kasan_report+0x84/0x120 __asan_load4+0x94/0xd0 find_match.part.0+0x70/0x134 __find_rr_leaf+0x408/0x470 fib6_table_lookup+0x264/0x540 ip6_pol_route+0xf4/0x260 ip6_pol_route_output+0x58/0x70 fib6_rule_lookup+0x1a8/0x330 ip6_route_output_flags_noref+0xd8/0x1a0 ip6_route_output_flags+0x58/0x160 ip6_dst_lookup_tail+0x5b4/0x85c ip6_dst_lookup_flow+0x98/0x120 rawv6_sendmsg+0x49c/0xc70 inet_sendmsg+0x68/0x94 Reproducer as following: Firstly, prepare conditions: $ip netns add ns1 $ip netns add ns2 $ip link add veth1 type veth peer name veth2 $ip link set veth1 netns ns1 $ip link set veth2 netns ns2 $ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1 $ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2 $ip netns exec ns1 ifconfig veth1 up $ip netns exec ns2 ifconfig veth2 up $ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1 $ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1 Secondly, execute the following two commands in two ssh windows respectively: $ip netns exec ns1 sh $while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done $ip netns exec ns1 sh $while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly, then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check. cpu0 cpu1 fib6_table_lookup __find_rr_leaf addrconf_notify [ NETDEV_CHANGEMTU ] addrconf_ifdown RCU_INIT_POINTER(dev->ip6_ptr, NULL) find_match ip6_ignore_linkdown So we can add NULL check for ip6_ptr before using in ip6_ignore_linkdown() to fix the null-ptr-deref bug. Fixes: dcd1f572954f ("net/ipv6: Remove fib6_idev") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220728013307.656257-1-william.xuanziyang@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-08-03net: ping6: Fix memleak in ipv6_renew_options().Kuniyuki Iwashima
commit e27326009a3d247b831eda38878c777f6f4eb3d1 upstream. When we close ping6 sockets, some resources are left unfreed because pingv6_prot is missing sk->sk_prot->destroy(). As reported by syzbot [0], just three syscalls leak 96 bytes and easily cause OOM. struct ipv6_sr_hdr *hdr; char data[24] = {0}; int fd; hdr = (struct ipv6_sr_hdr *)data; hdr->hdrlen = 2; hdr->type = IPV6_SRCRT_TYPE_4; fd = socket(AF_INET6, SOCK_DGRAM, NEXTHDR_ICMP); setsockopt(fd, IPPROTO_IPV6, IPV6_RTHDR, data, 24); close(fd); To fix memory leaks, let's add a destroy function. Note the socket() syscall checks if the GID is within the range of net.ipv4.ping_group_range. The default value is [1, 0] so that no GID meets the condition (1 <= GID <= 0). Thus, the local DoS does not succeed until we change the default value. However, at least Ubuntu/Fedora/RHEL loosen it. $ cat /usr/lib/sysctl.d/50-default.conf ... -net.ipv4.ping_group_range = 0 2147483647 Also, there could be another path reported with these options, and some of them require CAP_NET_RAW. setsockopt IPV6_ADDRFORM (inet6_sk(sk)->pktoptions) IPV6_RECVPATHMTU (inet6_sk(sk)->rxpmtu) IPV6_HOPOPTS (inet6_sk(sk)->opt) IPV6_RTHDRDSTOPTS (inet6_sk(sk)->opt) IPV6_RTHDR (inet6_sk(sk)->opt) IPV6_DSTOPTS (inet6_sk(sk)->opt) IPV6_2292PKTOPTIONS (inet6_sk(sk)->opt) getsockopt IPV6_FLOWLABEL_MGR (inet6_sk(sk)->ipv6_fl_list) For the record, I left a different splat with syzbot's one. unreferenced object 0xffff888006270c60 (size 96): comm "repro2", pid 231, jiffies 4294696626 (age 13.118s) hex dump (first 32 bytes): 01 00 00 00 44 00 00 00 00 00 00 00 00 00 00 00 ....D........... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f6bc7ea9>] sock_kmalloc (net/core/sock.c:2564 net/core/sock.c:2554) [<000000006d699550>] do_ipv6_setsockopt.constprop.0 (net/ipv6/ipv6_sockglue.c:715) [<00000000c3c3b1f5>] ipv6_setsockopt (net/ipv6/ipv6_sockglue.c:1024) [<000000007096a025>] __sys_setsockopt (net/socket.c:2254) [<000000003a8ff47b>] __x64_sys_setsockopt (net/socket.c:2265 net/socket.c:2262 net/socket.c:2262) [<000000007c409dcb>] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [<00000000e939c4a9>] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [0]: https://syzkaller.appspot.com/bug?extid=a8430774139ec3ab7176 Fixes: 6d0bfe226116 ("net: ipv6: Add IPv6 support to the ping socket.") Reported-by: syzbot+a8430774139ec3ab7176@syzkaller.appspotmail.com Reported-by: Ayushman Dutta <ayudutta@amazon.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220728012220.46918-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>