AgeCommit message (Collapse)Author
2014-06-19Linux Tarreau
Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-06-19net: fix regression introduced in by sysctl fixesWilly Tarreau
Commits b7c9e4ee1 ("sysctl net: Keep tcp_syn_retries inside the boundary") and eedcafdc ("net: check net.core.somaxconn sysctl values") were missing a .strategy entry which is still required in 2.6.32. Because of this, the Ubuntu kernel team has faced kernel dumps during their testing. Tyler Hicks and Luis Henriques proposed this patch to fix the issue, which properly sets .strategy as needed in 2.6.32. Reported-by: Luis Henriques <luis.henriques@canonical.com> Cc: tyler.hicks@canonical.com Cc: Michal Tesar <mtesar@redhat.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-06-19auditsc: audit_krule mask accesses need bounds checkingAndy Lutomirski
Fixes an easy DoS and possible information disclosure. This does nothing about the broken state of x32 auditing. eparis: If the admin has enabled auditd and has specifically loaded audit rules. This bug has been around since before git. Wow... Cc: stable@vger.kernel.org Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit a3c54931199565930d6d84f4c3456f6440aefd41) [wt: no audit_filter_inode_name(), applied to audit_filter_inodes() instead] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-06-19futex: Make lookup_pi_state more robustThomas Gleixner
The current implementation of lookup_pi_state has ambigous handling of the TID value 0 in the user space futex. We can get into the kernel even if the TID value is 0, because either there is a stale waiters bit or the owner died bit is set or we are called from the requeue_pi path or from user space just for fun. The current code avoids an explicit sanity check for pid = 0 in case that kernel internal state (waiters) are found for the user space address. This can lead to state leakage and worse under some circumstances. Handle the cases explicit: Waiter | pi_state | pi->owner | uTID | uODIED | ? [1] NULL | --- | --- | 0 | 0/1 | Valid [2] NULL | --- | --- | >0 | 0/1 | Valid [3] Found | NULL | -- | Any | 0/1 | Invalid [4] Found | Found | NULL | 0 | 1 | Valid [5] Found | Found | NULL | >0 | 1 | Invalid [6] Found | Found | task | 0 | 1 | Valid [7] Found | Found | NULL | Any | 0 | Invalid [8] Found | Found | task | ==taskTID | 0/1 | Valid [9] Found | Found | task | 0 | 0 | Invalid [10] Found | Found | task | !=taskTID | 0/1 | Invalid [1] Indicates that the kernel can acquire the futex atomically. We came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit. [2] Valid, if TID does not belong to a kernel thread. If no matching thread is found then it indicates that the owner TID has died. [3] Invalid. The waiter is queued on a non PI futex [4] Valid state after exit_robust_list(), which sets the user space value to FUTEX_WAITERS | FUTEX_OWNER_DIED. [5] The user space value got manipulated between exit_robust_list() and exit_pi_state_list() [6] Valid state after exit_pi_state_list() which sets the new owner in the pi_state but cannot access the user space value. [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set. [8] Owner and user space value match [9] There is no transient state which sets the user space TID to 0 except exit_robust_list(), but this is indicated by the FUTEX_OWNER_DIED bit. See [4] [10] There is no transient state which leaves owner and user space TID out of sync. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 54a217887a7b658e2650c3feff22756ab80c7339) Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-06-19futex: Always cleanup owner tid in unlock_piThomas Gleixner
If the owner died bit is set at futex_unlock_pi, we currently do not cleanup the user space futex. So the owner TID of the current owner (the unlocker) persists. That's observable inconsistant state, especially when the ownership of the pi state got transferred. Clean it up unconditionally. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: Darren Hart <dvhart@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 13fbca4c6ecd96ec1a1cfa2e4f2ce191fe928a5e) [wt: adjusted context - cmpxchg_futex_value_locked() takes 3 args] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-06-19futex: Validate atomic acquisition in futex_lock_pi_atomic()Thomas Gleixner
We need to protect the atomic acquisition in the kernel against rogue user space which sets the user space futex to 0, so the kernel side acquisition succeeds while there is existing state in the kernel associated to the real owner. Verify whether the futex has waiters associated with kernel state. If it has, return -EINVAL. The state is corrupted already, so no point in cleaning it up. Subsequent calls will fail as well. Not our problem. [ tglx: Use futex_top_waiter() and explain why we do not need to try restoring the already corrupted user space state. ] Signed-off-by: Darren Hart <dvhart@linux.intel.com> Cc: Kees Cook <keescook@chromium.org> Cc: Will Drewry <wad@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit b3eaa9fc5cd0a4d74b18f6b8dc617aeaf1873270) Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-06-19futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 ↵Thomas Gleixner
in futex_requeue(..., requeue_pi=1) If uaddr == uaddr2, then we have broken the rule of only requeueing from a non-pi futex to a pi futex with this call. If we attempt this, then dangling pointers may be left for rt_waiter resulting in an exploitable condition. This change brings futex_requeue() in line with futex_wait_requeue_pi() which performs the same check as per commit 6f7b0a2a5c0f ("futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi()") [ tglx: Compare the resulting keys as well, as uaddrs might be different depending on the mapping ] Fixes CVE-2014-3153. Reported-by: Pinkie Pie Signed-off-by: Will Drewry <wad@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit e9c243a5a6de0be8e584c604d353412584b592f8) Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-06-19futex: Prevent attaching to kernel threadsThomas Gleixner
We happily allow userspace to declare a random kernel thread to be the owner of a user space PI futex. Found while analysing the fallout of Dave Jones syscall fuzzer. We also should validate the thread group for private futexes and find some fast way to validate whether the "alleged" owner has RW access on the file which backs the SHM, but that's a separate issue. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <williams@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Carlos ODonell <carlos@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20140512201701.194824402@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org (cherry picked from commit f0d71b3dcb8332f7971b5f2363632573e6d9486a) Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-06-19futex: Add another early deadlock detection checkThomas Gleixner
Dave Jones trinity syscall fuzzer exposed an issue in the deadlock detection code of rtmutex: http://lkml.kernel.org/r/20140429151655.GA14277@redhat.com That underlying issue has been fixed with a patch to the rtmutex code, but the futex code must not call into rtmutex in that case because - it can detect that issue early - it avoids a different and more complex fixup for backing out If the user space variable got manipulated to 0x80000000 which means no lock holder, but the waiters bit set and an active pi_state in the kernel is found we can figure out the recursive locking issue by looking at the pi_state owner. If that is the current task, then we can safely return -EDEADLK. The check should have been added in commit 59fa62451 (futex: Handle futex_pi OWNER_DIED take over correctly) already, but I did not see the above issue caused by user space manipulation back then. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <darren@dvhart.com> Cc: Davidlohr Bueso <davidlohr@hp.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Clark Williams <williams@redhat.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Carlos ODonell <carlos@redhat.com> Cc: Jakub Jelinek <jakub@redhat.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20140512201701.097349971@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org (cherry picked from commit 866293ee54227584ffcb4a42f69c1f365974ba7f) Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-06-19ethtool: Report link-down while interface is downBen Hutchings
While an interface is down, many implementations of ethtool_ops::get_link, including the default, ethtool_op_get_link(), will report the last link state seen while the interface was up. In general the current physical link state is not available if the interface is down. Define ETHTOOL_GLINK to reflect whether the interface *and* any physical port have a working link, and consistently return 0 when the interface is down. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit e596e6e4d578f2639416e620d367a3af34814a40) Cc: Wang Weidong <wangweidong1@huawei.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19Linux Tarreau
Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19floppy: don't write kernel-only members to FDRAWCMD ioctl outputMatthew Daley
Do not leak kernel-only floppy_raw_cmd structure members to userspace. This includes the linked-list pointer and the pointer to the allocated DMA space. Signed-off-by: Matthew Daley <mattd@bugfuzz.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 2145e15e0557a01b9195d1c7199a1b92cb9be81f) Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19floppy: ignore kernel-only members in FDRAWCMD ioctl inputMatthew Daley
Always clear out these floppy_raw_cmd struct members after copying the entire structure from userspace so that the in-kernel version is always valid and never left in an interdeterminate state. Signed-off-by: Matthew Daley <mattd@bugfuzz.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit ef87dbe7614341c2e7bfe8d32fcb7028cc97442c) [wt: be careful in 2.6.32 we still have the ugly macros everywhere] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19netfilter: nf_conntrack_dccp: fix skb_header_pointer API usagesDaniel Borkmann
commit b22f5126a24b3b2f15448c3f2a254fc10cbc2b92 upstream Some occurences in the netfilter tree use skb_header_pointer() in the following way ... struct dccp_hdr _dh, *dh; ... skb_header_pointer(skb, dataoff, sizeof(_dh), &dh); ... where dh itself is a pointer that is being passed as the copy buffer. Instead, we need to use &_dh as the forth argument so that we're copying the data into an actual buffer that sits on the stack. Currently, we probably could overwrite memory on the stack (e.g. with a possibly mal-formed DCCP packet), but unintentionally, as we only want the buffer to be placed into _dh variable. Fixes: 2bc780499aa3 ("[NETFILTER]: nf_conntrack: add DCCP protocol support") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19s390: fix kernel crash due to linkage stack instructionsMartin Schwidefsky
commit 8d7f6690cedb83456edd41c9bd583783f0703bf0 upstream The kernel currently crashes with a low-address-protection exception if a user space process executes an instruction that tries to use the linkage stack. Set the base-ASTE origin and the subspace-ASTE origin of the dispatchable-unit-control-table to point to a dummy ASTE. Set up control register 15 to point to an empty linkage stack with no room left. A user space process with a linkage stack instruction will still crash but with a different exception which is correctly translated to a segmentation fault instead of a kernel oops. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19SELinux: Fix kernel BUG on empty security contexts.Stephen Smalley
commit 2172fa709ab32ca60e86179dc67d0857be8e2c98 upstream Setting an empty security context (length=0) on a file will lead to incorrectly dereferencing the type and other fields of the security context structure, yielding a kernel BUG. As a zero-length security context is never valid, just reject all such security contexts whether coming from userspace via setxattr or coming from the filesystem upon a getxattr request by SELinux. Setting a security context value (empty or otherwise) unknown to SELinux in the first place is only possible for a root process (CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only if the corresponding SELinux mac_admin permission is also granted to the domain by policy. In Fedora policies, this is only allowed for specific domains such as livecd for setting down security contexts that are not defined in the build host policy. Reproducer: su setenforce 0 touch foo setfattr -n security.selinux foo Caveat: Relabeling or removing foo after doing the above may not be possible without booting with SELinux disabled. Any subsequent access to foo after doing the above will also trigger the BUG. BUG output from Matthew Thode: [ 473.893141] ------------[ cut here ]------------ [ 473.962110] kernel BUG at security/selinux/ss/services.c:654! [ 473.995314] invalid opcode: 0000 [#6] SMP [ 474.027196] Modules linked in: [ 474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G D I 3.13.0-grsec #1 [ 474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0 07/29/10 [ 474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti: ffff8805f50cd488 [ 474.183707] RIP: 0010:[<ffffffff814681c7>] [<ffffffff814681c7>] context_struct_compute_av+0xce/0x308 [ 474.219954] RSP: 0018:ffff8805c0ac3c38 EFLAGS: 00010246 [ 474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX: 0000000000000100 [ 474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI: ffff8805e8aaa000 [ 474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09: 0000000000000006 [ 474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12: 0000000000000006 [ 474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15: 0000000000000000 [ 474.453816] FS: 00007f2e75220800(0000) GS:ffff88061fc00000(0000) knlGS:0000000000000000 [ 474.489254] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4: 00000000000207f0 [ 474.556058] Stack: [ 474.584325] ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98 ffff8805f1190a40 [ 474.618913] ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990 ffff8805e8aac860 [ 474.653955] ffff8805c0ac3cb8 000700068113833a ffff880606c75060 ffff8805c0ac3d94 [ 474.690461] Call Trace: [ 474.723779] [<ffffffff811b549b>] ? lookup_fast+0x1cd/0x22a [ 474.778049] [<ffffffff81468824>] security_compute_av+0xf4/0x20b [ 474.811398] [<ffffffff8196f419>] avc_compute_av+0x2a/0x179 [ 474.843813] [<ffffffff8145727b>] avc_has_perm+0x45/0xf4 [ 474.875694] [<ffffffff81457d0e>] inode_has_perm+0x2a/0x31 [ 474.907370] [<ffffffff81457e76>] selinux_inode_getattr+0x3c/0x3e [ 474.938726] [<ffffffff81455cf6>] security_inode_getattr+0x1b/0x22 [ 474.970036] [<ffffffff811b057d>] vfs_getattr+0x19/0x2d [ 475.000618] [<ffffffff811b05e5>] vfs_fstatat+0x54/0x91 [ 475.030402] [<ffffffff811b063b>] vfs_lstat+0x19/0x1b [ 475.061097] [<ffffffff811b077e>] SyS_newlstat+0x15/0x30 [ 475.094595] [<ffffffff8113c5c1>] ? __audit_syscall_entry+0xa1/0xc3 [ 475.148405] [<ffffffff8197791e>] system_call_fastpath+0x16/0x1b [ 475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48 8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7 75 02 <0f> 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8 [ 475.255884] RIP [<ffffffff814681c7>] context_struct_compute_av+0xce/0x308 [ 475.296120] RSP <ffff8805c0ac3c38> [ 475.328734] ---[ end trace f076482e9d754adc ]--- Reported-by: Matthew Thode <mthode@mthode.org> Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: stable@vger.kernel.org Signed-off-by: Paul Moore <pmoore@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19aacraid: missing capable() check in compat ioctlDan Carpenter
commit f856567b930dfcdbc3323261bf77240ccdde01f5 upstream In commit d496f94d22d1 ('[SCSI] aacraid: fix security weakness') we added a check on CAP_SYS_RAWIO to the ioctl. The compat ioctls need the check as well. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19xfs: underflow bug in xfs_attrlist_by_handle()Dan Carpenter
If we allocate less than sizeof(struct attrlist) then we end up corrupting memory or doing a ZERO_PTR_SIZE dereference. This can only be triggered with CAP_SYS_ADMIN. Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com> (cherry picked from commit 071c529eb672648ee8ca3f90944bcbcc730b4c06) [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19qeth: avoid buffer overflow in snmp ioctlUrsula Braun
commit 6fb392b1a63ae36c31f62bc3fc8630b49d602b62 upstream Check user-defined length in snmp ioctl request and allow request only if it fits into a qeth command buffer. Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com> Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com> Reviewed-by: Heiko Carstens <heicars2@linux.vnet.ibm.com> Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Cc: <stable@vger.kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> [jmm: backport 2.6.32] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367)Andy Honig
commit b963a22e6d1a266a67e9eecc88134713fd54775c upstream Under guest controllable circumstances apic_get_tmcct will execute a divide by zero and cause a crash. If the guest cpuid support tsc deadline timers and performs the following sequence of requests the host will crash. - Set the mode to periodic - Set the TMICT to 0 - Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline) - Set the TMICT to non-zero. Then the lapic_timer.period will be 0, but the TMICT will not be. If the guest then reads from the TMCCT then the host will perform a divide by 0. This patch ensures that if the lapic_timer.period is 0, then the division does not occur. Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19KVM: Improve create VCPU parameter (CVE-2013-4587)Andy Honig
commit 338c7dbadd2671189cec7faf64c84d01071b3f96 upstream In multiple functions the vcpu_id is used as an offset into a bitfield. Ag malicious user could specify a vcpu_id greater than 255 in order to set or clear bits in kernel memory. This could be used to elevate priveges in the kernel. This patch verifies that the vcpu_id provided is less than 255. The api documentation already specifies that the vcpu_id must be less than max_vcpus, but this is currently not checked. Reported-by: Andrew Honig <ahonig@google.com> Cc: stable@vger.kernel.org Signed-off-by: Andrew Honig <ahonig@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19uml: check length in exitcode_proc_write()Dan Carpenter
commit 201f99f170df14ba52ea4c52847779042b7a623b upstream We don't cap the size of buffer from the user so we could write past the end of the array here. Only root can write to this file. Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19crypto: ansi_cprng - Fix off by one error in non-block size requestNeil Horman
commit 714b33d15130cbb5ab426456d4e3de842d6c5b8a upstream Stephan Mueller reported to me recently a error in random number generation in the ansi cprng. If several small requests are made that are less than the instances block size, the remainder for loop code doesn't increment rand_data_valid in the last iteration, meaning that the last bytes in the rand_data buffer gets reused on the subsequent smaller-than-a-block request for random data. The fix is pretty easy, just re-code the for loop to make sure that rand_data_valid gets incremented appropriately Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Reported-by: Stephan Mueller <stephan.mueller@atsec.com> CC: Stephan Mueller <stephan.mueller@atsec.com> CC: Petr Matousek <pmatouse@redhat.com> CC: Herbert Xu <herbert@gondor.apana.org.au> CC: "David S. Miller" <davem@davemloft.net> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19dm snapshot: fix data corruptionMikulas Patocka
CVE-2013-4299 BugLink: http://bugs.launchpad.net/bugs/1241769 This patch fixes a particular type of data corruption that has been encountered when loading a snapshot's metadata from disk. When we allocate a new chunk in persistent_prepare, we increment ps->next_free and we make sure that it doesn't point to a metadata area by further incrementing it if necessary. When we load metadata from disk on device activation, ps->next_free is positioned after the last used data chunk. However, if this last used data chunk is followed by a metadata area, ps->next_free is positioned erroneously to the metadata area. A newly-allocated chunk is placed at the same location as the metadata area, resulting in data or metadata corruption. This patch changes the code so that ps->next_free skips the metadata area when metadata are loaded in function read_exceptions. The patch also moves a piece of code from persistent_prepare_exception to a separate function skip_metadata to avoid code duplication. CVE-2013-4299 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> (back ported from commit e9c6a182649f4259db704ae15a91ac820e63b0ca) Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19ipv6: call udp_push_pending_frames when uncorking a socket with AF_INET ↵Hannes Frederic Sowa
pending data CVE-2013-4162 BugLink: http://bugs.launchpad.net/bugs/1205070 We accidentally call down to ip6_push_pending_frames when uncorking pending AF_INET data on a ipv6 socket. This results in the following splat (from Dave Jones): skbuff: skb_under_panic: text:ffffffff816765f6 len:48 put:40 head:ffff88013deb6df0 data:ffff88013deb6dec tail:0x2c end:0xc0 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:126! invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: dccp_ipv4 dccp 8021q garp bridge stp dlci mpoa snd_seq_dummy sctp fuse hidp tun bnep nfnetlink scsi_transport_iscsi rfcomm can_raw can_bcm af_802154 appletalk caif_socket can caif ipt_ULOG x25 rose af_key pppoe pppox ipx phonet irda llc2 ppp_generic slhc p8023 psnap p8022 llc crc_ccitt atm bluetooth +netrom ax25 nfc rfkill rds af_rxrpc coretemp hwmon kvm_intel kvm crc32c_intel snd_hda_codec_realtek ghash_clmulni_intel microcode pcspkr snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep usb_debug snd_seq snd_seq_device snd_pcm e1000e snd_page_alloc snd_timer ptp snd pps_core soundcore xfs libcrc32c CPU: 2 PID: 8095 Comm: trinity-child2 Not tainted 3.10.0-rc7+ #37 task: ffff8801f52c2520 ti: ffff8801e6430000 task.ti: ffff8801e6430000 RIP: 0010:[<ffffffff816e759c>] [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP: 0018:ffff8801e6431de8 EFLAGS: 00010282 RAX: 0000000000000086 RBX: ffff8802353d3cc0 RCX: 0000000000000006 RDX: 0000000000003b90 RSI: ffff8801f52c2ca0 RDI: ffff8801f52c2520 RBP: ffff8801e6431e08 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000001 R12: ffff88022ea0c800 R13: ffff88022ea0cdf8 R14: ffff8802353ecb40 R15: ffffffff81cc7800 FS: 00007f5720a10740(0000) GS:ffff880244c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000005862000 CR3: 000000022843c000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Stack: ffff88013deb6dec 000000000000002c 00000000000000c0 ffffffff81a3f6e4 ffff8801e6431e18 ffffffff8159a9aa ffff8801e6431e90 ffffffff816765f6 ffffffff810b756b 0000000700000002 ffff8801e6431e40 0000fea9292aa8c0 Call Trace: [<ffffffff8159a9aa>] skb_push+0x3a/0x40 [<ffffffff816765f6>] ip6_push_pending_frames+0x1f6/0x4d0 [<ffffffff810b756b>] ? mark_held_locks+0xbb/0x140 [<ffffffff81694919>] udp_v6_push_pending_frames+0x2b9/0x3d0 [<ffffffff81694660>] ? udplite_getfrag+0x20/0x20 [<ffffffff8162092a>] udp_lib_setsockopt+0x1aa/0x1f0 [<ffffffff811cc5e7>] ? fget_light+0x387/0x4f0 [<ffffffff816958a4>] udpv6_setsockopt+0x34/0x40 [<ffffffff815949f4>] sock_common_setsockopt+0x14/0x20 [<ffffffff81593c31>] SyS_setsockopt+0x71/0xd0 [<ffffffff816f5d54>] tracesys+0xdd/0xe2 Code: 00 00 48 89 44 24 10 8b 87 d8 00 00 00 48 89 44 24 08 48 8b 87 e8 00 00 00 48 c7 c7 c0 04 aa 81 48 89 04 24 31 c0 e8 e1 7e ff ff <0f> 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 RIP [<ffffffff816e759c>] skb_panic+0x63/0x65 RSP <ffff8801e6431de8> This patch adds a check if the pending data is of address family AF_INET and directly calls udp_push_ending_frames from udp_v6_push_pending_frames if that is the case. This bug was found by Dave Jones with trinity. (Also move the initialization of fl6 below the AF_INET check, even if not strictly necessary.) Cc: Dave Jones <davej@redhat.com> Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> (back ported from commit 8822b64a0fa64a5dd1dfcf837c5b0be83f8c05d1) Signed-off-by: Luis Henriques <luis.henriques@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19exec/ptrace: fix get_dumpable() incorrect testsKees Cook
commit d049f74f2dbe71354d43d393ac3a188947811348 upstream The get_dumpable() return value is not boolean. Most users of the function actually want to be testing for non-SUID_DUMP_USER(1) rather than SUID_DUMP_DISABLE(0). The SUID_DUMP_ROOT(2) is also considered a protected state. Almost all places did this correctly, excepting the two places fixed in this patch. Wrong logic: if (dumpable == SUID_DUMP_DISABLE) { /* be protective */ } or if (dumpable == 0) { /* be protective */ } or if (!dumpable) { /* be protective */ } Correct logic: if (dumpable != SUID_DUMP_USER) { /* be protective */ } or if (dumpable != 1) { /* be protective */ } Without this patch, if the system had set the sysctl fs/suid_dumpable=2, a user was able to ptrace attach to processes that had dropped privileges to that user. (This may have been partially mitigated if Yama was enabled.) The macros have been moved into the file that declares get/set_dumpable(), which means things like the ia64 code can see them too. CVE-2013-2929 Reported-by: Vasily Kulikov <segoon@openwall.com> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [dannf: backported to Debian's 2.6.32] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19n_tty: Fix n_tty_write crash when echoing in raw modePeter Hurley
The tty atomic_write_lock does not provide an exclusion guarantee for the tty driver if the termios settings are LECHO & !OPOST. And since it is unexpected and not allowed to call TTY buffer helpers like tty_insert_flip_string concurrently, this may lead to crashes when concurrect writers call pty_write. In that case the following two writers: * the ECHOing from a workqueue and * pty_write from the process race and can overflow the corresponding TTY buffer like follows. If we look into tty_insert_flip_string_fixed_flag, there is: int space = __tty_buffer_request_room(port, goal, flags); struct tty_buffer *tb = port->buf.tail; ... memcpy(char_buf_ptr(tb, tb->used), chars, space); ... tb->used += space; so the race of the two can result in something like this: A B __tty_buffer_request_room __tty_buffer_request_room memcpy(buf(tb->used), ...) tb->used += space; memcpy(buf(tb->used), ...) ->BOOM B's memcpy is past the tty_buffer due to the previous A's tb->used increment. Since the N_TTY line discipline input processing can output concurrently with a tty write, obtain the N_TTY ldisc output_lock to serialize echo output with normal tty writes. This ensures the tty buffer helper tty_insert_flip_string is not called concurrently and everything is fine. Note that this is nicely reproducible by an ordinary user using forkpty and some setup around that (raw termios + ECHO). And it is present in kernels at least after commit d945cb9cce20ac7143c2de8d88b187f62db99bdc (pty: Rework the pty layer to use the normal buffering logic) in 2.6.31-rc3. js: add more info to the commit log js: switch to bool js: lock unconditionally js: lock only the tty->ops->write call References: CVE-2014-0196 Reported-and-tested-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Peter Hurley <peter@hurleysoftware.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 4291086b1f081b869c6d79e5b7441633dc3ace00) [wt: 2.6.32 has no n_tty_data, so output_lock is in tty, not tty->disc_data] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19tcp_cubic: fix the range of delayed_ackLiu Yu
commit b9f47a3aaeab (tcp_cubic: limit delayed_ack ratio to prevent divide error) try to prevent divide error, but there is still a little chance that delayed_ack can reach zero. In case the param cnt get negative value, then ratio+cnt would overflow and may happen to be zero. As a result, min(ratio, ACK_RATIO_LIMIT) will calculate to be zero. In some old kernels, such as 2.6.32, there is a bug that would pass negative param, which then ultimately leads to this divide error. commit 5b35e1e6e9c (tcp: fix tcp_trim_head() to adjust segment count with skb MSS) fixed the negative param issue. However, it's safe that we fix the range of delayed_ack as well, to make sure we do not hit a divide by zero. CC: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Liu Yu <allanyuliu@tencent.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 0cda345d1b2201dd15591b163e3c92bad5191745) Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19tcp_cubic: limit delayed_ack ratio to prevent divide errorstephen hemminger
TCP Cubic keeps a metric that estimates the amount of delayed acknowledgements to use in adjusting the window. If an abnormally large number of packets are acknowledged at once, then the update could wrap and reach zero. This kind of ACK could only happen when there was a large window and huge number of ACK's were lost. This patch limits the value of delayed ack ratio. The choice of 32 is just a conservative value since normally it should be range of 1 to 4 packets. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit b9f47a3aaeabdce3b42829bbb27765fa340f76ba) [wt: in 2.6.32, this fix is needed for the next one] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19tcp: fix tcp_trim_head() to adjust segment count with skb MSSNeal Cardwell
This commit fixes tcp_trim_head() to recalculate the number of segments in the skb with the skb's existing MSS, so trimming the head causes the skb segment count to be monotonically non-increasing - it should stay the same or go down, but not increase. Previously tcp_trim_head() used the current MSS of the connection. But if there was a decrease in MSS between original transmission and ACK (e.g. due to PMTUD), this could cause tcp_trim_head() to counter-intuitively increase the segment count when trimming bytes off the head of an skb. This violated assumptions in tcp_tso_acked() that tcp_trim_head() only decreases the packet count, so that packets_acked in tcp_tso_acked() could underflow, leading tcp_clean_rtx_queue() to pass u32 pkts_acked values as large as 0xffffffff to ca_ops->pkts_acked(). As an aside, if tcp_trim_head() had really wanted the skb to reflect the current MSS, it should have called tcp_set_skb_tso_segs() unconditionally, since a decrease in MSS would mean that a single-packet skb should now be sliced into multiple segments. Signed-off-by: Neal Cardwell <ncardwell@google.com> Acked-by: Nandita Dukkipati <nanditad@google.com> Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 5b35e1e6e9ca651e6b291c96d1106043c9af314a) Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19powernow-k6: reorder frequenciesMikulas Patocka
commit 22c73795b101597051924556dce019385a1e2fa0 upstream This patch reorders reported frequencies from the highest to the lowest, just like in other frequency drivers. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19powernow-k6: correctly initialize default parametersMikulas Patocka
commit d82b922a4acc1781d368aceac2f9da43b038cab2 upstream The powernow-k6 driver used to read the initial multiplier from the powernow register. However, there is a problem with this: * If there was a frequency transition before, the multiplier read from the register corresponds to the current multiplier. * If there was no frequency transition since reset, the field in the register always reads as zero, regardless of the current multiplier that is set using switches on the mainboard and that the CPU is running at. The zero value corresponds to multiplier 4.5, so as a consequence, the powernow-k6 driver always assumes multiplier 4.5. For example, if we have 550MHz CPU with bus frequency 100MHz and multiplier 5.5, the powernow-k6 driver thinks that the multiplier is 4.5 and bus frequency is 122MHz. The powernow-k6 driver then sets the multiplier to 4.5, underclocking the CPU to 450MHz, but reports the current frequency as 550MHz. There is no reliable way how to read the initial multiplier. I modified the driver so that it contains a table of known frequencies (based on parameters of existing CPUs and some common overclocking schemes) and sets the multiplier according to the frequency. If the frequency is unknown (because of unusual overclocking or underclocking), the user must supply the bus speed and maximum multiplier as module parameters. This patch should be backported to all stable kernels. If it doesn't apply cleanly, change it, or ask me to change it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19powernow-k6: disable cache when changing frequencyMikulas Patocka
commit e20e1d0ac02308e2211306fc67abcd0b2668fb8b upstream I found out that a system with k6-3+ processor is unstable during network server load. The system locks up or the network card stops receiving. The reason for the instability is the CPU frequency scaling. During frequency transition the processor is in "EPM Stop Grant" state. The documentation says that the processor doesn't respond to inquiry requests in this state. Consequently, coherency of processor caches and bus master devices is not maintained, causing the system instability. This patch flushes the cache during frequency transition. It fixes the instability. Other minor changes: * u64 invalue changed to unsigned long because the variable is 32-bit * move the logic to set the multiplier to a separate function powernow_k6_set_cpu_multiplier * preserve lower 5 bits of the powernow port instead of 4 (the voltage field has 5 bits) * mask interrupts when reading the multiplier, so that the port is not open during other activity (running other kernel code with the port open shouldn't cause any misbehavior, but we should better be safe and keep the port closed) This patch should be backported to all stable kernels. If it doesn't apply cleanly, change it, or ask me to change it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19powernow-k6: set transition latency value so ondemand governor can be usedKrzysztof Helt
Set the transition latency to value smaller than CPUFREQ_ETERNAL so governors other than "performance" work (like the "ondemand" one). The value is found in "AMD PowerNow! Technology Platform Design Guide for Embedded Processors" dated December 2000 (AMD doc #24267A). There is the answer to one of FAQs on page 40 which states that suggested complete transition period is 200 us. Tested on K6-2+ CPU with K6-3 core (model 13, stepping 4). Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Dave Jones <davej@redhat.com> (cherry picked from commit db2820dd5445a44b4726f15a2bc89b9ded2503eb) [wt: in 2.6.32, we only need this one so that next series applies cleanly] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19gianfar: disable TX vlan based on kernel 2.6.xZhu Yanjun
2.6.x kernels require a similar logic change as commit e1653c3e [gianfar: do vlan cleanup] and commit 51b8cbfc [gianfar: fix bug caused by e1653c3e] introduces for newer kernels. Since there is something wrong with tx vlan of gianfar nic driver, in kernel(3.1+), tx vlan is disabled. But in kernel 2.6.x, tx vlan is still enabled. Thus,gianfar nic driver can not support vlan packets and non-vlan packets at the same time. Signed-off-by: Zhu Yanjun <Yanjun.Zhu@windriver.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19x86, fpu, amd: Clear exceptions in AMD FXSAVE workaroundLinus Torvalds
Before we do an EMMS in the AMD FXSAVE information leak workaround we need to clear any pending exceptions, otherwise we trap with a floating-point exception inside this code. Reported-by: halfdog <me@halfdog.net> Tested-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com Signed-off-by: H. Peter Anvin <hpa@zytor.com> (cherry picked from commit 26bef1318adc1b3a530ecc807ef99346db2aa8b0) [wt: in 2.6.32, patch applies to arch/x86/include/asm/i387.h. There's no static_cpu_has() so we use boot_cpu_has() like other kernels do with gcc3. ] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19libertas: potential oops in debugfsDan Carpenter
If we do a zero size allocation then it will oops. Also we can't be sure the user passes us a NUL terminated string so I've added a terminator. This code can only be triggered by root. Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Dan Williams <dcbw@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> (cherry picked from commit a497e47d4aec37aaf8f13509f3ef3d1f6a717d88) Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19Fix a few incorrectly checked [io_]remap_pfn_range() callsLinus Torvalds
Nico Golde reports a few straggling uses of [io_]remap_pfn_range() that really should use the vm_iomap_memory() helper. This trivially converts two of them to the helper, and comments about why the third one really needs to continue to use remap_pfn_range(), and adds the missing size check. Reported-by: Nico Golde <nico@ngolde.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 7314e613d5ff9f0934f7a0f74ed7973b903315d1) [wt: vm_flags were absent in mainline, Ben removed them in 3.2, but I kept them to minimize changes and avoid any side effect] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19vm: add vm_iomap_memory() helper functionLinus Torvalds
Various drivers end up replicating the code to mmap() their memory buffers into user space, and our core memory remapping function may be very flexible but it is unnecessarily complicated for the common cases to use. Our internal VM uses pfn's ("page frame numbers") which simplifies things for the VM, and allows us to pass physical addresses around in a denser and more efficient format than passing a "phys_addr_t" around, and having to shift it up and down by the page size. But it just means that drivers end up doing that shifting instead at the interface level. It also means that drivers end up mucking around with internal VM things like the vma details (vm_pgoff, vm_start/end) way more than they really need to. So this just exports a function to map a certain physical memory range into user space (using a phys_addr_t based interface that is much more natural for a driver) and hides all the complexity from the driver. Some drivers will still end up tweaking the vm_page_prot details for things like prefetching or cacheability etc, but that's actually relevant to the driver, rather than caring about what the page offset of the mapping is into the particular IO memory region. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit b4cbb197c7e7a68dbad0d491242e3ca67420c13e) [WT: only needed in 2.6.32 for next commit] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19inet: fix possible memory corruption with UDP_CORK and UFOHannes Frederic Sowa
[ This is a simplified -stable version of a set of upstream commits. ] This is a replacement patch only for stable which does fix the problems handled by the following two commits in -net: "ip_output: do skb ufo init for peeked non ufo skb as well" (e93b7d748be887cd7639b113ba7d7ef792a7efb9) "ip6_output: do skb ufo init for peeked non ufo skb as well" (c547dbf55d5f8cf615ccc0e7265e98db27d3fb8b) Three frames are written on a corked udp socket for which the output netdevice has UFO enabled. If the first and third frame are smaller than the mtu and the second one is bigger, we enqueue the second frame with skb_append_datato_frags without initializing the gso fields. This leads to the third frame appended regulary and thus constructing an invalid skb. This fixes the problem by always using skb_append_datato_frags as soon as the first frag got enqueued to the skb without marking the packet as SKB_GSO_UDP. The problem with only two frames for ipv6 was fixed by "ipv6: udp packets following an UFO enqueued packet need also be handled by UFO" (2811ebac2521ceac84f2bdae402455baa6a7fb47). Cc: Jiri Pirko <jiri@resnulli.us> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> (cherry picked from commit 5124ae99ac8a8f63d0fca9b75adaef40b20678ff) Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19ipv6: udp packets following an UFO enqueued packet need also be handled by UFOHannes Frederic Sowa
In the following scenario the socket is corked: If the first UDP packet is larger then the mtu we try to append it to the write queue via ip6_ufo_append_data. A following packet, which is smaller than the mtu would be appended to the already queued up gso-skb via plain ip6_append_data. This causes random memory corruptions. In ip6_ufo_append_data we also have to be careful to not queue up the same skb multiple times. So setup the gso frame only when no first skb is available. This also fixes a shortcoming where we add the current packet's length to cork->length but return early because of a packet > mtu with dontfrag set (instead of sutracting it again). Found with trinity. Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> (cherry picked from commit 2811ebac2521ceac84f2bdae402455baa6a7fb47) [wt: 2.6.32 doesn't have dontfrag so remove the optimization] Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19aacraid: prevent invalid pointer dereferenceMahesh Rajashekhara
It appears that driver runs into a problem here if fibsize is too small because we allocate user_srbcmd with fibsize size only but later we access it until user_srbcmd->sg.count to copy it over to srbcmd. It is not correct to test (fibsize < sizeof(*user_srbcmd)) because this structure already includes one sg element and this is not needed for commands without data. So, we would recommend to add the following (instead of test for fibsize == 0). Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@pmcs.com> Reported-by: Nico Golde <nico@ngolde.de> Reported-by: Fabian Yamaguchi <fabs@goesec.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit b4789b8e6be3151a955ade74872822f30e8cd914) CVE-2013-6380 Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19sctp: unbalanced rcu lock in ip_queue_xmit()Nicolas Dichtel
The bug was introduced in by commit b8710128e201 ("inet: add RCU protection to inet->opt") (it's a backport of upstream commit f6d8bd051c39). In SCTP case, packet is already routed, hence we jump to the label 'packet_routed', but without rcu_read_lock(). After this label, rcu_read_unlock() is called unconditionally. Spotted-by: Guo Fengtian <fengtian.guo@6wind.com> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19isdnloop: Validate NUL-terminated strings from user.YOSHIFUJI Hideaki
[ Upstream commit 77bc6bed7121936bb2e019a8c336075f4c8eef62 ] Return -EINVAL unless all of user-given strings are correctly NUL-terminated. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19rds: prevent dereference of a NULL device in rds_iw_laddr_checkSasha Levin
[ Upstream commit bf39b4247b8799935ea91d90db250ab608a58e50 ] Binding might result in a NULL device which is later dereferenced without checking. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19isdnloop: several buffer overflowsDan Carpenter
[ Upstream commit 7563487cbf865284dcd35e9ef5a95380da046737 ] There are three buffer overflows addressed in this patch. 1) In isdnloop_fake_err() we add an 'E' to a 60 character string and then copy it into a 60 character buffer. I have made the destination buffer 64 characters and I'm changed the sprintf() to a snprintf(). 2) In isdnloop_parse_cmd(), p points to a 6 characters into a 60 character buffer so we have 54 characters. The ->eazlist[] is 11 characters long. I have modified the code to return if the source buffer is too long. 3) In isdnloop_command() the cbuf[] array was 60 characters long but the max length of the string then can be up to 79 characters. I made the cbuf array 80 characters long and changed the sprintf() to snprintf(). I also removed the temporary "dial" buffer and changed it to use "p" directly. Unfortunately, we pass the "cbuf" string from isdnloop_command() to isdnloop_writecmd() which truncates anything over 60 characters to make it fit in card->omsg[]. (It can accept values up to 255 characters so long as there is a '\n' character every 60 characters). For now I have just fixed the memory corruption bug and left the other problems in this driver alone. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19netlink: don't compare the nul-termination in nla_strcmpPablo Neira
[ Upstream commit 8b7b932434f5eee495b91a2804f5b64ebb2bc835 ] nla_strcmp compares the string length plus one, so it's implicitly including the nul-termination in the comparison. int nla_strcmp(const struct nlattr *nla, const char *str) { int len = strlen(str) + 1; ... d = memcmp(nla_data(nla), str, len); However, if NLA_STRING is used, userspace can send us a string without the nul-termination. This is a problem since the string comparison will not match as the last byte may be not the nul-termination. Fix this by skipping the comparison of the nul-termination if the attribute data is nul-terminated. Suggested by Thomas Graf. Cc: Florian Westphal <fw@strlen.de> Cc: Thomas Graf <tgraf@suug.ch> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19net: socket: error on a negative msg_namelenMatthew Leach
[ Upstream commit dbb490b96584d4e958533fb637f08b557f505657 ] When copying in a struct msghdr from the user, if the user has set the msg_namelen parameter to a negative value it gets clamped to a valid size due to a comparison between signed and unsigned values. Ensure the syscall errors when the user passes in a negative value. Signed-off-by: Matthew Leach <matthew.leach@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19net: sctp: fix skb leakage in COOKIE ECHO path of chunk->auth_chunkDaniel Borkmann
[ Upstream commit c485658bae87faccd7aed540fd2ca3ab37992310 ] While working on ec0223ec48a9 ("net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capable"), we noticed that there's a skb memory leakage in the error path. Running the same reproducer as in ec0223ec48a9 and by unconditionally jumping to the error label (to simulate an error condition) in sctp_sf_do_5_1D_ce() receive path lets kmemleak detector bark about the unfreed chunk->auth_chunk skb clone: Unreferenced object 0xffff8800b8f3a000 (size 256): comm "softirq", pid 0, jiffies 4294769856 (age 110.757s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 89 ab 75 5e d4 01 58 13 00 00 00 00 00 00 00 00 ..u^..X......... backtrace: [<ffffffff816660be>] kmemleak_alloc+0x4e/0xb0 [<ffffffff8119f328>] kmem_cache_alloc+0xc8/0x210 [<ffffffff81566929>] skb_clone+0x49/0xb0 [<ffffffffa0467459>] sctp_endpoint_bh_rcv+0x1d9/0x230 [sctp] [<ffffffffa046fdbc>] sctp_inq_push+0x4c/0x70 [sctp] [<ffffffffa047e8de>] sctp_rcv+0x82e/0x9a0 [sctp] [<ffffffff815abd38>] ip_local_deliver_finish+0xa8/0x210 [<ffffffff815a64af>] nf_reinject+0xbf/0x180 [<ffffffffa04b4762>] nfqnl_recv_verdict+0x1d2/0x2b0 [nfnetlink_queue] [<ffffffffa04aa40b>] nfnetlink_rcv_msg+0x14b/0x250 [nfnetlink] [<ffffffff815a3269>] netlink_rcv_skb+0xa9/0xc0 [<ffffffffa04aa7cf>] nfnetlink_rcv+0x23f/0x408 [nfnetlink] [<ffffffff815a2bd8>] netlink_unicast+0x168/0x250 [<ffffffff815a2fa1>] netlink_sendmsg+0x2e1/0x3f0 [<ffffffff8155cc6b>] sock_sendmsg+0x8b/0xc0 [<ffffffff8155d449>] ___sys_sendmsg+0x369/0x380 What happens is that commit bbd0d59809f9 clones the skb containing the AUTH chunk in sctp_endpoint_bh_rcv() when having the edge case that an endpoint requires COOKIE-ECHO chunks to be authenticated: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- ------------------ AUTH; COOKIE-ECHO ----------------> <-------------------- COOKIE-ACK --------------------- When we enter sctp_sf_do_5_1D_ce() and before we actually get to the point where we process (and subsequently free) a non-NULL chunk->auth_chunk, we could hit the "goto nomem_init" path from an error condition and thus leave the cloned skb around w/o freeing it. The fix is to centrally free such clones in sctp_chunk_destroy() handler that is invoked from sctp_chunk_free() after all refs have dropped; and also move both kfree_skb(chunk->auth_chunk) there, so that chunk->auth_chunk is either NULL (since sctp_chunkify() allocs new chunks through kmem_cache_zalloc()) or non-NULL with a valid skb pointer. chunk->skb and chunk->auth_chunk are the only skbs in the sctp_chunk structure that need to be handeled. While at it, we should use consume_skb() for both. It is the same as dev_kfree_skb() but more appropriately named as we are not a device but a protocol. Also, this effectively replaces the kfree_skb() from both invocations into consume_skb(). Functions are the same only that kfree_skb() assumes that the frame was being dropped after a failure (e.g. for tools like drop monitor), usage of consume_skb() seems more appropriate in function sctp_chunk_destroy() though. Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <yasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
2014-05-19net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH capableDaniel Borkmann
[ Upstream commit ec0223ec48a90cb605244b45f7c62de856403729 ] RFC4895 introduced AUTH chunks for SCTP; during the SCTP handshake RANDOM; CHUNKS; HMAC-ALGO are negotiated (CHUNKS being optional though): ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- -------------------- COOKIE-ECHO --------------------> <-------------------- COOKIE-ACK --------------------- A special case is when an endpoint requires COOKIE-ECHO chunks to be authenticated: ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ----------> <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] --------- ------------------ AUTH; COOKIE-ECHO ----------------> <-------------------- COOKIE-ACK --------------------- RFC4895, section 6.3. Receiving Authenticated Chunks says: The receiver MUST use the HMAC algorithm indicated in the HMAC Identifier field. If this algorithm was not specified by the receiver in the HMAC-ALGO parameter in the INIT or INIT-ACK chunk during association setup, the AUTH chunk and all the chunks after it MUST be discarded and an ERROR chunk SHOULD be sent with the error cause defined in Section 4.1. [...] If no endpoint pair shared key has been configured for that Shared Key Identifier, all authenticated chunks MUST be silently discarded. [...] When an endpoint requires COOKIE-ECHO chunks to be authenticated, some special procedures have to be followed because the reception of a COOKIE-ECHO chunk might result in the creation of an SCTP association. If a packet arrives containing an AUTH chunk as a first chunk, a COOKIE-ECHO chunk as the second chunk, and possibly more chunks after them, and the receiver does not have an STCB for that packet, then authentication is based on the contents of the COOKIE-ECHO chunk. In this situation, the receiver MUST authenticate the chunks in the packet by using the RANDOM parameters, CHUNKS parameters and HMAC_ALGO parameters obtained from the COOKIE-ECHO chunk, and possibly a local shared secret as inputs to the authentication procedure specified in Section 6.3. If authentication fails, then the packet is discarded. If the authentication is successful, the COOKIE-ECHO and all the chunks after the COOKIE-ECHO MUST be processed. If the receiver has an STCB, it MUST process the AUTH chunk as described above using the STCB from the existing association to authenticate the COOKIE-ECHO chunk and all the chunks after it. [...] Commit bbd0d59809f9 introduced the possibility to receive and verification of AUTH chunk, including the edge case for authenticated COOKIE-ECHO. On reception of COOKIE-ECHO, the function sctp_sf_do_5_1D_ce() handles processing, unpacks and creates a new association if it passed sanity checks and also tests for authentication chunks being present. After a new association has been processed, it invokes sctp_process_init() on the new association and walks through the parameter list it received from the INIT chunk. It checks SCTP_PARAM_RANDOM, SCTP_PARAM_HMAC_ALGO and SCTP_PARAM_CHUNKS, and copies them into asoc->peer meta data (peer_random, peer_hmacs, peer_chunks) in case sysctl -w net.sctp.auth_enable=1 is set. If in INIT's SCTP_PARAM_SUPPORTED_EXT parameter SCTP_CID_AUTH is set, peer_random != NULL and peer_hmacs != NULL the peer is to be assumed asoc->peer.auth_capable=1, in any other case asoc->peer.auth_capable=0. Now, if in sctp_sf_do_5_1D_ce() chunk->auth_chunk is available, we set up a fake auth chunk and pass that on to sctp_sf_authenticate(), which at latest in sctp_auth_calculate_hmac() reliably dereferences a NULL pointer at position 0..0008 when setting up the crypto key in crypto_hash_setkey() by using asoc->asoc_shared_key that is NULL as condition key_id == asoc->active_key_id is true if the AUTH chunk was injected correctly from remote. This happens no matter what net.sctp.auth_enable sysctl says. The fix is to check for net->sctp.auth_enable and for asoc->peer.auth_capable before doing any operations like sctp_sf_authenticate() as no key is activated in sctp_auth_asoc_init_active_key() for each case. Now as RFC4895 section 6.3 states that if the used HMAC-ALGO passed from the INIT chunk was not used in the AUTH chunk, we SHOULD send an error; however in this case it would be better to just silently discard such a maliciously prepared handshake as we didn't even receive a parameter at all. Also, as our endpoint has no shared key configured, section 6.3 says that MUST silently discard, which we are doing from now onwards. Before calling sctp_sf_pdiscard(), we need not only to free the association, but also the chunk->auth_chunk skb, as commit bbd0d59809f9 created a skb clone in that case. I have tested this locally by using netfilter's nfqueue and re-injecting packets into the local stack after maliciously modifying the INIT chunk (removing RANDOM; HMAC-ALGO param) and the SCTP packet containing the COOKIE_ECHO (injecting AUTH chunk before COOKIE_ECHO). Fixed with this patch applied. Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <yasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>